LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH 11/25] soc: fsl: qe: qe_common: Fix misnamed function attribute 'addr'
From: Lee Jones @ 2020-11-13  7:15 UTC (permalink / raw)
  To: Leo Li
  Cc: Software, Inc, linux-kernel@vger.kernel.org, act, Dan Malek,
	Vitaly Bordug, Scott Wood, linuxppc-dev@lists.ozlabs.org,
	linux-arm-kernel@lists.infradead.org, Qiang Zhao
In-Reply-To: <VE1PR04MB66877659A67152AE02CF443F8FE70@VE1PR04MB6687.eurprd04.prod.outlook.com>

On Thu, 12 Nov 2020, Leo Li wrote:

> 
> 
> > -----Original Message-----
> > From: Lee Jones <lee.jones@linaro.org>
> > Sent: Thursday, November 12, 2020 4:33 AM
> > To: linux-arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org;
> > Qiang Zhao <qiang.zhao@nxp.com>; Leo Li <leoyang.li@nxp.com>; Scott
> > Wood <scottwood@freescale.com>; act <dmalek@jlc.net>; Dan Malek
> > <dan@embeddedalley.com>; Software, Inc <source@mvista.com>; Vitaly
> > Bordug <vbordug@ru.mvista.com>; linuxppc-dev@lists.ozlabs.org
> > Subject: Re: [PATCH 11/25] soc: fsl: qe: qe_common: Fix misnamed function
> > attribute 'addr'
> > 
> > On Tue, 03 Nov 2020, Lee Jones wrote:
> > 
> > > Fixes the following W=1 kernel build warning(s):
> > >
> > >  drivers/soc/fsl/qe/qe_common.c:237: warning: Function parameter or
> > member 'addr' not described in 'cpm_muram_dma'
> > >  drivers/soc/fsl/qe/qe_common.c:237: warning: Excess function parameter
> > 'offset' description in 'cpm_muram_dma'
> > >
> > > Cc: Qiang Zhao <qiang.zhao@nxp.com>
> > > Cc: Li Yang <leoyang.li@nxp.com>
> > > Cc: Scott Wood <scottwood@freescale.com>
> > > Cc: act <dmalek@jlc.net>
> > > Cc: Dan Malek <dan@embeddedalley.com>
> > > Cc: "Software, Inc" <source@mvista.com>
> > > Cc: Vitaly Bordug <vbordug@ru.mvista.com>
> > > Cc: linuxppc-dev@lists.ozlabs.org
> > > Signed-off-by: Lee Jones <lee.jones@linaro.org>
> > > ---
> > >  drivers/soc/fsl/qe/qe_common.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/soc/fsl/qe/qe_common.c
> > > b/drivers/soc/fsl/qe/qe_common.c index 75075591f6308..497a7e0fd0272
> > > 100644
> > > --- a/drivers/soc/fsl/qe/qe_common.c
> > > +++ b/drivers/soc/fsl/qe/qe_common.c
> > > @@ -231,7 +231,7 @@ EXPORT_SYMBOL(cpm_muram_offset);
> > >
> > >  /**
> > >   * cpm_muram_dma - turn a muram virtual address into a DMA address
> > > - * @offset: virtual address from cpm_muram_addr() to convert
> > > + * @addr: virtual address from cpm_muram_addr() to convert
> > >   */
> > >  dma_addr_t cpm_muram_dma(void __iomem *addr)  {
> > 
> > Any idea who will pick this up?
> 
> I can pick them up through my tree, but I haven't created the
> for-next branch for the next kernel yet.  Will look through this
> series soon.  Thanks.

Thank you Leo.

There's not rush.  Just trying to ensure they don't get forgotten.

-- 
Lee Jones [李琼斯]
Senior Technical Lead - Developer Services
Linaro.org │ Open source software for Arm SoCs
Follow Linaro: Facebook | Twitter | Blog

^ permalink raw reply

* [PATCH] macintosh: windfarm: Use NULL to compare with pointer-typed value rather than 0
From: Xu Wang @ 2020-11-13  7:33 UTC (permalink / raw)
  To: benh, linuxppc-dev; +Cc: linux-kernel

Compare pointer-typed values to NULL rather than 0.

Signed-off-by: Xu Wang <vulab@iscas.ac.cn>
---
 drivers/macintosh/windfarm_pm121.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/macintosh/windfarm_pm121.c b/drivers/macintosh/windfarm_pm121.c
index ab467b9c31be..62826844b584 100644
--- a/drivers/macintosh/windfarm_pm121.c
+++ b/drivers/macintosh/windfarm_pm121.c
@@ -650,7 +650,7 @@ static void pm121_create_cpu_fans(void)
 
 	/* First, locate the PID params in SMU SBD */
 	hdr = smu_get_sdb_partition(SMU_SDB_CPUPIDDATA_ID, NULL);
-	if (hdr == 0) {
+	if (hdr == NULL) {
 		printk(KERN_WARNING "pm121: CPU PID fan config not found.\n");
 		goto fail;
 	}
@@ -969,7 +969,7 @@ static int pm121_init_pm(void)
 	const struct smu_sdbp_header *hdr;
 
 	hdr = smu_get_sdb_partition(SMU_SDB_SENSORTREE_ID, NULL);
-	if (hdr != 0) {
+	if (hdr != NULL) {
 		struct smu_sdbp_sensortree *st =
 			(struct smu_sdbp_sensortree *)&hdr[1];
 		pm121_mach_model = st->model_id;
-- 
2.17.1


^ permalink raw reply related

* [PATCH 2/5] mm: Introduce pXX_leaf_size()
From: Peter Zijlstra @ 2020-11-13 11:19 UTC (permalink / raw)
  To: kan.liang, mingo, acme, mark.rutland, alexander.shishkin, jolsa,
	eranian
  Cc: linux-arch, ak, catalin.marinas, peterz, linuxppc-dev, willy,
	linux-kernel, dave.hansen, npiggin, aneesh.kumar, sparclinux,
	will, davem, kirill.shutemov
In-Reply-To: <20201113111901.743573013@infradead.org>

A number of architectures have non-pagetable aligned huge/large pages.
For such architectures a leaf can actually be part of a larger TLB
entry.

Provide generic helpers to determine the TLB size of a page-table
leaf.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/pgtable.h |   16 ++++++++++++++++
 1 file changed, 16 insertions(+)

--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -1536,4 +1536,20 @@ typedef unsigned int pgtbl_mod_mask;
 #define pmd_leaf(x)	0
 #endif
 
+#ifndef pgd_leaf_size
+#define pgd_leaf_size(x) PGD_SIZE
+#endif
+#ifndef p4d_leaf_size
+#define p4d_leaf_size(x) P4D_SIZE
+#endif
+#ifndef pud_leaf_size
+#define pud_leaf_size(x) PUD_SIZE
+#endif
+#ifndef pmd_leaf_size
+#define pmd_leaf_size(x) PMD_SIZE
+#endif
+#ifndef pte_leaf_size
+#define pte_leaf_size(x) PAGE_SIZE
+#endif
+
 #endif /* _LINUX_PGTABLE_H */



^ permalink raw reply

* [PATCH 4/5] arm64/mm: Implement pXX_leaf_size() support
From: Peter Zijlstra @ 2020-11-13 11:19 UTC (permalink / raw)
  To: kan.liang, mingo, acme, mark.rutland, alexander.shishkin, jolsa,
	eranian
  Cc: linux-arch, ak, catalin.marinas, peterz, linuxppc-dev, willy,
	linux-kernel, dave.hansen, npiggin, aneesh.kumar, sparclinux,
	will, davem, kirill.shutemov
In-Reply-To: <20201113111901.743573013@infradead.org>

ARM64 has non-pagetable aligned large page support with PTE_CONT, when
this bit is set the page is part of a super-page. Match the hugetlb
code and support these super pages for PTE and PMD levels.

This enables PERF_SAMPLE_{DATA,CODE}_PAGE_SIZE to report accurate TLB
page sizes.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/arm64/include/asm/pgtable.h |    3 +++
 1 file changed, 3 insertions(+)

--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -503,6 +503,9 @@ extern pgprot_t phys_mem_access_prot(str
 				 PMD_TYPE_SECT)
 #define pmd_leaf(pmd)		pmd_sect(pmd)
 
+#define pmd_leaf_size(pmd)	(pmd_cont(pmd) ? CONT_PMD_SIZE : PMD_SIZE)
+#define pte_leaf_size(pte)	(pte_cont(pte) ? CONT_PTE_SIZE : PAGE_SIZE)
+
 #if defined(CONFIG_ARM64_64K_PAGES) || CONFIG_PGTABLE_LEVELS < 3
 static inline bool pud_sect(pud_t pud) { return false; }
 static inline bool pud_table(pud_t pud) { return true; }



^ permalink raw reply

* [PATCH 1/5] mm/gup: Provide gup_get_pte() more generic
From: Peter Zijlstra @ 2020-11-13 11:19 UTC (permalink / raw)
  To: kan.liang, mingo, acme, mark.rutland, alexander.shishkin, jolsa,
	eranian
  Cc: linux-arch, ak, catalin.marinas, peterz, linuxppc-dev, willy,
	linux-kernel, dave.hansen, npiggin, aneesh.kumar, sparclinux,
	will, davem, kirill.shutemov
In-Reply-To: <20201113111901.743573013@infradead.org>

In order to write another lockless page-table walker, we need
gup_get_pte() exposed. While doing that, rename it to
ptep_get_lockless() to match the existing ptep_get() naming.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 include/linux/pgtable.h |   55 +++++++++++++++++++++++++++++++++++++++++++++
 mm/gup.c                |   58 ------------------------------------------------
 2 files changed, 56 insertions(+), 57 deletions(-)

--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -258,6 +258,61 @@ static inline pte_t ptep_get(pte_t *ptep
 }
 #endif
 
+#ifdef CONFIG_GUP_GET_PTE_LOW_HIGH
+/*
+ * WARNING: only to be used in the get_user_pages_fast() implementation.
+ *
+ * With get_user_pages_fast(), we walk down the pagetables without taking any
+ * locks.  For this we would like to load the pointers atomically, but sometimes
+ * that is not possible (e.g. without expensive cmpxchg8b on x86_32 PAE).  What
+ * we do have is the guarantee that a PTE will only either go from not present
+ * to present, or present to not present or both -- it will not switch to a
+ * completely different present page without a TLB flush in between; something
+ * that we are blocking by holding interrupts off.
+ *
+ * Setting ptes from not present to present goes:
+ *
+ *   ptep->pte_high = h;
+ *   smp_wmb();
+ *   ptep->pte_low = l;
+ *
+ * And present to not present goes:
+ *
+ *   ptep->pte_low = 0;
+ *   smp_wmb();
+ *   ptep->pte_high = 0;
+ *
+ * We must ensure here that the load of pte_low sees 'l' IFF pte_high sees 'h'.
+ * We load pte_high *after* loading pte_low, which ensures we don't see an older
+ * value of pte_high.  *Then* we recheck pte_low, which ensures that we haven't
+ * picked up a changed pte high. We might have gotten rubbish values from
+ * pte_low and pte_high, but we are guaranteed that pte_low will not have the
+ * present bit set *unless* it is 'l'. Because get_user_pages_fast() only
+ * operates on present ptes we're safe.
+ */
+static inline pte_t ptep_get_lockless(pte_t *ptep)
+{
+	pte_t pte;
+
+	do {
+		pte.pte_low = ptep->pte_low;
+		smp_rmb();
+		pte.pte_high = ptep->pte_high;
+		smp_rmb();
+	} while (unlikely(pte.pte_low != ptep->pte_low));
+
+	return pte;
+}
+#else /* CONFIG_GUP_GET_PTE_LOW_HIGH */
+/*
+ * We require that the PTE can be read atomically.
+ */
+static inline pte_t ptep_get_lockless(pte_t *ptep)
+{
+	return ptep_get(ptep);
+}
+#endif /* CONFIG_GUP_GET_PTE_LOW_HIGH */
+
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 #ifndef __HAVE_ARCH_PMDP_HUGE_GET_AND_CLEAR
 static inline pmd_t pmdp_huge_get_and_clear(struct mm_struct *mm,
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2079,62 +2079,6 @@ static void put_compound_head(struct pag
 	put_page(page);
 }
 
-#ifdef CONFIG_GUP_GET_PTE_LOW_HIGH
-
-/*
- * WARNING: only to be used in the get_user_pages_fast() implementation.
- *
- * With get_user_pages_fast(), we walk down the pagetables without taking any
- * locks.  For this we would like to load the pointers atomically, but sometimes
- * that is not possible (e.g. without expensive cmpxchg8b on x86_32 PAE).  What
- * we do have is the guarantee that a PTE will only either go from not present
- * to present, or present to not present or both -- it will not switch to a
- * completely different present page without a TLB flush in between; something
- * that we are blocking by holding interrupts off.
- *
- * Setting ptes from not present to present goes:
- *
- *   ptep->pte_high = h;
- *   smp_wmb();
- *   ptep->pte_low = l;
- *
- * And present to not present goes:
- *
- *   ptep->pte_low = 0;
- *   smp_wmb();
- *   ptep->pte_high = 0;
- *
- * We must ensure here that the load of pte_low sees 'l' IFF pte_high sees 'h'.
- * We load pte_high *after* loading pte_low, which ensures we don't see an older
- * value of pte_high.  *Then* we recheck pte_low, which ensures that we haven't
- * picked up a changed pte high. We might have gotten rubbish values from
- * pte_low and pte_high, but we are guaranteed that pte_low will not have the
- * present bit set *unless* it is 'l'. Because get_user_pages_fast() only
- * operates on present ptes we're safe.
- */
-static inline pte_t gup_get_pte(pte_t *ptep)
-{
-	pte_t pte;
-
-	do {
-		pte.pte_low = ptep->pte_low;
-		smp_rmb();
-		pte.pte_high = ptep->pte_high;
-		smp_rmb();
-	} while (unlikely(pte.pte_low != ptep->pte_low));
-
-	return pte;
-}
-#else /* CONFIG_GUP_GET_PTE_LOW_HIGH */
-/*
- * We require that the PTE can be read atomically.
- */
-static inline pte_t gup_get_pte(pte_t *ptep)
-{
-	return ptep_get(ptep);
-}
-#endif /* CONFIG_GUP_GET_PTE_LOW_HIGH */
-
 static void __maybe_unused undo_dev_pagemap(int *nr, int nr_start,
 					    unsigned int flags,
 					    struct page **pages)
@@ -2160,7 +2104,7 @@ static int gup_pte_range(pmd_t pmd, unsi
 
 	ptem = ptep = pte_offset_map(&pmd, addr);
 	do {
-		pte_t pte = gup_get_pte(ptep);
+		pte_t pte = ptep_get_lockless(ptep);
 		struct page *head, *page;
 
 		/*



^ permalink raw reply

* [PATCH 3/5] perf/core: Fix arch_perf_get_page_size()
From: Peter Zijlstra @ 2020-11-13 11:19 UTC (permalink / raw)
  To: kan.liang, mingo, acme, mark.rutland, alexander.shishkin, jolsa,
	eranian
  Cc: linux-arch, ak, catalin.marinas, peterz, linuxppc-dev, willy,
	linux-kernel, dave.hansen, npiggin, aneesh.kumar, sparclinux,
	will, davem, kirill.shutemov
In-Reply-To: <20201113111901.743573013@infradead.org>

The (new) page-table walker in arch_perf_get_page_size() is broken in
various ways. Specifically while it is used in a locless manner, it
doesn't depend on CONFIG_HAVE_FAST_GUP nor uses the proper _lockless
offset methods, nor is careful to only read each entry only once.

Also the hugetlb support is broken due to calling pte_page() without
first checking pte_special().

Rewrite the whole thing to be a proper lockless page-table walker and
employ the new pXX_leaf_size() pgtable functions to determine the TLB
size without looking at the page-frames.

Fixes: 51b646b2d9f8 ("perf,mm: Handle non-page-table-aligned hugetlbfs")
Fixes: 8d97e71811aa ("perf/core: Add PERF_SAMPLE_DATA_PAGE_SIZE")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/arm64/include/asm/pgtable.h    |    3 +
 arch/sparc/include/asm/pgtable_64.h |   13 ++++
 arch/sparc/mm/hugetlbpage.c         |   19 ++++--
 include/linux/pgtable.h             |   16 +++++
 kernel/events/core.c                |  102 +++++++++++++-----------------------
 5 files changed, 82 insertions(+), 71 deletions(-)

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7001,90 +7001,62 @@ static u64 perf_virt_to_phys(u64 virt)
 	return phys_addr;
 }
 
-#ifdef CONFIG_MMU
-
 /*
- * Return the MMU page size of a given virtual address.
- *
- * This generic implementation handles page-table aligned huge pages, as well
- * as non-page-table aligned hugetlbfs compound pages.
- *
- * If an architecture supports and uses non-page-table aligned pages in their
- * kernel mapping it will need to provide it's own implementation of this
- * function.
+ * Return the MMU/TLB page size of a given virtual address.
  */
-__weak u64 arch_perf_get_page_size(struct mm_struct *mm, unsigned long addr)
+static u64 perf_get_tlb_page_size(struct mm_struct *mm, unsigned long addr)
 {
-	struct page *page;
-	pgd_t *pgd;
-	p4d_t *p4d;
-	pud_t *pud;
-	pmd_t *pmd;
-	pte_t *pte;
+	u64 size = 0;
 
-	pgd = pgd_offset(mm, addr);
-	if (pgd_none(*pgd))
-		return 0;
+#ifdef CONFIG_HAVE_FAST_GUP
+	pgd_t *pgdp, pgd;
+	p4d_t *p4dp, p4d;
+	pud_t *pudp, pud;
+	pmd_t *pmdp, pmd;
+	pte_t *ptep, pte;
 
-	p4d = p4d_offset(pgd, addr);
-	if (!p4d_present(*p4d))
+	pgdp = pgd_offset(mm, addr);
+	pgd = READ_ONCE(*pgdp);
+	if (pgd_none(pgd))
 		return 0;
 
-	if (p4d_leaf(*p4d))
-		return 1ULL << P4D_SHIFT;
+	if (pgd_leaf(pgd))
+		return pgd_leaf_size(pgd);
 
-	pud = pud_offset(p4d, addr);
-	if (!pud_present(*pud))
+	p4dp = p4d_offset_lockless(pgdp, pgd, addr);
+	p4d = READ_ONCE(*p4dp);
+	if (!p4d_present(p4d))
 		return 0;
 
-	if (pud_leaf(*pud)) {
-#ifdef pud_page
-		page = pud_page(*pud);
-		if (PageHuge(page))
-			return page_size(compound_head(page));
-#endif
-		return 1ULL << PUD_SHIFT;
-	}
+	if (p4d_leaf(p4d))
+		return p4d_leaf_size(p4d);
 
-	pmd = pmd_offset(pud, addr);
-	if (!pmd_present(*pmd))
+	pudp = pud_offset_lockless(p4dp, p4d, addr);
+	pud = READ_ONCE(*pudp);
+	if (!pud_present(pud))
 		return 0;
 
-	if (pmd_leaf(*pmd)) {
-#ifdef pmd_page
-		page = pmd_page(*pmd);
-		if (PageHuge(page))
-			return page_size(compound_head(page));
-#endif
-		return 1ULL << PMD_SHIFT;
-	}
+	if (pud_leaf(pud))
+		return pud_leaf_size(pud);
 
-	pte = pte_offset_map(pmd, addr);
-	if (!pte_present(*pte)) {
-		pte_unmap(pte);
+	pmdp = pmd_offset_lockless(pudp, pud, addr);
+	pmd = READ_ONCE(*pmdp);
+	if (!pmd_present(pmd))
 		return 0;
-	}
 
-	page = pte_page(*pte);
-	if (PageHuge(page)) {
-		u64 size = page_size(compound_head(page));
-		pte_unmap(pte);
-		return size;
-	}
-
-	pte_unmap(pte);
-	return PAGE_SIZE;
-}
+	if (pmd_leaf(pmd))
+		return pmd_leaf_size(pmd);
 
-#else
+	ptep = pte_offset_map(&pmd, addr);
+	pte = ptep_get_lockless(ptep);
+	if (pte_present(pte))
+		size = pte_leaf_size(pte);
+	pte_unmap(ptep);
+#endif /* CONFIG_HAVE_FAST_GUP */
 
-static u64 arch_perf_get_page_size(struct mm_struct *mm, unsigned long addr)
-{
-	return 0;
+	return size;
 }
 
-#endif
-
 static u64 perf_get_page_size(unsigned long addr)
 {
 	struct mm_struct *mm;
@@ -7109,7 +7081,7 @@ static u64 perf_get_page_size(unsigned l
 		mm = &init_mm;
 	}
 
-	size = arch_perf_get_page_size(mm, addr);
+	size = perf_get_tlb_page_size(mm, addr);
 
 	local_irq_restore(flags);
 



^ permalink raw reply

* [PATCH 0/5] perf/mm: Fix PERF_SAMPLE_*_PAGE_SIZE
From: Peter Zijlstra @ 2020-11-13 11:19 UTC (permalink / raw)
  To: kan.liang, mingo, acme, mark.rutland, alexander.shishkin, jolsa,
	eranian
  Cc: linux-arch, ak, catalin.marinas, peterz, linuxppc-dev, willy,
	linux-kernel, dave.hansen, npiggin, aneesh.kumar, sparclinux,
	will, davem, kirill.shutemov

Hi,

These patches provide generic infrastructure to determine TLB page size from
page table entries alone. Perf will use this (for either data or code address)
to aid in profiling TLB issues.

While most architectures only have page table aligned large pages, some
(notably ARM64, Sparc64 and Power) provide non page table aligned large pages
and need to provide their own implementation of these functions.

I've provided (completely untested) implementations for ARM64 and Sparc64, but
failed to penetrate the _many_ Power MMUs. I'm hoping Nick or Aneesh can help
me out there.


^ permalink raw reply

* [PATCH 5/5] sparc64/mm: Implement pXX_leaf_size() support
From: Peter Zijlstra @ 2020-11-13 11:19 UTC (permalink / raw)
  To: kan.liang, mingo, acme, mark.rutland, alexander.shishkin, jolsa,
	eranian
  Cc: linux-arch, ak, catalin.marinas, peterz, linuxppc-dev, willy,
	linux-kernel, dave.hansen, npiggin, aneesh.kumar, sparclinux,
	will, davem, kirill.shutemov
In-Reply-To: <20201113111901.743573013@infradead.org>

Sparc64 has non-pagetable aligned large page support; wire up the
pXX_leaf_size() functions to report the correct TLB page size.

This enables PERF_SAMPLE_{DATA,CODE}_PAGE_SIZE to report accurate TLB
page sizes.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/sparc/include/asm/pgtable_64.h |   13 +++++++++++++
 arch/sparc/mm/hugetlbpage.c         |   19 +++++++++++++------
 2 files changed, 26 insertions(+), 6 deletions(-)

--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -1121,6 +1121,19 @@ extern unsigned long cmdline_memory_size
 
 asmlinkage void do_sparc64_fault(struct pt_regs *regs);
 
+#ifdef CONFIG_HUGETLB_PAGE
+
+#define pud_leaf_size pud_leaf_size
+extern unsigned long pud_leaf_size(pud_t pud);
+
+#define pmd_leaf_size pmd_leaf_size
+extern unsigned long pmd_leaf_size(pmd_t pmd);
+
+#define pte_leaf_size pte_leaf_size
+extern unsigned long pte_leaf_size(pte_t pte);
+
+#endif /* CONFIG_HUGETLB_PAGE */
+
 #endif /* !(__ASSEMBLY__) */
 
 #endif /* !(_SPARC64_PGTABLE_H) */
--- a/arch/sparc/mm/hugetlbpage.c
+++ b/arch/sparc/mm/hugetlbpage.c
@@ -247,14 +247,17 @@ static unsigned int sun4u_huge_tte_to_sh
 	return shift;
 }
 
-static unsigned int huge_tte_to_shift(pte_t entry)
+static unsigned long tte_to_shift(pte_t entry)
 {
-	unsigned long shift;
-
 	if (tlb_type == hypervisor)
-		shift = sun4v_huge_tte_to_shift(entry);
-	else
-		shift = sun4u_huge_tte_to_shift(entry);
+		return sun4v_huge_tte_to_shift(entry);
+
+	return sun4u_huge_tte_to_shift(entry);
+}
+
+static unsigned int huge_tte_to_shift(pte_t entry)
+{
+	unsigned long shift = tte_to_shift(entry);
 
 	if (shift == PAGE_SHIFT)
 		WARN_ONCE(1, "tto_to_shift: invalid hugepage tte=0x%lx\n",
@@ -272,6 +275,10 @@ static unsigned long huge_tte_to_size(pt
 	return size;
 }
 
+unsigned long pud_leaf_size(pud_t pud) { return 1UL << tte_to_shift((pte_t)pud); }
+unsigned long pmd_leaf_size(pmd_t pmd) { return 1UL << tte_to_shift((pte_t)pmd); }
+unsigned long pte_leaf_size(pte_t pte) { return 1UL << tte_to_shift((pte_t)pte); }
+
 pte_t *huge_pte_alloc(struct mm_struct *mm,
 			unsigned long addr, unsigned long sz)
 {



^ permalink raw reply

* Re: [PATCH 2/5] mm: Introduce pXX_leaf_size()
From: Peter Zijlstra @ 2020-11-13 11:45 UTC (permalink / raw)
  To: kan.liang, mingo, acme, mark.rutland, alexander.shishkin, jolsa,
	eranian
  Cc: linux-arch, ak, catalin.marinas, linuxppc-dev, willy,
	linux-kernel, dave.hansen, npiggin, aneesh.kumar, sparclinux,
	will, davem, kirill.shutemov
In-Reply-To: <20201113113426.465239104@infradead.org>

On Fri, Nov 13, 2020 at 12:19:03PM +0100, Peter Zijlstra wrote:
> A number of architectures have non-pagetable aligned huge/large pages.
> For such architectures a leaf can actually be part of a larger TLB
> entry.
> 
> Provide generic helpers to determine the TLB size of a page-table
> leaf.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  include/linux/pgtable.h |   16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> --- a/include/linux/pgtable.h
> +++ b/include/linux/pgtable.h
> @@ -1536,4 +1536,20 @@ typedef unsigned int pgtbl_mod_mask;
>  #define pmd_leaf(x)	0
>  #endif
>  
> +#ifndef pgd_leaf_size
> +#define pgd_leaf_size(x) PGD_SIZE

Argh, I lost a refresh, that should've been:

+#define pgd_leaf_size(x) (1ULL << PGDIR_SHIFT)


> +#endif
> +#ifndef p4d_leaf_size
> +#define p4d_leaf_size(x) P4D_SIZE
> +#endif
> +#ifndef pud_leaf_size
> +#define pud_leaf_size(x) PUD_SIZE
> +#endif
> +#ifndef pmd_leaf_size
> +#define pmd_leaf_size(x) PMD_SIZE
> +#endif
> +#ifndef pte_leaf_size
> +#define pte_leaf_size(x) PAGE_SIZE
> +#endif
> +
>  #endif /* _LINUX_PGTABLE_H */
> 
> 

^ permalink raw reply

* Re: [RFC PATCH kernel 2/2] powerpc/pci: Remove LSI mappings on device teardown
From: Cédric Le Goater @ 2020-11-13 12:06 UTC (permalink / raw)
  To: Alexey Kardashevskiy, linuxppc-dev
  Cc: Rob Herring, Marc Zyngier, linux-kernel, Oliver O'Halloran,
	Frederic Barrat, Qian Cai, Thomas Gleixner, Michal Suchánek
In-Reply-To: <20201027090655.14118-3-aik@ozlabs.ru>

On 10/27/20 10:06 AM, Alexey Kardashevskiy wrote:
> From: Oliver O'Halloran <oohall@gmail.com>
> 
> When a passthrough IO adapter is removed from a pseries machine using hash
> MMU and the XIVE interrupt mode, the POWER hypervisor expects the guest OS
> to clear all page table entries related to the adapter. If some are still
> present, the RTAS call which isolates the PCI slot returns error 9001
> "valid outstanding translations" and the removal of the IO adapter fails.
> This is because when the PHBs are scanned, Linux maps automatically the
> INTx interrupts in the Linux interrupt number space but these are never
> removed.
> 
> This problem can be fixed by adding the corresponding unmap operation when
> the device is removed. There's no pcibios_* hook for the remove case, but
> the same effect can be achieved using a bus notifier.
> 
> Cc: Cédric Le Goater <clg@kaod.org>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>


Reviewed-by: Cédric Le Goater <clg@kaod.org>

Thanks taking care of this.

C. 

> ---
>  arch/powerpc/kernel/pci-common.c | 21 +++++++++++++++++++++
>  1 file changed, 21 insertions(+)
> 
> diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
> index be108616a721..95f4e173368a 100644
> --- a/arch/powerpc/kernel/pci-common.c
> +++ b/arch/powerpc/kernel/pci-common.c
> @@ -404,6 +404,27 @@ static int pci_read_irq_line(struct pci_dev *pci_dev)
>  	return 0;
>  }
>  
> +static int ppc_pci_unmap_irq_line(struct notifier_block *nb,
> +			       unsigned long action, void *data)
> +{
> +	struct pci_dev *pdev = to_pci_dev(data);
> +
> +	if (action == BUS_NOTIFY_DEL_DEVICE)
> +		irq_dispose_mapping(pdev->irq);
> +
> +	return NOTIFY_DONE;
> +}
> +
> +static struct notifier_block ppc_pci_unmap_irq_notifier = {
> +	.notifier_call = ppc_pci_unmap_irq_line,
> +};
> +
> +static int ppc_pci_register_irq_notifier(void)
> +{
> +	return bus_register_notifier(&pci_bus_type, &ppc_pci_unmap_irq_notifier);
> +}
> +arch_initcall(ppc_pci_register_irq_notifier);
> +
>  /*
>   * Platform support for /proc/bus/pci/X/Y mmap()s.
>   *  -- paulus.
> 


^ permalink raw reply

* Re: [PATCH 0/5] perf/mm: Fix PERF_SAMPLE_*_PAGE_SIZE
From: Christophe Leroy @ 2020-11-13 13:44 UTC (permalink / raw)
  To: Peter Zijlstra, kan.liang, mingo, acme, mark.rutland,
	alexander.shishkin, jolsa, eranian
  Cc: linux-arch, ak, catalin.marinas, linuxppc-dev, willy,
	linux-kernel, dave.hansen, npiggin, aneesh.kumar, sparclinux,
	will, davem, kirill.shutemov
In-Reply-To: <20201113111901.743573013@infradead.org>

Hi

Le 13/11/2020 à 12:19, Peter Zijlstra a écrit :
> Hi,
> 
> These patches provide generic infrastructure to determine TLB page size from
> page table entries alone. Perf will use this (for either data or code address)
> to aid in profiling TLB issues.
> 
> While most architectures only have page table aligned large pages, some
> (notably ARM64, Sparc64 and Power) provide non page table aligned large pages
> and need to provide their own implementation of these functions.
> 
> I've provided (completely untested) implementations for ARM64 and Sparc64, but
> failed to penetrate the _many_ Power MMUs. I'm hoping Nick or Aneesh can help
> me out there.
> 

I can help with powerpc 8xx. It is a 32 bits powerpc. The PGD has 1024 entries, that means each 
entry maps 4M.

Page sizes are 4k, 16k, 512k and 8M.

For the 8M pages we use hugepd with a single entry. The two related PGD entries point to the same 
hugepd.

For the other sizes, they are in standard page tables. 16k pages appear 4 times in the page table. 
512k entries appear 128 times in the page table.

When the PGD entry has _PMD_PAGE_8M bits, the PMD entry points to a hugepd with holds the single 8M 
entry.

In the PTE, we have two bits: _PAGE_SPS and _PAGE_HUGE

_PAGE_HUGE means it is a 512k page
_PAGE_SPS means it is not a 4k page

The kernel can by build either with 4k pages as standard page size, or 16k pages. It doesn't change 
the page table layout though.

Hope this is clear. Now I don't really know to wire that up to your series.

Christophe

^ permalink raw reply

* [PATCH] arch: pgtable: define MAX_POSSIBLE_PHYSMEM_BITS where needed
From: Arnd Bergmann @ 2020-11-13 14:59 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-arch, linux-snps-arc, Arnd Bergmann, Vineet Gupta,
	Albert Ou, Paul Walmsley, Russell King, Stefan Agner, linux-mips,
	Minchan Kim, Thomas Bogendoerfer, Paul Mackerras,
	Kirill A . Shutemov, Palmer Dabbelt, linux-riscv, Nitin Gupta,
	linuxppc-dev, Mike Rapoport, linux-arm-kernel

From: Arnd Bergmann <arnd@arndb.de>

Stefan Agner reported a bug when using zsram on 32-bit Arm machines
with RAM above the 4GB address boundary:

  Unable to handle kernel NULL pointer dereference at virtual address 00000000
  pgd = a27bd01c
  [00000000] *pgd=236a0003, *pmd=1ffa64003
  Internal error: Oops: 207 [#1] SMP ARM
  Modules linked in: mdio_bcm_unimac(+) brcmfmac cfg80211 brcmutil raspberrypi_hwmon hci_uart crc32_arm_ce bcm2711_thermal phy_generic genet
  CPU: 0 PID: 123 Comm: mkfs.ext4 Not tainted 5.9.6 #1
  Hardware name: BCM2711
  PC is at zs_map_object+0x94/0x338
  LR is at zram_bvec_rw.constprop.0+0x330/0xa64
  pc : [<c0602b38>]    lr : [<c0bda6a0>]    psr: 60000013
  sp : e376bbe0  ip : 00000000  fp : c1e2921c
  r10: 00000002  r9 : c1dda730  r8 : 00000000
  r7 : e8ff7a00  r6 : 00000000  r5 : 02f9ffa0  r4 : e3710000
  r3 : 000fdffe  r2 : c1e0ce80  r1 : ebf979a0  r0 : 00000000
  Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
  Control: 30c5383d  Table: 235c2a80  DAC: fffffffd
  Process mkfs.ext4 (pid: 123, stack limit = 0x495a22e6)
  Stack: (0xe376bbe0 to 0xe376c000)

As it turns out, zsram needs to know the maximum memory size, which
is defined in MAX_PHYSMEM_BITS when CONFIG_SPARSEMEM is set, or in
MAX_POSSIBLE_PHYSMEM_BITS on the x86 architecture.

The same problem will be hit on all 32-bit architectures that have a
physical address space larger than 4GB and happen to not enable sparsemem
and include asm/sparsemem.h from asm/pgtable.h.

After the initial discussion, I suggested just always defining
MAX_POSSIBLE_PHYSMEM_BITS whenever CONFIG_PHYS_ADDR_T_64BIT is
set, or provoking a build error otherwise. This addresses all
configurations that can currently have this runtime bug, but
leaves all other configurations unchanged.

I looked up the possible number of bits in source code and
datasheets, here is what I found:

 - on ARC, CONFIG_ARC_HAS_PAE40 controls whether 32 or 40 bits are used
 - on ARM, CONFIG_LPAE enables 40 bit addressing, without it we never
   support more than 32 bits, even though supersections in theory allow
   up to 40 bits as well.
 - on MIPS, some MIPS32r1 or later chips support 36 bits, and MIPS32r5
   XPA supports up to 60 bits in theory, but 40 bits are more than
   anyone will ever ship
 - On PowerPC, there are three different implementations of 36 bit
   addressing, but 32-bit is used without CONFIG_PTE_64BIT
 - On RISC-V, the normal page table format can support 34 bit
   addressing. There is no highmem support on RISC-V, so anything
   above 2GB is unused, but it might be useful to eventually support
   CONFIG_ZRAM for high pages.

Fixes: 61989a80fb3a ("staging: zsmalloc: zsmalloc memory allocation library")
Fixes: 02390b87a945 ("mm/zsmalloc: Prepare to variable MAX_PHYSMEM_BITS")
Cc: Stefan Agner <stefan@agner.ch>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: linux-snps-arc@lists.infradead.org
Cc: Russell King <linux@armlinux.org.uk>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: linux-mips@vger.kernel.org
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: linux-riscv@lists.infradead.org
Link: https://lore.kernel.org/linux-mm/bdfa44bf1c570b05d6c70898e2bbb0acf234ecdf.1604762181.git.stefan@agner.ch/
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
If everyone is happy with this version, I would suggest merging this as
a bugfix through my asm-generic tree for linux-5.10. I originally
said I'd send individual patches for each architecture tree, but
I now think this is easier and better documents what is going on.
---
 arch/arc/include/asm/pgtable.h               |  2 ++
 arch/arm/include/asm/pgtable-2level.h        |  2 ++
 arch/arm/include/asm/pgtable-3level.h        |  2 ++
 arch/mips/include/asm/pgtable-32.h           |  3 +++
 arch/powerpc/include/asm/book3s/32/pgtable.h |  2 ++
 arch/powerpc/include/asm/nohash/32/pgtable.h |  2 ++
 arch/riscv/include/asm/pgtable-32.h          |  2 ++
 include/linux/pgtable.h                      | 13 +++++++++++++
 8 files changed, 28 insertions(+)

diff --git a/arch/arc/include/asm/pgtable.h b/arch/arc/include/asm/pgtable.h
index f1ed17edb085..163641726a2b 100644
--- a/arch/arc/include/asm/pgtable.h
+++ b/arch/arc/include/asm/pgtable.h
@@ -134,8 +134,10 @@
 
 #ifdef CONFIG_ARC_HAS_PAE40
 #define PTE_BITS_NON_RWX_IN_PD1	(0xff00000000 | PAGE_MASK | _PAGE_CACHEABLE)
+#define MAX_POSSIBLE_PHYSMEM_BITS 40
 #else
 #define PTE_BITS_NON_RWX_IN_PD1	(PAGE_MASK | _PAGE_CACHEABLE)
+#define MAX_POSSIBLE_PHYSMEM_BITS 32
 #endif
 
 /**************************************************************************
diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h
index 3502c2f746ca..baf7d0204eb5 100644
--- a/arch/arm/include/asm/pgtable-2level.h
+++ b/arch/arm/include/asm/pgtable-2level.h
@@ -75,6 +75,8 @@
 #define PTE_HWTABLE_OFF		(PTE_HWTABLE_PTRS * sizeof(pte_t))
 #define PTE_HWTABLE_SIZE	(PTRS_PER_PTE * sizeof(u32))
 
+#define MAX_POSSIBLE_PHYSMEM_BITS	32
+
 /*
  * PMD_SHIFT determines the size of the area a second-level page table can map
  * PGDIR_SHIFT determines what a third-level page table entry can map
diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h
index fbb6693c3352..2b85d175e999 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -25,6 +25,8 @@
 #define PTE_HWTABLE_OFF		(0)
 #define PTE_HWTABLE_SIZE	(PTRS_PER_PTE * sizeof(u64))
 
+#define MAX_POSSIBLE_PHYSMEM_BITS 40
+
 /*
  * PGDIR_SHIFT determines the size a top-level page table entry can map.
  */
diff --git a/arch/mips/include/asm/pgtable-32.h b/arch/mips/include/asm/pgtable-32.h
index a950fc1ddb4d..6c0532d7b211 100644
--- a/arch/mips/include/asm/pgtable-32.h
+++ b/arch/mips/include/asm/pgtable-32.h
@@ -154,6 +154,7 @@ static inline void pmd_clear(pmd_t *pmdp)
 
 #if defined(CONFIG_XPA)
 
+#define MAX_POSSIBLE_PHYSMEM_BITS 40
 #define pte_pfn(x)		(((unsigned long)((x).pte_high >> _PFN_SHIFT)) | (unsigned long)((x).pte_low << _PAGE_PRESENT_SHIFT))
 static inline pte_t
 pfn_pte(unsigned long pfn, pgprot_t prot)
@@ -169,6 +170,7 @@ pfn_pte(unsigned long pfn, pgprot_t prot)
 
 #elif defined(CONFIG_PHYS_ADDR_T_64BIT) && defined(CONFIG_CPU_MIPS32)
 
+#define MAX_POSSIBLE_PHYSMEM_BITS 36
 #define pte_pfn(x)		((unsigned long)((x).pte_high >> 6))
 
 static inline pte_t pfn_pte(unsigned long pfn, pgprot_t prot)
@@ -183,6 +185,7 @@ static inline pte_t pfn_pte(unsigned long pfn, pgprot_t prot)
 
 #else
 
+#define MAX_POSSIBLE_PHYSMEM_BITS 32
 #ifdef CONFIG_CPU_VR41XX
 #define pte_pfn(x)		((unsigned long)((x).pte >> (PAGE_SHIFT + 2)))
 #define pfn_pte(pfn, prot)	__pte(((pfn) << (PAGE_SHIFT + 2)) | pgprot_val(prot))
diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h b/arch/powerpc/include/asm/book3s/32/pgtable.h
index 36443cda8dcf..1376be95e975 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -36,8 +36,10 @@ static inline bool pte_user(pte_t pte)
  */
 #ifdef CONFIG_PTE_64BIT
 #define PTE_RPN_MASK	(~((1ULL << PTE_RPN_SHIFT) - 1))
+#define MAX_POSSIBLE_PHYSMEM_BITS 36
 #else
 #define PTE_RPN_MASK	(~((1UL << PTE_RPN_SHIFT) - 1))
+#define MAX_POSSIBLE_PHYSMEM_BITS 32
 #endif
 
 /*
diff --git a/arch/powerpc/include/asm/nohash/32/pgtable.h b/arch/powerpc/include/asm/nohash/32/pgtable.h
index ee2243ba96cf..96522f7f0618 100644
--- a/arch/powerpc/include/asm/nohash/32/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/32/pgtable.h
@@ -153,8 +153,10 @@ int map_kernel_page(unsigned long va, phys_addr_t pa, pgprot_t prot);
  */
 #if defined(CONFIG_PPC32) && defined(CONFIG_PTE_64BIT)
 #define PTE_RPN_MASK	(~((1ULL << PTE_RPN_SHIFT) - 1))
+#define MAX_POSSIBLE_PHYSMEM_BITS 36
 #else
 #define PTE_RPN_MASK	(~((1UL << PTE_RPN_SHIFT) - 1))
+#define MAX_POSSIBLE_PHYSMEM_BITS 32
 #endif
 
 /*
diff --git a/arch/riscv/include/asm/pgtable-32.h b/arch/riscv/include/asm/pgtable-32.h
index b0ab66e5fdb1..5b2e79e5bfa5 100644
--- a/arch/riscv/include/asm/pgtable-32.h
+++ b/arch/riscv/include/asm/pgtable-32.h
@@ -14,4 +14,6 @@
 #define PGDIR_SIZE      (_AC(1, UL) << PGDIR_SHIFT)
 #define PGDIR_MASK      (~(PGDIR_SIZE - 1))
 
+#define MAX_POSSIBLE_PHYSMEM_BITS 34
+
 #endif /* _ASM_RISCV_PGTABLE_32_H */
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 71125a4676c4..e237004d498d 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -1427,6 +1427,19 @@ typedef unsigned int pgtbl_mod_mask;
 
 #endif /* !__ASSEMBLY__ */
 
+#if !defined(MAX_POSSIBLE_PHYSMEM_BITS) && !defined(CONFIG_64BIT)
+#ifdef CONFIG_PHYS_ADDR_T_64BIT
+/*
+ * ZSMALLOC needs to know the highest PFN on 32-bit architectures
+ * with physical address space extension, but falls back to
+ * BITS_PER_LONG otherwise.
+ */
+#error Missing MAX_POSSIBLE_PHYSMEM_BITS definition
+#else
+#define MAX_POSSIBLE_PHYSMEM_BITS 32
+#endif
+#endif
+
 #ifndef has_transparent_hugepage
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 #define has_transparent_hugepage() 1
-- 
2.27.0


^ permalink raw reply related

* Re: [PATCH 3/3] powerpc: rewrite atomics to use ARCH_ATOMIC
From: Boqun Feng @ 2020-11-13 15:30 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Christophe Leroy, linux-arch, Arnd Bergmann, Peter Zijlstra,
	linuxppc-dev, linux-kernel, Alexey Kardashevskiy, Will Deacon
In-Reply-To: <20201111110723.3148665-4-npiggin@gmail.com>

Hi Nicholas,

On Wed, Nov 11, 2020 at 09:07:23PM +1000, Nicholas Piggin wrote:
> All the cool kids are doing it.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>  arch/powerpc/include/asm/atomic.h  | 681 ++++++++++-------------------
>  arch/powerpc/include/asm/cmpxchg.h |  62 +--
>  2 files changed, 248 insertions(+), 495 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/atomic.h b/arch/powerpc/include/asm/atomic.h
> index 8a55eb8cc97b..899aa2403ba7 100644
> --- a/arch/powerpc/include/asm/atomic.h
> +++ b/arch/powerpc/include/asm/atomic.h
> @@ -11,185 +11,285 @@
>  #include <asm/cmpxchg.h>
>  #include <asm/barrier.h>
>  
> +#define ARCH_ATOMIC
> +
> +#ifndef CONFIG_64BIT
> +#include <asm-generic/atomic64.h>
> +#endif
> +
>  /*
>   * Since *_return_relaxed and {cmp}xchg_relaxed are implemented with
>   * a "bne-" instruction at the end, so an isync is enough as a acquire barrier
>   * on the platform without lwsync.
>   */
>  #define __atomic_acquire_fence()					\
> -	__asm__ __volatile__(PPC_ACQUIRE_BARRIER "" : : : "memory")
> +	asm volatile(PPC_ACQUIRE_BARRIER "" : : : "memory")
>  
>  #define __atomic_release_fence()					\
> -	__asm__ __volatile__(PPC_RELEASE_BARRIER "" : : : "memory")
> +	asm volatile(PPC_RELEASE_BARRIER "" : : : "memory")
>  
> -static __inline__ int atomic_read(const atomic_t *v)
> -{
> -	int t;
> +#define __atomic_pre_full_fence		smp_mb
>  
> -	__asm__ __volatile__("lwz%U1%X1 %0,%1" : "=r"(t) : "m"(v->counter));
> +#define __atomic_post_full_fence	smp_mb
>  

Do you need to define __atomic_{pre,post}_full_fence for PPC? IIRC, they
are default smp_mb__{before,atomic}_atomic(), so are smp_mb() defautly
on PPC.

> -	return t;
> +#define arch_atomic_read(v)			__READ_ONCE((v)->counter)
> +#define arch_atomic_set(v, i)			__WRITE_ONCE(((v)->counter), (i))
> +#ifdef CONFIG_64BIT
> +#define ATOMIC64_INIT(i)			{ (i) }
> +#define arch_atomic64_read(v)			__READ_ONCE((v)->counter)
> +#define arch_atomic64_set(v, i)			__WRITE_ONCE(((v)->counter), (i))
> +#endif
> +
[...]
>  
> +#define ATOMIC_FETCH_OP_UNLESS_RELAXED(name, type, dtype, width, asm_op) \
> +static inline int arch_##name##_relaxed(type *v, dtype a, dtype u)	\

I don't think we have atomic_fetch_*_unless_relaxed() at atomic APIs,
ditto for:

	atomic_fetch_add_unless_relaxed()
	atomic_inc_not_zero_relaxed()
	atomic_dec_if_positive_relaxed()

, and we don't have the _acquire() and _release() variants for them
either, and if you don't define their fully-ordered version (e.g.
atomic_inc_not_zero()), atomic-arch-fallback.h will use read and cmpxchg
to implement them, and I think not what we want.

[...]
>  
>  #endif /* __KERNEL__ */
>  #endif /* _ASM_POWERPC_ATOMIC_H_ */
> diff --git a/arch/powerpc/include/asm/cmpxchg.h b/arch/powerpc/include/asm/cmpxchg.h
> index cf091c4c22e5..181f7e8b3281 100644
> --- a/arch/powerpc/include/asm/cmpxchg.h
> +++ b/arch/powerpc/include/asm/cmpxchg.h
> @@ -192,7 +192,7 @@ __xchg_relaxed(void *ptr, unsigned long x, unsigned int size)
>       		(unsigned long)_x_, sizeof(*(ptr))); 			     \
>    })
>  
> -#define xchg_relaxed(ptr, x)						\
> +#define arch_xchg_relaxed(ptr, x)					\
>  ({									\
>  	__typeof__(*(ptr)) _x_ = (x);					\
>  	(__typeof__(*(ptr))) __xchg_relaxed((ptr),			\
> @@ -448,35 +448,7 @@ __cmpxchg_relaxed(void *ptr, unsigned long old, unsigned long new,
>  	return old;
>  }
>  
> -static __always_inline unsigned long
> -__cmpxchg_acquire(void *ptr, unsigned long old, unsigned long new,
> -		  unsigned int size)
> -{
> -	switch (size) {
> -	case 1:
> -		return __cmpxchg_u8_acquire(ptr, old, new);
> -	case 2:
> -		return __cmpxchg_u16_acquire(ptr, old, new);
> -	case 4:
> -		return __cmpxchg_u32_acquire(ptr, old, new);
> -#ifdef CONFIG_PPC64
> -	case 8:
> -		return __cmpxchg_u64_acquire(ptr, old, new);
> -#endif
> -	}
> -	BUILD_BUG_ON_MSG(1, "Unsupported size for __cmpxchg_acquire");
> -	return old;
> -}
> -#define cmpxchg(ptr, o, n)						 \
> -  ({									 \
> -     __typeof__(*(ptr)) _o_ = (o);					 \
> -     __typeof__(*(ptr)) _n_ = (n);					 \
> -     (__typeof__(*(ptr))) __cmpxchg((ptr), (unsigned long)_o_,		 \
> -				    (unsigned long)_n_, sizeof(*(ptr))); \
> -  })
> -
> -

If you remove {atomic_}_cmpxchg_{,_acquire}() and use the version
provided by atomic-arch-fallback.h, then a fail cmpxchg or
cmpxchg_acquire() will still result into a full barrier or a acquire
barrier after the RMW operation, the barrier is not necessary and
probably this is not what we want?

Regards,
Boqun

> -#define cmpxchg_local(ptr, o, n)					 \
> +#define arch_cmpxchg_local(ptr, o, n)					 \
>    ({									 \
>       __typeof__(*(ptr)) _o_ = (o);					 \
>       __typeof__(*(ptr)) _n_ = (n);					 \
> @@ -484,7 +456,7 @@ __cmpxchg_acquire(void *ptr, unsigned long old, unsigned long new,
>  				    (unsigned long)_n_, sizeof(*(ptr))); \
>    })
>  
> -#define cmpxchg_relaxed(ptr, o, n)					\
> +#define arch_cmpxchg_relaxed(ptr, o, n)					\
>  ({									\
>  	__typeof__(*(ptr)) _o_ = (o);					\
>  	__typeof__(*(ptr)) _n_ = (n);					\
> @@ -493,38 +465,20 @@ __cmpxchg_acquire(void *ptr, unsigned long old, unsigned long new,
>  			sizeof(*(ptr)));				\
>  })
>  
> -#define cmpxchg_acquire(ptr, o, n)					\
> -({									\
> -	__typeof__(*(ptr)) _o_ = (o);					\
> -	__typeof__(*(ptr)) _n_ = (n);					\
> -	(__typeof__(*(ptr))) __cmpxchg_acquire((ptr),			\
> -			(unsigned long)_o_, (unsigned long)_n_,		\
> -			sizeof(*(ptr)));				\
> -})
>  #ifdef CONFIG_PPC64
> -#define cmpxchg64(ptr, o, n)						\
> -  ({									\
> -	BUILD_BUG_ON(sizeof(*(ptr)) != 8);				\
> -	cmpxchg((ptr), (o), (n));					\
> -  })
> -#define cmpxchg64_local(ptr, o, n)					\
> +#define arch_cmpxchg64_local(ptr, o, n)					\
>    ({									\
>  	BUILD_BUG_ON(sizeof(*(ptr)) != 8);				\
> -	cmpxchg_local((ptr), (o), (n));					\
> +	arch_cmpxchg_local((ptr), (o), (n));				\
>    })
> -#define cmpxchg64_relaxed(ptr, o, n)					\
> -({									\
> -	BUILD_BUG_ON(sizeof(*(ptr)) != 8);				\
> -	cmpxchg_relaxed((ptr), (o), (n));				\
> -})
> -#define cmpxchg64_acquire(ptr, o, n)					\
> +#define arch_cmpxchg64_relaxed(ptr, o, n)				\
>  ({									\
>  	BUILD_BUG_ON(sizeof(*(ptr)) != 8);				\
> -	cmpxchg_acquire((ptr), (o), (n));				\
> +	arch_cmpxchg_relaxed((ptr), (o), (n));				\
>  })
>  #else
>  #include <asm-generic/cmpxchg-local.h>
> -#define cmpxchg64_local(ptr, o, n) __cmpxchg64_local_generic((ptr), (o), (n))
> +#define arch_cmpxchg64_local(ptr, o, n) __cmpxchg64_local_generic((ptr), (o), (n))
>  #endif
>  
>  #endif /* __KERNEL__ */
> -- 
> 2.23.0
> 

^ permalink raw reply

* [PATCH] ocxl: Mmio invalidation support
From: Christophe Lombard @ 2020-11-13 15:33 UTC (permalink / raw)
  To: linuxppc-dev, fbarrat

OpenCAPI 4.0/5.0 with TLBI/SLBI Snooping, is not used due to performance
problems caused by the PAU having to process all incoming TLBI/SLBI
commands which will cause them to back up on the PowerBus.

When the Address Translation Mode requires TLB and SLB Invalidate
operations to be initiated using MMIO registers, a set of registers like
the following is used:
• XTS MMIO ATSD0 LPARID register
• XTS MMIO ATSD0 AVA register
• XTS MMIO ATSD0 launch register, write access initiates a shoot down
• XTS MMIO ATSD0 status register

The MMIO based mechanism also blocks the NPU/PAU from snooping TLBIE
commands from the PowerBus.

The Shootdown commands (ATSD) will be generated using MMIO registers
in the NPU/PAU and sent to the device.

Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/pnv-ocxl.h   |   2 +
 arch/powerpc/platforms/powernv/ocxl.c |  19 +++
 drivers/misc/ocxl/link.c              | 180 ++++++++++++++++++++++----
 drivers/misc/ocxl/ocxl_internal.h     |  46 ++++++-
 drivers/misc/ocxl/trace.h             | 125 ++++++++++++++++++
 5 files changed, 348 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/include/asm/pnv-ocxl.h b/arch/powerpc/include/asm/pnv-ocxl.h
index d37ededca3ee..4a23abcc347b 100644
--- a/arch/powerpc/include/asm/pnv-ocxl.h
+++ b/arch/powerpc/include/asm/pnv-ocxl.h
@@ -28,4 +28,6 @@ int pnv_ocxl_spa_setup(struct pci_dev *dev, void *spa_mem, int PE_mask, void **p
 void pnv_ocxl_spa_release(void *platform_data);
 int pnv_ocxl_spa_remove_pe_from_cache(void *platform_data, int pe_handle);
 
+extern int pnv_ocxl_map_lpar(struct pci_dev *dev, uint64_t lparid,
+			     uint64_t lpcr);
 #endif /* _ASM_PNV_OCXL_H */
diff --git a/arch/powerpc/platforms/powernv/ocxl.c b/arch/powerpc/platforms/powernv/ocxl.c
index ecdad219d704..100546ea635f 100644
--- a/arch/powerpc/platforms/powernv/ocxl.c
+++ b/arch/powerpc/platforms/powernv/ocxl.c
@@ -483,3 +483,22 @@ int pnv_ocxl_spa_remove_pe_from_cache(void *platform_data, int pe_handle)
 	return rc;
 }
 EXPORT_SYMBOL_GPL(pnv_ocxl_spa_remove_pe_from_cache);
+
+int pnv_ocxl_map_lpar(struct pci_dev *dev, uint64_t lparid,
+		      uint64_t lpcr)
+{
+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
+	struct pnv_phb *phb = hose->private_data;
+	u32 bdfn;
+	int rc;
+
+	bdfn = (dev->bus->number << 8) | dev->devfn;
+	rc = opal_npu_map_lpar(phb->opal_id, bdfn, lparid, lpcr);
+	if (rc) {
+		dev_err(&dev->dev, "Error mapping device to LPAR: %d\n", rc);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(pnv_ocxl_map_lpar);
diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
index fd73d3bc0eb6..9b5b77d40734 100644
--- a/drivers/misc/ocxl/link.c
+++ b/drivers/misc/ocxl/link.c
@@ -4,6 +4,8 @@
 #include <linux/mutex.h>
 #include <linux/mm_types.h>
 #include <linux/mmu_context.h>
+#include <linux/mm.h>
+#include <linux/mmu_notifier.h>
 #include <asm/copro.h>
 #include <asm/pnv-ocxl.h>
 #include <asm/xive.h>
@@ -33,6 +35,31 @@
 
 #define SPA_PE_VALID		0x80000000
 
+struct spa;
+
+/*
+ * A opencapi link can be used be by several PCI functions. We have
+ * one link per device slot.
+ *
+ * A linked list of opencapi links should suffice, as there's a
+ * limited number of opencapi slots on a system and lookup is only
+ * done when the device is probed
+ */
+struct ocxl_link {
+	struct list_head list;
+	struct kref ref;
+	int domain;
+	int bus;
+	int dev;
+	u64 mmio_atsd; /* ATSD physical address */
+	void __iomem *base;    /* ATSD register virtual address */
+	spinlock_t atsd_lock; // to serialize shootdowns
+	atomic_t irq_available;
+	struct spa *spa;
+	void *platform_data;
+};
+static struct list_head links_list = LIST_HEAD_INIT(links_list);
+static DEFINE_MUTEX(links_list_lock);
 
 struct pe_data {
 	struct mm_struct *mm;
@@ -41,6 +68,8 @@ struct pe_data {
 	/* opaque pointer to be passed to the above callback */
 	void *xsl_err_data;
 	struct rcu_head rcu;
+	struct ocxl_link *link;
+	struct mmu_notifier mmu_notifier;
 };
 
 struct spa {
@@ -69,27 +98,6 @@ struct spa {
 	} xsl_fault;
 };
 
-/*
- * A opencapi link can be used be by several PCI functions. We have
- * one link per device slot.
- *
- * A linked list of opencapi links should suffice, as there's a
- * limited number of opencapi slots on a system and lookup is only
- * done when the device is probed
- */
-struct ocxl_link {
-	struct list_head list;
-	struct kref ref;
-	int domain;
-	int bus;
-	int dev;
-	atomic_t irq_available;
-	struct spa *spa;
-	void *platform_data;
-};
-static struct list_head links_list = LIST_HEAD_INIT(links_list);
-static DEFINE_MUTEX(links_list_lock);
-
 enum xsl_response {
 	CONTINUE,
 	ADDRESS_ERROR,
@@ -126,6 +134,8 @@ static void ack_irq(struct spa *spa, enum xsl_response r)
 	}
 }
 
+static const struct mmu_notifier_ops ocxl_mmu_notifier_ops;
+
 static void xsl_fault_handler_bh(struct work_struct *fault_work)
 {
 	vm_fault_t flt = 0;
@@ -376,6 +386,7 @@ static void free_spa(struct ocxl_link *link)
 
 static int alloc_link(struct pci_dev *dev, int PE_mask, struct ocxl_link **out_link)
 {
+	struct pci_controller *hose = pci_bus_to_host(dev->bus);
 	struct ocxl_link *link;
 	int rc;
 
@@ -403,6 +414,22 @@ static int alloc_link(struct pci_dev *dev, int PE_mask, struct ocxl_link **out_l
 	if (rc)
 		goto err_xsl_irq;
 
+	/* Since OpenCAPI 5.0, Address Translation Mode requires TLB
+	 * and SLB Invalidate operations to be initiated using MMIO
+	 * registers
+	 */
+	if (of_property_read_u64_index(hose->dn, "ibm,mmio-atsd",
+				       0, &link->mmio_atsd)) {
+		dev_info(&dev->dev, "No available ATSD found\n");
+	}
+	if (link->mmio_atsd) {
+		link->base = ioremap(link->mmio_atsd, 24);
+		if (!link->base)
+			dev_warn(&dev->dev, "ioremap failed - mmio_atsd: %#llx\n", link->mmio_atsd);
+		else
+			pnv_ocxl_map_lpar(dev, mfspr(SPRN_LPID), 0);
+	}
+
 	*out_link = link;
 	return 0;
 
@@ -464,12 +491,101 @@ void ocxl_link_release(struct pci_dev *dev, void *link_handle)
 {
 	struct ocxl_link *link = (struct ocxl_link *) link_handle;
 
+	if (link->base) {
+		iounmap(link->base);
+		link->base = NULL;
+	}
 	mutex_lock(&links_list_lock);
 	kref_put(&link->ref, release_xsl);
 	mutex_unlock(&links_list_lock);
 }
 EXPORT_SYMBOL_GPL(ocxl_link_release);
 
+static void tlb_invalidate(struct ocxl_link *link,
+			   unsigned long pid,
+			   unsigned long addr)
+{
+	unsigned long timeout = jiffies + (HZ * OCXL_ATSD_TIMEOUT);
+	uint64_t val;
+	int pend;
+
+	if (!link->base)
+		return;
+
+	spin_lock(&link->atsd_lock);
+	if (addr) {
+		/* load Abbreviated Virtual Address register with
+		 * the necessary value
+		 */
+		val = SETFIELD(XTS_ATSD_AVA_AVA, 0ull, addr >> (63-51));
+		out_be64(link->base + XTS_ATSD_AVA, val);
+		eieio();
+		trace_ocxl_mmu_notifier_mmio_atsd_ava(val, pid);
+	}
+
+	/* Write access initiates a shoot down to initiate the
+	 * TLB Invalidate command
+	 */
+	val = XTS_ATSD_LNCH_R;
+	if (addr) {
+		val = SETFIELD(XTS_ATSD_LNCH_RIC, val, 0b00);
+		val = SETFIELD(XTS_ATSD_LNCH_IS, val, 0b00);
+	} else {
+		val = SETFIELD(XTS_ATSD_LNCH_RIC, val, 0b10);
+		val = SETFIELD(XTS_ATSD_LNCH_IS, val, 0b01);
+		val |= XTS_ATSD_LNCH_OCAPI_SINGLETON;
+	}
+	val |= XTS_ATSD_LNCH_PRS;
+	val = SETFIELD(XTS_ATSD_LNCH_AP, val, 0b101);
+	val = SETFIELD(XTS_ATSD_LNCH_PID, val, pid);
+	out_be64(link->base + XTS_ATSD_LNCH, val);
+	trace_ocxl_mmu_notifier_mmio_atsd_lnch(val, addr, pid);
+
+	/* Poll the ATSD status register to determine when the
+	* TLB Invalidate has been completed.
+	*/
+	val = in_be64(link->base + XTS_ATSD_STAT);
+	pend = val >> 63;
+	trace_ocxl_mmu_notifier_mmio_atsd_stat(val, addr, pid);
+
+	while (pend) {
+		if (time_after_eq(jiffies, timeout)) {
+			pr_err("%s - Timeout while reading XTS MMIO ATSD status register (val=%#llx, pidr=0x%lx)\n",
+			       __func__, val, pid);
+			spin_unlock(&link->atsd_lock);
+			return;
+		}
+		cpu_relax();
+		val = in_be64(link->base + XTS_ATSD_STAT);
+		pend = val >> 63;
+	}
+	spin_unlock(&link->atsd_lock);
+	trace_ocxl_mmu_notifier_mmio_atsd_stat(val, addr, pid);
+}
+
+static void invalidate_range_end(struct mmu_notifier *mn,
+				 const struct mmu_notifier_range *range)
+{
+	struct pe_data *pe_data = container_of(mn, struct pe_data, mmu_notifier);
+	struct ocxl_link *link = pe_data->link;
+	struct mm_struct *mm = mn->mm;
+	unsigned long addr, pid, page_size = PAGE_SIZE;
+
+	pid = mm->context.id;
+	trace_ocxl_mmu_notifier_range(range->start, range->end, pid);
+
+	for (addr = range->start; addr < range->end; addr += page_size)
+		tlb_invalidate(link, pid, addr);
+}
+
+static const struct mmu_notifier_ops ocxl_mmu_notifier_ops = {
+	/* invalidate_range_end() is called when all pages in the
+	 * range have been unmapped and the pages have been freed by
+	 * the VM
+	 */
+	.invalidate_range_end = invalidate_range_end,
+};
+
 static u64 calculate_cfg_state(bool kernel)
 {
 	u64 state;
@@ -517,7 +633,7 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr,
 		goto unlock;
 	}
 
-	pe_data = kmalloc(sizeof(*pe_data), GFP_KERNEL);
+	pe_data = kzalloc(sizeof(*pe_data), GFP_KERNEL);
 	if (!pe_data) {
 		rc = -ENOMEM;
 		goto unlock;
@@ -526,9 +642,13 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr,
 	pe_data->mm = mm;
 	pe_data->xsl_err_cb = xsl_err_cb;
 	pe_data->xsl_err_data = xsl_err_data;
+	pe_data->link = link;
+	pe_data->mmu_notifier.ops = &ocxl_mmu_notifier_ops;
 
 	memset(pe, 0, sizeof(struct ocxl_process_element));
 	pe->config_state = cpu_to_be64(calculate_cfg_state(pidr == 0));
+	pe->pasid = cpu_to_be32(pasid << (31 - 19));
+	pe->bdf = cpu_to_be32(1 << (31 - 15));
 	pe->lpid = cpu_to_be32(mfspr(SPRN_LPID));
 	pe->pid = cpu_to_be32(pidr);
 	pe->tid = cpu_to_be32(tidr);
@@ -540,8 +660,17 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr,
 	 * by the nest MMU. If we have a kernel context, TLBIs are
 	 * already global.
 	 */
-	if (mm)
+	if (mm) {
 		mm_context_add_copro(mm);
+		if (link->base) {
+			/* Use MMIO registers for the TLB and SLB
+			 * Invalidate operations.
+			 */
+			trace_init_mmu_notifier(pasid, mm->context.id);
+			mmu_notifier_register(&pe_data->mmu_notifier, mm);
+		}
+	}
+
 	/*
 	 * Barrier is to make sure PE is visible in the SPA before it
 	 * is used by the device. It also helps with the global TLBI
@@ -672,6 +801,11 @@ int ocxl_link_remove_pe(void *link_handle, int pasid)
 		WARN(1, "Couldn't find pe data when removing PE\n");
 	} else {
 		if (pe_data->mm) {
+			if (link->base) {
+				trace_release_mmu_notifier(pasid, pe_data->mm->context.id);
+				mmu_notifier_unregister(&pe_data->mmu_notifier, pe_data->mm);
+				tlb_invalidate(link, pe_data->mm->context.id, 0ull);
+			}
 			mm_context_remove_copro(pe_data->mm);
 			mmdrop(pe_data->mm);
 		}
diff --git a/drivers/misc/ocxl/ocxl_internal.h b/drivers/misc/ocxl/ocxl_internal.h
index 0bad0a123af6..35d8be3cd270 100644
--- a/drivers/misc/ocxl/ocxl_internal.h
+++ b/drivers/misc/ocxl/ocxl_internal.h
@@ -8,6 +8,48 @@
 #include <linux/list.h>
 #include <misc/ocxl.h>
 
+/* Find left shift from first set bit in mask */
+#define MASK_TO_LSH(m)		(__builtin_ffsl(m) - 1)
+
+/* Set field fname of oval to fval
+ * NOTE: oval isn't modified, the combined result is returned
+ */
+#define SETFIELD(m, v, val)				\
+	(((v) & ~(m)) |	((((typeof(v))(val)) << MASK_TO_LSH(m)) & (m)))
+
+#define OCXL_ATSD_TIMEOUT		1
+
+/* 5.9.3.3 TLB Management Instructions - PowerISA tags workbook */
+#define XTS_ATSD_LNCH		0x00
+#define   XTS_ATSD_LNCH_R	PPC_BIT(0)		/* Radix Invalidate */
+#define   XTS_ATSD_LNCH_RIC	PPC_BITMASK(1,2)	/* Radix Invalidation Control
+							 * 0b00 Just invalidate TLB.
+							 * 0b01 Invalidate just Page Walk Cache.
+							 * 0b10 Invalidate TLB, Page Walk Cache, and any
+							 * caching of Partition and Process Table Entries.
+							 */
+#define   XTS_ATSD_LNCH_LP	PPC_BITMASK(3, 10)	/* Number and Page Size of translations to be invalidated (HPT only ?) */
+#define   XTS_ATSD_LNCH_IS	PPC_BITMASK(11, 12)	/* Invalidation Criteria
+							 * 0b00 Invalidate just the target VA.
+							 * 0b01 Invalidate matching PID.
+							 */
+#define   XTS_ATSD_LNCH_PRS	PPC_BIT(13)		/* 0b1: Process Scope, 0b0: Partition Scope */
+#define   XTS_ATSD_LNCH_B	PPC_BIT(14)		/* Invalidation Flag */
+#define   XTS_ATSD_LNCH_AP	PPC_BITMASK(15, 17)	/* Actual Page Size to be invalidated
+							 * 000 4KB
+							 * 101 64KB
+							 * 001 2MB
+							 * 010 1GB
+							 */
+#define   XTS_ATSD_LNCH_L	PPC_BIT(18)		/* Defines the large page select (L=0b0 for 4KB pages, L=0b1 for large pages) */
+#define   XTS_ATSD_LNCH_PID	PPC_BITMASK(19, 38)	/* Process ID */
+#define   XTS_ATSD_LNCH_F	PPC_BIT(39)		/* NoFlush – Assumed to be 0b0 */
+#define   XTS_ATSD_LNCH_OCAPI_SLBI	PPC_BIT(40)
+#define   XTS_ATSD_LNCH_OCAPI_SINGLETON	PPC_BIT(41)
+#define XTS_ATSD_AVA		0x08
+#define   XTS_ATSD_AVA_AVA	PPC_BITMASK(0, 51) /* au lieu de 35*/
+#define XTS_ATSD_STAT		0x10
+
 #define MAX_IRQ_PER_LINK	2000
 #define MAX_IRQ_PER_CONTEXT	MAX_IRQ_PER_LINK
 
@@ -84,7 +126,9 @@ struct ocxl_context {
 
 struct ocxl_process_element {
 	__be64 config_state;
-	__be32 reserved1[11];
+	__be32 pasid;
+	__be32 bdf;
+	__be32 reserved1[9];
 	__be32 lpid;
 	__be32 tid;
 	__be32 pid;
diff --git a/drivers/misc/ocxl/trace.h b/drivers/misc/ocxl/trace.h
index 17e21cb2addd..6171069d071a 100644
--- a/drivers/misc/ocxl/trace.h
+++ b/drivers/misc/ocxl/trace.h
@@ -8,6 +8,131 @@
 
 #include <linux/tracepoint.h>
 
+
+TRACE_EVENT(ocxl_mmu_notifier_range,
+	TP_PROTO(unsigned long start, unsigned long end, unsigned long pidr),
+	TP_ARGS(start, end, pidr),
+
+	TP_STRUCT__entry(
+		__field(unsigned long, start)
+		__field(unsigned long, end)
+		__field(unsigned long, pidr)
+	),
+
+	TP_fast_assign(
+		__entry->start = start;
+		__entry->end = end;
+		__entry->pidr = pidr;
+	),
+
+	TP_printk("start=0x%lx end=0x%lx pidr=0x%lx",
+		__entry->start,
+		__entry->end,
+		__entry->pidr
+	)
+);
+
+TRACE_EVENT(ocxl_mmu_notifier_mmio_atsd_ava,
+	TP_PROTO(u64 val, unsigned long pidr),
+	TP_ARGS(val, pidr),
+
+	TP_STRUCT__entry(
+		__field(u64, val)
+		__field(unsigned long, pidr)
+	),
+
+	TP_fast_assign(
+		__entry->val = val;
+		__entry->pidr = pidr;
+	),
+
+	TP_printk("ATSD AVA: 0x%llx pidr=0x%lx",
+		__entry->val, __entry->pidr
+	)
+);
+
+TRACE_EVENT(ocxl_mmu_notifier_mmio_atsd_lnch,
+	TP_PROTO(u64 val, unsigned long addr, unsigned long pidr),
+	TP_ARGS(val, addr, pidr),
+
+	TP_STRUCT__entry(
+		__field(u64, val)
+		__field(unsigned long, addr)
+		__field(unsigned long, pidr)
+	),
+
+	TP_fast_assign(
+		__entry->val = val;
+		__entry->addr = addr;
+		__entry->pidr = pidr;
+	),
+
+	TP_printk("ATSD LNCH: 0x%llx addr=0x%lx pidr=0x%lx",
+		__entry->val, __entry->addr, __entry->pidr
+	)
+);
+
+TRACE_EVENT(ocxl_mmu_notifier_mmio_atsd_stat,
+	TP_PROTO(u64 val, unsigned long addr, unsigned long pidr),
+	TP_ARGS(val, addr, pidr),
+
+	TP_STRUCT__entry(
+		__field(u64, val)
+		__field(unsigned long, addr)
+		__field(unsigned long, pidr)
+	),
+
+	TP_fast_assign(
+		__entry->val = val;
+		__entry->addr = addr;
+		__entry->pidr = pidr;
+	),
+
+	TP_printk("ATSD STAT: 0x%llx addr=0x%lx pidr=0x%lx",
+		__entry->val, __entry->addr, __entry->pidr
+	)
+);
+
+TRACE_EVENT(init_mmu_notifier,
+	TP_PROTO(int pasid, unsigned long pidr),
+	TP_ARGS(pasid, pidr),
+
+	TP_STRUCT__entry(
+		__field(int, pasid)
+		__field(unsigned long, pidr)
+	),
+
+	TP_fast_assign(
+		__entry->pasid = pasid;
+		__entry->pidr = pidr;
+	),
+
+	TP_printk("pasid=%d, pidr=0x%lx",
+		__entry->pasid,
+		__entry->pidr
+	)
+);
+
+TRACE_EVENT(release_mmu_notifier,
+	TP_PROTO(int pasid, unsigned long pidr),
+	TP_ARGS(pasid, pidr),
+
+	TP_STRUCT__entry(
+		__field(int, pasid)
+		__field(unsigned long, pidr)
+	),
+
+	TP_fast_assign(
+		__entry->pasid = pasid;
+		__entry->pidr = pidr;
+	),
+
+	TP_printk("pasid=%d, pidr=0x%lx",
+		__entry->pasid,
+		__entry->pidr
+	)
+);
+
 DECLARE_EVENT_CLASS(ocxl_context,
 	TP_PROTO(pid_t pid, void *spa, int pasid, u32 pidr, u32 tidr),
 	TP_ARGS(pid, spa, pasid, pidr, tidr),
-- 
2.28.0


^ permalink raw reply related

* Re: [PATCH net-next 01/12] ibmvnic: Ensure that subCRQ entry reads are ordered
From: Brian King @ 2020-11-13 16:14 UTC (permalink / raw)
  To: Thomas Falcon, netdev
  Cc: cforno12, ljp, ricklind, dnbanerg, drt, sukadev, linuxppc-dev
In-Reply-To: <1605208207-1896-2-git-send-email-tlfalcon@linux.ibm.com>

Reviewed-by: Brian King <brking@linux.vnet.ibm.com>


-- 
Brian King
Power Linux I/O
IBM Linux Technology Center


^ permalink raw reply

* Re: [PATCH net-next 02/12] ibmvnic: Introduce indirect subordinate Command Response Queue buffer
From: Brian King @ 2020-11-13 16:17 UTC (permalink / raw)
  To: Thomas Falcon, netdev
  Cc: cforno12, ljp, ricklind, dnbanerg, drt, sukadev, linuxppc-dev
In-Reply-To: <1605208207-1896-3-git-send-email-tlfalcon@linux.ibm.com>

Reviewed-by: Brian King <brking@linux.vnet.ibm.com>


-- 
Brian King
Power Linux I/O
IBM Linux Technology Center


^ permalink raw reply

* Re: [PATCH] arch: pgtable: define MAX_POSSIBLE_PHYSMEM_BITS where needed
From: Thomas Bogendoerfer @ 2020-11-13 15:53 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-arch, linux-snps-arc, Arnd Bergmann, Minchan Kim,
	Vineet Gupta, Paul Walmsley, Russell King, Stefan Agner,
	linux-mips, linux-mm, Albert Ou, Paul Mackerras,
	Kirill A . Shutemov, Palmer Dabbelt, linux-riscv, Nitin Gupta,
	linuxppc-dev, Mike Rapoport, linux-arm-kernel
In-Reply-To: <20201113145932.10994-1-arnd@kernel.org>

On Fri, Nov 13, 2020 at 03:59:32PM +0100, Arnd Bergmann wrote:
> From: Arnd Bergmann <arnd@arndb.de>
> 
> Stefan Agner reported a bug when using zsram on 32-bit Arm machines
> with RAM above the 4GB address boundary:
> 
>   Unable to handle kernel NULL pointer dereference at virtual address 00000000
>   pgd = a27bd01c
>   [00000000] *pgd=236a0003, *pmd=1ffa64003
>   Internal error: Oops: 207 [#1] SMP ARM
>   Modules linked in: mdio_bcm_unimac(+) brcmfmac cfg80211 brcmutil raspberrypi_hwmon hci_uart crc32_arm_ce bcm2711_thermal phy_generic genet
>   CPU: 0 PID: 123 Comm: mkfs.ext4 Not tainted 5.9.6 #1
>   Hardware name: BCM2711
>   PC is at zs_map_object+0x94/0x338
>   LR is at zram_bvec_rw.constprop.0+0x330/0xa64
>   pc : [<c0602b38>]    lr : [<c0bda6a0>]    psr: 60000013
>   sp : e376bbe0  ip : 00000000  fp : c1e2921c
>   r10: 00000002  r9 : c1dda730  r8 : 00000000
>   r7 : e8ff7a00  r6 : 00000000  r5 : 02f9ffa0  r4 : e3710000
>   r3 : 000fdffe  r2 : c1e0ce80  r1 : ebf979a0  r0 : 00000000
>   Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
>   Control: 30c5383d  Table: 235c2a80  DAC: fffffffd
>   Process mkfs.ext4 (pid: 123, stack limit = 0x495a22e6)
>   Stack: (0xe376bbe0 to 0xe376c000)
> 
> As it turns out, zsram needs to know the maximum memory size, which
> is defined in MAX_PHYSMEM_BITS when CONFIG_SPARSEMEM is set, or in
> MAX_POSSIBLE_PHYSMEM_BITS on the x86 architecture.
> 
> The same problem will be hit on all 32-bit architectures that have a
> physical address space larger than 4GB and happen to not enable sparsemem
> and include asm/sparsemem.h from asm/pgtable.h.
> 
> After the initial discussion, I suggested just always defining
> MAX_POSSIBLE_PHYSMEM_BITS whenever CONFIG_PHYS_ADDR_T_64BIT is
> set, or provoking a build error otherwise. This addresses all
> configurations that can currently have this runtime bug, but
> leaves all other configurations unchanged.
> 
> I looked up the possible number of bits in source code and
> datasheets, here is what I found:
> 
>  - on ARC, CONFIG_ARC_HAS_PAE40 controls whether 32 or 40 bits are used
>  - on ARM, CONFIG_LPAE enables 40 bit addressing, without it we never
>    support more than 32 bits, even though supersections in theory allow
>    up to 40 bits as well.
>  - on MIPS, some MIPS32r1 or later chips support 36 bits, and MIPS32r5
>    XPA supports up to 60 bits in theory, but 40 bits are more than
>    anyone will ever ship
>  - On PowerPC, there are three different implementations of 36 bit
>    addressing, but 32-bit is used without CONFIG_PTE_64BIT
>  - On RISC-V, the normal page table format can support 34 bit
>    addressing. There is no highmem support on RISC-V, so anything
>    above 2GB is unused, but it might be useful to eventually support
>    CONFIG_ZRAM for high pages.
> 
> Fixes: 61989a80fb3a ("staging: zsmalloc: zsmalloc memory allocation library")
> Fixes: 02390b87a945 ("mm/zsmalloc: Prepare to variable MAX_PHYSMEM_BITS")
> Cc: Stefan Agner <stefan@agner.ch>
> Cc: Mike Rapoport <rppt@kernel.org>
> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Cc: Nitin Gupta <ngupta@vflare.org>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Vineet Gupta <vgupta@synopsys.com>
> Cc: linux-snps-arc@lists.infradead.org
> Cc: Russell King <linux@armlinux.org.uk>
> Cc: linux-arm-kernel@lists.infradead.org
> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
> Cc: linux-mips@vger.kernel.org
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Cc: Paul Mackerras <paulus@samba.org>
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: Paul Walmsley <paul.walmsley@sifive.com>
> Cc: Palmer Dabbelt <palmer@dabbelt.com>
> Cc: Albert Ou <aou@eecs.berkeley.edu>
> Cc: linux-riscv@lists.infradead.org
> Link: https://lore.kernel.org/linux-mm/bdfa44bf1c570b05d6c70898e2bbb0acf234ecdf.1604762181.git.stefan@agner.ch/
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> ---
> If everyone is happy with this version, I would suggest merging this as
> a bugfix through my asm-generic tree for linux-5.10. I originally
> said I'd send individual patches for each architecture tree, but
> I now think this is easier and better documents what is going on.
> ---
>  arch/mips/include/asm/pgtable-32.h           |  3 +++

Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>

Thomas.

-- 
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea.                                                [ RFC1925, 2.3 ]

^ permalink raw reply

* Re: [PATCH 1/6] ibmvfc: byte swap login_buf.resp values in attribute show functions
From: Tyrel Datwyler @ 2020-11-13 19:39 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: martin.petersen, linux-scsi, linux-kernel, james.bottomley,
	brking, linuxppc-dev
In-Reply-To: <20201112093752.GA24235@infradead.org>

On 11/12/20 1:37 AM, Christoph Hellwig wrote:
> On Wed, Nov 11, 2020 at 07:04:37PM -0600, Tyrel Datwyler wrote:
>> Both ibmvfc_show_host_(capabilities|npiv_version) functions retrieve
>> values from vhost->login_buf.resp buffer. This is the MAD response
>> buffer from the VIOS and as such any multi-byte non-string values are in
>> big endian format.
>>
>> Byte swap these values to host cpu endian format for better human
>> readability.
> 
> The whole series creates tons of pointlessly over 80 char lines.
> Please do a quick fixup.
> 

The checkpatch script only warns at 100 char lines these days. To be fair though
I did have two lines go over that limit by a couple characters, there are a
couple commit log typos, and I had an if keyword with no space after before the
opening parenthesis. So, I'll happily re-spin.

However, for my info going forward is the SCSI subsystem sticking to 80 char
lines as a hard limit?

-Tyrel

^ permalink raw reply

* Re: Error: invalid switch -me200
From: Nick Desaulniers @ 2020-11-13 19:42 UTC (permalink / raw)
  To: Nathan Chancellor, Michael Ellerman
  Cc: kbuild-all, kernel test robot, Fāng-ruì Sòng,
	Masahiro Yamada, LKML, clang-built-linux, linuxppc-dev
In-Reply-To: <20201113190824.GA1477315@ubuntu-m3-large-x86>

+ MPE, PPC

On Fri, Nov 13, 2020 at 11:08 AM Nathan Chancellor
<natechancellor@gmail.com> wrote:
>
> On Fri, Nov 13, 2020 at 09:28:03AM -0800, Fāng-ruì Sòng wrote:
> > On Thu, Nov 12, 2020 at 7:22 PM kernel test robot <lkp@intel.com> wrote:
> > >
> > > Hi Fangrui,
> > >
> > > FYI, the error/warning still remains.
> > >
> > > tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
> > > head:   585e5b17b92dead8a3aca4e3c9876fbca5f7e0ba
> > > commit: ca9b31f6bb9c6aa9b4e5f0792f39a97bbffb8c51 Makefile: Fix GCC_TOOLCHAIN_DIR prefix for Clang cross compilation
> > > date:   4 months ago
> > > config: powerpc-randconfig-r031-20201113 (attached as .config)

^ randconfig

> > > compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 9e0c35655b6e8186baef8840b26ba4090503b554)
> > > reproduce (this is a W=1 build):
> > >         wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
> > >         chmod +x ~/bin/make.cross
> > >         # install powerpc cross compiling tool for clang build
> > >         # apt-get install binutils-powerpc-linux-gnu
> > >         # https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ca9b31f6bb9c6aa9b4e5f0792f39a97bbffb8c51
> > >         git remote add linus https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> > >         git fetch --no-tags linus master
> > >         git checkout ca9b31f6bb9c6aa9b4e5f0792f39a97bbffb8c51
> > >         # save the attached .config to linux build tree
> > >         COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=powerpc
> > >
> > > If you fix the issue, kindly add following tag as appropriate
> > > Reported-by: kernel test robot <lkp@intel.com>
> > >
> > > All errors (new ones prefixed by >>):
> > >
> > >    Assembler messages:
> > > >> Error: invalid switch -me200
> > > >> Error: unrecognized option -me200
> > >    clang-12: error: assembler command failed with exit code 1 (use -v to see invocation)
> > >    make[2]: *** [scripts/Makefile.build:281: scripts/mod/empty.o] Error 1
> > >    make[2]: Target '__build' not remade because of errors.
> > >    make[1]: *** [Makefile:1174: prepare0] Error 2
> > >    make[1]: Target 'prepare' not remade because of errors.
> > >    make: *** [Makefile:185: __sub-make] Error 2
> > >    make: Target 'prepare' not remade because of errors.
> > >
> > > ---
> > > 0-DAY CI Kernel Test Service, Intel Corporation
> > > https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
> >
> > This can be ignored. The LLVM integrated assembler does not recognize
> > -me200 (-Wa,-me200 in arch/powerpc/Makefile). I guess the GNU as -m
> > option is similar to .arch or .machine and controls what instructions
> > are recognized. The integrated assembler tends to support all
> > instructions (conditional supporting some instructions has some
> > challenges; in the end I have patched parsing but ignoring `.arch` for
> > x86-64 and ignoring `.machine ppc64` for ppc64)
> >
> > (In addition, e200 is a 32-bit Power ISA microprocessor. 32-bit
> > support may get less attention in LLVM.)
>
> This is also not a clang specific issue, I see the exact same error
> with GCC 10.2.0 and binutils 2.35.
>
> $ make -skj64 ARCH=powerpc CROSS_COMPILE=powerpc64-linux- olddefconfig vmlinux

Does using a non 64b triple produce the same failure?

> ...
> Error: invalid switch -me200
> Error: unrecognized option -me200

There's a block in  arch/powerpc/Makefile:
248 cpu-as-$(CONFIG_40x)    += -Wa,-m405
249 cpu-as-$(CONFIG_44x)    += -Wa,-m440
250 cpu-as-$(CONFIG_ALTIVEC)  += $(call
as-option,-Wa$(comma)-maltivec)
251 cpu-as-$(CONFIG_E200)   += -Wa,-me200
252 cpu-as-$(CONFIG_E500)   += -Wa,-me500

Are those all broken configs, or is Kconfig messed up such that
randconfig can select these when it should not?
-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply

* [PATCH 2/2] kbuild: Disable CONFIG_LD_ORPHAN_WARN for ld.lld 10.0.1
From: Nathan Chancellor @ 2020-11-13 19:55 UTC (permalink / raw)
  To: Masahiro Yamada, Michal Marek, Kees Cook
  Cc: linuxppc-dev, kernelci . org bot, linux-kbuild, Catalin Marinas,
	Mark Brown, x86, Nick Desaulniers, Russell King, linux-kernel,
	clang-built-linux, Arvind Sankar, Ingo Molnar, Borislav Petkov,
	Thomas Gleixner, Will Deacon, Nathan Chancellor, linux-arm-kernel
In-Reply-To: <20201113195553.1487659-1-natechancellor@gmail.com>

ld.lld 10.0.1 spews a bunch of various warnings about .rela sections,
along with a few others. Newer versions of ld.lld do not have these
warnings. As a result, do not add '--orphan-handling=warn' to
LDFLAGS_vmlinux if ld.lld's version is not new enough.

Reported-by: Arvind Sankar <nivedita@alum.mit.edu>
Reported-by: kernelci.org bot <bot@kernelci.org>
Reported-by: Mark Brown <broonie@kernel.org>
Link: https://github.com/ClangBuiltLinux/linux/issues/1187
Link: https://github.com/ClangBuiltLinux/linux/issues/1193
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
---
 MAINTAINERS            |  1 +
 init/Kconfig           |  6 +++++-
 scripts/lld-version.sh | 20 ++++++++++++++++++++
 3 files changed, 26 insertions(+), 1 deletion(-)
 create mode 100755 scripts/lld-version.sh

diff --git a/MAINTAINERS b/MAINTAINERS
index 3da6d8c154e4..4b83d3591ec7 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4284,6 +4284,7 @@ B:	https://github.com/ClangBuiltLinux/linux/issues
 C:	irc://chat.freenode.net/clangbuiltlinux
 F:	Documentation/kbuild/llvm.rst
 F:	scripts/clang-tools/
+F:	scripts/lld-version.sh
 K:	\b(?i:clang|llvm)\b
 
 CLEANCACHE API
diff --git a/init/Kconfig b/init/Kconfig
index a270716562de..40c9ca60ac1d 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -47,6 +47,10 @@ config CLANG_VERSION
 	int
 	default $(shell,$(srctree)/scripts/clang-version.sh $(CC))
 
+config LLD_VERSION
+	int
+	default $(shell,$(srctree)/scripts/lld-version.sh $(LD))
+
 config CC_CAN_LINK
 	bool
 	default $(success,$(srctree)/scripts/cc-can-link.sh $(CC) $(CLANG_FLAGS) $(m64-flag)) if 64BIT
@@ -1349,7 +1353,7 @@ config LD_DEAD_CODE_DATA_ELIMINATION
 	  own risk.
 
 config LD_ORPHAN_WARN
-	def_bool ARCH_WANT_LD_ORPHAN_WARN && $(ld-option,--orphan-handling=warn)
+	def_bool ARCH_WANT_LD_ORPHAN_WARN && $(ld-option,--orphan-handling=warn) && (!LD_IS_LLD || LLD_VERSION >= 110000)
 
 config SYSCTL
 	bool
diff --git a/scripts/lld-version.sh b/scripts/lld-version.sh
new file mode 100755
index 000000000000..cc779f412e39
--- /dev/null
+++ b/scripts/lld-version.sh
@@ -0,0 +1,20 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+#
+# ld.lld-version ld.lld-command
+#
+# Print the linker version of `ld.lld-command' in a 5 or 6-digit form
+# such as `100001' for ld.lld 10.0.1 etc.
+
+linker="$*"
+
+if ! ( $linker --version | grep -q LLD ); then
+	echo 0
+	exit 1
+fi
+
+VERSION=$($linker --version | cut -d ' ' -f 2)
+MAJOR=$(echo $VERSION | cut -d . -f 1)
+MINOR=$(echo $VERSION | cut -d . -f 2)
+PATCHLEVEL=$(echo $VERSION | cut -d . -f 3)
+printf "%d%02d%02d\\n" $MAJOR $MINOR $PATCHLEVEL
-- 
2.29.2


^ permalink raw reply related

* [PATCH 1/2] kbuild: Hoist '--orphan-handling' into Kconfig
From: Nathan Chancellor @ 2020-11-13 19:55 UTC (permalink / raw)
  To: Masahiro Yamada, Michal Marek, Kees Cook
  Cc: linuxppc-dev, linux-kbuild, Catalin Marinas, x86,
	Nick Desaulniers, Russell King, linux-kernel, clang-built-linux,
	Arvind Sankar, Ingo Molnar, Borislav Petkov, Thomas Gleixner,
	Will Deacon, Nathan Chancellor, linux-arm-kernel

Currently, '--orphan-handling=warn' is spread out across four different
architectures in their respective Makefiles, which makes it a little
unruly to deal with in case it needs to be disabled for a specific
linker version (in this case, ld.lld 10.0.1).

To make it easier to control this, hoist this warning into Kconfig and
the main Makefile so that disabling it is simpler, as the warning will
only be enabled in a couple places (main Makefile and a couple of
compressed boot folders that blow away LDFLAGS_vmlinx) and making it
conditional is easier due to Kconfig syntax. One small additional
benefit of this is saving a call to ld-option on incremental builds
because we will have already evaluated it for CONFIG_LD_ORPHAN_WARN.

To keep the list of supported architectures the same, introduce
CONFIG_ARCH_WANT_LD_ORPHAN_WARN, which an architecture can select to
gain this automatically after all of the sections are specified and size
asserted. A special thanks to Kees Cook for the help text on this
config.

Link: https://github.com/ClangBuiltLinux/linux/issues/1187
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
---
 Makefile                          | 6 ++++++
 arch/Kconfig                      | 9 +++++++++
 arch/arm/Kconfig                  | 1 +
 arch/arm/Makefile                 | 4 ----
 arch/arm/boot/compressed/Makefile | 4 +++-
 arch/arm64/Kconfig                | 1 +
 arch/arm64/Makefile               | 4 ----
 arch/powerpc/Kconfig              | 1 +
 arch/powerpc/Makefile             | 1 -
 arch/x86/Kconfig                  | 1 +
 arch/x86/Makefile                 | 3 ---
 arch/x86/boot/compressed/Makefile | 4 +++-
 init/Kconfig                      | 3 +++
 13 files changed, 28 insertions(+), 14 deletions(-)

diff --git a/Makefile b/Makefile
index 008aba5f1a20..c443afd61886 100644
--- a/Makefile
+++ b/Makefile
@@ -984,6 +984,12 @@ ifeq ($(CONFIG_RELR),y)
 LDFLAGS_vmlinux	+= --pack-dyn-relocs=relr
 endif
 
+# We never want expected sections to be placed heuristically by the
+# linker. All sections should be explicitly named in the linker script.
+ifeq ($(CONFIG_LD_ORPHAN_WARN),y)
+LDFLAGS_vmlinux += --orphan-handling=warn
+endif
+
 # Align the bit size of userspace programs with the kernel
 KBUILD_USERCFLAGS  += $(filter -m32 -m64 --target=%, $(KBUILD_CFLAGS))
 KBUILD_USERLDFLAGS += $(filter -m32 -m64 --target=%, $(KBUILD_CFLAGS))
diff --git a/arch/Kconfig b/arch/Kconfig
index 56b6ccc0e32d..ba4e966484ab 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -1028,6 +1028,15 @@ config HAVE_STATIC_CALL_INLINE
 	bool
 	depends on HAVE_STATIC_CALL
 
+config ARCH_WANT_LD_ORPHAN_WARN
+	bool
+	help
+	  An arch should select this symbol once all linker sections are explicitly
+	  included, size-asserted, or discarded in the linker scripts. This is
+	  important because we never want expected sections to be placed heuristically
+	  by the linker, since the locations of such sections can change between linker
+	  versions.
+
 source "kernel/gcov/Kconfig"
 
 source "scripts/gcc-plugins/Kconfig"
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index fe2f17eb2b50..002e0cf025f5 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -35,6 +35,7 @@ config ARM
 	select ARCH_USE_CMPXCHG_LOCKREF
 	select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT if MMU
 	select ARCH_WANT_IPC_PARSE_VERSION
+	select ARCH_WANT_LD_ORPHAN_WARN
 	select BINFMT_FLAT_ARGVP_ENVP_ON_STACK
 	select BUILDTIME_TABLE_SORT if MMU
 	select CLONE_BACKWARDS
diff --git a/arch/arm/Makefile b/arch/arm/Makefile
index 4d76eab2b22d..e15f76ca2887 100644
--- a/arch/arm/Makefile
+++ b/arch/arm/Makefile
@@ -16,10 +16,6 @@ LDFLAGS_vmlinux	+= --be8
 KBUILD_LDFLAGS_MODULE	+= --be8
 endif
 
-# We never want expected sections to be placed heuristically by the
-# linker. All sections should be explicitly named in the linker script.
-LDFLAGS_vmlinux += $(call ld-option, --orphan-handling=warn)
-
 GZFLAGS		:=-9
 #KBUILD_CFLAGS	+=-pipe
 
diff --git a/arch/arm/boot/compressed/Makefile b/arch/arm/boot/compressed/Makefile
index 47f001ca5499..c6f9f3b61c5f 100644
--- a/arch/arm/boot/compressed/Makefile
+++ b/arch/arm/boot/compressed/Makefile
@@ -129,7 +129,9 @@ LDFLAGS_vmlinux += --no-undefined
 # Delete all temporary local symbols
 LDFLAGS_vmlinux += -X
 # Report orphan sections
-LDFLAGS_vmlinux += $(call ld-option, --orphan-handling=warn)
+ifeq ($(CONFIG_LD_ORPHAN_WARN),y)
+LDFLAGS_vmlinux += --orphan-handling=warn
+endif
 # Next argument is a linker script
 LDFLAGS_vmlinux += -T
 
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 1515f6f153a0..a6b5b7ef40ae 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -81,6 +81,7 @@ config ARM64
 	select ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT
 	select ARCH_WANT_FRAME_POINTERS
 	select ARCH_WANT_HUGE_PMD_SHARE if ARM64_4K_PAGES || (ARM64_16K_PAGES && !ARM64_VA_BITS_36)
+	select ARCH_WANT_LD_ORPHAN_WARN
 	select ARCH_HAS_UBSAN_SANITIZE_ALL
 	select ARM_AMBA
 	select ARM_ARCH_TIMER
diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index 5789c2d18d43..6a87d592bd00 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -28,10 +28,6 @@ LDFLAGS_vmlinux	+= --fix-cortex-a53-843419
   endif
 endif
 
-# We never want expected sections to be placed heuristically by the
-# linker. All sections should be explicitly named in the linker script.
-LDFLAGS_vmlinux += $(call ld-option, --orphan-handling=warn)
-
 ifeq ($(CONFIG_ARM64_USE_LSE_ATOMICS), y)
   ifneq ($(CONFIG_ARM64_LSE_ATOMICS), y)
 $(warning LSE atomics not supported by binutils)
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index e9f13fe08492..5181872f9452 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -152,6 +152,7 @@ config PPC
 	select ARCH_USE_QUEUED_SPINLOCKS	if PPC_QUEUED_SPINLOCKS
 	select ARCH_WANT_IPC_PARSE_VERSION
 	select ARCH_WANT_IRQS_OFF_ACTIVATE_MM
+	select ARCH_WANT_LD_ORPHAN_WARN
 	select ARCH_WEAK_RELEASE_ACQUIRE
 	select BINFMT_ELF
 	select BUILDTIME_TABLE_SORT
diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index a4d56f0a41d9..d9eb0da845e1 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -123,7 +123,6 @@ endif
 LDFLAGS_vmlinux-y := -Bstatic
 LDFLAGS_vmlinux-$(CONFIG_RELOCATABLE) := -pie
 LDFLAGS_vmlinux	:= $(LDFLAGS_vmlinux-y)
-LDFLAGS_vmlinux += $(call ld-option,--orphan-handling=warn)
 
 ifdef CONFIG_PPC64
 ifeq ($(call cc-option-yn,-mcmodel=medium),y)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index f6946b81f74a..fbf26e0f7a6a 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -100,6 +100,7 @@ config X86
 	select ARCH_WANT_DEFAULT_BPF_JIT	if X86_64
 	select ARCH_WANTS_DYNAMIC_TASK_STRUCT
 	select ARCH_WANT_HUGE_PMD_SHARE
+	select ARCH_WANT_LD_ORPHAN_WARN
 	select ARCH_WANTS_THP_SWAP		if X86_64
 	select BUILDTIME_TABLE_SORT
 	select CLKEVT_I8253
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 154259f18b8b..1bf21746f4ce 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -209,9 +209,6 @@ ifdef CONFIG_X86_64
 LDFLAGS_vmlinux += -z max-page-size=0x200000
 endif
 
-# We never want expected sections to be placed heuristically by the
-# linker. All sections should be explicitly named in the linker script.
-LDFLAGS_vmlinux += $(call ld-option, --orphan-handling=warn)
 
 archscripts: scripts_basic
 	$(Q)$(MAKE) $(build)=arch/x86/tools relocs
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index ee249088cbfe..fa1c9f83436c 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -61,7 +61,9 @@ KBUILD_LDFLAGS += $(call ld-option,--no-ld-generated-unwind-info)
 # Compressed kernel should be built as PIE since it may be loaded at any
 # address by the bootloader.
 LDFLAGS_vmlinux := -pie $(call ld-option, --no-dynamic-linker)
-LDFLAGS_vmlinux += $(call ld-option, --orphan-handling=warn)
+ifeq ($(CONFIG_LD_ORPHAN_WARN),y)
+LDFLAGS_vmlinux += --orphan-handling=warn
+endif
 LDFLAGS_vmlinux += -T
 
 hostprogs	:= mkpiggy
diff --git a/init/Kconfig b/init/Kconfig
index c9446911cf41..a270716562de 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1348,6 +1348,9 @@ config LD_DEAD_CODE_DATA_ELIMINATION
 	  present. This option is not well tested yet, so use at your
 	  own risk.
 
+config LD_ORPHAN_WARN
+	def_bool ARCH_WANT_LD_ORPHAN_WARN && $(ld-option,--orphan-handling=warn)
+
 config SYSCTL
 	bool
 

base-commit: f8394f232b1eab649ce2df5c5f15b0e528c92091
-- 
2.29.2


^ permalink raw reply related

* Re: Error: invalid switch -me200
From: Nathan Chancellor @ 2020-11-13 20:04 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: kbuild-all, kernel test robot, Fāng-ruì Sòng,
	Masahiro Yamada, LKML, clang-built-linux, linuxppc-dev
In-Reply-To: <CAKwvOdkEtTQhDRFRV_d66FyhQBe536vRbOW=fQjesiHz3dfeBA@mail.gmail.com>

On Fri, Nov 13, 2020 at 11:42:03AM -0800, Nick Desaulniers wrote:
> + MPE, PPC
> 
> On Fri, Nov 13, 2020 at 11:08 AM Nathan Chancellor
> <natechancellor@gmail.com> wrote:
> >
> > On Fri, Nov 13, 2020 at 09:28:03AM -0800, Fāng-ruì Sòng wrote:
> > > On Thu, Nov 12, 2020 at 7:22 PM kernel test robot <lkp@intel.com> wrote:
> > > >
> > > > Hi Fangrui,
> > > >
> > > > FYI, the error/warning still remains.
> > > >
> > > > tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
> > > > head:   585e5b17b92dead8a3aca4e3c9876fbca5f7e0ba
> > > > commit: ca9b31f6bb9c6aa9b4e5f0792f39a97bbffb8c51 Makefile: Fix GCC_TOOLCHAIN_DIR prefix for Clang cross compilation
> > > > date:   4 months ago
> > > > config: powerpc-randconfig-r031-20201113 (attached as .config)
> 
> ^ randconfig
> 
> > > > compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 9e0c35655b6e8186baef8840b26ba4090503b554)
> > > > reproduce (this is a W=1 build):
> > > >         wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
> > > >         chmod +x ~/bin/make.cross
> > > >         # install powerpc cross compiling tool for clang build
> > > >         # apt-get install binutils-powerpc-linux-gnu
> > > >         # https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ca9b31f6bb9c6aa9b4e5f0792f39a97bbffb8c51
> > > >         git remote add linus https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> > > >         git fetch --no-tags linus master
> > > >         git checkout ca9b31f6bb9c6aa9b4e5f0792f39a97bbffb8c51
> > > >         # save the attached .config to linux build tree
> > > >         COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=powerpc
> > > >
> > > > If you fix the issue, kindly add following tag as appropriate
> > > > Reported-by: kernel test robot <lkp@intel.com>
> > > >
> > > > All errors (new ones prefixed by >>):
> > > >
> > > >    Assembler messages:
> > > > >> Error: invalid switch -me200
> > > > >> Error: unrecognized option -me200
> > > >    clang-12: error: assembler command failed with exit code 1 (use -v to see invocation)
> > > >    make[2]: *** [scripts/Makefile.build:281: scripts/mod/empty.o] Error 1
> > > >    make[2]: Target '__build' not remade because of errors.
> > > >    make[1]: *** [Makefile:1174: prepare0] Error 2
> > > >    make[1]: Target 'prepare' not remade because of errors.
> > > >    make: *** [Makefile:185: __sub-make] Error 2
> > > >    make: Target 'prepare' not remade because of errors.
> > > >
> > > > ---
> > > > 0-DAY CI Kernel Test Service, Intel Corporation
> > > > https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
> > >
> > > This can be ignored. The LLVM integrated assembler does not recognize
> > > -me200 (-Wa,-me200 in arch/powerpc/Makefile). I guess the GNU as -m
> > > option is similar to .arch or .machine and controls what instructions
> > > are recognized. The integrated assembler tends to support all
> > > instructions (conditional supporting some instructions has some
> > > challenges; in the end I have patched parsing but ignoring `.arch` for
> > > x86-64 and ignoring `.machine ppc64` for ppc64)
> > >
> > > (In addition, e200 is a 32-bit Power ISA microprocessor. 32-bit
> > > support may get less attention in LLVM.)
> >
> > This is also not a clang specific issue, I see the exact same error
> > with GCC 10.2.0 and binutils 2.35.
> >
> > $ make -skj64 ARCH=powerpc CROSS_COMPILE=powerpc64-linux- olddefconfig vmlinux
> 
> Does using a non 64b triple produce the same failure?

Yes, CROSS_COMPILE=powerpc-linux- produces the same failure.

> > ...
> > Error: invalid switch -me200
> > Error: unrecognized option -me200
> 
> There's a block in  arch/powerpc/Makefile:
> 248 cpu-as-$(CONFIG_40x)    += -Wa,-m405
> 249 cpu-as-$(CONFIG_44x)    += -Wa,-m440
> 250 cpu-as-$(CONFIG_ALTIVEC)  += $(call
> as-option,-Wa$(comma)-maltivec)
> 251 cpu-as-$(CONFIG_E200)   += -Wa,-me200
> 252 cpu-as-$(CONFIG_E500)   += -Wa,-me500
> 
> Are those all broken configs, or is Kconfig messed up such that
> randconfig can select these when it should not?

Hmmm, looks like this flag does not exist in mainline binutils? There is
a thread in 2010 about this that Segher commented on:

https://lore.kernel.org/linuxppc-dev/9859E645-954D-4D07-8003-FFCD2391AB6E@kernel.crashing.org/

Guess this config should be eliminated?

Cheers,
Nathan

^ permalink raw reply

* Re: Error: invalid switch -me200
From: Nick Desaulniers @ 2020-11-13 20:14 UTC (permalink / raw)
  To: Nathan Chancellor, Michael Ellerman, Segher Boessenkool,
	Linus Torvalds, Arnd Bergmann, Brian Cain
  Cc: kbuild-all, kernel test robot, Fāng-ruì Sòng,
	Masahiro Yamada, LKML, clang-built-linux, linuxppc-dev
In-Reply-To: <20201113200444.GA1496675@ubuntu-m3-large-x86>

On Fri, Nov 13, 2020 at 12:04 PM Nathan Chancellor
<natechancellor@gmail.com> wrote:
>
> On Fri, Nov 13, 2020 at 11:42:03AM -0800, Nick Desaulniers wrote:
> > + MPE, PPC
> >
> > On Fri, Nov 13, 2020 at 11:08 AM Nathan Chancellor
> > <natechancellor@gmail.com> wrote:
> > >
> > > On Fri, Nov 13, 2020 at 09:28:03AM -0800, Fāng-ruì Sòng wrote:
> > > > On Thu, Nov 12, 2020 at 7:22 PM kernel test robot <lkp@intel.com> wrote:
> > > > >
> > > > > Hi Fangrui,
> > > > >
> > > > > FYI, the error/warning still remains.
> > > > >
> > > > > tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
> > > > > head:   585e5b17b92dead8a3aca4e3c9876fbca5f7e0ba
> > > > > commit: ca9b31f6bb9c6aa9b4e5f0792f39a97bbffb8c51 Makefile: Fix GCC_TOOLCHAIN_DIR prefix for Clang cross compilation
> > > > > date:   4 months ago
> > > > > config: powerpc-randconfig-r031-20201113 (attached as .config)
> >
> > ^ randconfig
> >
> > > > > compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 9e0c35655b6e8186baef8840b26ba4090503b554)
> > > > > reproduce (this is a W=1 build):
> > > > >         wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
> > > > >         chmod +x ~/bin/make.cross
> > > > >         # install powerpc cross compiling tool for clang build
> > > > >         # apt-get install binutils-powerpc-linux-gnu
> > > > >         # https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ca9b31f6bb9c6aa9b4e5f0792f39a97bbffb8c51
> > > > >         git remote add linus https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> > > > >         git fetch --no-tags linus master
> > > > >         git checkout ca9b31f6bb9c6aa9b4e5f0792f39a97bbffb8c51
> > > > >         # save the attached .config to linux build tree
> > > > >         COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=powerpc
> > > > >
> > > > > If you fix the issue, kindly add following tag as appropriate
> > > > > Reported-by: kernel test robot <lkp@intel.com>
> > > > >
> > > > > All errors (new ones prefixed by >>):
> > > > >
> > > > >    Assembler messages:
> > > > > >> Error: invalid switch -me200
> > > > > >> Error: unrecognized option -me200
> > > > >    clang-12: error: assembler command failed with exit code 1 (use -v to see invocation)
> > > > >    make[2]: *** [scripts/Makefile.build:281: scripts/mod/empty.o] Error 1
> > > > >    make[2]: Target '__build' not remade because of errors.
> > > > >    make[1]: *** [Makefile:1174: prepare0] Error 2
> > > > >    make[1]: Target 'prepare' not remade because of errors.
> > > > >    make: *** [Makefile:185: __sub-make] Error 2
> > > > >    make: Target 'prepare' not remade because of errors.
> > > > >
> > > > > ---
> > > > > 0-DAY CI Kernel Test Service, Intel Corporation
> > > > > https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
> > > >
> > > > This can be ignored. The LLVM integrated assembler does not recognize
> > > > -me200 (-Wa,-me200 in arch/powerpc/Makefile). I guess the GNU as -m
> > > > option is similar to .arch or .machine and controls what instructions
> > > > are recognized. The integrated assembler tends to support all
> > > > instructions (conditional supporting some instructions has some
> > > > challenges; in the end I have patched parsing but ignoring `.arch` for
> > > > x86-64 and ignoring `.machine ppc64` for ppc64)
> > > >
> > > > (In addition, e200 is a 32-bit Power ISA microprocessor. 32-bit
> > > > support may get less attention in LLVM.)
> > >
> > > This is also not a clang specific issue, I see the exact same error
> > > with GCC 10.2.0 and binutils 2.35.
> > >
> > > $ make -skj64 ARCH=powerpc CROSS_COMPILE=powerpc64-linux- olddefconfig vmlinux
> >
> > Does using a non 64b triple produce the same failure?
>
> Yes, CROSS_COMPILE=powerpc-linux- produces the same failure.
>
> > > ...
> > > Error: invalid switch -me200
> > > Error: unrecognized option -me200
> >
> > There's a block in  arch/powerpc/Makefile:
> > 248 cpu-as-$(CONFIG_40x)    += -Wa,-m405
> > 249 cpu-as-$(CONFIG_44x)    += -Wa,-m440
> > 250 cpu-as-$(CONFIG_ALTIVEC)  += $(call
> > as-option,-Wa$(comma)-maltivec)
> > 251 cpu-as-$(CONFIG_E200)   += -Wa,-me200
> > 252 cpu-as-$(CONFIG_E500)   += -Wa,-me500
> >
> > Are those all broken configs, or is Kconfig messed up such that
> > randconfig can select these when it should not?
>
> Hmmm, looks like this flag does not exist in mainline binutils? There is
> a thread in 2010 about this that Segher commented on:
>
> https://lore.kernel.org/linuxppc-dev/9859E645-954D-4D07-8003-FFCD2391AB6E@kernel.crashing.org/
>
> Guess this config should be eliminated?

If we're going to get pestered by 0day bot randconfigs over code
that's not possible to build, I'm all for deleting it.  I doubt we'll
be seeing patches from anyone to binutils for supporting these.

What has the kernel's policy been for code in tree that other folks
can't build (without proprietary tools)? (ARCH=hexagon is pretty close
to not toeing the line here, not sure ICC actually works either).
-- 
Thanks,
~Nick Desaulniers

^ permalink raw reply

* Re: [PATCH v2 05/19] powerpc: interrupt handler wrapper functions
From: kernel test robot @ 2020-11-13 21:52 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev
  Cc: clang-built-linux, kbuild-all, Nicholas Piggin
In-Reply-To: <20201111094410.3038123-6-npiggin@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 13653 bytes --]

Hi Nicholas,

I love your patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on v5.10-rc3 next-20201113]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Nicholas-Piggin/powerpc-interrupt-wrappers/20201111-183954
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-randconfig-r003-20201113 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 9e0c35655b6e8186baef8840b26ba4090503b554)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install powerpc cross compiling tool for clang build
        # apt-get install binutils-powerpc-linux-gnu
        # https://github.com/0day-ci/linux/commit/36805b0ebcf1760588efad86b8b5db5344329148
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Nicholas-Piggin/powerpc-interrupt-wrappers/20201111-183954
        git checkout 36805b0ebcf1760588efad86b8b5db5344329148
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=powerpc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   In file included from include/linux/hardirq.h:10:
   In file included from arch/powerpc/include/asm/hardirq.h:6:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/powerpc/include/asm/io.h:604:
   arch/powerpc/include/asm/io-defs.h:45:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
   DEF_PCI_AC_NORET(insw, (unsigned long p, void *b, unsigned long c),
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   arch/powerpc/include/asm/io.h:601:3: note: expanded from macro 'DEF_PCI_AC_NORET'
                   __do_##name al;                                 \
                   ^~~~~~~~~~~~~~
   <scratch space>:16:1: note: expanded from here
   __do_insw
   ^
   arch/powerpc/include/asm/io.h:542:56: note: expanded from macro '__do_insw'
   #define __do_insw(p, b, n)      readsw((PCI_IO_ADDR)_IO_BASE+(p), (b), (n))
                                          ~~~~~~~~~~~~~~~~~~~~~^
   In file included from arch/powerpc/kvm/booke.c:15:
   In file included from include/linux/kvm_host.h:7:
   In file included from include/linux/hardirq.h:10:
   In file included from arch/powerpc/include/asm/hardirq.h:6:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/powerpc/include/asm/io.h:604:
   arch/powerpc/include/asm/io-defs.h:47:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
   DEF_PCI_AC_NORET(insl, (unsigned long p, void *b, unsigned long c),
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   arch/powerpc/include/asm/io.h:601:3: note: expanded from macro 'DEF_PCI_AC_NORET'
                   __do_##name al;                                 \
                   ^~~~~~~~~~~~~~
   <scratch space>:18:1: note: expanded from here
   __do_insl
   ^
   arch/powerpc/include/asm/io.h:543:56: note: expanded from macro '__do_insl'
   #define __do_insl(p, b, n)      readsl((PCI_IO_ADDR)_IO_BASE+(p), (b), (n))
                                          ~~~~~~~~~~~~~~~~~~~~~^
   In file included from arch/powerpc/kvm/booke.c:15:
   In file included from include/linux/kvm_host.h:7:
   In file included from include/linux/hardirq.h:10:
   In file included from arch/powerpc/include/asm/hardirq.h:6:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/powerpc/include/asm/io.h:604:
   arch/powerpc/include/asm/io-defs.h:49:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
   DEF_PCI_AC_NORET(outsb, (unsigned long p, const void *b, unsigned long c),
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   arch/powerpc/include/asm/io.h:601:3: note: expanded from macro 'DEF_PCI_AC_NORET'
                   __do_##name al;                                 \
                   ^~~~~~~~~~~~~~
   <scratch space>:20:1: note: expanded from here
   __do_outsb
   ^
   arch/powerpc/include/asm/io.h:544:58: note: expanded from macro '__do_outsb'
   #define __do_outsb(p, b, n)     writesb((PCI_IO_ADDR)_IO_BASE+(p),(b),(n))
                                           ~~~~~~~~~~~~~~~~~~~~~^
   In file included from arch/powerpc/kvm/booke.c:15:
   In file included from include/linux/kvm_host.h:7:
   In file included from include/linux/hardirq.h:10:
   In file included from arch/powerpc/include/asm/hardirq.h:6:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/powerpc/include/asm/io.h:604:
   arch/powerpc/include/asm/io-defs.h:51:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
   DEF_PCI_AC_NORET(outsw, (unsigned long p, const void *b, unsigned long c),
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   arch/powerpc/include/asm/io.h:601:3: note: expanded from macro 'DEF_PCI_AC_NORET'
                   __do_##name al;                                 \
                   ^~~~~~~~~~~~~~
   <scratch space>:22:1: note: expanded from here
   __do_outsw
   ^
   arch/powerpc/include/asm/io.h:545:58: note: expanded from macro '__do_outsw'
   #define __do_outsw(p, b, n)     writesw((PCI_IO_ADDR)_IO_BASE+(p),(b),(n))
                                           ~~~~~~~~~~~~~~~~~~~~~^
   In file included from arch/powerpc/kvm/booke.c:15:
   In file included from include/linux/kvm_host.h:7:
   In file included from include/linux/hardirq.h:10:
   In file included from arch/powerpc/include/asm/hardirq.h:6:
   In file included from include/linux/irq.h:20:
   In file included from include/linux/io.h:13:
   In file included from arch/powerpc/include/asm/io.h:604:
   arch/powerpc/include/asm/io-defs.h:53:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
   DEF_PCI_AC_NORET(outsl, (unsigned long p, const void *b, unsigned long c),
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   arch/powerpc/include/asm/io.h:601:3: note: expanded from macro 'DEF_PCI_AC_NORET'
                   __do_##name al;                                 \
                   ^~~~~~~~~~~~~~
   <scratch space>:24:1: note: expanded from here
   __do_outsl
   ^
   arch/powerpc/include/asm/io.h:546:58: note: expanded from macro '__do_outsl'
   #define __do_outsl(p, b, n)     writesl((PCI_IO_ADDR)_IO_BASE+(p),(b),(n))
                                           ~~~~~~~~~~~~~~~~~~~~~^
   arch/powerpc/kvm/booke.c:600:6: warning: no previous prototype for function 'kvmppc_watchdog_func' [-Wmissing-prototypes]
   void kvmppc_watchdog_func(struct timer_list *t)
        ^
   arch/powerpc/kvm/booke.c:600:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   void kvmppc_watchdog_func(struct timer_list *t)
   ^
   static 
>> arch/powerpc/kvm/booke.c:922:3: error: implicit declaration of function 'timer_interrupt' [-Werror,-Wimplicit-function-declaration]
                   timer_interrupt(&regs);
                   ^
   arch/powerpc/kvm/booke.c:922:3: note: did you mean 'hrtimer_interrupt'?
   include/linux/hrtimer.h:319:13: note: 'hrtimer_interrupt' declared here
   extern void hrtimer_interrupt(struct clock_event_device *dev);
               ^
>> arch/powerpc/kvm/booke.c:935:3: error: implicit declaration of function 'performance_monitor_exception' [-Werror,-Wimplicit-function-declaration]
                   performance_monitor_exception(&regs);
                   ^
>> arch/powerpc/kvm/booke.c:942:3: error: implicit declaration of function 'unknown_exception' [-Werror,-Wimplicit-function-declaration]
                   unknown_exception(&regs);
                   ^
   arch/powerpc/kvm/booke.c:984:5: warning: no previous prototype for function 'kvmppc_handle_exit' [-Wmissing-prototypes]
   int kvmppc_handle_exit(struct kvm_vcpu *vcpu, unsigned int exit_nr)
       ^
   arch/powerpc/kvm/booke.c:984:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   int kvmppc_handle_exit(struct kvm_vcpu *vcpu, unsigned int exit_nr)
   ^
   static 
   arch/powerpc/kvm/booke.c:1909:6: warning: no previous prototype for function 'kvm_guest_protect_msr' [-Wmissing-prototypes]
   void kvm_guest_protect_msr(struct kvm_vcpu *vcpu, ulong prot_bitmap, bool set)
        ^
   arch/powerpc/kvm/booke.c:1909:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   void kvm_guest_protect_msr(struct kvm_vcpu *vcpu, ulong prot_bitmap, bool set)
   ^
   static 
   9 warnings and 3 errors generated.

vim +/timer_interrupt +922 arch/powerpc/kvm/booke.c

4e642ccbd6a3f14 Alexander Graf   2012-02-20  903  
6328e593c3df5e8 Bharat Bhushan   2012-06-20  904  /*
6328e593c3df5e8 Bharat Bhushan   2012-06-20  905   * For interrupts needed to be handled by host interrupt handlers,
6328e593c3df5e8 Bharat Bhushan   2012-06-20  906   * corresponding host handler are called from here in similar way
6328e593c3df5e8 Bharat Bhushan   2012-06-20  907   * (but not exact) as they are called from low level handler
6328e593c3df5e8 Bharat Bhushan   2012-06-20  908   * (such as from arch/powerpc/kernel/head_fsl_booke.S).
6328e593c3df5e8 Bharat Bhushan   2012-06-20  909   */
4e642ccbd6a3f14 Alexander Graf   2012-02-20  910  static void kvmppc_restart_interrupt(struct kvm_vcpu *vcpu,
4e642ccbd6a3f14 Alexander Graf   2012-02-20  911  				     unsigned int exit_nr)
4e642ccbd6a3f14 Alexander Graf   2012-02-20  912  {
4e642ccbd6a3f14 Alexander Graf   2012-02-20  913  	struct pt_regs regs;
73e75b416ffcfa3 Hollis Blanchard 2008-12-02  914  
d30f6e480055e5b Scott Wood       2011-12-20  915  	switch (exit_nr) {
d30f6e480055e5b Scott Wood       2011-12-20  916  	case BOOKE_INTERRUPT_EXTERNAL:
4e642ccbd6a3f14 Alexander Graf   2012-02-20  917  		kvmppc_fill_pt_regs(&regs);
4e642ccbd6a3f14 Alexander Graf   2012-02-20  918  		do_IRQ(&regs);
d30f6e480055e5b Scott Wood       2011-12-20  919  		break;
d30f6e480055e5b Scott Wood       2011-12-20  920  	case BOOKE_INTERRUPT_DECREMENTER:
4e642ccbd6a3f14 Alexander Graf   2012-02-20  921  		kvmppc_fill_pt_regs(&regs);
4e642ccbd6a3f14 Alexander Graf   2012-02-20 @922  		timer_interrupt(&regs);
d30f6e480055e5b Scott Wood       2011-12-20  923  		break;
5f17ce8b954a2ff Tiejun Chen      2013-05-13  924  #if defined(CONFIG_PPC_DOORBELL)
d30f6e480055e5b Scott Wood       2011-12-20  925  	case BOOKE_INTERRUPT_DOORBELL:
4e642ccbd6a3f14 Alexander Graf   2012-02-20  926  		kvmppc_fill_pt_regs(&regs);
4e642ccbd6a3f14 Alexander Graf   2012-02-20  927  		doorbell_exception(&regs);
d30f6e480055e5b Scott Wood       2011-12-20  928  		break;
d30f6e480055e5b Scott Wood       2011-12-20  929  #endif
d30f6e480055e5b Scott Wood       2011-12-20  930  	case BOOKE_INTERRUPT_MACHINE_CHECK:
d30f6e480055e5b Scott Wood       2011-12-20  931  		/* FIXME */
d30f6e480055e5b Scott Wood       2011-12-20  932  		break;
7cc1e8ee78f469e Alexander Graf   2012-02-22  933  	case BOOKE_INTERRUPT_PERFORMANCE_MONITOR:
7cc1e8ee78f469e Alexander Graf   2012-02-22  934  		kvmppc_fill_pt_regs(&regs);
7cc1e8ee78f469e Alexander Graf   2012-02-22 @935  		performance_monitor_exception(&regs);
7cc1e8ee78f469e Alexander Graf   2012-02-22  936  		break;
6328e593c3df5e8 Bharat Bhushan   2012-06-20  937  	case BOOKE_INTERRUPT_WATCHDOG:
6328e593c3df5e8 Bharat Bhushan   2012-06-20  938  		kvmppc_fill_pt_regs(&regs);
6328e593c3df5e8 Bharat Bhushan   2012-06-20  939  #ifdef CONFIG_BOOKE_WDT
6328e593c3df5e8 Bharat Bhushan   2012-06-20  940  		WatchdogException(&regs);
6328e593c3df5e8 Bharat Bhushan   2012-06-20  941  #else
6328e593c3df5e8 Bharat Bhushan   2012-06-20 @942  		unknown_exception(&regs);
6328e593c3df5e8 Bharat Bhushan   2012-06-20  943  #endif
6328e593c3df5e8 Bharat Bhushan   2012-06-20  944  		break;
6328e593c3df5e8 Bharat Bhushan   2012-06-20  945  	case BOOKE_INTERRUPT_CRITICAL:
845ac985cf8e3d5 Tudor Laurentiu  2015-05-18  946  		kvmppc_fill_pt_regs(&regs);
6328e593c3df5e8 Bharat Bhushan   2012-06-20  947  		unknown_exception(&regs);
6328e593c3df5e8 Bharat Bhushan   2012-06-20  948  		break;
ce11e48b7fdd256 Bharat Bhushan   2013-07-04  949  	case BOOKE_INTERRUPT_DEBUG:
ce11e48b7fdd256 Bharat Bhushan   2013-07-04  950  		/* Save DBSR before preemption is enabled */
ce11e48b7fdd256 Bharat Bhushan   2013-07-04  951  		vcpu->arch.dbsr = mfspr(SPRN_DBSR);
ce11e48b7fdd256 Bharat Bhushan   2013-07-04  952  		kvmppc_clear_dbsr();
ce11e48b7fdd256 Bharat Bhushan   2013-07-04  953  		break;
d30f6e480055e5b Scott Wood       2011-12-20  954  	}
4e642ccbd6a3f14 Alexander Graf   2012-02-20  955  }
4e642ccbd6a3f14 Alexander Graf   2012-02-20  956  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 27196 bytes --]

^ permalink raw reply

* Re: Error: invalid switch -me200
From: Segher Boessenkool @ 2020-11-14  0:20 UTC (permalink / raw)
  To: Nick Desaulniers
  Cc: Arnd Bergmann, kbuild-all, Brian Cain,
	Fāng-ruì Sòng, Masahiro Yamada, linuxppc-dev, LKML,
	clang-built-linux, Nathan Chancellor, Linus Torvalds,
	kernel test robot
In-Reply-To: <CAKwvOdkBSGPaKmQY1nERVe4_n19Q=MUtuwdond=FJAAF9N9Zhg@mail.gmail.com>

On Fri, Nov 13, 2020 at 12:14:18PM -0800, Nick Desaulniers wrote:
> > > > Error: invalid switch -me200
> > > > Error: unrecognized option -me200
> > >
> > > 251 cpu-as-$(CONFIG_E200)   += -Wa,-me200
> > >
> > > Are those all broken configs, or is Kconfig messed up such that
> > > randconfig can select these when it should not?
> >
> > Hmmm, looks like this flag does not exist in mainline binutils? There is
> > a thread in 2010 about this that Segher commented on:
> >
> > https://lore.kernel.org/linuxppc-dev/9859E645-954D-4D07-8003-FFCD2391AB6E@kernel.crashing.org/
> >
> > Guess this config should be eliminated?

The help text for this config options says that e200 is used in 55xx,
and there *is* an -me5500 GAS flag (which probably does this same
thing, too).  But is any of this tested, or useful, or wanted?

Maybe Christophe knows, cc:ed.


Segher

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox