LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH] powerpc/mce: check if event info is valid
From: Ganesh @ 2021-09-17  8:54 UTC (permalink / raw)
  To: linuxppc-dev, mpe; +Cc: mahesh, npiggin
In-Reply-To: <20210806132307.367688-1-ganeshgr@linux.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 851 bytes --]

On 8/6/21 6:53 PM, Ganesh Goudar wrote:

> Check if the event info is valid before printing the
> event information. When a fwnmi enabled nested kvm guest
> hits a machine check exception L0 and L2 would generate
> machine check event info, But L1 would not generate any
> machine check event info as it won't go through 0x200
> vector and prints some unwanted message.
>
> To fix this, 'in_use' variable in machine check event info is
> no more in use, rename it to 'valid' and check if the event
> information is valid before logging the event information.
>
> without this patch L1 would print following message for
> exceptions encountered in L2, as event structure will be
> empty in L1.
>
> "Machine Check Exception, Unknown event version 0".
>
> Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
> ---

Hi mpe, Any comments on this patch.


[-- Attachment #2: Type: text/html, Size: 1255 bytes --]

^ permalink raw reply

* [powerpc:merge] BUILD SUCCESS a0175bcd45a4dd0de301dca8fd17d74fcb3ce240
From: kernel test robot @ 2021-09-17 10:30 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev

tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git merge
branch HEAD: a0175bcd45a4dd0de301dca8fd17d74fcb3ce240  powerpc/ci: Switch GCC 4.9.4 builds to 5.5.0

elapsed time: 1069m

configs tested: 105
configs skipped: 3

The following configs have been built successfully.
More configs may be tested in the coming days.

gcc tested configs:
arm                                 defconfig
arm64                            allyesconfig
arm64                               defconfig
arm                              allyesconfig
arm                              allmodconfig
i386                 randconfig-c001-20210916
sh                          rsk7269_defconfig
powerpc                     sequoia_defconfig
mips                       bmips_be_defconfig
riscv                             allnoconfig
powerpc                 mpc85xx_cds_defconfig
s390                       zfcpdump_defconfig
powerpc                      arches_defconfig
m68k                       m5275evb_defconfig
powerpc                     powernv_defconfig
mips                          rm200_defconfig
mips                     cu1000-neo_defconfig
arm                          ep93xx_defconfig
xtensa                  audio_kc705_defconfig
xtensa                          iss_defconfig
powerpc                 linkstation_defconfig
sh                ecovec24-romimage_defconfig
mips                          malta_defconfig
arm                     am200epdkit_defconfig
x86_64               randconfig-c001-20210916
arm                  randconfig-c002-20210916
ia64                             allmodconfig
ia64                                defconfig
ia64                             allyesconfig
m68k                             allmodconfig
m68k                                defconfig
m68k                             allyesconfig
nios2                               defconfig
nds32                             allnoconfig
nds32                               defconfig
nios2                            allyesconfig
csky                                defconfig
alpha                               defconfig
alpha                            allyesconfig
xtensa                           allyesconfig
h8300                            allyesconfig
arc                                 defconfig
sh                               allmodconfig
parisc                              defconfig
s390                             allyesconfig
s390                             allmodconfig
parisc                           allyesconfig
s390                                defconfig
i386                             allyesconfig
sparc                            allyesconfig
sparc                               defconfig
i386                                defconfig
arc                              allyesconfig
mips                             allyesconfig
mips                             allmodconfig
powerpc                          allyesconfig
powerpc                          allmodconfig
powerpc                           allnoconfig
x86_64               randconfig-a016-20210916
x86_64               randconfig-a013-20210916
x86_64               randconfig-a012-20210916
x86_64               randconfig-a011-20210916
x86_64               randconfig-a014-20210916
x86_64               randconfig-a015-20210916
i386                 randconfig-a016-20210916
i386                 randconfig-a015-20210916
i386                 randconfig-a011-20210916
i386                 randconfig-a012-20210916
i386                 randconfig-a013-20210916
i386                 randconfig-a014-20210916
riscv                randconfig-r042-20210916
s390                 randconfig-r044-20210916
arc                  randconfig-r043-20210916
riscv                    nommu_k210_defconfig
riscv                            allyesconfig
riscv                    nommu_virt_defconfig
riscv                               defconfig
riscv                          rv32_defconfig
riscv                            allmodconfig
x86_64                    rhel-8.3-kselftests
um                           x86_64_defconfig
um                             i386_defconfig
x86_64                           allyesconfig
x86_64                              defconfig
x86_64                               rhel-8.3
x86_64                                  kexec

clang tested configs:
riscv                randconfig-c006-20210916
x86_64               randconfig-c007-20210916
mips                 randconfig-c004-20210916
powerpc              randconfig-c003-20210916
arm                  randconfig-c002-20210916
i386                 randconfig-c001-20210916
s390                 randconfig-c005-20210916
i386                 randconfig-a004-20210916
i386                 randconfig-a005-20210916
i386                 randconfig-a006-20210916
i386                 randconfig-a002-20210916
i386                 randconfig-a003-20210916
i386                 randconfig-a001-20210916
x86_64               randconfig-a002-20210916
x86_64               randconfig-a003-20210916
x86_64               randconfig-a006-20210916
x86_64               randconfig-a004-20210916
x86_64               randconfig-a005-20210916
x86_64               randconfig-a001-20210916
hexagon              randconfig-r045-20210916
hexagon              randconfig-r041-20210916

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply

* Re: [PATCH] swiotlb: set IO TLB segment size via cmdline
From: Roman Skakun @ 2021-09-17 10:53 UTC (permalink / raw)
  To: Jan Beulich
  Cc: linux-doc, Peter Zijlstra, Viresh Kumar,
	Linux Kernel Mailing List, Paul Mackerras, Will Deacon,
	Boris Ostrovsky, Marek Szyprowski, Stefano Stabellini,
	Jonathan Corbet, Christoph Hellwig, xen-devel, Paul E. McKenney,
	Konrad Rzeszutek Wilk, Muchun Song, Thomas Gleixner,
	Juergen Gross, Thomas Bogendoerfer, Andrii Anisov, linuxppc-dev,
	Randy Dunlap, linux-mips, iommu, Roman Skakun, Andrew Morton,
	Lu Baolu, Robin Murphy, Mike Rapoport, Maciej W. Rozycki
In-Reply-To: <84ef7ff7-2c9c-113a-4a2c-cef54a6ded51@suse.com>

Hi Jan,

>>>  In order to be sure to catch all uses like this one (including ones
>>>  which make it upstream in parallel to yours), I think you will want
>>>  to rename the original IO_TLB_SEGSIZE to e.g. IO_TLB_DEFAULT_SEGSIZE.
>>
>> I don't understand your point. Can you clarify this?
>
> There's a concrete present example: I have a patch pending adding
> another use of IO_TLB_SEGSIZE. This use would need to be replaced
> like you do here in several places. The need for the additional
> replacement would be quite obvious (from a build failure) if you
> renamed the manifest constant. Without renaming, it'll take
> someone running into an issue on a live system, which I consider
> far worse. This is because a simple re-basing of one of the
> patches on top of the other will not point out the need for the
> extra replacement, nor would a test build (with both patches in
> place).

It's reasonable.
I will change the original IO_TLB_SEGSIZE to IO_TLB_DEFAULT_SEGSIZE in the
next patch series.

Thanks.

ср, 15 сент. 2021 г. в 16:50, Jan Beulich <jbeulich@suse.com>:
>
> On 15.09.2021 15:37, Roman Skakun wrote:
> >>> From: Roman Skakun <roman_skakun@epam.com>
> >>>
> >>> It is possible when default IO TLB size is not
> >>> enough to fit a long buffers as described here [1].
> >>>
> >>> This patch makes a way to set this parameter
> >>> using cmdline instead of recompiling a kernel.
> >>>
> >>> [1] https://www.xilinx.com/support/answers/72694.html
> >>
> >>  I'm not convinced the swiotlb use describe there falls under "intended
> >>  use" - mapping a 1280x720 framebuffer in a single chunk?
> >
> > I had the same issue while mapping DMA chuck ~4MB for gem fb when
> > using xen vdispl.
> > I got the next log:
> > [ 142.030421] rcar-fcp fea2f000.fcp: swiotlb buffer is full (sz:
> > 3686400 bytes), total 32768 (slots), used 32 (slots)
> >
> > It happened when I tried to map bounce buffer, which has a large size.
> > The default size if 128(IO_TLB_SEGSIZE) * 2048(IO_TLB_SHIFT) = 262144
> > bytes, but we requested 3686400 bytes.
> > When I change IO_TLB_SEGSIZE to 2048. (2048(IO_TLB_SEGSIZE)  *
> > 2048(IO_TLB_SHIFT) = 4194304bytes).
> > It makes possible to retrieve a bounce buffer for requested size.
> > After changing this value, the problem is gone.
>
> But the question remains: Why does the framebuffer need to be mapped
> in a single giant chunk?
>
> >>  In order to be sure to catch all uses like this one (including ones
> >>  which make it upstream in parallel to yours), I think you will want
> >>  to rename the original IO_TLB_SEGSIZE to e.g. IO_TLB_DEFAULT_SEGSIZE.
> >
> > I don't understand your point. Can you clarify this?
>
> There's a concrete present example: I have a patch pending adding
> another use of IO_TLB_SEGSIZE. This use would need to be replaced
> like you do here in several places. The need for the additional
> replacement would be quite obvious (from a build failure) if you
> renamed the manifest constant. Without renaming, it'll take
> someone running into an issue on a live system, which I consider
> far worse. This is because a simple re-basing of one of the
> patches on top of the other will not point out the need for the
> extra replacement, nor would a test build (with both patches in
> place).
>
> Jan
>


-- 
Best Regards, Roman.

^ permalink raw reply

* [PATCH] powerpc/32: Don't use a struct based type for pte_t
From: Christophe Leroy @ 2021-09-17 10:54 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: linuxppc-dev, linux-kernel

Long time ago we had a config item called STRICT_MM_TYPECHECKS
to build the kernel with pte_t defined as a structure in order
to perform additional build checks or build it with pte_t
defined as a simple type in order to get simpler generated code.

Commit 670eea924198 ("powerpc/mm: Always use STRICT_MM_TYPECHECKS")
made the struct based definition the only one, considering that the
generated code was similar in both cases.

That's right on ppc64 because the ABI is such that the content of a
struct having a single simple type element is passed as register,
but on ppc32 such a structure is passed via the stack like any
structure.

Simple test function:

	pte_t test(pte_t pte)
	{
		return pte;
	}

Before this patch we get

	c00108ec <test>:
	c00108ec:	81 24 00 00 	lwz     r9,0(r4)
	c00108f0:	91 23 00 00 	stw     r9,0(r3)
	c00108f4:	4e 80 00 20 	blr

So, for PPC32, restore the simple type behaviour we got before
commit 670eea924198, but instead of adding a config option to
activate type check, do it when __CHECKER__ is set so that type
checking is performed by 'sparse' and provides feedback like:

	arch/powerpc/mm/pgtable.c:466:16: warning: incorrect type in return expression (different base types)
	arch/powerpc/mm/pgtable.c:466:16:    expected unsigned long
	arch/powerpc/mm/pgtable.c:466:16:    got struct pte_t [usertype] x

With this patch we now get

	c0010890 <test>:
	c0010890:	4e 80 00 20 	blr

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 arch/powerpc/include/asm/nohash/32/pgtable.h |  2 +-
 arch/powerpc/include/asm/pgtable-types.h     | 13 ++++++++++++-
 arch/powerpc/mm/pgtable.c                    |  2 +-
 3 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/nohash/32/pgtable.h b/arch/powerpc/include/asm/nohash/32/pgtable.h
index f06ae00f2a65..34ce50da1850 100644
--- a/arch/powerpc/include/asm/nohash/32/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/32/pgtable.h
@@ -245,7 +245,7 @@ static int number_of_cells_per_pte(pmd_t *pmd, pte_basic_t val, int huge)
 static inline pte_basic_t pte_update(struct mm_struct *mm, unsigned long addr, pte_t *p,
 				     unsigned long clr, unsigned long set, int huge)
 {
-	pte_basic_t *entry = &p->pte;
+	pte_basic_t *entry = (pte_basic_t *)p;
 	pte_basic_t old = pte_val(*p);
 	pte_basic_t new = (old & ~(pte_basic_t)clr) | set;
 	int num, i;
diff --git a/arch/powerpc/include/asm/pgtable-types.h b/arch/powerpc/include/asm/pgtable-types.h
index d11b4c61d686..e5ed4580a62c 100644
--- a/arch/powerpc/include/asm/pgtable-types.h
+++ b/arch/powerpc/include/asm/pgtable-types.h
@@ -5,14 +5,25 @@
 /* PTE level */
 #if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
 typedef struct { pte_basic_t pte, pte1, pte2, pte3; } pte_t;
-#else
+#elif defined(__CHECKER__) || !defined(CONFIG_PPC32)
 typedef struct { pte_basic_t pte; } pte_t;
+#else
+typedef pte_basic_t pte_t;
 #endif
+
+#if defined(__CHECKER__) || !defined(CONFIG_PPC32)
 #define __pte(x)	((pte_t) { (x) })
 static inline pte_basic_t pte_val(pte_t x)
 {
 	return x.pte;
 }
+#else
+#define __pte(x)	((pte_t)(x))
+static inline pte_basic_t pte_val(pte_t x)
+{
+	return x;
+}
+#endif
 
 /* PMD level */
 #ifdef CONFIG_PPC64
diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
index cd16b407f47e..ce9482383144 100644
--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -271,7 +271,7 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_
 {
 	pmd_t *pmd = pmd_off(mm, addr);
 	pte_basic_t val;
-	pte_basic_t *entry = &ptep->pte;
+	pte_basic_t *entry = (pte_basic_t *)ptep;
 	int num, i;
 
 	/*
-- 
2.31.1


^ permalink raw reply related

* Re: [PATCH] powerpc: warn on emulation of dcbz instruction
From: Benjamin Herrenschmidt @ 2021-09-17 12:33 UTC (permalink / raw)
  To: David Laight, 'Christophe Leroy', Paul Mackerras,
	Michael Ellerman
  Cc: Finn Thain, linuxppc-dev@lists.ozlabs.org,
	linux-kernel@vger.kernel.org, Stan Johnson
In-Reply-To: <2fc6953808854149965e60f340eca94b@AcuMS.aculab.com>

On Thu, 2021-09-16 at 14:36 +0000, David Laight wrote:
> > Does userspace accesses non-cached memory directly ?
> 
> 
> It probably can if a driver mmaps PCI space directly into user space.
> 
> That certainly works on x86-64.

The posterchild for that is Xorg

Cheers,
Ben.



^ permalink raw reply

* [PATCH v2] powerpc/8xx: Simplify TLB handling
From: Christophe Leroy @ 2021-09-17 13:57 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: linuxppc-dev, linux-kernel

In the old days, TLB handling for 8xx was using tlbie and tlbia
instructions directly as much as possible.

But commit f048aace29e0 ("powerpc/mm: Add SMP support to no-hash
TLB handling") broke that by introducing out-of-line unnecessary
complex functions for booke/smp which don't have tlbie/tlbia
instructions and require more complex handling.

Restore direct use of tlbie and tlbia for 8xx which is never SMP.

With this patch we now get

	c00ecc68 <ptep_clear_flush>:
	c00ecc68:	39 00 00 00 	li      r8,0
	c00ecc6c:	81 46 00 00 	lwz     r10,0(r6)
	c00ecc70:	91 06 00 00 	stw     r8,0(r6)
	c00ecc74:	7c 00 2a 64 	tlbie   r5,r0
	c00ecc78:	7c 00 04 ac 	hwsync
	c00ecc7c:	91 43 00 00 	stw     r10,0(r3)
	c00ecc80:	4e 80 00 20 	blr

Before it was

	c0012880 <local_flush_tlb_page>:
	c0012880:	2c 03 00 00 	cmpwi   r3,0
	c0012884:	41 82 00 54 	beq     c00128d8 <local_flush_tlb_page+0x58>
	c0012888:	81 22 00 00 	lwz     r9,0(r2)
	c001288c:	81 43 00 20 	lwz     r10,32(r3)
	c0012890:	39 29 00 01 	addi    r9,r9,1
	c0012894:	91 22 00 00 	stw     r9,0(r2)
	c0012898:	2c 0a 00 00 	cmpwi   r10,0
	c001289c:	41 82 00 10 	beq     c00128ac <local_flush_tlb_page+0x2c>
	c00128a0:	81 2a 01 dc 	lwz     r9,476(r10)
	c00128a4:	2c 09 ff ff 	cmpwi   r9,-1
	c00128a8:	41 82 00 0c 	beq     c00128b4 <local_flush_tlb_page+0x34>
	c00128ac:	7c 00 22 64 	tlbie   r4,r0
	c00128b0:	7c 00 04 ac 	hwsync
	c00128b4:	81 22 00 00 	lwz     r9,0(r2)
	c00128b8:	39 29 ff ff 	addi    r9,r9,-1
	c00128bc:	2c 09 00 00 	cmpwi   r9,0
	c00128c0:	91 22 00 00 	stw     r9,0(r2)
	c00128c4:	4c a2 00 20 	bclr+   4,eq
	c00128c8:	81 22 00 70 	lwz     r9,112(r2)
	c00128cc:	71 29 00 04 	andi.   r9,r9,4
	c00128d0:	4d 82 00 20 	beqlr
	c00128d4:	48 65 76 74 	b       c0669f48 <preempt_schedule>
	c00128d8:	81 22 00 00 	lwz     r9,0(r2)
	c00128dc:	39 29 00 01 	addi    r9,r9,1
	c00128e0:	91 22 00 00 	stw     r9,0(r2)
	c00128e4:	4b ff ff c8 	b       c00128ac <local_flush_tlb_page+0x2c>
...
	c00ecdc8 <ptep_clear_flush>:
	c00ecdc8:	94 21 ff f0 	stwu    r1,-16(r1)
	c00ecdcc:	39 20 00 00 	li      r9,0
	c00ecdd0:	93 c1 00 08 	stw     r30,8(r1)
	c00ecdd4:	83 c6 00 00 	lwz     r30,0(r6)
	c00ecdd8:	91 26 00 00 	stw     r9,0(r6)
	c00ecddc:	93 e1 00 0c 	stw     r31,12(r1)
	c00ecde0:	7c 08 02 a6 	mflr    r0
	c00ecde4:	7c 7f 1b 78 	mr      r31,r3
	c00ecde8:	7c 83 23 78 	mr      r3,r4
	c00ecdec:	7c a4 2b 78 	mr      r4,r5
	c00ecdf0:	90 01 00 14 	stw     r0,20(r1)
	c00ecdf4:	4b f2 5a 8d 	bl      c0012880 <local_flush_tlb_page>
	c00ecdf8:	93 df 00 00 	stw     r30,0(r31)
	c00ecdfc:	7f e3 fb 78 	mr      r3,r31
	c00ece00:	80 01 00 14 	lwz     r0,20(r1)
	c00ece04:	83 c1 00 08 	lwz     r30,8(r1)
	c00ece08:	83 e1 00 0c 	lwz     r31,12(r1)
	c00ece0c:	7c 08 03 a6 	mtlr    r0
	c00ece10:	38 21 00 10 	addi    r1,r1,16
	c00ece14:	4e 80 00 20 	blr

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
v2: Remove tracing of tlbie and tlbia as it generates circular dependency problems. Will see later if it can be added back. Not a big issue though.
---
 arch/powerpc/include/asm/nohash/tlbflush.h | 15 +++++++++++++++
 arch/powerpc/mm/nohash/tlb.c               |  2 ++
 2 files changed, 17 insertions(+)

diff --git a/arch/powerpc/include/asm/nohash/tlbflush.h b/arch/powerpc/include/asm/nohash/tlbflush.h
index 1edb7243e515..c08d25e3e626 100644
--- a/arch/powerpc/include/asm/nohash/tlbflush.h
+++ b/arch/powerpc/include/asm/nohash/tlbflush.h
@@ -32,11 +32,26 @@ extern void flush_tlb_range(struct vm_area_struct *vma, unsigned long start,
 			    unsigned long end);
 extern void flush_tlb_kernel_range(unsigned long start, unsigned long end);
 
+#ifdef CONFIG_PPC_8xx
+static inline void local_flush_tlb_mm(struct mm_struct *mm)
+{
+	unsigned int pid = READ_ONCE(mm->context.id);
+
+	if (pid != MMU_NO_CONTEXT)
+		asm volatile ("sync; tlbia; isync" : : : "memory");
+}
+
+static inline void local_flush_tlb_page(struct vm_area_struct *vma, unsigned long vmaddr)
+{
+	asm volatile ("tlbie %0; sync" : : "r" (vmaddr) : "memory");
+}
+#else
 extern void local_flush_tlb_mm(struct mm_struct *mm);
 extern void local_flush_tlb_page(struct vm_area_struct *vma, unsigned long vmaddr);
 
 extern void __local_flush_tlb_page(struct mm_struct *mm, unsigned long vmaddr,
 				   int tsize, int ind);
+#endif
 
 #ifdef CONFIG_SMP
 extern void flush_tlb_mm(struct mm_struct *mm);
diff --git a/arch/powerpc/mm/nohash/tlb.c b/arch/powerpc/mm/nohash/tlb.c
index 5872f69141d5..6d8cb68f6efb 100644
--- a/arch/powerpc/mm/nohash/tlb.c
+++ b/arch/powerpc/mm/nohash/tlb.c
@@ -185,6 +185,7 @@ EXPORT_PER_CPU_SYMBOL(next_tlbcam_idx);
  *    processor
  */
 
+#ifndef CONFIG_PPC_8xx
 /*
  * These are the base non-SMP variants of page and mm flushing
  */
@@ -218,6 +219,7 @@ void local_flush_tlb_page(struct vm_area_struct *vma, unsigned long vmaddr)
 			       mmu_get_tsize(mmu_virtual_psize), 0);
 }
 EXPORT_SYMBOL(local_flush_tlb_page);
+#endif
 
 /*
  * And here are the SMP non-local implementations
-- 
2.31.1


^ permalink raw reply related

* [PATCH v2] powerpc/32: Don't use a struct based type for pte_t
From: Christophe Leroy @ 2021-09-17 13:57 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: linuxppc-dev, linux-kernel

Long time ago we had a config item called STRICT_MM_TYPECHECKS
to build the kernel with pte_t defined as a structure in order
to perform additional build checks or build it with pte_t
defined as a simple type in order to get simpler generated code.

Commit 670eea924198 ("powerpc/mm: Always use STRICT_MM_TYPECHECKS")
made the struct based definition the only one, considering that the
generated code was similar in both cases.

That's right on ppc64 because the ABI is such that the content of a
struct having a single simple type element is passed as register,
but on ppc32 such a structure is passed via the stack like any
structure.

Simple test function:

	pte_t test(pte_t pte)
	{
		return pte;
	}

Before this patch we get

	c00108ec <test>:
	c00108ec:	81 24 00 00 	lwz     r9,0(r4)
	c00108f0:	91 23 00 00 	stw     r9,0(r3)
	c00108f4:	4e 80 00 20 	blr

So, for PPC32, restore the simple type behaviour we got before
commit 670eea924198, but instead of adding a config option to
activate type check, do it when __CHECKER__ is set so that type
checking is performed by 'sparse' and provides feedback like:

	arch/powerpc/mm/pgtable.c:466:16: warning: incorrect type in return expression (different base types)
	arch/powerpc/mm/pgtable.c:466:16:    expected unsigned long
	arch/powerpc/mm/pgtable.c:466:16:    got struct pte_t [usertype] x

With this patch we now get

	c0010890 <test>:
	c0010890:	4e 80 00 20 	blr

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
v2: Properly handle 8xx 16k pages
---
 arch/powerpc/include/asm/nohash/32/pgtable.h |  2 +-
 arch/powerpc/include/asm/pgtable-types.h     | 14 +++++++++++++-
 arch/powerpc/mm/pgtable.c                    |  2 +-
 3 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/nohash/32/pgtable.h b/arch/powerpc/include/asm/nohash/32/pgtable.h
index f06ae00f2a65..34ce50da1850 100644
--- a/arch/powerpc/include/asm/nohash/32/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/32/pgtable.h
@@ -245,7 +245,7 @@ static int number_of_cells_per_pte(pmd_t *pmd, pte_basic_t val, int huge)
 static inline pte_basic_t pte_update(struct mm_struct *mm, unsigned long addr, pte_t *p,
 				     unsigned long clr, unsigned long set, int huge)
 {
-	pte_basic_t *entry = &p->pte;
+	pte_basic_t *entry = (pte_basic_t *)p;
 	pte_basic_t old = pte_val(*p);
 	pte_basic_t new = (old & ~(pte_basic_t)clr) | set;
 	int num, i;
diff --git a/arch/powerpc/include/asm/pgtable-types.h b/arch/powerpc/include/asm/pgtable-types.h
index d11b4c61d686..c60199fc6fa6 100644
--- a/arch/powerpc/include/asm/pgtable-types.h
+++ b/arch/powerpc/include/asm/pgtable-types.h
@@ -5,14 +5,26 @@
 /* PTE level */
 #if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
 typedef struct { pte_basic_t pte, pte1, pte2, pte3; } pte_t;
-#else
+#elif defined(__CHECKER__) || !defined(CONFIG_PPC32)
 typedef struct { pte_basic_t pte; } pte_t;
+#else
+typedef pte_basic_t pte_t;
 #endif
+
+#if defined(__CHECKER__) || !defined(CONFIG_PPC32) || \
+    (defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES))
 #define __pte(x)	((pte_t) { (x) })
 static inline pte_basic_t pte_val(pte_t x)
 {
 	return x.pte;
 }
+#else
+#define __pte(x)	((pte_t)(x))
+static inline pte_basic_t pte_val(pte_t x)
+{
+	return x;
+}
+#endif
 
 /* PMD level */
 #ifdef CONFIG_PPC64
diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c
index cd16b407f47e..ce9482383144 100644
--- a/arch/powerpc/mm/pgtable.c
+++ b/arch/powerpc/mm/pgtable.c
@@ -271,7 +271,7 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_
 {
 	pmd_t *pmd = pmd_off(mm, addr);
 	pte_basic_t val;
-	pte_basic_t *entry = &ptep->pte;
+	pte_basic_t *entry = (pte_basic_t *)ptep;
 	int num, i;
 
 	/*
-- 
2.31.1


^ permalink raw reply related

* RE: [PATCH v2] powerpc/32: Don't use a struct based type for pte_t
From: David Laight @ 2021-09-17 14:32 UTC (permalink / raw)
  To: 'Christophe Leroy', Benjamin Herrenschmidt,
	Paul Mackerras, Michael Ellerman
  Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org
In-Reply-To: <c904599f33aaf6bb7ee2836a9ff8368509e0d78d.1631887042.git.christophe.leroy@csgroup.eu>

From: Christophe Leroy
> Sent: 17 September 2021 14:58
> 
> Long time ago we had a config item called STRICT_MM_TYPECHECKS
> to build the kernel with pte_t defined as a structure in order
> to perform additional build checks or build it with pte_t
> defined as a simple type in order to get simpler generated code.
> 
...
> diff --git a/arch/powerpc/include/asm/pgtable-types.h b/arch/powerpc/include/asm/pgtable-types.h
> index d11b4c61d686..c60199fc6fa6 100644
> --- a/arch/powerpc/include/asm/pgtable-types.h
> +++ b/arch/powerpc/include/asm/pgtable-types.h
> @@ -5,14 +5,26 @@
>  /* PTE level */
>  #if defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES)
>  typedef struct { pte_basic_t pte, pte1, pte2, pte3; } pte_t;
> -#else
> +#elif defined(__CHECKER__) || !defined(CONFIG_PPC32)
>  typedef struct { pte_basic_t pte; } pte_t;
> +#else
> +typedef pte_basic_t pte_t;
>  #endif
> +
> +#if defined(__CHECKER__) || !defined(CONFIG_PPC32) || \
> +    (defined(CONFIG_PPC_8xx) && defined(CONFIG_PPC_16K_PAGES))
>  #define __pte(x)	((pte_t) { (x) })
>  static inline pte_basic_t pte_val(pte_t x)
>  {
>  	return x.pte;
>  }
> +#else
> +#define __pte(x)	((pte_t)(x))
> +static inline pte_basic_t pte_val(pte_t x)
> +{
> +	return x;
> +}
> +#endif

Would it be better to define:
static inline pte_basic_*pte_basic(pte_t *x)
{
#if xxx
	return x;
#else
	return &x->pte;
#endif
}

Then pte_val(x) is always *pt_basic(x)
and the casts like:

> -	pte_basic_t *entry = &ptep->pte;
> +	pte_basic_t *entry = (pte_basic_t *)ptep;

can go away.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply

* Re: [PATCH v5 6/6] sched/fair: Consider SMT in ASYM_PACKING load balance
From: Vincent Guittot @ 2021-09-17 15:25 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Juri Lelli, Aubrey Li, Srikar Dronamraju, Ravi V. Shankar,
	Peter Zijlstra (Intel), Ricardo Neri, Ben Segall,
	Srinivas Pandruvada, Joel Fernandes (Google), Ingo Molnar,
	Rafael J . Wysocki, Steven Rostedt, Mel Gorman, Len Brown,
	Nicholas Piggin, Aubrey Li, Dietmar Eggemann, Tim Chen,
	Quentin Perret, Daniel Bristot de Oliveira, linux-kernel,
	linuxppc-dev
In-Reply-To: <20210917010044.GA23727@ranerica-svr.sc.intel.com>

On Fri, 17 Sept 2021 at 03:01, Ricardo Neri
<ricardo.neri-calderon@linux.intel.com> wrote:
>
> On Wed, Sep 15, 2021 at 05:43:44PM +0200, Vincent Guittot wrote:
> > On Sat, 11 Sept 2021 at 03:19, Ricardo Neri
> > <ricardo.neri-calderon@linux.intel.com> wrote:
> > >
> > > When deciding to pull tasks in ASYM_PACKING, it is necessary not only to
> > > check for the idle state of the destination CPU, dst_cpu, but also of
> > > its SMT siblings.
> > >
> > > If dst_cpu is idle but its SMT siblings are busy, performance suffers
> > > if it pulls tasks from a medium priority CPU that does not have SMT
> > > siblings.
> > >
> > > Implement asym_smt_can_pull_tasks() to inspect the state of the SMT
> > > siblings of both dst_cpu and the CPUs in the candidate busiest group.
> > >
> > > Cc: Aubrey Li <aubrey.li@intel.com>
> > > Cc: Ben Segall <bsegall@google.com>
> > > Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
> > > Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> > > Cc: Mel Gorman <mgorman@suse.de>
> > > Cc: Quentin Perret <qperret@google.com>
> > > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> > > Cc: Steven Rostedt <rostedt@goodmis.org>
> > > Cc: Tim Chen <tim.c.chen@linux.intel.com>
> > > Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> > > Reviewed-by: Len Brown <len.brown@intel.com>
> > > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> > > ---
> > > Changes since v4:
> > >   * Use sg_lb_stats::sum_nr_running the idle state of a scheduling group.
> > >     (Vincent, Peter)
> > >   * Do not even idle CPUs in asym_smt_can_pull_tasks(). (Vincent)
> > >   * Updated function documentation and corrected a typo.
> > >
> > > Changes since v3:
> > >   * Removed the arch_asym_check_smt_siblings() hook. Discussions with the
> > >     powerpc folks showed that this patch should not impact them. Also, more
> > >     recent powerpc processor no longer use asym_packing. (PeterZ)
> > >   * Removed unnecessary local variable in asym_can_pull_tasks(). (Dietmar)
> > >   * Removed unnecessary check for local CPUs when the local group has zero
> > >     utilization. (Joel)
> > >   * Renamed asym_can_pull_tasks() as asym_smt_can_pull_tasks() to reflect
> > >     the fact that it deals with SMT cases.
> > >   * Made asym_smt_can_pull_tasks() return false for !CONFIG_SCHED_SMT so
> > >     that callers can deal with non-SMT cases.
> > >
> > > Changes since v2:
> > >   * Reworded the commit message to reflect updates in code.
> > >   * Corrected misrepresentation of dst_cpu as the CPU doing the load
> > >     balancing. (PeterZ)
> > >   * Removed call to arch_asym_check_smt_siblings() as it is now called in
> > >     sched_asym().
> > >
> > > Changes since v1:
> > >   * Don't bailout in update_sd_pick_busiest() if dst_cpu cannot pull
> > >     tasks. Instead, reclassify the candidate busiest group, as it
> > >     may still be selected. (PeterZ)
> > >   * Avoid an expensive and unnecessary call to cpumask_weight() when
> > >     determining if a sched_group is comprised of SMT siblings.
> > >     (PeterZ).
> > > ---
> > >  kernel/sched/fair.c | 94 +++++++++++++++++++++++++++++++++++++++++++++
> > >  1 file changed, 94 insertions(+)
> > >
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index 26db017c14a3..8d763dd0174b 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -8597,10 +8597,98 @@ group_type group_classify(unsigned int imbalance_pct,
> > >         return group_has_spare;
> > >  }
> > >
> > > +/**
> > > + * asym_smt_can_pull_tasks - Check whether the load balancing CPU can pull tasks
> > > + * @dst_cpu:   Destination CPU of the load balancing
> > > + * @sds:       Load-balancing data with statistics of the local group
> > > + * @sgs:       Load-balancing statistics of the candidate busiest group
> > > + * @sg:                The candidate busiest group
> > > + *
> > > + * Check the state of the SMT siblings of both @sds::local and @sg and decide
> > > + * if @dst_cpu can pull tasks.
> > > + *
> > > + * If @dst_cpu does not have SMT siblings, it can pull tasks if two or more of
> > > + * the SMT siblings of @sg are busy. If only one CPU in @sg is busy, pull tasks
> > > + * only if @dst_cpu has higher priority.
> > > + *
> > > + * If both @dst_cpu and @sg have SMT siblings, and @sg has exactly one more
> > > + * busy CPU than @sds::local, let @dst_cpu pull tasks if it has higher priority.
> > > + * Bigger imbalances in the number of busy CPUs will be dealt with in
> > > + * update_sd_pick_busiest().
> > > + *
> > > + * If @sg does not have SMT siblings, only pull tasks if all of the SMT siblings
> > > + * of @dst_cpu are idle and @sg has lower priority.
> > > + */
> > > +static bool asym_smt_can_pull_tasks(int dst_cpu, struct sd_lb_stats *sds,
> > > +                                   struct sg_lb_stats *sgs,
> > > +                                   struct sched_group *sg)
> > > +{
> > > +#ifdef CONFIG_SCHED_SMT
> > > +       bool local_is_smt, sg_is_smt;
> > > +       int sg_busy_cpus;
> > > +
> > > +       local_is_smt = sds->local->flags & SD_SHARE_CPUCAPACITY;
> > > +       sg_is_smt = sg->flags & SD_SHARE_CPUCAPACITY;
> > > +
> > > +       sg_busy_cpus = sgs->group_weight - sgs->idle_cpus;
> > > +
> > > +       if (!local_is_smt) {
> > > +               /*
> > > +                * If we are here, @dst_cpu is idle and does not have SMT
> > > +                * siblings. Pull tasks if candidate group has two or more
> > > +                * busy CPUs.
> > > +                */
> > > +               if (sg_is_smt && sg_busy_cpus >= 2)
> >
> > Do you really need to test sg_is_smt ? if sg_busy_cpus >= 2 then
> > sd_is_smt must be true ?
>
> Thank you very much for your feedback Vincent!
>
> Yes, it is true that sg_busy_cpus >=2 is only true if @sg is SMT. I will
> remove this check.
>
> >
> > Also, This is the default behavior where we want to even the number of
> > busy cpu. Shouldn't you return false and fall back to the default
> > behavior ?
>
> This is also true.
>
> >
> > That being said, the default behavior tries to even the number of idle
> > cpus which is easier to compute and is equal to even the number of
> > busy cpus in "normal" system with the same number of cpus in groups
> > but this is not the case here. It could be good to change the default
> > behavior to even the number of busy cpus and that you use the default
> > behavior here. Additional condition will be used to select the busiest
> > group like more busy cpu or more number of running tasks
>
> That is a very good observation. Checking the number of idle CPUs
> assumes that both groups have the same number of CPUs. I'll look into
> modifying the default behavior.

Because this change will impact default smt/smp system, we might
prefer to do that in a separate step

With the removal of the condition !sds->local_stat.sum_nr_running
which seems useless because dst_cpu is idle and not SMT, this patch
looks good to me

>
> >
> > > +                       return true;
> > > +
> > > +               /*
> > > +                * @dst_cpu does not have SMT siblings. @sg may have SMT
> > > +                * siblings and only one is busy. In such case, @dst_cpu
> > > +                * can help if it has higher priority and is idle (i.e.,
> > > +                * it has no running tasks).
> >
> > The previous comment above assume that "@dst_cpu is idle" but now you
> > need to check that sds->local_stat.sum_nr_running == 0
>
> But we already know that, right? We are here because in
> update_sg_lb_stats() we determine that dst CPU is idle (env->idle !=
> CPU_NOT_IDLE).
>
> Thanks and BR,
> Ricardo

^ permalink raw reply

* Re: [PATCH v5 2/6] sched/topology: Introduce sched_group::flags
From: Vincent Guittot @ 2021-09-17 15:26 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Juri Lelli, Aubrey Li, Srikar Dronamraju, Ravi V. Shankar,
	Peter Zijlstra (Intel), Ricardo Neri, Ben Segall,
	Srinivas Pandruvada, Joel Fernandes (Google), Ingo Molnar,
	Rafael J . Wysocki, Steven Rostedt, Mel Gorman, Len Brown,
	Nicholas Piggin, Aubrey Li, Dietmar Eggemann, Tim Chen,
	Quentin Perret, Daniel Bristot de Oliveira, linux-kernel,
	linuxppc-dev
In-Reply-To: <20210911011819.12184-3-ricardo.neri-calderon@linux.intel.com>

On Sat, 11 Sept 2021 at 03:19, Ricardo Neri
<ricardo.neri-calderon@linux.intel.com> wrote:
>
> There exist situations in which the load balance needs to know the
> properties of the CPUs in a scheduling group. When using asymmetric
> packing, for instance, the load balancer needs to know not only the
> state of dst_cpu but also of its SMT siblings, if any.
>
> Use the flags of the child scheduling domains to initialize scheduling
> group flags. This will reflect the properties of the CPUs in the
> group.
>
> A subsequent changeset will make use of these new flags. No functional
> changes are introduced.
>
> Cc: Aubrey Li <aubrey.li@intel.com>
> Cc: Ben Segall <bsegall@google.com>
> Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Quentin Perret <qperret@google.com>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Tim Chen <tim.c.chen@linux.intel.com>
> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> Reviewed-by: Len Brown <len.brown@intel.com>
> Originally-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>

Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>

> ---
> Changes since v4:
>   * None
>
> Changes since v3:
>   * Clear the flags of the scheduling groups of a domain if its child is
>     destroyed.
>   * Minor rewording of the commit message.
>
> Changes since v2:
>   * Introduced this patch.
>
> Changes since v1:
>   * N/A
> ---
>  kernel/sched/sched.h    |  1 +
>  kernel/sched/topology.c | 21 ++++++++++++++++++---
>  2 files changed, 19 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 3d3e5793e117..86ab33ce529d 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -1809,6 +1809,7 @@ struct sched_group {
>         unsigned int            group_weight;
>         struct sched_group_capacity *sgc;
>         int                     asym_prefer_cpu;        /* CPU of highest priority in group */
> +       int                     flags;
>
>         /*
>          * The CPUs this group covers.
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index 4e8698e62f07..c56faae461d9 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -716,8 +716,20 @@ cpu_attach_domain(struct sched_domain *sd, struct root_domain *rd, int cpu)
>                 tmp = sd;
>                 sd = sd->parent;
>                 destroy_sched_domain(tmp);
> -               if (sd)
> +               if (sd) {
> +                       struct sched_group *sg = sd->groups;
> +
> +                       /*
> +                        * sched groups hold the flags of the child sched
> +                        * domain for convenience. Clear such flags since
> +                        * the child is being destroyed.
> +                        */
> +                       do {
> +                               sg->flags = 0;
> +                       } while (sg != sd->groups);
> +
>                         sd->child = NULL;
> +               }
>         }
>
>         for (tmp = sd; tmp; tmp = tmp->parent)
> @@ -916,10 +928,12 @@ build_group_from_child_sched_domain(struct sched_domain *sd, int cpu)
>                 return NULL;
>
>         sg_span = sched_group_span(sg);
> -       if (sd->child)
> +       if (sd->child) {
>                 cpumask_copy(sg_span, sched_domain_span(sd->child));
> -       else
> +               sg->flags = sd->child->flags;
> +       } else {
>                 cpumask_copy(sg_span, sched_domain_span(sd));
> +       }
>
>         atomic_inc(&sg->ref);
>         return sg;
> @@ -1169,6 +1183,7 @@ static struct sched_group *get_group(int cpu, struct sd_data *sdd)
>         if (child) {
>                 cpumask_copy(sched_group_span(sg), sched_domain_span(child));
>                 cpumask_copy(group_balance_mask(sg), sched_group_span(sg));
> +               sg->flags = child->flags;
>         } else {
>                 cpumask_set_cpu(cpu, sched_group_span(sg));
>                 cpumask_set_cpu(cpu, group_balance_mask(sg));
> --
> 2.17.1
>

^ permalink raw reply

* Re: [PATCH v5 3/6] sched/fair: Optimize checking for group_asym_packing
From: Vincent Guittot @ 2021-09-17 15:26 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Juri Lelli, Aubrey Li, Srikar Dronamraju, Ravi V. Shankar,
	Peter Zijlstra (Intel), Ricardo Neri, Ben Segall,
	Srinivas Pandruvada, Joel Fernandes (Google), Ingo Molnar,
	Rafael J . Wysocki, Steven Rostedt, Mel Gorman, Len Brown,
	Nicholas Piggin, Aubrey Li, Dietmar Eggemann, Tim Chen,
	Quentin Perret, Daniel Bristot de Oliveira, linux-kernel,
	linuxppc-dev
In-Reply-To: <20210911011819.12184-4-ricardo.neri-calderon@linux.intel.com>

On Sat, 11 Sept 2021 at 03:19, Ricardo Neri
<ricardo.neri-calderon@linux.intel.com> wrote:
>
> sched_asmy_prefer() always returns false when called on the local group. By
> checking local_group, we can avoid additional checks and invoking
> sched_asmy_prefer() when it is not needed. No functional changes are
> introduced.
>
> Cc: Aubrey Li <aubrey.li@intel.com>
> Cc: Ben Segall <bsegall@google.com>
> Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Quentin Perret <qperret@google.com>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Tim Chen <tim.c.chen@linux.intel.com>
> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> Reviewed-by: Len Brown <len.brown@intel.com>
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>

Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>

> ---
> Changes since v4:
>   * None
>
> Changes since v3:
>   * Further rewording of the commit message. (Len)
>
> Changes since v2:
>   * Reworded the commit message for clarity. (Peter Z)
>
> Changes since v1:
>   * None
> ---
>  kernel/sched/fair.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index ff69f245b939..7a054f528bcc 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8657,7 +8657,7 @@ static inline void update_sg_lb_stats(struct lb_env *env,
>         }
>
>         /* Check if dst CPU is idle and preferred to this group */
> -       if (env->sd->flags & SD_ASYM_PACKING &&
> +       if (!local_group && env->sd->flags & SD_ASYM_PACKING &&
>             env->idle != CPU_NOT_IDLE &&
>             sgs->sum_h_nr_running &&
>             sched_asym_prefer(env->dst_cpu, group->asym_prefer_cpu)) {
> --
> 2.17.1
>

^ permalink raw reply

* Re: [PATCH v5 4/6] sched/fair: Provide update_sg_lb_stats() with sched domain statistics
From: Vincent Guittot @ 2021-09-17 15:27 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Juri Lelli, Aubrey Li, Srikar Dronamraju, Ravi V. Shankar,
	Peter Zijlstra (Intel), Ricardo Neri, Ben Segall,
	Srinivas Pandruvada, Joel Fernandes (Google), Ingo Molnar,
	Rafael J . Wysocki, Steven Rostedt, Mel Gorman, Len Brown,
	Nicholas Piggin, Aubrey Li, Dietmar Eggemann, Tim Chen,
	Quentin Perret, Daniel Bristot de Oliveira, linux-kernel,
	linuxppc-dev
In-Reply-To: <20210911011819.12184-5-ricardo.neri-calderon@linux.intel.com>

On Sat, 11 Sept 2021 at 03:19, Ricardo Neri
<ricardo.neri-calderon@linux.intel.com> wrote:
>
> Before deciding to pull tasks when using asymmetric packing of tasks,
> on some architectures (e.g., x86) it is necessary to know not only the
> state of dst_cpu but also of its SMT siblings. The decision to classify
> a candidate busiest group as group_asym_packing is done in
> update_sg_lb_stats(). Give this function access to the scheduling domain
> statistics, which contains the statistics of the local group.
>
> Cc: Aubrey Li <aubrey.li@intel.com>
> Cc: Ben Segall <bsegall@google.com>
> Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Quentin Perret <qperret@google.com>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Tim Chen <tim.c.chen@linux.intel.com>
> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> Reviewed-by: Len Brown <len.brown@intel.com>
> Originally-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>

Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>

> ---
> Changes since v4:
>   * None
>
> Changes since v3:
>   * None
>
> Changes since v2:
>   * Introduced this patch.
>
> Changes since v1:
>   * N/A
> ---
>  kernel/sched/fair.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 7a054f528bcc..c5851260b4d8 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8605,6 +8605,7 @@ group_type group_classify(unsigned int imbalance_pct,
>   * @sg_status: Holds flag indicating the status of the sched_group
>   */
>  static inline void update_sg_lb_stats(struct lb_env *env,
> +                                     struct sd_lb_stats *sds,
>                                       struct sched_group *group,
>                                       struct sg_lb_stats *sgs,
>                                       int *sg_status)
> @@ -8613,7 +8614,7 @@ static inline void update_sg_lb_stats(struct lb_env *env,
>
>         memset(sgs, 0, sizeof(*sgs));
>
> -       local_group = cpumask_test_cpu(env->dst_cpu, sched_group_span(group));
> +       local_group = group == sds->local;
>
>         for_each_cpu_and(i, sched_group_span(group), env->cpus) {
>                 struct rq *rq = cpu_rq(i);
> @@ -9176,7 +9177,7 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd
>                                 update_group_capacity(env->sd, env->dst_cpu);
>                 }
>
> -               update_sg_lb_stats(env, sg, sgs, &sg_status);
> +               update_sg_lb_stats(env, sds, sg, sgs, &sg_status);
>
>                 if (local_group)
>                         goto next_group;
> --
> 2.17.1
>

^ permalink raw reply

* Re: [PATCH v5 5/6] sched/fair: Carve out logic to mark a group for asymmetric packing
From: Vincent Guittot @ 2021-09-17 15:27 UTC (permalink / raw)
  To: Ricardo Neri
  Cc: Juri Lelli, Aubrey Li, Srikar Dronamraju, Ravi V. Shankar,
	Peter Zijlstra (Intel), Ricardo Neri, Ben Segall,
	Srinivas Pandruvada, Joel Fernandes (Google), Ingo Molnar,
	Rafael J . Wysocki, Steven Rostedt, Mel Gorman, Len Brown,
	Nicholas Piggin, Aubrey Li, Dietmar Eggemann, Tim Chen,
	Quentin Perret, Daniel Bristot de Oliveira, linux-kernel,
	linuxppc-dev
In-Reply-To: <20210911011819.12184-6-ricardo.neri-calderon@linux.intel.com>

On Sat, 11 Sept 2021 at 03:19, Ricardo Neri
<ricardo.neri-calderon@linux.intel.com> wrote:
>
> Create a separate function, sched_asym(). A subsequent changeset will
> introduce logic to deal with SMT in conjunction with asmymmetric
> packing. Such logic will need the statistics of the scheduling
> group provided as argument. Update them before calling sched_asym().
>
> Cc: Aubrey Li <aubrey.li@intel.com>
> Cc: Ben Segall <bsegall@google.com>
> Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Quentin Perret <qperret@google.com>
> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Cc: Tim Chen <tim.c.chen@linux.intel.com>
> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> Reviewed-by: Len Brown <len.brown@intel.com>
> Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>

Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>

> ---
> Changes since v4:
>   * None
>
> Changes since v3:
>   * Remove a redundant check for the local group in sched_asym().
>     (Dietmar)
>   * Reworded commit message for clarity. (Len)
>
> Changes since v2:
>   * Introduced this patch.
>
> Changes since v1:
>   * N/A
> ---
>  kernel/sched/fair.c | 20 +++++++++++++-------
>  1 file changed, 13 insertions(+), 7 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index c5851260b4d8..26db017c14a3 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8597,6 +8597,13 @@ group_type group_classify(unsigned int imbalance_pct,
>         return group_has_spare;
>  }
>
> +static inline bool
> +sched_asym(struct lb_env *env, struct sd_lb_stats *sds,  struct sg_lb_stats *sgs,
> +          struct sched_group *group)
> +{
> +       return sched_asym_prefer(env->dst_cpu, group->asym_prefer_cpu);
> +}
> +
>  /**
>   * update_sg_lb_stats - Update sched_group's statistics for load balancing.
>   * @env: The load balancing environment.
> @@ -8657,18 +8664,17 @@ static inline void update_sg_lb_stats(struct lb_env *env,
>                 }
>         }
>
> +       sgs->group_capacity = group->sgc->capacity;
> +
> +       sgs->group_weight = group->group_weight;
> +
>         /* Check if dst CPU is idle and preferred to this group */
>         if (!local_group && env->sd->flags & SD_ASYM_PACKING &&
> -           env->idle != CPU_NOT_IDLE &&
> -           sgs->sum_h_nr_running &&
> -           sched_asym_prefer(env->dst_cpu, group->asym_prefer_cpu)) {
> +           env->idle != CPU_NOT_IDLE && sgs->sum_h_nr_running &&
> +           sched_asym(env, sds, sgs, group)) {
>                 sgs->group_asym_packing = 1;
>         }
>
> -       sgs->group_capacity = group->sgc->capacity;
> -
> -       sgs->group_weight = group->group_weight;
> -
>         sgs->group_type = group_classify(env->sd->imbalance_pct, group, sgs);
>
>         /* Computing avg_load makes sense only when group is overloaded */
> --
> 2.17.1
>

^ permalink raw reply

* [PATCH v2 0/8] bpf powerpc: Add BPF_PROBE_MEM support in powerpc JIT compiler
From: Hari Bathini @ 2021-09-17 15:30 UTC (permalink / raw)
  To: naveen.n.rao, mpe, ast, daniel
  Cc: songliubraving, netdev, john.fastabend, andrii, kpsingh, paulus,
	yhs, bpf, linuxppc-dev, kafai, Hari Bathini

Patch #1 & #2 are simple cleanup patches. Patch #3 refactors JIT
compiler code with the aim to simplify adding BPF_PROBE_MEM support.
Patch #4 introduces PPC_RAW_BRANCH() macro instead of open coding
branch instruction. Patch #5 & #7 add BPF_PROBE_MEM support for PPC64
& PPC32 JIT compilers respectively. Patch #6 & #8 add explicit
addr > TASK_SIZE_MAX check to handle bad userspace pointers for
PPC64 & PPC32 cases respectively.

Link to v1 posted by Ravi:

  https://lore.kernel.org/all/20210706073211.349889-1-ravi.bangoria@linux.ibm.com/
  ("[PATCH 0/4] bpf powerpc: Add BPF_PROBE_MEM support for 64bit JIT")


Hari Bathini (4):
  bpf powerpc: refactor JIT compiler code
  powerpc/ppc-opcode: introduce PPC_RAW_BRANCH() macro
  bpf ppc32: Add BPF_PROBE_MEM support for JIT
  bpf ppc32: Add addr > TASK_SIZE_MAX explicit check

Ravi Bangoria (4):
  bpf powerpc: Remove unused SEEN_STACK
  bpf powerpc: Remove extra_pass from bpf_jit_build_body()
  bpf ppc64: Add BPF_PROBE_MEM support for JIT
  bpf ppc64: Add addr > TASK_SIZE_MAX explicit check

 arch/powerpc/include/asm/ppc-opcode.h |   2 +
 arch/powerpc/net/bpf_jit.h            |  19 ++--
 arch/powerpc/net/bpf_jit_comp.c       |  75 ++++++++++++++--
 arch/powerpc/net/bpf_jit_comp32.c     | 123 +++++++++++++++++++++-----
 arch/powerpc/net/bpf_jit_comp64.c     | 114 ++++++++++++++++--------
 5 files changed, 260 insertions(+), 73 deletions(-)

-- 
2.31.1


^ permalink raw reply

* [PATCH v2 1/8] bpf powerpc: Remove unused SEEN_STACK
From: Hari Bathini @ 2021-09-17 15:30 UTC (permalink / raw)
  To: naveen.n.rao, mpe, ast, daniel
  Cc: Ravi Bangoria, songliubraving, netdev, john.fastabend, andrii,
	kpsingh, paulus, yhs, bpf, linuxppc-dev, kafai
In-Reply-To: <20210917153047.177141-1-hbathini@linux.ibm.com>

From: Ravi Bangoria <ravi.bangoria@linux.ibm.com>

SEEN_STACK is unused on PowerPC. Remove it. Also, have
SEEN_TAILCALL use 0x40000000.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
---

* No changes in v2.


 arch/powerpc/net/bpf_jit.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
index 99fad093f43e..d6267e93027a 100644
--- a/arch/powerpc/net/bpf_jit.h
+++ b/arch/powerpc/net/bpf_jit.h
@@ -116,8 +116,7 @@ static inline bool is_nearbranch(int offset)
 #define COND_LE		(CR0_GT | COND_CMP_FALSE)
 
 #define SEEN_FUNC	0x20000000 /* might call external helpers */
-#define SEEN_STACK	0x40000000 /* uses BPF stack */
-#define SEEN_TAILCALL	0x80000000 /* uses tail calls */
+#define SEEN_TAILCALL	0x40000000 /* uses tail calls */
 
 #define SEEN_VREG_MASK	0x1ff80000 /* Volatile registers r3-r12 */
 #define SEEN_NVREG_MASK	0x0003ffff /* Non volatile registers r14-r31 */
-- 
2.31.1


^ permalink raw reply related

* [PATCH v2 2/8] bpf powerpc: Remove extra_pass from bpf_jit_build_body()
From: Hari Bathini @ 2021-09-17 15:30 UTC (permalink / raw)
  To: naveen.n.rao, mpe, ast, daniel
  Cc: Ravi Bangoria, songliubraving, netdev, john.fastabend, andrii,
	kpsingh, paulus, yhs, bpf, linuxppc-dev, kafai
In-Reply-To: <20210917153047.177141-1-hbathini@linux.ibm.com>

From: Ravi Bangoria <ravi.bangoria@linux.ibm.com>

In case of extra_pass, usual JIT passes are always skipped. So,
extra_pass is always false while calling bpf_jit_build_body() and
thus it can be removed.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
---

Changes in v2:
* Updated the changelog wording a bit.


 arch/powerpc/net/bpf_jit.h        | 2 +-
 arch/powerpc/net/bpf_jit_comp.c   | 6 +++---
 arch/powerpc/net/bpf_jit_comp32.c | 4 ++--
 arch/powerpc/net/bpf_jit_comp64.c | 4 ++--
 4 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
index d6267e93027a..411c63d945c7 100644
--- a/arch/powerpc/net/bpf_jit.h
+++ b/arch/powerpc/net/bpf_jit.h
@@ -166,7 +166,7 @@ static inline void bpf_clear_seen_register(struct codegen_context *ctx, int i)
 
 void bpf_jit_emit_func_call_rel(u32 *image, struct codegen_context *ctx, u64 func);
 int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *ctx,
-		       u32 *addrs, bool extra_pass);
+		       u32 *addrs);
 void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx);
 void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx);
 void bpf_jit_realloc_regs(struct codegen_context *ctx);
diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index 53aefee3fe70..c5c9e8ad1de7 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -149,7 +149,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 	cgctx.stack_size = round_up(fp->aux->stack_depth, 16);
 
 	/* Scouting faux-generate pass 0 */
-	if (bpf_jit_build_body(fp, 0, &cgctx, addrs, false)) {
+	if (bpf_jit_build_body(fp, 0, &cgctx, addrs)) {
 		/* We hit something illegal or unsupported. */
 		fp = org_fp;
 		goto out_addrs;
@@ -162,7 +162,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 	 */
 	if (cgctx.seen & SEEN_TAILCALL) {
 		cgctx.idx = 0;
-		if (bpf_jit_build_body(fp, 0, &cgctx, addrs, false)) {
+		if (bpf_jit_build_body(fp, 0, &cgctx, addrs)) {
 			fp = org_fp;
 			goto out_addrs;
 		}
@@ -210,7 +210,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 		/* Now build the prologue, body code & epilogue for real. */
 		cgctx.idx = 0;
 		bpf_jit_build_prologue(code_base, &cgctx);
-		bpf_jit_build_body(fp, code_base, &cgctx, addrs, extra_pass);
+		bpf_jit_build_body(fp, code_base, &cgctx, addrs);
 		bpf_jit_build_epilogue(code_base, &cgctx);
 
 		if (bpf_jit_enable > 1)
diff --git a/arch/powerpc/net/bpf_jit_comp32.c b/arch/powerpc/net/bpf_jit_comp32.c
index beb12cbc8c29..b60b59426a24 100644
--- a/arch/powerpc/net/bpf_jit_comp32.c
+++ b/arch/powerpc/net/bpf_jit_comp32.c
@@ -266,7 +266,7 @@ static void bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32
 
 /* Assemble the body code between the prologue & epilogue */
 int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *ctx,
-		       u32 *addrs, bool extra_pass)
+		       u32 *addrs)
 {
 	const struct bpf_insn *insn = fp->insnsi;
 	int flen = fp->len;
@@ -860,7 +860,7 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
 		case BPF_JMP | BPF_CALL:
 			ctx->seen |= SEEN_FUNC;
 
-			ret = bpf_jit_get_func_addr(fp, &insn[i], extra_pass,
+			ret = bpf_jit_get_func_addr(fp, &insn[i], false,
 						    &func_addr, &func_addr_fixed);
 			if (ret < 0)
 				return ret;
diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c
index b87a63dba9c8..2a87da50d9a4 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -272,7 +272,7 @@ static void bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32
 
 /* Assemble the body code between the prologue & epilogue */
 int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *ctx,
-		       u32 *addrs, bool extra_pass)
+		       u32 *addrs)
 {
 	const struct bpf_insn *insn = fp->insnsi;
 	int flen = fp->len;
@@ -769,7 +769,7 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
 		case BPF_JMP | BPF_CALL:
 			ctx->seen |= SEEN_FUNC;
 
-			ret = bpf_jit_get_func_addr(fp, &insn[i], extra_pass,
+			ret = bpf_jit_get_func_addr(fp, &insn[i], false,
 						    &func_addr, &func_addr_fixed);
 			if (ret < 0)
 				return ret;
-- 
2.31.1


^ permalink raw reply related

* [PATCH v2 4/8] powerpc/ppc-opcode: introduce PPC_RAW_BRANCH() macro
From: Hari Bathini @ 2021-09-17 15:30 UTC (permalink / raw)
  To: naveen.n.rao, mpe, ast, daniel
  Cc: songliubraving, netdev, john.fastabend, andrii, kpsingh, paulus,
	yhs, bpf, linuxppc-dev, kafai, Hari Bathini
In-Reply-To: <20210917153047.177141-1-hbathini@linux.ibm.com>

Define and use PPC_RAW_BRANCH() macro instead of open coding it. This
macro is used while adding BPF_PROBE_MEM support.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
---

Changes in v2:
* New patch to introduce PPC_RAW_BRANCH() macro.


 arch/powerpc/include/asm/ppc-opcode.h | 2 ++
 arch/powerpc/net/bpf_jit.h            | 4 ++--
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h
index baea657bc868..f50213e2a3e0 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -566,6 +566,8 @@
 #define PPC_RAW_MTSPR(spr, d)		(0x7c0003a6 | ___PPC_RS(d) | __PPC_SPR(spr))
 #define PPC_RAW_EIEIO()			(0x7c0006ac)
 
+#define PPC_RAW_BRANCH(addr)		(PPC_INST_BRANCH | ((addr) & 0x03fffffc))
+
 /* Deal with instructions that older assemblers aren't aware of */
 #define	PPC_BCCTR_FLUSH		stringify_in_c(.long PPC_INST_BCCTR_FLUSH)
 #define	PPC_CP_ABORT		stringify_in_c(.long PPC_RAW_CP_ABORT)
diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
index 411c63d945c7..0c8f885b8f48 100644
--- a/arch/powerpc/net/bpf_jit.h
+++ b/arch/powerpc/net/bpf_jit.h
@@ -24,8 +24,8 @@
 #define EMIT(instr)		PLANT_INSTR(image, ctx->idx, instr)
 
 /* Long jump; (unconditional 'branch') */
-#define PPC_JMP(dest)		EMIT(PPC_INST_BRANCH |			      \
-				     (((dest) - (ctx->idx * 4)) & 0x03fffffc))
+#define PPC_JMP(dest)		EMIT(PPC_RAW_BRANCH((dest) - (ctx->idx * 4)))
+
 /* blr; (unconditional 'branch' with link) to absolute address */
 #define PPC_BL_ABS(dest)	EMIT(PPC_INST_BL |			      \
 				     (((dest) - (unsigned long)(image + ctx->idx)) & 0x03fffffc))
-- 
2.31.1


^ permalink raw reply related

* [PATCH v2 5/8] bpf ppc64: Add BPF_PROBE_MEM support for JIT
From: Hari Bathini @ 2021-09-17 15:30 UTC (permalink / raw)
  To: naveen.n.rao, mpe, ast, daniel
  Cc: Ravi Bangoria, songliubraving, netdev, john.fastabend, andrii,
	kpsingh, paulus, yhs, bpf, linuxppc-dev, kafai, Hari Bathini
In-Reply-To: <20210917153047.177141-1-hbathini@linux.ibm.com>

From: Ravi Bangoria <ravi.bangoria@linux.ibm.com>

BPF load instruction with BPF_PROBE_MEM mode can cause a fault
inside kernel. Append exception table for such instructions
within BPF program.

Unlike other archs which uses extable 'fixup' field to pass dest_reg
and nip, BPF exception table on PowerPC follows the generic PowerPC
exception table design, where it populates both fixup and extable
sections within BPF program. fixup section contains two instructions,
first instruction clears dest_reg and 2nd jumps to next instruction
in the BPF code. extable 'insn' field contains relative offset of
the instruction and 'fixup' field contains relative offset of the
fixup entry. Example layout of BPF program with extable present:

             +------------------+
             |                  |
             |                  |
   0x4020 -->| ld   r27,4(r3)   |
             |                  |
             |                  |
   0x40ac -->| lwz  r3,0(r4)    |
             |                  |
             |                  |
             |------------------|
   0x4280 -->| li  r27,0        |  \ fixup entry
             | b   0x4024       |  /
   0x4288 -->| li  r3,0         |
             | b   0x40b0       |
             |------------------|
   0x4290 -->| insn=0xfffffd90  |  \ extable entry
             | fixup=0xffffffec |  /
   0x4298 -->| insn=0xfffffe14  |
             | fixup=0xffffffec |
             +------------------+

   (Addresses shown here are chosen random, not real)

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
---

Changes in v2:
* Used JITing code after refactoring.
* Replaced 'xor reg,reg,reg' with 'li reg,0' where appropriate.
* Avoided unnecessary init during declaration.


 arch/powerpc/net/bpf_jit.h        |  5 ++-
 arch/powerpc/net/bpf_jit_comp.c   | 25 ++++++++++----
 arch/powerpc/net/bpf_jit_comp32.c |  2 +-
 arch/powerpc/net/bpf_jit_comp64.c | 57 ++++++++++++++++++++++++++++++-
 4 files changed, 80 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
index 0c8f885b8f48..6357c71c26eb 100644
--- a/arch/powerpc/net/bpf_jit.h
+++ b/arch/powerpc/net/bpf_jit.h
@@ -141,8 +141,11 @@ struct codegen_context {
 	unsigned int idx;
 	unsigned int stack_size;
 	int b2p[ARRAY_SIZE(b2p)];
+	unsigned int exentry_idx;
 };
 
+#define BPF_FIXUP_LEN	8 /* Two instructions */
+
 static inline void bpf_flush_icache(void *start, void *end)
 {
 	smp_wmb();	/* smp write barrier */
@@ -166,7 +169,7 @@ static inline void bpf_clear_seen_register(struct codegen_context *ctx, int i)
 
 void bpf_jit_emit_func_call_rel(u32 *image, struct codegen_context *ctx, u64 func);
 int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *ctx,
-		       u32 *addrs);
+		       u32 *addrs, int pass);
 void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx);
 void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx);
 void bpf_jit_realloc_regs(struct codegen_context *ctx);
diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index c5c9e8ad1de7..e92bd79d3bac 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -101,6 +101,8 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 	struct bpf_prog *tmp_fp;
 	bool bpf_blinded = false;
 	bool extra_pass = false;
+	u32 extable_len;
+	u32 fixup_len;
 
 	if (!fp->jit_requested)
 		return org_fp;
@@ -131,7 +133,6 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 		image = jit_data->image;
 		bpf_hdr = jit_data->header;
 		proglen = jit_data->proglen;
-		alloclen = proglen + FUNCTION_DESCR_SIZE;
 		extra_pass = true;
 		goto skip_init_ctx;
 	}
@@ -149,7 +150,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 	cgctx.stack_size = round_up(fp->aux->stack_depth, 16);
 
 	/* Scouting faux-generate pass 0 */
-	if (bpf_jit_build_body(fp, 0, &cgctx, addrs)) {
+	if (bpf_jit_build_body(fp, 0, &cgctx, addrs, 0)) {
 		/* We hit something illegal or unsupported. */
 		fp = org_fp;
 		goto out_addrs;
@@ -162,7 +163,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 	 */
 	if (cgctx.seen & SEEN_TAILCALL) {
 		cgctx.idx = 0;
-		if (bpf_jit_build_body(fp, 0, &cgctx, addrs)) {
+		if (bpf_jit_build_body(fp, 0, &cgctx, addrs, 0)) {
 			fp = org_fp;
 			goto out_addrs;
 		}
@@ -177,8 +178,11 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 	bpf_jit_build_prologue(0, &cgctx);
 	bpf_jit_build_epilogue(0, &cgctx);
 
+	fixup_len = fp->aux->num_exentries * BPF_FIXUP_LEN;
+	extable_len = fp->aux->num_exentries * sizeof(struct exception_table_entry);
+
 	proglen = cgctx.idx * 4;
-	alloclen = proglen + FUNCTION_DESCR_SIZE;
+	alloclen = proglen + FUNCTION_DESCR_SIZE + fixup_len + extable_len;
 
 	bpf_hdr = bpf_jit_binary_alloc(alloclen, &image, 4, bpf_jit_fill_ill_insns);
 	if (!bpf_hdr) {
@@ -186,6 +190,11 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 		goto out_addrs;
 	}
 
+	if (extable_len) {
+		fp->aux->extable = (void *)image + FUNCTION_DESCR_SIZE +
+				   proglen + fixup_len;
+	}
+
 skip_init_ctx:
 	code_base = (u32 *)(image + FUNCTION_DESCR_SIZE);
 
@@ -210,7 +219,11 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 		/* Now build the prologue, body code & epilogue for real. */
 		cgctx.idx = 0;
 		bpf_jit_build_prologue(code_base, &cgctx);
-		bpf_jit_build_body(fp, code_base, &cgctx, addrs);
+		if (bpf_jit_build_body(fp, code_base, &cgctx, addrs, pass)) {
+			bpf_jit_binary_free(bpf_hdr);
+			fp = org_fp;
+			goto out_addrs;
+		}
 		bpf_jit_build_epilogue(code_base, &cgctx);
 
 		if (bpf_jit_enable > 1)
@@ -234,7 +247,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 
 	fp->bpf_func = (void *)image;
 	fp->jited = 1;
-	fp->jited_len = alloclen;
+	fp->jited_len = proglen + FUNCTION_DESCR_SIZE;
 
 	bpf_flush_icache(bpf_hdr, (u8 *)bpf_hdr + (bpf_hdr->pages * PAGE_SIZE));
 	bpf_jit_binary_lock_ro(bpf_hdr);
diff --git a/arch/powerpc/net/bpf_jit_comp32.c b/arch/powerpc/net/bpf_jit_comp32.c
index c8ae14c316e3..94641b7be387 100644
--- a/arch/powerpc/net/bpf_jit_comp32.c
+++ b/arch/powerpc/net/bpf_jit_comp32.c
@@ -266,7 +266,7 @@ static void bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32
 
 /* Assemble the body code between the prologue & epilogue */
 int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *ctx,
-		       u32 *addrs)
+		       u32 *addrs, int pass)
 {
 	const struct bpf_insn *insn = fp->insnsi;
 	int flen = fp->len;
diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c
index 78b28f25555c..2fc10995f243 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -270,9 +270,54 @@ static void bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32
 	/* out: */
 }
 
+/*
+ * The caller should check for (BPF_MODE(code) == BPF_PROBE_MEM) before calling
+ * this function, as this only applies to BPF_PROBE_MEM, for now.
+ */
+static int bpf_add_extable_entry(struct bpf_prog *fp, u32 *image, int pass,
+				 struct codegen_context *ctx, int dst_reg)
+{
+	off_t offset;
+	unsigned long pc;
+	struct exception_table_entry *ex;
+	u32 *fixup;
+
+	/* Populate extable entries only in the last pass */
+	if (pass != 2)
+		return 0;
+
+	if (!fp->aux->extable ||
+	    WARN_ON_ONCE(ctx->exentry_idx >= fp->aux->num_exentries))
+		return -EINVAL;
+
+	pc = (unsigned long)&image[ctx->idx - 1];
+
+	fixup = (void *)fp->aux->extable -
+		(fp->aux->num_exentries * BPF_FIXUP_LEN) +
+		(ctx->exentry_idx * BPF_FIXUP_LEN);
+
+	fixup[0] = PPC_RAW_LI(dst_reg, 0);
+	fixup[1] = PPC_RAW_BRANCH((long)(pc + 4) - (long)&fixup[1]);
+
+	ex = &fp->aux->extable[ctx->exentry_idx];
+
+	offset = pc - (long)&ex->insn;
+	if (WARN_ON_ONCE(offset >= 0 || offset < INT_MIN))
+		return -ERANGE;
+	ex->insn = offset;
+
+	offset = (long)fixup - (long)&ex->fixup;
+	if (WARN_ON_ONCE(offset >= 0 || offset < INT_MIN))
+		return -ERANGE;
+	ex->fixup = offset;
+
+	ctx->exentry_idx++;
+	return 0;
+}
+
 /* Assemble the body code between the prologue & epilogue */
 int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *ctx,
-		       u32 *addrs)
+		       u32 *addrs, int pass)
 {
 	const struct bpf_insn *insn = fp->insnsi;
 	int flen = fp->len;
@@ -714,12 +759,16 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
 		 */
 		/* dst = *(u8 *)(ul) (src + off) */
 		case BPF_LDX | BPF_MEM | BPF_B:
+		case BPF_LDX | BPF_PROBE_MEM | BPF_B:
 		/* dst = *(u16 *)(ul) (src + off) */
 		case BPF_LDX | BPF_MEM | BPF_H:
+		case BPF_LDX | BPF_PROBE_MEM | BPF_H:
 		/* dst = *(u32 *)(ul) (src + off) */
 		case BPF_LDX | BPF_MEM | BPF_W:
+		case BPF_LDX | BPF_PROBE_MEM | BPF_W:
 		/* dst = *(u64 *)(ul) (src + off) */
 		case BPF_LDX | BPF_MEM | BPF_DW:
+		case BPF_LDX | BPF_PROBE_MEM | BPF_DW:
 			switch (size) {
 			case BPF_B:
 				EMIT(PPC_RAW_LBZ(dst_reg, src_reg, off));
@@ -737,6 +786,12 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
 
 			if ((size != BPF_DW) && insn_is_zext(&insn[i + 1]))
 				addrs[++i] = ctx->idx * 4;
+
+			if (BPF_MODE(code) == BPF_PROBE_MEM) {
+				ret = bpf_add_extable_entry(fp, image, pass, ctx, dst_reg);
+				if (ret)
+					return ret;
+			}
 			break;
 
 		/*
-- 
2.31.1


^ permalink raw reply related

* [PATCH v2 6/8] bpf ppc64: Add addr > TASK_SIZE_MAX explicit check
From: Hari Bathini @ 2021-09-17 15:30 UTC (permalink / raw)
  To: naveen.n.rao, mpe, ast, daniel
  Cc: Ravi Bangoria, songliubraving, netdev, john.fastabend, andrii,
	kpsingh, paulus, yhs, bpf, linuxppc-dev, kafai, Hari Bathini
In-Reply-To: <20210917153047.177141-1-hbathini@linux.ibm.com>

From: Ravi Bangoria <ravi.bangoria@linux.ibm.com>

On PPC64 with KUAP enabled, any kernel code which wants to
access userspace needs to be surrounded by disable-enable KUAP.
But that is not happening for BPF_PROBE_MEM load instruction.
So, when BPF program tries to access invalid userspace address,
page-fault handler considers it as bad KUAP fault:

  Kernel attempted to read user page (d0000000) - exploit attempt? (uid: 0)

Considering the fact that PTR_TO_BTF_ID (which uses BPF_PROBE_MEM
mode) could either be a valid kernel pointer or NULL but should
never be a pointer to userspace address, execute BPF_PROBE_MEM load
only if addr > TASK_SIZE_MAX, otherwise set dst_reg=0 and move on.

This will catch NULL, valid or invalid userspace pointers. Only bad
kernel pointer will be handled by BPF exception table.

[Alexei suggested for x86]
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
---

Changes in v2:
* Refactored the code based on Christophe's comments.


 arch/powerpc/net/bpf_jit_comp64.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c
index 2fc10995f243..eb28dbc67151 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -769,6 +769,29 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
 		/* dst = *(u64 *)(ul) (src + off) */
 		case BPF_LDX | BPF_MEM | BPF_DW:
 		case BPF_LDX | BPF_PROBE_MEM | BPF_DW:
+			/*
+			 * As PTR_TO_BTF_ID that uses BPF_PROBE_MEM mode could either be a valid
+			 * kernel pointer or NULL but not a userspace address, execute BPF_PROBE_MEM
+			 * load only if addr > TASK_SIZE_MAX, otherwise set dst_reg=0 and move on.
+			 */
+			if (BPF_MODE(code) == BPF_PROBE_MEM) {
+				unsigned int adjusted_idx;
+
+				/*
+				 * Check if 'off' is word aligned because PPC_BPF_LL()
+				 * (BPF_DW case) generates two instructions if 'off' is not
+				 * word-aligned and one instruction otherwise.
+				 */
+				adjusted_idx = ((BPF_SIZE(code) == BPF_DW) && (off & 3)) ? 1 : 0;
+
+				EMIT(PPC_RAW_ADDI(b2p[TMP_REG_1], src_reg, off));
+				PPC_LI64(b2p[TMP_REG_2], TASK_SIZE_MAX);
+				EMIT(PPC_RAW_CMPLD(b2p[TMP_REG_1], b2p[TMP_REG_2]));
+				PPC_BCC(COND_GT, (ctx->idx + 4) * 4);
+				EMIT(PPC_RAW_LI(dst_reg, 0));
+				PPC_JMP((ctx->idx + 2 + adjusted_idx) * 4);
+			}
+
 			switch (size) {
 			case BPF_B:
 				EMIT(PPC_RAW_LBZ(dst_reg, src_reg, off));
-- 
2.31.1


^ permalink raw reply related

* [PATCH v2 3/8] bpf powerpc: refactor JIT compiler code
From: Hari Bathini @ 2021-09-17 15:30 UTC (permalink / raw)
  To: naveen.n.rao, mpe, ast, daniel
  Cc: songliubraving, netdev, john.fastabend, andrii, kpsingh, paulus,
	yhs, bpf, linuxppc-dev, kafai, Hari Bathini
In-Reply-To: <20210917153047.177141-1-hbathini@linux.ibm.com>

Refactor powerpc JITing. This simplifies adding BPF_PROBE_MEM support.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
---

Changes in v2:
* New patch to refactor a bit of JITing code.


 arch/powerpc/net/bpf_jit_comp32.c | 50 +++++++++++---------
 arch/powerpc/net/bpf_jit_comp64.c | 76 ++++++++++++++++---------------
 2 files changed, 68 insertions(+), 58 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit_comp32.c b/arch/powerpc/net/bpf_jit_comp32.c
index b60b59426a24..c8ae14c316e3 100644
--- a/arch/powerpc/net/bpf_jit_comp32.c
+++ b/arch/powerpc/net/bpf_jit_comp32.c
@@ -276,17 +276,17 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
 	u32 exit_addr = addrs[flen];
 
 	for (i = 0; i < flen; i++) {
-		u32 code = insn[i].code;
 		u32 dst_reg = bpf_to_ppc(ctx, insn[i].dst_reg);
-		u32 dst_reg_h = dst_reg - 1;
 		u32 src_reg = bpf_to_ppc(ctx, insn[i].src_reg);
-		u32 src_reg_h = src_reg - 1;
 		u32 tmp_reg = bpf_to_ppc(ctx, TMP_REG);
+		u32 true_cond, code = insn[i].code;
+		u32 dst_reg_h = dst_reg - 1;
+		u32 src_reg_h = src_reg - 1;
+		u32 size = BPF_SIZE(code);
 		s16 off = insn[i].off;
 		s32 imm = insn[i].imm;
 		bool func_addr_fixed;
 		u64 func_addr;
-		u32 true_cond;
 
 		/*
 		 * addrs[] maps a BPF bytecode address into a real offset from
@@ -809,25 +809,33 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
 		/*
 		 * BPF_LDX
 		 */
-		case BPF_LDX | BPF_MEM | BPF_B: /* dst = *(u8 *)(ul) (src + off) */
-			EMIT(PPC_RAW_LBZ(dst_reg, src_reg, off));
-			if (!fp->aux->verifier_zext)
-				EMIT(PPC_RAW_LI(dst_reg_h, 0));
-			break;
-		case BPF_LDX | BPF_MEM | BPF_H: /* dst = *(u16 *)(ul) (src + off) */
-			EMIT(PPC_RAW_LHZ(dst_reg, src_reg, off));
-			if (!fp->aux->verifier_zext)
-				EMIT(PPC_RAW_LI(dst_reg_h, 0));
-			break;
-		case BPF_LDX | BPF_MEM | BPF_W: /* dst = *(u32 *)(ul) (src + off) */
-			EMIT(PPC_RAW_LWZ(dst_reg, src_reg, off));
-			if (!fp->aux->verifier_zext)
+		/* dst = *(u8 *)(ul) (src + off) */
+		case BPF_LDX | BPF_MEM | BPF_B:
+		/* dst = *(u16 *)(ul) (src + off) */
+		case BPF_LDX | BPF_MEM | BPF_H:
+		/* dst = *(u32 *)(ul) (src + off) */
+		case BPF_LDX | BPF_MEM | BPF_W:
+		/* dst = *(u64 *)(ul) (src + off) */
+		case BPF_LDX | BPF_MEM | BPF_DW:
+			switch (size) {
+			case BPF_B:
+				EMIT(PPC_RAW_LBZ(dst_reg, src_reg, off));
+				break;
+			case BPF_H:
+				EMIT(PPC_RAW_LHZ(dst_reg, src_reg, off));
+				break;
+			case BPF_W:
+				EMIT(PPC_RAW_LWZ(dst_reg, src_reg, off));
+				break;
+			case BPF_DW:
+				EMIT(PPC_RAW_LWZ(dst_reg_h, src_reg, off));
+				EMIT(PPC_RAW_LWZ(dst_reg, src_reg, off + 4));
+				break;
+			}
+
+			if ((size != BPF_DW) && !fp->aux->verifier_zext)
 				EMIT(PPC_RAW_LI(dst_reg_h, 0));
 			break;
-		case BPF_LDX | BPF_MEM | BPF_DW: /* dst = *(u64 *)(ul) (src + off) */
-			EMIT(PPC_RAW_LWZ(dst_reg_h, src_reg, off));
-			EMIT(PPC_RAW_LWZ(dst_reg, src_reg, off + 4));
-			break;
 
 		/*
 		 * Doubleword load
diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c
index 2a87da50d9a4..78b28f25555c 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -282,16 +282,15 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
 	u32 exit_addr = addrs[flen];
 
 	for (i = 0; i < flen; i++) {
-		u32 code = insn[i].code;
 		u32 dst_reg = b2p[insn[i].dst_reg];
 		u32 src_reg = b2p[insn[i].src_reg];
+		u32 tmp_idx, code = insn[i].code;
+		u32 size = BPF_SIZE(code);
 		s16 off = insn[i].off;
 		s32 imm = insn[i].imm;
 		bool func_addr_fixed;
-		u64 func_addr;
-		u64 imm64;
+		u64 func_addr, imm64;
 		u32 true_cond;
-		u32 tmp_idx;
 
 		/*
 		 * addrs[] maps a BPF bytecode address into a real offset from
@@ -638,35 +637,34 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
 		 */
 		case BPF_STX | BPF_MEM | BPF_B: /* *(u8 *)(dst + off) = src */
 		case BPF_ST | BPF_MEM | BPF_B: /* *(u8 *)(dst + off) = imm */
-			if (BPF_CLASS(code) == BPF_ST) {
-				EMIT(PPC_RAW_LI(b2p[TMP_REG_1], imm));
-				src_reg = b2p[TMP_REG_1];
-			}
-			EMIT(PPC_RAW_STB(src_reg, dst_reg, off));
-			break;
 		case BPF_STX | BPF_MEM | BPF_H: /* (u16 *)(dst + off) = src */
 		case BPF_ST | BPF_MEM | BPF_H: /* (u16 *)(dst + off) = imm */
-			if (BPF_CLASS(code) == BPF_ST) {
-				EMIT(PPC_RAW_LI(b2p[TMP_REG_1], imm));
-				src_reg = b2p[TMP_REG_1];
-			}
-			EMIT(PPC_RAW_STH(src_reg, dst_reg, off));
-			break;
 		case BPF_STX | BPF_MEM | BPF_W: /* *(u32 *)(dst + off) = src */
 		case BPF_ST | BPF_MEM | BPF_W: /* *(u32 *)(dst + off) = imm */
-			if (BPF_CLASS(code) == BPF_ST) {
-				PPC_LI32(b2p[TMP_REG_1], imm);
-				src_reg = b2p[TMP_REG_1];
-			}
-			EMIT(PPC_RAW_STW(src_reg, dst_reg, off));
-			break;
 		case BPF_STX | BPF_MEM | BPF_DW: /* (u64 *)(dst + off) = src */
 		case BPF_ST | BPF_MEM | BPF_DW: /* *(u64 *)(dst + off) = imm */
 			if (BPF_CLASS(code) == BPF_ST) {
-				PPC_LI32(b2p[TMP_REG_1], imm);
+				if ((size == BPF_B) || (size == BPF_H))
+					EMIT(PPC_RAW_LI(b2p[TMP_REG_1], imm));
+				else /* size == BPF_W || size == BPF_DW */
+					PPC_LI32(b2p[TMP_REG_1], imm);
 				src_reg = b2p[TMP_REG_1];
 			}
-			PPC_BPF_STL(src_reg, dst_reg, off);
+
+			switch (size) {
+			case BPF_B:
+				EMIT(PPC_RAW_STB(src_reg, dst_reg, off));
+				break;
+			case BPF_H:
+				EMIT(PPC_RAW_STH(src_reg, dst_reg, off));
+				break;
+			case BPF_W:
+				EMIT(PPC_RAW_STW(src_reg, dst_reg, off));
+				break;
+			case BPF_DW:
+				PPC_BPF_STL(src_reg, dst_reg, off);
+				break;
+			}
 			break;
 
 		/*
@@ -716,25 +714,29 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
 		 */
 		/* dst = *(u8 *)(ul) (src + off) */
 		case BPF_LDX | BPF_MEM | BPF_B:
-			EMIT(PPC_RAW_LBZ(dst_reg, src_reg, off));
-			if (insn_is_zext(&insn[i + 1]))
-				addrs[++i] = ctx->idx * 4;
-			break;
 		/* dst = *(u16 *)(ul) (src + off) */
 		case BPF_LDX | BPF_MEM | BPF_H:
-			EMIT(PPC_RAW_LHZ(dst_reg, src_reg, off));
-			if (insn_is_zext(&insn[i + 1]))
-				addrs[++i] = ctx->idx * 4;
-			break;
 		/* dst = *(u32 *)(ul) (src + off) */
 		case BPF_LDX | BPF_MEM | BPF_W:
-			EMIT(PPC_RAW_LWZ(dst_reg, src_reg, off));
-			if (insn_is_zext(&insn[i + 1]))
-				addrs[++i] = ctx->idx * 4;
-			break;
 		/* dst = *(u64 *)(ul) (src + off) */
 		case BPF_LDX | BPF_MEM | BPF_DW:
-			PPC_BPF_LL(dst_reg, src_reg, off);
+			switch (size) {
+			case BPF_B:
+				EMIT(PPC_RAW_LBZ(dst_reg, src_reg, off));
+				break;
+			case BPF_H:
+				EMIT(PPC_RAW_LHZ(dst_reg, src_reg, off));
+				break;
+			case BPF_W:
+				EMIT(PPC_RAW_LWZ(dst_reg, src_reg, off));
+				break;
+			case BPF_DW:
+				PPC_BPF_LL(dst_reg, src_reg, off);
+				break;
+			}
+
+			if ((size != BPF_DW) && insn_is_zext(&insn[i + 1]))
+				addrs[++i] = ctx->idx * 4;
 			break;
 
 		/*
-- 
2.31.1


^ permalink raw reply related

* [PATCH v2 7/8] bpf ppc32: Add BPF_PROBE_MEM support for JIT
From: Hari Bathini @ 2021-09-17 15:30 UTC (permalink / raw)
  To: naveen.n.rao, mpe, ast, daniel
  Cc: songliubraving, netdev, john.fastabend, andrii, kpsingh, paulus,
	yhs, bpf, linuxppc-dev, kafai, Hari Bathini
In-Reply-To: <20210917153047.177141-1-hbathini@linux.ibm.com>

BPF load instruction with BPF_PROBE_MEM mode can cause a fault
inside kernel. Append exception table for such instructions
within BPF program.

Unlike other archs which uses extable 'fixup' field to pass dest_reg
and nip, BPF exception table on PowerPC follows the generic PowerPC
exception table design, where it populates both fixup and extable
sections within BPF program. fixup section contains 3 instructions,
first 2 instructions clear dest_reg (lower & higher 32-bit registers)
and last instruction jumps to next instruction in the BPF code.
extable 'insn' field contains relative offset of the instruction and
'fixup' field contains relative offset of the fixup entry. Example
layout of BPF program with extable present:

             +------------------+
             |                  |
             |                  |
   0x4020 -->| lwz   r28,4(r4)  |
             |                  |
             |                  |
   0x40ac -->| lwz  r3,0(r24)   |
             | lwz  r4,4(r24)   |
             |                  |
             |                  |
             |------------------|
   0x4278 -->| li  r28,0        |  \
             | li  r27,0        |  | fixup entry
             | b   0x4024       |  /
   0x4284 -->| li  r4,0         |
             | li  r3,0         |
             | b   0x40b4       |
             |------------------|
   0x4290 -->| insn=0xfffffd90  |  \ extable entry
             | fixup=0xffffffe4 |  /
   0x4298 -->| insn=0xfffffe14  |
             | fixup=0xffffffe8 |
             +------------------+

   (Addresses shown here are chosen random, not real)

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
---

Changes in v2:
* New patch to add BPF_PROBE_MEM support for PPC32.


 arch/powerpc/net/bpf_jit.h        |  7 +++++
 arch/powerpc/net/bpf_jit_comp.c   | 50 +++++++++++++++++++++++++++++++
 arch/powerpc/net/bpf_jit_comp32.c | 30 +++++++++++++++++++
 arch/powerpc/net/bpf_jit_comp64.c | 48 ++---------------------------
 4 files changed, 89 insertions(+), 46 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
index 6357c71c26eb..6a591ef88006 100644
--- a/arch/powerpc/net/bpf_jit.h
+++ b/arch/powerpc/net/bpf_jit.h
@@ -144,7 +144,11 @@ struct codegen_context {
 	unsigned int exentry_idx;
 };
 
+#ifdef CONFIG_PPC32
+#define BPF_FIXUP_LEN	12 /* Three instructions */
+#else
 #define BPF_FIXUP_LEN	8 /* Two instructions */
+#endif
 
 static inline void bpf_flush_icache(void *start, void *end)
 {
@@ -174,6 +178,9 @@ void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx);
 void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx);
 void bpf_jit_realloc_regs(struct codegen_context *ctx);
 
+int bpf_add_extable_entry(struct bpf_prog *fp, u32 *image, int pass, struct codegen_context *ctx,
+			  int insn_idx, int jmp_off, int dst_reg);
+
 #endif
 
 #endif
diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index e92bd79d3bac..a1753b8c78c8 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -271,3 +271,53 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 
 	return fp;
 }
+
+/*
+ * The caller should check for (BPF_MODE(code) == BPF_PROBE_MEM) before calling
+ * this function, as this only applies to BPF_PROBE_MEM, for now.
+ */
+int bpf_add_extable_entry(struct bpf_prog *fp, u32 *image, int pass, struct codegen_context *ctx,
+			  int insn_idx, int jmp_off, int dst_reg)
+{
+	off_t offset;
+	unsigned long pc;
+	struct exception_table_entry *ex;
+	u32 *fixup;
+
+	/* Populate extable entries only in the last pass */
+	if (pass != 2)
+		return 0;
+
+	if (!fp->aux->extable ||
+	    WARN_ON_ONCE(ctx->exentry_idx >= fp->aux->num_exentries))
+		return -EINVAL;
+
+	pc = (unsigned long)&image[insn_idx];
+
+	fixup = (void *)fp->aux->extable -
+		(fp->aux->num_exentries * BPF_FIXUP_LEN) +
+		(ctx->exentry_idx * BPF_FIXUP_LEN);
+
+	fixup[0] = PPC_RAW_LI(dst_reg, 0);
+#ifdef CONFIG_PPC32
+	fixup[1] = PPC_RAW_LI(dst_reg - 1, 0); /* clear higher 32-bit register too */
+	fixup[2] = PPC_RAW_BRANCH((long)(pc + jmp_off) - (long)&fixup[2]);
+#else
+	fixup[1] = PPC_RAW_BRANCH((long)(pc + jmp_off) - (long)&fixup[1]);
+#endif
+
+	ex = &fp->aux->extable[ctx->exentry_idx];
+
+	offset = pc - (long)&ex->insn;
+	if (WARN_ON_ONCE(offset >= 0 || offset < INT_MIN))
+		return -ERANGE;
+	ex->insn = offset;
+
+	offset = (long)fixup - (long)&ex->fixup;
+	if (WARN_ON_ONCE(offset >= 0 || offset < INT_MIN))
+		return -ERANGE;
+	ex->fixup = offset;
+
+	ctx->exentry_idx++;
+	return 0;
+}
diff --git a/arch/powerpc/net/bpf_jit_comp32.c b/arch/powerpc/net/bpf_jit_comp32.c
index 94641b7be387..c6262289dcc4 100644
--- a/arch/powerpc/net/bpf_jit_comp32.c
+++ b/arch/powerpc/net/bpf_jit_comp32.c
@@ -811,12 +811,16 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
 		 */
 		/* dst = *(u8 *)(ul) (src + off) */
 		case BPF_LDX | BPF_MEM | BPF_B:
+		case BPF_LDX | BPF_PROBE_MEM | BPF_B:
 		/* dst = *(u16 *)(ul) (src + off) */
 		case BPF_LDX | BPF_MEM | BPF_H:
+		case BPF_LDX | BPF_PROBE_MEM | BPF_H:
 		/* dst = *(u32 *)(ul) (src + off) */
 		case BPF_LDX | BPF_MEM | BPF_W:
+		case BPF_LDX | BPF_PROBE_MEM | BPF_W:
 		/* dst = *(u64 *)(ul) (src + off) */
 		case BPF_LDX | BPF_MEM | BPF_DW:
+		case BPF_LDX | BPF_PROBE_MEM | BPF_DW:
 			switch (size) {
 			case BPF_B:
 				EMIT(PPC_RAW_LBZ(dst_reg, src_reg, off));
@@ -835,6 +839,32 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
 
 			if ((size != BPF_DW) && !fp->aux->verifier_zext)
 				EMIT(PPC_RAW_LI(dst_reg_h, 0));
+
+			if (BPF_MODE(code) == BPF_PROBE_MEM) {
+				int insn_idx = ctx->idx - 1;
+				int jmp_off = 4;
+
+				/*
+				 * In case of BPF_DW, two lwz instructions are emitted, one
+				 * for higher 32-bit and another for lower 32-bit. So, set
+				 * ex->insn to the first of the two and jump over both
+				 * instructions in fixup.
+				 *
+				 * Similarly, with !verifier_zext, two instructions are
+				 * emitted for BPF_B/H/W case. So, set ex-insn to the
+				 * instruction that could fault and skip over both
+				 * instructions.
+				 */
+				if ((size == BPF_DW) || !fp->aux->verifier_zext) {
+					insn_idx -= 1;
+					jmp_off += 4;
+				}
+
+				ret = bpf_add_extable_entry(fp, image, pass, ctx, insn_idx,
+							    jmp_off, dst_reg);
+				if (ret)
+					return ret;
+			}
 			break;
 
 		/*
diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c
index eb28dbc67151..10cc9f04843c 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -270,51 +270,6 @@ static void bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32
 	/* out: */
 }
 
-/*
- * The caller should check for (BPF_MODE(code) == BPF_PROBE_MEM) before calling
- * this function, as this only applies to BPF_PROBE_MEM, for now.
- */
-static int bpf_add_extable_entry(struct bpf_prog *fp, u32 *image, int pass,
-				 struct codegen_context *ctx, int dst_reg)
-{
-	off_t offset;
-	unsigned long pc;
-	struct exception_table_entry *ex;
-	u32 *fixup;
-
-	/* Populate extable entries only in the last pass */
-	if (pass != 2)
-		return 0;
-
-	if (!fp->aux->extable ||
-	    WARN_ON_ONCE(ctx->exentry_idx >= fp->aux->num_exentries))
-		return -EINVAL;
-
-	pc = (unsigned long)&image[ctx->idx - 1];
-
-	fixup = (void *)fp->aux->extable -
-		(fp->aux->num_exentries * BPF_FIXUP_LEN) +
-		(ctx->exentry_idx * BPF_FIXUP_LEN);
-
-	fixup[0] = PPC_RAW_LI(dst_reg, 0);
-	fixup[1] = PPC_RAW_BRANCH((long)(pc + 4) - (long)&fixup[1]);
-
-	ex = &fp->aux->extable[ctx->exentry_idx];
-
-	offset = pc - (long)&ex->insn;
-	if (WARN_ON_ONCE(offset >= 0 || offset < INT_MIN))
-		return -ERANGE;
-	ex->insn = offset;
-
-	offset = (long)fixup - (long)&ex->fixup;
-	if (WARN_ON_ONCE(offset >= 0 || offset < INT_MIN))
-		return -ERANGE;
-	ex->fixup = offset;
-
-	ctx->exentry_idx++;
-	return 0;
-}
-
 /* Assemble the body code between the prologue & epilogue */
 int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *ctx,
 		       u32 *addrs, int pass)
@@ -811,7 +766,8 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
 				addrs[++i] = ctx->idx * 4;
 
 			if (BPF_MODE(code) == BPF_PROBE_MEM) {
-				ret = bpf_add_extable_entry(fp, image, pass, ctx, dst_reg);
+				ret = bpf_add_extable_entry(fp, image, pass, ctx, ctx->idx - 1,
+							    4, dst_reg);
 				if (ret)
 					return ret;
 			}
-- 
2.31.1


^ permalink raw reply related

* [PATCH v2 8/8] bpf ppc32: Add addr > TASK_SIZE_MAX explicit check
From: Hari Bathini @ 2021-09-17 15:30 UTC (permalink / raw)
  To: naveen.n.rao, mpe, ast, daniel
  Cc: songliubraving, netdev, john.fastabend, andrii, kpsingh, paulus,
	yhs, bpf, linuxppc-dev, kafai, Hari Bathini
In-Reply-To: <20210917153047.177141-1-hbathini@linux.ibm.com>

With KUAP enabled, any kernel code which wants to access userspace
needs to be surrounded by disable-enable KUAP. But that is not
happening for BPF_PROBE_MEM load instruction. Though PPC32 does not
support read protection, considering the fact that PTR_TO_BTF_ID
(which uses BPF_PROBE_MEM mode) could either be a valid kernel pointer
or NULL but should never be a pointer to userspace address, execute
BPF_PROBE_MEM load only if addr > TASK_SIZE_MAX, otherwise set
dst_reg=0 and move on.

This will catch NULL, valid or invalid userspace pointers. Only bad
kernel pointer will be handled by BPF exception table.

[Alexei suggested for x86]
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
---

Changes in v2:
* New patch to handle bad userspace pointers on PPC32. 


 arch/powerpc/net/bpf_jit_comp32.c | 39 +++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/arch/powerpc/net/bpf_jit_comp32.c b/arch/powerpc/net/bpf_jit_comp32.c
index c6262289dcc4..faa8047fcf4a 100644
--- a/arch/powerpc/net/bpf_jit_comp32.c
+++ b/arch/powerpc/net/bpf_jit_comp32.c
@@ -821,6 +821,45 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
 		/* dst = *(u64 *)(ul) (src + off) */
 		case BPF_LDX | BPF_MEM | BPF_DW:
 		case BPF_LDX | BPF_PROBE_MEM | BPF_DW:
+			/*
+			 * As PTR_TO_BTF_ID that uses BPF_PROBE_MEM mode could either be a valid
+			 * kernel pointer or NULL but not a userspace address, execute BPF_PROBE_MEM
+			 * load only if addr > TASK_SIZE_MAX, otherwise set dst_reg=0 and move on.
+			 */
+			if (BPF_MODE(code) == BPF_PROBE_MEM) {
+				bool extra_insn_needed = false;
+				unsigned int adjusted_idx;
+
+				/*
+				 * For BPF_DW case, "li reg_h,0" would be needed when
+				 * !fp->aux->verifier_zext. Adjust conditional branch'ing
+				 * address accordingly.
+				 */
+				if ((size == BPF_DW) && !fp->aux->verifier_zext)
+					extra_insn_needed = true;
+
+				/*
+				 * Need to jump two instructions instead of one for BPF_DW case
+				 * as there are two load instructions for dst_reg_h & dst_reg
+				 * respectively.
+				 */
+				adjusted_idx = (size == BPF_DW) ? 1 : 0;
+
+				EMIT(PPC_RAW_ADDI(b2p[TMP_REG], src_reg, off));
+				PPC_LI32(_R0, TASK_SIZE_MAX);
+				EMIT(PPC_RAW_CMPLW(b2p[TMP_REG], _R0));
+				PPC_BCC(COND_GT, (ctx->idx + 4 + (extra_insn_needed ? 1 : 0)) * 4);
+				EMIT(PPC_RAW_LI(dst_reg, 0));
+				/*
+				 * Note that "li reg_h,0" is emitted for BPF_B/H/W case,
+				 * if necessary. So, jump there insted of emitting an
+				 * additional "li reg_h,0" instruction.
+				 */
+				if (extra_insn_needed)
+					EMIT(PPC_RAW_LI(dst_reg_h, 0));
+				PPC_JMP((ctx->idx + 2 + adjusted_idx) * 4);
+			}
+
 			switch (size) {
 			case BPF_B:
 				EMIT(PPC_RAW_LBZ(dst_reg, src_reg, off));
-- 
2.31.1


^ permalink raw reply related

* Re: [PATCH v2 1/8] bpf powerpc: Remove unused SEEN_STACK
From: Christophe Leroy @ 2021-09-17 16:02 UTC (permalink / raw)
  To: Hari Bathini, naveen.n.rao, mpe, ast, daniel
  Cc: Ravi Bangoria, songliubraving, netdev, john.fastabend, andrii,
	kpsingh, paulus, yhs, bpf, linuxppc-dev, kafai
In-Reply-To: <20210917153047.177141-2-hbathini@linux.ibm.com>



Le 17/09/2021 à 17:30, Hari Bathini a écrit :
> From: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
> 
> SEEN_STACK is unused on PowerPC. Remove it. Also, have
> SEEN_TAILCALL use 0x40000000.
> 
> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>

Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>

> ---
> 
> * No changes in v2.
> 
> 
>   arch/powerpc/net/bpf_jit.h | 3 +--
>   1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
> index 99fad093f43e..d6267e93027a 100644
> --- a/arch/powerpc/net/bpf_jit.h
> +++ b/arch/powerpc/net/bpf_jit.h
> @@ -116,8 +116,7 @@ static inline bool is_nearbranch(int offset)
>   #define COND_LE		(CR0_GT | COND_CMP_FALSE)
>   
>   #define SEEN_FUNC	0x20000000 /* might call external helpers */
> -#define SEEN_STACK	0x40000000 /* uses BPF stack */
> -#define SEEN_TAILCALL	0x80000000 /* uses tail calls */
> +#define SEEN_TAILCALL	0x40000000 /* uses tail calls */
>   
>   #define SEEN_VREG_MASK	0x1ff80000 /* Volatile registers r3-r12 */
>   #define SEEN_NVREG_MASK	0x0003ffff /* Non volatile registers r14-r31 */
> 

^ permalink raw reply

* Re: [PATCH v2 3/8] bpf powerpc: refactor JIT compiler code
From: Christophe Leroy @ 2021-09-17 16:10 UTC (permalink / raw)
  To: Hari Bathini, naveen.n.rao, mpe, ast, daniel
  Cc: songliubraving, netdev, john.fastabend, andrii, kpsingh, paulus,
	yhs, bpf, linuxppc-dev, kafai
In-Reply-To: <20210917153047.177141-4-hbathini@linux.ibm.com>



Le 17/09/2021 à 17:30, Hari Bathini a écrit :
> Refactor powerpc JITing. This simplifies adding BPF_PROBE_MEM support.

Could you describe a bit more what you are refactoring exactly ?


> 
> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
> ---
> 
> Changes in v2:
> * New patch to refactor a bit of JITing code.
> 
> 
>   arch/powerpc/net/bpf_jit_comp32.c | 50 +++++++++++---------
>   arch/powerpc/net/bpf_jit_comp64.c | 76 ++++++++++++++++---------------
>   2 files changed, 68 insertions(+), 58 deletions(-)
> 
> diff --git a/arch/powerpc/net/bpf_jit_comp32.c b/arch/powerpc/net/bpf_jit_comp32.c
> index b60b59426a24..c8ae14c316e3 100644
> --- a/arch/powerpc/net/bpf_jit_comp32.c
> +++ b/arch/powerpc/net/bpf_jit_comp32.c
> @@ -276,17 +276,17 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
>   	u32 exit_addr = addrs[flen];
>   
>   	for (i = 0; i < flen; i++) {
> -		u32 code = insn[i].code;
>   		u32 dst_reg = bpf_to_ppc(ctx, insn[i].dst_reg);
> -		u32 dst_reg_h = dst_reg - 1;
>   		u32 src_reg = bpf_to_ppc(ctx, insn[i].src_reg);
> -		u32 src_reg_h = src_reg - 1;
>   		u32 tmp_reg = bpf_to_ppc(ctx, TMP_REG);
> +		u32 true_cond, code = insn[i].code;
> +		u32 dst_reg_h = dst_reg - 1;
> +		u32 src_reg_h = src_reg - 1;

All changes above seems unneeded and not linked to the current patch. 
Please leave cosmetic changes outside and focus on necessary changes.

> +		u32 size = BPF_SIZE(code);
>   		s16 off = insn[i].off;
>   		s32 imm = insn[i].imm;
>   		bool func_addr_fixed;
>   		u64 func_addr;
> -		u32 true_cond;
>   
>   		/*
>   		 * addrs[] maps a BPF bytecode address into a real offset from
> @@ -809,25 +809,33 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
>   		/*
>   		 * BPF_LDX
>   		 */
> -		case BPF_LDX | BPF_MEM | BPF_B: /* dst = *(u8 *)(ul) (src + off) */
> -			EMIT(PPC_RAW_LBZ(dst_reg, src_reg, off));
> -			if (!fp->aux->verifier_zext)
> -				EMIT(PPC_RAW_LI(dst_reg_h, 0));
> -			break;
> -		case BPF_LDX | BPF_MEM | BPF_H: /* dst = *(u16 *)(ul) (src + off) */
> -			EMIT(PPC_RAW_LHZ(dst_reg, src_reg, off));
> -			if (!fp->aux->verifier_zext)
> -				EMIT(PPC_RAW_LI(dst_reg_h, 0));
> -			break;
> -		case BPF_LDX | BPF_MEM | BPF_W: /* dst = *(u32 *)(ul) (src + off) */
> -			EMIT(PPC_RAW_LWZ(dst_reg, src_reg, off));
> -			if (!fp->aux->verifier_zext)
> +		/* dst = *(u8 *)(ul) (src + off) */
> +		case BPF_LDX | BPF_MEM | BPF_B:
> +		/* dst = *(u16 *)(ul) (src + off) */
> +		case BPF_LDX | BPF_MEM | BPF_H:
> +		/* dst = *(u32 *)(ul) (src + off) */
> +		case BPF_LDX | BPF_MEM | BPF_W:
> +		/* dst = *(u64 *)(ul) (src + off) */
> +		case BPF_LDX | BPF_MEM | BPF_DW:
Why changing the location of the comments ? I found it more readable before.

> +			switch (size) {
> +			case BPF_B:
> +				EMIT(PPC_RAW_LBZ(dst_reg, src_reg, off));
> +				break;
> +			case BPF_H:
> +				EMIT(PPC_RAW_LHZ(dst_reg, src_reg, off));
> +				break;
> +			case BPF_W:
> +				EMIT(PPC_RAW_LWZ(dst_reg, src_reg, off));
> +				break;
> +			case BPF_DW:
> +				EMIT(PPC_RAW_LWZ(dst_reg_h, src_reg, off));
> +				EMIT(PPC_RAW_LWZ(dst_reg, src_reg, off + 4));
> +				break;
> +			}

BPF_B, BPF_H, ... are not part of an enum. Are you sure GCC is happy to 
have no default ?

> +
> +			if ((size != BPF_DW) && !fp->aux->verifier_zext)

You don't need () around size != BPF_DW

>   				EMIT(PPC_RAW_LI(dst_reg_h, 0));
>   			break;
> -		case BPF_LDX | BPF_MEM | BPF_DW: /* dst = *(u64 *)(ul) (src + off) */
> -			EMIT(PPC_RAW_LWZ(dst_reg_h, src_reg, off));
> -			EMIT(PPC_RAW_LWZ(dst_reg, src_reg, off + 4));
> -			break;
>   
>   		/*
>   		 * Doubleword load
> diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c
> index 2a87da50d9a4..78b28f25555c 100644
> --- a/arch/powerpc/net/bpf_jit_comp64.c
> +++ b/arch/powerpc/net/bpf_jit_comp64.c
> @@ -282,16 +282,15 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
>   	u32 exit_addr = addrs[flen];
>   
>   	for (i = 0; i < flen; i++) {
> -		u32 code = insn[i].code;
>   		u32 dst_reg = b2p[insn[i].dst_reg];
>   		u32 src_reg = b2p[insn[i].src_reg];
> +		u32 tmp_idx, code = insn[i].code;
> +		u32 size = BPF_SIZE(code);
>   		s16 off = insn[i].off;
>   		s32 imm = insn[i].imm;
>   		bool func_addr_fixed;
> -		u64 func_addr;
> -		u64 imm64;
> +		u64 func_addr, imm64;
>   		u32 true_cond;
> -		u32 tmp_idx;

All changes other than the addition of 'size' seems unneeded and not 
linked to the current patch.

>   
>   		/*
>   		 * addrs[] maps a BPF bytecode address into a real offset from
> @@ -638,35 +637,34 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
>   		 */
>   		case BPF_STX | BPF_MEM | BPF_B: /* *(u8 *)(dst + off) = src */
>   		case BPF_ST | BPF_MEM | BPF_B: /* *(u8 *)(dst + off) = imm */
> -			if (BPF_CLASS(code) == BPF_ST) {
> -				EMIT(PPC_RAW_LI(b2p[TMP_REG_1], imm));
> -				src_reg = b2p[TMP_REG_1];
> -			}
> -			EMIT(PPC_RAW_STB(src_reg, dst_reg, off));
> -			break;
>   		case BPF_STX | BPF_MEM | BPF_H: /* (u16 *)(dst + off) = src */
>   		case BPF_ST | BPF_MEM | BPF_H: /* (u16 *)(dst + off) = imm */
> -			if (BPF_CLASS(code) == BPF_ST) {
> -				EMIT(PPC_RAW_LI(b2p[TMP_REG_1], imm));
> -				src_reg = b2p[TMP_REG_1];
> -			}
> -			EMIT(PPC_RAW_STH(src_reg, dst_reg, off));
> -			break;
>   		case BPF_STX | BPF_MEM | BPF_W: /* *(u32 *)(dst + off) = src */
>   		case BPF_ST | BPF_MEM | BPF_W: /* *(u32 *)(dst + off) = imm */
> -			if (BPF_CLASS(code) == BPF_ST) {
> -				PPC_LI32(b2p[TMP_REG_1], imm);
> -				src_reg = b2p[TMP_REG_1];
> -			}
> -			EMIT(PPC_RAW_STW(src_reg, dst_reg, off));
> -			break;
>   		case BPF_STX | BPF_MEM | BPF_DW: /* (u64 *)(dst + off) = src */
>   		case BPF_ST | BPF_MEM | BPF_DW: /* *(u64 *)(dst + off) = imm */
>   			if (BPF_CLASS(code) == BPF_ST) {
> -				PPC_LI32(b2p[TMP_REG_1], imm);
> +				if ((size == BPF_B) || (size == BPF_H))
> +					EMIT(PPC_RAW_LI(b2p[TMP_REG_1], imm));
> +				else /* size == BPF_W || size == BPF_DW */
> +					PPC_LI32(b2p[TMP_REG_1], imm);

I think you can use PPC_LI32() for all cases, it should generate the 
same code.

>   				src_reg = b2p[TMP_REG_1];
>   			}
> -			PPC_BPF_STL(src_reg, dst_reg, off);
> +
> +			switch (size) {
> +			case BPF_B:
> +				EMIT(PPC_RAW_STB(src_reg, dst_reg, off));
> +				break;
> +			case BPF_H:
> +				EMIT(PPC_RAW_STH(src_reg, dst_reg, off));
> +				break;
> +			case BPF_W:
> +				EMIT(PPC_RAW_STW(src_reg, dst_reg, off));
> +				break;
> +			case BPF_DW:
> +				PPC_BPF_STL(src_reg, dst_reg, off);
> +				break;
> +			}
>   			break;
>   
>   		/*
> @@ -716,25 +714,29 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
>   		 */
>   		/* dst = *(u8 *)(ul) (src + off) */
>   		case BPF_LDX | BPF_MEM | BPF_B:
> -			EMIT(PPC_RAW_LBZ(dst_reg, src_reg, off));
> -			if (insn_is_zext(&insn[i + 1]))
> -				addrs[++i] = ctx->idx * 4;
> -			break;
>   		/* dst = *(u16 *)(ul) (src + off) */
>   		case BPF_LDX | BPF_MEM | BPF_H:
> -			EMIT(PPC_RAW_LHZ(dst_reg, src_reg, off));
> -			if (insn_is_zext(&insn[i + 1]))
> -				addrs[++i] = ctx->idx * 4;
> -			break;
>   		/* dst = *(u32 *)(ul) (src + off) */
>   		case BPF_LDX | BPF_MEM | BPF_W:
> -			EMIT(PPC_RAW_LWZ(dst_reg, src_reg, off));
> -			if (insn_is_zext(&insn[i + 1]))
> -				addrs[++i] = ctx->idx * 4;
> -			break;
>   		/* dst = *(u64 *)(ul) (src + off) */
>   		case BPF_LDX | BPF_MEM | BPF_DW:
> -			PPC_BPF_LL(dst_reg, src_reg, off);
> +			switch (size) {
> +			case BPF_B:
> +				EMIT(PPC_RAW_LBZ(dst_reg, src_reg, off));
> +				break;
> +			case BPF_H:
> +				EMIT(PPC_RAW_LHZ(dst_reg, src_reg, off));
> +				break;
> +			case BPF_W:
> +				EMIT(PPC_RAW_LWZ(dst_reg, src_reg, off));
> +				break;
> +			case BPF_DW:
> +				PPC_BPF_LL(dst_reg, src_reg, off);
> +				break;
> +			}
> +
> +			if ((size != BPF_DW) && insn_is_zext(&insn[i + 1]))
> +				addrs[++i] = ctx->idx * 4;
>   			break;
>   
>   		/*
> 

^ permalink raw reply

* Re: [PATCH v2 4/8] powerpc/ppc-opcode: introduce PPC_RAW_BRANCH() macro
From: LEROY Christophe @ 2021-09-17 16:14 UTC (permalink / raw)
  To: Hari Bathini, naveen.n.rao@linux.ibm.com, mpe@ellerman.id.au,
	ast@kernel.org, daniel@iogearbox.net
  Cc: songliubraving@fb.com, netdev@vger.kernel.org,
	john.fastabend@gmail.com, andrii@kernel.org, kpsingh@kernel.org,
	paulus@samba.org, yhs@fb.com, bpf@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org, kafai@fb.com
In-Reply-To: <20210917153047.177141-5-hbathini@linux.ibm.com>



Le 17/09/2021 à 17:30, Hari Bathini a écrit :
> Define and use PPC_RAW_BRANCH() macro instead of open coding it. This
> macro is used while adding BPF_PROBE_MEM support.
> 
> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>

Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>

> ---
> 
> Changes in v2:
> * New patch to introduce PPC_RAW_BRANCH() macro.
> 
> 
>   arch/powerpc/include/asm/ppc-opcode.h | 2 ++
>   arch/powerpc/net/bpf_jit.h            | 4 ++--
>   2 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h
> index baea657bc868..f50213e2a3e0 100644
> --- a/arch/powerpc/include/asm/ppc-opcode.h
> +++ b/arch/powerpc/include/asm/ppc-opcode.h
> @@ -566,6 +566,8 @@
>   #define PPC_RAW_MTSPR(spr, d)		(0x7c0003a6 | ___PPC_RS(d) | __PPC_SPR(spr))
>   #define PPC_RAW_EIEIO()			(0x7c0006ac)
>   
> +#define PPC_RAW_BRANCH(addr)		(PPC_INST_BRANCH | ((addr) & 0x03fffffc))
> +
>   /* Deal with instructions that older assemblers aren't aware of */
>   #define	PPC_BCCTR_FLUSH		stringify_in_c(.long PPC_INST_BCCTR_FLUSH)
>   #define	PPC_CP_ABORT		stringify_in_c(.long PPC_RAW_CP_ABORT)
> diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
> index 411c63d945c7..0c8f885b8f48 100644
> --- a/arch/powerpc/net/bpf_jit.h
> +++ b/arch/powerpc/net/bpf_jit.h
> @@ -24,8 +24,8 @@
>   #define EMIT(instr)		PLANT_INSTR(image, ctx->idx, instr)
>   
>   /* Long jump; (unconditional 'branch') */
> -#define PPC_JMP(dest)		EMIT(PPC_INST_BRANCH |			      \
> -				     (((dest) - (ctx->idx * 4)) & 0x03fffffc))
> +#define PPC_JMP(dest)		EMIT(PPC_RAW_BRANCH((dest) - (ctx->idx * 4)))
> +
>   /* blr; (unconditional 'branch' with link) to absolute address */
>   #define PPC_BL_ABS(dest)	EMIT(PPC_INST_BL |			      \
>   				     (((dest) - (unsigned long)(image + ctx->idx)) & 0x03fffffc))
> 

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox