LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH] powerpc/book3s64/kuap: Improve error reporting with KUAP
From: Michael Ellerman @ 2020-12-15 11:09 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev, Aneesh Kumar K.V
In-Reply-To: <160756607519.1313423.7828205073526616704.b4-ty@ellerman.id.au>

Michael Ellerman <patch-notifications@ellerman.id.au> writes:
> On Tue, 8 Dec 2020 08:45:39 +0530, Aneesh Kumar K.V wrote:
>> This partially reverts commit eb232b162446 ("powerpc/book3s64/kuap: Improve
>> error reporting with KUAP") and update the fault handler to print
>> 
>> [   55.022514] Kernel attempted to access user page (7e6725b70000) - exploit attempt? (uid: 0)
>> [   55.022528] BUG: Unable to handle kernel data access on read at 0x7e6725b70000
>> [   55.022533] Faulting instruction address: 0xc000000000e8b9bc
>> [   55.022540] Oops: Kernel access of bad area, sig: 11 [#1]
>> ....
>> 
>> [...]
>
> Applied to powerpc/next.
>
> [1/1] powerpc/book3s64/kuap: Improve error reporting with KUAP
>       https://git.kernel.org/powerpc/c/eb232b1624462752dc916d9015b31ecdac0a01f1

This is wrong, the script was confused by two patches with the exact
same subject. See the other mail.

cheers

^ permalink raw reply

* Re: [PATCH v7 updated 21/22 ] powerpc/book3s64/kup: Check max key supported before enabling kup
From: Michael Ellerman @ 2020-12-15 11:19 UTC (permalink / raw)
  To: linuxppc-dev, mpe, Aneesh Kumar K.V
In-Reply-To: <20201202043854.76406-1-aneesh.kumar@linux.ibm.com>

On Fri, 27 Nov 2020 10:14:02 +0530, Aneesh Kumar K.V wrote:
> Don't enable KUEP/KUAP if we support less than or equal to 3 keys.
> 
> [...]

Applied to powerpc/next.

[21/22] powerpc/book3s64/kup: Check max key supported before enabling kup
        https://git.kernel.org/powerpc/c/61130e203dca3ba1f0c510eb12f7a4294e31a834

cheers

^ permalink raw reply

* Re: [PATCH 2/3] kbuild: LD_VERSION redenomination
From: Thomas Bogendoerfer @ 2020-12-15 13:48 UTC (permalink / raw)
  To: Masahiro Yamada
  Cc: linux-kbuild, Dominique Martinet, linuxppc-dev, linux-kernel,
	Jiaxun Yang, linux-mips, Paul Mackerras, Catalin Marinas,
	Huacai Chen, Will Deacon, linux-arm-kernel
In-Reply-To: <20201212165431.150750-2-masahiroy@kernel.org>

On Sun, Dec 13, 2020 at 01:54:30AM +0900, Masahiro Yamada wrote:
> Commit ccbef1674a15 ("Kbuild, lto: add ld-version and ld-ifversion
> macros") introduced scripts/ld-version.sh for GCC LTO.
> 
> At that time, this script handled 5 version fields because GCC LTO
> needed the downstream binutils. (https://lkml.org/lkml/2014/4/8/272)
> 
> The code snippet from the submitted patch was as follows:
> 
>     # We need HJ Lu's Linux binutils because mainline binutils does not
>     # support mixing assembler and LTO code in the same ld -r object.
>     # XXX check if the gcc plugin ld is the expected one too
>     # XXX some Fedora binutils should also support it. How to check for that?
>     ifeq ($(call ld-ifversion,-ge,22710001,y),y)
>         ...
> 
> However, GCC LTO was not merged into the mainline after all.
> (https://lkml.org/lkml/2014/4/8/272)
> 
> So, the 4th and 5th fields were never used, and finally removed by
> commit 0d61ed17dd30 ("ld-version: Drop the 4th and 5th version
> components").
> 
> Since then, the last 4-digits returned by this script is always zeros.
> 
> Remove the meaningless last 4-digits. This makes the version format
> consistent with GCC_VERSION, CLANG_VERSION, LLD_VERSION.
> 
> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
> ---
> 
>  arch/mips/loongson64/Platform | 2 +-
>  arch/mips/vdso/Kconfig        | 2 +-

Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>

Thomas.

-- 
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea.                                                [ RFC1925, 2.3 ]

^ permalink raw reply

* Re: [PATCH] powerpc/vas: Fix IRQ name allocation
From: Cédric Le Goater @ 2020-12-15 12:33 UTC (permalink / raw)
  To: Haren Myneni, linuxppc-dev; +Cc: Sukadev Bhattiprolu
In-Reply-To: <facf50fec946b5ee85f8151c4e539acf60cc149e.camel@linux.ibm.com>

On 12/15/20 11:56 AM, Haren Myneni wrote:
> On Sat, 2020-12-12 at 15:27 +0100, Cédric Le Goater wrote:
>> The VAS device allocates a generic interrupt to handle page faults
>> but
>> the IRQ name doesn't show under /proc. This is because it's on
>> stack. Allocate the name.
>>
>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> 
> Thanks for fixing.

I was wondering where those ^B interrupt numbers were coming from.

/proc/interrupts looks better now: 

     36:  ...   0  XIVE-IRQ 50331732 Edge      vas-6
     40:  ...   0  XIVE-IRQ 33554504 Edge      vas-4
     72:  ...   0  XIVE-IRQ 16777304 Edge      vas-2
    124:  ...   0  XIVE-IRQ      124 Edge      vas-0


> 
> Acked-by: Haren Myneni <haren@linux.ibm.com>
> 
>> ---
>>
>>  I didn't understand this part in init_vas_instance() :
>>
>> 	if (vinst->virq) {
>> 		rc = vas_irq_fault_window_setup(vinst);
>> 		/*
>> 		 * Fault window is used only for user space send
>> windows.
>> 		 * So if vinst->virq is NULL, tx_win_open returns
>> -ENODEV
>> 		 * for user space.
>> 		 */
>> 		if (rc)
>> 			vinst->virq = 0;
>> 	}
>>
>>  If the IRQ cannot be requested, the device probing should fail but
>>  it's not today. The use of 'vinst->virq' is suspicious.
> 
> VAS raises an interrupt only when NX sees fault on request buffers and
> faults can happen only for user space requests. So Fault window setup
> is needed for user space requests. For kernel requests, continue even
> if IRQ / fault_window_setup is failed. 
>
> When window open request is issued from user space, kernel returns
> -ENODEV if vinst->virq = 0 (means fault window setup is failed). 

It looks ok to deactivate a feature (page faulting for user space 
requests) if vas_setup_fault_window() fails but if the IRQ layer 
routine request_threaded_irq() fails, something is really wrong 
in the system and we should stop probing IMO.

We should probably move the IRQ request after allocating/mapping 
the XIVE IPI IRQ.

this test is always true : 

	if (vinst->virq) {
		rc = vas_irq_fault_window_setup(vinst);
	
since above, we did : 

	vinst->virq = irq_create_mapping(NULL, hwirq);
	if (!vinst->virq) {
		pr_err("Inst%d: Unable to map global irq %d\n",
				vinst->vas_id, hwirq);
		return -EINVAL;
	}

Cheers,

C.


> 
> 
>>
>>  arch/powerpc/platforms/powernv/vas.h |  1 +
>>  arch/powerpc/platforms/powernv/vas.c | 11 ++++++++---
>>  2 files changed, 9 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/powerpc/platforms/powernv/vas.h
>> b/arch/powerpc/platforms/powernv/vas.h
>> index 70f793e8f6cc..c7db3190baca 100644
>> --- a/arch/powerpc/platforms/powernv/vas.h
>> +++ b/arch/powerpc/platforms/powernv/vas.h
>> @@ -340,6 +340,7 @@ struct vas_instance {
>>  	struct vas_window *rxwin[VAS_COP_TYPE_MAX];
>>  	struct vas_window *windows[VAS_WINDOWS_PER_CHIP];
>>  
>> +	char *name;
>>  	char *dbgname;
>>  	struct dentry *dbgdir;
>>  };
>> diff --git a/arch/powerpc/platforms/powernv/vas.c
>> b/arch/powerpc/platforms/powernv/vas.c
>> index 598e4cd563fb..b65256a63e87 100644
>> --- a/arch/powerpc/platforms/powernv/vas.c
>> +++ b/arch/powerpc/platforms/powernv/vas.c
>> @@ -28,12 +28,10 @@ static DEFINE_PER_CPU(int, cpu_vas_id);
>>  
>>  static int vas_irq_fault_window_setup(struct vas_instance *vinst)
>>  {
>> -	char devname[64];
>>  	int rc = 0;
>>  
>> -	snprintf(devname, sizeof(devname), "vas-%d", vinst->vas_id);
>>  	rc = request_threaded_irq(vinst->virq, vas_fault_handler,
>> -				vas_fault_thread_fn, 0, devname,
>> vinst);
>> +				vas_fault_thread_fn, 0, vinst->name,
>> vinst);
>>  
>>  	if (rc) {
>>  		pr_err("VAS[%d]: Request IRQ(%d) failed with %d\n",
>> @@ -80,6 +78,12 @@ static int init_vas_instance(struct
>> platform_device *pdev)
>>  	if (!vinst)
>>  		return -ENOMEM;
>>  
>> +	vinst->name = kasprintf(GFP_KERNEL, "vas-%d", vasid);
>> +	if (!vinst->name) {
>> +		kfree(vinst);
>> +		return -ENOMEM;
>> +	}
>> +
>>  	INIT_LIST_HEAD(&vinst->node);
>>  	ida_init(&vinst->ida);
>>  	mutex_init(&vinst->mutex);
>> @@ -162,6 +166,7 @@ static int init_vas_instance(struct
>> platform_device *pdev)
>>  	return 0;
>>  
>>  free_vinst:
>> +	kfree(vinst->name);
>>  	kfree(vinst);
>>  	return -ENODEV;
>>  
> 


^ permalink raw reply

* Re: [PATCH] powerpc/boot: Fix build of dts/fsl
From: Masahiro Yamada @ 2020-12-15 18:02 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev
In-Reply-To: <20201215032906.473460-1-mpe@ellerman.id.au>

On Tue, Dec 15, 2020 at 12:29 PM Michael Ellerman <mpe@ellerman.id.au> wrote:
>
> The lkp robot reported that some configs fail to build, for example
> mpc85xx_smp_defconfig, with:
>
>   cc1: fatal error: opening output file arch/powerpc/boot/dts/fsl/.mpc8540ads.dtb.dts.tmp: No such file or directory
>
> This bisects to:
>   cc8a51ca6f05 ("kbuild: always create directories of targets")
>
> Although that commit claims to be about in-tree builds, it somehow
> breaks out-of-tree builds. But presumably it's just exposing a latent
> bug in our Makefiles.
>
> We can fix it by adding to targets for dts/fsl in the same way that we
> do for dts.
>
> Fixes: cc8a51ca6f05 ("kbuild: always create directories of targets")
> Reported-by: kernel test robot <lkp@intel.com>
> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
> ---
>  arch/powerpc/boot/Makefile | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile
> index 68a7534454cd..c3e084cceaed 100644
> --- a/arch/powerpc/boot/Makefile
> +++ b/arch/powerpc/boot/Makefile
> @@ -372,6 +372,8 @@ initrd-y := $(filter-out $(image-y), $(initrd-y))
>  targets        += $(image-y) $(initrd-y)
>  targets += $(foreach x, dtbImage uImage cuImage simpleImage treeImage, \
>                 $(patsubst $(x).%, dts/%.dtb, $(filter $(x).%, $(image-y))))
> +targets += $(foreach x, dtbImage uImage cuImage simpleImage treeImage, \
> +               $(patsubst $(x).%, dts/fsl/%.dtb, $(filter $(x).%, $(image-y))))
>
>  $(addprefix $(obj)/, $(initrd-y)): $(obj)/ramdisk.image.gz
>
> --
> 2.25.1
>


Some freescale dts files are right under arch/powerpc/boot/dts/,
but some are in the fsl/ subdirectory.
I do not understand the policy.


If "fsl/" is a very special case,
I just thought we could add a new syntax, fslimage-y,
but I do not mind either way.


fslimage-$(CONFIG_MPC8540_ADS) += cuImage.mpc8540ads

targets += $(foreach x, dtbImage uImage cuImage simpleImage treeImage, \
               $(patsubst $(x).%, dts/fsl/%.dtb, $(filter $(x).%,
$(fslimage-y))))


This Makefile is wrong over-all anyway.




-- 
Best Regards
Masahiro Yamada

^ permalink raw reply

* [RFC PATCH] treewide: remove bzip2 compression support
From: Alex Xu (Hello71) @ 2020-12-15 19:03 UTC (permalink / raw)
  To: linux-kernel, linux-kbuild, linux-arm-kernel, linux-aspeed,
	linux-mips, openrisc, linux-parisc, linuxppc-dev, linux-riscv,
	linux-s390, linux-sh, linux-xtensa
  Cc: torvalds, Alex Xu (Hello71)
In-Reply-To: <20201215190315.8681-1-alex_y_xu.ref@yahoo.ca>

bzip2 is either slower or larger than every other supported algorithm,
according to benchmarks at [0]. It is far slower to decompress than any
other algorithm, and still larger than lzma, xz, and zstd.

[0] https://lore.kernel.org/lkml/1588791882.08g1378g67.none@localhost/

Signed-off-by: Alex Xu (Hello71) <alex_y_xu@yahoo.ca>
---
 Documentation/x86/boot.rst                 |   8 +-
 arch/arm/configs/aspeed_g4_defconfig       |   1 -
 arch/arm/configs/aspeed_g5_defconfig       |   1 -
 arch/arm/configs/ezx_defconfig             |   1 -
 arch/arm/configs/imote2_defconfig          |   1 -
 arch/arm/configs/lpc18xx_defconfig         |   1 -
 arch/arm/configs/vf610m4_defconfig         |   1 -
 arch/arm64/boot/Makefile                   |   5 +-
 arch/mips/Kconfig                          |   1 -
 arch/mips/Makefile                         |   3 -
 arch/mips/boot/Makefile                    |  14 -
 arch/mips/boot/compressed/Makefile         |   1 -
 arch/mips/boot/compressed/decompress.c     |   4 -
 arch/mips/configs/ath25_defconfig          |   1 -
 arch/mips/configs/pistachio_defconfig      |   1 -
 arch/openrisc/configs/simple_smp_defconfig |   1 -
 arch/parisc/Kconfig                        |   1 -
 arch/parisc/boot/compressed/Makefile       |   5 +-
 arch/parisc/boot/compressed/misc.c         |   4 -
 arch/powerpc/configs/skiroot_defconfig     |   1 -
 arch/riscv/boot/Makefile                   |   3 -
 arch/riscv/configs/nommu_k210_defconfig    |   1 -
 arch/riscv/configs/nommu_virt_defconfig    |   1 -
 arch/s390/Kconfig                          |   1 -
 arch/s390/boot/compressed/Makefile         |   5 +-
 arch/s390/boot/compressed/decompressor.c   |   8 -
 arch/sh/Kconfig                            |   1 -
 arch/sh/Makefile                           |   3 +-
 arch/sh/boot/Makefile                      |  11 +-
 arch/sh/boot/compressed/Makefile           |   5 +-
 arch/sh/boot/compressed/misc.c             |   8 -
 arch/sh/configs/sdk7786_defconfig          |   1 -
 arch/x86/Kconfig                           |   1 -
 arch/x86/boot/compressed/Makefile          |   9 +-
 arch/x86/boot/compressed/misc.c            |   4 -
 arch/x86/include/asm/boot.h                |   4 +-
 arch/xtensa/configs/cadence_csp_defconfig  |   1 -
 arch/xtensa/configs/nommu_kc705_defconfig  |   1 -
 include/linux/decompress/bunzip2.h         |  11 -
 init/Kconfig                               |  22 +-
 init/do_mounts_rd.c                        |   1 -
 kernel/configs/tiny.config                 |   1 -
 lib/Kconfig                                |   3 -
 lib/Makefile                               |   1 -
 lib/decompress.c                           |   5 -
 lib/decompress_bunzip2.c                   | 756 ---------------------
 scripts/Makefile.lib                       |   8 +-
 scripts/Makefile.package                   |   1 -
 scripts/package/buildtar                   |   2 +-
 usr/Kconfig                                |  26 +-
 usr/Makefile                               |   3 +-
 51 files changed, 22 insertions(+), 942 deletions(-)
 delete mode 100644 include/linux/decompress/bunzip2.h
 delete mode 100644 lib/decompress_bunzip2.c

diff --git a/Documentation/x86/boot.rst b/Documentation/x86/boot.rst
index abb9fc164657..741eebc10140 100644
--- a/Documentation/x86/boot.rst
+++ b/Documentation/x86/boot.rst
@@ -781,10 +781,10 @@ Protocol:	2.08+
   The payload may be compressed. The format of both the compressed and
   uncompressed data should be determined using the standard magic
   numbers.  The currently supported compression formats are gzip
-  (magic numbers 1F 8B or 1F 9E), bzip2 (magic number 42 5A), LZMA
-  (magic number 5D 00), XZ (magic number FD 37), LZ4 (magic number
-  02 21) and ZSTD (magic number 28 B5). The uncompressed payload is
-  currently always ELF (magic number 7F 45 4C 46).
+  (magic numbers 1F 8B or 1F 9E), LZMA (magic number 5D 00), XZ (magic
+  number FD 37), LZ4 (magic number 02 21) and ZSTD (magic number 28 B5).
+  Formerly supported was bzip2 (magic number 42 5A). The uncompressed
+  payload is currently always ELF (magic number 7F 45 4C 46).
 
 ============	==============
 Field name:	payload_length
diff --git a/arch/arm/configs/aspeed_g4_defconfig b/arch/arm/configs/aspeed_g4_defconfig
index 58d293b63581..f2f5dcd0e59c 100644
--- a/arch/arm/configs/aspeed_g4_defconfig
+++ b/arch/arm/configs/aspeed_g4_defconfig
@@ -8,7 +8,6 @@ CONFIG_IKCONFIG_PROC=y
 CONFIG_LOG_BUF_SHIFT=16
 CONFIG_CGROUPS=y
 CONFIG_BLK_DEV_INITRD=y
-# CONFIG_RD_BZIP2 is not set
 # CONFIG_RD_LZO is not set
 # CONFIG_RD_LZ4 is not set
 # CONFIG_UID16 is not set
diff --git a/arch/arm/configs/aspeed_g5_defconfig b/arch/arm/configs/aspeed_g5_defconfig
index 047975eccefb..5d045b2902d6 100644
--- a/arch/arm/configs/aspeed_g5_defconfig
+++ b/arch/arm/configs/aspeed_g5_defconfig
@@ -8,7 +8,6 @@ CONFIG_IKCONFIG_PROC=y
 CONFIG_LOG_BUF_SHIFT=16
 CONFIG_CGROUPS=y
 CONFIG_BLK_DEV_INITRD=y
-# CONFIG_RD_BZIP2 is not set
 # CONFIG_RD_LZO is not set
 # CONFIG_RD_LZ4 is not set
 # CONFIG_UID16 is not set
diff --git a/arch/arm/configs/ezx_defconfig b/arch/arm/configs/ezx_defconfig
index 81665b7abf83..422592786e01 100644
--- a/arch/arm/configs/ezx_defconfig
+++ b/arch/arm/configs/ezx_defconfig
@@ -4,7 +4,6 @@ CONFIG_SYSVIPC=y
 CONFIG_LOG_BUF_SHIFT=14
 CONFIG_SYSFS_DEPRECATED_V2=y
 CONFIG_BLK_DEV_INITRD=y
-CONFIG_RD_BZIP2=y
 CONFIG_RD_LZMA=y
 CONFIG_EXPERT=y
 # CONFIG_COMPAT_BRK is not set
diff --git a/arch/arm/configs/imote2_defconfig b/arch/arm/configs/imote2_defconfig
index ae15a2a33802..04e23ec01af6 100644
--- a/arch/arm/configs/imote2_defconfig
+++ b/arch/arm/configs/imote2_defconfig
@@ -3,7 +3,6 @@ CONFIG_SYSVIPC=y
 CONFIG_LOG_BUF_SHIFT=14
 CONFIG_SYSFS_DEPRECATED_V2=y
 CONFIG_BLK_DEV_INITRD=y
-CONFIG_RD_BZIP2=y
 CONFIG_RD_LZMA=y
 CONFIG_EXPERT=y
 # CONFIG_COMPAT_BRK is not set
diff --git a/arch/arm/configs/lpc18xx_defconfig b/arch/arm/configs/lpc18xx_defconfig
index be882ea0eee4..12e69dfa18dc 100644
--- a/arch/arm/configs/lpc18xx_defconfig
+++ b/arch/arm/configs/lpc18xx_defconfig
@@ -1,7 +1,6 @@
 CONFIG_HIGH_RES_TIMERS=y
 CONFIG_PREEMPT=y
 CONFIG_BLK_DEV_INITRD=y
-# CONFIG_RD_BZIP2 is not set
 # CONFIG_RD_LZMA is not set
 # CONFIG_RD_XZ is not set
 # CONFIG_RD_LZO is not set
diff --git a/arch/arm/configs/vf610m4_defconfig b/arch/arm/configs/vf610m4_defconfig
index a89f035c3b01..02c7acc7cd09 100644
--- a/arch/arm/configs/vf610m4_defconfig
+++ b/arch/arm/configs/vf610m4_defconfig
@@ -1,6 +1,5 @@
 CONFIG_NAMESPACES=y
 CONFIG_BLK_DEV_INITRD=y
-# CONFIG_RD_BZIP2 is not set
 # CONFIG_RD_LZMA is not set
 # CONFIG_RD_XZ is not set
 # CONFIG_RD_LZ4 is not set
diff --git a/arch/arm64/boot/Makefile b/arch/arm64/boot/Makefile
index cd3414898d10..d3207561c078 100644
--- a/arch/arm64/boot/Makefile
+++ b/arch/arm64/boot/Makefile
@@ -16,14 +16,11 @@
 
 OBJCOPYFLAGS_Image :=-O binary -R .note -R .note.gnu.build-id -R .comment -S
 
-targets := Image Image.bz2 Image.gz Image.lz4 Image.lzma Image.lzo
+targets := Image Image.gz Image.lz4 Image.lzma Image.lzo
 
 $(obj)/Image: vmlinux FORCE
 	$(call if_changed,objcopy)
 
-$(obj)/Image.bz2: $(obj)/Image FORCE
-	$(call if_changed,bzip2)
-
 $(obj)/Image.gz: $(obj)/Image FORCE
 	$(call if_changed,gzip)
 
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 6b762bebff33..d25d900294bb 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -1880,7 +1880,6 @@ endif # CPU_LOONGSON2F
 config SYS_SUPPORTS_ZBOOT
 	bool
 	select HAVE_KERNEL_GZIP
-	select HAVE_KERNEL_BZIP2
 	select HAVE_KERNEL_LZ4
 	select HAVE_KERNEL_LZMA
 	select HAVE_KERNEL_LZO
diff --git a/arch/mips/Makefile b/arch/mips/Makefile
index 0d0f29d662c9..2dd53daf6fbb 100644
--- a/arch/mips/Makefile
+++ b/arch/mips/Makefile
@@ -328,14 +328,12 @@ boot-y			+= vmlinux.srec
 ifeq ($(shell expr $(load-y) \< 0xffffffff80000000 2> /dev/null), 0)
 boot-y			+= uImage
 boot-y			+= uImage.bin
-boot-y			+= uImage.bz2
 boot-y			+= uImage.gz
 boot-y			+= uImage.lzma
 boot-y			+= uImage.lzo
 endif
 boot-y			+= vmlinux.itb
 boot-y			+= vmlinux.gz.itb
-boot-y			+= vmlinux.bz2.itb
 boot-y			+= vmlinux.lzma.itb
 boot-y			+= vmlinux.lzo.itb
 
@@ -429,7 +427,6 @@ define archhelp
 	echo '  vmlinuz.srec         - SREC zboot image'
 	echo '  uImage               - U-Boot image'
 	echo '  uImage.bin           - U-Boot image (uncompressed)'
-	echo '  uImage.bz2           - U-Boot image (bz2)'
 	echo '  uImage.gz            - U-Boot image (gzip)'
 	echo '  uImage.lzma          - U-Boot image (lzma)'
 	echo '  uImage.lzo           - U-Boot image (lzo)'
diff --git a/arch/mips/boot/Makefile b/arch/mips/boot/Makefile
index a3da2c5d63c2..78f70e3576cd 100644
--- a/arch/mips/boot/Makefile
+++ b/arch/mips/boot/Makefile
@@ -24,7 +24,6 @@ strip-flags   := $(addprefix --remove-section=,$(drop-sections))
 hostprogs := elf2ecoff
 
 suffix-y			:= bin
-suffix-$(CONFIG_KERNEL_BZIP2)	:= bz2
 suffix-$(CONFIG_KERNEL_GZIP)	:= gz
 suffix-$(CONFIG_KERNEL_LZMA)	:= lzma
 suffix-$(CONFIG_KERNEL_LZO)	:= lzo
@@ -54,14 +53,10 @@ UIMAGE_ENTRYADDR = $(VMLINUX_ENTRY_ADDRESS)
 # Compressed vmlinux images
 #
 
-extra-y += vmlinux.bin.bz2
 extra-y += vmlinux.bin.gz
 extra-y += vmlinux.bin.lzma
 extra-y += vmlinux.bin.lzo
 
-$(obj)/vmlinux.bin.bz2: $(obj)/vmlinux.bin FORCE
-	$(call if_changed,bzip2)
-
 $(obj)/vmlinux.bin.gz: $(obj)/vmlinux.bin FORCE
 	$(call if_changed,gzip)
 
@@ -77,7 +72,6 @@ $(obj)/vmlinux.bin.lzo: $(obj)/vmlinux.bin FORCE
 
 targets += uImage
 targets += uImage.bin
-targets += uImage.bz2
 targets += uImage.gz
 targets += uImage.lzma
 targets += uImage.lzo
@@ -85,9 +79,6 @@ targets += uImage.lzo
 $(obj)/uImage.bin: $(obj)/vmlinux.bin FORCE
 	$(call if_changed,uimage,none)
 
-$(obj)/uImage.bz2: $(obj)/vmlinux.bin.bz2 FORCE
-	$(call if_changed,uimage,bzip2)
-
 $(obj)/uImage.gz: $(obj)/vmlinux.bin.gz FORCE
 	$(call if_changed,uimage,gzip)
 
@@ -122,7 +113,6 @@ $(obj)/vmlinux.its.S: $(addprefix $(srctree)/arch/mips/$(PLATFORM)/,$(ITS_INPUTS
 
 targets += vmlinux.its
 targets += vmlinux.gz.its
-targets += vmlinux.bz2.its
 targets += vmlinux.lzma.its
 targets += vmlinux.lzo.its
 
@@ -142,9 +132,6 @@ $(obj)/vmlinux.its: $(obj)/vmlinux.its.S $(VMLINUX) FORCE
 $(obj)/vmlinux.gz.its: $(obj)/vmlinux.its.S $(VMLINUX) FORCE
 	$(call if_changed,cpp_its_S,gzip,vmlinux.bin.gz)
 
-$(obj)/vmlinux.bz2.its: $(obj)/vmlinux.its.S $(VMLINUX)  FORCE
-	$(call if_changed,cpp_its_S,bzip2,vmlinux.bin.bz2)
-
 $(obj)/vmlinux.lzma.its: $(obj)/vmlinux.its.S $(VMLINUX) FORCE
 	$(call if_changed,cpp_its_S,lzma,vmlinux.bin.lzma)
 
@@ -153,7 +140,6 @@ $(obj)/vmlinux.lzo.its: $(obj)/vmlinux.its.S $(VMLINUX) FORCE
 
 targets += vmlinux.itb
 targets += vmlinux.gz.itb
-targets += vmlinux.bz2.itb
 targets += vmlinux.lzma.itb
 targets += vmlinux.lzo.itb
 
diff --git a/arch/mips/boot/compressed/Makefile b/arch/mips/boot/compressed/Makefile
index d66511825fe1..8fbd72b466e6 100644
--- a/arch/mips/boot/compressed/Makefile
+++ b/arch/mips/boot/compressed/Makefile
@@ -70,7 +70,6 @@ $(obj)/vmlinux.bin: $(KBUILD_IMAGE) FORCE
 	$(call if_changed,objcopy)
 
 tool_$(CONFIG_KERNEL_GZIP)    = gzip
-tool_$(CONFIG_KERNEL_BZIP2)   = bzip2
 tool_$(CONFIG_KERNEL_LZ4)     = lz4
 tool_$(CONFIG_KERNEL_LZMA)    = lzma
 tool_$(CONFIG_KERNEL_LZO)     = lzo
diff --git a/arch/mips/boot/compressed/decompress.c b/arch/mips/boot/compressed/decompress.c
index c61c641674e6..ac7ccab2bb52 100644
--- a/arch/mips/boot/compressed/decompress.c
+++ b/arch/mips/boot/compressed/decompress.c
@@ -52,10 +52,6 @@ void error(char *x)
 #include "../../../../lib/decompress_inflate.c"
 #endif
 
-#ifdef CONFIG_KERNEL_BZIP2
-#include "../../../../lib/decompress_bunzip2.c"
-#endif
-
 #ifdef CONFIG_KERNEL_LZ4
 #include "../../../../lib/decompress_unlz4.c"
 #endif
diff --git a/arch/mips/configs/ath25_defconfig b/arch/mips/configs/ath25_defconfig
index 7143441f5476..1e12d3018c15 100644
--- a/arch/mips/configs/ath25_defconfig
+++ b/arch/mips/configs/ath25_defconfig
@@ -4,7 +4,6 @@ CONFIG_SYSVIPC=y
 CONFIG_HIGH_RES_TIMERS=y
 CONFIG_BLK_DEV_INITRD=y
 # CONFIG_RD_GZIP is not set
-# CONFIG_RD_BZIP2 is not set
 # CONFIG_RD_XZ is not set
 # CONFIG_RD_LZO is not set
 # CONFIG_RD_LZ4 is not set
diff --git a/arch/mips/configs/pistachio_defconfig b/arch/mips/configs/pistachio_defconfig
index b9adf15ebbec..ad31439400c6 100644
--- a/arch/mips/configs/pistachio_defconfig
+++ b/arch/mips/configs/pistachio_defconfig
@@ -14,7 +14,6 @@ CONFIG_CGROUP_FREEZER=y
 CONFIG_NAMESPACES=y
 CONFIG_USER_NS=y
 CONFIG_BLK_DEV_INITRD=y
-# CONFIG_RD_BZIP2 is not set
 # CONFIG_RD_LZMA is not set
 # CONFIG_RD_LZO is not set
 # CONFIG_RD_LZ4 is not set
diff --git a/arch/openrisc/configs/simple_smp_defconfig b/arch/openrisc/configs/simple_smp_defconfig
index ff49d868e040..74a5fe83aa17 100644
--- a/arch/openrisc/configs/simple_smp_defconfig
+++ b/arch/openrisc/configs/simple_smp_defconfig
@@ -3,7 +3,6 @@ CONFIG_NO_HZ=y
 CONFIG_LOG_BUF_SHIFT=14
 CONFIG_BLK_DEV_INITRD=y
 # CONFIG_RD_GZIP is not set
-# CONFIG_RD_BZIP2 is not set
 # CONFIG_RD_LZMA is not set
 # CONFIG_RD_XZ is not set
 # CONFIG_RD_LZO is not set
diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig
index b234e8154cbd..4eee43c1e7d9 100644
--- a/arch/parisc/Kconfig
+++ b/arch/parisc/Kconfig
@@ -22,7 +22,6 @@ config PARISC
 	select BUILDTIME_TABLE_SORT
 	select HAVE_PCI
 	select HAVE_PERF_EVENTS
-	select HAVE_KERNEL_BZIP2
 	select HAVE_KERNEL_GZIP
 	select HAVE_KERNEL_LZ4
 	select HAVE_KERNEL_LZMA
diff --git a/arch/parisc/boot/compressed/Makefile b/arch/parisc/boot/compressed/Makefile
index dff453687530..2c9403ebb96a 100644
--- a/arch/parisc/boot/compressed/Makefile
+++ b/arch/parisc/boot/compressed/Makefile
@@ -9,7 +9,7 @@ KCOV_INSTRUMENT := n
 GCOV_PROFILE := n
 UBSAN_SANITIZE := n
 
-targets := vmlinux.lds vmlinux vmlinux.bin vmlinux.bin.gz vmlinux.bin.bz2
+targets := vmlinux.lds vmlinux vmlinux.bin vmlinux.bin.gz
 targets += vmlinux.bin.xz vmlinux.bin.lzma vmlinux.bin.lzo vmlinux.bin.lz4
 targets += misc.o piggy.o sizes.h head.o real2.o firmware.o
 targets += real2.S firmware.c
@@ -64,7 +64,6 @@ $(obj)/vmlinux.bin: vmlinux FORCE
 vmlinux.bin.all-y := $(obj)/vmlinux.bin
 
 suffix-$(CONFIG_KERNEL_GZIP)  := gz
-suffix-$(CONFIG_KERNEL_BZIP2) := bz2
 suffix-$(CONFIG_KERNEL_LZ4)  := lz4
 suffix-$(CONFIG_KERNEL_LZMA)  := lzma
 suffix-$(CONFIG_KERNEL_LZO)  := lzo
@@ -72,8 +71,6 @@ suffix-$(CONFIG_KERNEL_XZ)  := xz
 
 $(obj)/vmlinux.bin.gz: $(vmlinux.bin.all-y)
 	$(call if_changed,gzip)
-$(obj)/vmlinux.bin.bz2: $(vmlinux.bin.all-y)
-	$(call if_changed,bzip2)
 $(obj)/vmlinux.bin.lz4: $(vmlinux.bin.all-y)
 	$(call if_changed,lz4)
 $(obj)/vmlinux.bin.lzma: $(vmlinux.bin.all-y)
diff --git a/arch/parisc/boot/compressed/misc.c b/arch/parisc/boot/compressed/misc.c
index 2d395998f524..247a0b138cb1 100644
--- a/arch/parisc/boot/compressed/misc.c
+++ b/arch/parisc/boot/compressed/misc.c
@@ -42,10 +42,6 @@ static unsigned long free_mem_end_ptr;
 #include "../../../../lib/decompress_inflate.c"
 #endif
 
-#ifdef CONFIG_KERNEL_BZIP2
-#include "../../../../lib/decompress_bunzip2.c"
-#endif
-
 #ifdef CONFIG_KERNEL_LZ4
 #include "../../../../lib/decompress_unlz4.c"
 #endif
diff --git a/arch/powerpc/configs/skiroot_defconfig b/arch/powerpc/configs/skiroot_defconfig
index b806a5d3a695..28139c0294e9 100644
--- a/arch/powerpc/configs/skiroot_defconfig
+++ b/arch/powerpc/configs/skiroot_defconfig
@@ -11,7 +11,6 @@ CONFIG_IKCONFIG_PROC=y
 CONFIG_LOG_BUF_SHIFT=20
 CONFIG_BLK_DEV_INITRD=y
 # CONFIG_RD_GZIP is not set
-# CONFIG_RD_BZIP2 is not set
 # CONFIG_RD_LZMA is not set
 # CONFIG_RD_LZO is not set
 # CONFIG_RD_LZ4 is not set
diff --git a/arch/riscv/boot/Makefile b/arch/riscv/boot/Makefile
index c59fca695f9d..944ea5165225 100644
--- a/arch/riscv/boot/Makefile
+++ b/arch/riscv/boot/Makefile
@@ -31,9 +31,6 @@ $(obj)/loader.o: $(src)/loader.S $(obj)/Image
 $(obj)/loader: $(obj)/loader.o $(obj)/Image $(obj)/loader.lds FORCE
 	$(Q)$(LD) -T $(obj)/loader.lds -o $@ $(obj)/loader.o
 
-$(obj)/Image.bz2: $(obj)/Image FORCE
-	$(call if_changed,bzip2)
-
 $(obj)/Image.lz4: $(obj)/Image FORCE
 	$(call if_changed,lz4)
 
diff --git a/arch/riscv/configs/nommu_k210_defconfig b/arch/riscv/configs/nommu_k210_defconfig
index cd1df62b13c7..a71b615fa1b1 100644
--- a/arch/riscv/configs/nommu_k210_defconfig
+++ b/arch/riscv/configs/nommu_k210_defconfig
@@ -3,7 +3,6 @@ CONFIG_LOG_BUF_SHIFT=15
 CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=12
 CONFIG_BLK_DEV_INITRD=y
 CONFIG_INITRAMFS_FORCE=y
-# CONFIG_RD_BZIP2 is not set
 # CONFIG_RD_LZMA is not set
 # CONFIG_RD_XZ is not set
 # CONFIG_RD_LZO is not set
diff --git a/arch/riscv/configs/nommu_virt_defconfig b/arch/riscv/configs/nommu_virt_defconfig
index e046a0babde4..fb72b5d6c0ec 100644
--- a/arch/riscv/configs/nommu_virt_defconfig
+++ b/arch/riscv/configs/nommu_virt_defconfig
@@ -2,7 +2,6 @@
 CONFIG_LOG_BUF_SHIFT=16
 CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=12
 CONFIG_BLK_DEV_INITRD=y
-# CONFIG_RD_BZIP2 is not set
 # CONFIG_RD_LZMA is not set
 # CONFIG_RD_XZ is not set
 # CONFIG_RD_LZO is not set
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index a60cc523d810..4bdfe1d64836 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -153,7 +153,6 @@ config S390
 	select HAVE_FUTEX_CMPXCHG if FUTEX
 	select HAVE_GCC_PLUGINS
 	select HAVE_GENERIC_VDSO
-	select HAVE_KERNEL_BZIP2
 	select HAVE_KERNEL_GZIP
 	select HAVE_KERNEL_LZ4
 	select HAVE_KERNEL_LZMA
diff --git a/arch/s390/boot/compressed/Makefile b/arch/s390/boot/compressed/Makefile
index de18dab518bb..db11dc6264aa 100644
--- a/arch/s390/boot/compressed/Makefile
+++ b/arch/s390/boot/compressed/Makefile
@@ -12,7 +12,7 @@ KASAN_SANITIZE := n
 
 obj-y	:= $(if $(CONFIG_KERNEL_UNCOMPRESSED),,decompressor.o) info.o
 obj-all := $(obj-y) piggy.o syms.o
-targets	:= vmlinux.lds vmlinux vmlinux.bin vmlinux.bin.gz vmlinux.bin.bz2
+targets	:= vmlinux.lds vmlinux vmlinux.bin vmlinux.bin.gz
 targets += vmlinux.bin.xz vmlinux.bin.lzma vmlinux.bin.lzo vmlinux.bin.lz4
 targets += info.bin syms.bin vmlinux.syms $(obj-all)
 
@@ -58,7 +58,6 @@ $(obj)/vmlinux.bin: vmlinux FORCE
 vmlinux.bin.all-y := $(obj)/vmlinux.bin
 
 suffix-$(CONFIG_KERNEL_GZIP)  := .gz
-suffix-$(CONFIG_KERNEL_BZIP2) := .bz2
 suffix-$(CONFIG_KERNEL_LZ4)  := .lz4
 suffix-$(CONFIG_KERNEL_LZMA)  := .lzma
 suffix-$(CONFIG_KERNEL_LZO)  := .lzo
@@ -66,8 +65,6 @@ suffix-$(CONFIG_KERNEL_XZ)  := .xz
 
 $(obj)/vmlinux.bin.gz: $(vmlinux.bin.all-y) FORCE
 	$(call if_changed,gzip)
-$(obj)/vmlinux.bin.bz2: $(vmlinux.bin.all-y) FORCE
-	$(call if_changed,bzip2)
 $(obj)/vmlinux.bin.lz4: $(vmlinux.bin.all-y) FORCE
 	$(call if_changed,lz4)
 $(obj)/vmlinux.bin.lzma: $(vmlinux.bin.all-y) FORCE
diff --git a/arch/s390/boot/compressed/decompressor.c b/arch/s390/boot/compressed/decompressor.c
index 3061b11c4d27..87395950cc69 100644
--- a/arch/s390/boot/compressed/decompressor.c
+++ b/arch/s390/boot/compressed/decompressor.c
@@ -28,11 +28,7 @@ extern char _end[];
 extern unsigned char _compressed_start[];
 extern unsigned char _compressed_end[];
 
-#ifdef CONFIG_HAVE_KERNEL_BZIP2
-#define BOOT_HEAP_SIZE	0x400000
-#else
 #define BOOT_HEAP_SIZE	0x10000
-#endif
 
 static unsigned long free_mem_ptr = (unsigned long) _end;
 static unsigned long free_mem_end_ptr = (unsigned long) _end + BOOT_HEAP_SIZE;
@@ -41,10 +37,6 @@ static unsigned long free_mem_end_ptr = (unsigned long) _end + BOOT_HEAP_SIZE;
 #include "../../../../lib/decompress_inflate.c"
 #endif
 
-#ifdef CONFIG_KERNEL_BZIP2
-#include "../../../../lib/decompress_bunzip2.c"
-#endif
-
 #ifdef CONFIG_KERNEL_LZ4
 #include "../../../../lib/decompress_unlz4.c"
 #endif
diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig
index 159da4ed578f..df4113457afd 100644
--- a/arch/sh/Kconfig
+++ b/arch/sh/Kconfig
@@ -42,7 +42,6 @@ config SUPERH
 	select HAVE_HW_BREAKPOINT
 	select HAVE_IDE if HAS_IOPORT_MAP
 	select HAVE_IOREMAP_PROT if MMU && !X2TLB
-	select HAVE_KERNEL_BZIP2
 	select HAVE_KERNEL_GZIP
 	select HAVE_KERNEL_LZMA
 	select HAVE_KERNEL_LZO
diff --git a/arch/sh/Makefile b/arch/sh/Makefile
index 2faebfd72eca..6fdaa8d2d835 100644
--- a/arch/sh/Makefile
+++ b/arch/sh/Makefile
@@ -189,7 +189,7 @@ endif
 
 libs-y			:= arch/sh/lib/	$(libs-y)
 
-BOOT_TARGETS = uImage uImage.bz2 uImage.gz uImage.lzma uImage.xz uImage.lzo \
+BOOT_TARGETS = uImage uImage.gz uImage.lzma uImage.xz uImage.lzo \
 	       uImage.srec uImage.bin zImage vmlinux.bin vmlinux.srec \
 	       romImage
 PHONY += $(BOOT_TARGETS)
@@ -220,7 +220,6 @@ define archhelp
 	@echo '  uImage.srec	           - Create an S-record for U-Boot'
 	@echo '  uImage.bin	           - Kernel-only image for U-Boot (bin)'
 	@echo '* uImage.gz	           - Kernel-only image for U-Boot (gzip)'
-	@echo '  uImage.bz2	           - Kernel-only image for U-Boot (bzip2)'
 	@echo '  uImage.lzma	           - Kernel-only image for U-Boot (lzma)'
 	@echo '  uImage.xz	           - Kernel-only image for U-Boot (xz)'
 	@echo '  uImage.lzo	           - Kernel-only image for U-Boot (lzo)'
diff --git a/arch/sh/boot/Makefile b/arch/sh/boot/Makefile
index 58592dfa5cb6..8f742b8285a5 100644
--- a/arch/sh/boot/Makefile
+++ b/arch/sh/boot/Makefile
@@ -21,14 +21,13 @@ CONFIG_PHYSICAL_START	?= $(CONFIG_MEMORY_START)
 
 suffix-y := bin
 suffix-$(CONFIG_KERNEL_GZIP)	:= gz
-suffix-$(CONFIG_KERNEL_BZIP2)	:= bz2
 suffix-$(CONFIG_KERNEL_LZMA)	:= lzma
 suffix-$(CONFIG_KERNEL_XZ)	:= xz
 suffix-$(CONFIG_KERNEL_LZO)	:= lzo
 
 targets := zImage vmlinux.srec romImage uImage uImage.srec uImage.gz \
-	   uImage.bz2 uImage.lzma uImage.xz uImage.lzo uImage.bin
-extra-y += vmlinux.bin vmlinux.bin.gz vmlinux.bin.bz2 vmlinux.bin.lzma \
+	   uImage.lzma uImage.xz uImage.lzo uImage.bin
+extra-y += vmlinux.bin vmlinux.bin.gz vmlinux.bin.lzma \
 	   vmlinux.bin.xz vmlinux.bin.lzo
 subdir- := compressed romimage
 
@@ -68,9 +67,6 @@ $(obj)/vmlinux.bin: vmlinux FORCE
 $(obj)/vmlinux.bin.gz: $(obj)/vmlinux.bin FORCE
 	$(call if_changed,gzip)
 
-$(obj)/vmlinux.bin.bz2: $(obj)/vmlinux.bin FORCE
-	$(call if_changed,bzip2)
-
 $(obj)/vmlinux.bin.lzma: $(obj)/vmlinux.bin FORCE
 	$(call if_changed,lzma)
 
@@ -80,9 +76,6 @@ $(obj)/vmlinux.bin.xz: $(obj)/vmlinux.bin FORCE
 $(obj)/vmlinux.bin.lzo: $(obj)/vmlinux.bin FORCE
 	$(call if_changed,lzo)
 
-$(obj)/uImage.bz2: $(obj)/vmlinux.bin.bz2
-	$(call if_changed,uimage,bzip2)
-
 $(obj)/uImage.gz: $(obj)/vmlinux.bin.gz
 	$(call if_changed,uimage,gzip)
 
diff --git a/arch/sh/boot/compressed/Makefile b/arch/sh/boot/compressed/Makefile
index 589d2d8a573d..641b16826383 100644
--- a/arch/sh/boot/compressed/Makefile
+++ b/arch/sh/boot/compressed/Makefile
@@ -6,8 +6,7 @@
 #
 
 targets		:= vmlinux vmlinux.bin vmlinux.bin.gz \
-		   vmlinux.bin.bz2 vmlinux.bin.lzma \
-		   vmlinux.bin.xz vmlinux.bin.lzo \
+		   vmlinux.bin.lzma vmlinux.bin.xz vmlinux.bin.lzo \
 		   head_32.o misc.o piggy.o
 
 OBJECTS = $(obj)/head_32.o $(obj)/misc.o $(obj)/cache.o
@@ -57,8 +56,6 @@ vmlinux.bin.all-y := $(obj)/vmlinux.bin
 
 $(obj)/vmlinux.bin.gz: $(vmlinux.bin.all-y) FORCE
 	$(call if_changed,gzip)
-$(obj)/vmlinux.bin.bz2: $(vmlinux.bin.all-y) FORCE
-	$(call if_changed,bzip2)
 $(obj)/vmlinux.bin.lzma: $(vmlinux.bin.all-y) FORCE
 	$(call if_changed,lzma)
 $(obj)/vmlinux.bin.xz: $(vmlinux.bin.all-y) FORCE
diff --git a/arch/sh/boot/compressed/misc.c b/arch/sh/boot/compressed/misc.c
index a03b6680a9d9..bf0d4446f7a6 100644
--- a/arch/sh/boot/compressed/misc.c
+++ b/arch/sh/boot/compressed/misc.c
@@ -44,20 +44,12 @@ extern int _end;
 static unsigned long free_mem_ptr;
 static unsigned long free_mem_end_ptr;
 
-#ifdef CONFIG_HAVE_KERNEL_BZIP2
-#define HEAP_SIZE	0x400000
-#else
 #define HEAP_SIZE	0x10000
-#endif
 
 #ifdef CONFIG_KERNEL_GZIP
 #include "../../../../lib/decompress_inflate.c"
 #endif
 
-#ifdef CONFIG_KERNEL_BZIP2
-#include "../../../../lib/decompress_bunzip2.c"
-#endif
-
 #ifdef CONFIG_KERNEL_LZMA
 #include "../../../../lib/decompress_unlzma.c"
 #endif
diff --git a/arch/sh/configs/sdk7786_defconfig b/arch/sh/configs/sdk7786_defconfig
index 61bec46ebd66..486e8762cc44 100644
--- a/arch/sh/configs/sdk7786_defconfig
+++ b/arch/sh/configs/sdk7786_defconfig
@@ -29,7 +29,6 @@ CONFIG_USER_NS=y
 CONFIG_PID_NS=y
 CONFIG_NET_NS=y
 CONFIG_BLK_DEV_INITRD=y
-CONFIG_RD_BZIP2=y
 CONFIG_RD_LZMA=y
 CONFIG_RD_LZO=y
 # CONFIG_COMPAT_BRK is not set
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index eeb87fce9c6f..177a84c2f822 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -186,7 +186,6 @@ config X86
 	select HAVE_IDE
 	select HAVE_IOREMAP_PROT
 	select HAVE_IRQ_TIME_ACCOUNTING
-	select HAVE_KERNEL_BZIP2
 	select HAVE_KERNEL_GZIP
 	select HAVE_KERNEL_LZ4
 	select HAVE_KERNEL_LZMA
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 40b8fd375d52..868c61b6d692 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -7,13 +7,13 @@
 # vmlinuz is:
 #	decompression code (*.o)
 #	asm globals (piggy.S), including:
-#		vmlinux.bin.(gz|bz2|lzma|...)
+#		vmlinux.bin.(gz|lzma|...)
 #
 # vmlinux.bin is:
 #	vmlinux stripped of debugging and comments
 # vmlinux.bin.all is:
 #	vmlinux.bin + vmlinux.relocs
-# vmlinux.bin.(gz|bz2|lzma|...) is:
+# vmlinux.bin.(gz|lzma|...) is:
 #	(see scripts/Makefile.lib size_append)
 #	compressed vmlinux.bin.all + u32 size of vmlinux.bin.all
 
@@ -25,7 +25,7 @@ OBJECT_FILES_NON_STANDARD	:= y
 # Prevents link failures: __sanitizer_cov_trace_pc() is not linked in.
 KCOV_INSTRUMENT		:= n
 
-targets := vmlinux vmlinux.bin vmlinux.bin.gz vmlinux.bin.bz2 vmlinux.bin.lzma \
+targets := vmlinux vmlinux.bin vmlinux.bin.gz vmlinux.bin.lzma \
 	vmlinux.bin.xz vmlinux.bin.lzo vmlinux.bin.lz4 vmlinux.bin.zst
 
 KBUILD_CFLAGS := -m$(BITS) -O2
@@ -120,8 +120,6 @@ vmlinux.bin.all-$(CONFIG_X86_NEED_RELOCS) += $(obj)/vmlinux.relocs
 
 $(obj)/vmlinux.bin.gz: $(vmlinux.bin.all-y) FORCE
 	$(call if_changed,gzip)
-$(obj)/vmlinux.bin.bz2: $(vmlinux.bin.all-y) FORCE
-	$(call if_changed,bzip2)
 $(obj)/vmlinux.bin.lzma: $(vmlinux.bin.all-y) FORCE
 	$(call if_changed,lzma)
 $(obj)/vmlinux.bin.xz: $(vmlinux.bin.all-y) FORCE
@@ -134,7 +132,6 @@ $(obj)/vmlinux.bin.zst: $(vmlinux.bin.all-y) FORCE
 	$(call if_changed,zstd22)
 
 suffix-$(CONFIG_KERNEL_GZIP)	:= gz
-suffix-$(CONFIG_KERNEL_BZIP2)	:= bz2
 suffix-$(CONFIG_KERNEL_LZMA)	:= lzma
 suffix-$(CONFIG_KERNEL_XZ)	:= xz
 suffix-$(CONFIG_KERNEL_LZO) 	:= lzo
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 267e7f93050e..b8ef48b240cd 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -55,10 +55,6 @@ static int lines, cols;
 #include "../../../../lib/decompress_inflate.c"
 #endif
 
-#ifdef CONFIG_KERNEL_BZIP2
-#include "../../../../lib/decompress_bunzip2.c"
-#endif
-
 #ifdef CONFIG_KERNEL_LZMA
 #include "../../../../lib/decompress_unlzma.c"
 #endif
diff --git a/arch/x86/include/asm/boot.h b/arch/x86/include/asm/boot.h
index 9191280d9ea3..460243445b13 100644
--- a/arch/x86/include/asm/boot.h
+++ b/arch/x86/include/asm/boot.h
@@ -24,9 +24,7 @@
 # error "Invalid value for CONFIG_PHYSICAL_ALIGN"
 #endif
 
-#if defined(CONFIG_KERNEL_BZIP2)
-# define BOOT_HEAP_SIZE		0x400000
-#elif defined(CONFIG_KERNEL_ZSTD)
+#if defined(CONFIG_KERNEL_ZSTD)
 /*
  * Zstd needs to allocate the ZSTD_DCtx in order to decompress the kernel.
  * The ZSTD_DCtx is ~160KB, so set the heap size to 192KB because it is a
diff --git a/arch/xtensa/configs/cadence_csp_defconfig b/arch/xtensa/configs/cadence_csp_defconfig
index fc240737b14d..fb069d8a7c83 100644
--- a/arch/xtensa/configs/cadence_csp_defconfig
+++ b/arch/xtensa/configs/cadence_csp_defconfig
@@ -15,7 +15,6 @@ CONFIG_SCHED_AUTOGROUP=y
 CONFIG_RELAY=y
 CONFIG_BLK_DEV_INITRD=y
 CONFIG_INITRAMFS_SOURCE="$$KERNEL_INITRAMFS_SOURCE"
-# CONFIG_RD_BZIP2 is not set
 # CONFIG_RD_LZMA is not set
 # CONFIG_RD_XZ is not set
 # CONFIG_RD_LZO is not set
diff --git a/arch/xtensa/configs/nommu_kc705_defconfig b/arch/xtensa/configs/nommu_kc705_defconfig
index 88b2e222d4bf..b78b404f4f8f 100644
--- a/arch/xtensa/configs/nommu_kc705_defconfig
+++ b/arch/xtensa/configs/nommu_kc705_defconfig
@@ -15,7 +15,6 @@ CONFIG_NAMESPACES=y
 CONFIG_SCHED_AUTOGROUP=y
 CONFIG_RELAY=y
 CONFIG_BLK_DEV_INITRD=y
-# CONFIG_RD_BZIP2 is not set
 # CONFIG_RD_LZMA is not set
 # CONFIG_RD_XZ is not set
 # CONFIG_RD_LZO is not set
diff --git a/include/linux/decompress/bunzip2.h b/include/linux/decompress/bunzip2.h
deleted file mode 100644
index 5860163942a4..000000000000
--- a/include/linux/decompress/bunzip2.h
+++ /dev/null
@@ -1,11 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef DECOMPRESS_BUNZIP2_H
-#define DECOMPRESS_BUNZIP2_H
-
-int bunzip2(unsigned char *inbuf, long len,
-	    long (*fill)(void*, unsigned long),
-	    long (*flush)(void*, unsigned long),
-	    unsigned char *output,
-	    long *pos,
-	    void(*error)(char *x));
-#endif
diff --git a/init/Kconfig b/init/Kconfig
index b77c60f8b963..fdb50763ec50 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -180,9 +180,6 @@ config BUILD_SALT
 config HAVE_KERNEL_GZIP
 	bool
 
-config HAVE_KERNEL_BZIP2
-	bool
-
 config HAVE_KERNEL_LZMA
 	bool
 
@@ -204,7 +201,7 @@ config HAVE_KERNEL_UNCOMPRESSED
 choice
 	prompt "Kernel compression mode"
 	default KERNEL_GZIP
-	depends on HAVE_KERNEL_GZIP || HAVE_KERNEL_BZIP2 || HAVE_KERNEL_LZMA || HAVE_KERNEL_XZ || HAVE_KERNEL_LZO || HAVE_KERNEL_LZ4 || HAVE_KERNEL_ZSTD || HAVE_KERNEL_UNCOMPRESSED
+	depends on HAVE_KERNEL_GZIP || HAVE_KERNEL_LZMA || HAVE_KERNEL_XZ || HAVE_KERNEL_LZO || HAVE_KERNEL_LZ4 || HAVE_KERNEL_ZSTD || HAVE_KERNEL_UNCOMPRESSED
 	help
 	  The linux kernel is a kind of self-extracting executable.
 	  Several compression algorithms are available, which differ
@@ -212,11 +209,6 @@ choice
 	  Compression speed is only relevant when building a kernel.
 	  Decompression speed is relevant at each boot.
 
-	  If you have any problems with bzip2 or lzma compressed
-	  kernels, mail me (Alain Knaff) <alain@knaff.lu>. (An older
-	  version of this functionality (bzip2 only), for 2.4, was
-	  supplied by Christian Ludwig)
-
 	  High compression options are mostly useful for users, who
 	  are low on disk space (embedded systems), but for whom ram
 	  size matters less.
@@ -230,22 +222,12 @@ config KERNEL_GZIP
 	  The old and tried gzip compression. It provides a good balance
 	  between compression ratio and decompression speed.
 
-config KERNEL_BZIP2
-	bool "Bzip2"
-	depends on HAVE_KERNEL_BZIP2
-	help
-	  Its compression ratio and speed is intermediate.
-	  Decompression speed is slowest among the choices.  The kernel
-	  size is about 10% smaller with bzip2, in comparison to gzip.
-	  Bzip2 uses a large amount of memory. For modern kernels you
-	  will need at least 8MB RAM or more for booting.
-
 config KERNEL_LZMA
 	bool "LZMA"
 	depends on HAVE_KERNEL_LZMA
 	help
 	  This compression algorithm's ratio is best.  Decompression speed
-	  is between gzip and bzip2.  Compression is slowest.
+	  is similar to XZ.  Compression is slowest.
 	  The kernel size is about 33% smaller with LZMA in comparison to gzip.
 
 config KERNEL_XZ
diff --git a/init/do_mounts_rd.c b/init/do_mounts_rd.c
index ac021ae6e6fa..5b7114743c9a 100644
--- a/init/do_mounts_rd.c
+++ b/init/do_mounts_rd.c
@@ -48,7 +48,6 @@ static int __init crd_load(decompress_fn deco);
  *	cramfs
  *	squashfs
  *	gzip
- *	bzip2
  *	lzma
  *	xz
  *	lzo
diff --git a/kernel/configs/tiny.config b/kernel/configs/tiny.config
index 8a44b93da0f3..2ca3fb830ca7 100644
--- a/kernel/configs/tiny.config
+++ b/kernel/configs/tiny.config
@@ -1,7 +1,6 @@
 # CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE is not set
 CONFIG_CC_OPTIMIZE_FOR_SIZE=y
 # CONFIG_KERNEL_GZIP is not set
-# CONFIG_KERNEL_BZIP2 is not set
 # CONFIG_KERNEL_LZMA is not set
 CONFIG_KERNEL_XZ=y
 # CONFIG_KERNEL_LZO is not set
diff --git a/lib/Kconfig b/lib/Kconfig
index b46a9fd122c8..1d33e5e70f0e 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -324,9 +324,6 @@ config DECOMPRESS_GZIP
 	select ZLIB_INFLATE
 	tristate
 
-config DECOMPRESS_BZIP2
-	tristate
-
 config DECOMPRESS_LZMA
 	tristate
 
diff --git a/lib/Makefile b/lib/Makefile
index d415fc7067c5..c50f3f0111c2 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -193,7 +193,6 @@ obj-$(CONFIG_XZ_DEC) += xz/
 obj-$(CONFIG_RAID6_PQ) += raid6/
 
 lib-$(CONFIG_DECOMPRESS_GZIP) += decompress_inflate.o
-lib-$(CONFIG_DECOMPRESS_BZIP2) += decompress_bunzip2.o
 lib-$(CONFIG_DECOMPRESS_LZMA) += decompress_unlzma.o
 lib-$(CONFIG_DECOMPRESS_XZ) += decompress_unxz.o
 lib-$(CONFIG_DECOMPRESS_LZO) += decompress_unlzo.o
diff --git a/lib/decompress.c b/lib/decompress.c
index ab3fc90ffc64..7e272d7fd27a 100644
--- a/lib/decompress.c
+++ b/lib/decompress.c
@@ -7,7 +7,6 @@
 
 #include <linux/decompress/generic.h>
 
-#include <linux/decompress/bunzip2.h>
 #include <linux/decompress/unlzma.h>
 #include <linux/decompress/unxz.h>
 #include <linux/decompress/inflate.h>
@@ -23,9 +22,6 @@
 #ifndef CONFIG_DECOMPRESS_GZIP
 # define gunzip NULL
 #endif
-#ifndef CONFIG_DECOMPRESS_BZIP2
-# define bunzip2 NULL
-#endif
 #ifndef CONFIG_DECOMPRESS_LZMA
 # define unlzma NULL
 #endif
@@ -51,7 +47,6 @@ struct compress_format {
 static const struct compress_format compressed_formats[] __initconst = {
 	{ {0x1f, 0x8b}, "gzip", gunzip },
 	{ {0x1f, 0x9e}, "gzip", gunzip },
-	{ {0x42, 0x5a}, "bzip2", bunzip2 },
 	{ {0x5d, 0x00}, "lzma", unlzma },
 	{ {0xfd, 0x37}, "xz", unxz },
 	{ {0x89, 0x4c}, "lzo", unlzo },
diff --git a/lib/decompress_bunzip2.c b/lib/decompress_bunzip2.c
deleted file mode 100644
index c72c865032fa..000000000000
--- a/lib/decompress_bunzip2.c
+++ /dev/null
@@ -1,756 +0,0 @@
-/*	Small bzip2 deflate implementation, by Rob Landley (rob@landley.net).
-
-	Based on bzip2 decompression code by Julian R Seward (jseward@acm.org),
-	which also acknowledges contributions by Mike Burrows, David Wheeler,
-	Peter Fenwick, Alistair Moffat, Radford Neal, Ian H. Witten,
-	Robert Sedgewick, and Jon L. Bentley.
-
-	This code is licensed under the LGPLv2:
-		LGPL (http://www.gnu.org/copyleft/lgpl.html
-*/
-
-/*
-	Size and speed optimizations by Manuel Novoa III  (mjn3@codepoet.org).
-
-	More efficient reading of Huffman codes, a streamlined read_bunzip()
-	function, and various other tweaks.  In (limited) tests, approximately
-	20% faster than bzcat on x86 and about 10% faster on arm.
-
-	Note that about 2/3 of the time is spent in read_unzip() reversing
-	the Burrows-Wheeler transformation.  Much of that time is delay
-	resulting from cache misses.
-
-	I would ask that anyone benefiting from this work, especially those
-	using it in commercial products, consider making a donation to my local
-	non-profit hospice organization in the name of the woman I loved, who
-	passed away Feb. 12, 2003.
-
-		In memory of Toni W. Hagan
-
-		Hospice of Acadiana, Inc.
-		2600 Johnston St., Suite 200
-		Lafayette, LA 70503-3240
-
-		Phone (337) 232-1234 or 1-800-738-2226
-		Fax   (337) 232-1297
-
-		https://www.hospiceacadiana.com/
-
-	Manuel
- */
-
-/*
-	Made it fit for running in Linux Kernel by Alain Knaff (alain@knaff.lu)
-*/
-
-
-#ifdef STATIC
-#define PREBOOT
-#else
-#include <linux/decompress/bunzip2.h>
-#endif /* STATIC */
-
-#include <linux/decompress/mm.h>
-#include <linux/crc32poly.h>
-
-#ifndef INT_MAX
-#define INT_MAX 0x7fffffff
-#endif
-
-/* Constants for Huffman coding */
-#define MAX_GROUPS		6
-#define GROUP_SIZE   		50	/* 64 would have been more efficient */
-#define MAX_HUFCODE_BITS 	20	/* Longest Huffman code allowed */
-#define MAX_SYMBOLS 		258	/* 256 literals + RUNA + RUNB */
-#define SYMBOL_RUNA		0
-#define SYMBOL_RUNB		1
-
-/* Status return values */
-#define RETVAL_OK			0
-#define RETVAL_LAST_BLOCK		(-1)
-#define RETVAL_NOT_BZIP_DATA		(-2)
-#define RETVAL_UNEXPECTED_INPUT_EOF	(-3)
-#define RETVAL_UNEXPECTED_OUTPUT_EOF	(-4)
-#define RETVAL_DATA_ERROR		(-5)
-#define RETVAL_OUT_OF_MEMORY		(-6)
-#define RETVAL_OBSOLETE_INPUT		(-7)
-
-/* Other housekeeping constants */
-#define BZIP2_IOBUF_SIZE		4096
-
-/* This is what we know about each Huffman coding group */
-struct group_data {
-	/* We have an extra slot at the end of limit[] for a sentinal value. */
-	int limit[MAX_HUFCODE_BITS+1];
-	int base[MAX_HUFCODE_BITS];
-	int permute[MAX_SYMBOLS];
-	int minLen, maxLen;
-};
-
-/* Structure holding all the housekeeping data, including IO buffers and
-   memory that persists between calls to bunzip */
-struct bunzip_data {
-	/* State for interrupting output loop */
-	int writeCopies, writePos, writeRunCountdown, writeCount, writeCurrent;
-	/* I/O tracking data (file handles, buffers, positions, etc.) */
-	long (*fill)(void*, unsigned long);
-	long inbufCount, inbufPos /*, outbufPos*/;
-	unsigned char *inbuf /*,*outbuf*/;
-	unsigned int inbufBitCount, inbufBits;
-	/* The CRC values stored in the block header and calculated from the
-	data */
-	unsigned int crc32Table[256], headerCRC, totalCRC, writeCRC;
-	/* Intermediate buffer and its size (in bytes) */
-	unsigned int *dbuf, dbufSize;
-	/* These things are a bit too big to go on the stack */
-	unsigned char selectors[32768];		/* nSelectors = 15 bits */
-	struct group_data groups[MAX_GROUPS];	/* Huffman coding tables */
-	int io_error;			/* non-zero if we have IO error */
-	int byteCount[256];
-	unsigned char symToByte[256], mtfSymbol[256];
-};
-
-
-/* Return the next nnn bits of input.  All reads from the compressed input
-   are done through this function.  All reads are big endian */
-static unsigned int INIT get_bits(struct bunzip_data *bd, char bits_wanted)
-{
-	unsigned int bits = 0;
-
-	/* If we need to get more data from the byte buffer, do so.
-	   (Loop getting one byte at a time to enforce endianness and avoid
-	   unaligned access.) */
-	while (bd->inbufBitCount < bits_wanted) {
-		/* If we need to read more data from file into byte buffer, do
-		   so */
-		if (bd->inbufPos == bd->inbufCount) {
-			if (bd->io_error)
-				return 0;
-			bd->inbufCount = bd->fill(bd->inbuf, BZIP2_IOBUF_SIZE);
-			if (bd->inbufCount <= 0) {
-				bd->io_error = RETVAL_UNEXPECTED_INPUT_EOF;
-				return 0;
-			}
-			bd->inbufPos = 0;
-		}
-		/* Avoid 32-bit overflow (dump bit buffer to top of output) */
-		if (bd->inbufBitCount >= 24) {
-			bits = bd->inbufBits&((1 << bd->inbufBitCount)-1);
-			bits_wanted -= bd->inbufBitCount;
-			bits <<= bits_wanted;
-			bd->inbufBitCount = 0;
-		}
-		/* Grab next 8 bits of input from buffer. */
-		bd->inbufBits = (bd->inbufBits << 8)|bd->inbuf[bd->inbufPos++];
-		bd->inbufBitCount += 8;
-	}
-	/* Calculate result */
-	bd->inbufBitCount -= bits_wanted;
-	bits |= (bd->inbufBits >> bd->inbufBitCount)&((1 << bits_wanted)-1);
-
-	return bits;
-}
-
-/* Unpacks the next block and sets up for the inverse burrows-wheeler step. */
-
-static int INIT get_next_block(struct bunzip_data *bd)
-{
-	struct group_data *hufGroup = NULL;
-	int *base = NULL;
-	int *limit = NULL;
-	int dbufCount, nextSym, dbufSize, groupCount, selector,
-		i, j, k, t, runPos, symCount, symTotal, nSelectors, *byteCount;
-	unsigned char uc, *symToByte, *mtfSymbol, *selectors;
-	unsigned int *dbuf, origPtr;
-
-	dbuf = bd->dbuf;
-	dbufSize = bd->dbufSize;
-	selectors = bd->selectors;
-	byteCount = bd->byteCount;
-	symToByte = bd->symToByte;
-	mtfSymbol = bd->mtfSymbol;
-
-	/* Read in header signature and CRC, then validate signature.
-	   (last block signature means CRC is for whole file, return now) */
-	i = get_bits(bd, 24);
-	j = get_bits(bd, 24);
-	bd->headerCRC = get_bits(bd, 32);
-	if ((i == 0x177245) && (j == 0x385090))
-		return RETVAL_LAST_BLOCK;
-	if ((i != 0x314159) || (j != 0x265359))
-		return RETVAL_NOT_BZIP_DATA;
-	/* We can add support for blockRandomised if anybody complains.
-	   There was some code for this in busybox 1.0.0-pre3, but nobody ever
-	   noticed that it didn't actually work. */
-	if (get_bits(bd, 1))
-		return RETVAL_OBSOLETE_INPUT;
-	origPtr = get_bits(bd, 24);
-	if (origPtr >= dbufSize)
-		return RETVAL_DATA_ERROR;
-	/* mapping table: if some byte values are never used (encoding things
-	   like ascii text), the compression code removes the gaps to have fewer
-	   symbols to deal with, and writes a sparse bitfield indicating which
-	   values were present.  We make a translation table to convert the
-	   symbols back to the corresponding bytes. */
-	t = get_bits(bd, 16);
-	symTotal = 0;
-	for (i = 0; i < 16; i++) {
-		if (t&(1 << (15-i))) {
-			k = get_bits(bd, 16);
-			for (j = 0; j < 16; j++)
-				if (k&(1 << (15-j)))
-					symToByte[symTotal++] = (16*i)+j;
-		}
-	}
-	/* How many different Huffman coding groups does this block use? */
-	groupCount = get_bits(bd, 3);
-	if (groupCount < 2 || groupCount > MAX_GROUPS)
-		return RETVAL_DATA_ERROR;
-	/* nSelectors: Every GROUP_SIZE many symbols we select a new
-	   Huffman coding group.  Read in the group selector list,
-	   which is stored as MTF encoded bit runs.  (MTF = Move To
-	   Front, as each value is used it's moved to the start of the
-	   list.) */
-	nSelectors = get_bits(bd, 15);
-	if (!nSelectors)
-		return RETVAL_DATA_ERROR;
-	for (i = 0; i < groupCount; i++)
-		mtfSymbol[i] = i;
-	for (i = 0; i < nSelectors; i++) {
-		/* Get next value */
-		for (j = 0; get_bits(bd, 1); j++)
-			if (j >= groupCount)
-				return RETVAL_DATA_ERROR;
-		/* Decode MTF to get the next selector */
-		uc = mtfSymbol[j];
-		for (; j; j--)
-			mtfSymbol[j] = mtfSymbol[j-1];
-		mtfSymbol[0] = selectors[i] = uc;
-	}
-	/* Read the Huffman coding tables for each group, which code
-	   for symTotal literal symbols, plus two run symbols (RUNA,
-	   RUNB) */
-	symCount = symTotal+2;
-	for (j = 0; j < groupCount; j++) {
-		unsigned char length[MAX_SYMBOLS], temp[MAX_HUFCODE_BITS+1];
-		int	minLen,	maxLen, pp;
-		/* Read Huffman code lengths for each symbol.  They're
-		   stored in a way similar to mtf; record a starting
-		   value for the first symbol, and an offset from the
-		   previous value for everys symbol after that.
-		   (Subtracting 1 before the loop and then adding it
-		   back at the end is an optimization that makes the
-		   test inside the loop simpler: symbol length 0
-		   becomes negative, so an unsigned inequality catches
-		   it.) */
-		t = get_bits(bd, 5)-1;
-		for (i = 0; i < symCount; i++) {
-			for (;;) {
-				if (((unsigned)t) > (MAX_HUFCODE_BITS-1))
-					return RETVAL_DATA_ERROR;
-
-				/* If first bit is 0, stop.  Else
-				   second bit indicates whether to
-				   increment or decrement the value.
-				   Optimization: grab 2 bits and unget
-				   the second if the first was 0. */
-
-				k = get_bits(bd, 2);
-				if (k < 2) {
-					bd->inbufBitCount++;
-					break;
-				}
-				/* Add one if second bit 1, else
-				 * subtract 1.  Avoids if/else */
-				t += (((k+1)&2)-1);
-			}
-			/* Correct for the initial -1, to get the
-			 * final symbol length */
-			length[i] = t+1;
-		}
-		/* Find largest and smallest lengths in this group */
-		minLen = maxLen = length[0];
-
-		for (i = 1; i < symCount; i++) {
-			if (length[i] > maxLen)
-				maxLen = length[i];
-			else if (length[i] < minLen)
-				minLen = length[i];
-		}
-
-		/* Calculate permute[], base[], and limit[] tables from
-		 * length[].
-		 *
-		 * permute[] is the lookup table for converting
-		 * Huffman coded symbols into decoded symbols.  base[]
-		 * is the amount to subtract from the value of a
-		 * Huffman symbol of a given length when using
-		 * permute[].
-		 *
-		 * limit[] indicates the largest numerical value a
-		 * symbol with a given number of bits can have.  This
-		 * is how the Huffman codes can vary in length: each
-		 * code with a value > limit[length] needs another
-		 * bit.
-		 */
-		hufGroup = bd->groups+j;
-		hufGroup->minLen = minLen;
-		hufGroup->maxLen = maxLen;
-		/* Note that minLen can't be smaller than 1, so we
-		   adjust the base and limit array pointers so we're
-		   not always wasting the first entry.  We do this
-		   again when using them (during symbol decoding).*/
-		base = hufGroup->base-1;
-		limit = hufGroup->limit-1;
-		/* Calculate permute[].  Concurrently, initialize
-		 * temp[] and limit[]. */
-		pp = 0;
-		for (i = minLen; i <= maxLen; i++) {
-			temp[i] = limit[i] = 0;
-			for (t = 0; t < symCount; t++)
-				if (length[t] == i)
-					hufGroup->permute[pp++] = t;
-		}
-		/* Count symbols coded for at each bit length */
-		for (i = 0; i < symCount; i++)
-			temp[length[i]]++;
-		/* Calculate limit[] (the largest symbol-coding value
-		 *at each bit length, which is (previous limit <<
-		 *1)+symbols at this level), and base[] (number of
-		 *symbols to ignore at each bit length, which is limit
-		 *minus the cumulative count of symbols coded for
-		 *already). */
-		pp = t = 0;
-		for (i = minLen; i < maxLen; i++) {
-			pp += temp[i];
-			/* We read the largest possible symbol size
-			   and then unget bits after determining how
-			   many we need, and those extra bits could be
-			   set to anything.  (They're noise from
-			   future symbols.)  At each level we're
-			   really only interested in the first few
-			   bits, so here we set all the trailing
-			   to-be-ignored bits to 1 so they don't
-			   affect the value > limit[length]
-			   comparison. */
-			limit[i] = (pp << (maxLen - i)) - 1;
-			pp <<= 1;
-			base[i+1] = pp-(t += temp[i]);
-		}
-		limit[maxLen+1] = INT_MAX; /* Sentinal value for
-					    * reading next sym. */
-		limit[maxLen] = pp+temp[maxLen]-1;
-		base[minLen] = 0;
-	}
-	/* We've finished reading and digesting the block header.  Now
-	   read this block's Huffman coded symbols from the file and
-	   undo the Huffman coding and run length encoding, saving the
-	   result into dbuf[dbufCount++] = uc */
-
-	/* Initialize symbol occurrence counters and symbol Move To
-	 * Front table */
-	for (i = 0; i < 256; i++) {
-		byteCount[i] = 0;
-		mtfSymbol[i] = (unsigned char)i;
-	}
-	/* Loop through compressed symbols. */
-	runPos = dbufCount = symCount = selector = 0;
-	for (;;) {
-		/* Determine which Huffman coding group to use. */
-		if (!(symCount--)) {
-			symCount = GROUP_SIZE-1;
-			if (selector >= nSelectors)
-				return RETVAL_DATA_ERROR;
-			hufGroup = bd->groups+selectors[selector++];
-			base = hufGroup->base-1;
-			limit = hufGroup->limit-1;
-		}
-		/* Read next Huffman-coded symbol. */
-		/* Note: It is far cheaper to read maxLen bits and
-		   back up than it is to read minLen bits and then an
-		   additional bit at a time, testing as we go.
-		   Because there is a trailing last block (with file
-		   CRC), there is no danger of the overread causing an
-		   unexpected EOF for a valid compressed file.  As a
-		   further optimization, we do the read inline
-		   (falling back to a call to get_bits if the buffer
-		   runs dry).  The following (up to got_huff_bits:) is
-		   equivalent to j = get_bits(bd, hufGroup->maxLen);
-		 */
-		while (bd->inbufBitCount < hufGroup->maxLen) {
-			if (bd->inbufPos == bd->inbufCount) {
-				j = get_bits(bd, hufGroup->maxLen);
-				goto got_huff_bits;
-			}
-			bd->inbufBits =
-				(bd->inbufBits << 8)|bd->inbuf[bd->inbufPos++];
-			bd->inbufBitCount += 8;
-		};
-		bd->inbufBitCount -= hufGroup->maxLen;
-		j = (bd->inbufBits >> bd->inbufBitCount)&
-			((1 << hufGroup->maxLen)-1);
-got_huff_bits:
-		/* Figure how many bits are in next symbol and
-		 * unget extras */
-		i = hufGroup->minLen;
-		while (j > limit[i])
-			++i;
-		bd->inbufBitCount += (hufGroup->maxLen - i);
-		/* Huffman decode value to get nextSym (with bounds checking) */
-		if ((i > hufGroup->maxLen)
-			|| (((unsigned)(j = (j>>(hufGroup->maxLen-i))-base[i]))
-				>= MAX_SYMBOLS))
-			return RETVAL_DATA_ERROR;
-		nextSym = hufGroup->permute[j];
-		/* We have now decoded the symbol, which indicates
-		   either a new literal byte, or a repeated run of the
-		   most recent literal byte.  First, check if nextSym
-		   indicates a repeated run, and if so loop collecting
-		   how many times to repeat the last literal. */
-		if (((unsigned)nextSym) <= SYMBOL_RUNB) { /* RUNA or RUNB */
-			/* If this is the start of a new run, zero out
-			 * counter */
-			if (!runPos) {
-				runPos = 1;
-				t = 0;
-			}
-			/* Neat trick that saves 1 symbol: instead of
-			   or-ing 0 or 1 at each bit position, add 1
-			   or 2 instead.  For example, 1011 is 1 << 0
-			   + 1 << 1 + 2 << 2.  1010 is 2 << 0 + 2 << 1
-			   + 1 << 2.  You can make any bit pattern
-			   that way using 1 less symbol than the basic
-			   or 0/1 method (except all bits 0, which
-			   would use no symbols, but a run of length 0
-			   doesn't mean anything in this context).
-			   Thus space is saved. */
-			t += (runPos << nextSym);
-			/* +runPos if RUNA; +2*runPos if RUNB */
-
-			runPos <<= 1;
-			continue;
-		}
-		/* When we hit the first non-run symbol after a run,
-		   we now know how many times to repeat the last
-		   literal, so append that many copies to our buffer
-		   of decoded symbols (dbuf) now.  (The last literal
-		   used is the one at the head of the mtfSymbol
-		   array.) */
-		if (runPos) {
-			runPos = 0;
-			if (dbufCount+t >= dbufSize)
-				return RETVAL_DATA_ERROR;
-
-			uc = symToByte[mtfSymbol[0]];
-			byteCount[uc] += t;
-			while (t--)
-				dbuf[dbufCount++] = uc;
-		}
-		/* Is this the terminating symbol? */
-		if (nextSym > symTotal)
-			break;
-		/* At this point, nextSym indicates a new literal
-		   character.  Subtract one to get the position in the
-		   MTF array at which this literal is currently to be
-		   found.  (Note that the result can't be -1 or 0,
-		   because 0 and 1 are RUNA and RUNB.  But another
-		   instance of the first symbol in the mtf array,
-		   position 0, would have been handled as part of a
-		   run above.  Therefore 1 unused mtf position minus 2
-		   non-literal nextSym values equals -1.) */
-		if (dbufCount >= dbufSize)
-			return RETVAL_DATA_ERROR;
-		i = nextSym - 1;
-		uc = mtfSymbol[i];
-		/* Adjust the MTF array.  Since we typically expect to
-		 *move only a small number of symbols, and are bound
-		 *by 256 in any case, using memmove here would
-		 *typically be bigger and slower due to function call
-		 *overhead and other assorted setup costs. */
-		do {
-			mtfSymbol[i] = mtfSymbol[i-1];
-		} while (--i);
-		mtfSymbol[0] = uc;
-		uc = symToByte[uc];
-		/* We have our literal byte.  Save it into dbuf. */
-		byteCount[uc]++;
-		dbuf[dbufCount++] = (unsigned int)uc;
-	}
-	/* At this point, we've read all the Huffman-coded symbols
-	   (and repeated runs) for this block from the input stream,
-	   and decoded them into the intermediate buffer.  There are
-	   dbufCount many decoded bytes in dbuf[].  Now undo the
-	   Burrows-Wheeler transform on dbuf.  See
-	   http://dogma.net/markn/articles/bwt/bwt.htm
-	 */
-	/* Turn byteCount into cumulative occurrence counts of 0 to n-1. */
-	j = 0;
-	for (i = 0; i < 256; i++) {
-		k = j+byteCount[i];
-		byteCount[i] = j;
-		j = k;
-	}
-	/* Figure out what order dbuf would be in if we sorted it. */
-	for (i = 0; i < dbufCount; i++) {
-		uc = (unsigned char)(dbuf[i] & 0xff);
-		dbuf[byteCount[uc]] |= (i << 8);
-		byteCount[uc]++;
-	}
-	/* Decode first byte by hand to initialize "previous" byte.
-	   Note that it doesn't get output, and if the first three
-	   characters are identical it doesn't qualify as a run (hence
-	   writeRunCountdown = 5). */
-	if (dbufCount) {
-		if (origPtr >= dbufCount)
-			return RETVAL_DATA_ERROR;
-		bd->writePos = dbuf[origPtr];
-		bd->writeCurrent = (unsigned char)(bd->writePos&0xff);
-		bd->writePos >>= 8;
-		bd->writeRunCountdown = 5;
-	}
-	bd->writeCount = dbufCount;
-
-	return RETVAL_OK;
-}
-
-/* Undo burrows-wheeler transform on intermediate buffer to produce output.
-   If start_bunzip was initialized with out_fd =-1, then up to len bytes of
-   data are written to outbuf.  Return value is number of bytes written or
-   error (all errors are negative numbers).  If out_fd!=-1, outbuf and len
-   are ignored, data is written to out_fd and return is RETVAL_OK or error.
-*/
-
-static int INIT read_bunzip(struct bunzip_data *bd, char *outbuf, int len)
-{
-	const unsigned int *dbuf;
-	int pos, xcurrent, previous, gotcount;
-
-	/* If last read was short due to end of file, return last block now */
-	if (bd->writeCount < 0)
-		return bd->writeCount;
-
-	gotcount = 0;
-	dbuf = bd->dbuf;
-	pos = bd->writePos;
-	xcurrent = bd->writeCurrent;
-
-	/* We will always have pending decoded data to write into the output
-	   buffer unless this is the very first call (in which case we haven't
-	   Huffman-decoded a block into the intermediate buffer yet). */
-
-	if (bd->writeCopies) {
-		/* Inside the loop, writeCopies means extra copies (beyond 1) */
-		--bd->writeCopies;
-		/* Loop outputting bytes */
-		for (;;) {
-			/* If the output buffer is full, snapshot
-			 * state and return */
-			if (gotcount >= len) {
-				bd->writePos = pos;
-				bd->writeCurrent = xcurrent;
-				bd->writeCopies++;
-				return len;
-			}
-			/* Write next byte into output buffer, updating CRC */
-			outbuf[gotcount++] = xcurrent;
-			bd->writeCRC = (((bd->writeCRC) << 8)
-				^bd->crc32Table[((bd->writeCRC) >> 24)
-				^xcurrent]);
-			/* Loop now if we're outputting multiple
-			 * copies of this byte */
-			if (bd->writeCopies) {
-				--bd->writeCopies;
-				continue;
-			}
-decode_next_byte:
-			if (!bd->writeCount--)
-				break;
-			/* Follow sequence vector to undo
-			 * Burrows-Wheeler transform */
-			previous = xcurrent;
-			pos = dbuf[pos];
-			xcurrent = pos&0xff;
-			pos >>= 8;
-			/* After 3 consecutive copies of the same
-			   byte, the 4th is a repeat count.  We count
-			   down from 4 instead *of counting up because
-			   testing for non-zero is faster */
-			if (--bd->writeRunCountdown) {
-				if (xcurrent != previous)
-					bd->writeRunCountdown = 4;
-			} else {
-				/* We have a repeated run, this byte
-				 * indicates the count */
-				bd->writeCopies = xcurrent;
-				xcurrent = previous;
-				bd->writeRunCountdown = 5;
-				/* Sometimes there are just 3 bytes
-				 * (run length 0) */
-				if (!bd->writeCopies)
-					goto decode_next_byte;
-				/* Subtract the 1 copy we'd output
-				 * anyway to get extras */
-				--bd->writeCopies;
-			}
-		}
-		/* Decompression of this block completed successfully */
-		bd->writeCRC = ~bd->writeCRC;
-		bd->totalCRC = ((bd->totalCRC << 1) |
-				(bd->totalCRC >> 31)) ^ bd->writeCRC;
-		/* If this block had a CRC error, force file level CRC error. */
-		if (bd->writeCRC != bd->headerCRC) {
-			bd->totalCRC = bd->headerCRC+1;
-			return RETVAL_LAST_BLOCK;
-		}
-	}
-
-	/* Refill the intermediate buffer by Huffman-decoding next
-	 * block of input */
-	/* (previous is just a convenient unused temp variable here) */
-	previous = get_next_block(bd);
-	if (previous) {
-		bd->writeCount = previous;
-		return (previous != RETVAL_LAST_BLOCK) ? previous : gotcount;
-	}
-	bd->writeCRC = 0xffffffffUL;
-	pos = bd->writePos;
-	xcurrent = bd->writeCurrent;
-	goto decode_next_byte;
-}
-
-static long INIT nofill(void *buf, unsigned long len)
-{
-	return -1;
-}
-
-/* Allocate the structure, read file header.  If in_fd ==-1, inbuf must contain
-   a complete bunzip file (len bytes long).  If in_fd!=-1, inbuf and len are
-   ignored, and data is read from file handle into temporary buffer. */
-static int INIT start_bunzip(struct bunzip_data **bdp, void *inbuf, long len,
-			     long (*fill)(void*, unsigned long))
-{
-	struct bunzip_data *bd;
-	unsigned int i, j, c;
-	const unsigned int BZh0 =
-		(((unsigned int)'B') << 24)+(((unsigned int)'Z') << 16)
-		+(((unsigned int)'h') << 8)+(unsigned int)'0';
-
-	/* Figure out how much data to allocate */
-	i = sizeof(struct bunzip_data);
-
-	/* Allocate bunzip_data.  Most fields initialize to zero. */
-	bd = *bdp = malloc(i);
-	if (!bd)
-		return RETVAL_OUT_OF_MEMORY;
-	memset(bd, 0, sizeof(struct bunzip_data));
-	/* Setup input buffer */
-	bd->inbuf = inbuf;
-	bd->inbufCount = len;
-	if (fill != NULL)
-		bd->fill = fill;
-	else
-		bd->fill = nofill;
-
-	/* Init the CRC32 table (big endian) */
-	for (i = 0; i < 256; i++) {
-		c = i << 24;
-		for (j = 8; j; j--)
-			c = c&0x80000000 ? (c << 1)^(CRC32_POLY_BE) : (c << 1);
-		bd->crc32Table[i] = c;
-	}
-
-	/* Ensure that file starts with "BZh['1'-'9']." */
-	i = get_bits(bd, 32);
-	if (((unsigned int)(i-BZh0-1)) >= 9)
-		return RETVAL_NOT_BZIP_DATA;
-
-	/* Fourth byte (ascii '1'-'9'), indicates block size in units of 100k of
-	   uncompressed data.  Allocate intermediate buffer for block. */
-	bd->dbufSize = 100000*(i-BZh0);
-
-	bd->dbuf = large_malloc(bd->dbufSize * sizeof(int));
-	if (!bd->dbuf)
-		return RETVAL_OUT_OF_MEMORY;
-	return RETVAL_OK;
-}
-
-/* Example usage: decompress src_fd to dst_fd.  (Stops at end of bzip2 data,
-   not end of file.) */
-STATIC int INIT bunzip2(unsigned char *buf, long len,
-			long (*fill)(void*, unsigned long),
-			long (*flush)(void*, unsigned long),
-			unsigned char *outbuf,
-			long *pos,
-			void(*error)(char *x))
-{
-	struct bunzip_data *bd;
-	int i = -1;
-	unsigned char *inbuf;
-
-	if (flush)
-		outbuf = malloc(BZIP2_IOBUF_SIZE);
-
-	if (!outbuf) {
-		error("Could not allocate output buffer");
-		return RETVAL_OUT_OF_MEMORY;
-	}
-	if (buf)
-		inbuf = buf;
-	else
-		inbuf = malloc(BZIP2_IOBUF_SIZE);
-	if (!inbuf) {
-		error("Could not allocate input buffer");
-		i = RETVAL_OUT_OF_MEMORY;
-		goto exit_0;
-	}
-	i = start_bunzip(&bd, inbuf, len, fill);
-	if (!i) {
-		for (;;) {
-			i = read_bunzip(bd, outbuf, BZIP2_IOBUF_SIZE);
-			if (i <= 0)
-				break;
-			if (!flush)
-				outbuf += i;
-			else
-				if (i != flush(outbuf, i)) {
-					i = RETVAL_UNEXPECTED_OUTPUT_EOF;
-					break;
-				}
-		}
-	}
-	/* Check CRC and release memory */
-	if (i == RETVAL_LAST_BLOCK) {
-		if (bd->headerCRC != bd->totalCRC)
-			error("Data integrity error when decompressing.");
-		else
-			i = RETVAL_OK;
-	} else if (i == RETVAL_UNEXPECTED_OUTPUT_EOF) {
-		error("Compressed file ends unexpectedly");
-	}
-	if (!bd)
-		goto exit_1;
-	if (bd->dbuf)
-		large_free(bd->dbuf);
-	if (pos)
-		*pos = bd->inbufPos;
-	free(bd);
-exit_1:
-	if (!buf)
-		free(inbuf);
-exit_0:
-	if (flush)
-		free(outbuf);
-	return i;
-}
-
-#ifdef PREBOOT
-STATIC int INIT __decompress(unsigned char *buf, long len,
-			long (*fill)(void*, unsigned long),
-			long (*flush)(void*, unsigned long),
-			unsigned char *outbuf, long olen,
-			long *pos,
-			void (*error)(char *x))
-{
-	return bunzip2(buf, len - 4, fill, flush, outbuf, pos, error);
-}
-#endif
diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
index 94133708889d..f05111c59a3b 100644
--- a/scripts/Makefile.lib
+++ b/scripts/Makefile.lib
@@ -343,10 +343,7 @@ $(obj)/%.dt.yaml: $(src)/%.dts $(DTC) $(DT_TMP_SCHEMA) FORCE
 
 dtc-tmp = $(subst $(comma),_,$(dot-target).dts.tmp)
 
-# Bzip2
-# ---------------------------------------------------------------------------
-
-# Bzip2 and LZMA do not include size in file... so we have to fake that;
+# LZMA does not include size in file... so we have to fake that;
 # append the size as a 32-bit littleendian number as gzip does.
 size_append = printf $(shell						\
 dec_size=0;								\
@@ -363,9 +360,6 @@ printf "%08x\n" $$dec_size |						\
 	}								\
 )
 
-quiet_cmd_bzip2 = BZIP2   $@
-      cmd_bzip2 = { cat $(real-prereqs) | $(KBZIP2) -9; $(size_append); } > $@
-
 # Lzma
 # ---------------------------------------------------------------------------
 
diff --git a/scripts/Makefile.package b/scripts/Makefile.package
index f952fb64789d..2c4a9dbc7e6e 100644
--- a/scripts/Makefile.package
+++ b/scripts/Makefile.package
@@ -127,7 +127,6 @@ util/PERF-VERSION-GEN $(CURDIR)/$(perf-tar)/);              \
 tar rf $(perf-tar).tar $(perf-tar)/HEAD $(perf-tar)/PERF-VERSION-FILE; \
 rm -r $(perf-tar);                                                  \
 $(if $(findstring tar-src,$@),,                                     \
-$(if $(findstring bz2,$@),$(KBZIP2),                                 \
 $(if $(findstring gz,$@),$(KGZIP),                                  \
 $(if $(findstring xz,$@),$(XZ),                                     \
 $(error unknown target $@))))                                       \
diff --git a/scripts/package/buildtar b/scripts/package/buildtar
index 936198a90477..91bcb50579c7 100755
--- a/scripts/package/buildtar
+++ b/scripts/package/buildtar
@@ -118,7 +118,7 @@ case "${ARCH}" in
 		fi
 		;;
 	arm64)
-		for i in Image.bz2 Image.gz Image.lz4 Image.lzma Image.lzo ; do
+		for i in Image.gz Image.lz4 Image.lzma Image.lzo ; do
 			if [ -f "${objtree}/arch/arm64/boot/${i}" ] ; then
 				cp -v -- "${objtree}/arch/arm64/boot/${i}" "${tmpdir}/boot/vmlinuz-${KERNELRELEASE}"
 				break
diff --git a/usr/Kconfig b/usr/Kconfig
index 2599bc21c1b2..ba5ecb054df5 100644
--- a/usr/Kconfig
+++ b/usr/Kconfig
@@ -60,14 +60,6 @@ config RD_GZIP
 	  Support loading of a gzip encoded initial ramdisk or cpio buffer.
 	  If unsure, say Y.
 
-config RD_BZIP2
-	bool "Support initial ramdisk/ramfs compressed using bzip2"
-	default y
-	select DECOMPRESS_BZIP2
-	help
-	  Support loading of a bzip2 encoded initial ramdisk or cpio buffer
-	  If unsure, say N.
-
 config RD_LZMA
 	bool "Support initial ramdisk/ramfs compressed using LZMA"
 	default y
@@ -143,19 +135,6 @@ config INITRAMFS_COMPRESSION_GZIP
 	  supported by your build system as the gzip tool is present by default
 	  on most distros.
 
-config INITRAMFS_COMPRESSION_BZIP2
-	bool "Bzip2"
-	depends on RD_BZIP2
-	help
-	  It's compression ratio and speed is intermediate. Decompression speed
-	  is slowest among the choices. The initramfs size is about 10% smaller
-	  with bzip2, in comparison to gzip. Bzip2 uses a large amount of
-	  memory. For modern kernels you will need at least 8MB RAM or more for
-	  booting.
-
-	  If you choose this, keep in mind that you need to have the bzip2 tool
-	  available to be able to compress the initram.
-
 config INITRAMFS_COMPRESSION_LZMA
 	bool "LZMA"
 	depends on RD_LZMA
@@ -175,9 +154,8 @@ config INITRAMFS_COMPRESSION_XZ
 	help
 	  XZ uses the LZMA2 algorithm and has a large dictionary which may cause
 	  problems on memory constrained systems. The initramfs size is about
-	  30% smaller with XZ in comparison to gzip. Decompression speed is
-	  better than that of bzip2 but worse than gzip and LZO. Compression is
-	  slow.
+	  30% smaller with XZ in comparison to gzip. Compression and
+	  decompression are slowest.
 
 	  If you choose this, keep in mind that you may need to install the xz
 	  tool to be able to compress the initram.
diff --git a/usr/Makefile b/usr/Makefile
index b1a81a40eab1..367bc5d4c759 100644
--- a/usr/Makefile
+++ b/usr/Makefile
@@ -3,14 +3,13 @@
 # kbuild file for usr/ - including initramfs image
 #
 
-# cmd_bzip2, cmd_lzma, cmd_lzo, cmd_lz4 from scripts/Makefile.lib appends the
+# cmd_lzma, cmd_lzo, cmd_lz4 from scripts/Makefile.lib appends the
 # size at the end of the compressed file, which unfortunately does not work
 # with unpack_to_rootfs(). Make size_append no-op.
 override size_append := :
 
 compress-y					:= shipped
 compress-$(CONFIG_INITRAMFS_COMPRESSION_GZIP)	:= gzip
-compress-$(CONFIG_INITRAMFS_COMPRESSION_BZIP2)	:= bzip2
 compress-$(CONFIG_INITRAMFS_COMPRESSION_LZMA)	:= lzma
 compress-$(CONFIG_INITRAMFS_COMPRESSION_XZ)	:= xzmisc
 compress-$(CONFIG_INITRAMFS_COMPRESSION_LZO)	:= lzo
-- 
2.29.2


^ permalink raw reply related

* Re: [PATCH] arch: fix 'unexpected IRQ trap at vector' warnings
From: Enrico Weigelt, metux IT consult @ 2020-12-15 20:12 UTC (permalink / raw)
  To: Thomas Gleixner, Michael Ellerman,
	Enrico Weigelt, metux IT consult, linux-kernel
  Cc: linux-s390, hpa, linux-parisc, deller, x86, linux-um,
	James.Bottomley, mingo, paulus, richard, bp, linuxppc-dev, jdike,
	anton.ivanov
In-Reply-To: <87y2i7298s.fsf@nanos.tec.linutronix.de>

On 09.12.20 00:01, Thomas Gleixner wrote:

> There are a few situations why it is invoked or not:
> 
>   1) The original x86 usage is not longer using it because it complains
>      rightfully about a vector being raised which has no interrupt
>      descriptor associated to it. So the original reason for naming it
>      vector is gone long ago. It emits:
> 
>      pr_emerg_ratelimited("%s: %d.%u No irq handler for vector\n",
>                           __func__, smp_processor_id(), vector);
> 
>      Directly from the x86 C entry point without ever invoking that
>      function.  Pretty popular error message due to some AMD BIOS
>      wreckage. :)

Of course, the term "vector" should be replaced by something like
"irqnr" or "virq", but I didn't have name changes within scope - just
wanted to fix the printing of that number, as i've stupled over it while
working on something different and wondered why the number differed from
what I had expected, until I seen that it prints hex instead of decimal.

But if you prefer a more complete cleanup, I'll be happy to do it.

>   3) It's invoked from __handle_domain_irq() when the 'hwirq' which is
>      handed in by the caller does not resolve to a mapped Linux
>      interrupt which is pretty much the same as the x86 situation above
>      in #1, but it prints useless data.
> 
>      It prints 'irq' which is invalid but it does not print the really
>      interesting 'hwirq' which was handed in by the caller and did
>      not resolve.

I wouldn't say the irq-nr isn't interesting. In my particular case it
was quite what I've been looking for. But you're right, hwirq should
also be printed.

>      In this case the Linux irq number is uninteresting as it is known
>      to be invalid and simply is not mapped and therefore does not
>      exist.

In my case it came in from generic_handle_irq(), and in this case this
irq number (IMHO) has been valid, but nobody handled it, so it went to
ack_bad_irq.

Of course, if this function is meant as a fallback to ack some not
otherwise handled IRQ on the hw, the linux irq number indeed isn't quite
helpful (unless we expect that code to do a lookup to the hw irq).

... rethinking this further ... shouldn't we also pass in even more data
(eg. irq_desc, irqchip, ...), so this function can check which hw to
actually talk to ?

>   4) It's invoked from the dummy irq chip which is installed for a
>      couple of truly virtual interrupts where the invocation of
>      dummy_irq_chip::irq_ack() is indicating wreckage.
> 
>      In that case the Linux irq number is the thing which is printed.
> 
> So no. It's not just inconsistent it's in some places outright
> wrong. What we really want is:
> 
> ack_bad_irq(int hwirq, int virq)

is 'int' correct here ?

BTW: I also wonder why the virq is unsigned int, while hwirq (eg. in
struct irq_data) is unsigned long. shouldn't the virtual number space
be at least as big (or even bigger) than the hw one ?

 {
>         if (hwirq >= 0)
>            print_useful_info(hwirq);
>         if (virq > 0)
>            print_useful_info(virq);
>         arch_try_to_ack(hwirq, virq);
> }
>     
> for this to make sense. Just fixing the existing printk() to be less
> wrong is not really an improvement.

Okay, makes sense.

OTOH: since both callers (dummychip.c, handle.c) already dump out before
ack_bad_irq(), do we need to print out anything at all ?

I've also seen that many archs increase a counter (some use long, others
atomic_t) - should we also consolidate this in an arch-independent way
in handle.c (or does kstat_incr_irqs_this_cpu already do this) ?

--mtx

-- 
---
Hinweis: unverschlüsselte E-Mails können leicht abgehört und manipuliert
werden ! Für eine vertrauliche Kommunikation senden Sie bitte ihren
GPG/PGP-Schlüssel zu.
---
Enrico Weigelt, metux IT consult
Free software and Linux embedded engineering
info@metux.net -- +49-151-27565287

^ permalink raw reply

* Re: [RFC PATCH] treewide: remove bzip2 compression support
From: Michal Suchánek @ 2020-12-15 21:51 UTC (permalink / raw)
  To: Alex Xu (Hello71)
  Cc: linux-s390, linux-parisc, linux-aspeed, linux-kbuild, torvalds,
	linux-xtensa, linux-sh, linux-mips, linux-kernel, openrisc,
	linux-riscv, linuxppc-dev, linux-arm-kernel
In-Reply-To: <20201215190315.8681-1-alex_y_xu@yahoo.ca>

Hello,

On Tue, Dec 15, 2020 at 02:03:15PM -0500, Alex Xu (Hello71) wrote:
> bzip2 is either slower or larger than every other supported algorithm,
> according to benchmarks at [0]. It is far slower to decompress than any
> other algorithm, and still larger than lzma, xz, and zstd.
> 
> [0] https://lore.kernel.org/lkml/1588791882.08g1378g67.none@localhost/

Sounds cool. I wonder how many people will complain that their
distribution migrated to bzip2 but got stuck there and now new kernels
won't work on there with some odd tool or another :p

> @@ -212,11 +209,6 @@ choice
>  	  Compression speed is only relevant when building a kernel.
>  	  Decompression speed is relevant at each boot.
>  
> -	  If you have any problems with bzip2 or lzma compressed
> -	  kernels, mail me (Alain Knaff) <alain@knaff.lu>. (An older
> -	  version of this functionality (bzip2 only), for 2.4, was
> -	  supplied by Christian Ludwig)
> -
Shouldn't the LZMA part be preserved here?

Thanks

Michal

^ permalink raw reply

* Re: [PATCH] arch: fix 'unexpected IRQ trap at vector' warnings
From: Thomas Gleixner @ 2020-12-15 22:12 UTC (permalink / raw)
  To: Enrico Weigelt, metux IT consult, Michael Ellerman,
	Enrico Weigelt, metux IT consult, linux-kernel
  Cc: linux-s390, hpa, linux-parisc, deller, x86, linux-um,
	James.Bottomley, mingo, paulus, richard, bp, linuxppc-dev, jdike,
	anton.ivanov
In-Reply-To: <33001e60-cbfc-f114-55bf-f347f21fee9b@metux.net>

On Tue, Dec 15 2020 at 21:12, Enrico Weigelt wrote:
> On 09.12.20 00:01, Thomas Gleixner wrote:
>>   3) It's invoked from __handle_domain_irq() when the 'hwirq' which is
>>      handed in by the caller does not resolve to a mapped Linux
>>      interrupt which is pretty much the same as the x86 situation above
>>      in #1, but it prints useless data.
>> 
>>      It prints 'irq' which is invalid but it does not print the really
>>      interesting 'hwirq' which was handed in by the caller and did
>>      not resolve.
>
> I wouldn't say the irq-nr isn't interesting. In my particular case it
> was quite what I've been looking for. But you're right, hwirq should
> also be printed.

The number is _not_ interesting in this case. It's useless because the
function does:

    irq = hwirq;

    if (lookup)
        irq = find_mapping(hwirq);

    if (!irq || irq >= nr_irqs)
       -> BAD

So irq is completely useless because find_mapping() returns 0 if there
is no mapping and if irq >= nr_irqs then there was no lookup and the
hwirq number is bogus.

In both cases the only interesting information is that hwirq does not
resolve to a valid Linux interrupt number and which hwirq number caused
that.

>>      In this case the Linux irq number is uninteresting as it is known
>>      to be invalid and simply is not mapped and therefore does not
>>      exist.
>
> In my case it came in from generic_handle_irq(), and in this case this
> irq number (IMHO) has been valid, but nobody handled it, so it went to
> ack_bad_irq.

generic_handle_irq() _is_ a different function which is only invoked
when there is a valid Linux interrupt number and then the ack_bad_irq()
is invoked from a different place. See below.

> Of course, if this function is meant as a fallback to ack some not
> otherwise handled IRQ on the hw, the linux irq number indeed isn't quite
> helpful (unless we expect that code to do a lookup to the hw irq).

If there is no valid linux irq number then there is no lookup. And you
can't look it up from the hardware either.

If you look really then you find out that there is exactly _ONE_
architecture which does anything else than incrementing a counter and/or
printing stuff: X86, which has a big fat comment explaining why. The
only way to ack an interrupt on X86 is to issue EOI on the local APIC,
i.e. it does _not_ need any further information.

> ... rethinking this further ... shouldn't we also pass in even more data
> (eg. irq_desc, irqchip, ...), so this function can check which hw to
> actually talk to ?

There are 3 ways to get there:

      1) via dummy chip which obviously has no hardware associated

      2) via handle_bad_irq() which prints the info already

      3) __handle_domain_irq() which cannot print anything and obviously
         cannot figure out the hw to talk to because there is no irq
         descriptor associated.

>>   4) It's invoked from the dummy irq chip which is installed for a
>>      couple of truly virtual interrupts where the invocation of
>>      dummy_irq_chip::irq_ack() is indicating wreckage.
>> 
>>      In that case the Linux irq number is the thing which is printed.
>> 
>> So no. It's not just inconsistent it's in some places outright
>> wrong. What we really want is:
>> 
>> ack_bad_irq(int hwirq, int virq)
>
> is 'int' correct here ?

This was just for illustration.

> BTW: I also wonder why the virq is unsigned int, while hwirq (eg. in
> struct irq_data) is unsigned long. shouldn't the virtual number space
> be at least as big (or even bigger) than the hw one ?

Only if there are no irqdomain mappings and the virq space is 1:1 mapped
to the hwirq space. Systems with > 4G interrupts are pretty unlikely.

Also hwirq can be completely artificial and encode information about
interrupts which are composed, i.e. PCI/MSI. See pci_msi_domain_calc_hwirq().

>  {
>>         if (hwirq >= 0)
>>            print_useful_info(hwirq);
>>         if (virq > 0)
>>            print_useful_info(virq);
>>         arch_try_to_ack(hwirq, virq);
>> }
>>     
>> for this to make sense. Just fixing the existing printk() to be less
>> wrong is not really an improvement.
>
> Okay, makes sense.
>
> OTOH: since both callers (dummychip.c, handle.c) already dump out before
> ack_bad_irq(), do we need to print out anything at all ?

Not all callers print something, but yes this could do with some general
cleanup.

> I've also seen that many archs increase a counter (some use long, others
> atomic_t) - should we also consolidate this in an arch-independent way
> in handle.c (or does kstat_incr_irqs_this_cpu already do this) ?

kstat_incr_irqs_this_cpu(desc) operates on the irq descriptor which
requires that an irq descriptor exists in the first place.

The error counter is independent of that, but yes there is room for
consolidation.

Thanks,

        tglx

^ permalink raw reply

* Re: [RFC PATCH] treewide: remove bzip2 compression support
From: Alex Xu (Hello71) @ 2020-12-15 23:39 UTC (permalink / raw)
  To: linux-kernel, linux-kbuild, linux-arm-kernel, linux-aspeed,
	linux-mips, openrisc, linux-parisc, linuxppc-dev, linux-riscv,
	linux-s390, linux-sh, linux-xtensa
In-Reply-To: <20201215190315.8681-1-alex_y_xu@yahoo.ca>

Excerpts from Alex Xu (Hello71)'s message of December 15, 2020 2:03 pm:
> bzip2 is either slower or larger than every other supported algorithm,
> according to benchmarks at [0]. It is far slower to decompress than any
> other algorithm, and still larger than lzma, xz, and zstd.
> 
> [0] https://lore.kernel.org/lkml/1588791882.08g1378g67.none@localhost/
> 
> Signed-off-by: Alex Xu (Hello71) <alex_y_xu@yahoo.ca>

Upon further research, I found that bzip2 removal was already 
implemented as part of zstd addition, but were apparently abandoned in 
an effort to get zstd in. I will check those patches and try sending 
those instead. Thanks to all reviewers for comments on this patch.

^ permalink raw reply

* [powerpc:next] BUILD SUCCESS c15d1f9d03a0f4f68bf52dffdd541c8054e6de35
From: kernel test robot @ 2020-12-16  1:13 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev

tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git  next
branch HEAD: c15d1f9d03a0f4f68bf52dffdd541c8054e6de35  powerpc: Add config fragment for disabling -Werror

elapsed time: 724m

configs tested: 175
configs skipped: 2

The following configs have been built successfully.
More configs may be tested in the coming days.

gcc tested configs:
arm                                 defconfig
arm64                            allyesconfig
arm64                               defconfig
arm                              allyesconfig
arm                              allmodconfig
sh                                  defconfig
ia64                        generic_defconfig
arm                             ezx_defconfig
mips                   sb1250_swarm_defconfig
mips                     decstation_defconfig
arm64                            alldefconfig
powerpc                       holly_defconfig
powerpc                 mpc837x_mds_defconfig
sh                         microdev_defconfig
sh                           se7712_defconfig
powerpc                      bamboo_defconfig
m68k                       m5275evb_defconfig
powerpc                  iss476-smp_defconfig
powerpc                           allnoconfig
arm                           efm32_defconfig
powerpc                  mpc885_ads_defconfig
powerpc               mpc834x_itxgp_defconfig
arc                           tb10x_defconfig
powerpc                     ksi8560_defconfig
h8300                       h8s-sim_defconfig
powerpc                     powernv_defconfig
arm                         assabet_defconfig
ia64                      gensparse_defconfig
openrisc                 simple_smp_defconfig
powerpc                 mpc836x_mds_defconfig
arm                    vt8500_v6_v7_defconfig
riscv                    nommu_k210_defconfig
mips                            ar7_defconfig
powerpc                       ppc64_defconfig
arm                          ep93xx_defconfig
powerpc                        warp_defconfig
mips                     cu1830-neo_defconfig
powerpc                       ebony_defconfig
riscv                            alldefconfig
powerpc                     tqm8540_defconfig
c6x                        evmc6457_defconfig
powerpc                        fsp2_defconfig
h8300                    h8300h-sim_defconfig
xtensa                       common_defconfig
arm                         shannon_defconfig
m68k                          hp300_defconfig
m68k                         apollo_defconfig
powerpc                      walnut_defconfig
arm                      footbridge_defconfig
arm                        cerfcube_defconfig
xtensa                    xip_kc705_defconfig
sh                           se7724_defconfig
arm                            lart_defconfig
ia64                         bigsur_defconfig
um                             i386_defconfig
powerpc                 mpc85xx_cds_defconfig
arm                        clps711x_defconfig
powerpc                     ep8248e_defconfig
arm                       netwinder_defconfig
arm                           h5000_defconfig
powerpc                     kmeter1_defconfig
arm                      integrator_defconfig
h8300                     edosk2674_defconfig
arm                          moxart_defconfig
arc                                 defconfig
powerpc                 mpc8315_rdb_defconfig
mips                         tb0219_defconfig
powerpc                    adder875_defconfig
c6x                        evmc6678_defconfig
arm                            xcep_defconfig
arm                        multi_v7_defconfig
powerpc                       maple_defconfig
mips                           ip27_defconfig
mips                        bcm47xx_defconfig
arm                     davinci_all_defconfig
powerpc                 linkstation_defconfig
m68k                            mac_defconfig
powerpc64                        alldefconfig
arm                           sunxi_defconfig
arm                            zeus_defconfig
mips                         tb0287_defconfig
powerpc                     sequoia_defconfig
mips                  decstation_64_defconfig
x86_64                           allyesconfig
powerpc                      ppc64e_defconfig
arm                  colibri_pxa270_defconfig
arm                            mps2_defconfig
arm                       omap2plus_defconfig
powerpc                 mpc834x_itx_defconfig
sh                        sh7763rdp_defconfig
xtensa                          iss_defconfig
sh                           sh2007_defconfig
arm                       mainstone_defconfig
m68k                          atari_defconfig
m68k                        m5272c3_defconfig
arm                        neponset_defconfig
sh                        dreamcast_defconfig
parisc                              defconfig
s390                             allyesconfig
m68k                        m5407c3_defconfig
sh                               j2_defconfig
parisc                generic-32bit_defconfig
mips                         tb0226_defconfig
arm                           corgi_defconfig
parisc                           alldefconfig
mips                      pic32mzda_defconfig
mips                           xway_defconfig
arm                         mv78xx0_defconfig
powerpc                 mpc832x_mds_defconfig
arm                           u8500_defconfig
powerpc                     tqm8560_defconfig
ia64                             allmodconfig
ia64                                defconfig
ia64                             allyesconfig
m68k                             allmodconfig
m68k                                defconfig
m68k                             allyesconfig
nios2                               defconfig
arc                              allyesconfig
nds32                             allnoconfig
c6x                              allyesconfig
nds32                               defconfig
nios2                            allyesconfig
csky                                defconfig
alpha                               defconfig
alpha                            allyesconfig
xtensa                           allyesconfig
h8300                            allyesconfig
sh                               allmodconfig
parisc                           allyesconfig
s390                                defconfig
i386                             allyesconfig
sparc                            allyesconfig
i386                               tinyconfig
i386                                defconfig
sparc                               defconfig
mips                             allyesconfig
mips                             allmodconfig
powerpc                          allyesconfig
powerpc                          allmodconfig
x86_64               randconfig-a003-20201215
x86_64               randconfig-a006-20201215
x86_64               randconfig-a002-20201215
x86_64               randconfig-a005-20201215
x86_64               randconfig-a004-20201215
x86_64               randconfig-a001-20201215
i386                 randconfig-a001-20201215
i386                 randconfig-a004-20201215
i386                 randconfig-a003-20201215
i386                 randconfig-a002-20201215
i386                 randconfig-a006-20201215
i386                 randconfig-a005-20201215
i386                 randconfig-a014-20201215
i386                 randconfig-a013-20201215
i386                 randconfig-a012-20201215
i386                 randconfig-a011-20201215
i386                 randconfig-a015-20201215
i386                 randconfig-a016-20201215
riscv                            allyesconfig
riscv                    nommu_virt_defconfig
riscv                             allnoconfig
riscv                               defconfig
riscv                          rv32_defconfig
riscv                            allmodconfig
x86_64                              defconfig
x86_64                               rhel-8.3
x86_64                                   rhel
x86_64                    rhel-7.6-kselftests
x86_64                                  kexec

clang tested configs:
x86_64               randconfig-a016-20201215
x86_64               randconfig-a012-20201215
x86_64               randconfig-a013-20201215
x86_64               randconfig-a015-20201215
x86_64               randconfig-a014-20201215
x86_64               randconfig-a011-20201215

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply

* [powerpc:merge] BUILD SUCCESS a1d4aa500bfb93c4ea6eb9a3c5c9cb6720ed8f46
From: kernel test robot @ 2020-12-16  1:13 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev

tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git  merge
branch HEAD: a1d4aa500bfb93c4ea6eb9a3c5c9cb6720ed8f46  Automatic merge of 'next' into merge (2020-12-15 23:50)

elapsed time: 727m

configs tested: 134
configs skipped: 2

The following configs have been built successfully.
More configs may be tested in the coming days.

gcc tested configs:
arm                                 defconfig
arm64                            allyesconfig
arm64                               defconfig
arm                              allyesconfig
arm                              allmodconfig
arc                 nsimosci_hs_smp_defconfig
riscv                            allyesconfig
mips                      bmips_stb_defconfig
arm                        realview_defconfig
mips                     decstation_defconfig
arm64                            alldefconfig
powerpc                       holly_defconfig
powerpc                 mpc837x_mds_defconfig
arm                        shmobile_defconfig
sh                ecovec24-romimage_defconfig
arm                      pxa255-idp_defconfig
h8300                       h8s-sim_defconfig
powerpc                     powernv_defconfig
arm                         assabet_defconfig
ia64                      gensparse_defconfig
openrisc                 simple_smp_defconfig
powerpc                 mpc836x_mds_defconfig
arm                    vt8500_v6_v7_defconfig
riscv                    nommu_k210_defconfig
mips                            ar7_defconfig
powerpc                       ppc64_defconfig
arm                          ep93xx_defconfig
powerpc                        warp_defconfig
nios2                         10m50_defconfig
mips                       bmips_be_defconfig
xtensa                    xip_kc705_defconfig
sh                           se7724_defconfig
arm                            lart_defconfig
um                             i386_defconfig
ia64                         bigsur_defconfig
powerpc                 mpc85xx_cds_defconfig
arm                        clps711x_defconfig
powerpc                     ep8248e_defconfig
arm                       netwinder_defconfig
arm                           h5000_defconfig
powerpc                     kmeter1_defconfig
arm                      integrator_defconfig
powerpc                     pq2fads_defconfig
powerpc                     tqm5200_defconfig
sh                        sh7763rdp_defconfig
arm                          collie_defconfig
arm                      jornada720_defconfig
arm                     davinci_all_defconfig
powerpc                 linkstation_defconfig
m68k                            mac_defconfig
powerpc64                        alldefconfig
arm                           sunxi_defconfig
arm                            zeus_defconfig
mips                         tb0287_defconfig
powerpc                     sequoia_defconfig
mips                  decstation_64_defconfig
arm                       omap2plus_defconfig
powerpc                 mpc834x_itx_defconfig
xtensa                          iss_defconfig
sh                           sh2007_defconfig
sh                        dreamcast_defconfig
arm                           corgi_defconfig
parisc                           alldefconfig
mips                      pic32mzda_defconfig
mips                           xway_defconfig
arm                         mv78xx0_defconfig
ia64                             allmodconfig
ia64                                defconfig
ia64                             allyesconfig
m68k                             allmodconfig
m68k                                defconfig
m68k                             allyesconfig
nios2                               defconfig
arc                              allyesconfig
nds32                             allnoconfig
c6x                              allyesconfig
nds32                               defconfig
nios2                            allyesconfig
csky                                defconfig
alpha                               defconfig
alpha                            allyesconfig
xtensa                           allyesconfig
h8300                            allyesconfig
arc                                 defconfig
sh                               allmodconfig
parisc                              defconfig
s390                             allyesconfig
parisc                           allyesconfig
s390                                defconfig
i386                             allyesconfig
sparc                            allyesconfig
sparc                               defconfig
i386                               tinyconfig
i386                                defconfig
mips                             allyesconfig
mips                             allmodconfig
powerpc                          allyesconfig
powerpc                          allmodconfig
powerpc                           allnoconfig
x86_64               randconfig-a003-20201215
x86_64               randconfig-a006-20201215
x86_64               randconfig-a002-20201215
x86_64               randconfig-a005-20201215
x86_64               randconfig-a004-20201215
x86_64               randconfig-a001-20201215
i386                 randconfig-a001-20201215
i386                 randconfig-a004-20201215
i386                 randconfig-a003-20201215
i386                 randconfig-a002-20201215
i386                 randconfig-a006-20201215
i386                 randconfig-a005-20201215
i386                 randconfig-a014-20201215
i386                 randconfig-a013-20201215
i386                 randconfig-a012-20201215
i386                 randconfig-a011-20201215
i386                 randconfig-a015-20201215
i386                 randconfig-a016-20201215
riscv                    nommu_virt_defconfig
riscv                             allnoconfig
riscv                               defconfig
riscv                          rv32_defconfig
riscv                            allmodconfig
x86_64                                   rhel
x86_64                           allyesconfig
x86_64                    rhel-7.6-kselftests
x86_64                              defconfig
x86_64                               rhel-8.3
x86_64                                  kexec

clang tested configs:
x86_64               randconfig-a016-20201215
x86_64               randconfig-a012-20201215
x86_64               randconfig-a013-20201215
x86_64               randconfig-a015-20201215
x86_64               randconfig-a014-20201215
x86_64               randconfig-a011-20201215

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply

* Re: [PATCH] powerpc/boot: Fix build of dts/fsl
From: Michael Ellerman @ 2020-12-16  2:41 UTC (permalink / raw)
  To: Masahiro Yamada; +Cc: linuxppc-dev
In-Reply-To: <CAK7LNASJcp=U9sKo9FVdkGNWXu7TGDL2zE-hFQymtfvUhY5+wA@mail.gmail.com>

Masahiro Yamada <masahiroy@kernel.org> writes:
> On Tue, Dec 15, 2020 at 12:29 PM Michael Ellerman <mpe@ellerman.id.au> wrote:
>>
>> The lkp robot reported that some configs fail to build, for example
>> mpc85xx_smp_defconfig, with:
>>
>>   cc1: fatal error: opening output file arch/powerpc/boot/dts/fsl/.mpc8540ads.dtb.dts.tmp: No such file or directory
>>
>> This bisects to:
>>   cc8a51ca6f05 ("kbuild: always create directories of targets")
>>
>> Although that commit claims to be about in-tree builds, it somehow
>> breaks out-of-tree builds. But presumably it's just exposing a latent
>> bug in our Makefiles.
>>
>> We can fix it by adding to targets for dts/fsl in the same way that we
>> do for dts.
>>
>> Fixes: cc8a51ca6f05 ("kbuild: always create directories of targets")
>> Reported-by: kernel test robot <lkp@intel.com>
>> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
>> ---
>>  arch/powerpc/boot/Makefile | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile
>> index 68a7534454cd..c3e084cceaed 100644
>> --- a/arch/powerpc/boot/Makefile
>> +++ b/arch/powerpc/boot/Makefile
>> @@ -372,6 +372,8 @@ initrd-y := $(filter-out $(image-y), $(initrd-y))
>>  targets        += $(image-y) $(initrd-y)
>>  targets += $(foreach x, dtbImage uImage cuImage simpleImage treeImage, \
>>                 $(patsubst $(x).%, dts/%.dtb, $(filter $(x).%, $(image-y))))
>> +targets += $(foreach x, dtbImage uImage cuImage simpleImage treeImage, \
>> +               $(patsubst $(x).%, dts/fsl/%.dtb, $(filter $(x).%, $(image-y))))
>>
>>  $(addprefix $(obj)/, $(initrd-y)): $(obj)/ramdisk.image.gz
>>
>
>
> Some freescale dts files are right under arch/powerpc/boot/dts/,
> but some are in the fsl/ subdirectory.
> I do not understand the policy.

There isn't a policy. Best I can tell Kumar felt like it would be
cleaner to have a separate directory for (some of) the Freescale DTS
files, when he initially submitted them ~9 years ago.


> If "fsl/" is a very special case,
> I just thought we could add a new syntax, fslimage-y,
> but I do not mind either way.

OK. If you don't mind I'll merge my patch as a quick fix for now.

Then we can probably move all the fsl/ files up one level and avoid the
problem entirely in future.

> fslimage-$(CONFIG_MPC8540_ADS) += cuImage.mpc8540ads
>
> targets += $(foreach x, dtbImage uImage cuImage simpleImage treeImage, \
>                $(patsubst $(x).%, dts/fsl/%.dtb, $(filter $(x).%,
> $(fslimage-y))))
>
>
> This Makefile is wrong over-all anyway.

Excellent.

cheers

^ permalink raw reply

* Re: [PATCH v2 07/13] powerpc: Increase NR_IRQS range to support more KVM guests
From: Michael Ellerman @ 2020-12-16  2:49 UTC (permalink / raw)
  To: Cédric Le Goater, linuxppc-dev; +Cc: Greg Kurz
In-Reply-To: <9fca102b-b1b2-84b0-085f-96965f126e58@kaod.org>

Cédric Le Goater <clg@kaod.org> writes:
> On 12/11/20 12:51 AM, Michael Ellerman wrote:
>> Cédric Le Goater <clg@kaod.org> writes:
>>> PowerNV systems can handle up to 4K guests and 1M interrupt numbers
>>> per chip. Increase the range of allowed interrupts to support a larger
>>> number of guests.
>>>
>>> Reviewed-by: Greg Kurz <groug@kaod.org>
>>> Signed-off-by: Cédric Le Goater <clg@kaod.org>
>>> ---
>>>  arch/powerpc/Kconfig | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
>>> index 5181872f9452..c250fbd430d1 100644
>>> --- a/arch/powerpc/Kconfig
>>> +++ b/arch/powerpc/Kconfig
>>> @@ -66,7 +66,7 @@ config NEED_PER_CPU_PAGE_FIRST_CHUNK
>>>  
>>>  config NR_IRQS
>>>  	int "Number of virtual interrupt numbers"
>>> -	range 32 32768
>>> +	range 32 1048576
>>>  	default "512"
>>>  	help
>>>  	  This defines the number of virtual interrupt numbers the kernel
>> 
>> We should really do what other arches do, and size this appropriately
>> based on the config, rather than asking users to guess what size they
>> need.
>> 
>> But I guess I'll take this for now, and we can do something fancier
>> later.
>
> I was thinking on adding a property to OPAL to size the HW interrupt 
> number space. Is that it ?

That's a separate issue. NR_IRQS is about the maximum number of Linux
interrupts, and it's a compile time limit.

In the old days there was an array of irq_desc[NR_IRQS] in .data, so you
didn't want NR_IRQS to be too big. These days we don't do that, because
of the sparse IRQ support, but I don't know if it's completely free to
make NR_IRQS arbitrarily large at build time.

> That would be good because it's increasing from 20bits on P9 to 24bits
> on P10.

That's probably still helpful, it might mean we can shrink some
structures at runtime.

> I am checking other arches.

Thanks.

cheers

^ permalink raw reply

* Re: [PATCH] powerpc/boot: Fix build of dts/fsl
From: Masahiro Yamada @ 2020-12-16  5:23 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev
In-Reply-To: <87zh2exyjp.fsf@mpe.ellerman.id.au>

On Wed, Dec 16, 2020 at 11:41 AM Michael Ellerman <mpe@ellerman.id.au> wrote:
>
> Masahiro Yamada <masahiroy@kernel.org> writes:
> > On Tue, Dec 15, 2020 at 12:29 PM Michael Ellerman <mpe@ellerman.id.au> wrote:
> >>
> >> The lkp robot reported that some configs fail to build, for example
> >> mpc85xx_smp_defconfig, with:
> >>
> >>   cc1: fatal error: opening output file arch/powerpc/boot/dts/fsl/.mpc8540ads.dtb.dts.tmp: No such file or directory
> >>
> >> This bisects to:
> >>   cc8a51ca6f05 ("kbuild: always create directories of targets")
> >>
> >> Although that commit claims to be about in-tree builds, it somehow
> >> breaks out-of-tree builds. But presumably it's just exposing a latent
> >> bug in our Makefiles.
> >>
> >> We can fix it by adding to targets for dts/fsl in the same way that we
> >> do for dts.
> >>
> >> Fixes: cc8a51ca6f05 ("kbuild: always create directories of targets")
> >> Reported-by: kernel test robot <lkp@intel.com>
> >> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
> >> ---
> >>  arch/powerpc/boot/Makefile | 2 ++
> >>  1 file changed, 2 insertions(+)
> >>
> >> diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile
> >> index 68a7534454cd..c3e084cceaed 100644
> >> --- a/arch/powerpc/boot/Makefile
> >> +++ b/arch/powerpc/boot/Makefile
> >> @@ -372,6 +372,8 @@ initrd-y := $(filter-out $(image-y), $(initrd-y))
> >>  targets        += $(image-y) $(initrd-y)
> >>  targets += $(foreach x, dtbImage uImage cuImage simpleImage treeImage, \
> >>                 $(patsubst $(x).%, dts/%.dtb, $(filter $(x).%, $(image-y))))
> >> +targets += $(foreach x, dtbImage uImage cuImage simpleImage treeImage, \
> >> +               $(patsubst $(x).%, dts/fsl/%.dtb, $(filter $(x).%, $(image-y))))
> >>
> >>  $(addprefix $(obj)/, $(initrd-y)): $(obj)/ramdisk.image.gz
> >>
> >
> >
> > Some freescale dts files are right under arch/powerpc/boot/dts/,
> > but some are in the fsl/ subdirectory.
> > I do not understand the policy.
>
> There isn't a policy. Best I can tell Kumar felt like it would be
> cleaner to have a separate directory for (some of) the Freescale DTS
> files, when he initially submitted them ~9 years ago.
>
>
> > If "fsl/" is a very special case,
> > I just thought we could add a new syntax, fslimage-y,
> > but I do not mind either way.
>
> OK. If you don't mind I'll merge my patch as a quick fix for now.
>
> Then we can probably move all the fsl/ files up one level and avoid the
> problem entirely in future.


Yes. I think it is OK.

As for PPC, most of the DT files are freescale.

Even if you separated DT files in vendor directories,
the majority would go into fsl/.



> > fslimage-$(CONFIG_MPC8540_ADS) += cuImage.mpc8540ads
> >
> > targets += $(foreach x, dtbImage uImage cuImage simpleImage treeImage, \
> >                $(patsubst $(x).%, dts/fsl/%.dtb, $(filter $(x).%,
> > $(fslimage-y))))
> >
> >
> > This Makefile is wrong over-all anyway.
>
> Excellent.

You can pass V=2 to see why targets under arch/powerpc/boot/
are needlessly rebuilt.

This Makefile is already too cluttered, and I do not have
much time to look into it.






-- 
Best Regards
Masahiro Yamada

^ permalink raw reply

* [PATCH v2 1/2] KVM: PPC: Book3S HV: Add support for H_RPT_INVALIDATE
From: Bharata B Rao @ 2020-12-16  8:54 UTC (permalink / raw)
  To: kvm-ppc, linuxppc-dev; +Cc: aneesh.kumar, npiggin, Bharata B Rao, david
In-Reply-To: <20201216085447.1265433-1-bharata@linux.ibm.com>

Implement H_RPT_INVALIDATE hcall and add KVM capability
KVM_CAP_PPC_RPT_INVALIDATE to indicate the support for the same.

This hcall does two types of TLB invalidations:

1. Process-scoped invalidations for guests with LPCR[GTSE]=0.
   This is currently not used in KVM as GTSE is not usually
   disabled in KVM.
2. Partition-scoped invalidations that an L1 hypervisor does on
   behalf of an L2 guest. This replaces the uses of the existing
   hcall H_TLB_INVALIDATE.

Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
---
 Documentation/virt/kvm/api.rst                |  17 +++
 .../include/asm/book3s/64/tlbflush-radix.h    |  18 +++
 arch/powerpc/include/asm/kvm_book3s.h         |   3 +
 arch/powerpc/kvm/book3s_hv.c                  | 121 ++++++++++++++++++
 arch/powerpc/kvm/book3s_hv_nested.c           |  94 ++++++++++++++
 arch/powerpc/kvm/powerpc.c                    |   3 +
 arch/powerpc/mm/book3s64/radix_tlb.c          |   4 -
 include/uapi/linux/kvm.h                      |   1 +
 8 files changed, 257 insertions(+), 4 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index e00a66d72372..5ce237c0d707 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6014,6 +6014,23 @@ KVM_EXIT_X86_RDMSR and KVM_EXIT_X86_WRMSR exit notifications which user space
 can then handle to implement model specific MSR handling and/or user notifications
 to inform a user that an MSR was not handled.
 
+7.22 KVM_CAP_PPC_RPT_INVALIDATE
+------------------------------
+
+:Capability: KVM_CAP_PPC_RPT_INVALIDATE
+:Architectures: ppc
+:Type: vm
+
+This capability indicates that the kernel is capable of handling
+H_RPT_INVALIDATE hcall.
+
+In order to enable the use of H_RPT_INVALIDATE in the guest,
+user space might have to advertise it for the guest. For example,
+IBM pSeries (sPAPR) guest starts using it if "hcall-rpt-invalidate" is
+present in the "ibm,hypertas-functions" device-tree property.
+
+This capability is always enabled.
+
 8. Other capabilities.
 ======================
 
diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
index 94439e0cefc9..aace7e9b2397 100644
--- a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
+++ b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
@@ -4,6 +4,10 @@
 
 #include <asm/hvcall.h>
 
+#define RIC_FLUSH_TLB 0
+#define RIC_FLUSH_PWC 1
+#define RIC_FLUSH_ALL 2
+
 struct vm_area_struct;
 struct mm_struct;
 struct mmu_gather;
@@ -21,6 +25,20 @@ static inline u64 psize_to_rpti_pgsize(unsigned long psize)
 	return H_RPTI_PAGE_ALL;
 }
 
+static inline int rpti_pgsize_to_psize(unsigned long page_size)
+{
+	if (page_size == H_RPTI_PAGE_4K)
+		return MMU_PAGE_4K;
+	if (page_size == H_RPTI_PAGE_64K)
+		return MMU_PAGE_64K;
+	if (page_size == H_RPTI_PAGE_2M)
+		return MMU_PAGE_2M;
+	if (page_size == H_RPTI_PAGE_1G)
+		return MMU_PAGE_1G;
+	else
+		return MMU_PAGE_64K; /* Default */
+}
+
 static inline int mmu_get_ap(int psize)
 {
 	return mmu_psize_defs[psize].ap;
diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
index d32ec9ae73bd..0f1c5fa6e8ce 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -298,6 +298,9 @@ void kvmhv_set_ptbl_entry(unsigned int lpid, u64 dw0, u64 dw1);
 void kvmhv_release_all_nested(struct kvm *kvm);
 long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu);
 long kvmhv_do_nested_tlbie(struct kvm_vcpu *vcpu);
+long kvmhv_h_rpti_nested(struct kvm_vcpu *vcpu, unsigned long lpid,
+			 unsigned long type, unsigned long pg_sizes,
+			 unsigned long start, unsigned long end);
 int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu,
 			  u64 time_limit, unsigned long lpcr);
 void kvmhv_save_hv_regs(struct kvm_vcpu *vcpu, struct hv_guest_state *hr);
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index e3b1839fc251..adf2d1191581 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -904,6 +904,118 @@ static int kvmppc_get_yield_count(struct kvm_vcpu *vcpu)
 	return yield_count;
 }
 
+static inline void do_tlb_invalidate_all(unsigned long rb, unsigned long rs)
+{
+	asm volatile("ptesync" : : : "memory");
+	asm volatile(PPC_TLBIE_5(%0, %4, %3, %2, %1)
+		: : "r"(rb), "i"(1), "i"(1), "i"(RIC_FLUSH_ALL), "r"(rs)
+		: "memory");
+	asm volatile("eieio; tlbsync; ptesync" : : : "memory");
+}
+
+static inline void do_tlb_invalidate_pwc(unsigned long rb, unsigned long rs)
+{
+	asm volatile("ptesync" : : : "memory");
+	asm volatile(PPC_TLBIE_5(%0, %4, %3, %2, %1)
+		: : "r"(rb), "i"(1), "i"(1), "i"(RIC_FLUSH_PWC), "r"(rs)
+		: "memory");
+	asm volatile("eieio; tlbsync; ptesync" : : : "memory");
+}
+
+static inline void do_tlb_invalidate_tlb(unsigned long rb, unsigned long rs)
+{
+	asm volatile("ptesync" : : : "memory");
+	asm volatile(PPC_TLBIE_5(%0, %4, %3, %2, %1)
+		: : "r"(rb), "i"(1), "i"(1), "i"(RIC_FLUSH_TLB), "r"(rs)
+		: "memory");
+	asm volatile("eieio; tlbsync; ptesync" : : : "memory");
+}
+
+static void do_tlb_invalidate(unsigned long rs, unsigned long target,
+			      unsigned long type, unsigned long page_size,
+			      unsigned long ap, unsigned long start,
+			      unsigned long end)
+{
+	unsigned long rb;
+	unsigned long addr = start;
+
+	if ((type & H_RPTI_TYPE_ALL) == H_RPTI_TYPE_ALL) {
+		rb = PPC_BIT(53); /* IS = 1 */
+		do_tlb_invalidate_all(rb, rs);
+		return;
+	}
+
+	if (type & H_RPTI_TYPE_PWC) {
+		rb = PPC_BIT(53); /* IS = 1 */
+		do_tlb_invalidate_pwc(rb, rs);
+	}
+
+	if (!addr && end == -1) { /* PID */
+		rb = PPC_BIT(53); /* IS = 1 */
+		do_tlb_invalidate_tlb(rb, rs);
+	} else { /* EA */
+		do {
+			rb = addr & ~(PPC_BITMASK(52, 63));
+			rb |= ap << PPC_BITLSHIFT(58);
+			do_tlb_invalidate_tlb(rb, rs);
+			addr += page_size;
+		} while (addr < end);
+	}
+}
+
+static long kvmppc_h_rpt_invalidate(struct kvm_vcpu *vcpu,
+				    unsigned long pid, unsigned long target,
+				    unsigned long type, unsigned long pg_sizes,
+				    unsigned long start, unsigned long end)
+{
+	unsigned long rs, ap, psize;
+
+	if (!kvm_is_radix(vcpu->kvm))
+		return H_FUNCTION;
+
+	if (end < start)
+		return H_P5;
+
+	if (type & H_RPTI_TYPE_NESTED) {
+		if (!nesting_enabled(vcpu->kvm))
+			return H_FUNCTION;
+
+		/* Support only cores as target */
+		if (target != H_RPTI_TARGET_CMMU)
+			return H_P2;
+
+		return kvmhv_h_rpti_nested(vcpu, pid,
+					   (type & ~H_RPTI_TYPE_NESTED),
+					    pg_sizes, start, end);
+	}
+
+	rs = pid << PPC_BITLSHIFT(31);
+	rs |= vcpu->kvm->arch.lpid;
+
+	if (pg_sizes & H_RPTI_PAGE_64K) {
+		psize = rpti_pgsize_to_psize(pg_sizes & H_RPTI_PAGE_64K);
+		ap = mmu_get_ap(psize);
+		do_tlb_invalidate(rs, target, type, (1UL << 16), ap, start,
+				  end);
+	}
+
+	if (pg_sizes & H_RPTI_PAGE_2M) {
+		psize = rpti_pgsize_to_psize(pg_sizes & H_RPTI_PAGE_2M);
+		ap = mmu_get_ap(psize);
+		do_tlb_invalidate(rs, target, type, (1UL << 21), ap, start,
+				  end);
+	}
+
+	if (pg_sizes & H_RPTI_PAGE_1G) {
+		psize = rpti_pgsize_to_psize(pg_sizes & H_RPTI_PAGE_1G);
+		ap = mmu_get_ap(psize);
+		do_tlb_invalidate(rs, target, type, (1UL << 30), ap, start,
+				  end);
+	}
+
+	return H_SUCCESS;
+}
+
 int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
 {
 	unsigned long req = kvmppc_get_gpr(vcpu, 3);
@@ -1112,6 +1224,14 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
 		 */
 		ret = kvmppc_h_svm_init_abort(vcpu->kvm);
 		break;
+	case H_RPT_INVALIDATE:
+		ret = kvmppc_h_rpt_invalidate(vcpu, kvmppc_get_gpr(vcpu, 4),
+					      kvmppc_get_gpr(vcpu, 5),
+					      kvmppc_get_gpr(vcpu, 6),
+					      kvmppc_get_gpr(vcpu, 7),
+					      kvmppc_get_gpr(vcpu, 8),
+					      kvmppc_get_gpr(vcpu, 9));
+		break;
 
 	default:
 		return RESUME_HOST;
@@ -1158,6 +1278,7 @@ static int kvmppc_hcall_impl_hv(unsigned long cmd)
 	case H_XIRR_X:
 #endif
 	case H_PAGE_INIT:
+	case H_RPT_INVALIDATE:
 		return 1;
 	}
 
diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c
index 33b58549a9aa..a54ba4b1d4a7 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -1149,6 +1149,100 @@ long kvmhv_do_nested_tlbie(struct kvm_vcpu *vcpu)
 	return H_SUCCESS;
 }
 
+static long do_tlb_invalidate_nested_tlb(struct kvm_vcpu *vcpu,
+					 unsigned long lpid,
+					 unsigned long page_size,
+					 unsigned long ap,
+					 unsigned long start,
+					 unsigned long end)
+{
+	unsigned long addr = start;
+	int ret;
+
+	do {
+		ret = kvmhv_emulate_tlbie_tlb_addr(vcpu, lpid, ap,
+						   get_epn(addr));
+		if (ret)
+			return ret;
+		addr += page_size;
+	} while (addr < end);
+
+	return ret;
+}
+
+static long do_tlb_invalidate_nested_all(struct kvm_vcpu *vcpu,
+					 unsigned long lpid)
+{
+	struct kvm *kvm = vcpu->kvm;
+	struct kvm_nested_guest *gp;
+
+	gp = kvmhv_get_nested(kvm, lpid, false);
+	if (gp) {
+		kvmhv_emulate_tlbie_lpid(vcpu, gp, RIC_FLUSH_ALL);
+		kvmhv_put_nested(gp);
+	}
+	return H_SUCCESS;
+}
+
+long kvmhv_h_rpti_nested(struct kvm_vcpu *vcpu, unsigned long lpid,
+			 unsigned long type, unsigned long pg_sizes,
+			 unsigned long start, unsigned long end)
+{
+	struct kvm_nested_guest *gp;
+	long ret;
+	unsigned long psize, ap;
+
+	/*
+	 * If L2 lpid isn't valid, we need to return H_PARAMETER.
+	 * Nested KVM issues a L2 lpid flush call when creating
+	 * partition table entries for L2. This happens even before
+	 * the corresponding shadow lpid is created in HV. Until
+	 * this is fixed, ignore such flush requests.
+	 */
+	gp = kvmhv_find_nested(vcpu->kvm, lpid);
+	if (!gp)
+		return H_SUCCESS;
+
+	if ((type & H_RPTI_TYPE_NESTED_ALL) == H_RPTI_TYPE_NESTED_ALL)
+		return do_tlb_invalidate_nested_all(vcpu, lpid);
+
+	if ((type & H_RPTI_TYPE_TLB) == H_RPTI_TYPE_TLB) {
+		if (pg_sizes & H_RPTI_PAGE_64K) {
+			psize = rpti_pgsize_to_psize(pg_sizes & H_RPTI_PAGE_64K);
+			ap = mmu_get_ap(psize);
+
+			ret = do_tlb_invalidate_nested_tlb(vcpu, lpid,
+							   (1UL << 16),
+							   ap, start, end);
+			if (ret)
+				return H_P4;
+		}
+
+		if (pg_sizes & H_RPTI_PAGE_2M) {
+			psize = rpti_pgsize_to_psize(pg_sizes & H_RPTI_PAGE_2M);
+			ap = mmu_get_ap(psize);
+
+			ret = do_tlb_invalidate_nested_tlb(vcpu, lpid,
+							   (1UL << 21),
+							   ap, start, end);
+			if (ret)
+				return H_P4;
+		}
+
+		if (pg_sizes & H_RPTI_PAGE_1G) {
+			psize = rpti_pgsize_to_psize(pg_sizes & H_RPTI_PAGE_1G);
+			ap = mmu_get_ap(psize);
+
+			ret = do_tlb_invalidate_nested_tlb(vcpu, lpid,
+							   (1UL << 30),
+							   ap, start, end);
+			if (ret)
+				return H_P4;
+		}
+	}
+	return H_SUCCESS;
+}
+
 /* Used to convert a nested guest real address to a L1 guest real address */
 static int kvmhv_translate_addr_nested(struct kvm_vcpu *vcpu,
 				       struct kvm_nested_guest *gp,
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 13999123b735..172a89187116 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -678,6 +678,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 		r = hv_enabled && kvmppc_hv_ops->enable_svm &&
 			!kvmppc_hv_ops->enable_svm(NULL);
 		break;
+	case KVM_CAP_PPC_RPT_INVALIDATE:
+		r = 1;
+		break;
 #endif
 	default:
 		r = 0;
diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c b/arch/powerpc/mm/book3s64/radix_tlb.c
index b487b489d4b6..3a2b12d1d49b 100644
--- a/arch/powerpc/mm/book3s64/radix_tlb.c
+++ b/arch/powerpc/mm/book3s64/radix_tlb.c
@@ -18,10 +18,6 @@
 #include <asm/cputhreads.h>
 #include <asm/plpar_wrappers.h>
 
-#define RIC_FLUSH_TLB 0
-#define RIC_FLUSH_PWC 1
-#define RIC_FLUSH_ALL 2
-
 /*
  * tlbiel instruction for radix, set invalidation
  * i.e., r=1 and is=01 or is=10 or is=11
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index ca41220b40b8..c9ece825299e 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1053,6 +1053,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_X86_USER_SPACE_MSR 188
 #define KVM_CAP_X86_MSR_FILTER 189
 #define KVM_CAP_ENFORCE_PV_FEATURE_CPUID 190
+#define KVM_CAP_PPC_RPT_INVALIDATE 191
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.26.2


^ permalink raw reply related

* [PATCH v2 2/2] KVM: PPC: Book3S HV: Use H_RPT_INVALIDATE in nested KVM
From: Bharata B Rao @ 2020-12-16  8:54 UTC (permalink / raw)
  To: kvm-ppc, linuxppc-dev; +Cc: aneesh.kumar, npiggin, Bharata B Rao, david
In-Reply-To: <20201216085447.1265433-1-bharata@linux.ibm.com>

In the nested KVM case, replace H_TLB_INVALIDATE by the new hcall
H_RPT_INVALIDATE if available. The availability of this hcall
is determined from "hcall-rpt-invalidate" string in ibm,hypertas-functions
DT property.

Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
---
 arch/powerpc/kvm/book3s_64_mmu_radix.c | 27 +++++++++++++++++++++-----
 arch/powerpc/kvm/book3s_hv_nested.c    | 12 ++++++++++--
 2 files changed, 32 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index bb35490400e9..7ea5459022cb 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -21,6 +21,7 @@
 #include <asm/pte-walk.h>
 #include <asm/ultravisor.h>
 #include <asm/kvm_book3s_uvmem.h>
+#include <asm/plpar_wrappers.h>
 
 /*
  * Supported radix tree geometry.
@@ -318,9 +319,19 @@ void kvmppc_radix_tlbie_page(struct kvm *kvm, unsigned long addr,
 	}
 
 	psi = shift_to_mmu_psize(pshift);
-	rb = addr | (mmu_get_ap(psi) << PPC_BITLSHIFT(58));
-	rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(0, 0, 1),
-				lpid, rb);
+
+	if (!firmware_has_feature(FW_FEATURE_RPT_INVALIDATE)) {
+		rb = addr | (mmu_get_ap(psi) << PPC_BITLSHIFT(58));
+		rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(0, 0, 1),
+					lpid, rb);
+	} else {
+		rc = pseries_rpt_invalidate(lpid, H_RPTI_TARGET_CMMU,
+					    H_RPTI_TYPE_NESTED |
+					    H_RPTI_TYPE_TLB,
+					    psize_to_rpti_pgsize(psi),
+					    addr, addr + psize);
+	}
+
 	if (rc)
 		pr_err("KVM: TLB page invalidation hcall failed, rc=%ld\n", rc);
 }
@@ -334,8 +345,14 @@ static void kvmppc_radix_flush_pwc(struct kvm *kvm, unsigned int lpid)
 		return;
 	}
 
-	rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(1, 0, 1),
-				lpid, TLBIEL_INVAL_SET_LPID);
+	if (!firmware_has_feature(FW_FEATURE_RPT_INVALIDATE))
+		rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(1, 0, 1),
+					lpid, TLBIEL_INVAL_SET_LPID);
+	else
+		rc = pseries_rpt_invalidate(lpid, H_RPTI_TARGET_CMMU,
+					    H_RPTI_TYPE_NESTED |
+					    H_RPTI_TYPE_PWC, H_RPTI_PAGE_ALL,
+					    0, -1UL);
 	if (rc)
 		pr_err("KVM: TLB PWC invalidation hcall failed, rc=%ld\n", rc);
 }
diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c
index a54ba4b1d4a7..9dc694288757 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -19,6 +19,7 @@
 #include <asm/pgalloc.h>
 #include <asm/pte-walk.h>
 #include <asm/reg.h>
+#include <asm/plpar_wrappers.h>
 
 static struct patb_entry *pseries_partition_tb;
 
@@ -402,8 +403,15 @@ static void kvmhv_flush_lpid(unsigned int lpid)
 		return;
 	}
 
-	rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(2, 0, 1),
-				lpid, TLBIEL_INVAL_SET_LPID);
+	if (!firmware_has_feature(FW_FEATURE_RPT_INVALIDATE))
+		rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(2, 0, 1),
+					lpid, TLBIEL_INVAL_SET_LPID);
+	else
+		rc = pseries_rpt_invalidate(lpid, H_RPTI_TARGET_CMMU,
+					    H_RPTI_TYPE_NESTED |
+					    H_RPTI_TYPE_TLB | H_RPTI_TYPE_PWC |
+					    H_RPTI_TYPE_PAT,
+					    H_RPTI_PAGE_ALL, 0, -1UL);
 	if (rc)
 		pr_err("KVM: TLB LPID invalidation hcall failed, rc=%ld\n", rc);
 }
-- 
2.26.2


^ permalink raw reply related

* [PATCH v2 0/2] Support for H_RPT_INVALIDATE in PowerPC KVM
From: Bharata B Rao @ 2020-12-16  8:54 UTC (permalink / raw)
  To: kvm-ppc, linuxppc-dev; +Cc: aneesh.kumar, npiggin, Bharata B Rao, david

This patchset adds support for the new hcall H_RPT_INVALIDATE
and replaces the nested tlb flush calls with this new hcall
if support for the same exists.

Changes in v2:
-------------
- Not enabling the hcall by default now, userspace can enable it when
  required.
- Added implementation for process-scoped invalidations in the hcall.

v1: https://lore.kernel.org/linuxppc-dev/20201019112642.53016-1-bharata@linux.ibm.com/T/#t

H_RPT_INVALIDATE
================
Syntax:
int64   /* H_Success: Return code on successful completion */
        /* H_Busy - repeat the call with the same */
        /* H_Parameter, H_P2, H_P3, H_P4, H_P5 : Invalid parameters */
        hcall(const uint64 H_RPT_INVALIDATE, /* Invalidate RPT translation lookaside information */
              uint64 pid,       /* PID/LPID to invalidate */
              uint64 target,    /* Invalidation target */
              uint64 type,      /* Type of lookaside information */
              uint64 pageSizes,     /* Page sizes */
              uint64 start,     /* Start of Effective Address (EA) range (inclusive) */
              uint64 end)       /* End of EA range (exclusive) */

Invalidation targets (target)
-----------------------------
Core MMU        0x01 /* All virtual processors in the partition */
Core local MMU  0x02 /* Current virtual processor */
Nest MMU        0x04 /* All nest/accelerator agents in use by the partition */

A combination of the above can be specified, except core and core local.

Type of translation to invalidate (type)
---------------------------------------
NESTED       0x0001  /* Invalidate nested guest partition-scope */
TLB          0x0002  /* Invalidate TLB */
PWC          0x0004  /* Invalidate Page Walk Cache */
PRT          0x0008  /* Invalidate Process Table Entries if NESTED is clear */
PAT          0x0008  /* Invalidate Partition Table Entries if NESTED is set */

A combination of the above can be specified.

Page size mask (pageSizes)
--------------------------
4K              0x01
64K             0x02
2M              0x04
1G              0x08
All sizes       (-1UL)

A combination of the above can be specified.
All page sizes can be selected with -1.

Semantics: Invalidate radix tree lookaside information
           matching the parameters given.
* Return H_P2, H_P3 or H_P4 if target, type, or pageSizes parameters are
  different from the defined values.
* Return H_PARAMETER if NESTED is set and pid is not a valid nested
  LPID allocated to this partition
* Return H_P5 if (start, end) doesn't form a valid range. Start and end
  should be a valid Quadrant address and  end > start.
* Return H_NotSupported if the partition is not in running in radix
  translation mode.
* May invalidate more translation information than requested.
* If start = 0 and end = -1, set the range to cover all valid addresses.
  Else start and end should be aligned to 4kB (lower 11 bits clear).
* If NESTED is clear, then invalidate process scoped lookaside information.
  Else pid specifies a nested LPID, and the invalidation is performed
  on nested guest partition table and nested guest partition scope real
  addresses.
* If pid = 0 and NESTED is clear, then valid addresses are quadrant 3 and
  quadrant 0 spaces, Else valid addresses are quadrant 0.
* Pages which are fully covered by the range are to be invalidated.
  Those which are partially covered are considered outside invalidation
  range, which allows a caller to optimally invalidate ranges that may
  contain mixed page sizes.
* Return H_SUCCESS on success.

Bharata B Rao (2):
  KVM: PPC: Book3S HV: Add support for H_RPT_INVALIDATE
  KVM: PPC: Book3S HV: Use H_RPT_INVALIDATE in nested KVM

 Documentation/virt/kvm/api.rst                |  17 +++
 .../include/asm/book3s/64/tlbflush-radix.h    |  18 +++
 arch/powerpc/include/asm/kvm_book3s.h         |   3 +
 arch/powerpc/kvm/book3s_64_mmu_radix.c        |  27 +++-
 arch/powerpc/kvm/book3s_hv.c                  | 121 ++++++++++++++++++
 arch/powerpc/kvm/book3s_hv_nested.c           | 106 ++++++++++++++-
 arch/powerpc/kvm/powerpc.c                    |   3 +
 arch/powerpc/mm/book3s64/radix_tlb.c          |   4 -
 include/uapi/linux/kvm.h                      |   1 +
 9 files changed, 289 insertions(+), 11 deletions(-)

-- 
2.26.2


^ permalink raw reply

* [RFC PATCH v1 4/7] powerpc/bpf: Move common functions into bpf_jit_comp.c
From: Christophe Leroy @ 2020-12-16 10:07 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, ast,
	daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, naveen.n.rao, sandipan
  Cc: netdev, bpf, linuxppc-dev, linux-kernel
In-Reply-To: <cover.1608112796.git.christophe.leroy@csgroup.eu>

Move into bpf_jit_comp.c the functions that will remain common to
PPC64 and PPC32 when we add support of EBPF for PPC32.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 arch/powerpc/net/Makefile         |   2 +-
 arch/powerpc/net/bpf_jit.h        |   6 +
 arch/powerpc/net/bpf_jit_comp.c   | 269 ++++++++++++++++++++++++++++++
 arch/powerpc/net/bpf_jit_comp64.c | 258 +---------------------------
 4 files changed, 281 insertions(+), 254 deletions(-)
 create mode 100644 arch/powerpc/net/bpf_jit_comp.c

diff --git a/arch/powerpc/net/Makefile b/arch/powerpc/net/Makefile
index 52c939cef5b2..969cde177880 100644
--- a/arch/powerpc/net/Makefile
+++ b/arch/powerpc/net/Makefile
@@ -2,4 +2,4 @@
 #
 # Arch-specific network modules
 #
-obj-$(CONFIG_BPF_JIT) += bpf_jit_comp64.o
+obj-$(CONFIG_BPF_JIT) += bpf_jit_comp.o bpf_jit_comp64.o
diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
index b8fa6908fc5e..b34abfce15a6 100644
--- a/arch/powerpc/net/bpf_jit.h
+++ b/arch/powerpc/net/bpf_jit.h
@@ -143,6 +143,12 @@ static inline void bpf_set_seen_register(struct codegen_context *ctx, int i)
 	ctx->seen |= 1 << (31 - i);
 }
 
+void bpf_jit_emit_func_call_rel(u32 *image, struct codegen_context *ctx, u64 func);
+int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *ctx,
+		       u32 *addrs, bool extra_pass);
+void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx);
+void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx);
+
 #endif
 
 #endif
diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
new file mode 100644
index 000000000000..efac89964873
--- /dev/null
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -0,0 +1,269 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * eBPF JIT compiler
+ *
+ * Copyright 2016 Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
+ *		  IBM Corporation
+ *
+ * Based on the powerpc classic BPF JIT compiler by Matt Evans
+ */
+#include <linux/moduleloader.h>
+#include <asm/cacheflush.h>
+#include <asm/asm-compat.h>
+#include <linux/netdevice.h>
+#include <linux/filter.h>
+#include <linux/if_vlan.h>
+#include <asm/kprobes.h>
+#include <linux/bpf.h>
+
+#include "bpf_jit.h"
+
+static void bpf_jit_fill_ill_insns(void *area, unsigned int size)
+{
+	memset32(area, BREAKPOINT_INSTRUCTION, size / 4);
+}
+
+/* Fix the branch target addresses for subprog calls */
+static int bpf_jit_fixup_subprog_calls(struct bpf_prog *fp, u32 *image,
+				       struct codegen_context *ctx, u32 *addrs)
+{
+	const struct bpf_insn *insn = fp->insnsi;
+	bool func_addr_fixed;
+	u64 func_addr;
+	u32 tmp_idx;
+	int i, ret;
+
+	for (i = 0; i < fp->len; i++) {
+		/*
+		 * During the extra pass, only the branch target addresses for
+		 * the subprog calls need to be fixed. All other instructions
+		 * can left untouched.
+		 *
+		 * The JITed image length does not change because we already
+		 * ensure that the JITed instruction sequence for these calls
+		 * are of fixed length by padding them with NOPs.
+		 */
+		if (insn[i].code == (BPF_JMP | BPF_CALL) &&
+		    insn[i].src_reg == BPF_PSEUDO_CALL) {
+			ret = bpf_jit_get_func_addr(fp, &insn[i], true,
+						    &func_addr,
+						    &func_addr_fixed);
+			if (ret < 0)
+				return ret;
+
+			/*
+			 * Save ctx->idx as this would currently point to the
+			 * end of the JITed image and set it to the offset of
+			 * the instruction sequence corresponding to the
+			 * subprog call temporarily.
+			 */
+			tmp_idx = ctx->idx;
+			ctx->idx = addrs[i] / 4;
+			bpf_jit_emit_func_call_rel(image, ctx, func_addr);
+
+			/*
+			 * Restore ctx->idx here. This is safe as the length
+			 * of the JITed sequence remains unchanged.
+			 */
+			ctx->idx = tmp_idx;
+		}
+	}
+
+	return 0;
+}
+
+struct powerpc64_jit_data {
+	struct bpf_binary_header *header;
+	u32 *addrs;
+	u8 *image;
+	u32 proglen;
+	struct codegen_context ctx;
+};
+
+bool bpf_jit_needs_zext(void)
+{
+	return true;
+}
+
+struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
+{
+	u32 proglen;
+	u32 alloclen;
+	u8 *image = NULL;
+	u32 *code_base;
+	u32 *addrs;
+	struct powerpc64_jit_data *jit_data;
+	struct codegen_context cgctx;
+	int pass;
+	int flen;
+	struct bpf_binary_header *bpf_hdr;
+	struct bpf_prog *org_fp = fp;
+	struct bpf_prog *tmp_fp;
+	bool bpf_blinded = false;
+	bool extra_pass = false;
+
+	if (!fp->jit_requested)
+		return org_fp;
+
+	tmp_fp = bpf_jit_blind_constants(org_fp);
+	if (IS_ERR(tmp_fp))
+		return org_fp;
+
+	if (tmp_fp != org_fp) {
+		bpf_blinded = true;
+		fp = tmp_fp;
+	}
+
+	jit_data = fp->aux->jit_data;
+	if (!jit_data) {
+		jit_data = kzalloc(sizeof(*jit_data), GFP_KERNEL);
+		if (!jit_data) {
+			fp = org_fp;
+			goto out;
+		}
+		fp->aux->jit_data = jit_data;
+	}
+
+	flen = fp->len;
+	addrs = jit_data->addrs;
+	if (addrs) {
+		cgctx = jit_data->ctx;
+		image = jit_data->image;
+		bpf_hdr = jit_data->header;
+		proglen = jit_data->proglen;
+		alloclen = proglen + FUNCTION_DESCR_SIZE;
+		extra_pass = true;
+		goto skip_init_ctx;
+	}
+
+	addrs = kcalloc(flen + 1, sizeof(*addrs), GFP_KERNEL);
+	if (addrs == NULL) {
+		fp = org_fp;
+		goto out_addrs;
+	}
+
+	memset(&cgctx, 0, sizeof(struct codegen_context));
+
+	/* Make sure that the stack is quadword aligned. */
+	cgctx.stack_size = round_up(fp->aux->stack_depth, 16);
+
+	/* Scouting faux-generate pass 0 */
+	if (bpf_jit_build_body(fp, 0, &cgctx, addrs, false)) {
+		/* We hit something illegal or unsupported. */
+		fp = org_fp;
+		goto out_addrs;
+	}
+
+	/*
+	 * If we have seen a tail call, we need a second pass.
+	 * This is because bpf_jit_emit_common_epilogue() is called
+	 * from bpf_jit_emit_tail_call() with a not yet stable ctx->seen.
+	 */
+	if (cgctx.seen & SEEN_TAILCALL) {
+		cgctx.idx = 0;
+		if (bpf_jit_build_body(fp, 0, &cgctx, addrs, false)) {
+			fp = org_fp;
+			goto out_addrs;
+		}
+	}
+
+	/*
+	 * Pretend to build prologue, given the features we've seen.  This will
+	 * update ctgtx.idx as it pretends to output instructions, then we can
+	 * calculate total size from idx.
+	 */
+	bpf_jit_build_prologue(0, &cgctx);
+	bpf_jit_build_epilogue(0, &cgctx);
+
+	proglen = cgctx.idx * 4;
+	alloclen = proglen + FUNCTION_DESCR_SIZE;
+
+	bpf_hdr = bpf_jit_binary_alloc(alloclen, &image, 4, bpf_jit_fill_ill_insns);
+	if (!bpf_hdr) {
+		fp = org_fp;
+		goto out_addrs;
+	}
+
+skip_init_ctx:
+	code_base = (u32 *)(image + FUNCTION_DESCR_SIZE);
+
+	if (extra_pass) {
+		/*
+		 * Do not touch the prologue and epilogue as they will remain
+		 * unchanged. Only fix the branch target address for subprog
+		 * calls in the body.
+		 *
+		 * This does not change the offsets and lengths of the subprog
+		 * call instruction sequences and hence, the size of the JITed
+		 * image as well.
+		 */
+		bpf_jit_fixup_subprog_calls(fp, code_base, &cgctx, addrs);
+
+		/* There is no need to perform the usual passes. */
+		goto skip_codegen_passes;
+	}
+
+	/* Code generation passes 1-2 */
+	for (pass = 1; pass < 3; pass++) {
+		/* Now build the prologue, body code & epilogue for real. */
+		cgctx.idx = 0;
+		bpf_jit_build_prologue(code_base, &cgctx);
+		bpf_jit_build_body(fp, code_base, &cgctx, addrs, extra_pass);
+		bpf_jit_build_epilogue(code_base, &cgctx);
+
+		if (bpf_jit_enable > 1)
+			pr_info("Pass %d: shrink = %d, seen = 0x%x\n", pass,
+				proglen - (cgctx.idx * 4), cgctx.seen);
+	}
+
+skip_codegen_passes:
+	if (bpf_jit_enable > 1)
+		/*
+		 * Note that we output the base address of the code_base
+		 * rather than image, since opcodes are in code_base.
+		 */
+		bpf_jit_dump(flen, proglen, pass, code_base);
+
+#ifdef PPC64_ELF_ABI_v1
+	/* Function descriptor nastiness: Address + TOC */
+	((u64 *)image)[0] = (u64)code_base;
+	((u64 *)image)[1] = local_paca->kernel_toc;
+#endif
+
+	fp->bpf_func = (void *)image;
+	fp->jited = 1;
+	fp->jited_len = alloclen;
+
+	bpf_flush_icache(bpf_hdr, (u8 *)bpf_hdr + (bpf_hdr->pages * PAGE_SIZE));
+	if (!fp->is_func || extra_pass) {
+		bpf_prog_fill_jited_linfo(fp, addrs);
+out_addrs:
+		kfree(addrs);
+		kfree(jit_data);
+		fp->aux->jit_data = NULL;
+	} else {
+		jit_data->addrs = addrs;
+		jit_data->ctx = cgctx;
+		jit_data->proglen = proglen;
+		jit_data->image = image;
+		jit_data->header = bpf_hdr;
+	}
+
+out:
+	if (bpf_blinded)
+		bpf_jit_prog_release_other(fp, fp == org_fp ? tmp_fp : org_fp);
+
+	return fp;
+}
+
+/* Overriding bpf_jit_free() as we don't set images read-only. */
+void bpf_jit_free(struct bpf_prog *fp)
+{
+	unsigned long addr = (unsigned long)fp->bpf_func & PAGE_MASK;
+	struct bpf_binary_header *bpf_hdr = (void *)addr;
+
+	if (fp->jited)
+		bpf_jit_binary_free(bpf_hdr);
+
+	bpf_prog_unlock_free(fp);
+}
diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c
index 89599e75028c..fe0b5937f9fe 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -69,7 +69,7 @@ static int bpf_jit_stack_offsetof(struct codegen_context *ctx, int reg)
 	BUG();
 }
 
-static void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx)
+void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx)
 {
 	int i;
 
@@ -136,7 +136,7 @@ static void bpf_jit_emit_common_epilogue(u32 *image, struct codegen_context *ctx
 	}
 }
 
-static void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx)
+void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx)
 {
 	bpf_jit_emit_common_epilogue(image, ctx);
 
@@ -171,8 +171,7 @@ static void bpf_jit_emit_func_call_hlp(u32 *image, struct codegen_context *ctx,
 	EMIT(PPC_RAW_BLRL());
 }
 
-static void bpf_jit_emit_func_call_rel(u32 *image, struct codegen_context *ctx,
-				       u64 func)
+void bpf_jit_emit_func_call_rel(u32 *image, struct codegen_context *ctx, u64 func)
 {
 	unsigned int i, ctx_idx = ctx->idx;
 
@@ -273,9 +272,8 @@ static void bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32
 }
 
 /* Assemble the body code between the prologue & epilogue */
-static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
-			      struct codegen_context *ctx,
-			      u32 *addrs, bool extra_pass)
+int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *ctx,
+		       u32 *addrs, bool extra_pass)
 {
 	const struct bpf_insn *insn = fp->insnsi;
 	int flen = fp->len;
@@ -995,249 +993,3 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
 
 	return 0;
 }
-
-/* Fix the branch target addresses for subprog calls */
-static int bpf_jit_fixup_subprog_calls(struct bpf_prog *fp, u32 *image,
-				       struct codegen_context *ctx, u32 *addrs)
-{
-	const struct bpf_insn *insn = fp->insnsi;
-	bool func_addr_fixed;
-	u64 func_addr;
-	u32 tmp_idx;
-	int i, ret;
-
-	for (i = 0; i < fp->len; i++) {
-		/*
-		 * During the extra pass, only the branch target addresses for
-		 * the subprog calls need to be fixed. All other instructions
-		 * can left untouched.
-		 *
-		 * The JITed image length does not change because we already
-		 * ensure that the JITed instruction sequence for these calls
-		 * are of fixed length by padding them with NOPs.
-		 */
-		if (insn[i].code == (BPF_JMP | BPF_CALL) &&
-		    insn[i].src_reg == BPF_PSEUDO_CALL) {
-			ret = bpf_jit_get_func_addr(fp, &insn[i], true,
-						    &func_addr,
-						    &func_addr_fixed);
-			if (ret < 0)
-				return ret;
-
-			/*
-			 * Save ctx->idx as this would currently point to the
-			 * end of the JITed image and set it to the offset of
-			 * the instruction sequence corresponding to the
-			 * subprog call temporarily.
-			 */
-			tmp_idx = ctx->idx;
-			ctx->idx = addrs[i] / 4;
-			bpf_jit_emit_func_call_rel(image, ctx, func_addr);
-
-			/*
-			 * Restore ctx->idx here. This is safe as the length
-			 * of the JITed sequence remains unchanged.
-			 */
-			ctx->idx = tmp_idx;
-		}
-	}
-
-	return 0;
-}
-
-struct powerpc64_jit_data {
-	struct bpf_binary_header *header;
-	u32 *addrs;
-	u8 *image;
-	u32 proglen;
-	struct codegen_context ctx;
-};
-
-bool bpf_jit_needs_zext(void)
-{
-	return true;
-}
-
-struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
-{
-	u32 proglen;
-	u32 alloclen;
-	u8 *image = NULL;
-	u32 *code_base;
-	u32 *addrs;
-	struct powerpc64_jit_data *jit_data;
-	struct codegen_context cgctx;
-	int pass;
-	int flen;
-	struct bpf_binary_header *bpf_hdr;
-	struct bpf_prog *org_fp = fp;
-	struct bpf_prog *tmp_fp;
-	bool bpf_blinded = false;
-	bool extra_pass = false;
-
-	if (!fp->jit_requested)
-		return org_fp;
-
-	tmp_fp = bpf_jit_blind_constants(org_fp);
-	if (IS_ERR(tmp_fp))
-		return org_fp;
-
-	if (tmp_fp != org_fp) {
-		bpf_blinded = true;
-		fp = tmp_fp;
-	}
-
-	jit_data = fp->aux->jit_data;
-	if (!jit_data) {
-		jit_data = kzalloc(sizeof(*jit_data), GFP_KERNEL);
-		if (!jit_data) {
-			fp = org_fp;
-			goto out;
-		}
-		fp->aux->jit_data = jit_data;
-	}
-
-	flen = fp->len;
-	addrs = jit_data->addrs;
-	if (addrs) {
-		cgctx = jit_data->ctx;
-		image = jit_data->image;
-		bpf_hdr = jit_data->header;
-		proglen = jit_data->proglen;
-		alloclen = proglen + FUNCTION_DESCR_SIZE;
-		extra_pass = true;
-		goto skip_init_ctx;
-	}
-
-	addrs = kcalloc(flen + 1, sizeof(*addrs), GFP_KERNEL);
-	if (addrs == NULL) {
-		fp = org_fp;
-		goto out_addrs;
-	}
-
-	memset(&cgctx, 0, sizeof(struct codegen_context));
-
-	/* Make sure that the stack is quadword aligned. */
-	cgctx.stack_size = round_up(fp->aux->stack_depth, 16);
-
-	/* Scouting faux-generate pass 0 */
-	if (bpf_jit_build_body(fp, 0, &cgctx, addrs, false)) {
-		/* We hit something illegal or unsupported. */
-		fp = org_fp;
-		goto out_addrs;
-	}
-
-	/*
-	 * If we have seen a tail call, we need a second pass.
-	 * This is because bpf_jit_emit_common_epilogue() is called
-	 * from bpf_jit_emit_tail_call() with a not yet stable ctx->seen.
-	 */
-	if (cgctx.seen & SEEN_TAILCALL) {
-		cgctx.idx = 0;
-		if (bpf_jit_build_body(fp, 0, &cgctx, addrs, false)) {
-			fp = org_fp;
-			goto out_addrs;
-		}
-	}
-
-	/*
-	 * Pretend to build prologue, given the features we've seen.  This will
-	 * update ctgtx.idx as it pretends to output instructions, then we can
-	 * calculate total size from idx.
-	 */
-	bpf_jit_build_prologue(0, &cgctx);
-	bpf_jit_build_epilogue(0, &cgctx);
-
-	proglen = cgctx.idx * 4;
-	alloclen = proglen + FUNCTION_DESCR_SIZE;
-
-	bpf_hdr = bpf_jit_binary_alloc(alloclen, &image, 4,
-			bpf_jit_fill_ill_insns);
-	if (!bpf_hdr) {
-		fp = org_fp;
-		goto out_addrs;
-	}
-
-skip_init_ctx:
-	code_base = (u32 *)(image + FUNCTION_DESCR_SIZE);
-
-	if (extra_pass) {
-		/*
-		 * Do not touch the prologue and epilogue as they will remain
-		 * unchanged. Only fix the branch target address for subprog
-		 * calls in the body.
-		 *
-		 * This does not change the offsets and lengths of the subprog
-		 * call instruction sequences and hence, the size of the JITed
-		 * image as well.
-		 */
-		bpf_jit_fixup_subprog_calls(fp, code_base, &cgctx, addrs);
-
-		/* There is no need to perform the usual passes. */
-		goto skip_codegen_passes;
-	}
-
-	/* Code generation passes 1-2 */
-	for (pass = 1; pass < 3; pass++) {
-		/* Now build the prologue, body code & epilogue for real. */
-		cgctx.idx = 0;
-		bpf_jit_build_prologue(code_base, &cgctx);
-		bpf_jit_build_body(fp, code_base, &cgctx, addrs, extra_pass);
-		bpf_jit_build_epilogue(code_base, &cgctx);
-
-		if (bpf_jit_enable > 1)
-			pr_info("Pass %d: shrink = %d, seen = 0x%x\n", pass,
-				proglen - (cgctx.idx * 4), cgctx.seen);
-	}
-
-skip_codegen_passes:
-	if (bpf_jit_enable > 1)
-		/*
-		 * Note that we output the base address of the code_base
-		 * rather than image, since opcodes are in code_base.
-		 */
-		bpf_jit_dump(flen, proglen, pass, code_base);
-
-#ifdef PPC64_ELF_ABI_v1
-	/* Function descriptor nastiness: Address + TOC */
-	((u64 *)image)[0] = (u64)code_base;
-	((u64 *)image)[1] = local_paca->kernel_toc;
-#endif
-
-	fp->bpf_func = (void *)image;
-	fp->jited = 1;
-	fp->jited_len = alloclen;
-
-	bpf_flush_icache(bpf_hdr, (u8 *)bpf_hdr + (bpf_hdr->pages * PAGE_SIZE));
-	if (!fp->is_func || extra_pass) {
-		bpf_prog_fill_jited_linfo(fp, addrs);
-out_addrs:
-		kfree(addrs);
-		kfree(jit_data);
-		fp->aux->jit_data = NULL;
-	} else {
-		jit_data->addrs = addrs;
-		jit_data->ctx = cgctx;
-		jit_data->proglen = proglen;
-		jit_data->image = image;
-		jit_data->header = bpf_hdr;
-	}
-
-out:
-	if (bpf_blinded)
-		bpf_jit_prog_release_other(fp, fp == org_fp ? tmp_fp : org_fp);
-
-	return fp;
-}
-
-/* Overriding bpf_jit_free() as we don't set images read-only. */
-void bpf_jit_free(struct bpf_prog *fp)
-{
-	unsigned long addr = (unsigned long)fp->bpf_func & PAGE_MASK;
-	struct bpf_binary_header *bpf_hdr = (void *)addr;
-
-	if (fp->jited)
-		bpf_jit_binary_free(bpf_hdr);
-
-	bpf_prog_unlock_free(fp);
-}
-- 
2.25.0


^ permalink raw reply related

* [RFC PATCH v1 2/7] powerpc/bpf: Change register numbering for bpf_set/is_seen_register()
From: Christophe Leroy @ 2020-12-16 10:07 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, ast,
	daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, naveen.n.rao, sandipan
  Cc: netdev, bpf, linuxppc-dev, linux-kernel
In-Reply-To: <cover.1608112796.git.christophe.leroy@csgroup.eu>

Instead of using BPF register number as input in functions
bpf_set_seen_register() and bpf_is_seen_register(), use
CPU register number directly.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 arch/powerpc/net/bpf_jit_comp64.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c
index 022103c6a201..26a836a904f5 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -31,12 +31,12 @@ static inline void bpf_flush_icache(void *start, void *end)
 
 static inline bool bpf_is_seen_register(struct codegen_context *ctx, int i)
 {
-	return (ctx->seen & (1 << (31 - b2p[i])));
+	return ctx->seen & (1 << (31 - i));
 }
 
 static inline void bpf_set_seen_register(struct codegen_context *ctx, int i)
 {
-	ctx->seen |= (1 << (31 - b2p[i]));
+	ctx->seen |= 1 << (31 - i);
 }
 
 static inline bool bpf_has_stack_frame(struct codegen_context *ctx)
@@ -47,7 +47,7 @@ static inline bool bpf_has_stack_frame(struct codegen_context *ctx)
 	 * - the bpf program uses its stack area
 	 * The latter condition is deduced from the usage of BPF_REG_FP
 	 */
-	return ctx->seen & SEEN_FUNC || bpf_is_seen_register(ctx, BPF_REG_FP);
+	return ctx->seen & SEEN_FUNC || bpf_is_seen_register(ctx, b2p[BPF_REG_FP]);
 }
 
 /*
@@ -124,11 +124,11 @@ static void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx)
 	 * in the protected zone below the previous stack frame
 	 */
 	for (i = BPF_REG_6; i <= BPF_REG_10; i++)
-		if (bpf_is_seen_register(ctx, i))
+		if (bpf_is_seen_register(ctx, b2p[i]))
 			PPC_BPF_STL(b2p[i], 1, bpf_jit_stack_offsetof(ctx, b2p[i]));
 
 	/* Setup frame pointer to point to the bpf stack area */
-	if (bpf_is_seen_register(ctx, BPF_REG_FP))
+	if (bpf_is_seen_register(ctx, b2p[BPF_REG_FP]))
 		EMIT(PPC_RAW_ADDI(b2p[BPF_REG_FP], 1,
 				STACK_FRAME_MIN_SIZE + ctx->stack_size));
 }
@@ -139,7 +139,7 @@ static void bpf_jit_emit_common_epilogue(u32 *image, struct codegen_context *ctx
 
 	/* Restore NVRs */
 	for (i = BPF_REG_6; i <= BPF_REG_10; i++)
-		if (bpf_is_seen_register(ctx, i))
+		if (bpf_is_seen_register(ctx, b2p[i]))
 			PPC_BPF_LL(b2p[i], 1, bpf_jit_stack_offsetof(ctx, b2p[i]));
 
 	/* Tear down our stack frame */
@@ -330,9 +330,9 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
 		 * any issues.
 		 */
 		if (dst_reg >= BPF_PPC_NVR_MIN && dst_reg < 32)
-			bpf_set_seen_register(ctx, insn[i].dst_reg);
+			bpf_set_seen_register(ctx, dst_reg);
 		if (src_reg >= BPF_PPC_NVR_MIN && src_reg < 32)
-			bpf_set_seen_register(ctx, insn[i].src_reg);
+			bpf_set_seen_register(ctx, src_reg);
 
 		switch (code) {
 		/*
-- 
2.25.0


^ permalink raw reply related

* [RFC PATCH v1 6/7] powerpc/asm: Add some opcodes in asm/ppc-opcode.h for PPC32 eBPF
From: Christophe Leroy @ 2020-12-16 10:07 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, ast,
	daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, naveen.n.rao, sandipan
  Cc: netdev, bpf, linuxppc-dev, linux-kernel
In-Reply-To: <cover.1608112796.git.christophe.leroy@csgroup.eu>

The following opcodes will be needed for the implementation
of eBPF for PPC32. Add them in asm/ppc-opcode.h

PPC_RAW_ADDE
PPC_RAW_ADDZE
PPC_RAW_ADDME
PPC_RAW_MFLR
PPC_RAW_ADDIC
PPC_RAW_ADDIC_DOT
PPC_RAW_SUBFC
PPC_RAW_SUBFE
PPC_RAW_SUBFIC
PPC_RAW_SUBFZE
PPC_RAW_ANDIS
PPC_RAW_NOR

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 arch/powerpc/include/asm/ppc-opcode.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h
index ed161ef2b3ca..5b60020dc1f4 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -437,6 +437,9 @@
 #define PPC_RAW_STFDX(s, a, b)		(0x7c0005ae | ___PPC_RS(s) | ___PPC_RA(a) | ___PPC_RB(b))
 #define PPC_RAW_LVX(t, a, b)		(0x7c0000ce | ___PPC_RT(t) | ___PPC_RA(a) | ___PPC_RB(b))
 #define PPC_RAW_STVX(s, a, b)		(0x7c0001ce | ___PPC_RS(s) | ___PPC_RA(a) | ___PPC_RB(b))
+#define PPC_RAW_ADDE(t, a, b)		(0x7c000114 | ___PPC_RT(t) | ___PPC_RA(a) | ___PPC_RB(b))
+#define PPC_RAW_ADDZE(t, a)		(0x7c000194 | ___PPC_RT(t) | ___PPC_RA(a))
+#define PPC_RAW_ADDME(t, a)		(0x7c0001d4 | ___PPC_RT(t) | ___PPC_RA(a))
 #define PPC_RAW_ADD(t, a, b)		(PPC_INST_ADD | ___PPC_RT(t) | ___PPC_RA(a) | ___PPC_RB(b))
 #define PPC_RAW_ADD_DOT(t, a, b)	(PPC_INST_ADD | ___PPC_RT(t) | ___PPC_RA(a) | ___PPC_RB(b) | 0x1)
 #define PPC_RAW_ADDC(t, a, b)		(0x7c000014 | ___PPC_RT(t) | ___PPC_RA(a) | ___PPC_RB(b))
@@ -445,11 +448,14 @@
 #define PPC_RAW_BLR()			(PPC_INST_BLR)
 #define PPC_RAW_BLRL()			(0x4e800021)
 #define PPC_RAW_MTLR(r)			(0x7c0803a6 | ___PPC_RT(r))
+#define PPC_RAW_MFLR(t)			(PPC_INST_MFLR | ___PPC_RT(t))
 #define PPC_RAW_BCTR()			(PPC_INST_BCTR)
 #define PPC_RAW_MTCTR(r)		(PPC_INST_MTCTR | ___PPC_RT(r))
 #define PPC_RAW_ADDI(d, a, i)		(PPC_INST_ADDI | ___PPC_RT(d) | ___PPC_RA(a) | IMM_L(i))
 #define PPC_RAW_LI(r, i)		PPC_RAW_ADDI(r, 0, i)
 #define PPC_RAW_ADDIS(d, a, i)		(PPC_INST_ADDIS | ___PPC_RT(d) | ___PPC_RA(a) | IMM_L(i))
+#define PPC_RAW_ADDIC(d, a, i)		(0x30000000 | ___PPC_RT(d) | ___PPC_RA(a) | IMM_L(i))
+#define PPC_RAW_ADDIC_DOT(d, a, i)	(0x34000000 | ___PPC_RT(d) | ___PPC_RA(a) | IMM_L(i))
 #define PPC_RAW_LIS(r, i)		PPC_RAW_ADDIS(r, 0, i)
 #define PPC_RAW_STDX(r, base, b)	(0x7c00012a | ___PPC_RS(r) | ___PPC_RA(base) | ___PPC_RB(b))
 #define PPC_RAW_STDU(r, base, i)	(0xf8000001 | ___PPC_RS(r) | ___PPC_RA(base) | ((i) & 0xfffc))
@@ -472,6 +478,10 @@
 #define PPC_RAW_CMPLW(a, b)		(0x7c000040 | ___PPC_RA(a) | ___PPC_RB(b))
 #define PPC_RAW_CMPLD(a, b)		(0x7c200040 | ___PPC_RA(a) | ___PPC_RB(b))
 #define PPC_RAW_SUB(d, a, b)		(0x7c000050 | ___PPC_RT(d) | ___PPC_RB(a) | ___PPC_RA(b))
+#define PPC_RAW_SUBFC(d, a, b)		(0x7c000010 | ___PPC_RT(d) | ___PPC_RA(a) | ___PPC_RB(b))
+#define PPC_RAW_SUBFE(d, a, b)		(0x7c000110 | ___PPC_RT(d) | ___PPC_RA(a) | ___PPC_RB(b))
+#define PPC_RAW_SUBFIC(d, a, i)		(0x20000000 | ___PPC_RT(d) | ___PPC_RA(a) | IMM_L(i))
+#define PPC_RAW_SUBFZE(d, a)		(0x7c000190 | ___PPC_RT(d) | ___PPC_RA(a))
 #define PPC_RAW_MULD(d, a, b)		(0x7c0001d2 | ___PPC_RT(d) | ___PPC_RA(a) | ___PPC_RB(b))
 #define PPC_RAW_MULW(d, a, b)		(0x7c0001d6 | ___PPC_RT(d) | ___PPC_RA(a) | ___PPC_RB(b))
 #define PPC_RAW_MULHWU(d, a, b)		(0x7c000016 | ___PPC_RT(d) | ___PPC_RA(a) | ___PPC_RB(b))
@@ -484,11 +494,13 @@
 #define PPC_RAW_DIVDEU_DOT(t, a, b)	(0x7c000312 | ___PPC_RT(t) | ___PPC_RA(a) | ___PPC_RB(b) | 0x1)
 #define PPC_RAW_AND(d, a, b)		(0x7c000038 | ___PPC_RA(d) | ___PPC_RS(a) | ___PPC_RB(b))
 #define PPC_RAW_ANDI(d, a, i)		(0x70000000 | ___PPC_RA(d) | ___PPC_RS(a) | IMM_L(i))
+#define PPC_RAW_ANDIS(d, a, i)		(0x74000000 | ___PPC_RA(d) | ___PPC_RS(a) | IMM_L(i))
 #define PPC_RAW_AND_DOT(d, a, b)	(0x7c000039 | ___PPC_RA(d) | ___PPC_RS(a) | ___PPC_RB(b))
 #define PPC_RAW_OR(d, a, b)		(0x7c000378 | ___PPC_RA(d) | ___PPC_RS(a) | ___PPC_RB(b))
 #define PPC_RAW_MR(d, a)		PPC_RAW_OR(d, a, a)
 #define PPC_RAW_ORI(d, a, i)		(PPC_INST_ORI | ___PPC_RA(d) | ___PPC_RS(a) | IMM_L(i))
 #define PPC_RAW_ORIS(d, a, i)		(PPC_INST_ORIS | ___PPC_RA(d) | ___PPC_RS(a) | IMM_L(i))
+#define PPC_RAW_NOR(d, a, b)		(0x7c0000f8 | ___PPC_RA(d) | ___PPC_RS(a) | ___PPC_RB(b))
 #define PPC_RAW_XOR(d, a, b)		(0x7c000278 | ___PPC_RA(d) | ___PPC_RS(a) | ___PPC_RB(b))
 #define PPC_RAW_XORI(d, a, i)		(0x68000000 | ___PPC_RA(d) | ___PPC_RS(a) | IMM_L(i))
 #define PPC_RAW_XORIS(d, a, i)		(0x6c000000 | ___PPC_RA(d) | ___PPC_RS(a) | IMM_L(i))
-- 
2.25.0


^ permalink raw reply related

* [RFC PATCH v1 5/7] powerpc/bpf: Change values of SEEN_ flags
From: Christophe Leroy @ 2020-12-16 10:07 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, ast,
	daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, naveen.n.rao, sandipan
  Cc: netdev, bpf, linuxppc-dev, linux-kernel
In-Reply-To: <cover.1608112796.git.christophe.leroy@csgroup.eu>

Because PPC32 will use more non volatile registers,
move SEEN_ flags to positions 0-2 which corresponds to special
registers.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 arch/powerpc/net/bpf_jit.h | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
index b34abfce15a6..fb4656986fb9 100644
--- a/arch/powerpc/net/bpf_jit.h
+++ b/arch/powerpc/net/bpf_jit.h
@@ -108,18 +108,18 @@ static inline bool is_nearbranch(int offset)
 #define COND_LT		(CR0_LT | COND_CMP_TRUE)
 #define COND_LE		(CR0_GT | COND_CMP_FALSE)
 
-#define SEEN_FUNC	0x1000 /* might call external helpers */
-#define SEEN_STACK	0x2000 /* uses BPF stack */
-#define SEEN_TAILCALL	0x4000 /* uses tail calls */
+#define SEEN_FUNC	0x20000000 /* might call external helpers */
+#define SEEN_STACK	0x40000000 /* uses BPF stack */
+#define SEEN_TAILCALL	0x80000000 /* uses tail calls */
 
 struct codegen_context {
 	/*
 	 * This is used to track register usage as well
 	 * as calls to external helpers.
 	 * - register usage is tracked with corresponding
-	 *   bits (r3-r10 and r27-r31)
+	 *   bits (r3-r31)
 	 * - rest of the bits can be used to track other
-	 *   things -- for now, we use bits 16 to 23
+	 *   things -- for now, we use bits 0 to 2
 	 *   encoded in SEEN_* macros above
 	 */
 	unsigned int seen;
-- 
2.25.0


^ permalink raw reply related

* [RFC PATCH v1 7/7] powerpc/bpf: Implement extended BPF on PPC32
From: Christophe Leroy @ 2020-12-16 10:07 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, ast,
	daniel, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, naveen.n.rao, sandipan
  Cc: netdev, bpf, linuxppc-dev, linux-kernel
In-Reply-To: <cover.1608112796.git.christophe.leroy@csgroup.eu>

Implement Extended Berkeley Packet Filter on Powerpc 32

Test result with test_bpf module:

	test_bpf: Summary: 378 PASSED, 0 FAILED, [354/366 JIT'ed]

Registers mapping:

	[BPF_REG_0] = r11-r12
	/* function arguments */
	[BPF_REG_1] = r3-r4
	[BPF_REG_2] = r5-r6
	[BPF_REG_3] = r7-r8
	[BPF_REG_4] = r9-r10
	[BPF_REG_5] = r21-r22 (Args 9 and 10 come in via the stack)
	/* non volatile registers */
	[BPF_REG_6] = r23-r24
	[BPF_REG_7] = r25-r26
	[BPF_REG_8] = r27-r28
	[BPF_REG_9] = r29-r30
	/* frame pointer aka BPF_REG_10 */
	[BPF_REG_FP] = r31
	/* eBPF jit internal registers */
	[BPF_REG_AX] = r19-r20
	[TMP_REG] = r18

As PPC32 doesn't have a redzone in the stack,
use r17 as tail call counter.

r0 is used as temporary register as much as possible. It is referenced
directly in the code in order to avoid misuse of it, because some
instructions interpret it as value 0 instead of register r0
(ex: addi, addis, stw, lwz, ...)

The following operations are not implemented:

		case BPF_ALU64 | BPF_DIV | BPF_X: /* dst /= src */
		case BPF_ALU64 | BPF_MOD | BPF_X: /* dst %= src */
		case BPF_STX | BPF_XADD | BPF_DW: /* *(u64 *)(dst + off) += src */

The following operations are only implemented for power of two constants:

		case BPF_ALU64 | BPF_MOD | BPF_K: /* dst %= imm */
		case BPF_ALU64 | BPF_DIV | BPF_K: /* dst /= imm */

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 arch/powerpc/Kconfig              |    2 +-
 arch/powerpc/net/Makefile         |    2 +-
 arch/powerpc/net/bpf_jit.h        |    4 +
 arch/powerpc/net/bpf_jit32.h      |   58 ++
 arch/powerpc/net/bpf_jit_comp32.c | 1020 +++++++++++++++++++++++++++++
 5 files changed, 1084 insertions(+), 2 deletions(-)
 create mode 100644 arch/powerpc/net/bpf_jit32.h
 create mode 100644 arch/powerpc/net/bpf_jit_comp32.c

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 6d1454d31a53..e09d0bfed843 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -199,7 +199,7 @@ config PPC
 	select HAVE_DEBUG_STACKOVERFLOW
 	select HAVE_DYNAMIC_FTRACE
 	select HAVE_DYNAMIC_FTRACE_WITH_REGS	if MPROFILE_KERNEL
-	select HAVE_EBPF_JIT			if PPC64
+	select HAVE_EBPF_JIT
 	select HAVE_EFFICIENT_UNALIGNED_ACCESS	if !(CPU_LITTLE_ENDIAN && POWER7_CPU)
 	select HAVE_FAST_GUP
 	select HAVE_FTRACE_MCOUNT_RECORD
diff --git a/arch/powerpc/net/Makefile b/arch/powerpc/net/Makefile
index 969cde177880..8e60af32e51e 100644
--- a/arch/powerpc/net/Makefile
+++ b/arch/powerpc/net/Makefile
@@ -2,4 +2,4 @@
 #
 # Arch-specific network modules
 #
-obj-$(CONFIG_BPF_JIT) += bpf_jit_comp.o bpf_jit_comp64.o
+obj-$(CONFIG_BPF_JIT) += bpf_jit_comp.o bpf_jit_comp$(BITS).o
diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h
index fb4656986fb9..a45b8266355d 100644
--- a/arch/powerpc/net/bpf_jit.h
+++ b/arch/powerpc/net/bpf_jit.h
@@ -42,6 +42,10 @@
 				EMIT(PPC_RAW_ORI(d, d, IMM_L(i)));	      \
 		} } while(0)
 
+#ifdef CONFIG_PPC32
+#define PPC_EX32(r, i)		EMIT(PPC_RAW_LI((r), (i) < 0 ? -1 : 0))
+#endif
+
 #define PPC_LI64(d, i)		do {					      \
 		if ((long)(i) >= -2147483648 &&				      \
 				(long)(i) < 2147483648)			      \
diff --git a/arch/powerpc/net/bpf_jit32.h b/arch/powerpc/net/bpf_jit32.h
new file mode 100644
index 000000000000..3e8149f45368
--- /dev/null
+++ b/arch/powerpc/net/bpf_jit32.h
@@ -0,0 +1,58 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * BPF JIT compiler for PPC32
+ *
+ */
+#ifndef _BPF_JIT32_H
+#define _BPF_JIT32_H
+
+#include "bpf_jit.h"
+
+/*
+ * Stack layout:
+ *
+ *		[	prev sp		] <-------------
+ *		[   nv gpr save area	] 16 * 4	|
+ * fp (r31) -->	[   ebpf stack space	] upto 512	|
+ *		[     frame header	] 16		|
+ * sp (r1) --->	[    stack pointer	] --------------
+ */
+
+/* for gpr non volatile registers r18 to r31 (14) + r17 for tail call + alignment */
+#define BPF_PPC_STACK_SAVE	(14 * 4 + 4 + 4)
+/* stack frame, ensure this is quadword aligned */
+#define BPF_PPC_STACKFRAME(ctx)	(STACK_FRAME_MIN_SIZE + BPF_PPC_STACK_SAVE + (ctx)->stack_size)
+
+#ifndef __ASSEMBLY__
+
+/* BPF register usage */
+#define TMP_REG	(MAX_BPF_JIT_REG + 0)
+
+/* BPF to ppc register mappings */
+static const int b2p[] = {
+	/* function return value */
+	[BPF_REG_0] = 12,
+	/* function arguments */
+	[BPF_REG_1] = 4,
+	[BPF_REG_2] = 6,
+	[BPF_REG_3] = 8,
+	[BPF_REG_4] = 10,
+	[BPF_REG_5] = 22,
+	/* non volatile registers */
+	[BPF_REG_6] = 24,
+	[BPF_REG_7] = 26,
+	[BPF_REG_8] = 28,
+	[BPF_REG_9] = 30,
+	/* frame pointer aka BPF_REG_10 */
+	[BPF_REG_FP] = 31,
+	/* eBPF jit internal registers */
+	[BPF_REG_AX] = 20,
+	[TMP_REG] = 18,
+};
+
+/* PPC NVR range -- update this if we ever use NVRs below r18 */
+#define BPF_PPC_NVR_MIN		18
+
+#endif /* !__ASSEMBLY__ */
+
+#endif
diff --git a/arch/powerpc/net/bpf_jit_comp32.c b/arch/powerpc/net/bpf_jit_comp32.c
new file mode 100644
index 000000000000..72eef501899b
--- /dev/null
+++ b/arch/powerpc/net/bpf_jit_comp32.c
@@ -0,0 +1,1020 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * eBPF JIT compiler for PPC32
+ *
+ * Copyright 2020 Christophe Leroy <christophe.leroy@csgroup.eu>
+ *		  CS GROUP France
+ *
+ * Based on PPC64 eBPF JIT compiler by Naveen N. Rao
+ */
+#include <linux/moduleloader.h>
+#include <asm/cacheflush.h>
+#include <asm/asm-compat.h>
+#include <linux/netdevice.h>
+#include <linux/filter.h>
+#include <linux/if_vlan.h>
+#include <asm/kprobes.h>
+#include <linux/bpf.h>
+
+#include "bpf_jit32.h"
+
+static int bpf_jit_stack_offsetof(struct codegen_context *ctx, int reg)
+{
+	if ((reg >= BPF_PPC_NVR_MIN && reg < 32) || reg == __REG_R17)
+		return BPF_PPC_STACKFRAME(ctx) - 4 * (32 - reg);
+
+	WARN(true, "BPF JIT is asking about unknown registers");
+	/* Use the hole we have left for alignment */
+	return BPF_PPC_STACKFRAME(ctx) - 4 * (32 - __REG_R16);
+}
+
+void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx)
+{
+	int i;
+
+	/* First arg comes in as a 32 bits pointer. */
+	EMIT(PPC_RAW_MR(b2p[BPF_REG_1], __REG_R3));
+	EMIT(PPC_RAW_LI(b2p[BPF_REG_1] - 1, 0));
+
+	/*
+	 * Initialize tail_call_cnt in r17 if we do tail calls.
+	 * Otherwise, put in NOPs so that it can be skipped when we are
+	 * invoked through a tail call.
+	 */
+	if (ctx->seen & SEEN_TAILCALL) {
+		EMIT(PPC_RAW_STWU(__REG_R1, __REG_R1, -BPF_PPC_STACKFRAME(ctx)));
+		EMIT(PPC_RAW_STW(__REG_R17, __REG_R1, bpf_jit_stack_offsetof(ctx, __REG_R17)));
+		EMIT(PPC_RAW_LI(__REG_R17, 0));
+		/* Skip already done stackframe setup */
+		PPC_JMP((ctx->idx + 2) * 4);
+	} else {
+		EMIT(PPC_RAW_NOP());
+		EMIT(PPC_RAW_NOP());
+		EMIT(PPC_RAW_NOP());
+		EMIT(PPC_RAW_NOP());
+	}
+
+#define BPF_TAILCALL_PROLOGUE_SIZE	24
+
+	EMIT(PPC_RAW_STWU(__REG_R1, __REG_R1, -BPF_PPC_STACKFRAME(ctx)));
+
+	/*
+	 * We need a stack frame, but we don't necessarily need to
+	 * save/restore LR unless we call other functions
+	 */
+	if (ctx->seen & SEEN_FUNC) {
+		EMIT(PPC_RAW_MFLR(__REG_R0));
+		EMIT(PPC_RAW_STW(__REG_R0, __REG_R1, BPF_PPC_STACKFRAME(ctx) + PPC_LR_STKOFF));
+	}
+
+	/*
+	 * Back up non-volatile regs -- registers r18-r31
+	 */
+	for (i = BPF_PPC_NVR_MIN; i <= 31; i++)
+		if (bpf_is_seen_register(ctx, i))
+			EMIT(PPC_RAW_STW(i, __REG_R1, bpf_jit_stack_offsetof(ctx, i)));
+
+	/* If needed retrieve arguments 9 and 10, ie 5th 64 bits arg.*/
+	if (bpf_is_seen_register(ctx, b2p[BPF_REG_5])) {
+		EMIT(PPC_RAW_LWZ(b2p[BPF_REG_5] - 1, __REG_R1, BPF_PPC_STACKFRAME(ctx)) + 8);
+		EMIT(PPC_RAW_LWZ(b2p[BPF_REG_5], __REG_R1, BPF_PPC_STACKFRAME(ctx)) + 12);
+	}
+
+	/* Setup frame pointer to point to the bpf stack area */
+	if (bpf_is_seen_register(ctx, b2p[BPF_REG_FP]))
+		EMIT(PPC_RAW_ADDI(b2p[BPF_REG_FP], __REG_R1,
+				  STACK_FRAME_MIN_SIZE + ctx->stack_size));
+}
+
+static void bpf_jit_emit_common_epilogue(u32 *image, struct codegen_context *ctx)
+{
+	int i;
+
+	/* Restore NVRs */
+	for (i = BPF_PPC_NVR_MIN; i <= 31; i++)
+		if (bpf_is_seen_register(ctx, i))
+			EMIT(PPC_RAW_LWZ(i, __REG_R1, bpf_jit_stack_offsetof(ctx, i)));
+
+	/* Tear down our stack frame */
+	EMIT(PPC_RAW_ADDI(__REG_R1, __REG_R1, BPF_PPC_STACKFRAME(ctx)));
+	if (ctx->seen & SEEN_FUNC) {
+		EMIT(PPC_RAW_LWZ(__REG_R0, __REG_R1, PPC_LR_STKOFF));
+		EMIT(PPC_RAW_MTLR(__REG_R0));
+	}
+}
+
+void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx)
+{
+	EMIT(PPC_RAW_MR(__REG_R3, b2p[BPF_REG_0]));
+
+	if (ctx->seen & SEEN_TAILCALL)
+		EMIT(PPC_RAW_LWZ(__REG_R17, __REG_R1, bpf_jit_stack_offsetof(ctx, __REG_R17)));
+
+	bpf_jit_emit_common_epilogue(image, ctx);
+
+	EMIT(PPC_RAW_BLR());
+}
+
+void bpf_jit_emit_func_call_rel(u32 *image, struct codegen_context *ctx, u64 func)
+{
+	/* Load function address into r0 */
+	EMIT(PPC_RAW_LIS(__REG_R0, IMM_H(func)));
+	EMIT(PPC_RAW_ORI(__REG_R0, __REG_R0, IMM_L(func)));
+	EMIT(PPC_RAW_MTLR(__REG_R0));
+	EMIT(PPC_RAW_BLRL());
+}
+
+static void bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32 out)
+{
+	/*
+	 * By now, the eBPF program has already setup parameters in r3-r6
+	 * r3-r4/BPF_REG_1 - pointer to ctx -- passed as is to the next bpf program
+	 * r5-r6/BPF_REG_2 - pointer to bpf_array
+	 * r7-r8/BPF_REG_3 - index in bpf_array
+	 */
+	int b2p_bpf_array = b2p[BPF_REG_2];
+	int b2p_index = b2p[BPF_REG_3];
+
+	/*
+	 * if (index >= array->map.max_entries)
+	 *   goto out;
+	 */
+	EMIT(PPC_RAW_LWZ(__REG_R0, b2p_bpf_array, offsetof(struct bpf_array, map.max_entries)));
+	EMIT(PPC_RAW_CMPLW(b2p_index, __REG_R0));
+	PPC_BCC(COND_GE, out);
+
+	/*
+	 * if (tail_call_cnt > MAX_TAIL_CALL_CNT)
+	 *   goto out;
+	 */
+	EMIT(PPC_RAW_CMPLWI(__REG_R17, MAX_TAIL_CALL_CNT));
+	PPC_BCC(COND_GT, out);
+
+	/*
+	 * tail_call_cnt++;
+	 */
+	EMIT(PPC_RAW_ADDI(__REG_R17, __REG_R17, 1));
+
+	/* prog = array->ptrs[index]; */
+	EMIT(PPC_RAW_RLWINM(__REG_R3, b2p_index, 2, 0, 29));
+	EMIT(PPC_RAW_ADD(__REG_R3, __REG_R3, b2p_bpf_array));
+	EMIT(PPC_RAW_LWZ(__REG_R3, __REG_R3, offsetof(struct bpf_array, ptrs)));
+
+	/*
+	 * if (prog == NULL)
+	 *   goto out;
+	 */
+	EMIT(PPC_RAW_CMPLWI(__REG_R3, 0));
+	PPC_BCC(COND_EQ, out);
+
+	/* goto *(prog->bpf_func + prologue_size); */
+	EMIT(PPC_RAW_LWZ(__REG_R3, __REG_R3, offsetof(struct bpf_prog, bpf_func)));
+	EMIT(PPC_RAW_ADDIC(__REG_R3, __REG_R3, BPF_TAILCALL_PROLOGUE_SIZE));
+	EMIT(PPC_RAW_MTCTR(__REG_R3));
+
+	EMIT(PPC_RAW_MR(__REG_R3, b2p[BPF_REG_1]));
+
+	/* tear down stack, restore NVRs, ... */
+	bpf_jit_emit_common_epilogue(image, ctx);
+
+	EMIT(PPC_RAW_BCTR());
+	/* out: */
+}
+
+/* Assemble the body code between the prologue & epilogue */
+int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *ctx,
+		       u32 *addrs, bool extra_pass)
+{
+	const struct bpf_insn *insn = fp->insnsi;
+	int flen = fp->len;
+	int i, ret;
+
+	/* Start of epilogue code - will only be valid 2nd pass onwards */
+	u32 exit_addr = addrs[flen];
+
+	for (i = 0; i < flen; i++) {
+		u32 code = insn[i].code;
+		u32 dst_reg = b2p[insn[i].dst_reg];
+		u32 dst_reg_h = dst_reg - 1;
+		u32 src_reg = b2p[insn[i].src_reg];
+		u32 src_reg_h = src_reg - 1;
+		u32 tmp_reg = b2p[TMP_REG];
+		s16 off = insn[i].off;
+		s32 imm = insn[i].imm;
+		bool func_addr_fixed;
+		u64 func_addr;
+		u32 true_cond;
+
+		/*
+		 * addrs[] maps a BPF bytecode address into a real offset from
+		 * the start of the body code.
+		 */
+		addrs[i] = ctx->idx * 4;
+
+		/*
+		 * As an optimization, we note down which registers
+		 * are used so that we can only save/restore those in our
+		 * prologue and epilogue. We do this here regardless of whether
+		 * the actual BPF instruction uses src/dst registers or not
+		 * (for instance, BPF_CALL does not use them). The expectation
+		 * is that those instructions will have src_reg/dst_reg set to
+		 * 0. Even otherwise, we just lose some prologue/epilogue
+		 * optimization but everything else should work without
+		 * any issues.
+		 */
+		if (dst_reg >= 3 && dst_reg < 32) {
+			bpf_set_seen_register(ctx, dst_reg);
+			bpf_set_seen_register(ctx, dst_reg_h);
+		}
+
+		if (src_reg >= 3 && src_reg < 32) {
+			bpf_set_seen_register(ctx, src_reg);
+			bpf_set_seen_register(ctx, src_reg_h);
+		}
+
+		switch (code) {
+		/*
+		 * Arithmetic operations: ADD/SUB/MUL/DIV/MOD/NEG
+		 */
+		case BPF_ALU | BPF_ADD | BPF_X: /* (u32) dst += (u32) src */
+			EMIT(PPC_RAW_ADD(dst_reg, dst_reg, src_reg));
+			break;
+		case BPF_ALU64 | BPF_ADD | BPF_X: /* dst += src */
+			EMIT(PPC_RAW_ADDC(dst_reg, dst_reg, src_reg));
+			EMIT(PPC_RAW_ADDE(dst_reg_h, dst_reg_h, src_reg_h));
+			break;
+		case BPF_ALU | BPF_SUB | BPF_X: /* (u32) dst -= (u32) src */
+			EMIT(PPC_RAW_SUB(dst_reg, dst_reg, src_reg));
+			break;
+		case BPF_ALU64 | BPF_SUB | BPF_X: /* dst -= src */
+			EMIT(PPC_RAW_SUBFC(dst_reg, src_reg, dst_reg));
+			EMIT(PPC_RAW_SUBFE(dst_reg_h, src_reg_h, dst_reg_h));
+			break;
+		case BPF_ALU | BPF_SUB | BPF_K: /* (u32) dst -= (u32) imm */
+			imm = -imm;
+			fallthrough;
+		case BPF_ALU | BPF_ADD | BPF_K: /* (u32) dst += (u32) imm */
+			if (IMM_HA(imm) & 0xffff)
+				EMIT(PPC_RAW_ADDIS(dst_reg, dst_reg, IMM_HA(imm)));
+			if (IMM_L(imm))
+				EMIT(PPC_RAW_ADDI(dst_reg, dst_reg, IMM_L(imm)));
+			break;
+		case BPF_ALU64 | BPF_SUB | BPF_K: /* dst -= imm */
+			imm = -imm;
+			fallthrough;
+		case BPF_ALU64 | BPF_ADD | BPF_K: /* dst += imm */
+			if (!imm)
+				break;
+
+			if (imm >= -32768 && imm < 32768) {
+				EMIT(PPC_RAW_ADDIC(dst_reg, dst_reg, imm));
+			} else {
+				PPC_LI32(__REG_R0, imm);
+				EMIT(PPC_RAW_ADDC(dst_reg, dst_reg, __REG_R0));
+			}
+			if (imm >= 0)
+				EMIT(PPC_RAW_ADDZE(dst_reg_h, dst_reg_h));
+			else
+				EMIT(PPC_RAW_ADDME(dst_reg_h, dst_reg_h));
+			break;
+		case BPF_ALU64 | BPF_MUL | BPF_X: /* dst *= src */
+			bpf_set_seen_register(ctx, tmp_reg);
+			EMIT(PPC_RAW_MULW(__REG_R0, dst_reg, src_reg_h));
+			EMIT(PPC_RAW_MULW(dst_reg_h, dst_reg_h, src_reg));
+			EMIT(PPC_RAW_MULHWU(tmp_reg, dst_reg, src_reg));
+			EMIT(PPC_RAW_MULW(dst_reg, dst_reg, src_reg));
+			EMIT(PPC_RAW_ADD(dst_reg_h, dst_reg_h, __REG_R0));
+			EMIT(PPC_RAW_ADD(dst_reg_h, dst_reg_h, tmp_reg));
+			break;
+		case BPF_ALU | BPF_MUL | BPF_X: /* (u32) dst *= (u32) src */
+			EMIT(PPC_RAW_MULW(dst_reg, dst_reg, src_reg));
+			break;
+		case BPF_ALU | BPF_MUL | BPF_K: /* (u32) dst *= (u32) imm */
+			if (imm >= -32768 && imm < 32768) {
+				EMIT(PPC_RAW_MULI(dst_reg, dst_reg, imm));
+			} else {
+				PPC_LI32(__REG_R0, imm);
+				EMIT(PPC_RAW_MULW(dst_reg, dst_reg, __REG_R0));
+			}
+			break;
+		case BPF_ALU64 | BPF_MUL | BPF_K: /* dst *= imm */
+			if (!imm) {
+				PPC_LI32(dst_reg, 0);
+				PPC_LI32(dst_reg_h, 0);
+				break;
+			}
+			if (imm == 1)
+				break;
+			if (imm == -1) {
+				EMIT(PPC_RAW_SUBFIC(dst_reg, dst_reg, 0));
+				EMIT(PPC_RAW_SUBFZE(dst_reg_h, dst_reg_h));
+				break;
+			}
+			bpf_set_seen_register(ctx, tmp_reg);
+			PPC_LI32(tmp_reg, imm);
+			EMIT(PPC_RAW_MULW(dst_reg_h, dst_reg_h, tmp_reg));
+			if (imm < 0)
+				EMIT(PPC_RAW_SUB(dst_reg_h, dst_reg_h, dst_reg));
+			EMIT(PPC_RAW_MULHWU(__REG_R0, dst_reg, tmp_reg));
+			EMIT(PPC_RAW_MULW(dst_reg, dst_reg, tmp_reg));
+			EMIT(PPC_RAW_ADD(dst_reg_h, dst_reg_h, __REG_R0));
+			break;
+		case BPF_ALU | BPF_DIV | BPF_X: /* (u32) dst /= (u32) src */
+			EMIT(PPC_RAW_DIVWU(dst_reg, dst_reg, src_reg));
+			break;
+		case BPF_ALU | BPF_MOD | BPF_X: /* (u32) dst %= (u32) src */
+			EMIT(PPC_RAW_DIVWU(__REG_R0, dst_reg, src_reg));
+			EMIT(PPC_RAW_MULW(__REG_R0, src_reg, __REG_R0));
+			EMIT(PPC_RAW_SUB(dst_reg, dst_reg, __REG_R0));
+			break;
+		case BPF_ALU64 | BPF_DIV | BPF_X: /* dst /= src */
+			return -EOPNOTSUPP;
+		case BPF_ALU64 | BPF_MOD | BPF_X: /* dst %= src */
+			return -EOPNOTSUPP;
+		case BPF_ALU | BPF_DIV | BPF_K: /* (u32) dst /= (u32) imm */
+			if (!imm)
+				return -EINVAL;
+			if (imm == 1)
+				break;
+
+			PPC_LI32(__REG_R0, imm);
+			EMIT(PPC_RAW_DIVWU(dst_reg, dst_reg, __REG_R0));
+			break;
+		case BPF_ALU | BPF_MOD | BPF_K: /* (u32) dst %= (u32) imm */
+			if (!imm)
+				return -EINVAL;
+
+			if (!is_power_of_2((u32)imm)) {
+				bpf_set_seen_register(ctx, tmp_reg);
+				PPC_LI32(tmp_reg, imm);
+				EMIT(PPC_RAW_DIVWU(__REG_R0, dst_reg, tmp_reg));
+				EMIT(PPC_RAW_MULW(__REG_R0, tmp_reg, __REG_R0));
+				EMIT(PPC_RAW_SUB(dst_reg, dst_reg, __REG_R0));
+				break;
+			}
+			if (imm == 1)
+				EMIT(PPC_RAW_LI(dst_reg, 0));
+			else
+				EMIT(PPC_RAW_RLWINM(dst_reg, dst_reg, 0, 32 - ilog2((u32)imm), 31));
+
+			break;
+		case BPF_ALU64 | BPF_MOD | BPF_K: /* dst %= imm */
+			if (!imm)
+				return -EINVAL;
+			if (imm < 0)
+				imm = -imm;
+			if (!is_power_of_2(imm))
+				return -EOPNOTSUPP;
+			if (imm == 1)
+				EMIT(PPC_RAW_LI(dst_reg, 0));
+			else
+				EMIT(PPC_RAW_RLWINM(dst_reg, dst_reg, 0, 32 - ilog2(imm), 31));
+			EMIT(PPC_RAW_LI(dst_reg_h, 0));
+			break;
+		case BPF_ALU64 | BPF_DIV | BPF_K: /* dst /= imm */
+			if (!imm)
+				return -EINVAL;
+			if (!is_power_of_2(abs(imm)))
+				return -EOPNOTSUPP;
+
+			if (imm < 0) {
+				EMIT(PPC_RAW_SUBFIC(dst_reg, dst_reg, 0));
+				EMIT(PPC_RAW_SUBFZE(dst_reg_h, dst_reg_h));
+				imm = -imm;
+			}
+			if (imm == 1)
+				break;
+			imm = ilog2(imm);
+			EMIT(PPC_RAW_RLWINM(dst_reg, dst_reg, 32 - imm, imm, 31));
+			EMIT(PPC_RAW_RLWIMI(dst_reg, dst_reg_h, 32 - imm, 0, imm - 1));
+			EMIT(PPC_RAW_SRAWI(dst_reg_h, dst_reg_h, imm));
+			break;
+		case BPF_ALU | BPF_NEG: /* (u32) dst = -dst */
+			EMIT(PPC_RAW_NEG(dst_reg, dst_reg));
+			break;
+		case BPF_ALU64 | BPF_NEG: /* dst = -dst */
+			EMIT(PPC_RAW_SUBFIC(dst_reg, dst_reg, 0));
+			EMIT(PPC_RAW_SUBFZE(dst_reg_h, dst_reg_h));
+			break;
+
+		/*
+		 * Logical operations: AND/OR/XOR/[A]LSH/[A]RSH
+		 */
+		case BPF_ALU64 | BPF_AND | BPF_X: /* dst = dst & src */
+			EMIT(PPC_RAW_AND(dst_reg, dst_reg, src_reg));
+			EMIT(PPC_RAW_AND(dst_reg_h, dst_reg_h, src_reg_h));
+			break;
+		case BPF_ALU | BPF_AND | BPF_X: /* (u32) dst = dst & src */
+			EMIT(PPC_RAW_AND(dst_reg, dst_reg, src_reg));
+			break;
+		case BPF_ALU64 | BPF_AND | BPF_K: /* dst = dst & imm */
+			if (imm >= 0)
+				EMIT(PPC_RAW_LI(dst_reg_h, 0));
+			fallthrough;
+		case BPF_ALU | BPF_AND | BPF_K: /* (u32) dst = dst & imm */
+			if (!IMM_H(imm)) {
+				EMIT(PPC_RAW_ANDI(dst_reg, dst_reg, IMM_L(imm)));
+			} else if (!IMM_L(imm)) {
+				EMIT(PPC_RAW_ANDIS(dst_reg, dst_reg, IMM_H(imm)));
+			} else if (imm == (((1 << fls(imm)) - 1) ^ ((1 << (ffs(i) - 1)) - 1))) {
+				EMIT(PPC_RAW_RLWINM(dst_reg, dst_reg, 0,
+						    32 - fls(imm), 32 - ffs(imm)));
+			} else {
+				PPC_LI32(__REG_R0, imm);
+				EMIT(PPC_RAW_AND(dst_reg, dst_reg, __REG_R0));
+			}
+			break;
+		case BPF_ALU64 | BPF_OR | BPF_X: /* dst = dst | src */
+			EMIT(PPC_RAW_OR(dst_reg, dst_reg, src_reg));
+			EMIT(PPC_RAW_OR(dst_reg_h, dst_reg_h, src_reg_h));
+			break;
+		case BPF_ALU | BPF_OR | BPF_X: /* dst = (u32) dst | (u32) src */
+			EMIT(PPC_RAW_OR(dst_reg, dst_reg, src_reg));
+			break;
+		case BPF_ALU64 | BPF_OR | BPF_K:/* dst = dst | imm */
+			/* Sign-extended */
+			if (imm < 0)
+				EMIT(PPC_RAW_LI(dst_reg_h, -1));
+			fallthrough;
+		case BPF_ALU | BPF_OR | BPF_K:/* dst = (u32) dst | (u32) imm */
+			if (IMM_L(imm))
+				EMIT(PPC_RAW_ORI(dst_reg, dst_reg, IMM_L(imm)));
+			if (IMM_H(imm))
+				EMIT(PPC_RAW_ORIS(dst_reg, dst_reg, IMM_H(imm)));
+			break;
+		case BPF_ALU64 | BPF_XOR | BPF_X: /* dst ^= src */
+			if (dst_reg == src_reg) {
+				EMIT(PPC_RAW_LI(dst_reg, 0));
+				EMIT(PPC_RAW_LI(dst_reg_h, 0));
+			} else {
+				EMIT(PPC_RAW_XOR(dst_reg, dst_reg, src_reg));
+				EMIT(PPC_RAW_XOR(dst_reg_h, dst_reg_h, src_reg_h));
+			}
+			break;
+		case BPF_ALU | BPF_XOR | BPF_X: /* (u32) dst ^= src */
+			if (dst_reg == src_reg)
+				EMIT(PPC_RAW_LI(dst_reg, 0));
+			else
+				EMIT(PPC_RAW_XOR(dst_reg, dst_reg, src_reg));
+			break;
+		case BPF_ALU64 | BPF_XOR | BPF_K: /* dst ^= imm */
+			if (imm < 0)
+				EMIT(PPC_RAW_NOR(dst_reg_h, dst_reg_h, dst_reg_h));
+			fallthrough;
+		case BPF_ALU | BPF_XOR | BPF_K: /* (u32) dst ^= (u32) imm */
+			if (IMM_L(imm))
+				EMIT(PPC_RAW_XORI(dst_reg, dst_reg, IMM_L(imm)));
+			if (IMM_H(imm))
+				EMIT(PPC_RAW_XORIS(dst_reg, dst_reg, IMM_H(imm)));
+			break;
+		case BPF_ALU | BPF_LSH | BPF_X: /* (u32) dst <<= (u32) src */
+			EMIT(PPC_RAW_SLW(dst_reg, dst_reg, src_reg));
+			break;
+		case BPF_ALU64 | BPF_LSH | BPF_X: /* dst <<= src; */
+			EMIT(PPC_RAW_ADDIC_DOT(__REG_R0, src_reg, -32));
+			PPC_BCC_SHORT(COND_LT, (ctx->idx + 4) * 4);
+			EMIT(PPC_RAW_SLW(dst_reg_h, dst_reg, __REG_R0));
+			EMIT(PPC_RAW_LI(dst_reg, 0));
+			PPC_JMP((ctx->idx + 6) * 4);
+			EMIT(PPC_RAW_SUBFIC(__REG_R0, src_reg, 32));
+			EMIT(PPC_RAW_SLW(dst_reg_h, dst_reg_h, src_reg));
+			EMIT(PPC_RAW_SRW(__REG_R0, dst_reg, __REG_R0));
+			EMIT(PPC_RAW_SLW(dst_reg, dst_reg, src_reg));
+			EMIT(PPC_RAW_OR(dst_reg_h, dst_reg_h, __REG_R0));
+			break;
+		case BPF_ALU | BPF_LSH | BPF_K: /* (u32) dst <<== (u32) imm */
+			if (!imm)
+				break;
+			EMIT(PPC_RAW_SLWI(dst_reg, dst_reg, imm));
+			break;
+		case BPF_ALU64 | BPF_LSH | BPF_K: /* dst <<== imm */
+			if (imm < 0)
+				return -EINVAL;
+			if (!imm)
+				break;
+			if (imm < 32) {
+				EMIT(PPC_RAW_RLWINM(dst_reg_h, dst_reg_h, imm, 0, 31 - imm));
+				EMIT(PPC_RAW_RLWIMI(dst_reg_h, dst_reg, imm, 32 - imm, 31));
+				EMIT(PPC_RAW_RLWINM(dst_reg, dst_reg, imm, 0, 31 - imm));
+				break;
+			}
+			if (imm < 64)
+				EMIT(PPC_RAW_RLWINM(dst_reg_h, dst_reg, imm, 0, 31 - imm));
+			else
+				EMIT(PPC_RAW_LI(dst_reg_h, 0));
+			EMIT(PPC_RAW_LI(dst_reg, 0));
+			break;
+		case BPF_ALU | BPF_RSH | BPF_X: /* (u32) dst >>= (u32) src */
+			EMIT(PPC_RAW_SRW(dst_reg, dst_reg, src_reg));
+			break;
+		case BPF_ALU64 | BPF_RSH | BPF_X: /* dst >>= src */
+			EMIT(PPC_RAW_ADDIC_DOT(__REG_R0, src_reg, -32));
+			PPC_BCC_SHORT(COND_LT, (ctx->idx + 4) * 4);
+			EMIT(PPC_RAW_SRW(dst_reg, dst_reg_h, __REG_R0));
+			EMIT(PPC_RAW_LI(dst_reg_h, 0));
+			PPC_JMP((ctx->idx + 6) * 4);
+			EMIT(PPC_RAW_SUBFIC(0, src_reg, 32));
+			EMIT(PPC_RAW_SRW(dst_reg, dst_reg, src_reg));
+			EMIT(PPC_RAW_SLW(__REG_R0, dst_reg_h, __REG_R0));
+			EMIT(PPC_RAW_SRW(dst_reg_h, dst_reg_h, src_reg));
+			EMIT(PPC_RAW_OR(dst_reg, dst_reg, __REG_R0));
+			break;
+		case BPF_ALU | BPF_RSH | BPF_K: /* (u32) dst >>= (u32) imm */
+			if (!imm)
+				break;
+			EMIT(PPC_RAW_SRWI(dst_reg, dst_reg, imm));
+			break;
+		case BPF_ALU64 | BPF_RSH | BPF_K: /* dst >>= imm */
+			if (imm < 0)
+				return -EINVAL;
+			if (!imm)
+				break;
+			if (imm < 32) {
+				EMIT(PPC_RAW_RLWINM(dst_reg, dst_reg, 32 - imm, imm, 31));
+				EMIT(PPC_RAW_RLWIMI(dst_reg, dst_reg_h, 32 - imm, 0, imm - 1));
+				EMIT(PPC_RAW_RLWINM(dst_reg_h, dst_reg_h, 32 - imm, imm, 31));
+				break;
+			}
+			if (imm < 64)
+				EMIT(PPC_RAW_RLWINM(dst_reg, dst_reg_h, 64 - imm, imm - 32, 31));
+			else
+				EMIT(PPC_RAW_LI(dst_reg, 0));
+			EMIT(PPC_RAW_LI(dst_reg_h, 0));
+			break;
+		case BPF_ALU | BPF_ARSH | BPF_X: /* (s32) dst >>= src */
+			EMIT(PPC_RAW_SRAW(dst_reg_h, dst_reg, src_reg));
+			break;
+		case BPF_ALU64 | BPF_ARSH | BPF_X: /* (s64) dst >>= src */
+			EMIT(PPC_RAW_ADDIC_DOT(__REG_R0, src_reg, -32));
+			PPC_BCC_SHORT(COND_LT, (ctx->idx + 4) * 4);
+			EMIT(PPC_RAW_SRAW(dst_reg, dst_reg_h, __REG_R0));
+			EMIT(PPC_RAW_SRAWI(dst_reg_h, dst_reg_h, 31));
+			PPC_JMP((ctx->idx + 6) * 4);
+			EMIT(PPC_RAW_SUBFIC(0, src_reg, 32));
+			EMIT(PPC_RAW_SRW(dst_reg, dst_reg, src_reg));
+			EMIT(PPC_RAW_SLW(__REG_R0, dst_reg_h, __REG_R0));
+			EMIT(PPC_RAW_SRAW(dst_reg_h, dst_reg_h, src_reg));
+			EMIT(PPC_RAW_OR(dst_reg, dst_reg, __REG_R0));
+			break;
+		case BPF_ALU | BPF_ARSH | BPF_K: /* (s32) dst >>= imm */
+			if (!imm)
+				break;
+			EMIT(PPC_RAW_SRAWI(dst_reg, dst_reg, imm));
+			break;
+		case BPF_ALU64 | BPF_ARSH | BPF_K: /* (s64) dst >>= imm */
+			if (imm < 0)
+				return -EINVAL;
+			if (!imm)
+				break;
+			if (imm < 32) {
+				EMIT(PPC_RAW_RLWINM(dst_reg, dst_reg, 32 - imm, imm, 31));
+				EMIT(PPC_RAW_RLWIMI(dst_reg, dst_reg_h, 32 - imm, 0, imm - 1));
+				EMIT(PPC_RAW_SRAWI(dst_reg_h, dst_reg_h, imm));
+				break;
+			}
+			if (imm < 64)
+				EMIT(PPC_RAW_SRAWI(dst_reg, dst_reg_h, imm - 32));
+			else
+				EMIT(PPC_RAW_SRAWI(dst_reg, dst_reg_h, 31));
+			EMIT(PPC_RAW_SRAWI(dst_reg_h, dst_reg_h, 31));
+			break;
+
+		/*
+		 * MOV
+		 */
+		case BPF_ALU64 | BPF_MOV | BPF_X: /* dst = src */
+			if (dst_reg == src_reg)
+				break;
+			EMIT(PPC_RAW_MR(dst_reg, src_reg));
+			EMIT(PPC_RAW_MR(dst_reg_h, src_reg_h));
+			break;
+		case BPF_ALU | BPF_MOV | BPF_X: /* (u32) dst = src */
+			/* special mov32 for zext */
+			if (imm == 1)
+				EMIT(PPC_RAW_LI(dst_reg_h, 0));
+			else if (dst_reg != src_reg)
+				EMIT(PPC_RAW_MR(dst_reg, src_reg));
+			break;
+		case BPF_ALU64 | BPF_MOV | BPF_K: /* dst = (s64) imm */
+			PPC_LI32(dst_reg, imm);
+			PPC_EX32(dst_reg_h, imm);
+			break;
+		case BPF_ALU | BPF_MOV | BPF_K: /* (u32) dst = imm */
+			PPC_LI32(dst_reg, imm);
+			break;
+
+		/*
+		 * BPF_FROM_BE/LE
+		 */
+		case BPF_ALU | BPF_END | BPF_FROM_LE:
+			switch (imm) {
+			case 16:
+				/* Rotate 8 bits left & mask with 0x0000ff00 */
+				EMIT(PPC_RAW_RLWINM(__REG_R0, dst_reg, 8, 16, 23));
+				/* Rotate 8 bits right & insert LSB to reg */
+				EMIT(PPC_RAW_RLWIMI(__REG_R0, dst_reg, 24, 24, 31));
+				/* Move result back to dst_reg_h */
+				EMIT(PPC_RAW_MR(dst_reg, __REG_R0));
+				break;
+			case 32:
+				/*
+				 * Rotate word left by 8 bits:
+				 * 2 bytes are already in their final position
+				 * -- byte 2 and 4 (of bytes 1, 2, 3 and 4)
+				 */
+				EMIT(PPC_RAW_RLWINM(__REG_R0, dst_reg, 8, 0, 31));
+				/* Rotate 24 bits and insert byte 1 */
+				EMIT(PPC_RAW_RLWIMI(__REG_R0, dst_reg, 24, 0, 7));
+				/* Rotate 24 bits and insert byte 3 */
+				EMIT(PPC_RAW_RLWIMI(__REG_R0, dst_reg, 24, 16, 23));
+				EMIT(PPC_RAW_MR(dst_reg, __REG_R0));
+				break;
+			case 64:
+				bpf_set_seen_register(ctx, tmp_reg);
+				EMIT(PPC_RAW_RLWINM(tmp_reg, dst_reg, 8, 0, 31));
+				EMIT(PPC_RAW_RLWINM(__REG_R0, dst_reg_h, 8, 0, 31));
+				/* Rotate 24 bits and insert byte 1 */
+				EMIT(PPC_RAW_RLWIMI(tmp_reg, dst_reg, 24, 0, 7));
+				EMIT(PPC_RAW_RLWIMI(__REG_R0, dst_reg_h, 24, 0, 7));
+				/* Rotate 24 bits and insert byte 3 */
+				EMIT(PPC_RAW_RLWIMI(tmp_reg, dst_reg, 24, 16, 23));
+				EMIT(PPC_RAW_RLWIMI(__REG_R0, dst_reg_h, 24, 16, 23));
+				EMIT(PPC_RAW_MR(dst_reg, __REG_R0));
+				EMIT(PPC_RAW_MR(dst_reg_h, tmp_reg));
+				break;
+			}
+			break;
+		case BPF_ALU | BPF_END | BPF_FROM_BE:
+			switch (imm) {
+			case 16:
+				/* zero-extend 16 bits into 32 bits */
+				EMIT(PPC_RAW_RLWINM(dst_reg, dst_reg, 0, 16, 31));
+				break;
+			case 32:
+			case 64:
+				/* nop */
+				break;
+			}
+			break;
+
+		/*
+		 * BPF_ST(X)
+		 */
+		case BPF_STX | BPF_MEM | BPF_B: /* *(u8 *)(dst + off) = src */
+			EMIT(PPC_RAW_STB(src_reg, dst_reg, off));
+			break;
+		case BPF_ST | BPF_MEM | BPF_B: /* *(u8 *)(dst + off) = imm */
+			PPC_LI32(__REG_R0, imm);
+			EMIT(PPC_RAW_STB(__REG_R0, dst_reg, off));
+			break;
+		case BPF_STX | BPF_MEM | BPF_H: /* (u16 *)(dst + off) = src */
+			EMIT(PPC_RAW_STH(src_reg, dst_reg, off));
+			break;
+		case BPF_ST | BPF_MEM | BPF_H: /* (u16 *)(dst + off) = imm */
+			PPC_LI32(__REG_R0, imm);
+			EMIT(PPC_RAW_STH(__REG_R0, dst_reg, off));
+			break;
+		case BPF_STX | BPF_MEM | BPF_W: /* *(u32 *)(dst + off) = src */
+			EMIT(PPC_RAW_STW(src_reg, dst_reg, off));
+			break;
+		case BPF_ST | BPF_MEM | BPF_W: /* *(u32 *)(dst + off) = imm */
+			PPC_LI32(__REG_R0, imm);
+			EMIT(PPC_RAW_STW(__REG_R0, dst_reg, off));
+			break;
+		case BPF_STX | BPF_MEM | BPF_DW: /* (u64 *)(dst + off) = src */
+			EMIT(PPC_RAW_STW(src_reg_h, dst_reg, off));
+			EMIT(PPC_RAW_STW(src_reg, dst_reg, off + 4));
+			break;
+		case BPF_ST | BPF_MEM | BPF_DW: /* *(u64 *)(dst + off) = imm */
+			PPC_LI32(__REG_R0, imm);
+			EMIT(PPC_RAW_STW(__REG_R0, dst_reg, off + 4));
+			PPC_EX32(__REG_R0, imm);
+			EMIT(PPC_RAW_STW(__REG_R0, dst_reg, off));
+			break;
+
+		/*
+		 * BPF_STX XADD (atomic_add)
+		 */
+		case BPF_STX | BPF_XADD | BPF_W: /* *(u32 *)(dst + off) += src */
+			bpf_set_seen_register(ctx, tmp_reg);
+			/* Get offset into TMP_REG */
+			EMIT(PPC_RAW_LI(tmp_reg, off));
+			/* load value from memory into r0 */
+			EMIT(PPC_RAW_LWARX(__REG_R0, tmp_reg, dst_reg, 0));
+			/* add value from src_reg into this */
+			EMIT(PPC_RAW_ADD(__REG_R0, __REG_R0, src_reg));
+			/* store result back */
+			EMIT(PPC_RAW_STWCX(__REG_R0, tmp_reg, dst_reg));
+			/* we're done if this succeeded */
+			PPC_BCC_SHORT(COND_NE, (ctx->idx - 3) * 4);
+			break;
+
+		case BPF_STX | BPF_XADD | BPF_DW: /* *(u64 *)(dst + off) += src */
+			return -EOPNOTSUPP;
+
+		/*
+		 * BPF_LDX
+		 */
+		case BPF_LDX | BPF_MEM | BPF_B: /* dst = *(u8 *)(ul) (src + off) */
+			EMIT(PPC_RAW_LBZ(dst_reg, src_reg, off));
+			if (!fp->aux->verifier_zext)
+				EMIT(PPC_RAW_LI(dst_reg_h, 0));
+			break;
+		case BPF_LDX | BPF_MEM | BPF_H: /* dst = *(u16 *)(ul) (src + off) */
+			EMIT(PPC_RAW_LHZ(dst_reg, src_reg, off));
+			if (!fp->aux->verifier_zext)
+				EMIT(PPC_RAW_LI(dst_reg_h, 0));
+			break;
+		case BPF_LDX | BPF_MEM | BPF_W: /* dst = *(u32 *)(ul) (src + off) */
+			EMIT(PPC_RAW_LWZ(dst_reg, src_reg, off));
+			if (!fp->aux->verifier_zext)
+				EMIT(PPC_RAW_LI(dst_reg_h, 0));
+			break;
+		case BPF_LDX | BPF_MEM | BPF_DW: /* dst = *(u64 *)(ul) (src + off) */
+			EMIT(PPC_RAW_LWZ(dst_reg_h, src_reg, off));
+			EMIT(PPC_RAW_LWZ(dst_reg, src_reg, off + 4));
+			break;
+
+		/*
+		 * Doubleword load
+		 * 16 byte instruction that uses two 'struct bpf_insn'
+		 */
+		case BPF_LD | BPF_IMM | BPF_DW: /* dst = (u64) imm */
+			PPC_LI32(dst_reg_h, (u32)insn[i + 1].imm);
+			PPC_LI32(dst_reg, (u32)insn[i].imm);
+			/* Adjust for two bpf instructions */
+			addrs[++i] = ctx->idx * 4;
+			break;
+
+		/*
+		 * Return/Exit
+		 */
+		case BPF_JMP | BPF_EXIT:
+			/*
+			 * If this isn't the very last instruction, branch to
+			 * the epilogue. If we _are_ the last instruction,
+			 * we'll just fall through to the epilogue.
+			 */
+			if (i != flen - 1)
+				PPC_JMP(exit_addr);
+			/* else fall through to the epilogue */
+			break;
+
+		/*
+		 * Call kernel helper or bpf function
+		 */
+		case BPF_JMP | BPF_CALL:
+			ctx->seen |= SEEN_FUNC;
+
+			ret = bpf_jit_get_func_addr(fp, &insn[i], extra_pass,
+						    &func_addr, &func_addr_fixed);
+			if (ret < 0)
+				return ret;
+
+			if (bpf_is_seen_register(ctx, b2p[BPF_REG_5])) {
+				EMIT(PPC_RAW_STW(b2p[BPF_REG_5] - 1, __REG_R1, 8));
+				EMIT(PPC_RAW_STW(b2p[BPF_REG_5], __REG_R1, 12));
+			}
+
+			bpf_jit_emit_func_call_rel(image, ctx, func_addr);
+
+			EMIT(PPC_RAW_MR(b2p[BPF_REG_0] - 1, __REG_R3));
+			EMIT(PPC_RAW_MR(b2p[BPF_REG_0], __REG_R4));
+			break;
+
+		/*
+		 * Jumps and branches
+		 */
+		case BPF_JMP | BPF_JA:
+			PPC_JMP(addrs[i + 1 + off]);
+			break;
+
+		case BPF_JMP | BPF_JGT | BPF_K:
+		case BPF_JMP | BPF_JGT | BPF_X:
+		case BPF_JMP | BPF_JSGT | BPF_K:
+		case BPF_JMP | BPF_JSGT | BPF_X:
+		case BPF_JMP32 | BPF_JGT | BPF_K:
+		case BPF_JMP32 | BPF_JGT | BPF_X:
+		case BPF_JMP32 | BPF_JSGT | BPF_K:
+		case BPF_JMP32 | BPF_JSGT | BPF_X:
+			true_cond = COND_GT;
+			goto cond_branch;
+		case BPF_JMP | BPF_JLT | BPF_K:
+		case BPF_JMP | BPF_JLT | BPF_X:
+		case BPF_JMP | BPF_JSLT | BPF_K:
+		case BPF_JMP | BPF_JSLT | BPF_X:
+		case BPF_JMP32 | BPF_JLT | BPF_K:
+		case BPF_JMP32 | BPF_JLT | BPF_X:
+		case BPF_JMP32 | BPF_JSLT | BPF_K:
+		case BPF_JMP32 | BPF_JSLT | BPF_X:
+			true_cond = COND_LT;
+			goto cond_branch;
+		case BPF_JMP | BPF_JGE | BPF_K:
+		case BPF_JMP | BPF_JGE | BPF_X:
+		case BPF_JMP | BPF_JSGE | BPF_K:
+		case BPF_JMP | BPF_JSGE | BPF_X:
+		case BPF_JMP32 | BPF_JGE | BPF_K:
+		case BPF_JMP32 | BPF_JGE | BPF_X:
+		case BPF_JMP32 | BPF_JSGE | BPF_K:
+		case BPF_JMP32 | BPF_JSGE | BPF_X:
+			true_cond = COND_GE;
+			goto cond_branch;
+		case BPF_JMP | BPF_JLE | BPF_K:
+		case BPF_JMP | BPF_JLE | BPF_X:
+		case BPF_JMP | BPF_JSLE | BPF_K:
+		case BPF_JMP | BPF_JSLE | BPF_X:
+		case BPF_JMP32 | BPF_JLE | BPF_K:
+		case BPF_JMP32 | BPF_JLE | BPF_X:
+		case BPF_JMP32 | BPF_JSLE | BPF_K:
+		case BPF_JMP32 | BPF_JSLE | BPF_X:
+			true_cond = COND_LE;
+			goto cond_branch;
+		case BPF_JMP | BPF_JEQ | BPF_K:
+		case BPF_JMP | BPF_JEQ | BPF_X:
+		case BPF_JMP32 | BPF_JEQ | BPF_K:
+		case BPF_JMP32 | BPF_JEQ | BPF_X:
+			true_cond = COND_EQ;
+			goto cond_branch;
+		case BPF_JMP | BPF_JNE | BPF_K:
+		case BPF_JMP | BPF_JNE | BPF_X:
+		case BPF_JMP32 | BPF_JNE | BPF_K:
+		case BPF_JMP32 | BPF_JNE | BPF_X:
+			true_cond = COND_NE;
+			goto cond_branch;
+		case BPF_JMP | BPF_JSET | BPF_K:
+		case BPF_JMP | BPF_JSET | BPF_X:
+		case BPF_JMP32 | BPF_JSET | BPF_K:
+		case BPF_JMP32 | BPF_JSET | BPF_X:
+			true_cond = COND_NE;
+			/* fallthrough; */
+
+cond_branch:
+			switch (code) {
+			case BPF_JMP | BPF_JGT | BPF_X:
+			case BPF_JMP | BPF_JLT | BPF_X:
+			case BPF_JMP | BPF_JGE | BPF_X:
+			case BPF_JMP | BPF_JLE | BPF_X:
+			case BPF_JMP | BPF_JEQ | BPF_X:
+			case BPF_JMP | BPF_JNE | BPF_X:
+				/* unsigned comparison */
+				EMIT(PPC_RAW_CMPLW(dst_reg_h, src_reg_h));
+				PPC_BCC_SHORT(COND_NE, (ctx->idx + 2) * 4);
+				EMIT(PPC_RAW_CMPLW(dst_reg, src_reg));
+				break;
+			case BPF_JMP32 | BPF_JGT | BPF_X:
+			case BPF_JMP32 | BPF_JLT | BPF_X:
+			case BPF_JMP32 | BPF_JGE | BPF_X:
+			case BPF_JMP32 | BPF_JLE | BPF_X:
+			case BPF_JMP32 | BPF_JEQ | BPF_X:
+			case BPF_JMP32 | BPF_JNE | BPF_X:
+				/* unsigned comparison */
+				EMIT(PPC_RAW_CMPLW(dst_reg, src_reg));
+				break;
+			case BPF_JMP | BPF_JSGT | BPF_X:
+			case BPF_JMP | BPF_JSLT | BPF_X:
+			case BPF_JMP | BPF_JSGE | BPF_X:
+			case BPF_JMP | BPF_JSLE | BPF_X:
+				/* signed comparison */
+				EMIT(PPC_RAW_CMPW(dst_reg_h, src_reg_h));
+				PPC_BCC_SHORT(COND_NE, (ctx->idx + 2) * 4);
+				EMIT(PPC_RAW_CMPLW(dst_reg, src_reg));
+				break;
+			case BPF_JMP32 | BPF_JSGT | BPF_X:
+			case BPF_JMP32 | BPF_JSLT | BPF_X:
+			case BPF_JMP32 | BPF_JSGE | BPF_X:
+			case BPF_JMP32 | BPF_JSLE | BPF_X:
+				/* signed comparison */
+				EMIT(PPC_RAW_CMPW(dst_reg, src_reg));
+				break;
+			case BPF_JMP | BPF_JSET | BPF_X:
+				EMIT(PPC_RAW_AND_DOT(__REG_R0, dst_reg_h, src_reg_h));
+				PPC_BCC_SHORT(COND_NE, (ctx->idx + 2) * 4);
+				EMIT(PPC_RAW_AND_DOT(__REG_R0, dst_reg, src_reg));
+				break;
+			case BPF_JMP32 | BPF_JSET | BPF_X: {
+				EMIT(PPC_RAW_AND_DOT(__REG_R0, dst_reg, src_reg));
+				break;
+			case BPF_JMP | BPF_JNE | BPF_K:
+			case BPF_JMP | BPF_JEQ | BPF_K:
+			case BPF_JMP | BPF_JGT | BPF_K:
+			case BPF_JMP | BPF_JLT | BPF_K:
+			case BPF_JMP | BPF_JGE | BPF_K:
+			case BPF_JMP | BPF_JLE | BPF_K:
+				/*
+				 * Need sign-extended load, so only positive
+				 * values can be used as imm in cmplwi
+				 */
+				if (imm >= 0 && imm < 32768) {
+					EMIT(PPC_RAW_CMPLWI(dst_reg_h, 0));
+					PPC_BCC_SHORT(COND_NE, (ctx->idx + 2) * 4);
+					EMIT(PPC_RAW_CMPLWI(dst_reg, imm));
+				} else {
+					/* sign-extending load ... but unsigned comparison */
+					PPC_EX32(__REG_R0, imm);
+					EMIT(PPC_RAW_CMPLW(dst_reg_h, __REG_R0));
+					PPC_LI32(__REG_R0, imm);
+					PPC_BCC_SHORT(COND_NE, (ctx->idx + 2) * 4);
+					EMIT(PPC_RAW_CMPLW(dst_reg, __REG_R0));
+				}
+				break;
+			case BPF_JMP32 | BPF_JNE | BPF_K:
+			case BPF_JMP32 | BPF_JEQ | BPF_K:
+			case BPF_JMP32 | BPF_JGT | BPF_K:
+			case BPF_JMP32 | BPF_JLT | BPF_K:
+			case BPF_JMP32 | BPF_JGE | BPF_K:
+			case BPF_JMP32 | BPF_JLE | BPF_K:
+				if (imm >= 0 && imm < 65536) {
+					EMIT(PPC_RAW_CMPLWI(dst_reg, imm));
+				} else {
+					PPC_LI32(__REG_R0, imm);
+					EMIT(PPC_RAW_CMPLW(dst_reg, __REG_R0));
+				}
+				break;
+			}
+			case BPF_JMP | BPF_JSGT | BPF_K:
+			case BPF_JMP | BPF_JSLT | BPF_K:
+			case BPF_JMP | BPF_JSGE | BPF_K:
+			case BPF_JMP | BPF_JSLE | BPF_K:
+				if (imm >= 0 && imm < 65536) {
+					EMIT(PPC_RAW_CMPWI(dst_reg_h, imm < 0 ? -1 : 0));
+					PPC_BCC_SHORT(COND_NE, (ctx->idx + 2) * 4);
+					EMIT(PPC_RAW_CMPLWI(dst_reg, imm));
+				} else {
+					/* sign-extending load */
+					EMIT(PPC_RAW_CMPWI(dst_reg_h, imm < 0 ? -1 : 0));
+					PPC_LI32(__REG_R0, imm);
+					PPC_BCC_SHORT(COND_NE, (ctx->idx + 2) * 4);
+					EMIT(PPC_RAW_CMPLW(dst_reg, __REG_R0));
+				}
+				break;
+			case BPF_JMP32 | BPF_JSGT | BPF_K:
+			case BPF_JMP32 | BPF_JSLT | BPF_K:
+			case BPF_JMP32 | BPF_JSGE | BPF_K:
+			case BPF_JMP32 | BPF_JSLE | BPF_K:
+				/*
+				 * signed comparison, so any 16-bit value
+				 * can be used in cmpwi
+				 */
+				if (imm >= -32768 && imm < 32768) {
+					EMIT(PPC_RAW_CMPWI(dst_reg, imm));
+				} else {
+					/* sign-extending load */
+					PPC_LI32(__REG_R0, imm);
+					EMIT(PPC_RAW_CMPW(dst_reg, __REG_R0));
+				}
+				break;
+			case BPF_JMP | BPF_JSET | BPF_K:
+				/* andi does not sign-extend the immediate */
+				if (imm >= 0 && imm < 32768) {
+					/* PPC_ANDI is _only/always_ dot-form */
+					EMIT(PPC_RAW_ANDI(__REG_R0, dst_reg, imm));
+				} else {
+					PPC_LI32(__REG_R0, imm);
+					if (imm < 0) {
+						EMIT(PPC_RAW_CMPWI(dst_reg_h, 0));
+						PPC_BCC_SHORT(COND_NE, (ctx->idx + 2) * 4);
+					}
+					EMIT(PPC_RAW_AND_DOT(__REG_R0, dst_reg, __REG_R0));
+				}
+				break;
+			case BPF_JMP32 | BPF_JSET | BPF_K:
+				/* andi does not sign-extend the immediate */
+				if (imm >= -32768 && imm < 32768) {
+					/* PPC_ANDI is _only/always_ dot-form */
+					EMIT(PPC_RAW_ANDI(__REG_R0, dst_reg, imm));
+				} else {
+					PPC_LI32(__REG_R0, imm);
+					EMIT(PPC_RAW_AND_DOT(__REG_R0, dst_reg, __REG_R0));
+				}
+				break;
+			}
+			PPC_BCC(true_cond, addrs[i + 1 + off]);
+			break;
+
+		/*
+		 * Tail call
+		 */
+		case BPF_JMP | BPF_TAIL_CALL:
+			ctx->seen |= SEEN_TAILCALL;
+			bpf_jit_emit_tail_call(image, ctx, addrs[i + 1]);
+			break;
+
+		default:
+			/*
+			 * The filter contains something cruel & unusual.
+			 * We don't handle it, but also there shouldn't be
+			 * anything missing from our list.
+			 */
+			pr_err_ratelimited("eBPF filter opcode %04x (@%d) unsupported\n", code, i);
+			return -EOPNOTSUPP;
+		}
+		if (BPF_CLASS(code) == BPF_ALU && !fp->aux->verifier_zext &&
+		    !insn_is_zext(&insn[i + 1]))
+			EMIT(PPC_RAW_LI(dst_reg_h, 0));
+	}
+
+	/* Set end-of-body-code address for exit. */
+	addrs[i] = ctx->idx * 4;
+
+	return 0;
+}
-- 
2.25.0




^ permalink raw reply related

* Re: [PATCH v3 04/19] powerpc/perf: move perf irq/nmi handling details into traps.c
From: Athira Rajeev @ 2020-12-16  7:17 UTC (permalink / raw)
  To: Nicholas Piggin; +Cc: linuxppc-dev
In-Reply-To: <20201128144114.944000-5-npiggin@gmail.com>

[-- Attachment #1: Type: text/html, Size: 12253 bytes --]

^ permalink raw reply

* [PATCH v3 3/4] KVM: PPC: Add infrastructure to support 2nd DAWR
From: Ravi Bangoria @ 2020-12-16 10:42 UTC (permalink / raw)
  To: mpe, paulus
  Cc: christophe.leroy, ravi.bangoria, mikey, kvm, leobras.c, jniethe5,
	linux-kernel, npiggin, kvm-ppc, pbonzini, linuxppc-dev
In-Reply-To: <20201216104219.458713-1-ravi.bangoria@linux.ibm.com>

kvm code assumes single DAWR everywhere. Add code to support 2nd DAWR.
DAWR is a hypervisor resource and thus H_SET_MODE hcall is used to set/
unset it. Introduce new case H_SET_MODE_RESOURCE_SET_DAWR1 for 2nd DAWR.
Also, kvm will support 2nd DAWR only if CPU_FTR_DAWR1 is set.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
---
 Documentation/virt/kvm/api.rst            |  2 ++
 arch/powerpc/include/asm/hvcall.h         |  8 ++++-
 arch/powerpc/include/asm/kvm_host.h       |  3 ++
 arch/powerpc/include/uapi/asm/kvm.h       |  2 ++
 arch/powerpc/kernel/asm-offsets.c         |  2 ++
 arch/powerpc/kvm/book3s_hv.c              | 43 +++++++++++++++++++++++
 arch/powerpc/kvm/book3s_hv_nested.c       |  7 ++++
 arch/powerpc/kvm/book3s_hv_rmhandlers.S   | 23 ++++++++++++
 tools/arch/powerpc/include/uapi/asm/kvm.h |  2 ++
 9 files changed, 91 insertions(+), 1 deletion(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 36d5f1f3c6dd..abb24575bdf9 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -2249,6 +2249,8 @@ registers, find a list below:
   PPC     KVM_REG_PPC_PSSCR               64
   PPC     KVM_REG_PPC_DEC_EXPIRY          64
   PPC     KVM_REG_PPC_PTCR                64
+  PPC     KVM_REG_PPC_DAWR1               64
+  PPC     KVM_REG_PPC_DAWRX1              64
   PPC     KVM_REG_PPC_TM_GPR0             64
   ...
   PPC     KVM_REG_PPC_TM_GPR31            64
diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h
index ca6840239f90..98afa58b619a 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -560,16 +560,22 @@ struct hv_guest_state {
 	u64 pidr;
 	u64 cfar;
 	u64 ppr;
+	/* Version 1 ends here */
+	u64 dawr1;
+	u64 dawrx1;
+	/* Version 2 ends here */
 };
 
 /* Latest version of hv_guest_state structure */
-#define HV_GUEST_STATE_VERSION	1
+#define HV_GUEST_STATE_VERSION	2
 
 static inline int hv_guest_state_size(unsigned int version)
 {
 	switch (version) {
 	case 1:
 		return offsetofend(struct hv_guest_state, ppr);
+	case 2:
+		return offsetofend(struct hv_guest_state, dawrx1);
 	default:
 		return -1;
 	}
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 62cadf1a596e..a93cfb672421 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -307,6 +307,7 @@ struct kvm_arch {
 	u8 svm_enabled;
 	bool threads_indep;
 	bool nested_enable;
+	bool dawr1_enabled;
 	pgd_t *pgtable;
 	u64 process_table;
 	struct dentry *debugfs_dir;
@@ -586,6 +587,8 @@ struct kvm_vcpu_arch {
 	ulong dabr;
 	ulong dawr0;
 	ulong dawrx0;
+	ulong dawr1;
+	ulong dawrx1;
 	ulong ciabr;
 	ulong cfar;
 	ulong ppr;
diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h
index c3af3f324c5a..9f18fa090f1f 100644
--- a/arch/powerpc/include/uapi/asm/kvm.h
+++ b/arch/powerpc/include/uapi/asm/kvm.h
@@ -644,6 +644,8 @@ struct kvm_ppc_cpu_char {
 #define KVM_REG_PPC_MMCR3	(KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xc1)
 #define KVM_REG_PPC_SIER2	(KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xc2)
 #define KVM_REG_PPC_SIER3	(KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xc3)
+#define KVM_REG_PPC_DAWR1	(KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xc4)
+#define KVM_REG_PPC_DAWRX1	(KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xc5)
 
 /* Transactional Memory checkpointed state:
  * This is all GPRs, all VSX regs and a subset of SPRs
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index 5a77aac516ba..a35ea4e19360 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -550,6 +550,8 @@ int main(void)
 	OFFSET(VCPU_DABRX, kvm_vcpu, arch.dabrx);
 	OFFSET(VCPU_DAWR0, kvm_vcpu, arch.dawr0);
 	OFFSET(VCPU_DAWRX0, kvm_vcpu, arch.dawrx0);
+	OFFSET(VCPU_DAWR1, kvm_vcpu, arch.dawr1);
+	OFFSET(VCPU_DAWRX1, kvm_vcpu, arch.dawrx1);
 	OFFSET(VCPU_CIABR, kvm_vcpu, arch.ciabr);
 	OFFSET(VCPU_HFLAGS, kvm_vcpu, arch.hflags);
 	OFFSET(VCPU_DEC, kvm_vcpu, arch.dec);
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index bcbad8daa974..b7a30c0692a7 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -785,6 +785,22 @@ static int kvmppc_h_set_mode(struct kvm_vcpu *vcpu, unsigned long mflags,
 		vcpu->arch.dawr0  = value1;
 		vcpu->arch.dawrx0 = value2;
 		return H_SUCCESS;
+	case H_SET_MODE_RESOURCE_SET_DAWR1:
+		if (!kvmppc_power8_compatible(vcpu))
+			return H_P2;
+		if (!ppc_breakpoint_available())
+			return H_P2;
+		if (!cpu_has_feature(CPU_FTR_DAWR1))
+			return H_P2;
+		if (!vcpu->kvm->arch.dawr1_enabled)
+			return H_FUNCTION;
+		if (mflags)
+			return H_UNSUPPORTED_FLAG_START;
+		if (value2 & DABRX_HYP)
+			return H_P4;
+		vcpu->arch.dawr1  = value1;
+		vcpu->arch.dawrx1 = value2;
+		return H_SUCCESS;
 	case H_SET_MODE_RESOURCE_ADDR_TRANS_MODE:
 		/* KVM does not support mflags=2 (AIL=2) */
 		if (mflags != 0 && mflags != 3)
@@ -1752,6 +1768,12 @@ static int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
 	case KVM_REG_PPC_DAWRX:
 		*val = get_reg_val(id, vcpu->arch.dawrx0);
 		break;
+	case KVM_REG_PPC_DAWR1:
+		*val = get_reg_val(id, vcpu->arch.dawr1);
+		break;
+	case KVM_REG_PPC_DAWRX1:
+		*val = get_reg_val(id, vcpu->arch.dawrx1);
+		break;
 	case KVM_REG_PPC_CIABR:
 		*val = get_reg_val(id, vcpu->arch.ciabr);
 		break;
@@ -1984,6 +2006,12 @@ static int kvmppc_set_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
 	case KVM_REG_PPC_DAWRX:
 		vcpu->arch.dawrx0 = set_reg_val(id, *val) & ~DAWRX_HYP;
 		break;
+	case KVM_REG_PPC_DAWR1:
+		vcpu->arch.dawr1 = set_reg_val(id, *val);
+		break;
+	case KVM_REG_PPC_DAWRX1:
+		vcpu->arch.dawrx1 = set_reg_val(id, *val) & ~DAWRX_HYP;
+		break;
 	case KVM_REG_PPC_CIABR:
 		vcpu->arch.ciabr = set_reg_val(id, *val);
 		/* Don't allow setting breakpoints in hypervisor code */
@@ -3441,6 +3469,13 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit,
 	unsigned long host_dawrx0 = mfspr(SPRN_DAWRX0);
 	unsigned long host_psscr = mfspr(SPRN_PSSCR);
 	unsigned long host_pidr = mfspr(SPRN_PID);
+	unsigned long host_dawr1 = 0;
+	unsigned long host_dawrx1 = 0;
+
+	if (cpu_has_feature(CPU_FTR_DAWR1)) {
+		host_dawr1 = mfspr(SPRN_DAWR1);
+		host_dawrx1 = mfspr(SPRN_DAWRX1);
+	}
 
 	/*
 	 * P8 and P9 suppress the HDEC exception when LPCR[HDICE] = 0,
@@ -3479,6 +3514,10 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit,
 	if (dawr_enabled()) {
 		mtspr(SPRN_DAWR0, vcpu->arch.dawr0);
 		mtspr(SPRN_DAWRX0, vcpu->arch.dawrx0);
+		if (cpu_has_feature(CPU_FTR_DAWR1)) {
+			mtspr(SPRN_DAWR1, vcpu->arch.dawr1);
+			mtspr(SPRN_DAWRX1, vcpu->arch.dawrx1);
+		}
 	}
 	mtspr(SPRN_CIABR, vcpu->arch.ciabr);
 	mtspr(SPRN_IC, vcpu->arch.ic);
@@ -3532,6 +3571,10 @@ static int kvmhv_load_hv_regs_and_go(struct kvm_vcpu *vcpu, u64 time_limit,
 	mtspr(SPRN_CIABR, host_ciabr);
 	mtspr(SPRN_DAWR0, host_dawr0);
 	mtspr(SPRN_DAWRX0, host_dawrx0);
+	if (cpu_has_feature(CPU_FTR_DAWR1)) {
+		mtspr(SPRN_DAWR1, host_dawr1);
+		mtspr(SPRN_DAWRX1, host_dawrx1);
+	}
 	mtspr(SPRN_PID, host_pidr);
 
 	/*
diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c
index 8c608f4d912c..907e68d20486 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -49,6 +49,8 @@ void kvmhv_save_hv_regs(struct kvm_vcpu *vcpu, struct hv_guest_state *hr)
 	hr->pidr = vcpu->arch.pid;
 	hr->cfar = vcpu->arch.cfar;
 	hr->ppr = vcpu->arch.ppr;
+	hr->dawr1 = vcpu->arch.dawr1;
+	hr->dawrx1 = vcpu->arch.dawrx1;
 }
 
 static void byteswap_pt_regs(struct pt_regs *regs)
@@ -91,6 +93,8 @@ static void byteswap_hv_regs(struct hv_guest_state *hr)
 	hr->pidr = swab64(hr->pidr);
 	hr->cfar = swab64(hr->cfar);
 	hr->ppr = swab64(hr->ppr);
+	hr->dawr1 = swab64(hr->dawr1);
+	hr->dawrx1 = swab64(hr->dawrx1);
 }
 
 static void save_hv_return_state(struct kvm_vcpu *vcpu, int trap,
@@ -138,6 +142,7 @@ static void sanitise_hv_regs(struct kvm_vcpu *vcpu, struct hv_guest_state *hr)
 
 	/* Don't let data address watchpoint match in hypervisor state */
 	hr->dawrx0 &= ~DAWRX_HYP;
+	hr->dawrx1 &= ~DAWRX_HYP;
 
 	/* Don't let completed instruction address breakpt match in HV state */
 	if ((hr->ciabr & CIABR_PRIV) == CIABR_PRIV_HYPER)
@@ -167,6 +172,8 @@ static void restore_hv_regs(struct kvm_vcpu *vcpu, struct hv_guest_state *hr)
 	vcpu->arch.pid = hr->pidr;
 	vcpu->arch.cfar = hr->cfar;
 	vcpu->arch.ppr = hr->ppr;
+	vcpu->arch.dawr1 = hr->dawr1;
+	vcpu->arch.dawrx1 = hr->dawrx1;
 }
 
 void kvmhv_restore_hv_return_state(struct kvm_vcpu *vcpu,
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 75804062f2c5..65161c062dc5 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -57,6 +57,8 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
 #define STACK_SLOT_HFSCR	(SFS-72)
 #define STACK_SLOT_AMR		(SFS-80)
 #define STACK_SLOT_UAMOR	(SFS-88)
+#define STACK_SLOT_DAWR1	(SFS-96)
+#define STACK_SLOT_DAWRX1	(SFS-104)
 /* the following is used by the P9 short path */
 #define STACK_SLOT_NVGPRS	(SFS-152)	/* 18 gprs */
 
@@ -715,6 +717,12 @@ BEGIN_FTR_SECTION
 	std	r7, STACK_SLOT_DAWRX0(r1)
 	std	r8, STACK_SLOT_IAMR(r1)
 END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
+BEGIN_FTR_SECTION
+	mfspr	r6, SPRN_DAWR1
+	mfspr	r7, SPRN_DAWRX1
+	std	r6, STACK_SLOT_DAWR1(r1)
+	std	r7, STACK_SLOT_DAWRX1(r1)
+END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S | CPU_FTR_DAWR1)
 
 	mfspr	r5, SPRN_AMR
 	std	r5, STACK_SLOT_AMR(r1)
@@ -805,6 +813,12 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
 	ld	r6, VCPU_DAWRX0(r4)
 	mtspr	SPRN_DAWR0, r5
 	mtspr	SPRN_DAWRX0, r6
+BEGIN_FTR_SECTION
+	ld	r5, VCPU_DAWR1(r4)
+	ld	r6, VCPU_DAWRX1(r4)
+	mtspr	SPRN_DAWR1, r5
+	mtspr	SPRN_DAWRX1, r6
+END_FTR_SECTION_IFSET(CPU_FTR_DAWR1)
 1:
 	ld	r7, VCPU_CIABR(r4)
 	ld	r8, VCPU_TAR(r4)
@@ -1769,6 +1783,12 @@ BEGIN_FTR_SECTION
 	mtspr	SPRN_DAWR0, r6
 	mtspr	SPRN_DAWRX0, r7
 END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
+BEGIN_FTR_SECTION
+	ld	r6, STACK_SLOT_DAWR1(r1)
+	ld	r7, STACK_SLOT_DAWRX1(r1)
+	mtspr	SPRN_DAWR1, r6
+	mtspr	SPRN_DAWRX1, r7
+END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S | CPU_FTR_DAWR1)
 BEGIN_FTR_SECTION
 	ld	r5, STACK_SLOT_TID(r1)
 	ld	r6, STACK_SLOT_PSSCR(r1)
@@ -3343,6 +3363,9 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
 	mtspr	SPRN_IAMR, r0
 	mtspr	SPRN_CIABR, r0
 	mtspr	SPRN_DAWRX0, r0
+BEGIN_FTR_SECTION
+	mtspr	SPRN_DAWRX1, r0
+END_FTR_SECTION_IFSET(CPU_FTR_DAWR1)
 
 BEGIN_MMU_FTR_SECTION
 	b	4f
diff --git a/tools/arch/powerpc/include/uapi/asm/kvm.h b/tools/arch/powerpc/include/uapi/asm/kvm.h
index c3af3f324c5a..9f18fa090f1f 100644
--- a/tools/arch/powerpc/include/uapi/asm/kvm.h
+++ b/tools/arch/powerpc/include/uapi/asm/kvm.h
@@ -644,6 +644,8 @@ struct kvm_ppc_cpu_char {
 #define KVM_REG_PPC_MMCR3	(KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xc1)
 #define KVM_REG_PPC_SIER2	(KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xc2)
 #define KVM_REG_PPC_SIER3	(KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xc3)
+#define KVM_REG_PPC_DAWR1	(KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xc4)
+#define KVM_REG_PPC_DAWRX1	(KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xc5)
 
 /* Transactional Memory checkpointed state:
  * This is all GPRs, all VSX regs and a subset of SPRs
-- 
2.26.2


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox