Issues with kexec on arm64

public inbox for linux-arm-kernel@lists.infradead.org
 help / color / mirror / Atom feed

* Issues with kexec on arm64
@ 2024-12-24 10:37 Itai Handler
  0 siblings, 0 replies; 7+ messages in thread
From: Itai Handler @ 2024-12-24 10:37 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel

Hello,

I'm encountering kernel panics / system hangs when attempting to
kexec a vmlinux file on arm64 architecture.

It happens both on qemu and on real hardware.

These issues occur on all kernels from v4.19 to the latest mainline.
A sample panic output looks as follows:
  kernel BUG at arch/arm64/mm/mmu.c:217!
  Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
  CPU: 0 PID: 0 Comm: swapper Not tainted 6.6.0 #292
  Hardware name: linux,dummy-virt (DT)
  pstate: 800000c5 (Nzcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
  pc : __create_pgd_mapping+0xe8/0x3b0
  lr : __create_pgd_mapping+0x44/0x3b0
  sp : fffffe00804d3c20
  x29: fffffe00804d3c20 x28: fffffe0080620000 x27: fffffffefdbc0000
  x26: fffffe0080300000 x25: 0000000040010000 x24: fffffffefdbc8020
  x23: fffffe0080010000 x22: 0000000000000040 x21: fffffe0080010000
  x20: fffffe0080300000 x19: 0040000000000783 x18: 0000000000000000
  x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
  x14: fffffffefdde0000 x13: fffffe00804d3c78 x12: 0000000000001d68
  x11: 0000000000001d64 x10: fffffe00804d3c2c x9 : fffffffefdde0000
  x8 : 0000000040420000 x7 : 0000000000001d68 x6 : 0000000000000000
  x5 : fffffe00a0010000 x4 : 0000000000001004 x3 : fffffe0480010000
  x2 : fffffe00804f7ec0 x1 : 0000000000000000 x0 : 0000000000000000
  Call trace:
   __create_pgd_mapping+0xe8/0x3b0
   map_kernel_segment+0x74/0xb0
   paging_init+0xec/0x4f8
   setup_arch+0x234/0x52c
   start_kernel+0x64/0x500
   __primary_switched+0xb4/0xbc
  Code: f9400300 92400400 f1000c1f 54000060 (d4210000)
  ---[ end trace 0000000000000000 ]---
  Kernel panic - not syncing: Oops - BUG: Fatal exception

I bisected those panics to 8eb7e28d4c642c310f25c18f80a44dd4b01c694e
("arm64/mm: move runtime pgds to rodata"), which was added on v4.19.

I also reconstructed the full call trace (by adding "noinline" to the
relevant functions):
  alloc_init_cont_pte+0x6c/0x1e0
  init_pmd+0x154/0x1c8
  alloc_init_cont_pmd+0x11c/0x174
  alloc_init_pud+0xc4/0x148
  __create_pgd_mapping+0xa8/0x130
  map_kernel_segment+0xc8/0x168
  map_kernel+0x98/0x1a8
  paging_init+0x7c/0x418
  setup_arch+0x224/0x570
  start_kernel+0x5c/0x4f0

My understanding is that the panic occurs inside alloc_init_cont_pte,
at the BUG_ON(pmd_bad(..)) line.

Hello,

I see kernel panics / system hangs when I try to kexec a vmlinux file
on arm64 architecture.

It happens both on qemu and on real hardware.

Those issues occur on all kernels from v4.19 to mainline inclusive.
A sample panic output looks as follows:
  kernel BUG at arch/arm64/mm/mmu.c:217!
  Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
  CPU: 0 PID: 0 Comm: swapper Not tainted 6.6.0 #292
  Hardware name: linux,dummy-virt (DT)
  pstate: 800000c5 (Nzcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
  pc : __create_pgd_mapping+0xe8/0x3b0
  lr : __create_pgd_mapping+0x44/0x3b0
  sp : fffffe00804d3c20
  x29: fffffe00804d3c20 x28: fffffe0080620000 x27: fffffffefdbc0000
  x26: fffffe0080300000 x25: 0000000040010000 x24: fffffffefdbc8020
  x23: fffffe0080010000 x22: 0000000000000040 x21: fffffe0080010000
  x20: fffffe0080300000 x19: 0040000000000783 x18: 0000000000000000
  x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
  x14: fffffffefdde0000 x13: fffffe00804d3c78 x12: 0000000000001d68
  x11: 0000000000001d64 x10: fffffe00804d3c2c x9 : fffffffefdde0000
  x8 : 0000000040420000 x7 : 0000000000001d68 x6 : 0000000000000000
  x5 : fffffe00a0010000 x4 : 0000000000001004 x3 : fffffe0480010000
  x2 : fffffe00804f7ec0 x1 : 0000000000000000 x0 : 0000000000000000
  Call trace:
   __create_pgd_mapping+0xe8/0x3b0
   map_kernel_segment+0x74/0xb0
   paging_init+0xec/0x4f8
   setup_arch+0x234/0x52c
   start_kernel+0x64/0x500
   __primary_switched+0xb4/0xbc
  Code: f9400300 92400400 f1000c1f 54000060 (d4210000)
  ---[ end trace 0000000000000000 ]---
  Kernel panic - not syncing: Oops - BUG: Fatal exception

I bisected those panics to 8eb7e28d4c642c310f25c18f80a44dd4b01c694e
("arm64/mm: move runtime pgds to rodata"), which was added on v4.19.

I also reconstructed the full call trace (by adding "noinline" to the
relevant functions):
  alloc_init_cont_pte+0x6c/0x1e0
  init_pmd+0x154/0x1c8
  alloc_init_cont_pmd+0x11c/0x174
  alloc_init_pud+0xc4/0x148
  __create_pgd_mapping+0xa8/0x130
  map_kernel_segment+0xc8/0x168
  map_kernel+0x98/0x1a8
  paging_init+0x7c/0x418
  setup_arch+0x224/0x570
  start_kernel+0x5c/0x4f0

My understanding is that the panic occurs inside alloc_init_cont_pte,
at the BUG_ON(pmd_bad(..)) line.

To run the qemu VM I use the following script:
  APPEND="earlycon console=ttyAMA0 loglevel=8"
  qemu-system-aarch64 \
    -M virt \
    -cpu cortex-a53 \
    -smp 4 \
    -m 4096 \
    -kernel ~/vmshare/Image \
    -initrd ~/vmshare/rootfs.cpio \
    -nographic \
    -append "${APPEND}" \
    -fsdev local,id=vmshare,path=$HOME/vmshare,security_model=mapped,multidevs=remap
\
    -device virtio-9p-pci,fsdev=vmshare,mount_tag=vmshare \

I built the root filesystem using buildroot 2024.08.2, using the following
defconfig:
  BR2_aarch64=y
  BR2_ARM64_PAGE_SIZE_64K=y
  BR2_KERNEL_HEADERS_4_19=y
  BR2_PACKAGE_HOST_GDB=y
  BR2_GDB_VERSION_15=y
  BR2_PACKAGE_KEXEC=y
  BR2_PACKAGE_KEXEC_ZLIB=y
  BR2_TARGET_ROOTFS_CPIO=y
  BR2_PACKAGE_HOST_KMOD=y

To kexec the file I use the following command:
  kexec -d -c -l /media/vmshare/vmlinux \
    --initrd=/media/vmshare/rootfs.cpio \
    --reuse-cmdline \
    && kexec -d -e

Thanks,
Itai Handler


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Issues with kexec on arm64
@ 2024-12-24 11:36 Itai Handler
  2025-01-03 16:16 ` Will Deacon
  2025-01-06 14:02 ` Mark Rutland
  0 siblings, 2 replies; 7+ messages in thread
From: Itai Handler @ 2024-12-24 11:36 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel

[Sorry about my previous e-mail on this subject. It got corrupted.
Please ignore it.]

Hello,

I'm encountering kernel panics / system hangs when attempting to
kexec a vmlinux file on arm64 architecture.

It happens both on qemu and on real hardware.

These issues occur on all kernels from v4.19 to the latest mainline.
A sample panic output looks as follows:
  kernel BUG at arch/arm64/mm/mmu.c:217!
  Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
  CPU: 0 PID: 0 Comm: swapper Not tainted 6.6.0 #292
  Hardware name: linux,dummy-virt (DT)
  pstate: 800000c5 (Nzcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
  pc : __create_pgd_mapping+0xe8/0x3b0
  lr : __create_pgd_mapping+0x44/0x3b0
  sp : fffffe00804d3c20
  x29: fffffe00804d3c20 x28: fffffe0080620000 x27: fffffffefdbc0000
  x26: fffffe0080300000 x25: 0000000040010000 x24: fffffffefdbc8020
  x23: fffffe0080010000 x22: 0000000000000040 x21: fffffe0080010000
  x20: fffffe0080300000 x19: 0040000000000783 x18: 0000000000000000
  x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
  x14: fffffffefdde0000 x13: fffffe00804d3c78 x12: 0000000000001d68
  x11: 0000000000001d64 x10: fffffe00804d3c2c x9 : fffffffefdde0000
  x8 : 0000000040420000 x7 : 0000000000001d68 x6 : 0000000000000000
  x5 : fffffe00a0010000 x4 : 0000000000001004 x3 : fffffe0480010000
  x2 : fffffe00804f7ec0 x1 : 0000000000000000 x0 : 0000000000000000
  Call trace:
   __create_pgd_mapping+0xe8/0x3b0
   map_kernel_segment+0x74/0xb0
   paging_init+0xec/0x4f8
   setup_arch+0x234/0x52c
   start_kernel+0x64/0x500
   __primary_switched+0xb4/0xbc
  Code: f9400300 92400400 f1000c1f 54000060 (d4210000)
  ---[ end trace 0000000000000000 ]---
  Kernel panic - not syncing: Oops - BUG: Fatal exception

I bisected those panics to 8eb7e28d4c642c310f25c18f80a44dd4b01c694e
("arm64/mm: move runtime pgds to rodata"), which was added on v4.19.

I also reconstructed the full call trace (by adding "noinline" to the
relevant functions):
  alloc_init_cont_pte+0x6c/0x1e0
  init_pmd+0x154/0x1c8
  alloc_init_cont_pmd+0x11c/0x174
  alloc_init_pud+0xc4/0x148
  __create_pgd_mapping+0xa8/0x130
  map_kernel_segment+0xc8/0x168
  map_kernel+0x98/0x1a8
  paging_init+0x7c/0x418
  setup_arch+0x224/0x570
  start_kernel+0x5c/0x4f0

My understanding is that the panic occurs inside alloc_init_cont_pte,
at the BUG_ON(pmd_bad(..)) line.

kexec-tools version: 2.0.29
qemu version: 8.1.94 (v8.2.0-rc4)

The .config files are created using the following script:
  make tinyconfig
  # --- Options to enable ---
  scripts/config --enable CONFIG_EXPERT
  scripts/config --enable CONFIG_TTY
  scripts/config --enable CONFIG_PRINTK
  scripts/config --enable CONFIG_BUG
  scripts/config --enable CONFIG_STACKTRACE
  scripts/config --enable CONFIG_BINFMT_ELF
  scripts/config --enable CONFIG_BINFMT_SCRIPT
  scripts/config --enable CONFIG_PROC_FS
  scripts/config --enable CONFIG_BLOCK
  scripts/config --enable CONFIG_BLK_DEV
  scripts/config --enable CONFIG_BLK_DEV_NULL_BLK
  scripts/config --enable CONFIG_BLK_DEV_INITRD
  scripts/config --enable CONFIG_PANIC_ON_OOPS
  scripts/config --enable CONFIG_DEVTMPFS
  scripts/config --enable CONFIG_DEVTMPFS_MOUNT
  scripts/config --enable CONFIG_NET
  scripts/config --enable CONFIG_PCI
  scripts/config --enable CONFIG_PCI_HOST_GENERIC
  scripts/config --enable CONFIG_VIRTIO_MENU
  scripts/config --enable CONFIG_VIRTIO_BLK
  scripts/config --enable CONFIG_VIRTIO_PCI
  scripts/config --enable CONFIG_NET_9P
  scripts/config --enable CONFIG_NET_9P_VIRTIO
  scripts/config --enable CONFIG_9P_FS
  scripts/config --enable CONFIG_CONFIGFS_FS
  scripts/config --enable CONFIG_SUSPEND
  scripts/config --enable CONFIG_PROC_KCORE
  scripts/config --enable CONFIG_KEXEC
  scripts/config --enable CONFIG_SERIAL_AMBA_PL011
  scripts/config --enable CONFIG_SERIAL_AMBA_PL011_CONSOLE
  scripts/config --enable CONFIG_POSIX_TIMERS
  scripts/config --enable CONFIG_KALLSYMS
  scripts/config --enable CONFIG_ARM64_64K_PAGES
  # --- Options to disable ---
  scripts/config --disable CONFIG_IPV6
  scripts/config --disable CONFIG_WIRELESS
  scripts/config --disable CONFIG_SWAP
  make olddefconfig

To run the qemu VM I use the following script:
  APPEND="earlycon console=ttyAMA0 loglevel=8"
  qemu-system-aarch64 \
    -M virt \
    -cpu cortex-a53 \
    -smp 4 \
    -m 4096 \
    -kernel ~/vmshare/Image \
    -initrd ~/vmshare/rootfs.cpio \
    -nographic \
    -append "${APPEND}" \
    -fsdev local,id=vmshare,path=$HOME/vmshare,security_model=mapped,multidevs=remap
\
    -device virtio-9p-pci,fsdev=vmshare,mount_tag=vmshare \

I built the root filesystem using buildroot 2024.08.2, using the following
defconfig:
  BR2_aarch64=y
  BR2_ARM64_PAGE_SIZE_64K=y
  BR2_KERNEL_HEADERS_4_19=y
  BR2_PACKAGE_HOST_GDB=y
  BR2_GDB_VERSION_15=y
  BR2_PACKAGE_KEXEC=y
  BR2_PACKAGE_KEXEC_ZLIB=y
  BR2_TARGET_ROOTFS_CPIO=y
  BR2_PACKAGE_HOST_KMOD=y

To kexec the file I use the following command:
  kexec -d -c -l /media/vmshare/vmlinux \
    --initrd=/media/vmshare/rootfs.cpio \
    --reuse-cmdline \
    && kexec -d -e

Thanks,
Itai Handler


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Issues with kexec on arm64
  2024-12-24 11:36 Issues with kexec on arm64 Itai Handler
@ 2025-01-03 16:16 ` Will Deacon
  2025-01-05 14:46   ` Itai Handler
  2025-01-06 14:02 ` Mark Rutland
  1 sibling, 1 reply; 7+ messages in thread
From: Will Deacon @ 2025-01-03 16:16 UTC (permalink / raw)
  To: Itai Handler
  Cc: linux-kernel, linux-arm-kernel, mark.rutland, ardb, usamaarif642

On Tue, Dec 24, 2024 at 01:36:41PM +0200, Itai Handler wrote:
> [Sorry about my previous e-mail on this subject. It got corrupted.
> Please ignore it.]
> 
> Hello,
> 
> I'm encountering kernel panics / system hangs when attempting to
> kexec a vmlinux file on arm64 architecture.
> 
> It happens both on qemu and on real hardware.
> 
> These issues occur on all kernels from v4.19 to the latest mainline.

I think other folks have been using kexec on arm64, so something smells
fishy here. Is the issue intermittent?

> A sample panic output looks as follows:
>   kernel BUG at arch/arm64/mm/mmu.c:217!
>   Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
>   CPU: 0 PID: 0 Comm: swapper Not tainted 6.6.0 #292
>   Hardware name: linux,dummy-virt (DT)
>   pstate: 800000c5 (Nzcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>   pc : __create_pgd_mapping+0xe8/0x3b0
>   lr : __create_pgd_mapping+0x44/0x3b0
>   sp : fffffe00804d3c20
>   x29: fffffe00804d3c20 x28: fffffe0080620000 x27: fffffffefdbc0000
>   x26: fffffe0080300000 x25: 0000000040010000 x24: fffffffefdbc8020
>   x23: fffffe0080010000 x22: 0000000000000040 x21: fffffe0080010000
>   x20: fffffe0080300000 x19: 0040000000000783 x18: 0000000000000000
>   x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
>   x14: fffffffefdde0000 x13: fffffe00804d3c78 x12: 0000000000001d68
>   x11: 0000000000001d64 x10: fffffe00804d3c2c x9 : fffffffefdde0000
>   x8 : 0000000040420000 x7 : 0000000000001d68 x6 : 0000000000000000
>   x5 : fffffe00a0010000 x4 : 0000000000001004 x3 : fffffe0480010000
>   x2 : fffffe00804f7ec0 x1 : 0000000000000000 x0 : 0000000000000000
>   Call trace:
>    __create_pgd_mapping+0xe8/0x3b0
>    map_kernel_segment+0x74/0xb0
>    paging_init+0xec/0x4f8
>    setup_arch+0x234/0x52c
>    start_kernel+0x64/0x500
>    __primary_switched+0xb4/0xbc
>   Code: f9400300 92400400 f1000c1f 54000060 (d4210000)
>   ---[ end trace 0000000000000000 ]---
>   Kernel panic - not syncing: Oops - BUG: Fatal exception

So this explodes because we find a page-table entry at the pmd level
that we don't like the look of:

  - It's not a block entry
  - It's not all zeroes
  - It's also not a table

Sadly, the actual value is clobbered by the time we take the BUG():

   0:	f9400300	ldr	x0, [x24]
   4:	92400400	and	x0, x0, #0x3
   8:	f1000c1f	cmp	x0, #0x3
   c:	54000060	b.eq	0x18  // b.none
  10:*	d4210000	brk	#0x800		<-- trapping instruction

Maybe dumping 'pmd_val(pmd)' before we crash would be instructive? Maybe
it's a pointer...

> I bisected those panics to 8eb7e28d4c642c310f25c18f80a44dd4b01c694e
> ("arm64/mm: move runtime pgds to rodata"), which was added on v4.19.

Hmm. I wonder if the rodata section isn't being loaded properly? Can you
add some traces to check that, please?

Will


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Issues with kexec on arm64
  2025-01-03 16:16 ` Will Deacon
@ 2025-01-05 14:46   ` Itai Handler
  2025-02-11 18:36     ` Will Deacon
  0 siblings, 1 reply; 7+ messages in thread
From: Itai Handler @ 2025-01-05 14:46 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-kernel, linux-arm-kernel, mark.rutland, ardb, usamaarif642

On Fri, Jan 3, 2025 at 6:16 PM Will Deacon <will@kernel.org> wrote:
>
> On Tue, Dec 24, 2024 at 01:36:41PM +0200, Itai Handler wrote:
> > [Sorry about my previous e-mail on this subject. It got corrupted.
> > Please ignore it.]
> >
> > Hello,
> >
> > I'm encountering kernel panics / system hangs when attempting to
> > kexec a vmlinux file on arm64 architecture.
> >
> > It happens both on qemu and on real hardware.
> >
> > These issues occur on all kernels from v4.19 to the latest mainline.
>
> I think other folks have been using kexec on arm64, so something smells
> fishy here. Is the issue intermittent?

No, it isn't intermittent. It's very easy to reproduce the panics/hangs.
At most we need to perform two recursive kexec attempts of the vmlinux file.
In v6.6, using the configuration I supplied (config.sh), a single kexec
attempt is sufficient to demonstrate the issue. In that case a panic occurs
on the first kexec attempt. In newer versions I mostly see hangs but sometimes
panics as well.
Please note that the configuration I supplied sets CONFIG_ARM64_64K_PAGES=y.
But I saw issues also with 4K pages, but in that case only when enabling some
debug options (KASAN, SCHED_DEBUG, KCSAN).Also please note that kexec with the
Image file (instead of the vmlinux file) seems to work properly, without any
issue.

>
> > A sample panic output looks as follows:
> >   kernel BUG at arch/arm64/mm/mmu.c:217!
> >   Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
> >   CPU: 0 PID: 0 Comm: swapper Not tainted 6.6.0 #292
> >   Hardware name: linux,dummy-virt (DT)
> >   pstate: 800000c5 (Nzcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> >   pc : __create_pgd_mapping+0xe8/0x3b0
> >   lr : __create_pgd_mapping+0x44/0x3b0
> >   sp : fffffe00804d3c20
> >   x29: fffffe00804d3c20 x28: fffffe0080620000 x27: fffffffefdbc0000
> >   x26: fffffe0080300000 x25: 0000000040010000 x24: fffffffefdbc8020
> >   x23: fffffe0080010000 x22: 0000000000000040 x21: fffffe0080010000
> >   x20: fffffe0080300000 x19: 0040000000000783 x18: 0000000000000000
> >   x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
> >   x14: fffffffefdde0000 x13: fffffe00804d3c78 x12: 0000000000001d68
> >   x11: 0000000000001d64 x10: fffffe00804d3c2c x9 : fffffffefdde0000
> >   x8 : 0000000040420000 x7 : 0000000000001d68 x6 : 0000000000000000
> >   x5 : fffffe00a0010000 x4 : 0000000000001004 x3 : fffffe0480010000
> >   x2 : fffffe00804f7ec0 x1 : 0000000000000000 x0 : 0000000000000000
> >   Call trace:
> >    __create_pgd_mapping+0xe8/0x3b0
> >    map_kernel_segment+0x74/0xb0
> >    paging_init+0xec/0x4f8
> >    setup_arch+0x234/0x52c
> >    start_kernel+0x64/0x500
> >    __primary_switched+0xb4/0xbc
> >   Code: f9400300 92400400 f1000c1f 54000060 (d4210000)
> >   ---[ end trace 0000000000000000 ]---
> >   Kernel panic - not syncing: Oops - BUG: Fatal exception
>
> So this explodes because we find a page-table entry at the pmd level
> that we don't like the look of:
>
>   - It's not a block entry
>   - It's not all zeroes
>   - It's also not a table
>
> Sadly, the actual value is clobbered by the time we take the BUG():
>
>    0:   f9400300        ldr     x0, [x24]
>    4:   92400400        and     x0, x0, #0x3
>    8:   f1000c1f        cmp     x0, #0x3
>    c:   54000060        b.eq    0x18  // b.none
>   10:*  d4210000        brk     #0x800          <-- trapping instruction
>
> Maybe dumping 'pmd_val(pmd)' before we crash would be instructive? Maybe
> it's a pointer...

I dumped the bad pmd (on v6.6).
It's always the same value: 128000017901ca60.

>
> > I bisected those panics to 8eb7e28d4c642c310f25c18f80a44dd4b01c694e
> > ("arm64/mm: move runtime pgds to rodata"), which was added on v4.19.
>
> Hmm. I wonder if the rodata section isn't being loaded properly? Can you
> add some traces to check that, please?

Could you advise which traces are needed and how to add them?

Thanks,
Itai Handler


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Issues with kexec on arm64
  2024-12-24 11:36 Issues with kexec on arm64 Itai Handler
  2025-01-03 16:16 ` Will Deacon
@ 2025-01-06 14:02 ` Mark Rutland
  2025-01-07  9:46   ` Itai Handler
  1 sibling, 1 reply; 7+ messages in thread
From: Mark Rutland @ 2025-01-06 14:02 UTC (permalink / raw)
  To: Itai Handler; +Cc: linux-kernel, linux-arm-kernel

On Tue, Dec 24, 2024 at 01:36:41PM +0200, Itai Handler wrote:
> [Sorry about my previous e-mail on this subject. It got corrupted.
> Please ignore it.]
> 
> Hello,

Hi,

> 
> I'm encountering kernel panics / system hangs when attempting to
> kexec a vmlinux file on arm64 architecture.
> 
> It happens both on qemu and on real hardware.
>
> These issues occur on all kernels from v4.19 to the latest mainline.
> A sample panic output looks as follows:
>   kernel BUG at arch/arm64/mm/mmu.c:217!
>   Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
>   CPU: 0 PID: 0 Comm: swapper Not tainted 6.6.0 #292
>   Hardware name: linux,dummy-virt (DT)
>   pstate: 800000c5 (Nzcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>   pc : __create_pgd_mapping+0xe8/0x3b0
>   lr : __create_pgd_mapping+0x44/0x3b0
>   sp : fffffe00804d3c20
>   x29: fffffe00804d3c20 x28: fffffe0080620000 x27: fffffffefdbc0000
>   x26: fffffe0080300000 x25: 0000000040010000 x24: fffffffefdbc8020
>   x23: fffffe0080010000 x22: 0000000000000040 x21: fffffe0080010000
>   x20: fffffe0080300000 x19: 0040000000000783 x18: 0000000000000000
>   x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
>   x14: fffffffefdde0000 x13: fffffe00804d3c78 x12: 0000000000001d68
>   x11: 0000000000001d64 x10: fffffe00804d3c2c x9 : fffffffefdde0000
>   x8 : 0000000040420000 x7 : 0000000000001d68 x6 : 0000000000000000
>   x5 : fffffe00a0010000 x4 : 0000000000001004 x3 : fffffe0480010000
>   x2 : fffffe00804f7ec0 x1 : 0000000000000000 x0 : 0000000000000000
>   Call trace:
>    __create_pgd_mapping+0xe8/0x3b0
>    map_kernel_segment+0x74/0xb0
>    paging_init+0xec/0x4f8
>    setup_arch+0x234/0x52c
>    start_kernel+0x64/0x500
>    __primary_switched+0xb4/0xbc
>   Code: f9400300 92400400 f1000c1f 54000060 (d4210000)
>   ---[ end trace 0000000000000000 ]---
>   Kernel panic - not syncing: Oops - BUG: Fatal exception
> 
> I bisected those panics to 8eb7e28d4c642c310f25c18f80a44dd4b01c694e
> ("arm64/mm: move runtime pgds to rodata"), which was added on v4.19.
> 
> I also reconstructed the full call trace (by adding "noinline" to the
> relevant functions):
>   alloc_init_cont_pte+0x6c/0x1e0
>   init_pmd+0x154/0x1c8
>   alloc_init_cont_pmd+0x11c/0x174
>   alloc_init_pud+0xc4/0x148
>   __create_pgd_mapping+0xa8/0x130
>   map_kernel_segment+0xc8/0x168
>   map_kernel+0x98/0x1a8
>   paging_init+0x7c/0x418
>   setup_arch+0x224/0x570
>   start_kernel+0x5c/0x4f0
> 

Does your system have GICv3 and an ITS? If so, and assuming you're not
using EFI to boot in the first place, what *might* be happening here is
that the GIC is still using property/pending tables allocated bye the
first kernel, and after that memory gets reallocated, the GIC writes
back and corrupts that memory. That would be very sensitive to memory
layout, which could explain why the bisect leads to something that
changes that.

We have a solution for that with EFI (where we can use a configuration
table to indicate that the memory is in use), but we don't currently
have a solution in the absence of EFI, and we should probably forbid
kexec in that case...

Mark.

> My understanding is that the panic occurs inside alloc_init_cont_pte,
> at the BUG_ON(pmd_bad(..)) line.
> 
> kexec-tools version: 2.0.29
> qemu version: 8.1.94 (v8.2.0-rc4)
> 
> The .config files are created using the following script:
>   make tinyconfig
>   # --- Options to enable ---
>   scripts/config --enable CONFIG_EXPERT
>   scripts/config --enable CONFIG_TTY
>   scripts/config --enable CONFIG_PRINTK
>   scripts/config --enable CONFIG_BUG
>   scripts/config --enable CONFIG_STACKTRACE
>   scripts/config --enable CONFIG_BINFMT_ELF
>   scripts/config --enable CONFIG_BINFMT_SCRIPT
>   scripts/config --enable CONFIG_PROC_FS
>   scripts/config --enable CONFIG_BLOCK
>   scripts/config --enable CONFIG_BLK_DEV
>   scripts/config --enable CONFIG_BLK_DEV_NULL_BLK
>   scripts/config --enable CONFIG_BLK_DEV_INITRD
>   scripts/config --enable CONFIG_PANIC_ON_OOPS
>   scripts/config --enable CONFIG_DEVTMPFS
>   scripts/config --enable CONFIG_DEVTMPFS_MOUNT
>   scripts/config --enable CONFIG_NET
>   scripts/config --enable CONFIG_PCI
>   scripts/config --enable CONFIG_PCI_HOST_GENERIC
>   scripts/config --enable CONFIG_VIRTIO_MENU
>   scripts/config --enable CONFIG_VIRTIO_BLK
>   scripts/config --enable CONFIG_VIRTIO_PCI
>   scripts/config --enable CONFIG_NET_9P
>   scripts/config --enable CONFIG_NET_9P_VIRTIO
>   scripts/config --enable CONFIG_9P_FS
>   scripts/config --enable CONFIG_CONFIGFS_FS
>   scripts/config --enable CONFIG_SUSPEND
>   scripts/config --enable CONFIG_PROC_KCORE
>   scripts/config --enable CONFIG_KEXEC
>   scripts/config --enable CONFIG_SERIAL_AMBA_PL011
>   scripts/config --enable CONFIG_SERIAL_AMBA_PL011_CONSOLE
>   scripts/config --enable CONFIG_POSIX_TIMERS
>   scripts/config --enable CONFIG_KALLSYMS
>   scripts/config --enable CONFIG_ARM64_64K_PAGES
>   # --- Options to disable ---
>   scripts/config --disable CONFIG_IPV6
>   scripts/config --disable CONFIG_WIRELESS
>   scripts/config --disable CONFIG_SWAP
>   make olddefconfig
> 
> To run the qemu VM I use the following script:
>   APPEND="earlycon console=ttyAMA0 loglevel=8"
>   qemu-system-aarch64 \
>     -M virt \
>     -cpu cortex-a53 \
>     -smp 4 \
>     -m 4096 \
>     -kernel ~/vmshare/Image \
>     -initrd ~/vmshare/rootfs.cpio \
>     -nographic \
>     -append "${APPEND}" \
>     -fsdev local,id=vmshare,path=$HOME/vmshare,security_model=mapped,multidevs=remap
> \
>     -device virtio-9p-pci,fsdev=vmshare,mount_tag=vmshare \
> 
> I built the root filesystem using buildroot 2024.08.2, using the following
> defconfig:
>   BR2_aarch64=y
>   BR2_ARM64_PAGE_SIZE_64K=y
>   BR2_KERNEL_HEADERS_4_19=y
>   BR2_PACKAGE_HOST_GDB=y
>   BR2_GDB_VERSION_15=y
>   BR2_PACKAGE_KEXEC=y
>   BR2_PACKAGE_KEXEC_ZLIB=y
>   BR2_TARGET_ROOTFS_CPIO=y
>   BR2_PACKAGE_HOST_KMOD=y
> 
> To kexec the file I use the following command:
>   kexec -d -c -l /media/vmshare/vmlinux \
>     --initrd=/media/vmshare/rootfs.cpio \
>     --reuse-cmdline \
>     && kexec -d -e
> 
> Thanks,
> Itai Handler
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Issues with kexec on arm64
  2025-01-06 14:02 ` Mark Rutland
@ 2025-01-07  9:46   ` Itai Handler
  0 siblings, 0 replies; 7+ messages in thread
From: Itai Handler @ 2025-01-07  9:46 UTC (permalink / raw)
  To: Mark Rutland; +Cc: linux-kernel, linux-arm-kernel, Will Deacon, ardb

On Mon, Jan 6, 2025 at 4:02 PM Mark Rutland <mark.rutland@arm.com> wrote:
>
> On Tue, Dec 24, 2024 at 01:36:41PM +0200, Itai Handler wrote:
> > [Sorry about my previous e-mail on this subject. It got corrupted.
> > Please ignore it.]
> >
> > Hello,
>
> Hi,
>
> >
> > I'm encountering kernel panics / system hangs when attempting to
> > kexec a vmlinux file on arm64 architecture.
> >
> > It happens both on qemu and on real hardware.
> >
> > These issues occur on all kernels from v4.19 to the latest mainline.
> > A sample panic output looks as follows:
> >   kernel BUG at arch/arm64/mm/mmu.c:217!
> >   Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
> >   CPU: 0 PID: 0 Comm: swapper Not tainted 6.6.0 #292
> >   Hardware name: linux,dummy-virt (DT)
> >   pstate: 800000c5 (Nzcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> >   pc : __create_pgd_mapping+0xe8/0x3b0
> >   lr : __create_pgd_mapping+0x44/0x3b0
> >   sp : fffffe00804d3c20
> >   x29: fffffe00804d3c20 x28: fffffe0080620000 x27: fffffffefdbc0000
> >   x26: fffffe0080300000 x25: 0000000040010000 x24: fffffffefdbc8020
> >   x23: fffffe0080010000 x22: 0000000000000040 x21: fffffe0080010000
> >   x20: fffffe0080300000 x19: 0040000000000783 x18: 0000000000000000
> >   x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
> >   x14: fffffffefdde0000 x13: fffffe00804d3c78 x12: 0000000000001d68
> >   x11: 0000000000001d64 x10: fffffe00804d3c2c x9 : fffffffefdde0000
> >   x8 : 0000000040420000 x7 : 0000000000001d68 x6 : 0000000000000000
> >   x5 : fffffe00a0010000 x4 : 0000000000001004 x3 : fffffe0480010000
> >   x2 : fffffe00804f7ec0 x1 : 0000000000000000 x0 : 0000000000000000
> >   Call trace:
> >    __create_pgd_mapping+0xe8/0x3b0
> >    map_kernel_segment+0x74/0xb0
> >    paging_init+0xec/0x4f8
> >    setup_arch+0x234/0x52c
> >    start_kernel+0x64/0x500
> >    __primary_switched+0xb4/0xbc
> >   Code: f9400300 92400400 f1000c1f 54000060 (d4210000)
> >   ---[ end trace 0000000000000000 ]---
> >   Kernel panic - not syncing: Oops - BUG: Fatal exception
> >
> > I bisected those panics to 8eb7e28d4c642c310f25c18f80a44dd4b01c694e
> > ("arm64/mm: move runtime pgds to rodata"), which was added on v4.19.
> >
> > I also reconstructed the full call trace (by adding "noinline" to the
> > relevant functions):
> >   alloc_init_cont_pte+0x6c/0x1e0
> >   init_pmd+0x154/0x1c8
> >   alloc_init_cont_pmd+0x11c/0x174
> >   alloc_init_pud+0xc4/0x148
> >   __create_pgd_mapping+0xa8/0x130
> >   map_kernel_segment+0xc8/0x168
> >   map_kernel+0x98/0x1a8
> >   paging_init+0x7c/0x418
> >   setup_arch+0x224/0x570
> >   start_kernel+0x5c/0x4f0
> >
>
> Does your system have GICv3 and an ITS? If so, and assuming you're not
> using EFI to boot in the first place, what *might* be happening here is
> that the GIC is still using property/pending tables allocated bye the
> first kernel, and after that memory gets reallocated, the GIC writes
> back and corrupts that memory. That would be very sensitive to memory
> layout, which could explain why the bisect leads to something that
> changes that.
>
> We have a solution for that with EFI (where we can use a configuration
> table to indicate that the memory is in use), but we don't currently
> have a solution in the absence of EFI, and we should probably forbid
> kexec in that case...
>
> Mark.
>

Hi Mark,

No, both the real hardware and the qemu VM do not have GICv3.
They both have GICv2.
Is kexec supported with GICv2 (assuming I'm not using EFI to boot)?

Thanks,
Itai Handler


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Issues with kexec on arm64
  2025-01-05 14:46   ` Itai Handler
@ 2025-02-11 18:36     ` Will Deacon
  0 siblings, 0 replies; 7+ messages in thread
From: Will Deacon @ 2025-02-11 18:36 UTC (permalink / raw)
  To: Itai Handler
  Cc: linux-kernel, linux-arm-kernel, mark.rutland, ardb, usamaarif642

On Sun, Jan 05, 2025 at 04:46:42PM +0200, Itai Handler wrote:
> On Fri, Jan 3, 2025 at 6:16 PM Will Deacon <will@kernel.org> wrote:
> >
> > On Tue, Dec 24, 2024 at 01:36:41PM +0200, Itai Handler wrote:
> > > [Sorry about my previous e-mail on this subject. It got corrupted.
> > > Please ignore it.]
> > >
> > > Hello,
> > >
> > > I'm encountering kernel panics / system hangs when attempting to
> > > kexec a vmlinux file on arm64 architecture.
> > >
> > > It happens both on qemu and on real hardware.
> > >
> > > These issues occur on all kernels from v4.19 to the latest mainline.
> >
> > I think other folks have been using kexec on arm64, so something smells
> > fishy here. Is the issue intermittent?
> 
> No, it isn't intermittent. It's very easy to reproduce the panics/hangs.
> At most we need to perform two recursive kexec attempts of the vmlinux file.
> In v6.6, using the configuration I supplied (config.sh), a single kexec
> attempt is sufficient to demonstrate the issue. In that case a panic occurs
> on the first kexec attempt. In newer versions I mostly see hangs but sometimes
> panics as well.
> Please note that the configuration I supplied sets CONFIG_ARM64_64K_PAGES=y.
> But I saw issues also with 4K pages, but in that case only when enabling some
> debug options (KASAN, SCHED_DEBUG, KCSAN).Also please note that kexec with the
> Image file (instead of the vmlinux file) seems to work properly, without any
> issue.
> 
> >
> > > A sample panic output looks as follows:
> > >   kernel BUG at arch/arm64/mm/mmu.c:217!
> > >   Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
> > >   CPU: 0 PID: 0 Comm: swapper Not tainted 6.6.0 #292
> > >   Hardware name: linux,dummy-virt (DT)
> > >   pstate: 800000c5 (Nzcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> > >   pc : __create_pgd_mapping+0xe8/0x3b0
> > >   lr : __create_pgd_mapping+0x44/0x3b0
> > >   sp : fffffe00804d3c20
> > >   x29: fffffe00804d3c20 x28: fffffe0080620000 x27: fffffffefdbc0000
> > >   x26: fffffe0080300000 x25: 0000000040010000 x24: fffffffefdbc8020
> > >   x23: fffffe0080010000 x22: 0000000000000040 x21: fffffe0080010000
> > >   x20: fffffe0080300000 x19: 0040000000000783 x18: 0000000000000000
> > >   x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
> > >   x14: fffffffefdde0000 x13: fffffe00804d3c78 x12: 0000000000001d68
> > >   x11: 0000000000001d64 x10: fffffe00804d3c2c x9 : fffffffefdde0000
> > >   x8 : 0000000040420000 x7 : 0000000000001d68 x6 : 0000000000000000
> > >   x5 : fffffe00a0010000 x4 : 0000000000001004 x3 : fffffe0480010000
> > >   x2 : fffffe00804f7ec0 x1 : 0000000000000000 x0 : 0000000000000000
> > >   Call trace:
> > >    __create_pgd_mapping+0xe8/0x3b0
> > >    map_kernel_segment+0x74/0xb0
> > >    paging_init+0xec/0x4f8
> > >    setup_arch+0x234/0x52c
> > >    start_kernel+0x64/0x500
> > >    __primary_switched+0xb4/0xbc
> > >   Code: f9400300 92400400 f1000c1f 54000060 (d4210000)
> > >   ---[ end trace 0000000000000000 ]---
> > >   Kernel panic - not syncing: Oops - BUG: Fatal exception
> >
> > So this explodes because we find a page-table entry at the pmd level
> > that we don't like the look of:
> >
> >   - It's not a block entry
> >   - It's not all zeroes
> >   - It's also not a table
> >
> > Sadly, the actual value is clobbered by the time we take the BUG():
> >
> >    0:   f9400300        ldr     x0, [x24]
> >    4:   92400400        and     x0, x0, #0x3
> >    8:   f1000c1f        cmp     x0, #0x3
> >    c:   54000060        b.eq    0x18  // b.none
> >   10:*  d4210000        brk     #0x800          <-- trapping instruction
> >
> > Maybe dumping 'pmd_val(pmd)' before we crash would be instructive? Maybe
> > it's a pointer...
> 
> I dumped the bad pmd (on v6.6).
> It's always the same value: 128000017901ca60.

Hmm, I can't make anything useful out of that but it certainly looks
bogus.

> > > I bisected those panics to 8eb7e28d4c642c310f25c18f80a44dd4b01c694e
> > > ("arm64/mm: move runtime pgds to rodata"), which was added on v4.19.
> >
> > Hmm. I wonder if the rodata section isn't being loaded properly? Can you
> > add some traces to check that, please?
> 
> Could you advise which traces are needed and how to add them?

If you can find where the .rodata section lives in the kernel binary
that you're trying to kexec, then you could instrument the kexec code to
check that it does indeed load that into memory, for example? You'll
need to use your imagination as you're the one lucky enough to hit the
bug...

Will


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-02-11 18:46 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-24 11:36 Issues with kexec on arm64 Itai Handler
2025-01-03 16:16 ` Will Deacon
2025-01-05 14:46   ` Itai Handler
2025-02-11 18:36     ` Will Deacon
2025-01-06 14:02 ` Mark Rutland
2025-01-07  9:46   ` Itai Handler
  -- strict thread matches above, loose matches on Subject: below --
2024-12-24 10:37 Itai Handler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox