Re: -next boot failures during KVM setup

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Marc Zyngier <maz@kernel.org>
To: Mark Brown <broonie@kernel.org>, Will Deacon <will@kernel.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Ard Biesheuvel <ardb@kernel.org>
Cc: Oliver Upton <oupton@kernel.org>,
	Aishwarya.TCV@arm.com, linux-arm-kernel@lists.infradead.org
Subject: Re: -next boot failures during KVM setup
Date: Mon, 08 Jun 2026 21:18:29 +0100	[thread overview]
Message-ID: <87tsrc946i.wl-maz@kernel.org> (raw)
In-Reply-To: <aicVyebkEMs6w6UV@sirena.co.uk>

[+ Will, Catalin, Ard]

On Mon, 08 Jun 2026 20:19:37 +0100,
Mark Brown <broonie@kernel.org> wrote:
> 
> I'm seeing boot failures on a range of physical arm64 platforms in
> today's -next.  Turning on earlycon it looks like we're getting bad
> pointer dereferences during KVM initialisation:
> 
> [    0.728923] kvm [1]: nv: 570 coarse grained trap handlers
> [    0.735138] kvm [1]: nv: 710 fine grained trap handlers
> [    0.741326] kvm [1]: IPA Size Limit: 40 bits
> [    0.748840] Unable to handle kernel paging request at virtual address ffff00000478e000

That really doesn't look like a duff pointer.

> [    0.757027] Mem abort info:
> [    0.759917]   ESR = 0x0000000096000147

Translation fault, level 3. My take is that something is getting
unmapped.

> [    0.763772]   EC = 0x25: DABT (current EL), IL = 32 bits
> [    0.851526] pc : dcache_clean_inval_poc+0x24/0x48
> [    0.856367] lr : kvm_arm_init+0xbb0/0x13f0
> ...
> 
> [    0.937120] Call trace:
> [    0.939628]  dcache_clean_inval_poc+0x24/0x48 (P)
> [    0.944457]  do_one_initcall+0x60/0x1d4
> [    0.948393]  kernel_init_freeable+0x250/0x2d8
> 
>    https://lava.sirena.org.uk/scheduler/job/2849583#L848
> 
> (with other platforms I've got earlycon logs showing basically the same
> thing).  I have some bisects but they seem to have been confused by
> earlier driver core issues, I've tweaked to try to avoid that and am
> retrying.  FVP and qemu seem unaffected:
> 
>   https://lava.sirena.org.uk/scheduler/job/2848374#L888
>   https://lava.sirena.org.uk/scheduler/job/2848966#L447
> 
> The affected platforms thus far are all SMP Cortex A53/5 systems, but
> that's the vast majority of my lab.  They have both GICv3 and GICv2.

I've reproduced with -next on an A72 platform. But it doesn't happen
with kvmarm/next on its own. So it is likely something coming from
another tree that messes up with CMOs, or .

The stack trace here is slightly better:

[    0.099138] Unable to handle kernel paging request at virtual address ffff0023d9ead000
[    0.099141] Mem abort info:
[    0.099142]   ESR = 0x0000000096000147
[    0.099144]   EC = 0x25: DABT (current EL), IL = 32 bits
[    0.099146]   SET = 0, FnV = 0
[    0.099148]   EA = 0, S1PTW = 0
[    0.099150]   FSC = 0x07: level 3 translation fault
[    0.099151] Data abort info:
[    0.099153]   ISV = 0, ISS = 0x00000147, ISS2 = 0x00000000
[    0.099155]   CM = 1, WnR = 1, TnD = 0, TagAccess = 0
[    0.099157]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[    0.099159] swapper pgtable: 4k pages, 48-bit VAs, pgdp=000000245983b000
[    0.099162] [ffff0023d9ead000] pgd=18000027fffff403, p4d=18000027fffff403, pud=18000027ffffe403, pmd=18000027fffed403, pte=00e8002459eadf06
[    0.099173] Internal error: Oops: 0000000096000147 [#1]  SMP
[    0.582137] Freeing initrd memory: 29068K
[    2.025400] Modules linked in:
[    2.028447] CPU: 2 UID: 0 PID: 1 Comm: swapper/0 Not tainted 7.1.0-rc7-next-20260608 #6265 PREEMPT 
[    2.037482] Hardware name: SolidRun Ltd. SolidRun CEX7 Platform, BIOS EDK II May 30 2024
[    2.045559] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[    2.052510] pc : dcache_clean_inval_poc+0x24/0x48
[    2.057210] lr : kvm_hyp_init_symbols+0x370/0x388
[    2.061904] sp : ffff80008009bd00
[    2.065206] x29: ffff80008009bd00 x28: 0000000000000000 x27: 0000002022116000
[    2.072332] x26: ffff0020232967f0 x25: 00000020a2116000 x24: 00000000000038b0
[    2.079458] x23: 0000000000000030 x22: ffffc7dc575880c8 x21: ffffc7dc57948fb0
[    2.086584] x20: 0000000000000001 x19: 0000000001002222 x18: 00000000ffffffff
[    2.093709] x17: 000000007a3345b6 x16: 0000000073a611dd x15: 0000000000000000
[    2.100835] x14: 0000000000000000 x13: 0000000000000000 x12: fffffffffffff800
[    2.107960] x11: 00000000000007ff x10: 0000000000000000 x9 : fffffffffffff800
[    2.115086] x8 : 00000000000007ff x7 : 0000000000000000 x6 : ffffc7dc5740af58
[    2.122211] x5 : 0000000080000000 x4 : ffffc7b87de00000 x3 : 000000000000003f
[    2.129336] x2 : 0000000000000040 x1 : ffff0023d9eaf000 x0 : ffff0023d9ead000
[    2.136462] Call trace:
[    2.138896]  dcache_clean_inval_poc+0x24/0x48 (P)
[    2.143592]  init_hyp_mode+0x644/0x960
[    2.147333]  kvm_arm_init+0x128/0x280
[    2.150987]  do_one_initcall+0x4c/0x458
[    2.154813]  kernel_init_freeable+0x1f4/0x2a0
[    2.159161]  kernel_init+0x2c/0x150
[    2.162642]  ret_from_fork+0x10/0x20
[    2.166210] Code: 9ac32042 d1000443 8a230000 d503201f (d50b7e20) 
[    2.172292] ---[ end trace 0000000000000000 ]---
[    2.176958] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[    2.184608] SMP: stopping secondary CPUs
[    2.188523] Kernel Offset: 0x47dbd5dc0000 from 0xffff800080000000
[    2.194604] PHYS_OFFSET: 0x80000000
[    2.198080] CPU features: 0x04000000,804b0008,00040001,0400421b
[    2.203988] Memory Limit: none
[    2.207031] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---

This points to the following code in kvm_hyp_init_symbols():

<quote>
	/*
	 * Flush entire BSS since part of its data containing init symbols is read
	 * while the MMU is off.
	 */
	kvm_flush_dcache_to_poc(kvm_ksym_ref(__hyp_bss_start),
				kvm_ksym_ref(__hyp_bss_end) - kvm_ksym_ref(__hyp_bss_start))

</quote>

which I suspect is related to some of the new BSS related code in
arm64/for-next/mm.

Ard, does this ring a bell?

Thanks,

	M.

-- 
Jazz isn't dead. It just smells funny.

next prev parent reply	other threads:[~2026-06-08 20:15 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-08 19:19 -next boot failures during KVM setup Mark Brown
2026-06-08 20:18 ` Marc Zyngier [this message]
2026-06-08 20:56   ` Ard Biesheuvel
2026-06-08 23:27     ` Nathan Chancellor

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87tsrc946i.wl-maz@kernel.org \
    --to=maz@kernel.org \
    --cc=Aishwarya.TCV@arm.com \
    --cc=ardb@kernel.org \
    --cc=broonie@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=oupton@kernel.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.