From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D67296BB5B for ; Tue, 12 Aug 2025 13:14:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755004475; cv=none; b=AAJ+hoL/MGIcPVCG7Z1V0FKx75qpO4K7tn2pjZ8itMcvG2GZPMfKZAv5IG7bHvMtmCAYeO4GNYRBY02gFRPRSI3f8DxUcHmVrdIRhk3K4deNhqQUXoEEATf+E+vbCj3qOeRNowv3FeQ19fj6QuMC1niiNizDSAux8DumTEoLgNo= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1755004475; c=relaxed/simple; bh=N31Jk7sk2wXqUWlbJqfBRZI21Hu0ZuXuWWLfYwq5Akw=; h=Date:Message-ID:From:To:Cc:Subject:In-Reply-To:References: MIME-Version:Content-Type; b=XPyyR4bVRTeLiPoLujWWwEgQ2sMY2qlbTSYsfcUBb5TwQhoKr/WZdROYJehpqLeZw3TESYz+UFWnBmLucmIe9rt4MvRyRyf7skR/q/ucYbWEE+yrj/ok1CYLXWmumbU5zHWPQ+OfLie9vJDjkQdgEeqmHQU7XpoIE/xOZn2bwIw= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=qAYzFzJp; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="qAYzFzJp" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 672C2C4CEF5; Tue, 12 Aug 2025 13:14:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1755004474; bh=N31Jk7sk2wXqUWlbJqfBRZI21Hu0ZuXuWWLfYwq5Akw=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=qAYzFzJpb3+GWjgjuiff8vRWHltE0MgI/+D0MvqHcg/E1fJs4QZopkMhyCS7HaxGn Tj4yzWtlKEYCepLwpucu+QKKbxfRC6wLiL6S3fJTcYFZsn9tr4BmiTpaUleocDfcf0 S3jJ201CYCcMLcVeBe968r5kcxlrewbjbmSllG6wL8MqvQaGXG7OsbznGCi8QVH6Ny /ZOxgeKakz/XJgtWLdWt8nR7w7jfUmR4lColyUe4x8YVmeWz1KReHFt0ozjwFlGUmS UEYyYgWvRS/JL5e0kg4PZaEqKyWazZ4WRWFhpUj06dJft9qlkoNAgNZ09YRR+8X+h1 qH2EUD6DHPqew== Received: from sofa.misterjones.org ([185.219.108.64] helo=goblin-girl.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1ulope-006gas-Mr; Tue, 12 Aug 2025 14:14:26 +0100 Date: Tue, 12 Aug 2025 14:14:25 +0100 Message-ID: <86ldno8yxa.wl-maz@kernel.org> From: Marc Zyngier To: Coiby Xu Cc: Thomas Gleixner , linux-arm-kernel@lists.infradead.org, linux-pci@vger.kernel.org, kexec@lists.infradead.org Subject: Re: [Regression] kdump fails to get DHCP address unless booting with pci=nomsi or without nr_cpus=1 In-Reply-To: References: <87bjom8106.ffs@tglx> <878qjq80yu.ffs@tglx> <86v7mt9ai3.wl-maz@kernel.org> <86ms84974v.wl-maz@kernel.org> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/30.1 (aarch64-unknown-linux-gnu) MULE/6.0 (HANACHIRUSATO) Precedence: bulk X-Mailing-List: linux-pci@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: coxu@redhat.com, tglx@linutronix.de, linux-arm-kernel@lists.infradead.org, linux-pci@vger.kernel.org, kexec@lists.infradead.org X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false On Tue, 12 Aug 2025 12:07:56 +0100, Coiby Xu wrote: > > On Tue, Aug 12, 2025 at 11:17:04AM +0100, Marc Zyngier wrote: > > On Tue, 12 Aug 2025 11:09:12 +0100, > > Coiby Xu wrote: > >> > >> On Mon, Aug 11, 2025 at 03:52:04PM +0100, Marc Zyngier wrote: > >> > On Mon, 11 Aug 2025 14:03:21 +0100, > >> > Thomas Gleixner wrote: > >> >> > >> >> On Mon, Aug 11 2025 at 15:02, Thomas Gleixner wrote: > >> >> > >> >> CC+ Marc > >> >> > >> >> > On Mon, Aug 11 2025 at 11:23, Coiby Xu wrote: > >> >> >> Recently I met an issue that on certain virtual machines, the kdump > >> >> >> kernel fails to get DHCP IP address most of times starting from > >> >> >> 6.11-rc2. git bisection shows commit b5712bf89b4b ("irqchip/gic-v3-its: > >> >> >> Provide MSI parent for PCI/MSI[-X]") is the 1st bad commit, > >> >> >> > >> >> >> # good: [7d189c77106ed6df09829f7a419e35ada67b2bd0] PCI/MSI: Provide > >> >> >> # MSI_FLAG_PCI_MSI_MASK_PARENT > >> >> >> git bisect good 7d189c77106ed6df09829f7a419e35ada67b2bd0 > >> >> >> # good: [48f71d56e2b87839052d2a2ec32fc97a79c3e264] irqchip/gic-v3-its: > >> >> >> # Provide MSI parent infrastructure > >> >> >> git bisect good 48f71d56e2b87839052d2a2ec32fc97a79c3e264 > >> >> >> # good: [8c41ccec839c622b2d1be769a95405e4e9a4cb20] irqchip/irq-msi-lib: > >> >> >> # Prepare for PCI MSI/MSIX > >> >> >> git bisect good 8c41ccec839c622b2d1be769a95405e4e9a4cb20 > >> >> >> # first bad commit: [b5712bf89b4bbc5bcc9ebde8753ad222f1f68296] > >> >> >> # irqchip/gic-v3-its: Provide MSI parent for PCI/MSI[-X] > >> >> > > >> >> > There were follow up fixes on this, so isolating this one is not really > >> >> > conclusive. > >> >> > > >> >> > Is the problem still there on v6.16 and v6.17-rc1? > >> > > >> > Yeah, there are way too many things that have been addressed since. > >> > kdump is also a particularly nasty case, as it tends to rely on the > >> > redistributor tables programmed by the previous kernel. > >> > >> Thanks for providing a clue. This may also explain explain why I fails > >> to reproduce this issue against 1st kernel even with the same cmdline of > >> the kdump kernel. > > > > I'm not sure that's a clue. It's only an indication that things are > > not necessarily easy to spot. > > > > Has it ever been reproduced on bare metal? Have you tried v6.16 as > > instructed? > > Thanks for replying so quickly! > > No, I haven't reproduced it on a bare metal machine and our QE engineers > haven't noticed this issue on any bare metal machine either. > And I can confirm this issue still happens to 6.16.0-200.fc42.aarch64 > and 6.17.0-0.rc1.17.fc43.aarch64 on the type of KVM VMS (QEMU PnP device > PNP0c02) where the issue was found. What is that device? Is that the emulated PCI bridge? > >> > Also, this says "virtual machines". What's the hypervisor? > >> > >> I'll contact the lab administrator. What kinds of info I should collect > >> to help you narrow down the issue? > > > > Surely you know what hypervisor you're running on, right? > > Yes, the hypervisor is KVM. Sorry, I thought merely providing the > hypervisor info isn't sufficient and also misunderstood your request as > providing more details on the host machine. Well, knowing that it is KVM is definitely relevant, given that this is my own turf. > >> > How hard is it to reproduce? > >> > >> It can be reproduced reliably on certain machines. But as of writing I > >> haven't reproduced it on other KVM virtual machines on three different > >> host machines. > > > > Which machines? I'm sorry, but if you want help on this, you'll have > > to provide actual information. > > Sorry, I didn't mean to be vague. I thought you question is on how > reproducible this issue is and there is no need to provide the details > on the machines where I can't reproduce this issue. Since you explicitly > request it, I'll be glad to share the details. > > I just grabbed three arbitrary bare metal machines having Fedora-42 > installed and launched some KVM VMs to see if this issue can be > reproduced easily. Two host machines are as follows (sorry I can't find > the info of the 3rd one) > - GIGABYTE PnP device PNP0c02, ARMv8 (M128-30) > - LTHPCSR112 (01234567890123456789AB), ARMv8 (Q80-30) Are these both Ampere Altra boxes? > The virtual machine image is downloaded from > https://download.fedoraproject.org/pub/fedora/linux/releases/42/Cloud/aarch64/images/Fedora-Cloud-Base-Generic-42-1.1.aarch64.qcow2. > I tried different vCPUs (2, 4), different RAM (4G, 35G) and also two > different UEFI firmware (the default one and one from edk2-experimental > package) but haven't reproduced this issue so far. Hold on. Above, you say that you have reproduced it with 6.16.0-200.fc42.aarch64. So have you, or have you not reproduced it? Can you at the very least share: - the boot log of the guest on its first kernel - the boot log of the guest running kdump - the content of /sys/kernel/debug/kvm/$PID-xx/vgic*state* when running both kernels - the QEMU command-line to get to run the whole thing Thanks, M. -- Without deviation from the norm, progress is not possible.