All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marc Zyngier <maz@kernel.org>
To: Coiby Xu <coxu@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	linux-arm-kernel@lists.infradead.org, linux-pci@vger.kernel.org,
	kexec@lists.infradead.org
Subject: Re: [Regression] kdump fails to get DHCP address unless booting with pci=nomsi or without nr_cpus=1
Date: Tue, 12 Aug 2025 14:14:25 +0100	[thread overview]
Message-ID: <86ldno8yxa.wl-maz@kernel.org> (raw)
In-Reply-To: <gec2yl5wx6vvt67smx5emhcoifvtp4orw6sub24b2nrqwryhp2@i4h7qbtwjo3r>

On Tue, 12 Aug 2025 12:07:56 +0100,
Coiby Xu <coxu@redhat.com> wrote:
> 
> On Tue, Aug 12, 2025 at 11:17:04AM +0100, Marc Zyngier wrote:
> > On Tue, 12 Aug 2025 11:09:12 +0100,
> > Coiby Xu <coxu@redhat.com> wrote:
> >> 
> >> On Mon, Aug 11, 2025 at 03:52:04PM +0100, Marc Zyngier wrote:
> >> > On Mon, 11 Aug 2025 14:03:21 +0100,
> >> > Thomas Gleixner <tglx@linutronix.de> wrote:
> >> >>
> >> >> On Mon, Aug 11 2025 at 15:02, Thomas Gleixner wrote:
> >> >>
> >> >> CC+ Marc
> >> >>
> >> >> > On Mon, Aug 11 2025 at 11:23, Coiby Xu wrote:
> >> >> >> Recently I met an issue that on certain virtual machines, the kdump
> >> >> >> kernel fails to get DHCP IP address most of times starting from
> >> >> >> 6.11-rc2. git bisection shows commit b5712bf89b4b ("irqchip/gic-v3-its:
> >> >> >> Provide MSI parent for PCI/MSI[-X]") is the 1st bad commit,
> >> >> >>
> >> >> >>      # good: [7d189c77106ed6df09829f7a419e35ada67b2bd0] PCI/MSI: Provide
> >> >> >>      # MSI_FLAG_PCI_MSI_MASK_PARENT
> >> >> >>      git bisect good 7d189c77106ed6df09829f7a419e35ada67b2bd0
> >> >> >>      # good: [48f71d56e2b87839052d2a2ec32fc97a79c3e264] irqchip/gic-v3-its:
> >> >> >>      # Provide MSI parent infrastructure
> >> >> >>      git bisect good 48f71d56e2b87839052d2a2ec32fc97a79c3e264
> >> >> >>      # good: [8c41ccec839c622b2d1be769a95405e4e9a4cb20] irqchip/irq-msi-lib:
> >> >> >>      # Prepare for PCI MSI/MSIX
> >> >> >>      git bisect good 8c41ccec839c622b2d1be769a95405e4e9a4cb20
> >> >> >>      # first bad commit: [b5712bf89b4bbc5bcc9ebde8753ad222f1f68296]
> >> >> >>      # irqchip/gic-v3-its: Provide MSI parent for PCI/MSI[-X]
> >> >> >
> >> >> > There were follow up fixes on this, so isolating this one is not really
> >> >> > conclusive.
> >> >> >
> >> >> > Is the problem still there on v6.16 and v6.17-rc1?
> >> >
> >> > Yeah, there are way too many things that have been addressed since.
> >> > kdump is also a particularly nasty case, as it tends to rely on the
> >> > redistributor tables programmed by the previous kernel.
> >> 
> >> Thanks for providing a clue. This may also explain explain why I fails
> >> to reproduce this issue against 1st kernel even with the same cmdline of
> >> the kdump kernel.
> > 
> > I'm not sure that's a clue. It's only an indication that things are
> > not necessarily easy to spot.
> > 
> > Has it ever been reproduced on bare metal? Have you tried v6.16 as
> > instructed?
> 
> Thanks for replying so quickly!
> 
> No, I haven't reproduced it on a bare metal machine and our QE engineers
> haven't noticed this issue on any bare metal machine either. 
> And I can confirm this issue still happens to 6.16.0-200.fc42.aarch64
> and 6.17.0-0.rc1.17.fc43.aarch64 on the type of KVM VMS (QEMU PnP device
> PNP0c02) where the issue was found.

What is that device? Is that the emulated PCI bridge?

> >> > Also, this says "virtual machines". What's the hypervisor?
> >> 
> >> I'll contact the lab administrator. What kinds of info I should collect
> >> to help you narrow down the issue?
> > 
> > Surely you know what hypervisor you're running on, right?
> 
> Yes, the hypervisor is KVM. Sorry, I thought merely providing the
> hypervisor info isn't sufficient and also misunderstood your request as
> providing more details on the host machine.

Well, knowing that it is KVM is definitely relevant, given that this
is my own turf.

> >> > How hard is it to reproduce?
> >> 
> >> It can be reproduced reliably on certain machines. But as of writing I
> >> haven't reproduced it on other KVM virtual machines on three different
> >> host machines.
> > 
> > Which machines? I'm sorry, but if you want help on this, you'll have
> > to provide actual information.
> 
> Sorry, I didn't mean to be vague. I thought you question is on how
> reproducible this issue is and there is no need to provide the details
> on the machines where I can't reproduce this issue. Since you explicitly
> request it, I'll be glad to share the details.
> 
> I just grabbed three arbitrary bare metal machines having Fedora-42
> installed and launched some KVM VMs to see if this issue can be
> reproduced easily. Two host machines are as follows (sorry I can't find
> the info of the 3rd one)
> - GIGABYTE PnP device PNP0c02, ARMv8 (M128-30)
> - LTHPCSR112 (01234567890123456789AB), ARMv8 (Q80-30)

Are these both Ampere Altra boxes?

> The virtual machine image is downloaded from
> https://download.fedoraproject.org/pub/fedora/linux/releases/42/Cloud/aarch64/images/Fedora-Cloud-Base-Generic-42-1.1.aarch64.qcow2.
> I tried different vCPUs (2, 4), different RAM (4G, 35G) and also two
> different UEFI firmware (the default one and one from edk2-experimental
> package) but haven't reproduced this issue so far.

Hold on. Above, you say that you have reproduced it with
6.16.0-200.fc42.aarch64. So have you, or have you not reproduced it?

Can you at the very least share:

- the boot log of the guest on its first kernel

- the boot log of the guest running kdump

- the content of /sys/kernel/debug/kvm/$PID-xx/vgic*state* when
  running both kernels

- the QEMU command-line to get to run the whole thing

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.


  reply	other threads:[~2025-08-12 16:30 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-11  3:23 [Regression] kdump fails to get DHCP address unless booting with pci=nomsi or without nr_cpus=1 Coiby Xu
2025-08-11 13:02 ` Thomas Gleixner
2025-08-11 13:03   ` Thomas Gleixner
2025-08-11 14:52     ` Marc Zyngier
2025-08-12 10:09       ` Coiby Xu
2025-08-12 10:17         ` Marc Zyngier
2025-08-12 11:07           ` Coiby Xu
2025-08-12 13:14             ` Marc Zyngier [this message]
     [not found]               ` <yweverlt7onyse3rbm7phxzwrwfk4pq2dipzdjenrx4onrak6r@dsm4ra3x3gv6>
2025-08-19 23:30                 ` Coiby Xu
2025-08-20  8:56                   ` Marc Zyngier
2025-08-23  3:00                     ` Coiby Xu
2025-08-27  8:17                       ` Coiby Xu
2025-08-12  3:29   ` Coiby Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=86ldno8yxa.wl-maz@kernel.org \
    --to=maz@kernel.org \
    --cc=coxu@redhat.com \
    --cc=kexec@lists.infradead.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.