Linux PCI subsystem development
 help / color / mirror / Atom feed
From: Niklas Schnelle <schnelle@linux.ibm.com>
To: Huacai Chen <chenhuacai@kernel.org>
Cc: Tianrui Zhao <zhaotianrui@loongson.cn>,
	Bibo Mao <maobibo@loongson.cn>,
	Bjorn Helgaas <bhelgaas@google.com>,
	Jan Kiszka <jan.kiszka@siemens.com>,
	linux-s390	 <linux-s390@vger.kernel.org>,
	loongarch@lists.linux.dev, Farhan Ali	 <alifm@linux.ibm.com>,
	Matthew Rosato <mjrosato@linux.ibm.com>,
	Gerald Schaefer	 <gerald.schaefer@linux.ibm.com>,
	Heiko Carstens <hca@linux.ibm.com>,
	Vasily Gorbik <gor@linux.ibm.com>,
	Alexander Gordeev <agordeev@linux.ibm.com>,
	Sven Schnelle	 <svens@linux.ibm.com>,
	Christian Borntraeger <borntraeger@linux.ibm.com>,
	Gerd Bayer <gbayer@linux.ibm.com>,
	linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org
Subject: Re: [PATCH v5 1/2] PCI: Fix isolated PCI function probing with ARI and SR-IOV
Date: Thu, 18 Dec 2025 13:02:09 +0100	[thread overview]
Message-ID: <39e1536d956bfe061a4da7446c41a1b21eac0b37.camel@linux.ibm.com> (raw)
In-Reply-To: <CAAhV-H6GE3q4qaPo9OvNkYOzatR-8BMYeGQ8hdmvKVZXbQF2rw@mail.gmail.com>

On Wed, 2025-12-17 at 14:55 +0800, Huacai Chen wrote:
> On Thu, Dec 4, 2025 at 5:45 AM Niklas Schnelle <schnelle@linux.ibm.com> wrote:
> > 
> > On Mon, 2025-12-01 at 22:45 +0800, Huacai Chen wrote:
> > > 
> > --- snip ---
> > > You said that "it feels like this is just a hack to probe an odd
> > > topology". Yes, to some extent you are right.
> > > 
> > > 1, One of our SoC (LS2K3000) has a special device which has func1 but
> > > without func0. To let the PCI core scan func1 we can only make
> > > hypervisor_isolated_pci_functions() return true.
> > > 2, In the above case, PCI_SCAN_ALL_PCIE_DEVS has no help.
> > > 3, Though we change hypervisor_isolated_pci_functions() to resolve the
> > > above problem, it also lets us pass isolated PCI functions to a guest
> > > OS instance.
> > > 
> > > As a summary, for real machines commit a02fd05661d73a850 is a hack to
> > > probe an odd device, for virtual machines it allows passing isolated
> > > PCI functions.
> > 
> > Ok, thanks for the answer. So let's see how we can debug this and get
> > to a solution that works for both of us. Looking around a bit I see
> > that your pci_loongson_map_bus() has some special handling for trying
> > not to access non-existent devices added by your commit 2410e3301fcc
> > ("PCI: loongson: Don't access non-existent devices"). I wonder if with
> > this patch applied we're running into this same issue but with a devfn
> > that was previously not tried and is not covered by your checks? And
> > maybe since your root complex doesn't return 0xff for these non-
> > existent devices we could end up trying to probe AHCI on such an empty
> > slot misinterpreting whatever it returns as matching device/vendor?
> Commit 2410e3301fcc seems to have no relationship with current problems.

I'm not so sure. The only thing this patch is potentially supposed to
change is which devfns get enumerated and thus config space accessed
looking for a device. And that commit talks about accessing non
existent devices causing a system hang so that does seem fitting in
principle.

> > 
--- snip ---
> > Could you try redoing the test with the AHCI hang but add a print of
> > the affected bus/device/function that AHCI thinks it is probing? Then
> > if the above theory applies we should see it trying to probe on a
> > device that is missing in the correctly booted case and got past your
> > existing checks.
> By redoing this test we found there is only one AHCI detected, and the
> BDF is: bus=0, device=8, fun=0.
> 
> With or without this patch, only one AHCI. But without this patch, the
> AHCI initialization doesn't hang.
> 


This is all very odd. Just so there is no chance of misunderstanding.
You did check the BDF that the ahci driver is trying to probe directly?
I.e. something like what I added as the top commit here:
https://git.kernel.org/pub/scm/linux/kernel/git/niks/linux.git/log/?h=loongarch_debug

This is because with the AHCI controller having a devfn 08.0 devfn and
likely dev->multifunction not set this patch would make a difference in
that it would try to enumerate 08.1 and so on while without this  patch
these would be skipped because of the dev && !dev->multifunction
condition even though isolated function probing should look at all
functions. And I was thinking maybe this causes us to end up trying to
probe an AHCI controller where there is none.

Another thing I could imagine, especially with commit 2410e3301fcc
("PCI: loongson: Don't access non-existent devices") in mind is that
accessing the device/vendor config space for some non existent devices
leaves your PCIe controller in some bad state and then the MMIOs for
the AHCI enable go lost or something. Maybe you could add debug code in
the relevant parts of drivers/pci/controller/pci-loongson.c to check
which devices get accessed with this patch vs without it? Would it help
if I provided a debug patch for that? Though I really don't know what
part is relevant for the system you're seeing the problem with.

Thanks,
Niklas

  reply	other threads:[~2025-12-18 12:03 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-29  9:41 [PATCH v5 0/2] PCI: Fix isolated function probing and enable ARI for s390 Niklas Schnelle
2025-10-29  9:41 ` [PATCH v5 1/2] PCI: Fix isolated PCI function probing with ARI and SR-IOV Niklas Schnelle
2025-11-03  9:50   ` Huacai Chen
2025-11-03 11:23     ` Niklas Schnelle
2025-11-05  1:01       ` Huacai Chen
2025-11-05  9:46         ` Niklas Schnelle
2025-11-07  7:19           ` Huacai Chen
2025-11-10 13:08             ` Niklas Schnelle
2025-11-28 13:30               ` Niklas Schnelle
2025-12-01 14:45                 ` Huacai Chen
2025-12-03 21:45                   ` Niklas Schnelle
2025-12-17  6:55                     ` Huacai Chen
2025-12-18 12:02                       ` Niklas Schnelle [this message]
2025-12-24  9:12                         ` Huacai Chen
2025-10-29  9:41 ` [PATCH v5 2/2] PCI: s390: Handle ARI on bus without associated struct pci_dev Niklas Schnelle

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=39e1536d956bfe061a4da7446c41a1b21eac0b37.camel@linux.ibm.com \
    --to=schnelle@linux.ibm.com \
    --cc=agordeev@linux.ibm.com \
    --cc=alifm@linux.ibm.com \
    --cc=bhelgaas@google.com \
    --cc=borntraeger@linux.ibm.com \
    --cc=chenhuacai@kernel.org \
    --cc=gbayer@linux.ibm.com \
    --cc=gerald.schaefer@linux.ibm.com \
    --cc=gor@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=jan.kiszka@siemens.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=loongarch@lists.linux.dev \
    --cc=maobibo@loongson.cn \
    --cc=mjrosato@linux.ibm.com \
    --cc=svens@linux.ibm.com \
    --cc=zhaotianrui@loongson.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox