From: Niklas Schnelle <schnelle@linux.ibm.com>
To: Huacai Chen <chenhuacai@kernel.org>,
Tianrui Zhao <zhaotianrui@loongson.cn>,
Bibo Mao <maobibo@loongson.cn>,
Bjorn Helgaas <bhelgaas@google.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>,
linux-s390 <linux-s390@vger.kernel.org>,
loongarch@lists.linux.dev, Farhan Ali <alifm@linux.ibm.com>,
Matthew Rosato <mjrosato@linux.ibm.com>,
Gerald Schaefer <gerald.schaefer@linux.ibm.com>,
Heiko Carstens <hca@linux.ibm.com>,
Vasily Gorbik <gor@linux.ibm.com>,
Alexander Gordeev <agordeev@linux.ibm.com>,
Sven Schnelle <svens@linux.ibm.com>,
Christian Borntraeger <borntraeger@linux.ibm.com>,
Gerd Bayer <gbayer@linux.ibm.com>,
linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org
Subject: Re: [PATCH v5 1/2] PCI: Fix isolated PCI function probing with ARI and SR-IOV
Date: Fri, 28 Nov 2025 14:30:40 +0100 [thread overview]
Message-ID: <298aaf6b2815e59d1a94efffdd0e3b002c000cea.camel@linux.ibm.com> (raw)
In-Reply-To: <7385677843a7790e01158644f63ae4dbb3353bfe.camel@linux.ibm.com>
On Mon, 2025-11-10 at 14:08 +0100, Niklas Schnelle wrote:
> On Fri, 2025-11-07 at 15:19 +0800, Huacai Chen wrote:
> > On Wed, Nov 5, 2025 at 5:46 PM Niklas Schnelle <schnelle@linux.ibm.com> wrote:
> > >
> > > On Wed, 2025-11-05 at 09:01 +0800, Huacai Chen wrote:
> > > > On Mon, Nov 3, 2025 at 7:23 PM Niklas Schnelle <schnelle@linux.ibm.com> wrote:
> > > > >
> > > > > On Mon, 2025-11-03 at 17:50 +0800, Huacai Chen wrote:
> > > > > > Hi, Niklas,
> > > > > >
> > > > > > On Wed, Oct 29, 2025 at 5:42 PM Niklas Schnelle <schnelle@linux.ibm.com> wrote:
--- snip ---
> > > > > > >
> > > > > > > Still especially the first issue prevents correct detection of ARI and
> > > > > > > the second might be a problem for other users of isolated function
> > > > > > > probing. Fix both issues by keeping things as simple as possible. If
> > > > > > > isolated function probing is enabled simply scan every possible devfn.
> > > > > > I'm very sorry, but applying this patch on top of commit a02fd05661d7
> > > > > > ("PCI: Extend isolated function probing to LoongArch") we fail to
> > > > > > boot.
> > > > > >
> > > > > > Boot log:
> > > > > >
--- snip ---
> > > > >
> > > > >
> > > > > This looks like a warning telling us that AHCI enable failed / timed
> > > > > out. Do you have Panic on Warn on that this directly causes a boot
> > > > > failure? The only relation I can see with my patch is that maybe this
> > > > > AHCI device wasn't probed before and somehow isn't working?
> > > > The rootfs is on the AHCI controller, so AHCI failure causes the boot
> > > > failure, without this patch no boot problems.
> > > >
> > > > Huacai
> > > >
> > >
> > > Ok, I'm going to need more details to make sense of this. Can you tell
> > > me if ARI is enabled for that bus? Did you test with both patches or
> > > just this one? Could you provide lspci -vv from a good boot and can you
> > > tell which AHCI device the root device is on? Also could you clarify
> > > why you set hypervisor_isolated_pci_functions() in particular this
> > > seems like a bare metal boot, right? When running in KVM do you pass-
> > > through individual PCI functions with the guest seeing a devfn other
> > > than 0 alone, i.e. a missing devfn 0? Or do you need this for bare
> > > metal for some reason? If you don't need it for bare metal, does the
> > > boot work if you return 0 from hypervisor_isolated_pci_functions() with
> > > this patch?
> > 1. ARI isn't enabled.
> > 2. Only test the first patch.
> > 3. This is a bare metal boot.
> > 4. If hypervisor_isolated_pci_functions() return 0 then boot is OK
> > 5. PCI information please see the attachment.
> >
> > Huacai
>
> Thanks for the input. As far as I can see the lspci from a good boot
> shows no holes in your devfn space so this particular system doesn't
> seem to need the isolated function probing at all. But even then using
> it should only try out all devfns and thus never skip one that is found
> without isolated function probing.
>
> To sanity check this, I just booted my personal AMD Ryzen 3900X system
> with this series plus a two-liner to unconditionally enable isolated
> function probing also on x86_64 and it came up fine including AMD
> graphics and my Intel NIC with enabled SR-IOV.
>
> So I'm really perplexed and coming back to the thought that a device on
> your system is misbehaving when probing is attempted and maybe due to a
> similar issue as what I saw with SR-IOV it wasn't probed so far but
> really should be probed if isolated function probing is enabled. I also
> still don't understand your use-case. If it is for VMs then maybe you
> could limit it to those? Otherwise it feels like this is just a hack to
> probe an odd topology and I wonder if you should rather set
> PCI_SCAN_ALL_PCIE_DEVS to find those?
>
> Thanks,
> Niklas
Hi LoongArch Maintainers, Hi Bjorn,
Sorry for the ping but I'd really like to somehow get this unstuck and
I haven't heard back on my previous questions. From my testing on s390
this patch fixes a real logic error which prevents the scanning of some
devfns which I believe should be scanned if isolated functions are
possible. And in all my testing, including on x86 as stated in the
previous mail, the code does exactly what I think it is supposed to do.
So to me it really looks like something goes wrong with your use of
hypervisor_isolated_pci_functions() on your specific hardware.
One idea I had is if maybe you need to somehow exclude known empty
slots in you config space accessors?
And just in general I'd really like to better understand your use-case
for the isolated PCI functions. And speaking of that, I'm sorry for
having been so blunt in my last mail saying that it felt like a hack.
I'm just worried, that we've run into incompatible interpretations or
uses of this feature that now prevent us from fixing actual bugs.
Thanks in advance,
Niklas Schnelle
next prev parent reply other threads:[~2025-11-28 13:31 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-29 9:41 [PATCH v5 0/2] PCI: Fix isolated function probing and enable ARI for s390 Niklas Schnelle
2025-10-29 9:41 ` [PATCH v5 1/2] PCI: Fix isolated PCI function probing with ARI and SR-IOV Niklas Schnelle
2025-11-03 9:50 ` Huacai Chen
2025-11-03 11:23 ` Niklas Schnelle
2025-11-05 1:01 ` Huacai Chen
2025-11-05 9:46 ` Niklas Schnelle
2025-11-07 7:19 ` Huacai Chen
2025-11-10 13:08 ` Niklas Schnelle
2025-11-28 13:30 ` Niklas Schnelle [this message]
2025-12-01 14:45 ` Huacai Chen
2025-12-03 21:45 ` Niklas Schnelle
2025-10-29 9:41 ` [PATCH v5 2/2] PCI: s390: Handle ARI on bus without associated struct pci_dev Niklas Schnelle
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=298aaf6b2815e59d1a94efffdd0e3b002c000cea.camel@linux.ibm.com \
--to=schnelle@linux.ibm.com \
--cc=agordeev@linux.ibm.com \
--cc=alifm@linux.ibm.com \
--cc=bhelgaas@google.com \
--cc=borntraeger@linux.ibm.com \
--cc=chenhuacai@kernel.org \
--cc=gbayer@linux.ibm.com \
--cc=gerald.schaefer@linux.ibm.com \
--cc=gor@linux.ibm.com \
--cc=hca@linux.ibm.com \
--cc=jan.kiszka@siemens.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=linux-s390@vger.kernel.org \
--cc=loongarch@lists.linux.dev \
--cc=maobibo@loongson.cn \
--cc=mjrosato@linux.ibm.com \
--cc=svens@linux.ibm.com \
--cc=zhaotianrui@loongson.cn \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).