From: Bjorn Helgaas <helgaas@kernel.org>
To: Erin_Tsao@wistron.com
Cc: linux-pci@vger.kernel.org, mj@ucw.cz
Subject: Re: Issue about PCI physical slot fetch incorrect number
Date: Thu, 29 Aug 2024 11:35:18 -0500 [thread overview]
Message-ID: <20240829163518.GA39705@bhelgaas> (raw)
In-Reply-To: <dcd9ff2b16c44efca61189850fb0fe02@wistron.com>
On Mon, Aug 26, 2024 at 08:27:09AM +0000, Erin_Tsao@wistron.com wrote:
> Hi Bjorn,
> Sorry for the late response. And thanks for responding to my question.
> There's a few thing I would like to clarify with you.
> 1. Is the physical slot number associate with the configuration of
> device itself or with the configuration of device's parent?
A PCIe device doesn't know its own slot number. The bridge leading to
a slot (either a Root Port or a Switch Downstream Port) has the Slot
Capability/Status/Control registers that manage the slot. The Slot
Capabilities register contains a "Physical Slot Number". This is
HwInit, which means it's set by hardware or firmware, and it's
supposed to be a number that's unique within the chassis.
The "Physical Slot" reported by lspci for Endpoints comes from sysfs,
not from the device itself. See
https://git.kernel.org/pub/scm/utils/pciutils/pciutils.git/tree/lib/sysfs.c?id=v3.13.0#n277
From team1 lspci-vvv:
20:01.1 Root Port to [bus 21-34] Slot #0
21:00.0 Broadcom Switch Upstream Port to [bus 22-34]
22:00.0 Broadcom Switch Downstream Port to [bus 23] Slot #308
23:00.0 Broadcom Endpoint 02b2 "Physical Slot: 308"
22:01.0 Broadcom Switch Downstream Port to [bus 24] Slot #306
24:00.0 Broadcom Endpoint 02b2 "Physical Slot: 306"
22:02.0 Broadcom Switch Downstream Port to [bus 25-2a] Slot #213
25:00.0 Mellanox Endpoint MT2910 "Physical Slot: 213"
22:03.0 Broadcom Switch Downstream Port to [bus 2b-30] Slot #203
2b:00.0 Broadcom Endpoint 02b2 "Physical Slot: 203"
22:04.0 Broadcom Switch Downstream Port to [bus 31-33] Slot #101
31:00.0 AMD Switch Upstream Port to [bus 32-33] "Physical Slot: 101"
32:00.0 AMD Switch Downstream Port to [bus 33] Slot #0
33:00.0 AMD Endpoint 74a1 "Physical Slot: 0-6"
I don't know off the top of my head why lspci doesn't report a
"Physical Slot:" for 21:00.0. I suppose the kernel didn't provide
something in /sys for it.
All the other "Physical Slot:" reports from lspci match the "Physical
Slot Number" from the PCIe Capability of the bridge leading to the
slot, *except* for 33:00.0. In that case, the "Physical Slot Number"
from the bridge PCIe Capability is not unique. Both 20:01.1 and
32:00.0 advertise Slot #0 there, so the kernel make the sysfs slot
unique, e.g., "0-6".
From lspci_vvv_team2.txt:
39:00.0 Broadcom Switch Downstream Port to [bus 3a] Slot #166
3a:00.0 Samsung Endpoint NVMe "Physical Slot: 166"
39:01.0 Broadcom Switch Downstream Port to [bus 3b-3d] Slot #24 "Physical Slot: 24"
3b:00.0 AMD Switch Upstream Port to [bus 3c-3d]
3c:00.0 AMD Switch Downstream Port to [bus 3d] Slot #0
3d:00.0 AMD Endpoint MI300X
39:02.0 Broadcom Switch Downstream Port to [bus 3e] Slot #39 "Physical Slot: 39"
3e:00.0 Mellanox Endpoint
This seems strange to me. For 39:01.0, lspci reports "Physical Slot:
24", but 39:01.0 is a Downstream Port that *leads* to a slot; it's not
a slot itself. 3b:00.0 is the device in that slot, and I think it
should have a slot number, but it doesn't.
Similarly, lspci reports "Physical Slot: 39" for 39:02.0, when it
should show 3e:00.0 being in slot 39.
I guess this team2 situation is what you're trying to understand?
Can you collect the complete dmesg log and output of "grep -r .
/sys/bus/pci/slots" for both team1 and team2? We should be able to
puzzle out what's going on. The dmesg logging will show which hotplug
drivers are in use and should have hints about slot numbering, and if
it doesn't, we may need to add some.
> 2. As my understanding, we also have another team using AMD GPU
> MI300. And I have discovered that lspci -xxx have some difference
> between our team(team 1) and their team (team 2). The difference is
> that when we dump the file of lspci -xxx, the content only listed to
> 0xff, however, another team listed the content till 0xfff, which
> means that they have additional content from 0x100 to 0xfff.
> ->Is there any setting of OS that we can enable in order to see
> the whole content?
I think "lspci -xxx" will only show you 0-0xff unless lspci is run as
root.
> ->Will these additional content related to the physical slot
> number? Or have any impact on showing the physical slot number?
I don't think so. The Slot Capability/Status/Control registers are in
the PCIe Capability, which should be below 0xff.
Bjorn
next prev parent reply other threads:[~2024-08-29 16:35 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <a600fc09c06d4ca28b045668ad1e63cb@wistron.com>
2024-08-23 18:51 ` Issue about PCI physical slot fetch incorrect number Martin Mareš
2024-08-23 21:03 ` Bjorn Helgaas
2024-08-26 8:27 ` Erin_Tsao
2024-08-29 16:35 ` Bjorn Helgaas [this message]
2024-09-06 2:04 ` Erin_Tsao
2024-09-18 14:09 ` Bjorn Helgaas
2024-08-26 9:05 ` Erin_Tsao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240829163518.GA39705@bhelgaas \
--to=helgaas@kernel.org \
--cc=Erin_Tsao@wistron.com \
--cc=linux-pci@vger.kernel.org \
--cc=mj@ucw.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox