From: Alex Williamson <alex.williamson@redhat.com>
To: Mario Limonciello <mario.limonciello@amd.com>
Cc: Ashutosh Sharma <ashutosh.dandora4@gmail.com>,
Lukas Wunner <lukas@wunner.de>,
linux-pci@vger.kernel.org, helgaas@kernel.org,
dwmw2@infradead.org, yi.l.liu@intel.com, majosaheb@gmail.com,
cohuck@redhat.com, zhenzhong.duan@gmail.com,
Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>,
Yazen Ghannam <yazen.ghannam@amd.com>
Subject: Re: PCI device hot insert is not detected
Date: Tue, 12 Dec 2023 12:07:33 -0700 [thread overview]
Message-ID: <20231212120733.7b62f92c.alex.williamson@redhat.com> (raw)
In-Reply-To: <5d880d78-ee3b-4c3d-a0bb-4e278c3d7b29@amd.com>
On Tue, 12 Dec 2023 12:29:13 -0600
Mario Limonciello <mario.limonciello@amd.com> wrote:
> On 12/12/2023 05:32, Ashutosh Sharma wrote:
> >> This doesn't work, try "echo 1 | sudo tee power" instead.
> >
> > This was not a permission issue, I already gave it read/write permission.
> >
> > admin@node-4:/sys/bus/pci/slots/14$ sudo echo 1 > power
> > -bash: power: Permission denied
> > admin@node-4:/sys/bus/pci/slots/14$ sudo chmod 0666 power
> > admin@node-4:/sys/bus/pci/slots/14$ sudo echo 1 > power
> > echo: write error: Operation not permitted
> > admin@node-4:/sys/bus/pci/slots/14$
> >
> >> This is from a "Link up" situation (DLActive+), it would be more
> >> interesting to get lspci output of the port in a "No link" situation.
> >
> > Unfortunately, I did not collect that output before system reboot.
> >
> > On Tue, 12 Dec 2023 at 16:29, Lukas Wunner <lukas@wunner.de> wrote:
> >>
> >> On Tue, Dec 12, 2023 at 04:04:41PM +0530, Ashutosh Sharma wrote:
> >>> Removed one NVMe drive (pci address 0000:83:00.0), it got unbound
> >>> successfully from "vfio-pci" driver but saw below error in the syslog.
> >>>
> >>> can't change power state from D0 to D3hot (config space inaccessible)
> >>
> >> This is normal, the drive's config space is inaccessible after removal.
> >>
>
> Was the removal a "surprise" removal? Or you mean it was by using
> 'remove' sysfs file?
>
> IIRC surprise removal will need platform firmware support to handle it
> properly.
The vfio-pci driver also makes zero claims about supporting surprise
removal, you'll likely end up in an inconsistent state. Thanks,
Alex
> >>> Then after 2:30 min approx, re-inserted the same drive to the same PCI
> >>> slot. But the drive was not detected.
> >>>
> >>> Dec 11 23:54:39 node-4 kernel: [183672.630191] pcieport 0000:80:03.2:
> >>> pciehp: Slot(14): Attention button pressed
> >>> Dec 11 23:54:39 node-4 kernel: [183672.630195] pcieport 0000:80:03.2:
> >>> pciehp: Slot(14) Powering on due to button press
> >>> Dec 11 23:54:44 node-4 kernel: [183677.671931] pcieport 0000:80:03.2:
> >>> pciehp: Slot(14): Card present
> >>> Dec 11 23:54:46 node-4 kernel: [183679.783922] pcieport 0000:80:03.2:
> >>> pciehp: Slot(14): No link
> >>
> >> The link doesn't come up, so the kernel gives up on the slot.
> >>
> >> I don't know what the reason is, could be a hardware issue or
> >> protocol incompatibility. This doesn't look like a kernel issue.
> >>
> >>
> >>> | +-03.0 Advanced Micro Devices, Inc. [AMD]
> >>> Starship/Matisse PCIe Dummy Host Bridge
> >>> | +-03.1-[82]----00.0 Samsung Electronics Co Ltd NVMe SSD
> >>> Controller PM9A1/PM9A3/980PRO
> >>> | +-03.2-[83]--
> >>
> >> Adding Mario, Smita, Yazen from AMD to cc, maybe they have an idea
> >> what the issue is or how to get diagnostics on this Epyc platform.
> >>
> >> Start of thread:
> >> https://lore.kernel.org/linux-pci/CADOvten7jG7KjW6W1MRd7i8_E18L0xCCaCzmZOY_vvgJhdfOSw@mail.gmail.com/
> >>
> >>
> >>> admin@node-4:/sys/bus/pci/slots/14$ sudo echo 1 > power
> >>> echo: write error: Operation not permitted
> >>
> >> This doesn't work, try "echo 1 | sudo tee power" instead.
> >>
> >>
> >>> lspci output of the pci port:
> >>> 80:03.2 PCI bridge: Advanced Micro Devices, Inc. [AMD]
> >>> Starship/Matisse GPP Bridge (prog-if 00 [Normal decode])
> >> [...]
> >>> LnkSta: Speed 16GT/s (ok), Width x4 (ok)
> >>> TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
> >>
> >> This is from a "Link up" situation (DLActive+), it would be more
> >> interesting to get lspci output of the port in a "No link" situation.
> >>
> >> Thanks,
> >>
> >> Lukas
>
next prev parent reply other threads:[~2023-12-12 19:07 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-12 10:34 PCI device hot insert is not detected Ashutosh Sharma
2023-12-12 10:59 ` Lukas Wunner
2023-12-12 11:32 ` Ashutosh Sharma
2023-12-12 18:29 ` Mario Limonciello
2023-12-12 19:07 ` Alex Williamson [this message]
2023-12-13 4:57 ` Ashutosh Sharma
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231212120733.7b62f92c.alex.williamson@redhat.com \
--to=alex.williamson@redhat.com \
--cc=Smita.KoralahalliChannabasappa@amd.com \
--cc=ashutosh.dandora4@gmail.com \
--cc=cohuck@redhat.com \
--cc=dwmw2@infradead.org \
--cc=helgaas@kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=lukas@wunner.de \
--cc=majosaheb@gmail.com \
--cc=mario.limonciello@amd.com \
--cc=yazen.ghannam@amd.com \
--cc=yi.l.liu@intel.com \
--cc=zhenzhong.duan@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox