public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: "Patel, Nirmal" <nirmal.patel@linux.intel.com>
Cc: Alex Williamson <alex.williamson@redhat.com>,
	linux-pci@vger.kernel.org, kvm@vger.kernel.org,
	linux-acpi@vger.kernel.org, Bagas Sanjaya <bagasdotme@gmail.com>
Subject: Re: FW: [Bug 217472] New: ACPI _OSC features have different values in Host OS and Guest OS
Date: Tue, 6 Jun 2023 18:41:24 -0500	[thread overview]
Message-ID: <20230606234124.GA1147990@bhelgaas> (raw)
In-Reply-To: <c49ed344-f521-b4b9-8a7a-a70600002358@linux.intel.com>

[+cc Bagas]

On Thu, May 25, 2023 at 01:33:27PM -0700, Patel, Nirmal wrote:
> On 5/25/2023 1:19 PM, Patel, Nirmal wrote:
> >> On Tue, 23 May 2023 12:21:25 -0500
> >> Bjorn Helgaas <helgaas@kernel.org> wrote:
> >>> On Mon, May 22, 2023 at 04:32:03PM +0000, bugzilla-daemon@kernel.org wrote:
> >>>> https://bugzilla.kernel.org/show_bug.cgi?id=217472
> >>>> ...  
> >>>> Created attachment 304301  
> >>>>   --> 
> >>>> https://bugzilla.kernel.org/attachment.cgi?id=304301&action=edit
> >>>> Rhel9.1_Guest_dmesg
> >>>>
> >>>> Issue:
> >>>> NVMe Drives are still present after performing hotplug in guest
> >>>> OS.  We have tested with different combination of OSes, drives
> >>>> and Hypervisor. The issue is present across all the OSes.   
> >>>
> >>> Maybe attach the specific commands to reproduce the problem in one of 
> >>> these scenarios to the bugzilla?  I'm a virtualization noob, so I 
> >>> can't visualize all the usual pieces.
> >>>
> >>>> The following patch was added to honor ACPI _OSC values set by BIOS 
> >>>> and the patch helped to bring the issue out in VM/ Guest OS.
> >>>>
> >>>> https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/comm
> >>>> it/drivers/pci/controller/vmd.c?id=04b12ef163d10e348db664900ae7f611b
> >>>> 83c7a0e
> >>>>
> >>>>
> >>>> I also compared the values of the parameters in the patch in
> >>>> Host and Guest OS.  The parameters with different values in
> >>>> Host and Guest OS are:
> >>>>
> >>>> native_pcie_hotplug
> >>>> native_shpc_hotplug
> >>>> native_aer
> >>>> native_ltr
> >>>>
> >>>> i.e.
> >>>> value of native_pcie_hotplug in Host OS is 1.
> >>>> value of native_pcie_hotplug in Guest OS is 0.
> >>>>
> >>>> I am not sure why "native_pcie_hotplug" is changed to 0 in guest.
> >>>> Isn't it OSC_ managed parameter? If that is the case, it should have 
> >>>> same value in Host and Guest OS.
> >>>
> >>> From your dmesg:
> >>>  
> >>>   DMI: Red Hat KVM/RHEL, BIOS 1.16.0-4.el9 04/01/2014
> >>>   _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI EDR HPX-Type3]
> >>>   _OSC: platform does not support [PCIeHotplug LTR DPC]
> >>>   _OSC: OS now controls [SHPCHotplug PME AER PCIeCapability]
> >>>   acpiphp: Slot [0] registered
> >>>   virtio_blk virtio3: [vda] 62914560 512-byte logical blocks (32.2 
> >>> GB/30.0 GiB)
> >>>
> >>> So the DMI ("KVM/RHEL ...") is the BIOS seen by the guest.  Doesn't 
> >>> mean anything to me, but the KVM folks would know about it.  In any 
> >>> event, the guest BIOS is different from the host BIOS, so I'm not 
> >>> surprised that _OSC is different.
> >>
> >> Right, the premise of the issue that guest and host should have
> >> the same OSC features is flawed.  The guest is a virtual machine
> >> that can present an entirely different feature set from the host.
> >> A software hotplug on the guest can occur without any bearing to
> >> the slot status on the host.
> >>
> >>> That guest BIOS _OSC declined to grant control of PCIe native
> >>> hotplug to the guest OS, so the guest will use acpiphp (not
> >>> pciehp, which would be used if native_pcie_hotplug were set).
> >>>
> >>> The dmesg doesn't mention the nvme driver.  Are you using
> >>> something like virtio_blk with qemu pointed at an NVMe drive?
> >>> And you hot-remove the NVMe device, but the guest OS thinks it's
> >>> still present?
> >>>
> >>> Since the guest is using acpiphp, I would think a hot-remove of
> >>> a host NVMe device should be noticed by qemu and turned into an
> >>> ACPI notification that the guest OS would consume.  But I don't
> >>> know how those connections work.
> >>
> >> If vfio-pci is involved, a cooperative hot-unplug will attempt to
> >> unbind the host driver, which triggers a device request through
> >> vfio, which is ultimately seen as a hotplug eject operation by
> >> the guest.  Surprise hotplugs of assigned devices are not
> >> supported.  There's not enough info in the bz to speculate how
> >> this VM is wired or what actions are taken.  Thanks,
> 
> Thanks Bjorn and Alex for quick response.
> I agree with the analysis about guest BIOS not giving control of
> PCIe native hotplug to guest OS.

Can I back up and try to understand the problem better?  I'm sure I'm
asking dumb questions, so please correct me:

  - Can you add more details in the bz about what you're doing and
    what is failing?

  - I have the impression that the hotplug worked before 04b12ef163d1
    ("PCI: vmd: Honor ACPI _OSC on PCIe features") but fails after?

  - Can you attach dmesg logs from before and after 04b12ef163d1?

  - What sort of virtualized guest is this?  qemu?

  - How is the NVMe drive passed to the guest?  vfio-pci?

  - Apparently the problem is with a hot-remove in the guest?  How are
    you doing this?  Sysfs "remove" file?  qemu "device_del"?

  - I assume this hot-remove is only from the *guest* and there's no
    hotplug event for the *host*?

> Adding some background about the patch f611b83c7a0e PCI: vmd: Honor
> ACPI _OSC on PCIe features.

Tangent, "f611b83c7a0e" is not a valid SHA1, so I was lost for a minute :)
I guess you're referring to 04b12ef163d10e348db664900ae7f611b83c7a0e,
where f611b83c7a0e is at the *end* of the SHA1.  You can abbreviate
it, but you have to quote the *beginning*, not the end.  E.g., the
conventional style would be 04b12ef163d1 ("PCI: vmd: Honor ACPI _OSC
on PCIe features").

Bjorn

      reply	other threads:[~2023-06-06 23:41 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <bug-217472-41252@https.bugzilla.kernel.org/>
2023-05-23 17:21 ` [Bug 217472] New: ACPI _OSC features have different values in Host OS and Guest OS Bjorn Helgaas
2023-05-23 18:06   ` Alex Williamson
     [not found]     ` <BYAPR11MB3031739869CB639695896CEF98469@BYAPR11MB3031.namprd11.prod.outlook.com>
     [not found]       ` <1f7f84a4-0df4-a413-ba30-cc2257980abd@linux.intel.com>
2023-05-25 20:33         ` FW: " Patel, Nirmal
2023-06-06 23:41           ` Bjorn Helgaas [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230606234124.GA1147990@bhelgaas \
    --to=helgaas@kernel.org \
    --cc=alex.williamson@redhat.com \
    --cc=bagasdotme@gmail.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=nirmal.patel@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox