qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Leonardo Bras <leobras@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>,
	Marcel Apfelbaum <marcel.apfelbaum@gmail.com>,
	Juan Quintela <quintela@redhat.com>,
	qemu-devel@nongnu.org
Subject: Re: [PATCH v2 1/1] pcie: Add hotplug detect state register to cmask
Date: Thu, 6 Jul 2023 10:35:50 -0400	[thread overview]
Message-ID: <ZKbRRt8ESGsMz+o7@x1n> (raw)
In-Reply-To: <20230706045546.593605-3-leobras@redhat.com>

On Thu, Jul 06, 2023 at 01:55:47AM -0300, Leonardo Bras wrote:
> When trying to migrate a machine type pc-q35-6.0 or lower, with this
> cmdline options,
> 
> -device driver=pcie-root-port,port=18,chassis=19,id=pcie-root-port18,bus=pcie.0,addr=0x12 \
> -device driver=nec-usb-xhci,p2=4,p3=4,id=nex-usb-xhci0,bus=pcie-root-port18,addr=0x12.0x1
> 
> the following bug happens after all ram pages were sent:
> 
> qemu-kvm: get_pci_config_device: Bad config data: i=0x6e read: 0 device: 40 cmask: ff wmask: 0 w1cmask:19
> qemu-kvm: Failed to load PCIDevice:config
> qemu-kvm: Failed to load pcie-root-port:parent_obj.parent_obj.parent_obj
> qemu-kvm: error while loading state for instance 0x0 of device '0000:00:12.0/pcie-root-port'
> qemu-kvm: load of migration failed: Invalid argument
> 
> This happens on pc-q35-6.0 or lower because of:
> { "ICH9-LPC", ACPI_PM_PROP_ACPI_PCIHP_BRIDGE, "off" }
> 
> In this scenario, hotplug_handler_plug() calls pcie_cap_slot_plug_cb(),
> which sets dev->config byte 0x6e with bit PCI_EXP_SLTSTA_PDS to signal PCI
> hotplug for the guest. After a while the guest will deal with this hotplug
> and qemu will clear the above bit.

Do you mean that the bit will be cleared after this point for the whole
lifecycle of the VM, as long as the pcie topology doesn't change again?

"This bit indicates the presence of an adapter in the slot"

IIUC the adapter in the slot is there, why it's cleared rather than set?

> 
> Then, during migration, get_pci_config_device() will compare the
> configs of both the freshly created device and the one that is being
> received via migration, which will differ due to the PCI_EXP_SLTSTA_PDS bit
> and cause the bug to reproduce.
> 
> To avoid this fake incompatibility, there are tree fields in PCIDevice that
> can help:
> 
> - wmask: Used to implement R/W bytes, and
> - w1cmask: Used to implement RW1C(Write 1 to Clear) bytes
> - cmask: Used to enable config checks on load.
> 
> According to PCI Express® Base Specification Revision 5.0 Version 1.0,
> table 7-27 (Slot Status Register) bit 6, the "Presence Detect State" is
> listed as RO (read-only), so it only makes sense to make use of the cmask
> field.
> 
> So, clear PCI_EXP_SLTSTA_PDS bit on cmask, so the fake incompatibility on
> get_pci_config_device() does not abort the migration.

Yes, using cmask makes more sense to me, but we'd need some pci developer
to ack it at last I guess, anyway.

> 
> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2215819
> Signed-off-by: Leonardo Bras <leobras@redhat.com>

I asked the same question, and I still keep confused: whether there's a
first bad commit?  Starting from when it fails?

For example, is this broken on 6.0 binaries too with pc-q35-6.0?

Thanks,

> ---
>  hw/pci/pcie.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> index b8c24cf45f..cae56bf1c8 100644
> --- a/hw/pci/pcie.c
> +++ b/hw/pci/pcie.c
> @@ -659,6 +659,10 @@ void pcie_cap_slot_init(PCIDevice *dev, PCIESlot *s)
>      pci_word_test_and_set_mask(dev->w1cmask + pos + PCI_EXP_SLTSTA,
>                                 PCI_EXP_HP_EV_SUPPORTED);
>  
> +    /* Avoid migration abortion when this device hot-removed by guest */
> +    pci_word_test_and_clear_mask(dev->cmask + pos + PCI_EXP_SLTSTA,
> +                                 PCI_EXP_SLTSTA_PDS);
> +
>      dev->exp.hpev_notified = false;
>  
>      qbus_set_hotplug_handler(BUS(pci_bridge_get_sec_bus(PCI_BRIDGE(dev))),
> -- 
> 2.41.0
> 

-- 
Peter Xu



  parent reply	other threads:[~2023-07-06 14:37 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-06  4:55 [PATCH v2 1/1] pcie: Add hotplug detect state register to cmask Leonardo Bras
2023-07-06  7:37 ` Juan Quintela
2023-07-06 17:58   ` Leonardo Bras Soares Passos
2023-07-06 14:35 ` Peter Xu [this message]
2023-07-06 18:07   ` Leonardo Bras Soares Passos
2023-07-06 18:14     ` Peter Xu
2023-07-06 18:37       ` Leonardo Bras Soares Passos
2023-07-06 18:50       ` Michael S. Tsirkin
2023-07-06 19:02         ` Peter Xu
2023-07-06 20:00           ` Michael S. Tsirkin
2023-07-06 20:17             ` Peter Xu
2023-07-10 17:49             ` Leonardo Bras Soares Passos
2023-07-10 18:16               ` Michael S. Tsirkin
2023-07-10 21:48                 ` Leonardo Bras Soares Passos
2023-07-06 21:47       ` Leonardo Bras Soares Passos

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZKbRRt8ESGsMz+o7@x1n \
    --to=peterx@redhat.com \
    --cc=leobras@redhat.com \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).