qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Igor Mammedov <imammedo@redhat.com>
To: Yu Zhang <yu.zhang@ionos.com>
Cc: Laurent Vivier <lvivier@redhat.com>,
	qemu-devel <qemu-devel@nongnu.org>,
	Jinpu Wang <jinpu.wang@ionos.com>,
	Elmar Gerdes <elmar.gerdes@ionos.com>
Subject: Re: an issue for device hot-unplug
Date: Tue, 4 Apr 2023 14:17:19 +0200	[thread overview]
Message-ID: <20230404141719.1bc087c8@imammedo.users.ipa.redhat.com> (raw)
In-Reply-To: <CAHEcVy5SV34jaubY5F-q=H+smvMVOzKbb=rTaNJDNXyGdFaLZg@mail.gmail.com>

On Mon, 3 Apr 2023 15:24:43 +0200
Yu Zhang <yu.zhang@ionos.com> wrote:

> Dear Laurent,
> 
> recently we run into an issue with the following error:
> 
> command '{ "execute": "device_del", "arguments": { "id": "virtio-diskX" }
> }' for VM "id" failed ({ "return": {"class": "GenericError", "desc":
> "Device virtio-diskX is already in the process of unplug"} }).
> 
> The issue is reproducible. With a few seconds delay before hot-unplug,
> hot-unplug just works fine.
> 
> After a few digging, we found that the commit 9323f892b39 may incur the
> issue.
> ------------------
>     failover: fix unplug pending detection
> 
>     Failover needs to detect the end of the PCI unplug to start migration
>     after the VFIO card has been unplugged.
> 
>     To do that, a flag is set in pcie_cap_slot_unplug_request_cb() and
> reset in
>     pcie_unplug_device().
> 
>     But since
>         17858a169508 ("hw/acpi/ich9: Set ACPI PCI hot-plug as default on
> Q35")
>     we have switched to ACPI unplug and these functions are not called
> anymore
>     and the flag not set. So failover migration is not able to detect if
> card
>     is really unplugged and acts as it's done as soon as it's started. So it
>     doesn't wait the end of the unplug to start the migration. We don't see
> any
>     problem when we test that because ACPI unplug is faster than PCIe native
>     hotplug and when the migration really starts the unplug operation is
>     already done.
> 
>     See c000a9bd06ea ("pci: mark device having guest unplug request
> pending")
>         a99c4da9fc2a ("pci: mark devices partially unplugged")
> 
>     Signed-off-by: Laurent Vivier <lvivier@redhat.com>
>     Reviewed-by: Ani Sinha <ani@anisinha.ca>
>     Message-Id: <20211118133225.324937-4-lvivier@redhat.com>
>     Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
>     Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> ------------------
> The purpose is for detecting the end of the PCI device hot-unplug. However,

unplug is async process and issuing multiple unplug requests waiting for
'not found' error as a means to detect that device has been unplugged
hardly a sane way to do that.
Instead of swamping guest with unplug requests (which lead to hw interrupts)
you should wait for DEVICE_DELETED QMP event.

> we feel the error confusing. How is it possible that a disk "is already in
> the process of unplug" during the first hot-unplug attempt? So far as I
> know, the issue was also encountered by libvirt, but they simply ignored it:
> 
>     https://bugzilla.redhat.com/show_bug.cgi?id=1878659
> 
> Hence, a question is: should we have the line below in
> acpi_pcihp_device_unplug_request_cb()?
> 
>    pdev->qdev.pending_deleted_event = true;

comment 15 in above BZ describes how we could get rid of this line
but also see comment 17
(in nutshell you get error because device hasn't been removed yet)
 
> 
> It would be great if you as the author could give us a few hints.
> 
> Thank you very much for your reply!
> 
> Sincerely,
> 
> Yu Zhang @ Compute Platform IONOS
> 03.04.2013



      parent reply	other threads:[~2023-04-04 12:18 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-03 13:24 an issue for device hot-unplug Yu Zhang
2023-04-03 16:32 ` Laurent Vivier
2023-04-03 16:59   ` Yu Zhang
2023-04-04  6:45     ` Jinpu Wang
2023-04-04 12:25       ` Igor Mammedov
2023-04-04 16:00         ` Yu Zhang
2023-04-05  7:51           ` Igor Mammedov
2023-04-04 10:00     ` Jinpu Wang
2023-04-04 12:17 ` Igor Mammedov [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230404141719.1bc087c8@imammedo.users.ipa.redhat.com \
    --to=imammedo@redhat.com \
    --cc=elmar.gerdes@ionos.com \
    --cc=jinpu.wang@ionos.com \
    --cc=lvivier@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=yu.zhang@ionos.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).