From: "Michael S. Tsirkin" <mst@redhat.com>
To: Parav Pandit <parav@nvidia.com>
Cc: virtualization@lists.linux-foundation.org
Subject: Re: [PATCH 4/4] virtio_pci: Support surprise removal of virtio pci device
Date: Sat, 17 Jul 2021 16:46:39 -0400 [thread overview]
Message-ID: <20210717164152-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <20210717074258.1463313-5-parav@nvidia.com>
On Sat, Jul 17, 2021 at 10:42:58AM +0300, Parav Pandit wrote:
> When a virtio pci device undergo surprise removal (aka async removaln in
typo
> PCIe spec), mark the device is broken so that any upper layer drivers can
> abort any outstanding operation.
>
> When a virtio net pci device undergo surprise removal which is used by a
> NetworkManager, a below call trace was observed.
>
> kernel:watchdog: BUG: soft lockup - CPU#1 stuck for 26s! [kworker/1:1:27059]
> watchdog: BUG: soft lockup - CPU#1 stuck for 52s! [kworker/1:1:27059]
> CPU: 1 PID: 27059 Comm: kworker/1:1 Tainted: G S W I L 5.13.0-hotplug+ #8
> Hardware name: Dell Inc. PowerEdge R640/0H28RR, BIOS 2.9.4 11/06/2020
> Workqueue: events linkwatch_event
> RIP: 0010:virtnet_send_command+0xfc/0x150 [virtio_net]
> Call Trace:
> virtnet_set_rx_mode+0xcf/0x2a7 [virtio_net]
> ? __hw_addr_create_ex+0x85/0xc0
> __dev_mc_add+0x72/0x80
> igmp6_group_added+0xa7/0xd0
> ipv6_mc_up+0x3c/0x60
> ipv6_find_idev+0x36/0x80
> addrconf_add_dev+0x1e/0xa0
> addrconf_dev_config+0x71/0x130
> addrconf_notify+0x1f5/0xb40
> ? rtnl_is_locked+0x11/0x20
> ? __switch_to_asm+0x42/0x70
> ? finish_task_switch+0xaf/0x2c0
> ? raw_notifier_call_chain+0x3e/0x50
> raw_notifier_call_chain+0x3e/0x50
> netdev_state_change+0x67/0x90
> linkwatch_do_dev+0x3c/0x50
> __linkwatch_run_queue+0xd2/0x220
> linkwatch_event+0x21/0x30
> process_one_work+0x1c8/0x370
> worker_thread+0x30/0x380
> ? process_one_work+0x370/0x370
> kthread+0x118/0x140
> ? set_kthread_struct+0x40/0x40
> ret_from_fork+0x1f/0x30
>
> Hence, add the ability to abort the command on surprise removal
> which prevents infinite loop and system lockup.
>
> Signed-off-by: Parav Pandit <parav@nvidia.com>
OK that's nice, but I suspect this is not enough.
For example we need to also fix up all config space reads
to handle all-ones patterns specially.
E.g.
/* After writing 0 to device_status, the driver MUST wait for a read of
* device_status to return 0 before reinitializing the device.
* This will flush out the status write, and flush in device writes,
* including MSI-X interrupts, if any.
*/
while (vp_modern_get_status(mdev))
msleep(1);
lots of code in drivers needs to be fixed too.
I guess we need to annotate drivers somehow to mark up
whether they support surprise removal? And maybe find a
way to let host know?
> ---
> drivers/virtio/virtio_pci_common.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
> index 222d630c41fc..b35bb2d57f62 100644
> --- a/drivers/virtio/virtio_pci_common.c
> +++ b/drivers/virtio/virtio_pci_common.c
> @@ -576,6 +576,13 @@ static void virtio_pci_remove(struct pci_dev *pci_dev)
> struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
> struct device *dev = get_device(&vp_dev->vdev.dev);
>
> + /*
> + * Device is marked broken on surprise removal so that virtio upper
> + * layers can abort any ongoing operation.
> + */
> + if (!pci_device_is_present(pci_dev))
> + virtio_break_device(&vp_dev->vdev);
> +
> pci_disable_sriov(pci_dev);
>
> unregister_virtio_device(&vp_dev->vdev);
> --
> 2.27.0
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
next prev parent reply other threads:[~2021-07-17 20:46 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-17 7:42 [PATCH 0/4] virtio short improvements Parav Pandit via Virtualization
2021-07-17 7:42 ` [PATCH 1/4] virtio: Improve vq->broken access to avoid any compiler optimization Parav Pandit via Virtualization
2021-07-17 20:38 ` Michael S. Tsirkin
2021-07-19 5:26 ` Parav Pandit via Virtualization
2021-07-19 12:04 ` Michael S. Tsirkin
2021-07-19 14:20 ` Parav Pandit via Virtualization
2021-07-17 7:42 ` [PATCH 2/4] virtio: Keep vring_del_virtqueue() mirror of VQ create Parav Pandit via Virtualization
2021-07-17 7:42 ` [PATCH 3/4] virtio: Protect vqs list access Parav Pandit via Virtualization
2021-07-17 20:41 ` Michael S. Tsirkin
2021-07-19 5:37 ` Parav Pandit via Virtualization
2021-07-17 7:42 ` [PATCH 4/4] virtio_pci: Support surprise removal of virtio pci device Parav Pandit via Virtualization
2021-07-17 20:46 ` Michael S. Tsirkin [this message]
2021-07-19 5:44 ` Parav Pandit via Virtualization
2021-07-19 12:07 ` Michael S. Tsirkin
2021-07-19 14:15 ` Parav Pandit via Virtualization
2021-07-19 9:40 ` Cornelia Huck
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210717164152-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=parav@nvidia.com \
--cc=virtualization@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.