From: Cornelia Huck <cohuck@redhat.com>
To: "Michael S. Tsirkin" <mst@redhat.com>, Parav Pandit <parav@nvidia.com>
Cc: virtualization@lists.linux-foundation.org
Subject: Re: [PATCH 4/4] virtio_pci: Support surprise removal of virtio pci device
Date: Mon, 19 Jul 2021 11:40:08 +0200 [thread overview]
Message-ID: <878s22bq13.fsf@redhat.com> (raw)
In-Reply-To: <20210717164152-mutt-send-email-mst@kernel.org>
On Sat, Jul 17 2021, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Sat, Jul 17, 2021 at 10:42:58AM +0300, Parav Pandit wrote:
>> When a virtio pci device undergo surprise removal (aka async removaln in
>
> typo
>
>> PCIe spec), mark the device is broken so that any upper layer drivers can
>> abort any outstanding operation.
>>
>> When a virtio net pci device undergo surprise removal which is used by a
>> NetworkManager, a below call trace was observed.
>>
>> kernel:watchdog: BUG: soft lockup - CPU#1 stuck for 26s! [kworker/1:1:27059]
>> watchdog: BUG: soft lockup - CPU#1 stuck for 52s! [kworker/1:1:27059]
>> CPU: 1 PID: 27059 Comm: kworker/1:1 Tainted: G S W I L 5.13.0-hotplug+ #8
>> Hardware name: Dell Inc. PowerEdge R640/0H28RR, BIOS 2.9.4 11/06/2020
>> Workqueue: events linkwatch_event
>> RIP: 0010:virtnet_send_command+0xfc/0x150 [virtio_net]
>> Call Trace:
>> virtnet_set_rx_mode+0xcf/0x2a7 [virtio_net]
>> ? __hw_addr_create_ex+0x85/0xc0
>> __dev_mc_add+0x72/0x80
>> igmp6_group_added+0xa7/0xd0
>> ipv6_mc_up+0x3c/0x60
>> ipv6_find_idev+0x36/0x80
>> addrconf_add_dev+0x1e/0xa0
>> addrconf_dev_config+0x71/0x130
>> addrconf_notify+0x1f5/0xb40
>> ? rtnl_is_locked+0x11/0x20
>> ? __switch_to_asm+0x42/0x70
>> ? finish_task_switch+0xaf/0x2c0
>> ? raw_notifier_call_chain+0x3e/0x50
>> raw_notifier_call_chain+0x3e/0x50
>> netdev_state_change+0x67/0x90
>> linkwatch_do_dev+0x3c/0x50
>> __linkwatch_run_queue+0xd2/0x220
>> linkwatch_event+0x21/0x30
>> process_one_work+0x1c8/0x370
>> worker_thread+0x30/0x380
>> ? process_one_work+0x370/0x370
>> kthread+0x118/0x140
>> ? set_kthread_struct+0x40/0x40
>> ret_from_fork+0x1f/0x30
>>
>> Hence, add the ability to abort the command on surprise removal
>> which prevents infinite loop and system lockup.
>>
>> Signed-off-by: Parav Pandit <parav@nvidia.com>
>
> OK that's nice, but I suspect this is not enough.
> For example we need to also fix up all config space reads
> to handle all-ones patterns specially.
>
> E.g.
>
> /* After writing 0 to device_status, the driver MUST wait for a read of
> * device_status to return 0 before reinitializing the device.
> * This will flush out the status write, and flush in device writes,
> * including MSI-X interrupts, if any.
> */
> while (vp_modern_get_status(mdev))
> msleep(1);
>
> lots of code in drivers needs to be fixed too.
>
> I guess we need to annotate drivers somehow to mark up
> whether they support surprise removal? And maybe find a
> way to let host know?
I'm wondering whether virtio-pci surprise removal would need more
support in drivers than virtio-ccw surprise removal; given that
virtio-ccw *only* supports surprise removal and I don't remember any
problem reports, the situation is probably not that bad.
Is surprise removal of block devices still a big problem? We have some
support for (non-virtio) ccw devices (e.g. dasd) via a 'disconnected'
state that was designed to mitigate problems with block devices that are
suddenly gone.
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization
prev parent reply other threads:[~2021-07-19 9:40 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-07-17 7:42 [PATCH 0/4] virtio short improvements Parav Pandit via Virtualization
2021-07-17 7:42 ` [PATCH 1/4] virtio: Improve vq->broken access to avoid any compiler optimization Parav Pandit via Virtualization
2021-07-17 20:38 ` Michael S. Tsirkin
2021-07-19 5:26 ` Parav Pandit via Virtualization
2021-07-19 12:04 ` Michael S. Tsirkin
2021-07-19 14:20 ` Parav Pandit via Virtualization
2021-07-17 7:42 ` [PATCH 2/4] virtio: Keep vring_del_virtqueue() mirror of VQ create Parav Pandit via Virtualization
2021-07-17 7:42 ` [PATCH 3/4] virtio: Protect vqs list access Parav Pandit via Virtualization
2021-07-17 20:41 ` Michael S. Tsirkin
2021-07-19 5:37 ` Parav Pandit via Virtualization
2021-07-17 7:42 ` [PATCH 4/4] virtio_pci: Support surprise removal of virtio pci device Parav Pandit via Virtualization
2021-07-17 20:46 ` Michael S. Tsirkin
2021-07-19 5:44 ` Parav Pandit via Virtualization
2021-07-19 12:07 ` Michael S. Tsirkin
2021-07-19 14:15 ` Parav Pandit via Virtualization
2021-07-19 9:40 ` Cornelia Huck [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=878s22bq13.fsf@redhat.com \
--to=cohuck@redhat.com \
--cc=mst@redhat.com \
--cc=parav@nvidia.com \
--cc=virtualization@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).