Re: [PATCH] [RFC] virtio: Limit the retries on a virtio device reset

From: Pierre Morel <pmorel@linux.vnet.ibm.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: cohuck@redhat.com, virtualization@lists.linux-foundation.org
Subject: Re: [PATCH] [RFC] virtio: Limit the retries on a virtio device reset
Date: Fri, 25 Aug 2017 10:33:57 +0200	[thread overview]
Message-ID: <d271e548-efd2-5315-c406-a32fad838e87@linux.vnet.ibm.com> (raw)
In-Reply-To: <20170825001922-mutt-send-email-mst@kernel.org>

On 24/08/2017 23:23, Michael S. Tsirkin wrote:
> On Thu, Aug 24, 2017 at 07:42:07PM +0200, Pierre Morel wrote:
>> On 24/08/2017 16:19, Michael S. Tsirkin wrote:
>>> On Wed, Aug 23, 2017 at 06:33:02PM +0200, Pierre Morel wrote:
>>>> Reseting a device can sometime fail, even a virtual device.
>>>> If the device is not reseted after a while the driver should
>>>> abandon the retries.
>>>> This is the change proposed for the modern virtio_pci.
>>>>
>>>> More generally, when this happens,the virtio driver can set the
>>>> VIRTIO_CONFIG_S_FAILED status flag to advertise the caller.
>>>>
>>>> The virtio core can test if the reset was succesful by testing
>>>> this flag after a reset.
>>>>
>>>> This behavior is backward compatible with existing drivers.
>>>> This behavior seems to me compatible with Virtio-1.0 specifications,
>>>> Chapters 2.1 Device Status Field.
>>>> There I definitively need your opinion: Is it right?
>>>>
>>>> This patch also lead to another question:
>>>> do we care if a device provided by the hypervisor is buggy?
>>>>
>>>> Signed-off-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
>>>
>>> So I think this is not the best place to start to add error recovery.
>>
>> I agree, there can not be any error recovery there.
>> If reset does not work we can let fall the device until next reset of the
>> hypervisor.
> 
> On probe, yes. But failures are more likely to trigger at other times.

OK, what about:
- On probe if reset fail, the probe fail.

- On freeze and remove : we can not free resources which are common
	with the device, at least the queues.
	... we can only signal the error and give up with the device.

> 
>>> It should be much more common to have a situation where device gets
>>> broken while it's being used.  Spec has a NEEDS_RESET flag for this.
>>
>> Yes the device side can set this flag, but it is another problem, it is
>> supposing that:
>> - the transport, device side, still works.
>> - it is able to detect that the device need a reset
>> - a reset is effective
> 
> Right. OTOH in this case there's more we can do.

Yes, I did not find a single test of this flag (NEEDS_RESET).
even QEMU set it quite often (though virtio_error())

The decision to reset the device must come from the driver.
The protocol to reset the device is device/driver specific... lotta work

Shouldn't it be separate from the "reset failed" problem?

Regards,

Pierre

> 
> 
>>>
>>> I think we should start by coding up that support in all virtio drivers.
>>>
>>> As a next step, we can add more code to detect unexpected behaviour by
>>> the host and mark device as broken. Then we can do more things by
>>> looking at the broken flag.
>>
>> It seems difficult to me.
>> But may be I went too fast to the conclusion that there is nothing to do.
>> I still think about it.
>>
>> Best regards
>>
>> Pierre
>>
>>>
>>>
>>>> ---
>>>>    drivers/virtio/virtio.c            |  4 ++++
>>>>    drivers/virtio/virtio_pci_modern.c | 11 ++++++++++-
>>>>    2 files changed, 14 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
>>>> index 48230a5..6255dc4 100644
>>>> --- a/drivers/virtio/virtio.c
>>>> +++ b/drivers/virtio/virtio.c
>>>> @@ -324,6 +324,8 @@ int register_virtio_device(struct virtio_device *dev)
>>>>    	/* We always start by resetting the device, in case a previous
>>>>    	 * driver messed it up.  This also tests that code path a little. */
>>>>    	dev->config->reset(dev);
>>>> +	if (dev->config->get_status(dev) & VIRTIO_CONFIG_S_FAILED)
>>>> +		return -EIO;
>>>>    	/* Acknowledge that we've seen the device. */
>>>>    	virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
>>>> @@ -373,6 +375,8 @@ int virtio_device_restore(struct virtio_device *dev)
>>>>    	/* We always start by resetting the device, in case a previous
>>>>    	 * driver messed it up. */
>>>>    	dev->config->reset(dev);
>>>> +	if (dev->config->get_status(dev) & VIRTIO_CONFIG_S_FAILED)
>>>> +		return -EIO;
>>>>    	/* Acknowledge that we've seen the device. */
>>>>    	virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
>>>> diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c
>>>> index 2555d80..bfc5fc1 100644
>>>> --- a/drivers/virtio/virtio_pci_modern.c
>>>> +++ b/drivers/virtio/virtio_pci_modern.c
>>>> @@ -270,6 +270,7 @@ static void vp_set_status(struct virtio_device *vdev, u8 status)
>>>>    static void vp_reset(struct virtio_device *vdev)
>>>>    {
>>>>    	struct virtio_pci_device *vp_dev = to_vp_device(vdev);
>>>> +	int retry_count = 10;
>>>>    	/* 0 status means a reset. */
>>>>    	vp_iowrite8(0, &vp_dev->common->device_status);
>>>>    	/* After writing 0 to device_status, the driver MUST wait for a read of
>>>> @@ -277,8 +278,16 @@ static void vp_reset(struct virtio_device *vdev)
>>>>    	 * This will flush out the status write, and flush in device writes,
>>>>    	 * including MSI-X interrupts, if any.
>>>>    	 */
>>>> -	while (vp_ioread8(&vp_dev->common->device_status))
>>>> +	while (vp_ioread8(&vp_dev->common->device_status) && retry_count--)
>>>>    		msleep(1);
>>>> +	/* If the read did not return 0 before the timeout consider that
>>>> +	 * the device failed.
>>>> +	 */
>>>> +	if (retry_count <= 0) {
>>>> +		virtio_add_status(vdev, VIRTIO_CONFIG_S_FAILED);
>>>> +		return;
>>>> +	}
>>>> +	virtio_add_status(vdev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
>>>>    	/* Flush pending VQ/configuration callbacks. */
>>>>    	vp_synchronize_vectors(vdev);
>>>>    }
>>>> -- 
>>>> 2.3.0
>>>
>>
>>
>> -- 
>> Pierre Morel
>> Linux/KVM/QEMU in Böblingen - Germany
> 

-- 
Pierre Morel
Linux/KVM/QEMU in Böblingen - Germany

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization