From: "Michael S. Tsirkin" <mst@redhat.com>
To: Parav Pandit <parav@nvidia.com>
Cc: Cornelia Huck <cohuck@redhat.com>,
"virtio-dev@lists.oasis-open.org"
<virtio-dev@lists.oasis-open.org>,
Max Gurtovoy <mgurtovoy@nvidia.com>,
Shahaf Shuler <shahafs@nvidia.com>, Oren Duer <oren@nvidia.com>
Subject: Re: [PATCH v2] Add device reset timeout field
Date: Fri, 8 Oct 2021 19:09:27 -0400 [thread overview]
Message-ID: <20211008185926-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <PH0PR12MB54816AD1011E3806A4A26B27DCB29@PH0PR12MB5481.namprd12.prod.outlook.com>
On Fri, Oct 08, 2021 at 01:23:52PM +0000, Parav Pandit wrote:
>
>
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Friday, October 8, 2021 6:27 PM
>
> > > 2. A sriov VF virtio device for our case takes a lot lesser than this, but may
> > take anywhere between 10 msec to 250msec.
> > > This can happen on a firmware where user enabled 500 SR-IOV VFs.
> > > Pci spec indicates that all VFs to initialize within 100msec. This translates to
> > 0.2msec for one VF.
> > > In some scenario this can be a hard to initialize a VF in 0.2 msec depending
> > on what else a firmware is doing at that time.
> >
> > That's separate from virtio reset though. virtio reset is much lighter weight
> > than a VF reset, all it needs to do is return config space to original values and
> > stop DMA.
> Again you took the valid example to stop the DMA of already initialized device, while above case is for the first init. :-)
> virtio device is going the first reset during initialization. It should be able to tell how long to wait.
> A device firmware may take more than 0.2msec to finish needed initialization to serve a virtio device.
> Infinite wait of today works here.
Looks like it's as Cornelia said - nothing to do with reset. E.g. it's
likely device can not even serve pci config before the init is complete.
> Question was for wild guess by driver for 100msec vs 10msec vs 0.2 msec.
> Is that enough?
So some guidance in the spec on how long it should take will address
this I think.
> >
> > > 3. A system has one or more virtio boot devices.
> > > One of them happens to be faulty after a firmware upgrade.
> > > Pre-boot env is infinitely waiting. Michael suggest to do disable such PCI slot
> > by means of abstract Ctrl+C.
> > > If PCI slot is disabled, that device must be physically taken out for recovery.
> > > In an alternative, if device advertised a finite timeout, that device didn't boot,
> > system gave up after finite timeout and server picked second boot option, and
> > booted.
> > > Now a system admin can repair the faulty device without physically taking it
> > out.
> > > Will infinite timeout help here? Or a device advertising finite timeout and
> > recovering the system more useful?
> > >
> > > 4. device was hotplug in system and before it is fully probed, a hot unplug is
> > triggered.
> >
> >
> > I don't get this one. Are you talking about surprise removal here?
> Yes.
> > The way to handle that is surely not a timeout, we should be able to test for
> > device presence.
> Yes, it should be possible to update device presence of device under probe while its surprised removed.
> I will look into it more.
> However, this is not the only place timeout is used.
As in this example, I'd be worried people will rely on timeout instead
of addressing things properly.
> >
> > > Device cannot respond to reset, because its hot unplugged.
> > > OS waits infinitely for reset to complete.
> > > And system component is stuck just because of one device.
> > > Would a finite timeout help to abort this operation? Yes.
> >
> > Except if it takes minutes it is not agile enough for many workloads.
> >
> > >
> > > So is wild guess of 10msec for all devices or an infinite time most efficient
> > way to handle above scenarios?
> >
> > Donnu, but as I hope you begin to see, as we start digging into actual
> > requirements, neither does a huge reset promise by the device.
>
> A finite reset timeout helps in making the virtio devices more predicable to use.
>
> > How about some "keepalive" signal then? E.g. a register where each read
> > needs to respond with a different value, if it's the same then device is stuck ...
>
> A device should be out of the reset, keep alive feature negotiated to respond to a keep alive requests from host driver.
> Keep alive is useful post the reset+ init stage.
> (keep alive is also used by nvme devices, similar to device ready TIMEOUT with granularity of 500msec, similar to virtio device reset timeout).
Just to clarify, what I call keepalive here is a counter
providing a different value on each read.
This can thinkably work even before feature negotiation.
--
MST
next prev parent reply other threads:[~2021-10-08 23:09 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-06 14:10 [PATCH v2] Add device reset timeout field Parav Pandit
2021-10-06 15:22 ` Michael S. Tsirkin
2021-10-06 16:11 ` Parav Pandit
2021-10-06 20:53 ` Michael S. Tsirkin
2021-10-07 3:42 ` Parav Pandit
2021-10-07 16:10 ` [virtio-dev] " Cornelia Huck
2021-10-07 17:58 ` Parav Pandit
2021-10-08 10:00 ` [virtio-dev] " Cornelia Huck
2021-10-08 10:19 ` Parav Pandit
2021-10-08 10:12 ` Michael S. Tsirkin
2021-10-08 10:51 ` Parav Pandit
2021-10-08 11:18 ` [virtio-dev] " Michael S. Tsirkin
2021-10-08 12:55 ` Parav Pandit
2021-10-08 10:44 ` Michael S. Tsirkin
2021-10-08 10:59 ` Parav Pandit
2021-10-08 11:21 ` Michael S. Tsirkin
2021-10-08 11:45 ` Parav Pandit
2021-10-08 11:47 ` [virtio-dev] " Cornelia Huck
2021-10-08 12:12 ` Parav Pandit
2021-10-08 12:57 ` Michael S. Tsirkin
2021-10-08 13:23 ` Parav Pandit
2021-10-08 23:09 ` Michael S. Tsirkin [this message]
2021-10-11 14:29 ` Parav Pandit
2021-10-11 14:59 ` [virtio-dev] " Cornelia Huck
2021-10-11 15:44 ` Parav Pandit
2021-10-11 16:00 ` Michael S. Tsirkin
2021-10-12 8:51 ` Parav Pandit
2021-10-12 9:01 ` Michael S. Tsirkin
2021-10-12 9:12 ` Parav Pandit
2021-10-14 17:35 ` Parav Pandit
2021-10-14 22:28 ` Michael S. Tsirkin
2021-10-15 4:36 ` Parav Pandit
2021-10-15 5:15 ` [virtio-dev] " Jason Wang
2021-10-15 5:20 ` Parav Pandit
2021-10-15 6:40 ` Jason Wang
2021-10-15 6:42 ` Jason Wang
2021-10-15 6:48 ` Parav Pandit
2021-10-15 7:02 ` Jason Wang
2021-10-15 8:21 ` Parav Pandit
2021-10-15 8:42 ` Jason Wang
2021-10-22 7:20 ` Parav Pandit
2021-10-25 5:41 ` Jason Wang
2021-10-25 6:11 ` Parav Pandit
2021-10-26 4:03 ` Jason Wang
2021-10-27 8:04 ` Parav Pandit
2021-10-27 8:26 ` Michael S. Tsirkin
2021-10-28 4:01 ` Parav Pandit
2021-10-28 5:50 ` Michael S. Tsirkin
2021-10-28 6:06 ` Parav Pandit
2021-10-15 6:51 ` Cornelia Huck
2021-10-15 8:09 ` Parav Pandit
2021-10-15 9:25 ` [virtio-dev] " Cornelia Huck
2021-10-22 6:29 ` Parav Pandit
2021-10-11 16:22 ` [virtio-dev] " Cornelia Huck
2021-10-12 10:35 ` Parav Pandit
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20211008185926-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=cohuck@redhat.com \
--cc=mgurtovoy@nvidia.com \
--cc=oren@nvidia.com \
--cc=parav@nvidia.com \
--cc=shahafs@nvidia.com \
--cc=virtio-dev@lists.oasis-open.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.