From: "Michael S. Tsirkin" <mst@redhat.com>
To: Parav Pandit <parav@nvidia.com>
Cc: Cornelia Huck <cohuck@redhat.com>,
"virtio-dev@lists.oasis-open.org"
<virtio-dev@lists.oasis-open.org>,
Max Gurtovoy <mgurtovoy@nvidia.com>,
Shahaf Shuler <shahafs@nvidia.com>, Oren Duer <oren@nvidia.com>
Subject: Re: [PATCH v2] Add device reset timeout field
Date: Fri, 8 Oct 2021 19:09:27 -0400 [thread overview]
Message-ID: <20211008185926-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <PH0PR12MB54816AD1011E3806A4A26B27DCB29@PH0PR12MB5481.namprd12.prod.outlook.com>
On Fri, Oct 08, 2021 at 01:23:52PM +0000, Parav Pandit wrote:
>
>
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Friday, October 8, 2021 6:27 PM
>
> > > 2. A sriov VF virtio device for our case takes a lot lesser than this, but may
> > take anywhere between 10 msec to 250msec.
> > > This can happen on a firmware where user enabled 500 SR-IOV VFs.
> > > Pci spec indicates that all VFs to initialize within 100msec. This translates to
> > 0.2msec for one VF.
> > > In some scenario this can be a hard to initialize a VF in 0.2 msec depending
> > on what else a firmware is doing at that time.
> >
> > That's separate from virtio reset though. virtio reset is much lighter weight
> > than a VF reset, all it needs to do is return config space to original values and
> > stop DMA.
> Again you took the valid example to stop the DMA of already initialized device, while above case is for the first init. :-)
> virtio device is going the first reset during initialization. It should be able to tell how long to wait.
> A device firmware may take more than 0.2msec to finish needed initialization to serve a virtio device.
> Infinite wait of today works here.
Looks like it's as Cornelia said - nothing to do with reset. E.g. it's
likely device can not even serve pci config before the init is complete.
> Question was for wild guess by driver for 100msec vs 10msec vs 0.2 msec.
> Is that enough?
So some guidance in the spec on how long it should take will address
this I think.
> >
> > > 3. A system has one or more virtio boot devices.
> > > One of them happens to be faulty after a firmware upgrade.
> > > Pre-boot env is infinitely waiting. Michael suggest to do disable such PCI slot
> > by means of abstract Ctrl+C.
> > > If PCI slot is disabled, that device must be physically taken out for recovery.
> > > In an alternative, if device advertised a finite timeout, that device didn't boot,
> > system gave up after finite timeout and server picked second boot option, and
> > booted.
> > > Now a system admin can repair the faulty device without physically taking it
> > out.
> > > Will infinite timeout help here? Or a device advertising finite timeout and
> > recovering the system more useful?
> > >
> > > 4. device was hotplug in system and before it is fully probed, a hot unplug is
> > triggered.
> >
> >
> > I don't get this one. Are you talking about surprise removal here?
> Yes.
> > The way to handle that is surely not a timeout, we should be able to test for
> > device presence.
> Yes, it should be possible to update device presence of device under probe while its surprised removed.
> I will look into it more.
> However, this is not the only place timeout is used.
As in this example, I'd be worried people will rely on timeout instead
of addressing things properly.
> >
> > > Device cannot respond to reset, because its hot unplugged.
> > > OS waits infinitely for reset to complete.
> > > And system component is stuck just because of one device.
> > > Would a finite timeout help to abort this operation? Yes.
> >
> > Except if it takes minutes it is not agile enough for many workloads.
> >
> > >
> > > So is wild guess of 10msec for all devices or an infinite time most efficient
> > way to handle above scenarios?
> >
> > Donnu, but as I hope you begin to see, as we start digging into actual
> > requirements, neither does a huge reset promise by the device.
>
> A finite reset timeout helps in making the virtio devices more predicable to use.
>
> > How about some "keepalive" signal then? E.g. a register where each read
> > needs to respond with a different value, if it's the same then device is stuck ...
>
> A device should be out of the reset, keep alive feature negotiated to respond to a keep alive requests from host driver.
> Keep alive is useful post the reset+ init stage.
> (keep alive is also used by nvme devices, similar to device ready TIMEOUT with granularity of 500msec, similar to virtio device reset timeout).
Just to clarify, what I call keepalive here is a counter
providing a different value on each read.
This can thinkably work even before feature negotiation.
--
MST
next prev parent reply other threads:[~2021-10-08 23:09 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-06 14:10 [PATCH v2] Add device reset timeout field Parav Pandit
2021-10-06 15:22 ` Michael S. Tsirkin
2021-10-06 16:11 ` Parav Pandit
2021-10-06 20:53 ` Michael S. Tsirkin
2021-10-07 3:42 ` Parav Pandit
2021-10-07 16:10 ` [virtio-dev] " Cornelia Huck
2021-10-07 17:58 ` Parav Pandit
2021-10-08 10:00 ` [virtio-dev] " Cornelia Huck
2021-10-08 10:19 ` Parav Pandit
2021-10-08 10:12 ` Michael S. Tsirkin
2021-10-08 10:51 ` Parav Pandit
2021-10-08 11:18 ` [virtio-dev] " Michael S. Tsirkin
2021-10-08 12:55 ` Parav Pandit
2021-10-08 10:44 ` Michael S. Tsirkin
2021-10-08 10:59 ` Parav Pandit
2021-10-08 11:21 ` Michael S. Tsirkin
2021-10-08 11:45 ` Parav Pandit
2021-10-08 11:47 ` [virtio-dev] " Cornelia Huck
2021-10-08 12:12 ` Parav Pandit
2021-10-08 12:57 ` Michael S. Tsirkin
2021-10-08 13:23 ` Parav Pandit
2021-10-08 23:09 ` Michael S. Tsirkin [this message]
2021-10-11 14:29 ` Parav Pandit
2021-10-11 14:59 ` [virtio-dev] " Cornelia Huck
2021-10-11 15:44 ` Parav Pandit
2021-10-11 16:00 ` Michael S. Tsirkin
2021-10-12 8:51 ` Parav Pandit
2021-10-12 9:01 ` Michael S. Tsirkin
2021-10-12 9:12 ` Parav Pandit
2021-10-14 17:35 ` Parav Pandit
2021-10-14 22:28 ` Michael S. Tsirkin
2021-10-15 4:36 ` Parav Pandit
2021-10-15 5:15 ` [virtio-dev] " Jason Wang
2021-10-15 5:20 ` Parav Pandit
2021-10-15 6:40 ` Jason Wang
2021-10-15 6:42 ` Jason Wang
2021-10-15 6:48 ` Parav Pandit
2021-10-15 7:02 ` Jason Wang
2021-10-15 8:21 ` Parav Pandit
2021-10-15 8:42 ` Jason Wang
2021-10-22 7:20 ` Parav Pandit
2021-10-25 5:41 ` Jason Wang
2021-10-25 6:11 ` Parav Pandit
2021-10-26 4:03 ` Jason Wang
2021-10-27 8:04 ` Parav Pandit
2021-10-27 8:26 ` Michael S. Tsirkin
2021-10-28 4:01 ` Parav Pandit
2021-10-28 5:50 ` Michael S. Tsirkin
2021-10-28 6:06 ` Parav Pandit
2021-10-15 6:51 ` Cornelia Huck
2021-10-15 8:09 ` Parav Pandit
2021-10-15 9:25 ` [virtio-dev] " Cornelia Huck
2021-10-22 6:29 ` Parav Pandit
2021-10-11 16:22 ` [virtio-dev] " Cornelia Huck
2021-10-12 10:35 ` Parav Pandit
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20211008185926-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=cohuck@redhat.com \
--cc=mgurtovoy@nvidia.com \
--cc=oren@nvidia.com \
--cc=parav@nvidia.com \
--cc=shahafs@nvidia.com \
--cc=virtio-dev@lists.oasis-open.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox