Discussion of the implementations of VIRTIO specification
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Parav Pandit <parav@nvidia.com>
Cc: Cornelia Huck <cohuck@redhat.com>,
	"virtio-dev@lists.oasis-open.org"
	<virtio-dev@lists.oasis-open.org>,
	Max Gurtovoy <mgurtovoy@nvidia.com>,
	Shahaf Shuler <shahafs@nvidia.com>, Oren Duer <oren@nvidia.com>
Subject: Re: [PATCH v2] Add device reset timeout field
Date: Fri, 8 Oct 2021 08:57:26 -0400	[thread overview]
Message-ID: <20211008085134-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <PH0PR12MB5481270F843DEC30A49972F3DCB29@PH0PR12MB5481.namprd12.prod.outlook.com>

On Fri, Oct 08, 2021 at 12:12:35PM +0000, Parav Pandit wrote:
> 
> 
> > From: Cornelia Huck <cohuck@redhat.com>
> > Sent: Friday, October 8, 2021 5:18 PM
> > 
> > On Fri, Oct 08 2021, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > 
> > > On Fri, Oct 08, 2021 at 10:59:02AM +0000, Parav Pandit wrote:
> > >> It may be even a pre-boot environment where 100msec or 10msec may be
> > too short interval as other extreme of VM boot time example.
> > >
> > > I don't really know what this means. We are talking about how long it
> > > takes device to calm down and stop poking at the host after it's told
> > > to reset. 10ms worst case not enough for this?
> > 
> > To me, that sounded more like a physical device that needs to do something
> > like boot its firmware before it can perform an actual virtio operation (and
> > reset simply happens to be the first one.)
> > 
> > So, I'm getting more confused about the scope of this timeout. If it's more a
> > "device might not be ready yet" issue, I don't think we need a timeout for reset
> > specifically. Same for races with hotplugging. If it's about "reset may take some
> > time, because it will take some time before all operations have quiesced", I
> > don't see how the device can come up with a value that isn't anything other
> > than a wild guess, and the driver could do wild guessing equally well.
> 
> Device implementation has good knowledge of how a given virtio device is implemented to not do wild guess.
> I will take real world examples.
> 1. A physical virtio device backed by a firmware will take more than 10msec of boot time to respond to the reset operation.
> 
> 2. A sriov VF virtio device for our case takes a lot lesser than this, but may take anywhere between 10 msec to 250msec.
> This can happen on a firmware where user enabled 500 SR-IOV VFs.
> Pci spec indicates that all VFs to initialize within 100msec. This translates to 0.2msec for one VF.
> In some scenario this can be a hard to initialize a VF in 0.2 msec depending on what else a firmware is doing at that time.

That's separate from virtio reset though. virtio reset is
much lighter weight than a VF reset, all it needs to do is
return config space to original values and stop DMA.

> 3. A system has one or more virtio boot devices.
> One of them happens to be faulty after a firmware upgrade.
> Pre-boot env is infinitely waiting. Michael suggest to do disable such PCI slot by means of abstract Ctrl+C.
> If PCI slot is disabled, that device must be physically taken out for recovery.
> In an alternative, if device advertised a finite timeout, that device didn't boot, system gave up after finite timeout and server picked second boot option, and booted.
> Now a system admin can repair the faulty device without physically taking it out.
> Will infinite timeout help here? Or a device advertising finite timeout and recovering the system more useful?
> 
> 4. device was hotplug in system and before it is fully probed, a hot unplug is triggered.


I don't get this one. Are you talking about surprise removal here?
The way to handle that is surely not a timeout, we should be able to
test for device presence.

> Device cannot respond to reset, because its hot unplugged.
> OS waits infinitely for reset to complete.
> And system component is stuck just because of one device.
> Would a finite timeout help to abort this operation? Yes.

Except if it takes minutes it is not agile enough for many workloads.

> 
> So is wild guess of 10msec for all devices or an infinite time most efficient way to handle above scenarios?

Donnu, but as I hope you begin to see, as we start digging into
actual requirements, neither does a huge reset promise by the device.
How about some "keepalive" signal then? E.g. a register where each read
needs to respond with a different value, if it's the same then
device is stuck ...

-- 
MST


  reply	other threads:[~2021-10-08 12:57 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-06 14:10 [PATCH v2] Add device reset timeout field Parav Pandit
2021-10-06 15:22 ` Michael S. Tsirkin
2021-10-06 16:11   ` Parav Pandit
2021-10-06 20:53     ` Michael S. Tsirkin
2021-10-07  3:42       ` Parav Pandit
2021-10-07 16:10         ` [virtio-dev] " Cornelia Huck
2021-10-07 17:58           ` Parav Pandit
2021-10-08 10:00             ` [virtio-dev] " Cornelia Huck
2021-10-08 10:19               ` Parav Pandit
2021-10-08 10:12             ` Michael S. Tsirkin
2021-10-08 10:51               ` Parav Pandit
2021-10-08 11:18                 ` [virtio-dev] " Michael S. Tsirkin
2021-10-08 12:55                   ` Parav Pandit
2021-10-08 10:44         ` Michael S. Tsirkin
2021-10-08 10:59           ` Parav Pandit
2021-10-08 11:21             ` Michael S. Tsirkin
2021-10-08 11:45               ` Parav Pandit
2021-10-08 11:47               ` [virtio-dev] " Cornelia Huck
2021-10-08 12:12                 ` Parav Pandit
2021-10-08 12:57                   ` Michael S. Tsirkin [this message]
2021-10-08 13:23                     ` Parav Pandit
2021-10-08 23:09                       ` Michael S. Tsirkin
2021-10-11 14:29                         ` Parav Pandit
2021-10-11 14:59                           ` [virtio-dev] " Cornelia Huck
2021-10-11 15:44                             ` Parav Pandit
2021-10-11 16:00                               ` Michael S. Tsirkin
2021-10-12  8:51                                 ` Parav Pandit
2021-10-12  9:01                                   ` Michael S. Tsirkin
2021-10-12  9:12                                     ` Parav Pandit
2021-10-14 17:35                                       ` Parav Pandit
2021-10-14 22:28                                         ` Michael S. Tsirkin
2021-10-15  4:36                                           ` Parav Pandit
2021-10-15  5:15                                             ` [virtio-dev] " Jason Wang
2021-10-15  5:20                                               ` Parav Pandit
2021-10-15  6:40                                                 ` Jason Wang
2021-10-15  6:42                                                 ` Jason Wang
2021-10-15  6:48                                                   ` Parav Pandit
2021-10-15  7:02                                                     ` Jason Wang
2021-10-15  8:21                                                       ` Parav Pandit
2021-10-15  8:42                                                         ` Jason Wang
2021-10-22  7:20                                                           ` Parav Pandit
2021-10-25  5:41                                                             ` Jason Wang
2021-10-25  6:11                                                               ` Parav Pandit
2021-10-26  4:03                                                                 ` Jason Wang
2021-10-27  8:04                                                                   ` Parav Pandit
2021-10-27  8:26                                                                     ` Michael S. Tsirkin
2021-10-28  4:01                                                                       ` Parav Pandit
2021-10-28  5:50                                                                         ` Michael S. Tsirkin
2021-10-28  6:06                                                                           ` Parav Pandit
2021-10-15  6:51                                             ` Cornelia Huck
2021-10-15  8:09                                               ` Parav Pandit
2021-10-15  9:25                                                 ` [virtio-dev] " Cornelia Huck
2021-10-22  6:29                                                   ` Parav Pandit
2021-10-11 16:22                               ` [virtio-dev] " Cornelia Huck
2021-10-12 10:35                                 ` Parav Pandit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211008085134-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=cohuck@redhat.com \
    --cc=mgurtovoy@nvidia.com \
    --cc=oren@nvidia.com \
    --cc=parav@nvidia.com \
    --cc=shahafs@nvidia.com \
    --cc=virtio-dev@lists.oasis-open.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox