All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Parav Pandit <parav@nvidia.com>
Cc: Cornelia Huck <cohuck@redhat.com>,
	"virtio-dev@lists.oasis-open.org"
	<virtio-dev@lists.oasis-open.org>,
	Max Gurtovoy <mgurtovoy@nvidia.com>,
	Shahaf Shuler <shahafs@nvidia.com>, Oren Duer <oren@nvidia.com>
Subject: Re: [PATCH v2] Add device reset timeout field
Date: Fri, 8 Oct 2021 08:57:26 -0400	[thread overview]
Message-ID: <20211008085134-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <PH0PR12MB5481270F843DEC30A49972F3DCB29@PH0PR12MB5481.namprd12.prod.outlook.com>

On Fri, Oct 08, 2021 at 12:12:35PM +0000, Parav Pandit wrote:
> 
> 
> > From: Cornelia Huck <cohuck@redhat.com>
> > Sent: Friday, October 8, 2021 5:18 PM
> > 
> > On Fri, Oct 08 2021, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > 
> > > On Fri, Oct 08, 2021 at 10:59:02AM +0000, Parav Pandit wrote:
> > >> It may be even a pre-boot environment where 100msec or 10msec may be
> > too short interval as other extreme of VM boot time example.
> > >
> > > I don't really know what this means. We are talking about how long it
> > > takes device to calm down and stop poking at the host after it's told
> > > to reset. 10ms worst case not enough for this?
> > 
> > To me, that sounded more like a physical device that needs to do something
> > like boot its firmware before it can perform an actual virtio operation (and
> > reset simply happens to be the first one.)
> > 
> > So, I'm getting more confused about the scope of this timeout. If it's more a
> > "device might not be ready yet" issue, I don't think we need a timeout for reset
> > specifically. Same for races with hotplugging. If it's about "reset may take some
> > time, because it will take some time before all operations have quiesced", I
> > don't see how the device can come up with a value that isn't anything other
> > than a wild guess, and the driver could do wild guessing equally well.
> 
> Device implementation has good knowledge of how a given virtio device is implemented to not do wild guess.
> I will take real world examples.
> 1. A physical virtio device backed by a firmware will take more than 10msec of boot time to respond to the reset operation.
> 
> 2. A sriov VF virtio device for our case takes a lot lesser than this, but may take anywhere between 10 msec to 250msec.
> This can happen on a firmware where user enabled 500 SR-IOV VFs.
> Pci spec indicates that all VFs to initialize within 100msec. This translates to 0.2msec for one VF.
> In some scenario this can be a hard to initialize a VF in 0.2 msec depending on what else a firmware is doing at that time.

That's separate from virtio reset though. virtio reset is
much lighter weight than a VF reset, all it needs to do is
return config space to original values and stop DMA.

> 3. A system has one or more virtio boot devices.
> One of them happens to be faulty after a firmware upgrade.
> Pre-boot env is infinitely waiting. Michael suggest to do disable such PCI slot by means of abstract Ctrl+C.
> If PCI slot is disabled, that device must be physically taken out for recovery.
> In an alternative, if device advertised a finite timeout, that device didn't boot, system gave up after finite timeout and server picked second boot option, and booted.
> Now a system admin can repair the faulty device without physically taking it out.
> Will infinite timeout help here? Or a device advertising finite timeout and recovering the system more useful?
> 
> 4. device was hotplug in system and before it is fully probed, a hot unplug is triggered.


I don't get this one. Are you talking about surprise removal here?
The way to handle that is surely not a timeout, we should be able to
test for device presence.

> Device cannot respond to reset, because its hot unplugged.
> OS waits infinitely for reset to complete.
> And system component is stuck just because of one device.
> Would a finite timeout help to abort this operation? Yes.

Except if it takes minutes it is not agile enough for many workloads.

> 
> So is wild guess of 10msec for all devices or an infinite time most efficient way to handle above scenarios?

Donnu, but as I hope you begin to see, as we start digging into
actual requirements, neither does a huge reset promise by the device.
How about some "keepalive" signal then? E.g. a register where each read
needs to respond with a different value, if it's the same then
device is stuck ...

-- 
MST


  reply	other threads:[~2021-10-08 12:57 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-06 14:10 [PATCH v2] Add device reset timeout field Parav Pandit
2021-10-06 15:22 ` Michael S. Tsirkin
2021-10-06 16:11   ` Parav Pandit
2021-10-06 20:53     ` Michael S. Tsirkin
2021-10-07  3:42       ` Parav Pandit
2021-10-07 16:10         ` [virtio-dev] " Cornelia Huck
2021-10-07 17:58           ` Parav Pandit
2021-10-08 10:00             ` [virtio-dev] " Cornelia Huck
2021-10-08 10:19               ` Parav Pandit
2021-10-08 10:12             ` Michael S. Tsirkin
2021-10-08 10:51               ` Parav Pandit
2021-10-08 11:18                 ` [virtio-dev] " Michael S. Tsirkin
2021-10-08 12:55                   ` Parav Pandit
2021-10-08 10:44         ` Michael S. Tsirkin
2021-10-08 10:59           ` Parav Pandit
2021-10-08 11:21             ` Michael S. Tsirkin
2021-10-08 11:45               ` Parav Pandit
2021-10-08 11:47               ` [virtio-dev] " Cornelia Huck
2021-10-08 12:12                 ` Parav Pandit
2021-10-08 12:57                   ` Michael S. Tsirkin [this message]
2021-10-08 13:23                     ` Parav Pandit
2021-10-08 23:09                       ` Michael S. Tsirkin
2021-10-11 14:29                         ` Parav Pandit
2021-10-11 14:59                           ` [virtio-dev] " Cornelia Huck
2021-10-11 15:44                             ` Parav Pandit
2021-10-11 16:00                               ` Michael S. Tsirkin
2021-10-12  8:51                                 ` Parav Pandit
2021-10-12  9:01                                   ` Michael S. Tsirkin
2021-10-12  9:12                                     ` Parav Pandit
2021-10-14 17:35                                       ` Parav Pandit
2021-10-14 22:28                                         ` Michael S. Tsirkin
2021-10-15  4:36                                           ` Parav Pandit
2021-10-15  5:15                                             ` [virtio-dev] " Jason Wang
2021-10-15  5:20                                               ` Parav Pandit
2021-10-15  6:40                                                 ` Jason Wang
2021-10-15  6:42                                                 ` Jason Wang
2021-10-15  6:48                                                   ` Parav Pandit
2021-10-15  7:02                                                     ` Jason Wang
2021-10-15  8:21                                                       ` Parav Pandit
2021-10-15  8:42                                                         ` Jason Wang
2021-10-22  7:20                                                           ` Parav Pandit
2021-10-25  5:41                                                             ` Jason Wang
2021-10-25  6:11                                                               ` Parav Pandit
2021-10-26  4:03                                                                 ` Jason Wang
2021-10-27  8:04                                                                   ` Parav Pandit
2021-10-27  8:26                                                                     ` Michael S. Tsirkin
2021-10-28  4:01                                                                       ` Parav Pandit
2021-10-28  5:50                                                                         ` Michael S. Tsirkin
2021-10-28  6:06                                                                           ` Parav Pandit
2021-10-15  6:51                                             ` Cornelia Huck
2021-10-15  8:09                                               ` Parav Pandit
2021-10-15  9:25                                                 ` [virtio-dev] " Cornelia Huck
2021-10-22  6:29                                                   ` Parav Pandit
2021-10-11 16:22                               ` [virtio-dev] " Cornelia Huck
2021-10-12 10:35                                 ` Parav Pandit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211008085134-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=cohuck@redhat.com \
    --cc=mgurtovoy@nvidia.com \
    --cc=oren@nvidia.com \
    --cc=parav@nvidia.com \
    --cc=shahafs@nvidia.com \
    --cc=virtio-dev@lists.oasis-open.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.