From: "Michael S. Tsirkin" <mst@redhat.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: Cao jin <caoj.fnst@cn.fujitsu.com>,
linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
izumi.taku@jp.fujitsu.com
Subject: Re: [PATCH] vfio/pci: Support error recovery
Date: Wed, 14 Dec 2016 03:58:17 +0200 [thread overview]
Message-ID: <20161214035457-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <20161213092759.50fbd7df@t450s.home>
On Tue, Dec 13, 2016 at 09:27:59AM -0700, Alex Williamson wrote:
> On Tue, 13 Dec 2016 18:12:34 +0200
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>
> > On Mon, Dec 12, 2016 at 08:39:48PM -0700, Alex Williamson wrote:
> > > On Tue, 13 Dec 2016 05:15:13 +0200
> > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > >
> > > > On Mon, Dec 12, 2016 at 03:43:13PM -0700, Alex Williamson wrote:
> > > > > > So just don't do it then. Topology must match between host and guest,
> > > > > > except maybe for the case of devices with host driver (e.g. PF)
> > > > > > which we might be able to synchronize against.
> > > > >
> > > > > We're talking about host kernel level handling here. The host kernel
> > > > > cannot defer the link reset to the user under the assumption that the
> > > > > user is handling the devices in a very specific way. The moment we do
> > > > > that, we've lost.
> > > >
> > > > The way is same as baremetal though, so why not?
> > >
> > > How do we know this? What if the user is dpdk? The kernel is
> > > responsible for maintaining the integrity of the system and devices,
> > > not the user.
> > >
> > > > And if user doesn't do what's expected, we can
> > > > do the full link reset on close.
> > >
> > > That's exactly my point, if we're talking about multiple devices,
> > > there's no guarantee that the close() for each is simultaneous. If one
> > > function is released before the other we cannot do a bus reset. If
> > > that device is then opened by another user before its sibling is
> > > released, then we once again cannot perform a link reset. I don't
> > > think it would be reasonable to mark the released device quarantined
> > > until the sibling is released, that would be a terrible user experience.
> >
> > Not sure why you find it so terrible, and I don't think there's another way.
>
> If we can't do it without regressing the support we currently have,
> let's not do it at all.
Why would we regress? As long as there are no unrecoverable errors,
there's no need to change behaviour at all.
Alex, do you have a picture of how error recovery can work in your mind?
Your answers seem to imply you do, and these patches don't implement
this correctly. I'm not sure about others, but I for one am unable to
piece it together from the comments you provide. If yes, could you
maybe do a short writeup of an architecture you would be comfortable
with?
Thanks,
--
MST
next prev parent reply other threads:[~2016-12-14 1:58 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-11-27 11:34 [PATCH] vfio/pci: Support error recovery Cao jin
2016-11-28 3:00 ` Michael S. Tsirkin
2016-11-28 9:32 ` Cao jin
2016-11-30 1:46 ` Michael S. Tsirkin
2016-12-01 13:38 ` Cao jin
2016-12-01 4:04 ` Alex Williamson
2016-12-01 4:51 ` Michael S. Tsirkin
2016-12-01 13:40 ` Cao jin
2016-12-06 3:46 ` Michael S. Tsirkin
2016-12-06 6:47 ` Cao jin
2016-12-01 13:40 ` Cao jin
2016-12-01 14:55 ` Alex Williamson
2016-12-04 12:16 ` Cao jin
2016-12-04 15:30 ` Alex Williamson
2016-12-05 5:52 ` Cao jin
2016-12-05 16:17 ` Alex Williamson
2016-12-06 3:55 ` Michael S. Tsirkin
2016-12-06 4:59 ` Alex Williamson
2016-12-06 10:46 ` Cao jin
2016-12-06 15:35 ` Alex Williamson
2016-12-07 2:49 ` Cao jin
2016-12-08 14:46 ` Cao jin
2016-12-08 16:30 ` Michael S. Tsirkin
2016-12-09 3:40 ` Cao jin
2016-12-09 3:40 ` Cao jin
2016-12-06 6:11 ` Cao jin
2016-12-06 15:25 ` Alex Williamson
2016-12-07 2:58 ` Cao jin
2016-12-12 13:49 ` Cao jin
2016-12-12 19:12 ` Alex Williamson
2016-12-12 22:29 ` Michael S. Tsirkin
2016-12-12 22:43 ` Alex Williamson
2016-12-13 3:15 ` Michael S. Tsirkin
2016-12-13 3:39 ` Alex Williamson
2016-12-13 16:12 ` Michael S. Tsirkin
2016-12-13 16:27 ` Alex Williamson
2016-12-14 1:58 ` Michael S. Tsirkin [this message]
2016-12-14 3:00 ` Alex Williamson
2016-12-14 22:20 ` Michael S. Tsirkin
2016-12-14 22:47 ` Alex Williamson
2016-12-14 23:00 ` Michael S. Tsirkin
2016-12-14 23:32 ` Alex Williamson
2016-12-14 10:24 ` Cao jin
2016-12-14 22:16 ` Alex Williamson
2016-12-14 22:25 ` Michael S. Tsirkin
2016-12-14 22:49 ` Alex Williamson
2016-12-15 13:56 ` Cao jin
2016-12-15 14:50 ` Michael S. Tsirkin
2016-12-15 22:01 ` Alex Williamson
2016-12-16 10:15 ` Cao jin
2016-12-16 10:15 ` Cao jin
2016-12-15 17:02 ` Alex Williamson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20161214035457-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=alex.williamson@redhat.com \
--cc=caoj.fnst@cn.fujitsu.com \
--cc=izumi.taku@jp.fujitsu.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.