From: Yan Zhao <yan.y.zhao@intel.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: "cjia@nvidia.com" <cjia@nvidia.com>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"aik@ozlabs.ru" <aik@ozlabs.ru>,
"Zhengxiao.zx@alibaba-inc.com" <Zhengxiao.zx@alibaba-inc.com>,
"shuangtai.tst@alibaba-inc.com" <shuangtai.tst@alibaba-inc.com>,
"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
"kwankhede@nvidia.com" <kwankhede@nvidia.com>,
"eauger@redhat.com" <eauger@redhat.com>,
"Liu, Yi L" <yi.l.liu@intel.com>,
Erik Skultety <eskultet@redhat.com>,
"Yang, Ziye" <ziye.yang@intel.com>,
"mlevitsk@redhat.com" <mlevitsk@redhat.com>,
"pasic@linux.ibm.com" <pasic@linux.ibm.com>,
"libvir-list@redhat.com" <libvir-list@redhat.com>,
"arei.gonglei@huawei.com" <arei.gonglei@huawei.com>,
"felipe@nutanix.com" <felipe@nutanix.com>,
"Ken.Xue@amd.com" <Ken.Xue@amd.com>,
"Tian, Kevin" <kevin.tian@intel.com>,
"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
"zhenyuw@linux.intel.com" <zhenyuw@linux.intel.com>,
"dinechin@redhat.com" <dinechin@redhat.com>,
"intel-gvt-dev@lists.freedesktop.org"
<intel-gvt-dev@lists.freedesktop.org>,
"Liu, Changpeng" <changpeng.liu@intel.com>,
"berrange@redhat.com" <berrange@redhat.com>,
Cornelia Huck <cohuck@redhat.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"Wang, Zhi A" <zhi.a.wang@intel.com>,
"jonathan.davies@nutanix.com" <jonathan.davies@nutanix.com>,
"He, Shaopeng" <shaopeng.he@intel.com>
Subject: Re: [Qemu-devel] [PATCH v2 1/2] vfio/mdev: add version attribute for mdev device
Date: Wed, 15 May 2019 21:00:46 -0400 [thread overview]
Message-ID: <20190516010046.GA5535@joy-OptiPlex-7040> (raw)
In-Reply-To: <20190514090142.441a8a8c@x1.home>
On Tue, May 14, 2019 at 11:01:42PM +0800, Alex Williamson wrote:
> On Tue, 14 May 2019 09:43:44 +0200
> Erik Skultety <eskultet@redhat.com> wrote:
>
> > On Tue, May 14, 2019 at 03:32:19AM -0400, Yan Zhao wrote:
> > > On Tue, May 14, 2019 at 03:20:40PM +0800, Erik Skultety wrote:
> > > > On Tue, May 14, 2019 at 02:12:35AM -0400, Yan Zhao wrote:
> > > > > On Mon, May 13, 2019 at 09:28:04PM +0800, Erik Skultety wrote:
> > > > > > On Fri, May 10, 2019 at 11:48:38AM +0200, Cornelia Huck wrote:
> > > > > > > On Fri, 10 May 2019 10:36:09 +0100
> > > > > > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > > > > > >
> > > > > > > > * Cornelia Huck (cohuck@redhat.com) wrote:
> > > > > > > > > On Thu, 9 May 2019 17:48:26 +0100
> > > > > > > > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > > > > > > > >
> > > > > > > > > > * Cornelia Huck (cohuck@redhat.com) wrote:
> > > > > > > > > > > On Thu, 9 May 2019 16:48:57 +0100
> > > > > > > > > > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > > * Cornelia Huck (cohuck@redhat.com) wrote:
> > > > > > > > > > > > > On Tue, 7 May 2019 15:18:26 -0600
> > > > > > > > > > > > > Alex Williamson <alex.williamson@redhat.com> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > On Sun, 5 May 2019 21:49:04 -0400
> > > > > > > > > > > > > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > > + Errno:
> > > > > > > > > > > > > > > + If vendor driver wants to claim a mdev device incompatible to all other mdev
> > > > > > > > > > > > > > > + devices, it should not register version attribute for this mdev device. But if
> > > > > > > > > > > > > > > + a vendor driver has already registered version attribute and it wants to claim
> > > > > > > > > > > > > > > + a mdev device incompatible to all other mdev devices, it needs to return
> > > > > > > > > > > > > > > + -ENODEV on access to this mdev device's version attribute.
> > > > > > > > > > > > > > > + If a mdev device is only incompatible to certain mdev devices, write of
> > > > > > > > > > > > > > > + incompatible mdev devices's version strings to its version attribute should
> > > > > > > > > > > > > > > + return -EINVAL;
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I think it's best not to define the specific errno returned for a
> > > > > > > > > > > > > > specific situation, let the vendor driver decide, userspace simply
> > > > > > > > > > > > > > needs to know that an errno on read indicates the device does not
> > > > > > > > > > > > > > support migration version comparison and that an errno on write
> > > > > > > > > > > > > > indicates the devices are incompatible or the target doesn't support
> > > > > > > > > > > > > > migration versions.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I think I have to disagree here: It's probably valuable to have an
> > > > > > > > > > > > > agreed error for 'cannot migrate at all' vs 'cannot migrate between
> > > > > > > > > > > > > those two particular devices'. Userspace might want to do different
> > > > > > > > > > > > > things (e.g. trying with different device pairs).
> > > > > > > > > > > >
> > > > > > > > > > > > Trying to stuff these things down an errno seems a bad idea; we can't
> > > > > > > > > > > > get much information that way.
> > > > > > > > > > >
> > > > > > > > > > > So, what would be a reasonable approach? Userspace should first read
> > > > > > > > > > > the version attributes on both devices (to find out whether migration
> > > > > > > > > > > is supported at all), and only then figure out via writing whether they
> > > > > > > > > > > are compatible?
> > > > > > > > > > >
> > > > > > > > > > > (Or just go ahead and try, if it does not care about the reason.)
> > > > > > > > > >
> > > > > > > > > > Well, I'm OK with something like writing to test whether it's
> > > > > > > > > > compatible, it's just we need a better way of saying 'no'.
> > > > > > > > > > I'm not sure if that involves reading back from somewhere after
> > > > > > > > > > the write or what.
> > > > > > > > >
> > > > > > > > > Hm, so I basically see two ways of doing that:
> > > > > > > > > - standardize on some error codes... problem: error codes can be hard
> > > > > > > > > to fit to reasons
> > > > > > > > > - make the error available in some attribute that can be read
> > > > > > > > >
> > > > > > > > > I'm not sure how we can serialize the readback with the last write,
> > > > > > > > > though (this looks inherently racy).
> > > > > > > > >
> > > > > > > > > How important is detailed error reporting here?
> > > > > > > >
> > > > > > > > I think we need something, otherwise we're just going to get vague
> > > > > > > > user reports of 'but my VM doesn't migrate'; I'd like the error to be
> > > > > > > > good enough to point most users to something they can understand
> > > > > > > > (e.g. wrong card family/too old a driver etc).
> > > > > > >
> > > > > > > Ok, that sounds like a reasonable point. Not that I have a better idea
> > > > > > > how to achieve that, though... we could also log a more verbose error
> > > > > > > message to the kernel log, but that's not necessarily where a user will
> > > > > > > look first.
> > > > > >
> > > > > > In case of libvirt checking the compatibility, it won't matter how good the
> > > > > > error message in the kernel log is and regardless of how many error states you
> > > > > > want to handle, libvirt's only limited to errno here, since we're going to do
> > > > > > plain read/write, so our internal error message returned to the user is only
> > > > > > going to contain what the errno says - okay, of course we can (and we DO)
> > > > > > provide libvirt specific string, further specifying the error but like I
> > > > > > mentioned, depending on how many error cases we want to distinguish this may be
> > > > > > hard for anyone to figure out solely on the error code, as apps will most
> > > > > > probably not parse the
> > > > > > logs.
> > > > > >
> > > > > > Regards,
> > > > > > Erik
> > > > > hi Erik
> > > > > do you mean you are agreeing on defining common errors and only returning errno?
> > > >
> > > > In a sense, yes. While it is highly desirable to have logs with descriptive
> > > > messages which will help in troubleshooting tremendously, I wanted to point out
> > > > that spending time with error logs may not be that worthwhile especially since
> > > > most apps (like libvirt) will solely rely on using read(3)/write(3) to sysfs.
> > > > That means that we're limited by the errnos available, so apart from
> > > > reporting the generic system message we can't any more magic in terms of the
> > > > error messages, so the driver needs to assure that a proper message is
> > > > propagated to the journal and at best libvirt can direct the user (consumer) to
> > > > look through the system logs for more info. I also agree with the point
> > > > mentioned above that defining a specific errno is IMO not the way to go, as
> > > > these would be just too specific for the read(3)/write(3) use case.
> > > >
> > > > That said, from libvirt POV as a consumer, I'd expect there to be truly only 2
> > > > errors (I believe Alex has mentioned something similar in one of his responses
> > > > in one of the threads):
> > > > a) read error indicating that an mdev type doesn't support migration
> > > > - I assume if one type doesn't support migration, none of the other
> > > > types exposed on the parent device do, is that a fair assumption?
>
> I'd prefer not to make this assumption. Let's leave open the
> possibility that (for whatever reason) a vendor may choose to support
> migration on some types, but not others.
>
> > > > b) write error indicating that the mdev types are incompatible for
> > > > migration
> > > >
> > > > Regards,
> > > > Erik
> > > Thanks for this explanation.
> > > so, can we arrive at below agreements?
> > >
> > > 1. "not to define the specific errno returned for a specific situation,
> > > let the vendor driver decide, userspace simply needs to know that an errno on
> > > read indicates the device does not support migration version comparison and
> > > that an errno on write indicates the devices are incompatible or the target
> > > doesn't support migration versions. "
> > > 2. vendor driver should log detailed error reasons in kernel log.
> >
> > That would be my take on this, yes, but I open to hear any other suggestions and
> > ideas I couldn't think of as well.
>
> Kernel logging tends to be rather ineffective, it's surprisingly
> difficult to get users to look in dmesg and it's not really a good
> choice for scraping diagnostic information either. I'd probably leave
> this to vendor driver's discretion at this point. Thanks,
>
> Alex
got it.
Thank you all!
I'll follow it to prepare the next revision.
Thanks
Yan
> _______________________________________________
> intel-gvt-dev mailing list
> intel-gvt-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gvt-dev
next prev parent reply other threads:[~2019-05-16 1:19 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-05-06 1:45 [Qemu-devel] [PATCH v2 0/2] introduction of version attribute for VFIO live migration Yan Zhao
2019-05-06 1:49 ` [Qemu-devel] [PATCH v2 1/2] vfio/mdev: add version attribute for mdev device Yan Zhao
2019-05-07 9:19 ` Cornelia Huck
2019-05-08 11:57 ` Yan Zhao
2019-05-09 15:24 ` Cornelia Huck
2019-05-10 2:43 ` Yan Zhao
2019-05-07 21:18 ` Alex Williamson
2019-05-08 11:27 ` Yan Zhao
2019-05-08 21:22 ` Alex Williamson
2019-05-08 15:27 ` [Qemu-devel] [libvirt] " Boris Fiuczynski
2019-05-09 6:55 ` Yan Zhao
2019-05-14 15:31 ` Alex Williamson
2019-05-28 20:57 ` Boris Fiuczynski
2019-05-29 14:08 ` Alex Williamson
2019-05-09 3:10 ` [Qemu-devel] " Yan Zhao
2019-05-09 3:38 ` Alex Williamson
2019-05-09 5:48 ` Yan Zhao
2019-05-09 15:38 ` Cornelia Huck
2019-05-09 15:48 ` Dr. David Alan Gilbert
2019-05-09 15:54 ` Cornelia Huck
2019-05-09 16:48 ` Dr. David Alan Gilbert
2019-05-10 9:08 ` Cornelia Huck
2019-05-10 9:36 ` Dr. David Alan Gilbert
2019-05-10 9:48 ` Cornelia Huck
2019-05-13 1:16 ` Yan Zhao
2019-05-13 13:28 ` Erik Skultety
2019-05-14 6:12 ` Yan Zhao
2019-05-14 7:03 ` Cornelia Huck
2019-05-14 7:20 ` Erik Skultety
2019-05-14 7:32 ` Yan Zhao
2019-05-14 7:43 ` Erik Skultety
2019-05-14 7:47 ` Yan Zhao
2019-05-14 9:51 ` Cornelia Huck
2019-05-14 10:57 ` Erik Skultety
2019-05-14 11:01 ` Dr. David Alan Gilbert
2019-05-14 11:30 ` Cornelia Huck
2019-05-14 15:01 ` Alex Williamson
2019-05-16 1:00 ` Yan Zhao [this message]
2019-05-06 1:51 ` [Qemu-devel] [PATCH v2 2/2] drm/i915/gvt: export mdev device version to sysfs for Intel vGPU Yan Zhao
2019-05-06 3:20 ` Zhenyu Wang
2019-05-06 7:41 ` Zhenyu Wang
2019-05-07 5:43 ` Yan Zhao
2019-05-07 9:27 ` Cornelia Huck
2019-05-08 12:02 ` Yan Zhao
2019-05-08 10:50 ` Dr. David Alan Gilbert
2019-05-08 12:10 ` Yan Zhao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190516010046.GA5535@joy-OptiPlex-7040 \
--to=yan.y.zhao@intel.com \
--cc=Ken.Xue@amd.com \
--cc=Zhengxiao.zx@alibaba-inc.com \
--cc=aik@ozlabs.ru \
--cc=alex.williamson@redhat.com \
--cc=arei.gonglei@huawei.com \
--cc=berrange@redhat.com \
--cc=changpeng.liu@intel.com \
--cc=cjia@nvidia.com \
--cc=cohuck@redhat.com \
--cc=dgilbert@redhat.com \
--cc=dinechin@redhat.com \
--cc=eauger@redhat.com \
--cc=eskultet@redhat.com \
--cc=felipe@nutanix.com \
--cc=intel-gvt-dev@lists.freedesktop.org \
--cc=jonathan.davies@nutanix.com \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=kwankhede@nvidia.com \
--cc=libvir-list@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mlevitsk@redhat.com \
--cc=pasic@linux.ibm.com \
--cc=qemu-devel@nongnu.org \
--cc=shaopeng.he@intel.com \
--cc=shuangtai.tst@alibaba-inc.com \
--cc=yi.l.liu@intel.com \
--cc=zhenyuw@linux.intel.com \
--cc=zhi.a.wang@intel.com \
--cc=ziye.yang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).