All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: Cao jin <caoj.fnst@cn.fujitsu.com>,
	linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	izumi.taku@jp.fujitsu.com
Subject: Re: [PATCH] vfio/pci: Support error recovery
Date: Thu, 1 Dec 2016 06:51:31 +0200	[thread overview]
Message-ID: <20161201064708-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <20161130210413.5161aab1@t450s.home>

On Wed, Nov 30, 2016 at 09:04:13PM -0700, Alex Williamson wrote:
> On Sun, 27 Nov 2016 19:34:17 +0800
> Cao jin <caoj.fnst@cn.fujitsu.com> wrote:
> 
> > It is user space driver's or device-specific driver's(in guest) responsbility
> > to do a serious recovery when error happened. Link-reset is one part of
> > recovery, when pci device is assigned to VM via vfio, link-reset will do
> > twice in host & guest separately, which will cause many trouble for a
> > successful recovery, so, disable the vfio-pci's link-reset in aer driver
> > in host, this is a keypoint for guest to do error recovery successfully.
> > 
> > CC: alex.williamson@redhat.com
> > CC: mst@redhat.com
> > Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
> > ---
> > This is actually a RFC version(has debug lines left), and has minor changes in
> > aer driver, so I think maybe it is better not to CC pci guys in this round.
> > Later will do.
> > 
> >  drivers/pci/pcie/aer/aerdrv_core.c  | 12 ++++++-
> >  drivers/vfio/pci/vfio_pci.c         | 63 +++++++++++++++++++++++++++++++++++--
> >  drivers/vfio/pci/vfio_pci_private.h |  2 ++
> >  3 files changed, 74 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c
> > index 521e39c..289fb8e 100644
> > --- a/drivers/pci/pcie/aer/aerdrv_core.c
> > +++ b/drivers/pci/pcie/aer/aerdrv_core.c
> > @@ -496,7 +496,17 @@ static void do_recovery(struct pci_dev *dev, int severity)
> >  			"error_detected",
> >  			report_error_detected);
> >  
> > -	if (severity == AER_FATAL) {
> > +	/* vfio-pci as a general meta driver, it actually couldn't do any real
> > +	 * recovery for device. It is user space driver, or device-specific
> > +	 * driver in guest who should take care of the serious error recovery,
> > +	 * link reset actually is one part of whole recovery. Doing reset_link
> > +	 * in aer driver of host kernel for vfio-pci devices will cause many
> > +	 * trouble for user space driver or guest's device-specific driver,
> > +	 * for example: the serious recovery often need to read register in
> > +	 * config space, but if register reading happens during link-resetting,
> > +	 * it is quite possible to return invalid value like all F's, which
> > +	 * will result in unpredictable error. */
> > +	if (severity == AER_FATAL && strcmp(dev->driver->name, "vfio-pci")) {
> >  		result = reset_link(dev);
> >  		if (result != PCI_ERS_RESULT_RECOVERED)
> >  			goto failed;
> 
> This is not acceptable.  If we want to make a path through AER recovery
> that does not reset the link, there should be a way for the driver to
> request it.  Testing the driver name is a horrible hack.

Right, just set a flag or something.

> The other
> question is what guarantees does vfio make that the device does get
> reset?  If an AER fault occurs and the user doesn't do a reset, what
> happens when that device is released and a host driver tries to make
> use of it?  The user makes no commitment to do a reset and there are
> only limited configurations where we even allow the user to perform a
> reset.

Let's try to work out what can be done a bit more constructively.
Doesn't vfio already reset device on release?
How about detecting there was an unhandled AER event
and doing link reset at that point?



> > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> > index 712a849..aefd751 100644
> > --- a/drivers/vfio/pci/vfio_pci.c
> > +++ b/drivers/vfio/pci/vfio_pci.c
> > @@ -535,6 +535,15 @@ static long vfio_pci_ioctl(void *device_data,
> >  	struct vfio_pci_device *vdev = device_data;
> >  	unsigned long minsz;
> >  
> > +	if (vdev->aer_recovering && (cmd == VFIO_DEVICE_SET_IRQS ||
> > +	    cmd == VFIO_DEVICE_RESET || cmd == VFIO_DEVICE_PCI_HOT_RESET)) {
> > +		int ret;
> > +		ret = wait_for_completion_interruptible(
> > +			&vdev->aer_error_completion);
> > +		if (ret)
> > +			return ret;
> > +	}
> > +
> >  	if (cmd == VFIO_DEVICE_GET_INFO) {
> >  		struct vfio_device_info info;
> >  
> > @@ -1117,6 +1126,7 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> >  	vdev->irq_type = VFIO_PCI_NUM_IRQS;
> >  	mutex_init(&vdev->igate);
> >  	spin_lock_init(&vdev->irqlock);
> > +	init_completion(&vdev->aer_error_completion);
> >  
> >  	ret = vfio_add_group_dev(&pdev->dev, &vfio_pci_ops, vdev);
> >  	if (ret) {
> > @@ -1176,6 +1186,9 @@ static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev,
> >  {
> >  	struct vfio_pci_device *vdev;
> >  	struct vfio_device *device;
> > +	u32 uncor_status = 0;
> > +	unsigned int aer_cap_offset = 0;
> 
> Unnecessary initialization.
> 
> > +	int ret;
> >  
> >  	device = vfio_device_get_from_dev(&pdev->dev);
> >  	if (device == NULL)
> > @@ -1187,10 +1200,30 @@ static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev,
> >  		return PCI_ERS_RESULT_DISCONNECT;
> >  	}
> >  
> > +	/* get device's uncorrectable error status as soon as possible,
> > +	 * and signal it to user space. The later we read it, the possibility
> > +	 * the register value is mangled grows. */
> 
> Bad comment style (throughout).
> 
> > +	aer_cap_offset = pci_find_ext_capability(vdev->pdev, PCI_EXT_CAP_ID_ERR);
> > +	ret = pci_read_config_dword(vdev->pdev, aer_cap_offset + 
> > +                                    PCI_ERR_UNCOR_STATUS, &uncor_status);
> > +        if (ret)
> > +                return PCI_ERS_RESULT_DISCONNECT;
> > +
> > +	pr_err("device %d got AER detect notification. uncorrectable error status = 0x%x\n", pdev->devfn, uncor_status);//to be removed
> >  	mutex_lock(&vdev->igate);
> > +    
> > +	vdev->aer_recovering = true;
> > +	reinit_completion(&vdev->aer_error_completion);
> > +
> > +	/* suspend config space access from user space,
> > +	 * when vfio-pci's error recovery process is on */
> > +	pci_cfg_access_trylock(vdev->pdev);
> >  
> > -	if (vdev->err_trigger)
> > -		eventfd_signal(vdev->err_trigger, 1);
> > +	if (vdev->err_trigger && uncor_status) {
> > +		pr_err("device %d signal uncor status to user space", pdev->devfn);//may be removed
> > +		/* signal uncorrectable error status to user space */
> > +		eventfd_signal(vdev->err_trigger, uncor_status);

What does this mean that we trigger with uncor_status as opposed to 1?

> > +        }
> 
> } else... what?  By bypassing the AER link reset we've assumed
> responsibility for resetting the device.  Even if we signal the user,
> what guarantee do we have that the device is recovered?

So set flag and recover ?

> >  
> >  	mutex_unlock(&vdev->igate);
> >  
> > @@ -1199,8 +1232,34 @@ static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev,
> >  	return PCI_ERS_RESULT_CAN_RECOVER;
> >  }
> >  
> > +static void vfio_pci_aer_resume(struct pci_dev *pdev)
> > +{
> > +	struct vfio_pci_device *vdev;
> > +	struct vfio_device *device;
> > +
> > +	device = vfio_device_get_from_dev(&pdev->dev);
> > +	if (device == NULL)
> > +		return;
> > +
> > +	vdev = vfio_device_data(device);
> > +	if (vdev == NULL) {
> > +		vfio_device_put(device);
> > +		return;
> > +	}
> > +
> > +	/* vfio-pci's error recovery is done, time to
> > +	 * resume pci config space's accesses */
> > +	pci_cfg_access_unlock(vdev->pdev);
> > +
> > +	vdev->aer_recovering = false;
> > +	complete_all(&vdev->aer_error_completion);
> > +
> > +	vfio_device_put(device);
> > +}
> > +
> >  static const struct pci_error_handlers vfio_err_handlers = {
> >  	.error_detected = vfio_pci_aer_err_detected,
> > +	.resume         = vfio_pci_aer_resume,
> >  };
> >  
> >  static struct pci_driver vfio_pci_driver = {
> > diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h
> > index 8a7d546..ebf1041 100644
> > --- a/drivers/vfio/pci/vfio_pci_private.h
> > +++ b/drivers/vfio/pci/vfio_pci_private.h
> > @@ -83,6 +83,8 @@ struct vfio_pci_device {
> >  	bool			bardirty;
> >  	bool			has_vga;
> >  	bool			needs_reset;
> > +	bool			aer_recovering;
> > +	struct completion	aer_error_completion;
> >  	struct pci_saved_state	*pci_saved_state;
> >  	int			refcnt;
> >  	struct eventfd_ctx	*err_trigger;

  reply	other threads:[~2016-12-01  4:51 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-11-27 11:34 [PATCH] vfio/pci: Support error recovery Cao jin
2016-11-28  3:00 ` Michael S. Tsirkin
2016-11-28  9:32   ` Cao jin
2016-11-30  1:46     ` Michael S. Tsirkin
2016-12-01 13:38       ` Cao jin
2016-12-01  4:04 ` Alex Williamson
2016-12-01  4:51   ` Michael S. Tsirkin [this message]
2016-12-01 13:40     ` Cao jin
2016-12-06  3:46       ` Michael S. Tsirkin
2016-12-06  6:47         ` Cao jin
2016-12-01 13:40   ` Cao jin
2016-12-01 14:55     ` Alex Williamson
2016-12-04 12:16       ` Cao jin
2016-12-04 15:30         ` Alex Williamson
2016-12-05  5:52           ` Cao jin
2016-12-05 16:17             ` Alex Williamson
2016-12-06  3:55               ` Michael S. Tsirkin
2016-12-06  4:59                 ` Alex Williamson
2016-12-06 10:46                   ` Cao jin
2016-12-06 15:35                     ` Alex Williamson
2016-12-07  2:49                       ` Cao jin
2016-12-08 14:46                       ` Cao jin
2016-12-08 16:30                         ` Michael S. Tsirkin
2016-12-09  3:40                           ` Cao jin
2016-12-09  3:40                         ` Cao jin
2016-12-06  6:11               ` Cao jin
2016-12-06 15:25                 ` Alex Williamson
2016-12-07  2:58                   ` Cao jin
2016-12-12 13:49 ` Cao jin
2016-12-12 19:12   ` Alex Williamson
2016-12-12 22:29     ` Michael S. Tsirkin
2016-12-12 22:43       ` Alex Williamson
2016-12-13  3:15         ` Michael S. Tsirkin
2016-12-13  3:39           ` Alex Williamson
2016-12-13 16:12             ` Michael S. Tsirkin
2016-12-13 16:27               ` Alex Williamson
2016-12-14  1:58                 ` Michael S. Tsirkin
2016-12-14  3:00                   ` Alex Williamson
2016-12-14 22:20                     ` Michael S. Tsirkin
2016-12-14 22:47                       ` Alex Williamson
2016-12-14 23:00                         ` Michael S. Tsirkin
2016-12-14 23:32                           ` Alex Williamson
2016-12-14 10:24     ` Cao jin
2016-12-14 22:16       ` Alex Williamson
2016-12-14 22:25         ` Michael S. Tsirkin
2016-12-14 22:49           ` Alex Williamson
2016-12-15 13:56         ` Cao jin
2016-12-15 14:50           ` Michael S. Tsirkin
2016-12-15 22:01             ` Alex Williamson
2016-12-16 10:15               ` Cao jin
2016-12-16 10:15             ` Cao jin
2016-12-15 17:02           ` Alex Williamson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161201064708-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=caoj.fnst@cn.fujitsu.com \
    --cc=izumi.taku@jp.fujitsu.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.