From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759356AbcLANe7 (ORCPT ); Thu, 1 Dec 2016 08:34:59 -0500 Received: from cn.fujitsu.com ([59.151.112.132]:4635 "EHLO heian.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1751510AbcLANe5 (ORCPT ); Thu, 1 Dec 2016 08:34:57 -0500 X-IronPort-AV: E=Sophos;i="5.22,518,1449504000"; d="scan'208";a="13511467" Subject: Re: [PATCH] vfio/pci: Support error recovery To: "Michael S. Tsirkin" References: <1480246457-10368-1-git-send-email-caoj.fnst@cn.fujitsu.com> <20161128045506-mutt-send-email-mst@kernel.org> <583BF99F.9030106@cn.fujitsu.com> <20161130033533-mutt-send-email-mst@kernel.org> CC: , , , From: Cao jin Message-ID: <584027D7.7080802@cn.fujitsu.com> Date: Thu, 1 Dec 2016 21:38:31 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0 MIME-Version: 1.0 In-Reply-To: <20161130033533-mutt-send-email-mst@kernel.org> Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.167.226.69] X-yoursite-MailScanner-ID: 8FD5140056FC.A90E6 X-yoursite-MailScanner: Found to be clean X-yoursite-MailScanner-From: caoj.fnst@cn.fujitsu.com Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/30/2016 09:46 AM, Michael S. Tsirkin wrote: > On Mon, Nov 28, 2016 at 05:32:15PM +0800, Cao jin wrote: >> >> >>> >>>> + if (severity == AER_FATAL && strcmp(dev->driver->name, "vfio-pci")) { >>> >>> You really want some flag in the device, or something similar. >>> Also, how do we know driver is not going away at this point? >>> >> >> I didn't think of this condition, and I don't quite follow how would driver >> go away?(device has error happened, then is removed?) > > Yes - hotplug request detected. Does something prevent this? > I wasn't realized there would be possible to have hotplug happened during error recovery. But, it seems is the aer driver's thing to consider it? >>>> result = reset_link(dev); >>>> if (result != PCI_ERS_RESULT_RECOVERED) >>>> goto failed; >> >>>> @@ -1187,10 +1200,30 @@ static pci_ers_result_t vfio_pci_aer_err_detected(struct pci_dev *pdev, >>>> return PCI_ERS_RESULT_DISCONNECT; >>>> } >>>> >>>> + /* get device's uncorrectable error status as soon as possible, >>>> + * and signal it to user space. The later we read it, the possibility >>>> + * the register value is mangled grows. */ >>>> + aer_cap_offset = pci_find_ext_capability(vdev->pdev, PCI_EXT_CAP_ID_ERR); >>>> + ret = pci_read_config_dword(vdev->pdev, aer_cap_offset + >>>> + PCI_ERR_UNCOR_STATUS, &uncor_status); >>>> + if (ret) >>>> + return PCI_ERS_RESULT_DISCONNECT; >>>> + >>>> + pr_err("device %d got AER detect notification. uncorrectable error status = 0x%x\n", pdev->devfn, uncor_status);//to be removed >>>> mutex_lock(&vdev->igate); >>>> + >>>> + vdev->aer_recovering = true; >>>> + reinit_completion(&vdev->aer_error_completion); >>>> + >>>> + /* suspend config space access from user space, >>>> + * when vfio-pci's error recovery process is on */ >>> >>> what about access to memory etc? Do you need to suspend this as well? >>> >> >> Yes, this question came into my mind a little bit, but I didn't see some >> existing APIs like pci_cfg_access_xxx which can help to do this.(I am still >> not familiar with kernel) > > This isn't easy to do at all. > I guess we can use completion in vfio_pci_rw to block all kinds of access during vfio's error recovery. But from another perspective, now that we have disabled reset_link in host, I think the access to device could be performed normally. To me, the benefit of using pci_cfg_access is: we won't bother to do "wait 3s to do guest's link reset" > >>>> + pci_cfg_access_trylock(vdev->pdev); >>> >>> If you trylock, you need to handle failure. >> >> try lock returns 0 if access is already locked, 1 otherwise. Is it necessary >> to check its return value? > > Locked by whom? You blissfully access as if it's locked by you. > >> >> -- >> Sincerely, >> Cao jin >> > > > . > -- Sincerely, Cao jin