From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ie0-f170.google.com ([209.85.223.170]:42772 "EHLO mail-ie0-f170.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757365Ab3ETWs2 (ORCPT ); Mon, 20 May 2013 18:48:28 -0400 Received: by mail-ie0-f170.google.com with SMTP id aq17so15266307iec.1 for ; Mon, 20 May 2013 15:48:27 -0700 (PDT) Date: Mon, 20 May 2013 16:48:24 -0600 From: Bjorn Helgaas To: "Zhang, LongX" Cc: "linasvepstas@gmail.com" , "linux-pci@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "yanmin_zhang@linux.intel.com" , "Joseph.Liu@Emulex.Com" Subject: Re: Subject : [ PATCH ] pci-reset-error_state-to-pci_channel_io_normal-at-report_slot_reset Message-ID: <20130520224824.GA31740@google.com> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: Sender: linux-pci-owner@vger.kernel.org List-ID: On Fri, Apr 26, 2013 at 06:28:59AM +0000, Zhang, LongX wrote: > From: Zhang Long > > Specific pci device drivers might have many functions to call > pci_channel_offline to check device states. When slot_reset happens, > drivers' slot_reset callback might call such functions and eventually > abort the reset. > > The patch resets pdev->error_state to pci_channel_io_normal at > the begining of report_slot_reset. > > Thank Liu Joseph for pointing it out. > > Signed-off-by: Zhang Yanmin > Signed-off-by: Zhang Long > --- > drivers/pci/pcie/aer/aerdrv_core.c | 1 + > drivers/pci/pcie/portdrv_pci.c | 12 +++++------- > 2 files changed, 6 insertions(+), 7 deletions(-) > > diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c > index 564d97f..c61fd44 100644 > --- a/drivers/pci/pcie/aer/aerdrv_core.c > +++ b/drivers/pci/pcie/aer/aerdrv_core.c > @@ -286,6 +286,7 @@ static int report_slot_reset(struct pci_dev *dev, void *data) > result_data = (struct aer_broadcast_data *) data; > > device_lock(&dev->dev); > + dev->error_state = pci_channel_io_normal; > if (!dev->driver || > !dev->driver->err_handler || > !dev->driver->err_handler->slot_reset) > diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c > index ed4d094..7abefd9 100644 > --- a/drivers/pci/pcie/portdrv_pci.c > +++ b/drivers/pci/pcie/portdrv_pci.c > @@ -332,13 +332,11 @@ static pci_ers_result_t pcie_portdrv_slot_reset(struct pci_dev *dev) > pci_ers_result_t status = PCI_ERS_RESULT_RECOVERED; > int retval; > > - /* If fatal, restore cfg space for possible link reset at upstream */ > - if (dev->error_state == pci_channel_io_frozen) { > - dev->state_saved = true; > - pci_restore_state(dev); > - pcie_portdrv_restore_config(dev); > - pci_enable_pcie_error_reporting(dev); > - } > + /* restore cfg space for possible link reset at upstream */ > + dev->state_saved = true; > + pci_restore_state(dev); > + pcie_portdrv_restore_config(dev); > + pci_enable_pcie_error_reporting(dev); > > /* get true return value from &status */ > retval = device_for_each_child(&dev->dev, &status, slot_reset_iter); I think this patch changes the behavior in the case of a non-fatal error where one of the .error_detected() methods returned PCI_ERS_RESULT_NEED_RESET. In that case, pcie_portdrv_slot_reset() previously did not restore config space, but after your patch, it *will* restore it. We need an explanation of why this is safe. I think you should split this into two patches: the first would remove the "if (dev->error_state == pci_channel_io_frozen)" test from portdrv_pci.c and explain the reason, and the second would make the aerdrv_core.c change. I'm also concerned that in that same case (a non-fatal error where one of the .error_detected() methods returned PCI_ERS_RESULT_NEED_RESET), I don't think we actually *do* any kind of device reset. This isn't related to your patch, of course, so if you resolve the config space restore question, we can deal with the reset question later. Bjorn