public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Kuppuswamy Sathyanarayanan  <sathyanarayanan.kuppuswamy@linux.intel.com>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
	ashok.raj@intel.com, keith.busch@intel.com
Subject: Re: [PATCH v11 1/8] PCI/ERR: Update error status after reset_link()
Date: Wed, 8 Jan 2020 16:14:09 -0800	[thread overview]
Message-ID: <7f7fdfec-5060-bcaa-38c4-6b973149e5cc@linux.intel.com> (raw)
In-Reply-To: <20200104025414.GA85401@google.com>

Hi Bjorn,

Thanks for the comments.

On 1/3/20 6:54 PM, Bjorn Helgaas wrote:
> On Fri, Jan 03, 2020 at 05:03:03PM -0800, Kuppuswamy Sathyanarayanan wrote:
>> On 1/3/20 4:34 PM, Bjorn Helgaas wrote:
>>> On Thu, Dec 26, 2019 at 04:39:07PM -0800, sathyanarayanan.kuppuswamy@linux.intel.com wrote:
>>>> From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
>>>>
>>>> Commit bdb5ac85777d ("PCI/ERR: Handle fatal error recovery") uses
>>>> reset_link() to recover from fatal errors. But, if the reset is
>>>> successful there is no need to continue the rest of the error recovery
>>>> checks. Also, during fatal error recovery, if the initial value of error
>>>> status is PCI_ERS_RESULT_DISCONNECT or PCI_ERS_RESULT_NO_AER_DRIVER then
>>>> even after successful recovery (using reset_link()) pcie_do_recovery()
>>>> will report the recovery result as failure. So update the status of
>>>> error after reset_link().
>>> I like the part about updating "status" with the result of
>>> reset_link(), and I split that into its own patch because it
>>> seems like a fix that *can* be separated.
>>>
>>> But I'm not convinced that we should skip the ->slot_reset()
>>> callbacks if the reset_link() was successful.
>> If reset_link() call is successful then the result value will be
>> "PCI_ERS_RESULT_RECOVERED". So even if you proceed with
>> rest of the code, slot_reset() will never get called right ?
> The current code:
>
>          if (state == pci_channel_io_frozen &&
>              reset_link(dev, service) != PCI_ERS_RESULT_RECOVERED)
>                  goto failed;
>          ...
>          if (status == PCI_ERS_RESULT_NEED_RESET) {
>                  status = PCI_ERS_RESULT_RECOVERED;
>                  pci_walk_bus(bus, report_slot_reset, &status);
>
> doesn't save the result of reset_link(), so if status was
> PCI_ERS_RESULT_NEED_RESET and the reset succeeds, we will call
> ->slot_reset().
>
> After your patch, if "state == pci_channel_io_frozen", we *never* call
> ->slot_reset().
>
> Do you think that matches pci-error-recovery.rst?  It doesn't seem
> like it to me, but perhaps I haven't read it closely enough.
Documentation does not have clear details on what to do with return
value of reset_link() ( step 3). But IMO, if step 3 recovers the device and
returns PCI_ERS_RESULT_RECOVERED then there is no need to proceed
to slot reset (step 4). May be we should update the Documentation ?

Keith,
You have any comments ?
>
>>> According to
>>> Documentation/PCI/pci-error-recovery.rst, we should call
>>> ->slot_reset() after completion of the reset.
>>>
>>> For example, rsxx_err_handler implements ->slot_reset(), but
>>> not ->resume().  If we reset the device, we'll claim success and
>>> return, but we won't call rsxx_slot_reset(), which does a bunch
>>> of important-looking recovery stuff.
>>>
>>> If pci-error-recovery.rst is wrong, we should fix that (after
>>> auditing all the drivers to make sure they match).
>>>
>>>> Fixes: bdb5ac85777d ("PCI/ERR: Handle fatal error recovery")
>>>> Cc: Ashok Raj <ashok.raj@intel.com>
>>>> Cc: Keith Busch <keith.busch@intel.com>
>>>> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
>>>> Acked-by: Keith Busch <keith.busch@intel.com>
>>>> ---
>>>>    drivers/pci/pcie/err.c | 10 +++++++---
>>>>    1 file changed, 7 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
>>>> index b0e6048a9208..53cd9200ec2c 100644
>>>> --- a/drivers/pci/pcie/err.c
>>>> +++ b/drivers/pci/pcie/err.c
>>>> @@ -204,9 +204,12 @@ void pcie_do_recovery(struct pci_dev *dev, enum pci_channel_state state,
>>>>    	else
>>>>    		pci_walk_bus(bus, report_normal_detected, &status);
>>>> -	if (state == pci_channel_io_frozen &&
>>>> -	    reset_link(dev, service) != PCI_ERS_RESULT_RECOVERED)
>>>> -		goto failed;
>>>> +	if (state == pci_channel_io_frozen) {
>>>> +		status = reset_link(dev, service);
>>>> +		if (status != PCI_ERS_RESULT_RECOVERED)
>>>> +			goto failed;
>>>> +		goto done;
>>>> +	}
>>>>    	if (status == PCI_ERS_RESULT_CAN_RECOVER) {
>>>>    		status = PCI_ERS_RESULT_RECOVERED;
>>>> @@ -228,6 +231,7 @@ void pcie_do_recovery(struct pci_dev *dev, enum pci_channel_state state,
>>>>    	if (status != PCI_ERS_RESULT_RECOVERED)
>>>>    		goto failed;
>>>> +done:
>>>>    	pci_dbg(dev, "broadcast resume message\n");
>>>>    	pci_walk_bus(bus, report_resume, &status);
>>>> -- 
>>>> 2.21.0
>>>>
>> -- 
>> Sathyanarayanan Kuppuswamy
>> Linux kernel developer
>>
-- 
Sathyanarayanan Kuppuswamy
Linux kernel developer


  reply	other threads:[~2020-01-09  0:16 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-27  0:39 [PATCH v11 0/8] Add Error Disconnect Recover (EDR) support sathyanarayanan.kuppuswamy
2019-12-27  0:39 ` [PATCH v11 1/8] PCI/ERR: Update error status after reset_link() sathyanarayanan.kuppuswamy
2020-01-04  0:34   ` Bjorn Helgaas
2020-01-04  1:03     ` Kuppuswamy Sathyanarayanan
2020-01-04  2:54       ` Bjorn Helgaas
2020-01-09  0:14         ` Kuppuswamy Sathyanarayanan [this message]
2020-01-09 23:26           ` Bjorn Helgaas
2020-01-10  1:08             ` Kuppuswamy Sathyanarayanan
2019-12-27  0:39 ` [PATCH v11 2/8] PCI/DPC: Allow dpc_probe() even if firmware first mode is enabled sathyanarayanan.kuppuswamy
2019-12-27  0:39 ` [PATCH v11 3/8] PCI/DPC: Add dpc_process_error() wrapper function sathyanarayanan.kuppuswamy
2019-12-27  0:39 ` [PATCH v11 4/8] PCI/DPC: Add Error Disconnect Recover (EDR) support sathyanarayanan.kuppuswamy
2019-12-27  0:39 ` [PATCH v11 5/8] PCI/AER: Allow clearing Error Status Register in FF mode sathyanarayanan.kuppuswamy
2019-12-30 23:59   ` Bjorn Helgaas
2019-12-31 18:11     ` Kuppuswamy Sathyanarayanan
2019-12-27  0:39 ` [PATCH v11 6/8] PCI/DPC: Update comments related to DPC recovery on NON_FATAL errors sathyanarayanan.kuppuswamy
2019-12-27  0:39 ` [PATCH v11 7/8] PCI/DPC: Clear AER registers in EDR mode sathyanarayanan.kuppuswamy
2019-12-27  0:39 ` [PATCH v11 8/8] PCI/ACPI: Enable EDR support sathyanarayanan.kuppuswamy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7f7fdfec-5060-bcaa-38c4-6b973149e5cc@linux.intel.com \
    --to=sathyanarayanan.kuppuswamy@linux.intel.com \
    --cc=ashok.raj@intel.com \
    --cc=helgaas@kernel.org \
    --cc=keith.busch@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox