public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Sinan Kaya <okaya@codeaurora.org>
To: Keith Busch <keith.busch@intel.com>, Oza Pawandeep <poza@codeaurora.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>,
	Philippe Ombredanne <pombredanne@nexb.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Kate Stewart <kstewart@linuxfoundation.org>,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
	Dongdong Liu <liudongdong3@huawei.com>,
	Gabriele Paoloni <gabriele.paoloni@huawei.com>,
	Wei Zhang <wzhang@fb.com>, Timur Tabi <timur@codeaurora.org>
Subject: Re: [PATCH v2 2/4] PCI/DPC/AER: Address Concurrency between AER and DPC
Date: Tue, 2 Jan 2018 08:25:08 -0500	[thread overview]
Message-ID: <5e9ffecf-2da7-0014-9a62-e2ae10323ce3@codeaurora.org> (raw)
In-Reply-To: <20171229172324.GF16407@localhost.localdomain>

Hi Keith,

On 12/29/2017 12:23 PM, Keith Busch wrote:
> On Fri, Dec 29, 2017 at 12:54:17PM +0530, Oza Pawandeep wrote:
>> This patch addresses the race condition between AER and DPC for recovery.
>>
>> Current DPC driver does not do recovery, e.g. calling end-point's driver's
>> callbacks, which sanitize the device.
>> DPC driver implements link_reset callback, and calls pci_do_recovery.
> 
> I'm not sure I see why any of this is necessary for two reasons:
> 
> 1. A downstream port containment event disables the link. How can a driver
> sanitize an end device when all the end devices below the containment are
> physically inaccessible? Any attempt to access such devices will just
> end with either CA or UR (depending on DPC control settings). Since we
> already know the failed outcome from attempting to access such devices,
> why do you want the drivers to do anything?

The reset callback to the endpoint driver has a status field indicating
whether the IO is frozen or not. If IO is not frozen, an endpoint driver
can potentially recover from the error by reissuing the failed request. 

If IO is frozen, then the endpoint driver needs to clean up outstanding
resources. It is not safe to just shutdown the driver while there are
transactions in flight. This is the reason for the status field and a
chance for driver to clean up any state machines and resources. 

Also note that the error callback has a result return value. An endpoint
driver indicates whether it was successful on recovering or not.


> 
> 2. A DPC event suppresses the error message required for the Linux
> AER driver to run. How can AER and DPC run concurrently?
> 

As we briefly discussed in previous email exchanges, I think you are
looking at a use case with a switch that supports DPC functionality. 

Oza and I are looking at a root port functionality with DPC feature. 

As you already know, AER errors are logged to AER capability register
independent of the DPC driver presence.

A root port is also allowed to share the MSI interrupts across DPC and
AER. 

Therefore, when a DPC interrupt fires; both AER driver and DPC driver
starts recovery work. This is the issue we are trying to deal with. 

In the end, the driver needs to work for both root port and switches.
I think you verified it against a switch. We are doing the same for a
root port and submitting the plumbing code. 

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

  parent reply	other threads:[~2018-01-02 13:25 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-29  7:24 [PATCH v2 0/4] Address error and recovery for AER and DPC Oza Pawandeep
2017-12-29  7:24 ` [PATCH v2 1/4] PCI/AER: factor out error reporting from AER Oza Pawandeep
2017-12-29  7:24 ` [PATCH v2 2/4] PCI/DPC/AER: Address Concurrency between AER and DPC Oza Pawandeep
2017-12-29 17:23   ` Keith Busch
2017-12-29 18:00     ` poza
2017-12-29 18:13       ` Keith Busch
2017-12-30  3:57         ` poza
2018-01-02 13:25     ` Sinan Kaya [this message]
2018-01-02 17:12       ` Keith Busch
2018-01-02 18:34         ` Sinan Kaya
2017-12-29  7:24 ` [PATCH v2 3/4] PCI/ERR: Do not do recovery if DPC service is active Oza Pawandeep
2017-12-29  7:24 ` [PATCH v2 4/4] PCI/DPC: Enumerate the devices after DPC trigger event Oza Pawandeep
2018-01-02 19:02 ` [PATCH v2 0/4] Address error and recovery for AER and DPC Bjorn Helgaas
2018-01-02 19:09   ` Sinan Kaya
2018-01-02 19:12   ` Keith Busch
2018-01-03  6:14   ` poza

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5e9ffecf-2da7-0014-9a62-e2ae10323ce3@codeaurora.org \
    --to=okaya@codeaurora.org \
    --cc=bhelgaas@google.com \
    --cc=gabriele.paoloni@huawei.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=keith.busch@intel.com \
    --cc=kstewart@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=liudongdong3@huawei.com \
    --cc=pombredanne@nexb.com \
    --cc=poza@codeaurora.org \
    --cc=tglx@linutronix.de \
    --cc=timur@codeaurora.org \
    --cc=wzhang@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox