linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sinan Kaya <okaya@codeaurora.org>
To: David Laight <David.Laight@ACULAB.COM>,
	Bjorn Helgaas <helgaas@kernel.org>,
	Oza Pawandeep <poza@codeaurora.org>
Cc: "linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	"timur@codeaurora.org" <timur@codeaurora.org>,
	Gabriele Paoloni <gabriele.paoloni@huawei.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Dongdong Liu <liudongdong3@huawei.com>,
	"linux-arm-msm@vger.kernel.org" <linux-arm-msm@vger.kernel.org>,
	Bjorn Helgaas <bhelgaas@google.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>
Subject: Re: [PATCH v2 4/4] PCI/AER: Dont do recovery when DPC is enabled
Date: Tue, 21 Nov 2017 11:43:32 -0500	[thread overview]
Message-ID: <13cf28da-9ab3-33a5-ea91-c62c186c6170@codeaurora.org> (raw)
In-Reply-To: <de047a7ce7334daab4e53ea4e8444421@AcuMS.aculab.com>

On 11/21/2017 11:25 AM, David Laight wrote:
>> The DPC on the other hand stops the drivers immediately since HW took care of
>> link disable. (Endpoint register reads return ~0 at this point.)
> What happens if the 'user' driver doesn't define the error reporting callbacks?
> It might be hardened against the ~0u returns from reads - so not OOPS.
> It might be appropriate to call the remove() function instead.

This is what the DPC driver does in its interrupt handler. 

http://elixir.free-electrons.com/linux/latest/ident/interrupt_event_handler

My understanding is that this will eventually call the remove() function on the
endpoint driver eventually. 

Bjorn had concerns that we are not calling the error handler if registered and
then calling remove() callback while the driver is in the middle of something
could be bad. 

He had concerns if remove() would leave something in a bad state so recovery
would really not work at all and kernel crashes eventually due to data corruption. 

Oza and I are looking for a way to plumb DPC's error handling into AER driver
so that PCI framework has a single place to look for error handling.

for dpc:
1. If an error handler registered, call it for all children devices
2. Remove all children devices from the bus
3. Recover the link with DPC
4. Rescan the entire bus and install the drivers again

> 
>> DPC driver clears
>> the interrupt from the DPC capability and brings the link up at the end. Full
>> enumeration/rescan follows this procedure to go back to functioning state.
> That might not be a good idea, very likely it will fail again immediately.

We can add a policy parameter and not bring up the link if you want to do
troubleshooting at the point of failure or have a way to define how the
system response should be. 

DPC causes a hot reset on the bus. Endpoint should go to reset state and we should
be able to bring up the link without any problems under normal circumstances.

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

      reply	other threads:[~2017-11-21 16:43 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-15  4:56 [PATCH v2 0/4] PCI: query active service list Oza Pawandeep
2017-11-15  4:56 ` [PATCH v2 1/4] PCI: Add port service list node for pci_dev Oza Pawandeep
2017-11-15  4:56 ` [PATCH v2 2/4] PCI/portdrv: Add/Remove port services to the list Oza Pawandeep
2017-11-15  4:56 ` [PATCH v2 3/4] PCI/portdrv: Implement interface to query the registered service Oza Pawandeep
2017-11-15  4:56 ` [PATCH v2 4/4] PCI/AER: Dont do recovery when DPC is enabled Oza Pawandeep
2017-11-15 21:14   ` Bjorn Helgaas
2017-11-16 14:03     ` Sinan Kaya
2017-11-16 20:17       ` Bjorn Helgaas
2017-11-16 20:52         ` Sinan Kaya
2017-11-18  0:02           ` Bjorn Helgaas
2017-11-19 16:41             ` Sinan Kaya
2017-11-21 16:25       ` David Laight
2017-11-21 16:43         ` Sinan Kaya [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=13cf28da-9ab3-33a5-ea91-c62c186c6170@codeaurora.org \
    --to=okaya@codeaurora.org \
    --cc=David.Laight@ACULAB.COM \
    --cc=bhelgaas@google.com \
    --cc=gabriele.paoloni@huawei.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=helgaas@kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=liudongdong3@huawei.com \
    --cc=poza@codeaurora.org \
    --cc=tglx@linutronix.de \
    --cc=timur@codeaurora.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).