linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sinan Kaya <okaya@codeaurora.org>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: Oza Pawandeep <poza@codeaurora.org>,
	linux-pci@vger.kernel.org, timur@codeaurora.org,
	Gabriele Paoloni <gabriele.paoloni@huawei.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Dongdong Liu <liudongdong3@huawei.com>,
	linux-arm-msm@vger.kernel.org,
	Bjorn Helgaas <bhelgaas@google.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH v2 4/4] PCI/AER: Dont do recovery when DPC is enabled
Date: Thu, 16 Nov 2017 15:52:47 -0500	[thread overview]
Message-ID: <16b9ac67-042d-1010-0c8c-205b2fa40a7f@codeaurora.org> (raw)
In-Reply-To: <20171116201745.GB7266@bhelgaas-glaptop.roam.corp.google.com>

>>
>> Whether the AER driver reads ~0 or not really depends on timing. The
>> link may come up from the DPC driver by the time AER driver reaches
>> here as an example.
>>
>> Bad things do happen. We have seen this with e1000e driver.
> 
> I don't doubt that bad things happen.  I'm just trying to understand
> exactly *what* bad things happen and how, so we can fix them cleanly.
> 

This was random crashes in the e1000e drivers accompanied with stack
traces coming from WARN and msi allocation routines.

> I don't know exactly what you mean by "DPC stops the drivers
> immediately".  Since the DPC hardware disables the Link, I *think*
> you probably mean that driver accesses to the device start failing
> (whether the driver notices this is a whole different question).

Sorry for not being clear.

I was talking about the pci_stop_and_remove_bus_device() call in DPC's
interrupt_event_handler() function as the "stop command".

> 
> When the DPC hardware disables the Link, it causes a hot reset for
> downstream components.  The DPC interrupt_event_handler() doesn't do
> much except remove the device (which detaches the driver) and clear
> the DPC Trigger Status bit (which allows hardware to try to retrain
> the Link).
> 

That's true.

> So the "stop" and "recover" commands you mention must be related to
> AER.  

I was talking about pci_stop_and_remove_bus_device() vs. error_detected()
as "stop" and "recover"

> I guess these would be some of the driver callbacks
> (error_detected(), mmio_enabled(), slot_reset(), reset_prepare(),
> reset_done(), resume())?
> 
> In any case, I agree that it probably doesn't make sense to call any
> of these callbacks if the DPC driver has already detached the driver
> and re-attached it.  The device state is gone because of the hot reset
> and the driver state is gone because of the detach/re-attach.
> 
> However, I'm not so sure about the period *before* the DPC driver
> detaches the driver.  The description of error_detected() says it
> cannot assume the device is accessible, so I think there might be an
> argument that AER *should* call this for DPC events so the driver has
> a chance to clean up before being unceremoniously detached.

Makes sense. Let us work on this.

> 
> I suspect this all probably requires tighter integration between DPC
> and AER, and I'm totally fine with that.  I think the current
> separation as separate "drivers" is pretty artificial anyway.

Got it. We will try to plumb DPC error handling into AER driver's error
handling mechanism.

Another question:

What do you think about the rescan following link up? The only entity
that does rescan today is hotplug after DPC recovery. There could be
a platform with DPC support but no hotplug support. 

How should we handle it?

We can call rescan from DPC all the time.

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.

  reply	other threads:[~2017-11-16 20:52 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-15  4:56 [PATCH v2 0/4] PCI: query active service list Oza Pawandeep
2017-11-15  4:56 ` [PATCH v2 1/4] PCI: Add port service list node for pci_dev Oza Pawandeep
2017-11-15  4:56 ` [PATCH v2 2/4] PCI/portdrv: Add/Remove port services to the list Oza Pawandeep
2017-11-15  4:56 ` [PATCH v2 3/4] PCI/portdrv: Implement interface to query the registered service Oza Pawandeep
2017-11-15  4:56 ` [PATCH v2 4/4] PCI/AER: Dont do recovery when DPC is enabled Oza Pawandeep
2017-11-15 21:14   ` Bjorn Helgaas
2017-11-16 14:03     ` Sinan Kaya
2017-11-16 20:17       ` Bjorn Helgaas
2017-11-16 20:52         ` Sinan Kaya [this message]
2017-11-18  0:02           ` Bjorn Helgaas
2017-11-19 16:41             ` Sinan Kaya
2017-11-21 16:25       ` David Laight
2017-11-21 16:43         ` Sinan Kaya

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=16b9ac67-042d-1010-0c8c-205b2fa40a7f@codeaurora.org \
    --to=okaya@codeaurora.org \
    --cc=bhelgaas@google.com \
    --cc=gabriele.paoloni@huawei.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=helgaas@kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-arm-msm@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=liudongdong3@huawei.com \
    --cc=poza@codeaurora.org \
    --cc=tglx@linutronix.de \
    --cc=timur@codeaurora.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).