From: poza@codeaurora.org
To: Keith Busch <keith.busch@intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>,
Philippe Ombredanne <pombredanne@nexb.com>,
Thomas Gleixner <tglx@linutronix.de>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Kate Stewart <kstewart@linuxfoundation.org>,
linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
Dongdong Liu <liudongdong3@huawei.com>,
Gabriele Paoloni <gabriele.paoloni@huawei.com>,
Wei Zhang <wzhang@fb.com>, Sinan Kaya <okaya@codeaurora.org>,
Timur Tabi <timur@codeaurora.org>
Subject: Re: [PATCH v2 2/4] PCI/DPC/AER: Address Concurrency between AER and DPC
Date: Fri, 29 Dec 2017 23:30:02 +0530 [thread overview]
Message-ID: <481784bb38707839493fc04e6a2b4068@codeaurora.org> (raw)
In-Reply-To: <20171229172324.GF16407@localhost.localdomain>
On 2017-12-29 22:53, Keith Busch wrote:
> On Fri, Dec 29, 2017 at 12:54:17PM +0530, Oza Pawandeep wrote:
>> This patch addresses the race condition between AER and DPC for
>> recovery.
>>
>> Current DPC driver does not do recovery, e.g. calling end-point's
>> driver's
>> callbacks, which sanitize the device.
>> DPC driver implements link_reset callback, and calls pci_do_recovery.
>
> I'm not sure I see why any of this is necessary for two reasons:
>
> 1. A downstream port containment event disables the link. How can a
> driver
> sanitize an end device when all the end devices below the containment
> are
> physically inaccessible? Any attempt to access such devices will just
> end with either CA or UR (depending on DPC control settings). Since we
> already know the failed outcome from attempting to access such devices,
> why do you want the drivers to do anything?
>
Ok I think my statement was misleading, not device sanitation, but the
device driver making
SW sanitize.
for e.g. have a look at e1000_io_error_detected which is called say in
case of AER ERR_FATAL msg.
which sanitizes sw stack, interrupts management (synchronize_irq),
delete timers etc..
yes, DPC would have made the link state disabled, and HW would have
reset the internal logic with
quiescence activities so yes, any transaction on will end with CA or UR.
well but device driver
has to handle rest of the possible things as I mentioned (error
callbacks)
> 2. A DPC event suppresses the error message required for the Linux
> AER driver to run. How can AER and DPC run concurrently?
I afraid I could not grasp the first line completely.
but they way it is triggering AER and DPC on our platform concurrently
is, we have same MSIx registered
for both AER and DPC, and linux calls the shared handlers to handle both
the triggers anyway.
otherwise also if ERR_FATAL msg occurs, the Root port should trigger
both AER and DPC
(assuming both are enabled, and no FW first for AER/DPC)
the problem with the current framework of AER and DPC in Linux is:
both try to act independently, while we know that (for e.g. ERR_FATAL
msg) is responsible for triggering
both AER and DPC depending on the configuration. (currently DPC is
configured for both FATAL and NONFATAL in linux anyway)
It does not make sense that AER goes ahead and attempts to sanitize with
the device driver's callbacks as I mentioned.
and DPC being unaware, asynchronously disables the link (although this
is all HW)
but DPC service driver should adapt to some kind of error handling and
error resume which AER has adapted.
Hence this whole design changes proposed with respect to error handling.
Let me give you another problem statement on the same line:
when DPC is active, AER does not need to act at all...because it doesnt
make sense for AER to act independently.,
without knowing what DPC service driver is upto!
which is handled in one of the patches.
the point I am trying to make is: DPC should not rely on AER to call
error callbacks, and AER should not be doing it without knowing that
DPC is active and it is also going to some course of action (be in HW or
SW)
Regards,
Oza.
next prev parent reply other threads:[~2017-12-29 18:00 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-12-29 7:24 [PATCH v2 0/4] Address error and recovery for AER and DPC Oza Pawandeep
2017-12-29 7:24 ` [PATCH v2 1/4] PCI/AER: factor out error reporting from AER Oza Pawandeep
2017-12-29 7:24 ` [PATCH v2 2/4] PCI/DPC/AER: Address Concurrency between AER and DPC Oza Pawandeep
2017-12-29 17:23 ` Keith Busch
2017-12-29 18:00 ` poza [this message]
2017-12-29 18:13 ` Keith Busch
2017-12-30 3:57 ` poza
2018-01-02 13:25 ` Sinan Kaya
2018-01-02 17:12 ` Keith Busch
2018-01-02 18:34 ` Sinan Kaya
2017-12-29 7:24 ` [PATCH v2 3/4] PCI/ERR: Do not do recovery if DPC service is active Oza Pawandeep
2017-12-29 7:24 ` [PATCH v2 4/4] PCI/DPC: Enumerate the devices after DPC trigger event Oza Pawandeep
2018-01-02 19:02 ` [PATCH v2 0/4] Address error and recovery for AER and DPC Bjorn Helgaas
2018-01-02 19:09 ` Sinan Kaya
2018-01-02 19:12 ` Keith Busch
2018-01-03 6:14 ` poza
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=481784bb38707839493fc04e6a2b4068@codeaurora.org \
--to=poza@codeaurora.org \
--cc=bhelgaas@google.com \
--cc=gabriele.paoloni@huawei.com \
--cc=gregkh@linuxfoundation.org \
--cc=keith.busch@intel.com \
--cc=kstewart@linuxfoundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=liudongdong3@huawei.com \
--cc=okaya@codeaurora.org \
--cc=pombredanne@nexb.com \
--cc=tglx@linutronix.de \
--cc=timur@codeaurora.org \
--cc=wzhang@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).