From: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
To: Lucas Stach <l.stach@pengutronix.de>
Cc: Bharat Kumar Gogada <bharatku@xilinx.com>,
Bjorn Helgaas <helgaas@kernel.org>,
Jingoo Han <jingoohan1@gmail.com>,
"joao.pinto@synopsys.com" <joao.pinto@synopsys.com>,
Ley Foon Tan <lftan@altera.com>,
Shawn Lin <shawn.lin@rock-chips.com>,
Michal Simek <michal.simek@xilinx.com>,
Jim Quinlan <jim2101024@gmail.com>,
"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
"rfi@lists.rocketboards.org" <rfi@lists.rocketboards.org>,
"linux-rockchip@lists.infradead.org"
<linux-rockchip@lists.infradead.org>
Subject: Re: Why do we check for "link-up" in *_pcie_valid_device()?
Date: Mon, 8 Jan 2018 11:24:59 +0000 [thread overview]
Message-ID: <20180108112458.GB32027@red-moon> (raw)
In-Reply-To: <1515409414.12538.12.camel@pengutronix.de>
On Mon, Jan 08, 2018 at 12:03:34PM +0100, Lucas Stach wrote:
> Am Freitag, den 05.01.2018, 15:43 +0000 schrieb Lorenzo Pieralisi:
> > On Fri, Jan 05, 2018 at 02:26:34PM +0000, Bharat Kumar Gogada wrote:
> > > On Fri, Dec 22, 2017 at 01:02:28PM +0000, Bharat Kumar Gogada
> > > wrote:
> > > > Bjorn wrote:
> > > > > In the PCI config access path, the *_pcie_valid_device()
> > > > > functions in
> > > > > the dwc, altera, rockchip, and xilinx drivers all check whether
> > > > > the
> > > > > link is up.
> > > > >
> > > > > I think this is racy because the link may go down after we
> > > > > check but
> > > > > before we perform the config access.
> > > > >
> > > > > What would blow up if we removed the *_pcie_link_up() checks?
> > > > >
> > > > > I'd like to either remove the checks or add comments about why
> > > > > the
> > > > > race is acceptable. If we've covered this before, I apologize.
> > > > > Adding a comment will keep me from pestering you about this
> > > > > again in
> > > > > the future.
> > > > In both Xilinx driver cases when link is down, hardware responds
> > > > by
> > > > AXI DECERR/SLVERR status which causes an exception, synchronous
> > > > external abort to CPU. This causes system to hang, so we need
> > > > this
> > > > check for both of our drivers. We will add comments.
> > >
> > > This is a problem, and checking whether the link is up is a
> > > workaround but not a real solution. That means your system may
> > > hang if the link happens to go down at the wrong time.
> > >
> > > A real solution would be to handle the synchronous external abort
> > > so it doesn't cause a system hang.
> > >
> > > Yes, I agree that this is workaround. For pcie-xilinx.c for arm32,
> > > we can have fault handling similar to "imx6q_pcie_abort_handler" in
> > > drivers/pci/dwc/pci-imx6.c.
> > > Since this driver is same for Microblaze architecture also, it
> > > requires separate handling.
> > >
> > > For pcie-xilinx-nwl.c ARM64 as per link [1], linux kernel will hang
> > > for the above AXI responses.
> > > As of now arm64 RAS is still work in progress [2].
> > >
> > > [1] https://www.spinics.net/lists/arm-kernel/msg624203.html
> > >
> > > [2] https://patchwork.kernel.org/patch/9973967/
> > >
> > > The check can be removed, if above issues were addressed.
> >
> > I do not see why the above "issues" should be addressed in order to
> > remove that check - as it was pointed out in this thread it just does
> > not solve anything, so what's the reason for keeping it ?
>
> I solves the issue that you hang the system on PCIe enumeration in 100%
> of the cases when the link is down and you don't have the abort handler
> in place.
There is a mechanism to detect if the link is up before starting
enumeration or am I wrong ?
Probably what we should be discussing here is what are the causes
for the link to go down *unexpectedly* - in other words - what
makes that racy check "likely" to work - which was the initial
question, by the way.
> It doesn't solve the race issue, but that is a lot less likely to be
> hit in the real world. I guess it's not a good idea to remove something
> that covers 98% of the problem just because it doesn't cover the
> remaining 2%, right?
See above - I want to understand what your 98% and 2% actually represent.
Thanks,
Lorenzo
next prev parent reply other threads:[~2018-01-08 11:24 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-12-14 22:58 Why do we check for "link-up" in *_pcie_valid_device()? Bjorn Helgaas
2017-12-15 18:39 ` Jingoo Han
2017-12-15 18:39 ` Jingoo Han
2017-12-15 19:04 ` Bjorn Helgaas
2017-12-15 20:11 ` Bjorn Helgaas
2017-12-22 13:02 ` Bharat Kumar Gogada
2017-12-22 13:02 ` Bharat Kumar Gogada
2017-12-22 17:28 ` Bjorn Helgaas
2018-01-02 11:37 ` Lorenzo Pieralisi
2018-01-05 14:26 ` Bharat Kumar Gogada
2018-01-05 14:26 ` Bharat Kumar Gogada
2018-01-05 15:43 ` Lorenzo Pieralisi
2018-01-08 11:03 ` Lucas Stach
2018-01-08 11:24 ` Lorenzo Pieralisi [this message]
2018-01-02 12:24 ` Shawn Lin
2018-01-02 12:28 ` Shawn Lin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180108112458.GB32027@red-moon \
--to=lorenzo.pieralisi@arm.com \
--cc=bharatku@xilinx.com \
--cc=helgaas@kernel.org \
--cc=jim2101024@gmail.com \
--cc=jingoohan1@gmail.com \
--cc=joao.pinto@synopsys.com \
--cc=l.stach@pengutronix.de \
--cc=lftan@altera.com \
--cc=linux-pci@vger.kernel.org \
--cc=linux-rockchip@lists.infradead.org \
--cc=michal.simek@xilinx.com \
--cc=rfi@lists.rocketboards.org \
--cc=shawn.lin@rock-chips.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.