linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@ziepe.ca>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Leon Romanovsky <leon@kernel.org>,
	linux-rdma@vger.kernel.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, sbest@redhat.com,
	saeedm@mellanox.com, alex.williamson@redhat.com,
	paulus@samba.org, linux-pci@vger.kernel.org, bhelgaas@google.com,
	ogerlitz@mellanox.com, David Gibson <david@gibson.dropbear.id.au>,
	linuxppc-dev@lists.ozlabs.org, davem@davemloft.net,
	tariqt@mellanox.com
Subject: Re: [PATCH] PCI: Add no-D3 quirk for Mellanox ConnectX-[45]
Date: Mon, 7 Jan 2019 21:01:29 -0700	[thread overview]
Message-ID: <20190108040129.GE5336@ziepe.ca> (raw)
In-Reply-To: <c1296ee9120a6a04dc75d0fdb2a641c722cb65d6.camel@kernel.crashing.org>

On Sun, Jan 06, 2019 at 09:43:46AM +1100, Benjamin Herrenschmidt wrote:
> On Sat, 2019-01-05 at 10:51 -0700, Jason Gunthorpe wrote:
> > 
> > > Interesting.  I've investigated this further, though I don't have as
> > > many new clues as I'd like.  The problem occurs reliably, at least on
> > > one particular type of machine (a POWER8 "Garrison" with ConnectX-4).
> > > I don't yet know if it occurs with other machines, I'm having trouble
> > > getting access to other machines with a suitable card.  I didn't
> > > manage to reproduce it on a different POWER8 machine with a
> > > ConnectX-5, but I don't know if it's the difference in machine or
> > > difference in card revision that's important.
> > 
> > Make sure the card has the latest firmware is always good advice..
> > 
> > > So possibilities that occur to me:
> > >   * It's something specific about how the vfio-pci driver uses D3
> > >     state - have you tried rebinding your device to vfio-pci?
> > >   * It's something specific about POWER, either the kernel or the PCI
> > >     bridge hardware
> > >   * It's something specific about this particular type of machine
> > 
> > Does the EEH indicate what happend to actually trigger it?
> 
> In a very cryptic way that requires manual parsing using non-public
> docs sadly but yes. From the look of it, it's a completion timeout.
> 
> Looks to me like we don't get a response to a config space access
> during the change of D state. I don't know if it's the write of the D3
> state itself or the read back though (it's probably detected on the
> read back or a subsequent read, but that doesn't tell me which specific
> one failed).

If it is just one card doing it (again, check you have latest
firmware) I wonder if it is a sketchy PCI-E electrical link that is
causing a long re-training cycle? Can you tell if the PCI-E link is
permanently gone or does it eventually return?

Does the card work in Gen 3 when it starts? Is there any indication of
PCI-E link errors?

Everytime or sometimes?

POWER 8 firmware is good? If the link does eventually come back, is
the POWER8's D3 resumption timeout long enough?

If this doesn't lead to an obvious conclusion you'll probably need to
connect to IBM's Mellanox support team to get more information from
the card side.

Jason

  reply	other threads:[~2019-01-08  4:03 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-06  4:19 [PATCH] PCI: Add no-D3 quirk for Mellanox ConnectX-[45] David Gibson
2018-12-06  6:45 ` Leon Romanovsky
2018-12-11  2:31   ` David Gibson
2019-01-04  3:44   ` David Gibson
2019-01-05 17:51     ` Jason Gunthorpe
2019-01-05 22:43       ` Benjamin Herrenschmidt
2019-01-08  4:01         ` Jason Gunthorpe [this message]
2019-01-08  6:07           ` Leon Romanovsky
2019-01-09  5:09           ` Benjamin Herrenschmidt
2019-01-09  5:30             ` David Gibson
2019-01-09  6:32               ` Alexey Kardashevskiy
2019-01-09  7:25                 ` Benjamin Herrenschmidt
2019-01-09  8:14                   ` Alexey Kardashevskiy
2019-01-09 15:27             ` Jason Gunthorpe
2019-01-09  4:53         ` Alexey Kardashevskiy
2019-01-09  7:24           ` Benjamin Herrenschmidt
2019-01-09  8:20             ` Alexey Kardashevskiy
2018-12-11 14:01 ` Bjorn Helgaas
2018-12-12  0:22   ` David Gibson
2018-12-12  3:04     ` Bjorn Helgaas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190108040129.GE5336@ziepe.ca \
    --to=jgg@ziepe.ca \
    --cc=alex.williamson@redhat.com \
    --cc=benh@kernel.crashing.org \
    --cc=bhelgaas@google.com \
    --cc=davem@davemloft.net \
    --cc=david@gibson.dropbear.id.au \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=netdev@vger.kernel.org \
    --cc=ogerlitz@mellanox.com \
    --cc=paulus@samba.org \
    --cc=saeedm@mellanox.com \
    --cc=sbest@redhat.com \
    --cc=tariqt@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).