From: Bjorn Helgaas <helgaas@kernel.org>
To: Dennis Dalessandro <dennis.dalessandro@intel.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>,
"Arumugam, Kamenee" <kamenee.arumugam@intel.com>,
"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>,
"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
"Ruhl, Michael J" <michael.j.ruhl@intel.com>,
"dledford@redhat.com" <dledford@redhat.com>
Subject: Re: [PATCH for-next 2/2] IB/hfi1: Make Unsupported Request error non-fatal
Date: Mon, 15 Apr 2019 16:46:51 -0500 [thread overview]
Message-ID: <20190415214651.GM126710@google.com> (raw)
In-Reply-To: <ef74c759-a5ac-c497-0e82-345ddf3c9255@intel.com>
On Mon, Apr 15, 2019 at 02:47:01PM -0400, Dennis Dalessandro wrote:
> On 4/12/2019 9:55 AM, Jason Gunthorpe wrote:
> > On Thu, Apr 11, 2019 at 08:37:53PM +0000, Arumugam, Kamenee wrote:
> > > On Thu, Apr 11, 2019 at 06:22:45PM +0000, Arumugam, Kamenee wrote:
> > >
> > > > This is a device bug then.
> > >
> > > > A RDMA device must accept and respond to all TLPs that the CPU
> > > > could create for the user accessible BAR pages.
> > >
> > > > A user process must not be able to crash the CPU or make the
> > > > device malfunction by accessing the exposed BAR page. This
> > > > includes a broad range of topics, like mis-aligned acceses,
> > > > SSE instructions, atomics, >etc.
> > >
> > > > Is blocking AER even enough here? If the device isn't
> > > > generating a reasonable reply I have a bad feeling worse will
> > > > happen.
> > >
> > > After blocking unsupported request error, we don't see any other
> > > issue including no system hang.
> >
> > Are you specifically testing all the special TLPs the CPU can
> > produce?
>
> All the special TLPs should have been tested. This however seems to
> be a missed test case. Not that surprising though given differences
> in BIOS and things of that nature that something falls through the
> cracks and is extra hard to find.
Is there a published erratum for this? I don't have warm fuzzies yet
that we actually know the root cause here.
Kamenee said the problem case was:
user-level application is making spurious read accesses (invalid
width access) to this memory mapping causing the device to report an
unsupported request error through AER.
So I guess that means the application performed a read and got invalid
data back? I think the Root Complex had to supply *some* data to
complete the CPU's read, and since the HFI responded with UR instead
of data, the RC probably fabricated something. Many RCs fabricate ~0,
but I don't think that's actually required by the spec, so I'm
doubtful that the application can reliably detect this.
I'd be really surprised that something as obvious as an invalid width
wasn't tested, especially if this is intended for direct mapping into
user applications.
Bjorn
prev parent reply other threads:[~2019-04-15 21:46 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-04-10 12:34 [PATCH for-next 0/2] Allow drivers to configure AER registers Dennis Dalessandro
2019-04-10 12:34 ` [PATCH for-next 1/2] PCI/AER: Helper function for configuring " Dennis Dalessandro
2019-04-10 13:46 ` Andriy Shevchenko
2019-04-10 12:35 ` [PATCH for-next 2/2] IB/hfi1: Make Unsupported Request error non-fatal Dennis Dalessandro
2019-04-10 19:29 ` Bjorn Helgaas
[not found] ` <14063C7AD467DE4B82DEDB5C278E8663BE6A1B14@FMSMSX108.amr.corp.intel.com>
2019-04-11 18:22 ` Arumugam, Kamenee
2019-04-11 18:29 ` Jason Gunthorpe
2019-04-11 20:37 ` Arumugam, Kamenee
2019-04-12 13:55 ` Jason Gunthorpe
2019-04-15 18:47 ` Dennis Dalessandro
2019-04-15 21:46 ` Bjorn Helgaas [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190415214651.GM126710@google.com \
--to=helgaas@kernel.org \
--cc=dennis.dalessandro@intel.com \
--cc=dledford@redhat.com \
--cc=jgg@ziepe.ca \
--cc=kamenee.arumugam@intel.com \
--cc=linux-pci@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=michael.j.ruhl@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.