All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: Dennis Dalessandro <dennis.dalessandro@intel.com>
Cc: jgg@ziepe.ca, linux-rdma@vger.kernel.org,
	linux-pci@vger.kernel.org,
	"Michael J. Ruhl" <michael.j.ruhl@intel.com>,
	dledford@redhat.com,
	Kamenee Arumugam <kamenee.arumugam@intel.com>
Subject: Re: [PATCH for-next 2/2] IB/hfi1: Make Unsupported Request error non-fatal
Date: Wed, 10 Apr 2019 14:29:48 -0500	[thread overview]
Message-ID: <20190410192948.GG256045@google.com> (raw)
In-Reply-To: <20190410123455.26818.49424.stgit@scvm10.sc.intel.com>

Hi Dennis,

On Wed, Apr 10, 2019 at 05:35:01AM -0700, Dennis Dalessandro wrote:
> From: Kamenee Arumugam <kamenee.arumugam@intel.com>
> 
> For hfi1, the unsupported request error is not considered a fatal
> error. When the PCIe advanced error reporting capability (AER) is
> configured to report unsupported requests as fatal, the system will
> hang on this error.

I know there are a few drivers that fiddle with AER bits, but that
makes me a little bit nervous because error handling is more than just
a driver issue.  It involves the PCI core and the platform firmware as
well.

Anyway, let's figure out more about this particular case.  Unsupported
Request is a PCIe protocol-level issue.  You're masking it in the HFI
adapter, which I guess means you want to prevent it from reporting UR.
So the HFI is receiving a TLP that it doesn't support?

What exactly is causing the UR?  Is it something the driver could
potentially avoid, e.g., an AtomicOp that HFI doesn't support?  I have
a vague notion that InfiniBand allows some sort of direct user-space
access to hardware; is there something there that can cause a UR?

The system hang sounds like a separate problem that should also be
fixed.  Even if HFI signals a UR error, I would not expect a system
hang.

Bjorn

> Set Unsupported Request Error bit in Uncorrectable Error Mask
> register to disable error reporting to the PCIe root complex.
> 
> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
> Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com>
> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
> ---
>  drivers/infiniband/hw/hfi1/pcie.c |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/hfi1/pcie.c b/drivers/infiniband/hw/hfi1/pcie.c
> index c96d193..a033e28 100644
> --- a/drivers/infiniband/hw/hfi1/pcie.c
> +++ b/drivers/infiniband/hw/hfi1/pcie.c
> @@ -114,6 +114,7 @@ int hfi1_pcie_init(struct hfi1_devdata *dd)
>  	}
>  
>  	pci_set_master(pdev);
> +	pcie_aer_set_dword(pdev, PCI_ERR_UNCOR_MASK, PCI_ERR_UNC_UNSUP);
>  	(void)pci_enable_pcie_error_reporting(pdev);
>  	return 0;
>  
> 

  reply	other threads:[~2019-04-10 19:29 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-10 12:34 [PATCH for-next 0/2] Allow drivers to configure AER registers Dennis Dalessandro
2019-04-10 12:34 ` [PATCH for-next 1/2] PCI/AER: Helper function for configuring " Dennis Dalessandro
2019-04-10 13:46   ` Andriy Shevchenko
2019-04-10 12:35 ` [PATCH for-next 2/2] IB/hfi1: Make Unsupported Request error non-fatal Dennis Dalessandro
2019-04-10 19:29   ` Bjorn Helgaas [this message]
     [not found]     ` <14063C7AD467DE4B82DEDB5C278E8663BE6A1B14@FMSMSX108.amr.corp.intel.com>
2019-04-11 18:22       ` Arumugam, Kamenee
2019-04-11 18:29         ` Jason Gunthorpe
2019-04-11 20:37           ` Arumugam, Kamenee
2019-04-12 13:55             ` Jason Gunthorpe
2019-04-15 18:47               ` Dennis Dalessandro
2019-04-15 21:46                 ` Bjorn Helgaas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190410192948.GG256045@google.com \
    --to=helgaas@kernel.org \
    --cc=dennis.dalessandro@intel.com \
    --cc=dledford@redhat.com \
    --cc=jgg@ziepe.ca \
    --cc=kamenee.arumugam@intel.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=michael.j.ruhl@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.