All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: Dave Jiang <dave.jiang@intel.com>
Cc: linux-cxl@vger.kernel.org, linux-pci@vger.kernel.org,
	dan.j.williams@intel.com, ira.weiny@intel.com,
	vishal.l.verma@intel.com, alison.schofield@intel.com,
	Jonathan.Cameron@huawei.com, rostedt@goodmis.org,
	terry.bowman@amd.com, bhelgaas@google.com,
	sathyanarayanan.kuppuswamy@linux.intel.com,
	shiju.jose@huawei.com
Subject: Re: [PATCH v4 10/11] PCI/AER: Add optional logging callback for correctable error
Date: Wed, 30 Nov 2022 13:45:21 -0600	[thread overview]
Message-ID: <20221130194521.GA829038@bhelgaas> (raw)
In-Reply-To: <166974414546.1608150.4142682712102935008.stgit@djiang5-desk3.ch.intel.com>

On Tue, Nov 29, 2022 at 10:49:05AM -0700, Dave Jiang wrote:
> Some new devices such as CXL devices may want to record additional error
> information on a corrected error. Add a callback to allow the PCI device
> driver to do additional logging such as providing additional stats for user
> space RAS monitoring.
> 
> For CXL device, this is actually a need due to CXL needing to write to the
> device AER status register in order to clear the unmasked CEs.

s/CE/correctable error/ since it's the first use and not common in
PCI-land.

"device AER status register" sounds like the PCIe AER Correctable
Error Status Register (PCIe r6.0, sec 7.8.4.5), but I think you mean
something else, maybe a CXL-specific register?

The PCIe core needs to own the AER one (PCI_ERR_COR_STATUS) so it can
coordinate ownership between firmware and Linux.

> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Suggested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Signed-off-by: Dave Jiang <dave.jiang@intel.com>

Acked-by: Bjorn Helgaas <bhelgaas@google.com>

> ---
>  Documentation/PCI/pci-error-recovery.rst |    7 +++++++
>  drivers/pci/pcie/aer.c                   |    8 +++++++-
>  include/linux/pci.h                      |    3 +++
>  3 files changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/PCI/pci-error-recovery.rst b/Documentation/PCI/pci-error-recovery.rst
> index 187f43a03200..690220255d5e 100644
> --- a/Documentation/PCI/pci-error-recovery.rst
> +++ b/Documentation/PCI/pci-error-recovery.rst
> @@ -83,6 +83,7 @@ This structure has the form::
>  		int (*mmio_enabled)(struct pci_dev *dev);
>  		int (*slot_reset)(struct pci_dev *dev);
>  		void (*resume)(struct pci_dev *dev);
> +		void (*cor_error_log)(struct pci_dev *dev);

I think I would remove "log" from the name because it suggests this
hook should *only* log, and you need to actually clear some status.
Maybe "cor_error_detected()" to be analogous to error_detected()?

>  	};
>  
>  The possible channel states are::
> @@ -422,5 +423,11 @@ That is, the recovery API only requires that:
>     - drivers/net/cxgb3
>     - drivers/net/s2io.c
>  
> +   The cor_error_log() callback is invoked in handle_error_source() when
> +   the error severity is "correctable". The callback is optional and allows
> +   additional logging to be done if desired. See example:
> +
> +   - drivers/cxl/pci.c
> +
>  The End
>  -------
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index e2d8a74f83c3..af1b5eecbb11 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -961,8 +961,14 @@ static void handle_error_source(struct pci_dev *dev, struct aer_err_info *info)
>  		if (aer)
>  			pci_write_config_dword(dev, aer + PCI_ERR_COR_STATUS,
>  					info->status);
> -		if (pcie_aer_is_native(dev))
> +		if (pcie_aer_is_native(dev)) {
> +			struct pci_driver *pdrv = dev->driver;
> +
> +			if (pdrv && pdrv->err_handler &&
> +			    pdrv->err_handler->cor_error_log)
> +				pdrv->err_handler->cor_error_log(dev);
>  			pcie_clear_device_status(dev);
> +		}
>  	} else if (info->severity == AER_NONFATAL)
>  		pcie_do_recovery(dev, pci_channel_io_normal, aer_root_reset);
>  	else if (info->severity == AER_FATAL)
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 575849a100a3..54939b3426a9 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -844,6 +844,9 @@ struct pci_error_handlers {
>  
>  	/* Device driver may resume normal operations */
>  	void (*resume)(struct pci_dev *dev);
> +
> +	/* Allow device driver to record more details of a correctable error */
> +	void (*cor_error_log)(struct pci_dev *dev);
>  };
>  
>  
> 
> 

  reply	other threads:[~2022-11-30 19:45 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-29 17:48 [PATCH v4 00/11] cxl/pci: Add fundamental error handling Dave Jiang
2022-11-29 17:48 ` [PATCH v4 01/11] cxl/pci: Cleanup repeated code in cxl_probe_regs() helpers Dave Jiang
2022-11-29 17:48 ` [PATCH v4 02/11] cxl/pci: Cleanup cxl_map_device_regs() Dave Jiang
2022-11-29 17:48 ` [PATCH v4 03/11] cxl/pci: Kill cxl_map_regs() Dave Jiang
2022-11-29 17:48 ` [PATCH v4 04/11] cxl/core/regs: Make cxl_map_{component, device}_regs() device generic Dave Jiang
2022-11-29 17:48 ` [PATCH v4 05/11] cxl/port: Limit the port driver to just the HDM Decoder Capability Dave Jiang
2022-11-29 17:48 ` [PATCH v4 06/11] cxl/pci: Prepare for mapping RAS Capability Structure Dave Jiang
2022-11-29 17:48 ` [PATCH v4 07/11] cxl/pci: Find and map the " Dave Jiang
2022-11-29 17:48 ` [PATCH v4 08/11] cxl/pci: add tracepoint events for CXL RAS Dave Jiang
2022-11-29 19:45   ` Steven Rostedt
2022-11-29 17:48 ` [PATCH v4 09/11] cxl/pci: Add (hopeful) error handling support Dave Jiang
2023-01-06 16:05   ` Jonathan Cameron
2023-01-06 16:12     ` Dave Jiang
2022-11-29 17:49 ` [PATCH v4 10/11] PCI/AER: Add optional logging callback for correctable error Dave Jiang
2022-11-30 19:45   ` Bjorn Helgaas [this message]
2022-11-30 21:37     ` Dave Jiang
2022-11-30 22:11     ` [v5 10/11 PATCH] " Dave Jiang
2022-11-30 22:13     ` [v5 11/11 PATCH] cxl/pci: Add callback to log AER " Dave Jiang
2022-11-30 22:47       ` Bjorn Helgaas
2022-12-01  0:02         ` [v6 " Dave Jiang
2022-12-07 20:04           ` Terry Bowman
2022-12-07 20:29             ` Bjorn Helgaas
2022-12-07 20:54               ` Terry Bowman
2022-11-29 17:49 ` [PATCH v4 11/11] " Dave Jiang
2022-12-13 15:17 ` [PATCH v4 00/11] cxl/pci: Add fundamental error handling Jonathan Cameron

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221130194521.GA829038@bhelgaas \
    --to=helgaas@kernel.org \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=alison.schofield@intel.com \
    --cc=bhelgaas@google.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=ira.weiny@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    --cc=shiju.jose@huawei.com \
    --cc=terry.bowman@amd.com \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.