All of lore.kernel.org
 help / color / mirror / Atom feed
From: PJ Waskiewicz <ppwaskie@kernel.org>
To: Dan Williams <dan.j.williams@intel.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>,
	Lukas Wunner <lukas@wunner.de>,
	linux-cxl@vger.kernel.org, linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/1] cxl/acpi.c: Add buggy BIOS hint for CXL ACPI lookup failure
Date: Mon, 8 Apr 2024 14:32:59 -0700	[thread overview]
Message-ID: <ZhRii1BAg86gccWM@snoopy> (raw)
In-Reply-To: <6614575a1c15c_2583ad29476@dwillia2-xfh.jf.intel.com.notmuch>

On 24/04/08 01:45PM, Dan Williams wrote:
> PJ Waskiewicz wrote:
> [..]
> > > Other than that seems reasonable to hint it is probably a bios
> > > bug - however I wonder how many other cases we should do this for and
> > > whether it is worth the effort of marking them all?
> > 
> > I can confirm this was definitely a BIOS bug in this particular case.
> > The vendor spun a quick test BIOS for us to test on an EMR and SPR host,
> > and the _UID's were finally correct.  I could successfully walk the CEDT
> > and get to the CAPS structs I was after (link speed, bus width, etc.).
> 
> Oh, in that case I think there's no need to worry about a Linux quirk.

Copy that.
 
> I do think cxl_acpi has multiple overlapping failure cases when what you
> really want to know is whether:
> 
>    "CXL host bridge can be found in CEDT.CHBS"

Yes.
 
> Turns out that straightforward message is aleady a driver message, but
> it gets skipped in this case. So, I am thinking of cleanup /
> clarification along the following lines:
> 
> 1/ Lean on the existing cxl_get_chbs() validation paths to report on
> errors
> 
> 2/ Include the device-name rather than the UID since if UID is
> unreliable it does not help you communicate with your BIOS vendor. I.e.
> give a breadcrumb for the BIOS engineer to match the AML device name
> with the CEDT content.
> 
> 3/ Do not fail driver load on a single host-bridge parsing failure
> 
> 4/ These are all cxl_acpi driver init events, so consistently use the
> ACPI0017 device, and the cxl_acpi driver, as the originator of the error
> message.
> 
> Would this clarification have saved you time with the debug?

Inline below, but yes, this would have helped sped up the debug quite a
bit.  Since the initial debug was happening on SPR, and I couldn't use
the host drivers, I was pulling the logic from the CXL host drivers into
my drivers to skip the need for ACPI0017.
 
> diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> index 32091379a97b..5a70d7312c64 100644
> --- a/drivers/cxl/acpi.c
> +++ b/drivers/cxl/acpi.c
> @@ -511,29 +511,26 @@ static int cxl_get_chbs_iter(union acpi_subtable_headers *header, void *arg,
>  	return 0;
>  }
>  
> -static int cxl_get_chbs(struct device *dev, struct acpi_device *hb,
> -			struct cxl_chbs_context *ctx)
> +static void cxl_get_chbs(struct device *dev, struct acpi_device *hb,
> +			 struct cxl_chbs_context *ctx)
>  {
> -	unsigned long long uid;
>  	int rc;
>  
> -	rc = acpi_evaluate_integer(hb->handle, METHOD_NAME__UID, NULL, &uid);
> -	if (rc != AE_OK) {
> -		dev_err(dev, "unable to retrieve _UID\n");
> -		return -ENOENT;
> -	}
> -
> -	dev_dbg(dev, "UID found: %lld\n", uid);
> -	*ctx = (struct cxl_chbs_context) {
> +	*ctx = (struct cxl_chbs_context){
>  		.dev = dev,
> -		.uid = uid,
>  		.base = CXL_RESOURCE_NONE,
>  		.cxl_version = UINT_MAX,
>  	};
>  
> -	acpi_table_parse_cedt(ACPI_CEDT_TYPE_CHBS, cxl_get_chbs_iter, ctx);
> +	rc = acpi_evaluate_integer(hb->handle, METHOD_NAME__UID, NULL,
> +				   &ctx->uid);
> +	if (rc != AE_OK) {
> +		dev_dbg(dev, "unable to retrieve _UID\n");
> +		return;
> +	}

Before I read the changes below, I didn't see why this was still
useful...but I do see it now in full context.

>  
> -	return 0;
> +	dev_dbg(dev, "UID found: %lld\n", ctx->uid);
> +	acpi_table_parse_cedt(ACPI_CEDT_TYPE_CHBS, cxl_get_chbs_iter, ctx);
>  }
>  
>  static int get_genport_coordinates(struct device *dev, struct cxl_dport *dport)
> @@ -561,7 +558,6 @@ static int get_genport_coordinates(struct device *dev, struct cxl_dport *dport)
>  static int add_host_bridge_dport(struct device *match, void *arg)
>  {
>  	int ret;
> -	acpi_status rc;
>  	struct device *bridge;
>  	struct cxl_dport *dport;
>  	struct cxl_chbs_context ctx;
> @@ -573,19 +569,16 @@ static int add_host_bridge_dport(struct device *match, void *arg)
>  	if (!hb)
>  		return 0;
>  
> -	rc = cxl_get_chbs(match, hb, &ctx);
> -	if (rc)
> -		return rc;
> -
> +	cxl_get_chbs(match, hb, &ctx);
>  	if (ctx.cxl_version == UINT_MAX) {
> -		dev_warn(match, "No CHBS found for Host Bridge (UID %lld)\n",
> -			 ctx.uid);
> +		dev_err(host, FW_BUG "No CHBS found for Host Bridge (%s)\n",
> +			dev_name(match));
>  		return 0;
>  	}

This would have been more obvious that the lookup failed for "reasons"
instead of AE_BUFFER_OVERFLOW (which led to the fun Alice-style LXR
spelunking).

>  
>  	if (ctx.base == CXL_RESOURCE_NONE) {
> -		dev_warn(match, "CHBS invalid for Host Bridge (UID %lld)\n",
> -			 ctx.uid);
> +		dev_err(host, FW_BUG "CHBS invalid for Host Bridge (%s)\n",
> +			dev_name(match));

I think this is a good catch-all too in case the lookup is good, but the
vendor borked filling it in properly.

>  		return 0;
>  	}
>  
> @@ -650,13 +643,11 @@ static int add_host_bridge_uport(struct device *match, void *arg)
>  		return 0;
>  	}
>  
> -	rc = cxl_get_chbs(match, hb, &ctx);
> -	if (rc)
> -		return rc;
> -
> +	cxl_get_chbs(match, hb, &ctx);
>  	if (ctx.cxl_version == ACPI_CEDT_CHBS_VERSION_CXL11) {
> -		dev_warn(bridge,
> -			 "CXL CHBS version mismatch, skip port registration\n");
> +		dev_err(host,
> +			FW_BUG "CXL CHBS version mismatch, skip port registration for %s\n",
> +			dev_name(match));
>  		return 0;
>  	}

I like your version here much better than mine.  If you want to merge
that, then:

Reviewed-by: PJ Waskiewicz <ppwaskie@kernel.org>

  reply	other threads:[~2024-04-08 21:33 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-07 21:05 [PATCH 1/1] cxl/acpi.c: Add buggy BIOS hint for CXL ACPI lookup failure ppwaskie
2024-04-07 21:28 ` Lukas Wunner
2024-04-08  2:03   ` PJ Waskiewicz
2024-04-08  8:34     ` Jonathan Cameron
2024-04-08 19:29       ` PJ Waskiewicz
2024-04-08 20:45         ` Dan Williams
2024-04-08 21:32           ` PJ Waskiewicz [this message]
2024-04-09  4:22             ` PJ Waskiewicz
2024-04-08 16:54 ` Dan Williams
2024-04-08 19:25   ` PJ Waskiewicz
2024-04-09 13:22 ` Bjorn Helgaas
2024-04-29  5:57   ` PJ Waskiewicz
2024-04-29 15:31     ` Bjorn Helgaas
2024-04-29 18:35       ` Dan Williams
2024-05-01 15:28         ` PJ Waskiewicz
2024-05-01 15:47           ` Dan Williams
2024-05-02 17:34             ` PJ Waskiewicz
2024-05-02 18:29               ` Dan Williams
2024-05-01 17:54           ` Bjorn Helgaas
2024-05-02 17:30             ` PJ Waskiewicz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZhRii1BAg86gccWM@snoopy \
    --to=ppwaskie@kernel.org \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=dan.j.williams@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.