Linux CXL
 help / color / mirror / Atom feed
From: Dave Jiang <dave.jiang@intel.com>
To: Robert Richter <rrichter@amd.com>
Cc: linux-cxl@vger.kernel.org, dave@stgolabs.net,
	jonathan.cameron@huawei.com, alison.schofield@intel.com,
	vishal.l.verma@intel.com, ira.weiny@intel.com,
	dan.j.williams@intel.com
Subject: Re: [PATCH] cxl: Fix cxl_endpoint_get_perf_coordinate() support for RCH
Date: Mon, 29 Apr 2024 08:56:48 -0700	[thread overview]
Message-ID: <929b7b3c-4676-468c-8ba2-8028bb05479c@intel.com> (raw)
In-Reply-To: <Zi-LzgrCVHDlIBpB@rric.localdomain>



On 4/29/24 5:00 AM, Robert Richter wrote:
> On 26.04.24 15:47:56, Dave Jiang wrote:
>> Robert reported the following when booting a CXL host with Restricted CXL
>> Host (RCH) topology:
>>  [   39.815379] cxl_acpi ACPI0017:00: not a cxl_port device
>>  [   39.827123] WARNING: CPU: 46 PID: 1754 at drivers/cxl/core/port.c:592 to_cxl_port+0x56/0x70 [cxl_core]
>>
>> ... plus some related subsequent NULL pointer dereference:
>>
>>  [   40.718708] BUG: kernel NULL pointer dereference, address: 00000000000002d8
>>
>> The iterator to walk the PCIe path did not account for RCH topology.
>> However RCH does not support hotplug and the memory exported by the
>> Restricted CXL Device (RCD) should be covered by HMAT and therefore no
>> access_coordinate is needed. Add check to see if the endpoint device is
>> RCD and skip calculation.
>>
>> Also add a call to cxl_endpoint_get_perf_coordinates() in cxl_test in order
>> to exercise the topology iterator. The dev_is_pci() check added is to help
>> with this test and should be harmless for normal operation.
>>
>> Reported-by: Robert Richter <rrichter@amd.com>
>> Closes: https://lore.kernel.org/all/Ziv8GfSMSbvlBB0h@rric.localdomain/
>> Fixes: 592780b8391f ("cxl: Fix retrieving of access_coordinates in PCIe path")
>> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
> 
> This patch fixes the issue.
> 
> Tested-by: Robert Richter <rrichter@amd.com>
> Reviewed-by: Robert Richter <rrichter@amd.com>

Thank you Robert! I'll get this queued for rc7. 
> 
> But see below for a question...
> 
>> ---
>>
>> Hi Robert,
>> Can you please try this patch and see if it addresses the issue you saw
>> on your RCH platform? Thanks!
>>
>>  drivers/cxl/core/port.c      | 15 ++++++++++++++-
>>  tools/testing/cxl/test/cxl.c |  3 +++
>>  2 files changed, 17 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
>> index 762783bb091a..887ed6e358fb 100644
>> --- a/drivers/cxl/core/port.c
>> +++ b/drivers/cxl/core/port.c
>> @@ -2184,6 +2184,7 @@ static bool parent_port_is_cxl_root(struct cxl_port *port)
>>  int cxl_endpoint_get_perf_coordinates(struct cxl_port *port,
>>  				      struct access_coordinate *coord)
>>  {
>> +	struct cxl_memdev *cxlmd = to_cxl_memdev(port->uport_dev);
>>  	struct access_coordinate c[] = {
>>  		{
>>  			.read_bandwidth = UINT_MAX,
>> @@ -2197,12 +2198,20 @@ int cxl_endpoint_get_perf_coordinates(struct cxl_port *port,
>>  	struct cxl_port *iter = port;
>>  	struct cxl_dport *dport;
>>  	struct pci_dev *pdev;
>> +	struct device *dev;
>>  	unsigned int bw;
>>  	bool is_cxl_root;
>>  
>>  	if (!is_cxl_endpoint(port))
>>  		return -EINVAL;
>>  
>> +	/*
>> +	 * Skip calculation for RCD. Expectation is HMAT already covers RCD case
>> +	 * since RCH does not support hotplug.
>> +	 */
>> +	if (cxlmd->cxlds->rcd)
>> +		return 0;
>> +
>>  	/*
>>  	 * Exit the loop when the parent port of the current iter port is cxl
>>  	 * root. The iterative loop starts at the endpoint and gathers the
>> @@ -2232,8 +2241,12 @@ int cxl_endpoint_get_perf_coordinates(struct cxl_port *port,
>>  		return -EINVAL;
>>  	cxl_coordinates_combine(c, c, dport->coord);
>>  
>> +	dev = port->uport_dev->parent;
>> +	if (!dev_is_pci(dev))
>> +		return -ENODEV;
>> +
>>  	/* Get the calculated PCI paths bandwidth */
>> -	pdev = to_pci_dev(port->uport_dev->parent);
>> +	pdev = to_pci_dev(dev);
>>  	bw = pcie_bandwidth_available(pdev, NULL, NULL, NULL);
>>  	if (bw == 0)
>>  		return -ENXIO;
>> diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c
>> index 61c69297e797..72e2ce58e1dc 100644
>> --- a/tools/testing/cxl/test/cxl.c
>> +++ b/tools/testing/cxl/test/cxl.c
>> @@ -1001,6 +1001,7 @@ static void mock_cxl_endpoint_parse_cdat(struct cxl_port *port)
>>  	struct cxl_memdev *cxlmd = to_cxl_memdev(port->uport_dev);
>>  	struct cxl_dev_state *cxlds = cxlmd->cxlds;
>>  	struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlds);
>> +	struct access_coordinate ep_c[ACCESS_COORDINATE_MAX];
>>  	struct range pmem_range = {
>>  		.start = cxlds->pmem_res.start,
>>  		.end = cxlds->pmem_res.end,
>> @@ -1020,6 +1021,8 @@ static void mock_cxl_endpoint_parse_cdat(struct cxl_port *port)
>>  		dpa_perf_setup(port, &pmem_range, &mds->pmem_perf);
>>  
>>  	cxl_memdev_update_perf(cxlmd);
>> +
>> +	cxl_endpoint_get_perf_coordinates(port, ep_c);
> 
> I don't see what this is for as ep_c is unused later? The only reason
> is for error checking to see if that throws some kernel message in the
> logs but return code is unused.

Right. The results are thrown away. The call is there to specifically test the iterator and making sure we don't crash or fail. It has no other purpose. I looked into plumbing this function for usage in cxl-test but found it not possible to override the PCI API function that retrieves bandwidth. So for now we'll just have it just test the topology iterator. I'll add a comment. 
> 
> Thanks,
> 
> -Robert
> 
>>  }
>>  
>>  static struct cxl_mock_ops cxl_mock_ops = {
>> -- 
>> 2.44.0
>>

      reply	other threads:[~2024-04-29 15:56 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-26 22:47 [PATCH] cxl: Fix cxl_endpoint_get_perf_coordinate() support for RCH Dave Jiang
2024-04-27  0:10 ` Dan Williams
2024-04-29 12:23   ` Robert Richter
2024-04-30 16:54     ` Jonathan Cameron
2024-04-29 12:00 ` Robert Richter
2024-04-29 15:56   ` Dave Jiang [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=929b7b3c-4676-468c-8ba2-8028bb05479c@intel.com \
    --to=dave.jiang@intel.com \
    --cc=alison.schofield@intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave@stgolabs.net \
    --cc=ira.weiny@intel.com \
    --cc=jonathan.cameron@huawei.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=rrichter@amd.com \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox