public inbox for linux-acpi@vger.kernel.org
 help / color / mirror / Atom feed
From: Ben Cheatham <benjamin.cheatham@amd.com>
To: Dan Williams <dan.j.williams@intel.com>, Tony Luck <tony.luck@intel.com>
Cc: jonathan.cameron@huawei.com, rafael@kernel.org,
	james.morse@arm.com, bp@alien8.de, dave@stogolabs.net,
	dave.jiang@intel.com, alison.schofield@intel.com,
	vishal.l.verma@intel.com, ira.weiny@intel.com,
	linux-cxl@vger.kernel.org, linux-acpi@vger.kernel.org
Subject: Re: [PATCH v12 0/3] cxl, EINJ: Update EINJ for CXL error types
Date: Thu, 15 Feb 2024 09:01:49 -0600	[thread overview]
Message-ID: <7b00a4ec-7fdd-46cb-b5a1-c06780fa8bdb@amd.com> (raw)
In-Reply-To: <65cd7cac71377_5c76294ad@dwillia2-mobl3.amr.corp.intel.com.notmuch>



On 2/14/24 8:53 PM, Dan Williams wrote:
> Tony Luck wrote:
>> On Wed, Feb 14, 2024 at 02:07:06PM -0600, Ben Cheatham wrote:
>>> v12 Changes:
>>> 	- Rebase onto v6.8-rc4
>>> 	- Squash Kconfig patch into patch 2/3 (Jonathan)
>>> 	- Change CONFIG_CXL_EINJ from "depends on ACPI_APEI_EINJ >= CXL_BUS"
>>> 	  to "depends on ACPI_APEI_EINJ = CXL_BUS"
>>> 	- Drop "ACPI, APEI" part of commit message title and use just EINJ
>>> 	instead (Dan)
>>> 	- Add protocol error types to "einj_types" documentation (Jonathan)
>>> 	- Change 0xffff... constants to use GENMASK()
>>> 	- Drop param* variables and use constants instead in cxl error
>>> 	  inject functions (Jonathan)
>>> 	- Add is_cxl_error_type() helper function in einj.c (Jonathan)
>>> 	- Remove a stray function declaration in einj-cxl.h (Jonathan)
>>> 	- Comment #else/#endifs with corresponding #if/#ifdef in
>>> 	einj-cxl.h (Jonathan)
>>>
>>> v11 Changes:
>>> 	- Drop patch 2/6 (Add CXL protocol error defines) and put the
>>> 	  defines in patch 4/6 instead (Dan)
>>> 	- Add Dan's reviewed-by
>>>
>>> The new CXL error types will use the Memory Address field in the
>>> SET_ERROR_TYPE_WITH_ADDRESS structure in order to target a CXL 1.1
>>> compliant memory-mapped downstream port. The value of the memory address
>>> will be in the port's MMIO range, and it will not represent physical
>>> (normal or persistent) memory.
>>>
>>> Add the functionality for injecting CXL 1.1 errors to the EINJ module,
>>> but not through the EINJ legacy interface under /sys/kernel/debug/apei/einj.
>>> Instead, make the error types available under /sys/kernel/debug/cxl.
>>> This allows for validating the MMIO address for a CXL 1.1 error type
>>> while also not making the user responsible for finding it.
>>
>> I tried this series on an Intel Icelake (which as far as I know
>> doesn't support CXL).
> 
> Thanks Tony!
> 
>> Couple of oddities:
>>
>> 1) I built as a module (CONFIG_ACPI_APEI_EINJ=m) like I normally do.
>>    But this was autoloaded and EINJ initialized during boot:
>>
>> [   33.909111] EINJ: Error INJection is initialized.
> 
> In the current code it should only load if cxl_core.ko is also loaded.
> 
> Can you share the output of lsmod to maybe see which module loaded that
> dependency?
> 
>> I'm wondering if that might be a problem for anyone that likes to
>> leave the einj module not loaded until they have some need to
>> inject errors.
> 
> That is a behavior change of this approach. Is it a problem?
> 
> If it is I would say that we need to break out a new cxl_einj.ko module
> that when it loads walks the CXL topology and creates the debugfs files.
> Otherwise my assumption is that CONFIG_CXL_EINJ=y means that cxl_core.ko
> loads einj.ko unconditionally.
> 
> I would save that work for a clear description of why einj.ko should not
> be resident.
> 
>> 2) Even though my system doesn't have any CXL support, I found this:
>>
>> # cat /sys/kernel/debug/cxl/einj_types
>> 0x00001000      CXL.cache Protocol Correctable
>>
>> What does this mean?
> 
> Strange, does:
> 
> /sys/kernel/debug/einj/available_error_type
> 
> ...show the same even before these patches? I.e. maybe this pre-CXL BIOS was
> using the 0x1000 encoding when it should not?
> 

Dan's already alluded to this, but to elaborate: This series doesn't do anything
different than before when getting available error types, it just puts the CXL types
into the "einj_types" file instead. So it's possible that your platform doesn't
have CXL support, but it is reporting a CXL error type for EINJ. This could be
a BIOS error, or it could be that the BIOS is pre ACPIv6.5 and was using 0x1000
for a different error type and the kernel is interpreting it as a CXL error type.
If you think something else is happening, I'd love to hear about it!

Thanks,
Ben

>> Using ras-tools I injected some DDR memory errors. So legacy
>> functionality still works OK.
>>
>> -Tony
> 
> 
> 

  reply	other threads:[~2024-02-15 15:01 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-14 20:07 [PATCH v12 0/3] cxl, EINJ: Update EINJ for CXL error types Ben Cheatham
2024-02-14 20:07 ` [PATCH v12 1/3] EINJ: Migrate to a platform driver Ben Cheatham
2024-02-16  2:03   ` kernel test robot
2024-02-16 13:43   ` kernel test robot
2024-02-14 20:07 ` [PATCH v12 2/3] cxl/core, EINJ: Add EINJ CXL debugfs files and EINJ helper functions Ben Cheatham
2024-02-15  2:25   ` Dan Williams
2024-02-15 15:01     ` Ben Cheatham
2024-02-16  0:15       ` Dan Williams
2024-02-15  9:33   ` Jonathan Cameron
2024-02-15 15:01     ` Ben Cheatham
2024-02-14 20:07 ` [PATCH v12 3/3] EINJ, Documentation: Update EINJ kernel doc Ben Cheatham
2024-02-20 19:02   ` Davidlohr Bueso
2024-02-20 19:59     ` Ben Cheatham
2024-02-15  1:11 ` [PATCH v12 0/3] cxl, EINJ: Update EINJ for CXL error types Tony Luck
2024-02-15  2:53   ` Dan Williams
2024-02-15 15:01     ` Ben Cheatham [this message]
2024-02-15 17:23     ` Luck, Tony
2024-02-15 18:15       ` Luck, Tony
2024-02-16  0:11         ` Dan Williams
2024-02-16  0:09       ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7b00a4ec-7fdd-46cb-b5a1-c06780fa8bdb@amd.com \
    --to=benjamin.cheatham@amd.com \
    --cc=alison.schofield@intel.com \
    --cc=bp@alien8.de \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dave@stogolabs.net \
    --cc=ira.weiny@intel.com \
    --cc=james.morse@arm.com \
    --cc=jonathan.cameron@huawei.com \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=rafael@kernel.org \
    --cc=tony.luck@intel.com \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox