All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ben Cheatham <benjamin.cheatham@amd.com>
To: Dan Williams <dan.j.williams@intel.com>, Tony Luck <tony.luck@intel.com>
Cc: jonathan.cameron@huawei.com, rafael@kernel.org,
	james.morse@arm.com, bp@alien8.de, dave@stogolabs.net,
	dave.jiang@intel.com, alison.schofield@intel.com,
	vishal.l.verma@intel.com, ira.weiny@intel.com,
	linux-cxl@vger.kernel.org, linux-acpi@vger.kernel.org
Subject: Re: [PATCH v12 0/3] cxl, EINJ: Update EINJ for CXL error types
Date: Thu, 15 Feb 2024 09:01:49 -0600	[thread overview]
Message-ID: <7b00a4ec-7fdd-46cb-b5a1-c06780fa8bdb@amd.com> (raw)
In-Reply-To: <65cd7cac71377_5c76294ad@dwillia2-mobl3.amr.corp.intel.com.notmuch>



On 2/14/24 8:53 PM, Dan Williams wrote:
> Tony Luck wrote:
>> On Wed, Feb 14, 2024 at 02:07:06PM -0600, Ben Cheatham wrote:
>>> v12 Changes:
>>> 	- Rebase onto v6.8-rc4
>>> 	- Squash Kconfig patch into patch 2/3 (Jonathan)
>>> 	- Change CONFIG_CXL_EINJ from "depends on ACPI_APEI_EINJ >= CXL_BUS"
>>> 	  to "depends on ACPI_APEI_EINJ = CXL_BUS"
>>> 	- Drop "ACPI, APEI" part of commit message title and use just EINJ
>>> 	instead (Dan)
>>> 	- Add protocol error types to "einj_types" documentation (Jonathan)
>>> 	- Change 0xffff... constants to use GENMASK()
>>> 	- Drop param* variables and use constants instead in cxl error
>>> 	  inject functions (Jonathan)
>>> 	- Add is_cxl_error_type() helper function in einj.c (Jonathan)
>>> 	- Remove a stray function declaration in einj-cxl.h (Jonathan)
>>> 	- Comment #else/#endifs with corresponding #if/#ifdef in
>>> 	einj-cxl.h (Jonathan)
>>>
>>> v11 Changes:
>>> 	- Drop patch 2/6 (Add CXL protocol error defines) and put the
>>> 	  defines in patch 4/6 instead (Dan)
>>> 	- Add Dan's reviewed-by
>>>
>>> The new CXL error types will use the Memory Address field in the
>>> SET_ERROR_TYPE_WITH_ADDRESS structure in order to target a CXL 1.1
>>> compliant memory-mapped downstream port. The value of the memory address
>>> will be in the port's MMIO range, and it will not represent physical
>>> (normal or persistent) memory.
>>>
>>> Add the functionality for injecting CXL 1.1 errors to the EINJ module,
>>> but not through the EINJ legacy interface under /sys/kernel/debug/apei/einj.
>>> Instead, make the error types available under /sys/kernel/debug/cxl.
>>> This allows for validating the MMIO address for a CXL 1.1 error type
>>> while also not making the user responsible for finding it.
>>
>> I tried this series on an Intel Icelake (which as far as I know
>> doesn't support CXL).
> 
> Thanks Tony!
> 
>> Couple of oddities:
>>
>> 1) I built as a module (CONFIG_ACPI_APEI_EINJ=m) like I normally do.
>>    But this was autoloaded and EINJ initialized during boot:
>>
>> [   33.909111] EINJ: Error INJection is initialized.
> 
> In the current code it should only load if cxl_core.ko is also loaded.
> 
> Can you share the output of lsmod to maybe see which module loaded that
> dependency?
> 
>> I'm wondering if that might be a problem for anyone that likes to
>> leave the einj module not loaded until they have some need to
>> inject errors.
> 
> That is a behavior change of this approach. Is it a problem?
> 
> If it is I would say that we need to break out a new cxl_einj.ko module
> that when it loads walks the CXL topology and creates the debugfs files.
> Otherwise my assumption is that CONFIG_CXL_EINJ=y means that cxl_core.ko
> loads einj.ko unconditionally.
> 
> I would save that work for a clear description of why einj.ko should not
> be resident.
> 
>> 2) Even though my system doesn't have any CXL support, I found this:
>>
>> # cat /sys/kernel/debug/cxl/einj_types
>> 0x00001000      CXL.cache Protocol Correctable
>>
>> What does this mean?
> 
> Strange, does:
> 
> /sys/kernel/debug/einj/available_error_type
> 
> ...show the same even before these patches? I.e. maybe this pre-CXL BIOS was
> using the 0x1000 encoding when it should not?
> 

Dan's already alluded to this, but to elaborate: This series doesn't do anything
different than before when getting available error types, it just puts the CXL types
into the "einj_types" file instead. So it's possible that your platform doesn't
have CXL support, but it is reporting a CXL error type for EINJ. This could be
a BIOS error, or it could be that the BIOS is pre ACPIv6.5 and was using 0x1000
for a different error type and the kernel is interpreting it as a CXL error type.
If you think something else is happening, I'd love to hear about it!

Thanks,
Ben

>> Using ras-tools I injected some DDR memory errors. So legacy
>> functionality still works OK.
>>
>> -Tony
> 
> 
> 

  reply	other threads:[~2024-02-15 15:01 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-14 20:07 [PATCH v12 0/3] cxl, EINJ: Update EINJ for CXL error types Ben Cheatham
2024-02-14 20:07 ` [PATCH v12 1/3] EINJ: Migrate to a platform driver Ben Cheatham
2024-02-16  2:03   ` kernel test robot
2024-02-16 13:43   ` kernel test robot
2024-02-14 20:07 ` [PATCH v12 2/3] cxl/core, EINJ: Add EINJ CXL debugfs files and EINJ helper functions Ben Cheatham
2024-02-15  2:25   ` Dan Williams
2024-02-15 15:01     ` Ben Cheatham
2024-02-16  0:15       ` Dan Williams
2024-02-15  9:33   ` Jonathan Cameron
2024-02-15 15:01     ` Ben Cheatham
2024-02-14 20:07 ` [PATCH v12 3/3] EINJ, Documentation: Update EINJ kernel doc Ben Cheatham
2024-02-20 19:02   ` Davidlohr Bueso
2024-02-20 19:59     ` Ben Cheatham
2024-02-15  1:11 ` [PATCH v12 0/3] cxl, EINJ: Update EINJ for CXL error types Tony Luck
2024-02-15  2:53   ` Dan Williams
2024-02-15 15:01     ` Ben Cheatham [this message]
2024-02-15 17:23     ` Luck, Tony
2024-02-15 18:15       ` Luck, Tony
2024-02-16  0:11         ` Dan Williams
2024-02-16  0:09       ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7b00a4ec-7fdd-46cb-b5a1-c06780fa8bdb@amd.com \
    --to=benjamin.cheatham@amd.com \
    --cc=alison.schofield@intel.com \
    --cc=bp@alien8.de \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dave@stogolabs.net \
    --cc=ira.weiny@intel.com \
    --cc=james.morse@arm.com \
    --cc=jonathan.cameron@huawei.com \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=rafael@kernel.org \
    --cc=tony.luck@intel.com \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.