From: Ben Cheatham <benjamin.cheatham@amd.com>
To: Dan Williams <dan.j.williams@intel.com>, Tony Luck <tony.luck@intel.com>
Cc: jonathan.cameron@huawei.com, rafael@kernel.org,
james.morse@arm.com, bp@alien8.de, dave@stogolabs.net,
dave.jiang@intel.com, alison.schofield@intel.com,
vishal.l.verma@intel.com, ira.weiny@intel.com,
linux-cxl@vger.kernel.org, linux-acpi@vger.kernel.org
Subject: Re: [PATCH v12 0/3] cxl, EINJ: Update EINJ for CXL error types
Date: Thu, 15 Feb 2024 09:01:49 -0600 [thread overview]
Message-ID: <7b00a4ec-7fdd-46cb-b5a1-c06780fa8bdb@amd.com> (raw)
In-Reply-To: <65cd7cac71377_5c76294ad@dwillia2-mobl3.amr.corp.intel.com.notmuch>
On 2/14/24 8:53 PM, Dan Williams wrote:
> Tony Luck wrote:
>> On Wed, Feb 14, 2024 at 02:07:06PM -0600, Ben Cheatham wrote:
>>> v12 Changes:
>>> - Rebase onto v6.8-rc4
>>> - Squash Kconfig patch into patch 2/3 (Jonathan)
>>> - Change CONFIG_CXL_EINJ from "depends on ACPI_APEI_EINJ >= CXL_BUS"
>>> to "depends on ACPI_APEI_EINJ = CXL_BUS"
>>> - Drop "ACPI, APEI" part of commit message title and use just EINJ
>>> instead (Dan)
>>> - Add protocol error types to "einj_types" documentation (Jonathan)
>>> - Change 0xffff... constants to use GENMASK()
>>> - Drop param* variables and use constants instead in cxl error
>>> inject functions (Jonathan)
>>> - Add is_cxl_error_type() helper function in einj.c (Jonathan)
>>> - Remove a stray function declaration in einj-cxl.h (Jonathan)
>>> - Comment #else/#endifs with corresponding #if/#ifdef in
>>> einj-cxl.h (Jonathan)
>>>
>>> v11 Changes:
>>> - Drop patch 2/6 (Add CXL protocol error defines) and put the
>>> defines in patch 4/6 instead (Dan)
>>> - Add Dan's reviewed-by
>>>
>>> The new CXL error types will use the Memory Address field in the
>>> SET_ERROR_TYPE_WITH_ADDRESS structure in order to target a CXL 1.1
>>> compliant memory-mapped downstream port. The value of the memory address
>>> will be in the port's MMIO range, and it will not represent physical
>>> (normal or persistent) memory.
>>>
>>> Add the functionality for injecting CXL 1.1 errors to the EINJ module,
>>> but not through the EINJ legacy interface under /sys/kernel/debug/apei/einj.
>>> Instead, make the error types available under /sys/kernel/debug/cxl.
>>> This allows for validating the MMIO address for a CXL 1.1 error type
>>> while also not making the user responsible for finding it.
>>
>> I tried this series on an Intel Icelake (which as far as I know
>> doesn't support CXL).
>
> Thanks Tony!
>
>> Couple of oddities:
>>
>> 1) I built as a module (CONFIG_ACPI_APEI_EINJ=m) like I normally do.
>> But this was autoloaded and EINJ initialized during boot:
>>
>> [ 33.909111] EINJ: Error INJection is initialized.
>
> In the current code it should only load if cxl_core.ko is also loaded.
>
> Can you share the output of lsmod to maybe see which module loaded that
> dependency?
>
>> I'm wondering if that might be a problem for anyone that likes to
>> leave the einj module not loaded until they have some need to
>> inject errors.
>
> That is a behavior change of this approach. Is it a problem?
>
> If it is I would say that we need to break out a new cxl_einj.ko module
> that when it loads walks the CXL topology and creates the debugfs files.
> Otherwise my assumption is that CONFIG_CXL_EINJ=y means that cxl_core.ko
> loads einj.ko unconditionally.
>
> I would save that work for a clear description of why einj.ko should not
> be resident.
>
>> 2) Even though my system doesn't have any CXL support, I found this:
>>
>> # cat /sys/kernel/debug/cxl/einj_types
>> 0x00001000 CXL.cache Protocol Correctable
>>
>> What does this mean?
>
> Strange, does:
>
> /sys/kernel/debug/einj/available_error_type
>
> ...show the same even before these patches? I.e. maybe this pre-CXL BIOS was
> using the 0x1000 encoding when it should not?
>
Dan's already alluded to this, but to elaborate: This series doesn't do anything
different than before when getting available error types, it just puts the CXL types
into the "einj_types" file instead. So it's possible that your platform doesn't
have CXL support, but it is reporting a CXL error type for EINJ. This could be
a BIOS error, or it could be that the BIOS is pre ACPIv6.5 and was using 0x1000
for a different error type and the kernel is interpreting it as a CXL error type.
If you think something else is happening, I'd love to hear about it!
Thanks,
Ben
>> Using ras-tools I injected some DDR memory errors. So legacy
>> functionality still works OK.
>>
>> -Tony
>
>
>
next prev parent reply other threads:[~2024-02-15 15:01 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-14 20:07 [PATCH v12 0/3] cxl, EINJ: Update EINJ for CXL error types Ben Cheatham
2024-02-14 20:07 ` [PATCH v12 1/3] EINJ: Migrate to a platform driver Ben Cheatham
2024-02-16 2:03 ` kernel test robot
2024-02-16 13:43 ` kernel test robot
2024-02-14 20:07 ` [PATCH v12 2/3] cxl/core, EINJ: Add EINJ CXL debugfs files and EINJ helper functions Ben Cheatham
2024-02-15 2:25 ` Dan Williams
2024-02-15 15:01 ` Ben Cheatham
2024-02-16 0:15 ` Dan Williams
2024-02-15 9:33 ` Jonathan Cameron
2024-02-15 15:01 ` Ben Cheatham
2024-02-14 20:07 ` [PATCH v12 3/3] EINJ, Documentation: Update EINJ kernel doc Ben Cheatham
2024-02-20 19:02 ` Davidlohr Bueso
2024-02-20 19:59 ` Ben Cheatham
2024-02-15 1:11 ` [PATCH v12 0/3] cxl, EINJ: Update EINJ for CXL error types Tony Luck
2024-02-15 2:53 ` Dan Williams
2024-02-15 15:01 ` Ben Cheatham [this message]
2024-02-15 17:23 ` Luck, Tony
2024-02-15 18:15 ` Luck, Tony
2024-02-16 0:11 ` Dan Williams
2024-02-16 0:09 ` Dan Williams
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7b00a4ec-7fdd-46cb-b5a1-c06780fa8bdb@amd.com \
--to=benjamin.cheatham@amd.com \
--cc=alison.schofield@intel.com \
--cc=bp@alien8.de \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=dave@stogolabs.net \
--cc=ira.weiny@intel.com \
--cc=james.morse@arm.com \
--cc=jonathan.cameron@huawei.com \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-cxl@vger.kernel.org \
--cc=rafael@kernel.org \
--cc=tony.luck@intel.com \
--cc=vishal.l.verma@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox