public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] EDAC/igen6: Add polling support and allow setting edac_op_state
@ 2024-11-06 11:35 Orange Kao
  2024-11-06 11:35 ` [PATCH 1/3] EDAC/igen6: Initialize edac_op_state according to the configuration data Orange Kao
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: Orange Kao @ 2024-11-06 11:35 UTC (permalink / raw)
  To: tony.luck, qiuxu.zhuo
  Cc: bp, james.morse, orange, linux-edac, linux-kernel, mchehab, rric

Thank you Qiuxu and Boris.

Here is the updated patch. I would like to propose that we keep the 
edac_op_state as a module parameter. Because it would allow users (regardless of
CPU SKU) to test different options on their machine without compiling their own
kernel. I hope this could lower the entry barrier and make it easier for them to
test IBECC.

Patch 1: Initialize edac_op_state according to the configuration data
Patch 2: Add polling support
Patch 3: Allow setting edac_op_state

Thanks. Please let me know if there is anything I should improve or if anything
does not make sense.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/3] EDAC/igen6: Initialize edac_op_state according to the configuration data
  2024-11-06 11:35 [PATCH 0/3] EDAC/igen6: Add polling support and allow setting edac_op_state Orange Kao
@ 2024-11-06 11:35 ` Orange Kao
  2024-11-06 11:35 ` [PATCH 2/3] EDAC/igen6: Add polling support Orange Kao
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Orange Kao @ 2024-11-06 11:35 UTC (permalink / raw)
  To: tony.luck, qiuxu.zhuo
  Cc: bp, james.morse, orange, linux-edac, linux-kernel, mchehab, rric

From: Qiuxu Zhuo <qiuxu.zhuo@intel.com>

Currently, igen6_edac sets edac_op_state to EDAC_OPSTATE_NMI, while the
driver also supports memory errors reported from Machine Check. Initialize
edac_op_state to the correct value according to the configuration data
that the driver probed.

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
---
 drivers/edac/igen6_edac.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/edac/igen6_edac.c b/drivers/edac/igen6_edac.c
index 07dacf8c10be..fa488ba15059 100644
--- a/drivers/edac/igen6_edac.c
+++ b/drivers/edac/igen6_edac.c
@@ -1350,6 +1350,15 @@ static void unregister_err_handler(void)
 	unregister_nmi_handler(NMI_SERR, IGEN6_NMI_NAME);
 }
 
+static void opstate_set(struct res_config *cfg)
+{
+	/* Set the mode according to the configuration data. */
+	if (cfg->machine_check)
+		edac_op_state = EDAC_OPSTATE_INT;
+	else
+		edac_op_state = EDAC_OPSTATE_NMI;
+}
+
 static int igen6_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 {
 	u64 mchbar;
@@ -1367,6 +1376,8 @@ static int igen6_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	if (rc)
 		goto fail;
 
+	opstate_set(res_cfg);
+
 	for (i = 0; i < res_cfg->num_imc; i++) {
 		rc = igen6_register_mci(i, mchbar, pdev);
 		if (rc)
@@ -1450,8 +1461,6 @@ static int __init igen6_init(void)
 	if (owner && strncmp(owner, EDAC_MOD_STR, sizeof(EDAC_MOD_STR)))
 		return -EBUSY;
 
-	edac_op_state = EDAC_OPSTATE_NMI;
-
 	rc = pci_register_driver(&igen6_driver);
 	if (rc)
 		return rc;
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/3] EDAC/igen6: Add polling support
  2024-11-06 11:35 [PATCH 0/3] EDAC/igen6: Add polling support and allow setting edac_op_state Orange Kao
  2024-11-06 11:35 ` [PATCH 1/3] EDAC/igen6: Initialize edac_op_state according to the configuration data Orange Kao
@ 2024-11-06 11:35 ` Orange Kao
  2024-11-07 11:53   ` Zhuo, Qiuxu
  2024-11-06 11:35 ` [PATCH 3/3] EDAC/igen6: Allow setting edac_op_state Orange Kao
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 9+ messages in thread
From: Orange Kao @ 2024-11-06 11:35 UTC (permalink / raw)
  To: tony.luck, qiuxu.zhuo
  Cc: bp, james.morse, orange, linux-edac, linux-kernel, mchehab, rric,
	Orange Kao

Some PCs with Intel N100 (with PCI device 8086:461c, DID_ADL_N_SKU4)
experienced issues with error interrupts not working, even with the
following configuration in the BIOS.

    In-Band ECC Support: Enabled
    In-Band ECC Operation Mode: 2 (make all requests protected and
                                   ignore range checks)
    IBECC Error Injection Control: Inject Correctable Error on insertion
                                   counter
    Error Injection Insertion Count: 251658240 (0xf000000)

Add polling mode support for these machines to ensure that memory error
events are handled.

Signed-off-by: Orange Kao <orange@aiven.io>
---
 drivers/edac/igen6_edac.c | 30 ++++++++++++++++++++++++++++--
 1 file changed, 28 insertions(+), 2 deletions(-)

diff --git a/drivers/edac/igen6_edac.c b/drivers/edac/igen6_edac.c
index fa488ba15059..dd62aa1ea9c3 100644
--- a/drivers/edac/igen6_edac.c
+++ b/drivers/edac/igen6_edac.c
@@ -1170,6 +1170,20 @@ static int igen6_pci_setup(struct pci_dev *pdev, u64 *mchbar)
 	return -ENODEV;
 }
 
+static void igen6_check(struct mem_ctl_info *mci)
+{
+	struct igen6_imc *imc = mci->pvt_info;
+	u64 ecclog;
+
+	/* errsts_clear() isn't NMI-safe. Delay it in the IRQ context */
+	ecclog = ecclog_read_and_clear(imc);
+	if (!ecclog)
+		return;
+
+	if (!ecclog_gen_pool_add(imc->mc, ecclog))
+		irq_work_queue(&ecclog_irq_work);
+}
+
 static int igen6_register_mci(int mc, u64 mchbar, struct pci_dev *pdev)
 {
 	struct edac_mc_layer layers[2];
@@ -1211,6 +1225,8 @@ static int igen6_register_mci(int mc, u64 mchbar, struct pci_dev *pdev)
 	mci->edac_cap = EDAC_FLAG_SECDED;
 	mci->mod_name = EDAC_MOD_STR;
 	mci->dev_name = pci_name(pdev);
+	if (edac_op_state == EDAC_OPSTATE_POLL)
+		mci->edac_check = igen6_check;
 	mci->pvt_info = &igen6_pvt->imc[mc];
 
 	imc = mci->pvt_info;
@@ -1350,8 +1366,18 @@ static void unregister_err_handler(void)
 	unregister_nmi_handler(NMI_SERR, IGEN6_NMI_NAME);
 }
 
-static void opstate_set(struct res_config *cfg)
+static void opstate_set(struct res_config *cfg, const struct pci_device_id *ent)
 {
+	/*
+	 * Quirk: Certain SoCs' error reporting interrupts don't work.
+	 *        Force polling mode for them to ensure that memory error
+	 *        events can be handled.
+	 */
+	if (ent->device == DID_ADL_N_SKU4) {
+		edac_op_state = EDAC_OPSTATE_POLL;
+		return;
+	}
+
 	/* Set the mode according to the configuration data. */
 	if (cfg->machine_check)
 		edac_op_state = EDAC_OPSTATE_INT;
@@ -1376,7 +1402,7 @@ static int igen6_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	if (rc)
 		goto fail;
 
-	opstate_set(res_cfg);
+	opstate_set(res_cfg, ent);
 
 	for (i = 0; i < res_cfg->num_imc; i++) {
 		rc = igen6_register_mci(i, mchbar, pdev);
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/3] EDAC/igen6: Allow setting edac_op_state
  2024-11-06 11:35 [PATCH 0/3] EDAC/igen6: Add polling support and allow setting edac_op_state Orange Kao
  2024-11-06 11:35 ` [PATCH 1/3] EDAC/igen6: Initialize edac_op_state according to the configuration data Orange Kao
  2024-11-06 11:35 ` [PATCH 2/3] EDAC/igen6: Add polling support Orange Kao
@ 2024-11-06 11:35 ` Orange Kao
  2024-11-06 13:04   ` Zhuo, Qiuxu
  2024-11-06 12:05 ` [PATCH 0/3] EDAC/igen6: Add polling support and allow " Borislav Petkov
  2024-11-08 21:44 ` Tony Luck
  4 siblings, 1 reply; 9+ messages in thread
From: Orange Kao @ 2024-11-06 11:35 UTC (permalink / raw)
  To: tony.luck, qiuxu.zhuo
  Cc: bp, james.morse, orange, linux-edac, linux-kernel, mchehab, rric,
	Orange Kao

Current implementation does not allow users to set edac_op_state. As a
result, if a user needs to test different edac_op_state, they need to
compile the kernel.

This commit accepts module parameter edac_op_state which makes it easier
for users to test IBECC on their hardware.

Signed-off-by: Orange Kao <orange@aiven.io>
---
 drivers/edac/igen6_edac.c | 34 ++++++++++++++++++++++++++--------
 1 file changed, 26 insertions(+), 8 deletions(-)

diff --git a/drivers/edac/igen6_edac.c b/drivers/edac/igen6_edac.c
index dd62aa1ea9c3..025f994f7bf0 100644
--- a/drivers/edac/igen6_edac.c
+++ b/drivers/edac/igen6_edac.c
@@ -1341,16 +1341,18 @@ static int register_err_handler(void)
 {
 	int rc;
 
-	if (res_cfg->machine_check) {
+	if (edac_op_state == EDAC_OPSTATE_INT) {
 		mce_register_decode_chain(&ecclog_mce_dec);
 		return 0;
 	}
 
-	rc = register_nmi_handler(NMI_SERR, ecclog_nmi_handler,
-				  0, IGEN6_NMI_NAME);
-	if (rc) {
-		igen6_printk(KERN_ERR, "Failed to register NMI handler\n");
-		return rc;
+	if (edac_op_state == EDAC_OPSTATE_NMI) {
+		rc = register_nmi_handler(NMI_SERR, ecclog_nmi_handler,
+					  0, IGEN6_NMI_NAME);
+		if (rc) {
+			igen6_printk(KERN_ERR, "Failed to register NMI handler\n");
+			return rc;
+		}
 	}
 
 	return 0;
@@ -1358,16 +1360,29 @@ static int register_err_handler(void)
 
 static void unregister_err_handler(void)
 {
-	if (res_cfg->machine_check) {
+	if (edac_op_state == EDAC_OPSTATE_INT) {
 		mce_unregister_decode_chain(&ecclog_mce_dec);
 		return;
 	}
 
-	unregister_nmi_handler(NMI_SERR, IGEN6_NMI_NAME);
+	if (edac_op_state == EDAC_OPSTATE_NMI)
+		unregister_nmi_handler(NMI_SERR, IGEN6_NMI_NAME);
 }
 
 static void opstate_set(struct res_config *cfg, const struct pci_device_id *ent)
 {
+	switch (edac_op_state) {
+	case EDAC_OPSTATE_POLL:
+	case EDAC_OPSTATE_NMI:
+	case EDAC_OPSTATE_INT:
+		return;
+	case EDAC_OPSTATE_INVAL:
+		break;
+	default:
+		edac_op_state = EDAC_OPSTATE_INVAL;
+		break;
+	}
+
 	/*
 	 * Quirk: Certain SoCs' error reporting interrupts don't work.
 	 *        Force polling mode for them to ensure that memory error
@@ -1509,3 +1524,6 @@ module_exit(igen6_exit);
 MODULE_LICENSE("GPL v2");
 MODULE_AUTHOR("Qiuxu Zhuo");
 MODULE_DESCRIPTION("MC Driver for Intel client SoC using In-Band ECC");
+
+module_param(edac_op_state, int, 0444);
+MODULE_PARM_DESC(edac_op_state, "EDAC Error Reporting state: 0=Poll, 1=NMI, 2=Machine Check, Default=Auto detect");
-- 
2.47.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/3] EDAC/igen6: Add polling support and allow setting edac_op_state
  2024-11-06 11:35 [PATCH 0/3] EDAC/igen6: Add polling support and allow setting edac_op_state Orange Kao
                   ` (2 preceding siblings ...)
  2024-11-06 11:35 ` [PATCH 3/3] EDAC/igen6: Allow setting edac_op_state Orange Kao
@ 2024-11-06 12:05 ` Borislav Petkov
  2024-11-08 21:44 ` Tony Luck
  4 siblings, 0 replies; 9+ messages in thread
From: Borislav Petkov @ 2024-11-06 12:05 UTC (permalink / raw)
  To: Orange Kao
  Cc: tony.luck, qiuxu.zhuo, james.morse, orange, linux-edac,
	linux-kernel, mchehab, rric

On Wed, Nov 06, 2024 at 11:35:44AM +0000, Orange Kao wrote:
> I would like to propose that we keep the edac_op_state as a module
> parameter. Because it would allow users (regardless of CPU SKU) to test
> different options on their machine without compiling their own

Are you talking about an actual use case where "users" really will do that
because there actually really is such a use case out there (If so, please do
tell because *no one* is setting that parameter and I'd prefer to remove it
everywhere in favor of automatic detection.)

or

are you talking about a potential,
it-would-be-good-to-but-I-don't-know-yet-whether-it-would-really-get-used
thing?

If latter, that third patch can remain out-of-tree until an actual use case
materializes and justifies it.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH 3/3] EDAC/igen6: Allow setting edac_op_state
  2024-11-06 11:35 ` [PATCH 3/3] EDAC/igen6: Allow setting edac_op_state Orange Kao
@ 2024-11-06 13:04   ` Zhuo, Qiuxu
  2024-11-06 21:23     ` Orange Kao
  0 siblings, 1 reply; 9+ messages in thread
From: Zhuo, Qiuxu @ 2024-11-06 13:04 UTC (permalink / raw)
  To: Orange Kao, Luck, Tony
  Cc: bp@alien8.de, james.morse@arm.com, orange@kaosy.org,
	linux-edac@vger.kernel.org, linux-kernel@vger.kernel.org,
	mchehab@kernel.org, rric@kernel.org

> From: Orange Kao <orange@aiven.io>
> [...]
> Subject: [PATCH 3/3] EDAC/igen6: Allow setting edac_op_state
> 
> Current implementation does not allow users to set edac_op_state. As a
> result, if a user needs to test different edac_op_state, they need to compile
> the kernel.
> 
> This commit accepts module parameter edac_op_state which makes it easier
> for users to test IBECC on their hardware.

An SoC's (with the IBECC feature) memory error reporting type is determined. 
Switching from NMI to Machine Check or vice versa for a given SoC is pointless
in the real world.

Additionally, the interrupt mode is preferred over the polling mode unless 
the interrupt cannot work, as in the case you reported.

[ Sometimes, no choice is the best choice :-). ]

-Qiuxu

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 3/3] EDAC/igen6: Allow setting edac_op_state
  2024-11-06 13:04   ` Zhuo, Qiuxu
@ 2024-11-06 21:23     ` Orange Kao
  0 siblings, 0 replies; 9+ messages in thread
From: Orange Kao @ 2024-11-06 21:23 UTC (permalink / raw)
  To: Zhuo, Qiuxu, Luck, Tony
  Cc: bp@alien8.de, james.morse@arm.com, orange@kaosy.org,
	linux-edac@vger.kernel.org, linux-kernel@vger.kernel.org,
	mchehab@kernel.org, rric@kernel.org

On 6/11/24 13:04, Zhuo, Qiuxu wrote:
>> From: Orange Kao <orange@aiven.io>
>> [...]
>> Subject: [PATCH 3/3] EDAC/igen6: Allow setting edac_op_state
>>
>> Current implementation does not allow users to set edac_op_state. As a
>> result, if a user needs to test different edac_op_state, they need to compile
>> the kernel.
>>
>> This commit accepts module parameter edac_op_state which makes it easier
>> for users to test IBECC on their hardware.
> 
> An SoC's (with the IBECC feature) memory error reporting type is determined.
> Switching from NMI to Machine Check or vice versa for a given SoC is pointless
> in the real world.
> 
> Additionally, the interrupt mode is preferred over the polling mode unless
> the interrupt cannot work, as in the case you reported.
> 
> [ Sometimes, no choice is the best choice :-). ]
> 
> -Qiuxu

Thank you Qiuxu and Boris. Good to know. I don't have any "actual" use 
case so please exclude patch 3.

Thanks

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [PATCH 2/3] EDAC/igen6: Add polling support
  2024-11-06 11:35 ` [PATCH 2/3] EDAC/igen6: Add polling support Orange Kao
@ 2024-11-07 11:53   ` Zhuo, Qiuxu
  0 siblings, 0 replies; 9+ messages in thread
From: Zhuo, Qiuxu @ 2024-11-07 11:53 UTC (permalink / raw)
  To: Orange Kao, Luck, Tony
  Cc: bp@alien8.de, james.morse@arm.com, orange@kaosy.org,
	linux-edac@vger.kernel.org, linux-kernel@vger.kernel.org,
	mchehab@kernel.org, rric@kernel.org

> From: Orange Kao <orange@aiven.io>
> Sent: Wednesday, November 6, 2024 7:36 PM
> To: Luck, Tony <tony.luck@intel.com>; Zhuo, Qiuxu <qiuxu.zhuo@intel.com>
> Cc: bp@alien8.de; james.morse@arm.com; orange@kaosy.org; linux-
> edac@vger.kernel.org; linux-kernel@vger.kernel.org; mchehab@kernel.org;
> rric@kernel.org; Orange Kao <orange@aiven.io>
> Subject: [PATCH 2/3] EDAC/igen6: Add polling support
> 
> Some PCs with Intel N100 (with PCI device 8086:461c, DID_ADL_N_SKU4)
> experienced issues with error interrupts not working, even with the following
> configuration in the BIOS.
> 
>     In-Band ECC Support: Enabled
>     In-Band ECC Operation Mode: 2 (make all requests protected and
>                                    ignore range checks)
>     IBECC Error Injection Control: Inject Correctable Error on insertion
>                                    counter
>     Error Injection Insertion Count: 251658240 (0xf000000)
> 
> Add polling mode support for these machines to ensure that memory error
> events are handled.
> 
> Signed-off-by: Orange Kao <orange@aiven.io>

LGTM. Thanks!

    Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/3] EDAC/igen6: Add polling support and allow setting edac_op_state
  2024-11-06 11:35 [PATCH 0/3] EDAC/igen6: Add polling support and allow setting edac_op_state Orange Kao
                   ` (3 preceding siblings ...)
  2024-11-06 12:05 ` [PATCH 0/3] EDAC/igen6: Add polling support and allow " Borislav Petkov
@ 2024-11-08 21:44 ` Tony Luck
  4 siblings, 0 replies; 9+ messages in thread
From: Tony Luck @ 2024-11-08 21:44 UTC (permalink / raw)
  To: Orange Kao
  Cc: qiuxu.zhuo, bp, james.morse, orange, linux-edac, linux-kernel,
	mchehab, rric

On Wed, Nov 06, 2024 at 11:35:44AM +0000, Orange Kao wrote:
> Thank you Qiuxu and Boris.
> 
> Here is the updated patch. I would like to propose that we keep the 
> edac_op_state as a module parameter. Because it would allow users (regardless of
> CPU SKU) to test different options on their machine without compiling their own
> kernel. I hope this could lower the entry barrier and make it easier for them to
> test IBECC.
> 
> Patch 1: Initialize edac_op_state according to the configuration data
> Patch 2: Add polling support

Applied patches 1 & 2 to RAS tree. Thanks

> Patch 3: Allow setting edac_op_state

As discussed on mailing list, not taking this one as there
is no real use case.

-Tony

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-11-08 21:44 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-06 11:35 [PATCH 0/3] EDAC/igen6: Add polling support and allow setting edac_op_state Orange Kao
2024-11-06 11:35 ` [PATCH 1/3] EDAC/igen6: Initialize edac_op_state according to the configuration data Orange Kao
2024-11-06 11:35 ` [PATCH 2/3] EDAC/igen6: Add polling support Orange Kao
2024-11-07 11:53   ` Zhuo, Qiuxu
2024-11-06 11:35 ` [PATCH 3/3] EDAC/igen6: Allow setting edac_op_state Orange Kao
2024-11-06 13:04   ` Zhuo, Qiuxu
2024-11-06 21:23     ` Orange Kao
2024-11-06 12:05 ` [PATCH 0/3] EDAC/igen6: Add polling support and allow " Borislav Petkov
2024-11-08 21:44 ` Tony Luck

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox