* [PATCH v8 0/5] EDAC/Versal NET: Add support for error notification
@ 2025-08-26 5:29 Shubhrajyoti Datta
2025-08-26 5:29 ` [PATCH v8 1/5] cdx: add the headers to include/linux Shubhrajyoti Datta
` (4 more replies)
0 siblings, 5 replies; 9+ messages in thread
From: Shubhrajyoti Datta @ 2025-08-26 5:29 UTC (permalink / raw)
To: devicetree, linux-kernel, linux-edac
Cc: git, ptsm, srivatsa, shubhrajyoti.datta, Shubhrajyoti Datta,
Krzysztof Kozlowski, Rob Herring, Conor Dooley, Borislav Petkov,
Tony Luck, James Morse, Mauro Carvalho Chehab, Robert Richter,
Nipun Gupta, Nikhil Agarwal
Adds support for the error notification for the Versal NET EDAC driver.
The driver receives error events via RPMsg instead of directly accessing
hardware registers. The NMC((Network management controller), which has
secure access to DDRMC registers, gathers the necessary information and
transmits it through RPMsg.
During probe, the driver registers with RPMsg and retrieves DDR
configuration by scheduling a work item from the NMC.
Once this is completed, it registers the EDAC controller.
When an error occurs, the NMC sends an RPMsg, notifying the driver.
The EDAC driver handles error reporting for all events.
Also we register the EDAC once and it reports the errors for all the
events including the 8 DDRMC controllers. So while registering we give
the particulars of the 1st controller.
Currently 20 errors has been tested.
Changes in v8:
- Split `mcdi.h` into `mcdi.h` and `mcdid.h`
- Removed common code from CDX headers
- Used refactored versions from shared location
- Remove "EDAC" from macros and shoterned them
- Removed redundant parentheses
- Improved the description of the @i field in union ecc_error_info
- Improved logging for memory_failure()
- Merged init_csrows() into mc_init():
- Remove AMD-specific naming for static functions
- Add MAINTAINERS file
- Register all the controllers
- Replace AMD_ERR use the snprintf in a function
Changes in v7:
- add a minimal header instead moving them
- Add the kernel doc description
- Add the prototype from first patch to here
- Add the reviewed by tag
- Update the header paths
- merge edac_cdx_pcol.h
Changes in v6:
- Patch added
- Update commit description
- Update the commit message.
- update to the chip name as xlnx,versal-net
- Correct indentation
- Update to xlnx,versal-net-ddrmc5
- Update the kconfig message
- Make the messages uniform
- Add some more supported events
- rename regval to reglo
- combine/ reformat functions
- remove trailing comments
- Remove unneeded comments
- make the amd_mcdi function void
- rename versalnet_rpmsg_edac to versalnet_edac
- Remove the column bit and use them directly
- Update the comments
- Update the mod_name to versalnet_edac
- remove the global priv col and rows
- rename edac_priv to mc_priv
- Update the comment description for dwidth
- Remove error_id enum
- rename the variable par to parity
- make get_ddr_config void
- Fix memory leak of the mcdi structure
- Update the spelling
- Remove the workqueue
Changes in v5:
- Update the binding
- Update the compatible
- Update the handle_error documentation
Changes in v4:
- Update the compatible
- align the example
- Enhance the description for rproc
- Update the compatible
Changes in v3:
- make remove void
Changes in v2:
- Export the symbols for module compilation
- New patch addition
- rename EDAC to memory controller
- update the compatible name
- Add remote proc handle
- Read the data width from the registers
- Remove the dwidth, rank and channel number the same is
read from the RpMsg.
- remove reset
- Add the remote proc requests
- remove probe_once
- reorder the rpmsg registration
- the data width , rank and number of channel is read from message.
Shubhrajyoti Datta (5):
cdx: add the headers to include/linux
cdx: Export Symbols for MCDI RPC and Initialization
ras: Export log_non_standard_event for External Usage
dt-bindings: memory-controllers: Add support for Versal NET EDAC
EDAC/VersalNET: Add support for error notification
.../xlnx,versal-net-ddrmc5.yaml | 41 +
MAINTAINERS | 7 +
drivers/cdx/controller/cdx_controller.c | 2 +-
drivers/cdx/controller/cdx_rpmsg.c | 2 +-
drivers/cdx/controller/mcdi.c | 34 +-
drivers/cdx/controller/mcdi_functions.c | 1 -
drivers/cdx/controller/mcdi_functions.h | 3 +-
drivers/cdx/controller/mcdid.h | 65 +
drivers/edac/Kconfig | 11 +
drivers/edac/Makefile | 1 +
drivers/edac/versalnet_edac.c | 1077 +++++++++++++++++
drivers/ras/ras.c | 1 +
.../linux/cdx}/bitfield.h | 0
include/linux/cdx/edac_cdx_pcol.h | 28 +
.../controller => include/linux/cdx}/mcdi.h | 46 +-
15 files changed, 1268 insertions(+), 51 deletions(-)
create mode 100644 Documentation/devicetree/bindings/memory-controllers/xlnx,versal-net-ddrmc5.yaml
create mode 100644 drivers/cdx/controller/mcdid.h
create mode 100644 drivers/edac/versalnet_edac.c
rename {drivers/cdx/controller => include/linux/cdx}/bitfield.h (100%)
create mode 100644 include/linux/cdx/edac_cdx_pcol.h
rename {drivers/cdx/controller => include/linux/cdx}/mcdi.h (78%)
--
2.34.1
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v8 1/5] cdx: add the headers to include/linux
2025-08-26 5:29 [PATCH v8 0/5] EDAC/Versal NET: Add support for error notification Shubhrajyoti Datta
@ 2025-08-26 5:29 ` Shubhrajyoti Datta
2025-08-27 8:40 ` Borislav Petkov
2025-08-26 5:29 ` [PATCH v8 2/5] cdx: Export Symbols for MCDI RPC and Initialization Shubhrajyoti Datta
` (3 subsequent siblings)
4 siblings, 1 reply; 9+ messages in thread
From: Shubhrajyoti Datta @ 2025-08-26 5:29 UTC (permalink / raw)
To: devicetree, linux-kernel, linux-edac
Cc: git, ptsm, srivatsa, shubhrajyoti.datta, Shubhrajyoti Datta,
Krzysztof Kozlowski, Rob Herring, Conor Dooley, Borislav Petkov,
Tony Luck, James Morse, Mauro Carvalho Chehab, Robert Richter,
Nipun Gupta, Nikhil Agarwal
Move `bitfield.h` from the CDX controller directory to
`include/linux/cdx` to make them accessible to other drivers.
As part of this refactoring, `mcdi.h` has been split into two headers:
- `mcdi.h`: retains interface-level declarations
- `mcdid.h`: contains internal definitions and macros
This is in preparation for VersalNET EDAC
driver that relies on it.
Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@amd.com>
---
Changes in v8:
- Split `mcdi.h` into `mcdi.h` and `mcdid.h`
- Removed common code from CDX headers
- Used refactored versions from shared location
Changes in v7:
- add a minimal header instead moving them
Changes in v6:
- Patch added
drivers/cdx/controller/cdx_controller.c | 2 +-
drivers/cdx/controller/cdx_rpmsg.c | 2 +-
drivers/cdx/controller/mcdi.c | 5 +-
drivers/cdx/controller/mcdi_functions.c | 1 -
drivers/cdx/controller/mcdi_functions.h | 3 +-
drivers/cdx/controller/mcdid.h | 65 +++++++++++++++++++
.../linux/cdx}/bitfield.h | 0
.../controller => include/linux/cdx}/mcdi.h | 52 +--------------
8 files changed, 73 insertions(+), 57 deletions(-)
create mode 100644 drivers/cdx/controller/mcdid.h
rename {drivers/cdx/controller => include/linux/cdx}/bitfield.h (100%)
rename {drivers/cdx/controller => include/linux/cdx}/mcdi.h (74%)
diff --git a/drivers/cdx/controller/cdx_controller.c b/drivers/cdx/controller/cdx_controller.c
index d623f9c7517a..e943cec09fab 100644
--- a/drivers/cdx/controller/cdx_controller.c
+++ b/drivers/cdx/controller/cdx_controller.c
@@ -14,7 +14,7 @@
#include "cdx_controller.h"
#include "../cdx.h"
#include "mcdi_functions.h"
-#include "mcdi.h"
+#include "mcdid.h"
static unsigned int cdx_mcdi_rpc_timeout(struct cdx_mcdi *cdx, unsigned int cmd)
{
diff --git a/drivers/cdx/controller/cdx_rpmsg.c b/drivers/cdx/controller/cdx_rpmsg.c
index 04b578a0be17..d4f763323aac 100644
--- a/drivers/cdx/controller/cdx_rpmsg.c
+++ b/drivers/cdx/controller/cdx_rpmsg.c
@@ -15,7 +15,7 @@
#include "../cdx.h"
#include "cdx_controller.h"
#include "mcdi_functions.h"
-#include "mcdi.h"
+#include "mcdid.h"
static struct rpmsg_device_id cdx_rpmsg_id_table[] = {
{ .name = "mcdi_ipc" },
diff --git a/drivers/cdx/controller/mcdi.c b/drivers/cdx/controller/mcdi.c
index e760f8d347cc..90bf9f7c257b 100644
--- a/drivers/cdx/controller/mcdi.c
+++ b/drivers/cdx/controller/mcdi.c
@@ -23,9 +23,10 @@
#include <linux/log2.h>
#include <linux/net_tstamp.h>
#include <linux/wait.h>
+#include <linux/cdx/bitfield.h>
-#include "bitfield.h"
-#include "mcdi.h"
+#include <linux/cdx/mcdi.h>
+#include "mcdid.h"
static void cdx_mcdi_cancel_cmd(struct cdx_mcdi *cdx, struct cdx_mcdi_cmd *cmd);
static void cdx_mcdi_wait_for_cleanup(struct cdx_mcdi *cdx);
diff --git a/drivers/cdx/controller/mcdi_functions.c b/drivers/cdx/controller/mcdi_functions.c
index 885c69e6ebe5..8ae2d99be81e 100644
--- a/drivers/cdx/controller/mcdi_functions.c
+++ b/drivers/cdx/controller/mcdi_functions.c
@@ -5,7 +5,6 @@
#include <linux/module.h>
-#include "mcdi.h"
#include "mcdi_functions.h"
int cdx_mcdi_get_num_buses(struct cdx_mcdi *cdx)
diff --git a/drivers/cdx/controller/mcdi_functions.h b/drivers/cdx/controller/mcdi_functions.h
index b9942affdc6b..57fd1bae706b 100644
--- a/drivers/cdx/controller/mcdi_functions.h
+++ b/drivers/cdx/controller/mcdi_functions.h
@@ -8,7 +8,8 @@
#ifndef CDX_MCDI_FUNCTIONS_H
#define CDX_MCDI_FUNCTIONS_H
-#include "mcdi.h"
+#include <linux/cdx/mcdi.h>
+#include "mcdid.h"
#include "../cdx.h"
/**
diff --git a/drivers/cdx/controller/mcdid.h b/drivers/cdx/controller/mcdid.h
new file mode 100644
index 000000000000..5014b04ed710
--- /dev/null
+++ b/drivers/cdx/controller/mcdid.h
@@ -0,0 +1,65 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright 2008-2013 Solarflare Communications Inc.
+ * Copyright (C) 2022-2023, Advanced Micro Devices, Inc.
+ */
+
+#ifndef CDX_MCDID_H
+#define CDX_MCDID_H
+
+#include <linux/mutex.h>
+#include <linux/kref.h>
+#include <linux/rpmsg.h>
+
+#include "mc_cdx_pcol.h"
+
+#ifdef DEBUG
+#define CDX_WARN_ON_ONCE_PARANOID(x) WARN_ON_ONCE(x)
+#define CDX_WARN_ON_PARANOID(x) WARN_ON(x)
+#else
+#define CDX_WARN_ON_ONCE_PARANOID(x) do {} while (0)
+#define CDX_WARN_ON_PARANOID(x) do {} while (0)
+#endif
+
+#define MCDI_BUF_LEN (8 + MCDI_CTL_SDU_LEN_MAX)
+
+static inline struct cdx_mcdi_iface *cdx_mcdi_if(struct cdx_mcdi *cdx)
+{
+ return cdx->mcdi ? &cdx->mcdi->iface : NULL;
+}
+
+void cdx_mcdi_finish(struct cdx_mcdi *cdx);
+
+int cdx_mcdi_rpc_async(struct cdx_mcdi *cdx, unsigned int cmd,
+ const struct cdx_dword *inbuf, size_t inlen,
+ cdx_mcdi_async_completer *complete,
+ unsigned long cookie);
+int cdx_mcdi_wait_for_quiescence(struct cdx_mcdi *cdx,
+ unsigned int timeout_jiffies);
+
+/*
+ * We expect that 16- and 32-bit fields in MCDI requests and responses
+ * are appropriately aligned, but 64-bit fields are only
+ * 32-bit-aligned.
+ */
+#define MCDI_BYTE(_buf, _field) \
+ ((void)BUILD_BUG_ON_ZERO(MC_CMD_ ## _field ## _LEN != 1), \
+ *MCDI_PTR(_buf, _field))
+#define MCDI_WORD(_buf, _field) \
+ ((void)BUILD_BUG_ON_ZERO(MC_CMD_ ## _field ## _LEN != 2), \
+ le16_to_cpu(*(__force const __le16 *)MCDI_PTR(_buf, _field)))
+#define MCDI_POPULATE_DWORD_1(_buf, _field, _name1, _value1) \
+ CDX_POPULATE_DWORD_1(*_MCDI_DWORD(_buf, _field), \
+ MC_CMD_ ## _name1, _value1)
+#define MCDI_SET_QWORD(_buf, _field, _value) \
+ do { \
+ CDX_POPULATE_DWORD_1(_MCDI_DWORD(_buf, _field)[0], \
+ CDX_DWORD, (u32)(_value)); \
+ CDX_POPULATE_DWORD_1(_MCDI_DWORD(_buf, _field)[1], \
+ CDX_DWORD, (u64)(_value) >> 32); \
+ } while (0)
+#define MCDI_QWORD(_buf, _field) \
+ (CDX_DWORD_FIELD(_MCDI_DWORD(_buf, _field)[0], CDX_DWORD) | \
+ (u64)CDX_DWORD_FIELD(_MCDI_DWORD(_buf, _field)[1], CDX_DWORD) << 32)
+
+#endif /* CDX_MCDID_H */
diff --git a/drivers/cdx/controller/bitfield.h b/include/linux/cdx/bitfield.h
similarity index 100%
rename from drivers/cdx/controller/bitfield.h
rename to include/linux/cdx/bitfield.h
diff --git a/drivers/cdx/controller/mcdi.h b/include/linux/cdx/mcdi.h
similarity index 74%
rename from drivers/cdx/controller/mcdi.h
rename to include/linux/cdx/mcdi.h
index 54a65e9760ae..46e3f63b062a 100644
--- a/drivers/cdx/controller/mcdi.h
+++ b/include/linux/cdx/mcdi.h
@@ -11,16 +11,7 @@
#include <linux/kref.h>
#include <linux/rpmsg.h>
-#include "bitfield.h"
-#include "mc_cdx_pcol.h"
-
-#ifdef DEBUG
-#define CDX_WARN_ON_ONCE_PARANOID(x) WARN_ON_ONCE(x)
-#define CDX_WARN_ON_PARANOID(x) WARN_ON(x)
-#else
-#define CDX_WARN_ON_ONCE_PARANOID(x) do {} while (0)
-#define CDX_WARN_ON_PARANOID(x) do {} while (0)
-#endif
+#include "linux/cdx/bitfield.h"
/**
* enum cdx_mcdi_mode - MCDI transaction mode
@@ -36,8 +27,6 @@ enum cdx_mcdi_mode {
#define MCDI_RPC_LONG_TIMEOU (60 * HZ)
#define MCDI_RPC_POST_RST_TIME (10 * HZ)
-#define MCDI_BUF_LEN (8 + MCDI_CTL_SDU_LEN_MAX)
-
/**
* enum cdx_mcdi_cmd_state - State for an individual MCDI command
* @MCDI_STATE_QUEUED: Command not started and is waiting to run.
@@ -180,25 +169,6 @@ struct cdx_mcdi_data {
u32 fn_flags;
};
-static inline struct cdx_mcdi_iface *cdx_mcdi_if(struct cdx_mcdi *cdx)
-{
- return cdx->mcdi ? &cdx->mcdi->iface : NULL;
-}
-
-int cdx_mcdi_init(struct cdx_mcdi *cdx);
-void cdx_mcdi_finish(struct cdx_mcdi *cdx);
-
-void cdx_mcdi_process_cmd(struct cdx_mcdi *cdx, struct cdx_dword *outbuf, int len);
-int cdx_mcdi_rpc(struct cdx_mcdi *cdx, unsigned int cmd,
- const struct cdx_dword *inbuf, size_t inlen,
- struct cdx_dword *outbuf, size_t outlen, size_t *outlen_actual);
-int cdx_mcdi_rpc_async(struct cdx_mcdi *cdx, unsigned int cmd,
- const struct cdx_dword *inbuf, size_t inlen,
- cdx_mcdi_async_completer *complete,
- unsigned long cookie);
-int cdx_mcdi_wait_for_quiescence(struct cdx_mcdi *cdx,
- unsigned int timeout_jiffies);
-
/*
* We expect that 16- and 32-bit fields in MCDI requests and responses
* are appropriately aligned, but 64-bit fields are only
@@ -215,28 +185,8 @@ int cdx_mcdi_wait_for_quiescence(struct cdx_mcdi *cdx,
#define _MCDI_DWORD(_buf, _field) \
((_buf) + (_MCDI_CHECK_ALIGN(MC_CMD_ ## _field ## _OFST, 4) >> 2))
-#define MCDI_BYTE(_buf, _field) \
- ((void)BUILD_BUG_ON_ZERO(MC_CMD_ ## _field ## _LEN != 1), \
- *MCDI_PTR(_buf, _field))
-#define MCDI_WORD(_buf, _field) \
- ((void)BUILD_BUG_ON_ZERO(MC_CMD_ ## _field ## _LEN != 2), \
- le16_to_cpu(*(__force const __le16 *)MCDI_PTR(_buf, _field)))
#define MCDI_SET_DWORD(_buf, _field, _value) \
CDX_POPULATE_DWORD_1(*_MCDI_DWORD(_buf, _field), CDX_DWORD, _value)
#define MCDI_DWORD(_buf, _field) \
CDX_DWORD_FIELD(*_MCDI_DWORD(_buf, _field), CDX_DWORD)
-#define MCDI_POPULATE_DWORD_1(_buf, _field, _name1, _value1) \
- CDX_POPULATE_DWORD_1(*_MCDI_DWORD(_buf, _field), \
- MC_CMD_ ## _name1, _value1)
-#define MCDI_SET_QWORD(_buf, _field, _value) \
- do { \
- CDX_POPULATE_DWORD_1(_MCDI_DWORD(_buf, _field)[0], \
- CDX_DWORD, (u32)(_value)); \
- CDX_POPULATE_DWORD_1(_MCDI_DWORD(_buf, _field)[1], \
- CDX_DWORD, (u64)(_value) >> 32); \
- } while (0)
-#define MCDI_QWORD(_buf, _field) \
- (CDX_DWORD_FIELD(_MCDI_DWORD(_buf, _field)[0], CDX_DWORD) | \
- (u64)CDX_DWORD_FIELD(_MCDI_DWORD(_buf, _field)[1], CDX_DWORD) << 32)
-
#endif /* CDX_MCDI_H */
--
2.34.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH v8 2/5] cdx: Export Symbols for MCDI RPC and Initialization
2025-08-26 5:29 [PATCH v8 0/5] EDAC/Versal NET: Add support for error notification Shubhrajyoti Datta
2025-08-26 5:29 ` [PATCH v8 1/5] cdx: add the headers to include/linux Shubhrajyoti Datta
@ 2025-08-26 5:29 ` Shubhrajyoti Datta
2025-08-29 12:03 ` Borislav Petkov
2025-08-26 5:29 ` [PATCH v8 3/5] ras: Export log_non_standard_event for External Usage Shubhrajyoti Datta
` (2 subsequent siblings)
4 siblings, 1 reply; 9+ messages in thread
From: Shubhrajyoti Datta @ 2025-08-26 5:29 UTC (permalink / raw)
To: devicetree, linux-kernel, linux-edac
Cc: git, ptsm, srivatsa, shubhrajyoti.datta, Shubhrajyoti Datta,
Krzysztof Kozlowski, Rob Herring, Conor Dooley, Borislav Petkov,
Tony Luck, James Morse, Mauro Carvalho Chehab, Robert Richter,
Nipun Gupta, Nikhil Agarwal
The cdx_mcdi_init, cdx_mcdi_process_cmd, and cdx_mcdi_rpc functions are
needed by VersalNET EDAC modules that interact with the MCDI (Management
Controller Direct Interface) framework. These functions facilitate
communication between different hardware components by enabling command
execution and status management.
Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@amd.com>
---
(no changes since v7)
Changes in v7:
- Add the kernel doc description
- Add the prototype from first patch to here
Changes in v6:
- Update commit description
Changes in v2:
- Export the symbols for module compilation
drivers/cdx/controller/mcdi.c | 29 +++++++++++++++++++++++++++++
include/linux/cdx/mcdi.h | 6 ++++++
2 files changed, 35 insertions(+)
diff --git a/drivers/cdx/controller/mcdi.c b/drivers/cdx/controller/mcdi.c
index 90bf9f7c257b..6f52d8dac907 100644
--- a/drivers/cdx/controller/mcdi.c
+++ b/drivers/cdx/controller/mcdi.c
@@ -100,6 +100,19 @@ static unsigned long cdx_mcdi_rpc_timeout(struct cdx_mcdi *cdx, unsigned int cmd
return cdx->mcdi_ops->mcdi_rpc_timeout(cdx, cmd);
}
+/**
+ * cdx_mcdi_init - Initialize MCDI (Management Controller Driver Interface) state
+ * @cdx: NIC through which to issue the command
+ *
+ * This function allocates and initializes internal MCDI structures and resources
+ * for the CDX device, including the workqueue, locking primitives, and command
+ * tracking mechanisms. It sets the initial operating mode and prepares the device
+ * for MCDI operations.
+ *
+ * Return:
+ * * 0 - on success
+ * * -ENOMEM - if memory allocation or workqueue creation fails
+ */
int cdx_mcdi_init(struct cdx_mcdi *cdx)
{
struct cdx_mcdi_iface *mcdi;
@@ -129,6 +142,7 @@ int cdx_mcdi_init(struct cdx_mcdi *cdx)
fail:
return rc;
}
+EXPORT_SYMBOL_GPL(cdx_mcdi_init);
void cdx_mcdi_finish(struct cdx_mcdi *cdx)
{
@@ -554,6 +568,19 @@ static void cdx_mcdi_start_or_queue(struct cdx_mcdi_iface *mcdi,
cdx_mcdi_cmd_start_or_queue(mcdi, cmd);
}
+/**
+ * cdx_mcdi_process_cmd - Process an incoming MCDI response
+ * @cdx: NIC through which to issue the command
+ * @outbuf: Pointer to the response buffer received from the management controller
+ * @len: Length of the response buffer in bytes
+ *
+ * This function handles a response from the management controller. It locates the
+ * corresponding command using the sequence number embedded in the header,
+ * completes the command if it is still pending, and initiates any necessary cleanup.
+ *
+ * The function assumes that the response buffer is well-formed and at least one
+ * dword in size.
+ */
void cdx_mcdi_process_cmd(struct cdx_mcdi *cdx, struct cdx_dword *outbuf, int len)
{
struct cdx_mcdi_iface *mcdi;
@@ -591,6 +618,7 @@ void cdx_mcdi_process_cmd(struct cdx_mcdi *cdx, struct cdx_dword *outbuf, int le
cdx_mcdi_process_cleanup_list(mcdi->cdx, &cleanup_list);
}
+EXPORT_SYMBOL_GPL(cdx_mcdi_process_cmd);
static void cdx_mcdi_cmd_work(struct work_struct *context)
{
@@ -758,6 +786,7 @@ int cdx_mcdi_rpc(struct cdx_mcdi *cdx, unsigned int cmd,
return cdx_mcdi_rpc_sync(cdx, cmd, inbuf, inlen, outbuf, outlen,
outlen_actual, false);
}
+EXPORT_SYMBOL_GPL(cdx_mcdi_rpc);
/**
* cdx_mcdi_rpc_async - Schedule an MCDI command to run asynchronously
diff --git a/include/linux/cdx/mcdi.h b/include/linux/cdx/mcdi.h
index 46e3f63b062a..1344119e9a2c 100644
--- a/include/linux/cdx/mcdi.h
+++ b/include/linux/cdx/mcdi.h
@@ -169,6 +169,12 @@ struct cdx_mcdi_data {
u32 fn_flags;
};
+int cdx_mcdi_init(struct cdx_mcdi *cdx);
+void cdx_mcdi_process_cmd(struct cdx_mcdi *cdx, struct cdx_dword *outbuf, int len);
+int cdx_mcdi_rpc(struct cdx_mcdi *cdx, unsigned int cmd,
+ const struct cdx_dword *inbuf, size_t inlen,
+ struct cdx_dword *outbuf, size_t outlen, size_t *outlen_actual);
+
/*
* We expect that 16- and 32-bit fields in MCDI requests and responses
* are appropriately aligned, but 64-bit fields are only
--
2.34.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH v8 3/5] ras: Export log_non_standard_event for External Usage
2025-08-26 5:29 [PATCH v8 0/5] EDAC/Versal NET: Add support for error notification Shubhrajyoti Datta
2025-08-26 5:29 ` [PATCH v8 1/5] cdx: add the headers to include/linux Shubhrajyoti Datta
2025-08-26 5:29 ` [PATCH v8 2/5] cdx: Export Symbols for MCDI RPC and Initialization Shubhrajyoti Datta
@ 2025-08-26 5:29 ` Shubhrajyoti Datta
2025-09-01 15:16 ` Borislav Petkov
2025-08-26 5:29 ` [PATCH v8 4/5] dt-bindings: memory-controllers: Add support for Versal NET EDAC Shubhrajyoti Datta
2025-08-26 5:29 ` [PATCH v8 5/5] EDAC/VersalNET: Add support for error notification Shubhrajyoti Datta
4 siblings, 1 reply; 9+ messages in thread
From: Shubhrajyoti Datta @ 2025-08-26 5:29 UTC (permalink / raw)
To: devicetree, linux-kernel, linux-edac
Cc: git, ptsm, srivatsa, shubhrajyoti.datta, Shubhrajyoti Datta,
Krzysztof Kozlowski, Rob Herring, Conor Dooley, Borislav Petkov,
Tony Luck, James Morse, Mauro Carvalho Chehab, Robert Richter,
Nipun Gupta, Nikhil Agarwal
The function log_non_standard_event is responsible for logging
platform-specific or vendor-defined RAS (Reliability, Availability,
and Serviceability) events. Currently, this function is only available
within the RAS subsystem, preventing external modules from
leveraging its capabilities.
log_non_standard_event is exported so that external drivers like VersalNet
EDAC can log non-standard RAS events.
Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@amd.com>
---
(no changes since v6)
Changes in v6:
- Update the commit message.
Changes in v2:
- New patch addition
drivers/ras/ras.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/ras/ras.c b/drivers/ras/ras.c
index a6e4792a1b2e..ac0e132ccc3e 100644
--- a/drivers/ras/ras.c
+++ b/drivers/ras/ras.c
@@ -51,6 +51,7 @@ void log_non_standard_event(const guid_t *sec_type, const guid_t *fru_id,
{
trace_non_standard_event(sec_type, fru_id, fru_text, sev, err, len);
}
+EXPORT_SYMBOL_GPL(log_non_standard_event);
void log_arm_hw_error(struct cper_sec_proc_arm *err)
{
--
2.34.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH v8 4/5] dt-bindings: memory-controllers: Add support for Versal NET EDAC
2025-08-26 5:29 [PATCH v8 0/5] EDAC/Versal NET: Add support for error notification Shubhrajyoti Datta
` (2 preceding siblings ...)
2025-08-26 5:29 ` [PATCH v8 3/5] ras: Export log_non_standard_event for External Usage Shubhrajyoti Datta
@ 2025-08-26 5:29 ` Shubhrajyoti Datta
2025-08-26 5:29 ` [PATCH v8 5/5] EDAC/VersalNET: Add support for error notification Shubhrajyoti Datta
4 siblings, 0 replies; 9+ messages in thread
From: Shubhrajyoti Datta @ 2025-08-26 5:29 UTC (permalink / raw)
To: devicetree, linux-kernel, linux-edac
Cc: git, ptsm, srivatsa, shubhrajyoti.datta, Shubhrajyoti Datta,
Krzysztof Kozlowski, Rob Herring, Conor Dooley, Borislav Petkov,
Tony Luck, James Morse, Mauro Carvalho Chehab, Robert Richter,
Nipun Gupta, Nikhil Agarwal
Add device tree bindings for AMD Versal NET EDAC for DDR controller.
Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@amd.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
---
(no changes since v7)
Changes in v7:
- Add the reviewed by tag
Changes in v6:
- update to the chip name as xlnx,versal-net
- Correct indentation
Changes in v5:
- Update the binding
Changes in v4:
- Update the compatible
- align the example
- Enhance the description for rproc
Changes in v2:
- rename EDAC to memory controller
- update the compatible name
- Add remote proc handle
- Read the data width from the registers
- Remove the dwidth, rank and channel number the same is
read from the RpMsg.
.../xlnx,versal-net-ddrmc5.yaml | 41 +++++++++++++++++++
1 file changed, 41 insertions(+)
create mode 100644 Documentation/devicetree/bindings/memory-controllers/xlnx,versal-net-ddrmc5.yaml
diff --git a/Documentation/devicetree/bindings/memory-controllers/xlnx,versal-net-ddrmc5.yaml b/Documentation/devicetree/bindings/memory-controllers/xlnx,versal-net-ddrmc5.yaml
new file mode 100644
index 000000000000..479288567d0b
--- /dev/null
+++ b/Documentation/devicetree/bindings/memory-controllers/xlnx,versal-net-ddrmc5.yaml
@@ -0,0 +1,41 @@
+# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/memory-controllers/xlnx,versal-net-ddrmc5.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Xilinx Versal NET Memory Controller
+
+maintainers:
+ - Shubhrajyoti Datta <shubhrajyoti.datta@amd.com>
+
+description:
+ The integrated DDR Memory Controllers (DDRMCs) support both DDR5 and LPDDR5
+ compact and extended memory interfaces. Versal NET DDR memory controller
+ has an optional ECC support which correct single bit ECC errors and detect
+ double bit ECC errors. It also has support for reporting other errors like
+ MMCM (Mixed-Mode Clock Manager) errors and General software errors.
+
+properties:
+ compatible:
+ const: xlnx,versal-net-ddrmc5
+
+ amd,rproc:
+ $ref: /schemas/types.yaml#/definitions/phandle
+ description:
+ phandle to the remoteproc_r5 rproc node using which APU interacts
+ with remote processor. APU primarily communicates with the RPU for
+ accessing the DDRMC address space and getting error notification.
+
+required:
+ - compatible
+ - amd,rproc
+
+additionalProperties: false
+
+examples:
+ - |
+ memory-controller {
+ compatible = "xlnx,versal-net-ddrmc5";
+ amd,rproc = <&remoteproc_r5>;
+ };
--
2.34.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH v8 5/5] EDAC/VersalNET: Add support for error notification
2025-08-26 5:29 [PATCH v8 0/5] EDAC/Versal NET: Add support for error notification Shubhrajyoti Datta
` (3 preceding siblings ...)
2025-08-26 5:29 ` [PATCH v8 4/5] dt-bindings: memory-controllers: Add support for Versal NET EDAC Shubhrajyoti Datta
@ 2025-08-26 5:29 ` Shubhrajyoti Datta
4 siblings, 0 replies; 9+ messages in thread
From: Shubhrajyoti Datta @ 2025-08-26 5:29 UTC (permalink / raw)
To: devicetree, linux-kernel, linux-edac
Cc: git, ptsm, srivatsa, shubhrajyoti.datta, Shubhrajyoti Datta,
Krzysztof Kozlowski, Rob Herring, Conor Dooley, Borislav Petkov,
Tony Luck, James Morse, Mauro Carvalho Chehab, Robert Richter,
Nipun Gupta, Nikhil Agarwal
Add support for single bit error correction, double bit error detection
on AMD Versal NET DDR memory controller and other system errors
from various IP subsystems (e.g., RPU, NOCs, HNICX, PL) reporting.
The Versal NET EDAC listens to the notifications from NMC(Network
management controller) on RPMsg (Remote Processor Messaging).
The channel used for communicating to RPMsg is named "error_edac".
Upon receiving the notification the Versal NET edac driver
sends a RAS((Reliability, Availability, and Serviceability) event
trace. This aids the user space application to decide on the
corrective action.
For reporting events driver registers to the RAS framework
specifically:
Memory errors are reported through the Memory Controller (MC) events.
Non-memory errors are reported using non-standard RAS events.
Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@amd.com>
---
Changes in v8:
- Remove "EDAC" from macros and shortened them
- Removed redundant parentheses
- Improved the description of the @i field in union ecc_error_info
- Improved logging for memory_failure()
- Merged init_csrows() into mc_init():
- Remove AMD-specific naming for static functions
- Add MAINTAINERS file
- Register all the controllers
- Replace AMD_ERR use the versalnet_snprintf in a function
Changes in v7:
- Update the header paths
- merge edac_cdx_pcol.h
Changes in v6:
- Update to xlnx,versal-net-ddrmc5
- Update the kconfig message
- Make the messages uniform
- Add some more supported events
- rename regval to reglo
- combine/ reformat functions
- remove trailing comments
- Remove unneeded comments
- make the amd_mcdi function void
- rename versalnet_rpmsg_edac to versalnet_edac
- Remove the column bit and use them directly
- Update the comments
- Update the mod_name to versalnet_edac
- remove the global priv col and rows
- rename edac_priv to mc_priv
- Update the comment description for dwidth
- Remove error_id enum
- rename the variable par to parity
- make get_ddr_config void
- Fix memory leak of the mcdi structure
- Update the spelling
- Remove the workqueue
Changes in v5:
- Update the compatible
- Update the handle_error documentation
Changes in v4:
- Update the compatible
Changes in v3:
- make remove void
Changes in v2:
- remove reset
- Add the remote proc requests
- remove probe_once
- reorder the rpmsg registration
- the data width , rank and number of channel is read from message.
MAINTAINERS | 7 +
drivers/edac/Kconfig | 11 +
drivers/edac/Makefile | 1 +
drivers/edac/versalnet_edac.c | 1077 +++++++++++++++++++++++++++++
include/linux/cdx/edac_cdx_pcol.h | 28 +
5 files changed, 1124 insertions(+)
create mode 100644 drivers/edac/versalnet_edac.c
create mode 100644 include/linux/cdx/edac_cdx_pcol.h
diff --git a/MAINTAINERS b/MAINTAINERS
index fa1e04e87d1d..53e2349ce577 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -26564,6 +26564,13 @@ S: Maintained
F: Documentation/devicetree/bindings/memory-controllers/xlnx,versal-ddrmc-edac.yaml
F: drivers/edac/versal_edac.c
+XILINX VERSALNET EDAC DRIVER
+M: Shubhrajyoti Datta <shubhrajyoti.datta@amd.com>
+S: Maintained
+F: Documentation/devicetree/bindings/memory-controllers/xlnx,versal-net-ddrmc5.yaml
+F: drivers/edac/versalnet_edac.c
+F: include/linux/cdx/edac_cdx_pcol.h
+
XILINX WATCHDOG DRIVER
M: Srinivas Neeli <srinivas.neeli@amd.com>
R: Shubhrajyoti Datta <shubhrajyoti.datta@amd.com>
diff --git a/drivers/edac/Kconfig b/drivers/edac/Kconfig
index 19ad3c3b675d..081bccd3405b 100644
--- a/drivers/edac/Kconfig
+++ b/drivers/edac/Kconfig
@@ -576,4 +576,15 @@ config EDAC_LOONGSON
errors (CE) only. Loongson-3A5000/3C5000/3D5000/3A6000/3C6000
are compatible.
+config EDAC_VERSALNET
+ tristate "AMD Versal NET EDAC"
+ depends on CDX_CONTROLLER && ARCH_ZYNQMP
+ help
+ Support for single bit error correction, double bit error detection on
+ the AMD Versal NET DDR memory controller and other system errors
+ from various IP subsystems (e.g., RPU, NOCs, HNICX, PL).
+
+ Report single bit errors (CE), double bit errors (UE) and
+ errors from other IP subsystems like RPU, APU, NOC, HNICX and PL.
+
endif # EDAC
diff --git a/drivers/edac/Makefile b/drivers/edac/Makefile
index a8f2d8f6c894..8eca81f04160 100644
--- a/drivers/edac/Makefile
+++ b/drivers/edac/Makefile
@@ -88,3 +88,4 @@ obj-$(CONFIG_EDAC_NPCM) += npcm_edac.o
obj-$(CONFIG_EDAC_ZYNQMP) += zynqmp_edac.o
obj-$(CONFIG_EDAC_VERSAL) += versal_edac.o
obj-$(CONFIG_EDAC_LOONGSON) += loongson_edac.o
+obj-$(CONFIG_EDAC_VERSALNET) += versalnet_edac.o
diff --git a/drivers/edac/versalnet_edac.c b/drivers/edac/versalnet_edac.c
new file mode 100644
index 000000000000..d9d3394a03fa
--- /dev/null
+++ b/drivers/edac/versalnet_edac.c
@@ -0,0 +1,1077 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * AMD Versal NET memory controller driver
+ * Copyright (C) 2025 Advanced Micro Devices, Inc.
+ */
+
+#include <linux/cdx/edac_cdx_pcol.h>
+#include <linux/edac.h>
+#include <linux/module.h>
+#include <linux/of_device.h>
+#include <linux/ras.h>
+#include <linux/remoteproc.h>
+#include <linux/rpmsg.h>
+#include <linux/sizes.h>
+#include <ras/ras_event.h>
+
+#include "edac_module.h"
+
+/* Granularity of reported error in bytes */
+#define MC5_ERR_GRAIN 1
+#define MC_GET_DDR_CONFIG_IN_LEN 4
+
+#define MC5_MSG_SIZE 256
+
+#define MC5_IRQ_CE_MASK GENMASK(18, 15)
+#define MC5_IRQ_UE_MASK GENMASK(14, 11)
+
+#define MC5_RANK_1_MASK GENMASK(11, 6)
+#define MASK_24 GENMASK(29, 24)
+#define MASK_0 GENMASK(5, 0)
+
+#define MC5_LRANK_1_MASK GENMASK(11, 6)
+#define MC5_LRANK_2_MASK GENMASK(17, 12)
+#define MC5_BANK1_MASK GENMASK(11, 6)
+#define MC5_GRP_0_MASK GENMASK(17, 12)
+#define MC5_GRP_1_MASK GENMASK(23, 18)
+
+#define ECCR_UE_CE_ADDR_HI_ROW_MASK GENMASK(10, 0)
+
+#define MC5_MAX_ROW_CNT 18
+#define MC5_MAX_COL_CNT 11
+#define MC5_MAX_RANK_CNT 2
+#define MC5_MAX_LRANK_CNT 4
+#define MC5_MAX_BANK_CNT 2
+#define MC5_MAX_GRP_CNT 3
+
+#define MC5_REGHI_ROW 7
+#define MC5_EACHBIT 1
+#define MC5_ERR_TYPE_CE 0
+#define MC5_ERR_TYPE_UE 1
+#define MC5_HIGH_MEM_EN BIT(20)
+#define MC5_MEM_MASK GENMASK(19, 0)
+#define MC5_X16_BASE 256
+#define MC5_X16_ECC 32
+#define MC5_X16_SIZE (MC5_X16_BASE + MC5_X16_ECC)
+#define MC5_X32_SIZE 576
+#define MC5_HIMEM_BASE (256 * SZ_1M)
+#define MC5_ILC_HIMEM_EN BIT(28)
+#define MC5_ILC_MEM GENMASK(27, 0)
+#define MC5_INTERLEAVE_SEL GENMASK(3, 0)
+#define MC5_BUS_WIDTH_MASK GENMASK(19, 18)
+#define MC5_NUM_CHANS_MASK BIT(17)
+#define MC5_RANK_MASK GENMASK(15, 14)
+#define MC5_DWIDTH_MASK GENMASK(5, 4)
+
+#define AMD_MIN_BUF_LEN 0x28
+#define AMD_ERROR_LEVEL 2
+#define AMD_ERRORID 3
+#define TOTAL_ERR_LENGTH 5
+#define AMD_MSG_ERR_OFFSET 8
+#define AMD_MSG_ERR_LENGTH 9
+#define AMD_ERR_DATA 10
+#define MCDI_RESPONSE 0xFF
+
+#define ERR_NOTIFICATION_MAX 96
+#define REG_MAX 152
+#define ADEC_MAX 152
+#define NUM_CONTROLLERS 8
+#define REGS_PER_CONTROLLER 19
+#define ADEC_NUM 19
+#define MC_CMD_EDAC_GET_OVERALL_DDR_CONFIG 2
+#define BUFFER_SZ 80
+
+#define XDDR5_BUS_WIDTH_64 0
+#define XDDR5_BUS_WIDTH_32 1
+#define XDDR5_BUS_WIDTH_16 2
+
+static inline void versalnet_edac_snprintf(char *buf, size_t size, const char *msg, int err_id)
+{
+ snprintf(buf, size, "[VERSAL_EDAC_ERR_ID: %d] Error type: %s", err_id, msg);
+}
+
+/**
+ * struct ecc_error_info - ECC error log information.
+ * @burstpos: Burst position.
+ * @lrank: Logical Rank number.
+ * @rank: Rank number.
+ * @group: Group number.
+ * @bank: Bank number.
+ * @col: Column number.
+ * @row: Row number.
+ * @rowhi: Row number higher bits.
+ * @i: Combined ECC error vector containing encoded values of burst position,
+ * rank, bank, column, and row information.
+ *
+ */
+union ecc_error_info {
+ struct {
+ u32 burstpos:3;
+ u32 lrank:4;
+ u32 rank:2;
+ u32 group:3;
+ u32 bank:2;
+ u32 col:11;
+ u32 row:7;
+ u32 rowhi;
+ };
+ u64 i;
+} __packed;
+
+/*
+ * * Row and column bit positions in ADEC (address decoder) registers.
+ */
+union row_col_mapping {
+ struct {
+ u32 row0:6;
+ u32 row1:6;
+ u32 row2:6;
+ u32 row3:6;
+ u32 row4:6;
+ u32 reserved:2;
+ };
+ struct {
+ u32 col1:6;
+ u32 col2:6;
+ u32 col3:6;
+ u32 col4:6;
+ u32 col5:6;
+ u32 reservedcol:2;
+ };
+ u32 i;
+} __packed;
+
+/**
+ * struct ecc_status - ECC status information to report.
+ * @ceinfo: Correctable error log information.
+ * @ueinfo: Uncorrected error log information.
+ * @channel: Channel number.
+ * @error_type: Error type information.
+ */
+struct ecc_status {
+ union ecc_error_info ceinfo[2];
+ union ecc_error_info ueinfo[2];
+ u8 channel;
+ u8 error_type;
+};
+
+/**
+ * struct mc_priv - DDR memory controller private instance data.
+ * @message: Buffer for framing the event specific info.
+ * @stat: ECC status information.
+ * @error_id: The error id.
+ * @error_level: The error level.
+ * @dwidth: Width of data bus excluding ECC bits.
+ * @part_len: The support of the message received.
+ * @regs: The registers sent on the rpmsg.
+ * @adec: Address decode registers.
+ * @mci: Memory controller interface.
+ * @ept: rpmsg endpoint.
+ * @mcdi: The mcdi handle.
+ */
+struct mc_priv {
+ char message[MC5_MSG_SIZE];
+ struct ecc_status stat;
+ u32 error_id;
+ u32 error_level;
+ u32 dwidth;
+ u32 part_len;
+ u32 regs[REG_MAX];
+ u32 adec[ADEC_MAX];
+ struct mem_ctl_info *mci[NUM_CONTROLLERS];
+ struct rpmsg_endpoint *ept;
+ struct cdx_mcdi *mcdi;
+};
+
+/* Address decoder (ADEC) register information
+ * To match the order in which the register information is received from
+ * firmware
+ */
+enum adec_info {
+ CONF = 0,
+ ADEC0,
+ ADEC1,
+ ADEC2,
+ ADEC3,
+ ADEC4,
+ ADEC5,
+ ADEC6,
+ ADEC7,
+ ADEC8,
+ ADEC9,
+ ADEC10,
+ ADEC11,
+ ADEC12,
+ ADEC13,
+ ADEC14,
+ ADEC15,
+ ADEC16,
+ ADECILC,
+};
+
+enum reg_info {
+ ISR = 0,
+ IMR,
+ ECCR0_ERR_STATUS,
+ ECCR0_ADDR_LO,
+ ECCR0_ADDR_HI,
+ ECCR0_DATA_LO,
+ ECCR0_DATA_HI,
+ ECCR0_PAR,
+ ECCR1_ERR_STATUS,
+ ECCR1_ADDR_LO,
+ ECCR1_ADDR_HI,
+ ECCR1_DATA_LO,
+ ECCR1_DATA_HI,
+ ECCR1_PAR,
+ XMPU_ERR,
+ XMPU_ERR_ADDR_L0,
+ XMPU_ERR_ADDR_HI,
+ XMPU_ERR_AXI_ID,
+ ADEC_CHK_ERR_LOG,
+};
+
+static bool get_ddr_info(u32 *error_data, struct mc_priv *priv)
+{
+ u32 reglo, reghi, parity, eccr0_val, eccr1_val, isr;
+ struct ecc_status *p;
+
+ p = &priv->stat;
+
+ isr = error_data[ISR];
+
+ if (!(isr & (MC5_IRQ_UE_MASK | MC5_IRQ_CE_MASK)))
+ return false;
+
+ eccr0_val = error_data[ECCR0_ERR_STATUS];
+ eccr1_val = error_data[ECCR1_ERR_STATUS];
+
+ if (!eccr0_val && !eccr1_val)
+ return false;
+
+ if (!eccr0_val)
+ p->channel = 1;
+ else
+ p->channel = 0;
+
+ reglo = error_data[ECCR0_ADDR_LO];
+ reghi = error_data[ECCR0_ADDR_HI];
+ if (isr & MC5_IRQ_CE_MASK)
+ p->ceinfo[0].i = reglo | (u64)reghi << 32;
+ else if ((isr & MC5_IRQ_UE_MASK))
+ p->ueinfo[0].i = reglo | (u64)reghi << 32;
+
+ parity = error_data[ECCR0_PAR];
+ edac_dbg(2, "ERR DATA: 0x%08X%08X PARITY: 0x%08X\n",
+ reghi, reglo, parity);
+
+ reglo = error_data[ECCR1_ADDR_LO];
+ reghi = error_data[ECCR1_ADDR_HI];
+ if (isr & MC5_IRQ_CE_MASK)
+ p->ceinfo[1].i = reglo | (u64)reghi << 32;
+ else if ((isr & MC5_IRQ_UE_MASK))
+ p->ueinfo[1].i = reglo | (u64)reghi << 32;
+
+ parity = error_data[ECCR1_PAR];
+ edac_dbg(2, "ERR DATA: 0x%08X%08X PARITY: 0x%08X\n",
+ reghi, reglo, parity);
+
+ return true;
+}
+
+/**
+ * convert_to_physical - Convert to physical address.
+ * @priv: DDR memory controller private instance data.
+ * @pinf: ECC error info structure.
+ * @controller: Controller number of the MC5
+ * @error_data: the DDRMC5 ADEC address decoder register data
+ *
+ * Return: Physical address of the DDR memory.
+ */
+static unsigned long convert_to_physical(struct mc_priv *priv,
+ union ecc_error_info pinf,
+ int controller, int *error_data)
+{
+ u32 row, blk, rsh_req_addr, interleave, ilc_base_ctrl_add, ilc_himem_en, reg, offset;
+ u64 high_mem_base, high_mem_offset, low_mem_offset, ilcmem_base;
+ unsigned long err_addr = 0, addr;
+ union row_col_mapping cols;
+ union row_col_mapping rows;
+ u32 col_bit_0;
+
+ row = pinf.rowhi << MC5_REGHI_ROW | pinf.row;
+ offset = controller * ADEC_NUM;
+
+ reg = error_data[ADEC6];
+ rows.i = reg;
+ err_addr |= (row & BIT(0)) << rows.row0;
+ row >>= MC5_EACHBIT;
+ err_addr |= (row & BIT(0)) << rows.row1;
+ row >>= MC5_EACHBIT;
+ err_addr |= (row & BIT(0)) << rows.row2;
+ row >>= MC5_EACHBIT;
+ err_addr |= (row & BIT(0)) << rows.row3;
+ row >>= MC5_EACHBIT;
+ err_addr |= (row & BIT(0)) << rows.row4;
+ row >>= MC5_EACHBIT;
+
+ reg = error_data[ADEC7];
+ rows.i = reg;
+ err_addr |= (row & BIT(0)) << rows.row0;
+ row >>= MC5_EACHBIT;
+ err_addr |= (row & BIT(0)) << rows.row1;
+ row >>= MC5_EACHBIT;
+ err_addr |= (row & BIT(0)) << rows.row2;
+ row >>= MC5_EACHBIT;
+ err_addr |= (row & BIT(0)) << rows.row3;
+ row >>= MC5_EACHBIT;
+ err_addr |= (row & BIT(0)) << rows.row4;
+ row >>= MC5_EACHBIT;
+
+ reg = error_data[ADEC8];
+ rows.i = reg;
+ err_addr |= (row & BIT(0)) << rows.row0;
+ row >>= MC5_EACHBIT;
+ err_addr |= (row & BIT(0)) << rows.row1;
+ row >>= MC5_EACHBIT;
+ err_addr |= (row & BIT(0)) << rows.row2;
+ row >>= MC5_EACHBIT;
+ err_addr |= (row & BIT(0)) << rows.row3;
+ row >>= MC5_EACHBIT;
+ err_addr |= (row & BIT(0)) << rows.row4;
+
+ reg = error_data[ADEC9];
+ rows.i = reg;
+
+ err_addr |= (row & BIT(0)) << rows.row0;
+ row >>= MC5_EACHBIT;
+ err_addr |= (row & BIT(0)) << rows.row1;
+ row >>= MC5_EACHBIT;
+ err_addr |= (row & BIT(0)) << rows.row2;
+ row >>= MC5_EACHBIT;
+
+ col_bit_0 = FIELD_GET(MASK_24, error_data[ADEC9]);
+ pinf.col >>= 1;
+ err_addr |= (pinf.col & 1) << col_bit_0;
+
+ cols.i = error_data[ADEC10];
+ err_addr |= (pinf.col & 1) << cols.col1;
+ pinf.col >>= 1;
+ err_addr |= (pinf.col & 1) << cols.col2;
+ pinf.col >>= 1;
+ err_addr |= (pinf.col & 1) << cols.col3;
+ pinf.col >>= 1;
+ err_addr |= (pinf.col & 1) << cols.col4;
+ pinf.col >>= 1;
+ err_addr |= (pinf.col & 1) << cols.col5;
+ pinf.col >>= 1;
+
+ cols.i = error_data[ADEC11];
+ err_addr |= (pinf.col & 1) << cols.col1;
+ pinf.col >>= 1;
+ err_addr |= (pinf.col & 1) << cols.col2;
+ pinf.col >>= 1;
+ err_addr |= (pinf.col & 1) << cols.col3;
+ pinf.col >>= 1;
+ err_addr |= (pinf.col & 1) << cols.col4;
+ pinf.col >>= 1;
+ err_addr |= (pinf.col & 1) << cols.col5;
+ pinf.col >>= 1;
+
+ reg = error_data[ADEC12];
+ err_addr |= (pinf.bank & BIT(0)) << (reg & MASK_0);
+ pinf.bank >>= MC5_EACHBIT;
+ err_addr |= (pinf.bank & BIT(0)) << FIELD_GET(MC5_BANK1_MASK, reg);
+ pinf.bank >>= MC5_EACHBIT;
+
+ err_addr |= (pinf.bank & BIT(0)) << FIELD_GET(MC5_GRP_0_MASK, reg);
+ pinf.group >>= MC5_EACHBIT;
+ err_addr |= (pinf.bank & BIT(0)) << FIELD_GET(MC5_GRP_1_MASK, reg);
+ pinf.group >>= MC5_EACHBIT;
+ err_addr |= (pinf.bank & BIT(0)) << FIELD_GET(MASK_24, reg);
+ pinf.group >>= MC5_EACHBIT;
+
+ reg = error_data[ADEC4];
+ err_addr |= (pinf.rank & BIT(0)) << (reg & MASK_0);
+ pinf.rank >>= MC5_EACHBIT;
+ err_addr |= (pinf.rank & BIT(0)) << FIELD_GET(MC5_RANK_1_MASK, reg);
+ pinf.rank >>= MC5_EACHBIT;
+
+ reg = error_data[ADEC5];
+ err_addr |= (pinf.lrank & BIT(0)) << (reg & MASK_0);
+ pinf.lrank >>= MC5_EACHBIT;
+ err_addr |= (pinf.lrank & BIT(0)) << FIELD_GET(MC5_LRANK_1_MASK, reg);
+ pinf.lrank >>= MC5_EACHBIT;
+ err_addr |= (pinf.lrank & BIT(0)) << FIELD_GET(MC5_LRANK_2_MASK, reg);
+ pinf.lrank >>= MC5_EACHBIT;
+ err_addr |= (pinf.lrank & BIT(0)) << FIELD_GET(MASK_24, reg);
+ pinf.lrank >>= MC5_EACHBIT;
+
+ high_mem_base = (priv->adec[ADEC2 + offset] & MC5_MEM_MASK) * MC5_HIMEM_BASE;
+ interleave = priv->adec[ADEC13 + offset] & MC5_INTERLEAVE_SEL;
+
+ high_mem_offset = priv->adec[ADEC3 + offset] & MC5_MEM_MASK;
+ low_mem_offset = priv->adec[ADEC1 + offset] & MC5_MEM_MASK;
+ reg = priv->adec[ADEC14 + offset];
+ ilc_himem_en = !!(reg & MC5_ILC_HIMEM_EN);
+ ilcmem_base = (reg & MC5_ILC_MEM) * SZ_1M;
+ if (ilc_himem_en)
+ ilc_base_ctrl_add = ilcmem_base - high_mem_offset;
+ else
+ ilc_base_ctrl_add = ilcmem_base - low_mem_offset;
+
+ if (priv->dwidth == DEV_X16) {
+ blk = err_addr / MC5_X16_SIZE;
+ rsh_req_addr = (blk << 8) + ilc_base_ctrl_add;
+ err_addr = rsh_req_addr * interleave * 2;
+ } else {
+ blk = err_addr / MC5_X32_SIZE;
+ rsh_req_addr = (blk << 9) + ilc_base_ctrl_add;
+ err_addr = rsh_req_addr * interleave * 2;
+ }
+
+ if ((priv->adec[ADEC2 + offset] & MC5_HIGH_MEM_EN) && err_addr >= high_mem_base)
+ addr = err_addr - high_mem_offset;
+ else
+ addr = err_addr - low_mem_offset;
+
+ return addr;
+}
+
+/**
+ * handle_error - Handle Correctable and Uncorrectable errors.
+ * @priv: DDR memory controller private instance data.
+ * @stat: ECC status structure.
+ * @controller: Controller number of the MC5
+ * @error_data: the MC5 ADEC address decoder register data
+ *
+ * Handles ECC correctable and uncorrectable errors.
+ */
+static void handle_error(struct mc_priv *priv, struct ecc_status *stat,
+ int controller, int *error_data)
+{
+ struct mem_ctl_info *mci = priv->mci[controller];
+ union ecc_error_info pinf;
+ unsigned long pa;
+ phys_addr_t pfn;
+ int err;
+
+ if (stat->error_type == MC5_ERR_TYPE_CE) {
+ pinf = stat->ceinfo[stat->channel];
+ snprintf(priv->message, MC5_MSG_SIZE,
+ "Error type:%s Controller %d Addr at %lx\n",
+ "CE", controller, convert_to_physical(priv, pinf, controller, error_data));
+
+ edac_mc_handle_error(HW_EVENT_ERR_CORRECTED, mci,
+ 1, 0, 0, 0, 0, 0, -1,
+ priv->message, "");
+ }
+
+ if (stat->error_type == MC5_ERR_TYPE_UE) {
+ pinf = stat->ueinfo[stat->channel];
+ snprintf(priv->message, MC5_MSG_SIZE,
+ "Error type:%s controller %d Addr at %lx\n",
+ "UE", controller, convert_to_physical(priv, pinf, controller, error_data));
+
+ edac_mc_handle_error(HW_EVENT_ERR_UNCORRECTED, mci,
+ 1, 0, 0, 0, 0, 0, -1,
+ priv->message, "");
+ pa = convert_to_physical(priv, pinf, controller, error_data);
+ pfn = PHYS_PFN(pa);
+
+ if (IS_ENABLED(CONFIG_MEMORY_FAILURE)) {
+ err = memory_failure(pfn, MF_ACTION_REQUIRED);
+ if (err)
+ edac_dbg(2, "memory_failure() error: %d", err);
+ else
+ edac_dbg(2, "Page at PA 0x%lx is hardware poisoned\n", pa);
+ }
+ }
+}
+
+static void mc_init(struct mem_ctl_info *mci, struct device *dev)
+{
+ struct mc_priv *priv = mci->pvt_info;
+ struct csrow_info *csi;
+ struct dimm_info *dimm;
+ u32 row;
+ int ch;
+
+ /* Initialize controller capabilities and configuration */
+ mci->mtype_cap = MEM_FLAG_DDR5;
+ mci->edac_ctl_cap = EDAC_FLAG_NONE | EDAC_FLAG_SECDED;
+ mci->scrub_cap = SCRUB_HW_SRC;
+ mci->scrub_mode = SCRUB_NONE;
+
+ mci->edac_cap = EDAC_FLAG_SECDED;
+ mci->ctl_name = "VersalNET DDR5 controller";
+ mci->dev_name = dev_name(dev);
+ mci->mod_name = "versalnet_edac";
+
+ edac_op_state = EDAC_OPSTATE_INT;
+
+ for (row = 0; row < mci->nr_csrows; row++) {
+ csi = mci->csrows[row];
+ for (ch = 0; ch < csi->nr_channels; ch++) {
+ dimm = csi->channels[ch]->dimm;
+ dimm->edac_mode = EDAC_SECDED;
+ dimm->mtype = MEM_DDR5;
+ dimm->grain = MC5_ERR_GRAIN;
+ dimm->dtype = priv->dwidth;
+ }
+ }
+}
+
+#define to_mci(k) container_of(k, struct mem_ctl_info, dev)
+
+static unsigned int mcdi_rpc_timeout(struct cdx_mcdi *cdx, unsigned int cmd)
+{
+ return MCDI_RPC_TIMEOUT;
+}
+
+static void mcdi_request(struct cdx_mcdi *cdx,
+ const struct cdx_dword *hdr, size_t hdr_len,
+ const struct cdx_dword *sdu, size_t sdu_len)
+{
+ unsigned char *send_buf;
+ int ret;
+
+ send_buf = kzalloc(hdr_len + sdu_len, GFP_KERNEL);
+ if (!send_buf)
+ return;
+
+ memcpy(send_buf, hdr, hdr_len);
+ memcpy(send_buf + hdr_len, sdu, sdu_len);
+
+ ret = rpmsg_send(cdx->ept, send_buf, hdr_len + sdu_len);
+ if (ret)
+ dev_err(&cdx->rpdev->dev, "Failed to send rpmsg data\n");
+
+ kfree(send_buf);
+}
+
+static const struct cdx_mcdi_ops mcdi_ops = {
+ .mcdi_rpc_timeout = mcdi_rpc_timeout,
+ .mcdi_request = mcdi_request,
+};
+
+static void get_ddr_config(u32 index, u32 *buffer, struct cdx_mcdi *amd_mcdi)
+{
+ size_t outlen;
+ int ret;
+
+ MCDI_DECLARE_BUF(inbuf, MC_GET_DDR_CONFIG_IN_LEN);
+ MCDI_DECLARE_BUF(outbuf, BUFFER_SZ);
+
+ MCDI_SET_DWORD(inbuf, EDAC_GET_DDR_CONFIG_IN_CONTROLLER_INDEX, index);
+
+ ret = cdx_mcdi_rpc(amd_mcdi, MC_CMD_EDAC_GET_DDR_CONFIG, inbuf, sizeof(inbuf),
+ outbuf, sizeof(outbuf), &outlen);
+ if (!ret)
+ memcpy(buffer, MCDI_PTR(outbuf, GET_DDR_CONFIG),
+ (ADEC_NUM * 4));
+}
+
+static int setup_mcdi(struct mc_priv *mc_priv)
+{
+ struct cdx_mcdi *amd_mcdi;
+ int ret, i;
+
+ amd_mcdi = kzalloc(sizeof(*amd_mcdi), GFP_KERNEL);
+ if (!amd_mcdi)
+ return -ENOMEM;
+
+ amd_mcdi->mcdi_ops = &mcdi_ops;
+ ret = cdx_mcdi_init(amd_mcdi);
+ if (ret) {
+ kfree(amd_mcdi);
+ return ret;
+ }
+
+ amd_mcdi->ept = mc_priv->ept;
+ mc_priv->mcdi = amd_mcdi;
+
+ for (i = 0; i < NUM_CONTROLLERS; i++)
+ get_ddr_config(i, &mc_priv->adec[ADEC_NUM * i], amd_mcdi);
+
+ return 0;
+}
+
+static const guid_t amd_versalnet_guid = GUID_INIT(0x82678888, 0xa556, 0x44f2,
+ 0xb8, 0xb4, 0x45, 0x56, 0x2e,
+ 0x8c, 0x5b, 0xec);
+
+static int rpmsg_cb(struct rpmsg_device *rpdev, void *data,
+ int len, void *priv, u32 src)
+{
+ struct mc_priv *mc_priv = dev_get_drvdata(&rpdev->dev);
+ const guid_t *sec_type = &guid_null;
+ u32 length, offset, error_id;
+ u32 *result = (u32 *)data;
+ struct ecc_status *p;
+ int i, j, k, sec_sev;
+ u32 *adec_data;
+
+ if (*(u8 *)data == MCDI_RESPONSE) {
+ cdx_mcdi_process_cmd(mc_priv->mcdi, (struct cdx_dword *)data, len);
+ return 0;
+ }
+
+ sec_sev = result[AMD_ERROR_LEVEL];
+ error_id = result[AMD_ERRORID];
+ length = result[AMD_MSG_ERR_LENGTH];
+ offset = result[AMD_MSG_ERR_OFFSET];
+
+ if (result[TOTAL_ERR_LENGTH] > length) {
+ if (!mc_priv->part_len)
+ mc_priv->part_len = length;
+ else
+ mc_priv->part_len += length;
+ /*
+ * The data can come in 2 stretches. Construct the regs from 2
+ * messages the offset indicates the offset from which the data is to
+ * be taken
+ */
+ for (i = 0 ; i < length; i++) {
+ k = offset + i;
+ j = AMD_ERR_DATA + i;
+ mc_priv->regs[k] = result[j];
+ }
+ if (mc_priv->part_len < result[TOTAL_ERR_LENGTH])
+ return 0;
+ mc_priv->part_len = 0;
+ }
+
+ mc_priv->error_id = error_id;
+ mc_priv->error_level = result[AMD_ERROR_LEVEL];
+
+ switch (error_id) {
+ case 5:
+ versalnet_edac_snprintf(mc_priv->message, MC5_MSG_SIZE,
+ "General Software Non-Correctable error",
+ error_id);
+ break;
+ case 6:
+ versalnet_edac_snprintf(mc_priv->message, MC5_MSG_SIZE,
+ "CFU error", error_id);
+ break;
+ case 7:
+ versalnet_edac_snprintf(mc_priv->message, MC5_MSG_SIZE,
+ "CFRAME error", error_id);
+ break;
+ case 10:
+ versalnet_edac_snprintf(mc_priv->message, MC5_MSG_SIZE,
+ "DDRMC Microblaze Correctable ECC error", error_id);
+ break;
+ case 11:
+ versalnet_edac_snprintf(mc_priv->message, MC5_MSG_SIZE,
+ "DDRMC Microblaze Non-Correctable ECC error",
+ error_id);
+ break;
+ case 15:
+ versalnet_edac_snprintf(mc_priv->message, MC5_MSG_SIZE,
+ "MMCM error", error_id);
+ break;
+ case 16:
+ versalnet_edac_snprintf(mc_priv->message, MC5_MSG_SIZE,
+ "HNICX Correctable error", error_id);
+ break;
+ case 17:
+ versalnet_edac_snprintf(mc_priv->message, MC5_MSG_SIZE,
+ "HNICX Non-Correctable error", error_id);
+ break;
+ case 18:
+ p = &mc_priv->stat;
+ memset(p, 0, sizeof(struct ecc_status));
+ p->error_type = MC5_ERR_TYPE_CE;
+ for (i = 0 ; i < NUM_CONTROLLERS; i++) {
+ if (get_ddr_info(&mc_priv->regs[i * REGS_PER_CONTROLLER], mc_priv)) {
+ adec_data = mc_priv->adec + ADEC_NUM * i;
+ handle_error(mc_priv, &mc_priv->stat, i, adec_data);
+ }
+ }
+ return 0;
+ case 19:
+ p = &mc_priv->stat;
+ memset(p, 0, sizeof(struct ecc_status));
+ p->error_type = MC5_ERR_TYPE_UE;
+ for (i = 0 ; i < NUM_CONTROLLERS; i++) {
+ if (get_ddr_info(&mc_priv->regs[i * REGS_PER_CONTROLLER], mc_priv)) {
+ adec_data = mc_priv->adec + ADEC_NUM * i;
+ handle_error(mc_priv, &mc_priv->stat, i, adec_data);
+ }
+ }
+ return 0;
+ case 21:
+ versalnet_edac_snprintf(mc_priv->message, MC5_MSG_SIZE,
+ "GT Non-Correctable error", error_id);
+ break;
+ case 22:
+ versalnet_edac_snprintf(mc_priv->message, MC5_MSG_SIZE,
+ "PL Sysmon Correctable error", error_id);
+ break;
+ case 23:
+ versalnet_edac_snprintf(mc_priv->message, MC5_MSG_SIZE,
+ "PL Sysmon Non-Correctable error", error_id);
+ break;
+ case 111:
+ versalnet_edac_snprintf(mc_priv->message, MC5_MSG_SIZE,
+ "LPX unexpected dfx activation error", error_id);
+ break;
+ case 114:
+ versalnet_edac_snprintf(mc_priv->message, MC5_MSG_SIZE,
+ "INT_LPD Non-Correctable error", error_id);
+ break;
+ case 116:
+ versalnet_edac_snprintf(mc_priv->message, MC5_MSG_SIZE,
+ "INT_OCM Non-Correctable error", error_id);
+ break;
+ case 117:
+ versalnet_edac_snprintf(mc_priv->message, MC5_MSG_SIZE,
+ "INT_FPD Correctable error", error_id);
+ break;
+ case 118:
+ versalnet_edac_snprintf(mc_priv->message, MC5_MSG_SIZE,
+ "INT_FPD Non-Correctable error", error_id);
+ break;
+ case 120:
+ versalnet_edac_snprintf(mc_priv->message, MC5_MSG_SIZE,
+ "INT_IOU Non-Correctable error", error_id);
+ break;
+ case 123:
+ versalnet_edac_snprintf(mc_priv->message, MC5_MSG_SIZE,
+ "err_int_irq from APU GIC Distributor", error_id);
+ break;
+ case 124:
+ versalnet_edac_snprintf(mc_priv->message, MC5_MSG_SIZE,
+ "fault_int_irq from APU GIC Distribute", error_id);
+ break;
+ case 132 ... 139:
+ versalnet_edac_snprintf(mc_priv->message, MC5_MSG_SIZE,
+ "FPX SPLITTER error", error_id);
+ break;
+ case 140:
+ versalnet_edac_snprintf(mc_priv->message, MC5_MSG_SIZE,
+ "APU Cluster 0 error", error_id);
+ break;
+ case 141:
+ versalnet_edac_snprintf(mc_priv->message, MC5_MSG_SIZE,
+ "APU Cluster 1 error", error_id);
+ break;
+ case 142:
+ versalnet_edac_snprintf(mc_priv->message, MC5_MSG_SIZE,
+ "APU Cluster 2 error", error_id);
+ break;
+ case 143:
+ versalnet_edac_snprintf(mc_priv->message, MC5_MSG_SIZE,
+ "APU Cluster 3 error", error_id);
+ break;
+ case 145:
+ versalnet_edac_snprintf(mc_priv->message, MC5_MSG_SIZE,
+ "WWDT1 LPX error", error_id);
+ break;
+ case 147:
+ versalnet_edac_snprintf(mc_priv->message, MC5_MSG_SIZE,
+ "IPI error", error_id);
+ break;
+ case 152 ... 153:
+ versalnet_edac_snprintf(mc_priv->message, MC5_MSG_SIZE,
+ "AFIFS error", error_id);
+ break;
+ case 154 ... 155:
+ versalnet_edac_snprintf(mc_priv->message, MC5_MSG_SIZE,
+ "LPX glitch error", error_id);
+ break;
+ case 185 ... 186:
+ versalnet_edac_snprintf(mc_priv->message, MC5_MSG_SIZE,
+ "FPX AFIFS error", error_id);
+ break;
+ case 195 ... 199:
+ versalnet_edac_snprintf(mc_priv->message, MC5_MSG_SIZE,
+ "AFIFM error", error_id);
+ break;
+ case 108:
+ versalnet_edac_snprintf(mc_priv->message, MC5_MSG_SIZE,
+ "PSM Correctable error", error_id);
+ break;
+ case 59:
+ versalnet_edac_snprintf(mc_priv->message, MC5_MSG_SIZE,
+ "PMC correctable error", error_id);
+ break;
+ case 60:
+ versalnet_edac_snprintf(mc_priv->message, MC5_MSG_SIZE,
+ "PMC Un correctable error", error_id);
+ break;
+ case 43 ... 47:
+ versalnet_edac_snprintf(mc_priv->message, MC5_MSG_SIZE,
+ "PMC Sysmon error", error_id);
+ break;
+ case 163 ... 184:
+ versalnet_edac_snprintf(mc_priv->message, MC5_MSG_SIZE,
+ "RPU error", error_id);
+ break;
+ case 148:
+ versalnet_edac_snprintf(mc_priv->message, MC5_MSG_SIZE,
+ "OCM0 correctable error", error_id);
+ break;
+ case 149:
+ versalnet_edac_snprintf(mc_priv->message, MC5_MSG_SIZE,
+ "OCM1 correctable error", error_id);
+ break;
+ case 150:
+ versalnet_edac_snprintf(mc_priv->message, MC5_MSG_SIZE,
+ "OCM0 Un-correctable error", error_id);
+ break;
+ case 151:
+ versalnet_edac_snprintf(mc_priv->message, MC5_MSG_SIZE,
+ "OCM1 Un-correctable error", error_id);
+ break;
+ case 189:
+ versalnet_edac_snprintf(mc_priv->message, MC5_MSG_SIZE,
+ "PSX_CMN_3 PD block consolidated error", error_id);
+ break;
+ case 191:
+ versalnet_edac_snprintf(mc_priv->message, MC5_MSG_SIZE,
+ "FPD_INT_WRAP PD block consolidated error", error_id);
+ break;
+ case 232:
+ versalnet_edac_snprintf(mc_priv->message, MC5_MSG_SIZE,
+ "CRAM Un-Correctable error", error_id);
+ break;
+ default:
+ versalnet_edac_snprintf(mc_priv->message, MC5_MSG_SIZE,
+ "VERSAL_EDAC_ERR_ID: %d", error_id);
+ break;
+ }
+
+ /* Convert to bytes */
+ length = result[TOTAL_ERR_LENGTH] * 4;
+ log_non_standard_event(sec_type, &amd_versalnet_guid, mc_priv->message,
+ sec_sev, (void *)&result[AMD_ERR_DATA], length);
+
+ return 0;
+}
+
+static struct rpmsg_device_id amd_rpmsg_id_table[] = {
+ { .name = "error_ipc" },
+ { },
+};
+MODULE_DEVICE_TABLE(rpmsg, amd_rpmsg_id_table);
+
+static int rpmsg_probe(struct rpmsg_device *rpdev)
+{
+ struct rpmsg_channel_info chinfo = {0};
+ struct mc_priv *pg;
+
+ pg = (struct mc_priv *)amd_rpmsg_id_table[0].driver_data;
+ chinfo.src = RPMSG_ADDR_ANY;
+ chinfo.dst = rpdev->dst;
+ strscpy(chinfo.name, amd_rpmsg_id_table[0].name,
+ strlen(amd_rpmsg_id_table[0].name));
+
+ pg->ept = rpmsg_create_ept(rpdev, rpmsg_cb, NULL, chinfo);
+ if (!pg->ept)
+ return dev_err_probe(&rpdev->dev, -ENXIO,
+ "Failed to create ept for channel %s\n",
+ chinfo.name);
+
+ dev_set_drvdata(&rpdev->dev, pg);
+ return 0;
+}
+
+static void rpmsg_remove(struct rpmsg_device *rpdev)
+{
+ struct mc_priv *mc_priv = dev_get_drvdata(&rpdev->dev);
+
+ rpmsg_destroy_ept(mc_priv->ept);
+ dev_set_drvdata(&rpdev->dev, NULL);
+}
+
+static struct rpmsg_driver amd_rpmsg_driver = {
+ .drv.name = KBUILD_MODNAME,
+ .probe = rpmsg_probe,
+ .remove = rpmsg_remove,
+ .callback = rpmsg_cb,
+ .id_table = amd_rpmsg_id_table,
+};
+
+static void versal_edac_release(struct device *dev)
+{
+ kfree(dev);
+}
+
+static int init_versalnet(struct mc_priv *priv, struct platform_device *pdev)
+{
+ u32 num_chans, rank, dwidth, config;
+ struct mem_ctl_info *mci = NULL;
+ struct edac_mc_layer layers[2];
+ struct device *dev;
+ enum dev_type dt;
+ char *name;
+ int rc, i;
+
+ for (i = 0; i < NUM_CONTROLLERS; i++) {
+ config = priv->adec[CONF + i * ADEC_NUM];
+ num_chans = FIELD_GET(MC5_NUM_CHANS_MASK, config);
+ rank = 1 << FIELD_GET(MC5_RANK_MASK, config);
+ dwidth = FIELD_GET(MC5_BUS_WIDTH_MASK, config);
+
+ switch (dwidth) {
+ case XDDR5_BUS_WIDTH_16:
+ dt = DEV_X16;
+ break;
+ case XDDR5_BUS_WIDTH_32:
+ dt = DEV_X32;
+ break;
+ case XDDR5_BUS_WIDTH_64:
+ dt = DEV_X64;
+ break;
+ default:
+ dt = DEV_UNKNOWN;
+ }
+
+ if (dt == DEV_UNKNOWN)
+ continue;
+
+ /* Find the first enabled device and register that one. */
+ layers[0].type = EDAC_MC_LAYER_CHIP_SELECT;
+ layers[0].size = rank;
+ layers[0].is_virt_csrow = true;
+ layers[1].type = EDAC_MC_LAYER_CHANNEL;
+ layers[1].size = num_chans;
+ layers[1].is_virt_csrow = false;
+
+ mci = edac_mc_alloc(i, ARRAY_SIZE(layers), layers,
+ sizeof(struct mc_priv));
+ if (!mci) {
+ edac_printk(KERN_ERR, EDAC_MC,
+ "Failed memory allocation for mc instance\n");
+ return -ENOMEM;
+ }
+
+ priv->mci[i] = mci;
+ priv->dwidth = dt;
+
+ dev = kzalloc(sizeof(*dev), GFP_KERNEL);
+ dev->release = versal_edac_release;
+ name = kmalloc(32, GFP_KERNEL);
+ sprintf(name, "versal-net-ddrmc5-edac-%d", i);
+ dev->init_name = name;
+ rc = device_register(dev);
+ if (rc) {
+ put_device(dev);
+ return rc;
+ }
+ mci->pdev = dev;
+
+ platform_set_drvdata(pdev, priv);
+
+ mc_init(mci, dev);
+ rc = edac_mc_add_mc(mci);
+ if (rc) {
+ edac_printk(KERN_ERR, EDAC_MC,
+ "Failed to register with EDAC core\n");
+ edac_mc_free(mci);
+ return rc;
+ }
+ }
+ return 0;
+}
+
+static void remove_versalnet(struct mc_priv *priv)
+{
+ struct mem_ctl_info *mci;
+ int i;
+
+ for (i = 0; i < NUM_CONTROLLERS; i++) {
+ device_unregister(priv->mci[i]->pdev);
+ mci = edac_mc_del_mc(priv->mci[i]->pdev);
+ if (!mci)
+ return;
+
+ edac_mc_free(mci);
+ }
+}
+
+static int mc_probe(struct platform_device *pdev)
+{
+ struct device_node *r5_core_node;
+ struct mc_priv *priv;
+ struct rproc *rp;
+ int rc;
+
+ r5_core_node = of_parse_phandle(pdev->dev.of_node, "amd,rproc", 0);
+ if (!r5_core_node) {
+ dev_err(&pdev->dev, "amd,rproc: invalid phandle\n");
+ return -EINVAL;
+ }
+
+ rp = rproc_get_by_phandle(r5_core_node->phandle);
+ if (!rp)
+ return -EPROBE_DEFER;
+
+ rc = rproc_boot(rp);
+ if (rc) {
+ dev_err(&pdev->dev, "Failed to attach to remote processor\n");
+ rproc_put(rp);
+ return rc;
+ }
+
+ priv = devm_kzalloc(&pdev->dev, sizeof(*priv), GFP_KERNEL);
+ amd_rpmsg_id_table[0].driver_data = (kernel_ulong_t)priv;
+ rc = register_rpmsg_driver(&amd_rpmsg_driver);
+ if (rc) {
+ edac_printk(KERN_ERR, EDAC_MC,
+ "Failed to register RPMsg driver: %d\n", rc);
+ goto free_rproc;
+ }
+
+ rc = setup_mcdi(priv);
+ if (rc)
+ goto free_rpmsg;
+
+ priv->mcdi->r5_rproc = rp;
+ rc = init_versalnet(priv, pdev);
+ if (rc)
+ goto free_rpmsg;
+
+ return 0;
+
+free_rpmsg:
+ kfree(priv->mcdi);
+ unregister_rpmsg_driver(&amd_rpmsg_driver);
+free_rproc:
+ rproc_shutdown(rp);
+ return rc;
+}
+
+static void mc_remove(struct platform_device *pdev)
+{
+ struct mc_priv *priv = platform_get_drvdata(pdev);
+
+ unregister_rpmsg_driver(&amd_rpmsg_driver);
+ remove_versalnet(priv);
+ kfree(priv->mcdi);
+ rproc_shutdown(priv->mcdi->r5_rproc);
+}
+
+static const struct of_device_id amd_edac_match[] = {
+ { .compatible = "xlnx,versal-net-ddrmc5", },
+ {}
+};
+MODULE_DEVICE_TABLE(of, amd_edac_match);
+
+static struct platform_driver amd_ddr_edac_mc_driver = {
+ .driver = {
+ .name = "versal-net-edac",
+ .of_match_table = amd_edac_match,
+ },
+ .probe = mc_probe,
+ .remove = mc_remove,
+};
+
+module_platform_driver(amd_ddr_edac_mc_driver);
+
+MODULE_AUTHOR("AMD Inc");
+MODULE_DESCRIPTION("Versal NET EDAC driver");
+MODULE_LICENSE("GPL");
diff --git a/include/linux/cdx/edac_cdx_pcol.h b/include/linux/cdx/edac_cdx_pcol.h
new file mode 100644
index 000000000000..749db33bb482
--- /dev/null
+++ b/include/linux/cdx/edac_cdx_pcol.h
@@ -0,0 +1,28 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Driver for AMD network controllers and boards
+ *
+ * Copyright (C) 2021, Xilinx, Inc.
+ * Copyright (C) 2022-2023, Advanced Micro Devices, Inc.
+ */
+
+#ifndef MC_CDX_PCOL_H
+#define MC_CDX_PCOL_H
+#include <linux/cdx/mcdi.h>
+
+#define MC_CMD_EDAC_GET_DDR_CONFIG_OUT_WORD_LENGTH_LEN 4
+/* Number of registers for the DDR controller */
+#define MC_CMD_GET_DDR_CONFIG_OFST 4
+#define MC_CMD_GET_DDR_CONFIG_LEN 4
+
+/***********************************/
+/* MC_CMD_EDAC_GET_DDR_CONFIG
+ * Provides detailed configuration for the DDR controller of the given index.
+ */
+#define MC_CMD_EDAC_GET_DDR_CONFIG 0x3
+
+/* MC_CMD_EDAC_GET_DDR_CONFIG_IN msgrequest */
+#define MC_CMD_EDAC_GET_DDR_CONFIG_IN_CONTROLLER_INDEX_OFST 0
+#define MC_CMD_EDAC_GET_DDR_CONFIG_IN_CONTROLLER_INDEX_LEN 4
+
+#endif /* MC_CDX_PCOL_H */
--
2.34.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v8 1/5] cdx: add the headers to include/linux
2025-08-26 5:29 ` [PATCH v8 1/5] cdx: add the headers to include/linux Shubhrajyoti Datta
@ 2025-08-27 8:40 ` Borislav Petkov
0 siblings, 0 replies; 9+ messages in thread
From: Borislav Petkov @ 2025-08-27 8:40 UTC (permalink / raw)
To: Shubhrajyoti Datta
Cc: devicetree, linux-kernel, linux-edac, git, ptsm, srivatsa,
shubhrajyoti.datta, Krzysztof Kozlowski, Rob Herring,
Conor Dooley, Tony Luck, James Morse, Mauro Carvalho Chehab,
Robert Richter, Nipun Gupta, Nikhil Agarwal
On Tue, Aug 26, 2025 at 10:59:10AM +0530, Shubhrajyoti Datta wrote:
> Subject: Re: [PATCH v8 1/5] cdx: add the headers to include/linux
Make that title more specific:
"cdx: Split mcdi.h and reorganize headers"
or so.
> Move `bitfield.h` from the CDX controller directory to
> `include/linux/cdx` to make them accessible to other drivers.
>
> As part of this refactoring, `mcdi.h` has been split into two headers:
> - `mcdi.h`: retains interface-level declarations
> - `mcdid.h`: contains internal definitions and macros
>
> This is in preparation for VersalNET EDAC
> driver that relies on it.
>
> Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@amd.com>
> ---
>
> Changes in v8:
> - Split `mcdi.h` into `mcdi.h` and `mcdid.h`
> - Removed common code from CDX headers
> - Used refactored versions from shared location
>
> Changes in v7:
> - add a minimal header instead moving them
>
> Changes in v6:
> - Patch added
>
> drivers/cdx/controller/cdx_controller.c | 2 +-
> drivers/cdx/controller/cdx_rpmsg.c | 2 +-
> drivers/cdx/controller/mcdi.c | 5 +-
> drivers/cdx/controller/mcdi_functions.c | 1 -
> drivers/cdx/controller/mcdi_functions.h | 3 +-
> drivers/cdx/controller/mcdid.h | 65 +++++++++++++++++++
> .../linux/cdx}/bitfield.h | 0
> .../controller => include/linux/cdx}/mcdi.h | 52 +--------------
> 8 files changed, 73 insertions(+), 57 deletions(-)
> create mode 100644 drivers/cdx/controller/mcdid.h
> rename {drivers/cdx/controller => include/linux/cdx}/bitfield.h (100%)
> rename {drivers/cdx/controller => include/linux/cdx}/mcdi.h (74%)
I'd need an Ack from these gents:
Nipun Gupta <nipun.gupta@amd.com> (maintainer:AMD CDX BUS DRIVER)
Nikhil Agarwal <nikhil.agarwal@amd.com> (maintainer:AMD CDX BUS DRIVER,commit_signer:4/6=67%)
if this is going to go through the EDAC tree.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v8 2/5] cdx: Export Symbols for MCDI RPC and Initialization
2025-08-26 5:29 ` [PATCH v8 2/5] cdx: Export Symbols for MCDI RPC and Initialization Shubhrajyoti Datta
@ 2025-08-29 12:03 ` Borislav Petkov
0 siblings, 0 replies; 9+ messages in thread
From: Borislav Petkov @ 2025-08-29 12:03 UTC (permalink / raw)
To: Shubhrajyoti Datta
Cc: devicetree, linux-kernel, linux-edac, git, ptsm, srivatsa,
shubhrajyoti.datta, Krzysztof Kozlowski, Rob Herring,
Conor Dooley, Tony Luck, James Morse, Mauro Carvalho Chehab,
Robert Richter, Nipun Gupta, Nikhil Agarwal
On Tue, Aug 26, 2025 at 10:59:11AM +0530, Shubhrajyoti Datta wrote:
> diff --git a/drivers/cdx/controller/mcdi.c b/drivers/cdx/controller/mcdi.c
> index 90bf9f7c257b..6f52d8dac907 100644
> --- a/drivers/cdx/controller/mcdi.c
> +++ b/drivers/cdx/controller/mcdi.c
> @@ -100,6 +100,19 @@ static unsigned long cdx_mcdi_rpc_timeout(struct cdx_mcdi *cdx, unsigned int cmd
> return cdx->mcdi_ops->mcdi_rpc_timeout(cdx, cmd);
> }
>
> +/**
> + * cdx_mcdi_init - Initialize MCDI (Management Controller Driver Interface) state
> + * @cdx: NIC through which to issue the command
NIC?
/**
* struct cdx_mcdi - CDX MCDI Firmware interface, to interact
* with CDX controller.
Apparently there's a NIC behind this thing.
> + *
> + * This function allocates and initializes internal MCDI structures and resources
s/This function allocates/Allocate/
> + * for the CDX device, including the workqueue, locking primitives, and command
> + * tracking mechanisms. It sets the initial operating mode and prepares the device
> + * for MCDI operations.
> + *
> + * Return:
> + * * 0 - on success
> + * * -ENOMEM - if memory allocation or workqueue creation fails
> + */
> int cdx_mcdi_init(struct cdx_mcdi *cdx)
> {
> struct cdx_mcdi_iface *mcdi;
> @@ -129,6 +142,7 @@ int cdx_mcdi_init(struct cdx_mcdi *cdx)
> fail:
> return rc;
> }
> +EXPORT_SYMBOL_GPL(cdx_mcdi_init);
>
> void cdx_mcdi_finish(struct cdx_mcdi *cdx)
> {
> @@ -554,6 +568,19 @@ static void cdx_mcdi_start_or_queue(struct cdx_mcdi_iface *mcdi,
> cdx_mcdi_cmd_start_or_queue(mcdi, cmd);
> }
>
> +/**
> + * cdx_mcdi_process_cmd - Process an incoming MCDI response
> + * @cdx: NIC through which to issue the command
ditto. Also tabbing
> + * @outbuf: Pointer to the response buffer received from the management controller
> + * @len: Length of the response buffer in bytes
> + *
> + * This function handles a response from the management controller. It locates the
s/This function handles/Handle/
> + * corresponding command using the sequence number embedded in the header,
> + * completes the command if it is still pending, and initiates any necessary cleanup.
> + *
> + * The function assumes that the response buffer is well-formed and at least one
> + * dword in size.
> + */
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v8 3/5] ras: Export log_non_standard_event for External Usage
2025-08-26 5:29 ` [PATCH v8 3/5] ras: Export log_non_standard_event for External Usage Shubhrajyoti Datta
@ 2025-09-01 15:16 ` Borislav Petkov
0 siblings, 0 replies; 9+ messages in thread
From: Borislav Petkov @ 2025-09-01 15:16 UTC (permalink / raw)
To: Shubhrajyoti Datta
Cc: devicetree, linux-kernel, linux-edac, git, ptsm, srivatsa,
shubhrajyoti.datta, Krzysztof Kozlowski, Rob Herring,
Conor Dooley, Tony Luck, James Morse, Mauro Carvalho Chehab,
Robert Richter, Nipun Gupta, Nikhil Agarwal
On Tue, Aug 26, 2025 at 10:59:12AM +0530, Shubhrajyoti Datta wrote:
> The function log_non_standard_event is responsible for logging
> platform-specific or vendor-defined RAS (Reliability, Availability,
> and Serviceability) events. Currently, this function is only available
> within the RAS subsystem, preventing external modules from
> leveraging its capabilities.
>
> log_non_standard_event is exported so that external drivers like VersalNet
"Describe your changes in imperative mood, e.g. "make xyzzy do frotz"
instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy
to do frotz", as if you are giving orders to the codebase to change
its behaviour."
> EDAC can log non-standard RAS events.
>
> Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@amd.com>
> ---
>
> (no changes since v6)
>
> Changes in v6:
> - Update the commit message.
>
> Changes in v2:
> - New patch addition
>
> drivers/ras/ras.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/ras/ras.c b/drivers/ras/ras.c
> index a6e4792a1b2e..ac0e132ccc3e 100644
> --- a/drivers/ras/ras.c
> +++ b/drivers/ras/ras.c
> @@ -51,6 +51,7 @@ void log_non_standard_event(const guid_t *sec_type, const guid_t *fru_id,
> {
> trace_non_standard_event(sec_type, fru_id, fru_text, sev, err, len);
> }
> +EXPORT_SYMBOL_GPL(log_non_standard_event);
In a pre-patch, pls delete this silly wrapper log_non_standard_event() and use
the tracepoint trace_non_standard_event() at the callsites instead.
Then you can use the same in your driver.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2025-09-01 15:17 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-26 5:29 [PATCH v8 0/5] EDAC/Versal NET: Add support for error notification Shubhrajyoti Datta
2025-08-26 5:29 ` [PATCH v8 1/5] cdx: add the headers to include/linux Shubhrajyoti Datta
2025-08-27 8:40 ` Borislav Petkov
2025-08-26 5:29 ` [PATCH v8 2/5] cdx: Export Symbols for MCDI RPC and Initialization Shubhrajyoti Datta
2025-08-29 12:03 ` Borislav Petkov
2025-08-26 5:29 ` [PATCH v8 3/5] ras: Export log_non_standard_event for External Usage Shubhrajyoti Datta
2025-09-01 15:16 ` Borislav Petkov
2025-08-26 5:29 ` [PATCH v8 4/5] dt-bindings: memory-controllers: Add support for Versal NET EDAC Shubhrajyoti Datta
2025-08-26 5:29 ` [PATCH v8 5/5] EDAC/VersalNET: Add support for error notification Shubhrajyoti Datta
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).