Hi,
I would appreciate it if you could kindly let me know your thoughts.
Thanks,++ dri-devel On 28-03-2025 15:57, Aravind Iddamsetty wrote:Hi, Based on the discussions around using Netlink for RAS purposes, as summarized in this blog post [1] by Dave Airlie. I had proposed a series regarding RAS infrastructure in DRM [2]. I came across your work, which appears to address related areas and I'm particularly interested in understanding how it aligns with or could be adapted to the ongoing discussions around leveraging Netlink for RAS. Could you share your perspective on the potential integration of your efforts with Netlink? Do you foresee any challenges or opportunities in aligning with the approach discussed in the above-mentioned blog post and series? Looking forward to your insights and any additional thoughts you may have on this topic. [1] https://airlied.blogspot.com/2022/09/accelerators-bof-outcomes-summary.html [2] https://lore.kernel.org/all/20231020155835.1295524-1-aravind.iddamsetty@linux.intel.com/ Thanks, Aravind. On 14-02-2025 13:37, Xiang Liu wrote:This patch series generate RAS CPER records for UE/DE/CE/BP threshold exceed event. SMU_TYPE_CE banks are combined into 1 CPER entry, they could be CEs or DEs or both. UEs and BPs are encoded into separate CPER entries. RAS CPER records for CEs will be generated only after CEs count been queried. All records are committed to a pure software ring with a limit size, new records will flush older records when overflow happened. User can access the records by reading debugfs node, which is read-only. Hawking Zhang (5): drm/amd/include: Add amd cper header drm/amdgpu: Introduce funcs for populating CPER drm/amdgpu: Include ACA error type in aca bank drm/amdgpu: Introduce funcs for generating cper record drm/amdgpu: Generate cper records Tao Zhou (4): drm/amdgpu: add RAS CPER ring buffer drm/amdgpu: read CPER ring via debugfs drm/amdgpu: add data write function for CPER ring drm/amdgpu: add mutex lock for cper ring Xiang Liu (3): drm/amdgpu: Get timestamp from system time drm/amdgpu: Commit CPER entry drm/amdgpu: Generate bad page threshold cper records drivers/gpu/drm/amd/amdgpu/Makefile | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 4 + drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c | 46 +- drivers/gpu/drm/amd/amdgpu/amdgpu_aca.h | 16 +- drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c | 559 +++++++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_cper.h | 104 ++++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 + drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 91 +++- drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 2 + drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c | 3 +- drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 2 + drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c | 2 + drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c | 2 + drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 2 + drivers/gpu/drm/amd/amdgpu/umc_v12_0.c | 1 + drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c | 2 + drivers/gpu/drm/amd/include/amd_cper.h | 269 ++++++++++ drivers/gpu/drm/amd/pm/amdgpu_dpm.c | 3 + 19 files changed, 1075 insertions(+), 40 deletions(-) create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_cper.h create mode 100644 drivers/gpu/drm/amd/include/amd_cper.h