* [Intel-xe] [RFC i-g-t v2 0/1] A tool to demonstrate use of netlink sockets to read RAS error counters @ 2023-08-25 12:02 Aravind Iddamsetty 2023-08-25 12:02 ` [Intel-xe] [RFC i-g-t v2 1/1] tools/RAS: A tool to read " Aravind Iddamsetty 2023-08-25 12:08 ` [Intel-xe] ✗ CI.Patch_applied: failure for A tool to demonstrate use of netlink sockets to read RAS error counters (rev2) Patchwork 0 siblings, 2 replies; 3+ messages in thread From: Aravind Iddamsetty @ 2023-08-25 12:02 UTC (permalink / raw) To: intel-xe Cc: Joonas Lahtinen, igt-dev, Daniel Vetter, Alex Deucher, David Airlie This tool is to demonstrate the use of netlink sockets to read RAS error counters, which is being proposed via series "[RFC v2 0/5] Proposal to use netlink for RAS and Telemetry across drm subsystem". The tool supports the following commands: READ_ONE, READ_ALL, WAIT_ON_EVENT, LIST_ERRORS read single error counter: $ ./drm_ras READ_ONE --device=drm:/dev/dri/card1 --error_id=0x0000000000000005 counter value 0 read all error counters: $ ./drm_ras READ_ALL --device=drm:/dev/dri/card1 name config-id counter error-gt0-correctable-guc 0x0000000000000001 0 error-gt0-correctable-slm 0x0000000000000003 0 error-gt0-correctable-eu-ic 0x0000000000000004 0 error-gt0-correctable-eu-grf 0x0000000000000005 0 error-gt0-fatal-guc 0x0000000000000009 0 error-gt0-fatal-slm 0x000000000000000d 0 error-gt0-fatal-eu-grf 0x000000000000000f 0 error-gt0-fatal-fpu 0x0000000000000010 0 error-gt0-fatal-tlb 0x0000000000000011 0 error-gt0-fatal-l3-fabric 0x0000000000000012 0 error-gt0-correctable-subslice 0x0000000000000013 0 error-gt0-correctable-l3bank 0x0000000000000014 0 error-gt0-fatal-subslice 0x0000000000000015 0 error-gt0-fatal-l3bank 0x0000000000000016 0 error-gt0-sgunit-correctable 0x0000000000000017 0 error-gt0-sgunit-nonfatal 0x0000000000000018 0 error-gt0-sgunit-fatal 0x0000000000000019 0 error-gt0-soc-fatal-psf-csc-0 0x000000000000001a 0 error-gt0-soc-fatal-psf-csc-1 0x000000000000001b 0 error-gt0-soc-fatal-psf-csc-2 0x000000000000001c 0 error-gt0-soc-fatal-punit 0x000000000000001d 0 error-gt0-soc-fatal-psf-0 0x000000000000001e 0 error-gt0-soc-fatal-psf-1 0x000000000000001f 0 error-gt0-soc-fatal-psf-2 0x0000000000000020 0 error-gt0-soc-fatal-cd0 0x0000000000000021 0 error-gt0-soc-fatal-cd0-mdfi 0x0000000000000022 0 error-gt0-soc-fatal-mdfi-east 0x0000000000000023 0 error-gt0-soc-fatal-mdfi-south 0x0000000000000024 0 error-gt0-soc-fatal-hbm-ss0-0 0x0000000000000025 0 error-gt0-soc-fatal-hbm-ss0-1 0x0000000000000026 0 error-gt0-soc-fatal-hbm-ss0-2 0x0000000000000027 0 error-gt0-soc-fatal-hbm-ss0-3 0x0000000000000028 0 error-gt0-soc-fatal-hbm-ss0-4 0x0000000000000029 0 error-gt0-soc-fatal-hbm-ss0-5 0x000000000000002a 0 error-gt0-soc-fatal-hbm-ss0-6 0x000000000000002b 0 error-gt0-soc-fatal-hbm-ss0-7 0x000000000000002c 0 error-gt0-soc-fatal-hbm-ss1-0 0x000000000000002d 0 error-gt0-soc-fatal-hbm-ss1-1 0x000000000000002e 0 error-gt0-soc-fatal-hbm-ss1-2 0x000000000000002f 0 error-gt0-soc-fatal-hbm-ss1-3 0x0000000000000030 0 error-gt0-soc-fatal-hbm-ss1-4 0x0000000000000031 0 error-gt0-soc-fatal-hbm-ss1-5 0x0000000000000032 0 error-gt0-soc-fatal-hbm-ss1-6 0x0000000000000033 0 error-gt0-soc-fatal-hbm-ss1-7 0x0000000000000034 0 error-gt0-soc-fatal-hbm-ss2-0 0x0000000000000035 0 error-gt0-soc-fatal-hbm-ss2-1 0x0000000000000036 0 error-gt0-soc-fatal-hbm-ss2-2 0x0000000000000037 0 error-gt0-soc-fatal-hbm-ss2-3 0x0000000000000038 0 error-gt0-soc-fatal-hbm-ss2-4 0x0000000000000039 0 error-gt0-soc-fatal-hbm-ss2-5 0x000000000000003a 0 error-gt0-soc-fatal-hbm-ss2-6 0x000000000000003b 0 error-gt0-soc-fatal-hbm-ss2-7 0x000000000000003c 0 error-gt0-soc-fatal-hbm-ss3-0 0x000000000000003d 0 error-gt0-soc-fatal-hbm-ss3-1 0x000000000000003e 0 error-gt0-soc-fatal-hbm-ss3-2 0x000000000000003f 0 error-gt0-soc-fatal-hbm-ss3-3 0x0000000000000040 0 error-gt0-soc-fatal-hbm-ss3-4 0x0000000000000041 0 error-gt0-soc-fatal-hbm-ss3-5 0x0000000000000042 0 error-gt0-soc-fatal-hbm-ss3-6 0x0000000000000043 0 error-gt0-soc-fatal-hbm-ss3-7 0x0000000000000044 0 error-gt0-gsc-correctable-sram-ecc 0x0000000000000045 0 error-gt0-gsc-nonfatal-mia-shutdown 0x0000000000000046 0 error-gt0-gsc-nonfatal-mia-int 0x0000000000000047 0 error-gt0-gsc-nonfatal-sram-ecc 0x0000000000000048 0 error-gt0-gsc-nonfatal-wdg-timeout 0x0000000000000049 0 error-gt0-gsc-nonfatal-rom-parity 0x000000000000004a 0 error-gt0-gsc-nonfatal-ucode-parity 0x000000000000004b 0 error-gt0-gsc-nonfatal-glitch-det 0x000000000000004c 0 error-gt0-gsc-nonfatal-fuse-pull 0x000000000000004d 0 error-gt0-gsc-nonfatal-fuse-crc-check 0x000000000000004e 0 error-gt0-gsc-nonfatal-selfmbist 0x000000000000004f 0 error-gt0-gsc-nonfatal-aon-parity 0x0000000000000050 0 error-gt1-correctable-guc 0x1000000000000001 0 error-gt1-correctable-slm 0x1000000000000003 0 error-gt1-correctable-eu-ic 0x1000000000000004 0 error-gt1-correctable-eu-grf 0x1000000000000005 0 error-gt1-fatal-guc 0x1000000000000009 0 error-gt1-fatal-slm 0x100000000000000d 0 error-gt1-fatal-eu-grf 0x100000000000000f 0 error-gt1-fatal-fpu 0x1000000000000010 0 error-gt1-fatal-tlb 0x1000000000000011 0 error-gt1-fatal-l3-fabric 0x1000000000000012 0 error-gt1-correctable-subslice 0x1000000000000013 0 error-gt1-correctable-l3bank 0x1000000000000014 0 error-gt1-fatal-subslice 0x1000000000000015 0 error-gt1-fatal-l3bank 0x1000000000000016 0 error-gt1-sgunit-correctable 0x1000000000000017 0 error-gt1-sgunit-nonfatal 0x1000000000000018 0 error-gt1-sgunit-fatal 0x1000000000000019 0 error-gt1-soc-fatal-psf-csc-0 0x100000000000001a 0 error-gt1-soc-fatal-psf-csc-1 0x100000000000001b 0 error-gt1-soc-fatal-psf-csc-2 0x100000000000001c 0 error-gt1-soc-fatal-punit 0x100000000000001d 0 error-gt1-soc-fatal-psf-0 0x100000000000001e 0 error-gt1-soc-fatal-psf-1 0x100000000000001f 0 error-gt1-soc-fatal-psf-2 0x1000000000000020 0 error-gt1-soc-fatal-cd0 0x1000000000000021 0 error-gt1-soc-fatal-cd0-mdfi 0x1000000000000022 0 error-gt1-soc-fatal-mdfi-east 0x1000000000000023 0 error-gt1-soc-fatal-mdfi-south 0x1000000000000024 0 error-gt1-soc-fatal-hbm-ss0-0 0x1000000000000025 0 error-gt1-soc-fatal-hbm-ss0-1 0x1000000000000026 0 error-gt1-soc-fatal-hbm-ss0-2 0x1000000000000027 0 error-gt1-soc-fatal-hbm-ss0-3 0x1000000000000028 0 error-gt1-soc-fatal-hbm-ss0-4 0x1000000000000029 0 error-gt1-soc-fatal-hbm-ss0-5 0x100000000000002a 0 error-gt1-soc-fatal-hbm-ss0-6 0x100000000000002b 0 error-gt1-soc-fatal-hbm-ss0-7 0x100000000000002c 0 error-gt1-soc-fatal-hbm-ss1-0 0x100000000000002d 0 error-gt1-soc-fatal-hbm-ss1-1 0x100000000000002e 0 error-gt1-soc-fatal-hbm-ss1-2 0x100000000000002f 0 error-gt1-soc-fatal-hbm-ss1-3 0x1000000000000030 0 error-gt1-soc-fatal-hbm-ss1-4 0x1000000000000031 0 error-gt1-soc-fatal-hbm-ss1-5 0x1000000000000032 0 error-gt1-soc-fatal-hbm-ss1-6 0x1000000000000033 0 error-gt1-soc-fatal-hbm-ss1-7 0x1000000000000034 0 error-gt1-soc-fatal-hbm-ss2-0 0x1000000000000035 0 error-gt1-soc-fatal-hbm-ss2-1 0x1000000000000036 0 error-gt1-soc-fatal-hbm-ss2-2 0x1000000000000037 0 error-gt1-soc-fatal-hbm-ss2-3 0x1000000000000038 0 error-gt1-soc-fatal-hbm-ss2-4 0x1000000000000039 0 error-gt1-soc-fatal-hbm-ss2-5 0x100000000000003a 0 error-gt1-soc-fatal-hbm-ss2-6 0x100000000000003b 0 error-gt1-soc-fatal-hbm-ss2-7 0x100000000000003c 0 error-gt1-soc-fatal-hbm-ss3-0 0x100000000000003d 0 error-gt1-soc-fatal-hbm-ss3-1 0x100000000000003e 0 error-gt1-soc-fatal-hbm-ss3-2 0x100000000000003f 0 error-gt1-soc-fatal-hbm-ss3-3 0x1000000000000040 0 error-gt1-soc-fatal-hbm-ss3-4 0x1000000000000041 0 error-gt1-soc-fatal-hbm-ss3-5 0x1000000000000042 0 error-gt1-soc-fatal-hbm-ss3-6 0x1000000000000043 0 error-gt1-soc-fatal-hbm-ss3-7 0x1000000000000044 0 wait on a error event: $ ./drm_ras WAIT_ON_EVENT --device=drm:/dev/dri/card1 waiting for error event error event received counter value 0 list all errors: $ ./drm_ras LIST_ERRORS --device=drm:/dev/dri/card1 name config-id error-gt0-correctable-guc 0x0000000000000001 error-gt0-correctable-slm 0x0000000000000003 error-gt0-correctable-eu-ic 0x0000000000000004 error-gt0-correctable-eu-grf 0x0000000000000005 error-gt0-fatal-guc 0x0000000000000009 error-gt0-fatal-slm 0x000000000000000d error-gt0-fatal-eu-grf 0x000000000000000f error-gt0-fatal-fpu 0x0000000000000010 error-gt0-fatal-tlb 0x0000000000000011 error-gt0-fatal-l3-fabric 0x0000000000000012 error-gt0-correctable-subslice 0x0000000000000013 error-gt0-correctable-l3bank 0x0000000000000014 error-gt0-fatal-subslice 0x0000000000000015 error-gt0-fatal-l3bank 0x0000000000000016 error-gt0-sgunit-correctable 0x0000000000000017 error-gt0-sgunit-nonfatal 0x0000000000000018 error-gt0-sgunit-fatal 0x0000000000000019 error-gt0-soc-fatal-psf-csc-0 0x000000000000001a error-gt0-soc-fatal-psf-csc-1 0x000000000000001b error-gt0-soc-fatal-psf-csc-2 0x000000000000001c error-gt0-soc-fatal-punit 0x000000000000001d error-gt0-soc-fatal-psf-0 0x000000000000001e error-gt0-soc-fatal-psf-1 0x000000000000001f error-gt0-soc-fatal-psf-2 0x0000000000000020 error-gt0-soc-fatal-cd0 0x0000000000000021 error-gt0-soc-fatal-cd0-mdfi 0x0000000000000022 error-gt0-soc-fatal-mdfi-east 0x0000000000000023 error-gt0-soc-fatal-mdfi-south 0x0000000000000024 error-gt0-soc-fatal-hbm-ss0-0 0x0000000000000025 error-gt0-soc-fatal-hbm-ss0-1 0x0000000000000026 error-gt0-soc-fatal-hbm-ss0-2 0x0000000000000027 error-gt0-soc-fatal-hbm-ss0-3 0x0000000000000028 error-gt0-soc-fatal-hbm-ss0-4 0x0000000000000029 error-gt0-soc-fatal-hbm-ss0-5 0x000000000000002a error-gt0-soc-fatal-hbm-ss0-6 0x000000000000002b error-gt0-soc-fatal-hbm-ss0-7 0x000000000000002c error-gt0-soc-fatal-hbm-ss1-0 0x000000000000002d error-gt0-soc-fatal-hbm-ss1-1 0x000000000000002e error-gt0-soc-fatal-hbm-ss1-2 0x000000000000002f error-gt0-soc-fatal-hbm-ss1-3 0x0000000000000030 error-gt0-soc-fatal-hbm-ss1-4 0x0000000000000031 error-gt0-soc-fatal-hbm-ss1-5 0x0000000000000032 error-gt0-soc-fatal-hbm-ss1-6 0x0000000000000033 error-gt0-soc-fatal-hbm-ss1-7 0x0000000000000034 error-gt0-soc-fatal-hbm-ss2-0 0x0000000000000035 error-gt0-soc-fatal-hbm-ss2-1 0x0000000000000036 error-gt0-soc-fatal-hbm-ss2-2 0x0000000000000037 error-gt0-soc-fatal-hbm-ss2-3 0x0000000000000038 error-gt0-soc-fatal-hbm-ss2-4 0x0000000000000039 error-gt0-soc-fatal-hbm-ss2-5 0x000000000000003a error-gt0-soc-fatal-hbm-ss2-6 0x000000000000003b error-gt0-soc-fatal-hbm-ss2-7 0x000000000000003c error-gt0-soc-fatal-hbm-ss3-0 0x000000000000003d error-gt0-soc-fatal-hbm-ss3-1 0x000000000000003e error-gt0-soc-fatal-hbm-ss3-2 0x000000000000003f error-gt0-soc-fatal-hbm-ss3-3 0x0000000000000040 error-gt0-soc-fatal-hbm-ss3-4 0x0000000000000041 error-gt0-soc-fatal-hbm-ss3-5 0x0000000000000042 error-gt0-soc-fatal-hbm-ss3-6 0x0000000000000043 error-gt0-soc-fatal-hbm-ss3-7 0x0000000000000044 error-gt0-gsc-correctable-sram-ecc 0x0000000000000045 error-gt0-gsc-nonfatal-mia-shutdown 0x0000000000000046 error-gt0-gsc-nonfatal-mia-int 0x0000000000000047 error-gt0-gsc-nonfatal-sram-ecc 0x0000000000000048 error-gt0-gsc-nonfatal-wdg-timeout 0x0000000000000049 error-gt0-gsc-nonfatal-rom-parity 0x000000000000004a error-gt0-gsc-nonfatal-ucode-parity 0x000000000000004b error-gt0-gsc-nonfatal-glitch-det 0x000000000000004c error-gt0-gsc-nonfatal-fuse-pull 0x000000000000004d error-gt0-gsc-nonfatal-fuse-crc-check 0x000000000000004e error-gt0-gsc-nonfatal-selfmbist 0x000000000000004f error-gt0-gsc-nonfatal-aon-parity 0x0000000000000050 error-gt1-correctable-guc 0x1000000000000001 error-gt1-correctable-slm 0x1000000000000003 error-gt1-correctable-eu-ic 0x1000000000000004 error-gt1-correctable-eu-grf 0x1000000000000005 error-gt1-fatal-guc 0x1000000000000009 error-gt1-fatal-slm 0x100000000000000d error-gt1-fatal-eu-grf 0x100000000000000f error-gt1-fatal-fpu 0x1000000000000010 error-gt1-fatal-tlb 0x1000000000000011 error-gt1-fatal-l3-fabric 0x1000000000000012 error-gt1-correctable-subslice 0x1000000000000013 error-gt1-correctable-l3bank 0x1000000000000014 error-gt1-fatal-subslice 0x1000000000000015 error-gt1-fatal-l3bank 0x1000000000000016 error-gt1-sgunit-correctable 0x1000000000000017 error-gt1-sgunit-nonfatal 0x1000000000000018 error-gt1-sgunit-fatal 0x1000000000000019 error-gt1-soc-fatal-psf-csc-0 0x100000000000001a error-gt1-soc-fatal-psf-csc-1 0x100000000000001b error-gt1-soc-fatal-psf-csc-2 0x100000000000001c error-gt1-soc-fatal-punit 0x100000000000001d error-gt1-soc-fatal-psf-0 0x100000000000001e error-gt1-soc-fatal-psf-1 0x100000000000001f error-gt1-soc-fatal-psf-2 0x1000000000000020 error-gt1-soc-fatal-cd0 0x1000000000000021 error-gt1-soc-fatal-cd0-mdfi 0x1000000000000022 error-gt1-soc-fatal-mdfi-east 0x1000000000000023 error-gt1-soc-fatal-mdfi-south 0x1000000000000024 error-gt1-soc-fatal-hbm-ss0-0 0x1000000000000025 error-gt1-soc-fatal-hbm-ss0-1 0x1000000000000026 error-gt1-soc-fatal-hbm-ss0-2 0x1000000000000027 error-gt1-soc-fatal-hbm-ss0-3 0x1000000000000028 error-gt1-soc-fatal-hbm-ss0-4 0x1000000000000029 error-gt1-soc-fatal-hbm-ss0-5 0x100000000000002a error-gt1-soc-fatal-hbm-ss0-6 0x100000000000002b error-gt1-soc-fatal-hbm-ss0-7 0x100000000000002c error-gt1-soc-fatal-hbm-ss1-0 0x100000000000002d error-gt1-soc-fatal-hbm-ss1-1 0x100000000000002e error-gt1-soc-fatal-hbm-ss1-2 0x100000000000002f error-gt1-soc-fatal-hbm-ss1-3 0x1000000000000030 error-gt1-soc-fatal-hbm-ss1-4 0x1000000000000031 error-gt1-soc-fatal-hbm-ss1-5 0x1000000000000032 error-gt1-soc-fatal-hbm-ss1-6 0x1000000000000033 error-gt1-soc-fatal-hbm-ss1-7 0x1000000000000034 error-gt1-soc-fatal-hbm-ss2-0 0x1000000000000035 error-gt1-soc-fatal-hbm-ss2-1 0x1000000000000036 error-gt1-soc-fatal-hbm-ss2-2 0x1000000000000037 error-gt1-soc-fatal-hbm-ss2-3 0x1000000000000038 error-gt1-soc-fatal-hbm-ss2-4 0x1000000000000039 error-gt1-soc-fatal-hbm-ss2-5 0x100000000000003a error-gt1-soc-fatal-hbm-ss2-6 0x100000000000003b error-gt1-soc-fatal-hbm-ss2-7 0x100000000000003c error-gt1-soc-fatal-hbm-ss3-0 0x100000000000003d error-gt1-soc-fatal-hbm-ss3-1 0x100000000000003e error-gt1-soc-fatal-hbm-ss3-2 0x100000000000003f error-gt1-soc-fatal-hbm-ss3-3 0x1000000000000040 error-gt1-soc-fatal-hbm-ss3-4 0x1000000000000041 error-gt1-soc-fatal-hbm-ss3-5 0x1000000000000042 error-gt1-soc-fatal-hbm-ss3-6 0x1000000000000043 error-gt1-soc-fatal-hbm-ss3-7 0x1000000000000044 Cc: Alex Deucher <alexander.deucher@amd.com> Cc: David Airlie <airlied@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Oded Gabbay <ogabbay@kernel.org> Cc: Tomer Tayar <ttayar@habana.ai> Aravind Iddamsetty (1): tools/RAS: A tool to read error counters include/drm-uapi/drm_netlink.h | 66 ++++++ meson.build | 4 + tools/drm_ras.c | 403 +++++++++++++++++++++++++++++++++ tools/meson.build | 5 + 4 files changed, 478 insertions(+) create mode 100644 include/drm-uapi/drm_netlink.h create mode 100644 tools/drm_ras.c -- 2.25.1 ^ permalink raw reply [flat|nested] 3+ messages in thread
* [Intel-xe] [RFC i-g-t v2 1/1] tools/RAS: A tool to read error counters 2023-08-25 12:02 [Intel-xe] [RFC i-g-t v2 0/1] A tool to demonstrate use of netlink sockets to read RAS error counters Aravind Iddamsetty @ 2023-08-25 12:02 ` Aravind Iddamsetty 2023-08-25 12:08 ` [Intel-xe] ✗ CI.Patch_applied: failure for A tool to demonstrate use of netlink sockets to read RAS error counters (rev2) Patchwork 1 sibling, 0 replies; 3+ messages in thread From: Aravind Iddamsetty @ 2023-08-25 12:02 UTC (permalink / raw) To: intel-xe; +Cc: igt-dev This tool demonstrates the use of netlink sockets to query and read the error counters on a hardware. It provides following commands LIST_ERRORS, READ_ONE, READ_ALL to read counters and WAIT_ON_EVENT to wait for occurrence on a particular event, presently hardcoded to wait on occurrence of correctable error event and read a error counter. v2: update uapi header. Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@linux.intel.com> --- include/drm-uapi/drm_netlink.h | 66 ++++++ meson.build | 4 + tools/drm_ras.c | 403 +++++++++++++++++++++++++++++++++ tools/meson.build | 5 + 4 files changed, 478 insertions(+) create mode 100644 include/drm-uapi/drm_netlink.h create mode 100644 tools/drm_ras.c diff --git a/include/drm-uapi/drm_netlink.h b/include/drm-uapi/drm_netlink.h new file mode 100644 index 000000000..b37f95295 --- /dev/null +++ b/include/drm-uapi/drm_netlink.h @@ -0,0 +1,66 @@ +/* + * Copyright 2023 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * VA LINUX SYSTEMS AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR + * OTHER DEALINGS IN THE SOFTWARE. + */ + +#ifndef _DRM_NETLINK_H_ +#define _DRM_NETLINK_H_ + +#define DRM_GENL_VERSION 1 +#define DRM_GENL_MCAST_GROUP_NAME_CORR_ERR "drm_corr_err" +#define DRM_GENL_MCAST_GROUP_NAME_UNCORR_ERR "drm_uncorr_err" + +#if defined(__cplusplus) +extern "C" { +#endif + +enum error_cmds { + DRM_CMD_UNSPEC, + /* command to list all errors names with config-id */ + DRM_RAS_CMD_QUERY, + /* command to get a counter for a specific error */ + DRM_RAS_CMD_READ_ONE, + /* command to get counters of all errors */ + DRM_RAS_CMD_READ_ALL, + DRM_RAS_CMD_ERROR_EVENT, + + __DRM_CMD_MAX, + DRM_CMD_MAX = __DRM_CMD_MAX - 1, +}; + +enum error_attr { + DRM_ATTR_UNSPEC, + DRM_ATTR_PAD = DRM_ATTR_UNSPEC, + DRM_RAS_ATTR_REQUEST, /* NLA_U8 */ + DRM_RAS_ATTR_QUERY_REPLY, /* NLA_NESTED */ + DRM_RAS_ATTR_ERROR_NAME, /* NLA_NUL_STRING */ + DRM_RAS_ATTR_ERROR_ID, /* NLA_U64 */ + DRM_RAS_ATTR_ERROR_VALUE, /* NLA_U64 */ + + __DRM_ATTR_MAX, + DRM_ATTR_MAX = __DRM_ATTR_MAX - 1, +}; + +#if defined(__cplusplus) +} +#endif + +#endif diff --git a/meson.build b/meson.build index 7360634fe..269a9310f 100644 --- a/meson.build +++ b/meson.build @@ -141,6 +141,10 @@ cairo = dependency('cairo', version : '>1.12.0', required : true) libudev = dependency('libudev', required : true) glib = dependency('glib-2.0', required : true) +libnl = dependency('libnl-3.0', required: false) +libnl_genl = dependency('libnl-genl-3.0', required: false) +libnl_cli = dependency('libnl-cli-3.0', required:false) + xmlrpc = dependency('xmlrpc', required : false) xmlrpc_util = dependency('xmlrpc_util', required : false) xmlrpc_client = dependency('xmlrpc_client', required : false) diff --git a/tools/drm_ras.c b/tools/drm_ras.c new file mode 100644 index 000000000..739cf009e --- /dev/null +++ b/tools/drm_ras.c @@ -0,0 +1,403 @@ +// SPDX-License-Identifier: MIT +/* + * Copyright © 2021 Intel Corporation + */ + +#include <stdio.h> +#include <sys/types.h> +#include <unistd.h> +#include <getopt.h> +#include <linux/genetlink.h> +#include <netlink/cli/utils.h> + +#include "drm_netlink.h" +#include "igt_device_scan.h" + +#define ARRAY_SIZE(array) (sizeof(array) / sizeof((array)[0])) + +struct nl_sock *sock, *mcsock; +int family_id; + +enum opt_val { + OPT_UNKNOWN = '?', + OPT_END = -1, + OPT_DEVICE, + OPT_CONFIG, + OPT_HELP, +}; + +enum cmd_ids { + INVALID_CMD = -1, + LIST_ERRORS = 0, + READ_ONE, + READ_ALL, + WAIT_ON_EVENT, + + __MAX_CMDS, +}; + +static const char * const cmd_names[] = { + "LIST_ERRORS", + "READ_ONE", + "READ_ALL", + "WAIT_ON_EVENT", +}; + +static void help(char **argv) +{ + int i; + + printf("Usage: %s command [<command options>]\n", argv[0]); + printf("commands:\n"); + + for (i = 0; i < __MAX_CMDS; i++) { + switch (i) { + case LIST_ERRORS: + case READ_ALL: + case WAIT_ON_EVENT: + printf("%s %s --device=<device filter>\n", argv[0], cmd_names[i]); + break; + case READ_ONE: + printf("%s %s --device=<device filter> --error_id=<id returned from query>\n", argv[0], cmd_names[i]); + break; + } + } + + igt_device_print_filter_types(); +} + +static int list_errors(struct nl_cache_ops *ops, struct genl_cmd *cmd, + struct genl_info *info, void *arg) +{ + const struct nlmsghdr *nlh = info->nlh; + struct nlattr *nla; + int len, remain; + + len = GENL_HDRLEN; + + nlmsg_for_each_attr(nla, nlh, len, remain) { + if ((nla_type(nla) == DRM_RAS_ATTR_QUERY_REPLY) && nla_is_nested(nla)) { + struct nlattr *cur; + int rem; + + if (cmd->c_id == DRM_RAS_CMD_READ_ALL) + printf("%-50s\t%-18s\t%s\n", "name", "config-id", "counter"); + else + printf("%-50s\t%-18s\n", "name", "config-id"); + + nla_for_each_nested(cur, nla, rem) { + switch (nla_type(cur)) { + case DRM_RAS_ATTR_ERROR_NAME: + printf("\n%-50s", nla_get_string(cur)); + break; + case DRM_RAS_ATTR_ERROR_ID: + printf("\t0x%016lx", nla_get_u64(cur)); + break; + case DRM_RAS_ATTR_ERROR_VALUE: + printf("\t%lu", nla_get_u64(cur)); + break; + default: + break; + } + } + printf("\n"); + } + } + + return NL_OK; +} + +static int read_single(struct nl_cache_ops *ops, struct genl_cmd *cmd, + struct genl_info *info, void *arg) +{ + if (!info->attrs[DRM_RAS_ATTR_ERROR_VALUE]) + nl_cli_fatal(NLE_FAILURE, "DRM_RAS_ATTR_ERROR_VALUE attribute is missing"); + + printf("counter value %lu\n", nla_get_u64(info->attrs[DRM_RAS_ATTR_ERROR_VALUE])); + + return NL_OK; +} + +static int mcast_event_handler(struct nl_cache_ops *ops, struct genl_cmd *cmd, + struct genl_info *info, void *arg) +{ + struct nl_msg *msg; + uint64_t config = 0x0000000000000005; /* error-gt0-correctable-eu-grf */ + void *msg_head; + int ret; + + printf("error event received\n"); + + msg = nlmsg_alloc(); + if (!msg) + nl_cli_fatal(NLE_INVAL, "nlmsg_alloc failed\n"); + + msg_head = genlmsg_put(msg, NL_AUTO_PORT, NL_AUTO_SEQ, family_id, 0, 0, DRM_RAS_CMD_READ_ONE, 1); + if (!msg_head) + nl_cli_fatal(ENOMEM, "genlmsg_put failed\n"); + + nla_put_u64(msg, DRM_RAS_ATTR_ERROR_ID, config); + + ret = nl_send_auto(sock, msg); + if (ret < 0) + nl_cli_fatal(ret, "Unable to send message: %s", nl_geterror(ret)); + + ret = nl_recvmsgs_default(sock); + if (ret < 0) + nl_cli_fatal(ret, "Unable to receive message: %s", nl_geterror(ret)); + + nlmsg_free(msg); + + return NL_OK; +} + +static struct nla_policy drm_genl_policy[DRM_ATTR_MAX + 1] = { + [DRM_RAS_ATTR_REQUEST] = { .type = NLA_U8 }, + [DRM_RAS_ATTR_QUERY_REPLY] = { .type = NLA_NESTED }, + [DRM_RAS_ATTR_ERROR_NAME] = { .type = NLA_NUL_STRING }, + [DRM_RAS_ATTR_ERROR_ID] = { .type = NLA_U64 }, + [DRM_RAS_ATTR_ERROR_VALUE] = { .type = NLA_U64 }, +}; + +static struct genl_cmd drm_genl_cmds[] = { + { + .c_id = DRM_RAS_CMD_QUERY, + .c_name = "QUERY", + .c_maxattr = DRM_ATTR_MAX, + .c_attr_policy = drm_genl_policy, + .c_msg_parser = list_errors, + }, + { + .c_id = DRM_RAS_CMD_READ_ONE, + .c_name = "READ_1", + .c_maxattr = DRM_ATTR_MAX, + .c_attr_policy = drm_genl_policy, + .c_msg_parser = read_single, + }, + { + .c_id = DRM_RAS_CMD_READ_ALL, + .c_name = "READ_ALL", + .c_maxattr = DRM_ATTR_MAX, + .c_attr_policy = drm_genl_policy, + .c_msg_parser = list_errors, + }, + { + .c_id = DRM_RAS_CMD_ERROR_EVENT, + .c_name = "ERROR_EVENT", + .c_maxattr = DRM_ATTR_MAX, + .c_attr_policy = drm_genl_policy, + .c_msg_parser = mcast_event_handler, + }, +}; + +static struct genl_ops drm_genl_ops = { + .o_hdrsize = 0, + .o_cmds = drm_genl_cmds, + .o_ncmds = ARRAY_SIZE(drm_genl_cmds), +}; + +static void send_cmd(int cmd, uint64_t config) +{ + struct nl_msg *msg; + void *msg_head; + int ret; + + msg = nlmsg_alloc(); + if (!msg) + nl_cli_fatal(NLE_INVAL, "nlmsg_alloc failed\n"); + + msg_head = genlmsg_put(msg, NL_AUTO_PORT, NL_AUTO_SEQ, family_id, 0, 0, cmd, 1); + if (!msg_head) + nl_cli_fatal(ENOMEM, "genlmsg_put failed\n"); + switch (cmd) { + case DRM_RAS_CMD_QUERY: + nla_put_u8(msg, DRM_RAS_ATTR_REQUEST, 1); + break; + case DRM_RAS_CMD_READ_ONE: + nla_put_u64(msg, DRM_RAS_ATTR_ERROR_ID, config); + break; + case DRM_RAS_CMD_READ_ALL: + nla_put_u8(msg, DRM_RAS_ATTR_REQUEST, 1); + break; + default: + break; + } + + ret = nl_send_auto(sock, msg); + if (ret < 0) + nl_cli_fatal(ret, "Unable to send message: %s", nl_geterror(ret)); + + ret = nl_recvmsgs_default(sock); + if (ret < 0) + nl_cli_fatal(ret, "Unable to receive message: %s", nl_geterror(ret)); + + nlmsg_free(msg); +} + +static int get_cmd(char *cmd_name) +{ + int i; + + if (!cmd_name) + return -1; + + for (i = 0; i < __MAX_CMDS; i++) { + if (strcasecmp(cmd_name, cmd_names[i]) == 0) + return i; + } + + return -1; +} + +int main(int argc, char **argv) +{ + char *endptr; + enum opt_val val; + enum cmd_ids cmd; + char *device = NULL; + uint64_t error_config_id; + int ret, mcgrp, index; + struct igt_device_card card; + char *dev_name, *dup; + + static struct option options[] = { + {"device", required_argument, NULL, OPT_DEVICE}, + {"error_id", required_argument, NULL, OPT_CONFIG}, + {"help", no_argument, NULL, OPT_HELP}, + { 0 } + }; + + cmd = get_cmd(argv[1]); + if (cmd < 0) { + fprintf(stderr, "invalid command\n"); + help(argv); + exit(EXIT_FAILURE); + } + + for (val = 0; val != OPT_END; ) { + val = getopt_long(argc, argv, "", options, &index); + + switch (val) { + case OPT_DEVICE: + device = strdup(optarg); + break; + case OPT_CONFIG: + error_config_id = strtoull(optarg, &endptr, 16); + if (*endptr) { + fprintf(stderr, "invalid config id %s\n", optarg); + exit(EXIT_FAILURE); + } + break; + case OPT_HELP: + help(argv); + exit(EXIT_FAILURE); + case OPT_END: + break; + case OPT_UNKNOWN: + exit(EXIT_FAILURE); + } + } + + if (!device) { + fprintf(stderr, "missing device option\n"); + help(argv); + exit(EXIT_FAILURE); + } else { + ret = igt_device_card_match_pci(device, &card); + if (!ret) { + fprintf(stderr, "device %s not found!\n", device); + exit(EXIT_FAILURE); + } + free(device); + } + + /* get card name */ + dup = strdup(card.card); + + while (dup) + dev_name = strsep(&dup, "/"); + free(dup); + + drm_genl_ops.o_name = strdup(dev_name); + + sock = nl_cli_alloc_socket(); + if (!sock) + nl_cli_fatal(NLE_NOMEM, "Cannot allocate nl_sock"); + + ret = nl_cli_connect(sock, NETLINK_GENERIC); + if (ret < 0) + nl_cli_fatal(ret, "Cannot connect handle"); + + ret = genl_register_family(&drm_genl_ops); + if (ret < 0) + nl_cli_fatal(ret, "Cannot register xe family"); + + ret = genl_ops_resolve(sock, &drm_genl_ops); + if (ret < 0) + nl_cli_fatal(ret, "Unable to resolve family name"); + + family_id = genl_ctrl_resolve(sock, drm_genl_ops.o_name); + if (family_id < 0) + nl_cli_fatal(NLE_INVAL, "Resolving of \"%s\" failed", drm_genl_ops.o_name); + + ret = nl_socket_modify_cb(sock, NL_CB_VALID, NL_CB_CUSTOM, genl_handle_msg, NULL); + if (ret < 0) + nl_cli_fatal(ret, "Unable to modify valid message callback"); + + switch (cmd) { + case LIST_ERRORS: + send_cmd(DRM_RAS_CMD_QUERY, 0); + break; + case READ_ONE: + send_cmd(DRM_RAS_CMD_READ_ONE, error_config_id); + break; + case READ_ALL: + send_cmd(DRM_RAS_CMD_READ_ALL, 0); + break; + case WAIT_ON_EVENT: + mcsock = nl_cli_alloc_socket(); + if (!mcsock) + nl_cli_fatal(NLE_NOMEM, "Cannot allocate nl_sock"); + + ret = nl_cli_connect(mcsock, NETLINK_GENERIC); + if (ret < 0) + nl_cli_fatal(ret, "Cannot connect handle"); + + ret = genl_ops_resolve(mcsock, &drm_genl_ops); + if (ret < 0) + nl_cli_fatal(ret, "Unable to resolve family name"); + + nl_socket_disable_seq_check(mcsock); + + mcgrp = genl_ctrl_resolve_grp(mcsock, drm_genl_ops.o_name, + DRM_GENL_MCAST_GROUP_NAME_CORR_ERR); + if (mcgrp < 0) + nl_cli_fatal(mcgrp, "failed to resolve generic netlink multicast group"); + + /* Join the multicast group. */ + ret = nl_socket_add_membership(mcsock, mcgrp); + if (ret < 0) + nl_cli_fatal(ret, "failed to join multicast group"); + + ret = nl_socket_modify_cb(mcsock, NL_CB_VALID, NL_CB_CUSTOM, genl_handle_msg, NULL); + if (ret < 0) + nl_cli_fatal(ret, "Unable to modify valid message callback"); + + printf("waiting for error event\n"); + ret = nl_recvmsgs_default(mcsock); + if (ret < 0) + nl_cli_fatal(ret, "Unable to receive message: %s", nl_geterror(ret)); + + nl_close(mcsock); + nl_socket_free(mcsock); + break; + default: + break; + } + + nl_close(sock); + nl_socket_free(sock); + + return 0; +} + diff --git a/tools/meson.build b/tools/meson.build index 4c45f16b9..a53d3917f 100644 --- a/tools/meson.build +++ b/tools/meson.build @@ -107,5 +107,10 @@ if libudev.found() install : true) endif +executable('drm_ras', 'drm_ras.c', + dependencies : [tool_deps, libnl, libnl_cli, libnl_genl], + install_rpath : bindir_rpathdir, + install : true) + subdir('i915-perf') subdir('null_state_gen') -- 2.25.1 ^ permalink raw reply related [flat|nested] 3+ messages in thread
* [Intel-xe] ✗ CI.Patch_applied: failure for A tool to demonstrate use of netlink sockets to read RAS error counters (rev2) 2023-08-25 12:02 [Intel-xe] [RFC i-g-t v2 0/1] A tool to demonstrate use of netlink sockets to read RAS error counters Aravind Iddamsetty 2023-08-25 12:02 ` [Intel-xe] [RFC i-g-t v2 1/1] tools/RAS: A tool to read " Aravind Iddamsetty @ 2023-08-25 12:08 ` Patchwork 1 sibling, 0 replies; 3+ messages in thread From: Patchwork @ 2023-08-25 12:08 UTC (permalink / raw) To: Iddamsetty, Aravind; +Cc: intel-xe == Series Details == Series: A tool to demonstrate use of netlink sockets to read RAS error counters (rev2) URL : https://patchwork.freedesktop.org/series/118437/ State : failure == Summary == === Applying kernel patches on branch 'drm-xe-next' with base: === Base commit: b9c9020fc drm/xe/pvc: Use fast copy engines as migrate engine on PVC === git am output follows === error: meson.build: does not exist in index .git/rebase-apply/patch:505: new blank line at EOF. + error: tools/meson.build: does not exist in index hint: Use 'git am --show-current-patch' to see the failed patch Applying: tools/RAS: A tool to read error counters Patch failed at 0001 tools/RAS: A tool to read error counters When you have resolved this problem, run "git am --continue". If you prefer to skip this patch, run "git am --skip" instead. To restore the original branch and stop patching, run "git am --abort". ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2023-08-25 12:08 UTC | newest] Thread overview: 3+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-08-25 12:02 [Intel-xe] [RFC i-g-t v2 0/1] A tool to demonstrate use of netlink sockets to read RAS error counters Aravind Iddamsetty 2023-08-25 12:02 ` [Intel-xe] [RFC i-g-t v2 1/1] tools/RAS: A tool to read " Aravind Iddamsetty 2023-08-25 12:08 ` [Intel-xe] ✗ CI.Patch_applied: failure for A tool to demonstrate use of netlink sockets to read RAS error counters (rev2) Patchwork
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox