* [igt-dev] [RFC i-g-t 0/1] A tool to demonstrate use of netlink sockets to read RAS error counters @ 2023-05-26 16:30 Aravind Iddamsetty 2023-05-26 16:30 ` [igt-dev] [RFC i-g-t 1/1] tools/RAS: A tool to read " Aravind Iddamsetty 2023-05-26 17:49 ` [igt-dev] ✗ Fi.CI.BUILD: failure for A tool to demonstrate use of netlink sockets to read RAS " Patchwork 0 siblings, 2 replies; 5+ messages in thread From: Aravind Iddamsetty @ 2023-05-26 16:30 UTC (permalink / raw) To: intel-xe, igt-dev; +Cc: alexander.deucher, ogabbay, airlied, daniel This tool is to demonstrate the use of netlink sockets to read RAS error counters, which is being proposed via series "[RFC 0/5] Proposal to use netlink for RAS and Telemetry across drm subsystem". The tool supports the following commands: READ_ONE, READ_ALL, WAIT_ON_EVENT, LIST_ERRORS read single error counter: $ ./drm_ras READ_ONE --device=drm:/dev/dri/card1 --error_id=0x0000000000000005 counter value 0 read all error counters: $ ./drm_ras READ_ALL --device=drm:/dev/dri/card1 name config-id counter error-gt0-correctable-guc 0x0000000000000001 0 error-gt0-correctable-slm 0x0000000000000003 0 error-gt0-correctable-eu-ic 0x0000000000000004 0 error-gt0-correctable-eu-grf 0x0000000000000005 0 error-gt0-fatal-guc 0x0000000000000009 0 error-gt0-fatal-slm 0x000000000000000d 0 error-gt0-fatal-eu-grf 0x000000000000000f 0 error-gt0-fatal-fpu 0x0000000000000010 0 error-gt0-fatal-tlb 0x0000000000000011 0 error-gt0-fatal-l3-fabric 0x0000000000000012 0 error-gt0-correctable-subslice 0x0000000000000013 0 error-gt0-correctable-l3bank 0x0000000000000014 0 error-gt0-fatal-subslice 0x0000000000000015 0 error-gt0-fatal-l3bank 0x0000000000000016 0 error-gt0-sgunit-correctable 0x0000000000000017 0 error-gt0-sgunit-nonfatal 0x0000000000000018 0 error-gt0-sgunit-fatal 0x0000000000000019 0 error-gt0-soc-fatal-psf-csc-0 0x000000000000001a 0 error-gt0-soc-fatal-psf-csc-1 0x000000000000001b 0 error-gt0-soc-fatal-psf-csc-2 0x000000000000001c 0 error-gt0-soc-fatal-punit 0x000000000000001d 0 error-gt0-soc-fatal-psf-0 0x000000000000001e 0 error-gt0-soc-fatal-psf-1 0x000000000000001f 0 error-gt0-soc-fatal-psf-2 0x0000000000000020 0 error-gt0-soc-fatal-cd0 0x0000000000000021 0 error-gt0-soc-fatal-cd0-mdfi 0x0000000000000022 0 error-gt0-soc-fatal-mdfi-east 0x0000000000000023 0 error-gt0-soc-fatal-mdfi-south 0x0000000000000024 0 error-gt0-soc-fatal-hbm-ss0-0 0x0000000000000025 0 error-gt0-soc-fatal-hbm-ss0-1 0x0000000000000026 0 error-gt0-soc-fatal-hbm-ss0-2 0x0000000000000027 0 error-gt0-soc-fatal-hbm-ss0-3 0x0000000000000028 0 error-gt0-soc-fatal-hbm-ss0-4 0x0000000000000029 0 error-gt0-soc-fatal-hbm-ss0-5 0x000000000000002a 0 error-gt0-soc-fatal-hbm-ss0-6 0x000000000000002b 0 error-gt0-soc-fatal-hbm-ss0-7 0x000000000000002c 0 error-gt0-soc-fatal-hbm-ss1-0 0x000000000000002d 0 error-gt0-soc-fatal-hbm-ss1-1 0x000000000000002e 0 error-gt0-soc-fatal-hbm-ss1-2 0x000000000000002f 0 error-gt0-soc-fatal-hbm-ss1-3 0x0000000000000030 0 error-gt0-soc-fatal-hbm-ss1-4 0x0000000000000031 0 error-gt0-soc-fatal-hbm-ss1-5 0x0000000000000032 0 error-gt0-soc-fatal-hbm-ss1-6 0x0000000000000033 0 error-gt0-soc-fatal-hbm-ss1-7 0x0000000000000034 0 error-gt0-soc-fatal-hbm-ss2-0 0x0000000000000035 0 error-gt0-soc-fatal-hbm-ss2-1 0x0000000000000036 0 error-gt0-soc-fatal-hbm-ss2-2 0x0000000000000037 0 error-gt0-soc-fatal-hbm-ss2-3 0x0000000000000038 0 error-gt0-soc-fatal-hbm-ss2-4 0x0000000000000039 0 error-gt0-soc-fatal-hbm-ss2-5 0x000000000000003a 0 error-gt0-soc-fatal-hbm-ss2-6 0x000000000000003b 0 error-gt0-soc-fatal-hbm-ss2-7 0x000000000000003c 0 error-gt0-soc-fatal-hbm-ss3-0 0x000000000000003d 0 error-gt0-soc-fatal-hbm-ss3-1 0x000000000000003e 0 error-gt0-soc-fatal-hbm-ss3-2 0x000000000000003f 0 error-gt0-soc-fatal-hbm-ss3-3 0x0000000000000040 0 error-gt0-soc-fatal-hbm-ss3-4 0x0000000000000041 0 error-gt0-soc-fatal-hbm-ss3-5 0x0000000000000042 0 error-gt0-soc-fatal-hbm-ss3-6 0x0000000000000043 0 error-gt0-soc-fatal-hbm-ss3-7 0x0000000000000044 0 error-gt0-gsc-correctable-sram-ecc 0x0000000000000045 0 error-gt0-gsc-nonfatal-mia-shutdown 0x0000000000000046 0 error-gt0-gsc-nonfatal-mia-int 0x0000000000000047 0 error-gt0-gsc-nonfatal-sram-ecc 0x0000000000000048 0 error-gt0-gsc-nonfatal-wdg-timeout 0x0000000000000049 0 error-gt0-gsc-nonfatal-rom-parity 0x000000000000004a 0 error-gt0-gsc-nonfatal-ucode-parity 0x000000000000004b 0 error-gt0-gsc-nonfatal-glitch-det 0x000000000000004c 0 error-gt0-gsc-nonfatal-fuse-pull 0x000000000000004d 0 error-gt0-gsc-nonfatal-fuse-crc-check 0x000000000000004e 0 error-gt0-gsc-nonfatal-selfmbist 0x000000000000004f 0 error-gt0-gsc-nonfatal-aon-parity 0x0000000000000050 0 error-gt1-correctable-guc 0x1000000000000001 0 error-gt1-correctable-slm 0x1000000000000003 0 error-gt1-correctable-eu-ic 0x1000000000000004 0 error-gt1-correctable-eu-grf 0x1000000000000005 0 error-gt1-fatal-guc 0x1000000000000009 0 error-gt1-fatal-slm 0x100000000000000d 0 error-gt1-fatal-eu-grf 0x100000000000000f 0 error-gt1-fatal-fpu 0x1000000000000010 0 error-gt1-fatal-tlb 0x1000000000000011 0 error-gt1-fatal-l3-fabric 0x1000000000000012 0 error-gt1-correctable-subslice 0x1000000000000013 0 error-gt1-correctable-l3bank 0x1000000000000014 0 error-gt1-fatal-subslice 0x1000000000000015 0 error-gt1-fatal-l3bank 0x1000000000000016 0 error-gt1-sgunit-correctable 0x1000000000000017 0 error-gt1-sgunit-nonfatal 0x1000000000000018 0 error-gt1-sgunit-fatal 0x1000000000000019 0 error-gt1-soc-fatal-psf-csc-0 0x100000000000001a 0 error-gt1-soc-fatal-psf-csc-1 0x100000000000001b 0 error-gt1-soc-fatal-psf-csc-2 0x100000000000001c 0 error-gt1-soc-fatal-punit 0x100000000000001d 0 error-gt1-soc-fatal-psf-0 0x100000000000001e 0 error-gt1-soc-fatal-psf-1 0x100000000000001f 0 error-gt1-soc-fatal-psf-2 0x1000000000000020 0 error-gt1-soc-fatal-cd0 0x1000000000000021 0 error-gt1-soc-fatal-cd0-mdfi 0x1000000000000022 0 error-gt1-soc-fatal-mdfi-east 0x1000000000000023 0 error-gt1-soc-fatal-mdfi-south 0x1000000000000024 0 error-gt1-soc-fatal-hbm-ss0-0 0x1000000000000025 0 error-gt1-soc-fatal-hbm-ss0-1 0x1000000000000026 0 error-gt1-soc-fatal-hbm-ss0-2 0x1000000000000027 0 error-gt1-soc-fatal-hbm-ss0-3 0x1000000000000028 0 error-gt1-soc-fatal-hbm-ss0-4 0x1000000000000029 0 error-gt1-soc-fatal-hbm-ss0-5 0x100000000000002a 0 error-gt1-soc-fatal-hbm-ss0-6 0x100000000000002b 0 error-gt1-soc-fatal-hbm-ss0-7 0x100000000000002c 0 error-gt1-soc-fatal-hbm-ss1-0 0x100000000000002d 0 error-gt1-soc-fatal-hbm-ss1-1 0x100000000000002e 0 error-gt1-soc-fatal-hbm-ss1-2 0x100000000000002f 0 error-gt1-soc-fatal-hbm-ss1-3 0x1000000000000030 0 error-gt1-soc-fatal-hbm-ss1-4 0x1000000000000031 0 error-gt1-soc-fatal-hbm-ss1-5 0x1000000000000032 0 error-gt1-soc-fatal-hbm-ss1-6 0x1000000000000033 0 error-gt1-soc-fatal-hbm-ss1-7 0x1000000000000034 0 error-gt1-soc-fatal-hbm-ss2-0 0x1000000000000035 0 error-gt1-soc-fatal-hbm-ss2-1 0x1000000000000036 0 error-gt1-soc-fatal-hbm-ss2-2 0x1000000000000037 0 error-gt1-soc-fatal-hbm-ss2-3 0x1000000000000038 0 error-gt1-soc-fatal-hbm-ss2-4 0x1000000000000039 0 error-gt1-soc-fatal-hbm-ss2-5 0x100000000000003a 0 error-gt1-soc-fatal-hbm-ss2-6 0x100000000000003b 0 error-gt1-soc-fatal-hbm-ss2-7 0x100000000000003c 0 error-gt1-soc-fatal-hbm-ss3-0 0x100000000000003d 0 error-gt1-soc-fatal-hbm-ss3-1 0x100000000000003e 0 error-gt1-soc-fatal-hbm-ss3-2 0x100000000000003f 0 error-gt1-soc-fatal-hbm-ss3-3 0x1000000000000040 0 error-gt1-soc-fatal-hbm-ss3-4 0x1000000000000041 0 error-gt1-soc-fatal-hbm-ss3-5 0x1000000000000042 0 error-gt1-soc-fatal-hbm-ss3-6 0x1000000000000043 0 error-gt1-soc-fatal-hbm-ss3-7 0x1000000000000044 0 wait on a error event: $ ./drm_ras WAIT_ON_EVENT --device=drm:/dev/dri/card1 waiting for error event error event received counter value 0 list all errors: $ ./drm_ras LIST_ERRORS --device=drm:/dev/dri/card1 name config-id error-gt0-correctable-guc 0x0000000000000001 error-gt0-correctable-slm 0x0000000000000003 error-gt0-correctable-eu-ic 0x0000000000000004 error-gt0-correctable-eu-grf 0x0000000000000005 error-gt0-fatal-guc 0x0000000000000009 error-gt0-fatal-slm 0x000000000000000d error-gt0-fatal-eu-grf 0x000000000000000f error-gt0-fatal-fpu 0x0000000000000010 error-gt0-fatal-tlb 0x0000000000000011 error-gt0-fatal-l3-fabric 0x0000000000000012 error-gt0-correctable-subslice 0x0000000000000013 error-gt0-correctable-l3bank 0x0000000000000014 error-gt0-fatal-subslice 0x0000000000000015 error-gt0-fatal-l3bank 0x0000000000000016 error-gt0-sgunit-correctable 0x0000000000000017 error-gt0-sgunit-nonfatal 0x0000000000000018 error-gt0-sgunit-fatal 0x0000000000000019 error-gt0-soc-fatal-psf-csc-0 0x000000000000001a error-gt0-soc-fatal-psf-csc-1 0x000000000000001b error-gt0-soc-fatal-psf-csc-2 0x000000000000001c error-gt0-soc-fatal-punit 0x000000000000001d error-gt0-soc-fatal-psf-0 0x000000000000001e error-gt0-soc-fatal-psf-1 0x000000000000001f error-gt0-soc-fatal-psf-2 0x0000000000000020 error-gt0-soc-fatal-cd0 0x0000000000000021 error-gt0-soc-fatal-cd0-mdfi 0x0000000000000022 error-gt0-soc-fatal-mdfi-east 0x0000000000000023 error-gt0-soc-fatal-mdfi-south 0x0000000000000024 error-gt0-soc-fatal-hbm-ss0-0 0x0000000000000025 error-gt0-soc-fatal-hbm-ss0-1 0x0000000000000026 error-gt0-soc-fatal-hbm-ss0-2 0x0000000000000027 error-gt0-soc-fatal-hbm-ss0-3 0x0000000000000028 error-gt0-soc-fatal-hbm-ss0-4 0x0000000000000029 error-gt0-soc-fatal-hbm-ss0-5 0x000000000000002a error-gt0-soc-fatal-hbm-ss0-6 0x000000000000002b error-gt0-soc-fatal-hbm-ss0-7 0x000000000000002c error-gt0-soc-fatal-hbm-ss1-0 0x000000000000002d error-gt0-soc-fatal-hbm-ss1-1 0x000000000000002e error-gt0-soc-fatal-hbm-ss1-2 0x000000000000002f error-gt0-soc-fatal-hbm-ss1-3 0x0000000000000030 error-gt0-soc-fatal-hbm-ss1-4 0x0000000000000031 error-gt0-soc-fatal-hbm-ss1-5 0x0000000000000032 error-gt0-soc-fatal-hbm-ss1-6 0x0000000000000033 error-gt0-soc-fatal-hbm-ss1-7 0x0000000000000034 error-gt0-soc-fatal-hbm-ss2-0 0x0000000000000035 error-gt0-soc-fatal-hbm-ss2-1 0x0000000000000036 error-gt0-soc-fatal-hbm-ss2-2 0x0000000000000037 error-gt0-soc-fatal-hbm-ss2-3 0x0000000000000038 error-gt0-soc-fatal-hbm-ss2-4 0x0000000000000039 error-gt0-soc-fatal-hbm-ss2-5 0x000000000000003a error-gt0-soc-fatal-hbm-ss2-6 0x000000000000003b error-gt0-soc-fatal-hbm-ss2-7 0x000000000000003c error-gt0-soc-fatal-hbm-ss3-0 0x000000000000003d error-gt0-soc-fatal-hbm-ss3-1 0x000000000000003e error-gt0-soc-fatal-hbm-ss3-2 0x000000000000003f error-gt0-soc-fatal-hbm-ss3-3 0x0000000000000040 error-gt0-soc-fatal-hbm-ss3-4 0x0000000000000041 error-gt0-soc-fatal-hbm-ss3-5 0x0000000000000042 error-gt0-soc-fatal-hbm-ss3-6 0x0000000000000043 error-gt0-soc-fatal-hbm-ss3-7 0x0000000000000044 error-gt0-gsc-correctable-sram-ecc 0x0000000000000045 error-gt0-gsc-nonfatal-mia-shutdown 0x0000000000000046 error-gt0-gsc-nonfatal-mia-int 0x0000000000000047 error-gt0-gsc-nonfatal-sram-ecc 0x0000000000000048 error-gt0-gsc-nonfatal-wdg-timeout 0x0000000000000049 error-gt0-gsc-nonfatal-rom-parity 0x000000000000004a error-gt0-gsc-nonfatal-ucode-parity 0x000000000000004b error-gt0-gsc-nonfatal-glitch-det 0x000000000000004c error-gt0-gsc-nonfatal-fuse-pull 0x000000000000004d error-gt0-gsc-nonfatal-fuse-crc-check 0x000000000000004e error-gt0-gsc-nonfatal-selfmbist 0x000000000000004f error-gt0-gsc-nonfatal-aon-parity 0x0000000000000050 error-gt1-correctable-guc 0x1000000000000001 error-gt1-correctable-slm 0x1000000000000003 error-gt1-correctable-eu-ic 0x1000000000000004 error-gt1-correctable-eu-grf 0x1000000000000005 error-gt1-fatal-guc 0x1000000000000009 error-gt1-fatal-slm 0x100000000000000d error-gt1-fatal-eu-grf 0x100000000000000f error-gt1-fatal-fpu 0x1000000000000010 error-gt1-fatal-tlb 0x1000000000000011 error-gt1-fatal-l3-fabric 0x1000000000000012 error-gt1-correctable-subslice 0x1000000000000013 error-gt1-correctable-l3bank 0x1000000000000014 error-gt1-fatal-subslice 0x1000000000000015 error-gt1-fatal-l3bank 0x1000000000000016 error-gt1-sgunit-correctable 0x1000000000000017 error-gt1-sgunit-nonfatal 0x1000000000000018 error-gt1-sgunit-fatal 0x1000000000000019 error-gt1-soc-fatal-psf-csc-0 0x100000000000001a error-gt1-soc-fatal-psf-csc-1 0x100000000000001b error-gt1-soc-fatal-psf-csc-2 0x100000000000001c error-gt1-soc-fatal-punit 0x100000000000001d error-gt1-soc-fatal-psf-0 0x100000000000001e error-gt1-soc-fatal-psf-1 0x100000000000001f error-gt1-soc-fatal-psf-2 0x1000000000000020 error-gt1-soc-fatal-cd0 0x1000000000000021 error-gt1-soc-fatal-cd0-mdfi 0x1000000000000022 error-gt1-soc-fatal-mdfi-east 0x1000000000000023 error-gt1-soc-fatal-mdfi-south 0x1000000000000024 error-gt1-soc-fatal-hbm-ss0-0 0x1000000000000025 error-gt1-soc-fatal-hbm-ss0-1 0x1000000000000026 error-gt1-soc-fatal-hbm-ss0-2 0x1000000000000027 error-gt1-soc-fatal-hbm-ss0-3 0x1000000000000028 error-gt1-soc-fatal-hbm-ss0-4 0x1000000000000029 error-gt1-soc-fatal-hbm-ss0-5 0x100000000000002a error-gt1-soc-fatal-hbm-ss0-6 0x100000000000002b error-gt1-soc-fatal-hbm-ss0-7 0x100000000000002c error-gt1-soc-fatal-hbm-ss1-0 0x100000000000002d error-gt1-soc-fatal-hbm-ss1-1 0x100000000000002e error-gt1-soc-fatal-hbm-ss1-2 0x100000000000002f error-gt1-soc-fatal-hbm-ss1-3 0x1000000000000030 error-gt1-soc-fatal-hbm-ss1-4 0x1000000000000031 error-gt1-soc-fatal-hbm-ss1-5 0x1000000000000032 error-gt1-soc-fatal-hbm-ss1-6 0x1000000000000033 error-gt1-soc-fatal-hbm-ss1-7 0x1000000000000034 error-gt1-soc-fatal-hbm-ss2-0 0x1000000000000035 error-gt1-soc-fatal-hbm-ss2-1 0x1000000000000036 error-gt1-soc-fatal-hbm-ss2-2 0x1000000000000037 error-gt1-soc-fatal-hbm-ss2-3 0x1000000000000038 error-gt1-soc-fatal-hbm-ss2-4 0x1000000000000039 error-gt1-soc-fatal-hbm-ss2-5 0x100000000000003a error-gt1-soc-fatal-hbm-ss2-6 0x100000000000003b error-gt1-soc-fatal-hbm-ss2-7 0x100000000000003c error-gt1-soc-fatal-hbm-ss3-0 0x100000000000003d error-gt1-soc-fatal-hbm-ss3-1 0x100000000000003e error-gt1-soc-fatal-hbm-ss3-2 0x100000000000003f error-gt1-soc-fatal-hbm-ss3-3 0x1000000000000040 error-gt1-soc-fatal-hbm-ss3-4 0x1000000000000041 error-gt1-soc-fatal-hbm-ss3-5 0x1000000000000042 error-gt1-soc-fatal-hbm-ss3-6 0x1000000000000043 error-gt1-soc-fatal-hbm-ss3-7 0x1000000000000044 Cc: Alex Deucher <alexander.deucher@amd.com> Cc: David Airlie <airlied@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Oded Gabbay <ogabbay@kernel.org> Aravind Iddamsetty (1): tools/RAS: A tool to read error counters include/drm-uapi/drm_netlink.h | 58 +++++ meson.build | 4 + tools/drm_ras.c | 403 +++++++++++++++++++++++++++++++++ tools/meson.build | 5 + 4 files changed, 470 insertions(+) create mode 100644 include/drm-uapi/drm_netlink.h create mode 100644 tools/drm_ras.c -- 2.25.1 ^ permalink raw reply [flat|nested] 5+ messages in thread
* [igt-dev] [RFC i-g-t 1/1] tools/RAS: A tool to read error counters 2023-05-26 16:30 [igt-dev] [RFC i-g-t 0/1] A tool to demonstrate use of netlink sockets to read RAS error counters Aravind Iddamsetty @ 2023-05-26 16:30 ` Aravind Iddamsetty 2023-06-04 17:09 ` [igt-dev] [Intel-xe] " Tomer Tayar 2023-05-26 17:49 ` [igt-dev] ✗ Fi.CI.BUILD: failure for A tool to demonstrate use of netlink sockets to read RAS " Patchwork 1 sibling, 1 reply; 5+ messages in thread From: Aravind Iddamsetty @ 2023-05-26 16:30 UTC (permalink / raw) To: intel-xe, igt-dev; +Cc: alexander.deucher, ogabbay, airlied, daniel This tool demonstrates the use of netlink sockets to query and read the error counters on a hardware. It provides following commands LIST_ERRORS, READ_ONE, READ_ALL to read counters and WAIT_ON_EVENT to wait for occurrence on a particular event, presently hardcoded to wait on occurrence of correctable error event and read a error counter. Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com> --- include/drm-uapi/drm_netlink.h | 58 +++++ meson.build | 4 + tools/drm_ras.c | 403 +++++++++++++++++++++++++++++++++ tools/meson.build | 5 + 4 files changed, 470 insertions(+) create mode 100644 include/drm-uapi/drm_netlink.h create mode 100644 tools/drm_ras.c diff --git a/include/drm-uapi/drm_netlink.h b/include/drm-uapi/drm_netlink.h new file mode 100644 index 000000000..a41d658c1 --- /dev/null +++ b/include/drm-uapi/drm_netlink.h @@ -0,0 +1,58 @@ +/* + * Copyright 2023 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * VA LINUX SYSTEMS AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR + * OTHER DEALINGS IN THE SOFTWARE. + */ + +#ifndef _DRM_NETLINK_H_ +#define _DRM_NETLINK_H_ + +#define DRM_GENL_VERSION 1 +#define DRM_GENL_MCAST_GROUP_NAME_CORR_ERR "drm_corr_err" +#define DRM_GENL_MCAST_GROUP_NAME_UNCORR_ERR "drm_uncorr_err" + +enum error_cmds { + DRM_CMD_UNSPEC, + /* command to list all errors names with config-id */ + DRM_CMD_QUERY, + /* command to get a counter for a specific error */ + DRM_CMD_READ_ONE, + /* command to get counters of all errors */ + DRM_CMD_READ_ALL, + DRM_CMD_ERROR_EVENT, + + __DRM_CMD_MAX, + DRM_CMD_MAX = __DRM_CMD_MAX - 1, +}; + +enum error_attr { + DRM_ATTR_UNSPEC, + DRM_ATTR_PAD = DRM_ATTR_UNSPEC, + DRM_ATTR_REQUEST, /* NLA_U8 */ + DRM_ATTR_QUERY_REPLY, /*NLA_NESTED*/ + DRM_ATTR_ERROR_NAME, /* NLA_NUL_STRING */ + DRM_ATTR_ERROR_ID, /* NLA_U64 */ + DRM_ATTR_ERROR_VALUE, /* NLA_U64 */ + + __DRM_ATTR_MAX, + DRM_ATTR_MAX = __DRM_ATTR_MAX - 1, +}; + +#endif diff --git a/meson.build b/meson.build index 7360634fe..269a9310f 100644 --- a/meson.build +++ b/meson.build @@ -141,6 +141,10 @@ cairo = dependency('cairo', version : '>1.12.0', required : true) libudev = dependency('libudev', required : true) glib = dependency('glib-2.0', required : true) +libnl = dependency('libnl-3.0', required: false) +libnl_genl = dependency('libnl-genl-3.0', required: false) +libnl_cli = dependency('libnl-cli-3.0', required:false) + xmlrpc = dependency('xmlrpc', required : false) xmlrpc_util = dependency('xmlrpc_util', required : false) xmlrpc_client = dependency('xmlrpc_client', required : false) diff --git a/tools/drm_ras.c b/tools/drm_ras.c new file mode 100644 index 000000000..f0ac99c79 --- /dev/null +++ b/tools/drm_ras.c @@ -0,0 +1,403 @@ +// SPDX-License-Identifier: MIT +/* + * Copyright © 2021 Intel Corporation + */ + +#include <stdio.h> +#include <sys/types.h> +#include <unistd.h> +#include <getopt.h> +#include <linux/genetlink.h> +#include <netlink/cli/utils.h> + +#include "drm_netlink.h" +#include "igt_device_scan.h" + +#define ARRAY_SIZE(array) (sizeof(array) / sizeof((array)[0])) + +struct nl_sock *sock, *mcsock; +int family_id; + +enum opt_val { + OPT_UNKNOWN = '?', + OPT_END = -1, + OPT_DEVICE, + OPT_CONFIG, + OPT_HELP, +}; + +enum cmd_ids { + INVALID_CMD = -1, + LIST_ERRORS = 0, + READ_ONE, + READ_ALL, + WAIT_ON_EVENT, + + __MAX_CMDS, +}; + +static const char * const cmd_names[] = { + "LIST_ERRORS", + "READ_ONE", + "READ_ALL", + "WAIT_ON_EVENT", +}; + +static void help(char **argv) +{ + int i; + + printf("Usage: %s command [<command options>]\n", argv[0]); + printf("commands:\n"); + + for (i = 0; i < __MAX_CMDS; i++) { + switch (i) { + case LIST_ERRORS: + case READ_ALL: + case WAIT_ON_EVENT: + printf("%s %s --device=<device filter>\n", argv[0], cmd_names[i]); + break; + case READ_ONE: + printf("%s %s --device=<device filter> --error_id=<id returned from query>\n", argv[0], cmd_names[i]); + break; + } + } + + igt_device_print_filter_types(); +} + +static int list_errors(struct nl_cache_ops *ops, struct genl_cmd *cmd, + struct genl_info *info, void *arg) +{ + const struct nlmsghdr *nlh = info->nlh; + struct nlattr *nla; + int len, remain; + + len = GENL_HDRLEN; + + nlmsg_for_each_attr(nla, nlh, len, remain) { + if ((nla_type(nla) == DRM_ATTR_QUERY_REPLY) && nla_is_nested(nla)) { + struct nlattr *cur; + int rem; + + if (cmd->c_id == DRM_CMD_READ_ALL) + printf("%-50s\t%-18s\t%s\n", "name", "config-id", "counter"); + else + printf("%-50s\t%-18s\n", "name", "config-id"); + + nla_for_each_nested(cur, nla, rem) { + switch (nla_type(cur)) { + case DRM_ATTR_ERROR_NAME: + printf("\n%-50s", nla_get_string(cur)); + break; + case DRM_ATTR_ERROR_ID: + printf("\t0x%016lx", nla_get_u64(cur)); + break; + case DRM_ATTR_ERROR_VALUE: + printf("\t%lu", nla_get_u64(cur)); + break; + default: + break; + } + } + printf("\n"); + } + } + + return NL_OK; +} + +static int read_single(struct nl_cache_ops *ops, struct genl_cmd *cmd, + struct genl_info *info, void *arg) +{ + if (!info->attrs[DRM_ATTR_ERROR_VALUE]) + nl_cli_fatal(NLE_FAILURE, "DRM_ATTR_ERROR_VALUE attribute is missing"); + + printf("counter value %lu\n", nla_get_u64(info->attrs[DRM_ATTR_ERROR_VALUE])); + + return NL_OK; +} + +static int mcast_event_handler(struct nl_cache_ops *ops, struct genl_cmd *cmd, + struct genl_info *info, void *arg) +{ + struct nl_msg *msg; + uint64_t config = 0x0000000000000005; /* error-gt0-correctable-eu-grf */ + void *msg_head; + int ret; + + printf("error event received\n"); + + msg = nlmsg_alloc(); + if (!msg) + nl_cli_fatal(NLE_INVAL, "nlmsg_alloc failed\n"); + + msg_head = genlmsg_put(msg, NL_AUTO_PORT, NL_AUTO_SEQ, family_id, 0, 0, DRM_CMD_READ_ONE, 1); + if (!msg_head) + nl_cli_fatal(ENOMEM, "genlmsg_put failed\n"); + + nla_put_u64(msg, DRM_ATTR_ERROR_ID, config); + + ret = nl_send_auto(sock, msg); + if (ret < 0) + nl_cli_fatal(ret, "Unable to send message: %s", nl_geterror(ret)); + + ret = nl_recvmsgs_default(sock); + if (ret < 0) + nl_cli_fatal(ret, "Unable to receive message: %s", nl_geterror(ret)); + + nlmsg_free(msg); + + return NL_OK; +} + +static struct nla_policy drm_genl_policy[DRM_ATTR_MAX + 1] = { + [DRM_ATTR_REQUEST] = { .type = NLA_U8 }, + [DRM_ATTR_QUERY_REPLY] = { .type = NLA_NESTED }, + [DRM_ATTR_ERROR_NAME] = { .type = NLA_NUL_STRING }, + [DRM_ATTR_ERROR_ID] = { .type = NLA_U64 }, + [DRM_ATTR_ERROR_VALUE] = { .type = NLA_U64 }, +}; + +static struct genl_cmd drm_genl_cmds[] = { + { + .c_id = DRM_CMD_QUERY, + .c_name = "QUERY", + .c_maxattr = DRM_ATTR_MAX, + .c_attr_policy = drm_genl_policy, + .c_msg_parser = list_errors, + }, + { + .c_id = DRM_CMD_READ_ONE, + .c_name = "READ_1", + .c_maxattr = DRM_ATTR_MAX, + .c_attr_policy = drm_genl_policy, + .c_msg_parser = read_single, + }, + { + .c_id = DRM_CMD_READ_ALL, + .c_name = "READ_ALL", + .c_maxattr = DRM_ATTR_MAX, + .c_attr_policy = drm_genl_policy, + .c_msg_parser = list_errors, + }, + { + .c_id = DRM_CMD_ERROR_EVENT, + .c_name = "ERROR_EVENT", + .c_maxattr = DRM_ATTR_MAX, + .c_attr_policy = drm_genl_policy, + .c_msg_parser = mcast_event_handler, + }, +}; + +static struct genl_ops drm_genl_ops = { + .o_hdrsize = 0, + .o_cmds = drm_genl_cmds, + .o_ncmds = ARRAY_SIZE(drm_genl_cmds), +}; + +static void send_cmd(int cmd, uint64_t config) +{ + struct nl_msg *msg; + void *msg_head; + int ret; + + msg = nlmsg_alloc(); + if (!msg) + nl_cli_fatal(NLE_INVAL, "nlmsg_alloc failed\n"); + + msg_head = genlmsg_put(msg, NL_AUTO_PORT, NL_AUTO_SEQ, family_id, 0, 0, cmd, 1); + if (!msg_head) + nl_cli_fatal(ENOMEM, "genlmsg_put failed\n"); + switch (cmd) { + case DRM_CMD_QUERY: + nla_put_u8(msg, DRM_ATTR_REQUEST, 1); + break; + case DRM_CMD_READ_ONE: + nla_put_u64(msg, DRM_ATTR_ERROR_ID, config); + break; + case DRM_CMD_READ_ALL: + nla_put_u8(msg, DRM_ATTR_REQUEST, 1); + break; + default: + break; + } + + ret = nl_send_auto(sock, msg); + if (ret < 0) + nl_cli_fatal(ret, "Unable to send message: %s", nl_geterror(ret)); + + ret = nl_recvmsgs_default(sock); + if (ret < 0) + nl_cli_fatal(ret, "Unable to receive message: %s", nl_geterror(ret)); + + nlmsg_free(msg); +} + +static int get_cmd(char *cmd_name) +{ + int i; + + if (!cmd_name) + return -1; + + for (i = 0; i < __MAX_CMDS; i++) { + if (strcasecmp(cmd_name, cmd_names[i]) == 0) + return i; + } + + return -1; +} + +int main(int argc, char **argv) +{ + char *endptr; + enum opt_val val; + enum cmd_ids cmd; + char *device = NULL; + uint64_t error_config_id; + int ret, mcgrp, index; + struct igt_device_card card; + char *dev_name, *dup; + + static struct option options[] = { + {"device", required_argument, NULL, OPT_DEVICE}, + {"error_id", required_argument, NULL, OPT_CONFIG}, + {"help", no_argument, NULL, OPT_HELP}, + { 0 } + }; + + cmd = get_cmd(argv[1]); + if (cmd < 0) { + fprintf(stderr, "invalid command\n"); + help(argv); + exit(EXIT_FAILURE); + } + + for (val = 0; val != OPT_END; ) { + val = getopt_long(argc, argv, "", options, &index); + + switch (val) { + case OPT_DEVICE: + device = strdup(optarg); + break; + case OPT_CONFIG: + error_config_id = strtoull(optarg, &endptr, 16); + if (*endptr) { + fprintf(stderr, "invalid config id %s\n", optarg); + exit(EXIT_FAILURE); + } + break; + case OPT_HELP: + help(argv); + exit(EXIT_FAILURE); + case OPT_END: + break; + case OPT_UNKNOWN: + exit(EXIT_FAILURE); + } + } + + if (!device) { + fprintf(stderr, "missing device option\n"); + help(argv); + exit(EXIT_FAILURE); + } else { + ret = igt_device_card_match_pci(device, &card); + if (!ret) { + fprintf(stderr, "device %s not found!\n", device); + exit(EXIT_FAILURE); + } + free(device); + } + + /* get card name */ + dup = strdup(card.card); + + while (dup) + dev_name = strsep(&dup, "/"); + free(dup); + + drm_genl_ops.o_name = strdup(dev_name); + + sock = nl_cli_alloc_socket(); + if (!sock) + nl_cli_fatal(NLE_NOMEM, "Cannot allocate nl_sock"); + + ret = nl_cli_connect(sock, NETLINK_GENERIC); + if (ret < 0) + nl_cli_fatal(ret, "Cannot connect handle"); + + ret = genl_register_family(&drm_genl_ops); + if (ret < 0) + nl_cli_fatal(ret, "Cannot register xe family"); + + ret = genl_ops_resolve(sock, &drm_genl_ops); + if (ret < 0) + nl_cli_fatal(ret, "Unable to resolve family name"); + + family_id = genl_ctrl_resolve(sock, drm_genl_ops.o_name); + if (family_id < 0) + nl_cli_fatal(NLE_INVAL, "Resolving of \"%s\" failed", drm_genl_ops.o_name); + + ret = nl_socket_modify_cb(sock, NL_CB_VALID, NL_CB_CUSTOM, genl_handle_msg, NULL); + if (ret < 0) + nl_cli_fatal(ret, "Unable to modify valid message callback"); + + switch (cmd) { + case LIST_ERRORS: + send_cmd(DRM_CMD_QUERY, 0); + break; + case READ_ONE: + send_cmd(DRM_CMD_READ_ONE, error_config_id); + break; + case READ_ALL: + send_cmd(DRM_CMD_READ_ALL, 0); + break; + case WAIT_ON_EVENT: + mcsock = nl_cli_alloc_socket(); + if (!mcsock) + nl_cli_fatal(NLE_NOMEM, "Cannot allocate nl_sock"); + + ret = nl_cli_connect(mcsock, NETLINK_GENERIC); + if (ret < 0) + nl_cli_fatal(ret, "Cannot connect handle"); + + ret = genl_ops_resolve(mcsock, &drm_genl_ops); + if (ret < 0) + nl_cli_fatal(ret, "Unable to resolve family name"); + + nl_socket_disable_seq_check(mcsock); + + mcgrp = genl_ctrl_resolve_grp(mcsock, drm_genl_ops.o_name, + DRM_GENL_MCAST_GROUP_NAME_CORR_ERR); + if (mcgrp < 0) + nl_cli_fatal(mcgrp, "failed to resolve generic netlink multicast group"); + + /* Join the multicast group. */ + ret = nl_socket_add_membership(mcsock, mcgrp); + if (ret < 0) + nl_cli_fatal(ret, "failed to join multicast group"); + + ret = nl_socket_modify_cb(mcsock, NL_CB_VALID, NL_CB_CUSTOM, genl_handle_msg, NULL); + if (ret < 0) + nl_cli_fatal(ret, "Unable to modify valid message callback"); + + printf("waiting for error event\n"); + ret = nl_recvmsgs_default(mcsock); + if (ret < 0) + nl_cli_fatal(ret, "Unable to receive message: %s", nl_geterror(ret)); + + nl_close(mcsock); + nl_socket_free(mcsock); + break; + default: + break; + } + + nl_close(sock); + nl_socket_free(sock); + + return 0; +} + diff --git a/tools/meson.build b/tools/meson.build index 4c45f16b9..a53d3917f 100644 --- a/tools/meson.build +++ b/tools/meson.build @@ -107,5 +107,10 @@ if libudev.found() install : true) endif +executable('drm_ras', 'drm_ras.c', + dependencies : [tool_deps, libnl, libnl_cli, libnl_genl], + install_rpath : bindir_rpathdir, + install : true) + subdir('i915-perf') subdir('null_state_gen') -- 2.25.1 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [igt-dev] [Intel-xe] [RFC i-g-t 1/1] tools/RAS: A tool to read error counters 2023-05-26 16:30 ` [igt-dev] [RFC i-g-t 1/1] tools/RAS: A tool to read " Aravind Iddamsetty @ 2023-06-04 17:09 ` Tomer Tayar 2023-06-05 17:27 ` Iddamsetty, Aravind 0 siblings, 1 reply; 5+ messages in thread From: Tomer Tayar @ 2023-06-04 17:09 UTC (permalink / raw) To: Aravind Iddamsetty, intel-xe@lists.freedesktop.org, igt-dev@lists.freedesktop.org Cc: alexander.deucher@amd.com, Oded Gabbay, airlied@gmail.com, daniel@ffwll.ch On 26/05/2023 19:30, Aravind Iddamsetty wrote: > This tool demonstrates the use of netlink sockets to query and read the > error counters on a hardware. It provides following commands LIST_ERRORS, > READ_ONE, READ_ALL to read counters and WAIT_ON_EVENT to wait for > occurrence on a particular event, presently hardcoded to wait on > occurrence of correctable error event and read a error counter. > > Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com> > --- > include/drm-uapi/drm_netlink.h | 58 +++++ > meson.build | 4 + > tools/drm_ras.c | 403 +++++++++++++++++++++++++++++++++ > tools/meson.build | 5 + > 4 files changed, 470 insertions(+) > create mode 100644 include/drm-uapi/drm_netlink.h > create mode 100644 tools/drm_ras.c > > diff --git a/include/drm-uapi/drm_netlink.h b/include/drm-uapi/drm_netlink.h > new file mode 100644 > index 000000000..a41d658c1 > --- /dev/null > +++ b/include/drm-uapi/drm_netlink.h > @@ -0,0 +1,58 @@ > +/* > + * Copyright 2023 Intel Corporation > + * > + * Permission is hereby granted, free of charge, to any person obtaining a > + * copy of this software and associated documentation files (the "Software"), > + * to deal in the Software without restriction, including without limitation > + * the rights to use, copy, modify, merge, publish, distribute, sublicense, > + * and/or sell copies of the Software, and to permit persons to whom the > + * Software is furnished to do so, subject to the following conditions: > + * > + * The above copyright notice and this permission notice (including the next > + * paragraph) shall be included in all copies or substantial portions of the > + * Software. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL > + * VA LINUX SYSTEMS AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR > + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, > + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR > + * OTHER DEALINGS IN THE SOFTWARE. > + */ > + > +#ifndef _DRM_NETLINK_H_ > +#define _DRM_NETLINK_H_ > + > +#define DRM_GENL_VERSION 1 > +#define DRM_GENL_MCAST_GROUP_NAME_CORR_ERR "drm_corr_err" > +#define DRM_GENL_MCAST_GROUP_NAME_UNCORR_ERR "drm_uncorr_err" > + > +enum error_cmds { > + DRM_CMD_UNSPEC, > + /* command to list all errors names with config-id */ > + DRM_CMD_QUERY, > + /* command to get a counter for a specific error */ > + DRM_CMD_READ_ONE, > + /* command to get counters of all errors */ > + DRM_CMD_READ_ALL, > + DRM_CMD_ERROR_EVENT, > + > + __DRM_CMD_MAX, > + DRM_CMD_MAX = __DRM_CMD_MAX - 1, > +}; > + > +enum error_attr { > + DRM_ATTR_UNSPEC, > + DRM_ATTR_PAD = DRM_ATTR_UNSPEC, > + DRM_ATTR_REQUEST, /* NLA_U8 */ > + DRM_ATTR_QUERY_REPLY, /*NLA_NESTED*/ > + DRM_ATTR_ERROR_NAME, /* NLA_NUL_STRING */ > + DRM_ATTR_ERROR_ID, /* NLA_U64 */ > + DRM_ATTR_ERROR_VALUE, /* NLA_U64 */ > + > + __DRM_ATTR_MAX, > + DRM_ATTR_MAX = __DRM_ATTR_MAX - 1, > +}; > + > +#endif drm_netlink.h is not identical to the kernel uapi file. Is it intentional? Thanks, Tomer > diff --git a/meson.build b/meson.build > index 7360634fe..269a9310f 100644 > --- a/meson.build > +++ b/meson.build > @@ -141,6 +141,10 @@ cairo = dependency('cairo', version : '>1.12.0', required : true) > libudev = dependency('libudev', required : true) > glib = dependency('glib-2.0', required : true) > > +libnl = dependency('libnl-3.0', required: false) > +libnl_genl = dependency('libnl-genl-3.0', required: false) > +libnl_cli = dependency('libnl-cli-3.0', required:false) > + > xmlrpc = dependency('xmlrpc', required : false) > xmlrpc_util = dependency('xmlrpc_util', required : false) > xmlrpc_client = dependency('xmlrpc_client', required : false) > diff --git a/tools/drm_ras.c b/tools/drm_ras.c > new file mode 100644 > index 000000000..f0ac99c79 > --- /dev/null > +++ b/tools/drm_ras.c > @@ -0,0 +1,403 @@ > +// SPDX-License-Identifier: MIT > +/* > + * Copyright © 2021 Intel Corporation > + */ > + > +#include <stdio.h> > +#include <sys/types.h> > +#include <unistd.h> > +#include <getopt.h> > +#include <linux/genetlink.h> > +#include <netlink/cli/utils.h> > + > +#include "drm_netlink.h" > +#include "igt_device_scan.h" > + > +#define ARRAY_SIZE(array) (sizeof(array) / sizeof((array)[0])) > + > +struct nl_sock *sock, *mcsock; > +int family_id; > + > +enum opt_val { > + OPT_UNKNOWN = '?', > + OPT_END = -1, > + OPT_DEVICE, > + OPT_CONFIG, > + OPT_HELP, > +}; > + > +enum cmd_ids { > + INVALID_CMD = -1, > + LIST_ERRORS = 0, > + READ_ONE, > + READ_ALL, > + WAIT_ON_EVENT, > + > + __MAX_CMDS, > +}; > + > +static const char * const cmd_names[] = { > + "LIST_ERRORS", > + "READ_ONE", > + "READ_ALL", > + "WAIT_ON_EVENT", > +}; > + > +static void help(char **argv) > +{ > + int i; > + > + printf("Usage: %s command [<command options>]\n", argv[0]); > + printf("commands:\n"); > + > + for (i = 0; i < __MAX_CMDS; i++) { > + switch (i) { > + case LIST_ERRORS: > + case READ_ALL: > + case WAIT_ON_EVENT: > + printf("%s %s --device=<device filter>\n", argv[0], cmd_names[i]); > + break; > + case READ_ONE: > + printf("%s %s --device=<device filter> --error_id=<id returned from query>\n", argv[0], cmd_names[i]); > + break; > + } > + } > + > + igt_device_print_filter_types(); > +} > + > +static int list_errors(struct nl_cache_ops *ops, struct genl_cmd *cmd, > + struct genl_info *info, void *arg) > +{ > + const struct nlmsghdr *nlh = info->nlh; > + struct nlattr *nla; > + int len, remain; > + > + len = GENL_HDRLEN; > + > + nlmsg_for_each_attr(nla, nlh, len, remain) { > + if ((nla_type(nla) == DRM_ATTR_QUERY_REPLY) && nla_is_nested(nla)) { > + struct nlattr *cur; > + int rem; > + > + if (cmd->c_id == DRM_CMD_READ_ALL) > + printf("%-50s\t%-18s\t%s\n", "name", "config-id", "counter"); > + else > + printf("%-50s\t%-18s\n", "name", "config-id"); > + > + nla_for_each_nested(cur, nla, rem) { > + switch (nla_type(cur)) { > + case DRM_ATTR_ERROR_NAME: > + printf("\n%-50s", nla_get_string(cur)); > + break; > + case DRM_ATTR_ERROR_ID: > + printf("\t0x%016lx", nla_get_u64(cur)); > + break; > + case DRM_ATTR_ERROR_VALUE: > + printf("\t%lu", nla_get_u64(cur)); > + break; > + default: > + break; > + } > + } > + printf("\n"); > + } > + } > + > + return NL_OK; > +} > + > +static int read_single(struct nl_cache_ops *ops, struct genl_cmd *cmd, > + struct genl_info *info, void *arg) > +{ > + if (!info->attrs[DRM_ATTR_ERROR_VALUE]) > + nl_cli_fatal(NLE_FAILURE, "DRM_ATTR_ERROR_VALUE attribute is missing"); > + > + printf("counter value %lu\n", nla_get_u64(info->attrs[DRM_ATTR_ERROR_VALUE])); > + > + return NL_OK; > +} > + > +static int mcast_event_handler(struct nl_cache_ops *ops, struct genl_cmd *cmd, > + struct genl_info *info, void *arg) > +{ > + struct nl_msg *msg; > + uint64_t config = 0x0000000000000005; /* error-gt0-correctable-eu-grf */ > + void *msg_head; > + int ret; > + > + printf("error event received\n"); > + > + msg = nlmsg_alloc(); > + if (!msg) > + nl_cli_fatal(NLE_INVAL, "nlmsg_alloc failed\n"); > + > + msg_head = genlmsg_put(msg, NL_AUTO_PORT, NL_AUTO_SEQ, family_id, 0, 0, DRM_CMD_READ_ONE, 1); > + if (!msg_head) > + nl_cli_fatal(ENOMEM, "genlmsg_put failed\n"); > + > + nla_put_u64(msg, DRM_ATTR_ERROR_ID, config); > + > + ret = nl_send_auto(sock, msg); > + if (ret < 0) > + nl_cli_fatal(ret, "Unable to send message: %s", nl_geterror(ret)); > + > + ret = nl_recvmsgs_default(sock); > + if (ret < 0) > + nl_cli_fatal(ret, "Unable to receive message: %s", nl_geterror(ret)); > + > + nlmsg_free(msg); > + > + return NL_OK; > +} > + > +static struct nla_policy drm_genl_policy[DRM_ATTR_MAX + 1] = { > + [DRM_ATTR_REQUEST] = { .type = NLA_U8 }, > + [DRM_ATTR_QUERY_REPLY] = { .type = NLA_NESTED }, > + [DRM_ATTR_ERROR_NAME] = { .type = NLA_NUL_STRING }, > + [DRM_ATTR_ERROR_ID] = { .type = NLA_U64 }, > + [DRM_ATTR_ERROR_VALUE] = { .type = NLA_U64 }, > +}; > + > +static struct genl_cmd drm_genl_cmds[] = { > + { > + .c_id = DRM_CMD_QUERY, > + .c_name = "QUERY", > + .c_maxattr = DRM_ATTR_MAX, > + .c_attr_policy = drm_genl_policy, > + .c_msg_parser = list_errors, > + }, > + { > + .c_id = DRM_CMD_READ_ONE, > + .c_name = "READ_1", > + .c_maxattr = DRM_ATTR_MAX, > + .c_attr_policy = drm_genl_policy, > + .c_msg_parser = read_single, > + }, > + { > + .c_id = DRM_CMD_READ_ALL, > + .c_name = "READ_ALL", > + .c_maxattr = DRM_ATTR_MAX, > + .c_attr_policy = drm_genl_policy, > + .c_msg_parser = list_errors, > + }, > + { > + .c_id = DRM_CMD_ERROR_EVENT, > + .c_name = "ERROR_EVENT", > + .c_maxattr = DRM_ATTR_MAX, > + .c_attr_policy = drm_genl_policy, > + .c_msg_parser = mcast_event_handler, > + }, > +}; > + > +static struct genl_ops drm_genl_ops = { > + .o_hdrsize = 0, > + .o_cmds = drm_genl_cmds, > + .o_ncmds = ARRAY_SIZE(drm_genl_cmds), > +}; > + > +static void send_cmd(int cmd, uint64_t config) > +{ > + struct nl_msg *msg; > + void *msg_head; > + int ret; > + > + msg = nlmsg_alloc(); > + if (!msg) > + nl_cli_fatal(NLE_INVAL, "nlmsg_alloc failed\n"); > + > + msg_head = genlmsg_put(msg, NL_AUTO_PORT, NL_AUTO_SEQ, family_id, 0, 0, cmd, 1); > + if (!msg_head) > + nl_cli_fatal(ENOMEM, "genlmsg_put failed\n"); > + switch (cmd) { > + case DRM_CMD_QUERY: > + nla_put_u8(msg, DRM_ATTR_REQUEST, 1); > + break; > + case DRM_CMD_READ_ONE: > + nla_put_u64(msg, DRM_ATTR_ERROR_ID, config); > + break; > + case DRM_CMD_READ_ALL: > + nla_put_u8(msg, DRM_ATTR_REQUEST, 1); > + break; > + default: > + break; > + } > + > + ret = nl_send_auto(sock, msg); > + if (ret < 0) > + nl_cli_fatal(ret, "Unable to send message: %s", nl_geterror(ret)); > + > + ret = nl_recvmsgs_default(sock); > + if (ret < 0) > + nl_cli_fatal(ret, "Unable to receive message: %s", nl_geterror(ret)); > + > + nlmsg_free(msg); > +} > + > +static int get_cmd(char *cmd_name) > +{ > + int i; > + > + if (!cmd_name) > + return -1; > + > + for (i = 0; i < __MAX_CMDS; i++) { > + if (strcasecmp(cmd_name, cmd_names[i]) == 0) > + return i; > + } > + > + return -1; > +} > + > +int main(int argc, char **argv) > +{ > + char *endptr; > + enum opt_val val; > + enum cmd_ids cmd; > + char *device = NULL; > + uint64_t error_config_id; > + int ret, mcgrp, index; > + struct igt_device_card card; > + char *dev_name, *dup; > + > + static struct option options[] = { > + {"device", required_argument, NULL, OPT_DEVICE}, > + {"error_id", required_argument, NULL, OPT_CONFIG}, > + {"help", no_argument, NULL, OPT_HELP}, > + { 0 } > + }; > + > + cmd = get_cmd(argv[1]); > + if (cmd < 0) { > + fprintf(stderr, "invalid command\n"); > + help(argv); > + exit(EXIT_FAILURE); > + } > + > + for (val = 0; val != OPT_END; ) { > + val = getopt_long(argc, argv, "", options, &index); > + > + switch (val) { > + case OPT_DEVICE: > + device = strdup(optarg); > + break; > + case OPT_CONFIG: > + error_config_id = strtoull(optarg, &endptr, 16); > + if (*endptr) { > + fprintf(stderr, "invalid config id %s\n", optarg); > + exit(EXIT_FAILURE); > + } > + break; > + case OPT_HELP: > + help(argv); > + exit(EXIT_FAILURE); > + case OPT_END: > + break; > + case OPT_UNKNOWN: > + exit(EXIT_FAILURE); > + } > + } > + > + if (!device) { > + fprintf(stderr, "missing device option\n"); > + help(argv); > + exit(EXIT_FAILURE); > + } else { > + ret = igt_device_card_match_pci(device, &card); > + if (!ret) { > + fprintf(stderr, "device %s not found!\n", device); > + exit(EXIT_FAILURE); > + } > + free(device); > + } > + > + /* get card name */ > + dup = strdup(card.card); > + > + while (dup) > + dev_name = strsep(&dup, "/"); > + free(dup); > + > + drm_genl_ops.o_name = strdup(dev_name); > + > + sock = nl_cli_alloc_socket(); > + if (!sock) > + nl_cli_fatal(NLE_NOMEM, "Cannot allocate nl_sock"); > + > + ret = nl_cli_connect(sock, NETLINK_GENERIC); > + if (ret < 0) > + nl_cli_fatal(ret, "Cannot connect handle"); > + > + ret = genl_register_family(&drm_genl_ops); > + if (ret < 0) > + nl_cli_fatal(ret, "Cannot register xe family"); > + > + ret = genl_ops_resolve(sock, &drm_genl_ops); > + if (ret < 0) > + nl_cli_fatal(ret, "Unable to resolve family name"); > + > + family_id = genl_ctrl_resolve(sock, drm_genl_ops.o_name); > + if (family_id < 0) > + nl_cli_fatal(NLE_INVAL, "Resolving of \"%s\" failed", drm_genl_ops.o_name); > + > + ret = nl_socket_modify_cb(sock, NL_CB_VALID, NL_CB_CUSTOM, genl_handle_msg, NULL); > + if (ret < 0) > + nl_cli_fatal(ret, "Unable to modify valid message callback"); > + > + switch (cmd) { > + case LIST_ERRORS: > + send_cmd(DRM_CMD_QUERY, 0); > + break; > + case READ_ONE: > + send_cmd(DRM_CMD_READ_ONE, error_config_id); > + break; > + case READ_ALL: > + send_cmd(DRM_CMD_READ_ALL, 0); > + break; > + case WAIT_ON_EVENT: > + mcsock = nl_cli_alloc_socket(); > + if (!mcsock) > + nl_cli_fatal(NLE_NOMEM, "Cannot allocate nl_sock"); > + > + ret = nl_cli_connect(mcsock, NETLINK_GENERIC); > + if (ret < 0) > + nl_cli_fatal(ret, "Cannot connect handle"); > + > + ret = genl_ops_resolve(mcsock, &drm_genl_ops); > + if (ret < 0) > + nl_cli_fatal(ret, "Unable to resolve family name"); > + > + nl_socket_disable_seq_check(mcsock); > + > + mcgrp = genl_ctrl_resolve_grp(mcsock, drm_genl_ops.o_name, > + DRM_GENL_MCAST_GROUP_NAME_CORR_ERR); > + if (mcgrp < 0) > + nl_cli_fatal(mcgrp, "failed to resolve generic netlink multicast group"); > + > + /* Join the multicast group. */ > + ret = nl_socket_add_membership(mcsock, mcgrp); > + if (ret < 0) > + nl_cli_fatal(ret, "failed to join multicast group"); > + > + ret = nl_socket_modify_cb(mcsock, NL_CB_VALID, NL_CB_CUSTOM, genl_handle_msg, NULL); > + if (ret < 0) > + nl_cli_fatal(ret, "Unable to modify valid message callback"); > + > + printf("waiting for error event\n"); > + ret = nl_recvmsgs_default(mcsock); > + if (ret < 0) > + nl_cli_fatal(ret, "Unable to receive message: %s", nl_geterror(ret)); > + > + nl_close(mcsock); > + nl_socket_free(mcsock); > + break; > + default: > + break; > + } > + > + nl_close(sock); > + nl_socket_free(sock); > + > + return 0; > +} > + > diff --git a/tools/meson.build b/tools/meson.build > index 4c45f16b9..a53d3917f 100644 > --- a/tools/meson.build > +++ b/tools/meson.build > @@ -107,5 +107,10 @@ if libudev.found() > install : true) > endif > > +executable('drm_ras', 'drm_ras.c', > + dependencies : [tool_deps, libnl, libnl_cli, libnl_genl], > + install_rpath : bindir_rpathdir, > + install : true) > + > subdir('i915-perf') > subdir('null_state_gen') ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [igt-dev] [Intel-xe] [RFC i-g-t 1/1] tools/RAS: A tool to read error counters 2023-06-04 17:09 ` [igt-dev] [Intel-xe] " Tomer Tayar @ 2023-06-05 17:27 ` Iddamsetty, Aravind 0 siblings, 0 replies; 5+ messages in thread From: Iddamsetty, Aravind @ 2023-06-05 17:27 UTC (permalink / raw) To: Tomer Tayar, intel-xe@lists.freedesktop.org, igt-dev@lists.freedesktop.org Cc: alexander.deucher@amd.com, Oded Gabbay, airlied@gmail.com, daniel@ffwll.ch On 04-06-2023 22:39, Tomer Tayar wrote: > On 26/05/2023 19:30, Aravind Iddamsetty wrote: >> This tool demonstrates the use of netlink sockets to query and read the >> error counters on a hardware. It provides following commands LIST_ERRORS, >> READ_ONE, READ_ALL to read counters and WAIT_ON_EVENT to wait for >> occurrence on a particular event, presently hardcoded to wait on >> occurrence of correctable error event and read a error counter. >> >> Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com> >> --- >> include/drm-uapi/drm_netlink.h | 58 +++++ >> meson.build | 4 + >> tools/drm_ras.c | 403 +++++++++++++++++++++++++++++++++ >> tools/meson.build | 5 + >> 4 files changed, 470 insertions(+) >> create mode 100644 include/drm-uapi/drm_netlink.h >> create mode 100644 tools/drm_ras.c >> >> diff --git a/include/drm-uapi/drm_netlink.h b/include/drm-uapi/drm_netlink.h >> new file mode 100644 >> index 000000000..a41d658c1 >> --- /dev/null >> +++ b/include/drm-uapi/drm_netlink.h >> @@ -0,0 +1,58 @@ >> +/* >> + * Copyright 2023 Intel Corporation >> + * >> + * Permission is hereby granted, free of charge, to any person obtaining a >> + * copy of this software and associated documentation files (the "Software"), >> + * to deal in the Software without restriction, including without limitation >> + * the rights to use, copy, modify, merge, publish, distribute, sublicense, >> + * and/or sell copies of the Software, and to permit persons to whom the >> + * Software is furnished to do so, subject to the following conditions: >> + * >> + * The above copyright notice and this permission notice (including the next >> + * paragraph) shall be included in all copies or substantial portions of the >> + * Software. >> + * >> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR >> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, >> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL >> + * VA LINUX SYSTEMS AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR >> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, >> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR >> + * OTHER DEALINGS IN THE SOFTWARE. >> + */ >> + >> +#ifndef _DRM_NETLINK_H_ >> +#define _DRM_NETLINK_H_ >> + >> +#define DRM_GENL_VERSION 1 >> +#define DRM_GENL_MCAST_GROUP_NAME_CORR_ERR "drm_corr_err" >> +#define DRM_GENL_MCAST_GROUP_NAME_UNCORR_ERR "drm_uncorr_err" >> + >> +enum error_cmds { >> + DRM_CMD_UNSPEC, >> + /* command to list all errors names with config-id */ >> + DRM_CMD_QUERY, >> + /* command to get a counter for a specific error */ >> + DRM_CMD_READ_ONE, >> + /* command to get counters of all errors */ >> + DRM_CMD_READ_ALL, >> + DRM_CMD_ERROR_EVENT, >> + >> + __DRM_CMD_MAX, >> + DRM_CMD_MAX = __DRM_CMD_MAX - 1, >> +}; >> + >> +enum error_attr { >> + DRM_ATTR_UNSPEC, >> + DRM_ATTR_PAD = DRM_ATTR_UNSPEC, >> + DRM_ATTR_REQUEST, /* NLA_U8 */ >> + DRM_ATTR_QUERY_REPLY, /*NLA_NESTED*/ >> + DRM_ATTR_ERROR_NAME, /* NLA_NUL_STRING */ >> + DRM_ATTR_ERROR_ID, /* NLA_U64 */ >> + DRM_ATTR_ERROR_VALUE, /* NLA_U64 */ >> + >> + __DRM_ATTR_MAX, >> + DRM_ATTR_MAX = __DRM_ATTR_MAX - 1, >> +}; >> + >> +#endif > > drm_netlink.h is not identical to the kernel uapi file. Is it intentional? If I split up the kernel header as I mentioned there it will be same as kernel uapi file. Thanks, Aravind. > > Thanks, > Tomer > >> diff --git a/meson.build b/meson.build >> index 7360634fe..269a9310f 100644 >> --- a/meson.build >> +++ b/meson.build >> @@ -141,6 +141,10 @@ cairo = dependency('cairo', version : '>1.12.0', required : true) >> libudev = dependency('libudev', required : true) >> glib = dependency('glib-2.0', required : true) >> >> +libnl = dependency('libnl-3.0', required: false) >> +libnl_genl = dependency('libnl-genl-3.0', required: false) >> +libnl_cli = dependency('libnl-cli-3.0', required:false) >> + >> xmlrpc = dependency('xmlrpc', required : false) >> xmlrpc_util = dependency('xmlrpc_util', required : false) >> xmlrpc_client = dependency('xmlrpc_client', required : false) >> diff --git a/tools/drm_ras.c b/tools/drm_ras.c >> new file mode 100644 >> index 000000000..f0ac99c79 >> --- /dev/null >> +++ b/tools/drm_ras.c >> @@ -0,0 +1,403 @@ >> +// SPDX-License-Identifier: MIT >> +/* >> + * Copyright © 2021 Intel Corporation >> + */ >> + >> +#include <stdio.h> >> +#include <sys/types.h> >> +#include <unistd.h> >> +#include <getopt.h> >> +#include <linux/genetlink.h> >> +#include <netlink/cli/utils.h> >> + >> +#include "drm_netlink.h" >> +#include "igt_device_scan.h" >> + >> +#define ARRAY_SIZE(array) (sizeof(array) / sizeof((array)[0])) >> + >> +struct nl_sock *sock, *mcsock; >> +int family_id; >> + >> +enum opt_val { >> + OPT_UNKNOWN = '?', >> + OPT_END = -1, >> + OPT_DEVICE, >> + OPT_CONFIG, >> + OPT_HELP, >> +}; >> + >> +enum cmd_ids { >> + INVALID_CMD = -1, >> + LIST_ERRORS = 0, >> + READ_ONE, >> + READ_ALL, >> + WAIT_ON_EVENT, >> + >> + __MAX_CMDS, >> +}; >> + >> +static const char * const cmd_names[] = { >> + "LIST_ERRORS", >> + "READ_ONE", >> + "READ_ALL", >> + "WAIT_ON_EVENT", >> +}; >> + >> +static void help(char **argv) >> +{ >> + int i; >> + >> + printf("Usage: %s command [<command options>]\n", argv[0]); >> + printf("commands:\n"); >> + >> + for (i = 0; i < __MAX_CMDS; i++) { >> + switch (i) { >> + case LIST_ERRORS: >> + case READ_ALL: >> + case WAIT_ON_EVENT: >> + printf("%s %s --device=<device filter>\n", argv[0], cmd_names[i]); >> + break; >> + case READ_ONE: >> + printf("%s %s --device=<device filter> --error_id=<id returned from query>\n", argv[0], cmd_names[i]); >> + break; >> + } >> + } >> + >> + igt_device_print_filter_types(); >> +} >> + >> +static int list_errors(struct nl_cache_ops *ops, struct genl_cmd *cmd, >> + struct genl_info *info, void *arg) >> +{ >> + const struct nlmsghdr *nlh = info->nlh; >> + struct nlattr *nla; >> + int len, remain; >> + >> + len = GENL_HDRLEN; >> + >> + nlmsg_for_each_attr(nla, nlh, len, remain) { >> + if ((nla_type(nla) == DRM_ATTR_QUERY_REPLY) && nla_is_nested(nla)) { >> + struct nlattr *cur; >> + int rem; >> + >> + if (cmd->c_id == DRM_CMD_READ_ALL) >> + printf("%-50s\t%-18s\t%s\n", "name", "config-id", "counter"); >> + else >> + printf("%-50s\t%-18s\n", "name", "config-id"); >> + >> + nla_for_each_nested(cur, nla, rem) { >> + switch (nla_type(cur)) { >> + case DRM_ATTR_ERROR_NAME: >> + printf("\n%-50s", nla_get_string(cur)); >> + break; >> + case DRM_ATTR_ERROR_ID: >> + printf("\t0x%016lx", nla_get_u64(cur)); >> + break; >> + case DRM_ATTR_ERROR_VALUE: >> + printf("\t%lu", nla_get_u64(cur)); >> + break; >> + default: >> + break; >> + } >> + } >> + printf("\n"); >> + } >> + } >> + >> + return NL_OK; >> +} >> + >> +static int read_single(struct nl_cache_ops *ops, struct genl_cmd *cmd, >> + struct genl_info *info, void *arg) >> +{ >> + if (!info->attrs[DRM_ATTR_ERROR_VALUE]) >> + nl_cli_fatal(NLE_FAILURE, "DRM_ATTR_ERROR_VALUE attribute is missing"); >> + >> + printf("counter value %lu\n", nla_get_u64(info->attrs[DRM_ATTR_ERROR_VALUE])); >> + >> + return NL_OK; >> +} >> + >> +static int mcast_event_handler(struct nl_cache_ops *ops, struct genl_cmd *cmd, >> + struct genl_info *info, void *arg) >> +{ >> + struct nl_msg *msg; >> + uint64_t config = 0x0000000000000005; /* error-gt0-correctable-eu-grf */ >> + void *msg_head; >> + int ret; >> + >> + printf("error event received\n"); >> + >> + msg = nlmsg_alloc(); >> + if (!msg) >> + nl_cli_fatal(NLE_INVAL, "nlmsg_alloc failed\n"); >> + >> + msg_head = genlmsg_put(msg, NL_AUTO_PORT, NL_AUTO_SEQ, family_id, 0, 0, DRM_CMD_READ_ONE, 1); >> + if (!msg_head) >> + nl_cli_fatal(ENOMEM, "genlmsg_put failed\n"); >> + >> + nla_put_u64(msg, DRM_ATTR_ERROR_ID, config); >> + >> + ret = nl_send_auto(sock, msg); >> + if (ret < 0) >> + nl_cli_fatal(ret, "Unable to send message: %s", nl_geterror(ret)); >> + >> + ret = nl_recvmsgs_default(sock); >> + if (ret < 0) >> + nl_cli_fatal(ret, "Unable to receive message: %s", nl_geterror(ret)); >> + >> + nlmsg_free(msg); >> + >> + return NL_OK; >> +} >> + >> +static struct nla_policy drm_genl_policy[DRM_ATTR_MAX + 1] = { >> + [DRM_ATTR_REQUEST] = { .type = NLA_U8 }, >> + [DRM_ATTR_QUERY_REPLY] = { .type = NLA_NESTED }, >> + [DRM_ATTR_ERROR_NAME] = { .type = NLA_NUL_STRING }, >> + [DRM_ATTR_ERROR_ID] = { .type = NLA_U64 }, >> + [DRM_ATTR_ERROR_VALUE] = { .type = NLA_U64 }, >> +}; >> + >> +static struct genl_cmd drm_genl_cmds[] = { >> + { >> + .c_id = DRM_CMD_QUERY, >> + .c_name = "QUERY", >> + .c_maxattr = DRM_ATTR_MAX, >> + .c_attr_policy = drm_genl_policy, >> + .c_msg_parser = list_errors, >> + }, >> + { >> + .c_id = DRM_CMD_READ_ONE, >> + .c_name = "READ_1", >> + .c_maxattr = DRM_ATTR_MAX, >> + .c_attr_policy = drm_genl_policy, >> + .c_msg_parser = read_single, >> + }, >> + { >> + .c_id = DRM_CMD_READ_ALL, >> + .c_name = "READ_ALL", >> + .c_maxattr = DRM_ATTR_MAX, >> + .c_attr_policy = drm_genl_policy, >> + .c_msg_parser = list_errors, >> + }, >> + { >> + .c_id = DRM_CMD_ERROR_EVENT, >> + .c_name = "ERROR_EVENT", >> + .c_maxattr = DRM_ATTR_MAX, >> + .c_attr_policy = drm_genl_policy, >> + .c_msg_parser = mcast_event_handler, >> + }, >> +}; >> + >> +static struct genl_ops drm_genl_ops = { >> + .o_hdrsize = 0, >> + .o_cmds = drm_genl_cmds, >> + .o_ncmds = ARRAY_SIZE(drm_genl_cmds), >> +}; >> + >> +static void send_cmd(int cmd, uint64_t config) >> +{ >> + struct nl_msg *msg; >> + void *msg_head; >> + int ret; >> + >> + msg = nlmsg_alloc(); >> + if (!msg) >> + nl_cli_fatal(NLE_INVAL, "nlmsg_alloc failed\n"); >> + >> + msg_head = genlmsg_put(msg, NL_AUTO_PORT, NL_AUTO_SEQ, family_id, 0, 0, cmd, 1); >> + if (!msg_head) >> + nl_cli_fatal(ENOMEM, "genlmsg_put failed\n"); >> + switch (cmd) { >> + case DRM_CMD_QUERY: >> + nla_put_u8(msg, DRM_ATTR_REQUEST, 1); >> + break; >> + case DRM_CMD_READ_ONE: >> + nla_put_u64(msg, DRM_ATTR_ERROR_ID, config); >> + break; >> + case DRM_CMD_READ_ALL: >> + nla_put_u8(msg, DRM_ATTR_REQUEST, 1); >> + break; >> + default: >> + break; >> + } >> + >> + ret = nl_send_auto(sock, msg); >> + if (ret < 0) >> + nl_cli_fatal(ret, "Unable to send message: %s", nl_geterror(ret)); >> + >> + ret = nl_recvmsgs_default(sock); >> + if (ret < 0) >> + nl_cli_fatal(ret, "Unable to receive message: %s", nl_geterror(ret)); >> + >> + nlmsg_free(msg); >> +} >> + >> +static int get_cmd(char *cmd_name) >> +{ >> + int i; >> + >> + if (!cmd_name) >> + return -1; >> + >> + for (i = 0; i < __MAX_CMDS; i++) { >> + if (strcasecmp(cmd_name, cmd_names[i]) == 0) >> + return i; >> + } >> + >> + return -1; >> +} >> + >> +int main(int argc, char **argv) >> +{ >> + char *endptr; >> + enum opt_val val; >> + enum cmd_ids cmd; >> + char *device = NULL; >> + uint64_t error_config_id; >> + int ret, mcgrp, index; >> + struct igt_device_card card; >> + char *dev_name, *dup; >> + >> + static struct option options[] = { >> + {"device", required_argument, NULL, OPT_DEVICE}, >> + {"error_id", required_argument, NULL, OPT_CONFIG}, >> + {"help", no_argument, NULL, OPT_HELP}, >> + { 0 } >> + }; >> + >> + cmd = get_cmd(argv[1]); >> + if (cmd < 0) { >> + fprintf(stderr, "invalid command\n"); >> + help(argv); >> + exit(EXIT_FAILURE); >> + } >> + >> + for (val = 0; val != OPT_END; ) { >> + val = getopt_long(argc, argv, "", options, &index); >> + >> + switch (val) { >> + case OPT_DEVICE: >> + device = strdup(optarg); >> + break; >> + case OPT_CONFIG: >> + error_config_id = strtoull(optarg, &endptr, 16); >> + if (*endptr) { >> + fprintf(stderr, "invalid config id %s\n", optarg); >> + exit(EXIT_FAILURE); >> + } >> + break; >> + case OPT_HELP: >> + help(argv); >> + exit(EXIT_FAILURE); >> + case OPT_END: >> + break; >> + case OPT_UNKNOWN: >> + exit(EXIT_FAILURE); >> + } >> + } >> + >> + if (!device) { >> + fprintf(stderr, "missing device option\n"); >> + help(argv); >> + exit(EXIT_FAILURE); >> + } else { >> + ret = igt_device_card_match_pci(device, &card); >> + if (!ret) { >> + fprintf(stderr, "device %s not found!\n", device); >> + exit(EXIT_FAILURE); >> + } >> + free(device); >> + } >> + >> + /* get card name */ >> + dup = strdup(card.card); >> + >> + while (dup) >> + dev_name = strsep(&dup, "/"); >> + free(dup); >> + >> + drm_genl_ops.o_name = strdup(dev_name); >> + >> + sock = nl_cli_alloc_socket(); >> + if (!sock) >> + nl_cli_fatal(NLE_NOMEM, "Cannot allocate nl_sock"); >> + >> + ret = nl_cli_connect(sock, NETLINK_GENERIC); >> + if (ret < 0) >> + nl_cli_fatal(ret, "Cannot connect handle"); >> + >> + ret = genl_register_family(&drm_genl_ops); >> + if (ret < 0) >> + nl_cli_fatal(ret, "Cannot register xe family"); >> + >> + ret = genl_ops_resolve(sock, &drm_genl_ops); >> + if (ret < 0) >> + nl_cli_fatal(ret, "Unable to resolve family name"); >> + >> + family_id = genl_ctrl_resolve(sock, drm_genl_ops.o_name); >> + if (family_id < 0) >> + nl_cli_fatal(NLE_INVAL, "Resolving of \"%s\" failed", drm_genl_ops.o_name); >> + >> + ret = nl_socket_modify_cb(sock, NL_CB_VALID, NL_CB_CUSTOM, genl_handle_msg, NULL); >> + if (ret < 0) >> + nl_cli_fatal(ret, "Unable to modify valid message callback"); >> + >> + switch (cmd) { >> + case LIST_ERRORS: >> + send_cmd(DRM_CMD_QUERY, 0); >> + break; >> + case READ_ONE: >> + send_cmd(DRM_CMD_READ_ONE, error_config_id); >> + break; >> + case READ_ALL: >> + send_cmd(DRM_CMD_READ_ALL, 0); >> + break; >> + case WAIT_ON_EVENT: >> + mcsock = nl_cli_alloc_socket(); >> + if (!mcsock) >> + nl_cli_fatal(NLE_NOMEM, "Cannot allocate nl_sock"); >> + >> + ret = nl_cli_connect(mcsock, NETLINK_GENERIC); >> + if (ret < 0) >> + nl_cli_fatal(ret, "Cannot connect handle"); >> + >> + ret = genl_ops_resolve(mcsock, &drm_genl_ops); >> + if (ret < 0) >> + nl_cli_fatal(ret, "Unable to resolve family name"); >> + >> + nl_socket_disable_seq_check(mcsock); >> + >> + mcgrp = genl_ctrl_resolve_grp(mcsock, drm_genl_ops.o_name, >> + DRM_GENL_MCAST_GROUP_NAME_CORR_ERR); >> + if (mcgrp < 0) >> + nl_cli_fatal(mcgrp, "failed to resolve generic netlink multicast group"); >> + >> + /* Join the multicast group. */ >> + ret = nl_socket_add_membership(mcsock, mcgrp); >> + if (ret < 0) >> + nl_cli_fatal(ret, "failed to join multicast group"); >> + >> + ret = nl_socket_modify_cb(mcsock, NL_CB_VALID, NL_CB_CUSTOM, genl_handle_msg, NULL); >> + if (ret < 0) >> + nl_cli_fatal(ret, "Unable to modify valid message callback"); >> + >> + printf("waiting for error event\n"); >> + ret = nl_recvmsgs_default(mcsock); >> + if (ret < 0) >> + nl_cli_fatal(ret, "Unable to receive message: %s", nl_geterror(ret)); >> + >> + nl_close(mcsock); >> + nl_socket_free(mcsock); >> + break; >> + default: >> + break; >> + } >> + >> + nl_close(sock); >> + nl_socket_free(sock); >> + >> + return 0; >> +} >> + >> diff --git a/tools/meson.build b/tools/meson.build >> index 4c45f16b9..a53d3917f 100644 >> --- a/tools/meson.build >> +++ b/tools/meson.build >> @@ -107,5 +107,10 @@ if libudev.found() >> install : true) >> endif >> >> +executable('drm_ras', 'drm_ras.c', >> + dependencies : [tool_deps, libnl, libnl_cli, libnl_genl], >> + install_rpath : bindir_rpathdir, >> + install : true) >> + >> subdir('i915-perf') >> subdir('null_state_gen') > > ^ permalink raw reply [flat|nested] 5+ messages in thread
* [igt-dev] ✗ Fi.CI.BUILD: failure for A tool to demonstrate use of netlink sockets to read RAS error counters 2023-05-26 16:30 [igt-dev] [RFC i-g-t 0/1] A tool to demonstrate use of netlink sockets to read RAS error counters Aravind Iddamsetty 2023-05-26 16:30 ` [igt-dev] [RFC i-g-t 1/1] tools/RAS: A tool to read " Aravind Iddamsetty @ 2023-05-26 17:49 ` Patchwork 1 sibling, 0 replies; 5+ messages in thread From: Patchwork @ 2023-05-26 17:49 UTC (permalink / raw) To: Aravind Iddamsetty; +Cc: igt-dev == Series Details == Series: A tool to demonstrate use of netlink sockets to read RAS error counters URL : https://patchwork.freedesktop.org/series/118438/ State : failure == Summary == IGT patchset build failed on latest successful build f0714273cd896c637759b3790f485308c4c97008 lib/igt_core: add helper for srandom seed initialization [425/555] Linking target tools/cnl_compute_wrpll. [426/555] Linking target tools/hsw_compute_wrpll. [427/555] Linking target tools/skl_compute_wrpll. [428/555] Linking target tools/skl_ddb_allocation. [429/555] Linking target tools/igt_stats. [430/555] Linking target tools/intel_audio_dump. [431/555] Linking target tools/intel_backlight. [432/555] Linking target tools/intel_bios_dumper. [433/555] Linking target tools/intel_display_crc. [434/555] Linking target tools/intel_display_poller. [435/555] Linking target tools/intel_dump_decode. [436/555] Linking target tools/intel_error_decode. [437/555] Linking target tools/intel_forcewaked. [438/555] Linking target tools/intel_gpu_frequency. [439/555] Linking target tools/intel_firmware_decode. [440/555] Linking target tools/intel_framebuffer_dump. [441/555] Linking target tools/intel_gpu_time. [442/555] Linking target tools/intel_gtt. [443/555] Linking target tools/intel_guc_logger. [444/555] Linking target tools/intel_infoframes. [445/555] Linking target tools/intel_lid. [446/555] Linking target tools/intel_panel_fitter. [447/555] Linking target tools/intel_opregion_decode. [448/555] Linking target tools/intel_pm_rpm. [449/555] Linking target tools/intel_perf_counters. [450/555] Linking target tools/intel_reg_checker. [451/555] Linking target tools/intel_residency. [452/555] Linking target tools/intel_stepping. [453/555] Linking target tools/intel_vbt_decode. [454/555] Linking target tools/intel_watermark. [455/555] Linking target tools/intel_gvtg_test. [456/555] Linking target tools/intel_gem_info. [457/555] Linking target tools/dpcd_reg. [458/555] Linking target tools/lsgpu. [459/555] Linking target tools/xe_reg. [460/555] Linking target tools/gputop. [461/555] Linking target tools/intel_dp_compliance. [462/555] Linking target tools/intel_l3_parity. [463/555] Linking target tools/intel_gpu_top. [464/555] Linking target tools/intel_reg. [465/555] Linking target tools/i915-perf/i915-perf-configs. [466/555] Linking target tools/amd_hdmi_compliance. [467/555] Compiling C object 'tools/f9d35d4@@drm_ras@exe/drm_ras.c.o'. FAILED: tools/f9d35d4@@drm_ras@exe/drm_ras.c.o cc -Itools/f9d35d4@@drm_ras@exe -Itools -I../../../usr/src/igt-gpu-tools/tools -I../../../usr/src/igt-gpu-tools/include -I../../../usr/src/igt-gpu-tools/include/drm-uapi -I../../../usr/src/igt-gpu-tools/include/linux-uapi -Ilib -I../../../usr/src/igt-gpu-tools/lib -I../../../usr/src/igt-gpu-tools/lib/stubs/syscalls -I. -I../../../usr/src/igt-gpu-tools/ -I/usr/include/cairo -I/usr/include/glib-2.0 -I/usr/lib/x86_64-linux-gnu/glib-2.0/include -I/usr/include/pixman-1 -I/usr/include/uuid -I/usr/include/freetype2 -I/usr/include/libpng16 -I/usr/include/libdrm -I/usr/include/x86_64-linux-gnu -I/usr/include/valgrind -I/usr/include -fdiagnostics-color=always -pipe -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Wextra -std=gnu11 -O2 -g -D_GNU_SOURCE -include config.h -D_FORTIFY_SOURCE=2 -Wbad-function-cast -Wdeclaration-after-statement -Wformat=2 -Wimplicit-fallthrough=0 -Wlogical-op -Wmissing-declarations -Wmissing-format-attribute -Wmissing-noreturn -Wmissing-prototypes -Wnested-externs -Wold-style-definition -Wpointer-arith -Wredundant-decls -Wshadow -Wstrict-prototypes -Wuninitialized -Wunused -Wno-clobbered -Wno-maybe-uninitialized -Wno-missing-field-initializers -Wno-pointer-arith -Wno-address-of-packed-member -Wno-sign-compare -Wno-type-limits -Wno-unused-parameter -Wno-unused-result -Werror=address -Werror=array-bounds -Werror=implicit -Werror=init-self -Werror=int-to-pointer-cast -Werror=main -Werror=missing-braces -Werror=nonnull -Werror=pointer-to-int-cast -Werror=return-type -Werror=sequence-point -Werror=trigraphs -Werror=write-strings -fno-builtin-malloc -fno-builtin-calloc -pthread -MD -MQ 'tools/f9d35d4@@drm_ras@exe/drm_ras.c.o' -MF 'tools/f9d35d4@@drm_ras@exe/drm_ras.c.o.d' -o 'tools/f9d35d4@@drm_ras@exe/drm_ras.c.o' -c ../../../usr/src/igt-gpu-tools/tools/drm_ras.c ../../../usr/src/igt-gpu-tools/tools/drm_ras.c:11:10: fatal error: netlink/cli/utils.h: No such file or directory 11 | #include <netlink/cli/utils.h> | ^~~~~~~~~~~~~~~~~~~~~ compilation terminated. ninja: build stopped: subcommand failed. ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2023-06-05 17:27 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-05-26 16:30 [igt-dev] [RFC i-g-t 0/1] A tool to demonstrate use of netlink sockets to read RAS error counters Aravind Iddamsetty 2023-05-26 16:30 ` [igt-dev] [RFC i-g-t 1/1] tools/RAS: A tool to read " Aravind Iddamsetty 2023-06-04 17:09 ` [igt-dev] [Intel-xe] " Tomer Tayar 2023-06-05 17:27 ` Iddamsetty, Aravind 2023-05-26 17:49 ` [igt-dev] ✗ Fi.CI.BUILD: failure for A tool to demonstrate use of netlink sockets to read RAS " Patchwork
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox