Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/6] drm/xe/xe_drm_client: Add per drm client reset stats
@ 2025-02-19 20:28 Jonathan Cavitt
  2025-02-19 20:28 ` [PATCH v4 1/6] drm/xe/xe_exec_queue: Add ID param to exec queue struct Jonathan Cavitt
                   ` (12 more replies)
  0 siblings, 13 replies; 17+ messages in thread
From: Jonathan Cavitt @ 2025-02-19 20:28 UTC (permalink / raw)
  To: intel-xe
  Cc: saurabhg.gupta, alex.zuo, jonathan.cavitt, joonas.lahtinen,
	tvrtko.ursulin, lucas.demarchi, matthew.brost, dri-devel,
	simona.vetter

Add additional information to drm client so it can report the last 50
exec queues to have been banned on it, as well as the last pagefault
seen when said exec queues were banned. Since we cannot reasonably
associate a pagefault to a specific exec queue, we currently report the
last seen pagefault on the associated hw engine instead.

The last pagefault seen per exec queue is saved to the hw engine, and the
pagefault is updated during the pagefault handling process in
xe_gt_pagefault. The last seen pagefault is reset when the engine is
reset because any future exec queue bans likely were not caused by said
pagefault after the reset.

Also add a tracker that counts the number of times the drm client has
experienced an engine reset.

Finally, add a new query to xe_query that reports these drm client reset
stats back to the user.

v2: Report the per drm client reset stats as a query, rather than
    coopting xe_drm_client_fdinfo (Joonas)
v3: Report EOPNOTSUPP during the reset stats query if CONFIG_PROC_FS
    is not set in the kernel config, as it is required to trace the
    reset count and exec queue bans.
v4: Fix formatting and kzalloc during lock warnings

Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
CC: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
CC: Lucas de Marchi <lucas.demarchi@intel.com>
CC: Matthew Brost <matthew.brost@intel.com>
CC: Simona Vetter <simona.vetter@ffwll.ch>

Jonathan Cavitt (6):
  drm/xe/xe_exec_queue: Add ID param to exec queue struct
  drm/xe/xe_gt_pagefault: Migrate pagefault struct to header
  drm/xe/xe_drm_client: Add per drm client pagefault info
  drm/xe/xe_drm_client: Add per drm client reset stats
  drm/xe/xe_query: Pass drm file to query funcs
  drm/xe/xe_query: Add support for per-drm-client reset stat querying

 drivers/gpu/drm/xe/xe_drm_client.c       |  68 ++++++++++++++
 drivers/gpu/drm/xe/xe_drm_client.h       |  44 +++++++++
 drivers/gpu/drm/xe/xe_exec_queue.c       |   8 ++
 drivers/gpu/drm/xe/xe_exec_queue_types.h |   2 +
 drivers/gpu/drm/xe/xe_gt_pagefault.c     |  44 ++++-----
 drivers/gpu/drm/xe/xe_gt_pagefault.h     |  28 ++++++
 drivers/gpu/drm/xe/xe_guc_submit.c       |  17 ++++
 drivers/gpu/drm/xe/xe_hw_engine.c        |   4 +
 drivers/gpu/drm/xe/xe_hw_engine_types.h  |   8 ++
 drivers/gpu/drm/xe/xe_query.c            | 109 ++++++++++++++++++++---
 include/uapi/drm/xe_drm.h                |  50 +++++++++++
 11 files changed, 343 insertions(+), 39 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 17+ messages in thread
* [PATCH v4 0/6] drm/xe/xe_drm_client: Add per drm client reset stats
@ 2025-02-20 20:38 Jonathan Cavitt
  2025-02-20 20:38 ` [PATCH v4 2/6] drm/xe/xe_gt_pagefault: Migrate pagefault struct to header Jonathan Cavitt
  0 siblings, 1 reply; 17+ messages in thread
From: Jonathan Cavitt @ 2025-02-20 20:38 UTC (permalink / raw)
  To: intel-xe
  Cc: saurabhg.gupta, alex.zuo, jonathan.cavitt, joonas.lahtinen,
	tvrtko.ursulin, lucas.demarchi, matthew.brost, dri-devel,
	simona.vetter

Add additional information to drm client so it can report the last 50
exec queues to have been banned on it, as well as the last pagefault
seen when said exec queues were banned. Since we cannot reasonably
associate a pagefault to a specific exec queue, we currently report the
last seen pagefault on the associated hw engine instead.

The last pagefault seen per exec queue is saved to the hw engine, and the
pagefault is updated during the pagefault handling process in
xe_gt_pagefault. The last seen pagefault is reset when the engine is
reset because any future exec queue bans likely were not caused by said
pagefault after the reset.

Also add a tracker that counts the number of times the drm client has
experienced an engine reset.

Finally, add a new query to xe_query that reports these drm client reset
stats back to the user.

v2: Report the per drm client reset stats as a query, rather than
    coopting xe_drm_client_fdinfo (Joonas)
v3: Report EOPNOTSUPP during the reset stats query if CONFIG_PROC_FS
    is not set in the kernel config, as it is required to trace the
    reset count and exec queue bans.
v4: Fix formatting and kzalloc during lock warnings

Test-with: 20250220203747.130371-1-jonathan.cavitt@intel.com

Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
CC: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
CC: Lucas de Marchi <lucas.demarchi@intel.com>
CC: Matthew Brost <matthew.brost@intel.com>
CC: Simona Vetter <simona.vetter@ffwll.ch>

Jonathan Cavitt (6):
  drm/xe/xe_exec_queue: Add ID param to exec queue struct
  drm/xe/xe_gt_pagefault: Migrate pagefault struct to header
  drm/xe/xe_drm_client: Add per drm client pagefault info
  drm/xe/xe_drm_client: Add per drm client reset stats
  drm/xe/xe_query: Pass drm file to query funcs
  drm/xe/xe_query: Add support for per-drm-client reset stat querying

 drivers/gpu/drm/xe/xe_drm_client.c       |  68 ++++++++++++++
 drivers/gpu/drm/xe/xe_drm_client.h       |  44 +++++++++
 drivers/gpu/drm/xe/xe_exec_queue.c       |   8 ++
 drivers/gpu/drm/xe/xe_exec_queue_types.h |   2 +
 drivers/gpu/drm/xe/xe_gt_pagefault.c     |  44 ++++-----
 drivers/gpu/drm/xe/xe_gt_pagefault.h     |  28 ++++++
 drivers/gpu/drm/xe/xe_guc_submit.c       |  17 ++++
 drivers/gpu/drm/xe/xe_hw_engine.c        |   4 +
 drivers/gpu/drm/xe/xe_hw_engine_types.h  |   8 ++
 drivers/gpu/drm/xe/xe_query.c            | 109 ++++++++++++++++++++---
 include/uapi/drm/xe_drm.h                |  50 +++++++++++
 11 files changed, 343 insertions(+), 39 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 17+ messages in thread
* [PATCH v4 0/6] drm/xe/xe_vm: Implement xe_vm_get_property_ioctl
@ 2025-03-03 22:00 Jonathan Cavitt
  2025-03-03 22:00 ` [PATCH v4 2/6] drm/xe/xe_gt_pagefault: Migrate pagefault struct to header Jonathan Cavitt
  0 siblings, 1 reply; 17+ messages in thread
From: Jonathan Cavitt @ 2025-03-03 22:00 UTC (permalink / raw)
  To: intel-xe
  Cc: saurabhg.gupta, alex.zuo, jonathan.cavitt, joonas.lahtinen,
	matthew.brost, jianxun.zhang, dri-devel

Add additional information to each VM so they can report up to the last
50 seen pagefaults.  Only failed pagefaults are saved this way, as
successful pagefaults should recover and not need to be reported to
userspace.

Additionally, add a new ioctl - xe_vm_get_property_ioctl - that allows the
user to query these pagefaults

v2: (Matt Brost)
- Break full ban list request into a separate property.
- Reformat drm_xe_vm_get_property struct.
- Remove need for drm_xe_faults helper struct.
- Separate data pointer and scalar return value in ioctl.
- Get address type on pagefault report and save it to the pagefault.
- Correctly reject writes to read-only VMAs.
- Miscellaneous formatting fixes.

v3: (Matt Brost)
- Only allow querying of failed pagefaults

v4:
- Remove unnecessary size parameter from helper function, as it
  is a property of the arguments. (jcavitt)
- Remove unnecessary copy_from_user (Jainxun)
- Set address_precision to 1 (Jainxun)
- Report max size instead of dynamic size for memory allocation
  purposes.  Total memory usage is reported separately.

Signed-off-by: Jonathan Cavitt <joanthan.cavitt@intel.com>
Suggested-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Suggested-by: Matthew Brost <matthew.brost@intel.com>
CC: Zhang Jianxun <jianxun.zhang@intel.com>

Jonathan Cavitt (6):
  drm/xe/xe_gt_pagefault: Disallow writes to read-only VMAs
  drm/xe/xe_gt_pagefault: Migrate pagefault struct to header
  drm/xe/xe_vm: Add per VM pagefault info
  drm/xe/uapi: Define drm_xe_vm_get_property
  drm/xe/xe_gt_pagefault: Add address_type field to pagefaults
  drm/xe/xe_vm: Implement xe_vm_get_property_ioctl

 drivers/gpu/drm/xe/xe_device.c       |   3 +
 drivers/gpu/drm/xe/xe_gt_pagefault.c |  66 +++++++--------
 drivers/gpu/drm/xe/xe_gt_pagefault.h |  29 +++++++
 drivers/gpu/drm/xe/xe_vm.c           | 122 +++++++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_vm.h           |   8 ++
 drivers/gpu/drm/xe/xe_vm_types.h     |  20 +++++
 include/uapi/drm/xe_drm.h            |  67 +++++++++++++++
 7 files changed, 281 insertions(+), 34 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2025-03-03 22:00 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-19 20:28 [PATCH v4 0/6] drm/xe/xe_drm_client: Add per drm client reset stats Jonathan Cavitt
2025-02-19 20:28 ` [PATCH v4 1/6] drm/xe/xe_exec_queue: Add ID param to exec queue struct Jonathan Cavitt
2025-02-19 20:28 ` [PATCH v4 2/6] drm/xe/xe_gt_pagefault: Migrate pagefault struct to header Jonathan Cavitt
2025-02-19 20:28 ` [PATCH v4 3/6] drm/xe/xe_drm_client: Add per drm client pagefault info Jonathan Cavitt
2025-02-19 20:28 ` [PATCH v4 4/6] drm/xe/xe_drm_client: Add per drm client reset stats Jonathan Cavitt
2025-02-19 20:28 ` [PATCH v4 5/6] drm/xe/xe_query: Pass drm file to query funcs Jonathan Cavitt
2025-02-19 20:28 ` [PATCH v4 6/6] drm/xe/xe_query: Add support for per-drm-client reset stat querying Jonathan Cavitt
2025-02-19 21:22 ` ✓ CI.Patch_applied: success for drm/xe/xe_drm_client: Add per drm client reset stats (rev4) Patchwork
2025-02-19 21:22 ` ✓ CI.checkpatch: " Patchwork
2025-02-19 21:24 ` ✓ CI.KUnit: " Patchwork
2025-02-19 21:40 ` ✓ CI.Build: " Patchwork
2025-02-19 21:42 ` ✓ CI.Hooks: " Patchwork
2025-02-19 21:44 ` ✓ CI.checksparse: " Patchwork
2025-02-20  5:49 ` ✓ Xe.CI.BAT: " Patchwork
  -- strict thread matches above, loose matches on Subject: below --
2025-02-20 20:38 [PATCH v4 0/6] drm/xe/xe_drm_client: Add per drm client reset stats Jonathan Cavitt
2025-02-20 20:38 ` [PATCH v4 2/6] drm/xe/xe_gt_pagefault: Migrate pagefault struct to header Jonathan Cavitt
2025-02-25 20:48   ` Matthew Brost
2025-03-03 22:00 [PATCH v4 0/6] drm/xe/xe_vm: Implement xe_vm_get_property_ioctl Jonathan Cavitt
2025-03-03 22:00 ` [PATCH v4 2/6] drm/xe/xe_gt_pagefault: Migrate pagefault struct to header Jonathan Cavitt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox