intel-xe.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [RFC 00/34] Kill mem_access v2
@ 2024-01-26 20:30 Rodrigo Vivi
  2024-01-26 20:30 ` [RFC 01/34] Revert "drm/xe/uc: Store firmware binary in system-memory backed BO" Rodrigo Vivi
                   ` (36 more replies)
  0 siblings, 37 replies; 77+ messages in thread
From: Rodrigo Vivi @ 2024-01-26 20:30 UTC (permalink / raw)
  To: intel-xe; +Cc: Rodrigo Vivi

Hi all,

First of all, thank you so much for the good feedback and ideas on the
v1 of this RFC series:

v1: lore.kernel.org/all/20231228021232.2366249-1-rodrigo.vivi@intel.com

First of all, this v2 has a more organized/split patches. So I'd like
to ask you to start the reviews of the simple ones already, so I can
try to split the series and merge a little by little.

I have 2 pending issues on this series that I couldn't solve yet.
Matt Brost is already helping me on these, but any help is welcomed.

But as I told, I'd like to start the review of the simplest ones first
anyway, please!

Details of the current issues:

1. Underflow on a gpu-hang test coming from d3cold. The culprit is the
pm_runtime_{get,put} around g2h_outstanding.

[  476.450482] [IGT] xe_exec_threads: starting subtest threads-hang-basic
[  476.507039] xe 0000:03:00.0: [drm] Engine reset: guc_id=78
[snip]
[  476.518814] xe 0000:03:00.0: [drm] Timedout job: seqno=4294967169, guc_id=78, flags=0x8
[  476.524595] xe 0000:03:00.0: [drm] Engine reset: guc_id=78
[  476.532615] xe 0000:03:00.0: [drm] Xe device coredump has been created[  476.801805] [IGT] xe_exec_threads: finished subtest threads-hang-basic, SUCCESS
[  476.808391] xe 0000:03:00.0: Runtime PM usage count underflow!
[  476.813455] [IGT] xe_exec_threads: exiting, ret=0
[  476.819146] xe 0000:03:00.0: Runtime PM usage count underflow!
[  476.829816] xe 0000:03:00.0: Runtime PM usage count underflow!
[and on, and on]

2. Failing rmmod due to an invalidation that happens at xe_pci_removal
(a case that fails only coming from D3cold and with display_enabled, but
on idle/blank-screen)

[  326.857464] xe 0000:03:00.0: [drm] GT0: resumed
[  327.135455] show_signal_msg: 126 callbacks suppressed
[  327.135467] gnome-shell[2488]: segfault at 0 ip 00007fad50e315cc sp 00007ffd6f04a360 error 4 in libmutter-clutter-11.so.0.0.0[7fad50dbd000+97000] likely on CPU 15 (core 28, socket 0)
[  327.157020] Code: e9 6f ff ff ff 66 0f 1f 84 00 00 00 00 00 48 8b 05 49 12 07 00 48 85 c0 74 4c 48 8b 38 e8 fc 49 f9 ff 48 89 c5 e8 14 75 f9 ff <48> 8b 55 00 48 8b 92 d0 00 00 00 48 85 d2 74 07 89 c6 48 89 ef ff
[  328.160905] xe 0000:03:00.0: [drm] Xe device coredump has been deleted.
[  329.099696] pci 0000:03:00.0: [drm] *ERROR* GT0: TLB invalidation time'd out, seqno=1668, recv=1667
[sip]
[  329.312763] ------------[ cut here ]------------
[  329.317417] pci 0000:03:00.0: [drm] Assertion `ct->g2h_outstanding == 0 || state == XE_GUC_CT_STATE_STOPPED` failed!
               platform: 7 subplatform: 4
               graphics: Xe_HPG 12.55 step C0
               media: Xe_HPM 12.55 step C0
               tile: 0 VRAM 8.00 GiB
               GT: 0 type 1

Thanks in advance,
Rodrigo.

Rodrigo Vivi (34):
  Revert "drm/xe/uc: Store firmware binary in system-memory backed BO"
  drm/xe: Document Xe PM component
  drm/xe: Fix display runtime_pm handling
  drm/xe: Create a xe_pm_runtime_resume_and_get variant for display
  drm/xe: Convert xe_pm_runtime_{get,put} to void and protect from
    recursion
  drm/xe: Prepare display for D3Cold
  drm/xe: Convert mem_access assertion towards the runtime_pm state
  drm/xe: Runtime PM wake on every IOCTL
  drm/xe: Convert kunit tests from mem_access to xe_pm_runtime
  drm/xe: Convert scheduler towards direct pm_runtime
  drm/xe: Runtime PM wake on every sysfs call
  drm/xe: Ensure device is awake before removing it
  drm/xe: Remove mem_access from guc_pc calls
  drm/xe: Runtime PM wake on every debugfs call
  drm/xe: Replace dma_buf mem_access per direct xe_pm_runtime calls
  drm/xe: Removing extra mem_access protection from runtime pm
  drm/xe: Convert hwmon from mem_access to xe_pm_runtime calls
  drm/xe: Move lockdep protection from mem_access to xe_pm_runtime
  drm/xe: Remove pm_runtime lockdep
  drm/xe: Stop checking for power_lost on D3Cold
  drm/xe: Convert GuC CT paths from mem_access to xe_pm_runtime
  drm/xe: Keep D0 for the entire duration of a LR VM
  drm/xe: Ensure D0 on TLB invalidation
  drm/xe: Remove useless mem_access protection for query ioctls
  drm/xe: Convert gsc_work from mem_access to xe_pm_runtime
  drm/xe: VMs don't need the mem_access protection anymore
  drm/xe: Remove useless mem_access during probe
  drm/xe: Remove mem_access from suspend and resume functions
  drm/xe: Convert gt_reset from mem_access to xe_pm_runtime
  drm/xe: Remove useless mem_access on PAT dumps
  drm/xe: Remove inner mem_access protections
  drm/xe: Kill xe_device_mem_access_{get*,put}
  drm/xe: Remove unused runtime pm helper
  drm/xe: Enable D3Cold on 'low' VRAM utilization

 .../gpu/drm/xe/compat-i915-headers/i915_drv.h |   8 +-
 drivers/gpu/drm/xe/display/xe_fb_pin.c        |   7 +-
 drivers/gpu/drm/xe/tests/xe_bo.c              |   8 +-
 drivers/gpu/drm/xe/tests/xe_migrate.c         |   7 +-
 drivers/gpu/drm/xe/tests/xe_mocs.c            |  14 +-
 drivers/gpu/drm/xe/xe_bo.c                    |  10 +-
 drivers/gpu/drm/xe/xe_debugfs.c               |  13 +-
 drivers/gpu/drm/xe/xe_device.c                | 129 ++++------
 drivers/gpu/drm/xe/xe_device.h                |   9 -
 drivers/gpu/drm/xe/xe_device_sysfs.c          |   4 +
 drivers/gpu/drm/xe/xe_device_types.h          |   6 -
 drivers/gpu/drm/xe/xe_display.c               |  22 ++
 drivers/gpu/drm/xe/xe_display.h               |   2 +
 drivers/gpu/drm/xe/xe_dma_buf.c               |   5 +-
 drivers/gpu/drm/xe/xe_exec_queue.c            |  19 --
 drivers/gpu/drm/xe/xe_ggtt.c                  |   6 -
 drivers/gpu/drm/xe/xe_gpu_scheduler.c         |   8 +-
 drivers/gpu/drm/xe/xe_gpu_scheduler.h         |   3 +-
 drivers/gpu/drm/xe/xe_gpu_scheduler_types.h   |   2 +
 drivers/gpu/drm/xe/xe_gsc.c                   |   5 +-
 drivers/gpu/drm/xe/xe_gt.c                    |  21 +-
 drivers/gpu/drm/xe/xe_gt_debugfs.c            |  53 ++++-
 drivers/gpu/drm/xe/xe_gt_freq.c               |  38 ++-
 drivers/gpu/drm/xe/xe_gt_idle.c               |  23 +-
 drivers/gpu/drm/xe/xe_gt_throttle_sysfs.c     |   3 +
 drivers/gpu/drm/xe/xe_guc_ct.c                |  79 +++----
 drivers/gpu/drm/xe/xe_guc_ct_types.h          |   2 +
 drivers/gpu/drm/xe/xe_guc_debugfs.c           |   9 +-
 drivers/gpu/drm/xe/xe_guc_pc.c                |  62 +----
 drivers/gpu/drm/xe/xe_guc_submit.c            |   2 +-
 drivers/gpu/drm/xe/xe_huc_debugfs.c           |   5 +-
 drivers/gpu/drm/xe/xe_hw_engine_class_sysfs.c |  58 ++++-
 drivers/gpu/drm/xe/xe_hw_engine_class_sysfs.h |   7 +
 drivers/gpu/drm/xe/xe_hwmon.c                 |  25 +-
 drivers/gpu/drm/xe/xe_pat.c                   |  10 -
 drivers/gpu/drm/xe/xe_pci.c                   |   2 +-
 drivers/gpu/drm/xe/xe_pm.c                    | 223 +++++++++++++-----
 drivers/gpu/drm/xe/xe_pm.h                    |  16 +-
 drivers/gpu/drm/xe/xe_pt.c                    |   3 +
 drivers/gpu/drm/xe/xe_query.c                 |   4 -
 drivers/gpu/drm/xe/xe_sched_job.c             |  12 +-
 drivers/gpu/drm/xe/xe_tile.c                  |  10 +-
 drivers/gpu/drm/xe/xe_tile_sysfs.c            |   1 +
 drivers/gpu/drm/xe/xe_ttm_sys_mgr.c           |   5 +-
 drivers/gpu/drm/xe/xe_uc_fw.c                 |   4 +-
 drivers/gpu/drm/xe/xe_vm.c                    |  10 +-
 46 files changed, 565 insertions(+), 409 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 77+ messages in thread

end of thread, other threads:[~2024-02-28 16:54 UTC | newest]

Thread overview: 77+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-26 20:30 [RFC 00/34] Kill mem_access v2 Rodrigo Vivi
2024-01-26 20:30 ` [RFC 01/34] Revert "drm/xe/uc: Store firmware binary in system-memory backed BO" Rodrigo Vivi
2024-01-26 20:30 ` [RFC 02/34] drm/xe: Document Xe PM component Rodrigo Vivi
2024-01-29 10:38   ` Francois Dugast
2024-01-26 20:30 ` [RFC 03/34] drm/xe: Fix display runtime_pm handling Rodrigo Vivi
2024-02-05  9:11   ` Matthew Auld
2024-02-14 18:05     ` Rodrigo Vivi
2024-02-15  9:30       ` Matthew Auld
2024-02-15 22:19         ` Rodrigo Vivi
2024-01-26 20:30 ` [RFC 04/34] drm/xe: Create a xe_pm_runtime_resume_and_get variant for display Rodrigo Vivi
2024-01-26 20:30 ` [RFC 05/34] drm/xe: Convert xe_pm_runtime_{get, put} to void and protect from recursion Rodrigo Vivi
2024-01-26 20:30 ` [RFC 06/34] drm/xe: Prepare display for D3Cold Rodrigo Vivi
2024-01-26 20:30 ` [RFC 07/34] drm/xe: Convert mem_access assertion towards the runtime_pm state Rodrigo Vivi
2024-02-05  9:55   ` Matthew Auld
2024-02-14 18:15     ` Rodrigo Vivi
2024-01-26 20:30 ` [RFC 08/34] drm/xe: Runtime PM wake on every IOCTL Rodrigo Vivi
2024-02-05  9:39   ` Matthew Auld
2024-01-26 20:30 ` [RFC 09/34] drm/xe: Convert kunit tests from mem_access to xe_pm_runtime Rodrigo Vivi
2024-02-05  9:57   ` Matthew Auld
2024-01-26 20:30 ` [RFC 10/34] drm/xe: Convert scheduler towards direct pm_runtime Rodrigo Vivi
2024-02-05 10:46   ` Matthew Auld
2024-01-26 20:30 ` [RFC 11/34] drm/xe: Runtime PM wake on every sysfs call Rodrigo Vivi
2024-02-05 10:55   ` Matthew Auld
2024-02-14 18:48     ` Rodrigo Vivi
2024-01-26 20:30 ` [RFC 12/34] drm/xe: Ensure device is awake before removing it Rodrigo Vivi
2024-02-05 11:05   ` Matthew Auld
2024-02-14 18:51     ` Rodrigo Vivi
2024-01-26 20:30 ` [RFC 13/34] drm/xe: Remove mem_access from guc_pc calls Rodrigo Vivi
2024-02-05 11:08   ` Matthew Auld
2024-01-26 20:30 ` [RFC 14/34] drm/xe: Runtime PM wake on every debugfs call Rodrigo Vivi
2024-02-05 11:10   ` Matthew Auld
2024-02-14 18:57     ` Rodrigo Vivi
2024-01-26 20:30 ` [RFC 15/34] drm/xe: Replace dma_buf mem_access per direct xe_pm_runtime calls Rodrigo Vivi
2024-02-05 11:15   ` Matthew Auld
2024-01-26 20:30 ` [RFC 16/34] drm/xe: Removing extra mem_access protection from runtime pm Rodrigo Vivi
2024-02-05 11:23   ` Matthew Auld
2024-01-26 20:30 ` [RFC 17/34] drm/xe: Convert hwmon from mem_access to xe_pm_runtime calls Rodrigo Vivi
2024-02-05 11:25   ` Matthew Auld
2024-01-26 20:30 ` [RFC 18/34] drm/xe: Move lockdep protection from mem_access to xe_pm_runtime Rodrigo Vivi
2024-02-05 11:31   ` Matthew Auld
2024-01-26 20:30 ` [RFC 19/34] drm/xe: Remove pm_runtime lockdep Rodrigo Vivi
2024-02-05 11:54   ` Matthew Auld
2024-02-15 22:47     ` Rodrigo Vivi
2024-02-20 17:48       ` Matthew Auld
2024-02-28 16:53         ` Rodrigo Vivi
2024-01-26 20:30 ` [RFC 20/34] drm/xe: Stop checking for power_lost on D3Cold Rodrigo Vivi
2024-01-26 20:30 ` [RFC 21/34] drm/xe: Convert GuC CT paths from mem_access to xe_pm_runtime Rodrigo Vivi
2024-02-05 12:23   ` Matthew Auld
2024-02-28 16:51     ` Rodrigo Vivi
2024-01-26 20:30 ` [RFC 22/34] drm/xe: Keep D0 for the entire duration of a LR VM Rodrigo Vivi
2024-01-26 20:30 ` [RFC 23/34] drm/xe: Ensure D0 on TLB invalidation Rodrigo Vivi
2024-02-05 12:41   ` Matthew Auld
2024-01-26 20:30 ` [RFC 24/34] drm/xe: Remove useless mem_access protection for query ioctls Rodrigo Vivi
2024-02-05 12:43   ` Matthew Auld
2024-01-26 20:30 ` [RFC 25/34] drm/xe: Convert gsc_work from mem_access to xe_pm_runtime Rodrigo Vivi
2024-02-05 13:11   ` Matthew Auld
2024-01-26 20:30 ` [RFC 26/34] drm/xe: VMs don't need the mem_access protection anymore Rodrigo Vivi
2024-02-05 13:29   ` Matthew Auld
2024-02-15 22:37     ` Rodrigo Vivi
2024-01-26 20:30 ` [RFC 27/34] drm/xe: Remove useless mem_access during probe Rodrigo Vivi
2024-02-05 13:18   ` Matthew Auld
2024-01-26 20:30 ` [RFC 28/34] drm/xe: Remove mem_access from suspend and resume functions Rodrigo Vivi
2024-02-05 13:30   ` Matthew Auld
2024-01-26 20:30 ` [RFC 29/34] drm/xe: Convert gt_reset from mem_access to xe_pm_runtime Rodrigo Vivi
2024-02-05 13:33   ` Matthew Auld
2024-01-26 20:30 ` [RFC 30/34] drm/xe: Remove useless mem_access on PAT dumps Rodrigo Vivi
2024-02-05 13:34   ` Matthew Auld
2024-01-26 20:30 ` [RFC 31/34] drm/xe: Remove inner mem_access protections Rodrigo Vivi
2024-01-26 20:30 ` [RFC 32/34] drm/xe: Kill xe_device_mem_access_{get*,put} Rodrigo Vivi
2024-01-26 20:30 ` [RFC 33/34] drm/xe: Remove unused runtime pm helper Rodrigo Vivi
2024-01-26 20:30 ` [RFC 34/34] drm/xe: Enable D3Cold on 'low' VRAM utilization Rodrigo Vivi
2024-01-29 12:12   ` Matthew Auld
2024-01-29 19:01     ` Vivi, Rodrigo
2024-01-30 15:01       ` Gupta, Anshuman
2024-01-26 20:39 ` ✓ CI.Patch_applied: success for Kill mem_access v2 Patchwork
2024-01-26 20:40 ` ✗ CI.checkpatch: warning " Patchwork
2024-01-26 20:40 ` ✗ CI.KUnit: failure " Patchwork

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).