Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [Intel-xe] [PATCH v7 0/3] drm/xe/pmu: Enable PMU interface
@ 2023-09-14  6:13 Aravind Iddamsetty
  2023-09-14  6:13 ` [Intel-xe] [PATCH v3 1/3] drm/xe: Get GT clock to nanosecs Aravind Iddamsetty
                   ` (9 more replies)
  0 siblings, 10 replies; 17+ messages in thread
From: Aravind Iddamsetty @ 2023-09-14  6:13 UTC (permalink / raw)
  To: intel-xe; +Cc: rodrigo.vivi

There are a set of engine group busyness counters provided by HW which are
perfect fit to be exposed via PMU perf events.

BSPEC: 46559, 46560, 46722, 46729, 52071, 71028

events can be listed using:
perf list
  xe_0000_03_00.0/any-engine-group-busy-gt0/         [Kernel PMU event]
  xe_0000_03_00.0/copy-group-busy-gt0/               [Kernel PMU event]
  xe_0000_03_00.0/interrupts/                        [Kernel PMU event]
  xe_0000_03_00.0/media-group-busy-gt0/              [Kernel PMU event]
  xe_0000_03_00.0/render-group-busy-gt0/             [Kernel PMU event]

and can be read using:

perf stat -e "xe_0000_8c_00.0/render-group-busy-gt0/" -I 1000
           time             counts unit events
     1.001139062                  0 ns  xe_0000_8c_00.0/render-group-busy-gt0/
     2.003294678                  0 ns  xe_0000_8c_00.0/render-group-busy-gt0/
     3.005199582                  0 ns  xe_0000_8c_00.0/render-group-busy-gt0/
     4.007076497                  0 ns  xe_0000_8c_00.0/render-group-busy-gt0/
     5.008553068                  0 ns  xe_0000_8c_00.0/render-group-busy-gt0/
     6.010531563              43520 ns  xe_0000_8c_00.0/render-group-busy-gt0/
     7.012468029              44800 ns  xe_0000_8c_00.0/render-group-busy-gt0/
     8.013463515                  0 ns  xe_0000_8c_00.0/render-group-busy-gt0/
     9.015300183                  0 ns  xe_0000_8c_00.0/render-group-busy-gt0/
    10.017233010                  0 ns  xe_0000_8c_00.0/render-group-busy-gt0/
    10.971934120                  0 ns  xe_0000_8c_00.0/render-group-busy-gt0/

The pmu base implementation is taken from i915.

v7:
1. update UAPI documentation
2. drop MEDIA_GT specific change for media busyness counter.

v6:
1. drop engine_busyness_sample_type
2. update UAPI documentation

v5:
1. Use spinlock in forcewake instead of mutex
2. take forcewake when accessing the OAG registers

v4: minor nits.

v3:
1. drop init_samples, as storing counters before going to suspend should
be sufficient.
2. ported the "drm/i915/pmu: Make PMU sample array two-dimensional" and
dropped helpers to store and read samples.
3. use xe_device_mem_access_get_if_ongoing to check if device is active
before reading the OA registers.
4. dropped format attr as no longer needed
5. introduce xe_pmu_suspend to call engine_group_busyness_store
6. few other nits.

v2:
Store last known value when device is awake return that while the GT is
suspended and then update the driver copy when read during awake.

Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>

Aravind Iddamsetty (3):
  drm/xe: Get GT clock to nanosecs
  drm/xe: Use spinlock in forcewake instead of mutex
  drm/xe/pmu: Enable PMU interface

 drivers/gpu/drm/xe/Makefile              |   2 +
 drivers/gpu/drm/xe/regs/xe_gt_regs.h     |   5 +
 drivers/gpu/drm/xe/xe_device.c           |   2 +
 drivers/gpu/drm/xe/xe_device_types.h     |   4 +
 drivers/gpu/drm/xe/xe_force_wake.c       |  14 +-
 drivers/gpu/drm/xe/xe_force_wake_types.h |   2 +-
 drivers/gpu/drm/xe/xe_gt.c               |   2 +
 drivers/gpu/drm/xe/xe_gt_clock.c         |   5 +
 drivers/gpu/drm/xe/xe_gt_clock.h         |   4 +-
 drivers/gpu/drm/xe/xe_irq.c              |  18 +
 drivers/gpu/drm/xe/xe_module.c           |   5 +
 drivers/gpu/drm/xe/xe_pmu.c              | 654 +++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_pmu.h              |  25 +
 drivers/gpu/drm/xe/xe_pmu_types.h        |  76 +++
 include/uapi/drm/xe_drm.h                |  38 ++
 15 files changed, 847 insertions(+), 9 deletions(-)
 create mode 100644 drivers/gpu/drm/xe/xe_pmu.c
 create mode 100644 drivers/gpu/drm/xe/xe_pmu.h
 create mode 100644 drivers/gpu/drm/xe/xe_pmu_types.h

-- 
2.25.1


^ permalink raw reply	[flat|nested] 17+ messages in thread
* [Intel-xe] [PATCH v6 0/3] drm/xe/pmu: Enable PMU interface
@ 2023-09-01  7:06 Aravind Iddamsetty
  2023-09-01  7:06 ` [Intel-xe] [PATCH 2/3] drm/xe: Use spinlock in forcewake instead of mutex Aravind Iddamsetty
  0 siblings, 1 reply; 17+ messages in thread
From: Aravind Iddamsetty @ 2023-09-01  7:06 UTC (permalink / raw)
  To: intel-xe; +Cc: rodrigo.vivi

There are a set of engine group busyness counters provided by HW which are
perfect fit to be exposed via PMU perf events.

BSPEC: 46559, 46560, 46722, 46729, 52071, 71028

events can be listed using:
perf list
  xe_0000_03_00.0/any-engine-group-busy-gt0/         [Kernel PMU event]
  xe_0000_03_00.0/copy-group-busy-gt0/               [Kernel PMU event]
  xe_0000_03_00.0/interrupts/                        [Kernel PMU event]
  xe_0000_03_00.0/media-group-busy-gt0/              [Kernel PMU event]
  xe_0000_03_00.0/render-group-busy-gt0/             [Kernel PMU event]

and can be read using:

perf stat -e "xe_0000_8c_00.0/render-group-busy-gt0/" -I 1000
           time             counts unit events
     1.001139062                  0 ns  xe_0000_8c_00.0/render-group-busy-gt0/
     2.003294678                  0 ns  xe_0000_8c_00.0/render-group-busy-gt0/
     3.005199582                  0 ns  xe_0000_8c_00.0/render-group-busy-gt0/
     4.007076497                  0 ns  xe_0000_8c_00.0/render-group-busy-gt0/
     5.008553068                  0 ns  xe_0000_8c_00.0/render-group-busy-gt0/
     6.010531563              43520 ns  xe_0000_8c_00.0/render-group-busy-gt0/
     7.012468029              44800 ns  xe_0000_8c_00.0/render-group-busy-gt0/
     8.013463515                  0 ns  xe_0000_8c_00.0/render-group-busy-gt0/
     9.015300183                  0 ns  xe_0000_8c_00.0/render-group-busy-gt0/
    10.017233010                  0 ns  xe_0000_8c_00.0/render-group-busy-gt0/
    10.971934120                  0 ns  xe_0000_8c_00.0/render-group-busy-gt0/

The pmu base implementation is taken from i915.

v6:
1. drop engine_busyness_sample_type
2. update UAPI documentation

v5:
1. Use spinlock in forcewake instead of mutex
2. take forcewake when accessing the OAG registers

v4: minor nits.

v3:
1. drop init_samples, as storing counters before going to suspend should
be sufficient.
2. ported the "drm/i915/pmu: Make PMU sample array two-dimensional" and
dropped helpers to store and read samples.
3. use xe_device_mem_access_get_if_ongoing to check if device is active
before reading the OA registers.
4. dropped format attr as no longer needed
5. introduce xe_pmu_suspend to call engine_group_busyness_store
6. few other nits.

v2:
Store last known value when device is awake return that while the GT is
suspended and then update the driver copy when read during awake.

Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>

Aravind Iddamsetty (3):
  drm/xe: Get GT clock to nanosecs
  drm/xe: Use spinlock in forcewake instead of mutex
  drm/xe/pmu: Enable PMU interface

 drivers/gpu/drm/xe/Makefile              |   2 +
 drivers/gpu/drm/xe/regs/xe_gt_regs.h     |   5 +
 drivers/gpu/drm/xe/xe_device.c           |   2 +
 drivers/gpu/drm/xe/xe_device_types.h     |   4 +
 drivers/gpu/drm/xe/xe_force_wake.c       |  14 +-
 drivers/gpu/drm/xe/xe_force_wake_types.h |   2 +-
 drivers/gpu/drm/xe/xe_gt.c               |   2 +
 drivers/gpu/drm/xe/xe_gt_clock.c         |   5 +
 drivers/gpu/drm/xe/xe_gt_clock.h         |   4 +-
 drivers/gpu/drm/xe/xe_irq.c              |  18 +
 drivers/gpu/drm/xe/xe_module.c           |   5 +
 drivers/gpu/drm/xe/xe_pmu.c              | 661 +++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_pmu.h              |  25 +
 drivers/gpu/drm/xe/xe_pmu_types.h        |  76 +++
 include/uapi/drm/xe_drm.h                |  39 ++
 15 files changed, 855 insertions(+), 9 deletions(-)
 create mode 100644 drivers/gpu/drm/xe/xe_pmu.c
 create mode 100644 drivers/gpu/drm/xe/xe_pmu.h
 create mode 100644 drivers/gpu/drm/xe/xe_pmu_types.h

-- 
2.25.1


^ permalink raw reply	[flat|nested] 17+ messages in thread
* [Intel-xe] [PATCH v5 0/3] drm/xe/pmu: Enable PMU interface
@ 2023-08-30  5:15 Aravind Iddamsetty
  2023-08-30  5:15 ` [Intel-xe] [PATCH 2/3] drm/xe: Use spinlock in forcewake instead of mutex Aravind Iddamsetty
  0 siblings, 1 reply; 17+ messages in thread
From: Aravind Iddamsetty @ 2023-08-30  5:15 UTC (permalink / raw)
  To: intel-xe

There are a set of engine group busyness counters provided by HW which are
perfect fit to be exposed via PMU perf events.

BSPEC: 46559, 46560, 46722, 46729, 52071, 71028

events can be listed using:
perf list
  xe_0000_03_00.0/any-engine-group-busy-gt0/         [Kernel PMU event]
  xe_0000_03_00.0/copy-group-busy-gt0/               [Kernel PMU event]
  xe_0000_03_00.0/interrupts/                        [Kernel PMU event]
  xe_0000_03_00.0/media-group-busy-gt0/              [Kernel PMU event]
  xe_0000_03_00.0/render-group-busy-gt0/             [Kernel PMU event]

and can be read using:

perf stat -e "xe_0000_8c_00.0/render-group-busy-gt0/" -I 1000
           time             counts unit events
     1.001139062                  0 ns  xe_0000_8c_00.0/render-group-busy-gt0/
     2.003294678                  0 ns  xe_0000_8c_00.0/render-group-busy-gt0/
     3.005199582                  0 ns  xe_0000_8c_00.0/render-group-busy-gt0/
     4.007076497                  0 ns  xe_0000_8c_00.0/render-group-busy-gt0/
     5.008553068                  0 ns  xe_0000_8c_00.0/render-group-busy-gt0/
     6.010531563              43520 ns  xe_0000_8c_00.0/render-group-busy-gt0/
     7.012468029              44800 ns  xe_0000_8c_00.0/render-group-busy-gt0/
     8.013463515                  0 ns  xe_0000_8c_00.0/render-group-busy-gt0/
     9.015300183                  0 ns  xe_0000_8c_00.0/render-group-busy-gt0/
    10.017233010                  0 ns  xe_0000_8c_00.0/render-group-busy-gt0/
    10.971934120                  0 ns  xe_0000_8c_00.0/render-group-busy-gt0/

The pmu base implementation is taken from i915.

v5:
1. Use spinlock in forcewake instead of mutex
2. take forcewake when accessing the OAG registers

v4: minor nits.

v3:
1. drop init_samples, as storing counters before going to suspend should
be sufficient.
2. ported the "drm/i915/pmu: Make PMU sample array two-dimensional" and
dropped helpers to store and read samples.
3. use xe_device_mem_access_get_if_ongoing to check if device is active
before reading the OA registers.
4. dropped format attr as no longer needed
5. introduce xe_pmu_suspend to call engine_group_busyness_store
6. few other nits.

v2:
Store last known value when device is awake return that while the GT is
suspended and then update the driver copy when read during awake.

Aravind Iddamsetty (3):
  drm/xe: Get GT clock to nanosecs
  drm/xe: Use spinlock in forcewake instead of mutex
  drm/xe/pmu: Enable PMU interface

 drivers/gpu/drm/xe/Makefile              |   2 +
 drivers/gpu/drm/xe/regs/xe_gt_regs.h     |   5 +
 drivers/gpu/drm/xe/xe_device.c           |   2 +
 drivers/gpu/drm/xe/xe_device_types.h     |   4 +
 drivers/gpu/drm/xe/xe_force_wake.c       |  14 +-
 drivers/gpu/drm/xe/xe_force_wake_types.h |   2 +-
 drivers/gpu/drm/xe/xe_gt.c               |   2 +
 drivers/gpu/drm/xe/xe_gt_clock.c         |   5 +
 drivers/gpu/drm/xe/xe_gt_clock.h         |   4 +-
 drivers/gpu/drm/xe/xe_irq.c              |  18 +
 drivers/gpu/drm/xe/xe_module.c           |   5 +
 drivers/gpu/drm/xe/xe_pmu.c              | 679 +++++++++++++++++++++++
 drivers/gpu/drm/xe/xe_pmu.h              |  25 +
 drivers/gpu/drm/xe/xe_pmu_types.h        |  76 +++
 include/uapi/drm/xe_drm.h                |  16 +
 15 files changed, 850 insertions(+), 9 deletions(-)
 create mode 100644 drivers/gpu/drm/xe/xe_pmu.c
 create mode 100644 drivers/gpu/drm/xe/xe_pmu.h
 create mode 100644 drivers/gpu/drm/xe/xe_pmu_types.h

-- 
2.25.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2023-09-14  7:31 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-14  6:13 [Intel-xe] [PATCH v7 0/3] drm/xe/pmu: Enable PMU interface Aravind Iddamsetty
2023-09-14  6:13 ` [Intel-xe] [PATCH v3 1/3] drm/xe: Get GT clock to nanosecs Aravind Iddamsetty
2023-09-14  6:13 ` [Intel-xe] [PATCH 2/3] drm/xe: Use spinlock in forcewake instead of mutex Aravind Iddamsetty
2023-09-14  6:13 ` [Intel-xe] [PATCH v7 3/3] drm/xe/pmu: Enable PMU interface Aravind Iddamsetty
2023-09-14  6:50 ` [Intel-xe] ✓ CI.Patch_applied: success for drm/xe/pmu: Enable PMU interface (rev7) Patchwork
2023-09-14  6:50 ` [Intel-xe] ✗ CI.checkpatch: warning " Patchwork
2023-09-14  6:51 ` [Intel-xe] ✓ CI.KUnit: success " Patchwork
2023-09-14  6:58 ` [Intel-xe] ✓ CI.Build: " Patchwork
2023-09-14  6:58 ` [Intel-xe] ✗ CI.Hooks: failure " Patchwork
2023-09-14  7:00 ` [Intel-xe] ✓ CI.checksparse: success " Patchwork
2023-09-14  7:31 ` [Intel-xe] ✓ CI.BAT: " Patchwork
  -- strict thread matches above, loose matches on Subject: below --
2023-09-01  7:06 [Intel-xe] [PATCH v6 0/3] drm/xe/pmu: Enable PMU interface Aravind Iddamsetty
2023-09-01  7:06 ` [Intel-xe] [PATCH 2/3] drm/xe: Use spinlock in forcewake instead of mutex Aravind Iddamsetty
2023-08-30  5:15 [Intel-xe] [PATCH v5 0/3] drm/xe/pmu: Enable PMU interface Aravind Iddamsetty
2023-08-30  5:15 ` [Intel-xe] [PATCH 2/3] drm/xe: Use spinlock in forcewake instead of mutex Aravind Iddamsetty
2023-08-30  5:33   ` Dixit, Ashutosh
2023-08-30 20:56     ` Rodrigo Vivi
2023-08-30 22:19       ` Dixit, Ashutosh
2023-08-31  4:13         ` Aravind Iddamsetty

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox