Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
To: Lucas De Marchi <lucas.demarchi@intel.com>,
	Riana Tauro <riana.tauro@intel.com>
Cc: intel-xe@lists.freedesktop.org, anshuman.gupta@intel.com,
	ashutosh.dixit@intel.com, aravind.iddamsetty@linux.intel.com,
	rodrigo.vivi@intel.com, umesh.nerlige.ramappa@intel.com,
	krishnaiah.bommu@intel.com
Subject: Re: [PATCH v9 2/2] drm/xe/pmu: Enable PMU interface
Date: Fri, 14 Jun 2024 17:38:39 +0100	[thread overview]
Message-ID: <c3467a3f-8124-469a-9cf1-e133b81f9e9f@igalia.com> (raw)
In-Reply-To: <jldfwuyzm6pxms7qt2yqj57bb75lquvn27socbcxrfcevvt2s2@nllraaafjqpq>


On 14/06/2024 17:15, Lucas De Marchi wrote:
> On Thu, Jun 13, 2024 at 03:34:11PM GMT, Riana Tauro wrote:
>> From: Aravind Iddamsetty <aravind.iddamsetty@linux.intel.com>
>>
>> There are a set of engine group busyness counters provided by HW which 
>> are
>> perfect fit to be exposed via PMU perf events.
>>
>> BSPEC: 46559, 46560, 46722, 46729, 52071, 71028
>>
>> events can be listed using:
>> perf list
>>  xe_0000_03_00.0/any-engine-group-busy-gt0/         [Kernel PMU event]
>>  xe_0000_03_00.0/copy-group-busy-gt0/               [Kernel PMU event]
>>  xe_0000_03_00.0/media-group-busy-gt0/              [Kernel PMU event]
>>  xe_0000_03_00.0/render-group-busy-gt0/             [Kernel PMU event]
>>
>> and can be read using:
>>
>> perf stat -e "xe_0000_8c_00.0/render-group-busy-gt0/" -I 1000
>>           time        counts unit events
>>     1.001139062            0 ns  xe_0000_8c_00.0/render-group-busy-gt0/
>>     2.003294678            0 ns  xe_0000_8c_00.0/render-group-busy-gt0/
>>     3.005199582            0 ns  xe_0000_8c_00.0/render-group-busy-gt0/
>>     4.007076497            0 ns  xe_0000_8c_00.0/render-group-busy-gt0/
>>     5.008553068            0 ns  xe_0000_8c_00.0/render-group-busy-gt0/
>>     6.010531563        43520 ns  xe_0000_8c_00.0/render-group-busy-gt0/
>>     7.012468029        44800 ns  xe_0000_8c_00.0/render-group-busy-gt0/
>>     8.013463515            0 ns  xe_0000_8c_00.0/render-group-busy-gt0/
>>     9.015300183            0 ns  xe_0000_8c_00.0/render-group-busy-gt0/
>>    10.017233010            0 ns  xe_0000_8c_00.0/render-group-busy-gt0/
>>    10.971934120            0 ns  xe_0000_8c_00.0/render-group-busy-gt0/
>>
>> The pmu base implementation is taken from i915.
>>
>> v2:
>> Store last known value when device is awake return that while the GT is
>> suspended and then update the driver copy when read during awake.
>>
>> v3:
>> 1. drop init_samples, as storing counters before going to suspend should
>> be sufficient.
>> 2. ported the "drm/i915/pmu: Make PMU sample array two-dimensional" and
>> dropped helpers to store and read samples.
>> 3. use xe_device_mem_access_get_if_ongoing to check if device is active
>> before reading the OA registers.
>> 4. dropped format attr as no longer needed
>> 5. introduce xe_pmu_suspend to call engine_group_busyness_store
>> 6. few other nits.
>>
>> v4: minor nits.
>>
>> v5: take forcewake when accessing the OAG registers
>>
>> v6:
>> 1. drop engine_busyness_sample_type
>> 2. update UAPI documentation
>>
>> v7:
>> 1. update UAPI documentation
>> 2. drop MEDIA_GT specific change for media busyness counter.
>>
>> v8:
>> 1. rebase
>> 2. replace mem_access_if_ongoing with xe_pm_runtime_get_if_active
>> 3. remove interrupts pmu event
>>
>> v9: replace drmm_add_action_or_reset with devm (Matthew Auld)
>>
>> Co-developed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> Co-developed-by: Bommu Krishnaiah <krishnaiah.bommu@intel.com>
>> Signed-off-by: Bommu Krishnaiah <krishnaiah.bommu@intel.com>
>> Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@linux.intel.com>
> 
> first s-o-b should match the author in the patch. According to the
> "From" above, author is set to Aravind Iddamsetty 
> <aravind.iddamsetty@linux.intel.com>
> 
> I wil leave some nits about the implementation and focus on the main
> concept being added here.
> 
> ...
> 
>> diff --git a/drivers/gpu/drm/xe/xe_pmu.c b/drivers/gpu/drm/xe/xe_pmu.c
>> new file mode 100644
>> index 000000000000..64960a358af2
>> --- /dev/null
>> +++ b/drivers/gpu/drm/xe/xe_pmu.c
>> @@ -0,0 +1,631 @@
>> +// SPDX-License-Identifier: MIT
>> +/*
>> + * Copyright © 2024 Intel Corporation
>> + */
>> +
>> +#include <drm/drm_drv.h>
>> +#include <drm/drm_managed.h>
>> +#include <drm/xe_drm.h>
>> +
>> +#include "regs/xe_gt_regs.h"
>> +#include "xe_device.h"
>> +#include "xe_force_wake.h"
>> +#include "xe_gt_clock.h"
>> +#include "xe_mmio.h"
>> +#include "xe_macros.h"
>> +#include "xe_pm.h"
>> +
>> +static cpumask_t xe_pmu_cpumask;
>> +static unsigned int xe_pmu_target_cpu = -1;
>> +
>> +static unsigned int config_gt_id(const u64 config)
>> +{
>> +    return config >> __XE_PMU_GT_SHIFT;
>> +}
>> +
>> +static u64 config_counter(const u64 config)
>> +{
>> +    return config & ~(~0ULL << __XE_PMU_GT_SHIFT);
>> +}
>> +
>> +static void xe_pmu_event_destroy(struct perf_event *event)
>> +{
>> +    struct xe_device *xe =
>> +        container_of(event->pmu, typeof(*xe), pmu.base);
>> +
>> +    drm_WARN_ON(&xe->drm, event->parent);
>> +
>> +    drm_dev_put(&xe->drm);
>> +}
>> +
>> +static u64 __engine_group_busyness_read(struct xe_gt *gt, int 
>> sample_type)
>> +{
>> +    u64 val;
>> +
>> +    switch (sample_type) {
>> +    case __XE_SAMPLE_RENDER_GROUP_BUSY:
>> +        val = xe_mmio_read32(gt, XE_OAG_RENDER_BUSY_FREE);
>> +        break;
>> +    case __XE_SAMPLE_COPY_GROUP_BUSY:
>> +        val = xe_mmio_read32(gt, XE_OAG_BLT_BUSY_FREE);
>> +        break;
>> +    case __XE_SAMPLE_MEDIA_GROUP_BUSY:
>> +        val = xe_mmio_read32(gt, XE_OAG_ANY_MEDIA_FF_BUSY_FREE);
>> +        break;
>> +    case __XE_SAMPLE_ANY_ENGINE_GROUP_BUSY:
>> +        val = xe_mmio_read32(gt, XE_OAG_RC0_ANY_ENGINE_BUSY_FREE);
>> +        break;
>> +    default:
>> +        drm_warn(&gt->tile->xe->drm, "unknown pmu event\n");
>> +    }
>> +
>> +    return xe_gt_clock_cycles_to_ns(gt, val * 16);
>> +}
>> +
>> +static u64 engine_group_busyness_read(struct xe_gt *gt, u64 config)
>> +{
>> +    int sample_type = config_counter(config);
>> +    const unsigned int gt_id = gt->info.id;
>> +    struct xe_device *xe = gt->tile->xe;
>> +    struct xe_pmu *pmu = &xe->pmu;
>> +    unsigned long flags;
>> +    bool device_awake;
>> +    u64 val;
>> +
>> +    device_awake = xe_pm_runtime_get_if_active(xe);
>> +    if (device_awake) {
>> +        XE_WARN_ON(xe_force_wake_get(gt_to_fw(gt), XE_FW_GT));
>> +        val = __engine_group_busyness_read(gt, sample_type);
>> +        XE_WARN_ON(xe_force_wake_put(gt_to_fw(gt), XE_FW_GT));
>> +        xe_pm_runtime_put(xe);
>> +    }
>> +
>> +    spin_lock_irqsave(&pmu->lock, flags);
>> +
>> +    if (device_awake)
>> +        pmu->sample[gt_id][sample_type] = val;
>> +    else
>> +        val = pmu->sample[gt_id][sample_type];
>> +
>> +    spin_unlock_irqrestore(&pmu->lock, flags);
>> +
>> +    return val;
>> +}
>> +
>> +static void engine_group_busyness_store(struct xe_gt *gt)
>> +{
>> +    struct xe_pmu *pmu = &gt->tile->xe->pmu;
>> +    unsigned int gt_id = gt->info.id;
>> +    unsigned long flags;
>> +    int i;
>> +
>> +    spin_lock_irqsave(&pmu->lock, flags);
>> +
>> +    for (i = __XE_SAMPLE_RENDER_GROUP_BUSY; i <= 
>> __XE_SAMPLE_ANY_ENGINE_GROUP_BUSY; i++)
>> +        pmu->sample[gt_id][i] = __engine_group_busyness_read(gt, i);
>> +
>> +    spin_unlock_irqrestore(&pmu->lock, flags);
>> +}
>> +
>> +static int
>> +config_status(struct xe_device *xe, u64 config)
>> +{
>> +    unsigned int gt_id = config_gt_id(config);
>> +    struct xe_gt *gt = xe_device_get_gt(xe, gt_id);
>> +
>> +    if (gt_id >= XE_PMU_MAX_GT)
>> +        return -ENOENT;
>> +
>> +    switch (config_counter(config)) {
>> +    case XE_PMU_RENDER_GROUP_BUSY(0):
>> +    case XE_PMU_COPY_GROUP_BUSY(0):
>> +    case XE_PMU_ANY_ENGINE_GROUP_BUSY(0):
>> +        if (gt->info.type == XE_GT_TYPE_MEDIA)
>> +            return -ENOENT;
>> +        break;
>> +    case XE_PMU_MEDIA_GROUP_BUSY(0):
>> +        if (!(gt->info.engine_mask & (BIT(XE_HW_ENGINE_VCS0) | 
>> BIT(XE_HW_ENGINE_VECS0))))
>> +            return -ENOENT;
>> +        break;
>> +    default:
>> +        return -ENOENT;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static int xe_pmu_event_init(struct perf_event *event)
>> +{
>> +    struct xe_device *xe =
>> +        container_of(event->pmu, typeof(*xe), pmu.base);
>> +    struct xe_pmu *pmu = &xe->pmu;
>> +    int ret;
>> +
>> +    if (pmu->closed)
>> +        return -ENODEV;
>> +
>> +    if (event->attr.type != event->pmu->type)
>> +        return -ENOENT;
>> +
>> +    /* unsupported modes and filters */
>> +    if (event->attr.sample_period) /* no sampling */
>> +        return -EINVAL;
>> +
>> +    if (has_branch_stack(event))
>> +        return -EOPNOTSUPP;
>> +
>> +    if (event->cpu < 0)
>> +        return -EINVAL;
>> +
>> +    /* only allow running on one cpu at a time */
>> +    if (!cpumask_test_cpu(event->cpu, &xe_pmu_cpumask))
>> +        return -EINVAL;
>> +
>> +    ret = config_status(xe, event->attr.config);
>> +    if (ret)
>> +        return ret;
>> +
>> +    if (!event->parent) {
>> +        drm_dev_get(&xe->drm);
>> +        event->destroy = xe_pmu_event_destroy;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static u64 __xe_pmu_event_read(struct perf_event *event)
>> +{
>> +    struct xe_device *xe =
>> +        container_of(event->pmu, typeof(*xe), pmu.base);
>> +    const unsigned int gt_id = config_gt_id(event->attr.config);
>> +    const u64 config = event->attr.config;
>> +    struct xe_gt *gt = xe_device_get_gt(xe, gt_id);
>> +    u64 val;
>> +
>> +    switch (config_counter(config)) {
>> +    case XE_PMU_RENDER_GROUP_BUSY(0):
>> +    case XE_PMU_COPY_GROUP_BUSY(0):
>> +    case XE_PMU_ANY_ENGINE_GROUP_BUSY(0):
>> +    case XE_PMU_MEDIA_GROUP_BUSY(0):
>> +        val = engine_group_busyness_read(gt, config);
>> +        break;
>> +    default:
>> +        drm_warn(&gt->tile->xe->drm, "unknown pmu event\n");
>> +    }
>> +
>> +    return val;
>> +}
>> +
>> +static void xe_pmu_event_read(struct perf_event *event)
>> +{
>> +    struct xe_device *xe =
>> +        container_of(event->pmu, typeof(*xe), pmu.base);
>> +    struct hw_perf_event *hwc = &event->hw;
>> +    struct xe_pmu *pmu = &xe->pmu;
>> +    u64 prev, new;
>> +
>> +    if (pmu->closed) {
>> +        event->hw.state = PERF_HES_STOPPED;
>> +        return;
>> +    }
>> +again:
>> +    prev = local64_read(&hwc->prev_count);
>> +    new = __xe_pmu_event_read(event);
> 
> 
> so... when we enable a perf counter with the example in the cover
> letter:
> 
>      perf stat -e "xe_0000_8c_00.0/render-group-busy-gt0/" -I 1000
> 
> then we will have the following call chain:
> 
> xe_pmu_event_read()
>    __xe_pmu_event_read()
>      engine_group_busyness_read()
>        engine_group_busyness_read()
>          __engine_group_busyness_read()
>        xe_mmio_read32()
> 
> At a frequency up to sysctl kernel.perf_event_max_sample_rate  (~50kHz
> in my distro). The event itself is recorded to the ring buffer if it
> changed.

No, only once a second in this example. I forget the terminology.. 
Sampling is for per-task PMU drivers which i915 and this are not.

> The HW interface we are using is to simply read XE_OAG_*_BUSY_FREE.
> At the same time we are trying to add new uapi called xe_perf (that has
> nothing to do with perf, sigh) that exposes OA as streams with ioctl.
> 
> Also, at the same time we already do per-client engine utilization and
> with the usual very few clients, why couldn't userspace just aggregate
> the values per engine?  Is it really useful to expose this HW counter to
> userspace? From a quick look, it doesn't seem that much more accurate
> to use the global (per-engine-class) counter.  The bad non-performant
> part IMO of the per-client side is the client-discovery via /proc

Once upon a time we were saying why we need these new group busyness 
thingies when we got per engine, but semantics are different and people 
(L0) were very insisting :shrug:

IMHO group busyness is weak and next to useless for end users and 
intel_gpu_top like tools. But it is not a discussion for me.

Aggregating fdinfo stats would IMO be better semantics but wasteful and 
not showing in-kernel stuff like maybe clearing and migrations via blitter.

> Also not clear to me why we need the percpu part if we are collecting
> the GPU counters. It's more than 10 years I last played with the
> implementation side of perf counters, so I'm needing a refresh while
> looking at this.

Per cpu.. you mean cpu hotplug handling? Perf core mandates it.

> FYI I started to look to thi series because of the problem reported in
> i915 wrt perf counters while unbinding the device:
> 
> https://lore.kernel.org/lkml/20240115170120.662220-1-tvrtko.ursulin@linux.intel.com/T/#me72abfa2771e6fc94b167ce47efdbf391cc313ab
> 
> and
> 
> https://lore.kernel.org/all/20240213180302.47266-1-umesh.nerlige.ramappa@intel.com/
> 
> Tvrtko, any additional feedback you got from the perf/core side for that
> series?

Good that you found it in the archives otherwise I could be thinking I 
sent it to /dev/null. Does that answer you? :)

IMO best solution is to fix the perf core for hot unbind and have i915 
like counters. Other options are all flawed and weak.

Regards,

Tvrtko

  reply	other threads:[~2024-06-17 13:28 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-13 10:04 [PATCH v9 0/2] drm/xe/pmu: Enable PMU interface Riana Tauro
2024-06-13  9:57 ` ✓ CI.Patch_applied: success for drm/xe/pmu: Enable PMU interface (rev9) Patchwork
2024-06-13  9:57 ` ✗ CI.checkpatch: warning " Patchwork
2024-06-13  9:59 ` ✓ CI.KUnit: success " Patchwork
2024-06-13 10:04 ` [PATCH v9 1/2] drm/xe: Get GT clock to nanosecs Riana Tauro
2024-06-14 14:57   ` Lucas De Marchi
2024-06-13 10:04 ` [PATCH v9 2/2] drm/xe/pmu: Enable PMU interface Riana Tauro
2024-06-14 16:15   ` Lucas De Marchi
2024-06-14 16:38     ` Tvrtko Ursulin [this message]
2024-06-14 20:54   ` Ghimiray, Himal Prasad
2024-06-27  5:21     ` Riana Tauro
2024-06-20 19:52   ` Umesh Nerlige Ramappa
2024-06-27  6:49     ` Aravind Iddamsetty
2024-06-27 16:05       ` Umesh Nerlige Ramappa
2024-06-28  9:41         ` Aravind Iddamsetty
2024-06-28 16:36           ` Umesh Nerlige Ramappa
2024-06-28 15:55     ` Lucas De Marchi
2024-06-28 16:52       ` Umesh Nerlige Ramappa
2024-06-28 18:24         ` Lucas De Marchi
2024-06-28 18:49           ` Umesh Nerlige Ramappa
2024-06-13 10:11 ` ✓ CI.Build: success for drm/xe/pmu: Enable PMU interface (rev9) Patchwork
2024-06-13 10:13 ` ✗ CI.Hooks: failure " Patchwork
2024-06-13 10:14 ` ✓ CI.checksparse: success " Patchwork
2024-06-13 11:48 ` ✓ CI.BAT: " Patchwork
2024-06-13 18:39 ` [PATCH v9 0/2] drm/xe/pmu: Enable PMU interface Dixit, Ashutosh
2024-06-14  5:50   ` Riana Tauro
2024-06-14 15:34     ` Dixit, Ashutosh
2024-06-20 19:15       ` Umesh Nerlige Ramappa
2024-06-21  4:29         ` Aravind Iddamsetty
2024-06-21 18:32           ` Umesh Nerlige Ramappa
2024-06-24 10:27             ` Aravind Iddamsetty
2024-06-24 23:23               ` Umesh Nerlige Ramappa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c3467a3f-8124-469a-9cf1-e133b81f9e9f@igalia.com \
    --to=tvrtko.ursulin@igalia.com \
    --cc=anshuman.gupta@intel.com \
    --cc=aravind.iddamsetty@linux.intel.com \
    --cc=ashutosh.dixit@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=krishnaiah.bommu@intel.com \
    --cc=lucas.demarchi@intel.com \
    --cc=riana.tauro@intel.com \
    --cc=rodrigo.vivi@intel.com \
    --cc=umesh.nerlige.ramappa@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox