From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 71C41CA0FF6 for ; Sat, 2 Sep 2023 05:01:03 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id E90AA10E03E; Sat, 2 Sep 2023 05:01:02 +0000 (UTC) Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.93]) by gabe.freedesktop.org (Postfix) with ESMTPS id B125910E038 for ; Sat, 2 Sep 2023 05:01:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1693630860; x=1725166860; h=date:message-id:from:to:cc:subject:in-reply-to: references:mime-version; bh=mAVZpwL4EtaW/nkeqamT1zC+WqCpmGiW04JhPJIiYa4=; b=eM2QgGHxJ+vDJUaRPYc8qA31+Uq4Z91fC23f+6z+CoaSN0+sB3HnUwPb cFZ/Rw0tflsrzXrMq05HPQu2+bvDX2pnL5UD0l6gBXnX4tPD0upiZycxU G3IEJ7ybhoAh1Fl4DVn0hiGa+Psak3o4fuWTkOKFYzYxV7gfMJip7I7nA LtD4Ui+cdQ+Xlf7DB91B8CW7ba92V099sZs9VEBl84zp3D5ejQ2h4PIiY hcJU/K7vhV619ecDuDhbV+rsbBDbSq2Gs0fjZDqn0dpeZ36SzlD6zXKmu chWCdpy03MdEps/i2mQJ45+HJlNWMCJ5wezzC4OC0ccTZsKXpetxeeKrM A==; X-IronPort-AV: E=McAfee;i="6600,9927,10819"; a="373744240" X-IronPort-AV: E=Sophos;i="6.02,221,1688454000"; d="scan'208";a="373744240" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Sep 2023 22:00:59 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10819"; a="733723102" X-IronPort-AV: E=Sophos;i="6.02,221,1688454000"; d="scan'208";a="733723102" Received: from adixit-mobl.amr.corp.intel.com (HELO adixit-arch.intel.com) ([10.209.79.82]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Sep 2023 22:00:59 -0700 Date: Fri, 01 Sep 2023 22:00:25 -0700 Message-ID: <87wmx9yxsm.wl-ashutosh.dixit@intel.com> From: "Dixit, Ashutosh" To: Aravind Iddamsetty In-Reply-To: <20230901070648.1100049-4-aravind.iddamsetty@linux.intel.com> References: <20230901070648.1100049-1-aravind.iddamsetty@linux.intel.com> <20230901070648.1100049-4-aravind.iddamsetty@linux.intel.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?ISO-8859-4?Q?Goj=F2?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/29.1 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII Subject: Re: [Intel-xe] [PATCH v6 3/3] drm/xe/pmu: Enable PMU interface X-BeenThere: intel-xe@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel Xe graphics driver List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Bommu Krishnaiah , Tvrtko Ursulin , intel-xe@lists.freedesktop.org, rodrigo.vivi@intel.com Errors-To: intel-xe-bounces@lists.freedesktop.org Sender: "Intel-xe" On Fri, 01 Sep 2023 00:06:48 -0700, Aravind Iddamsetty wrote: > Hi Aravind, > +static u64 __engine_group_busyness_read(struct xe_gt *gt, int sample_type) > +{ > + u64 val; > + > + switch (sample_type) { > + case __XE_SAMPLE_RENDER_GROUP_BUSY: > + val = xe_mmio_read32(gt, XE_OAG_RENDER_BUSY_FREE); > + break; > + case __XE_SAMPLE_COPY_GROUP_BUSY: > + val = xe_mmio_read32(gt, XE_OAG_BLT_BUSY_FREE); > + break; > + case __XE_SAMPLE_MEDIA_GROUP_BUSY: > + /* > + * This GSI offset is not replicated in media specific range > + * BSPEC: 67789, 46559 > + */ > + if (gt->info.type == XE_GT_TYPE_MEDIA) > + val = xe_mmio_read32(gt->tile->primary_gt, XE_OAG_ANY_MEDIA_FF_BUSY_FREE); Hmm, this appeared in this version. Is this really needed? Did you test this on MTL media gt? E.g. if this is needed we'd need to do similar stuff in freq_act_show() when reading MTL_MIRROR_TARGET_WP1, but we don't do this there. So we don't see this done anywhere else in XE. I don't understand this multicast steering but appears XE mmio read/write functions might already be taking care of this via: if (reg.addr < gt->mmio.adj_limit) So could you please check this out and take out this code if not needed. > + else > + val = xe_mmio_read32(gt, XE_OAG_ANY_MEDIA_FF_BUSY_FREE); > + break; > + case __XE_SAMPLE_ANY_ENGINE_GROUP_BUSY: > + val = xe_mmio_read32(gt, XE_OAG_RC0_ANY_ENGINE_BUSY_FREE); > + break; > + default: > + drm_warn(>->tile->xe->drm, "unknown pmu event\n"); > + } > + > + return xe_gt_clock_cycles_to_ns(gt, val * 16); > +} > + > +static u64 engine_group_busyness_read(struct xe_gt *gt, u64 config) > +{ > + int sample_type = config_counter(config) - 1; > + const unsigned int gt_id = gt->info.id; > + struct xe_device *xe = gt->tile->xe; > + struct xe_pmu *pmu = &xe->pmu; > + unsigned long flags; > + bool device_awake; > + u64 val; > + > + device_awake = xe_device_mem_access_get_if_ongoing(xe); > + if (device_awake) { > + XE_WARN_ON(xe_force_wake_get(gt_to_fw(gt), XE_FW_GT)); > + val = __engine_group_busyness_read(gt, sample_type); > + XE_WARN_ON(xe_force_wake_put(gt_to_fw(gt), XE_FW_GT)); > + xe_device_mem_access_put(xe); > + } > + > + spin_lock_irqsave(&pmu->lock, flags); > + > + if (device_awake) > + pmu->sample[gt_id][sample_type] = val; > + else > + val = pmu->sample[gt_id][sample_type]; > + > + spin_unlock_irqrestore(&pmu->lock, flags); > + > + return val; > +} > + > +static void engine_group_busyness_store(struct xe_gt *gt) > +{ > + struct xe_pmu *pmu = >->tile->xe->pmu; > + unsigned int gt_id = gt->info.id; > + unsigned long flags; > + int i; > + > + spin_lock_irqsave(&pmu->lock, flags); > + > + for (i = __XE_SAMPLE_RENDER_GROUP_BUSY; i <= __XE_SAMPLE_ANY_ENGINE_GROUP_BUSY; i++) > + pmu->sample[gt_id][i] = __engine_group_busyness_read(gt, i); Just a reminder about what we discussed about this earlier. Here we are saving all these samples unconditionally, whether or not PMU will ever be used. We have deferred the optimization to only save enabled counters to a later patch. So we might still come back and do that. > +static void xe_pmu_event_read(struct perf_event *event) > +{ > + struct xe_device *xe = > + container_of(event->pmu, typeof(*xe), pmu.base); > + struct hw_perf_event *hwc = &event->hw; > + struct xe_pmu *pmu = &xe->pmu; > + u64 prev, new; > + > + if (pmu->closed) { > + event->hw.state = PERF_HES_STOPPED; > + return; > + } > +again: > + prev = local64_read(&hwc->prev_count); > + new = __xe_pmu_event_read(event); > + > + if (local64_cmpxchg(&hwc->prev_count, prev, new) != prev) > + goto again; > + > + local64_add(new - prev, &event->count); Just curious, in case you know, why do we do this here rather than just a: local64_set(new, &event->count); The code is the same as i915 so is likely correct, so no issue, if you don't know just ignore this comment. No changes needed here. > diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h > index 86f16d50e9cc..8a59350b3cea 100644 > --- a/include/uapi/drm/xe_drm.h > +++ b/include/uapi/drm/xe_drm.h > @@ -1056,6 +1056,45 @@ struct drm_xe_vm_madvise { > __u64 reserved[2]; > }; > > +/* PMU event config IDs */ > + > +/* > + * The below config IDs needs to be filled in the struct perf_even_attr field > + * as part of perf_event_open call to read a particular event. > + * > + * check man perf_event_open. > + * > + * example on how to open an event: > + * .. code-block:: C > + * struct perf_event_attr pe; > + * long long count; > + * int cpu = 0; > + * int fd; > + * > + * memset(&pe, 0, sizeof(struct perf_event_attr)); > + * pe.type = type // eg: /sys/bus/event_source/devices/xe_0000_56_00.0/type > + * pe.read_format = PERF_FORMAT_TOTAL_TIME_ENABLED; > + * pe.use_clockid = 1; > + * pe.clockid = CLOCK_MONOTONIC; > + * pe.config = XE_PMU_INTERRUPTS(0); > + * > + * fd = perf_event_open(&pe, -1, cpu, -1, 0); > + */ Pretty good, let me fix up the formatting, style and some typos. But please check. /** * XE PMU event config ID's * * Check 'man perf_event_open' to use these ID's in 'struct * perf_event_attr' as part of perf_event_open syscall to read particular * event counts. * * For example, to open the XE_PMU_INTERRUPTS(0) event: * * .. code-block:: C * * struct perf_event_attr attr = {}; * long long count; * int cpu = 0; * int fd; * * attr.type = type; /* /sys/bus/event_source/devices/xe_0000_56_00.0/type */ * attr.read_format = PERF_FORMAT_TOTAL_TIME_ENABLED; * attr.use_clockid = 1; * attr.clockid = CLOCK_MONOTONIC; * attr.config = XE_PMU_INTERRUPTS(0); * * fd = syscall(__NR_perf_event_open, &attr, -1, cpu, -1, 0); */ Apart from the couple of things above to be fixed up, I don't want to hold onto this review any longer, so this is now: Reviewed-by: Ashutosh Dixit