dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v8 0/5] Support fdinfo runtime and memory stats on Panthor
@ 2024-09-23 23:06 Adrián Larumbe
  2024-09-23 23:06 ` [PATCH v8 1/5] drm/panthor: introduce job cycle and timestamp accounting Adrián Larumbe
                   ` (5 more replies)
  0 siblings, 6 replies; 14+ messages in thread
From: Adrián Larumbe @ 2024-09-23 23:06 UTC (permalink / raw)
  To: Boris Brezillon, Steven Price, Liviu Dudau, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
	Sumit Semwal, Christian König
  Cc: kernel, Adrián Larumbe, dri-devel, linux-kernel, linux-media,
	linaro-mm-sig

This patch series enables userspace utilities like gputop and nvtop to query a
render context's fdinfo file and figure out rates of engine and memory utilisation.

Previous discussion can be found at
https://lore.kernel.org/dri-devel/20240920234436.207563-1-adrian.larumbe@collabora.com/

Changelog:
v8:
 - Fixed uninitialised stack variable bug that was triggering an invalid memory reference.
 - Added a few R-b tags to commits
v7:
 - Fixed some kernel test bot-reported documentation and sign mismatch errors.
 - Defined convenience macros for specifying CS instructions according to their profiled status.
 - Explicitly initialised instruction count for structure containing a job's
 instructions when calculating its amount of credits for the scheduler.
 - Some minor cosmetic nits. 
v6:
 - Addressed some nits and style issues.
 - Enforced static assert equality of instruction buffer when calculating job
 credits or copying them into the ringbuffer.
 - Added explanation to the way in which job credits and profiled job size is done.
 - Broke down fdinfo enablement patch into two, one of them dealing with adding
 support for calculating the current and top operating device frequencies
 - Fixed race at the time drm file-wide profiling stats are gathered from groups.
v5:
 - Moved profiling information slots into a per-queue BO and away from syncobjs.
 - Decide on size of profiling slots BO from size of CS for minimal profiled job
 - Turn job and device profiling flag into a bit mask so that individual metrics
 can be enabled separately.
 - Shrunk ringbuffer slot size to that of a cache line.
 - Track profiling slot indeces separately from the job's queue ringbuffer's
 - Emit CS instructions one by one and tag them depending on profiling mask
 - New helper for calculating job credits depending on profiling flags
 - Add Documentation file for sysfs profiling knob
 - fdinfo will only show engines or cycles tags if these are respectively enabled.
v4:
 - Fixed wrong assignment location for frequency values in Panthor's devfreq
 - Removed the last two commits about registering size of internal BO's
 - Rearranged patch series so that sysfs knob is done last and all the previous
 time sampling and fdinfo show dependencies are already in place
v3:
 - Fixed some nits and removed useless bounds check in panthor_sched.c
 - Added support for sysfs profiling knob and optional job accounting
 - Added new patches for calculating size of internal BO's
v2:
 - Split original first patch in two, one for FW CS cycle and timestamp
 calculations and job accounting memory management, and a second one
 that enables fdinfo.
 - Moved NUM_INSTRS_PER_SLOT to the file prelude
 - Removed nelem variable from the group's struct definition.
 - Precompute size of group's syncobj BO to avoid code duplication.
 - Some minor nits.

Adrián Larumbe (5):
  drm/panthor: introduce job cycle and timestamp accounting
  drm/panthor: record current and maximum device clock frequencies
  drm/panthor: add DRM fdinfo support
  drm/panthor: enable fdinfo for memory stats
  drm/panthor: add sysfs knob for enabling job profiling

 .../testing/sysfs-driver-panthor-profiling    |  10 +
 Documentation/gpu/panthor.rst                 |  46 +++
 drivers/gpu/drm/panthor/panthor_devfreq.c     |  18 +-
 drivers/gpu/drm/panthor/panthor_device.h      |  36 ++
 drivers/gpu/drm/panthor/panthor_drv.c         |  73 ++++
 drivers/gpu/drm/panthor/panthor_gem.c         |  12 +
 drivers/gpu/drm/panthor/panthor_sched.c       | 384 +++++++++++++++---
 drivers/gpu/drm/panthor/panthor_sched.h       |   2 +
 8 files changed, 531 insertions(+), 50 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-driver-panthor-profiling
 create mode 100644 Documentation/gpu/panthor.rst

-- 
2.46.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v8 1/5] drm/panthor: introduce job cycle and timestamp accounting
  2024-09-23 23:06 [PATCH v8 0/5] Support fdinfo runtime and memory stats on Panthor Adrián Larumbe
@ 2024-09-23 23:06 ` Adrián Larumbe
  2024-10-02  8:38   ` Boris Brezillon
  2024-09-23 23:06 ` [PATCH v8 2/5] drm/panthor: record current and maximum device clock frequencies Adrián Larumbe
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 14+ messages in thread
From: Adrián Larumbe @ 2024-09-23 23:06 UTC (permalink / raw)
  To: Boris Brezillon, Steven Price, Liviu Dudau, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
	Sumit Semwal, Christian König
  Cc: kernel, Adrián Larumbe, dri-devel, linux-kernel, linux-media,
	linaro-mm-sig

Enable calculations of job submission times in clock cycles and wall
time. This is done by expanding the boilerplate command stream when running
a job to include instructions that compute said times right before and
after a user CS.

A separate kernel BO is created per queue to store those values. Jobs can
access their sampled data through an index different from that of the
queue's ringbuffer. The reason for this is saving memory on the profiling
information kernel BO, since the amount of simultaneous profiled jobs we
can write into the queue's ringbuffer might be much smaller than for
regular jobs, as the former take more CSF instructions.

This commit is done in preparation for enabling DRM fdinfo support in the
Panthor driver, which depends on the numbers calculated herein.

A profile mode mask has been added that will in a future commit allow UM to
toggle performance metric sampling behaviour, which is disabled by default
to save power. When a ringbuffer CS is constructed, timestamp and cycling
sampling instructions are added depending on the enabled flags in the
profiling mask.

A helper was provided that calculates the number of instructions for a
given set of enablement mask, and these are passed as the number of credits
when initialising a DRM scheduler job.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>
---
 drivers/gpu/drm/panthor/panthor_device.h |  22 ++
 drivers/gpu/drm/panthor/panthor_sched.c  | 328 +++++++++++++++++++----
 2 files changed, 301 insertions(+), 49 deletions(-)

diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
index e388c0472ba7..a48e30d0af30 100644
--- a/drivers/gpu/drm/panthor/panthor_device.h
+++ b/drivers/gpu/drm/panthor/panthor_device.h
@@ -66,6 +66,25 @@ struct panthor_irq {
 	atomic_t suspended;
 };
 
+/**
+ * enum panthor_device_profiling_mode - Profiling state
+ */
+enum panthor_device_profiling_flags {
+	/** @PANTHOR_DEVICE_PROFILING_DISABLED: Profiling is disabled. */
+	PANTHOR_DEVICE_PROFILING_DISABLED = 0,
+
+	/** @PANTHOR_DEVICE_PROFILING_CYCLES: Sampling job cycles. */
+	PANTHOR_DEVICE_PROFILING_CYCLES = BIT(0),
+
+	/** @PANTHOR_DEVICE_PROFILING_TIMESTAMP: Sampling job timestamp. */
+	PANTHOR_DEVICE_PROFILING_TIMESTAMP = BIT(1),
+
+	/** @PANTHOR_DEVICE_PROFILING_ALL: Sampling everything. */
+	PANTHOR_DEVICE_PROFILING_ALL =
+	PANTHOR_DEVICE_PROFILING_CYCLES |
+	PANTHOR_DEVICE_PROFILING_TIMESTAMP,
+};
+
 /**
  * struct panthor_device - Panthor device
  */
@@ -162,6 +181,9 @@ struct panthor_device {
 		 */
 		struct page *dummy_latest_flush;
 	} pm;
+
+	/** @profile_mask: User-set profiling flags for job accounting. */
+	u32 profile_mask;
 };
 
 /**
diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
index 42afdf0ddb7e..4fc9946fd2ac 100644
--- a/drivers/gpu/drm/panthor/panthor_sched.c
+++ b/drivers/gpu/drm/panthor/panthor_sched.c
@@ -93,6 +93,9 @@
 #define MIN_CSGS				3
 #define MAX_CSG_PRIO				0xf
 
+#define NUM_INSTRS_PER_CACHE_LINE		(64 / sizeof(u64))
+#define MAX_INSTRS_PER_JOB			24
+
 struct panthor_group;
 
 /**
@@ -476,6 +479,18 @@ struct panthor_queue {
 		 */
 		struct list_head in_flight_jobs;
 	} fence_ctx;
+
+	/** @profiling: Job profiling data slots and access information. */
+	struct {
+		/** @slots: Kernel BO holding the slots. */
+		struct panthor_kernel_bo *slots;
+
+		/** @slot_count: Number of jobs ringbuffer can hold at once. */
+		u32 slot_count;
+
+		/** @seqno: Index of the next available profiling information slot. */
+		u32 seqno;
+	} profiling;
 };
 
 /**
@@ -661,6 +676,18 @@ struct panthor_group {
 	struct list_head wait_node;
 };
 
+struct panthor_job_profiling_data {
+	struct {
+		u64 before;
+		u64 after;
+	} cycles;
+
+	struct {
+		u64 before;
+		u64 after;
+	} time;
+};
+
 /**
  * group_queue_work() - Queue a group work
  * @group: Group to queue the work for.
@@ -774,6 +801,15 @@ struct panthor_job {
 
 	/** @done_fence: Fence signaled when the job is finished or cancelled. */
 	struct dma_fence *done_fence;
+
+	/** @profiling: Job profiling information. */
+	struct {
+		/** @mask: Current device job profiling enablement bitmask. */
+		u32 mask;
+
+		/** @slot: Job index in the profiling slots BO. */
+		u32 slot;
+	} profiling;
 };
 
 static void
@@ -838,6 +874,7 @@ static void group_free_queue(struct panthor_group *group, struct panthor_queue *
 
 	panthor_kernel_bo_destroy(queue->ringbuf);
 	panthor_kernel_bo_destroy(queue->iface.mem);
+	panthor_kernel_bo_destroy(queue->profiling.slots);
 
 	/* Release the last_fence we were holding, if any. */
 	dma_fence_put(queue->fence_ctx.last_fence);
@@ -1982,8 +2019,6 @@ tick_ctx_init(struct panthor_scheduler *sched,
 	}
 }
 
-#define NUM_INSTRS_PER_SLOT		16
-
 static void
 group_term_post_processing(struct panthor_group *group)
 {
@@ -2815,65 +2850,192 @@ static void group_sync_upd_work(struct work_struct *work)
 	group_put(group);
 }
 
-static struct dma_fence *
-queue_run_job(struct drm_sched_job *sched_job)
+struct panthor_job_ringbuf_instrs {
+	u64 buffer[MAX_INSTRS_PER_JOB];
+	u32 count;
+};
+
+struct panthor_job_instr {
+	u32 profile_mask;
+	u64 instr;
+};
+
+#define JOB_INSTR(__prof, __instr) \
+	{ \
+		.profile_mask = __prof, \
+		.instr = __instr, \
+	}
+
+static void
+copy_instrs_to_ringbuf(struct panthor_queue *queue,
+		       struct panthor_job *job,
+		       struct panthor_job_ringbuf_instrs *instrs)
+{
+	u64 ringbuf_size = panthor_kernel_bo_size(queue->ringbuf);
+	u64 start = job->ringbuf.start & (ringbuf_size - 1);
+	u64 size, written;
+
+	/*
+	 * We need to write a whole slot, including any trailing zeroes
+	 * that may come at the end of it. Also, because instrs.buffer has
+	 * been zero-initialised, there's no need to pad it with 0's
+	 */
+	instrs->count = ALIGN(instrs->count, NUM_INSTRS_PER_CACHE_LINE);
+	size = instrs->count * sizeof(u64);
+	WARN_ON(size > ringbuf_size);
+	written = min(ringbuf_size - start, size);
+
+	memcpy(queue->ringbuf->kmap + start, instrs->buffer, written);
+
+	if (written < size)
+		memcpy(queue->ringbuf->kmap,
+		       &instrs->buffer[written/sizeof(u64)],
+		       size - written);
+}
+
+struct panthor_job_cs_params {
+	u32 profile_mask;
+	u64 addr_reg; u64 val_reg;
+	u64 cycle_reg; u64 time_reg;
+	u64 sync_addr; u64 times_addr;
+	u64 cs_start; u64 cs_size;
+	u32 last_flush; u32 waitall_mask;
+};
+
+static void
+get_job_cs_params(struct panthor_job *job, struct panthor_job_cs_params *params)
 {
-	struct panthor_job *job = container_of(sched_job, struct panthor_job, base);
 	struct panthor_group *group = job->group;
 	struct panthor_queue *queue = group->queues[job->queue_idx];
 	struct panthor_device *ptdev = group->ptdev;
 	struct panthor_scheduler *sched = ptdev->scheduler;
-	u32 ringbuf_size = panthor_kernel_bo_size(queue->ringbuf);
-	u32 ringbuf_insert = queue->iface.input->insert & (ringbuf_size - 1);
-	u64 addr_reg = ptdev->csif_info.cs_reg_count -
-		       ptdev->csif_info.unpreserved_cs_reg_count;
-	u64 val_reg = addr_reg + 2;
-	u64 sync_addr = panthor_kernel_bo_gpuva(group->syncobjs) +
-			job->queue_idx * sizeof(struct panthor_syncobj_64b);
-	u32 waitall_mask = GENMASK(sched->sb_slot_count - 1, 0);
-	struct dma_fence *done_fence;
-	int ret;
 
-	u64 call_instrs[NUM_INSTRS_PER_SLOT] = {
-		/* MOV32 rX+2, cs.latest_flush */
-		(2ull << 56) | (val_reg << 48) | job->call_info.latest_flush,
+	params->addr_reg = ptdev->csif_info.cs_reg_count -
+			   ptdev->csif_info.unpreserved_cs_reg_count;
+	params->val_reg = params->addr_reg + 2;
+	params->cycle_reg = params->addr_reg;
+	params->time_reg = params->val_reg;
 
-		/* FLUSH_CACHE2.clean_inv_all.no_wait.signal(0) rX+2 */
-		(36ull << 56) | (0ull << 48) | (val_reg << 40) | (0 << 16) | 0x233,
+	params->sync_addr = panthor_kernel_bo_gpuva(group->syncobjs) +
+			    job->queue_idx * sizeof(struct panthor_syncobj_64b);
+	params->times_addr = panthor_kernel_bo_gpuva(queue->profiling.slots) +
+			     (job->profiling.slot * sizeof(struct panthor_job_profiling_data));
+	params->waitall_mask = GENMASK(sched->sb_slot_count - 1, 0);
 
-		/* MOV48 rX:rX+1, cs.start */
-		(1ull << 56) | (addr_reg << 48) | job->call_info.start,
+	params->cs_start = job->call_info.start;
+	params->cs_size = job->call_info.size;
+	params->last_flush = job->call_info.latest_flush;
 
-		/* MOV32 rX+2, cs.size */
-		(2ull << 56) | (val_reg << 48) | job->call_info.size,
+	params->profile_mask = job->profiling.mask;
+}
 
-		/* WAIT(0) => waits for FLUSH_CACHE2 instruction */
-		(3ull << 56) | (1 << 16),
+#define JOB_INSTR_ALWAYS(instr) \
+	JOB_INSTR(PANTHOR_DEVICE_PROFILING_DISABLED, (instr))
+#define JOB_INSTR_TIMESTAMP(instr) \
+	JOB_INSTR(PANTHOR_DEVICE_PROFILING_TIMESTAMP, (instr))
+#define JOB_INSTR_CYCLES(instr) \
+	JOB_INSTR(PANTHOR_DEVICE_PROFILING_CYCLES, (instr))
 
+static void
+prepare_job_instrs(const struct panthor_job_cs_params *params,
+		   struct panthor_job_ringbuf_instrs *instrs)
+{
+	const struct panthor_job_instr instr_seq[] = {
+		/* MOV32 rX+2, cs.latest_flush */
+		JOB_INSTR_ALWAYS((2ull << 56) | (params->val_reg << 48) | params->last_flush),
+		/* FLUSH_CACHE2.clean_inv_all.no_wait.signal(0) rX+2 */
+		JOB_INSTR_ALWAYS((36ull << 56) | (0ull << 48) | (params->val_reg << 40) | (0 << 16) | 0x233),
+		/* MOV48 rX:rX+1, cycles_offset */
+		JOB_INSTR_CYCLES((1ull << 56) | (params->cycle_reg << 48) |
+				 (params->times_addr + offsetof(struct panthor_job_profiling_data, cycles.before))),
+		/* STORE_STATE cycles */
+		JOB_INSTR_CYCLES((40ull << 56) | (params->cycle_reg << 40) | (1ll << 32)),
+		/* MOV48 rX:rX+1, time_offset */
+		JOB_INSTR_TIMESTAMP((1ull << 56) | (params->time_reg << 48) | (params->times_addr +
+			   offsetof(struct panthor_job_profiling_data, time.before))),
+		/* STORE_STATE timer */
+		JOB_INSTR_TIMESTAMP((40ull << 56) | (params->time_reg << 40) | (0ll << 32)),
+		/* MOV48 rX:rX+1, cs.start */
+		JOB_INSTR_ALWAYS((1ull << 56) | (params->addr_reg << 48) | params->cs_start),
+		/* MOV32 rX+2, cs.size */
+		JOB_INSTR_ALWAYS((2ull << 56) | (params->val_reg << 48) | params->cs_size),
+		/* WAIT(0) => waits for FLUSH_CACHE2 instruction */
+		JOB_INSTR_ALWAYS((3ull << 56) | (1 << 16)),
 		/* CALL rX:rX+1, rX+2 */
-		(32ull << 56) | (addr_reg << 40) | (val_reg << 32),
-
+		JOB_INSTR_ALWAYS((32ull << 56) | (params->addr_reg << 40) | (params->val_reg << 32)),
+		/* MOV48 rX:rX+1, cycles_offset */
+		JOB_INSTR_CYCLES((1ull << 56) | (params->cycle_reg << 48) |
+				 (params->times_addr + offsetof(struct panthor_job_profiling_data, cycles.after))),
+		/* STORE_STATE cycles */
+		JOB_INSTR_CYCLES((40ull << 56) | (params->cycle_reg << 40) | (1ll << 32)),
+		/* MOV48 rX:rX+1, time_offset */
+		JOB_INSTR_TIMESTAMP((1ull << 56) | (params->time_reg << 48) |
+			  (params->times_addr + offsetof(struct panthor_job_profiling_data, time.after))),
+		/* STORE_STATE timer */
+		JOB_INSTR_TIMESTAMP((40ull << 56) | (params->time_reg << 40) | (0ll << 32)),
 		/* MOV48 rX:rX+1, sync_addr */
-		(1ull << 56) | (addr_reg << 48) | sync_addr,
-
+		JOB_INSTR_ALWAYS((1ull << 56) | (params->addr_reg << 48) | params->sync_addr),
 		/* MOV48 rX+2, #1 */
-		(1ull << 56) | (val_reg << 48) | 1,
-
+		JOB_INSTR_ALWAYS((1ull << 56) | (params->val_reg << 48) | 1),
 		/* WAIT(all) */
-		(3ull << 56) | (waitall_mask << 16),
-
+		JOB_INSTR_ALWAYS((3ull << 56) | (params->waitall_mask << 16)),
 		/* SYNC_ADD64.system_scope.propage_err.nowait rX:rX+1, rX+2*/
-		(51ull << 56) | (0ull << 48) | (addr_reg << 40) | (val_reg << 32) | (0 << 16) | 1,
+		JOB_INSTR_ALWAYS((51ull << 56) | (0ull << 48) | (params->addr_reg << 40) |
+				 (params->val_reg << 32) | (0 << 16) | 1),
+		/* ERROR_BARRIER, so we can recover from faults at job boundaries. */
+		JOB_INSTR_ALWAYS((47ull << 56)),
+	};
+	u32 pad;
 
-		/* ERROR_BARRIER, so we can recover from faults at job
-		 * boundaries.
-		 */
-		(47ull << 56),
+	instrs->count = 0;
+
+	/* NEED to be cacheline aligned to please the prefetcher. */
+	static_assert(sizeof(instrs->buffer) % 64 == 0,
+		      "panthor_job_ringbuf_instrs::buffer is not aligned on a cacheline");
+
+	/* Make sure we have enough storage to store the whole sequence. */
+	static_assert(ALIGN(ARRAY_SIZE(instr_seq), NUM_INSTRS_PER_CACHE_LINE) ==
+		      ARRAY_SIZE(instrs->buffer),
+		      "instr_seq vs panthor_job_ringbuf_instrs::buffer size mismatch");
+
+	for (u32 i = 0; i < ARRAY_SIZE(instr_seq); i++) {
+		/* If the profile mask of this instruction is not enabled, skip it. */
+		if (instr_seq[i].profile_mask &&
+		    !(instr_seq[i].profile_mask & params->profile_mask))
+			continue;
+
+		instrs->buffer[instrs->count++] = instr_seq[i].instr;
+	}
+
+	pad = ALIGN(instrs->count, NUM_INSTRS_PER_CACHE_LINE);
+	memset(&instrs->buffer[instrs->count], 0,
+	       (pad - instrs->count) * sizeof(instrs->buffer[0]));
+	instrs->count = pad;
+}
+
+static u32 calc_job_credits(u32 profile_mask)
+{
+	struct panthor_job_ringbuf_instrs instrs;
+	struct panthor_job_cs_params params = {
+		.profile_mask = profile_mask,
 	};
 
-	/* Need to be cacheline aligned to please the prefetcher. */
-	static_assert(sizeof(call_instrs) % 64 == 0,
-		      "call_instrs is not aligned on a cacheline");
+	prepare_job_instrs(&params, &instrs);
+	return instrs.count;
+}
+
+static struct dma_fence *
+queue_run_job(struct drm_sched_job *sched_job)
+{
+	struct panthor_job *job = container_of(sched_job, struct panthor_job, base);
+	struct panthor_group *group = job->group;
+	struct panthor_queue *queue = group->queues[job->queue_idx];
+	struct panthor_device *ptdev = group->ptdev;
+	struct panthor_scheduler *sched = ptdev->scheduler;
+	struct panthor_job_ringbuf_instrs instrs;
+	struct panthor_job_cs_params cs_params;
+	struct dma_fence *done_fence;
+	int ret;
 
 	/* Stream size is zero, nothing to do except making sure all previously
 	 * submitted jobs are done before we signal the
@@ -2900,17 +3062,23 @@ queue_run_job(struct drm_sched_job *sched_job)
 		       queue->fence_ctx.id,
 		       atomic64_inc_return(&queue->fence_ctx.seqno));
 
-	memcpy(queue->ringbuf->kmap + ringbuf_insert,
-	       call_instrs, sizeof(call_instrs));
+	job->profiling.slot = queue->profiling.seqno++;
+	if (queue->profiling.seqno == queue->profiling.slot_count)
+		queue->profiling.seqno = 0;
+
+	job->ringbuf.start = queue->iface.input->insert;
+
+	get_job_cs_params(job, &cs_params);
+	prepare_job_instrs(&cs_params, &instrs);
+	copy_instrs_to_ringbuf(queue, job, &instrs);
+
+	job->ringbuf.end = job->ringbuf.start + (instrs.count * sizeof(u64));
 
 	panthor_job_get(&job->base);
 	spin_lock(&queue->fence_ctx.lock);
 	list_add_tail(&job->node, &queue->fence_ctx.in_flight_jobs);
 	spin_unlock(&queue->fence_ctx.lock);
 
-	job->ringbuf.start = queue->iface.input->insert;
-	job->ringbuf.end = job->ringbuf.start + sizeof(call_instrs);
-
 	/* Make sure the ring buffer is updated before the INSERT
 	 * register.
 	 */
@@ -3003,6 +3171,34 @@ static const struct drm_sched_backend_ops panthor_queue_sched_ops = {
 	.free_job = queue_free_job,
 };
 
+static u32 calc_profiling_ringbuf_num_slots(struct panthor_device *ptdev,
+				       u32 cs_ringbuf_size)
+{
+	u32 min_profiled_job_instrs = U32_MAX;
+	u32 last_flag = fls(PANTHOR_DEVICE_PROFILING_ALL);
+
+	/*
+	 * We want to calculate the minimum size of a profiled job's CS,
+	 * because since they need additional instructions for the sampling
+	 * of performance metrics, they might take up further slots in
+	 * the queue's ringbuffer. This means we might not need as many job
+	 * slots for keeping track of their profiling information. What we
+	 * need is the maximum number of slots we should allocate to this end,
+	 * which matches the maximum number of profiled jobs we can place
+	 * simultaneously in the queue's ring buffer.
+	 * That has to be calculated separately for every single job profiling
+	 * flag, but not in the case job profiling is disabled, since unprofiled
+	 * jobs don't need to keep track of this at all.
+	 */
+	for (u32 i = 0; i < last_flag; i++) {
+		if (BIT(i) & PANTHOR_DEVICE_PROFILING_ALL)
+			min_profiled_job_instrs =
+				min(min_profiled_job_instrs, calc_job_credits(BIT(i)));
+	}
+
+	return DIV_ROUND_UP(cs_ringbuf_size, min_profiled_job_instrs * sizeof(u64));
+}
+
 static struct panthor_queue *
 group_create_queue(struct panthor_group *group,
 		   const struct drm_panthor_queue_create *args)
@@ -3056,9 +3252,35 @@ group_create_queue(struct panthor_group *group,
 		goto err_free_queue;
 	}
 
+	queue->profiling.slot_count =
+		calc_profiling_ringbuf_num_slots(group->ptdev, args->ringbuf_size);
+
+	queue->profiling.slots =
+		panthor_kernel_bo_create(group->ptdev, group->vm,
+					 queue->profiling.slot_count *
+					 sizeof(struct panthor_job_profiling_data),
+					 DRM_PANTHOR_BO_NO_MMAP,
+					 DRM_PANTHOR_VM_BIND_OP_MAP_NOEXEC |
+					 DRM_PANTHOR_VM_BIND_OP_MAP_UNCACHED,
+					 PANTHOR_VM_KERNEL_AUTO_VA);
+
+	if (IS_ERR(queue->profiling.slots)) {
+		ret = PTR_ERR(queue->profiling.slots);
+		goto err_free_queue;
+	}
+
+	ret = panthor_kernel_bo_vmap(queue->profiling.slots);
+	if (ret)
+		goto err_free_queue;
+
+	/*
+	 * Credit limit argument tells us the total number of instructions
+	 * across all CS slots in the ringbuffer, with some jobs requiring
+	 * twice as many as others, depending on their profiling status.
+	 */
 	ret = drm_sched_init(&queue->scheduler, &panthor_queue_sched_ops,
 			     group->ptdev->scheduler->wq, 1,
-			     args->ringbuf_size / (NUM_INSTRS_PER_SLOT * sizeof(u64)),
+			     args->ringbuf_size / sizeof(u64),
 			     0, msecs_to_jiffies(JOB_TIMEOUT_MS),
 			     group->ptdev->reset.wq,
 			     NULL, "panthor-queue", group->ptdev->base.dev);
@@ -3354,6 +3576,7 @@ panthor_job_create(struct panthor_file *pfile,
 {
 	struct panthor_group_pool *gpool = pfile->groups;
 	struct panthor_job *job;
+	u32 credits;
 	int ret;
 
 	if (qsubmit->pad)
@@ -3407,9 +3630,16 @@ panthor_job_create(struct panthor_file *pfile,
 		}
 	}
 
+	job->profiling.mask = pfile->ptdev->profile_mask;
+	credits = calc_job_credits(job->profiling.mask);
+	if (credits == 0) {
+		ret = -EINVAL;
+		goto err_put_job;
+	}
+
 	ret = drm_sched_job_init(&job->base,
 				 &job->group->queues[job->queue_idx]->entity,
-				 1, job->group);
+				 credits, job->group);
 	if (ret)
 		goto err_put_job;
 
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v8 2/5] drm/panthor: record current and maximum device clock frequencies
  2024-09-23 23:06 [PATCH v8 0/5] Support fdinfo runtime and memory stats on Panthor Adrián Larumbe
  2024-09-23 23:06 ` [PATCH v8 1/5] drm/panthor: introduce job cycle and timestamp accounting Adrián Larumbe
@ 2024-09-23 23:06 ` Adrián Larumbe
  2024-09-23 23:06 ` [PATCH v8 3/5] drm/panthor: add DRM fdinfo support Adrián Larumbe
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Adrián Larumbe @ 2024-09-23 23:06 UTC (permalink / raw)
  To: Boris Brezillon, Steven Price, Liviu Dudau, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
	Sumit Semwal, Christian König
  Cc: kernel, Adrián Larumbe, dri-devel, linux-kernel, linux-media,
	linaro-mm-sig

In order to support UM in calculating rates of GPU utilisation, the current
operating and maximum GPU clock frequencies must be recorded during device
initialisation, and also during OPP state transitions.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
---
 drivers/gpu/drm/panthor/panthor_devfreq.c | 18 +++++++++++++++++-
 drivers/gpu/drm/panthor/panthor_device.h  |  6 ++++++
 2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/panthor/panthor_devfreq.c b/drivers/gpu/drm/panthor/panthor_devfreq.c
index c6d3c327cc24..9d0f891b9b53 100644
--- a/drivers/gpu/drm/panthor/panthor_devfreq.c
+++ b/drivers/gpu/drm/panthor/panthor_devfreq.c
@@ -62,14 +62,20 @@ static void panthor_devfreq_update_utilization(struct panthor_devfreq *pdevfreq)
 static int panthor_devfreq_target(struct device *dev, unsigned long *freq,
 				  u32 flags)
 {
+	struct panthor_device *ptdev = dev_get_drvdata(dev);
 	struct dev_pm_opp *opp;
+	int err;
 
 	opp = devfreq_recommended_opp(dev, freq, flags);
 	if (IS_ERR(opp))
 		return PTR_ERR(opp);
 	dev_pm_opp_put(opp);
 
-	return dev_pm_opp_set_rate(dev, *freq);
+	err = dev_pm_opp_set_rate(dev, *freq);
+	if (!err)
+		ptdev->current_frequency = *freq;
+
+	return err;
 }
 
 static void panthor_devfreq_reset(struct panthor_devfreq *pdevfreq)
@@ -130,6 +136,7 @@ int panthor_devfreq_init(struct panthor_device *ptdev)
 	struct panthor_devfreq *pdevfreq;
 	struct dev_pm_opp *opp;
 	unsigned long cur_freq;
+	unsigned long freq = ULONG_MAX;
 	int ret;
 
 	pdevfreq = drmm_kzalloc(&ptdev->base, sizeof(*ptdev->devfreq), GFP_KERNEL);
@@ -161,6 +168,7 @@ int panthor_devfreq_init(struct panthor_device *ptdev)
 		return PTR_ERR(opp);
 
 	panthor_devfreq_profile.initial_freq = cur_freq;
+	ptdev->current_frequency = cur_freq;
 
 	/* Regulator coupling only takes care of synchronizing/balancing voltage
 	 * updates, but the coupled regulator needs to be enabled manually.
@@ -204,6 +212,14 @@ int panthor_devfreq_init(struct panthor_device *ptdev)
 
 	dev_pm_opp_put(opp);
 
+	/* Find the fastest defined rate  */
+	opp = dev_pm_opp_find_freq_floor(dev, &freq);
+	if (IS_ERR(opp))
+		return PTR_ERR(opp);
+	ptdev->fast_rate = freq;
+
+	dev_pm_opp_put(opp);
+
 	/*
 	 * Setup default thresholds for the simple_ondemand governor.
 	 * The values are chosen based on experiments.
diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
index a48e30d0af30..2109905813e8 100644
--- a/drivers/gpu/drm/panthor/panthor_device.h
+++ b/drivers/gpu/drm/panthor/panthor_device.h
@@ -184,6 +184,12 @@ struct panthor_device {
 
 	/** @profile_mask: User-set profiling flags for job accounting. */
 	u32 profile_mask;
+
+	/** @current_frequency: Device clock frequency at present. Set by DVFS*/
+	unsigned long current_frequency;
+
+	/** @fast_rate: Maximum device clock frequency. Set by DVFS */
+	unsigned long fast_rate;
 };
 
 /**
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v8 3/5] drm/panthor: add DRM fdinfo support
  2024-09-23 23:06 [PATCH v8 0/5] Support fdinfo runtime and memory stats on Panthor Adrián Larumbe
  2024-09-23 23:06 ` [PATCH v8 1/5] drm/panthor: introduce job cycle and timestamp accounting Adrián Larumbe
  2024-09-23 23:06 ` [PATCH v8 2/5] drm/panthor: record current and maximum device clock frequencies Adrián Larumbe
@ 2024-09-23 23:06 ` Adrián Larumbe
  2024-09-29  0:25   ` kernel test robot
  2024-10-02  8:40   ` Boris Brezillon
  2024-09-23 23:06 ` [PATCH v8 4/5] drm/panthor: enable fdinfo for memory stats Adrián Larumbe
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 14+ messages in thread
From: Adrián Larumbe @ 2024-09-23 23:06 UTC (permalink / raw)
  To: Boris Brezillon, Steven Price, Liviu Dudau, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
	Sumit Semwal, Christian König
  Cc: kernel, Adrián Larumbe, dri-devel, linux-kernel, linux-media,
	linaro-mm-sig

Drawing from the FW-calculated values in a previous commit, we can increase
the numbers for an open file by collecting them from finished jobs when
updating their group synchronisation objects.

Display of fdinfo key-value pairs is governed by a bitmask that is by
default unset in the present commit, and supporting manual toggle of it
will be the matter of a later commit.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
---
 drivers/gpu/drm/panthor/panthor_device.h |  8 ++++
 drivers/gpu/drm/panthor/panthor_drv.c    | 34 ++++++++++++++
 drivers/gpu/drm/panthor/panthor_sched.c  | 56 ++++++++++++++++++++++++
 drivers/gpu/drm/panthor/panthor_sched.h  |  2 +
 4 files changed, 100 insertions(+)

diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
index 2109905813e8..0e68f5a70d20 100644
--- a/drivers/gpu/drm/panthor/panthor_device.h
+++ b/drivers/gpu/drm/panthor/panthor_device.h
@@ -192,6 +192,11 @@ struct panthor_device {
 	unsigned long fast_rate;
 };
 
+struct panthor_gpu_usage {
+	u64 time;
+	u64 cycles;
+};
+
 /**
  * struct panthor_file - Panthor file
  */
@@ -204,6 +209,9 @@ struct panthor_file {
 
 	/** @groups: Scheduling group pool attached to this file. */
 	struct panthor_group_pool *groups;
+
+	/** @stats: cycle and timestamp measures for job execution. */
+	struct panthor_gpu_usage stats;
 };
 
 int panthor_device_init(struct panthor_device *ptdev);
diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c
index 0caf9e9a8c45..233b265c0819 100644
--- a/drivers/gpu/drm/panthor/panthor_drv.c
+++ b/drivers/gpu/drm/panthor/panthor_drv.c
@@ -13,6 +13,7 @@
 #include <linux/pagemap.h>
 #include <linux/platform_device.h>
 #include <linux/pm_runtime.h>
+#include <linux/time64.h>
 
 #include <drm/drm_auth.h>
 #include <drm/drm_debugfs.h>
@@ -1414,6 +1415,37 @@ static int panthor_mmap(struct file *filp, struct vm_area_struct *vma)
 	return ret;
 }
 
+static void panthor_gpu_show_fdinfo(struct panthor_device *ptdev,
+				    struct panthor_file *pfile,
+				    struct drm_printer *p)
+{
+	if (ptdev->profile_mask & PANTHOR_DEVICE_PROFILING_ALL)
+		panthor_fdinfo_gather_group_samples(pfile);
+
+	if (ptdev->profile_mask & PANTHOR_DEVICE_PROFILING_TIMESTAMP) {
+#ifdef CONFIG_ARM_ARCH_TIMER
+		drm_printf(p, "drm-engine-panthor:\t%llu ns\n",
+			   DIV_ROUND_UP_ULL((pfile->stats.time * NSEC_PER_SEC),
+					    arch_timer_get_cntfrq()));
+#endif
+	}
+	if (ptdev->profile_mask & PANTHOR_DEVICE_PROFILING_CYCLES)
+		drm_printf(p, "drm-cycles-panthor:\t%llu\n", pfile->stats.cycles);
+
+	drm_printf(p, "drm-maxfreq-panthor:\t%lu Hz\n", ptdev->fast_rate);
+	drm_printf(p, "drm-curfreq-panthor:\t%lu Hz\n", ptdev->current_frequency);
+}
+
+static void panthor_show_fdinfo(struct drm_printer *p, struct drm_file *file)
+{
+	struct drm_device *dev = file->minor->dev;
+	struct panthor_device *ptdev = container_of(dev, struct panthor_device, base);
+
+	panthor_gpu_show_fdinfo(ptdev, file->driver_priv, p);
+
+	drm_show_memory_stats(p, file);
+}
+
 static const struct file_operations panthor_drm_driver_fops = {
 	.open = drm_open,
 	.release = drm_release,
@@ -1423,6 +1455,7 @@ static const struct file_operations panthor_drm_driver_fops = {
 	.read = drm_read,
 	.llseek = noop_llseek,
 	.mmap = panthor_mmap,
+	.show_fdinfo = drm_show_fdinfo,
 };
 
 #ifdef CONFIG_DEBUG_FS
@@ -1442,6 +1475,7 @@ static const struct drm_driver panthor_drm_driver = {
 			   DRIVER_SYNCOBJ_TIMELINE | DRIVER_GEM_GPUVA,
 	.open = panthor_open,
 	.postclose = panthor_postclose,
+	.show_fdinfo = panthor_show_fdinfo,
 	.ioctls = panthor_drm_driver_ioctls,
 	.num_ioctls = ARRAY_SIZE(panthor_drm_driver_ioctls),
 	.fops = &panthor_drm_driver_fops,
diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
index 4fc9946fd2ac..7ea97ede9c96 100644
--- a/drivers/gpu/drm/panthor/panthor_sched.c
+++ b/drivers/gpu/drm/panthor/panthor_sched.c
@@ -619,6 +619,18 @@ struct panthor_group {
 	 */
 	struct panthor_kernel_bo *syncobjs;
 
+	/** @fdinfo: Per-file total cycle and timestamp values reference. */
+	struct {
+		/** @data: Total sampled values for jobs in queues from this group. */
+		struct panthor_gpu_usage data;
+
+		/**
+		 * @lock: Mutex to govern concurrent access from drm file's fdinfo callback
+		 * and job post-completion processing function
+		 */
+		struct mutex lock;
+	} fdinfo;
+
 	/** @state: Group state. */
 	enum panthor_group_state state;
 
@@ -889,6 +901,8 @@ static void group_release_work(struct work_struct *work)
 						   release_work);
 	u32 i;
 
+	mutex_destroy(&group->fdinfo.lock);
+
 	for (i = 0; i < group->queue_count; i++)
 		group_free_queue(group, group->queues[i]);
 
@@ -2811,6 +2825,44 @@ void panthor_sched_post_reset(struct panthor_device *ptdev, bool reset_failed)
 	}
 }
 
+static void update_fdinfo_stats(struct panthor_job *job)
+{
+	struct panthor_group *group = job->group;
+	struct panthor_queue *queue = group->queues[job->queue_idx];
+	struct panthor_gpu_usage *fdinfo = &group->fdinfo.data;
+	struct panthor_job_profiling_data *times;
+
+	times = (struct panthor_job_profiling_data *)
+		((unsigned long) queue->profiling.slots->kmap +
+		 (job->profiling.slot * sizeof(struct panthor_job_profiling_data)));
+
+	mutex_lock(&group->fdinfo.lock);
+	if (job->profiling.mask & PANTHOR_DEVICE_PROFILING_CYCLES)
+		fdinfo->cycles += times->cycles.after - times->cycles.before;
+	if (job->profiling.mask & PANTHOR_DEVICE_PROFILING_TIMESTAMP)
+		fdinfo->time += times->time.after - times->time.before;
+	mutex_unlock(&group->fdinfo.lock);
+}
+
+void panthor_fdinfo_gather_group_samples(struct panthor_file *pfile)
+{
+	struct panthor_group_pool *gpool = pfile->groups;
+	struct panthor_group *group;
+	unsigned long i;
+
+	if (IS_ERR_OR_NULL(gpool))
+		return;
+
+	xa_for_each(&gpool->xa, i, group) {
+		mutex_lock(&group->fdinfo.lock);
+		pfile->stats.cycles += group->fdinfo.data.cycles;
+		pfile->stats.time += group->fdinfo.data.time;
+		group->fdinfo.data.cycles = 0;
+		group->fdinfo.data.time = 0;
+		mutex_unlock(&group->fdinfo.lock);
+	}
+}
+
 static void group_sync_upd_work(struct work_struct *work)
 {
 	struct panthor_group *group =
@@ -2843,6 +2895,8 @@ static void group_sync_upd_work(struct work_struct *work)
 	dma_fence_end_signalling(cookie);
 
 	list_for_each_entry_safe(job, job_tmp, &done_jobs, node) {
+		if (job->profiling.mask)
+			update_fdinfo_stats(job);
 		list_del_init(&job->node);
 		panthor_job_put(&job->base);
 	}
@@ -3421,6 +3475,8 @@ int panthor_group_create(struct panthor_file *pfile,
 	}
 	mutex_unlock(&sched->reset.lock);
 
+	mutex_init(&group->fdinfo.lock);
+
 	return gid;
 
 err_put_group:
diff --git a/drivers/gpu/drm/panthor/panthor_sched.h b/drivers/gpu/drm/panthor/panthor_sched.h
index 3a30d2328b30..5ae6b4bde7c5 100644
--- a/drivers/gpu/drm/panthor/panthor_sched.h
+++ b/drivers/gpu/drm/panthor/panthor_sched.h
@@ -47,4 +47,6 @@ void panthor_sched_resume(struct panthor_device *ptdev);
 void panthor_sched_report_mmu_fault(struct panthor_device *ptdev);
 void panthor_sched_report_fw_events(struct panthor_device *ptdev, u32 events);
 
+void panthor_fdinfo_gather_group_samples(struct panthor_file *pfile);
+
 #endif
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v8 4/5] drm/panthor: enable fdinfo for memory stats
  2024-09-23 23:06 [PATCH v8 0/5] Support fdinfo runtime and memory stats on Panthor Adrián Larumbe
                   ` (2 preceding siblings ...)
  2024-09-23 23:06 ` [PATCH v8 3/5] drm/panthor: add DRM fdinfo support Adrián Larumbe
@ 2024-09-23 23:06 ` Adrián Larumbe
  2024-09-23 23:06 ` [PATCH v8 5/5] drm/panthor: add sysfs knob for enabling job profiling Adrián Larumbe
  2024-10-02  8:57 ` [PATCH v8 0/5] Support fdinfo runtime and memory stats on Panthor Boris Brezillon
  5 siblings, 0 replies; 14+ messages in thread
From: Adrián Larumbe @ 2024-09-23 23:06 UTC (permalink / raw)
  To: Boris Brezillon, Steven Price, Liviu Dudau, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
	Sumit Semwal, Christian König
  Cc: kernel, Adrián Larumbe, dri-devel, linux-kernel, linux-media,
	linaro-mm-sig

Implement drm object's status callback.

Also, we consider a PRIME imported BO to be resident if its matching
dma_buf has an open attachment, which means its backing storage had already
been allocated.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
---
 drivers/gpu/drm/panthor/panthor_gem.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/panthor/panthor_gem.c b/drivers/gpu/drm/panthor/panthor_gem.c
index 38f560864879..c60b599665d8 100644
--- a/drivers/gpu/drm/panthor/panthor_gem.c
+++ b/drivers/gpu/drm/panthor/panthor_gem.c
@@ -145,6 +145,17 @@ panthor_gem_prime_export(struct drm_gem_object *obj, int flags)
 	return drm_gem_prime_export(obj, flags);
 }
 
+static enum drm_gem_object_status panthor_gem_status(struct drm_gem_object *obj)
+{
+	struct panthor_gem_object *bo = to_panthor_bo(obj);
+	enum drm_gem_object_status res = 0;
+
+	if (bo->base.base.import_attach || bo->base.pages)
+		res |= DRM_GEM_OBJECT_RESIDENT;
+
+	return res;
+}
+
 static const struct drm_gem_object_funcs panthor_gem_funcs = {
 	.free = panthor_gem_free_object,
 	.print_info = drm_gem_shmem_object_print_info,
@@ -154,6 +165,7 @@ static const struct drm_gem_object_funcs panthor_gem_funcs = {
 	.vmap = drm_gem_shmem_object_vmap,
 	.vunmap = drm_gem_shmem_object_vunmap,
 	.mmap = panthor_gem_mmap,
+	.status = panthor_gem_status,
 	.export = panthor_gem_prime_export,
 	.vm_ops = &drm_gem_shmem_vm_ops,
 };
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v8 5/5] drm/panthor: add sysfs knob for enabling job profiling
  2024-09-23 23:06 [PATCH v8 0/5] Support fdinfo runtime and memory stats on Panthor Adrián Larumbe
                   ` (3 preceding siblings ...)
  2024-09-23 23:06 ` [PATCH v8 4/5] drm/panthor: enable fdinfo for memory stats Adrián Larumbe
@ 2024-09-23 23:06 ` Adrián Larumbe
  2024-10-02  8:57 ` [PATCH v8 0/5] Support fdinfo runtime and memory stats on Panthor Boris Brezillon
  5 siblings, 0 replies; 14+ messages in thread
From: Adrián Larumbe @ 2024-09-23 23:06 UTC (permalink / raw)
  To: Boris Brezillon, Steven Price, Liviu Dudau, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
	Sumit Semwal, Christian König
  Cc: kernel, Adrián Larumbe, dri-devel, linux-kernel, linux-media,
	linaro-mm-sig

This commit introduces a DRM device sysfs attribute that lets UM control
the job accounting status in the device. The knob variable had been brought
in as part of a previous commit, but now we're able to fix it manually.

As sysfs files are part of a driver's uAPI, describe its legitimate input
values and output format in a documentation file.

Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
---
 .../testing/sysfs-driver-panthor-profiling    | 10 ++++
 Documentation/gpu/panthor.rst                 | 46 +++++++++++++++++++
 drivers/gpu/drm/panthor/panthor_drv.c         | 39 ++++++++++++++++
 3 files changed, 95 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-driver-panthor-profiling
 create mode 100644 Documentation/gpu/panthor.rst

diff --git a/Documentation/ABI/testing/sysfs-driver-panthor-profiling b/Documentation/ABI/testing/sysfs-driver-panthor-profiling
new file mode 100644
index 000000000000..af05fccedc15
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-driver-panthor-profiling
@@ -0,0 +1,10 @@
+What:		/sys/bus/platform/drivers/panthor/.../profiling
+Date:		September 2024
+KernelVersion:	6.11.0
+Contact:	Adrian Larumbe <adrian.larumbe@collabora.com>
+Description:
+		Bitmask to enable drm fdinfo's job profiling measurements.
+		Valid values are:
+		0: Don't enable fdinfo job profiling sources.
+		1: Enable GPU cycle measurements for running jobs.
+		2: Enable GPU timestamp sampling for running jobs.
diff --git a/Documentation/gpu/panthor.rst b/Documentation/gpu/panthor.rst
new file mode 100644
index 000000000000..cbf5c4429a2d
--- /dev/null
+++ b/Documentation/gpu/panthor.rst
@@ -0,0 +1,46 @@
+.. SPDX-License-Identifier: GPL-2.0+
+
+=========================
+ drm/Panthor CSF driver
+=========================
+
+.. _panfrost-usage-stats:
+
+Panthor DRM client usage stats implementation
+==============================================
+
+The drm/Panthor driver implements the DRM client usage stats specification as
+documented in :ref:`drm-client-usage-stats`.
+
+Example of the output showing the implemented key value pairs and entirety of
+the currently possible format options:
+
+::
+     pos:    0
+     flags:  02400002
+     mnt_id: 29
+     ino:    491
+     drm-driver:     panthor
+     drm-client-id:  10
+     drm-engine-panthor:     111110952750 ns
+     drm-cycles-panthor:     94439687187
+     drm-maxfreq-panthor:    1000000000 Hz
+     drm-curfreq-panthor:    1000000000 Hz
+     drm-total-memory:       16480 KiB
+     drm-shared-memory:      0
+     drm-active-memory:      16200 KiB
+     drm-resident-memory:    16480 KiB
+     drm-purgeable-memory:   0
+
+Possible `drm-engine-` key names are: `panthor`.
+`drm-curfreq-` values convey the current operating frequency for that engine.
+
+Users must bear in mind that engine and cycle sampling are disabled by default,
+because of power saving concerns. `fdinfo` users and benchmark applications which
+query the fdinfo file must make sure to toggle the job profiling status of the
+driver by writing into the appropriate sysfs node::
+
+    echo <N> > /sys/bus/platform/drivers/panthor/[a-f0-9]*.gpu/profiling
+
+Where `N` is a bit mask where cycle and timestamp sampling are respectively
+enabled by the first and second bits.
diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c
index 233b265c0819..6f47d9d1d86a 100644
--- a/drivers/gpu/drm/panthor/panthor_drv.c
+++ b/drivers/gpu/drm/panthor/panthor_drv.c
@@ -1513,6 +1513,44 @@ static void panthor_remove(struct platform_device *pdev)
 	panthor_device_unplug(ptdev);
 }
 
+static ssize_t profiling_show(struct device *dev,
+			      struct device_attribute *attr,
+			      char *buf)
+{
+	struct panthor_device *ptdev = dev_get_drvdata(dev);
+
+	return sysfs_emit(buf, "%d\n", ptdev->profile_mask);
+}
+
+static ssize_t profiling_store(struct device *dev,
+			       struct device_attribute *attr,
+			       const char *buf, size_t len)
+{
+	struct panthor_device *ptdev = dev_get_drvdata(dev);
+	u32 value;
+	int err;
+
+	err = kstrtou32(buf, 0, &value);
+	if (err)
+		return err;
+
+	if ((value & ~PANTHOR_DEVICE_PROFILING_ALL) != 0)
+		return -EINVAL;
+
+	ptdev->profile_mask = value;
+
+	return len;
+}
+
+static DEVICE_ATTR_RW(profiling);
+
+static struct attribute *panthor_attrs[] = {
+	&dev_attr_profiling.attr,
+	NULL,
+};
+
+ATTRIBUTE_GROUPS(panthor);
+
 static const struct of_device_id dt_match[] = {
 	{ .compatible = "rockchip,rk3588-mali" },
 	{ .compatible = "arm,mali-valhall-csf" },
@@ -1532,6 +1570,7 @@ static struct platform_driver panthor_driver = {
 		.name = "panthor",
 		.pm = pm_ptr(&panthor_pm_ops),
 		.of_match_table = dt_match,
+		.dev_groups = panthor_groups,
 	},
 };
 
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v8 3/5] drm/panthor: add DRM fdinfo support
  2024-09-23 23:06 ` [PATCH v8 3/5] drm/panthor: add DRM fdinfo support Adrián Larumbe
@ 2024-09-29  0:25   ` kernel test robot
  2024-10-02  8:14     ` Boris Brezillon
  2024-10-02  8:40   ` Boris Brezillon
  1 sibling, 1 reply; 14+ messages in thread
From: kernel test robot @ 2024-09-29  0:25 UTC (permalink / raw)
  To: Adrián Larumbe, Boris Brezillon, Steven Price, Liviu Dudau,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Sumit Semwal, Christian König
  Cc: oe-kbuild-all, kernel, Adrián Larumbe, dri-devel,
	linux-kernel, linux-media, linaro-mm-sig

Hi Adrián,

kernel test robot noticed the following build errors:

[auto build test ERROR on linus/master]
[also build test ERROR on v6.11 next-20240927]
[cannot apply to drm-misc/drm-misc-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Adri-n-Larumbe/drm-panthor-introduce-job-cycle-and-timestamp-accounting/20240924-071018
base:   linus/master
patch link:    https://lore.kernel.org/r/20240923230912.2207320-4-adrian.larumbe%40collabora.com
patch subject: [PATCH v8 3/5] drm/panthor: add DRM fdinfo support
config: arm-randconfig-002-20240929 (https://download.01.org/0day-ci/archive/20240929/202409291048.zLqDeqpO-lkp@intel.com/config)
compiler: arm-linux-gnueabi-gcc (GCC) 14.1.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240929/202409291048.zLqDeqpO-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202409291048.zLqDeqpO-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from include/linux/math64.h:6,
                    from include/linux/time.h:6,
                    from include/linux/stat.h:19,
                    from include/linux/module.h:13,
                    from drivers/gpu/drm/panthor/panthor_drv.c:7:
   drivers/gpu/drm/panthor/panthor_drv.c: In function 'panthor_gpu_show_fdinfo':
>> drivers/gpu/drm/panthor/panthor_drv.c:1389:45: error: implicit declaration of function 'arch_timer_get_cntfrq' [-Wimplicit-function-declaration]
    1389 |                                             arch_timer_get_cntfrq()));
         |                                             ^~~~~~~~~~~~~~~~~~~~~
   include/linux/math.h:40:39: note: in definition of macro 'DIV_ROUND_DOWN_ULL'
      40 |         ({ unsigned long long _tmp = (ll); do_div(_tmp, d); _tmp; })
         |                                       ^~
   drivers/gpu/drm/panthor/panthor_drv.c:1388:28: note: in expansion of macro 'DIV_ROUND_UP_ULL'
    1388 |                            DIV_ROUND_UP_ULL((pfile->stats.time * NSEC_PER_SEC),
         |                            ^~~~~~~~~~~~~~~~


vim +/arch_timer_get_cntfrq +1389 drivers/gpu/drm/panthor/panthor_drv.c

  1377	
  1378	static void panthor_gpu_show_fdinfo(struct panthor_device *ptdev,
  1379					    struct panthor_file *pfile,
  1380					    struct drm_printer *p)
  1381	{
  1382		if (ptdev->profile_mask & PANTHOR_DEVICE_PROFILING_ALL)
  1383			panthor_fdinfo_gather_group_samples(pfile);
  1384	
  1385		if (ptdev->profile_mask & PANTHOR_DEVICE_PROFILING_TIMESTAMP) {
  1386	#ifdef CONFIG_ARM_ARCH_TIMER
  1387			drm_printf(p, "drm-engine-panthor:\t%llu ns\n",
  1388				   DIV_ROUND_UP_ULL((pfile->stats.time * NSEC_PER_SEC),
> 1389						    arch_timer_get_cntfrq()));
  1390	#endif
  1391		}
  1392		if (ptdev->profile_mask & PANTHOR_DEVICE_PROFILING_CYCLES)
  1393			drm_printf(p, "drm-cycles-panthor:\t%llu\n", pfile->stats.cycles);
  1394	
  1395		drm_printf(p, "drm-maxfreq-panthor:\t%lu Hz\n", ptdev->fast_rate);
  1396		drm_printf(p, "drm-curfreq-panthor:\t%lu Hz\n", ptdev->current_frequency);
  1397	}
  1398	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v8 3/5] drm/panthor: add DRM fdinfo support
  2024-09-29  0:25   ` kernel test robot
@ 2024-10-02  8:14     ` Boris Brezillon
  0 siblings, 0 replies; 14+ messages in thread
From: Boris Brezillon @ 2024-10-02  8:14 UTC (permalink / raw)
  To: kernel test robot
  Cc: Adrián Larumbe, Boris Brezillon, Steven Price, Liviu Dudau,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter, Sumit Semwal, Christian König, oe-kbuild-all,
	kernel, dri-devel, linux-kernel, linux-media, linaro-mm-sig

On Sun, 29 Sep 2024 08:25:38 +0800
kernel test robot <lkp@intel.com> wrote:

> Hi Adrián,
> 
> kernel test robot noticed the following build errors:
> 
> [auto build test ERROR on linus/master]
> [also build test ERROR on v6.11 next-20240927]
> [cannot apply to drm-misc/drm-misc-next]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
> 
> url:    https://github.com/intel-lab-lkp/linux/commits/Adri-n-Larumbe/drm-panthor-introduce-job-cycle-and-timestamp-accounting/20240924-071018
> base:   linus/master
> patch link:    https://lore.kernel.org/r/20240923230912.2207320-4-adrian.larumbe%40collabora.com
> patch subject: [PATCH v8 3/5] drm/panthor: add DRM fdinfo support
> config: arm-randconfig-002-20240929 (https://download.01.org/0day-ci/archive/20240929/202409291048.zLqDeqpO-lkp@intel.com/config)
> compiler: arm-linux-gnueabi-gcc (GCC) 14.1.0
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240929/202409291048.zLqDeqpO-lkp@intel.com/reproduce)

I gave this a try with Adrian's series applied on top of drm-misc-next,
and I couldn't reproduce the issue.

> 
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202409291048.zLqDeqpO-lkp@intel.com/
> 
> All errors (new ones prefixed by >>):
> 
>    In file included from include/linux/math64.h:6,
>                     from include/linux/time.h:6,
>                     from include/linux/stat.h:19,
>                     from include/linux/module.h:13,
>                     from drivers/gpu/drm/panthor/panthor_drv.c:7:
>    drivers/gpu/drm/panthor/panthor_drv.c: In function 'panthor_gpu_show_fdinfo':
> >> drivers/gpu/drm/panthor/panthor_drv.c:1389:45: error: implicit declaration of function 'arch_timer_get_cntfrq' [-Wimplicit-function-declaration]  
>     1389 |                                             arch_timer_get_cntfrq()));
>          |                                             ^~~~~~~~~~~~~~~~~~~~~
>    include/linux/math.h:40:39: note: in definition of macro 'DIV_ROUND_DOWN_ULL'
>       40 |         ({ unsigned long long _tmp = (ll); do_div(_tmp, d); _tmp; })
>          |                                       ^~
>    drivers/gpu/drm/panthor/panthor_drv.c:1388:28: note: in expansion of macro 'DIV_ROUND_UP_ULL'
>     1388 |                            DIV_ROUND_UP_ULL((pfile->stats.time * NSEC_PER_SEC),
>          |                            ^~~~~~~~~~~~~~~~
> 
> 
> vim +/arch_timer_get_cntfrq +1389 drivers/gpu/drm/panthor/panthor_drv.c
> 
>   1377	
>   1378	static void panthor_gpu_show_fdinfo(struct panthor_device *ptdev,
>   1379					    struct panthor_file *pfile,
>   1380					    struct drm_printer *p)
>   1381	{
>   1382		if (ptdev->profile_mask & PANTHOR_DEVICE_PROFILING_ALL)
>   1383			panthor_fdinfo_gather_group_samples(pfile);
>   1384	
>   1385		if (ptdev->profile_mask & PANTHOR_DEVICE_PROFILING_TIMESTAMP) {
>   1386	#ifdef CONFIG_ARM_ARCH_TIMER
>   1387			drm_printf(p, "drm-engine-panthor:\t%llu ns\n",
>   1388				   DIV_ROUND_UP_ULL((pfile->stats.time * NSEC_PER_SEC),
> > 1389						    arch_timer_get_cntfrq()));  
>   1390	#endif
>   1391		}
>   1392		if (ptdev->profile_mask & PANTHOR_DEVICE_PROFILING_CYCLES)
>   1393			drm_printf(p, "drm-cycles-panthor:\t%llu\n", pfile->stats.cycles);
>   1394	
>   1395		drm_printf(p, "drm-maxfreq-panthor:\t%lu Hz\n", ptdev->fast_rate);
>   1396		drm_printf(p, "drm-curfreq-panthor:\t%lu Hz\n", ptdev->current_frequency);
>   1397	}
>   1398	
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v8 1/5] drm/panthor: introduce job cycle and timestamp accounting
  2024-09-23 23:06 ` [PATCH v8 1/5] drm/panthor: introduce job cycle and timestamp accounting Adrián Larumbe
@ 2024-10-02  8:38   ` Boris Brezillon
  2024-10-02  8:54     ` Steven Price
  0 siblings, 1 reply; 14+ messages in thread
From: Boris Brezillon @ 2024-10-02  8:38 UTC (permalink / raw)
  To: Adrián Larumbe
  Cc: Steven Price, Liviu Dudau, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
	Christian König, kernel, dri-devel, linux-kernel,
	linux-media, linaro-mm-sig

On Tue, 24 Sep 2024 00:06:21 +0100
Adrián Larumbe <adrian.larumbe@collabora.com> wrote:

> +static u32 calc_profiling_ringbuf_num_slots(struct panthor_device *ptdev,
> +				       u32 cs_ringbuf_size)
> +{
> +	u32 min_profiled_job_instrs = U32_MAX;
> +	u32 last_flag = fls(PANTHOR_DEVICE_PROFILING_ALL);
> +
> +	/*
> +	 * We want to calculate the minimum size of a profiled job's CS,
> +	 * because since they need additional instructions for the sampling
> +	 * of performance metrics, they might take up further slots in
> +	 * the queue's ringbuffer. This means we might not need as many job
> +	 * slots for keeping track of their profiling information. What we
> +	 * need is the maximum number of slots we should allocate to this end,
> +	 * which matches the maximum number of profiled jobs we can place
> +	 * simultaneously in the queue's ring buffer.
> +	 * That has to be calculated separately for every single job profiling
> +	 * flag, but not in the case job profiling is disabled, since unprofiled
> +	 * jobs don't need to keep track of this at all.
> +	 */
> +	for (u32 i = 0; i < last_flag; i++) {
> +		if (BIT(i) & PANTHOR_DEVICE_PROFILING_ALL)

I'll get rid of this check when applying, as suggested by Steve. Steve,
with this modification do you want me to add your R-b?

BTW, I've also fixed a bunch of checkpatch errors/warnings, so you
might want to run checkpatch --strict next time.

> +			min_profiled_job_instrs =
> +				min(min_profiled_job_instrs, calc_job_credits(BIT(i)));
> +	}
> +
> +	return DIV_ROUND_UP(cs_ringbuf_size, min_profiled_job_instrs * sizeof(u64));
> +}

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v8 3/5] drm/panthor: add DRM fdinfo support
  2024-09-23 23:06 ` [PATCH v8 3/5] drm/panthor: add DRM fdinfo support Adrián Larumbe
  2024-09-29  0:25   ` kernel test robot
@ 2024-10-02  8:40   ` Boris Brezillon
  1 sibling, 0 replies; 14+ messages in thread
From: Boris Brezillon @ 2024-10-02  8:40 UTC (permalink / raw)
  To: Adrián Larumbe
  Cc: Steven Price, Liviu Dudau, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
	Christian König, kernel, dri-devel, linux-kernel,
	linux-media, linaro-mm-sig

On Tue, 24 Sep 2024 00:06:23 +0100
Adrián Larumbe <adrian.larumbe@collabora.com> wrote:

> +static void update_fdinfo_stats(struct panthor_job *job)
> +{
> +	struct panthor_group *group = job->group;
> +	struct panthor_queue *queue = group->queues[job->queue_idx];
> +	struct panthor_gpu_usage *fdinfo = &group->fdinfo.data;
> +	struct panthor_job_profiling_data *times;
> +
> +	times = (struct panthor_job_profiling_data *)
> +		((unsigned long) queue->profiling.slots->kmap +
> +		 (job->profiling.slot * sizeof(struct panthor_job_profiling_data)));

The casting done here is a bit of a mess, so I replaced it by:

	struct panthor_job_profiling_data *slots = queue->profiling.slots->kmap;
	struct panthor_job_profiling_data *data = &slots[job->profiling.slot];

> +
> +	mutex_lock(&group->fdinfo.lock);
> +	if (job->profiling.mask & PANTHOR_DEVICE_PROFILING_CYCLES)
> +		fdinfo->cycles += times->cycles.after - times->cycles.before;
> +	if (job->profiling.mask & PANTHOR_DEVICE_PROFILING_TIMESTAMP)
> +		fdinfo->time += times->time.after - times->time.before;
> +	mutex_unlock(&group->fdinfo.lock);
> +}

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v8 1/5] drm/panthor: introduce job cycle and timestamp accounting
  2024-10-02  8:38   ` Boris Brezillon
@ 2024-10-02  8:54     ` Steven Price
  0 siblings, 0 replies; 14+ messages in thread
From: Steven Price @ 2024-10-02  8:54 UTC (permalink / raw)
  To: Boris Brezillon, Adrián Larumbe
  Cc: Liviu Dudau, Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	David Airlie, Simona Vetter, Sumit Semwal, Christian König,
	kernel, dri-devel, linux-kernel, linux-media, linaro-mm-sig

On 02/10/2024 09:38, Boris Brezillon wrote:
> On Tue, 24 Sep 2024 00:06:21 +0100
> Adrián Larumbe <adrian.larumbe@collabora.com> wrote:
> 
>> +static u32 calc_profiling_ringbuf_num_slots(struct panthor_device *ptdev,
>> +				       u32 cs_ringbuf_size)
>> +{
>> +	u32 min_profiled_job_instrs = U32_MAX;
>> +	u32 last_flag = fls(PANTHOR_DEVICE_PROFILING_ALL);
>> +
>> +	/*
>> +	 * We want to calculate the minimum size of a profiled job's CS,
>> +	 * because since they need additional instructions for the sampling
>> +	 * of performance metrics, they might take up further slots in
>> +	 * the queue's ringbuffer. This means we might not need as many job
>> +	 * slots for keeping track of their profiling information. What we
>> +	 * need is the maximum number of slots we should allocate to this end,
>> +	 * which matches the maximum number of profiled jobs we can place
>> +	 * simultaneously in the queue's ring buffer.
>> +	 * That has to be calculated separately for every single job profiling
>> +	 * flag, but not in the case job profiling is disabled, since unprofiled
>> +	 * jobs don't need to keep track of this at all.
>> +	 */
>> +	for (u32 i = 0; i < last_flag; i++) {
>> +		if (BIT(i) & PANTHOR_DEVICE_PROFILING_ALL)
> 
> I'll get rid of this check when applying, as suggested by Steve. Steve,
> with this modification do you want me to add your R-b?

Yes, please do.

Thanks,
Steve

> BTW, I've also fixed a bunch of checkpatch errors/warnings, so you
> might want to run checkpatch --strict next time.
> 
>> +			min_profiled_job_instrs =
>> +				min(min_profiled_job_instrs, calc_job_credits(BIT(i)));
>> +	}
>> +
>> +	return DIV_ROUND_UP(cs_ringbuf_size, min_profiled_job_instrs * sizeof(u64));
>> +}


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v8 0/5] Support fdinfo runtime and memory stats on Panthor
  2024-09-23 23:06 [PATCH v8 0/5] Support fdinfo runtime and memory stats on Panthor Adrián Larumbe
                   ` (4 preceding siblings ...)
  2024-09-23 23:06 ` [PATCH v8 5/5] drm/panthor: add sysfs knob for enabling job profiling Adrián Larumbe
@ 2024-10-02  8:57 ` Boris Brezillon
  2024-10-02  9:26   ` Jani Nikula
  5 siblings, 1 reply; 14+ messages in thread
From: Boris Brezillon @ 2024-10-02  8:57 UTC (permalink / raw)
  To: Adrián Larumbe
  Cc: Steven Price, Liviu Dudau, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
	Christian König, kernel, dri-devel, linux-kernel,
	linux-media, linaro-mm-sig

On Tue, 24 Sep 2024 00:06:20 +0100
Adrián Larumbe <adrian.larumbe@collabora.com> wrote:

> This patch series enables userspace utilities like gputop and nvtop to query a
> render context's fdinfo file and figure out rates of engine and memory utilisation.
> 
> Previous discussion can be found at
> https://lore.kernel.org/dri-devel/20240920234436.207563-1-adrian.larumbe@collabora.com/
> 
> Changelog:
> v8:
>  - Fixed uninitialised stack variable bug that was triggering an invalid memory reference.
>  - Added a few R-b tags to commits
> v7:
>  - Fixed some kernel test bot-reported documentation and sign mismatch errors.
>  - Defined convenience macros for specifying CS instructions according to their profiled status.
>  - Explicitly initialised instruction count for structure containing a job's
>  instructions when calculating its amount of credits for the scheduler.
>  - Some minor cosmetic nits. 
> v6:
>  - Addressed some nits and style issues.
>  - Enforced static assert equality of instruction buffer when calculating job
>  credits or copying them into the ringbuffer.
>  - Added explanation to the way in which job credits and profiled job size is done.
>  - Broke down fdinfo enablement patch into two, one of them dealing with adding
>  support for calculating the current and top operating device frequencies
>  - Fixed race at the time drm file-wide profiling stats are gathered from groups.
> v5:
>  - Moved profiling information slots into a per-queue BO and away from syncobjs.
>  - Decide on size of profiling slots BO from size of CS for minimal profiled job
>  - Turn job and device profiling flag into a bit mask so that individual metrics
>  can be enabled separately.
>  - Shrunk ringbuffer slot size to that of a cache line.
>  - Track profiling slot indeces separately from the job's queue ringbuffer's
>  - Emit CS instructions one by one and tag them depending on profiling mask
>  - New helper for calculating job credits depending on profiling flags
>  - Add Documentation file for sysfs profiling knob
>  - fdinfo will only show engines or cycles tags if these are respectively enabled.
> v4:
>  - Fixed wrong assignment location for frequency values in Panthor's devfreq
>  - Removed the last two commits about registering size of internal BO's
>  - Rearranged patch series so that sysfs knob is done last and all the previous
>  time sampling and fdinfo show dependencies are already in place
> v3:
>  - Fixed some nits and removed useless bounds check in panthor_sched.c
>  - Added support for sysfs profiling knob and optional job accounting
>  - Added new patches for calculating size of internal BO's
> v2:
>  - Split original first patch in two, one for FW CS cycle and timestamp
>  calculations and job accounting memory management, and a second one
>  that enables fdinfo.
>  - Moved NUM_INSTRS_PER_SLOT to the file prelude
>  - Removed nelem variable from the group's struct definition.
>  - Precompute size of group's syncobj BO to avoid code duplication.
>  - Some minor nits.
> 
> Adrián Larumbe (5):
>   drm/panthor: introduce job cycle and timestamp accounting
>   drm/panthor: record current and maximum device clock frequencies
>   drm/panthor: add DRM fdinfo support
>   drm/panthor: enable fdinfo for memory stats
>   drm/panthor: add sysfs knob for enabling job profiling

Queued to drm-misc-next after applying the few modifications I
mentioned. Also added Steve's ack (given on IRC) on the first patch.

> 
>  .../testing/sysfs-driver-panthor-profiling    |  10 +
>  Documentation/gpu/panthor.rst                 |  46 +++
>  drivers/gpu/drm/panthor/panthor_devfreq.c     |  18 +-
>  drivers/gpu/drm/panthor/panthor_device.h      |  36 ++
>  drivers/gpu/drm/panthor/panthor_drv.c         |  73 ++++
>  drivers/gpu/drm/panthor/panthor_gem.c         |  12 +
>  drivers/gpu/drm/panthor/panthor_sched.c       | 384 +++++++++++++++---
>  drivers/gpu/drm/panthor/panthor_sched.h       |   2 +
>  8 files changed, 531 insertions(+), 50 deletions(-)
>  create mode 100644 Documentation/ABI/testing/sysfs-driver-panthor-profiling
>  create mode 100644 Documentation/gpu/panthor.rst
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v8 0/5] Support fdinfo runtime and memory stats on Panthor
  2024-10-02  8:57 ` [PATCH v8 0/5] Support fdinfo runtime and memory stats on Panthor Boris Brezillon
@ 2024-10-02  9:26   ` Jani Nikula
  2024-10-02  9:58     ` Boris Brezillon
  0 siblings, 1 reply; 14+ messages in thread
From: Jani Nikula @ 2024-10-02  9:26 UTC (permalink / raw)
  To: Boris Brezillon, Adrián Larumbe
  Cc: Steven Price, Liviu Dudau, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
	Christian König, kernel, dri-devel, linux-kernel,
	linux-media, linaro-mm-sig

On Wed, 02 Oct 2024, Boris Brezillon <boris.brezillon@collabora.com> wrote:
> Queued to drm-misc-next after applying the few modifications I
> mentioned. Also added Steve's ack (given on IRC) on the first patch.

Can we have the drm-tip rebuild conflict resolution too, please?

diff --cc drivers/gpu/drm/panthor/panthor_drv.c
index c520f156e2d7,f9b93f84d611..000000000000
--- a/drivers/gpu/drm/panthor/panthor_drv.c
+++ b/drivers/gpu/drm/panthor/panthor_drv.c
@@@ -1383,7 -1476,7 +1476,11 @@@ static const struct file_operations pan
        .read = drm_read,
        .llseek = noop_llseek,
        .mmap = panthor_mmap,
++<<<<<<< HEAD
 +      .fop_flags = FOP_UNSIGNED_OFFSET,
++=======
+       .show_fdinfo = drm_show_fdinfo,
++>>>>>>> drm-misc/drm-misc-next
  };
  
  #ifdef CONFIG_DEBUG_FS



>
>> 
>>  .../testing/sysfs-driver-panthor-profiling    |  10 +
>>  Documentation/gpu/panthor.rst                 |  46 +++
>>  drivers/gpu/drm/panthor/panthor_devfreq.c     |  18 +-
>>  drivers/gpu/drm/panthor/panthor_device.h      |  36 ++
>>  drivers/gpu/drm/panthor/panthor_drv.c         |  73 ++++
>>  drivers/gpu/drm/panthor/panthor_gem.c         |  12 +
>>  drivers/gpu/drm/panthor/panthor_sched.c       | 384 +++++++++++++++---
>>  drivers/gpu/drm/panthor/panthor_sched.h       |   2 +
>>  8 files changed, 531 insertions(+), 50 deletions(-)
>>  create mode 100644 Documentation/ABI/testing/sysfs-driver-panthor-profiling
>>  create mode 100644 Documentation/gpu/panthor.rst
>> 
>

-- 
Jani Nikula, Intel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v8 0/5] Support fdinfo runtime and memory stats on Panthor
  2024-10-02  9:26   ` Jani Nikula
@ 2024-10-02  9:58     ` Boris Brezillon
  0 siblings, 0 replies; 14+ messages in thread
From: Boris Brezillon @ 2024-10-02  9:58 UTC (permalink / raw)
  To: Jani Nikula
  Cc: Adrián Larumbe, Steven Price, Liviu Dudau, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
	Sumit Semwal, Christian König, kernel, dri-devel,
	linux-kernel, linux-media, linaro-mm-sig

On Wed, 02 Oct 2024 12:26:49 +0300
Jani Nikula <jani.nikula@linux.intel.com> wrote:

> On Wed, 02 Oct 2024, Boris Brezillon <boris.brezillon@collabora.com> wrote:
> > Queued to drm-misc-next after applying the few modifications I
> > mentioned. Also added Steve's ack (given on IRC) on the first patch.  
> 
> Can we have the drm-tip rebuild conflict resolution too, please?

Oops, sorry. Should be good now.

> 
> diff --cc drivers/gpu/drm/panthor/panthor_drv.c
> index c520f156e2d7,f9b93f84d611..000000000000
> --- a/drivers/gpu/drm/panthor/panthor_drv.c
> +++ b/drivers/gpu/drm/panthor/panthor_drv.c
> @@@ -1383,7 -1476,7 +1476,11 @@@ static const struct file_operations pan
>         .read = drm_read,
>         .llseek = noop_llseek,
>         .mmap = panthor_mmap,
> ++<<<<<<< HEAD
>  +      .fop_flags = FOP_UNSIGNED_OFFSET,
> ++=======
> +       .show_fdinfo = drm_show_fdinfo,
> ++>>>>>>> drm-misc/drm-misc-next  
>   };
>   
>   #ifdef CONFIG_DEBUG_FS

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2024-10-02  9:58 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-23 23:06 [PATCH v8 0/5] Support fdinfo runtime and memory stats on Panthor Adrián Larumbe
2024-09-23 23:06 ` [PATCH v8 1/5] drm/panthor: introduce job cycle and timestamp accounting Adrián Larumbe
2024-10-02  8:38   ` Boris Brezillon
2024-10-02  8:54     ` Steven Price
2024-09-23 23:06 ` [PATCH v8 2/5] drm/panthor: record current and maximum device clock frequencies Adrián Larumbe
2024-09-23 23:06 ` [PATCH v8 3/5] drm/panthor: add DRM fdinfo support Adrián Larumbe
2024-09-29  0:25   ` kernel test robot
2024-10-02  8:14     ` Boris Brezillon
2024-10-02  8:40   ` Boris Brezillon
2024-09-23 23:06 ` [PATCH v8 4/5] drm/panthor: enable fdinfo for memory stats Adrián Larumbe
2024-09-23 23:06 ` [PATCH v8 5/5] drm/panthor: add sysfs knob for enabling job profiling Adrián Larumbe
2024-10-02  8:57 ` [PATCH v8 0/5] Support fdinfo runtime and memory stats on Panthor Boris Brezillon
2024-10-02  9:26   ` Jani Nikula
2024-10-02  9:58     ` Boris Brezillon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).