* [PATCH v4 0/7] Performance counter implementation with single manual client support
@ 2025-05-16 15:49 Lukas Zapolskas
2025-05-16 15:49 ` [PATCH v4 1/7] drm/panthor: Add performance counter uAPI Lukas Zapolskas
` (6 more replies)
0 siblings, 7 replies; 29+ messages in thread
From: Lukas Zapolskas @ 2025-05-16 15:49 UTC (permalink / raw)
To: Boris Brezillon, Steven Price, Liviu Dudau, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
dri-devel, linux-kernel
Cc: Adrián Larumbe, Lukas Zapolskas
Hello,
This patch set implements initial support for performance counter
sampling in Panthor, as a follow-up for Adrián Larumbe's patch
set [1]. This version of the patch series fixes a number of issues,
including FW ring buffer wrapping and IRQ handling for the
performance counter IRQs. The size of the sample is also added
to the uAPI, allowing for the PERF_INFO DEV_QUERY to be sufficient
to handle backwards and forwards compatibility of the interface.
The Mesa implementation is also now available [3].
Existing performance counter workflows, such as those in game
engines, and user-space power models/governor implementations
require the ability to simultaneously obtain counter data. The
hardware and firmware interfaces support a single global
configuration, meaning the kernel must allow for the multiplexing.
It is also in the best position to supplement the counter data
with contextual information about elapsed sampling periods,
information on the power state transitions undergone during
the sampling period, and cycles elapsed on specific clocks chosen
by the integrator.
Each userspace client creates a session, providing an enable
mask of counter values it requires, a BO for a ring buffer,
and a separate BO for the insert and extract indices, along with
an eventfd to signal counter capture, all of which are kept fixed
for the lifetime of the session. When emitting a sample for a
session, counters that were not requested are stripped out,
and non-counter information needed to interpret counter values
is added to either the sample header, or the block header,
which are stored in-line with the counter values in the sample.
The proposed uAPI specifies two major sources of supplemental
information:
- coarse-grained block state transitions are provided on newer
FW versions which support the metadata block, a FW-provided
counter block which indicates the reason a sample was taken
when entering or exiting a non-counting region, or when a
shader core has powered down.
- the clock assignments to individual blocks is done by
integrators, and in order to normalize counter values
which count cycles, userspace must know both the clock
cycles elapsed over the sampling period, and which
of the clocks that particular block is associated
with.
All of the sessions are then aggregated by the sampler, which
handles the programming of the FW interface and subsequent
handling of the samples coming from FW.
v2:
- Fixed offset issues into FW ring buffer
- Fixed sparse shader core handling
- Added pre- and post- reset handlers
- Added module param to control size of FW ring buffer
- Clarified naming on sampler functions
- Added error logging for PERF_SETUP
v3:
- Added sample size to the uAPI.
- Clarified the bit-to-counter mapping for enable masks.
- Fixed IRQ handling: the PERFCNT_THRESHOLD and PERFCNT_OVERFLOW
interrupts can be handled by checking the difference between the
REQ and ACK bits, whereas PERFCNT_SAMPLE needs external data to
validate.
- FW ring buffer indices are now only wrapped when reading the buffer
and are otherwise left in their pre-wrapped form.
- Accumulation index is now bumped after the first copy.
- All insert and extract index reads now use the proper, full-width
type.
- L2 slices are now computed via a macro to extract the relevant
bits from the MEM_FEATURES register. This macro was moved from
the uAPI due to changes in the register making it unstable.
- Consistently take the sampler lock to check if a sample has been
requested.
[1]: https://lore.kernel.org/lkml/20240305165820.585245-1-adrian.larumbe@collabora.com/T/#m67d1f89614fe35dc0560e8304d6731eb1a6942b6
[2]: https://lore.kernel.org/lkml/20241211165024.490748-1-lukas.zapolskas@arm.com/
[3]: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35024
Adrián Larumbe (1):
drm/panthor: Implement the counter sampler and sample handling
Lukas Zapolskas (6):
drm/panthor: Add performance counter uAPI
drm/panthor: Add DEV_QUERY.PERF_INFO handling for Gx10
drm/panthor: Add panthor perf initialization and termination
drm/panthor: Introduce sampling sessions to handle userspace clients
drm/panthor: Add suspend, resume and reset handling
drm/panthor: Expose the panthor perf ioctls
base-commit: 96c85e428ebaeacd2c640eba075479ab92072ccd
Adrián Larumbe (1):
drm/panthor: Implement the counter sampler and sample handling
Lukas Zapolskas (6):
drm/panthor: Add performance counter uAPI
drm/panthor: Add DEV_QUERY.PERF_INFO handling for Gx10
drm/panthor: Add panthor perf initialization and termination
drm/panthor: Introduce sampling sessions to handle userspace clients
drm/panthor: Add suspend, resume and reset handling
drm/panthor: Expose the panthor perf ioctls
drivers/gpu/drm/panthor/Makefile | 1 +
drivers/gpu/drm/panthor/panthor_device.c | 14 +-
drivers/gpu/drm/panthor/panthor_device.h | 11 +-
drivers/gpu/drm/panthor/panthor_drv.c | 150 +-
drivers/gpu/drm/panthor/panthor_fw.c | 6 +
drivers/gpu/drm/panthor/panthor_fw.h | 9 +-
drivers/gpu/drm/panthor/panthor_perf.c | 1982 ++++++++++++++++++++++
drivers/gpu/drm/panthor/panthor_perf.h | 40 +
drivers/gpu/drm/panthor/panthor_regs.h | 1 +
include/uapi/drm/panthor_drm.h | 565 ++++++
10 files changed, 2774 insertions(+), 5 deletions(-)
create mode 100644 drivers/gpu/drm/panthor/panthor_perf.c
create mode 100644 drivers/gpu/drm/panthor/panthor_perf.h
base-commit: 96c85e428ebaeacd2c640eba075479ab92072ccd
--
2.33.0.dirty
^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCH v4 1/7] drm/panthor: Add performance counter uAPI
2025-05-16 15:49 [PATCH v4 0/7] Performance counter implementation with single manual client support Lukas Zapolskas
@ 2025-05-16 15:49 ` Lukas Zapolskas
2025-07-18 2:43 ` Adrián Larumbe
2025-05-16 15:49 ` [PATCH v4 2/7] drm/panthor: Add DEV_QUERY.PERF_INFO handling for Gx10 Lukas Zapolskas
` (5 subsequent siblings)
6 siblings, 1 reply; 29+ messages in thread
From: Lukas Zapolskas @ 2025-05-16 15:49 UTC (permalink / raw)
To: Boris Brezillon, Steven Price, Liviu Dudau, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
dri-devel, linux-kernel
Cc: Adrián Larumbe, Lukas Zapolskas, Mihail Atanassov
This patch extends the DEV_QUERY ioctl to return information about the
performance counter setup for userspace, and introduces the new
ioctl DRM_PANTHOR_PERF_CONTROL in order to allow for the sampling of
performance counters.
The new design is inspired by the perf aux ringbuffer, with the insert
and extract indices being mapped to userspace, allowing multiple samples
to be exposed at any given time. To avoid pointer chasing, the sample
metadata and block metadata are inline with the elements they
describe.
Userspace is responsible for passing in resources for samples to be
exposed, including the event file descriptor for notification of new
sample availability, the ringbuffer BO to store samples, and the
control BO along with the offset for mapping the insert and extract
indices. Though these indices are only a total of 8 bytes, userspace
can then reuse the same physical page for tracking the state of
multiple buffers by giving different offsets from the BO start to
map them.
Co-developed-by: Mihail Atanassov <mihail.atanassov@arm.com>
Signed-off-by: Mihail Atanassov <mihail.atanassov@arm.com>
Signed-off-by: Lukas Zapolskas <lukas.zapolskas@arm.com>
---
include/uapi/drm/panthor_drm.h | 565 +++++++++++++++++++++++++++++++++
1 file changed, 565 insertions(+)
diff --git a/include/uapi/drm/panthor_drm.h b/include/uapi/drm/panthor_drm.h
index 97e2c4510e69..a74eabcabbcb 100644
--- a/include/uapi/drm/panthor_drm.h
+++ b/include/uapi/drm/panthor_drm.h
@@ -127,6 +127,9 @@ enum drm_panthor_ioctl_id {
/** @DRM_PANTHOR_TILER_HEAP_DESTROY: Destroy a tiler heap. */
DRM_PANTHOR_TILER_HEAP_DESTROY,
+
+ /** @DRM_PANTHOR_PERF_CONTROL: Control a performance counter session. */
+ DRM_PANTHOR_PERF_CONTROL,
};
/**
@@ -226,6 +229,9 @@ enum drm_panthor_dev_query_type {
* @DRM_PANTHOR_DEV_QUERY_GROUP_PRIORITIES_INFO: Query allowed group priorities information.
*/
DRM_PANTHOR_DEV_QUERY_GROUP_PRIORITIES_INFO,
+
+ /** @DRM_PANTHOR_DEV_QUERY_PERF_INFO: Query performance counter interface information. */
+ DRM_PANTHOR_DEV_QUERY_PERF_INFO,
};
/**
@@ -379,6 +385,135 @@ struct drm_panthor_group_priorities_info {
__u8 pad[3];
};
+/**
+ * enum drm_panthor_perf_feat_flags - Performance counter configuration feature flags.
+ */
+enum drm_panthor_perf_feat_flags {
+ /** @DRM_PANTHOR_PERF_BLOCK_STATES_SUPPORT: Coarse-grained block states are supported. */
+ DRM_PANTHOR_PERF_BLOCK_STATES_SUPPORT = 1 << 0,
+};
+
+/**
+ * enum drm_panthor_perf_block_type - Performance counter supported block types.
+ */
+enum drm_panthor_perf_block_type {
+ /** @DRM_PANTHOR_PERF_BLOCK_METADATA: Internal use only. */
+ DRM_PANTHOR_PERF_BLOCK_METADATA = 0,
+
+ /** @DRM_PANTHOR_PERF_BLOCK_FW: The FW counter block. */
+ DRM_PANTHOR_PERF_BLOCK_FW,
+
+ /** @DRM_PANTHOR_PERF_BLOCK_CSHW: The CSHW counter block. */
+ DRM_PANTHOR_PERF_BLOCK_CSHW,
+
+ /** @DRM_PANTHOR_PERF_BLOCK_TILER: The tiler counter block. */
+ DRM_PANTHOR_PERF_BLOCK_TILER,
+
+ /** @DRM_PANTHOR_PERF_BLOCK_MEMSYS: A memsys counter block. */
+ DRM_PANTHOR_PERF_BLOCK_MEMSYS,
+
+ /** @DRM_PANTHOR_PERF_BLOCK_SHADER: A shader core counter block. */
+ DRM_PANTHOR_PERF_BLOCK_SHADER,
+
+ /** @DRM_PANTHOR_PERF_BLOCK_FIRST: Internal use only. */
+ DRM_PANTHOR_PERF_BLOCK_FIRST = DRM_PANTHOR_PERF_BLOCK_FW,
+
+ /** @DRM_PANTHOR_PERF_BLOCK_LAST: Internal use only. */
+ DRM_PANTHOR_PERF_BLOCK_LAST = DRM_PANTHOR_PERF_BLOCK_SHADER,
+
+ /** @DRM_PANTHOR_PERF_BLOCK_MAX: Internal use only. */
+ DRM_PANTHOR_PERF_BLOCK_MAX = DRM_PANTHOR_PERF_BLOCK_LAST + 1,
+};
+
+/**
+ * enum drm_panthor_perf_clock - Identifier of the clock used to produce the cycle count values
+ * in a given block.
+ *
+ * Since the integrator has the choice of using one or more clocks, there may be some confusion
+ * as to which blocks are counted by which clock values unless this information is explicitly
+ * provided as part of every block sample. Not every single clock here can be used: in the simplest
+ * case, all cycle counts will be associated with the top-level clock.
+ */
+enum drm_panthor_perf_clock {
+ /** @DRM_PANTHOR_PERF_CLOCK_TOPLEVEL: Top-level CSF clock. */
+ DRM_PANTHOR_PERF_CLOCK_TOPLEVEL,
+
+ /**
+ * @DRM_PANTHOR_PERF_CLOCK_COREGROUP: Core group clock, responsible for the MMU, L2
+ * caches and the tiler.
+ */
+ DRM_PANTHOR_PERF_CLOCK_COREGROUP,
+
+ /** @DRM_PANTHOR_PERF_CLOCK_SHADER: Clock for the shader cores. */
+ DRM_PANTHOR_PERF_CLOCK_SHADER,
+};
+
+/**
+ * struct drm_panthor_perf_info - Performance counter interface information
+ *
+ * Structure grouping all queryable information relating to the performance counter
+ * interfaces.
+ */
+struct drm_panthor_perf_info {
+ /**
+ * @counters_per_block: The number of 8-byte counters available in a block.
+ */
+ __u32 counters_per_block;
+
+ /**
+ * @sample_header_size: The size of the header struct available at the beginning
+ * of every sample.
+ */
+ __u32 sample_header_size;
+
+ /**
+ * @block_header_size: The size of the header struct inline with the counters for a
+ * single block.
+ */
+ __u32 block_header_size;
+
+ /**
+ * @sample_size: The size of a fully annotated sample, starting with a sample header
+ * of size @sample_header_size bytes, and all available blocks for the current
+ * configuration, each comprised of @counters_per_block 64-bit counters and
+ * a block header of @block_header_size bytes.
+ *
+ * The user must use this field to allocate size for the ring buffer. In
+ * the case of new blocks being added, an old userspace can always use
+ * this field and ignore any blocks it does not know about.
+ */
+ __u32 sample_size;
+
+ /** @flags: Combination of drm_panthor_perf_feat_flags flags. */
+ __u32 flags;
+
+ /**
+ * @supported_clocks: Bitmask of the clocks supported by the GPU.
+ *
+ * Each bit represents a variant of the enum drm_panthor_perf_clock.
+ *
+ * For the same GPU, different implementers may have different clocks for the same hardware
+ * block. At the moment, up to four clocks are supported, and any clocks that are present
+ * will be reported here.
+ */
+ __u32 supported_clocks;
+
+ /** @fw_blocks: Number of FW blocks available. */
+ __u32 fw_blocks;
+
+ /** @cshw_blocks: Number of CSHW blocks available. */
+ __u32 cshw_blocks;
+
+ /** @tiler_blocks: Number of tiler blocks available. */
+ __u32 tiler_blocks;
+
+ /** @memsys_blocks: Number of memsys blocks available. */
+ __u32 memsys_blocks;
+
+ /** @shader_blocks: Number of shader core blocks available. */
+ __u32 shader_blocks;
+};
+
/**
* struct drm_panthor_dev_query - Arguments passed to DRM_PANTHOR_IOCTL_DEV_QUERY
*/
@@ -977,6 +1112,434 @@ struct drm_panthor_tiler_heap_destroy {
__u32 pad;
};
+/**
+ * DOC: Performance counter decoding in userspace.
+ *
+ * Each sample will be exposed to userspace in the following manner:
+ *
+ * +--------+--------+------------------------+--------+-------------------------+-----+
+ * | Sample | Block | Block | Block | Block | ... |
+ * | header | header | counters | header | counters | |
+ * +--------+--------+------------------------+--------+-------------------------+-----+
+ *
+ * Each sample will start with a sample header of type @struct drm_panthor_perf_sample header,
+ * providing sample-wide information like the start and end timestamps, the counter set currently
+ * configured, and any errors that may have occurred during sampling.
+ *
+ * After the fixed size header, the sample will consist of blocks of
+ * 64-bit @drm_panthor_dev_query_perf_info::counters_per_block counters, each prefaced with a
+ * header of its own, indicating source block type, as well as the cycle count needed to normalize
+ * cycle values within that block, and a clock source identifier.
+ */
+
+/**
+ * enum drm_panthor_perf_block_state - Bitmask of the power and execution states that an individual
+ * hardware block went through in a sampling period.
+ *
+ * Because the sampling period is controlled from userspace, the block may undergo multiple
+ * state transitions, so this must be interpreted as one or more such transitions occurring.
+ */
+enum drm_panthor_perf_block_state {
+ /**
+ * @DRM_PANTHOR_PERF_BLOCK_STATE_UNKNOWN: The state of this block was unknown during
+ * the sampling period.
+ */
+ DRM_PANTHOR_PERF_BLOCK_STATE_UNKNOWN = 0,
+
+ /**
+ * @DRM_PANTHOR_PERF_BLOCK_STATE_ON: This block was powered on for some or all of
+ * the sampling period.
+ */
+ DRM_PANTHOR_PERF_BLOCK_STATE_ON = 1 << 0,
+
+ /**
+ * @DRM_PANTHOR_PERF_BLOCK_STATE_OFF: This block was powered off for some or all of the
+ * sampling period.
+ */
+ DRM_PANTHOR_PERF_BLOCK_STATE_OFF = 1 << 1,
+
+ /**
+ * @DRM_PANTHOR_PERF_BLOCK_STATE_AVAILABLE: This block was available for execution for
+ * some or all of the sampling period.
+ */
+ DRM_PANTHOR_PERF_BLOCK_STATE_AVAILABLE = 1 << 2,
+ /**
+ * @DRM_PANTHOR_PERF_BLOCK_STATE_UNAVAILABLE: This block was unavailable for execution for
+ * some or all of the sampling period.
+ */
+ DRM_PANTHOR_PERF_BLOCK_STATE_UNAVAILABLE = 1 << 3,
+
+ /**
+ * @DRM_PANTHOR_PERF_BLOCK_STATE_NORMAL: This block was executing in normal mode
+ * for some or all of the sampling period.
+ */
+ DRM_PANTHOR_PERF_BLOCK_STATE_NORMAL = 1 << 4,
+
+ /**
+ * @DRM_PANTHOR_PERF_BLOCK_STATE_PROTECTED: This block was executing in protected mode
+ * for some or all of the sampling period.
+ */
+ DRM_PANTHOR_PERF_BLOCK_STATE_PROTECTED = 1 << 5,
+};
+
+/**
+ * struct drm_panthor_perf_block_header - Header present before every block in the
+ * sample ringbuffer.
+ */
+struct drm_panthor_perf_block_header {
+ /** @block_type: Type of the block. */
+ __u8 block_type;
+
+ /** @block_idx: Block index. */
+ __u8 block_idx;
+
+ /**
+ * @block_states: Coarse-grained block transitions, bitmask of enum
+ * drm_panthor_perf_block_states.
+ */
+ __u8 block_states;
+
+ /**
+ * @clock: Clock used to produce the cycle count for this block, taken from
+ * enum drm_panthor_perf_clock. The cycle counts are stored in the sample header.
+ */
+ __u8 clock;
+
+ /** @pad: MBZ. */
+ __u8 pad[4];
+
+ /** @enable_mask: Bitmask of counters requested during the session setup. */
+ __u64 enable_mask[2];
+};
+
+/**
+ * enum drm_panthor_perf_sample_flags - Sample-wide events that occurred over the sampling
+ * period.
+ */
+enum drm_panthor_perf_sample_flags {
+ /**
+ * @DRM_PANTHOR_PERF_SAMPLE_OVERFLOW: This sample contains overflows due to the duration
+ * of the sampling period.
+ */
+ DRM_PANTHOR_PERF_SAMPLE_OVERFLOW = 1 << 0,
+
+ /**
+ * @DRM_PANTHOR_PERF_SAMPLE_ERROR: This sample encountered an error condition during
+ * the sample duration.
+ */
+ DRM_PANTHOR_PERF_SAMPLE_ERROR = 1 << 1,
+};
+
+/**
+ * struct drm_panthor_perf_sample_header - Header present before every sample.
+ */
+struct drm_panthor_perf_sample_header {
+ /**
+ * @timestamp_start_ns: Earliest timestamp that values in this sample represent, in
+ * nanoseconds. Derived from CLOCK_MONOTONIC_RAW.
+ */
+ __u64 timestamp_start_ns;
+
+ /**
+ * @timestamp_end_ns: Latest timestamp that values in this sample represent, in
+ * nanoseconds. Derived from CLOCK_MONOTONIC_RAW.
+ */
+ __u64 timestamp_end_ns;
+
+ /** @block_set: Set of performance counter blocks. */
+ __u8 block_set;
+
+ /** @pad: MBZ. */
+ __u8 pad[3];
+
+ /** @flags: Current sample flags, combination of drm_panthor_perf_sample_flags. */
+ __u32 flags;
+
+ /**
+ * @user_data: User data provided as part of the command that triggered this sample.
+ *
+ * - Automatic samples (periodic ones or those around non-counting periods or power state
+ * transitions) will be tagged with the user_data provided as part of the
+ * DRM_PANTHOR_PERF_COMMAND_START call.
+ * - Manual samples will be tagged with the user_data provided with the
+ * DRM_PANTHOR_PERF_COMMAND_SAMPLE call.
+ * - A session's final automatic sample will be tagged with the user_data provided with the
+ * DRM_PANTHOR_PERF_COMMAND_STOP call.
+ */
+ __u64 user_data;
+
+ /**
+ * @toplevel_clock_cycles: The number of cycles elapsed between
+ * drm_panthor_perf_sample_header::timestamp_start_ns and
+ * drm_panthor_perf_sample_header::timestamp_end_ns on the top-level clock if the
+ * corresponding bit is set in drm_panthor_perf_info::supported_clocks.
+ */
+ __u64 toplevel_clock_cycles;
+
+ /**
+ * @coregroup_clock_cycles: The number of cycles elapsed between
+ * drm_panthor_perf_sample_header::timestamp_start_ns and
+ * drm_panthor_perf_sample_header::timestamp_end_ns on the coregroup clock if the
+ * corresponding bit is set in drm_panthor_perf_info::supported_clocks.
+ */
+ __u64 coregroup_clock_cycles;
+
+ /**
+ * @shader_clock_cycles: The number of cycles elapsed between
+ * drm_panthor_perf_sample_header::timestamp_start_ns and
+ * drm_panthor_perf_sample_header::timestamp_end_ns on the shader core clock if the
+ * corresponding bit is set in drm_panthor_perf_info::supported_clocks.
+ */
+ __u64 shader_clock_cycles;
+};
+
+/**
+ * enum drm_panthor_perf_command - Command type passed to the DRM_PANTHOR_PERF_CONTROL
+ * IOCTL.
+ */
+enum drm_panthor_perf_command {
+ /** @DRM_PANTHOR_PERF_COMMAND_SETUP: Create a new performance counter sampling context. */
+ DRM_PANTHOR_PERF_COMMAND_SETUP,
+
+ /** @DRM_PANTHOR_PERF_COMMAND_TEARDOWN: Teardown a performance counter sampling context. */
+ DRM_PANTHOR_PERF_COMMAND_TEARDOWN,
+
+ /** @DRM_PANTHOR_PERF_COMMAND_START: Start a sampling session on the indicated context. */
+ DRM_PANTHOR_PERF_COMMAND_START,
+
+ /** @DRM_PANTHOR_PERF_COMMAND_STOP: Stop the sampling session on the indicated context. */
+ DRM_PANTHOR_PERF_COMMAND_STOP,
+
+ /**
+ * @DRM_PANTHOR_PERF_COMMAND_SAMPLE: Request a manual sample on the indicated context.
+ *
+ * When the sampling session is configured with a non-zero sampling frequency, any
+ * DRM_PANTHOR_PERF_CONTROL calls with this command will be ignored and return an
+ * -EINVAL.
+ */
+ DRM_PANTHOR_PERF_COMMAND_SAMPLE,
+};
+
+/**
+ * struct drm_panthor_perf_control - Arguments passed to DRM_PANTHOR_IOCTL_PERF_CONTROL.
+ */
+struct drm_panthor_perf_control {
+ /** @cmd: Command from enum drm_panthor_perf_command. */
+ __u32 cmd;
+
+ /**
+ * @handle: session handle.
+ *
+ * Returned by the DRM_PANTHOR_PERF_COMMAND_SETUP call.
+ * It must be used in subsequent commands for the same context.
+ */
+ __u32 handle;
+
+ /**
+ * @size: size of the command structure.
+ *
+ * If the pointer is NULL, the size is updated by the driver to provide the size of the
+ * output structure. If the pointer is not NULL, the driver will only copy min(size,
+ * struct_size) to the pointer and update the size accordingly.
+ */
+ __u64 size;
+
+ /**
+ * @pointer: user pointer to a command type struct, such as
+ * @struct drm_panthor_perf_cmd_start.
+ */
+ __u64 pointer;
+};
+
+/**
+ * enum drm_panthor_perf_counter_set - The counter set to be requested from the hardware.
+ *
+ * The hardware supports a single performance counter set at a time, so requesting any set other
+ * than the primary may fail if another process is sampling at the same time.
+ *
+ * If in doubt, the primary counter set has the most commonly used counters and requires no
+ * additional permissions to open.
+ */
+enum drm_panthor_perf_counter_set {
+ /**
+ * @DRM_PANTHOR_PERF_SET_PRIMARY: The default set configured on the hardware.
+ *
+ * This is the only set for which all counters in all blocks are defined.
+ */
+ DRM_PANTHOR_PERF_SET_PRIMARY,
+
+ /**
+ * @DRM_PANTHOR_PERF_SET_SECONDARY: The secondary performance counter set.
+ *
+ * Some blocks may not have any defined counters for this set, and the block will
+ * have the UNAVAILABLE block state permanently set in the block header.
+ *
+ * Accessing this set requires the calling process to have the CAP_PERFMON capability.
+ */
+ DRM_PANTHOR_PERF_SET_SECONDARY,
+
+ /**
+ * @DRM_PANTHOR_PERF_SET_TERTIARY: The tertiary performance counter set.
+ *
+ * Some blocks may not have any defined counters for this set, and the block will have
+ * the UNAVAILABLE block state permanently set in the block header. Note that the
+ * tertiary set has the fewest defined counter blocks.
+ *
+ * Accessing this set requires the calling process to have the CAP_PERFMON capability.
+ */
+ DRM_PANTHOR_PERF_SET_TERTIARY,
+};
+
+/**
+ * struct drm_panthor_perf_ringbuf_control - Struct used to map in the ring buffer control indices
+ * into memory shared between user and kernel.
+ *
+ */
+struct drm_panthor_perf_ringbuf_control {
+ /**
+ * @extract_idx: The index of the latest sample that was processed by userspace. Only
+ * modifiable by userspace.
+ */
+ __u64 extract_idx;
+
+ /**
+ * @insert_idx: The index of the latest sample emitted by the kernel. Only modiable by
+ * modifiable by the kernel.
+ */
+ __u64 insert_idx;
+};
+
+/**
+ * struct drm_panthor_perf_cmd_setup - Arguments passed to DRM_PANTHOR_IOCTL_PERF_CONTROL
+ * when the DRM_PANTHOR_PERF_COMMAND_SETUP command is specified.
+ */
+struct drm_panthor_perf_cmd_setup {
+ /**
+ * @block_set: Set of performance counter blocks, member of
+ * enum drm_panthor_perf_block_set.
+ *
+ * This is a global configuration and only one set can be active at a time. If
+ * another client has already requested a counter set, any further requests
+ * for a different counter set will fail and return an -EBUSY.
+ *
+ * If the requested set does not exist, the request will fail and return an -EINVAL.
+ *
+ * Some sets have additional requirements to be enabled, and the setup request will
+ * fail with an -EACCES if these requirements are not satisfied.
+ */
+ __u8 block_set;
+
+ /** @pad: MBZ. */
+ __u8 pad[7];
+
+ /** @fd: eventfd for signalling the availability of a new sample. */
+ __u32 fd;
+
+ /** @ringbuf_handle: Handle to the BO to write perf counter sample to. */
+ __u32 ringbuf_handle;
+
+ /**
+ * @control_handle: Handle to the BO containing a contiguous 16 byte range, used for the
+ * insert and extract indices for the ringbuffer.
+ */
+ __u32 control_handle;
+
+ /**
+ * @sample_slots: The number of slots available in the userspace-provided BO. Must be
+ * a power of 2.
+ *
+ * If sample_slots * sample_size does not match the BO size, the setup request will fail.
+ */
+ __u32 sample_slots;
+
+ /**
+ * @control_offset: Offset into the control BO where the insert and extract indices are
+ * located.
+ */
+ __u64 control_offset;
+
+ /**
+ * @sample_freq_ns: Period between automatic counter sample collection in nanoseconds. Zero
+ * disables automatic collection and all collection must be done through explicit calls
+ * to DRM_PANTHOR_PERF_CONTROL.SAMPLE. Non-zero values will disable manual counter sampling
+ * via the DRM_PANTHOR_PERF_COMMAND_SAMPLE command.
+ *
+ * This disables software-triggered periodic sampling, but hardware will still trigger
+ * automatic samples on certain events, including shader core power transitions, and
+ * entries to and exits from non-counting periods. The final stop command will also
+ * trigger a sample to ensure no data is lost.
+ */
+ __u64 sample_freq_ns;
+
+ /**
+ * @fw_enable_mask: Bitmask of counters to request from the FW counter block. Any bits
+ * past the first drm_panthor_perf_info.counters_per_block bits will be ignored. Bit 0
+ * corresponds to counter 0.
+ */
+ __u64 fw_enable_mask[2];
+
+ /**
+ * @cshw_enable_mask: Bitmask of counters to request from the CSHW counter block. Any bits
+ * past the first drm_panthor_perf_info.counters_per_block bits will be ignored. Bit 0
+ * corresponds to counter 0.
+ */
+ __u64 cshw_enable_mask[2];
+
+ /**
+ * @tiler_enable_mask: Bitmask of counters to request from the tiler counter block. Any
+ * bits past the first drm_panthor_perf_info.counters_per_block bits will be ignored. Bit
+ * 0 corresponds to counter 0.
+ */
+ __u64 tiler_enable_mask[2];
+
+ /**
+ * @memsys_enable_mask: Bitmask of counters to request from the memsys counter blocks. Any
+ * bits past the first drm_panthor_perf_info.counters_per_block bits will be ignored. Bit 0
+ * corresponds to counter 0.
+ */
+ __u64 memsys_enable_mask[2];
+
+ /**
+ * @shader_enable_mask: Bitmask of counters to request from the shader core counter blocks.
+ * Any bits past the first drm_panthor_perf_info.counters_per_block bits will be ignored.
+ * Bit 0 corresponds to counter 0.
+ */
+ __u64 shader_enable_mask[2];
+};
+
+/**
+ * struct drm_panthor_perf_cmd_start - Arguments passed to DRM_PANTHOR_IOCTL_PERF_CONTROL
+ * when the DRM_PANTHOR_PERF_COMMAND_START command is specified.
+ */
+struct drm_panthor_perf_cmd_start {
+ /**
+ * @user_data: User provided data that will be attached to automatic samples collected
+ * until the next DRM_PANTHOR_PERF_COMMAND_STOP.
+ */
+ __u64 user_data;
+};
+
+/**
+ * struct drm_panthor_perf_cmd_stop - Arguments passed to DRM_PANTHOR_IOCTL_PERF_CONTROL
+ * when the DRM_PANTHOR_PERF_COMMAND_STOP command is specified.
+ */
+struct drm_panthor_perf_cmd_stop {
+ /**
+ * @user_data: User provided data that will be attached to the automatic sample collected
+ * at the end of this sampling session.
+ */
+ __u64 user_data;
+};
+
+/**
+ * struct drm_panthor_perf_cmd_sample - Arguments passed to DRM_PANTHOR_IOCTL_PERF_CONTROL
+ * when the DRM_PANTHOR_PERF_COMMAND_SAMPLE command is specified.
+ */
+struct drm_panthor_perf_cmd_sample {
+ /** @user_data: User provided data that will be attached to the sample.*/
+ __u64 user_data;
+};
+
/**
* DRM_IOCTL_PANTHOR() - Build a Panthor IOCTL number
* @__access: Access type. Must be R, W or RW.
@@ -1019,6 +1582,8 @@ enum {
DRM_IOCTL_PANTHOR(WR, TILER_HEAP_CREATE, tiler_heap_create),
DRM_IOCTL_PANTHOR_TILER_HEAP_DESTROY =
DRM_IOCTL_PANTHOR(WR, TILER_HEAP_DESTROY, tiler_heap_destroy),
+ DRM_IOCTL_PANTHOR_PERF_CONTROL =
+ DRM_IOCTL_PANTHOR(WR, PERF_CONTROL, perf_control)
};
#if defined(__cplusplus)
--
2.33.0.dirty
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v4 2/7] drm/panthor: Add DEV_QUERY.PERF_INFO handling for Gx10
2025-05-16 15:49 [PATCH v4 0/7] Performance counter implementation with single manual client support Lukas Zapolskas
2025-05-16 15:49 ` [PATCH v4 1/7] drm/panthor: Add performance counter uAPI Lukas Zapolskas
@ 2025-05-16 15:49 ` Lukas Zapolskas
2025-07-18 2:52 ` Adrián Larumbe
2025-07-18 15:11 ` Adrián Larumbe
2025-05-16 15:49 ` [PATCH v4 3/7] drm/panthor: Add panthor perf initialization and termination Lukas Zapolskas
` (4 subsequent siblings)
6 siblings, 2 replies; 29+ messages in thread
From: Lukas Zapolskas @ 2025-05-16 15:49 UTC (permalink / raw)
To: Boris Brezillon, Steven Price, Liviu Dudau, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
dri-devel, linux-kernel
Cc: Adrián Larumbe, Lukas Zapolskas
This change adds the IOCTL to query data about the performance counter
setup. Some of this data was available via previous DEV_QUERY calls,
for instance for GPU info, but exposing it via PERF_INFO
minimizes the overhead of creating a single session to just the one
aggregate IOCTL.
Signed-off-by: Lukas Zapolskas <lukas.zapolskas@arm.com>
Reviewed-by: Adrián Larumbe <adrian.larumbe@collabora.com>
---
drivers/gpu/drm/panthor/Makefile | 1 +
drivers/gpu/drm/panthor/panthor_device.c | 5 ++
drivers/gpu/drm/panthor/panthor_device.h | 3 +
drivers/gpu/drm/panthor/panthor_drv.c | 10 +++-
drivers/gpu/drm/panthor/panthor_fw.h | 3 +
drivers/gpu/drm/panthor/panthor_perf.c | 76 ++++++++++++++++++++++++
drivers/gpu/drm/panthor/panthor_perf.h | 15 +++++
drivers/gpu/drm/panthor/panthor_regs.h | 1 +
8 files changed, 113 insertions(+), 1 deletion(-)
create mode 100644 drivers/gpu/drm/panthor/panthor_perf.c
create mode 100644 drivers/gpu/drm/panthor/panthor_perf.h
diff --git a/drivers/gpu/drm/panthor/Makefile b/drivers/gpu/drm/panthor/Makefile
index 15294719b09c..0df9947f3575 100644
--- a/drivers/gpu/drm/panthor/Makefile
+++ b/drivers/gpu/drm/panthor/Makefile
@@ -9,6 +9,7 @@ panthor-y := \
panthor_gpu.o \
panthor_heap.o \
panthor_mmu.o \
+ panthor_perf.o \
panthor_sched.o
obj-$(CONFIG_DRM_PANTHOR) += panthor.o
diff --git a/drivers/gpu/drm/panthor/panthor_device.c b/drivers/gpu/drm/panthor/panthor_device.c
index a9da1d1eeb70..76b4cf3dc391 100644
--- a/drivers/gpu/drm/panthor/panthor_device.c
+++ b/drivers/gpu/drm/panthor/panthor_device.c
@@ -19,6 +19,7 @@
#include "panthor_fw.h"
#include "panthor_gpu.h"
#include "panthor_mmu.h"
+#include "panthor_perf.h"
#include "panthor_regs.h"
#include "panthor_sched.h"
@@ -259,6 +260,10 @@ int panthor_device_init(struct panthor_device *ptdev)
if (ret)
goto err_unplug_fw;
+ ret = panthor_perf_init(ptdev);
+ if (ret)
+ goto err_unplug_fw;
+
/* ~3 frames */
pm_runtime_set_autosuspend_delay(ptdev->base.dev, 50);
pm_runtime_use_autosuspend(ptdev->base.dev);
diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
index da6574021664..657ccc39568c 100644
--- a/drivers/gpu/drm/panthor/panthor_device.h
+++ b/drivers/gpu/drm/panthor/panthor_device.h
@@ -120,6 +120,9 @@ struct panthor_device {
/** @csif_info: Command stream interface information. */
struct drm_panthor_csif_info csif_info;
+ /** @perf_info: Performance counter interface information. */
+ struct drm_panthor_perf_info perf_info;
+
/** @gpu: GPU management data. */
struct panthor_gpu *gpu;
diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c
index 06fe46e32073..9d2b716cca45 100644
--- a/drivers/gpu/drm/panthor/panthor_drv.c
+++ b/drivers/gpu/drm/panthor/panthor_drv.c
@@ -175,7 +175,8 @@ panthor_get_uobj_array(const struct drm_panthor_obj_array *in, u32 min_stride,
PANTHOR_UOBJ_DECL(struct drm_panthor_sync_op, timeline_value), \
PANTHOR_UOBJ_DECL(struct drm_panthor_queue_submit, syncs), \
PANTHOR_UOBJ_DECL(struct drm_panthor_queue_create, ringbuf_size), \
- PANTHOR_UOBJ_DECL(struct drm_panthor_vm_bind_op, syncs))
+ PANTHOR_UOBJ_DECL(struct drm_panthor_vm_bind_op, syncs), \
+ PANTHOR_UOBJ_DECL(struct drm_panthor_perf_info, shader_blocks))
/**
* PANTHOR_UOBJ_SET() - Copy a kernel object to a user object.
@@ -835,6 +836,10 @@ static int panthor_ioctl_dev_query(struct drm_device *ddev, void *data, struct d
args->size = sizeof(priorities_info);
return 0;
+ case DRM_PANTHOR_DEV_QUERY_PERF_INFO:
+ args->size = sizeof(ptdev->perf_info);
+ return 0;
+
default:
return -EINVAL;
}
@@ -859,6 +864,9 @@ static int panthor_ioctl_dev_query(struct drm_device *ddev, void *data, struct d
panthor_query_group_priorities_info(file, &priorities_info);
return PANTHOR_UOBJ_SET(args->pointer, args->size, priorities_info);
+ case DRM_PANTHOR_DEV_QUERY_PERF_INFO:
+ return PANTHOR_UOBJ_SET(args->pointer, args->size, ptdev->perf_info);
+
default:
return -EINVAL;
}
diff --git a/drivers/gpu/drm/panthor/panthor_fw.h b/drivers/gpu/drm/panthor/panthor_fw.h
index 6598d96c6d2a..8bcb933fa790 100644
--- a/drivers/gpu/drm/panthor/panthor_fw.h
+++ b/drivers/gpu/drm/panthor/panthor_fw.h
@@ -197,8 +197,11 @@ struct panthor_fw_global_control_iface {
u32 output_va;
u32 group_num;
u32 group_stride;
+#define GLB_PERFCNT_FW_SIZE(x) ((((x) >> 16) << 8))
u32 perfcnt_size;
u32 instr_features;
+#define PERFCNT_FEATURES_MD_SIZE(x) (((x) & GENMASK(3, 0)) << 8)
+ u32 perfcnt_features;
};
struct panthor_fw_global_input_iface {
diff --git a/drivers/gpu/drm/panthor/panthor_perf.c b/drivers/gpu/drm/panthor/panthor_perf.c
new file mode 100644
index 000000000000..66e9a197ac1f
--- /dev/null
+++ b/drivers/gpu/drm/panthor/panthor_perf.c
@@ -0,0 +1,76 @@
+// SPDX-License-Identifier: GPL-2.0 or MIT
+/* Copyright 2023 Collabora Ltd */
+/* Copyright 2025 Arm ltd. */
+
+#include <linux/bitops.h>
+#include <drm/panthor_drm.h>
+
+#include "panthor_device.h"
+#include "panthor_fw.h"
+#include "panthor_perf.h"
+
+struct panthor_perf_counter_block {
+ struct drm_panthor_perf_block_header header;
+ u64 counters[];
+};
+
+static size_t get_annotated_block_size(size_t counters_per_block)
+{
+ return struct_size_t(struct panthor_perf_counter_block, counters, counters_per_block);
+}
+
+static size_t session_get_user_sample_size(const struct drm_panthor_perf_info *const info)
+{
+ const size_t block_size = get_annotated_block_size(info->counters_per_block);
+ const size_t block_nr = info->cshw_blocks + info->fw_blocks +
+ info->tiler_blocks + info->memsys_blocks + info->shader_blocks;
+
+ return sizeof(struct drm_panthor_perf_sample_header) + (block_size * block_nr);
+}
+
+/**
+ * PANTHOR_PERF_COUNTERS_PER_BLOCK - On CSF architectures pre-11.x, the number of counters
+ * per block was hardcoded to be 64. Arch 11.0 onwards supports the PRFCNT_FEATURES GPU register,
+ * which indicates the same information.
+ */
+#define PANTHOR_PERF_COUNTERS_PER_BLOCK (64)
+
+static void panthor_perf_info_init(struct panthor_device *ptdev)
+{
+ struct panthor_fw_global_iface *glb_iface = panthor_fw_get_glb_iface(ptdev);
+ struct drm_panthor_perf_info *const perf_info = &ptdev->perf_info;
+
+ if (PERFCNT_FEATURES_MD_SIZE(glb_iface->control->perfcnt_features))
+ perf_info->flags |= DRM_PANTHOR_PERF_BLOCK_STATES_SUPPORT;
+
+ perf_info->counters_per_block = PANTHOR_PERF_COUNTERS_PER_BLOCK;
+
+ perf_info->sample_header_size = sizeof(struct drm_panthor_perf_sample_header);
+ perf_info->block_header_size = sizeof(struct drm_panthor_perf_block_header);
+
+ if (GLB_PERFCNT_FW_SIZE(glb_iface->control->perfcnt_size))
+ perf_info->fw_blocks = 1;
+
+ perf_info->cshw_blocks = 1;
+ perf_info->tiler_blocks = 1;
+ perf_info->memsys_blocks = GPU_MEM_FEATURES_L2_SLICES(ptdev->gpu_info.mem_features);
+ perf_info->shader_blocks = hweight64(ptdev->gpu_info.shader_present);
+
+ perf_info->sample_size = session_get_user_sample_size(perf_info);
+}
+
+/**
+ * panthor_perf_init - Initialize the performance counter subsystem.
+ * @ptdev: Panthor device
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+int panthor_perf_init(struct panthor_device *ptdev)
+{
+ if (!ptdev)
+ return -EINVAL;
+
+ panthor_perf_info_init(ptdev);
+
+ return 0;
+}
diff --git a/drivers/gpu/drm/panthor/panthor_perf.h b/drivers/gpu/drm/panthor/panthor_perf.h
new file mode 100644
index 000000000000..3c32c24c164c
--- /dev/null
+++ b/drivers/gpu/drm/panthor/panthor_perf.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0 or MIT */
+/* Copyright 2025 Collabora Ltd */
+/* Copyright 2025 Arm ltd. */
+
+#ifndef __PANTHOR_PERF_H__
+#define __PANTHOR_PERF_H__
+
+#include <linux/types.h>
+
+struct panthor_device;
+
+int panthor_perf_init(struct panthor_device *ptdev);
+
+#endif /* __PANTHOR_PERF_H__ */
+
diff --git a/drivers/gpu/drm/panthor/panthor_regs.h b/drivers/gpu/drm/panthor/panthor_regs.h
index b7b3b3add166..d9e9379d1a20 100644
--- a/drivers/gpu/drm/panthor/panthor_regs.h
+++ b/drivers/gpu/drm/panthor/panthor_regs.h
@@ -27,6 +27,7 @@
#define GPU_TILER_FEATURES 0xC
#define GPU_MEM_FEATURES 0x10
#define GROUPS_L2_COHERENT BIT(0)
+#define GPU_MEM_FEATURES_L2_SLICES(x) ((((x) & GENMASK(11, 8)) >> 8) + 1)
#define GPU_MMU_FEATURES 0x14
#define GPU_MMU_FEATURES_VA_BITS(x) ((x) & GENMASK(7, 0))
--
2.33.0.dirty
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v4 3/7] drm/panthor: Add panthor perf initialization and termination
2025-05-16 15:49 [PATCH v4 0/7] Performance counter implementation with single manual client support Lukas Zapolskas
2025-05-16 15:49 ` [PATCH v4 1/7] drm/panthor: Add performance counter uAPI Lukas Zapolskas
2025-05-16 15:49 ` [PATCH v4 2/7] drm/panthor: Add DEV_QUERY.PERF_INFO handling for Gx10 Lukas Zapolskas
@ 2025-05-16 15:49 ` Lukas Zapolskas
2025-07-18 3:10 ` Adrián Larumbe
2025-05-16 15:49 ` [PATCH v4 4/7] drm/panthor: Introduce sampling sessions to handle userspace clients Lukas Zapolskas
` (3 subsequent siblings)
6 siblings, 1 reply; 29+ messages in thread
From: Lukas Zapolskas @ 2025-05-16 15:49 UTC (permalink / raw)
To: Boris Brezillon, Steven Price, Liviu Dudau, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
dri-devel, linux-kernel
Cc: Adrián Larumbe, Lukas Zapolskas
Added the panthor_perf system initialization and unplug code to allow
for the handling of userspace sessions to be added in follow-up
patches.
Signed-off-by: Lukas Zapolskas <lukas.zapolskas@arm.com>
---
drivers/gpu/drm/panthor/panthor_device.c | 2 +
drivers/gpu/drm/panthor/panthor_device.h | 5 +-
drivers/gpu/drm/panthor/panthor_perf.c | 62 +++++++++++++++++++++++-
drivers/gpu/drm/panthor/panthor_perf.h | 1 +
4 files changed, 68 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/panthor/panthor_device.c b/drivers/gpu/drm/panthor/panthor_device.c
index 76b4cf3dc391..7ac985d44655 100644
--- a/drivers/gpu/drm/panthor/panthor_device.c
+++ b/drivers/gpu/drm/panthor/panthor_device.c
@@ -98,6 +98,7 @@ void panthor_device_unplug(struct panthor_device *ptdev)
/* Now, try to cleanly shutdown the GPU before the device resources
* get reclaimed.
*/
+ panthor_perf_unplug(ptdev);
panthor_sched_unplug(ptdev);
panthor_fw_unplug(ptdev);
panthor_mmu_unplug(ptdev);
@@ -277,6 +278,7 @@ int panthor_device_init(struct panthor_device *ptdev)
err_disable_autosuspend:
pm_runtime_dont_use_autosuspend(ptdev->base.dev);
+ panthor_perf_unplug(ptdev);
panthor_sched_unplug(ptdev);
err_unplug_fw:
diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
index 657ccc39568c..818c4d96d448 100644
--- a/drivers/gpu/drm/panthor/panthor_device.h
+++ b/drivers/gpu/drm/panthor/panthor_device.h
@@ -27,7 +27,7 @@ struct panthor_heap_pool;
struct panthor_job;
struct panthor_mmu;
struct panthor_fw;
-struct panthor_perfcnt;
+struct panthor_perf;
struct panthor_vm;
struct panthor_vm_pool;
@@ -138,6 +138,9 @@ struct panthor_device {
/** @devfreq: Device frequency scaling management data. */
struct panthor_devfreq *devfreq;
+ /** @perf: Performance counter management data. */
+ struct panthor_perf *perf;
+
/** @unplug: Device unplug related fields. */
struct {
/** @lock: Lock used to serialize unplug operations. */
diff --git a/drivers/gpu/drm/panthor/panthor_perf.c b/drivers/gpu/drm/panthor/panthor_perf.c
index 66e9a197ac1f..9365ce9fed04 100644
--- a/drivers/gpu/drm/panthor/panthor_perf.c
+++ b/drivers/gpu/drm/panthor/panthor_perf.c
@@ -9,6 +9,19 @@
#include "panthor_fw.h"
#include "panthor_perf.h"
+struct panthor_perf {
+ /** @next_session: The ID of the next session. */
+ u32 next_session;
+
+ /** @session_range: The number of sessions supported at a time. */
+ struct xa_limit session_range;
+
+ /**
+ * @sessions: Global map of sessions, accessed by their ID.
+ */
+ struct xarray sessions;
+};
+
struct panthor_perf_counter_block {
struct drm_panthor_perf_block_header header;
u64 counters[];
@@ -63,14 +76,61 @@ static void panthor_perf_info_init(struct panthor_device *ptdev)
* panthor_perf_init - Initialize the performance counter subsystem.
* @ptdev: Panthor device
*
+ * The performance counters require the FW interface to be available to setup the
+ * sampling ringbuffers, so this must be called only after FW is initialized.
+ *
* Return: 0 on success, negative error code on failure.
*/
int panthor_perf_init(struct panthor_device *ptdev)
{
+ struct panthor_perf *perf __free(kfree) = NULL;
+ int ret = 0;
+
if (!ptdev)
return -EINVAL;
panthor_perf_info_init(ptdev);
- return 0;
+ perf = kzalloc(sizeof(*perf), GFP_KERNEL);
+ if (ZERO_OR_NULL_PTR(perf))
+ return -ENOMEM;
+
+ xa_init_flags(&perf->sessions, XA_FLAGS_ALLOC);
+
+ perf->session_range = (struct xa_limit) {
+ .min = 0,
+ .max = 1,
+ };
+
+ drm_info(&ptdev->base, "Performance counter subsystem initialized");
+
+ ptdev->perf = no_free_ptr(perf);
+
+ return ret;
+}
+
+/**
+ * panthor_perf_unplug - Terminate the performance counter subsystem.
+ * @ptdev: Panthor device.
+ *
+ * This function will terminate the performance counter control structures and any remaining
+ * sessions, after waiting for any pending interrupts.
+ */
+void panthor_perf_unplug(struct panthor_device *ptdev)
+{
+ struct panthor_perf *perf = ptdev->perf;
+
+ if (!perf)
+ return;
+
+ if (!xa_empty(&perf->sessions)) {
+ drm_err(&ptdev->base,
+ "Performance counter sessions active when unplugging the driver!");
+ }
+
+ xa_destroy(&perf->sessions);
+
+ kfree(ptdev->perf);
+
+ ptdev->perf = NULL;
}
diff --git a/drivers/gpu/drm/panthor/panthor_perf.h b/drivers/gpu/drm/panthor/panthor_perf.h
index 3c32c24c164c..e4805727b9e7 100644
--- a/drivers/gpu/drm/panthor/panthor_perf.h
+++ b/drivers/gpu/drm/panthor/panthor_perf.h
@@ -10,6 +10,7 @@
struct panthor_device;
int panthor_perf_init(struct panthor_device *ptdev);
+void panthor_perf_unplug(struct panthor_device *ptdev);
#endif /* __PANTHOR_PERF_H__ */
--
2.33.0.dirty
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v4 4/7] drm/panthor: Introduce sampling sessions to handle userspace clients
2025-05-16 15:49 [PATCH v4 0/7] Performance counter implementation with single manual client support Lukas Zapolskas
` (2 preceding siblings ...)
2025-05-16 15:49 ` [PATCH v4 3/7] drm/panthor: Add panthor perf initialization and termination Lukas Zapolskas
@ 2025-05-16 15:49 ` Lukas Zapolskas
2025-05-17 7:53 ` kernel test robot
` (2 more replies)
2025-05-16 15:49 ` [PATCH v4 5/7] drm/panthor: Implement the counter sampler and sample handling Lukas Zapolskas
` (2 subsequent siblings)
6 siblings, 3 replies; 29+ messages in thread
From: Lukas Zapolskas @ 2025-05-16 15:49 UTC (permalink / raw)
To: Boris Brezillon, Steven Price, Liviu Dudau, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
dri-devel, linux-kernel
Cc: Adrián Larumbe, Lukas Zapolskas
To allow for combining the requests from multiple userspace clients, an
intermediary layer between the HW/FW interfaces and userspace is
created, containing the information for the counter requests and
tracking of insert and extract indices. Each session starts inactive and
must be explicitly activated via PERF_CONTROL.START, and explicitly
stopped via PERF_CONTROL.STOP. Userspace identifies a single client with
its session ID and the panthor file it is associated with.
The SAMPLE and STOP commands both produce a single sample when called,
and these samples can be disambiguated via the opaque user data field
passed in the PERF_CONTROL uAPI. If this functionality is not desired,
these fields can be kept as zero, as the kernel copies this value into
the corresponding sample without attempting to interpret it.
Currently, only manual sampling sessions are supported, providing
samples when userspace calls PERF_CONTROL.SAMPLE, and only a single
session is allowed at a time. Multiple sessions and periodic sampling
will be enabled in following patches.
No protection is provided against the 32-bit hardware counter overflows,
so for the moment it is up to userspace to ensure that the counters are
sampled at a reasonable frequency.
The counter set enum is added to the uapi to clarify the restrictions on
calling the interface.
Signed-off-by: Lukas Zapolskas <lukas.zapolskas@arm.com>
---
drivers/gpu/drm/panthor/panthor_device.h | 3 +
drivers/gpu/drm/panthor/panthor_drv.c | 1 +
drivers/gpu/drm/panthor/panthor_perf.c | 694 ++++++++++++++++++++++-
drivers/gpu/drm/panthor/panthor_perf.h | 16 +
4 files changed, 713 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
index 818c4d96d448..3fa0882fe81b 100644
--- a/drivers/gpu/drm/panthor/panthor_device.h
+++ b/drivers/gpu/drm/panthor/panthor_device.h
@@ -225,6 +225,9 @@ struct panthor_file {
/** @ptdev: Device attached to this file. */
struct panthor_device *ptdev;
+ /** @drm_file: Corresponding drm_file */
+ struct drm_file *drm_file;
+
/** @vms: VM pool attached to this file. */
struct panthor_vm_pool *vms;
diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c
index 9d2b716cca45..4c1381320859 100644
--- a/drivers/gpu/drm/panthor/panthor_drv.c
+++ b/drivers/gpu/drm/panthor/panthor_drv.c
@@ -1356,6 +1356,7 @@ panthor_open(struct drm_device *ddev, struct drm_file *file)
}
pfile->ptdev = ptdev;
+ pfile->drm_file = file;
ret = panthor_vm_pool_create(pfile);
if (ret)
diff --git a/drivers/gpu/drm/panthor/panthor_perf.c b/drivers/gpu/drm/panthor/panthor_perf.c
index 9365ce9fed04..15fa533731f3 100644
--- a/drivers/gpu/drm/panthor/panthor_perf.c
+++ b/drivers/gpu/drm/panthor/panthor_perf.c
@@ -2,13 +2,177 @@
/* Copyright 2023 Collabora Ltd */
/* Copyright 2025 Arm ltd. */
-#include <linux/bitops.h>
+#include <drm/drm_gem.h>
#include <drm/panthor_drm.h>
+#include <linux/bitops.h>
+#include <linux/circ_buf.h>
#include "panthor_device.h"
#include "panthor_fw.h"
#include "panthor_perf.h"
+/**
+ * PANTHOR_PERF_EM_BITS - Number of bits in a user-facing enable mask. This must correspond
+ * to the maximum number of counters available for selection on the newest
+ * Mali GPUs (128 as of the Mali-Gx15).
+ */
+#define PANTHOR_PERF_EM_BITS (BITS_PER_TYPE(u64) * 2)
+
+enum panthor_perf_session_state {
+ /** @PANTHOR_PERF_SESSION_ACTIVE: The session is active and can be used for sampling. */
+ PANTHOR_PERF_SESSION_ACTIVE = 0,
+
+ /**
+ * @PANTHOR_PERF_SESSION_OVERFLOW: The session encountered an overflow in one of the
+ * counters during the last sampling period. This flag
+ * gets propagated as part of samples emitted for this
+ * session, to ensure the userspace client can gracefully
+ * handle this data corruption.
+ */
+ PANTHOR_PERF_SESSION_OVERFLOW,
+
+ /* Must be last */
+ PANTHOR_PERF_SESSION_MAX,
+};
+
+struct panthor_perf_enable_masks {
+ /**
+ * @mask: Array of bitmasks indicating the counters userspace requested, where
+ * one bit represents a single counter. Used to build the firmware configuration
+ * and ensure that userspace clients obtain only the counters they requested.
+ */
+ unsigned long mask[DRM_PANTHOR_PERF_BLOCK_MAX][BITS_TO_LONGS(PANTHOR_PERF_EM_BITS)];
+};
+
+struct panthor_perf_counter_block {
+ struct drm_panthor_perf_block_header header;
+ u64 counters[];
+};
+
+/**
+ * enum session_sample_type - Enum of the types of samples a session can request.
+ */
+enum session_sample_type {
+ /** @SAMPLE_TYPE_NONE: A sample has not been requested by this session. */
+ SAMPLE_TYPE_NONE,
+
+ /** @SAMPLE_TYPE_INITIAL: An initial sample has been requested by this session. */
+ SAMPLE_TYPE_INITIAL,
+
+ /** @SAMPLE_TYPE_REGULAR: A regular sample has been requested by this session. */
+ SAMPLE_TYPE_REGULAR,
+};
+
+struct panthor_perf_session {
+ DECLARE_BITMAP(state, PANTHOR_PERF_SESSION_MAX);
+
+ /**
+ * @pending_sample_request: The type of sample request that is currently pending:
+ * - when a sample is not requested, the data should be accumulated
+ * into the next slot of its ring buffer, but the extract index
+ * should not be updated, and the user-space session must
+ * not be signaled.
+ * - when an initial sample is requested, the data must not be
+ * emitted into the target ring buffer and the userspace client
+ * must not be notified.
+ * - when a regular sample is requested, the data must be emitted
+ * into the target ring buffer, and the userspace client must
+ * be signalled.
+ */
+ enum session_sample_type pending_sample_request;
+
+ /**
+ * @user_sample_size: The size of a single sample as exposed to userspace. For the sake of
+ * simplicity, the current implementation exposes the same structure
+ * as provided by firmware, after annotating the sample and the blocks,
+ * and zero-extending the counters themselves (to account for in-kernel
+ * accumulation).
+ *
+ * This may also allow further memory-optimizations of compressing the
+ * sample to provide only requested blocks, if deemed to be worth the
+ * additional complexity.
+ */
+ size_t user_sample_size;
+
+ /**
+ * @accum_idx: The last insert index indicates whether the current sample
+ * needs zeroing before accumulation. This is used to disambiguate
+ * between accumulating into an intermediate slot in the user ring buffer
+ * and zero-ing the buffer before copying data over.
+ */
+ u32 accum_idx;
+
+ /**
+ * @sample_freq_ns: Period between subsequent sample requests. Zero indicates that
+ * userspace will be responsible for requesting samples.
+ */
+ u64 sample_freq_ns;
+
+ /** @sample_start_ns: Sample request time, obtained from a monotonic raw clock. */
+ u64 sample_start_ns;
+
+ /**
+ * @user_data: Opaque handle passed in when starting a session, requesting a sample (for
+ * manual sampling sessions only) and when stopping a session. This handle
+ * allows the disambiguation of a sample in the ringbuffer.
+ */
+ u64 user_data;
+
+ /**
+ * @eventfd: Event file descriptor context used to signal userspace of a new sample
+ * being emitted.
+ */
+ struct eventfd_ctx *eventfd;
+
+ /**
+ * @enabled_counters: This session's requested counters. Note that these cannot change
+ * for the lifetime of the session.
+ */
+ struct panthor_perf_enable_masks *enabled_counters;
+
+ /** @ringbuf_slots: Slots in the user-facing ringbuffer. */
+ size_t ringbuf_slots;
+
+ /** @ring_buf: BO for the userspace ringbuffer. */
+ struct drm_gem_object *ring_buf;
+
+ /**
+ * @control_buf: BO for the insert and extract indices.
+ */
+ struct drm_gem_object *control_buf;
+
+ /** @control: The mapped insert and extract indices. */
+ struct drm_panthor_perf_ringbuf_control *control;
+
+ /** @samples: The mapping of the @ring_buf into the kernel's VA space. */
+ u8 *samples;
+
+ /**
+ * @pending: The list node used by the sampler to track the sessions that have not yet
+ * received a sample.
+ */
+ struct list_head pending;
+
+ /**
+ * @sessions: The list node used by the sampler to track the sessions waiting for a sample.
+ */
+ struct list_head sessions;
+
+ /**
+ * @pfile: The panthor file which was used to create a session, used for the postclose
+ * handling and to prevent a misconfigured userspace from closing unrelated
+ * sessions.
+ */
+ struct panthor_file *pfile;
+
+ /**
+ * @ref: Session reference count. The sample delivery to userspace is asynchronous, meaning
+ * the lifetime of the session must extend at least until the sample is exposed to
+ * userspace.
+ */
+ struct kref ref;
+};
+
struct panthor_perf {
/** @next_session: The ID of the next session. */
u32 next_session;
@@ -72,6 +236,122 @@ static void panthor_perf_info_init(struct panthor_device *ptdev)
perf_info->sample_size = session_get_user_sample_size(perf_info);
}
+static struct panthor_perf_enable_masks *panthor_perf_create_em(struct drm_panthor_perf_cmd_setup
+ *setup_args)
+{
+ struct panthor_perf_enable_masks *em = kmalloc(sizeof(*em), GFP_KERNEL);
+ if (IS_ERR_OR_NULL(em))
+ return em;
+
+ bitmap_from_arr64(em->mask[DRM_PANTHOR_PERF_BLOCK_FW],
+ setup_args->fw_enable_mask, PANTHOR_PERF_EM_BITS);
+ bitmap_from_arr64(em->mask[DRM_PANTHOR_PERF_BLOCK_CSHW],
+ setup_args->cshw_enable_mask, PANTHOR_PERF_EM_BITS);
+ bitmap_from_arr64(em->mask[DRM_PANTHOR_PERF_BLOCK_TILER],
+ setup_args->tiler_enable_mask, PANTHOR_PERF_EM_BITS);
+ bitmap_from_arr64(em->mask[DRM_PANTHOR_PERF_BLOCK_MEMSYS],
+ setup_args->memsys_enable_mask, PANTHOR_PERF_EM_BITS);
+ bitmap_from_arr64(em->mask[DRM_PANTHOR_PERF_BLOCK_SHADER],
+ setup_args->shader_enable_mask, PANTHOR_PERF_EM_BITS);
+
+ return em;
+}
+
+static u64 session_read_extract_idx(struct panthor_perf_session *session)
+{
+ const u64 slots = session->ringbuf_slots;
+
+ /* Userspace will update their own extract index to indicate that a sample is consumed
+ * from the ringbuffer, and we must ensure we read the latest value.
+ */
+ return smp_load_acquire(&session->control->extract_idx) % slots;
+}
+
+static u64 session_read_insert_idx(struct panthor_perf_session *session)
+{
+ const u64 slots = session->ringbuf_slots;
+
+ /*
+ * Userspace is able to write to the insert index, since it is mapped
+ * on the same page as the extract index. This should not happen
+ * in regular operation.
+ */
+ return smp_load_acquire(&session->control->insert_idx) % slots;
+}
+
+static void session_get(struct panthor_perf_session *session)
+{
+ kref_get(&session->ref);
+}
+
+static void session_free(struct kref *ref)
+{
+ struct panthor_perf_session *session = container_of(ref, typeof(*session), ref);
+
+ if (session->samples && session->ring_buf) {
+ struct iosys_map map = IOSYS_MAP_INIT_VADDR(session->samples);
+
+ drm_gem_vunmap_unlocked(session->ring_buf, &map);
+ drm_gem_object_put(session->ring_buf);
+ }
+
+ if (session->control && session->control_buf) {
+ struct iosys_map map = IOSYS_MAP_INIT_VADDR(session->control);
+
+ drm_gem_vunmap_unlocked(session->control_buf, &map);
+ drm_gem_object_put(session->control_buf);
+ }
+
+ eventfd_ctx_put(session->eventfd);
+
+ kfree(session);
+}
+
+static void session_put(struct panthor_perf_session *session)
+{
+ kref_put(&session->ref, session_free);
+}
+
+/**
+ * session_find - Find a session associated with the given session ID and
+ * panthor_file.
+ * @pfile: Panthor file.
+ * @perf: Panthor perf.
+ * @sid: Session ID.
+ *
+ * The reference count of a valid session is increased to ensure it does not disappear
+ * in the window between the XA lock being dropped and the internal session functions
+ * being called.
+ *
+ * Return: valid session pointer or an ERR_PTR.
+ */
+static struct panthor_perf_session *session_find(struct panthor_file *pfile,
+ struct panthor_perf *perf, u32 sid)
+{
+ struct panthor_perf_session *session;
+
+ if (!perf)
+ return ERR_PTR(-EINVAL);
+
+ xa_lock(&perf->sessions);
+ session = xa_load(&perf->sessions, sid);
+
+ if (!session || xa_is_err(session)) {
+ xa_unlock(&perf->sessions);
+ return ERR_PTR(-EBADF);
+ }
+
+ if (session->pfile != pfile) {
+ xa_unlock(&perf->sessions);
+ return ERR_PTR(-EINVAL);
+ }
+
+ session_get(session);
+ xa_unlock(&perf->sessions);
+
+ return session;
+}
+
/**
* panthor_perf_init - Initialize the performance counter subsystem.
* @ptdev: Panthor device
@@ -109,6 +389,412 @@ int panthor_perf_init(struct panthor_device *ptdev)
return ret;
}
+static int session_validate_set(u8 set)
+{
+ if (set > DRM_PANTHOR_PERF_SET_TERTIARY)
+ return -EINVAL;
+
+ if (set == DRM_PANTHOR_PERF_SET_PRIMARY)
+ return 0;
+
+ if (set > DRM_PANTHOR_PERF_SET_PRIMARY)
+ return capable(CAP_PERFMON) ? 0 : -EACCES;
+
+ return -EINVAL;
+}
+
+/**
+ * panthor_perf_session_setup - Create a user-visible session.
+ *
+ * @ptdev: Handle to the panthor device.
+ * @perf: Handle to the perf control structure.
+ * @setup_args: Setup arguments passed in via ioctl.
+ * @pfile: Panthor file associated with the request.
+ *
+ * Creates a new session associated with the session ID returned. When initialized, the
+ * session must explicitly request sampling to start with a successive call to PERF_CONTROL.START.
+ *
+ * Return: non-negative session identifier on success or negative error code on failure.
+ */
+int panthor_perf_session_setup(struct panthor_device *ptdev, struct panthor_perf *perf,
+ struct drm_panthor_perf_cmd_setup *setup_args,
+ struct panthor_file *pfile)
+{
+ struct panthor_perf_session *session;
+ struct drm_gem_object *ringbuffer;
+ struct drm_gem_object *control;
+ const size_t slots = setup_args->sample_slots;
+ struct panthor_perf_enable_masks *em;
+ struct iosys_map rb_map, ctrl_map;
+ size_t user_sample_size;
+ int session_id;
+ int ret;
+
+ ret = session_validate_set(setup_args->block_set);
+ if (ret) {
+ drm_err(&ptdev->base, "Did not meet requirements for set %d\n",
+ setup_args->block_set);
+ return ret;
+ }
+
+ session = kzalloc(sizeof(*session), GFP_KERNEL);
+ if (ZERO_OR_NULL_PTR(session))
+ return -ENOMEM;
+
+ ringbuffer = drm_gem_object_lookup(pfile->drm_file, setup_args->ringbuf_handle);
+ if (!ringbuffer) {
+ drm_err(&ptdev->base, "Could not find handle %d!\n", setup_args->ringbuf_handle);
+ ret = -EINVAL;
+ goto cleanup_session;
+ }
+
+ control = drm_gem_object_lookup(pfile->drm_file, setup_args->control_handle);
+ if (!control) {
+ drm_err(&ptdev->base, "Could not find handle %d!\n", setup_args->control_handle);
+ ret = -EINVAL;
+ goto cleanup_ringbuf;
+ }
+
+ user_sample_size = session_get_user_sample_size(&ptdev->perf_info) * slots;
+
+ if (ringbuffer->size != PFN_ALIGN(user_sample_size)) {
+ drm_err(&ptdev->base, "Incorrect ringbuffer size from userspace: user %zu vs kernel %lu\n",
+ ringbuffer->size, PFN_ALIGN(user_sample_size));
+ ret = -ENOMEM;
+ goto cleanup_control;
+ }
+
+ ret = drm_gem_vmap_unlocked(ringbuffer, &rb_map);
+ if (ret)
+ goto cleanup_control;
+
+ ret = drm_gem_vmap_unlocked(control, &ctrl_map);
+ if (ret)
+ goto cleanup_ring_map;
+
+ session->eventfd = eventfd_ctx_fdget(setup_args->fd);
+ if (IS_ERR(session->eventfd)) {
+ drm_err(&ptdev->base, "Invalid eventfd %d!\n", setup_args->fd);
+ ret = PTR_ERR_OR_ZERO(session->eventfd) ?: -EINVAL;
+ goto cleanup_control_map;
+ }
+
+ em = panthor_perf_create_em(setup_args);
+ if (IS_ERR_OR_NULL(em)) {
+ ret = -ENOMEM;
+ goto cleanup_eventfd;
+ }
+
+ INIT_LIST_HEAD(&session->sessions);
+ INIT_LIST_HEAD(&session->pending);
+
+ session->control = ctrl_map.vaddr;
+ *session->control = (struct drm_panthor_perf_ringbuf_control) { 0 };
+
+ session->samples = rb_map.vaddr;
+
+ /* TODO This will need validation when we support periodic sampling sessions */
+ if (setup_args->sample_freq_ns) {
+ ret = -EOPNOTSUPP;
+ goto cleanup_em;
+ }
+
+ ret = xa_alloc_cyclic(&perf->sessions, &session_id, session, perf->session_range,
+ &perf->next_session, GFP_KERNEL);
+ if (ret < 0) {
+ drm_err(&ptdev->base, "System session limit exceeded.\n");
+ ret = -EBUSY;
+ goto cleanup_em;
+ }
+
+ kref_init(&session->ref);
+ session->enabled_counters = em;
+
+ session->sample_freq_ns = setup_args->sample_freq_ns;
+ session->user_sample_size = user_sample_size;
+ session->ring_buf = ringbuffer;
+ session->ringbuf_slots = slots;
+ session->control_buf = control;
+ session->pfile = pfile;
+ session->accum_idx = U32_MAX;
+
+ return session_id;
+
+cleanup_em:
+ kfree(em);
+
+cleanup_eventfd:
+ eventfd_ctx_put(session->eventfd);
+
+cleanup_control_map:
+ drm_gem_vunmap_unlocked(control, &ctrl_map);
+
+cleanup_ring_map:
+ drm_gem_vunmap_unlocked(ringbuffer, &rb_map);
+
+cleanup_control:
+ drm_gem_object_put(control);
+
+cleanup_ringbuf:
+ drm_gem_object_put(ringbuffer);
+
+cleanup_session:
+ kfree(session);
+
+ return ret;
+}
+
+static int session_stop(struct panthor_perf *perf, struct panthor_perf_session *session,
+ u64 user_data)
+{
+ if (!test_bit(PANTHOR_PERF_SESSION_ACTIVE, session->state))
+ return 0;
+
+ const u64 extract_idx = session_read_extract_idx(session);
+ const u64 insert_idx = session_read_insert_idx(session);
+
+ /* Must have at least one slot remaining in the ringbuffer to sample. */
+ if (WARN_ON_ONCE(!CIRC_SPACE_TO_END(insert_idx, extract_idx, session->ringbuf_slots)))
+ return -EBUSY;
+
+ session->user_data = user_data;
+
+ clear_bit(PANTHOR_PERF_SESSION_ACTIVE, session->state);
+
+ /* TODO Calls to the FW interface will go here in later patches. */
+ return 0;
+}
+
+static int session_start(struct panthor_perf *perf, struct panthor_perf_session *session,
+ u64 user_data)
+{
+ if (test_bit(PANTHOR_PERF_SESSION_ACTIVE, session->state))
+ return 0;
+
+ set_bit(PANTHOR_PERF_SESSION_ACTIVE, session->state);
+
+ /*
+ * For manual sampling sessions, a start command does not correspond to a sample,
+ * and so the user data gets discarded.
+ */
+ if (session->sample_freq_ns)
+ session->user_data = user_data;
+
+ /* TODO Calls to the FW interface will go here in later patches. */
+ return 0;
+}
+
+static int session_sample(struct panthor_perf *perf, struct panthor_perf_session *session,
+ u64 user_data)
+{
+ if (!test_bit(PANTHOR_PERF_SESSION_ACTIVE, session->state))
+ return 0;
+
+ const u64 extract_idx = session_read_extract_idx(session);
+ const u64 insert_idx = session_read_insert_idx(session);
+
+ /* Manual sampling for periodic sessions is forbidden. */
+ if (session->sample_freq_ns)
+ return -EINVAL;
+
+ /*
+ * Must have at least two slots remaining in the ringbuffer to sample: one for
+ * the current sample, and one for a stop sample, since a stop command should
+ * always be acknowledged by taking a final sample and stopping the session.
+ */
+ if (CIRC_SPACE_TO_END(insert_idx, extract_idx, session->ringbuf_slots) < 2)
+ return -EBUSY;
+
+ session->sample_start_ns = ktime_get_raw_ns();
+ session->user_data = user_data;
+
+ return 0;
+}
+
+static int session_destroy(struct panthor_perf *perf, struct panthor_perf_session *session)
+{
+ session_put(session);
+
+ return 0;
+}
+
+static int session_teardown(struct panthor_perf *perf, struct panthor_perf_session *session)
+{
+ if (test_bit(PANTHOR_PERF_SESSION_ACTIVE, session->state))
+ return -EINVAL;
+
+ if (READ_ONCE(session->pending_sample_request) == SAMPLE_TYPE_NONE)
+ return -EBUSY;
+
+ return session_destroy(perf, session);
+}
+
+/**
+ * panthor_perf_session_teardown - Teardown the session associated with the @sid.
+ * @pfile: Open panthor file.
+ * @perf: Handle to the perf control structure.
+ * @sid: Session identifier.
+ *
+ * Destroys a stopped session where the last sample has been explicitly consumed
+ * or discarded. Active sessions will be ignored.
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+int panthor_perf_session_teardown(struct panthor_file *pfile, struct panthor_perf *perf, u32 sid)
+{
+ int err;
+ struct panthor_perf_session *session;
+
+ xa_lock(&perf->sessions);
+ session = __xa_store(&perf->sessions, sid, NULL, GFP_KERNEL);
+
+ if (xa_is_err(session)) {
+ err = xa_err(session);
+ goto restore;
+ }
+
+ if (session->pfile != pfile) {
+ err = -EINVAL;
+ goto restore;
+ }
+
+ session_get(session);
+ xa_unlock(&perf->sessions);
+
+ err = session_teardown(perf, session);
+
+ session_put(session);
+
+ return err;
+
+restore:
+ __xa_store(&perf->sessions, sid, session, GFP_KERNEL);
+ xa_unlock(&perf->sessions);
+
+ return err;
+}
+
+/**
+ * panthor_perf_session_start - Start sampling on a stopped session.
+ * @pfile: Open panthor file.
+ * @perf: Handle to the panthor perf control structure.
+ * @sid: Session identifier for the desired session.
+ * @user_data: An opaque value passed in from userspace.
+ *
+ * A session counts as stopped when it is created or when it is explicitly stopped after being
+ * started. Starting an active session is treated as a no-op.
+ *
+ * The @user_data parameter will be associated with all subsequent samples for a periodic
+ * sampling session and will be ignored for manual sampling ones in favor of the user data
+ * passed in the PERF_CONTROL.SAMPLE ioctl call.
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+int panthor_perf_session_start(struct panthor_file *pfile, struct panthor_perf *perf,
+ u32 sid, u64 user_data)
+{
+ struct panthor_perf_session *session = session_find(pfile, perf, sid);
+ int err;
+
+ if (IS_ERR_OR_NULL(session))
+ return IS_ERR(session) ? PTR_ERR(session) : -EINVAL;
+
+ err = session_start(perf, session, user_data);
+
+ session_put(session);
+
+ return err;
+}
+
+/**
+ * panthor_perf_session_stop - Stop sampling on an active session.
+ * @pfile: Open panthor file.
+ * @perf: Handle to the panthor perf control structure.
+ * @sid: Session identifier for the desired session.
+ * @user_data: An opaque value passed in from userspace.
+ *
+ * A session counts as active when it has been explicitly started via the PERF_CONTROL.START
+ * ioctl. Stopping a stopped session is treated as a no-op.
+ *
+ * To ensure data is not lost when sampling is stopping, there must always be at least one slot
+ * available for the final automatic sample, and the stop command will be rejected if there is not.
+ *
+ * The @user_data will always be associated with the final sample.
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+int panthor_perf_session_stop(struct panthor_file *pfile, struct panthor_perf *perf,
+ u32 sid, u64 user_data)
+{
+ struct panthor_perf_session *session = session_find(pfile, perf, sid);
+ int err;
+
+ if (IS_ERR_OR_NULL(session))
+ return IS_ERR(session) ? PTR_ERR(session) : -EINVAL;
+
+ err = session_stop(perf, session, user_data);
+
+ session_put(session);
+
+ return err;
+}
+
+/**
+ * panthor_perf_session_sample - Request a sample on a manual sampling session.
+ * @pfile: Open panthor file.
+ * @perf: Handle to the panthor perf control structure.
+ * @sid: Session identifier for the desired session.
+ * @user_data: An opaque value passed in from userspace.
+ *
+ * Only an active manual sampler is permitted to request samples directly. Failing to meet either
+ * of these conditions will cause the sampling request to be rejected. Requesting a manual sample
+ * with a full ringbuffer will see the request being rejected.
+ *
+ * The @user_data will always be unambiguously associated one-to-one with the resultant sample.
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+int panthor_perf_session_sample(struct panthor_file *pfile, struct panthor_perf *perf,
+ u32 sid, u64 user_data)
+{
+ struct panthor_perf_session *session = session_find(pfile, perf, sid);
+ int err;
+
+ if (IS_ERR_OR_NULL(session))
+ return IS_ERR(session) ? PTR_ERR(session) : -EINVAL;
+
+ err = session_sample(perf, session, user_data);
+
+ session_put(session);
+
+ return err;
+}
+
+/**
+ * panthor_perf_session_destroy - Destroy a sampling session associated with the @pfile.
+ * @perf: Handle to the panthor perf control structure.
+ * @pfile: The file being closed.
+ *
+ * Must be called when the corresponding userspace process is destroyed and cannot close its
+ * own sessions. As such, we offer no guarantees about data delivery.
+ */
+void panthor_perf_session_destroy(struct panthor_file *pfile, struct panthor_perf *perf)
+{
+ unsigned long sid;
+ struct panthor_perf_session *session;
+
+ if (!pfile || !perf)
+ return;
+
+ xa_for_each(&perf->sessions, sid, session)
+ {
+ if (session->pfile == pfile) {
+ session_destroy(perf, session);
+ xa_erase(&perf->sessions, sid);
+ }
+ }
+}
+
/**
* panthor_perf_unplug - Terminate the performance counter subsystem.
* @ptdev: Panthor device.
@@ -124,8 +810,14 @@ void panthor_perf_unplug(struct panthor_device *ptdev)
return;
if (!xa_empty(&perf->sessions)) {
+ unsigned long sid;
+ struct panthor_perf_session *session;
+
drm_err(&ptdev->base,
"Performance counter sessions active when unplugging the driver!");
+
+ xa_for_each(&perf->sessions, sid, session)
+ session_destroy(perf, session);
}
xa_destroy(&perf->sessions);
diff --git a/drivers/gpu/drm/panthor/panthor_perf.h b/drivers/gpu/drm/panthor/panthor_perf.h
index e4805727b9e7..89d61cd1f017 100644
--- a/drivers/gpu/drm/panthor/panthor_perf.h
+++ b/drivers/gpu/drm/panthor/panthor_perf.h
@@ -7,10 +7,26 @@
#include <linux/types.h>
+struct drm_panthor_perf_cmd_setup;
struct panthor_device;
+struct panthor_file;
+struct panthor_perf;
int panthor_perf_init(struct panthor_device *ptdev);
void panthor_perf_unplug(struct panthor_device *ptdev);
+int panthor_perf_session_setup(struct panthor_device *ptdev, struct panthor_perf *perf,
+ struct drm_panthor_perf_cmd_setup *setup_args,
+ struct panthor_file *pfile);
+int panthor_perf_session_teardown(struct panthor_file *pfile, struct panthor_perf *perf,
+ u32 sid);
+int panthor_perf_session_start(struct panthor_file *pfile, struct panthor_perf *perf,
+ u32 sid, u64 user_data);
+int panthor_perf_session_stop(struct panthor_file *pfile, struct panthor_perf *perf,
+ u32 sid, u64 user_data);
+int panthor_perf_session_sample(struct panthor_file *pfile, struct panthor_perf *perf,
+ u32 sid, u64 user_data);
+void panthor_perf_session_destroy(struct panthor_file *pfile, struct panthor_perf *perf);
+
#endif /* __PANTHOR_PERF_H__ */
--
2.33.0.dirty
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v4 5/7] drm/panthor: Implement the counter sampler and sample handling
2025-05-16 15:49 [PATCH v4 0/7] Performance counter implementation with single manual client support Lukas Zapolskas
` (3 preceding siblings ...)
2025-05-16 15:49 ` [PATCH v4 4/7] drm/panthor: Introduce sampling sessions to handle userspace clients Lukas Zapolskas
@ 2025-05-16 15:49 ` Lukas Zapolskas
2025-05-17 8:56 ` kernel test robot
2025-07-18 14:49 ` Adrián Larumbe
2025-05-16 15:49 ` [PATCH v4 6/7] drm/panthor: Add suspend, resume and reset handling Lukas Zapolskas
2025-05-16 15:49 ` [PATCH v4 7/7] drm/panthor: Expose the panthor perf ioctls Lukas Zapolskas
6 siblings, 2 replies; 29+ messages in thread
From: Lukas Zapolskas @ 2025-05-16 15:49 UTC (permalink / raw)
To: Boris Brezillon, Steven Price, Liviu Dudau, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
dri-devel, linux-kernel
Cc: Adrián Larumbe, Lukas Zapolskas
From: Adrián Larumbe <adrian.larumbe@collabora.com>
The sampler aggregates counter and set requests coming from userspace
and mediates interactions with the FW interface, to ensure that user
sessions cannot override the global configuration.
From the top-level interface, the sampler supports two different types
of samples: clearing samples and regular samples. Clearing samples are
a special sample type that allow for the creation of a sampling
baseline, to ensure that a session does not obtain counter data from
before its creation.
Upon receipt of a relevant interrupt, corresponding to one of the three
relevant bits of the GLB_ACK register, the sampler takes any samples
that occurred, and, based on the insert and extract indices, accumulates
them to an internal storage buffer after zero-extending the counters
from the 32-bit counters emitted by the hardware to 64-bit counters
for internal accumulation.
When the performance counters are enabled, the FW ensures no counter
data is lost when entering and leaving non-counting regions by producing
automatic samples that do not correspond to a GLB_REQ.PRFCNT_SAMPLE
request. Such regions may be per hardware unit, such as when a shader
core powers down, or global. Most of these events do not directly
correspond to session sample requests, so any intermediary counter data
must be stored into a temporary accumulation buffer.
If there are sessions waiting for a sample, this accumulated buffer will
be taken, and emitted for each waiting client. During this phase,
information like the timestamps of sample request and sample emission,
type of the counter block and block index annotations are added to the
sample header and block headers. If no sessions are waiting for
a sample, this accumulation buffer is kept until the next time a sample
is requested.
Special handling is needed for the PRFCNT_OVERFLOW interrupt, which is
an indication that the internal sample handling rate was insufficient.
The sampler also maintains a buffer descriptor indicating the structure
of a firmware sample, since neither the firmware nor the hardware give
any indication of the sample structure, only that it is composed out of
three parts:
- the metadata is an optional initial counter block on supporting
firmware versions that contains a single counter, indicating the
reason a sample was taken when entering global non-counting regions.
This is used to provide coarse-grained information about why a sample
was taken to userspace, to help userspace interpret variations in
counter magnitude.
- the firmware component of the sample is composed out of a global
firmware counter block on supporting firmware versions.
- the hardware component is the most sizeable of the three and contains
a block of counters for each of the underlying hardware resources. It
has a fixed structure that is described in the architecture
specification, and contains the command stream hardware block(s), the
tiler block(s), the MMU and L2 blocks (collectively named the memsys
blocks) and the shader core blocks, in that order.
The structure of this buffer changes based on the firmware and hardware
combination, but is constant on a single system.
This buffer descriptor also handles the sparseness of the shader cores,
wherein the physical core mask contains holes, but the memory allocated
for it is done based on the position of the most significant bit. In
cases with highly sparse core masks, this means that a lot of shader
counter blocks are empty, and must be skipped.
The number of ring buffer slots is configurable through module param to
allow for a lower memory footprint on memory constrained systems.
Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
Co-developed-by: Lukas Zapolskas <lukas.zapolskas@arm.com>
Signed-off-by: Lukas Zapolskas <lukas.zapolskas@arm.com>
---
drivers/gpu/drm/panthor/panthor_fw.c | 6 +
drivers/gpu/drm/panthor/panthor_fw.h | 6 +-
drivers/gpu/drm/panthor/panthor_perf.c | 1082 +++++++++++++++++++++++-
drivers/gpu/drm/panthor/panthor_perf.h | 2 +
4 files changed, 1080 insertions(+), 16 deletions(-)
diff --git a/drivers/gpu/drm/panthor/panthor_fw.c b/drivers/gpu/drm/panthor/panthor_fw.c
index 0f52766a3120..e3948354daa4 100644
--- a/drivers/gpu/drm/panthor/panthor_fw.c
+++ b/drivers/gpu/drm/panthor/panthor_fw.c
@@ -22,6 +22,7 @@
#include "panthor_gem.h"
#include "panthor_gpu.h"
#include "panthor_mmu.h"
+#include "panthor_perf.h"
#include "panthor_regs.h"
#include "panthor_sched.h"
@@ -987,9 +988,12 @@ static void panthor_fw_init_global_iface(struct panthor_device *ptdev)
/* Enable interrupts we care about. */
glb_iface->input->ack_irq_mask = GLB_CFG_ALLOC_EN |
+ GLB_PERFCNT_SAMPLE |
GLB_PING |
GLB_CFG_PROGRESS_TIMER |
GLB_CFG_POWEROFF_TIMER |
+ GLB_PERFCNT_THRESHOLD |
+ GLB_PERFCNT_OVERFLOW |
GLB_IDLE_EN |
GLB_IDLE;
@@ -1018,6 +1022,8 @@ static void panthor_job_irq_handler(struct panthor_device *ptdev, u32 status)
return;
panthor_sched_report_fw_events(ptdev, status);
+
+ panthor_perf_report_irq(ptdev, status);
}
PANTHOR_IRQ_HANDLER(job, JOB, panthor_job_irq_handler);
diff --git a/drivers/gpu/drm/panthor/panthor_fw.h b/drivers/gpu/drm/panthor/panthor_fw.h
index 8bcb933fa790..5a561e72e88b 100644
--- a/drivers/gpu/drm/panthor/panthor_fw.h
+++ b/drivers/gpu/drm/panthor/panthor_fw.h
@@ -198,6 +198,7 @@ struct panthor_fw_global_control_iface {
u32 group_num;
u32 group_stride;
#define GLB_PERFCNT_FW_SIZE(x) ((((x) >> 16) << 8))
+#define GLB_PERFCNT_HW_SIZE(x) (((x) & GENMASK(15, 0)) << 8)
u32 perfcnt_size;
u32 instr_features;
#define PERFCNT_FEATURES_MD_SIZE(x) (((x) & GENMASK(3, 0)) << 8)
@@ -210,7 +211,7 @@ struct panthor_fw_global_input_iface {
#define GLB_CFG_ALLOC_EN BIT(2)
#define GLB_CFG_POWEROFF_TIMER BIT(3)
#define GLB_PROTM_ENTER BIT(4)
-#define GLB_PERFCNT_EN BIT(5)
+#define GLB_PERFCNT_ENABLE BIT(5)
#define GLB_PERFCNT_SAMPLE BIT(6)
#define GLB_COUNTER_EN BIT(7)
#define GLB_PING BIT(8)
@@ -243,6 +244,9 @@ struct panthor_fw_global_input_iface {
u64 perfcnt_base;
u32 perfcnt_extract;
u32 reserved3[3];
+#define GLB_PERFCNT_CONFIG_SIZE(x) ((x) & GENMASK(7, 0))
+#define GLB_PERFCNT_CONFIG_SET(x) (((x) & GENMASK(1, 0)) << 8)
+#define GLB_PERFCNT_METADATA_ENABLE BIT(10)
u32 perfcnt_config;
u32 perfcnt_csg_select;
u32 perfcnt_fw_enable;
diff --git a/drivers/gpu/drm/panthor/panthor_perf.c b/drivers/gpu/drm/panthor/panthor_perf.c
index 15fa533731f3..97603b168d2d 100644
--- a/drivers/gpu/drm/panthor/panthor_perf.c
+++ b/drivers/gpu/drm/panthor/panthor_perf.c
@@ -9,7 +9,11 @@
#include "panthor_device.h"
#include "panthor_fw.h"
+#include "panthor_gem.h"
+#include "panthor_gpu.h"
+#include "panthor_mmu.h"
#include "panthor_perf.h"
+#include "panthor_regs.h"
/**
* PANTHOR_PERF_EM_BITS - Number of bits in a user-facing enable mask. This must correspond
@@ -18,6 +22,81 @@
*/
#define PANTHOR_PERF_EM_BITS (BITS_PER_TYPE(u64) * 2)
+/**
+ * PANTHOR_CTR_TIMESTAMP_LO - The first architecturally mandated counter of every block type
+ * contains the low 32-bits of the TIMESTAMP value.
+ */
+#define PANTHOR_CTR_TIMESTAMP_LO (0)
+
+/**
+ * PANTHOR_CTR_TIMESTAMP_HI - The register offset containinig the high 32-bits of the TIMESTAMP
+ * value.
+ */
+#define PANTHOR_CTR_TIMESTAMP_HI (1)
+
+/**
+ * PANTHOR_CTR_PRFCNT_EN - The register offset containing the enable mask for the enabled counters
+ * that were written to memory.
+ */
+#define PANTHOR_CTR_PRFCNT_EN (2)
+
+/**
+ * PANTHOR_HEADER_COUNTERS - The first four counters of every block type are architecturally
+ * defined to be equivalent. The fourth counter is always reserved,
+ * and should be zero and as such, does not have a separate define.
+ *
+ * These are the only four counters that are the same between different
+ * blocks and are consistent between different architectures.
+ */
+#define PANTHOR_HEADER_COUNTERS (4)
+
+/**
+ * PANTHOR_CTR_SAMPLE_REASON - The metadata block has a single value in position three which
+ * indicates the reason a sample was taken.
+ */
+#define PANTHOR_CTR_SAMPLE_REASON (3)
+
+/**
+ * PANTHOR_HW_COUNTER_SIZE - The size of a hardware counter in the FW ring buffer.
+ */
+#define PANTHOR_HW_COUNTER_SIZE (sizeof(u32))
+
+/**
+ * PANTHOR_PERF_RINGBUF_SLOTS_MIN - The minimum permitted number of slots in the Panthor perf
+ * ring buffer.
+ */
+#define PANTHOR_PERF_RINGBUF_SLOTS_MIN (16)
+
+/**
+ * PANTHOR_PERF_RINGBUF_SLOTS_MAX - The maximum permitted number of slots in the Panthor perf
+ * ring buffer.
+ */
+#define PANTHOR_PERF_RINGBUF_SLOTS_MAX (256)
+
+static unsigned int perf_ringbuf_slots = 32;
+
+static int perf_ringbuf_slots_set(const char *val, const struct kernel_param *kp)
+{
+ unsigned int slots;
+ int ret = kstrtouint(val, 0, &slots);
+
+ if (ret)
+ return ret;
+
+ if (!is_power_of_2(slots))
+ return -EINVAL;
+
+ return param_set_uint_minmax(val, kp, 16, 256);
+}
+
+static const struct kernel_param_ops perf_ringbuf_ops = {
+ .set = perf_ringbuf_slots_set,
+ .get = param_get_uint,
+};
+module_param_cb(perf_ringbuf_slots, &perf_ringbuf_ops, &perf_ringbuf_slots, 0400);
+MODULE_PARM_DESC(perf_ringbuf_slots,
+ "Power of two slots allocated for the Panthor perf kernel-FW ringbuffer");
+
enum panthor_perf_session_state {
/** @PANTHOR_PERF_SESSION_ACTIVE: The session is active and can be used for sampling. */
PANTHOR_PERF_SESSION_ACTIVE = 0,
@@ -63,6 +142,116 @@ enum session_sample_type {
SAMPLE_TYPE_REGULAR,
};
+struct panthor_perf_buffer_descriptor {
+ /**
+ * @block_size: The size of a single block in the FW ring buffer, equal to
+ * sizeof(u32) * counters_per_block.
+ */
+ size_t block_size;
+
+ /**
+ * @buffer_size: The total size of the buffer, equal to (#hardware blocks +
+ * #firmware blocks) * block_size.
+ */
+ size_t buffer_size;
+
+ /**
+ * @available_blocks: Bitmask indicating the blocks supported by the hardware and firmware
+ * combination. Note that this can also include blocks that will not
+ * be exposed to the user.
+ */
+ DECLARE_BITMAP(available_blocks, DRM_PANTHOR_PERF_BLOCK_MAX);
+ struct {
+ /** @offset: Starting offset of a block of type @type in the FW ringbuffer. */
+ size_t offset;
+
+ /** @block_count: Number of blocks of the given @type, starting at @offset. */
+ size_t block_count;
+
+ /** @phys_mask: Bitmask of the physically available blocks. */
+ u64 phys_mask;
+ } blocks[DRM_PANTHOR_PERF_BLOCK_MAX];
+};
+
+/**
+ * struct panthor_perf_sampler - Interface to de-multiplex firmware interaction and handle
+ * global interactions.
+ */
+struct panthor_perf_sampler {
+ /**
+ * @enabled_clients: The number of clients concurrently requesting samples. To ensure that
+ * one client cannot deny samples to another, we must ensure that clients
+ * are effectively reference counted.
+ */
+ atomic_t enabled_clients;
+
+ /**
+ * @sample_handled: Synchronization point between the interrupt bottom half and the
+ * main sampler interface. Must be re-armed solely on a new request
+ * coming to the sampler.
+ */
+ struct completion sample_handled;
+
+ /** @rb: Kernel BO in the FW AS containing the sample ringbuffer. */
+ struct panthor_kernel_bo *rb;
+
+ /**
+ * @sample_slots: Number of slots for samples in the FW ringbuffer. Could be static,
+ * but may be useful to customize for low-memory devices.
+ */
+ size_t sample_slots;
+
+ /** @em: Combined enable mask for all of the active sessions. */
+ struct panthor_perf_enable_masks *em;
+
+ /**
+ * @desc: Buffer descriptor for a sample in the FW ringbuffer. Note that this buffer
+ * at current time does some interesting things with the zeroth block type. On
+ * newer FW revisions, the first counter block of the sample is the METADATA block,
+ * which contains a single value indicating the reason the sample was taken (if
+ * any). This block must not be exposed to userspace, as userspace does not
+ * have sufficient context to interpret it. As such, this block type is not
+ * added to the uAPI, but we still use it in the kernel.
+ */
+ struct panthor_perf_buffer_descriptor desc;
+
+ /**
+ * @sample: Pointer to an upscaled and annotated sample that may be emitted to userspace.
+ * This is used both as an intermediate buffer to do the zero-extension of the
+ * 32-bit counters to 64-bits and as a storage buffer in case the sampler
+ * requests an additional sample that was not requested by any of the top-level
+ * sessions (for instance, when changing the enable masks).
+ */
+ u8 *sample;
+
+ /**
+ * @sampler_lock: Lock used to guard the list of sessions and the sampler configuration.
+ * In particular, it guards the @session_list and the @em.
+ */
+ struct mutex sampler_lock;
+
+ /** @session_list: List of all sessions. */
+ struct list_head session_list;
+
+ /** @pend_lock: Lock used to guard the list of sessions with pending samples. */
+ spinlock_t pend_lock;
+
+ /** @pending_samples: List of sessions requesting samples. */
+ struct list_head pending_samples;
+
+ /** @sample_requested: A sample has been requested. */
+ bool sample_requested;
+
+ /** @set_config: The set that will be configured onto the hardware. */
+ u8 set_config;
+
+ /**
+ * @ptdev: Backpointer to the Panthor device, needed to ring the global doorbell and
+ * interface with FW.
+ */
+ struct panthor_device *ptdev;
+};
+
struct panthor_perf_session {
DECLARE_BITMAP(state, PANTHOR_PERF_SESSION_MAX);
@@ -184,6 +373,9 @@ struct panthor_perf {
* @sessions: Global map of sessions, accessed by their ID.
*/
struct xarray sessions;
+
+ /** @sampler: FW control interface. */
+ struct panthor_perf_sampler sampler;
};
struct panthor_perf_counter_block {
@@ -237,7 +429,7 @@ static void panthor_perf_info_init(struct panthor_device *ptdev)
}
static struct panthor_perf_enable_masks *panthor_perf_create_em(struct drm_panthor_perf_cmd_setup
- *setup_args)
+ *setup_args)
{
struct panthor_perf_enable_masks *em = kmalloc(sizeof(*em), GFP_KERNEL);
if (IS_ERR_OR_NULL(em))
@@ -257,6 +449,23 @@ static struct panthor_perf_enable_masks *panthor_perf_create_em(struct drm_panth
return em;
}
+static void panthor_perf_em_add(struct panthor_perf_enable_masks *dst_em,
+ const struct panthor_perf_enable_masks *const src_em)
+{
+ size_t i = 0;
+
+ for (i = DRM_PANTHOR_PERF_BLOCK_FIRST; i <= DRM_PANTHOR_PERF_BLOCK_LAST; i++)
+ bitmap_or(dst_em->mask[i], dst_em->mask[i], src_em->mask[i], PANTHOR_PERF_EM_BITS);
+}
+
+static void panthor_perf_em_zero(struct panthor_perf_enable_masks *em)
+{
+ size_t i = 0;
+
+ for (i = DRM_PANTHOR_PERF_BLOCK_FIRST; i <= DRM_PANTHOR_PERF_BLOCK_LAST; i++)
+ bitmap_zero(em->mask[i], PANTHOR_PERF_EM_BITS);
+}
+
static u64 session_read_extract_idx(struct panthor_perf_session *session)
{
const u64 slots = session->ringbuf_slots;
@@ -267,6 +476,12 @@ static u64 session_read_extract_idx(struct panthor_perf_session *session)
return smp_load_acquire(&session->control->extract_idx) % slots;
}
+static void session_write_insert_idx(struct panthor_perf_session *session, u64 idx)
+{
+ /* Userspace needs the insert index to know where to look for the sample. */
+ smp_store_release(&session->control->insert_idx, idx);
+}
+
static u64 session_read_insert_idx(struct panthor_perf_session *session)
{
const u64 slots = session->ringbuf_slots;
@@ -326,7 +541,7 @@ static void session_put(struct panthor_perf_session *session)
* Return: valid session pointer or an ERR_PTR.
*/
static struct panthor_perf_session *session_find(struct panthor_file *pfile,
- struct panthor_perf *perf, u32 sid)
+ struct panthor_perf *perf, u32 sid)
{
struct panthor_perf_session *session;
@@ -352,6 +567,761 @@ static struct panthor_perf_session *session_find(struct panthor_file *pfile,
return session;
}
+static u32 compress_enable_mask(unsigned long *const src)
+{
+ size_t i;
+ u32 result = 0;
+ unsigned long clump;
+
+ for_each_set_clump8(i, clump, src, PANTHOR_PERF_EM_BITS) {
+ const unsigned long shift = div_u64(i, 4);
+
+ result |= !!(clump & GENMASK(3, 0)) << shift;
+ result |= !!(clump & GENMASK(7, 4)) << (shift + 1);
+ }
+
+ return result;
+}
+
+static void expand_enable_mask(u32 em, unsigned long *const dst)
+{
+ size_t i;
+ DECLARE_BITMAP(emb, BITS_PER_TYPE(u32));
+
+ bitmap_from_arr32(emb, &em, BITS_PER_TYPE(u32));
+
+ for_each_set_bit(i, emb, BITS_PER_TYPE(u32))
+ bitmap_set(dst, i * 4, 4);
+}
+
+/**
+ * panthor_perf_block_data - Identify the block index and type based on the offset.
+ *
+ * @desc: FW buffer descriptor.
+ * @offset: The current offset being examined.
+ * @idx: Pointer to an output index.
+ * @type: Pointer to an output block type.
+ *
+ * To disambiguate different types of blocks as well as different blocks of the same type,
+ * the offset into the FW ringbuffer is used to uniquely identify the block being considered.
+ *
+ * In the future, this is a good time to identify whether a block will be empty,
+ * allowing us to short-circuit its processing after emitting header information.
+ *
+ * Return: True if the current block is available, false otherwise.
+ */
+static bool panthor_perf_block_data(struct panthor_perf_buffer_descriptor *const desc,
+ size_t offset, u32 *idx,
+ enum drm_panthor_perf_block_type *type)
+{
+ unsigned long id;
+
+ for_each_set_bit(id, desc->available_blocks, DRM_PANTHOR_PERF_BLOCK_LAST) {
+ const size_t block_start = desc->blocks[id].offset;
+ const size_t block_count = desc->blocks[id].block_count;
+ const size_t block_end = desc->blocks[id].offset +
+ desc->block_size * block_count;
+
+ if (!block_count)
+ continue;
+
+ if ((offset >= block_start) && (offset < block_end)) {
+ const unsigned long phys_mask[] = {
+ BITMAP_FROM_U64(desc->blocks[id].phys_mask),
+ };
+ const size_t pos =
+ div_u64(offset - desc->blocks[id].offset, desc->block_size);
+
+ *type = id;
+
+ if (test_bit(pos, phys_mask)) {
+ const u64 mask = GENMASK_ULL(pos, 0);
+ const u64 zeroes = ~desc->blocks[id].phys_mask & mask;
+
+ *idx = pos - hweight64(zeroes);
+ return true;
+ }
+ return false;
+ }
+ }
+
+ return false;
+}
+
+static size_t session_get_user_sample_size(const struct drm_panthor_perf_info *const info)
+{
+ const size_t block_size = get_annotated_block_size(info->counters_per_block);
+ const size_t block_nr = info->cshw_blocks + info->fw_blocks +
+ info->tiler_blocks + info->memsys_blocks + info->shader_blocks;
+
+ return sizeof(struct drm_panthor_perf_sample_header) + (block_size * block_nr);
+}
+
+static u32 panthor_perf_handle_sample(struct panthor_device *ptdev, u32 extract_idx, u32 insert_idx)
+{
+ struct panthor_perf *perf = ptdev->perf;
+ struct panthor_perf_sampler *sampler = &ptdev->perf->sampler;
+ const size_t ann_block_size =
+ get_annotated_block_size(ptdev->perf_info.counters_per_block);
+ u32 i;
+
+ for (i = extract_idx; i != insert_idx; i++) {
+ u32 slot = i % sampler->sample_slots;
+ u8 *fw_sample = (u8 *)sampler->rb->kmap + slot * sampler->desc.buffer_size;
+
+ for (size_t fw_off = 0, ann_off = sizeof(struct drm_panthor_perf_sample_header);
+ fw_off < sampler->desc.buffer_size;
+ fw_off += sampler->desc.block_size)
+
+ {
+ u32 idx = 0;
+ enum drm_panthor_perf_block_type type = 0;
+ DECLARE_BITMAP(expanded_em, PANTHOR_PERF_EM_BITS);
+ struct panthor_perf_counter_block *blk =
+ (typeof(blk))(perf->sampler.sample + ann_off);
+ u32 *const block = (u32 *)(fw_sample + fw_off);
+ const u32 prfcnt_en = block[PANTHOR_CTR_PRFCNT_EN];
+
+ if (!panthor_perf_block_data(&sampler->desc, fw_off, &idx, &type))
+ continue;
+
+ /**
+ * TODO Data from the metadata block must be used to populate the
+ * block state information.
+ */
+ if (type == DRM_PANTHOR_PERF_BLOCK_METADATA) {
+ /*
+ * The host must clear the SAMPLE_REASON to acknowledge it has
+ * consumed the sample.
+ */
+ block[PANTHOR_CTR_SAMPLE_REASON] = 0;
+ continue;
+ }
+
+ expand_enable_mask(prfcnt_en, expanded_em);
+
+ blk->header = (struct drm_panthor_perf_block_header) {
+ .clock = 0,
+ .block_idx = idx,
+ .block_type = type,
+ .block_states = DRM_PANTHOR_PERF_BLOCK_STATE_UNKNOWN
+ };
+ bitmap_to_arr64(blk->header.enable_mask, expanded_em, PANTHOR_PERF_EM_BITS);
+
+ /*
+ * The four header counters must be treated differently, because they are
+ * not additive. For the fourth, the assignment does not matter, as it
+ * is reserved and should be zero.
+ */
+ blk->counters[PANTHOR_CTR_TIMESTAMP_LO] = block[PANTHOR_CTR_TIMESTAMP_LO];
+ blk->counters[PANTHOR_CTR_TIMESTAMP_HI] = block[PANTHOR_CTR_TIMESTAMP_HI];
+ blk->counters[PANTHOR_CTR_PRFCNT_EN] = block[PANTHOR_CTR_PRFCNT_EN];
+
+ /*
+ * The host must clear PRFCNT_EN to acknowledge it has consumed the sample.
+ */
+ block[PANTHOR_CTR_PRFCNT_EN] = 0;
+
+ for (size_t k = PANTHOR_HEADER_COUNTERS;
+ k < ptdev->perf_info.counters_per_block;
+ k++)
+ blk->counters[k] += block[k];
+
+ ann_off += ann_block_size;
+ }
+ }
+
+ return i;
+}
+
+static size_t panthor_perf_get_fw_reported_size(struct panthor_device *ptdev)
+{
+ struct panthor_fw_global_iface *glb_iface = panthor_fw_get_glb_iface(ptdev);
+
+ size_t fw_size = GLB_PERFCNT_FW_SIZE(glb_iface->control->perfcnt_size);
+ size_t hw_size = GLB_PERFCNT_HW_SIZE(glb_iface->control->perfcnt_size);
+ size_t md_size = PERFCNT_FEATURES_MD_SIZE(glb_iface->control->perfcnt_features);
+
+ return md_size + fw_size + hw_size;
+}
+
+#define PANTHOR_PERF_SET_BLOCK_DESC_DATA(__desc, __type, __blk_count, __phys_mask, __offset) \
+ ({ \
+ (__desc)->blocks[(__type)].offset = (__offset); \
+ (__desc)->blocks[(__type)].block_count = (__blk_count); \
+ (__desc)->blocks[(__type)].phys_mask = (__phys_mask); \
+ if ((__blk_count)) \
+ set_bit((__type), (__desc)->available_blocks); \
+ (__offset) + ((__desc)->block_size) * (__blk_count); \
+ })
+
+static size_t get_reserved_shader_core_blocks(struct panthor_device *ptdev)
+{
+ const u64 sc_mask = ptdev->gpu_info.shader_present;
+
+ return fls64(sc_mask);
+}
+
+#define BLK_MASK(x) GENMASK_ULL((x) - 1, 0)
+
+static u64 get_shader_core_mask(struct panthor_device *ptdev)
+{
+ const u64 sc_mask = ptdev->gpu_info.shader_present;
+
+ return BLK_MASK(hweight64(sc_mask));
+}
+
+static int panthor_perf_setup_fw_buffer_desc(struct panthor_device *ptdev,
+ struct panthor_perf_sampler *sampler)
+{
+ const struct drm_panthor_perf_info *const info = &ptdev->perf_info;
+ const size_t block_size = info->counters_per_block * PANTHOR_HW_COUNTER_SIZE;
+ struct panthor_perf_buffer_descriptor *desc = &sampler->desc;
+ const size_t fw_sample_size = panthor_perf_get_fw_reported_size(ptdev);
+ size_t offset = 0;
+
+ desc->block_size = block_size;
+
+ for (enum drm_panthor_perf_block_type type = 0; type < DRM_PANTHOR_PERF_BLOCK_MAX; type++) {
+ switch (type) {
+ case DRM_PANTHOR_PERF_BLOCK_METADATA:
+ if (info->flags & DRM_PANTHOR_PERF_BLOCK_STATES_SUPPORT)
+ offset = PANTHOR_PERF_SET_BLOCK_DESC_DATA(desc, type, 1,
+ BLK_MASK(1), offset);
+ break;
+ case DRM_PANTHOR_PERF_BLOCK_FW:
+ offset = PANTHOR_PERF_SET_BLOCK_DESC_DATA(desc, type, info->fw_blocks,
+ BLK_MASK(info->fw_blocks),
+ offset);
+ break;
+ case DRM_PANTHOR_PERF_BLOCK_CSHW:
+ offset = PANTHOR_PERF_SET_BLOCK_DESC_DATA(desc, type, info->cshw_blocks,
+ BLK_MASK(info->cshw_blocks),
+ offset);
+ break;
+ case DRM_PANTHOR_PERF_BLOCK_TILER:
+ offset = PANTHOR_PERF_SET_BLOCK_DESC_DATA(desc, type, info->tiler_blocks,
+ BLK_MASK(info->tiler_blocks),
+ offset);
+ break;
+ case DRM_PANTHOR_PERF_BLOCK_MEMSYS:
+ offset = PANTHOR_PERF_SET_BLOCK_DESC_DATA(desc, type, info->memsys_blocks,
+ BLK_MASK(info->memsys_blocks),
+ offset);
+ break;
+ case DRM_PANTHOR_PERF_BLOCK_SHADER:
+ offset = PANTHOR_PERF_SET_BLOCK_DESC_DATA(desc, type,
+ get_reserved_shader_core_blocks(ptdev),
+ get_shader_core_mask(ptdev),
+ offset);
+ break;
+ case DRM_PANTHOR_PERF_BLOCK_MAX:
+ drm_WARN_ON_ONCE(&ptdev->base,
+ "DRM_PANTHOR_PERF_BLOCK_MAX should be unreachable!");
+ break;
+ }
+ }
+
+ /* Computed size is not the same as the reported size, so we should not proceed in
+ * initializing the sampling session.
+ */
+ if (offset != fw_sample_size)
+ return -EINVAL;
+
+ desc->buffer_size = offset;
+
+ return 0;
+}
+
+static int panthor_perf_fw_stop_sampling(struct panthor_device *ptdev)
+{
+ struct panthor_fw_global_iface *glb_iface = panthor_fw_get_glb_iface(ptdev);
+ u32 acked;
+ int ret;
+
+ if (~READ_ONCE(glb_iface->input->req) & GLB_PERFCNT_ENABLE)
+ return 0;
+
+ panthor_fw_update_reqs(glb_iface, req, 0, GLB_PERFCNT_ENABLE);
+ gpu_write(ptdev, CSF_DOORBELL(CSF_GLB_DOORBELL_ID), 1);
+ ret = panthor_fw_glb_wait_acks(ptdev, GLB_PERFCNT_ENABLE, &acked, 100);
+ if (ret)
+ drm_warn(&ptdev->base, "Could not disable performance counters");
+
+ return ret;
+}
+
+static int panthor_perf_fw_start_sampling(struct panthor_device *ptdev)
+{
+ struct panthor_fw_global_iface *glb_iface = panthor_fw_get_glb_iface(ptdev);
+ u32 acked;
+ int ret;
+
+ if (READ_ONCE(glb_iface->input->req) & GLB_PERFCNT_ENABLE)
+ return 0;
+
+ panthor_fw_update_reqs(glb_iface, req, GLB_PERFCNT_ENABLE, GLB_PERFCNT_ENABLE);
+ gpu_write(ptdev, CSF_DOORBELL(CSF_GLB_DOORBELL_ID), 1);
+ ret = panthor_fw_glb_wait_acks(ptdev, GLB_PERFCNT_ENABLE, &acked, 100);
+ if (ret)
+ drm_warn(&ptdev->base, "Could not enable performance counters");
+
+ return ret;
+}
+
+static void panthor_perf_fw_write_config(struct panthor_perf_sampler *sampler,
+ struct panthor_perf_enable_masks *em)
+{
+ struct panthor_fw_global_iface *glb_iface = panthor_fw_get_glb_iface(sampler->ptdev);
+ u32 perfcnt_config;
+
+ glb_iface->input->perfcnt_csf_enable =
+ compress_enable_mask(em->mask[DRM_PANTHOR_PERF_BLOCK_CSHW]);
+ glb_iface->input->perfcnt_shader_enable =
+ compress_enable_mask(em->mask[DRM_PANTHOR_PERF_BLOCK_SHADER]);
+ glb_iface->input->perfcnt_mmu_l2_enable =
+ compress_enable_mask(em->mask[DRM_PANTHOR_PERF_BLOCK_MEMSYS]);
+ glb_iface->input->perfcnt_tiler_enable =
+ compress_enable_mask(em->mask[DRM_PANTHOR_PERF_BLOCK_TILER]);
+ glb_iface->input->perfcnt_fw_enable =
+ compress_enable_mask(em->mask[DRM_PANTHOR_PERF_BLOCK_FW]);
+
+ WRITE_ONCE(glb_iface->input->perfcnt_as, panthor_vm_as(panthor_fw_vm(sampler->ptdev)));
+ WRITE_ONCE(glb_iface->input->perfcnt_base, panthor_kernel_bo_gpuva(sampler->rb));
+
+ perfcnt_config = GLB_PERFCNT_CONFIG_SIZE(perf_ringbuf_slots);
+ perfcnt_config |= GLB_PERFCNT_CONFIG_SET(sampler->set_config);
+ if (sampler->ptdev->perf_info.flags & DRM_PANTHOR_PERF_BLOCK_STATES_SUPPORT)
+ perfcnt_config |= GLB_PERFCNT_METADATA_ENABLE;
+
+ WRITE_ONCE(glb_iface->input->perfcnt_config, perfcnt_config);
+
+ /**
+ * The spec mandates that the host zero the PRFCNT_EXTRACT register before an enable
+ * operation, and each (re-)enable will require an enable-disable pair to program
+ * the new changes onto the FW interface.
+ */
+ WRITE_ONCE(glb_iface->input->perfcnt_extract, 0);
+}
+
+static void panthor_perf_fw_write_sampler_config(struct panthor_perf_sampler *sampler)
+{
+ panthor_perf_fw_write_config(sampler, sampler->em);
+}
+
+static void session_populate_sample_header(struct panthor_perf_session *session,
+ struct drm_panthor_perf_sample_header *hdr, u8 set)
+{
+ *hdr = (struct drm_panthor_perf_sample_header) {
+ .block_set = set,
+ .user_data = session->user_data,
+ .timestamp_start_ns = session->sample_start_ns,
+ /**
+ * TODO This should be changed to use the GPU clocks and the TIMESTAMP register,
+ * when support is added.
+ */
+ .timestamp_end_ns = ktime_get_raw_ns(),
+ };
+}
+
+/**
+ * session_accumulate_sample - Accumulate the counters that are requested by the session
+ * into the target buffer.
+ *
+ * @ptdev: Panthor device
+ * @session: Perf session
+ * @session_sample: Starting offset of the sample in the userspace mapping.
+ * @sampler_sample: Starting offset of the sample in the sampler intermediate buffer.
+ *
+ * The hardware supports counter selection at the granularity of 1 bit per 4 counters, and there
+ * is a single global FW frontend to program the counter requests from multiple sessions. This may
+ * lead to a large disparity between the requested and provided counters for an individual client.
+ * To remove this cross-talk, we patch out the counters that have not been requested by this
+ * session and update the PRFCNT_EN, the header counter containing a bitmask of enabled counters,
+ * accordingly.
+ */
+static void session_accumulate_sample(struct panthor_device *ptdev,
+ struct panthor_perf_session *session,
+ u8 *session_sample, u8 *sampler_sample)
+{
+ const struct drm_panthor_perf_info *const perf_info = &ptdev->perf_info;
+ const size_t block_size = get_annotated_block_size(perf_info->counters_per_block);
+ const size_t sample_size = session_get_user_sample_size(perf_info);
+ const size_t sample_header_size = sizeof(struct drm_panthor_perf_sample_header);
+ const size_t data_size = sample_size - sample_header_size;
+ struct drm_panthor_perf_sample_header *hdr = (typeof(hdr))session_sample;
+
+ hdr->timestamp_end_ns = ktime_get_raw_ns();
+
+ session_sample += sample_header_size;
+ sampler_sample += sample_header_size;
+
+ for (size_t i = 0; i < data_size; i += block_size) {
+ size_t ctr_idx;
+ DECLARE_BITMAP(enabled_ctrs, PANTHOR_PERF_EM_BITS);
+ struct panthor_perf_counter_block *dst_blk = (typeof(dst_blk))(session_sample + i);
+ struct panthor_perf_counter_block *src_blk = (typeof(src_blk))(sampler_sample + i);
+
+ bitmap_from_arr64(enabled_ctrs, dst_blk->header.enable_mask, PANTHOR_PERF_EM_BITS);
+ bitmap_clear(enabled_ctrs, 0, PANTHOR_HEADER_COUNTERS);
+
+ dst_blk->counters[PANTHOR_CTR_TIMESTAMP_HI] =
+ src_blk->counters[PANTHOR_CTR_TIMESTAMP_HI];
+ dst_blk->counters[PANTHOR_CTR_TIMESTAMP_LO] =
+ src_blk->counters[PANTHOR_CTR_TIMESTAMP_LO];
+
+ for_each_set_bit(ctr_idx, enabled_ctrs, PANTHOR_PERF_EM_BITS)
+ dst_blk->counters[ctr_idx] += src_blk->counters[ctr_idx];
+ }
+}
+
+static void panthor_perf_fw_request_sample(struct panthor_perf_sampler *sampler)
+{
+ struct panthor_fw_global_iface *glb_iface = panthor_fw_get_glb_iface(sampler->ptdev);
+
+ panthor_fw_toggle_reqs(glb_iface, req, ack, GLB_PERFCNT_SAMPLE);
+ gpu_write(sampler->ptdev, CSF_DOORBELL(CSF_GLB_DOORBELL_ID), 1);
+}
+
+/**
+ * session_populate_sample - Write out a new sample into a previously populated slot in the user
+ * ringbuffer and update both the header of the block and the PRFCNT_EN
+ * counter to contain only the selected subset of counters for that block.
+ *
+ * @ptdev: Panthor device
+ * @session: Perf session
+ * @session_sample: Pointer aligned to the start of the data section of the sample in the targeted
+ * slot.
+ * @sampler_sample: Pointer aligned to the start of the data section of the intermediate sampler
+ * buffer.
+ *
+ * When a new sample slot is targeted, it must be cleared of the data already existing there,
+ * enabling a direct copy from the intermediate buffer and then zeroing out any counters
+ * that are not required for the current session.
+ */
+static void session_populate_sample(struct panthor_device *ptdev,
+ struct panthor_perf_session *session, u8 *session_sample,
+ u8 *sampler_sample)
+{
+ const struct drm_panthor_perf_info *const perf_info = &ptdev->perf_info;
+
+ const size_t block_size = get_annotated_block_size(perf_info->counters_per_block);
+ const size_t sample_size = session_get_user_sample_size(perf_info);
+ const size_t sample_header_size = sizeof(struct drm_panthor_perf_sample_header);
+ const size_t data_size = sample_size - sample_header_size;
+
+ memcpy(session_sample, sampler_sample, sample_size);
+
+ session_populate_sample_header(session,
+ (struct drm_panthor_perf_sample_header *)session_sample,
+ ptdev->perf->sampler.set_config);
+
+ session_sample += sample_header_size;
+
+ for (size_t i = 0; i < data_size; i += block_size) {
+ size_t ctr_idx;
+ DECLARE_BITMAP(em_diff, PANTHOR_PERF_EM_BITS);
+ struct panthor_perf_counter_block *blk = (typeof(blk))(session_sample + i);
+ enum drm_panthor_perf_block_type type = blk->header.block_type;
+ unsigned long *blk_em = session->enabled_counters->mask[type];
+
+ bitmap_from_arr64(em_diff, blk->header.enable_mask, PANTHOR_PERF_EM_BITS);
+
+ bitmap_andnot(em_diff, em_diff, blk_em, PANTHOR_PERF_EM_BITS);
+ bitmap_clear(em_diff, 0, PANTHOR_HEADER_COUNTERS);
+
+ blk->counters[PANTHOR_CTR_PRFCNT_EN] = compress_enable_mask(blk_em);
+
+ for_each_set_bit(ctr_idx, em_diff, PANTHOR_PERF_EM_BITS)
+ blk->counters[ctr_idx] = 0;
+
+ bitmap_to_arr64(&blk->header.enable_mask, blk_em, PANTHOR_PERF_EM_BITS);
+ }
+}
+
+static int session_copy_sample(struct panthor_device *ptdev, struct panthor_perf_session *session)
+{
+ struct panthor_perf *perf = ptdev->perf;
+ const size_t sample_size = session_get_user_sample_size(&ptdev->perf_info);
+ const u64 insert_idx = session_read_insert_idx(session);
+ const u64 extract_idx = session_read_extract_idx(session);
+ u8 *new_sample;
+
+ if (!CIRC_SPACE_TO_END(insert_idx, extract_idx, session->ringbuf_slots))
+ return -ENOSPC;
+
+ if (READ_ONCE(session->pending_sample_request) == SAMPLE_TYPE_INITIAL)
+ return 0;
+
+ new_sample = session->samples + insert_idx * sample_size;
+
+ if (session->accum_idx != insert_idx) {
+ session_populate_sample(ptdev, session, new_sample, perf->sampler.sample);
+ session->accum_idx = insert_idx;
+ } else
+ session_accumulate_sample(ptdev, session, new_sample, perf->sampler.sample);
+
+ return 0;
+}
+
+static void session_emit_sample(struct panthor_perf_session *session)
+{
+ const u64 insert_idx = session_read_insert_idx(session);
+ const enum session_sample_type type = READ_ONCE(session->pending_sample_request);
+
+ if (type == SAMPLE_TYPE_INITIAL || type == SAMPLE_TYPE_NONE)
+ goto reset_sample_request;
+
+ session_write_insert_idx(session, (insert_idx + 1) % session->ringbuf_slots);
+
+ /* Since we are about to notify userspace, we must ensure that all changes to memory
+ * are visible.
+ */
+ wmb();
+
+ eventfd_signal(session->eventfd);
+
+reset_sample_request:
+ WRITE_ONCE(session->pending_sample_request, SAMPLE_TYPE_NONE);
+}
+
+#define PRFCNT_IRQS (GLB_PERFCNT_OVERFLOW | GLB_PERFCNT_SAMPLE | GLB_PERFCNT_THRESHOLD)
+
+void panthor_perf_report_irq(struct panthor_device *ptdev, u32 status)
+{
+ struct panthor_perf *const perf = ptdev->perf;
+ struct panthor_perf_sampler *sampler;
+ struct panthor_fw_global_iface *glb_iface = panthor_fw_get_glb_iface(ptdev);
+ bool sample_requested;
+
+ if (!(status & JOB_INT_GLOBAL_IF))
+ return;
+
+ if (!perf)
+ return;
+
+ sampler = &perf->sampler;
+
+ const u32 ack = READ_ONCE(glb_iface->output->ack);
+ const u32 req = READ_ONCE(glb_iface->input->req);
+
+ scoped_guard(spinlock_irqsave, &sampler->pend_lock)
+ sample_requested = sampler->sample_requested;
+
+
+ /*
+ * TODO Fix up the error handling for overflow. Currently, the user is unblocked
+ * with a completely empty sample, whic is not the intended behaviour.
+ */
+ if (drm_WARN_ON_ONCE(&ptdev->base, (req ^ ack) & GLB_PERFCNT_OVERFLOW))
+ goto emit;
+
+ if ((sample_requested && (req & GLB_PERFCNT_SAMPLE) == (ack & GLB_PERFCNT_SAMPLE)) ||
+ ((req ^ ack) & GLB_PERFCNT_THRESHOLD)) {
+ const u32 extract_idx = READ_ONCE(glb_iface->input->perfcnt_extract);
+ const u32 insert_idx = READ_ONCE(glb_iface->output->perfcnt_insert);
+
+ /* If the sample was requested around a reset, some time may be needed
+ * for the FW interface to be updated, so we reschedule a sample
+ * and return immediately.
+ */
+ if (insert_idx == extract_idx) {
+ guard(spinlock_irqsave)(&sampler->pend_lock);
+ if (sampler->sample_requested)
+ panthor_perf_fw_request_sample(sampler);
+
+ return;
+ }
+
+ WRITE_ONCE(glb_iface->input->perfcnt_extract,
+ panthor_perf_handle_sample(ptdev, extract_idx, insert_idx));
+ }
+
+ scoped_guard(mutex, &sampler->sampler_lock)
+ {
+ struct list_head *pos;
+
+ list_for_each(pos, &sampler->session_list) {
+ struct panthor_perf_session *session = list_entry(pos,
+ struct panthor_perf_session, sessions);
+
+ session_copy_sample(ptdev, session);
+ }
+ }
+
+emit:
+ scoped_guard(spinlock_irqsave, &sampler->pend_lock) {
+ struct list_head *pos, *tmp;
+
+ list_for_each_safe(pos, tmp, &sampler->pending_samples) {
+ struct panthor_perf_session *session = list_entry(pos,
+ struct panthor_perf_session, pending);
+
+ session_emit_sample(session);
+ list_del(pos);
+ session_put(session);
+ }
+
+ sampler->sample_requested = false;
+ }
+
+ memset(sampler->sample, 0, session_get_user_sample_size(&ptdev->perf_info));
+ complete(&sampler->sample_handled);
+}
+
+static int panthor_perf_sampler_init(struct panthor_perf_sampler *sampler,
+ struct panthor_device *ptdev)
+{
+ struct panthor_kernel_bo *bo;
+ u8 *sample;
+ int ret;
+
+ ret = panthor_perf_setup_fw_buffer_desc(ptdev, sampler);
+ if (ret) {
+ drm_err(&ptdev->base,
+ "Failed to setup descriptor for FW ring buffer, err = %d", ret);
+ return ret;
+ }
+
+ bo = panthor_kernel_bo_create(ptdev, panthor_fw_vm(ptdev),
+ sampler->desc.buffer_size * perf_ringbuf_slots,
+ DRM_PANTHOR_BO_NO_MMAP,
+ DRM_PANTHOR_VM_BIND_OP_MAP_NOEXEC |
+ DRM_PANTHOR_VM_BIND_OP_MAP_UNCACHED,
+ PANTHOR_VM_KERNEL_AUTO_VA);
+
+ if (IS_ERR_OR_NULL(bo))
+ return IS_ERR(bo) ? PTR_ERR(bo) : -ENOMEM;
+
+ ret = panthor_kernel_bo_vmap(bo);
+ if (ret)
+ goto cleanup_bo;
+
+ sample = kzalloc(session_get_user_sample_size(&ptdev->perf_info), GFP_KERNEL);
+ if (ZERO_OR_NULL_PTR(sample)) {
+ ret = -ENOMEM;
+ goto cleanup_vmap;
+ }
+
+ sampler->rb = bo;
+ sampler->sample = sample;
+ sampler->sample_slots = perf_ringbuf_slots;
+ sampler->em = kzalloc(sizeof(*sampler->em), GFP_KERNEL);
+
+ mutex_init(&sampler->sampler_lock);
+ spin_lock_init(&sampler->pend_lock);
+ INIT_LIST_HEAD(&sampler->session_list);
+ INIT_LIST_HEAD(&sampler->pending_samples);
+ init_completion(&sampler->sample_handled);
+
+ sampler->ptdev = ptdev;
+
+ return 0;
+
+cleanup_vmap:
+ panthor_kernel_bo_vunmap(bo);
+
+cleanup_bo:
+ panthor_kernel_bo_destroy(bo);
+
+ return ret;
+}
+
+static void panthor_perf_sampler_term(struct panthor_perf_sampler *sampler)
+{
+ int ret;
+ bool requested;
+
+ scoped_guard(spinlock_irqsave, &sampler->pend_lock)
+ requested = sampler->sample_requested;
+
+ if (requested)
+ wait_for_completion_killable(&sampler->sample_handled);
+
+ panthor_perf_fw_write_config(sampler, &(struct panthor_perf_enable_masks){});
+
+ ret = panthor_perf_fw_stop_sampling(sampler->ptdev);
+ if (ret)
+ drm_warn_once(&sampler->ptdev->base, "Sampler termination failed, ret = %d", ret);
+
+ kfree(sampler->sample);
+
+ panthor_kernel_bo_destroy(sampler->rb);
+}
+
+static int panthor_perf_sampler_add(struct panthor_perf_sampler *sampler,
+ struct panthor_perf_session *session, u8 set)
+{
+ int ret = 0;
+ struct panthor_perf_enable_masks *session_em = session->enabled_counters;
+
+ guard(mutex)(&sampler->sampler_lock);
+
+ /* Early check for whether a new set can be configured. */
+ if (!atomic_read(&sampler->enabled_clients))
+ sampler->set_config = set;
+ else
+ if (sampler->set_config != set)
+ return -EBUSY;
+
+ panthor_perf_em_add(sampler->em, session_em);
+ ret = pm_runtime_resume_and_get(sampler->ptdev->base.dev);
+ if (ret)
+ return ret;
+
+ if (atomic_read(&sampler->enabled_clients)) {
+ ret = panthor_perf_fw_stop_sampling(sampler->ptdev);
+ if (ret)
+ return ret;
+ }
+
+ panthor_perf_fw_write_sampler_config(sampler);
+
+ ret = panthor_perf_fw_start_sampling(sampler->ptdev);
+ if (ret)
+ return ret;
+
+ session_get(session);
+ list_add_tail(&session->sessions, &sampler->session_list);
+ atomic_inc(&sampler->enabled_clients);
+
+ return 0;
+}
+
+static int panthor_perf_sampler_remove_session(struct panthor_perf_sampler *sampler,
+ struct panthor_perf_session *session)
+{
+ int ret;
+ struct list_head *snode;
+
+ guard(mutex)(&sampler->sampler_lock);
+
+ list_del_init(&session->sessions);
+ session_put(session);
+
+ panthor_perf_em_zero(sampler->em);
+ list_for_each(snode, &sampler->session_list)
+ {
+ struct panthor_perf_session *session =
+ container_of(snode, typeof(*session), sessions);
+
+ panthor_perf_em_add(sampler->em, session->enabled_counters);
+ }
+
+ ret = panthor_perf_fw_stop_sampling(sampler->ptdev);
+ if (ret)
+ return ret;
+
+ atomic_dec(&sampler->enabled_clients);
+ pm_runtime_put_sync(sampler->ptdev->base.dev);
+
+ panthor_perf_fw_write_sampler_config(sampler);
+
+ if (atomic_read(&sampler->enabled_clients))
+ return panthor_perf_fw_start_sampling(sampler->ptdev);
+ return 0;
+}
+
/**
* panthor_perf_init - Initialize the performance counter subsystem.
* @ptdev: Panthor device
@@ -382,6 +1352,10 @@ int panthor_perf_init(struct panthor_device *ptdev)
.max = 1,
};
+ ret = panthor_perf_sampler_init(&perf->sampler, ptdev);
+ if (ret)
+ return ret;
+
drm_info(&ptdev->base, "Performance counter subsystem initialized");
ptdev->perf = no_free_ptr(perf);
@@ -389,6 +1363,69 @@ int panthor_perf_init(struct panthor_device *ptdev)
return ret;
}
+static int sampler_request(struct panthor_perf_sampler *sampler,
+ struct panthor_perf_session *session, enum session_sample_type type)
+{
+ guard(spinlock_irqsave)(&sampler->pend_lock);
+
+ /*
+ * If a previous sample has not been handled yet, the session cannot request another
+ * sample. If this happens too often, the requested sample rate is too high.
+ */
+ if (READ_ONCE(session->pending_sample_request) != SAMPLE_TYPE_NONE)
+ return -EBUSY;
+
+ WRITE_ONCE(session->pending_sample_request, type);
+ session_get(session);
+ list_add_tail(&session->pending, &sampler->pending_samples);
+
+ if (!sampler->sample_requested) {
+ reinit_completion(&sampler->sample_handled);
+ sampler->sample_requested = true;
+ panthor_perf_fw_request_sample(sampler);
+ }
+
+ return 0;
+}
+
+/**
+ * panthor_perf_sampler_request_initial - Request an initial sample.
+ * @sampler: Panthor sampler
+ * @session: Target session
+ *
+ * Perform a synchronous sample that gets immediately discarded. This sets a baseline at the point
+ * of time a new session is started, to avoid having counters from before the session.
+ */
+static int panthor_perf_sampler_request_initial(struct panthor_perf_sampler *sampler,
+ struct panthor_perf_session *session)
+{
+ int ret = sampler_request(sampler, session, SAMPLE_TYPE_INITIAL);
+
+ if (ret)
+ return ret;
+
+ return wait_for_completion_timeout(&sampler->sample_handled,
+ msecs_to_jiffies(1000));
+}
+
+/**
+ * panthor_perf_sampler_request_sample - Request a counter sample for the userspace client.
+ * @sampler: Panthor sampler
+ * @session: Target session
+ *
+ * A session that has already requested a sample cannot request another one until the previous
+ * sample has been delivered.
+ *
+ * Return:
+ * * %0 - The sample has been requested successfully.
+ * * %-EBUSY - The target session has already requested a sample and has not received it yet.
+ */
+static int panthor_perf_sampler_request_sample(struct panthor_perf_sampler *sampler,
+ struct panthor_perf_session *session)
+{
+ return sampler_request(sampler, session, SAMPLE_TYPE_REGULAR);
+}
+
static int session_validate_set(u8 set)
{
if (set > DRM_PANTHOR_PERF_SET_TERTIARY)
@@ -417,8 +1454,8 @@ static int session_validate_set(u8 set)
* Return: non-negative session identifier on success or negative error code on failure.
*/
int panthor_perf_session_setup(struct panthor_device *ptdev, struct panthor_perf *perf,
- struct drm_panthor_perf_cmd_setup *setup_args,
- struct panthor_file *pfile)
+ struct drm_panthor_perf_cmd_setup *setup_args,
+ struct panthor_file *pfile)
{
struct panthor_perf_session *session;
struct drm_gem_object *ringbuffer;
@@ -510,6 +1547,10 @@ int panthor_perf_session_setup(struct panthor_device *ptdev, struct panthor_perf
kref_init(&session->ref);
session->enabled_counters = em;
+ ret = panthor_perf_sampler_add(&perf->sampler, session, setup_args->block_set);
+ if (ret)
+ goto cleanup_xa_alloc;
+
session->sample_freq_ns = setup_args->sample_freq_ns;
session->user_sample_size = user_sample_size;
session->ring_buf = ringbuffer;
@@ -520,6 +1561,9 @@ int panthor_perf_session_setup(struct panthor_device *ptdev, struct panthor_perf
return session_id;
+cleanup_xa_alloc:
+ xa_store(&perf->sessions, session_id, NULL, GFP_KERNEL);
+
cleanup_em:
kfree(em);
@@ -545,8 +1589,10 @@ int panthor_perf_session_setup(struct panthor_device *ptdev, struct panthor_perf
}
static int session_stop(struct panthor_perf *perf, struct panthor_perf_session *session,
- u64 user_data)
+ u64 user_data)
{
+ int ret;
+
if (!test_bit(PANTHOR_PERF_SESSION_ACTIVE, session->state))
return 0;
@@ -559,14 +1605,17 @@ static int session_stop(struct panthor_perf *perf, struct panthor_perf_session *
session->user_data = user_data;
+ ret = panthor_perf_sampler_request_sample(&perf->sampler, session);
+ if (ret)
+ return ret;
+
clear_bit(PANTHOR_PERF_SESSION_ACTIVE, session->state);
- /* TODO Calls to the FW interface will go here in later patches. */
return 0;
}
static int session_start(struct panthor_perf *perf, struct panthor_perf_session *session,
- u64 user_data)
+ u64 user_data)
{
if (test_bit(PANTHOR_PERF_SESSION_ACTIVE, session->state))
return 0;
@@ -580,12 +1629,11 @@ static int session_start(struct panthor_perf *perf, struct panthor_perf_session
if (session->sample_freq_ns)
session->user_data = user_data;
- /* TODO Calls to the FW interface will go here in later patches. */
- return 0;
+ return panthor_perf_sampler_request_initial(&perf->sampler, session);
}
static int session_sample(struct panthor_perf *perf, struct panthor_perf_session *session,
- u64 user_data)
+ u64 user_data)
{
if (!test_bit(PANTHOR_PERF_SESSION_ACTIVE, session->state))
return 0;
@@ -608,14 +1656,16 @@ static int session_sample(struct panthor_perf *perf, struct panthor_perf_session
session->sample_start_ns = ktime_get_raw_ns();
session->user_data = user_data;
- return 0;
+ return panthor_perf_sampler_request_sample(&perf->sampler, session);
}
static int session_destroy(struct panthor_perf *perf, struct panthor_perf_session *session)
{
+ int ret = panthor_perf_sampler_remove_session(&perf->sampler, session);
+
session_put(session);
- return 0;
+ return ret;
}
static int session_teardown(struct panthor_perf *perf, struct panthor_perf_session *session)
@@ -691,7 +1741,7 @@ int panthor_perf_session_teardown(struct panthor_file *pfile, struct panthor_per
* Return: 0 on success, negative error code on failure.
*/
int panthor_perf_session_start(struct panthor_file *pfile, struct panthor_perf *perf,
- u32 sid, u64 user_data)
+ u32 sid, u64 user_data)
{
struct panthor_perf_session *session = session_find(pfile, perf, sid);
int err;
@@ -724,7 +1774,7 @@ int panthor_perf_session_start(struct panthor_file *pfile, struct panthor_perf *
* Return: 0 on success, negative error code on failure.
*/
int panthor_perf_session_stop(struct panthor_file *pfile, struct panthor_perf *perf,
- u32 sid, u64 user_data)
+ u32 sid, u64 user_data)
{
struct panthor_perf_session *session = session_find(pfile, perf, sid);
int err;
@@ -755,7 +1805,7 @@ int panthor_perf_session_stop(struct panthor_file *pfile, struct panthor_perf *p
* Return: 0 on success, negative error code on failure.
*/
int panthor_perf_session_sample(struct panthor_file *pfile, struct panthor_perf *perf,
- u32 sid, u64 user_data)
+ u32 sid, u64 user_data)
{
struct panthor_perf_session *session = session_find(pfile, perf, sid);
int err;
@@ -822,6 +1872,8 @@ void panthor_perf_unplug(struct panthor_device *ptdev)
xa_destroy(&perf->sessions);
+ panthor_perf_sampler_term(&perf->sampler);
+
kfree(ptdev->perf);
ptdev->perf = NULL;
diff --git a/drivers/gpu/drm/panthor/panthor_perf.h b/drivers/gpu/drm/panthor/panthor_perf.h
index 89d61cd1f017..c482198b6fbd 100644
--- a/drivers/gpu/drm/panthor/panthor_perf.h
+++ b/drivers/gpu/drm/panthor/panthor_perf.h
@@ -28,5 +28,7 @@ int panthor_perf_session_sample(struct panthor_file *pfile, struct panthor_perf
u32 sid, u64 user_data);
void panthor_perf_session_destroy(struct panthor_file *pfile, struct panthor_perf *perf);
+void panthor_perf_report_irq(struct panthor_device *ptdev, u32 status);
+
#endif /* __PANTHOR_PERF_H__ */
--
2.33.0.dirty
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v4 6/7] drm/panthor: Add suspend, resume and reset handling
2025-05-16 15:49 [PATCH v4 0/7] Performance counter implementation with single manual client support Lukas Zapolskas
` (4 preceding siblings ...)
2025-05-16 15:49 ` [PATCH v4 5/7] drm/panthor: Implement the counter sampler and sample handling Lukas Zapolskas
@ 2025-05-16 15:49 ` Lukas Zapolskas
2025-07-18 15:01 ` Adrián Larumbe
2025-05-16 15:49 ` [PATCH v4 7/7] drm/panthor: Expose the panthor perf ioctls Lukas Zapolskas
6 siblings, 1 reply; 29+ messages in thread
From: Lukas Zapolskas @ 2025-05-16 15:49 UTC (permalink / raw)
To: Boris Brezillon, Steven Price, Liviu Dudau, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
dri-devel, linux-kernel
Cc: Adrián Larumbe, Lukas Zapolskas
The sampler must disable and re-enable counter sampling around suspends,
and must re-program the FW interface after a reset to avoid losing
data.
Signed-off-by: Lukas Zapolskas <lukas.zapolskas@arm.com>
---
drivers/gpu/drm/panthor/panthor_device.c | 7 +-
drivers/gpu/drm/panthor/panthor_perf.c | 102 +++++++++++++++++++++++
drivers/gpu/drm/panthor/panthor_perf.h | 6 ++
3 files changed, 114 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/panthor/panthor_device.c b/drivers/gpu/drm/panthor/panthor_device.c
index 7ac985d44655..92624a8717c5 100644
--- a/drivers/gpu/drm/panthor/panthor_device.c
+++ b/drivers/gpu/drm/panthor/panthor_device.c
@@ -139,6 +139,7 @@ static void panthor_device_reset_work(struct work_struct *work)
if (!drm_dev_enter(&ptdev->base, &cookie))
return;
+ panthor_perf_pre_reset(ptdev);
panthor_sched_pre_reset(ptdev);
panthor_fw_pre_reset(ptdev, true);
panthor_mmu_pre_reset(ptdev);
@@ -148,6 +149,7 @@ static void panthor_device_reset_work(struct work_struct *work)
ret = panthor_fw_post_reset(ptdev);
atomic_set(&ptdev->reset.pending, 0);
panthor_sched_post_reset(ptdev, ret != 0);
+ panthor_perf_post_reset(ptdev);
drm_dev_exit(cookie);
if (ret) {
@@ -496,8 +498,10 @@ int panthor_device_resume(struct device *dev)
ret = panthor_device_resume_hw_components(ptdev);
}
- if (!ret)
+ if (!ret) {
panthor_sched_resume(ptdev);
+ panthor_perf_resume(ptdev);
+ }
drm_dev_exit(cookie);
@@ -561,6 +565,7 @@ int panthor_device_suspend(struct device *dev)
/* We prepare everything as if we were resetting the GPU.
* The end of the reset will happen in the resume path though.
*/
+ panthor_perf_suspend(ptdev);
panthor_sched_suspend(ptdev);
panthor_fw_suspend(ptdev);
panthor_mmu_suspend(ptdev);
diff --git a/drivers/gpu/drm/panthor/panthor_perf.c b/drivers/gpu/drm/panthor/panthor_perf.c
index 97603b168d2d..438319cf71ab 100644
--- a/drivers/gpu/drm/panthor/panthor_perf.c
+++ b/drivers/gpu/drm/panthor/panthor_perf.c
@@ -1845,6 +1845,76 @@ void panthor_perf_session_destroy(struct panthor_file *pfile, struct panthor_per
}
}
+static int panthor_perf_sampler_resume(struct panthor_perf_sampler *sampler)
+{
+ int ret;
+
+ if (!atomic_read(&sampler->enabled_clients))
+ return 0;
+
+ ret = panthor_perf_fw_start_sampling(sampler->ptdev);
+ if (ret)
+ return ret;
+
+ return 0;
+}
+
+static int panthor_perf_sampler_suspend(struct panthor_perf_sampler *sampler)
+{
+ int ret;
+
+ if (!atomic_read(&sampler->enabled_clients))
+ return 0;
+
+ ret = panthor_perf_fw_stop_sampling(sampler->ptdev);
+ if (ret)
+ return ret;
+
+ return 0;
+}
+
+/**
+ * panthor_perf_suspend - Prepare the performance counter subsystem for system suspend.
+ * @ptdev: Panthor device.
+ *
+ * Indicate to the performance counters that the system is suspending.
+ *
+ * This function must not be used to handle MCU power state transitions: just before MCU goes
+ * from on to any inactive state, an automatic sample will be performed by the firmware, and
+ * the performance counter firmware state will be restored on warm boot.
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+int panthor_perf_suspend(struct panthor_device *ptdev)
+{
+ struct panthor_perf *perf = ptdev->perf;
+
+ if (!perf)
+ return 0;
+
+ return panthor_perf_sampler_suspend(&perf->sampler);
+}
+
+/**
+ * panthor_perf_resume - Resume the performance counter subsystem after system resumption.
+ * @ptdev: Panthor device.
+ *
+ * Indicate to the performance counters that the system has resumed. This must not be used
+ * to handle MCU state transitions, for the same reasons as detailed in the kerneldoc for
+ * @panthor_perf_suspend.
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+int panthor_perf_resume(struct panthor_device *ptdev)
+{
+ struct panthor_perf *perf = ptdev->perf;
+
+ if (!perf)
+ return 0;
+
+ return panthor_perf_sampler_resume(&perf->sampler);
+}
+
/**
* panthor_perf_unplug - Terminate the performance counter subsystem.
* @ptdev: Panthor device.
@@ -1878,3 +1948,35 @@ void panthor_perf_unplug(struct panthor_device *ptdev)
ptdev->perf = NULL;
}
+
+void panthor_perf_pre_reset(struct panthor_device *ptdev)
+{
+ struct panthor_perf_sampler *sampler;
+
+ if (!ptdev || !ptdev->perf)
+ return;
+
+ sampler = &ptdev->perf->sampler;
+
+ if (!atomic_read(&sampler->enabled_clients))
+ return;
+
+ panthor_perf_fw_stop_sampling(sampler->ptdev);
+}
+
+void panthor_perf_post_reset(struct panthor_device *ptdev)
+{
+ struct panthor_perf_sampler *sampler;
+
+ if (!ptdev || !ptdev->perf)
+ return;
+
+ sampler = &ptdev->perf->sampler;
+
+ if (!atomic_read(&sampler->enabled_clients))
+ return;
+
+ panthor_perf_fw_write_sampler_config(sampler);
+
+ panthor_perf_fw_start_sampling(sampler->ptdev);
+}
diff --git a/drivers/gpu/drm/panthor/panthor_perf.h b/drivers/gpu/drm/panthor/panthor_perf.h
index c482198b6fbd..fc08a5440a35 100644
--- a/drivers/gpu/drm/panthor/panthor_perf.h
+++ b/drivers/gpu/drm/panthor/panthor_perf.h
@@ -13,6 +13,8 @@ struct panthor_file;
struct panthor_perf;
int panthor_perf_init(struct panthor_device *ptdev);
+int panthor_perf_suspend(struct panthor_device *ptdev);
+int panthor_perf_resume(struct panthor_device *ptdev);
void panthor_perf_unplug(struct panthor_device *ptdev);
int panthor_perf_session_setup(struct panthor_device *ptdev, struct panthor_perf *perf,
@@ -30,5 +32,9 @@ void panthor_perf_session_destroy(struct panthor_file *pfile, struct panthor_per
void panthor_perf_report_irq(struct panthor_device *ptdev, u32 status);
+void panthor_perf_pre_reset(struct panthor_device *ptdev);
+
+void panthor_perf_post_reset(struct panthor_device *ptdev);
+
#endif /* __PANTHOR_PERF_H__ */
--
2.33.0.dirty
^ permalink raw reply related [flat|nested] 29+ messages in thread
* [PATCH v4 7/7] drm/panthor: Expose the panthor perf ioctls
2025-05-16 15:49 [PATCH v4 0/7] Performance counter implementation with single manual client support Lukas Zapolskas
` (5 preceding siblings ...)
2025-05-16 15:49 ` [PATCH v4 6/7] drm/panthor: Add suspend, resume and reset handling Lukas Zapolskas
@ 2025-05-16 15:49 ` Lukas Zapolskas
2025-07-18 15:05 ` Adrián Larumbe
2025-07-18 15:19 ` Adrián Larumbe
6 siblings, 2 replies; 29+ messages in thread
From: Lukas Zapolskas @ 2025-05-16 15:49 UTC (permalink / raw)
To: Boris Brezillon, Steven Price, Liviu Dudau, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
dri-devel, linux-kernel
Cc: Adrián Larumbe, Lukas Zapolskas
This patch implements the PANTHOR_PERF_CONTROL ioctl series, and
a PANTHOR_GET_UOBJ wrapper to deal with the backwards and forwards
compatibility of the uAPI.
The minor version is bumped to indicate that the feature is now
supported.
Signed-off-by: Lukas Zapolskas <lukas.zapolskas@arm.com>
Reviewed-by: Adrián Larumbe <adrian.larumbe@collabora.com>
---
drivers/gpu/drm/panthor/panthor_drv.c | 141 +++++++++++++++++++++++++-
1 file changed, 139 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c
index 4c1381320859..850a894fe91b 100644
--- a/drivers/gpu/drm/panthor/panthor_drv.c
+++ b/drivers/gpu/drm/panthor/panthor_drv.c
@@ -31,6 +31,7 @@
#include "panthor_gpu.h"
#include "panthor_heap.h"
#include "panthor_mmu.h"
+#include "panthor_perf.h"
#include "panthor_regs.h"
#include "panthor_sched.h"
@@ -73,6 +74,39 @@ panthor_set_uobj(u64 usr_ptr, u32 usr_size, u32 min_size, u32 kern_size, const v
return 0;
}
+/**
+ * panthor_get_uobj() - Copy kernel object to user object.
+ * @usr_ptr: Users pointer.
+ * @usr_size: Size of the user object.
+ * @min_size: Minimum size for this object.
+ *
+ * Helper automating kernel -> user object copies.
+ *
+ * Don't use this function directly, use PANTHOR_UOBJ_GET() instead.
+ *
+ * Return: valid pointer on success, an encoded error code otherwise.
+ */
+static void*
+panthor_get_uobj(u64 usr_ptr, u32 usr_size, u32 min_size)
+{
+ int ret;
+ void *out_alloc __free(kvfree) = NULL;
+
+ /* User size shouldn't be smaller than the minimal object size. */
+ if (usr_size < min_size)
+ return ERR_PTR(-EINVAL);
+
+ out_alloc = kvmalloc(min_size, GFP_KERNEL);
+ if (!out_alloc)
+ return ERR_PTR(-ENOMEM);
+
+ ret = copy_struct_from_user(out_alloc, min_size, u64_to_user_ptr(usr_ptr), usr_size);
+ if (ret)
+ return ERR_PTR(ret);
+
+ return_ptr(out_alloc);
+}
+
/**
* panthor_get_uobj_array() - Copy a user object array into a kernel accessible object array.
* @in: The object array to copy.
@@ -176,7 +210,12 @@ panthor_get_uobj_array(const struct drm_panthor_obj_array *in, u32 min_stride,
PANTHOR_UOBJ_DECL(struct drm_panthor_queue_submit, syncs), \
PANTHOR_UOBJ_DECL(struct drm_panthor_queue_create, ringbuf_size), \
PANTHOR_UOBJ_DECL(struct drm_panthor_vm_bind_op, syncs), \
- PANTHOR_UOBJ_DECL(struct drm_panthor_perf_info, shader_blocks))
+ PANTHOR_UOBJ_DECL(struct drm_panthor_perf_info, shader_blocks), \
+ PANTHOR_UOBJ_DECL(struct drm_panthor_perf_cmd_setup, shader_enable_mask), \
+ PANTHOR_UOBJ_DECL(struct drm_panthor_perf_cmd_start, user_data), \
+ PANTHOR_UOBJ_DECL(struct drm_panthor_perf_cmd_stop, user_data), \
+ PANTHOR_UOBJ_DECL(struct drm_panthor_perf_cmd_sample, user_data))
+
/**
* PANTHOR_UOBJ_SET() - Copy a kernel object to a user object.
@@ -191,6 +230,24 @@ panthor_get_uobj_array(const struct drm_panthor_obj_array *in, u32 min_stride,
PANTHOR_UOBJ_MIN_SIZE(_src_obj), \
sizeof(_src_obj), &(_src_obj))
+/**
+ * PANTHOR_UOBJ_GET() - Copies a user object from _usr_ptr to a kernel accessible _dest_ptr.
+ * @_dest_ptr: Local variable
+ * @_usr_size: Size of the user object.
+ * @_usr_ptr: The pointer of the object in userspace.
+ *
+ * Return: Error code. See panthor_get_uobj().
+ */
+#define PANTHOR_UOBJ_GET(_dest_ptr, _usr_size, _usr_ptr) \
+ ({ \
+ typeof(_dest_ptr) _tmp; \
+ _tmp = panthor_get_uobj(_usr_ptr, _usr_size, \
+ PANTHOR_UOBJ_MIN_SIZE(_tmp[0])); \
+ if (!IS_ERR(_tmp)) \
+ _dest_ptr = _tmp; \
+ PTR_ERR_OR_ZERO(_tmp); \
+ })
+
/**
* PANTHOR_UOBJ_GET_ARRAY() - Copy a user object array to a kernel accessible
* object array.
@@ -1339,6 +1396,83 @@ static int panthor_ioctl_vm_get_state(struct drm_device *ddev, void *data,
return 0;
}
+#define perf_cmd(command) \
+ ({ \
+ struct drm_panthor_perf_cmd_##command *command##_args __free(kvfree) = NULL; \
+ int _ret = PANTHOR_UOBJ_GET(command##_args, args->size, args->pointer); \
+ if (_ret) \
+ return _ret; \
+ return panthor_perf_session_##command(pfile, ptdev->perf, args->handle, \
+ command##_args->user_data); \
+ })
+
+static int panthor_ioctl_perf_control(struct drm_device *ddev, void *data,
+ struct drm_file *file)
+{
+ struct panthor_device *ptdev = container_of(ddev, struct panthor_device, base);
+ struct panthor_file *pfile = file->driver_priv;
+ struct drm_panthor_perf_control *args = data;
+ int ret;
+
+ if (!args->pointer) {
+ switch (args->cmd) {
+ case DRM_PANTHOR_PERF_COMMAND_SETUP:
+ args->size = sizeof(struct drm_panthor_perf_cmd_setup);
+ return 0;
+
+ case DRM_PANTHOR_PERF_COMMAND_TEARDOWN:
+ args->size = 0;
+ return 0;
+
+ case DRM_PANTHOR_PERF_COMMAND_START:
+ args->size = sizeof(struct drm_panthor_perf_cmd_start);
+ return 0;
+
+ case DRM_PANTHOR_PERF_COMMAND_STOP:
+ args->size = sizeof(struct drm_panthor_perf_cmd_stop);
+ return 0;
+
+ case DRM_PANTHOR_PERF_COMMAND_SAMPLE:
+ args->size = sizeof(struct drm_panthor_perf_cmd_sample);
+ return 0;
+
+ default:
+ return -EINVAL;
+ }
+ }
+
+ switch (args->cmd) {
+ case DRM_PANTHOR_PERF_COMMAND_SETUP:
+ {
+ struct drm_panthor_perf_cmd_setup *setup_args __free(kvfree) = NULL;
+
+ ret = PANTHOR_UOBJ_GET(setup_args, args->size, args->pointer);
+ if (ret)
+ return -EINVAL;
+
+ return panthor_perf_session_setup(ptdev, ptdev->perf, setup_args, pfile);
+ }
+ case DRM_PANTHOR_PERF_COMMAND_TEARDOWN:
+ {
+ return panthor_perf_session_teardown(pfile, ptdev->perf, args->handle);
+ }
+ case DRM_PANTHOR_PERF_COMMAND_START:
+ {
+ perf_cmd(start);
+ }
+ case DRM_PANTHOR_PERF_COMMAND_STOP:
+ {
+ perf_cmd(stop);
+ }
+ case DRM_PANTHOR_PERF_COMMAND_SAMPLE:
+ {
+ perf_cmd(sample);
+ }
+ default:
+ return -EINVAL;
+ }
+}
+
static int
panthor_open(struct drm_device *ddev, struct drm_file *file)
{
@@ -1409,6 +1543,7 @@ static const struct drm_ioctl_desc panthor_drm_driver_ioctls[] = {
PANTHOR_IOCTL(TILER_HEAP_CREATE, tiler_heap_create, DRM_RENDER_ALLOW),
PANTHOR_IOCTL(TILER_HEAP_DESTROY, tiler_heap_destroy, DRM_RENDER_ALLOW),
PANTHOR_IOCTL(GROUP_SUBMIT, group_submit, DRM_RENDER_ALLOW),
+ PANTHOR_IOCTL(PERF_CONTROL, perf_control, DRM_RENDER_ALLOW),
};
static int panthor_mmap(struct file *filp, struct vm_area_struct *vma)
@@ -1518,6 +1653,8 @@ static void panthor_debugfs_init(struct drm_minor *minor)
* - 1.2 - adds DEV_QUERY_GROUP_PRIORITIES_INFO query
* - adds PANTHOR_GROUP_PRIORITY_REALTIME priority
* - 1.3 - adds DRM_PANTHOR_GROUP_STATE_INNOCENT flag
+ * - 1.4 - adds DEV_QUERY_PERF_INFO query
+ * - adds PERF_CONTROL ioctl
*/
static const struct drm_driver panthor_drm_driver = {
.driver_features = DRIVER_RENDER | DRIVER_GEM | DRIVER_SYNCOBJ |
@@ -1531,7 +1668,7 @@ static const struct drm_driver panthor_drm_driver = {
.name = "panthor",
.desc = "Panthor DRM driver",
.major = 1,
- .minor = 3,
+ .minor = 4,
.gem_create_object = panthor_gem_create_object,
.gem_prime_import_sg_table = drm_gem_shmem_prime_import_sg_table,
--
2.33.0.dirty
^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCH v4 4/7] drm/panthor: Introduce sampling sessions to handle userspace clients
2025-05-16 15:49 ` [PATCH v4 4/7] drm/panthor: Introduce sampling sessions to handle userspace clients Lukas Zapolskas
@ 2025-05-17 7:53 ` kernel test robot
2025-06-20 15:28 ` Steven Price
2025-07-18 3:34 ` Adrián Larumbe
2 siblings, 0 replies; 29+ messages in thread
From: kernel test robot @ 2025-05-17 7:53 UTC (permalink / raw)
To: Lukas Zapolskas, Boris Brezillon, Steven Price, Liviu Dudau,
Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
Simona Vetter, dri-devel, linux-kernel
Cc: llvm, oe-kbuild-all, Adrián Larumbe, Lukas Zapolskas
Hi Lukas,
kernel test robot noticed the following build errors:
[auto build test ERROR on 96c85e428ebaeacd2c640eba075479ab92072ccd]
url: https://github.com/intel-lab-lkp/linux/commits/Lukas-Zapolskas/drm-panthor-Add-performance-counter-uAPI/20250517-000257
base: 96c85e428ebaeacd2c640eba075479ab92072ccd
patch link: https://lore.kernel.org/r/0319137f966f2dbffc54e51f7a2a3cbac837507b.1747148172.git.lukas.zapolskas%40arm.com
patch subject: [PATCH v4 4/7] drm/panthor: Introduce sampling sessions to handle userspace clients
config: i386-buildonly-randconfig-002-20250517 (https://download.01.org/0day-ci/archive/20250517/202505171509.6i95NZ0n-lkp@intel.com/config)
compiler: clang version 20.1.2 (https://github.com/llvm/llvm-project 58df0ef89dd64126512e4ee27b4ac3fd8ddf6247)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250517/202505171509.6i95NZ0n-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202505171509.6i95NZ0n-lkp@intel.com/
All errors (new ones prefixed by >>):
>> drivers/gpu/drm/panthor/panthor_perf.c:189:8: error: redefinition of 'panthor_perf_counter_block'
189 | struct panthor_perf_counter_block {
| ^
drivers/gpu/drm/panthor/panthor_perf.c:47:8: note: previous definition is here
47 | struct panthor_perf_counter_block {
| ^
drivers/gpu/drm/panthor/panthor_perf.c:233:29: error: call to undeclared function 'GPU_MEM_FEATURES_L2_SLICES'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
233 | perf_info->memsys_blocks = GPU_MEM_FEATURES_L2_SLICES(ptdev->gpu_info.mem_features);
| ^
2 errors generated.
vim +/panthor_perf_counter_block +189 drivers/gpu/drm/panthor/panthor_perf.c
05182d1d6cff3c7 Lukas Zapolskas 2025-05-16 188
1c26af93f15f9e2 Lukas Zapolskas 2025-05-16 @189 struct panthor_perf_counter_block {
1c26af93f15f9e2 Lukas Zapolskas 2025-05-16 190 struct drm_panthor_perf_block_header header;
1c26af93f15f9e2 Lukas Zapolskas 2025-05-16 191 u64 counters[];
1c26af93f15f9e2 Lukas Zapolskas 2025-05-16 192 };
1c26af93f15f9e2 Lukas Zapolskas 2025-05-16 193
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v4 5/7] drm/panthor: Implement the counter sampler and sample handling
2025-05-16 15:49 ` [PATCH v4 5/7] drm/panthor: Implement the counter sampler and sample handling Lukas Zapolskas
@ 2025-05-17 8:56 ` kernel test robot
2025-07-18 14:49 ` Adrián Larumbe
1 sibling, 0 replies; 29+ messages in thread
From: kernel test robot @ 2025-05-17 8:56 UTC (permalink / raw)
To: Lukas Zapolskas, Boris Brezillon, Steven Price, Liviu Dudau,
Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
Simona Vetter, dri-devel, linux-kernel
Cc: llvm, oe-kbuild-all, Adrián Larumbe, Lukas Zapolskas
Hi Lukas,
kernel test robot noticed the following build errors:
[auto build test ERROR on 96c85e428ebaeacd2c640eba075479ab92072ccd]
url: https://github.com/intel-lab-lkp/linux/commits/Lukas-Zapolskas/drm-panthor-Add-performance-counter-uAPI/20250517-000257
base: 96c85e428ebaeacd2c640eba075479ab92072ccd
patch link: https://lore.kernel.org/r/7005fb2eba3abbb2ee95282d117f70c8a7c8555f.1747148172.git.lukas.zapolskas%40arm.com
patch subject: [PATCH v4 5/7] drm/panthor: Implement the counter sampler and sample handling
config: i386-buildonly-randconfig-002-20250517 (https://download.01.org/0day-ci/archive/20250517/202505171601.i0qhMG1O-lkp@intel.com/config)
compiler: clang version 20.1.2 (https://github.com/llvm/llvm-project 58df0ef89dd64126512e4ee27b4ac3fd8ddf6247)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250517/202505171601.i0qhMG1O-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202505171601.i0qhMG1O-lkp@intel.com/
All errors (new ones prefixed by >>):
drivers/gpu/drm/panthor/panthor_perf.c:381:8: error: redefinition of 'panthor_perf_counter_block'
381 | struct panthor_perf_counter_block {
| ^
drivers/gpu/drm/panthor/panthor_perf.c:126:8: note: previous definition is here
126 | struct panthor_perf_counter_block {
| ^
>> drivers/gpu/drm/panthor/panthor_perf.c:651:15: error: redefinition of 'session_get_user_sample_size'
651 | static size_t session_get_user_sample_size(const struct drm_panthor_perf_info *const info)
| ^
drivers/gpu/drm/panthor/panthor_perf.c:391:15: note: previous definition is here
391 | static size_t session_get_user_sample_size(const struct drm_panthor_perf_info *const info)
| ^
>> drivers/gpu/drm/panthor/panthor_perf.c:1038:19: error: incompatible pointer types passing '__u64 (*)[2]' (aka 'unsigned long long (*)[2]') to parameter of type 'u64 *' (aka 'unsigned long long *') [-Werror,-Wincompatible-pointer-types]
1038 | bitmap_to_arr64(&blk->header.enable_mask, blk_em, PANTHOR_PERF_EM_BITS);
| ^~~~~~~~~~~~~~~~~~~~~~~~
include/linux/bitmap.h:313:27: note: passing argument to parameter 'buf' here
313 | void bitmap_to_arr64(u64 *buf, const unsigned long *bitmap, unsigned int nbits);
| ^
3 errors generated.
vim +/session_get_user_sample_size +651 drivers/gpu/drm/panthor/panthor_perf.c
650
> 651 static size_t session_get_user_sample_size(const struct drm_panthor_perf_info *const info)
652 {
653 const size_t block_size = get_annotated_block_size(info->counters_per_block);
654 const size_t block_nr = info->cshw_blocks + info->fw_blocks +
655 info->tiler_blocks + info->memsys_blocks + info->shader_blocks;
656
657 return sizeof(struct drm_panthor_perf_sample_header) + (block_size * block_nr);
658 }
659
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v4 4/7] drm/panthor: Introduce sampling sessions to handle userspace clients
2025-05-16 15:49 ` [PATCH v4 4/7] drm/panthor: Introduce sampling sessions to handle userspace clients Lukas Zapolskas
2025-05-17 7:53 ` kernel test robot
@ 2025-06-20 15:28 ` Steven Price
2025-07-21 9:58 ` Lukas Zapolskas
2025-07-18 3:34 ` Adrián Larumbe
2 siblings, 1 reply; 29+ messages in thread
From: Steven Price @ 2025-06-20 15:28 UTC (permalink / raw)
To: Lukas Zapolskas, Boris Brezillon, Liviu Dudau, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
dri-devel, linux-kernel
Cc: Adrián Larumbe
Hi Lukas,
I was going to try testing this out, but it doesn't look functional. See
below.
On 16/05/2025 16:49, Lukas Zapolskas wrote:
[...]
> diff --git a/drivers/gpu/drm/panthor/panthor_perf.c b/drivers/gpu/drm/panthor/panthor_perf.c
> index 9365ce9fed04..15fa533731f3 100644
> --- a/drivers/gpu/drm/panthor/panthor_perf.c
> +++ b/drivers/gpu/drm/panthor/panthor_perf.c
> @@ -2,13 +2,177 @@
> /* Copyright 2023 Collabora Ltd */
> /* Copyright 2025 Arm ltd. */
>
> -#include <linux/bitops.h>
> +#include <drm/drm_gem.h>
> #include <drm/panthor_drm.h>
> +#include <linux/bitops.h>
> +#include <linux/circ_buf.h>
>
> #include "panthor_device.h"
> #include "panthor_fw.h"
> #include "panthor_perf.h"
>
> +/**
> + * PANTHOR_PERF_EM_BITS - Number of bits in a user-facing enable mask. This must correspond
> + * to the maximum number of counters available for selection on the newest
> + * Mali GPUs (128 as of the Mali-Gx15).
> + */
> +#define PANTHOR_PERF_EM_BITS (BITS_PER_TYPE(u64) * 2)
> +
> +enum panthor_perf_session_state {
> + /** @PANTHOR_PERF_SESSION_ACTIVE: The session is active and can be used for sampling. */
> + PANTHOR_PERF_SESSION_ACTIVE = 0,
> +
> + /**
> + * @PANTHOR_PERF_SESSION_OVERFLOW: The session encountered an overflow in one of the
> + * counters during the last sampling period. This flag
> + * gets propagated as part of samples emitted for this
> + * session, to ensure the userspace client can gracefully
> + * handle this data corruption.
> + */
> + PANTHOR_PERF_SESSION_OVERFLOW,
> +
> + /* Must be last */
> + PANTHOR_PERF_SESSION_MAX,
> +};
> +
> +struct panthor_perf_enable_masks {
> + /**
> + * @mask: Array of bitmasks indicating the counters userspace requested, where
> + * one bit represents a single counter. Used to build the firmware configuration
> + * and ensure that userspace clients obtain only the counters they requested.
> + */
> + unsigned long mask[DRM_PANTHOR_PERF_BLOCK_MAX][BITS_TO_LONGS(PANTHOR_PERF_EM_BITS)];
> +};
> +
> +struct panthor_perf_counter_block {
> + struct drm_panthor_perf_block_header header;
> + u64 counters[];
> +};
I think something has gone rather wrong in a rebasing. This struct was
already added in patch 2. So this causes a build error (that the kernel
test robot caught too).
[...]
> @@ -72,6 +236,122 @@ static void panthor_perf_info_init(struct panthor_device *ptdev)
> perf_info->sample_size = session_get_user_sample_size(perf_info);
> }
>
> +static struct panthor_perf_enable_masks *panthor_perf_create_em(struct drm_panthor_perf_cmd_setup
> + *setup_args)
There's some code style mis-formatting like this - which is then fixed
up in patch 5. So it looks like you've applied fixups to the wrong commit.
Also this series will need rebasing because there's some upstream
changes that it's now conflicting with. The base commit looks pretty
ancient now.
Thanks,
Steve
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v4 1/7] drm/panthor: Add performance counter uAPI
2025-05-16 15:49 ` [PATCH v4 1/7] drm/panthor: Add performance counter uAPI Lukas Zapolskas
@ 2025-07-18 2:43 ` Adrián Larumbe
2025-07-21 8:46 ` Lukas Zapolskas
0 siblings, 1 reply; 29+ messages in thread
From: Adrián Larumbe @ 2025-07-18 2:43 UTC (permalink / raw)
To: Lukas Zapolskas
Cc: Boris Brezillon, Steven Price, Liviu Dudau, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
dri-devel, linux-kernel, Mihail Atanassov
Hi Lucas,
On 16.05.2025 16:49, Lukas Zapolskas wrote:
> This patch extends the DEV_QUERY ioctl to return information about the
> performance counter setup for userspace, and introduces the new
> ioctl DRM_PANTHOR_PERF_CONTROL in order to allow for the sampling of
> performance counters.
>
> The new design is inspired by the perf aux ringbuffer, with the insert
> and extract indices being mapped to userspace, allowing multiple samples
> to be exposed at any given time. To avoid pointer chasing, the sample
> metadata and block metadata are inline with the elements they
> describe.
Is the perf aux ringbuffer something internal to ARM's DDK?
> Userspace is responsible for passing in resources for samples to be
> exposed, including the event file descriptor for notification of new
> sample availability, the ringbuffer BO to store samples, and the
> control BO along with the offset for mapping the insert and extract
> indices. Though these indices are only a total of 8 bytes, userspace
> can then reuse the same physical page for tracking the state of
> multiple buffers by giving different offsets from the BO start to
> map them.
>
> Co-developed-by: Mihail Atanassov <mihail.atanassov@arm.com>
> Signed-off-by: Mihail Atanassov <mihail.atanassov@arm.com>
> Signed-off-by: Lukas Zapolskas <lukas.zapolskas@arm.com>
> ---
> include/uapi/drm/panthor_drm.h | 565 +++++++++++++++++++++++++++++++++
> 1 file changed, 565 insertions(+)
>
> diff --git a/include/uapi/drm/panthor_drm.h b/include/uapi/drm/panthor_drm.h
> index 97e2c4510e69..a74eabcabbcb 100644
> --- a/include/uapi/drm/panthor_drm.h
> +++ b/include/uapi/drm/panthor_drm.h
> @@ -127,6 +127,9 @@ enum drm_panthor_ioctl_id {
>
> /** @DRM_PANTHOR_TILER_HEAP_DESTROY: Destroy a tiler heap. */
> DRM_PANTHOR_TILER_HEAP_DESTROY,
> +
> + /** @DRM_PANTHOR_PERF_CONTROL: Control a performance counter session. */
> + DRM_PANTHOR_PERF_CONTROL,
> };
>
> /**
> @@ -226,6 +229,9 @@ enum drm_panthor_dev_query_type {
> * @DRM_PANTHOR_DEV_QUERY_GROUP_PRIORITIES_INFO: Query allowed group priorities information.
> */
> DRM_PANTHOR_DEV_QUERY_GROUP_PRIORITIES_INFO,
> +
> + /** @DRM_PANTHOR_DEV_QUERY_PERF_INFO: Query performance counter interface information. */
> + DRM_PANTHOR_DEV_QUERY_PERF_INFO,
> };
>
> /**
> @@ -379,6 +385,135 @@ struct drm_panthor_group_priorities_info {
> __u8 pad[3];
> };
>
> +/**
> + * enum drm_panthor_perf_feat_flags - Performance counter configuration feature flags.
> + */
> +enum drm_panthor_perf_feat_flags {
> + /** @DRM_PANTHOR_PERF_BLOCK_STATES_SUPPORT: Coarse-grained block states are supported. */
> + DRM_PANTHOR_PERF_BLOCK_STATES_SUPPORT = 1 << 0,
> +};
> +
> +/**
> + * enum drm_panthor_perf_block_type - Performance counter supported block types.
> + */
> +enum drm_panthor_perf_block_type {
> + /** @DRM_PANTHOR_PERF_BLOCK_METADATA: Internal use only. */
> + DRM_PANTHOR_PERF_BLOCK_METADATA = 0,
> +
> + /** @DRM_PANTHOR_PERF_BLOCK_FW: The FW counter block. */
> + DRM_PANTHOR_PERF_BLOCK_FW,
> +
> + /** @DRM_PANTHOR_PERF_BLOCK_CSHW: The CSHW counter block. */
> + DRM_PANTHOR_PERF_BLOCK_CSHW,
> +
> + /** @DRM_PANTHOR_PERF_BLOCK_TILER: The tiler counter block. */
> + DRM_PANTHOR_PERF_BLOCK_TILER,
> +
> + /** @DRM_PANTHOR_PERF_BLOCK_MEMSYS: A memsys counter block. */
> + DRM_PANTHOR_PERF_BLOCK_MEMSYS,
> +
> + /** @DRM_PANTHOR_PERF_BLOCK_SHADER: A shader core counter block. */
> + DRM_PANTHOR_PERF_BLOCK_SHADER,
> +
> + /** @DRM_PANTHOR_PERF_BLOCK_FIRST: Internal use only. */
> + DRM_PANTHOR_PERF_BLOCK_FIRST = DRM_PANTHOR_PERF_BLOCK_FW,
> +
> + /** @DRM_PANTHOR_PERF_BLOCK_LAST: Internal use only. */
> + DRM_PANTHOR_PERF_BLOCK_LAST = DRM_PANTHOR_PERF_BLOCK_SHADER,
> +
> + /** @DRM_PANTHOR_PERF_BLOCK_MAX: Internal use only. */
> + DRM_PANTHOR_PERF_BLOCK_MAX = DRM_PANTHOR_PERF_BLOCK_LAST + 1,
> +};
> +
> +/**
> + * enum drm_panthor_perf_clock - Identifier of the clock used to produce the cycle count values
> + * in a given block.
> + *
> + * Since the integrator has the choice of using one or more clocks, there may be some confusion
> + * as to which blocks are counted by which clock values unless this information is explicitly
> + * provided as part of every block sample. Not every single clock here can be used: in the simplest
> + * case, all cycle counts will be associated with the top-level clock.
> + */
> +enum drm_panthor_perf_clock {
> + /** @DRM_PANTHOR_PERF_CLOCK_TOPLEVEL: Top-level CSF clock. */
> + DRM_PANTHOR_PERF_CLOCK_TOPLEVEL,
> +
> + /**
> + * @DRM_PANTHOR_PERF_CLOCK_COREGROUP: Core group clock, responsible for the MMU, L2
> + * caches and the tiler.
> + */
> + DRM_PANTHOR_PERF_CLOCK_COREGROUP,
> +
> + /** @DRM_PANTHOR_PERF_CLOCK_SHADER: Clock for the shader cores. */
> + DRM_PANTHOR_PERF_CLOCK_SHADER,
> +};
> +
> +/**
> + * struct drm_panthor_perf_info - Performance counter interface information
> + *
> + * Structure grouping all queryable information relating to the performance counter
> + * interfaces.
> + */
> +struct drm_panthor_perf_info {
> + /**
> + * @counters_per_block: The number of 8-byte counters available in a block.
> + */
> + __u32 counters_per_block;
> +
> + /**
> + * @sample_header_size: The size of the header struct available at the beginning
> + * of every sample.
> + */
> + __u32 sample_header_size;
> +
> + /**
> + * @block_header_size: The size of the header struct inline with the counters for a
> + * single block.
> + */
> + __u32 block_header_size;
> +
> + /**
> + * @sample_size: The size of a fully annotated sample, starting with a sample header
> + * of size @sample_header_size bytes, and all available blocks for the current
> + * configuration, each comprised of @counters_per_block 64-bit counters and
> + * a block header of @block_header_size bytes.
> + *
> + * The user must use this field to allocate size for the ring buffer. In
> + * the case of new blocks being added, an old userspace can always use
> + * this field and ignore any blocks it does not know about.
> + */
> + __u32 sample_size;
I might've asked this question in a previous review, but couldn't user space easily calculate
the sample size with sample_header_size + block_header_size*(?_blocks) + (?_blocks)*counters_per_block ?
> + /** @flags: Combination of drm_panthor_perf_feat_flags flags. */
> + __u32 flags;
> +
> + /**
> + * @supported_clocks: Bitmask of the clocks supported by the GPU.
> + *
> + * Each bit represents a variant of the enum drm_panthor_perf_clock.
> + *
> + * For the same GPU, different implementers may have different clocks for the same hardware
> + * block. At the moment, up to four clocks are supported, and any clocks that are present
> + * will be reported here.
However, there seems to be just three clocks in in the drm_panthor_perf_clock enum definition.
t> + */
> + __u32 supported_clocks;
> +
> + /** @fw_blocks: Number of FW blocks available. */
> + __u32 fw_blocks;
> +
> + /** @cshw_blocks: Number of CSHW blocks available. */
> + __u32 cshw_blocks;
> +
> + /** @tiler_blocks: Number of tiler blocks available. */
> + __u32 tiler_blocks;
> +
> + /** @memsys_blocks: Number of memsys blocks available. */
> + __u32 memsys_blocks;
> +
> + /** @shader_blocks: Number of shader core blocks available. */
> + __u32 shader_blocks;
> +};
> +
> /**
> * struct drm_panthor_dev_query - Arguments passed to DRM_PANTHOR_IOCTL_DEV_QUERY
> */
> @@ -977,6 +1112,434 @@ struct drm_panthor_tiler_heap_destroy {
> __u32 pad;
> };
>
> +/**
> + * DOC: Performance counter decoding in userspace.
> + *
> + * Each sample will be exposed to userspace in the following manner:
> + *
> + * +--------+--------+------------------------+--------+-------------------------+-----+
> + * | Sample | Block | Block | Block | Block | ... |
> + * | header | header | counters | header | counters | |
> + * +--------+--------+------------------------+--------+-------------------------+-----+
> + *
> + * Each sample will start with a sample header of type @struct drm_panthor_perf_sample header,
> + * providing sample-wide information like the start and end timestamps, the counter set currently
> + * configured, and any errors that may have occurred during sampling.
> + *
> + * After the fixed size header, the sample will consist of blocks of
> + * 64-bit @drm_panthor_dev_query_perf_info::counters_per_block counters, each prefaced with a
> + * header of its own, indicating source block type, as well as the cycle count needed to normalize
> + * cycle values within that block, and a clock source identifier.
> + */
> +
> +/**
> + * enum drm_panthor_perf_block_state - Bitmask of the power and execution states that an individual
> + * hardware block went through in a sampling period.
> + *
> + * Because the sampling period is controlled from userspace, the block may undergo multiple
> + * state transitions, so this must be interpreted as one or more such transitions occurring.
> + */
> +enum drm_panthor_perf_block_state {
> + /**
> + * @DRM_PANTHOR_PERF_BLOCK_STATE_UNKNOWN: The state of this block was unknown during
> + * the sampling period.
> + */
> + DRM_PANTHOR_PERF_BLOCK_STATE_UNKNOWN = 0,
> +
> + /**
> + * @DRM_PANTHOR_PERF_BLOCK_STATE_ON: This block was powered on for some or all of
> + * the sampling period.
> + */
> + DRM_PANTHOR_PERF_BLOCK_STATE_ON = 1 << 0,
> +
> + /**
> + * @DRM_PANTHOR_PERF_BLOCK_STATE_OFF: This block was powered off for some or all of the
> + * sampling period.
> + */
> + DRM_PANTHOR_PERF_BLOCK_STATE_OFF = 1 << 1,
> +
> + /**
> + * @DRM_PANTHOR_PERF_BLOCK_STATE_AVAILABLE: This block was available for execution for
> + * some or all of the sampling period.
> + */
> + DRM_PANTHOR_PERF_BLOCK_STATE_AVAILABLE = 1 << 2,
> + /**
> + * @DRM_PANTHOR_PERF_BLOCK_STATE_UNAVAILABLE: This block was unavailable for execution for
> + * some or all of the sampling period.
> + */
> + DRM_PANTHOR_PERF_BLOCK_STATE_UNAVAILABLE = 1 << 3,
> +
> + /**
> + * @DRM_PANTHOR_PERF_BLOCK_STATE_NORMAL: This block was executing in normal mode
> + * for some or all of the sampling period.
> + */
> + DRM_PANTHOR_PERF_BLOCK_STATE_NORMAL = 1 << 4,
> +
> + /**
> + * @DRM_PANTHOR_PERF_BLOCK_STATE_PROTECTED: This block was executing in protected mode
> + * for some or all of the sampling period.
> + */
> + DRM_PANTHOR_PERF_BLOCK_STATE_PROTECTED = 1 << 5,
> +};
> +
> +/**
> + * struct drm_panthor_perf_block_header - Header present before every block in the
> + * sample ringbuffer.
> + */
> +struct drm_panthor_perf_block_header {
> + /** @block_type: Type of the block. */
> + __u8 block_type;
> +
> + /** @block_idx: Block index. */
> + __u8 block_idx;
> +
> + /**
> + * @block_states: Coarse-grained block transitions, bitmask of enum
> + * drm_panthor_perf_block_states.
> + */
> + __u8 block_states;
> +
> + /**
> + * @clock: Clock used to produce the cycle count for this block, taken from
> + * enum drm_panthor_perf_clock. The cycle counts are stored in the sample header.
> + */
> + __u8 clock;
> +
> + /** @pad: MBZ. */
> + __u8 pad[4];
> +
> + /** @enable_mask: Bitmask of counters requested during the session setup. */
> + __u64 enable_mask[2];
> +};
> +
> +/**
> + * enum drm_panthor_perf_sample_flags - Sample-wide events that occurred over the sampling
> + * period.
> + */
> +enum drm_panthor_perf_sample_flags {
> + /**
> + * @DRM_PANTHOR_PERF_SAMPLE_OVERFLOW: This sample contains overflows due to the duration
> + * of the sampling period.
> + */
> + DRM_PANTHOR_PERF_SAMPLE_OVERFLOW = 1 << 0,
> +
> + /**
> + * @DRM_PANTHOR_PERF_SAMPLE_ERROR: This sample encountered an error condition during
> + * the sample duration.
> + */
> + DRM_PANTHOR_PERF_SAMPLE_ERROR = 1 << 1,
> +};
> +
> +/**
> + * struct drm_panthor_perf_sample_header - Header present before every sample.
> + */
> +struct drm_panthor_perf_sample_header {
> + /**
> + * @timestamp_start_ns: Earliest timestamp that values in this sample represent, in
> + * nanoseconds. Derived from CLOCK_MONOTONIC_RAW.
> + */
> + __u64 timestamp_start_ns;
> +
> + /**
> + * @timestamp_end_ns: Latest timestamp that values in this sample represent, in
> + * nanoseconds. Derived from CLOCK_MONOTONIC_RAW.
> + */
> + __u64 timestamp_end_ns;
> +
> + /** @block_set: Set of performance counter blocks. */
> + __u8 block_set;
> +
> + /** @pad: MBZ. */
> + __u8 pad[3];
> +
> + /** @flags: Current sample flags, combination of drm_panthor_perf_sample_flags. */
> + __u32 flags;
> +
> + /**
> + * @user_data: User data provided as part of the command that triggered this sample.
> + *
> + * - Automatic samples (periodic ones or those around non-counting periods or power state
> + * transitions) will be tagged with the user_data provided as part of the
> + * DRM_PANTHOR_PERF_COMMAND_START call.
> + * - Manual samples will be tagged with the user_data provided with the
> + * DRM_PANTHOR_PERF_COMMAND_SAMPLE call.
> + * - A session's final automatic sample will be tagged with the user_data provided with the
> + * DRM_PANTHOR_PERF_COMMAND_STOP call.
> + */
> + __u64 user_data;
> +
> + /**
> + * @toplevel_clock_cycles: The number of cycles elapsed between
> + * drm_panthor_perf_sample_header::timestamp_start_ns and
> + * drm_panthor_perf_sample_header::timestamp_end_ns on the top-level clock if the
> + * corresponding bit is set in drm_panthor_perf_info::supported_clocks.
> + */
> + __u64 toplevel_clock_cycles;
> +
> + /**
> + * @coregroup_clock_cycles: The number of cycles elapsed between
> + * drm_panthor_perf_sample_header::timestamp_start_ns and
> + * drm_panthor_perf_sample_header::timestamp_end_ns on the coregroup clock if the
> + * corresponding bit is set in drm_panthor_perf_info::supported_clocks.
> + */
> + __u64 coregroup_clock_cycles;
> +
> + /**
> + * @shader_clock_cycles: The number of cycles elapsed between
> + * drm_panthor_perf_sample_header::timestamp_start_ns and
> + * drm_panthor_perf_sample_header::timestamp_end_ns on the shader core clock if the
> + * corresponding bit is set in drm_panthor_perf_info::supported_clocks.
> + */
> + __u64 shader_clock_cycles;
> +};
> +
> +/**
> + * enum drm_panthor_perf_command - Command type passed to the DRM_PANTHOR_PERF_CONTROL
> + * IOCTL.
> + */
> +enum drm_panthor_perf_command {
> + /** @DRM_PANTHOR_PERF_COMMAND_SETUP: Create a new performance counter sampling context. */
> + DRM_PANTHOR_PERF_COMMAND_SETUP,
> +
> + /** @DRM_PANTHOR_PERF_COMMAND_TEARDOWN: Teardown a performance counter sampling context. */
> + DRM_PANTHOR_PERF_COMMAND_TEARDOWN,
> +
> + /** @DRM_PANTHOR_PERF_COMMAND_START: Start a sampling session on the indicated context. */
> + DRM_PANTHOR_PERF_COMMAND_START,
> +
> + /** @DRM_PANTHOR_PERF_COMMAND_STOP: Stop the sampling session on the indicated context. */
> + DRM_PANTHOR_PERF_COMMAND_STOP,
> +
> + /**
> + * @DRM_PANTHOR_PERF_COMMAND_SAMPLE: Request a manual sample on the indicated context.
> + *
> + * When the sampling session is configured with a non-zero sampling frequency, any
> + * DRM_PANTHOR_PERF_CONTROL calls with this command will be ignored and return an
> + * -EINVAL.
> + */
> + DRM_PANTHOR_PERF_COMMAND_SAMPLE,
> +};
> +
> +/**
> + * struct drm_panthor_perf_control - Arguments passed to DRM_PANTHOR_IOCTL_PERF_CONTROL.
> + */
> +struct drm_panthor_perf_control {
> + /** @cmd: Command from enum drm_panthor_perf_command. */
> + __u32 cmd;
> +
> + /**
> + * @handle: session handle.
> + *
> + * Returned by the DRM_PANTHOR_PERF_COMMAND_SETUP call.
> + * It must be used in subsequent commands for the same context.
> + */
> + __u32 handle;
> +
> + /**
> + * @size: size of the command structure.
> + *
> + * If the pointer is NULL, the size is updated by the driver to provide the size of the
> + * output structure. If the pointer is not NULL, the driver will only copy min(size,
> + * struct_size) to the pointer and update the size accordingly.
> + */
> + __u64 size;
> +
> + /**
> + * @pointer: user pointer to a command type struct, such as
> + * @struct drm_panthor_perf_cmd_start.
> + */
> + __u64 pointer;
> +};
> +
> +/**
> + * enum drm_panthor_perf_counter_set - The counter set to be requested from the hardware.
> + *
> + * The hardware supports a single performance counter set at a time, so requesting any set other
> + * than the primary may fail if another process is sampling at the same time.
> + *
> + * If in doubt, the primary counter set has the most commonly used counters and requires no
> + * additional permissions to open.
> + */
> +enum drm_panthor_perf_counter_set {
> + /**
> + * @DRM_PANTHOR_PERF_SET_PRIMARY: The default set configured on the hardware.
> + *
> + * This is the only set for which all counters in all blocks are defined.
> + */
> + DRM_PANTHOR_PERF_SET_PRIMARY,
> +
> + /**
> + * @DRM_PANTHOR_PERF_SET_SECONDARY: The secondary performance counter set.
> + *
> + * Some blocks may not have any defined counters for this set, and the block will
> + * have the UNAVAILABLE block state permanently set in the block header.
> + *
> + * Accessing this set requires the calling process to have the CAP_PERFMON capability.
> + */
> + DRM_PANTHOR_PERF_SET_SECONDARY,
> +
> + /**
> + * @DRM_PANTHOR_PERF_SET_TERTIARY: The tertiary performance counter set.
> + *
> + * Some blocks may not have any defined counters for this set, and the block will have
> + * the UNAVAILABLE block state permanently set in the block header. Note that the
> + * tertiary set has the fewest defined counter blocks.
> + *
> + * Accessing this set requires the calling process to have the CAP_PERFMON capability.
> + */
> + DRM_PANTHOR_PERF_SET_TERTIARY,
> +};
> +
> +/**
> + * struct drm_panthor_perf_ringbuf_control - Struct used to map in the ring buffer control indices
> + * into memory shared between user and kernel.
> + *
> + */
> +struct drm_panthor_perf_ringbuf_control {
> + /**
> + * @extract_idx: The index of the latest sample that was processed by userspace. Only
> + * modifiable by userspace.
> + */
> + __u64 extract_idx;
> +
> + /**
> + * @insert_idx: The index of the latest sample emitted by the kernel. Only modiable by
> + * modifiable by the kernel.
> + */
> + __u64 insert_idx;
> +};
> +
> +/**
> + * struct drm_panthor_perf_cmd_setup - Arguments passed to DRM_PANTHOR_IOCTL_PERF_CONTROL
> + * when the DRM_PANTHOR_PERF_COMMAND_SETUP command is specified.
> + */
> +struct drm_panthor_perf_cmd_setup {
> + /**
> + * @block_set: Set of performance counter blocks, member of
> + * enum drm_panthor_perf_block_set.
> + *
> + * This is a global configuration and only one set can be active at a time. If
> + * another client has already requested a counter set, any further requests
> + * for a different counter set will fail and return an -EBUSY.
> + *
> + * If the requested set does not exist, the request will fail and return an -EINVAL.
> + *
> + * Some sets have additional requirements to be enabled, and the setup request will
> + * fail with an -EACCES if these requirements are not satisfied.
> + */
> + __u8 block_set;
> +
> + /** @pad: MBZ. */
> + __u8 pad[7];
> +
> + /** @fd: eventfd for signalling the availability of a new sample. */
> + __u32 fd;
> +
> + /** @ringbuf_handle: Handle to the BO to write perf counter sample to. */
> + __u32 ringbuf_handle;
> +
> + /**
> + * @control_handle: Handle to the BO containing a contiguous 16 byte range, used for the
> + * insert and extract indices for the ringbuffer.
> + */
> + __u32 control_handle;
> +
> + /**
> + * @sample_slots: The number of slots available in the userspace-provided BO. Must be
> + * a power of 2.
> + *
> + * If sample_slots * sample_size does not match the BO size, the setup request will fail.
> + */
> + __u32 sample_slots;
> +
> + /**
> + * @control_offset: Offset into the control BO where the insert and extract indices are
> + * located.
> + */
> + __u64 control_offset;
> +
> + /**
> + * @sample_freq_ns: Period between automatic counter sample collection in nanoseconds. Zero
> + * disables automatic collection and all collection must be done through explicit calls
> + * to DRM_PANTHOR_PERF_CONTROL.SAMPLE. Non-zero values will disable manual counter sampling
> + * via the DRM_PANTHOR_PERF_COMMAND_SAMPLE command.
> + *
> + * This disables software-triggered periodic sampling, but hardware will still trigger
> + * automatic samples on certain events, including shader core power transitions, and
> + * entries to and exits from non-counting periods. The final stop command will also
> + * trigger a sample to ensure no data is lost.
> + */
> + __u64 sample_freq_ns;
> +
> + /**
> + * @fw_enable_mask: Bitmask of counters to request from the FW counter block. Any bits
> + * past the first drm_panthor_perf_info.counters_per_block bits will be ignored. Bit 0
> + * corresponds to counter 0.
> + */
> + __u64 fw_enable_mask[2];
> +
> + /**
> + * @cshw_enable_mask: Bitmask of counters to request from the CSHW counter block. Any bits
> + * past the first drm_panthor_perf_info.counters_per_block bits will be ignored. Bit 0
> + * corresponds to counter 0.
> + */
> + __u64 cshw_enable_mask[2];
> +
> + /**
> + * @tiler_enable_mask: Bitmask of counters to request from the tiler counter block. Any
> + * bits past the first drm_panthor_perf_info.counters_per_block bits will be ignored. Bit
> + * 0 corresponds to counter 0.
> + */
> + __u64 tiler_enable_mask[2];
> +
> + /**
> + * @memsys_enable_mask: Bitmask of counters to request from the memsys counter blocks. Any
> + * bits past the first drm_panthor_perf_info.counters_per_block bits will be ignored. Bit 0
> + * corresponds to counter 0.
> + */
> + __u64 memsys_enable_mask[2];
> +
> + /**
> + * @shader_enable_mask: Bitmask of counters to request from the shader core counter blocks.
> + * Any bits past the first drm_panthor_perf_info.counters_per_block bits will be ignored.
> + * Bit 0 corresponds to counter 0.
> + */
> + __u64 shader_enable_mask[2];
> +};
> +
> +/**
> + * struct drm_panthor_perf_cmd_start - Arguments passed to DRM_PANTHOR_IOCTL_PERF_CONTROL
> + * when the DRM_PANTHOR_PERF_COMMAND_START command is specified.
> + */
> +struct drm_panthor_perf_cmd_start {
> + /**
> + * @user_data: User provided data that will be attached to automatic samples collected
> + * until the next DRM_PANTHOR_PERF_COMMAND_STOP.
> + */
> + __u64 user_data;
> +};
> +
> +/**
> + * struct drm_panthor_perf_cmd_stop - Arguments passed to DRM_PANTHOR_IOCTL_PERF_CONTROL
> + * when the DRM_PANTHOR_PERF_COMMAND_STOP command is specified.
> + */
> +struct drm_panthor_perf_cmd_stop {
> + /**
> + * @user_data: User provided data that will be attached to the automatic sample collected
> + * at the end of this sampling session.
> + */
> + __u64 user_data;
> +};
> +
> +/**
> + * struct drm_panthor_perf_cmd_sample - Arguments passed to DRM_PANTHOR_IOCTL_PERF_CONTROL
> + * when the DRM_PANTHOR_PERF_COMMAND_SAMPLE command is specified.
> + */
> +struct drm_panthor_perf_cmd_sample {
> + /** @user_data: User provided data that will be attached to the sample.*/
> + __u64 user_data;
> +};
> +
> /**
> * DRM_IOCTL_PANTHOR() - Build a Panthor IOCTL number
> * @__access: Access type. Must be R, W or RW.
> @@ -1019,6 +1582,8 @@ enum {
> DRM_IOCTL_PANTHOR(WR, TILER_HEAP_CREATE, tiler_heap_create),
> DRM_IOCTL_PANTHOR_TILER_HEAP_DESTROY =
> DRM_IOCTL_PANTHOR(WR, TILER_HEAP_DESTROY, tiler_heap_destroy),
> + DRM_IOCTL_PANTHOR_PERF_CONTROL =
> + DRM_IOCTL_PANTHOR(WR, PERF_CONTROL, perf_control)
> };
>
> #if defined(__cplusplus)
> --
> 2.33.0.dirty
Adrian Larumbe
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v4 2/7] drm/panthor: Add DEV_QUERY.PERF_INFO handling for Gx10
2025-05-16 15:49 ` [PATCH v4 2/7] drm/panthor: Add DEV_QUERY.PERF_INFO handling for Gx10 Lukas Zapolskas
@ 2025-07-18 2:52 ` Adrián Larumbe
2025-07-21 9:04 ` Lukas Zapolskas
2025-07-18 15:11 ` Adrián Larumbe
1 sibling, 1 reply; 29+ messages in thread
From: Adrián Larumbe @ 2025-07-18 2:52 UTC (permalink / raw)
To: Lukas Zapolskas
Cc: Boris Brezillon, Steven Price, Liviu Dudau, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
dri-devel, linux-kernel
On 16.05.2025 16:49, Lukas Zapolskas wrote:
> This change adds the IOCTL to query data about the performance counter
> setup. Some of this data was available via previous DEV_QUERY calls,
> for instance for GPU info, but exposing it via PERF_INFO
> minimizes the overhead of creating a single session to just the one
> aggregate IOCTL.
>
> Signed-off-by: Lukas Zapolskas <lukas.zapolskas@arm.com>
> Reviewed-by: Adrián Larumbe <adrian.larumbe@collabora.com>
> ---
> drivers/gpu/drm/panthor/Makefile | 1 +
> drivers/gpu/drm/panthor/panthor_device.c | 5 ++
> drivers/gpu/drm/panthor/panthor_device.h | 3 +
> drivers/gpu/drm/panthor/panthor_drv.c | 10 +++-
> drivers/gpu/drm/panthor/panthor_fw.h | 3 +
> drivers/gpu/drm/panthor/panthor_perf.c | 76 ++++++++++++++++++++++++
> drivers/gpu/drm/panthor/panthor_perf.h | 15 +++++
> drivers/gpu/drm/panthor/panthor_regs.h | 1 +
> 8 files changed, 113 insertions(+), 1 deletion(-)
> create mode 100644 drivers/gpu/drm/panthor/panthor_perf.c
> create mode 100644 drivers/gpu/drm/panthor/panthor_perf.h
>
> diff --git a/drivers/gpu/drm/panthor/Makefile b/drivers/gpu/drm/panthor/Makefile
> index 15294719b09c..0df9947f3575 100644
> --- a/drivers/gpu/drm/panthor/Makefile
> +++ b/drivers/gpu/drm/panthor/Makefile
> @@ -9,6 +9,7 @@ panthor-y := \
> panthor_gpu.o \
> panthor_heap.o \
> panthor_mmu.o \
> + panthor_perf.o \
> panthor_sched.o
>
> obj-$(CONFIG_DRM_PANTHOR) += panthor.o
> diff --git a/drivers/gpu/drm/panthor/panthor_device.c b/drivers/gpu/drm/panthor/panthor_device.c
> index a9da1d1eeb70..76b4cf3dc391 100644
> --- a/drivers/gpu/drm/panthor/panthor_device.c
> +++ b/drivers/gpu/drm/panthor/panthor_device.c
> @@ -19,6 +19,7 @@
> #include "panthor_fw.h"
> #include "panthor_gpu.h"
> #include "panthor_mmu.h"
> +#include "panthor_perf.h"
> #include "panthor_regs.h"
> #include "panthor_sched.h"
>
> @@ -259,6 +260,10 @@ int panthor_device_init(struct panthor_device *ptdev)
> if (ret)
> goto err_unplug_fw;
>
> + ret = panthor_perf_init(ptdev);
> + if (ret)
> + goto err_unplug_fw;
goto err_unplug_sched;
[...]
err_disable_autosuspend:
pm_runtime_dont_use_autosuspend(ptdev->base.dev);
err_unplug_sched:
panthor_sched_unplug(ptdev);
[...]
> +
> /* ~3 frames */
> pm_runtime_set_autosuspend_delay(ptdev->base.dev, 50);
> pm_runtime_use_autosuspend(ptdev->base.dev);
> diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
> index da6574021664..657ccc39568c 100644
> --- a/drivers/gpu/drm/panthor/panthor_device.h
> +++ b/drivers/gpu/drm/panthor/panthor_device.h
> @@ -120,6 +120,9 @@ struct panthor_device {
> /** @csif_info: Command stream interface information. */
> struct drm_panthor_csif_info csif_info;
>
> + /** @perf_info: Performance counter interface information. */
> + struct drm_panthor_perf_info perf_info;
> +
> /** @gpu: GPU management data. */
> struct panthor_gpu *gpu;
>
> diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c
> index 06fe46e32073..9d2b716cca45 100644
> --- a/drivers/gpu/drm/panthor/panthor_drv.c
> +++ b/drivers/gpu/drm/panthor/panthor_drv.c
> @@ -175,7 +175,8 @@ panthor_get_uobj_array(const struct drm_panthor_obj_array *in, u32 min_stride,
> PANTHOR_UOBJ_DECL(struct drm_panthor_sync_op, timeline_value), \
> PANTHOR_UOBJ_DECL(struct drm_panthor_queue_submit, syncs), \
> PANTHOR_UOBJ_DECL(struct drm_panthor_queue_create, ringbuf_size), \
> - PANTHOR_UOBJ_DECL(struct drm_panthor_vm_bind_op, syncs))
> + PANTHOR_UOBJ_DECL(struct drm_panthor_vm_bind_op, syncs), \
> + PANTHOR_UOBJ_DECL(struct drm_panthor_perf_info, shader_blocks))
>
> /**
> * PANTHOR_UOBJ_SET() - Copy a kernel object to a user object.
> @@ -835,6 +836,10 @@ static int panthor_ioctl_dev_query(struct drm_device *ddev, void *data, struct d
> args->size = sizeof(priorities_info);
> return 0;
>
> + case DRM_PANTHOR_DEV_QUERY_PERF_INFO:
> + args->size = sizeof(ptdev->perf_info);
> + return 0;
> +
> default:
> return -EINVAL;
> }
> @@ -859,6 +864,9 @@ static int panthor_ioctl_dev_query(struct drm_device *ddev, void *data, struct d
> panthor_query_group_priorities_info(file, &priorities_info);
> return PANTHOR_UOBJ_SET(args->pointer, args->size, priorities_info);
>
> + case DRM_PANTHOR_DEV_QUERY_PERF_INFO:
> + return PANTHOR_UOBJ_SET(args->pointer, args->size, ptdev->perf_info);
> +
> default:
> return -EINVAL;
> }
> diff --git a/drivers/gpu/drm/panthor/panthor_fw.h b/drivers/gpu/drm/panthor/panthor_fw.h
> index 6598d96c6d2a..8bcb933fa790 100644
> --- a/drivers/gpu/drm/panthor/panthor_fw.h
> +++ b/drivers/gpu/drm/panthor/panthor_fw.h
> @@ -197,8 +197,11 @@ struct panthor_fw_global_control_iface {
> u32 output_va;
> u32 group_num;
> u32 group_stride;
> +#define GLB_PERFCNT_FW_SIZE(x) ((((x) >> 16) << 8))
> u32 perfcnt_size;
> u32 instr_features;
> +#define PERFCNT_FEATURES_MD_SIZE(x) (((x) & GENMASK(3, 0)) << 8)
What does MD stand for here?
> + u32 perfcnt_features;
> };
>
> struct panthor_fw_global_input_iface {
> diff --git a/drivers/gpu/drm/panthor/panthor_perf.c b/drivers/gpu/drm/panthor/panthor_perf.c
> new file mode 100644
> index 000000000000..66e9a197ac1f
> --- /dev/null
> +++ b/drivers/gpu/drm/panthor/panthor_perf.c
> @@ -0,0 +1,76 @@
> +// SPDX-License-Identifier: GPL-2.0 or MIT
> +/* Copyright 2023 Collabora Ltd */
> +/* Copyright 2025 Arm ltd. */
> +
> +#include <linux/bitops.h>
> +#include <drm/panthor_drm.h>
> +
> +#include "panthor_device.h"
> +#include "panthor_fw.h"
> +#include "panthor_perf.h"
> +
> +struct panthor_perf_counter_block {
> + struct drm_panthor_perf_block_header header;
> + u64 counters[];
> +};
> +
> +{
> + return struct_size_t(struct panthor_perf_counter_block, counters, counters_per_block);
> +}
> +
> +static size_t session_get_user_sample_size(const struct drm_panthor_perf_info *const info)
> +{
> + const size_t block_size = get_annotated_block_size(info->counters_per_block);
> + const size_t block_nr = info->cshw_blocks + info->fw_blocks +
> + info->tiler_blocks + info->memsys_blocks + info->shader_blocks;
> +
> + return sizeof(struct drm_panthor_perf_sample_header) + (block_size * block_nr);
> +}
You're assining perf_info->counters_per_block the same sizeof() slightly further below
so maybe you can use that value here straight away.
> +
> +/**
> + * PANTHOR_PERF_COUNTERS_PER_BLOCK - On CSF architectures pre-11.x, the number of counters
> + * per block was hardcoded to be 64. Arch 11.0 onwards supports the PRFCNT_FEATURES GPU register,
> + * which indicates the same information.
> + */
I guess you're waiting for the commit in ML message <20250320111741.1937892-7-karunika.choo@arm.com>
("drm/panthor: Add support for Mali-G715 family of GPUs) to check whether GPU_ARCH_MAJOR(ptdev->gpu_info.gpu_id)
returns anything equal or above 11 to add support for reading the number of counters from PRFCNT_FEATURES?
I don't remember whether that series is already merged, but it'd be nice to have it in this one too.
> +#define PANTHOR_PERF_COUNTERS_PER_BLOCK (64)
> +
> +static void panthor_perf_info_init(struct panthor_device *ptdev)
> +{
> + struct panthor_fw_global_iface *glb_iface = panthor_fw_get_glb_iface(ptdev);
> + struct drm_panthor_perf_info *const perf_info = &ptdev->perf_info;
> +
> + if (PERFCNT_FEATURES_MD_SIZE(glb_iface->control->perfcnt_features))
> + perf_info->flags |= DRM_PANTHOR_PERF_BLOCK_STATES_SUPPORT;
> +
> + perf_info->counters_per_block = PANTHOR_PERF_COUNTERS_PER_BLOCK;
> +
> + perf_info->sample_header_size = sizeof(struct drm_panthor_perf_sample_header);
> + perf_info->block_header_size = sizeof(struct drm_panthor_perf_block_header);
> +
> + if (GLB_PERFCNT_FW_SIZE(glb_iface->control->perfcnt_size))
> + perf_info->fw_blocks = 1;
> +
> + perf_info->cshw_blocks = 1;
> + perf_info->tiler_blocks = 1;
> + perf_info->memsys_blocks = GPU_MEM_FEATURES_L2_SLICES(ptdev->gpu_info.mem_features);
> + perf_info->shader_blocks = hweight64(ptdev->gpu_info.shader_present);
> +
> + perf_info->sample_size = session_get_user_sample_size(perf_info);
> +}
> +
> +/**
> + * panthor_perf_init - Initialize the performance counter subsystem.
> + * @ptdev: Panthor device
> + *
> + * Return: 0 on success, negative error code on failure.
> + */
> +int panthor_perf_init(struct panthor_device *ptdev)
> +{
> + if (!ptdev)
> + return -EINVAL;
> +
> + panthor_perf_info_init(ptdev);
> +
> + return 0;
> +}
> diff --git a/drivers/gpu/drm/panthor/panthor_perf.h b/drivers/gpu/drm/panthor/panthor_perf.h
> new file mode 100644
> index 000000000000..3c32c24c164c
> --- /dev/null
> +++ b/drivers/gpu/drm/panthor/panthor_perf.h
> @@ -0,0 +1,15 @@
> +/* SPDX-License-Identifier: GPL-2.0 or MIT */
> +/* Copyright 2025 Collabora Ltd */
> +/* Copyright 2025 Arm ltd. */
> +
> +#ifndef __PANTHOR_PERF_H__
> +#define __PANTHOR_PERF_H__
> +
> +#include <linux/types.h>
> +
> +struct panthor_device;
> +
> +int panthor_perf_init(struct panthor_device *ptdev);
> +
> +#endif /* __PANTHOR_PERF_H__ */
> +
> diff --git a/drivers/gpu/drm/panthor/panthor_regs.h b/drivers/gpu/drm/panthor/panthor_regs.h
> index b7b3b3add166..d9e9379d1a20 100644
> --- a/drivers/gpu/drm/panthor/panthor_regs.h
> +++ b/drivers/gpu/drm/panthor/panthor_regs.h
> @@ -27,6 +27,7 @@
> #define GPU_TILER_FEATURES 0xC
> #define GPU_MEM_FEATURES 0x10
> #define GROUPS_L2_COHERENT BIT(0)
> +#define GPU_MEM_FEATURES_L2_SLICES(x) ((((x) & GENMASK(11, 8)) >> 8) + 1)
>
> #define GPU_MMU_FEATURES 0x14
> #define GPU_MMU_FEATURES_VA_BITS(x) ((x) & GENMASK(7, 0))
> --
> 2.33.0.dirty
Adrian Larumbe
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v4 3/7] drm/panthor: Add panthor perf initialization and termination
2025-05-16 15:49 ` [PATCH v4 3/7] drm/panthor: Add panthor perf initialization and termination Lukas Zapolskas
@ 2025-07-18 3:10 ` Adrián Larumbe
2025-07-21 9:10 ` Lukas Zapolskas
0 siblings, 1 reply; 29+ messages in thread
From: Adrián Larumbe @ 2025-07-18 3:10 UTC (permalink / raw)
To: Lukas Zapolskas
Cc: Boris Brezillon, Steven Price, Liviu Dudau, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
dri-devel, linux-kernel
On 16.05.2025 16:49, Lukas Zapolskas wrote:
> Added the panthor_perf system initialization and unplug code to allow
> for the handling of userspace sessions to be added in follow-up
> patches.
>
> Signed-off-by: Lukas Zapolskas <lukas.zapolskas@arm.com>
> ---
> drivers/gpu/drm/panthor/panthor_device.c | 2 +
> drivers/gpu/drm/panthor/panthor_device.h | 5 +-
> drivers/gpu/drm/panthor/panthor_perf.c | 62 +++++++++++++++++++++++-
> drivers/gpu/drm/panthor/panthor_perf.h | 1 +
> 4 files changed, 68 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/panthor/panthor_device.c b/drivers/gpu/drm/panthor/panthor_device.c
> index 76b4cf3dc391..7ac985d44655 100644
> --- a/drivers/gpu/drm/panthor/panthor_device.c
> +++ b/drivers/gpu/drm/panthor/panthor_device.c
> @@ -98,6 +98,7 @@ void panthor_device_unplug(struct panthor_device *ptdev)
> /* Now, try to cleanly shutdown the GPU before the device resources
> * get reclaimed.
> */
> + panthor_perf_unplug(ptdev);
> panthor_sched_unplug(ptdev);
> panthor_fw_unplug(ptdev);
> panthor_mmu_unplug(ptdev);
> @@ -277,6 +278,7 @@ int panthor_device_init(struct panthor_device *ptdev)
>
> err_disable_autosuspend:
> pm_runtime_dont_use_autosuspend(ptdev->base.dev);
> + panthor_perf_unplug(ptdev);
> panthor_sched_unplug(ptdev);
>
> err_unplug_fw:
> diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
> index 657ccc39568c..818c4d96d448 100644
> --- a/drivers/gpu/drm/panthor/panthor_device.h
> +++ b/drivers/gpu/drm/panthor/panthor_device.h
> @@ -27,7 +27,7 @@ struct panthor_heap_pool;
> struct panthor_job;
> struct panthor_mmu;
> struct panthor_fw;
> -struct panthor_perfcnt;
> +struct panthor_perf;
> struct panthor_vm;
> struct panthor_vm_pool;
>
> @@ -138,6 +138,9 @@ struct panthor_device {
> /** @devfreq: Device frequency scaling management data. */
> struct panthor_devfreq *devfreq;
>
> + /** @perf: Performance counter management data. */
> + struct panthor_perf *perf;
> +
> /** @unplug: Device unplug related fields. */
> struct {
> /** @lock: Lock used to serialize unplug operations. */
> diff --git a/drivers/gpu/drm/panthor/panthor_perf.c b/drivers/gpu/drm/panthor/panthor_perf.c
> index 66e9a197ac1f..9365ce9fed04 100644
> --- a/drivers/gpu/drm/panthor/panthor_perf.c
> +++ b/drivers/gpu/drm/panthor/panthor_perf.c
> @@ -9,6 +9,19 @@
> #include "panthor_fw.h"
> #include "panthor_perf.h"
You must include "panthor_regs.h" here or else GPU_MEM_FEATURES_L2_SLICES() won't be available.
However, it seems this is something that should be done in the previous patch.
>
> +struct panthor_perf {
> + /** @next_session: The ID of the next session. */
> + u32 next_session;
> +
> + /** @session_range: The number of sessions supported at a time. */
> + struct xa_limit session_range;
> +
> + /**
> + * @sessions: Global map of sessions, accessed by their ID.
> + */
> + struct xarray sessions;
> +};
> +
> struct panthor_perf_counter_block {
> struct drm_panthor_perf_block_header header;
> u64 counters[];
> @@ -63,14 +76,61 @@ static void panthor_perf_info_init(struct panthor_device *ptdev)
> * panthor_perf_init - Initialize the performance counter subsystem.
> * @ptdev: Panthor device
> *
> + * The performance counters require the FW interface to be available to setup the
> + * sampling ringbuffers, so this must be called only after FW is initialized.
> + *
> * Return: 0 on success, negative error code on failure.
> */
> int panthor_perf_init(struct panthor_device *ptdev)
> {
> + struct panthor_perf *perf __free(kfree) = NULL;
> + int ret = 0;
> +
> if (!ptdev)
> return -EINVAL;
>
> panthor_perf_info_init(ptdev);
>
> - return 0;
> + perf = kzalloc(sizeof(*perf), GFP_KERNEL);
> + if (ZERO_OR_NULL_PTR(perf))
> + return -ENOMEM;
> +
> + xa_init_flags(&perf->sessions, XA_FLAGS_ALLOC);
> +
> + perf->session_range = (struct xa_limit) {
> + .min = 0,
> + .max = 1,
> + };
> +
> + drm_info(&ptdev->base, "Performance counter subsystem initialized");
> +
> + ptdev->perf = no_free_ptr(perf);
> +
> + return ret;
> +}
> +
> +/**
> + * panthor_perf_unplug - Terminate the performance counter subsystem.
> + * @ptdev: Panthor device.
> + *
> + * This function will terminate the performance counter control structures and any remaining
> + * sessions, after waiting for any pending interrupts.
> + */
> +void panthor_perf_unplug(struct panthor_device *ptdev)
> +{
> + struct panthor_perf *perf = ptdev->perf;
> +
> + if (!perf)
> + return;
> +
> + if (!xa_empty(&perf->sessions)) {
> + drm_err(&ptdev->base,
> + "Performance counter sessions active when unplugging the driver!");
> + }
I think this could only happen if someone forces module unload, even
though there might still be processes which haven't yet closed the DRM
file?
> +
> + xa_destroy(&perf->sessions);
> +
> + kfree(ptdev->perf);
> +
> + ptdev->perf = NULL;
> }
> diff --git a/drivers/gpu/drm/panthor/panthor_perf.h b/drivers/gpu/drm/panthor/panthor_perf.h
> index 3c32c24c164c..e4805727b9e7 100644
> --- a/drivers/gpu/drm/panthor/panthor_perf.h
> +++ b/drivers/gpu/drm/panthor/panthor_perf.h
> @@ -10,6 +10,7 @@
> struct panthor_device;
>
> int panthor_perf_init(struct panthor_device *ptdev);
> +void panthor_perf_unplug(struct panthor_device *ptdev);
>
> #endif /* __PANTHOR_PERF_H__ */
>
> --
> 2.33.0.dirty
Adrian Larumbe
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v4 4/7] drm/panthor: Introduce sampling sessions to handle userspace clients
2025-05-16 15:49 ` [PATCH v4 4/7] drm/panthor: Introduce sampling sessions to handle userspace clients Lukas Zapolskas
2025-05-17 7:53 ` kernel test robot
2025-06-20 15:28 ` Steven Price
@ 2025-07-18 3:34 ` Adrián Larumbe
2025-07-21 9:53 ` Lukas Zapolskas
2 siblings, 1 reply; 29+ messages in thread
From: Adrián Larumbe @ 2025-07-18 3:34 UTC (permalink / raw)
To: Lukas Zapolskas
Cc: Boris Brezillon, Steven Price, Liviu Dudau, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
dri-devel, linux-kernel
On 16.05.2025 16:49, Lukas Zapolskas wrote:
> To allow for combining the requests from multiple userspace clients, an
> intermediary layer between the HW/FW interfaces and userspace is
> created, containing the information for the counter requests and
> tracking of insert and extract indices. Each session starts inactive and
> must be explicitly activated via PERF_CONTROL.START, and explicitly
> stopped via PERF_CONTROL.STOP. Userspace identifies a single client with
> its session ID and the panthor file it is associated with.
>
> The SAMPLE and STOP commands both produce a single sample when called,
> and these samples can be disambiguated via the opaque user data field
> passed in the PERF_CONTROL uAPI. If this functionality is not desired,
> these fields can be kept as zero, as the kernel copies this value into
> the corresponding sample without attempting to interpret it.
>
> Currently, only manual sampling sessions are supported, providing
> samples when userspace calls PERF_CONTROL.SAMPLE, and only a single
> session is allowed at a time. Multiple sessions and periodic sampling
> will be enabled in following patches.
>
> No protection is provided against the 32-bit hardware counter overflows,
> so for the moment it is up to userspace to ensure that the counters are
> sampled at a reasonable frequency.
>
> The counter set enum is added to the uapi to clarify the restrictions on
> calling the interface.
>
> Signed-off-by: Lukas Zapolskas <lukas.zapolskas@arm.com>
> ---
> drivers/gpu/drm/panthor/panthor_device.h | 3 +
> drivers/gpu/drm/panthor/panthor_drv.c | 1 +
> drivers/gpu/drm/panthor/panthor_perf.c | 694 ++++++++++++++++++++++-
> drivers/gpu/drm/panthor/panthor_perf.h | 16 +
> 4 files changed, 713 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
> index 818c4d96d448..3fa0882fe81b 100644
> --- a/drivers/gpu/drm/panthor/panthor_device.h
> +++ b/drivers/gpu/drm/panthor/panthor_device.h
> @@ -225,6 +225,9 @@ struct panthor_file {
> /** @ptdev: Device attached to this file. */
> struct panthor_device *ptdev;
>
> + /** @drm_file: Corresponding drm_file */
> + struct drm_file *drm_file;
I'm sceptical about adding this here, and suspect we don't need it. I mentioned why in the
review for the next patch.
> +
> /** @vms: VM pool attached to this file. */
> struct panthor_vm_pool *vms;
>
> diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c
> index 9d2b716cca45..4c1381320859 100644
> --- a/drivers/gpu/drm/panthor/panthor_drv.c
> +++ b/drivers/gpu/drm/panthor/panthor_drv.c
> @@ -1356,6 +1356,7 @@ panthor_open(struct drm_device *ddev, struct drm_file *file)
> }
>
> pfile->ptdev = ptdev;
> + pfile->drm_file = file;
>
> ret = panthor_vm_pool_create(pfile);
> if (ret)
> diff --git a/drivers/gpu/drm/panthor/panthor_perf.c b/drivers/gpu/drm/panthor/panthor_perf.c
> index 9365ce9fed04..15fa533731f3 100644
> --- a/drivers/gpu/drm/panthor/panthor_perf.c
> +++ b/drivers/gpu/drm/panthor/panthor_perf.c
> @@ -2,13 +2,177 @@
> /* Copyright 2023 Collabora Ltd */
> /* Copyright 2025 Arm ltd. */
>
> -#include <linux/bitops.h>
> +#include <drm/drm_gem.h>
> #include <drm/panthor_drm.h>
> +#include <linux/bitops.h>
> +#include <linux/circ_buf.h>
>
> #include "panthor_device.h"
> #include "panthor_fw.h"
> #include "panthor_perf.h"
>
> +/**
> + * PANTHOR_PERF_EM_BITS - Number of bits in a user-facing enable mask. This must correspond
> + * to the maximum number of counters available for selection on the newest
> + * Mali GPUs (128 as of the Mali-Gx15).
> + */
> +#define PANTHOR_PERF_EM_BITS (BITS_PER_TYPE(u64) * 2)
> +
> +enum panthor_perf_session_state {
> + /** @PANTHOR_PERF_SESSION_ACTIVE: The session is active and can be used for sampling. */
> + PANTHOR_PERF_SESSION_ACTIVE = 0,
> +
> + /**
> + * @PANTHOR_PERF_SESSION_OVERFLOW: The session encountered an overflow in one of the
> + * counters during the last sampling period. This flag
> + * gets propagated as part of samples emitted for this
> + * session, to ensure the userspace client can gracefully
> + * handle this data corruption.
> + */
> + PANTHOR_PERF_SESSION_OVERFLOW,
> +
> + /* Must be last */
> + PANTHOR_PERF_SESSION_MAX,
> +};
> +
> +struct panthor_perf_enable_masks {
> + /**
> + * @mask: Array of bitmasks indicating the counters userspace requested, where
> + * one bit represents a single counter. Used to build the firmware configuration
> + * and ensure that userspace clients obtain only the counters they requested.
> + */
> + unsigned long mask[DRM_PANTHOR_PERF_BLOCK_MAX][BITS_TO_LONGS(PANTHOR_PERF_EM_BITS)];
> +};
> +
> +struct panthor_perf_counter_block {
> + struct drm_panthor_perf_block_header header;
> + u64 counters[];
> +};
This is a redefinition.
> +/**
> + * enum session_sample_type - Enum of the types of samples a session can request.
> + */
> +enum session_sample_type {
> + /** @SAMPLE_TYPE_NONE: A sample has not been requested by this session. */
> + SAMPLE_TYPE_NONE,
> +
> + /** @SAMPLE_TYPE_INITIAL: An initial sample has been requested by this session. */
> + SAMPLE_TYPE_INITIAL,
> +
> + /** @SAMPLE_TYPE_REGULAR: A regular sample has been requested by this session. */
> + SAMPLE_TYPE_REGULAR,
> +};
> +
> +struct panthor_perf_session {
> + DECLARE_BITMAP(state, PANTHOR_PERF_SESSION_MAX);
> +
> + /**
> + * @pending_sample_request: The type of sample request that is currently pending:
> + * - when a sample is not requested, the data should be accumulated
> + * into the next slot of its ring buffer, but the extract index
> + * should not be updated, and the user-space session must
> + * not be signaled.
> + * - when an initial sample is requested, the data must not be
> + * emitted into the target ring buffer and the userspace client
> + * must not be notified.
> + * - when a regular sample is requested, the data must be emitted
> + * into the target ring buffer, and the userspace client must
> + * be signalled.
> + */
> + enum session_sample_type pending_sample_request;
> +
> + /**
> + * @user_sample_size: The size of a single sample as exposed to userspace. For the sake of
> + * simplicity, the current implementation exposes the same structure
> + * as provided by firmware, after annotating the sample and the blocks,
> + * and zero-extending the counters themselves (to account for in-kernel
> + * accumulation).
> + *
> + * This may also allow further memory-optimizations of compressing the
> + * sample to provide only requested blocks, if deemed to be worth the
> + * additional complexity.
> + */
> + size_t user_sample_size;
> +
> + /**
> + * @accum_idx: The last insert index indicates whether the current sample
> + * needs zeroing before accumulation. This is used to disambiguate
> + * between accumulating into an intermediate slot in the user ring buffer
> + * and zero-ing the buffer before copying data over.
> + */
> + u32 accum_idx;
> +
> + /**
> + * @sample_freq_ns: Period between subsequent sample requests. Zero indicates that
> + * userspace will be responsible for requesting samples.
> + */
> + u64 sample_freq_ns;
> +
> + /** @sample_start_ns: Sample request time, obtained from a monotonic raw clock. */
> + u64 sample_start_ns;
> +
> + /**
> + * @user_data: Opaque handle passed in when starting a session, requesting a sample (for
> + * manual sampling sessions only) and when stopping a session. This handle
> + * allows the disambiguation of a sample in the ringbuffer.
> + */
> + u64 user_data;
> +
> + /**
> + * @eventfd: Event file descriptor context used to signal userspace of a new sample
> + * being emitted.
> + */
> + struct eventfd_ctx *eventfd;
> +
> + /**
> + * @enabled_counters: This session's requested counters. Note that these cannot change
> + * for the lifetime of the session.
> + */
> + struct panthor_perf_enable_masks *enabled_counters;
> +
> + /** @ringbuf_slots: Slots in the user-facing ringbuffer. */
> + size_t ringbuf_slots;
> +
> + /** @ring_buf: BO for the userspace ringbuffer. */
> + struct drm_gem_object *ring_buf;
> +
> + /**
> + * @control_buf: BO for the insert and extract indices.
> + */
> + struct drm_gem_object *control_buf;
> +
> + /** @control: The mapped insert and extract indices. */
> + struct drm_panthor_perf_ringbuf_control *control;
> +
> + /** @samples: The mapping of the @ring_buf into the kernel's VA space. */
> + u8 *samples;
> +
> + /**
> + * @pending: The list node used by the sampler to track the sessions that have not yet
> + * received a sample.
> + */
> + struct list_head pending;
> +
> + /**
> + * @sessions: The list node used by the sampler to track the sessions waiting for a sample.
> + */
> + struct list_head sessions;
> +
> + /**
> + * @pfile: The panthor file which was used to create a session, used for the postclose
> + * handling and to prevent a misconfigured userspace from closing unrelated
> + * sessions.
> + */
> + struct panthor_file *pfile;
> +
> + /**
> + * @ref: Session reference count. The sample delivery to userspace is asynchronous, meaning
> + * the lifetime of the session must extend at least until the sample is exposed to
> + * userspace.
> + */
> + struct kref ref;
> +};
> +
> struct panthor_perf {
> /** @next_session: The ID of the next session. */
> u32 next_session;
> @@ -72,6 +236,122 @@ static void panthor_perf_info_init(struct panthor_device *ptdev)
> perf_info->sample_size = session_get_user_sample_size(perf_info);
> }
>
> +static struct panthor_perf_enable_masks *panthor_perf_create_em(struct drm_panthor_perf_cmd_setup
> + *setup_args)
> +{
> + struct panthor_perf_enable_masks *em = kmalloc(sizeof(*em), GFP_KERNEL);
> + if (IS_ERR_OR_NULL(em))
> + return em;
> +
> + bitmap_from_arr64(em->mask[DRM_PANTHOR_PERF_BLOCK_FW],
> + setup_args->fw_enable_mask, PANTHOR_PERF_EM_BITS);
> + bitmap_from_arr64(em->mask[DRM_PANTHOR_PERF_BLOCK_CSHW],
> + setup_args->cshw_enable_mask, PANTHOR_PERF_EM_BITS);
> + bitmap_from_arr64(em->mask[DRM_PANTHOR_PERF_BLOCK_TILER],
> + setup_args->tiler_enable_mask, PANTHOR_PERF_EM_BITS);
> + bitmap_from_arr64(em->mask[DRM_PANTHOR_PERF_BLOCK_MEMSYS],
> + setup_args->memsys_enable_mask, PANTHOR_PERF_EM_BITS);
> + bitmap_from_arr64(em->mask[DRM_PANTHOR_PERF_BLOCK_SHADER],
> + setup_args->shader_enable_mask, PANTHOR_PERF_EM_BITS);
> +
> + return em;
> +}
> +
> +static u64 session_read_extract_idx(struct panthor_perf_session *session)
> +{
> + const u64 slots = session->ringbuf_slots;
> +
> + /* Userspace will update their own extract index to indicate that a sample is consumed
> + * from the ringbuffer, and we must ensure we read the latest value.
> + */
> + return smp_load_acquire(&session->control->extract_idx) % slots;
> +}
> +
> +static u64 session_read_insert_idx(struct panthor_perf_session *session)
> +{
> + const u64 slots = session->ringbuf_slots;
> +
> + /*
> + * Userspace is able to write to the insert index, since it is mapped
> + * on the same page as the extract index. This should not happen
> + * in regular operation.
Why would userspace be able to write into the insert index? I guess in a
ringbuffer setup, UM updates the extract index when it consumes a
sample, and the kernel increases the insert index when it writes a new
sample into the user-facing ringbuffer.
> + */
> + return smp_load_acquire(&session->control->insert_idx) % slots;
> +}
> +
> +static void session_get(struct panthor_perf_session *session)
> +{
> + kref_get(&session->ref);
> +}
> +
> +static void session_free(struct kref *ref)
> +{
> + struct panthor_perf_session *session = container_of(ref, typeof(*session), ref);
> +
> + if (session->samples && session->ring_buf) {
> + struct iosys_map map = IOSYS_MAP_INIT_VADDR(session->samples);
> +
> + drm_gem_vunmap_unlocked(session->ring_buf, &map);
drm_gem_vunmap_unlocked() isn't declared in drm_gem.h when I rebase the patch series onto drm-misc. I guess it means either you're basing this patch series on a previous WIP branch or else it's misspelt?
> + drm_gem_object_put(session->ring_buf);
> + }
> +
> + if (session->control && session->control_buf) {
> + struct iosys_map map = IOSYS_MAP_INIT_VADDR(session->control);
> +
> + drm_gem_vunmap_unlocked(session->control_buf, &map);
> + drm_gem_object_put(session->control_buf);
> + }
> +
> + eventfd_ctx_put(session->eventfd);
> +
> + kfree(session);
> +}
> +
> +static void session_put(struct panthor_perf_session *session)
> +{
> + kref_put(&session->ref, session_free);
> +}
> +
> +/**
> + * session_find - Find a session associated with the given session ID and
> + * panthor_file.
> + * @pfile: Panthor file.
> + * @perf: Panthor perf.
> + * @sid: Session ID.
> + *
> + * The reference count of a valid session is increased to ensure it does not disappear
> + * in the window between the XA lock being dropped and the internal session functions
> + * being called.
> + *
> + * Return: valid session pointer or an ERR_PTR.
> + */
> +static struct panthor_perf_session *session_find(struct panthor_file *pfile,
> + struct panthor_perf *perf, u32 sid)
> +{
> + struct panthor_perf_session *session;
> +
> + if (!perf)
> + return ERR_PTR(-EINVAL);
> +
> + xa_lock(&perf->sessions);
> + session = xa_load(&perf->sessions, sid);
> +
> + if (!session || xa_is_err(session)) {
> + xa_unlock(&perf->sessions);
> + return ERR_PTR(-EBADF);
> + }
> +
> + if (session->pfile != pfile) {
> + xa_unlock(&perf->sessions);
> + return ERR_PTR(-EINVAL);
> + }
> +
> + session_get(session);
> + xa_unlock(&perf->sessions);
> +
> + return session;
> +}
> +
> /**
> * panthor_perf_init - Initialize the performance counter subsystem.
> * @ptdev: Panthor device
> @@ -109,6 +389,412 @@ int panthor_perf_init(struct panthor_device *ptdev)
> return ret;
> }
>
> +static int session_validate_set(u8 set)
> +{
> + if (set > DRM_PANTHOR_PERF_SET_TERTIARY)
> + return -EINVAL;
> +
> + if (set == DRM_PANTHOR_PERF_SET_PRIMARY)
> + return 0;
> +
> + if (set > DRM_PANTHOR_PERF_SET_PRIMARY)
> + return capable(CAP_PERFMON) ? 0 : -EACCES;
> +
> + return -EINVAL;
> +}
> +
> +/**
> + * panthor_perf_session_setup - Create a user-visible session.
> + *
> + * @ptdev: Handle to the panthor device.
> + * @perf: Handle to the perf control structure.
> + * @setup_args: Setup arguments passed in via ioctl.
> + * @pfile: Panthor file associated with the request.
> + *
> + * Creates a new session associated with the session ID returned. When initialized, the
> + * session must explicitly request sampling to start with a successive call to PERF_CONTROL.START.
> + *
> + * Return: non-negative session identifier on success or negative error code on failure.
> + */
> +int panthor_perf_session_setup(struct panthor_device *ptdev, struct panthor_perf *perf,
> + struct drm_panthor_perf_cmd_setup *setup_args,
> + struct panthor_file *pfile)
> +{
> + struct panthor_perf_session *session;
> + struct drm_gem_object *ringbuffer;
> + struct drm_gem_object *control;
> + const size_t slots = setup_args->sample_slots;
> + struct panthor_perf_enable_masks *em;
> + struct iosys_map rb_map, ctrl_map;
> + size_t user_sample_size;
> + int session_id;
> + int ret;
> +
> + ret = session_validate_set(setup_args->block_set);
> + if (ret) {
> + drm_err(&ptdev->base, "Did not meet requirements for set %d\n",
> + setup_args->block_set);
> + return ret;
> + }
> +
> + session = kzalloc(sizeof(*session), GFP_KERNEL);
> + if (ZERO_OR_NULL_PTR(session))
> + return -ENOMEM;
> +
> + ringbuffer = drm_gem_object_lookup(pfile->drm_file, setup_args->ringbuf_handle);
> + if (!ringbuffer) {
> + drm_err(&ptdev->base, "Could not find handle %d!\n", setup_args->ringbuf_handle);
> + ret = -EINVAL;
> + goto cleanup_session;
> + }
> +
> + control = drm_gem_object_lookup(pfile->drm_file, setup_args->control_handle);
> + if (!control) {
> + drm_err(&ptdev->base, "Could not find handle %d!\n", setup_args->control_handle);
> + ret = -EINVAL;
> + goto cleanup_ringbuf;
> + }
> +
> + user_sample_size = session_get_user_sample_size(&ptdev->perf_info) * slots;
> +
> + if (ringbuffer->size != PFN_ALIGN(user_sample_size)) {
> + drm_err(&ptdev->base, "Incorrect ringbuffer size from userspace: user %zu vs kernel %lu\n",
> + ringbuffer->size, PFN_ALIGN(user_sample_size));
> + ret = -ENOMEM;
> + goto cleanup_control;
> + }
> +
> + ret = drm_gem_vmap_unlocked(ringbuffer, &rb_map);
Same here, drm_gem_vmap_unlocked() isn't declared in any header files.
> + if (ret)
> + goto cleanup_control;
> +
> + ret = drm_gem_vmap_unlocked(control, &ctrl_map);
> + if (ret)
> + goto cleanup_ring_map;
> +
> + session->eventfd = eventfd_ctx_fdget(setup_args->fd);
> + if (IS_ERR(session->eventfd)) {
> + drm_err(&ptdev->base, "Invalid eventfd %d!\n", setup_args->fd);
> + ret = PTR_ERR_OR_ZERO(session->eventfd) ?: -EINVAL;
> + goto cleanup_control_map;
> + }
> +
> + em = panthor_perf_create_em(setup_args);
> + if (IS_ERR_OR_NULL(em)) {
> + ret = -ENOMEM;
> + goto cleanup_eventfd;
> + }
> +
> + INIT_LIST_HEAD(&session->sessions);
> + INIT_LIST_HEAD(&session->pending);
> +
> + session->control = ctrl_map.vaddr;
> + *session->control = (struct drm_panthor_perf_ringbuf_control) { 0 };
> +
> + session->samples = rb_map.vaddr;
> +
> + /* TODO This will need validation when we support periodic sampling sessions */
> + if (setup_args->sample_freq_ns) {
> + ret = -EOPNOTSUPP;
> + goto cleanup_em;
> + }
> +
> + ret = xa_alloc_cyclic(&perf->sessions, &session_id, session, perf->session_range,
> + &perf->next_session, GFP_KERNEL);
> + if (ret < 0) {
> + drm_err(&ptdev->base, "System session limit exceeded.\n");
> + ret = -EBUSY;
> + goto cleanup_em;
> + }
> +
> + kref_init(&session->ref);
> + session->enabled_counters = em;
> +
> + session->sample_freq_ns = setup_args->sample_freq_ns;
> + session->user_sample_size = user_sample_size;
> + session->ring_buf = ringbuffer;
> + session->ringbuf_slots = slots;
> + session->control_buf = control;
> + session->pfile = pfile;
> + session->accum_idx = U32_MAX;
> +
> + return session_id;
> +
> +cleanup_em:
> + kfree(em);
> +
> +cleanup_eventfd:
> + eventfd_ctx_put(session->eventfd);
> +
> +cleanup_control_map:
> + drm_gem_vunmap_unlocked(control, &ctrl_map);
> +
> +cleanup_ring_map:
> + drm_gem_vunmap_unlocked(ringbuffer, &rb_map);
> +
> +cleanup_control:
> + drm_gem_object_put(control);
> +
> +cleanup_ringbuf:
> + drm_gem_object_put(ringbuffer);
> +
> +cleanup_session:
> + kfree(session);
> +
> + return ret;
> +}
> +
> +static int session_stop(struct panthor_perf *perf, struct panthor_perf_session *session,
> + u64 user_data)
> +{
> + if (!test_bit(PANTHOR_PERF_SESSION_ACTIVE, session->state))
> + return 0;
> +
> + const u64 extract_idx = session_read_extract_idx(session);
> + const u64 insert_idx = session_read_insert_idx(session);
> +
> + /* Must have at least one slot remaining in the ringbuffer to sample. */
> + if (WARN_ON_ONCE(!CIRC_SPACE_TO_END(insert_idx, extract_idx, session->ringbuf_slots)))
> + return -EBUSY;
> +
> + session->user_data = user_data;
> +
> + clear_bit(PANTHOR_PERF_SESSION_ACTIVE, session->state);
> +
> + /* TODO Calls to the FW interface will go here in later patches. */
> + return 0;
> +}
> +
> +static int session_start(struct panthor_perf *perf, struct panthor_perf_session *session,
> + u64 user_data)
> +{
> + if (test_bit(PANTHOR_PERF_SESSION_ACTIVE, session->state))
> + return 0;
> +
> + set_bit(PANTHOR_PERF_SESSION_ACTIVE, session->state);
> +
> + /*
> + * For manual sampling sessions, a start command does not correspond to a sample,
> + * and so the user data gets discarded.
> + */
> + if (session->sample_freq_ns)
> + session->user_data = user_data;
> +
> + /* TODO Calls to the FW interface will go here in later patches. */
> + return 0;
> +}
> +
> +static int session_sample(struct panthor_perf *perf, struct panthor_perf_session *session,
> + u64 user_data)
> +{
> + if (!test_bit(PANTHOR_PERF_SESSION_ACTIVE, session->state))
> + return 0;
> +
> + const u64 extract_idx = session_read_extract_idx(session);
> + const u64 insert_idx = session_read_insert_idx(session);
> +
> + /* Manual sampling for periodic sessions is forbidden. */
> + if (session->sample_freq_ns)
> + return -EINVAL;
> +
> + /*
> + * Must have at least two slots remaining in the ringbuffer to sample: one for
> + * the current sample, and one for a stop sample, since a stop command should
> + * always be acknowledged by taking a final sample and stopping the session.
> + */
> + if (CIRC_SPACE_TO_END(insert_idx, extract_idx, session->ringbuf_slots) < 2)
> + return -EBUSY;
> +
> + session->sample_start_ns = ktime_get_raw_ns();
> + session->user_data = user_data;
> +
> + return 0;
> +}
> +
> +static int session_destroy(struct panthor_perf *perf, struct panthor_perf_session *session)
> +{
> + session_put(session);
> +
> + return 0;
> +}
> +
> +static int session_teardown(struct panthor_perf *perf, struct panthor_perf_session *session)
> +{
> + if (test_bit(PANTHOR_PERF_SESSION_ACTIVE, session->state))
> + return -EINVAL;
> +
> + if (READ_ONCE(session->pending_sample_request) == SAMPLE_TYPE_NONE)
> + return -EBUSY;
> +
> + return session_destroy(perf, session);
> +}
> +
> +/**
> + * panthor_perf_session_teardown - Teardown the session associated with the @sid.
> + * @pfile: Open panthor file.
> + * @perf: Handle to the perf control structure.
> + * @sid: Session identifier.
> + *
> + * Destroys a stopped session where the last sample has been explicitly consumed
> + * or discarded. Active sessions will be ignored.
> + *
> + * Return: 0 on success, negative error code on failure.
> + */
> +int panthor_perf_session_teardown(struct panthor_file *pfile, struct panthor_perf *perf, u32 sid)
> +{
> + int err;
> + struct panthor_perf_session *session;
> +
> + xa_lock(&perf->sessions);
> + session = __xa_store(&perf->sessions, sid, NULL, GFP_KERNEL);
> +
> + if (xa_is_err(session)) {
> + err = xa_err(session);
> + goto restore;
> + }
> +
> + if (session->pfile != pfile) {
> + err = -EINVAL;
> + goto restore;
> + }
> +
> + session_get(session);
> + xa_unlock(&perf->sessions);
> +
> + err = session_teardown(perf, session);
> +
> + session_put(session);
> +
> + return err;
> +
> +restore:
> + __xa_store(&perf->sessions, sid, session, GFP_KERNEL);
> + xa_unlock(&perf->sessions);
> +
> + return err;
> +}
> +
> +/**
> + * panthor_perf_session_start - Start sampling on a stopped session.
> + * @pfile: Open panthor file.
> + * @perf: Handle to the panthor perf control structure.
> + * @sid: Session identifier for the desired session.
> + * @user_data: An opaque value passed in from userspace.
> + *
> + * A session counts as stopped when it is created or when it is explicitly stopped after being
> + * started. Starting an active session is treated as a no-op.
> + *
> + * The @user_data parameter will be associated with all subsequent samples for a periodic
> + * sampling session and will be ignored for manual sampling ones in favor of the user data
> + * passed in the PERF_CONTROL.SAMPLE ioctl call.
> + *
> + * Return: 0 on success, negative error code on failure.
> + */
> +int panthor_perf_session_start(struct panthor_file *pfile, struct panthor_perf *perf,
> + u32 sid, u64 user_data)
> +{
> + struct panthor_perf_session *session = session_find(pfile, perf, sid);
> + int err;
> +
> + if (IS_ERR_OR_NULL(session))
> + return IS_ERR(session) ? PTR_ERR(session) : -EINVAL;
> +
> + err = session_start(perf, session, user_data);
> +
> + session_put(session);
> +
> + return err;
> +}
> +
> +/**
> + * panthor_perf_session_stop - Stop sampling on an active session.
> + * @pfile: Open panthor file.
> + * @perf: Handle to the panthor perf control structure.
> + * @sid: Session identifier for the desired session.
> + * @user_data: An opaque value passed in from userspace.
> + *
> + * A session counts as active when it has been explicitly started via the PERF_CONTROL.START
> + * ioctl. Stopping a stopped session is treated as a no-op.
> + *
> + * To ensure data is not lost when sampling is stopping, there must always be at least one slot
> + * available for the final automatic sample, and the stop command will be rejected if there is not.
> + *
> + * The @user_data will always be associated with the final sample.
> + *
> + * Return: 0 on success, negative error code on failure.
> + */
> +int panthor_perf_session_stop(struct panthor_file *pfile, struct panthor_perf *perf,
> + u32 sid, u64 user_data)
> +{
> + struct panthor_perf_session *session = session_find(pfile, perf, sid);
> + int err;
> +
> + if (IS_ERR_OR_NULL(session))
> + return IS_ERR(session) ? PTR_ERR(session) : -EINVAL;
> +
> + err = session_stop(perf, session, user_data);
> +
> + session_put(session);
> +
> + return err;
> +}
> +
> +/**
> + * panthor_perf_session_sample - Request a sample on a manual sampling session.
> + * @pfile: Open panthor file.
> + * @perf: Handle to the panthor perf control structure.
> + * @sid: Session identifier for the desired session.
> + * @user_data: An opaque value passed in from userspace.
> + *
> + * Only an active manual sampler is permitted to request samples directly. Failing to meet either
> + * of these conditions will cause the sampling request to be rejected. Requesting a manual sample
> + * with a full ringbuffer will see the request being rejected.
> + *
> + * The @user_data will always be unambiguously associated one-to-one with the resultant sample.
> + *
> + * Return: 0 on success, negative error code on failure.
> + */
> +int panthor_perf_session_sample(struct panthor_file *pfile, struct panthor_perf *perf,
> + u32 sid, u64 user_data)
> +{
> + struct panthor_perf_session *session = session_find(pfile, perf, sid);
> + int err;
> +
> + if (IS_ERR_OR_NULL(session))
> + return IS_ERR(session) ? PTR_ERR(session) : -EINVAL;
> +
> + err = session_sample(perf, session, user_data);
> +
> + session_put(session);
> +
> + return err;
> +}
> +
> +/**
> + * panthor_perf_session_destroy - Destroy a sampling session associated with the @pfile.
> + * @perf: Handle to the panthor perf control structure.
> + * @pfile: The file being closed.
> + *
> + * Must be called when the corresponding userspace process is destroyed and cannot close its
> + * own sessions. As such, we offer no guarantees about data delivery.
> + */
> +void panthor_perf_session_destroy(struct panthor_file *pfile, struct panthor_perf *perf)
> +{
> + unsigned long sid;
> + struct panthor_perf_session *session;
> +
> + if (!pfile || !perf)
> + return;
> +
> + xa_for_each(&perf->sessions, sid, session)
> + {
> + if (session->pfile == pfile) {
> + session_destroy(perf, session);
> + xa_erase(&perf->sessions, sid);
> + }
> + }
> +}
> +
> /**
> * panthor_perf_unplug - Terminate the performance counter subsystem.
> * @ptdev: Panthor device.
> @@ -124,8 +810,14 @@ void panthor_perf_unplug(struct panthor_device *ptdev)
> return;
>
> if (!xa_empty(&perf->sessions)) {
> + unsigned long sid;
> + struct panthor_perf_session *session;
> +
> drm_err(&ptdev->base,
> "Performance counter sessions active when unplugging the driver!");
> +
> + xa_for_each(&perf->sessions, sid, session)
> + session_destroy(perf, session);
> }
>
> xa_destroy(&perf->sessions);
> diff --git a/drivers/gpu/drm/panthor/panthor_perf.h b/drivers/gpu/drm/panthor/panthor_perf.h
> index e4805727b9e7..89d61cd1f017 100644
> --- a/drivers/gpu/drm/panthor/panthor_perf.h
> +++ b/drivers/gpu/drm/panthor/panthor_perf.h
> @@ -7,10 +7,26 @@
>
> #include <linux/types.h>
>
> +struct drm_panthor_perf_cmd_setup;
> struct panthor_device;
> +struct panthor_file;
> +struct panthor_perf;
>
> int panthor_perf_init(struct panthor_device *ptdev);
> void panthor_perf_unplug(struct panthor_device *ptdev);
>
> +int panthor_perf_session_setup(struct panthor_device *ptdev, struct panthor_perf *perf,
> + struct drm_panthor_perf_cmd_setup *setup_args,
> + struct panthor_file *pfile);
> +int panthor_perf_session_teardown(struct panthor_file *pfile, struct panthor_perf *perf,
> + u32 sid);
> +int panthor_perf_session_start(struct panthor_file *pfile, struct panthor_perf *perf,
> + u32 sid, u64 user_data);
> +int panthor_perf_session_stop(struct panthor_file *pfile, struct panthor_perf *perf,
> + u32 sid, u64 user_data);
> +int panthor_perf_session_sample(struct panthor_file *pfile, struct panthor_perf *perf,
> + u32 sid, u64 user_data);
> +void panthor_perf_session_destroy(struct panthor_file *pfile, struct panthor_perf *perf);
> +
> #endif /* __PANTHOR_PERF_H__ */
>
> --
> 2.33.0.dirty
Adrian Larumbe
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v4 5/7] drm/panthor: Implement the counter sampler and sample handling
2025-05-16 15:49 ` [PATCH v4 5/7] drm/panthor: Implement the counter sampler and sample handling Lukas Zapolskas
2025-05-17 8:56 ` kernel test robot
@ 2025-07-18 14:49 ` Adrián Larumbe
2025-07-25 10:29 ` Lukas Zapolskas
1 sibling, 1 reply; 29+ messages in thread
From: Adrián Larumbe @ 2025-07-18 14:49 UTC (permalink / raw)
To: Lukas Zapolskas
Cc: Boris Brezillon, Steven Price, Liviu Dudau, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
dri-devel, linux-kernel
On 16.05.2025 16:49, Lukas Zapolskas wrote:
> From: Adrián Larumbe <adrian.larumbe@collabora.com>
>
> The sampler aggregates counter and set requests coming from userspace
> and mediates interactions with the FW interface, to ensure that user
> sessions cannot override the global configuration.
>
> From the top-level interface, the sampler supports two different types
> of samples: clearing samples and regular samples. Clearing samples are
> a special sample type that allow for the creation of a sampling
> baseline, to ensure that a session does not obtain counter data from
> before its creation.
>
> Upon receipt of a relevant interrupt, corresponding to one of the three
> relevant bits of the GLB_ACK register, the sampler takes any samples
> that occurred, and, based on the insert and extract indices, accumulates
> them to an internal storage buffer after zero-extending the counters
> from the 32-bit counters emitted by the hardware to 64-bit counters
> for internal accumulation.
>
> When the performance counters are enabled, the FW ensures no counter
> data is lost when entering and leaving non-counting regions by producing
> automatic samples that do not correspond to a GLB_REQ.PRFCNT_SAMPLE
> request. Such regions may be per hardware unit, such as when a shader
> core powers down, or global. Most of these events do not directly
> correspond to session sample requests, so any intermediary counter data
> must be stored into a temporary accumulation buffer.
>
> If there are sessions waiting for a sample, this accumulated buffer will
> be taken, and emitted for each waiting client. During this phase,
> information like the timestamps of sample request and sample emission,
> type of the counter block and block index annotations are added to the
> sample header and block headers. If no sessions are waiting for
> a sample, this accumulation buffer is kept until the next time a sample
> is requested.
>
> Special handling is needed for the PRFCNT_OVERFLOW interrupt, which is
> an indication that the internal sample handling rate was insufficient.
>
> The sampler also maintains a buffer descriptor indicating the structure
> of a firmware sample, since neither the firmware nor the hardware give
> any indication of the sample structure, only that it is composed out of
> three parts:
> - the metadata is an optional initial counter block on supporting
> firmware versions that contains a single counter, indicating the
> reason a sample was taken when entering global non-counting regions.
> This is used to provide coarse-grained information about why a sample
> was taken to userspace, to help userspace interpret variations in
> counter magnitude.
> - the firmware component of the sample is composed out of a global
> firmware counter block on supporting firmware versions.
> - the hardware component is the most sizeable of the three and contains
> a block of counters for each of the underlying hardware resources. It
> has a fixed structure that is described in the architecture
> specification, and contains the command stream hardware block(s), the
> tiler block(s), the MMU and L2 blocks (collectively named the memsys
> blocks) and the shader core blocks, in that order.
> The structure of this buffer changes based on the firmware and hardware
> combination, but is constant on a single system.
>
> This buffer descriptor also handles the sparseness of the shader cores,
> wherein the physical core mask contains holes, but the memory allocated
> for it is done based on the position of the most significant bit. In
> cases with highly sparse core masks, this means that a lot of shader
> counter blocks are empty, and must be skipped.
>
> The number of ring buffer slots is configurable through module param to
> allow for a lower memory footprint on memory constrained systems.
>
> Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
> Co-developed-by: Lukas Zapolskas <lukas.zapolskas@arm.com>
> Signed-off-by: Lukas Zapolskas <lukas.zapolskas@arm.com>
> ---
> drivers/gpu/drm/panthor/panthor_fw.c | 6 +
> drivers/gpu/drm/panthor/panthor_fw.h | 6 +-
> drivers/gpu/drm/panthor/panthor_perf.c | 1082 +++++++++++++++++++++++-
> drivers/gpu/drm/panthor/panthor_perf.h | 2 +
> 4 files changed, 1080 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/gpu/drm/panthor/panthor_fw.c b/drivers/gpu/drm/panthor/panthor_fw.c
> index 0f52766a3120..e3948354daa4 100644
> --- a/drivers/gpu/drm/panthor/panthor_fw.c
> +++ b/drivers/gpu/drm/panthor/panthor_fw.c
> @@ -22,6 +22,7 @@
> #include "panthor_gem.h"
> #include "panthor_gpu.h"
> #include "panthor_mmu.h"
> +#include "panthor_perf.h"
> #include "panthor_regs.h"
> #include "panthor_sched.h"
>
> @@ -987,9 +988,12 @@ static void panthor_fw_init_global_iface(struct panthor_device *ptdev)
>
> /* Enable interrupts we care about. */
> glb_iface->input->ack_irq_mask = GLB_CFG_ALLOC_EN |
> + GLB_PERFCNT_SAMPLE |
> GLB_PING |
> GLB_CFG_PROGRESS_TIMER |
> GLB_CFG_POWEROFF_TIMER |
> + GLB_PERFCNT_THRESHOLD |
> + GLB_PERFCNT_OVERFLOW |
> GLB_IDLE_EN |
> GLB_IDLE;
>
> @@ -1018,6 +1022,8 @@ static void panthor_job_irq_handler(struct panthor_device *ptdev, u32 status)
> return;
>
> panthor_sched_report_fw_events(ptdev, status);
> +
> + panthor_perf_report_irq(ptdev, status);
> }
> PANTHOR_IRQ_HANDLER(job, JOB, panthor_job_irq_handler);
>
> diff --git a/drivers/gpu/drm/panthor/panthor_fw.h b/drivers/gpu/drm/panthor/panthor_fw.h
> index 8bcb933fa790..5a561e72e88b 100644
> --- a/drivers/gpu/drm/panthor/panthor_fw.h
> +++ b/drivers/gpu/drm/panthor/panthor_fw.h
> @@ -198,6 +198,7 @@ struct panthor_fw_global_control_iface {
> u32 group_num;
> u32 group_stride;
> #define GLB_PERFCNT_FW_SIZE(x) ((((x) >> 16) << 8))
> +#define GLB_PERFCNT_HW_SIZE(x) (((x) & GENMASK(15, 0)) << 8)
> u32 perfcnt_size;
> u32 instr_features;
> #define PERFCNT_FEATURES_MD_SIZE(x) (((x) & GENMASK(3, 0)) << 8)
> @@ -210,7 +211,7 @@ struct panthor_fw_global_input_iface {
> #define GLB_CFG_ALLOC_EN BIT(2)
> #define GLB_CFG_POWEROFF_TIMER BIT(3)
> #define GLB_PROTM_ENTER BIT(4)
> -#define GLB_PERFCNT_EN BIT(5)
> +#define GLB_PERFCNT_ENABLE BIT(5)
> #define GLB_PERFCNT_SAMPLE BIT(6)
> #define GLB_COUNTER_EN BIT(7)
> #define GLB_PING BIT(8)
> @@ -243,6 +244,9 @@ struct panthor_fw_global_input_iface {
> u64 perfcnt_base;
> u32 perfcnt_extract;
> u32 reserved3[3];
> +#define GLB_PERFCNT_CONFIG_SIZE(x) ((x) & GENMASK(7, 0))
> +#define GLB_PERFCNT_CONFIG_SET(x) (((x) & GENMASK(1, 0)) << 8)
> +#define GLB_PERFCNT_METADATA_ENABLE BIT(10)
> u32 perfcnt_config;
> u32 perfcnt_csg_select;
> u32 perfcnt_fw_enable;
> diff --git a/drivers/gpu/drm/panthor/panthor_perf.c b/drivers/gpu/drm/panthor/panthor_perf.c
> index 15fa533731f3..97603b168d2d 100644
> --- a/drivers/gpu/drm/panthor/panthor_perf.c
> +++ b/drivers/gpu/drm/panthor/panthor_perf.c
> @@ -9,7 +9,11 @@
>
> #include "panthor_device.h"
> #include "panthor_fw.h"
> +#include "panthor_gem.h"
> +#include "panthor_gpu.h"
> +#include "panthor_mmu.h"
> #include "panthor_perf.h"
> +#include "panthor_regs.h"
>
> /**
> * PANTHOR_PERF_EM_BITS - Number of bits in a user-facing enable mask. This must correspond
> @@ -18,6 +22,81 @@
> */
> #define PANTHOR_PERF_EM_BITS (BITS_PER_TYPE(u64) * 2)
>
> +/**
> + * PANTHOR_CTR_TIMESTAMP_LO - The first architecturally mandated counter of every block type
> + * contains the low 32-bits of the TIMESTAMP value.
> + */
> +#define PANTHOR_CTR_TIMESTAMP_LO (0)
> +
> +/**
> + * PANTHOR_CTR_TIMESTAMP_HI - The register offset containinig the high 32-bits of the TIMESTAMP
> + * value.
> + */
> +#define PANTHOR_CTR_TIMESTAMP_HI (1)
> +
> +/**
> + * PANTHOR_CTR_PRFCNT_EN - The register offset containing the enable mask for the enabled counters
> + * that were written to memory.
> + */
> +#define PANTHOR_CTR_PRFCNT_EN (2)
> +
> +/**
> + * PANTHOR_HEADER_COUNTERS - The first four counters of every block type are architecturally
> + * defined to be equivalent. The fourth counter is always reserved,
> + * and should be zero and as such, does not have a separate define.
> + *
> + * These are the only four counters that are the same between different
> + * blocks and are consistent between different architectures.
> + */
> +#define PANTHOR_HEADER_COUNTERS (4)
> +
> +/**
> + * PANTHOR_CTR_SAMPLE_REASON - The metadata block has a single value in position three which
> + * indicates the reason a sample was taken.
> + */
> +#define PANTHOR_CTR_SAMPLE_REASON (3)
> +
> +/**
> + * PANTHOR_HW_COUNTER_SIZE - The size of a hardware counter in the FW ring buffer.
> + */
> +#define PANTHOR_HW_COUNTER_SIZE (sizeof(u32))
> +
> +/**
> + * PANTHOR_PERF_RINGBUF_SLOTS_MIN - The minimum permitted number of slots in the Panthor perf
> + * ring buffer.
> + */
> +#define PANTHOR_PERF_RINGBUF_SLOTS_MIN (16)
> +
> +/**
> + * PANTHOR_PERF_RINGBUF_SLOTS_MAX - The maximum permitted number of slots in the Panthor perf
> + * ring buffer.
> + */
> +#define PANTHOR_PERF_RINGBUF_SLOTS_MAX (256)
> +
> +static unsigned int perf_ringbuf_slots = 32;
> +
> +static int perf_ringbuf_slots_set(const char *val, const struct kernel_param *kp)
> +{
> + unsigned int slots;
> + int ret = kstrtouint(val, 0, &slots);
> +
> + if (ret)
> + return ret;
> +
> + if (!is_power_of_2(slots))
> + return -EINVAL;
> +
> + return param_set_uint_minmax(val, kp, 16, 256);
> +}
> +
> +static const struct kernel_param_ops perf_ringbuf_ops = {
> + .set = perf_ringbuf_slots_set,
> + .get = param_get_uint,
> +};
> +module_param_cb(perf_ringbuf_slots, &perf_ringbuf_ops, &perf_ringbuf_slots, 0400);
> +MODULE_PARM_DESC(perf_ringbuf_slots,
> + "Power of two slots allocated for the Panthor perf kernel-FW ringbuffer");
> +
> enum panthor_perf_session_state {
> /** @PANTHOR_PERF_SESSION_ACTIVE: The session is active and can be used for sampling. */
> PANTHOR_PERF_SESSION_ACTIVE = 0,
> @@ -63,6 +142,116 @@ enum session_sample_type {
> SAMPLE_TYPE_REGULAR,
> };
>
> +struct panthor_perf_buffer_descriptor {
> + /**
> + * @block_size: The size of a single block in the FW ring buffer, equal to
> + * sizeof(u32) * counters_per_block.
> + */
> + size_t block_size;
> +
> + /**
> + * @buffer_size: The total size of the buffer, equal to (#hardware blocks +
> + * #firmware blocks) * block_size.
> + */
> + size_t buffer_size;
> +
> + /**
> + * @available_blocks: Bitmask indicating the blocks supported by the hardware and firmware
> + * combination. Note that this can also include blocks that will not
> + * be exposed to the user.
> + */
> + DECLARE_BITMAP(available_blocks, DRM_PANTHOR_PERF_BLOCK_MAX);
> + struct {
> + /** @offset: Starting offset of a block of type @type in the FW ringbuffer. */
> + size_t offset;
> +
> + /** @block_count: Number of blocks of the given @type, starting at @offset. */
> + size_t block_count;
> +
> + /** @phys_mask: Bitmask of the physically available blocks. */
> + u64 phys_mask;
> + } blocks[DRM_PANTHOR_PERF_BLOCK_MAX];
> +};
> +
> +/**
> + * struct panthor_perf_sampler - Interface to de-multiplex firmware interaction and handle
> + * global interactions.
> + */
> +struct panthor_perf_sampler {
> + /**
> + * @enabled_clients: The number of clients concurrently requesting samples. To ensure that
> + * one client cannot deny samples to another, we must ensure that clients
> + * are effectively reference counted.
> + */
> + atomic_t enabled_clients;
> +
> + /**
> + * @sample_handled: Synchronization point between the interrupt bottom half and the
> + * main sampler interface. Must be re-armed solely on a new request
> + * coming to the sampler.
> + */
> + struct completion sample_handled;
> +
> + /** @rb: Kernel BO in the FW AS containing the sample ringbuffer. */
> + struct panthor_kernel_bo *rb;
> +
> + /**
> + * @sample_slots: Number of slots for samples in the FW ringbuffer. Could be static,
> + * but may be useful to customize for low-memory devices.
> + */
> + size_t sample_slots;
> +
> + /** @em: Combined enable mask for all of the active sessions. */
> + struct panthor_perf_enable_masks *em;
> +
> + /**
> + * @desc: Buffer descriptor for a sample in the FW ringbuffer. Note that this buffer
> + * at current time does some interesting things with the zeroth block type. On
> + * newer FW revisions, the first counter block of the sample is the METADATA block,
> + * which contains a single value indicating the reason the sample was taken (if
> + * any). This block must not be exposed to userspace, as userspace does not
> + * have sufficient context to interpret it. As such, this block type is not
> + * added to the uAPI, but we still use it in the kernel.
> + */
> + struct panthor_perf_buffer_descriptor desc;
> +
> + /**
> + * @sample: Pointer to an upscaled and annotated sample that may be emitted to userspace.
> + * This is used both as an intermediate buffer to do the zero-extension of the
> + * 32-bit counters to 64-bits and as a storage buffer in case the sampler
> + * requests an additional sample that was not requested by any of the top-level
> + * sessions (for instance, when changing the enable masks).
> + */
> + u8 *sample;
> +
> + /**
> + * @sampler_lock: Lock used to guard the list of sessions and the sampler configuration.
> + * In particular, it guards the @session_list and the @em.
> + */
> + struct mutex sampler_lock;
> +
> + /** @session_list: List of all sessions. */
> + struct list_head session_list;
> +
> + /** @pend_lock: Lock used to guard the list of sessions with pending samples. */
> + spinlock_t pend_lock;
> +
> + /** @pending_samples: List of sessions requesting samples. */
> + struct list_head pending_samples;
> +
> + /** @sample_requested: A sample has been requested. */
> + bool sample_requested;
> +
> + /** @set_config: The set that will be configured onto the hardware. */
> + u8 set_config;
> +
> + /**
> + * @ptdev: Backpointer to the Panthor device, needed to ring the global doorbell and
> + * interface with FW.
> + */
> + struct panthor_device *ptdev;
> +};
> +
> struct panthor_perf_session {
> DECLARE_BITMAP(state, PANTHOR_PERF_SESSION_MAX);
>
> @@ -184,6 +373,9 @@ struct panthor_perf {
> * @sessions: Global map of sessions, accessed by their ID.
> */
> struct xarray sessions;
> +
> + /** @sampler: FW control interface. */
> + struct panthor_perf_sampler sampler;
> };
>
> struct panthor_perf_counter_block {
> @@ -237,7 +429,7 @@ static void panthor_perf_info_init(struct panthor_device *ptdev)
> }
>
> static struct panthor_perf_enable_masks *panthor_perf_create_em(struct drm_panthor_perf_cmd_setup
> - *setup_args)
> + *setup_args)
> {
> struct panthor_perf_enable_masks *em = kmalloc(sizeof(*em), GFP_KERNEL);
> if (IS_ERR_OR_NULL(em))
> @@ -257,6 +449,23 @@ static struct panthor_perf_enable_masks *panthor_perf_create_em(struct drm_panth
> return em;
> }
>
> +static void panthor_perf_em_add(struct panthor_perf_enable_masks *dst_em,
> + const struct panthor_perf_enable_masks *const src_em)
I think that, maybe for the sake of consistency, also make dst_em const? Just
the pointer variable itself, not what it points to:
struct panthor_perf_enable_masks *const dst_em
> +{
> + size_t i = 0;
> +
> + for (i = DRM_PANTHOR_PERF_BLOCK_FIRST; i <= DRM_PANTHOR_PERF_BLOCK_LAST; i++)
> + bitmap_or(dst_em->mask[i], dst_em->mask[i], src_em->mask[i], PANTHOR_PERF_EM_BITS);
> +}
> +
> +static void panthor_perf_em_zero(struct panthor_perf_enable_masks *em)
> +{
> + size_t i = 0;
> +
> + for (i = DRM_PANTHOR_PERF_BLOCK_FIRST; i <= DRM_PANTHOR_PERF_BLOCK_LAST; i++)
> + bitmap_zero(em->mask[i], PANTHOR_PERF_EM_BITS);
> +}
> +
> static u64 session_read_extract_idx(struct panthor_perf_session *session)
> {
> const u64 slots = session->ringbuf_slots;
> @@ -267,6 +476,12 @@ static u64 session_read_extract_idx(struct panthor_perf_session *session)
> return smp_load_acquire(&session->control->extract_idx) % slots;
> }
>
> +static void session_write_insert_idx(struct panthor_perf_session *session, u64 idx)
> +{
> + /* Userspace needs the insert index to know where to look for the sample. */
> + smp_store_release(&session->control->insert_idx, idx);
> +}
> +
> static u64 session_read_insert_idx(struct panthor_perf_session *session)
> {
> const u64 slots = session->ringbuf_slots;
> @@ -326,7 +541,7 @@ static void session_put(struct panthor_perf_session *session)
> * Return: valid session pointer or an ERR_PTR.
> */
> static struct panthor_perf_session *session_find(struct panthor_file *pfile,
> - struct panthor_perf *perf, u32 sid)
> + struct panthor_perf *perf, u32 sid)
> {
> struct panthor_perf_session *session;
>
> @@ -352,6 +567,761 @@ static struct panthor_perf_session *session_find(struct panthor_file *pfile,
> return session;
> }
>
> +static u32 compress_enable_mask(unsigned long *const src)
> +{
> + size_t i;
> + u32 result = 0;
> + unsigned long clump;
> +
> + for_each_set_clump8(i, clump, src, PANTHOR_PERF_EM_BITS) {
> + const unsigned long shift = div_u64(i, 4);
> +
> + result |= !!(clump & GENMASK(3, 0)) << shift;
> + result |= !!(clump & GENMASK(7, 4)) << (shift + 1);
> + }
> +
> + return result;
> +}
I think what you're trying to do here is, because one enable bit enables four consecutive counters,
if we have a mask of 32 bit, then that could be used to stand for the enablement status of
32 * 4 = 128 counters altogether. However, one thing I keep wondering is, in what circumstances
not all bits in a group of four could all be enabled at the same time (since they seem to be enabled
at the user's request in groups of four anyways).
Also, I think there's a problem in how you handle the shift.
In the case of four consecutive bytes being handled:
In for_each_set_clump8(i, [...]) , 'i' can take values between 0 and 15, so div_u64(i, 4)
will return values between 0 and 3.
Then let's say for 8-bit clumps with indeces between 0 and 3, you'd want to flag the 8 LS bits
of 'result', but at the moment you're just overwriting them for successive clumps with the same modulo 'i'.
I think what you meant was most likely this:
```
for_each_set_clump8(i, clump, src, PANTHOR_PERF_EM_BITS) {
result |= !!(clump & GENMASK(3, 0)) << (i * 2);
result |= !!(clump & GENMASK(7, 4)) << (i * 2) + 1;
}
```
> +
> +static void expand_enable_mask(u32 em, unsigned long *const dst)
> +{
> + size_t i;
> + DECLARE_BITMAP(emb, BITS_PER_TYPE(u32));
> +
> + bitmap_from_arr32(emb, &em, BITS_PER_TYPE(u32));
> +
> + for_each_set_bit(i, emb, BITS_PER_TYPE(u32))
> + bitmap_set(dst, i * 4, 4);
> +}
> +
> +/**
> + * panthor_perf_block_data - Identify the block index and type based on the offset.
> + *
> + * @desc: FW buffer descriptor.
> + * @offset: The current offset being examined.
> + * @idx: Pointer to an output index.
> + * @type: Pointer to an output block type.
> + *
> + * To disambiguate different types of blocks as well as different blocks of the same type,
> + * the offset into the FW ringbuffer is used to uniquely identify the block being considered.
> + *
> + * In the future, this is a good time to identify whether a block will be empty,
> + * allowing us to short-circuit its processing after emitting header information.
> + *
> + * Return: True if the current block is available, false otherwise.
> + */
> +static bool panthor_perf_block_data(struct panthor_perf_buffer_descriptor *const desc,
> + size_t offset, u32 *idx,
> + enum drm_panthor_perf_block_type *type)
> +{
> + unsigned long id;
> +
> + for_each_set_bit(id, desc->available_blocks, DRM_PANTHOR_PERF_BLOCK_LAST) {
I don't see the point of keeping this bitmask, because you set every single available bit
in panthor_perf_setup_fw_buffer_desc, as you traverse all the enum drm_panthor_perf_block_type values.
What you effectively do is turning the enum itself into a bitmask, so I'd just get rid of it and do
'for (enum drm_panthor_perf_block_type type = 0; type < DRM_PANTHOR_PERF_BLOCK_MAX; type++)'
right here.
> + const size_t block_start = desc->blocks[id].offset;
> + const size_t block_count = desc->blocks[id].block_count;
> + const size_t block_end = desc->blocks[id].offset +
> + desc->block_size * block_count;
> +
> + if (!block_count)
> + continue;
> +
> + if ((offset >= block_start) && (offset < block_end)) {
> + const unsigned long phys_mask[] = {
> + BITMAP_FROM_U64(desc->blocks[id].phys_mask),
> + };
> + const size_t pos =
> + div_u64(offset - desc->blocks[id].offset, desc->block_size);
> +
> + *type = id;
> +
> + if (test_bit(pos, phys_mask)) {
> + const u64 mask = GENMASK_ULL(pos, 0);
> + const u64 zeroes = ~desc->blocks[id].phys_mask & mask;
> +
> + *idx = pos - hweight64(zeroes);
> + return true;
I don't understand very well what you're trying to do here.
> + }
> + return false;
> + }
> + }
> +
> + return false;
> +}
> +
> +static size_t session_get_user_sample_size(const struct drm_panthor_perf_info *const info)
> +{
> + const size_t block_size = get_annotated_block_size(info->counters_per_block);
> + const size_t block_nr = info->cshw_blocks + info->fw_blocks +
> + info->tiler_blocks + info->memsys_blocks + info->shader_blocks;
> +
> + return sizeof(struct drm_panthor_perf_sample_header) + (block_size * block_nr);
> +}
You've redefined session_get_user_sample_size here.
> +
> +static u32 panthor_perf_handle_sample(struct panthor_device *ptdev, u32 extract_idx, u32 insert_idx)
> +{
> + struct panthor_perf *perf = ptdev->perf;
> + struct panthor_perf_sampler *sampler = &ptdev->perf->sampler;
> + const size_t ann_block_size =
> + get_annotated_block_size(ptdev->perf_info.counters_per_block);
> + u32 i;
> +
> + for (i = extract_idx; i != insert_idx; i++) {
> + u32 slot = i % sampler->sample_slots;
> + u8 *fw_sample = (u8 *)sampler->rb->kmap + slot * sampler->desc.buffer_size;
> +
> + for (size_t fw_off = 0, ann_off = sizeof(struct drm_panthor_perf_sample_header);
> + fw_off < sampler->desc.buffer_size;
> + fw_off += sampler->desc.block_size)
> +
> + {
> + u32 idx = 0;
> + enum drm_panthor_perf_block_type type = 0;
> + DECLARE_BITMAP(expanded_em, PANTHOR_PERF_EM_BITS);
> + struct panthor_perf_counter_block *blk =
> + (typeof(blk))(perf->sampler.sample + ann_off);
> + u32 *const block = (u32 *)(fw_sample + fw_off);
> + const u32 prfcnt_en = block[PANTHOR_CTR_PRFCNT_EN];
> +
> + if (!panthor_perf_block_data(&sampler->desc, fw_off, &idx, &type))
> + continue;
> +
> + /**
> + * TODO Data from the metadata block must be used to populate the
> + * block state information.
> + */
> + if (type == DRM_PANTHOR_PERF_BLOCK_METADATA) {
> + /*
> + * The host must clear the SAMPLE_REASON to acknowledge it has
> + * consumed the sample.
> + */
> + block[PANTHOR_CTR_SAMPLE_REASON] = 0;
> + continue;
> + }
> +
> + expand_enable_mask(prfcnt_en, expanded_em);
> +
> + blk->header = (struct drm_panthor_perf_block_header) {
> + .clock = 0,
> + .block_idx = idx,
> + .block_type = type,
> + .block_states = DRM_PANTHOR_PERF_BLOCK_STATE_UNKNOWN
Are you using any states other than DRM_PANTHOR_PERF_BLOCK_STATE_UNKNOWN?
> + };
> + bitmap_to_arr64(blk->header.enable_mask, expanded_em, PANTHOR_PERF_EM_BITS);
> +
> + /*
> + * The four header counters must be treated differently, because they are
> + * not additive. For the fourth, the assignment does not matter, as it
> + * is reserved and should be zero.
> + */
> + blk->counters[PANTHOR_CTR_TIMESTAMP_LO] = block[PANTHOR_CTR_TIMESTAMP_LO];
> + blk->counters[PANTHOR_CTR_TIMESTAMP_HI] = block[PANTHOR_CTR_TIMESTAMP_HI];
> + blk->counters[PANTHOR_CTR_PRFCNT_EN] = block[PANTHOR_CTR_PRFCNT_EN];
> +
> + /*
> + * The host must clear PRFCNT_EN to acknowledge it has consumed the sample.
> + */
> + block[PANTHOR_CTR_PRFCNT_EN] = 0;
> +
> + for (size_t k = PANTHOR_HEADER_COUNTERS;
> + k < ptdev->perf_info.counters_per_block;
> + k++)
> + blk->counters[k] += block[k];
> +
> + ann_off += ann_block_size;
Why wouldn't you include this inside the for loop step definition?
> + }
> + }
> +
> + return i;
> +}
> +
> +static size_t panthor_perf_get_fw_reported_size(struct panthor_device *ptdev)
> +{
> + struct panthor_fw_global_iface *glb_iface = panthor_fw_get_glb_iface(ptdev);
> +
> + size_t fw_size = GLB_PERFCNT_FW_SIZE(glb_iface->control->perfcnt_size);
> + size_t hw_size = GLB_PERFCNT_HW_SIZE(glb_iface->control->perfcnt_size);
> + size_t md_size = PERFCNT_FEATURES_MD_SIZE(glb_iface->control->perfcnt_features);
> +
> + return md_size + fw_size + hw_size;
> +}
> +
> +#define PANTHOR_PERF_SET_BLOCK_DESC_DATA(__desc, __type, __blk_count, __phys_mask, __offset) \
> + ({ \
> + (__desc)->blocks[(__type)].offset = (__offset); \
> + (__desc)->blocks[(__type)].block_count = (__blk_count); \
> + (__desc)->blocks[(__type)].phys_mask = (__phys_mask); \
> + if ((__blk_count)) \
> + set_bit((__type), (__desc)->available_blocks); \
> + (__offset) + ((__desc)->block_size) * (__blk_count); \
> + })
> +
> +static size_t get_reserved_shader_core_blocks(struct panthor_device *ptdev)
> +{
> + const u64 sc_mask = ptdev->gpu_info.shader_present;
> +
> + return fls64(sc_mask);
> +}
> +
> +#define BLK_MASK(x) GENMASK_ULL((x) - 1, 0)
> +
> +static u64 get_shader_core_mask(struct panthor_device *ptdev)
> +{
> + const u64 sc_mask = ptdev->gpu_info.shader_present;
> +
> + return BLK_MASK(hweight64(sc_mask));
> +}
> +
> +static int panthor_perf_setup_fw_buffer_desc(struct panthor_device *ptdev,
> + struct panthor_perf_sampler *sampler)
> +{
> + const struct drm_panthor_perf_info *const info = &ptdev->perf_info;
> + const size_t block_size = info->counters_per_block * PANTHOR_HW_COUNTER_SIZE;
> + struct panthor_perf_buffer_descriptor *desc = &sampler->desc;
> + const size_t fw_sample_size = panthor_perf_get_fw_reported_size(ptdev);
> + size_t offset = 0;
> +
> + desc->block_size = block_size;
> +
> + for (enum drm_panthor_perf_block_type type = 0; type < DRM_PANTHOR_PERF_BLOCK_MAX; type++) {
> + switch (type) {
> + case DRM_PANTHOR_PERF_BLOCK_METADATA:
> + if (info->flags & DRM_PANTHOR_PERF_BLOCK_STATES_SUPPORT)
> + offset = PANTHOR_PERF_SET_BLOCK_DESC_DATA(desc, type, 1,
> + BLK_MASK(1), offset);
> + break;
> + case DRM_PANTHOR_PERF_BLOCK_FW:
> + offset = PANTHOR_PERF_SET_BLOCK_DESC_DATA(desc, type, info->fw_blocks,
> + BLK_MASK(info->fw_blocks),
> + offset);
> + break;
> + case DRM_PANTHOR_PERF_BLOCK_CSHW:
> + offset = PANTHOR_PERF_SET_BLOCK_DESC_DATA(desc, type, info->cshw_blocks,
> + BLK_MASK(info->cshw_blocks),
> + offset);
> + break;
> + case DRM_PANTHOR_PERF_BLOCK_TILER:
> + offset = PANTHOR_PERF_SET_BLOCK_DESC_DATA(desc, type, info->tiler_blocks,
> + BLK_MASK(info->tiler_blocks),
> + offset);
> + break;
> + case DRM_PANTHOR_PERF_BLOCK_MEMSYS:
> + offset = PANTHOR_PERF_SET_BLOCK_DESC_DATA(desc, type, info->memsys_blocks,
> + BLK_MASK(info->memsys_blocks),
> + offset);
> + break;
> + case DRM_PANTHOR_PERF_BLOCK_SHADER:
> + offset = PANTHOR_PERF_SET_BLOCK_DESC_DATA(desc, type,
> + get_reserved_shader_core_blocks(ptdev),
> + get_shader_core_mask(ptdev),
> + offset);
> + break;
> + case DRM_PANTHOR_PERF_BLOCK_MAX:
> + drm_WARN_ON_ONCE(&ptdev->base,
> + "DRM_PANTHOR_PERF_BLOCK_MAX should be unreachable!");
> + break;
> + }
> + }
> +
> + /* Computed size is not the same as the reported size, so we should not proceed in
> + * initializing the sampling session.
> + */
> + if (offset != fw_sample_size)
> + return -EINVAL;
> +
> + desc->buffer_size = offset;
> +
> + return 0;
> +}
> +
> +static int panthor_perf_fw_stop_sampling(struct panthor_device *ptdev)
> +{
> + struct panthor_fw_global_iface *glb_iface = panthor_fw_get_glb_iface(ptdev);
> + u32 acked;
> + int ret;
> +
> + if (~READ_ONCE(glb_iface->input->req) & GLB_PERFCNT_ENABLE)
> + return 0;
> +
> + panthor_fw_update_reqs(glb_iface, req, 0, GLB_PERFCNT_ENABLE);
> + gpu_write(ptdev, CSF_DOORBELL(CSF_GLB_DOORBELL_ID), 1);
> + ret = panthor_fw_glb_wait_acks(ptdev, GLB_PERFCNT_ENABLE, &acked, 100);
> + if (ret)
> + drm_warn(&ptdev->base, "Could not disable performance counters");
> +
> + return ret;
> +}
> +
> +static int panthor_perf_fw_start_sampling(struct panthor_device *ptdev)
> +{
> + struct panthor_fw_global_iface *glb_iface = panthor_fw_get_glb_iface(ptdev);
> + u32 acked;
> + int ret;
> +
> + if (READ_ONCE(glb_iface->input->req) & GLB_PERFCNT_ENABLE)
> + return 0;
> +
> + panthor_fw_update_reqs(glb_iface, req, GLB_PERFCNT_ENABLE, GLB_PERFCNT_ENABLE);
> + gpu_write(ptdev, CSF_DOORBELL(CSF_GLB_DOORBELL_ID), 1);
> + ret = panthor_fw_glb_wait_acks(ptdev, GLB_PERFCNT_ENABLE, &acked, 100);
> + if (ret)
> + drm_warn(&ptdev->base, "Could not enable performance counters");
> +
> + return ret;
> +}
> +
> +static void panthor_perf_fw_write_config(struct panthor_perf_sampler *sampler,
> + struct panthor_perf_enable_masks *em)
I've noticed inconsistent usage in the way you declare some local variables (enablement mask pointers specifically)
as pointer constants, eg, in panthor_perf_em_add() you declare it as struct panthor_perf_enable_masks *const.
Here it would be alright to declare it as const too, because it's only being read from.
> +{
> + struct panthor_fw_global_iface *glb_iface = panthor_fw_get_glb_iface(sampler->ptdev);
> + u32 perfcnt_config;
> +
> + glb_iface->input->perfcnt_csf_enable =
> + compress_enable_mask(em->mask[DRM_PANTHOR_PERF_BLOCK_CSHW]);
> + glb_iface->input->perfcnt_shader_enable =
> + compress_enable_mask(em->mask[DRM_PANTHOR_PERF_BLOCK_SHADER]);
> + glb_iface->input->perfcnt_mmu_l2_enable =
> + compress_enable_mask(em->mask[DRM_PANTHOR_PERF_BLOCK_MEMSYS]);
> + glb_iface->input->perfcnt_tiler_enable =
> + compress_enable_mask(em->mask[DRM_PANTHOR_PERF_BLOCK_TILER]);
> + glb_iface->input->perfcnt_fw_enable =
> + compress_enable_mask(em->mask[DRM_PANTHOR_PERF_BLOCK_FW]);
> +
> + WRITE_ONCE(glb_iface->input->perfcnt_as, panthor_vm_as(panthor_fw_vm(sampler->ptdev)));
> + WRITE_ONCE(glb_iface->input->perfcnt_base, panthor_kernel_bo_gpuva(sampler->rb));
Some of the things you do here need to be done only once at perf init time, like
```
WRITE_ONCE(glb_iface->input->perfcnt_as, panthor_vm_as(panthor_fw_vm(sampler->ptdev)));
WRITE_ONCE(glb_iface->input->perfcnt_base, panthor_kernel_bo_gpuva(sampler->rb));
[...]
perfcnt_config = GLB_PERFCNT_CONFIG_SIZE(perf_ringbuf_slots);
[...]
if (sampler->ptdev->perf_info.flags & DRM_PANTHOR_PERF_BLOCK_STATES_SUPPORT)
perfcnt_config |= GLB_PERFCNT_METADATA_ENABLE;
```
However, this is done every single time a new session is introduced.
Also, perhaps it's a good idea to allocate the FW ring buffer only when the first session is registered,
and remove it when no sessions are active to save some memory
> +
> + perfcnt_config = GLB_PERFCNT_CONFIG_SIZE(perf_ringbuf_slots);
> + perfcnt_config |= GLB_PERFCNT_CONFIG_SET(sampler->set_config);
> + if (sampler->ptdev->perf_info.flags & DRM_PANTHOR_PERF_BLOCK_STATES_SUPPORT)
> + perfcnt_config |= GLB_PERFCNT_METADATA_ENABLE;
> +
> + WRITE_ONCE(glb_iface->input->perfcnt_config, perfcnt_config);
> +
> + /**
> + * The spec mandates that the host zero the PRFCNT_EXTRACT register before an enable
> + * operation, and each (re-)enable will require an enable-disable pair to program
> + * the new changes onto the FW interface.
> + */
> + WRITE_ONCE(glb_iface->input->perfcnt_extract, 0);
> +}
> +
> +static void panthor_perf_fw_write_sampler_config(struct panthor_perf_sampler *sampler)
> +{
> + panthor_perf_fw_write_config(sampler, sampler->em);
> +}
> +
> +static void session_populate_sample_header(struct panthor_perf_session *session,
> + struct drm_panthor_perf_sample_header *hdr, u8 set)
> +{
> + *hdr = (struct drm_panthor_perf_sample_header) {
> + .block_set = set,
> + .user_data = session->user_data,
> + .timestamp_start_ns = session->sample_start_ns,
> + /**
> + * TODO This should be changed to use the GPU clocks and the TIMESTAMP register,
> + * when support is added.
Access to the timestamp registers is available since the merging of the panthor fdinfo support patch series
> + */
> + .timestamp_end_ns = ktime_get_raw_ns(),
> + };
> +}
> +
> +/**
> + * session_accumulate_sample - Accumulate the counters that are requested by the session
> + * into the target buffer.
> + *
> + * @ptdev: Panthor device
> + * @session: Perf session
> + * @session_sample: Starting offset of the sample in the userspace mapping.
> + * @sampler_sample: Starting offset of the sample in the sampler intermediate buffer.
> + *
> + * The hardware supports counter selection at the granularity of 1 bit per 4 counters, and there
> + * is a single global FW frontend to program the counter requests from multiple sessions. This may
> + * lead to a large disparity between the requested and provided counters for an individual client.
> + * To remove this cross-talk, we patch out the counters that have not been requested by this
> + * session and update the PRFCNT_EN, the header counter containing a bitmask of enabled counters,
> + * accordingly.
> + */
> +static void session_accumulate_sample(struct panthor_device *ptdev,
> + struct panthor_perf_session *session,
> + u8 *session_sample, u8 *sampler_sample)
> +{
> + const struct drm_panthor_perf_info *const perf_info = &ptdev->perf_info;
> + const size_t block_size = get_annotated_block_size(perf_info->counters_per_block);
> + const size_t sample_size = session_get_user_sample_size(perf_info);
> + const size_t sample_header_size = sizeof(struct drm_panthor_perf_sample_header);
> + const size_t data_size = sample_size - sample_header_size;
> + struct drm_panthor_perf_sample_header *hdr = (typeof(hdr))session_sample;
> +
> + hdr->timestamp_end_ns = ktime_get_raw_ns();
> +
> + session_sample += sample_header_size;
> + sampler_sample += sample_header_size;
> +
> + for (size_t i = 0; i < data_size; i += block_size) {
> + size_t ctr_idx;
> + DECLARE_BITMAP(enabled_ctrs, PANTHOR_PERF_EM_BITS);
> + struct panthor_perf_counter_block *dst_blk = (typeof(dst_blk))(session_sample + i);
> + struct panthor_perf_counter_block *src_blk = (typeof(src_blk))(sampler_sample + i);
> +
> + bitmap_from_arr64(enabled_ctrs, dst_blk->header.enable_mask, PANTHOR_PERF_EM_BITS);
> + bitmap_clear(enabled_ctrs, 0, PANTHOR_HEADER_COUNTERS);
> +
> + dst_blk->counters[PANTHOR_CTR_TIMESTAMP_HI] =
> + src_blk->counters[PANTHOR_CTR_TIMESTAMP_HI];
> + dst_blk->counters[PANTHOR_CTR_TIMESTAMP_LO] =
> + src_blk->counters[PANTHOR_CTR_TIMESTAMP_LO];
> +
> + for_each_set_bit(ctr_idx, enabled_ctrs, PANTHOR_PERF_EM_BITS)
> + dst_blk->counters[ctr_idx] += src_blk->counters[ctr_idx];
> + }
> +}
> +
> +static void panthor_perf_fw_request_sample(struct panthor_perf_sampler *sampler)
> +{
> + struct panthor_fw_global_iface *glb_iface = panthor_fw_get_glb_iface(sampler->ptdev);
> +
> + panthor_fw_toggle_reqs(glb_iface, req, ack, GLB_PERFCNT_SAMPLE);
> + gpu_write(sampler->ptdev, CSF_DOORBELL(CSF_GLB_DOORBELL_ID), 1);
> +}
> +
> +/**
> + * session_populate_sample - Write out a new sample into a previously populated slot in the user
> + * ringbuffer and update both the header of the block and the PRFCNT_EN
> + * counter to contain only the selected subset of counters for that block.
> + *
> + * @ptdev: Panthor device
> + * @session: Perf session
> + * @session_sample: Pointer aligned to the start of the data section of the sample in the targeted
> + * slot.
> + * @sampler_sample: Pointer aligned to the start of the data section of the intermediate sampler
> + * buffer.
> + *
> + * When a new sample slot is targeted, it must be cleared of the data already existing there,
> + * enabling a direct copy from the intermediate buffer and then zeroing out any counters
> + * that are not required for the current session.
> + */
> +static void session_populate_sample(struct panthor_device *ptdev,
> + struct panthor_perf_session *session, u8 *session_sample,
> + u8 *sampler_sample)
> +{
> + const struct drm_panthor_perf_info *const perf_info = &ptdev->perf_info;
> +
> + const size_t block_size = get_annotated_block_size(perf_info->counters_per_block);
> + const size_t sample_size = session_get_user_sample_size(perf_info);
> + const size_t sample_header_size = sizeof(struct drm_panthor_perf_sample_header);
> + const size_t data_size = sample_size - sample_header_size;
> +
> + memcpy(session_sample, sampler_sample, sample_size);
If you're overwriting the sample header next line, maybe what you meant was?
memcpy(session_sample + sample_header_size, sampler_sample, sample_size);
> +
> + session_populate_sample_header(session,
> + (struct drm_panthor_perf_sample_header *)session_sample,
> + ptdev->perf->sampler.set_config);
> +
> + session_sample += sample_header_size;
> +
> + for (size_t i = 0; i < data_size; i += block_size) {
> + size_t ctr_idx;
> + DECLARE_BITMAP(em_diff, PANTHOR_PERF_EM_BITS);
> + struct panthor_perf_counter_block *blk = (typeof(blk))(session_sample + i);
> + enum drm_panthor_perf_block_type type = blk->header.block_type;
> + unsigned long *blk_em = session->enabled_counters->mask[type];
> +
> + bitmap_from_arr64(em_diff, blk->header.enable_mask, PANTHOR_PERF_EM_BITS);
> +
> + bitmap_andnot(em_diff, em_diff, blk_em, PANTHOR_PERF_EM_BITS);
> + bitmap_clear(em_diff, 0, PANTHOR_HEADER_COUNTERS);
> +
> + blk->counters[PANTHOR_CTR_PRFCNT_EN] = compress_enable_mask(blk_em);
> +
> + for_each_set_bit(ctr_idx, em_diff, PANTHOR_PERF_EM_BITS)
> + blk->counters[ctr_idx] = 0;
> +
> + bitmap_to_arr64(&blk->header.enable_mask, blk_em, PANTHOR_PERF_EM_BITS);
I'm wondering about the need to do this (writing the session enablement mask into the block header)
since we're already zeroing out the unrequested counters and also UM knows it.
> + }
> +}
> +
> +static int session_copy_sample(struct panthor_device *ptdev, struct panthor_perf_session *session)
> +{
> + struct panthor_perf *perf = ptdev->perf;
> + const size_t sample_size = session_get_user_sample_size(&ptdev->perf_info);
> + const u64 insert_idx = session_read_insert_idx(session);
> + const u64 extract_idx = session_read_extract_idx(session);
> + u8 *new_sample;
> +
> + if (!CIRC_SPACE_TO_END(insert_idx, extract_idx, session->ringbuf_slots))
> + return -ENOSPC;
> +
> + if (READ_ONCE(session->pending_sample_request) == SAMPLE_TYPE_INITIAL)
> + return 0;
> +
> + new_sample = session->samples + insert_idx * sample_size;
> +
> + if (session->accum_idx != insert_idx) {
> + session_populate_sample(ptdev, session, new_sample, perf->sampler.sample);
> + session->accum_idx = insert_idx;
> + } else
> + session_accumulate_sample(ptdev, session, new_sample, perf->sampler.sample);
> +
> + return 0;
> +}
> +
> +static void session_emit_sample(struct panthor_perf_session *session)
> +{
> + const u64 insert_idx = session_read_insert_idx(session);
> + const enum session_sample_type type = READ_ONCE(session->pending_sample_request);
> +
> + if (type == SAMPLE_TYPE_INITIAL || type == SAMPLE_TYPE_NONE)
> + goto reset_sample_request;
> +
> + session_write_insert_idx(session, (insert_idx + 1) % session->ringbuf_slots);
> +
> + /* Since we are about to notify userspace, we must ensure that all changes to memory
> + * are visible.
> + */
> + wmb();
> +
> + eventfd_signal(session->eventfd);
> +
> +reset_sample_request:
> + WRITE_ONCE(session->pending_sample_request, SAMPLE_TYPE_NONE);
> +}
> +
> +#define PRFCNT_IRQS (GLB_PERFCNT_OVERFLOW | GLB_PERFCNT_SAMPLE | GLB_PERFCNT_THRESHOLD)
> +
> +void panthor_perf_report_irq(struct panthor_device *ptdev, u32 status)
> +{
> + struct panthor_perf *const perf = ptdev->perf;
> + struct panthor_perf_sampler *sampler;
> + struct panthor_fw_global_iface *glb_iface = panthor_fw_get_glb_iface(ptdev);
> + bool sample_requested;
> +
> + if (!(status & JOB_INT_GLOBAL_IF))
> + return;
> +
> + if (!perf)
> + return;
> +
> + sampler = &perf->sampler;
> +
> + const u32 ack = READ_ONCE(glb_iface->output->ack);
> + const u32 req = READ_ONCE(glb_iface->input->req);
> +
> + scoped_guard(spinlock_irqsave, &sampler->pend_lock)
> + sample_requested = sampler->sample_requested;
> +
> +
> + /*
> + * TODO Fix up the error handling for overflow. Currently, the user is unblocked
> + * with a completely empty sample, whic is not the intended behaviour.
> + */
> + if (drm_WARN_ON_ONCE(&ptdev->base, (req ^ ack) & GLB_PERFCNT_OVERFLOW))
> + goto emit;
> +
> + if ((sample_requested && (req & GLB_PERFCNT_SAMPLE) == (ack & GLB_PERFCNT_SAMPLE)) ||
> + ((req ^ ack) & GLB_PERFCNT_THRESHOLD)) {
> + const u32 extract_idx = READ_ONCE(glb_iface->input->perfcnt_extract);
> + const u32 insert_idx = READ_ONCE(glb_iface->output->perfcnt_insert);
> +
> + /* If the sample was requested around a reset, some time may be needed
> + * for the FW interface to be updated, so we reschedule a sample
> + * and return immediately.
> + */
> + if (insert_idx == extract_idx) {
> + guard(spinlock_irqsave)(&sampler->pend_lock);
> + if (sampler->sample_requested)
> + panthor_perf_fw_request_sample(sampler);
> +
> + return;
> + }
> +
> + WRITE_ONCE(glb_iface->input->perfcnt_extract,
> + panthor_perf_handle_sample(ptdev, extract_idx, insert_idx));
Here you'd always be writing insert_idx into glb_iface->input->perfcnt_extract,
so is it really necessary to let panthor_perf_handle_sample return it?
> + }
> +
> + scoped_guard(mutex, &sampler->sampler_lock)
> + {
> + struct list_head *pos;
> +
> + list_for_each(pos, &sampler->session_list) {
> + struct panthor_perf_session *session = list_entry(pos,
> + struct panthor_perf_session, sessions);
> +
> + session_copy_sample(ptdev, session);
> + }
> + }
> +
> +emit:
> + scoped_guard(spinlock_irqsave, &sampler->pend_lock) {
> + struct list_head *pos, *tmp;
> +
> + list_for_each_safe(pos, tmp, &sampler->pending_samples) {
> + struct panthor_perf_session *session = list_entry(pos,
> + struct panthor_perf_session, pending);
> +
> + session_emit_sample(session);
> + list_del(pos);
> + session_put(session);
> + }
> +
> + sampler->sample_requested = false;
> + }
> +
> + memset(sampler->sample, 0, session_get_user_sample_size(&ptdev->perf_info));
I wonder why we'd want to zero out the intermediate sample buffer, since we don't need to do
that to tell the hardware that the FW sample was consumed (that is done in the FW ringbuffer),
and also it'll be overwritten next time a sample is produced by the FW. However, next time
there's an irq notification for a sample, it turns out that session->accum_idx == insert_idx,
perhaps we could defer the zero'ing out until then? Alternatively, adding a field to the
sample header in the sampler->sample buffer that would tell us if it needs to be overwritten
in the next occurrence of a copy might be enough?
> + complete(&sampler->sample_handled);
> +}
> +
> +static int panthor_perf_sampler_init(struct panthor_perf_sampler *sampler,
> + struct panthor_device *ptdev)
> +{
> + struct panthor_kernel_bo *bo;
> + u8 *sample;
> + int ret;
> +
> + ret = panthor_perf_setup_fw_buffer_desc(ptdev, sampler);
> + if (ret) {
> + drm_err(&ptdev->base,
> + "Failed to setup descriptor for FW ring buffer, err = %d", ret);
> + return ret;
> + }
> +
> + bo = panthor_kernel_bo_create(ptdev, panthor_fw_vm(ptdev),
> + sampler->desc.buffer_size * perf_ringbuf_slots,
> + DRM_PANTHOR_BO_NO_MMAP,
> + DRM_PANTHOR_VM_BIND_OP_MAP_NOEXEC |
> + DRM_PANTHOR_VM_BIND_OP_MAP_UNCACHED,
> + PANTHOR_VM_KERNEL_AUTO_VA);
> +
> + if (IS_ERR_OR_NULL(bo))
> + return IS_ERR(bo) ? PTR_ERR(bo) : -ENOMEM;
> +
> + ret = panthor_kernel_bo_vmap(bo);
> + if (ret)
> + goto cleanup_bo;
> +
> + sample = kzalloc(session_get_user_sample_size(&ptdev->perf_info), GFP_KERNEL);
> + if (ZERO_OR_NULL_PTR(sample)) {
> + ret = -ENOMEM;
> + goto cleanup_vmap;
> + }
> +
> + sampler->rb = bo;
> + sampler->sample = sample;
> + sampler->sample_slots = perf_ringbuf_slots;
> + sampler->em = kzalloc(sizeof(*sampler->em), GFP_KERNEL);
> +
> + mutex_init(&sampler->sampler_lock);
> + spin_lock_init(&sampler->pend_lock);
> + INIT_LIST_HEAD(&sampler->session_list);
> + INIT_LIST_HEAD(&sampler->pending_samples);
> + init_completion(&sampler->sample_handled);
> +
> + sampler->ptdev = ptdev;
> +
> + return 0;
> +
> +cleanup_vmap:
> + panthor_kernel_bo_vunmap(bo);
> +
> +cleanup_bo:
> + panthor_kernel_bo_destroy(bo);
> +
> + return ret;
> +}
> +
> +static void panthor_perf_sampler_term(struct panthor_perf_sampler *sampler)
> +{
> + int ret;
> + bool requested;
> +
> + scoped_guard(spinlock_irqsave, &sampler->pend_lock)
> + requested = sampler->sample_requested;
> +
> + if (requested)
> + wait_for_completion_killable(&sampler->sample_handled);
> +
> + panthor_perf_fw_write_config(sampler, &(struct panthor_perf_enable_masks){});
When you remove a session, you first call 'panthor_perf_em_zero(sampler->em);'
and then compose a new global sampler enablement mask with the OR'd bitmap
of the different sessions' enablement masks. But if there are no sessions
left, you're guaranteed that here sampler->em will be all zeros,
so you can just do 'panthor_perf_fw_write_sampler_config(sampler)' and
inline the definition of 'panthor_perf_fw_write_config()' into it
> +
> + ret = panthor_perf_fw_stop_sampling(sampler->ptdev);
> + if (ret)
> + drm_warn_once(&sampler->ptdev->base, "Sampler termination failed, ret = %d", ret);
> +
> + kfree(sampler->sample);
> +
> + panthor_kernel_bo_destroy(sampler->rb);
> +}
> +
> +static int panthor_perf_sampler_add(struct panthor_perf_sampler *sampler,
> + struct panthor_perf_session *session, u8 set)
> +{
> + int ret = 0;
> + struct panthor_perf_enable_masks *session_em = session->enabled_counters;
> +
> + guard(mutex)(&sampler->sampler_lock);
> +
> + /* Early check for whether a new set can be configured. */
> + if (!atomic_read(&sampler->enabled_clients))
> + sampler->set_config = set;
> + else
> + if (sampler->set_config != set)
> + return -EBUSY;
> +
> + panthor_perf_em_add(sampler->em, session_em);
> + ret = pm_runtime_resume_and_get(sampler->ptdev->base.dev);
> + if (ret)
> + return ret;
> +
> + if (atomic_read(&sampler->enabled_clients)) {
> + ret = panthor_perf_fw_stop_sampling(sampler->ptdev);
> + if (ret)
> + return ret;
> + }
> +
> + panthor_perf_fw_write_sampler_config(sampler);
> +
> + ret = panthor_perf_fw_start_sampling(sampler->ptdev);
> + if (ret)
> + return ret;
> +
> + session_get(session);
> + list_add_tail(&session->sessions, &sampler->session_list);
> + atomic_inc(&sampler->enabled_clients);
> +
> + return 0;
> +}
> +
> +static int panthor_perf_sampler_remove_session(struct panthor_perf_sampler *sampler,
> + struct panthor_perf_session *session)
> +{
> + int ret;
> + struct list_head *snode;
> +
> + guard(mutex)(&sampler->sampler_lock);
> +
> + list_del_init(&session->sessions);
> + session_put(session);
> +
> + panthor_perf_em_zero(sampler->em);
> + list_for_each(snode, &sampler->session_list)
> + {
> + struct panthor_perf_session *session =
> + container_of(snode, typeof(*session), sessions);
> +
> + panthor_perf_em_add(sampler->em, session->enabled_counters);
> + }
> +
> + ret = panthor_perf_fw_stop_sampling(sampler->ptdev);
> + if (ret)
> + return ret;
> +
> + atomic_dec(&sampler->enabled_clients);
> + pm_runtime_put_sync(sampler->ptdev->base.dev);
> +
> + panthor_perf_fw_write_sampler_config(sampler);
> +
> + if (atomic_read(&sampler->enabled_clients))
> + return panthor_perf_fw_start_sampling(sampler->Ptdev);
> + return 0;
> +}
> +
> /**
> * panthor_perf_init - Initialize the performance counter subsystem.
> * @ptdev: Panthor device
> @@ -382,6 +1352,10 @@ int panthor_perf_init(struct panthor_device *ptdev)
> .max = 1,
> };
>
> + ret = panthor_perf_sampler_init(&perf->sampler, ptdev);
> + if (ret)
> + return ret;
> +
> drm_info(&ptdev->base, "Performance counter subsystem initialized");
>
> ptdev->perf = no_free_ptr(perf);
> @@ -389,6 +1363,69 @@ int panthor_perf_init(struct panthor_device *ptdev)
> return ret;
> }
>
> +static int sampler_request(struct panthor_perf_sampler *sampler,
> + struct panthor_perf_session *session, enum session_sample_type type)
> +{
> + guard(spinlock_irqsave)(&sampler->pend_lock);
You're extending the lock to the entire function, but the only time you
modify the pending_samples list is after getting the session, so why not
just limit the critical section to that statement?
> +
> + /*
> + * If a previous sample has not been handled yet, the session cannot request another
> + * sample. If this happens too often, the requested sample rate is too high.
> + */
> + if (READ_ONCE(session->pending_sample_request) != SAMPLE_TYPE_NONE)
> + return -EBUSY;
> +
> + WRITE_ONCE(session->pending_sample_request, type);
> + session_get(session);
Why do we increase the rfcnt for the session here?
Is it because someone might try to tear the session down while a sample
is being waited for?
I think, in that case the sample processing logic can determine no
sessions might be left and then refuse to copy the FW sample.
> + list_add_tail(&session->pending, &sampler->pending_samples);
> +
> + if (!sampler->sample_requested) {
> + reinit_completion(&sampler->sample_handled);
> + sampler->sample_requested = true;
> + panthor_perf_fw_request_sample(sampler);
> + }
> +
> + return 0;
> +}
> +
> +/**
> + * panthor_perf_sampler_request_initial - Request an initial sample.
> + * @sampler: Panthor sampler
> + * @session: Target session
> + *
> + * Perform a synchronous sample that gets immediately discarded. This sets a baseline at the point
> + * of time a new session is started, to avoid having counters from before the session.
> + */
> +static int panthor_perf_sampler_request_initial(struct panthor_perf_sampler *sampler,
> + struct panthor_perf_session *session)
> +{
> + int ret = sampler_request(sampler, session, SAMPLE_TYPE_INITIAL);
> +
> + if (ret)
> + return ret;
> +
> + return wait_for_completion_timeout(&sampler->sample_handled,
> + msecs_to_jiffies(1000));
> +}
> +
> +/**
> + * panthor_perf_sampler_request_sample - Request a counter sample for the userspace client.
> + * @sampler: Panthor sampler
> + * @session: Target session
> + *
> + * A session that has already requested a sample cannot request another one until the previous
> + * sample has been delivered.
> + *
> + * Return:
> + * * %0 - The sample has been requested successfully.
> + * * %-EBUSY - The target session has already requested a sample and has not received it yet.
> + */
> +static int panthor_perf_sampler_request_sample(struct panthor_perf_sampler *sampler,
> + struct panthor_perf_session *session)
> +{
> + return sampler_request(sampler, session, SAMPLE_TYPE_REGULAR);
> +}
> +
> static int session_validate_set(u8 set)
> {
> if (set > DRM_PANTHOR_PERF_SET_TERTIARY)
> @@ -417,8 +1454,8 @@ static int session_validate_set(u8 set)
> * Return: non-negative session identifier on success or negative error code on failure.
> */
> int panthor_perf_session_setup(struct panthor_device *ptdev, struct panthor_perf *perf,
> - struct drm_panthor_perf_cmd_setup *setup_args,
> - struct panthor_file *pfile)
> + struct drm_panthor_perf_cmd_setup *setup_args,
> + struct panthor_file *pfile)
> {
> struct panthor_perf_session *session;
> struct drm_gem_object *ringbuffer;
> @@ -510,6 +1547,10 @@ int panthor_perf_session_setup(struct panthor_device *ptdev, struct panthor_perf
> kref_init(&session->ref);
> session->enabled_counters = em;
>
> + ret = panthor_perf_sampler_add(&perf->sampler, session, setup_args->block_set);
> + if (ret)
> + goto cleanup_xa_alloc;
> +
> session->sample_freq_ns = setup_args->sample_freq_ns;
> session->user_sample_size = user_sample_size;
> session->ring_buf = ringbuffer;
> @@ -520,6 +1561,9 @@ int panthor_perf_session_setup(struct panthor_device *ptdev, struct panthor_perf
>
> return session_id;
>
> +cleanup_xa_alloc:
> + xa_store(&perf->sessions, session_id, NULL, GFP_KERNEL);
> +
> cleanup_em:
> kfree(em);
>
> @@ -545,8 +1589,10 @@ int panthor_perf_session_setup(struct panthor_device *ptdev, struct panthor_perf
> }
>
> static int session_stop(struct panthor_perf *perf, struct panthor_perf_session *session,
> - u64 user_data)
> + u64 user_data)
You've changed the indentation of a few function headers in this
commit. It's best to fix it in the original one right away.
> {
> + int ret;
> +
> if (!test_bit(PANTHOR_PERF_SESSION_ACTIVE, session->state))
> return 0;
>
> @@ -559,14 +1605,17 @@ static int session_stop(struct panthor_perf *perf, struct panthor_perf_session *
>
> session->user_data = user_data;
>
> + ret = panthor_perf_sampler_request_sample(&perf->sampler, session);
> + if (ret)
> + return ret;
> +
> clear_bit(PANTHOR_PERF_SESSION_ACTIVE, session->state);
>
> - /* TODO Calls to the FW interface will go here in later patches. */
> return 0;
> }
>
> static int session_start(struct panthor_perf *perf, struct panthor_perf_session *session,
> - u64 user_data)
> + u64 user_data)
> {
> if (test_bit(PANTHOR_PERF_SESSION_ACTIVE, session->state))
> return 0;
> @@ -580,12 +1629,11 @@ static int session_start(struct panthor_perf *perf, struct panthor_perf_session
> if (session->sample_freq_ns)
> session->user_data = user_data;
>
> - /* TODO Calls to the FW interface will go here in later patches. */
> - return 0;
> + return panthor_perf_sampler_request_initial(&perf->sampler, session);
> }
>
> static int session_sample(struct panthor_perf *perf, struct panthor_perf_session *session,
> - u64 user_data)
> + u64 user_data)
> {
> if (!test_bit(PANTHOR_PERF_SESSION_ACTIVE, session->state))
> return 0;
> @@ -608,14 +1656,16 @@ static int session_sample(struct panthor_perf *perf, struct panthor_perf_session
> session->sample_start_ns = ktime_get_raw_ns();
> session->user_data = user_data;
>
> - return 0;
> + return panthor_perf_sampler_request_sample(&perf->sampler, session);
> }
>
> static int session_destroy(struct panthor_perf *perf, struct panthor_perf_session *session)
> {
> + int ret = panthor_perf_sampler_remove_session(&perf->sampler, session);
> +
> session_put(session);
>
> - return 0;
> + return ret;
> }
>
> static int session_teardown(struct panthor_perf *perf, struct panthor_perf_session *session)
> @@ -691,7 +1741,7 @@ int panthor_perf_session_teardown(struct panthor_file *pfile, struct panthor_per
> * Return: 0 on success, negative error code on failure.
> */
> int panthor_perf_session_start(struct panthor_file *pfile, struct panthor_perf *perf,
> - u32 sid, u64 user_data)
> + u32 sid, u64 user_data)
> {
> struct panthor_perf_session *session = session_find(pfile, perf, sid);
> int err;
> @@ -724,7 +1774,7 @@ int panthor_perf_session_start(struct panthor_file *pfile, struct panthor_perf *
> * Return: 0 on success, negative error code on failure.
> */
> int panthor_perf_session_stop(struct panthor_file *pfile, struct panthor_perf *perf,
> - u32 sid, u64 user_data)
> + u32 sid, u64 user_data)
> {
> struct panthor_perf_session *session = session_find(pfile, perf, sid);
> int err;
> @@ -755,7 +1805,7 @@ int panthor_perf_session_stop(struct panthor_file *pfile, struct panthor_perf *p
> * Return: 0 on success, negative error code on failure.
> */
> int panthor_perf_session_sample(struct panthor_file *pfile, struct panthor_perf *perf,
> - u32 sid, u64 user_data)
> + u32 sid, u64 user_data)
> {
> struct panthor_perf_session *session = session_find(pfile, perf, sid);
> int err;
> @@ -822,6 +1872,8 @@ void panthor_perf_unplug(struct panthor_device *ptdev)
>
> xa_destroy(&perf->sessions);
>
> + panthor_perf_sampler_term(&perf->sampler);
> +
> kfree(ptdev->perf);
>
> ptdev->perf = NULL;
> diff --git a/drivers/gpu/drm/panthor/panthor_perf.h b/drivers/gpu/drm/panthor/panthor_perf.h
> index 89d61cd1f017..c482198b6fbd 100644
> --- a/drivers/gpu/drm/panthor/panthor_perf.h
> +++ b/drivers/gpu/drm/panthor/panthor_perf.h
> @@ -28,5 +28,7 @@ int panthor_perf_session_sample(struct panthor_file *pfile, struct panthor_perf
> u32 sid, u64 user_data);
> void panthor_perf_session_destroy(struct panthor_file *pfile, struct panthor_perf *perf);
>
> +void panthor_perf_report_irq(struct panthor_device *ptdev, u32 status);
> +
> #endif /* __PANTHOR_PERF_H__ */
>
> --
> 2.33.0.dirty
Adrian Larumbe
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v4 6/7] drm/panthor: Add suspend, resume and reset handling
2025-05-16 15:49 ` [PATCH v4 6/7] drm/panthor: Add suspend, resume and reset handling Lukas Zapolskas
@ 2025-07-18 15:01 ` Adrián Larumbe
2025-07-25 9:26 ` Lukas Zapolskas
0 siblings, 1 reply; 29+ messages in thread
From: Adrián Larumbe @ 2025-07-18 15:01 UTC (permalink / raw)
To: Lukas Zapolskas
Cc: Boris Brezillon, Steven Price, Liviu Dudau, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
dri-devel, linux-kernel
On 16.05.2025 16:49, Lukas Zapolskas wrote:
> The sampler must disable and re-enable counter sampling around suspends,
> and must re-program the FW interface after a reset to avoid losing
> data.
>
> Signed-off-by: Lukas Zapolskas <lukas.zapolskas@arm.com>
> ---
> drivers/gpu/drm/panthor/panthor_device.c | 7 +-
> drivers/gpu/drm/panthor/panthor_perf.c | 102 +++++++++++++++++++++++
> drivers/gpu/drm/panthor/panthor_perf.h | 6 ++
> 3 files changed, 114 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/panthor/panthor_device.c b/drivers/gpu/drm/panthor/panthor_device.c
> index 7ac985d44655..92624a8717c5 100644
> --- a/drivers/gpu/drm/panthor/panthor_device.c
> +++ b/drivers/gpu/drm/panthor/panthor_device.c
> @@ -139,6 +139,7 @@ static void panthor_device_reset_work(struct work_struct *work)
> if (!drm_dev_enter(&ptdev->base, &cookie))
> return;
>
> + panthor_perf_pre_reset(ptdev);
> panthor_sched_pre_reset(ptdev);
> panthor_fw_pre_reset(ptdev, true);
> panthor_mmu_pre_reset(ptdev);
> @@ -148,6 +149,7 @@ static void panthor_device_reset_work(struct work_struct *work)
> ret = panthor_fw_post_reset(ptdev);
> atomic_set(&ptdev->reset.pending, 0);
> panthor_sched_post_reset(ptdev, ret != 0);
> + panthor_perf_post_reset(ptdev);
> drm_dev_exit(cookie);
>
> if (ret) {
> @@ -496,8 +498,10 @@ int panthor_device_resume(struct device *dev)
> ret = panthor_device_resume_hw_components(ptdev);
> }
>
> - if (!ret)
> + if (!ret) {
> panthor_sched_resume(ptdev);
> + panthor_perf_resume(ptdev);
> + }
>
> drm_dev_exit(cookie);
>
> @@ -561,6 +565,7 @@ int panthor_device_suspend(struct device *dev)
> /* We prepare everything as if we were resetting the GPU.
> * The end of the reset will happen in the resume path though.
> */
> + panthor_perf_suspend(ptdev);
> panthor_sched_suspend(ptdev);
> panthor_fw_suspend(ptdev);
> panthor_mmu_suspend(ptdev);
> diff --git a/drivers/gpu/drm/panthor/panthor_perf.c b/drivers/gpu/drm/panthor/panthor_perf.c
> index 97603b168d2d..438319cf71ab 100644
> --- a/drivers/gpu/drm/panthor/panthor_perf.c
> +++ b/drivers/gpu/drm/panthor/panthor_perf.c
> @@ -1845,6 +1845,76 @@ void panthor_perf_session_destroy(struct panthor_file *pfile, struct panthor_per
> }
> }
>
> +static int panthor_perf_sampler_resume(struct panthor_perf_sampler *sampler)
> +{
> + int ret;
> +
> + if (!atomic_read(&sampler->enabled_clients))
> + return 0;
> +
> + ret = panthor_perf_fw_start_sampling(sampler->ptdev);
> + if (ret)
> + return ret;
> +
> + return 0;
> +}
> +
> +static int panthor_perf_sampler_suspend(struct panthor_perf_sampler *sampler)
> +{
> + int ret;
> +
> + if (!atomic_read(&sampler->enabled_clients))
> + return 0;
> +
> + ret = panthor_perf_fw_stop_sampling(sampler->ptdev);
> + if (ret)
> + return ret;
> +
> + return 0;
> +}
> +
> +/**
> + * panthor_perf_suspend - Prepare the performance counter subsystem for system suspend.
> + * @ptdev: Panthor device.
> + *
> + * Indicate to the performance counters that the system is suspending.
> + *
> + * This function must not be used to handle MCU power state transitions: just before MCU goes
> + * from on to any inactive state, an automatic sample will be performed by the firmware, and
> + * the performance counter firmware state will be restored on warm boot.
> + *
> + * Return: 0 on success, negative error code on failure.
> + */
> +int panthor_perf_suspend(struct panthor_device *ptdev)
> +{
> + struct panthor_perf *perf = ptdev->perf;
> +
> + if (!perf)
> + return 0;
> +
> + return panthor_perf_sampler_suspend(&perf->sampler);
> +}
> +
> +/**
> + * panthor_perf_resume - Resume the performance counter subsystem after system resumption.
> + * @ptdev: Panthor device.
> + *
> + * Indicate to the performance counters that the system has resumed. This must not be used
> + * to handle MCU state transitions, for the same reasons as detailed in the kerneldoc for
> + * @panthor_perf_suspend.
> + *
> + * Return: 0 on success, negative error code on failure.
> + */
> +int panthor_perf_resume(struct panthor_device *ptdev)
> +{
> + struct panthor_perf *perf = ptdev->perf;
> +
> + if (!perf)
> + return 0;
> +
> + return panthor_perf_sampler_resume(&perf->sampler);
> +}
In the two previous functions, you return an int, but you never used it
from where they're called. Also, in both of them, for the sake of
coherence, I'd get rid of the *sampler* subcalls because later in
'panthor_perf_pre_reset' and 'panthor_perf_post_reset' you manipulate the
sampler directly without referring it to another function. The functions
are short enough for us to be able to inline the content of
'panthor_perf_sampler_resume' into 'panthor_perf_resume'.
> +
> /**
> * panthor_perf_unplug - Terminate the performance counter subsystem.
> * @ptdev: Panthor device.
> @@ -1878,3 +1948,35 @@ void panthor_perf_unplug(struct panthor_device *ptdev)
>
> ptdev->perf = NULL;
> }
> +
> +void panthor_perf_pre_reset(struct panthor_device *ptdev)
> +{
> + struct panthor_perf_sampler *sampler;
> +
> + if (!ptdev || !ptdev->perf)
> + return;
> +
> + sampler = &ptdev->perf->sampler;
> +
> + if (!atomic_read(&sampler->enabled_clients))
> + return;
> +
> + panthor_perf_fw_stop_sampling(sampler->ptdev);
> +}
> +
> +void panthor_perf_post_reset(struct panthor_device *ptdev)
> +{
> + struct panthor_perf_sampler *sampler;
> +
> + if (!ptdev || !ptdev->perf)
> + return;
In both this function and the preceding one, ptdev is meant to be
available by the time they're called, so I'd turn the check of ptdev not
being null into a drm_WARN().
> +
> + sampler = &ptdev->perf->sampler;
> +
> + if (!atomic_read(&sampler->enabled_clients))
> + return;
> +
> + panthor_perf_fw_write_sampler_config(sampler);
> +
> + panthor_perf_fw_start_sampling(sampler->ptdev);
> +}
> diff --git a/drivers/gpu/drm/panthor/panthor_perf.h b/drivers/gpu/drm/panthor/panthor_perf.h
> index c482198b6fbd..fc08a5440a35 100644
> --- a/drivers/gpu/drm/panthor/panthor_perf.h
> +++ b/drivers/gpu/drm/panthor/panthor_perf.h
> @@ -13,6 +13,8 @@ struct panthor_file;
> struct panthor_perf;
>
> int panthor_perf_init(struct panthor_device *ptdev);
> +int panthor_perf_suspend(struct panthor_device *ptdev);
> +int panthor_perf_resume(struct panthor_device *ptdev);
> void panthor_perf_unplug(struct panthor_device *ptdev);
>
> int panthor_perf_session_setup(struct panthor_device *ptdev, struct panthor_perf *perf,
> @@ -30,5 +32,9 @@ void panthor_perf_session_destroy(struct panthor_file *pfile, struct panthor_per
>
> void panthor_perf_report_irq(struct panthor_device *ptdev, u32 status);
>
> +void panthor_perf_pre_reset(struct panthor_device *ptdev);
> +
> +void panthor_perf_post_reset(struct panthor_device *ptdev);
> +
> #endif /* __PANTHOR_PERF_H__ */
>
> --
> 2.33.0.dirty
Adrian Larumbe
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v4 7/7] drm/panthor: Expose the panthor perf ioctls
2025-05-16 15:49 ` [PATCH v4 7/7] drm/panthor: Expose the panthor perf ioctls Lukas Zapolskas
@ 2025-07-18 15:05 ` Adrián Larumbe
2025-07-18 15:19 ` Adrián Larumbe
1 sibling, 0 replies; 29+ messages in thread
From: Adrián Larumbe @ 2025-07-18 15:05 UTC (permalink / raw)
To: Lukas Zapolskas
Cc: Boris Brezillon, Steven Price, Liviu Dudau, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
dri-devel, linux-kernel
Hi Lucas, this whole patch is alright, but don't forget to bump the
driver minor revision up, because 1.5 has alrady been assigned:
1.5 - adds DRM_PANTHOR_SET_USER_MMIO_OFFSET ioctl
On 16.05.2025 16:49, Lukas Zapolskas wrote:
> This patch implements the PANTHOR_PERF_CONTROL ioctl series, and
> a PANTHOR_GET_UOBJ wrapper to deal with the backwards and forwards
> compatibility of the uAPI.
>
> The minor version is bumped to indicate that the feature is now
> supported.
>
> Signed-off-by: Lukas Zapolskas <lukas.zapolskas@arm.com>
> Reviewed-by: Adrián Larumbe <adrian.larumbe@collabora.com>
> ---
> drivers/gpu/drm/panthor/panthor_drv.c | 141 +++++++++++++++++++++++++-
> 1 file changed, 139 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c
> index 4c1381320859..850a894fe91b 100644
> --- a/drivers/gpu/drm/panthor/panthor_drv.c
> +++ b/drivers/gpu/drm/panthor/panthor_drv.c
> @@ -31,6 +31,7 @@
> #include "panthor_gpu.h"
> #include "panthor_heap.h"
> #include "panthor_mmu.h"
> +#include "panthor_perf.h"
> #include "panthor_regs.h"
> #include "panthor_sched.h"
>
> @@ -73,6 +74,39 @@ panthor_set_uobj(u64 usr_ptr, u32 usr_size, u32 min_size, u32 kern_size, const v
> return 0;
> }
>
> +/**
> + * panthor_get_uobj() - Copy kernel object to user object.
> + * @usr_ptr: Users pointer.
> + * @usr_size: Size of the user object.
> + * @min_size: Minimum size for this object.
> + *
> + * Helper automating kernel -> user object copies.
> + *
> + * Don't use this function directly, use PANTHOR_UOBJ_GET() instead.
> + *
> + * Return: valid pointer on success, an encoded error code otherwise.
> + */
> +static void*
> +panthor_get_uobj(u64 usr_ptr, u32 usr_size, u32 min_size)
> +{
> + int ret;
> + void *out_alloc __free(kvfree) = NULL;
> +
> + /* User size shouldn't be smaller than the minimal object size. */
> + if (usr_size < min_size)
> + return ERR_PTR(-EINVAL);
> +
> + out_alloc = kvmalloc(min_size, GFP_KERNEL);
> + if (!out_alloc)
> + return ERR_PTR(-ENOMEM);
> +
> + ret = copy_struct_from_user(out_alloc, min_size, u64_to_user_ptr(usr_ptr), usr_size);
> + if (ret)
> + return ERR_PTR(ret);
> +
> + return_ptr(out_alloc);
> +}
> +
> /**
> * panthor_get_uobj_array() - Copy a user object array into a kernel accessible object array.
> * @in: The object array to copy.
> @@ -176,7 +210,12 @@ panthor_get_uobj_array(const struct drm_panthor_obj_array *in, u32 min_stride,
> PANTHOR_UOBJ_DECL(struct drm_panthor_queue_submit, syncs), \
> PANTHOR_UOBJ_DECL(struct drm_panthor_queue_create, ringbuf_size), \
> PANTHOR_UOBJ_DECL(struct drm_panthor_vm_bind_op, syncs), \
> - PANTHOR_UOBJ_DECL(struct drm_panthor_perf_info, shader_blocks))
> + PANTHOR_UOBJ_DECL(struct drm_panthor_perf_info, shader_blocks), \
> + PANTHOR_UOBJ_DECL(struct drm_panthor_perf_cmd_setup, shader_enable_mask), \
> + PANTHOR_UOBJ_DECL(struct drm_panthor_perf_cmd_start, user_data), \
> + PANTHOR_UOBJ_DECL(struct drm_panthor_perf_cmd_stop, user_data), \
> + PANTHOR_UOBJ_DECL(struct drm_panthor_perf_cmd_sample, user_data))
> +
>
> /**
> * PANTHOR_UOBJ_SET() - Copy a kernel object to a user object.
> @@ -191,6 +230,24 @@ panthor_get_uobj_array(const struct drm_panthor_obj_array *in, u32 min_stride,
> PANTHOR_UOBJ_MIN_SIZE(_src_obj), \
> sizeof(_src_obj), &(_src_obj))
>
> +/**
> + * PANTHOR_UOBJ_GET() - Copies a user object from _usr_ptr to a kernel accessible _dest_ptr.
> + * @_dest_ptr: Local variable
> + * @_usr_size: Size of the user object.
> + * @_usr_ptr: The pointer of the object in userspace.
> + *
> + * Return: Error code. See panthor_get_uobj().
> + */
> +#define PANTHOR_UOBJ_GET(_dest_ptr, _usr_size, _usr_ptr) \
> + ({ \
> + typeof(_dest_ptr) _tmp; \
> + _tmp = panthor_get_uobj(_usr_ptr, _usr_size, \
> + PANTHOR_UOBJ_MIN_SIZE(_tmp[0])); \
> + if (!IS_ERR(_tmp)) \
> + _dest_ptr = _tmp; \
> + PTR_ERR_OR_ZERO(_tmp); \
> + })
> +
> /**
> * PANTHOR_UOBJ_GET_ARRAY() - Copy a user object array to a kernel accessible
> * object array.
> @@ -1339,6 +1396,83 @@ static int panthor_ioctl_vm_get_state(struct drm_device *ddev, void *data,
> return 0;
> }
>
> +#define perf_cmd(command) \
> + ({ \
> + struct drm_panthor_perf_cmd_##command *command##_args __free(kvfree) = NULL; \
> + int _ret = PANTHOR_UOBJ_GET(command##_args, args->size, args->pointer); \
> + if (_ret) \
> + return _ret; \
> + return panthor_perf_session_##command(pfile, ptdev->perf, args->handle, \
> + command##_args->user_data); \
> + })
> +
> +static int panthor_ioctl_perf_control(struct drm_device *ddev, void *data,
> + struct drm_file *file)
> +{
> + struct panthor_device *ptdev = container_of(ddev, struct panthor_device, base);
> + struct panthor_file *pfile = file->driver_priv;
> + struct drm_panthor_perf_control *args = data;
> + int ret;
> +
> + if (!args->pointer) {
> + switch (args->cmd) {
> + case DRM_PANTHOR_PERF_COMMAND_SETUP:
> + args->size = sizeof(struct drm_panthor_perf_cmd_setup);
> + return 0;
> +
> + case DRM_PANTHOR_PERF_COMMAND_TEARDOWN:
> + args->size = 0;
> + return 0;
> +
> + case DRM_PANTHOR_PERF_COMMAND_START:
> + args->size = sizeof(struct drm_panthor_perf_cmd_start);
> + return 0;
> +
> + case DRM_PANTHOR_PERF_COMMAND_STOP:
> + args->size = sizeof(struct drm_panthor_perf_cmd_stop);
> + return 0;
> +
> + case DRM_PANTHOR_PERF_COMMAND_SAMPLE:
> + args->size = sizeof(struct drm_panthor_perf_cmd_sample);
> + return 0;
> +
> + default:
> + return -EINVAL;
> + }
> + }
> +
> + switch (args->cmd) {
> + case DRM_PANTHOR_PERF_COMMAND_SETUP:
> + {
> + struct drm_panthor_perf_cmd_setup *setup_args __free(kvfree) = NULL;
> +
> + ret = PANTHOR_UOBJ_GET(setup_args, args->size, args->pointer);
> + if (ret)
> + return -EINVAL;
> +
> + return panthor_perf_session_setup(ptdev, ptdev->perf, setup_args, pfile);
> + }
> + case DRM_PANTHOR_PERF_COMMAND_TEARDOWN:
> + {
> + return panthor_perf_session_teardown(pfile, ptdev->perf, args->handle);
> + }
> + case DRM_PANTHOR_PERF_COMMAND_START:
> + {
> + perf_cmd(start);
> + }
> + case DRM_PANTHOR_PERF_COMMAND_STOP:
> + {
> + perf_cmd(stop);
> + }
> + case DRM_PANTHOR_PERF_COMMAND_SAMPLE:
> + {
> + perf_cmd(sample);
> + }
> + default:
> + return -EINVAL;
> + }
> +}
> +
> static int
> panthor_open(struct drm_device *ddev, struct drm_file *file)
> {
> @@ -1409,6 +1543,7 @@ static const struct drm_ioctl_desc panthor_drm_driver_ioctls[] = {
> PANTHOR_IOCTL(TILER_HEAP_CREATE, tiler_heap_create, DRM_RENDER_ALLOW),
> PANTHOR_IOCTL(TILER_HEAP_DESTROY, tiler_heap_destroy, DRM_RENDER_ALLOW),
> PANTHOR_IOCTL(GROUP_SUBMIT, group_submit, DRM_RENDER_ALLOW),
> + PANTHOR_IOCTL(PERF_CONTROL, perf_control, DRM_RENDER_ALLOW),
> };
>
> static int panthor_mmap(struct file *filp, struct vm_area_struct *vma)
> @@ -1518,6 +1653,8 @@ static void panthor_debugfs_init(struct drm_minor *minor)
> * - 1.2 - adds DEV_QUERY_GROUP_PRIORITIES_INFO query
> * - adds PANTHOR_GROUP_PRIORITY_REALTIME priority
> * - 1.3 - adds DRM_PANTHOR_GROUP_STATE_INNOCENT flag
> + * - 1.4 - adds DEV_QUERY_PERF_INFO query
> + * - adds PERF_CONTROL ioctl
> */
> static const struct drm_driver panthor_drm_driver = {
> .driver_features = DRIVER_RENDER | DRIVER_GEM | DRIVER_SYNCOBJ |
> @@ -1531,7 +1668,7 @@ static const struct drm_driver panthor_drm_driver = {
> .name = "panthor",
> .desc = "Panthor DRM driver",
> .major = 1,
> - .minor = 3,
> + .minor = 4,
>
> .gem_create_object = panthor_gem_create_object,
> .gem_prime_import_sg_table = drm_gem_shmem_prime_import_sg_table,
> --
> 2.33.0.dirty
Adrian Larumbe
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v4 2/7] drm/panthor: Add DEV_QUERY.PERF_INFO handling for Gx10
2025-05-16 15:49 ` [PATCH v4 2/7] drm/panthor: Add DEV_QUERY.PERF_INFO handling for Gx10 Lukas Zapolskas
2025-07-18 2:52 ` Adrián Larumbe
@ 2025-07-18 15:11 ` Adrián Larumbe
2025-07-21 9:06 ` Lukas Zapolskas
1 sibling, 1 reply; 29+ messages in thread
From: Adrián Larumbe @ 2025-07-18 15:11 UTC (permalink / raw)
To: Lukas Zapolskas
Cc: Boris Brezillon, Steven Price, Liviu Dudau, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
dri-devel, linux-kernel
Hi Lucas, forgot to add one comment in the previous patch review,
On 16.05.2025 16:49, Lukas Zapolskas wrote:
> This change adds the IOCTL to query data about the performance counter
> setup. Some of this data was available via previous DEV_QUERY calls,
> for instance for GPU info, but exposing it via PERF_INFO
> minimizes the overhead of creating a single session to just the one
> aggregate IOCTL.
>
> Signed-off-by: Lukas Zapolskas <lukas.zapolskas@arm.com>
> Reviewed-by: Adrián Larumbe <adrian.larumbe@collabora.com>
> ---
> drivers/gpu/drm/panthor/Makefile | 1 +
> drivers/gpu/drm/panthor/panthor_device.c | 5 ++
> drivers/gpu/drm/panthor/panthor_device.h | 3 +
> drivers/gpu/drm/panthor/panthor_drv.c | 10 +++-
> drivers/gpu/drm/panthor/panthor_fw.h | 3 +
> drivers/gpu/drm/panthor/panthor_perf.c | 76 ++++++++++++++++++++++++
> drivers/gpu/drm/panthor/panthor_perf.h | 15 +++++
> drivers/gpu/drm/panthor/panthor_regs.h | 1 +
> 8 files changed, 113 insertions(+), 1 deletion(-)
> create mode 100644 drivers/gpu/drm/panthor/panthor_perf.c
> create mode 100644 drivers/gpu/drm/panthor/panthor_perf.h
>
> diff --git a/drivers/gpu/drm/panthor/Makefile b/drivers/gpu/drm/panthor/Makefile
> index 15294719b09c..0df9947f3575 100644
> --- a/drivers/gpu/drm/panthor/Makefile
> +++ b/drivers/gpu/drm/panthor/Makefile
> @@ -9,6 +9,7 @@ panthor-y := \
> panthor_gpu.o \
> panthor_heap.o \
> panthor_mmu.o \
> + panthor_perf.o \
> panthor_sched.o
>
> obj-$(CONFIG_DRM_PANTHOR) += panthor.o
> diff --git a/drivers/gpu/drm/panthor/panthor_device.c b/drivers/gpu/drm/panthor/panthor_device.c
> index a9da1d1eeb70..76b4cf3dc391 100644
> --- a/drivers/gpu/drm/panthor/panthor_device.c
> +++ b/drivers/gpu/drm/panthor/panthor_device.c
> @@ -19,6 +19,7 @@
> #include "panthor_fw.h"
> #include "panthor_gpu.h"
> #include "panthor_mmu.h"
> +#include "panthor_perf.h"
> #include "panthor_regs.h"
> #include "panthor_sched.h"
>
> @@ -259,6 +260,10 @@ int panthor_device_init(struct panthor_device *ptdev)
> if (ret)
> goto err_unplug_fw;
>
> + ret = panthor_perf_init(ptdev);
> + if (ret)
> + goto err_unplug_fw;
> +
> /* ~3 frames */
> pm_runtime_set_autosuspend_delay(ptdev->base.dev, 50);
> pm_runtime_use_autosuspend(ptdev->base.dev);
> diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
> index da6574021664..657ccc39568c 100644
> --- a/drivers/gpu/drm/panthor/panthor_device.h
> +++ b/drivers/gpu/drm/panthor/panthor_device.h
> @@ -120,6 +120,9 @@ struct panthor_device {
> /** @csif_info: Command stream interface information. */
> struct drm_panthor_csif_info csif_info;
>
> + /** @perf_info: Performance counter interface information. */
> + struct drm_panthor_perf_info perf_info;
> +
> /** @gpu: GPU management data. */
> struct panthor_gpu *gpu;
>
> diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c
> index 06fe46e32073..9d2b716cca45 100644
> --- a/drivers/gpu/drm/panthor/panthor_drv.c
> +++ b/drivers/gpu/drm/panthor/panthor_drv.c
> @@ -175,7 +175,8 @@ panthor_get_uobj_array(const struct drm_panthor_obj_array *in, u32 min_stride,
> PANTHOR_UOBJ_DECL(struct drm_panthor_sync_op, timeline_value), \
> PANTHOR_UOBJ_DECL(struct drm_panthor_queue_submit, syncs), \
> PANTHOR_UOBJ_DECL(struct drm_panthor_queue_create, ringbuf_size), \
> - PANTHOR_UOBJ_DECL(struct drm_panthor_vm_bind_op, syncs))
> + PANTHOR_UOBJ_DECL(struct drm_panthor_vm_bind_op, syncs), \
> + PANTHOR_UOBJ_DECL(struct drm_panthor_perf_info, shader_blocks))
>
> /**
> * PANTHOR_UOBJ_SET() - Copy a kernel object to a user object.
> @@ -835,6 +836,10 @@ static int panthor_ioctl_dev_query(struct drm_device *ddev, void *data, struct d
> args->size = sizeof(priorities_info);
> return 0;
>
> + case DRM_PANTHOR_DEV_QUERY_PERF_INFO:
> + args->size = sizeof(ptdev->perf_info);
> + return 0;
> +
> default:
> return -EINVAL;
> }
> @@ -859,6 +864,9 @@ static int panthor_ioctl_dev_query(struct drm_device *ddev, void *data, struct d
> panthor_query_group_priorities_info(file, &priorities_info);
> return PANTHOR_UOBJ_SET(args->pointer, args->size, priorities_info);
>
> + case DRM_PANTHOR_DEV_QUERY_PERF_INFO:
> + return PANTHOR_UOBJ_SET(args->pointer, args->size, ptdev->perf_info);
> +
> default:
> return -EINVAL;
> }
> diff --git a/drivers/gpu/drm/panthor/panthor_fw.h b/drivers/gpu/drm/panthor/panthor_fw.h
> index 6598d96c6d2a..8bcb933fa790 100644
> --- a/drivers/gpu/drm/panthor/panthor_fw.h
> +++ b/drivers/gpu/drm/panthor/panthor_fw.h
> @@ -197,8 +197,11 @@ struct panthor_fw_global_control_iface {
> u32 output_va;
> u32 group_num;
> u32 group_stride;
> +#define GLB_PERFCNT_FW_SIZE(x) ((((x) >> 16) << 8))
> u32 perfcnt_size;
> u32 instr_features;
> +#define PERFCNT_FEATURES_MD_SIZE(x) (((x) & GENMASK(3, 0)) << 8)
> + u32 perfcnt_features;
> };
>
> struct panthor_fw_global_input_iface {
> diff --git a/drivers/gpu/drm/panthor/panthor_perf.c b/drivers/gpu/drm/panthor/panthor_perf.c
> new file mode 100644
> index 000000000000..66e9a197ac1f
> --- /dev/null
> +++ b/drivers/gpu/drm/panthor/panthor_perf.c
> @@ -0,0 +1,76 @@
> +// SPDX-License-Identifier: GPL-2.0 or MIT
> +/* Copyright 2023 Collabora Ltd */
> +/* Copyright 2025 Arm ltd. */
> +
> +#include <linux/bitops.h>
> +#include <drm/panthor_drm.h>
> +
> +#include "panthor_device.h"
> +#include "panthor_fw.h"
> +#include "panthor_perf.h"
> +
> +struct panthor_perf_counter_block {
> + struct drm_panthor_perf_block_header header;
> + u64 counters[];
> +};
> +
> +static size_t get_annotated_block_size(size_t counters_per_block)
> +{
> + return struct_size_t(struct panthor_perf_counter_block, counters, counters_per_block);
> +}
> +
> +static size_t session_get_user_sample_size(const struct drm_panthor_perf_info *const info)
> +{
> + const size_t block_size = get_annotated_block_size(info->counters_per_block);
> + const size_t block_nr = info->cshw_blocks + info->fw_blocks +
> + info->tiler_blocks + info->memsys_blocks + info->shader_blocks;
> +
> + return sizeof(struct drm_panthor_perf_sample_header) + (block_size * block_nr);
I think you could use 'perf_info->sample_header_size' because you assign it in the calling function.
> +}
> +
> +/**
> + * PANTHOR_PERF_COUNTERS_PER_BLOCK - On CSF architectures pre-11.x, the number of counters
> + * per block was hardcoded to be 64. Arch 11.0 onwards supports the PRFCNT_FEATURES GPU register,
> + * which indicates the same information.
> + */
> +#define PANTHOR_PERF_COUNTERS_PER_BLOCK (64)
> +
> +static void panthor_perf_info_init(struct panthor_device *ptdev)
> +{
> + struct panthor_fw_global_iface *glb_iface = panthor_fw_get_glb_iface(ptdev);
> + struct drm_panthor_perf_info *const perf_info = &ptdev->perf_info;
> +
> + if (PERFCNT_FEATURES_MD_SIZE(glb_iface->control->perfcnt_features))
> + perf_info->flags |= DRM_PANTHOR_PERF_BLOCK_STATES_SUPPORT;
> +
> + perf_info->counters_per_block = PANTHOR_PERF_COUNTERS_PER_BLOCK;
> +
> + perf_info->sample_header_size = sizeof(struct drm_panthor_perf_sample_header);
> + perf_info->block_header_size = sizeof(struct drm_panthor_perf_block_header);
> +
> + if (GLB_PERFCNT_FW_SIZE(glb_iface->control->perfcnt_size))
> + perf_info->fw_blocks = 1;
> +
> + perf_info->cshw_blocks = 1;
> + perf_info->tiler_blocks = 1;
> + perf_info->memsys_blocks = GPU_MEM_FEATURES_L2_SLICES(ptdev->gpu_info.mem_features);
> + perf_info->shader_blocks = hweight64(ptdev->gpu_info.shader_present);
> +
> + perf_info->sample_size = session_get_user_sample_size(perf_info);
> +}
> +
> +/**
> + * panthor_perf_init - Initialize the performance counter subsystem.
> + * @ptdev: Panthor device
> + *
> + * Return: 0 on success, negative error code on failure.
> + */
> +int panthor_perf_init(struct panthor_device *ptdev)
> +{
> + if (!ptdev)
> + return -EINVAL;
> +
> + panthor_perf_info_init(ptdev);
> +
> + return 0;
> +}
> diff --git a/drivers/gpu/drm/panthor/panthor_perf.h b/drivers/gpu/drm/panthor/panthor_perf.h
> new file mode 100644
> index 000000000000..3c32c24c164c
> --- /dev/null
> +++ b/drivers/gpu/drm/panthor/panthor_perf.h
> @@ -0,0 +1,15 @@
> +/* SPDX-License-Identifier: GPL-2.0 or MIT */
> +/* Copyright 2025 Collabora Ltd */
> +/* Copyright 2025 Arm ltd. */
> +
> +#ifndef __PANTHOR_PERF_H__
> +#define __PANTHOR_PERF_H__
> +
> +#include <linux/types.h>
> +
> +struct panthor_device;
> +
> +int panthor_perf_init(struct panthor_device *ptdev);
> +
> +#endif /* __PANTHOR_PERF_H__ */
> +
> diff --git a/drivers/gpu/drm/panthor/panthor_regs.h b/drivers/gpu/drm/panthor/panthor_regs.h
> index b7b3b3add166..d9e9379d1a20 100644
> --- a/drivers/gpu/drm/panthor/panthor_regs.h
> +++ b/drivers/gpu/drm/panthor/panthor_regs.h
> @@ -27,6 +27,7 @@
> #define GPU_TILER_FEATURES 0xC
> #define GPU_MEM_FEATURES 0x10
> #define GROUPS_L2_COHERENT BIT(0)
> +#define GPU_MEM_FEATURES_L2_SLICES(x) ((((x) & GENMASK(11, 8)) >> 8) + 1)
>
> #define GPU_MMU_FEATURES 0x14
> #define GPU_MMU_FEATURES_VA_BITS(x) ((x) & GENMASK(7, 0))
> --
> 2.33.0.dirty
Adrian Larumbe
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v4 7/7] drm/panthor: Expose the panthor perf ioctls
2025-05-16 15:49 ` [PATCH v4 7/7] drm/panthor: Expose the panthor perf ioctls Lukas Zapolskas
2025-07-18 15:05 ` Adrián Larumbe
@ 2025-07-18 15:19 ` Adrián Larumbe
2025-07-25 9:09 ` Lukas Zapolskas
1 sibling, 1 reply; 29+ messages in thread
From: Adrián Larumbe @ 2025-07-18 15:19 UTC (permalink / raw)
To: Lukas Zapolskas
Cc: Boris Brezillon, Steven Price, Liviu Dudau, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
dri-devel, linux-kernel
Hi Lucas, another missing remark from the original review,
On 16.05.2025 16:49, Lukas Zapolskas wrote:
> This patch implements the PANTHOR_PERF_CONTROL ioctl series, and
> a PANTHOR_GET_UOBJ wrapper to deal with the backwards and forwards
> compatibility of the uAPI.
>
> The minor version is bumped to indicate that the feature is now
> supported.
>
> Signed-off-by: Lukas Zapolskas <lukas.zapolskas@arm.com>
> Reviewed-by: Adrián Larumbe <adrian.larumbe@collabora.com>
> ---
> drivers/gpu/drm/panthor/panthor_drv.c | 141 +++++++++++++++++++++++++-
> 1 file changed, 139 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c
> index 4c1381320859..850a894fe91b 100644
> --- a/drivers/gpu/drm/panthor/panthor_drv.c
> +++ b/drivers/gpu/drm/panthor/panthor_drv.c
> @@ -31,6 +31,7 @@
> #include "panthor_gpu.h"
> #include "panthor_heap.h"
> #include "panthor_mmu.h"
> +#include "panthor_perf.h"
> #include "panthor_regs.h"
> #include "panthor_sched.h"
>
> @@ -73,6 +74,39 @@ panthor_set_uobj(u64 usr_ptr, u32 usr_size, u32 min_size, u32 kern_size, const v
> return 0;
> }
>
> +/**
> + * panthor_get_uobj() - Copy kernel object to user object.
> + * @usr_ptr: Users pointer.
> + * @usr_size: Size of the user object.
> + * @min_size: Minimum size for this object.
> + *
> + * Helper automating kernel -> user object copies.
> + *
> + * Don't use this function directly, use PANTHOR_UOBJ_GET() instead.
> + *
> + * Return: valid pointer on success, an encoded error code otherwise.
> + */
> +static void*
> +panthor_get_uobj(u64 usr_ptr, u32 usr_size, u32 min_size)
> +{
> + int ret;
> + void *out_alloc __free(kvfree) = NULL;
> +
> + /* User size shouldn't be smaller than the minimal object size. */
> + if (usr_size < min_size)
> + return ERR_PTR(-EINVAL);
> +
> + out_alloc = kvmalloc(min_size, GFP_KERNEL);
> + if (!out_alloc)
> + return ERR_PTR(-ENOMEM);
> +
> + ret = copy_struct_from_user(out_alloc, min_size, u64_to_user_ptr(usr_ptr), usr_size);
> + if (ret)
> + return ERR_PTR(ret);
> +
> + return_ptr(out_alloc);
> +}
> +
> /**
> * panthor_get_uobj_array() - Copy a user object array into a kernel accessible object array.
> * @in: The object array to copy.
> @@ -176,7 +210,12 @@ panthor_get_uobj_array(const struct drm_panthor_obj_array *in, u32 min_stride,
> PANTHOR_UOBJ_DECL(struct drm_panthor_queue_submit, syncs), \
> PANTHOR_UOBJ_DECL(struct drm_panthor_queue_create, ringbuf_size), \
> PANTHOR_UOBJ_DECL(struct drm_panthor_vm_bind_op, syncs), \
> - PANTHOR_UOBJ_DECL(struct drm_panthor_perf_info, shader_blocks))
> + PANTHOR_UOBJ_DECL(struct drm_panthor_perf_info, shader_blocks), \
> + PANTHOR_UOBJ_DECL(struct drm_panthor_perf_cmd_setup, shader_enable_mask), \
> + PANTHOR_UOBJ_DECL(struct drm_panthor_perf_cmd_start, user_data), \
> + PANTHOR_UOBJ_DECL(struct drm_panthor_perf_cmd_stop, user_data), \
> + PANTHOR_UOBJ_DECL(struct drm_panthor_perf_cmd_sample, user_data))
> +
>
> /**
> * PANTHOR_UOBJ_SET() - Copy a kernel object to a user object.
> @@ -191,6 +230,24 @@ panthor_get_uobj_array(const struct drm_panthor_obj_array *in, u32 min_stride,
> PANTHOR_UOBJ_MIN_SIZE(_src_obj), \
> sizeof(_src_obj), &(_src_obj))
>
> +/**
> + * PANTHOR_UOBJ_GET() - Copies a user object from _usr_ptr to a kernel accessible _dest_ptr.
> + * @_dest_ptr: Local variable
> + * @_usr_size: Size of the user object.
> + * @_usr_ptr: The pointer of the object in userspace.
> + *
> + * Return: Error code. See panthor_get_uobj().
> + */
> +#define PANTHOR_UOBJ_GET(_dest_ptr, _usr_size, _usr_ptr) \
> + ({ \
> + typeof(_dest_ptr) _tmp; \
> + _tmp = panthor_get_uobj(_usr_ptr, _usr_size, \
> + PANTHOR_UOBJ_MIN_SIZE(_tmp[0])); \
> + if (!IS_ERR(_tmp)) \
> + _dest_ptr = _tmp; \
> + PTR_ERR_OR_ZERO(_tmp); \
> + })
> +
> /**
> * PANTHOR_UOBJ_GET_ARRAY() - Copy a user object array to a kernel accessible
> * object array.
> @@ -1339,6 +1396,83 @@ static int panthor_ioctl_vm_get_state(struct drm_device *ddev, void *data,
> return 0;
> }
>
> +#define perf_cmd(command) \
> + ({ \
> + struct drm_panthor_perf_cmd_##command *command##_args __free(kvfree) = NULL; \
> + int _ret = PANTHOR_UOBJ_GET(command##_args, args->size, args->pointer); \
> + if (_ret) \
> + return _ret; \
> + return panthor_perf_session_##command(pfile, ptdev->perf, args->handle, \
> + command##_args->user_data); \
> + })
> +
> +static int panthor_ioctl_perf_control(struct drm_device *ddev, void *data,
> + struct drm_file *file)
> +{
> + struct panthor_device *ptdev = container_of(ddev, struct panthor_device, base);
> + struct panthor_file *pfile = file->driver_priv;
> + struct drm_panthor_perf_control *args = data;
> + int ret;
> +
> + if (!args->pointer) {
> + switch (args->cmd) {
> + case DRM_PANTHOR_PERF_COMMAND_SETUP:
> + args->size = sizeof(struct drm_panthor_perf_cmd_setup);
> + return 0;
> +
> + case DRM_PANTHOR_PERF_COMMAND_TEARDOWN:
> + args->size = 0;
> + return 0;
> +
> + case DRM_PANTHOR_PERF_COMMAND_START:
> + args->size = sizeof(struct drm_panthor_perf_cmd_start);
> + return 0;
> +
> + case DRM_PANTHOR_PERF_COMMAND_STOP:
> + args->size = sizeof(struct drm_panthor_perf_cmd_stop);
> + return 0;
> +
> + case DRM_PANTHOR_PERF_COMMAND_SAMPLE:
> + args->size = sizeof(struct drm_panthor_perf_cmd_sample);
> + return 0;
> +
> + default:
> + return -EINVAL;
> + }
> + }
> +
> + switch (args->cmd) {
> + case DRM_PANTHOR_PERF_COMMAND_SETUP:
> + {
> + struct drm_panthor_perf_cmd_setup *setup_args __free(kvfree) = NULL;
> +
> + ret = PANTHOR_UOBJ_GET(setup_args, args->size, args->pointer);
> + if (ret)
> + return -EINVAL;
> +
> + return panthor_perf_session_setup(ptdev, ptdev->perf, setup_args, pfile);
I think this is something I had already brought up in the revision for v2 of the patch series,
but I think I would pass the drm_file here straight away rather than the panthor file,
then retrieve the panthor_file pointer from the file's driver_priv field inside
panthor_perf_session_setup, and that way you can get rid of struct panthor_file::drm_file.
I think this should be alright, because the only place where it'd be essential to keep
a copy of the drm_file is in the session struct, to make sure sessions match their DRM device fd's.
> + }
> + case DRM_PANTHOR_PERF_COMMAND_TEARDOWN:
> + {
> + return panthor_perf_session_teardown(pfile, ptdev->perf, args->handle);
> + }
> + case DRM_PANTHOR_PERF_COMMAND_START:
> + {
> + perf_cmd(start);
> + }
> + case DRM_PANTHOR_PERF_COMMAND_STOP:
> + {
> + perf_cmd(stop);
> + }
> + case DRM_PANTHOR_PERF_COMMAND_SAMPLE:
> + {
> + perf_cmd(sample);
> + }
> + default:
> + return -EINVAL;
> + }
> +}
> +
> static int
> panthor_open(struct drm_device *ddev, struct drm_file *file)
> {
> @@ -1409,6 +1543,7 @@ static const struct drm_ioctl_desc panthor_drm_driver_ioctls[] = {
> PANTHOR_IOCTL(TILER_HEAP_CREATE, tiler_heap_create, DRM_RENDER_ALLOW),
> PANTHOR_IOCTL(TILER_HEAP_DESTROY, tiler_heap_destroy, DRM_RENDER_ALLOW),
> PANTHOR_IOCTL(GROUP_SUBMIT, group_submit, DRM_RENDER_ALLOW),
> + PANTHOR_IOCTL(PERF_CONTROL, perf_control, DRM_RENDER_ALLOW),
> };
>
> static int panthor_mmap(struct file *filp, struct vm_area_struct *vma)
> @@ -1518,6 +1653,8 @@ static void panthor_debugfs_init(struct drm_minor *minor)
> * - 1.2 - adds DEV_QUERY_GROUP_PRIORITIES_INFO query
> * - adds PANTHOR_GROUP_PRIORITY_REALTIME priority
> * - 1.3 - adds DRM_PANTHOR_GROUP_STATE_INNOCENT flag
> + * - 1.4 - adds DEV_QUERY_PERF_INFO query
> + * - adds PERF_CONTROL ioctl
> */
> static const struct drm_driver panthor_drm_driver = {
> .driver_features = DRIVER_RENDER | DRIVER_GEM | DRIVER_SYNCOBJ |
> @@ -1531,7 +1668,7 @@ static const struct drm_driver panthor_drm_driver = {
> .name = "panthor",
> .desc = "Panthor DRM driver",
> .major = 1,
> - .minor = 3,
> + .minor = 4,
>
> .gem_create_object = panthor_gem_create_object,
> .gem_prime_import_sg_table = drm_gem_shmem_prime_import_sg_table,
> --
> 2.33.0.dirty
Adrian Larumbe
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v4 1/7] drm/panthor: Add performance counter uAPI
2025-07-18 2:43 ` Adrián Larumbe
@ 2025-07-21 8:46 ` Lukas Zapolskas
0 siblings, 0 replies; 29+ messages in thread
From: Lukas Zapolskas @ 2025-07-21 8:46 UTC (permalink / raw)
To: Adrián Larumbe
Cc: Boris Brezillon, Steven Price, Liviu Dudau, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
dri-devel, linux-kernel, Mihail Atanassov
Hello Adrián,
Thanks for taking a look!
On 18/07/2025 03:43, Adrián Larumbe wrote:
> Hi Lucas,
>
> On 16.05.2025 16:49, Lukas Zapolskas wrote:
>> This patch extends the DEV_QUERY ioctl to return information about the
>> performance counter setup for userspace, and introduces the new
>> ioctl DRM_PANTHOR_PERF_CONTROL in order to allow for the sampling of
>> performance counters.
>>
>> The new design is inspired by the perf aux ringbuffer, with the insert
>> and extract indices being mapped to userspace, allowing multiple samples
>> to be exposed at any given time. To avoid pointer chasing, the sample
>> metadata and block metadata are inline with the elements they
>> describe.
>
> Is the perf aux ringbuffer something internal to ARM's DDK?
>
I'm referring to the in-tree perf tool, which has its ring buffer
design documented here [0].
>> Userspace is responsible for passing in resources for samples to be
>> exposed, including the event file descriptor for notification of new
>> sample availability, the ringbuffer BO to store samples, and the
>> control BO along with the offset for mapping the insert and extract
>> indices. Though these indices are only a total of 8 bytes, userspace
>> can then reuse the same physical page for tracking the state of
>> multiple buffers by giving different offsets from the BO start to
>> map them.
>>
>> Co-developed-by: Mihail Atanassov <mihail.atanassov@arm.com>
>> Signed-off-by: Mihail Atanassov <mihail.atanassov@arm.com>
>> Signed-off-by: Lukas Zapolskas <lukas.zapolskas@arm.com>
>> ---
>> include/uapi/drm/panthor_drm.h | 565 +++++++++++++++++++++++++++++++++
>> 1 file changed, 565 insertions(+)
>>
>> diff --git a/include/uapi/drm/panthor_drm.h b/include/uapi/drm/panthor_drm.h
>> index 97e2c4510e69..a74eabcabbcb 100644
>> --- a/include/uapi/drm/panthor_drm.h
>> +++ b/include/uapi/drm/panthor_drm.h
>> @@ -127,6 +127,9 @@ enum drm_panthor_ioctl_id {
>>
>> /** @DRM_PANTHOR_TILER_HEAP_DESTROY: Destroy a tiler heap. */
>> DRM_PANTHOR_TILER_HEAP_DESTROY,
>> +
>> + /** @DRM_PANTHOR_PERF_CONTROL: Control a performance counter session. */
>> + DRM_PANTHOR_PERF_CONTROL,
>> };
>>
>> /**
>> @@ -226,6 +229,9 @@ enum drm_panthor_dev_query_type {
>> * @DRM_PANTHOR_DEV_QUERY_GROUP_PRIORITIES_INFO: Query allowed group priorities information.
>> */
>> DRM_PANTHOR_DEV_QUERY_GROUP_PRIORITIES_INFO,
>> +
>> + /** @DRM_PANTHOR_DEV_QUERY_PERF_INFO: Query performance counter interface information. */
>> + DRM_PANTHOR_DEV_QUERY_PERF_INFO,
>> };
>>
>> /**
>> @@ -379,6 +385,135 @@ struct drm_panthor_group_priorities_info {
>> __u8 pad[3];
>> };
>>
>> +/**
>> + * enum drm_panthor_perf_feat_flags - Performance counter configuration feature flags.
>> + */
>> +enum drm_panthor_perf_feat_flags {
>> + /** @DRM_PANTHOR_PERF_BLOCK_STATES_SUPPORT: Coarse-grained block states are supported. */
>> + DRM_PANTHOR_PERF_BLOCK_STATES_SUPPORT = 1 << 0,
>> +};
>> +
>> +/**
>> + * enum drm_panthor_perf_block_type - Performance counter supported block types.
>> + */
>> +enum drm_panthor_perf_block_type {
>> + /** @DRM_PANTHOR_PERF_BLOCK_METADATA: Internal use only. */
>> + DRM_PANTHOR_PERF_BLOCK_METADATA = 0,
>> +
>> + /** @DRM_PANTHOR_PERF_BLOCK_FW: The FW counter block. */
>> + DRM_PANTHOR_PERF_BLOCK_FW,
>> +
>> + /** @DRM_PANTHOR_PERF_BLOCK_CSHW: The CSHW counter block. */
>> + DRM_PANTHOR_PERF_BLOCK_CSHW,
>> +
>> + /** @DRM_PANTHOR_PERF_BLOCK_TILER: The tiler counter block. */
>> + DRM_PANTHOR_PERF_BLOCK_TILER,
>> +
>> + /** @DRM_PANTHOR_PERF_BLOCK_MEMSYS: A memsys counter block. */
>> + DRM_PANTHOR_PERF_BLOCK_MEMSYS,
>> +
>> + /** @DRM_PANTHOR_PERF_BLOCK_SHADER: A shader core counter block. */
>> + DRM_PANTHOR_PERF_BLOCK_SHADER,
>> +
>> + /** @DRM_PANTHOR_PERF_BLOCK_FIRST: Internal use only. */
>> + DRM_PANTHOR_PERF_BLOCK_FIRST = DRM_PANTHOR_PERF_BLOCK_FW,
>> +
>> + /** @DRM_PANTHOR_PERF_BLOCK_LAST: Internal use only. */
>> + DRM_PANTHOR_PERF_BLOCK_LAST = DRM_PANTHOR_PERF_BLOCK_SHADER,
>> +
>> + /** @DRM_PANTHOR_PERF_BLOCK_MAX: Internal use only. */
>> + DRM_PANTHOR_PERF_BLOCK_MAX = DRM_PANTHOR_PERF_BLOCK_LAST + 1,
>> +};
>> +
>> +/**
>> + * enum drm_panthor_perf_clock - Identifier of the clock used to produce the cycle count values
>> + * in a given block.
>> + *
>> + * Since the integrator has the choice of using one or more clocks, there may be some confusion
>> + * as to which blocks are counted by which clock values unless this information is explicitly
>> + * provided as part of every block sample. Not every single clock here can be used: in the simplest
>> + * case, all cycle counts will be associated with the top-level clock.
>> + */
>> +enum drm_panthor_perf_clock {
>> + /** @DRM_PANTHOR_PERF_CLOCK_TOPLEVEL: Top-level CSF clock. */
>> + DRM_PANTHOR_PERF_CLOCK_TOPLEVEL,
>> +
>> + /**
>> + * @DRM_PANTHOR_PERF_CLOCK_COREGROUP: Core group clock, responsible for the MMU, L2
>> + * caches and the tiler.
>> + */
>> + DRM_PANTHOR_PERF_CLOCK_COREGROUP,
>> +
>> + /** @DRM_PANTHOR_PERF_CLOCK_SHADER: Clock for the shader cores. */
>> + DRM_PANTHOR_PERF_CLOCK_SHADER,
>> +};
>> +
>> +/**
>> + * struct drm_panthor_perf_info - Performance counter interface information
>> + *
>> + * Structure grouping all queryable information relating to the performance counter
>> + * interfaces.
>> + */
>> +struct drm_panthor_perf_info {
>> + /**
>> + * @counters_per_block: The number of 8-byte counters available in a block.
>> + */
>> + __u32 counters_per_block;
>> +
>> + /**
>> + * @sample_header_size: The size of the header struct available at the beginning
>> + * of every sample.
>> + */
>> + __u32 sample_header_size;
>> +
>> + /**
>> + * @block_header_size: The size of the header struct inline with the counters for a
>> + * single block.
>> + */
>> + __u32 block_header_size;
>> +
>> + /**
>> + * @sample_size: The size of a fully annotated sample, starting with a sample header
>> + * of size @sample_header_size bytes, and all available blocks for the current
>> + * configuration, each comprised of @counters_per_block 64-bit counters and
>> + * a block header of @block_header_size bytes.
>> + *
>> + * The user must use this field to allocate size for the ring buffer. In
>> + * the case of new blocks being added, an old userspace can always use
>> + * this field and ignore any blocks it does not know about.
>> + */
>> + __u32 sample_size;
>
> I might've asked this question in a previous review, but couldn't user space easily calculate
> the sample size with sample_header_size + block_header_size*(?_blocks) + (?_blocks)*counters_per_block ?
>
It can if the versions of the userspace and the kernel are in lockstep. In the case of an old userspace
and a newer kernel, we can have a new field added to the end of this struct, and the user would not
know how to interpret this. In that case, the user could not successfully create a new session.
>> + /** @flags: Combination of drm_panthor_perf_feat_flags flags. */
>> + __u32 flags;
>> +
>> + /**
>> + * @supported_clocks: Bitmask of the clocks supported by the GPU.
>> + *
>> + * Each bit represents a variant of the enum drm_panthor_perf_clock.
>> + *
>> + * For the same GPU, different implementers may have different clocks for the same hardware
>> + * block. At the moment, up to four clocks are supported, and any clocks that are present
>> + * will be reported here.
>
> However, there seems to be just three clocks in in the drm_panthor_perf_clock enum definition.
>
Thanks for pointing that out! Need to clean this up.
> t> + */
>> + __u32 supported_clocks;
>> +
>> + /** @fw_blocks: Number of FW blocks available. */
>> + __u32 fw_blocks;
>> +
>> + /** @cshw_blocks: Number of CSHW blocks available. */
>> + __u32 cshw_blocks;
>> +
>> + /** @tiler_blocks: Number of tiler blocks available. */
>> + __u32 tiler_blocks;
>> +
>> + /** @memsys_blocks: Number of memsys blocks available. */
>> + __u32 memsys_blocks;
>> +
>> + /** @shader_blocks: Number of shader core blocks available. */
>> + __u32 shader_blocks;
>> +};
>> +
>> /**
>> * struct drm_panthor_dev_query - Arguments passed to DRM_PANTHOR_IOCTL_DEV_QUERY
>> */
>> @@ -977,6 +1112,434 @@ struct drm_panthor_tiler_heap_destroy {
>> __u32 pad;
>> };
>>
>> +/**
>> + * DOC: Performance counter decoding in userspace.
>> + *
>> + * Each sample will be exposed to userspace in the following manner:
>> + *
>> + * +--------+--------+------------------------+--------+-------------------------+-----+
>> + * | Sample | Block | Block | Block | Block | ... |
>> + * | header | header | counters | header | counters | |
>> + * +--------+--------+------------------------+--------+-------------------------+-----+
>> + *
>> + * Each sample will start with a sample header of type @struct drm_panthor_perf_sample header,
>> + * providing sample-wide information like the start and end timestamps, the counter set currently
>> + * configured, and any errors that may have occurred during sampling.
>> + *
>> + * After the fixed size header, the sample will consist of blocks of
>> + * 64-bit @drm_panthor_dev_query_perf_info::counters_per_block counters, each prefaced with a
>> + * header of its own, indicating source block type, as well as the cycle count needed to normalize
>> + * cycle values within that block, and a clock source identifier.
>> + */
>> +
>> +/**
>> + * enum drm_panthor_perf_block_state - Bitmask of the power and execution states that an individual
>> + * hardware block went through in a sampling period.
>> + *
>> + * Because the sampling period is controlled from userspace, the block may undergo multiple
>> + * state transitions, so this must be interpreted as one or more such transitions occurring.
>> + */
>> +enum drm_panthor_perf_block_state {
>> + /**
>> + * @DRM_PANTHOR_PERF_BLOCK_STATE_UNKNOWN: The state of this block was unknown during
>> + * the sampling period.
>> + */
>> + DRM_PANTHOR_PERF_BLOCK_STATE_UNKNOWN = 0,
>> +
>> + /**
>> + * @DRM_PANTHOR_PERF_BLOCK_STATE_ON: This block was powered on for some or all of
>> + * the sampling period.
>> + */
>> + DRM_PANTHOR_PERF_BLOCK_STATE_ON = 1 << 0,
>> +
>> + /**
>> + * @DRM_PANTHOR_PERF_BLOCK_STATE_OFF: This block was powered off for some or all of the
>> + * sampling period.
>> + */
>> + DRM_PANTHOR_PERF_BLOCK_STATE_OFF = 1 << 1,
>> +
>> + /**
>> + * @DRM_PANTHOR_PERF_BLOCK_STATE_AVAILABLE: This block was available for execution for
>> + * some or all of the sampling period.
>> + */
>> + DRM_PANTHOR_PERF_BLOCK_STATE_AVAILABLE = 1 << 2,
>> + /**
>> + * @DRM_PANTHOR_PERF_BLOCK_STATE_UNAVAILABLE: This block was unavailable for execution for
>> + * some or all of the sampling period.
>> + */
>> + DRM_PANTHOR_PERF_BLOCK_STATE_UNAVAILABLE = 1 << 3,
>> +
>> + /**
>> + * @DRM_PANTHOR_PERF_BLOCK_STATE_NORMAL: This block was executing in normal mode
>> + * for some or all of the sampling period.
>> + */
>> + DRM_PANTHOR_PERF_BLOCK_STATE_NORMAL = 1 << 4,
>> +
>> + /**
>> + * @DRM_PANTHOR_PERF_BLOCK_STATE_PROTECTED: This block was executing in protected mode
>> + * for some or all of the sampling period.
>> + */
>> + DRM_PANTHOR_PERF_BLOCK_STATE_PROTECTED = 1 << 5,
>> +};
>> +
>> +/**
>> + * struct drm_panthor_perf_block_header - Header present before every block in the
>> + * sample ringbuffer.
>> + */
>> +struct drm_panthor_perf_block_header {
>> + /** @block_type: Type of the block. */
>> + __u8 block_type;
>> +
>> + /** @block_idx: Block index. */
>> + __u8 block_idx;
>> +
>> + /**
>> + * @block_states: Coarse-grained block transitions, bitmask of enum
>> + * drm_panthor_perf_block_states.
>> + */
>> + __u8 block_states;
>> +
>> + /**
>> + * @clock: Clock used to produce the cycle count for this block, taken from
>> + * enum drm_panthor_perf_clock. The cycle counts are stored in the sample header.
>> + */
>> + __u8 clock;
>> +
>> + /** @pad: MBZ. */
>> + __u8 pad[4];
>> +
>> + /** @enable_mask: Bitmask of counters requested during the session setup. */
>> + __u64 enable_mask[2];
>> +};
>> +
>> +/**
>> + * enum drm_panthor_perf_sample_flags - Sample-wide events that occurred over the sampling
>> + * period.
>> + */
>> +enum drm_panthor_perf_sample_flags {
>> + /**
>> + * @DRM_PANTHOR_PERF_SAMPLE_OVERFLOW: This sample contains overflows due to the duration
>> + * of the sampling period.
>> + */
>> + DRM_PANTHOR_PERF_SAMPLE_OVERFLOW = 1 << 0,
>> +
>> + /**
>> + * @DRM_PANTHOR_PERF_SAMPLE_ERROR: This sample encountered an error condition during
>> + * the sample duration.
>> + */
>> + DRM_PANTHOR_PERF_SAMPLE_ERROR = 1 << 1,
>> +};
>> +
>> +/**
>> + * struct drm_panthor_perf_sample_header - Header present before every sample.
>> + */
>> +struct drm_panthor_perf_sample_header {
>> + /**
>> + * @timestamp_start_ns: Earliest timestamp that values in this sample represent, in
>> + * nanoseconds. Derived from CLOCK_MONOTONIC_RAW.
>> + */
>> + __u64 timestamp_start_ns;
>> +
>> + /**
>> + * @timestamp_end_ns: Latest timestamp that values in this sample represent, in
>> + * nanoseconds. Derived from CLOCK_MONOTONIC_RAW.
>> + */
>> + __u64 timestamp_end_ns;
>> +
>> + /** @block_set: Set of performance counter blocks. */
>> + __u8 block_set;
>> +
>> + /** @pad: MBZ. */
>> + __u8 pad[3];
>> +
>> + /** @flags: Current sample flags, combination of drm_panthor_perf_sample_flags. */
>> + __u32 flags;
>> +
>> + /**
>> + * @user_data: User data provided as part of the command that triggered this sample.
>> + *
>> + * - Automatic samples (periodic ones or those around non-counting periods or power state
>> + * transitions) will be tagged with the user_data provided as part of the
>> + * DRM_PANTHOR_PERF_COMMAND_START call.
>> + * - Manual samples will be tagged with the user_data provided with the
>> + * DRM_PANTHOR_PERF_COMMAND_SAMPLE call.
>> + * - A session's final automatic sample will be tagged with the user_data provided with the
>> + * DRM_PANTHOR_PERF_COMMAND_STOP call.
>> + */
>> + __u64 user_data;
>> +
>> + /**
>> + * @toplevel_clock_cycles: The number of cycles elapsed between
>> + * drm_panthor_perf_sample_header::timestamp_start_ns and
>> + * drm_panthor_perf_sample_header::timestamp_end_ns on the top-level clock if the
>> + * corresponding bit is set in drm_panthor_perf_info::supported_clocks.
>> + */
>> + __u64 toplevel_clock_cycles;
>> +
>> + /**
>> + * @coregroup_clock_cycles: The number of cycles elapsed between
>> + * drm_panthor_perf_sample_header::timestamp_start_ns and
>> + * drm_panthor_perf_sample_header::timestamp_end_ns on the coregroup clock if the
>> + * corresponding bit is set in drm_panthor_perf_info::supported_clocks.
>> + */
>> + __u64 coregroup_clock_cycles;
>> +
>> + /**
>> + * @shader_clock_cycles: The number of cycles elapsed between
>> + * drm_panthor_perf_sample_header::timestamp_start_ns and
>> + * drm_panthor_perf_sample_header::timestamp_end_ns on the shader core clock if the
>> + * corresponding bit is set in drm_panthor_perf_info::supported_clocks.
>> + */
>> + __u64 shader_clock_cycles;
>> +};
>> +
>> +/**
>> + * enum drm_panthor_perf_command - Command type passed to the DRM_PANTHOR_PERF_CONTROL
>> + * IOCTL.
>> + */
>> +enum drm_panthor_perf_command {
>> + /** @DRM_PANTHOR_PERF_COMMAND_SETUP: Create a new performance counter sampling context. */
>> + DRM_PANTHOR_PERF_COMMAND_SETUP,
>> +
>> + /** @DRM_PANTHOR_PERF_COMMAND_TEARDOWN: Teardown a performance counter sampling context. */
>> + DRM_PANTHOR_PERF_COMMAND_TEARDOWN,
>> +
>> + /** @DRM_PANTHOR_PERF_COMMAND_START: Start a sampling session on the indicated context. */
>> + DRM_PANTHOR_PERF_COMMAND_START,
>> +
>> + /** @DRM_PANTHOR_PERF_COMMAND_STOP: Stop the sampling session on the indicated context. */
>> + DRM_PANTHOR_PERF_COMMAND_STOP,
>> +
>> + /**
>> + * @DRM_PANTHOR_PERF_COMMAND_SAMPLE: Request a manual sample on the indicated context.
>> + *
>> + * When the sampling session is configured with a non-zero sampling frequency, any
>> + * DRM_PANTHOR_PERF_CONTROL calls with this command will be ignored and return an
>> + * -EINVAL.
>> + */
>> + DRM_PANTHOR_PERF_COMMAND_SAMPLE,
>> +};
>> +
>> +/**
>> + * struct drm_panthor_perf_control - Arguments passed to DRM_PANTHOR_IOCTL_PERF_CONTROL.
>> + */
>> +struct drm_panthor_perf_control {
>> + /** @cmd: Command from enum drm_panthor_perf_command. */
>> + __u32 cmd;
>> +
>> + /**
>> + * @handle: session handle.
>> + *
>> + * Returned by the DRM_PANTHOR_PERF_COMMAND_SETUP call.
>> + * It must be used in subsequent commands for the same context.
>> + */
>> + __u32 handle;
>> +
>> + /**
>> + * @size: size of the command structure.
>> + *
>> + * If the pointer is NULL, the size is updated by the driver to provide the size of the
>> + * output structure. If the pointer is not NULL, the driver will only copy min(size,
>> + * struct_size) to the pointer and update the size accordingly.
>> + */
>> + __u64 size;
>> +
>> + /**
>> + * @pointer: user pointer to a command type struct, such as
>> + * @struct drm_panthor_perf_cmd_start.
>> + */
>> + __u64 pointer;
>> +};
>> +
>> +/**
>> + * enum drm_panthor_perf_counter_set - The counter set to be requested from the hardware.
>> + *
>> + * The hardware supports a single performance counter set at a time, so requesting any set other
>> + * than the primary may fail if another process is sampling at the same time.
>> + *
>> + * If in doubt, the primary counter set has the most commonly used counters and requires no
>> + * additional permissions to open.
>> + */
>> +enum drm_panthor_perf_counter_set {
>> + /**
>> + * @DRM_PANTHOR_PERF_SET_PRIMARY: The default set configured on the hardware.
>> + *
>> + * This is the only set for which all counters in all blocks are defined.
>> + */
>> + DRM_PANTHOR_PERF_SET_PRIMARY,
>> +
>> + /**
>> + * @DRM_PANTHOR_PERF_SET_SECONDARY: The secondary performance counter set.
>> + *
>> + * Some blocks may not have any defined counters for this set, and the block will
>> + * have the UNAVAILABLE block state permanently set in the block header.
>> + *
>> + * Accessing this set requires the calling process to have the CAP_PERFMON capability.
>> + */
>> + DRM_PANTHOR_PERF_SET_SECONDARY,
>> +
>> + /**
>> + * @DRM_PANTHOR_PERF_SET_TERTIARY: The tertiary performance counter set.
>> + *
>> + * Some blocks may not have any defined counters for this set, and the block will have
>> + * the UNAVAILABLE block state permanently set in the block header. Note that the
>> + * tertiary set has the fewest defined counter blocks.
>> + *
>> + * Accessing this set requires the calling process to have the CAP_PERFMON capability.
>> + */
>> + DRM_PANTHOR_PERF_SET_TERTIARY,
>> +};
>> +
>> +/**
>> + * struct drm_panthor_perf_ringbuf_control - Struct used to map in the ring buffer control indices
>> + * into memory shared between user and kernel.
>> + *
>> + */
>> +struct drm_panthor_perf_ringbuf_control {
>> + /**
>> + * @extract_idx: The index of the latest sample that was processed by userspace. Only
>> + * modifiable by userspace.
>> + */
>> + __u64 extract_idx;
>> +
>> + /**
>> + * @insert_idx: The index of the latest sample emitted by the kernel. Only modiable by
>> + * modifiable by the kernel.
>> + */
>> + __u64 insert_idx;
>> +};
>> +
>> +/**
>> + * struct drm_panthor_perf_cmd_setup - Arguments passed to DRM_PANTHOR_IOCTL_PERF_CONTROL
>> + * when the DRM_PANTHOR_PERF_COMMAND_SETUP command is specified.
>> + */
>> +struct drm_panthor_perf_cmd_setup {
>> + /**
>> + * @block_set: Set of performance counter blocks, member of
>> + * enum drm_panthor_perf_block_set.
>> + *
>> + * This is a global configuration and only one set can be active at a time. If
>> + * another client has already requested a counter set, any further requests
>> + * for a different counter set will fail and return an -EBUSY.
>> + *
>> + * If the requested set does not exist, the request will fail and return an -EINVAL.
>> + *
>> + * Some sets have additional requirements to be enabled, and the setup request will
>> + * fail with an -EACCES if these requirements are not satisfied.
>> + */
>> + __u8 block_set;
>> +
>> + /** @pad: MBZ. */
>> + __u8 pad[7];
>> +
>> + /** @fd: eventfd for signalling the availability of a new sample. */
>> + __u32 fd;
>> +
>> + /** @ringbuf_handle: Handle to the BO to write perf counter sample to. */
>> + __u32 ringbuf_handle;
>> +
>> + /**
>> + * @control_handle: Handle to the BO containing a contiguous 16 byte range, used for the
>> + * insert and extract indices for the ringbuffer.
>> + */
>> + __u32 control_handle;
>> +
>> + /**
>> + * @sample_slots: The number of slots available in the userspace-provided BO. Must be
>> + * a power of 2.
>> + *
>> + * If sample_slots * sample_size does not match the BO size, the setup request will fail.
>> + */
>> + __u32 sample_slots;
>> +
>> + /**
>> + * @control_offset: Offset into the control BO where the insert and extract indices are
>> + * located.
>> + */
>> + __u64 control_offset;
>> +
>> + /**
>> + * @sample_freq_ns: Period between automatic counter sample collection in nanoseconds. Zero
>> + * disables automatic collection and all collection must be done through explicit calls
>> + * to DRM_PANTHOR_PERF_CONTROL.SAMPLE. Non-zero values will disable manual counter sampling
>> + * via the DRM_PANTHOR_PERF_COMMAND_SAMPLE command.
>> + *
>> + * This disables software-triggered periodic sampling, but hardware will still trigger
>> + * automatic samples on certain events, including shader core power transitions, and
>> + * entries to and exits from non-counting periods. The final stop command will also
>> + * trigger a sample to ensure no data is lost.
>> + */
>> + __u64 sample_freq_ns;
>> +
>> + /**
>> + * @fw_enable_mask: Bitmask of counters to request from the FW counter block. Any bits
>> + * past the first drm_panthor_perf_info.counters_per_block bits will be ignored. Bit 0
>> + * corresponds to counter 0.
>> + */
>> + __u64 fw_enable_mask[2];
>> +
>> + /**
>> + * @cshw_enable_mask: Bitmask of counters to request from the CSHW counter block. Any bits
>> + * past the first drm_panthor_perf_info.counters_per_block bits will be ignored. Bit 0
>> + * corresponds to counter 0.
>> + */
>> + __u64 cshw_enable_mask[2];
>> +
>> + /**
>> + * @tiler_enable_mask: Bitmask of counters to request from the tiler counter block. Any
>> + * bits past the first drm_panthor_perf_info.counters_per_block bits will be ignored. Bit
>> + * 0 corresponds to counter 0.
>> + */
>> + __u64 tiler_enable_mask[2];
>> +
>> + /**
>> + * @memsys_enable_mask: Bitmask of counters to request from the memsys counter blocks. Any
>> + * bits past the first drm_panthor_perf_info.counters_per_block bits will be ignored. Bit 0
>> + * corresponds to counter 0.
>> + */
>> + __u64 memsys_enable_mask[2];
>> +
>> + /**
>> + * @shader_enable_mask: Bitmask of counters to request from the shader core counter blocks.
>> + * Any bits past the first drm_panthor_perf_info.counters_per_block bits will be ignored.
>> + * Bit 0 corresponds to counter 0.
>> + */
>> + __u64 shader_enable_mask[2];
>> +};
>> +
>> +/**
>> + * struct drm_panthor_perf_cmd_start - Arguments passed to DRM_PANTHOR_IOCTL_PERF_CONTROL
>> + * when the DRM_PANTHOR_PERF_COMMAND_START command is specified.
>> + */
>> +struct drm_panthor_perf_cmd_start {
>> + /**
>> + * @user_data: User provided data that will be attached to automatic samples collected
>> + * until the next DRM_PANTHOR_PERF_COMMAND_STOP.
>> + */
>> + __u64 user_data;
>> +};
>> +
>> +/**
>> + * struct drm_panthor_perf_cmd_stop - Arguments passed to DRM_PANTHOR_IOCTL_PERF_CONTROL
>> + * when the DRM_PANTHOR_PERF_COMMAND_STOP command is specified.
>> + */
>> +struct drm_panthor_perf_cmd_stop {
>> + /**
>> + * @user_data: User provided data that will be attached to the automatic sample collected
>> + * at the end of this sampling session.
>> + */
>> + __u64 user_data;
>> +};
>> +
>> +/**
>> + * struct drm_panthor_perf_cmd_sample - Arguments passed to DRM_PANTHOR_IOCTL_PERF_CONTROL
>> + * when the DRM_PANTHOR_PERF_COMMAND_SAMPLE command is specified.
>> + */
>> +struct drm_panthor_perf_cmd_sample {
>> + /** @user_data: User provided data that will be attached to the sample.*/
>> + __u64 user_data;
>> +};
>> +
>> /**
>> * DRM_IOCTL_PANTHOR() - Build a Panthor IOCTL number
>> * @__access: Access type. Must be R, W or RW.
>> @@ -1019,6 +1582,8 @@ enum {
>> DRM_IOCTL_PANTHOR(WR, TILER_HEAP_CREATE, tiler_heap_create),
>> DRM_IOCTL_PANTHOR_TILER_HEAP_DESTROY =
>> DRM_IOCTL_PANTHOR(WR, TILER_HEAP_DESTROY, tiler_heap_destroy),
>> + DRM_IOCTL_PANTHOR_PERF_CONTROL =
>> + DRM_IOCTL_PANTHOR(WR, PERF_CONTROL, perf_control)
>> };
>>
>> #if defined(__cplusplus)
>> --
>> 2.33.0.dirty
>
>
>
> Adrian Larumbe
Kind regards,
Lukas Zapolskas
[0]: https://docs.kernel.org/next/userspace-api/perf_ring_buffer.html
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v4 2/7] drm/panthor: Add DEV_QUERY.PERF_INFO handling for Gx10
2025-07-18 2:52 ` Adrián Larumbe
@ 2025-07-21 9:04 ` Lukas Zapolskas
0 siblings, 0 replies; 29+ messages in thread
From: Lukas Zapolskas @ 2025-07-21 9:04 UTC (permalink / raw)
To: Adrián Larumbe
Cc: Boris Brezillon, Steven Price, Liviu Dudau, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
dri-devel, linux-kernel
On 18/07/2025 03:52, Adrián Larumbe wrote:
> On 16.05.2025 16:49, Lukas Zapolskas wrote:
>> This change adds the IOCTL to query data about the performance counter
>> setup. Some of this data was available via previous DEV_QUERY calls,
>> for instance for GPU info, but exposing it via PERF_INFO
>> minimizes the overhead of creating a single session to just the one
>> aggregate IOCTL.
>>
>> Signed-off-by: Lukas Zapolskas <lukas.zapolskas@arm.com>
>> Reviewed-by: Adrián Larumbe <adrian.larumbe@collabora.com>
>> ---
>> drivers/gpu/drm/panthor/Makefile | 1 +
>> drivers/gpu/drm/panthor/panthor_device.c | 5 ++
>> drivers/gpu/drm/panthor/panthor_device.h | 3 +
>> drivers/gpu/drm/panthor/panthor_drv.c | 10 +++-
>> drivers/gpu/drm/panthor/panthor_fw.h | 3 +
>> drivers/gpu/drm/panthor/panthor_perf.c | 76 ++++++++++++++++++++++++
>> drivers/gpu/drm/panthor/panthor_perf.h | 15 +++++
>> drivers/gpu/drm/panthor/panthor_regs.h | 1 +
>> 8 files changed, 113 insertions(+), 1 deletion(-)
>> create mode 100644 drivers/gpu/drm/panthor/panthor_perf.c
>> create mode 100644 drivers/gpu/drm/panthor/panthor_perf.h
>>
>> diff --git a/drivers/gpu/drm/panthor/Makefile b/drivers/gpu/drm/panthor/Makefile
>> index 15294719b09c..0df9947f3575 100644
>> --- a/drivers/gpu/drm/panthor/Makefile
>> +++ b/drivers/gpu/drm/panthor/Makefile
>> @@ -9,6 +9,7 @@ panthor-y := \
>> panthor_gpu.o \
>> panthor_heap.o \
>> panthor_mmu.o \
>> + panthor_perf.o \
>> panthor_sched.o
>>
>> obj-$(CONFIG_DRM_PANTHOR) += panthor.o
>> diff --git a/drivers/gpu/drm/panthor/panthor_device.c b/drivers/gpu/drm/panthor/panthor_device.c
>> index a9da1d1eeb70..76b4cf3dc391 100644
>> --- a/drivers/gpu/drm/panthor/panthor_device.c
>> +++ b/drivers/gpu/drm/panthor/panthor_device.c
>> @@ -19,6 +19,7 @@
>> #include "panthor_fw.h"
>> #include "panthor_gpu.h"
>> #include "panthor_mmu.h"
>> +#include "panthor_perf.h"
>> #include "panthor_regs.h"
>> #include "panthor_sched.h"
>>
>> @@ -259,6 +260,10 @@ int panthor_device_init(struct panthor_device *ptdev)
>> if (ret)
>> goto err_unplug_fw;
>>
>> + ret = panthor_perf_init(ptdev);
>> + if (ret)
>> + goto err_unplug_fw;
> goto err_unplug_sched;
>
> [...]
>
> err_disable_autosuspend:
> pm_runtime_dont_use_autosuspend(ptdev->base.dev);
>
> err_unplug_sched:
> panthor_sched_unplug(ptdev);
>
> [...]
>
>> +
>> /* ~3 frames */
>> pm_runtime_set_autosuspend_delay(ptdev->base.dev, 50);
>> pm_runtime_use_autosuspend(ptdev->base.dev);
>> diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
>> index da6574021664..657ccc39568c 100644
>> --- a/drivers/gpu/drm/panthor/panthor_device.h
>> +++ b/drivers/gpu/drm/panthor/panthor_device.h
>> @@ -120,6 +120,9 @@ struct panthor_device {
>> /** @csif_info: Command stream interface information. */
>> struct drm_panthor_csif_info csif_info;
>>
>> + /** @perf_info: Performance counter interface information. */
>> + struct drm_panthor_perf_info perf_info;
>> +
>> /** @gpu: GPU management data. */
>> struct panthor_gpu *gpu;
>>
>> diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c
>> index 06fe46e32073..9d2b716cca45 100644
>> --- a/drivers/gpu/drm/panthor/panthor_drv.c
>> +++ b/drivers/gpu/drm/panthor/panthor_drv.c
>> @@ -175,7 +175,8 @@ panthor_get_uobj_array(const struct drm_panthor_obj_array *in, u32 min_stride,
>> PANTHOR_UOBJ_DECL(struct drm_panthor_sync_op, timeline_value), \
>> PANTHOR_UOBJ_DECL(struct drm_panthor_queue_submit, syncs), \
>> PANTHOR_UOBJ_DECL(struct drm_panthor_queue_create, ringbuf_size), \
>> - PANTHOR_UOBJ_DECL(struct drm_panthor_vm_bind_op, syncs))
>> + PANTHOR_UOBJ_DECL(struct drm_panthor_vm_bind_op, syncs), \
>> + PANTHOR_UOBJ_DECL(struct drm_panthor_perf_info, shader_blocks))
>>
>> /**
>> * PANTHOR_UOBJ_SET() - Copy a kernel object to a user object.
>> @@ -835,6 +836,10 @@ static int panthor_ioctl_dev_query(struct drm_device *ddev, void *data, struct d
>> args->size = sizeof(priorities_info);
>> return 0;
>>
>> + case DRM_PANTHOR_DEV_QUERY_PERF_INFO:
>> + args->size = sizeof(ptdev->perf_info);
>> + return 0;
>> +
>> default:
>> return -EINVAL;
>> }
>> @@ -859,6 +864,9 @@ static int panthor_ioctl_dev_query(struct drm_device *ddev, void *data, struct d
>> panthor_query_group_priorities_info(file, &priorities_info);
>> return PANTHOR_UOBJ_SET(args->pointer, args->size, priorities_info);
>>
>> + case DRM_PANTHOR_DEV_QUERY_PERF_INFO:
>> + return PANTHOR_UOBJ_SET(args->pointer, args->size, ptdev->perf_info);
>> +
>> default:
>> return -EINVAL;
>> }
>> diff --git a/drivers/gpu/drm/panthor/panthor_fw.h b/drivers/gpu/drm/panthor/panthor_fw.h
>> index 6598d96c6d2a..8bcb933fa790 100644
>> --- a/drivers/gpu/drm/panthor/panthor_fw.h
>> +++ b/drivers/gpu/drm/panthor/panthor_fw.h
>> @@ -197,8 +197,11 @@ struct panthor_fw_global_control_iface {
>> u32 output_va;
>> u32 group_num;
>> u32 group_stride;
>> +#define GLB_PERFCNT_FW_SIZE(x) ((((x) >> 16) << 8))
>> u32 perfcnt_size;
>> u32 instr_features;
>> +#define PERFCNT_FEATURES_MD_SIZE(x) (((x) & GENMASK(3, 0)) << 8)
>
> What does MD stand for here?
>
Metadata. I will spell this out fully in the next patch set, since shortening it is not
too helpful.
>> + u32 perfcnt_features;
>> };
>>
>> struct panthor_fw_global_input_iface {
>> diff --git a/drivers/gpu/drm/panthor/panthor_perf.c b/drivers/gpu/drm/panthor/panthor_perf.c
>> new file mode 100644
>> index 000000000000..66e9a197ac1f
>> --- /dev/null
>> +++ b/drivers/gpu/drm/panthor/panthor_perf.c
>> @@ -0,0 +1,76 @@
>> +// SPDX-License-Identifier: GPL-2.0 or MIT
>> +/* Copyright 2023 Collabora Ltd */
>> +/* Copyright 2025 Arm ltd. */
>> +
>> +#include <linux/bitops.h>
>> +#include <drm/panthor_drm.h>
>> +
>> +#include "panthor_device.h"
>> +#include "panthor_fw.h"
>> +#include "panthor_perf.h"
>> +
>> +struct panthor_perf_counter_block {
>> + struct drm_panthor_perf_block_header header;
>> + u64 counters[];
>> +};
>> +
>
>> +{
>> + return struct_size_t(struct panthor_perf_counter_block, counters, counters_per_block);
>> +}
>> +
>> +static size_t session_get_user_sample_size(const struct drm_panthor_perf_info *const info)
>> +{
>> + const size_t block_size = get_annotated_block_size(info->counters_per_block);
>> + const size_t block_nr = info->cshw_blocks + info->fw_blocks +
>> + info->tiler_blocks + info->memsys_blocks + info->shader_blocks;
>> +
>> + return sizeof(struct drm_panthor_perf_sample_header) + (block_size * block_nr);
>> +}
>
> You're assining perf_info->counters_per_block the same sizeof() slightly further below
> so maybe you can use that value here straight away.
>
Will do, thanks.
>> +
>> +/**
>> + * PANTHOR_PERF_COUNTERS_PER_BLOCK - On CSF architectures pre-11.x, the number of counters
>> + * per block was hardcoded to be 64. Arch 11.0 onwards supports the PRFCNT_FEATURES GPU register,
>> + * which indicates the same information.
>> + */
>
> I guess you're waiting for the commit in ML message <20250320111741.1937892-7-karunika.choo@arm.com>
> ("drm/panthor: Add support for Mali-G715 family of GPUs) to check whether GPU_ARCH_MAJOR(ptdev->gpu_info.gpu_id)
> returns anything equal or above 11 to add support for reading the number of counters from PRFCNT_FEATURES?
>
> I don't remember whether that series is already merged, but it'd be nice to have it in this one too.
>
That's right. For the moment, I was only targetting the Gx10, but can add that when the mentioned patch is merged
(I don't think it is yet).
>> +#define PANTHOR_PERF_COUNTERS_PER_BLOCK (64)
>> +
>> +static void panthor_perf_info_init(struct panthor_device *ptdev)
>> +{
>> + struct panthor_fw_global_iface *glb_iface = panthor_fw_get_glb_iface(ptdev);
>> + struct drm_panthor_perf_info *const perf_info = &ptdev->perf_info;
>> +
>> + if (PERFCNT_FEATURES_MD_SIZE(glb_iface->control->perfcnt_features))
>> + perf_info->flags |= DRM_PANTHOR_PERF_BLOCK_STATES_SUPPORT;
>> +
>> + perf_info->counters_per_block = PANTHOR_PERF_COUNTERS_PER_BLOCK;
>> +
>> + perf_info->sample_header_size = sizeof(struct drm_panthor_perf_sample_header);
>> + perf_info->block_header_size = sizeof(struct drm_panthor_perf_block_header);
>> +
>> + if (GLB_PERFCNT_FW_SIZE(glb_iface->control->perfcnt_size))
>> + perf_info->fw_blocks = 1;
>> +
>> + perf_info->cshw_blocks = 1;
>> + perf_info->tiler_blocks = 1;
>> + perf_info->memsys_blocks = GPU_MEM_FEATURES_L2_SLICES(ptdev->gpu_info.mem_features);
>> + perf_info->shader_blocks = hweight64(ptdev->gpu_info.shader_present);
>> +
>> + perf_info->sample_size = session_get_user_sample_size(perf_info);
>> +}
>> +
>> +/**
>> + * panthor_perf_init - Initialize the performance counter subsystem.
>> + * @ptdev: Panthor device
>> + *
>> + * Return: 0 on success, negative error code on failure.
>> + */
>> +int panthor_perf_init(struct panthor_device *ptdev)
>> +{
>> + if (!ptdev)
>> + return -EINVAL;
>> +
>> + panthor_perf_info_init(ptdev);
>> +
>> + return 0;
>> +}
>> diff --git a/drivers/gpu/drm/panthor/panthor_perf.h b/drivers/gpu/drm/panthor/panthor_perf.h
>> new file mode 100644
>> index 000000000000..3c32c24c164c
>> --- /dev/null
>> +++ b/drivers/gpu/drm/panthor/panthor_perf.h
>> @@ -0,0 +1,15 @@
>> +/* SPDX-License-Identifier: GPL-2.0 or MIT */
>> +/* Copyright 2025 Collabora Ltd */
>> +/* Copyright 2025 Arm ltd. */
>> +
>> +#ifndef __PANTHOR_PERF_H__
>> +#define __PANTHOR_PERF_H__
>> +
>> +#include <linux/types.h>
>> +
>> +struct panthor_device;
>> +
>> +int panthor_perf_init(struct panthor_device *ptdev);
>> +
>> +#endif /* __PANTHOR_PERF_H__ */
>> +
>> diff --git a/drivers/gpu/drm/panthor/panthor_regs.h b/drivers/gpu/drm/panthor/panthor_regs.h
>> index b7b3b3add166..d9e9379d1a20 100644
>> --- a/drivers/gpu/drm/panthor/panthor_regs.h
>> +++ b/drivers/gpu/drm/panthor/panthor_regs.h
>> @@ -27,6 +27,7 @@
>> #define GPU_TILER_FEATURES 0xC
>> #define GPU_MEM_FEATURES 0x10
>> #define GROUPS_L2_COHERENT BIT(0)
>> +#define GPU_MEM_FEATURES_L2_SLICES(x) ((((x) & GENMASK(11, 8)) >> 8) + 1)
>>
>> #define GPU_MMU_FEATURES 0x14
>> #define GPU_MMU_FEATURES_VA_BITS(x) ((x) & GENMASK(7, 0))
>> --
>> 2.33.0.dirty
>
> Adrian Larumbe
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v4 2/7] drm/panthor: Add DEV_QUERY.PERF_INFO handling for Gx10
2025-07-18 15:11 ` Adrián Larumbe
@ 2025-07-21 9:06 ` Lukas Zapolskas
0 siblings, 0 replies; 29+ messages in thread
From: Lukas Zapolskas @ 2025-07-21 9:06 UTC (permalink / raw)
To: Adrián Larumbe
Cc: Boris Brezillon, Steven Price, Liviu Dudau, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
dri-devel, linux-kernel
On 18/07/2025 16:11, Adrián Larumbe wrote:
> Hi Lucas, forgot to add one comment in the previous patch review,
>
> On 16.05.2025 16:49, Lukas Zapolskas wrote:
>> This change adds the IOCTL to query data about the performance counter
>> setup. Some of this data was available via previous DEV_QUERY calls,
>> for instance for GPU info, but exposing it via PERF_INFO
>> minimizes the overhead of creating a single session to just the one
>> aggregate IOCTL.
>>
>> Signed-off-by: Lukas Zapolskas <lukas.zapolskas@arm.com>
>> Reviewed-by: Adrián Larumbe <adrian.larumbe@collabora.com>
>> ---
>> drivers/gpu/drm/panthor/Makefile | 1 +
>> drivers/gpu/drm/panthor/panthor_device.c | 5 ++
>> drivers/gpu/drm/panthor/panthor_device.h | 3 +
>> drivers/gpu/drm/panthor/panthor_drv.c | 10 +++-
>> drivers/gpu/drm/panthor/panthor_fw.h | 3 +
>> drivers/gpu/drm/panthor/panthor_perf.c | 76 ++++++++++++++++++++++++
>> drivers/gpu/drm/panthor/panthor_perf.h | 15 +++++
>> drivers/gpu/drm/panthor/panthor_regs.h | 1 +
>> 8 files changed, 113 insertions(+), 1 deletion(-)
>> create mode 100644 drivers/gpu/drm/panthor/panthor_perf.c
>> create mode 100644 drivers/gpu/drm/panthor/panthor_perf.h
>>
>> diff --git a/drivers/gpu/drm/panthor/Makefile b/drivers/gpu/drm/panthor/Makefile
>> index 15294719b09c..0df9947f3575 100644
>> --- a/drivers/gpu/drm/panthor/Makefile
>> +++ b/drivers/gpu/drm/panthor/Makefile
>> @@ -9,6 +9,7 @@ panthor-y := \
>> panthor_gpu.o \
>> panthor_heap.o \
>> panthor_mmu.o \
>> + panthor_perf.o \
>> panthor_sched.o
>>
>> obj-$(CONFIG_DRM_PANTHOR) += panthor.o
>> diff --git a/drivers/gpu/drm/panthor/panthor_device.c b/drivers/gpu/drm/panthor/panthor_device.c
>> index a9da1d1eeb70..76b4cf3dc391 100644
>> --- a/drivers/gpu/drm/panthor/panthor_device.c
>> +++ b/drivers/gpu/drm/panthor/panthor_device.c
>> @@ -19,6 +19,7 @@
>> #include "panthor_fw.h"
>> #include "panthor_gpu.h"
>> #include "panthor_mmu.h"
>> +#include "panthor_perf.h"
>> #include "panthor_regs.h"
>> #include "panthor_sched.h"
>>
>> @@ -259,6 +260,10 @@ int panthor_device_init(struct panthor_device *ptdev)
>> if (ret)
>> goto err_unplug_fw;
>>
>> + ret = panthor_perf_init(ptdev);
>> + if (ret)
>> + goto err_unplug_fw;
>> +
>> /* ~3 frames */
>> pm_runtime_set_autosuspend_delay(ptdev->base.dev, 50);
>> pm_runtime_use_autosuspend(ptdev->base.dev);
>> diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
>> index da6574021664..657ccc39568c 100644
>> --- a/drivers/gpu/drm/panthor/panthor_device.h
>> +++ b/drivers/gpu/drm/panthor/panthor_device.h
>> @@ -120,6 +120,9 @@ struct panthor_device {
>> /** @csif_info: Command stream interface information. */
>> struct drm_panthor_csif_info csif_info;
>>
>> + /** @perf_info: Performance counter interface information. */
>> + struct drm_panthor_perf_info perf_info;
>> +
>> /** @gpu: GPU management data. */
>> struct panthor_gpu *gpu;
>>
>> diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c
>> index 06fe46e32073..9d2b716cca45 100644
>> --- a/drivers/gpu/drm/panthor/panthor_drv.c
>> +++ b/drivers/gpu/drm/panthor/panthor_drv.c
>> @@ -175,7 +175,8 @@ panthor_get_uobj_array(const struct drm_panthor_obj_array *in, u32 min_stride,
>> PANTHOR_UOBJ_DECL(struct drm_panthor_sync_op, timeline_value), \
>> PANTHOR_UOBJ_DECL(struct drm_panthor_queue_submit, syncs), \
>> PANTHOR_UOBJ_DECL(struct drm_panthor_queue_create, ringbuf_size), \
>> - PANTHOR_UOBJ_DECL(struct drm_panthor_vm_bind_op, syncs))
>> + PANTHOR_UOBJ_DECL(struct drm_panthor_vm_bind_op, syncs), \
>> + PANTHOR_UOBJ_DECL(struct drm_panthor_perf_info, shader_blocks))
>>
>> /**
>> * PANTHOR_UOBJ_SET() - Copy a kernel object to a user object.
>> @@ -835,6 +836,10 @@ static int panthor_ioctl_dev_query(struct drm_device *ddev, void *data, struct d
>> args->size = sizeof(priorities_info);
>> return 0;
>>
>> + case DRM_PANTHOR_DEV_QUERY_PERF_INFO:
>> + args->size = sizeof(ptdev->perf_info);
>> + return 0;
>> +
>> default:
>> return -EINVAL;
>> }
>> @@ -859,6 +864,9 @@ static int panthor_ioctl_dev_query(struct drm_device *ddev, void *data, struct d
>> panthor_query_group_priorities_info(file, &priorities_info);
>> return PANTHOR_UOBJ_SET(args->pointer, args->size, priorities_info);
>>
>> + case DRM_PANTHOR_DEV_QUERY_PERF_INFO:
>> + return PANTHOR_UOBJ_SET(args->pointer, args->size, ptdev->perf_info);
>> +
>> default:
>> return -EINVAL;
>> }
>> diff --git a/drivers/gpu/drm/panthor/panthor_fw.h b/drivers/gpu/drm/panthor/panthor_fw.h
>> index 6598d96c6d2a..8bcb933fa790 100644
>> --- a/drivers/gpu/drm/panthor/panthor_fw.h
>> +++ b/drivers/gpu/drm/panthor/panthor_fw.h
>> @@ -197,8 +197,11 @@ struct panthor_fw_global_control_iface {
>> u32 output_va;
>> u32 group_num;
>> u32 group_stride;
>> +#define GLB_PERFCNT_FW_SIZE(x) ((((x) >> 16) << 8))
>> u32 perfcnt_size;
>> u32 instr_features;
>> +#define PERFCNT_FEATURES_MD_SIZE(x) (((x) & GENMASK(3, 0)) << 8)
>> + u32 perfcnt_features;
>> };
>>
>> struct panthor_fw_global_input_iface {
>> diff --git a/drivers/gpu/drm/panthor/panthor_perf.c b/drivers/gpu/drm/panthor/panthor_perf.c
>> new file mode 100644
>> index 000000000000..66e9a197ac1f
>> --- /dev/null
>> +++ b/drivers/gpu/drm/panthor/panthor_perf.c
>> @@ -0,0 +1,76 @@
>> +// SPDX-License-Identifier: GPL-2.0 or MIT
>> +/* Copyright 2023 Collabora Ltd */
>> +/* Copyright 2025 Arm ltd. */
>> +
>> +#include <linux/bitops.h>
>> +#include <drm/panthor_drm.h>
>> +
>> +#include "panthor_device.h"
>> +#include "panthor_fw.h"
>> +#include "panthor_perf.h"
>> +
>> +struct panthor_perf_counter_block {
>> + struct drm_panthor_perf_block_header header;
>> + u64 counters[];
>> +};
>> +
>> +static size_t get_annotated_block_size(size_t counters_per_block)
>> +{
>> + return struct_size_t(struct panthor_perf_counter_block, counters, counters_per_block);
>> +}
>> +
>> +static size_t session_get_user_sample_size(const struct drm_panthor_perf_info *const info)
>> +{
>> + const size_t block_size = get_annotated_block_size(info->counters_per_block);
>> + const size_t block_nr = info->cshw_blocks + info->fw_blocks +
>> + info->tiler_blocks + info->memsys_blocks + info->shader_blocks;
>> +
>> + return sizeof(struct drm_panthor_perf_sample_header) + (block_size * block_nr);
>
> I think you could use 'perf_info->sample_header_size' because you assign it in the calling function.
>
I think so. Will go over and see where I'm duplicating the sizeof() calls, since I may be doing so
in several places.
>> +}
>> +
>> +/**
>> + * PANTHOR_PERF_COUNTERS_PER_BLOCK - On CSF architectures pre-11.x, the number of counters
>> + * per block was hardcoded to be 64. Arch 11.0 onwards supports the PRFCNT_FEATURES GPU register,
>> + * which indicates the same information.
>> + */
>> +#define PANTHOR_PERF_COUNTERS_PER_BLOCK (64)
>> +
>> +static void panthor_perf_info_init(struct panthor_device *ptdev)
>> +{
>> + struct panthor_fw_global_iface *glb_iface = panthor_fw_get_glb_iface(ptdev);
>> + struct drm_panthor_perf_info *const perf_info = &ptdev->perf_info;
>> +
>> + if (PERFCNT_FEATURES_MD_SIZE(glb_iface->control->perfcnt_features))
>> + perf_info->flags |= DRM_PANTHOR_PERF_BLOCK_STATES_SUPPORT;
>> +
>> + perf_info->counters_per_block = PANTHOR_PERF_COUNTERS_PER_BLOCK;
>> +
>> + perf_info->sample_header_size = sizeof(struct drm_panthor_perf_sample_header);
>> + perf_info->block_header_size = sizeof(struct drm_panthor_perf_block_header);
>> +
>> + if (GLB_PERFCNT_FW_SIZE(glb_iface->control->perfcnt_size))
>> + perf_info->fw_blocks = 1;
>> +
>> + perf_info->cshw_blocks = 1;
>> + perf_info->tiler_blocks = 1;
>> + perf_info->memsys_blocks = GPU_MEM_FEATURES_L2_SLICES(ptdev->gpu_info.mem_features);
>> + perf_info->shader_blocks = hweight64(ptdev->gpu_info.shader_present);
>> +
>> + perf_info->sample_size = session_get_user_sample_size(perf_info);
>> +}
>> +
>> +/**
>> + * panthor_perf_init - Initialize the performance counter subsystem.
>> + * @ptdev: Panthor device
>> + *
>> + * Return: 0 on success, negative error code on failure.
>> + */
>> +int panthor_perf_init(struct panthor_device *ptdev)
>> +{
>> + if (!ptdev)
>> + return -EINVAL;
>> +
>> + panthor_perf_info_init(ptdev);
>> +
>> + return 0;
>> +}
>> diff --git a/drivers/gpu/drm/panthor/panthor_perf.h b/drivers/gpu/drm/panthor/panthor_perf.h
>> new file mode 100644
>> index 000000000000..3c32c24c164c
>> --- /dev/null
>> +++ b/drivers/gpu/drm/panthor/panthor_perf.h
>> @@ -0,0 +1,15 @@
>> +/* SPDX-License-Identifier: GPL-2.0 or MIT */
>> +/* Copyright 2025 Collabora Ltd */
>> +/* Copyright 2025 Arm ltd. */
>> +
>> +#ifndef __PANTHOR_PERF_H__
>> +#define __PANTHOR_PERF_H__
>> +
>> +#include <linux/types.h>
>> +
>> +struct panthor_device;
>> +
>> +int panthor_perf_init(struct panthor_device *ptdev);
>> +
>> +#endif /* __PANTHOR_PERF_H__ */
>> +
>> diff --git a/drivers/gpu/drm/panthor/panthor_regs.h b/drivers/gpu/drm/panthor/panthor_regs.h
>> index b7b3b3add166..d9e9379d1a20 100644
>> --- a/drivers/gpu/drm/panthor/panthor_regs.h
>> +++ b/drivers/gpu/drm/panthor/panthor_regs.h
>> @@ -27,6 +27,7 @@
>> #define GPU_TILER_FEATURES 0xC
>> #define GPU_MEM_FEATURES 0x10
>> #define GROUPS_L2_COHERENT BIT(0)
>> +#define GPU_MEM_FEATURES_L2_SLICES(x) ((((x) & GENMASK(11, 8)) >> 8) + 1)
>>
>> #define GPU_MMU_FEATURES 0x14
>> #define GPU_MMU_FEATURES_VA_BITS(x) ((x) & GENMASK(7, 0))
>> --
>> 2.33.0.dirty
>
>
> Adrian Larumbe
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v4 3/7] drm/panthor: Add panthor perf initialization and termination
2025-07-18 3:10 ` Adrián Larumbe
@ 2025-07-21 9:10 ` Lukas Zapolskas
0 siblings, 0 replies; 29+ messages in thread
From: Lukas Zapolskas @ 2025-07-21 9:10 UTC (permalink / raw)
To: Adrián Larumbe
Cc: Boris Brezillon, Steven Price, Liviu Dudau, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
dri-devel, linux-kernel
On 18/07/2025 04:10, Adrián Larumbe wrote:
> On 16.05.2025 16:49, Lukas Zapolskas wrote:
>> Added the panthor_perf system initialization and unplug code to allow
>> for the handling of userspace sessions to be added in follow-up
>> patches.
>>
>> Signed-off-by: Lukas Zapolskas <lukas.zapolskas@arm.com>
>> ---
>> drivers/gpu/drm/panthor/panthor_device.c | 2 +
>> drivers/gpu/drm/panthor/panthor_device.h | 5 +-
>> drivers/gpu/drm/panthor/panthor_perf.c | 62 +++++++++++++++++++++++-
>> drivers/gpu/drm/panthor/panthor_perf.h | 1 +
>> 4 files changed, 68 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/panthor/panthor_device.c b/drivers/gpu/drm/panthor/panthor_device.c
>> index 76b4cf3dc391..7ac985d44655 100644
>> --- a/drivers/gpu/drm/panthor/panthor_device.c
>> +++ b/drivers/gpu/drm/panthor/panthor_device.c
>> @@ -98,6 +98,7 @@ void panthor_device_unplug(struct panthor_device *ptdev)
>> /* Now, try to cleanly shutdown the GPU before the device resources
>> * get reclaimed.
>> */
>> + panthor_perf_unplug(ptdev);
>> panthor_sched_unplug(ptdev);
>> panthor_fw_unplug(ptdev);
>> panthor_mmu_unplug(ptdev);
>> @@ -277,6 +278,7 @@ int panthor_device_init(struct panthor_device *ptdev)
>>
>> err_disable_autosuspend:
>> pm_runtime_dont_use_autosuspend(ptdev->base.dev);
>> + panthor_perf_unplug(ptdev);
>> panthor_sched_unplug(ptdev);
>>
>> err_unplug_fw:
>> diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
>> index 657ccc39568c..818c4d96d448 100644
>> --- a/drivers/gpu/drm/panthor/panthor_device.h
>> +++ b/drivers/gpu/drm/panthor/panthor_device.h
>> @@ -27,7 +27,7 @@ struct panthor_heap_pool;
>> struct panthor_job;
>> struct panthor_mmu;
>> struct panthor_fw;
>> -struct panthor_perfcnt;
>> +struct panthor_perf;
>> struct panthor_vm;
>> struct panthor_vm_pool;
>>
>> @@ -138,6 +138,9 @@ struct panthor_device {
>> /** @devfreq: Device frequency scaling management data. */
>> struct panthor_devfreq *devfreq;
>>
>> + /** @perf: Performance counter management data. */
>> + struct panthor_perf *perf;
>> +
>> /** @unplug: Device unplug related fields. */
>> struct {
>> /** @lock: Lock used to serialize unplug operations. */
>> diff --git a/drivers/gpu/drm/panthor/panthor_perf.c b/drivers/gpu/drm/panthor/panthor_perf.c
>> index 66e9a197ac1f..9365ce9fed04 100644
>> --- a/drivers/gpu/drm/panthor/panthor_perf.c
>> +++ b/drivers/gpu/drm/panthor/panthor_perf.c
>> @@ -9,6 +9,19 @@
>> #include "panthor_fw.h"
>> #include "panthor_perf.h"
>
> You must include "panthor_regs.h" here or else GPU_MEM_FEATURES_L2_SLICES() won't be available.
> However, it seems this is something that should be done in the previous patch.
>
Will add that to the perf_info patch.
>>
>> +struct panthor_perf {
>> + /** @next_session: The ID of the next session. */
>> + u32 next_session;
>> +
>> + /** @session_range: The number of sessions supported at a time. */
>> + struct xa_limit session_range;
>> +
>> + /**
>> + * @sessions: Global map of sessions, accessed by their ID.
>> + */
>> + struct xarray sessions;
>> +};
>> +
>> struct panthor_perf_counter_block {
>> struct drm_panthor_perf_block_header header;
>> u64 counters[];
>> @@ -63,14 +76,61 @@ static void panthor_perf_info_init(struct panthor_device *ptdev)
>> * panthor_perf_init - Initialize the performance counter subsystem.
>> * @ptdev: Panthor device
>> *
>> + * The performance counters require the FW interface to be available to setup the
>> + * sampling ringbuffers, so this must be called only after FW is initialized.
>> + *
>> * Return: 0 on success, negative error code on failure.
>> */
>> int panthor_perf_init(struct panthor_device *ptdev)
>> {
>> + struct panthor_perf *perf __free(kfree) = NULL;
>> + int ret = 0;
>> +
>> if (!ptdev)
>> return -EINVAL;
>>
>> panthor_perf_info_init(ptdev);
>>
>> - return 0;
>> + perf = kzalloc(sizeof(*perf), GFP_KERNEL);
>> + if (ZERO_OR_NULL_PTR(perf))
>> + return -ENOMEM;
>> +
>> + xa_init_flags(&perf->sessions, XA_FLAGS_ALLOC);
>> +
>> + perf->session_range = (struct xa_limit) {
>> + .min = 0,
>> + .max = 1,
>> + };
>> +
>> + drm_info(&ptdev->base, "Performance counter subsystem initialized");
>> +
>> + ptdev->perf = no_free_ptr(perf);
>> +
>> + return ret;
>> +}
>> +
>> +/**
>> + * panthor_perf_unplug - Terminate the performance counter subsystem.
>> + * @ptdev: Panthor device.
>> + *
>> + * This function will terminate the performance counter control structures and any remaining
>> + * sessions, after waiting for any pending interrupts.
>> + */
>> +void panthor_perf_unplug(struct panthor_device *ptdev)
>> +{
>> + struct panthor_perf *perf = ptdev->perf;
>> +
>> + if (!perf)
>> + return;
>> +
>> + if (!xa_empty(&perf->sessions)) {
>> + drm_err(&ptdev->base,
>> + "Performance counter sessions active when unplugging the driver!");
>> + }
>
> I think this could only happen if someone forces module unload, even
> though there might still be processes which haven't yet closed the DRM
> file?
>
That sounds about right. The only time I have seen that warning was in development when
the session cleanup was not being done properly on process termination.
>> +
>> + xa_destroy(&perf->sessions);
>> +
>> + kfree(ptdev->perf);
>> +
>> + ptdev->perf = NULL;
>> }
>> diff --git a/drivers/gpu/drm/panthor/panthor_perf.h b/drivers/gpu/drm/panthor/panthor_perf.h
>> index 3c32c24c164c..e4805727b9e7 100644
>> --- a/drivers/gpu/drm/panthor/panthor_perf.h
>> +++ b/drivers/gpu/drm/panthor/panthor_perf.h
>> @@ -10,6 +10,7 @@
>> struct panthor_device;
>>
>> int panthor_perf_init(struct panthor_device *ptdev);
>> +void panthor_perf_unplug(struct panthor_device *ptdev);
>>
>> #endif /* __PANTHOR_PERF_H__ */
>>
>> --
>> 2.33.0.dirty
>
> Adrian Larumbe
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v4 4/7] drm/panthor: Introduce sampling sessions to handle userspace clients
2025-07-18 3:34 ` Adrián Larumbe
@ 2025-07-21 9:53 ` Lukas Zapolskas
0 siblings, 0 replies; 29+ messages in thread
From: Lukas Zapolskas @ 2025-07-21 9:53 UTC (permalink / raw)
To: Adrián Larumbe
Cc: Boris Brezillon, Steven Price, Liviu Dudau, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
dri-devel, linux-kernel
On 18/07/2025 04:34, Adrián Larumbe wrote:
> On 16.05.2025 16:49, Lukas Zapolskas wrote:
>> To allow for combining the requests from multiple userspace clients, an
>> intermediary layer between the HW/FW interfaces and userspace is
>> created, containing the information for the counter requests and
>> tracking of insert and extract indices. Each session starts inactive and
>> must be explicitly activated via PERF_CONTROL.START, and explicitly
>> stopped via PERF_CONTROL.STOP. Userspace identifies a single client with
>> its session ID and the panthor file it is associated with.
>>
>> The SAMPLE and STOP commands both produce a single sample when called,
>> and these samples can be disambiguated via the opaque user data field
>> passed in the PERF_CONTROL uAPI. If this functionality is not desired,
>> these fields can be kept as zero, as the kernel copies this value into
>> the corresponding sample without attempting to interpret it.
>>
>> Currently, only manual sampling sessions are supported, providing
>> samples when userspace calls PERF_CONTROL.SAMPLE, and only a single
>> session is allowed at a time. Multiple sessions and periodic sampling
>> will be enabled in following patches.
>>
>> No protection is provided against the 32-bit hardware counter overflows,
>> so for the moment it is up to userspace to ensure that the counters are
>> sampled at a reasonable frequency.
>>
>> The counter set enum is added to the uapi to clarify the restrictions on
>> calling the interface.
>>
>> Signed-off-by: Lukas Zapolskas <lukas.zapolskas@arm.com>
>> ---
>> drivers/gpu/drm/panthor/panthor_device.h | 3 +
>> drivers/gpu/drm/panthor/panthor_drv.c | 1 +
>> drivers/gpu/drm/panthor/panthor_perf.c | 694 ++++++++++++++++++++++-
>> drivers/gpu/drm/panthor/panthor_perf.h | 16 +
>> 4 files changed, 713 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
>> index 818c4d96d448..3fa0882fe81b 100644
>> --- a/drivers/gpu/drm/panthor/panthor_device.h
>> +++ b/drivers/gpu/drm/panthor/panthor_device.h
>> @@ -225,6 +225,9 @@ struct panthor_file {
>> /** @ptdev: Device attached to this file. */
>> struct panthor_device *ptdev;
>>
>> + /** @drm_file: Corresponding drm_file */
>
>> + struct drm_file *drm_file;
>
> I'm sceptical about adding this here, and suspect we don't need it. I mentioned why in the
> review for the next patch.
>
I see what you mean. Will try that out, thanks.
>> +
>> /** @vms: VM pool attached to this file. */
>> struct panthor_vm_pool *vms;
>>
>> diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c
>> index 9d2b716cca45..4c1381320859 100644
>> --- a/drivers/gpu/drm/panthor/panthor_drv.c
>> +++ b/drivers/gpu/drm/panthor/panthor_drv.c
>> @@ -1356,6 +1356,7 @@ panthor_open(struct drm_device *ddev, struct drm_file *file)
>> }
>>
>> pfile->ptdev = ptdev;
>> + pfile->drm_file = file;
>>
>> ret = panthor_vm_pool_create(pfile);
>> if (ret)
>> diff --git a/drivers/gpu/drm/panthor/panthor_perf.c b/drivers/gpu/drm/panthor/panthor_perf.c
>> index 9365ce9fed04..15fa533731f3 100644
>> --- a/drivers/gpu/drm/panthor/panthor_perf.c
>> +++ b/drivers/gpu/drm/panthor/panthor_perf.c
>> @@ -2,13 +2,177 @@
>> /* Copyright 2023 Collabora Ltd */
>> /* Copyright 2025 Arm ltd. */
>>
>> -#include <linux/bitops.h>
>> +#include <drm/drm_gem.h>
>> #include <drm/panthor_drm.h>
>> +#include <linux/bitops.h>
>> +#include <linux/circ_buf.h>
>>
>> #include "panthor_device.h"
>> #include "panthor_fw.h"
>> #include "panthor_perf.h"
>>
>> +/**
>> + * PANTHOR_PERF_EM_BITS - Number of bits in a user-facing enable mask. This must correspond
>> + * to the maximum number of counters available for selection on the newest
>> + * Mali GPUs (128 as of the Mali-Gx15).
>> + */
>> +#define PANTHOR_PERF_EM_BITS (BITS_PER_TYPE(u64) * 2)
>> +
>> +enum panthor_perf_session_state {
>> + /** @PANTHOR_PERF_SESSION_ACTIVE: The session is active and can be used for sampling. */
>> + PANTHOR_PERF_SESSION_ACTIVE = 0,
>> +
>> + /**
>> + * @PANTHOR_PERF_SESSION_OVERFLOW: The session encountered an overflow in one of the
>> + * counters during the last sampling period. This flag
>> + * gets propagated as part of samples emitted for this
>> + * session, to ensure the userspace client can gracefully
>> + * handle this data corruption.
>> + */
>> + PANTHOR_PERF_SESSION_OVERFLOW,
>> +
>> + /* Must be last */
>> + PANTHOR_PERF_SESSION_MAX,
>> +};
>> +
>> +struct panthor_perf_enable_masks {
>> + /**
>> + * @mask: Array of bitmasks indicating the counters userspace requested, where
>> + * one bit represents a single counter. Used to build the firmware configuration
>> + * and ensure that userspace clients obtain only the counters they requested.
>> + */
>> + unsigned long mask[DRM_PANTHOR_PERF_BLOCK_MAX][BITS_TO_LONGS(PANTHOR_PERF_EM_BITS)];
>> +};
>> +
>> +struct panthor_perf_counter_block {
>> + struct drm_panthor_perf_block_header header;
>> + u64 counters[];
>> +};
>
> This is a redefinition.
>
>> +/**
>> + * enum session_sample_type - Enum of the types of samples a session can request.
>> + */
>> +enum session_sample_type {
>> + /** @SAMPLE_TYPE_NONE: A sample has not been requested by this session. */
>> + SAMPLE_TYPE_NONE,
>> +
>> + /** @SAMPLE_TYPE_INITIAL: An initial sample has been requested by this session. */
>> + SAMPLE_TYPE_INITIAL,
>> +
>> + /** @SAMPLE_TYPE_REGULAR: A regular sample has been requested by this session. */
>> + SAMPLE_TYPE_REGULAR,
>> +};
>> +
>> +struct panthor_perf_session {
>> + DECLARE_BITMAP(state, PANTHOR_PERF_SESSION_MAX);
>> +
>> + /**
>> + * @pending_sample_request: The type of sample request that is currently pending:
>> + * - when a sample is not requested, the data should be accumulated
>> + * into the next slot of its ring buffer, but the extract index
>> + * should not be updated, and the user-space session must
>> + * not be signaled.
>> + * - when an initial sample is requested, the data must not be
>> + * emitted into the target ring buffer and the userspace client
>> + * must not be notified.
>> + * - when a regular sample is requested, the data must be emitted
>> + * into the target ring buffer, and the userspace client must
>> + * be signalled.
>> + */
>> + enum session_sample_type pending_sample_request;
>> +
>> + /**
>> + * @user_sample_size: The size of a single sample as exposed to userspace. For the sake of
>> + * simplicity, the current implementation exposes the same structure
>> + * as provided by firmware, after annotating the sample and the blocks,
>> + * and zero-extending the counters themselves (to account for in-kernel
>> + * accumulation).
>> + *
>> + * This may also allow further memory-optimizations of compressing the
>> + * sample to provide only requested blocks, if deemed to be worth the
>> + * additional complexity.
>> + */
>> + size_t user_sample_size;
>> +
>> + /**
>> + * @accum_idx: The last insert index indicates whether the current sample
>> + * needs zeroing before accumulation. This is used to disambiguate
>> + * between accumulating into an intermediate slot in the user ring buffer
>> + * and zero-ing the buffer before copying data over.
>> + */
>> + u32 accum_idx;
>> +
>> + /**
>> + * @sample_freq_ns: Period between subsequent sample requests. Zero indicates that
>> + * userspace will be responsible for requesting samples.
>> + */
>> + u64 sample_freq_ns;
>> +
>> + /** @sample_start_ns: Sample request time, obtained from a monotonic raw clock. */
>> + u64 sample_start_ns;
>> +
>> + /**
>> + * @user_data: Opaque handle passed in when starting a session, requesting a sample (for
>> + * manual sampling sessions only) and when stopping a session. This handle
>> + * allows the disambiguation of a sample in the ringbuffer.
>> + */
>> + u64 user_data;
>> +
>> + /**
>> + * @eventfd: Event file descriptor context used to signal userspace of a new sample
>> + * being emitted.
>> + */
>> + struct eventfd_ctx *eventfd;
>> +
>> + /**
>> + * @enabled_counters: This session's requested counters. Note that these cannot change
>> + * for the lifetime of the session.
>> + */
>> + struct panthor_perf_enable_masks *enabled_counters;
>> +
>> + /** @ringbuf_slots: Slots in the user-facing ringbuffer. */
>> + size_t ringbuf_slots;
>> +
>> + /** @ring_buf: BO for the userspace ringbuffer. */
>> + struct drm_gem_object *ring_buf;
>> +
>> + /**
>> + * @control_buf: BO for the insert and extract indices.
>> + */
>> + struct drm_gem_object *control_buf;
>> +
>> + /** @control: The mapped insert and extract indices. */
>> + struct drm_panthor_perf_ringbuf_control *control;
>> +
>> + /** @samples: The mapping of the @ring_buf into the kernel's VA space. */
>> + u8 *samples;
>> +
>> + /**
>> + * @pending: The list node used by the sampler to track the sessions that have not yet
>> + * received a sample.
>> + */
>> + struct list_head pending;
>> +
>> + /**
>> + * @sessions: The list node used by the sampler to track the sessions waiting for a sample.
>> + */
>> + struct list_head sessions;
>> +
>> + /**
>> + * @pfile: The panthor file which was used to create a session, used for the postclose
>> + * handling and to prevent a misconfigured userspace from closing unrelated
>> + * sessions.
>> + */
>> + struct panthor_file *pfile;
>> +
>> + /**
>> + * @ref: Session reference count. The sample delivery to userspace is asynchronous, meaning
>> + * the lifetime of the session must extend at least until the sample is exposed to
>> + * userspace.
>> + */
>> + struct kref ref;
>> +};
>> +
>> struct panthor_perf {
>> /** @next_session: The ID of the next session. */
>> u32 next_session;
>> @@ -72,6 +236,122 @@ static void panthor_perf_info_init(struct panthor_device *ptdev)
>> perf_info->sample_size = session_get_user_sample_size(perf_info);
>> }
>>
>> +static struct panthor_perf_enable_masks *panthor_perf_create_em(struct drm_panthor_perf_cmd_setup
>> + *setup_args)
>> +{
>> + struct panthor_perf_enable_masks *em = kmalloc(sizeof(*em), GFP_KERNEL);
>> + if (IS_ERR_OR_NULL(em))
>> + return em;
>> +
>> + bitmap_from_arr64(em->mask[DRM_PANTHOR_PERF_BLOCK_FW],
>> + setup_args->fw_enable_mask, PANTHOR_PERF_EM_BITS);
>> + bitmap_from_arr64(em->mask[DRM_PANTHOR_PERF_BLOCK_CSHW],
>> + setup_args->cshw_enable_mask, PANTHOR_PERF_EM_BITS);
>> + bitmap_from_arr64(em->mask[DRM_PANTHOR_PERF_BLOCK_TILER],
>> + setup_args->tiler_enable_mask, PANTHOR_PERF_EM_BITS);
>> + bitmap_from_arr64(em->mask[DRM_PANTHOR_PERF_BLOCK_MEMSYS],
>> + setup_args->memsys_enable_mask, PANTHOR_PERF_EM_BITS);
>> + bitmap_from_arr64(em->mask[DRM_PANTHOR_PERF_BLOCK_SHADER],
>> + setup_args->shader_enable_mask, PANTHOR_PERF_EM_BITS);
>> +
>> + return em;
>> +}
>> +
>> +static u64 session_read_extract_idx(struct panthor_perf_session *session)
>> +{
>> + const u64 slots = session->ringbuf_slots;
>> +
>> + /* Userspace will update their own extract index to indicate that a sample is consumed
>> + * from the ringbuffer, and we must ensure we read the latest value.
>> + */
>> + return smp_load_acquire(&session->control->extract_idx) % slots;
>> +}
>> +
>> +static u64 session_read_insert_idx(struct panthor_perf_session *session)
>> +{
>> + const u64 slots = session->ringbuf_slots;
>> +
>> + /*
>> + * Userspace is able to write to the insert index, since it is mapped
>> + * on the same page as the extract index. This should not happen
>> + * in regular operation.
>
> Why would userspace be able to write into the insert index? I guess in a
> ringbuffer setup, UM updates the extract index when it consumes a
> sample, and the kernel increases the insert index when it writes a new
> sample into the user-facing ringbuffer.
>
Currently, the insert index is mapped at a fixed offset from the extract index on the same
user visible page. Since they're both mapped as read/write, the user could write to it
and we need to handle that case to ensure there's no overflow.
>> + */
>> + return smp_load_acquire(&session->control->insert_idx) % slots;
>> +}
>> +
>> +static void session_get(struct panthor_perf_session *session)
>> +{
>> + kref_get(&session->ref);
>> +}
>> +
>> +static void session_free(struct kref *ref)
>> +{
>> + struct panthor_perf_session *session = container_of(ref, typeof(*session), ref);
>> +
>> + if (session->samples && session->ring_buf) {
>> + struct iosys_map map = IOSYS_MAP_INIT_VADDR(session->samples);
>> +
>> + drm_gem_vunmap_unlocked(session->ring_buf, &map);
>
> drm_gem_vunmap_unlocked() isn't declared in drm_gem.h when I rebase the patch series onto drm-misc. I guess it means either you're basing this patch series on a previous WIP branch or else it's misspelt?
>
Looks like I was based on an old commit, the unlocked variants were renamed to drop the `_unlocked` postfix
in [0]. Will fix these up.
>> + drm_gem_object_put(session->ring_buf);
>> + }
>> +
>> + if (session->control && session->control_buf) {
>> + struct iosys_map map = IOSYS_MAP_INIT_VADDR(session->control);
>> +
>> + drm_gem_vunmap_unlocked(session->control_buf, &map);
>> + drm_gem_object_put(session->control_buf);
>> + }
>> +
>> + eventfd_ctx_put(session->eventfd);
>> +
>> + kfree(session);
>> +}
>> +
>> +static void session_put(struct panthor_perf_session *session)
>> +{
>> + kref_put(&session->ref, session_free);
>> +}
>> +
>> +/**
>> + * session_find - Find a session associated with the given session ID and
>> + * panthor_file.
>> + * @pfile: Panthor file.
>> + * @perf: Panthor perf.
>> + * @sid: Session ID.
>> + *
>> + * The reference count of a valid session is increased to ensure it does not disappear
>> + * in the window between the XA lock being dropped and the internal session functions
>> + * being called.
>> + *
>> + * Return: valid session pointer or an ERR_PTR.
>> + */
>> +static struct panthor_perf_session *session_find(struct panthor_file *pfile,
>> + struct panthor_perf *perf, u32 sid)
>> +{
>> + struct panthor_perf_session *session;
>> +
>> + if (!perf)
>> + return ERR_PTR(-EINVAL);
>> +
>> + xa_lock(&perf->sessions);
>> + session = xa_load(&perf->sessions, sid);
>> +
>> + if (!session || xa_is_err(session)) {
>> + xa_unlock(&perf->sessions);
>> + return ERR_PTR(-EBADF);
>> + }
>> +
>> + if (session->pfile != pfile) {
>> + xa_unlock(&perf->sessions);
>> + return ERR_PTR(-EINVAL);
>> + }
>> +
>> + session_get(session);
>> + xa_unlock(&perf->sessions);
>> +
>> + return session;
>> +}
>> +
>> /**
>> * panthor_perf_init - Initialize the performance counter subsystem.
>> * @ptdev: Panthor device
>> @@ -109,6 +389,412 @@ int panthor_perf_init(struct panthor_device *ptdev)
>> return ret;
>> }
>>
>> +static int session_validate_set(u8 set)
>> +{
>> + if (set > DRM_PANTHOR_PERF_SET_TERTIARY)
>> + return -EINVAL;
>> +
>> + if (set == DRM_PANTHOR_PERF_SET_PRIMARY)
>> + return 0;
>> +
>> + if (set > DRM_PANTHOR_PERF_SET_PRIMARY)
>> + return capable(CAP_PERFMON) ? 0 : -EACCES;
>> +
>> + return -EINVAL;
>> +}
>> +
>> +/**
>> + * panthor_perf_session_setup - Create a user-visible session.
>> + *
>> + * @ptdev: Handle to the panthor device.
>> + * @perf: Handle to the perf control structure.
>> + * @setup_args: Setup arguments passed in via ioctl.
>> + * @pfile: Panthor file associated with the request.
>> + *
>> + * Creates a new session associated with the session ID returned. When initialized, the
>> + * session must explicitly request sampling to start with a successive call to PERF_CONTROL.START.
>> + *
>> + * Return: non-negative session identifier on success or negative error code on failure.
>> + */
>> +int panthor_perf_session_setup(struct panthor_device *ptdev, struct panthor_perf *perf,
>> + struct drm_panthor_perf_cmd_setup *setup_args,
>> + struct panthor_file *pfile)
>> +{
>> + struct panthor_perf_session *session;
>> + struct drm_gem_object *ringbuffer;
>> + struct drm_gem_object *control;
>> + const size_t slots = setup_args->sample_slots;
>> + struct panthor_perf_enable_masks *em;
>> + struct iosys_map rb_map, ctrl_map;
>> + size_t user_sample_size;
>> + int session_id;
>> + int ret;
>> +
>> + ret = session_validate_set(setup_args->block_set);
>> + if (ret) {
>> + drm_err(&ptdev->base, "Did not meet requirements for set %d\n",
>> + setup_args->block_set);
>> + return ret;
>> + }
>> +
>> + session = kzalloc(sizeof(*session), GFP_KERNEL);
>> + if (ZERO_OR_NULL_PTR(session))
>> + return -ENOMEM;
>> +
>> + ringbuffer = drm_gem_object_lookup(pfile->drm_file, setup_args->ringbuf_handle);
>> + if (!ringbuffer) {
>> + drm_err(&ptdev->base, "Could not find handle %d!\n", setup_args->ringbuf_handle);
>> + ret = -EINVAL;
>> + goto cleanup_session;
>> + }
>> +
>> + control = drm_gem_object_lookup(pfile->drm_file, setup_args->control_handle);
>> + if (!control) {
>> + drm_err(&ptdev->base, "Could not find handle %d!\n", setup_args->control_handle);
>> + ret = -EINVAL;
>> + goto cleanup_ringbuf;
>> + }
>> +
>> + user_sample_size = session_get_user_sample_size(&ptdev->perf_info) * slots;
>> +
>> + if (ringbuffer->size != PFN_ALIGN(user_sample_size)) {
>> + drm_err(&ptdev->base, "Incorrect ringbuffer size from userspace: user %zu vs kernel %lu\n",
>> + ringbuffer->size, PFN_ALIGN(user_sample_size));
>> + ret = -ENOMEM;
>> + goto cleanup_control;
>> + }
>> +
>> + ret = drm_gem_vmap_unlocked(ringbuffer, &rb_map);
>
> Same here, drm_gem_vmap_unlocked() isn't declared in any header files.
>
Will get this one too, thanks.
>> + if (ret)
>> + goto cleanup_control;
>> +
>> + ret = drm_gem_vmap_unlocked(control, &ctrl_map);
>> + if (ret)
>> + goto cleanup_ring_map;
>> +
>> + session->eventfd = eventfd_ctx_fdget(setup_args->fd);
>> + if (IS_ERR(session->eventfd)) {
>> + drm_err(&ptdev->base, "Invalid eventfd %d!\n", setup_args->fd);
>> + ret = PTR_ERR_OR_ZERO(session->eventfd) ?: -EINVAL;
>> + goto cleanup_control_map;
>> + }
>> +
>> + em = panthor_perf_create_em(setup_args);
>> + if (IS_ERR_OR_NULL(em)) {
>> + ret = -ENOMEM;
>> + goto cleanup_eventfd;
>> + }
>> +
>> + INIT_LIST_HEAD(&session->sessions);
>> + INIT_LIST_HEAD(&session->pending);
>> +
>> + session->control = ctrl_map.vaddr;
>> + *session->control = (struct drm_panthor_perf_ringbuf_control) { 0 };
>> +
>> + session->samples = rb_map.vaddr;
>> +
>> + /* TODO This will need validation when we support periodic sampling sessions */
>> + if (setup_args->sample_freq_ns) {
>> + ret = -EOPNOTSUPP;
>> + goto cleanup_em;
>> + }
>> +
>> + ret = xa_alloc_cyclic(&perf->sessions, &session_id, session, perf->session_range,
>> + &perf->next_session, GFP_KERNEL);
>> + if (ret < 0) {
>> + drm_err(&ptdev->base, "System session limit exceeded.\n");
>> + ret = -EBUSY;
>> + goto cleanup_em;
>> + }
>> +
>> + kref_init(&session->ref);
>> + session->enabled_counters = em;
>> +
>> + session->sample_freq_ns = setup_args->sample_freq_ns;
>> + session->user_sample_size = user_sample_size;
>> + session->ring_buf = ringbuffer;
>> + session->ringbuf_slots = slots;
>> + session->control_buf = control;
>> + session->pfile = pfile;
>> + session->accum_idx = U32_MAX;
>> +
>> + return session_id;
>> +
>> +cleanup_em:
>> + kfree(em);
>> +
>> +cleanup_eventfd:
>> + eventfd_ctx_put(session->eventfd);
>> +
>> +cleanup_control_map:
>> + drm_gem_vunmap_unlocked(control, &ctrl_map);
>> +
>> +cleanup_ring_map:
>> + drm_gem_vunmap_unlocked(ringbuffer, &rb_map);
>> +
>> +cleanup_control:
>> + drm_gem_object_put(control);
>> +
>> +cleanup_ringbuf:
>> + drm_gem_object_put(ringbuffer);
>> +
>> +cleanup_session:
>> + kfree(session);
>> +
>> + return ret;
>> +}
>> +
>> +static int session_stop(struct panthor_perf *perf, struct panthor_perf_session *session,
>> + u64 user_data)
>> +{
>> + if (!test_bit(PANTHOR_PERF_SESSION_ACTIVE, session->state))
>> + return 0;
>> +
>> + const u64 extract_idx = session_read_extract_idx(session);
>> + const u64 insert_idx = session_read_insert_idx(session);
>> +
>> + /* Must have at least one slot remaining in the ringbuffer to sample. */
>> + if (WARN_ON_ONCE(!CIRC_SPACE_TO_END(insert_idx, extract_idx, session->ringbuf_slots)))
>> + return -EBUSY;
>> +
>> + session->user_data = user_data;
>> +
>> + clear_bit(PANTHOR_PERF_SESSION_ACTIVE, session->state);
>> +
>> + /* TODO Calls to the FW interface will go here in later patches. */
>> + return 0;
>> +}
>> +
>> +static int session_start(struct panthor_perf *perf, struct panthor_perf_session *session,
>> + u64 user_data)
>> +{
>> + if (test_bit(PANTHOR_PERF_SESSION_ACTIVE, session->state))
>> + return 0;
>> +
>> + set_bit(PANTHOR_PERF_SESSION_ACTIVE, session->state);
>> +
>> + /*
>> + * For manual sampling sessions, a start command does not correspond to a sample,
>> + * and so the user data gets discarded.
>> + */
>> + if (session->sample_freq_ns)
>> + session->user_data = user_data;
>> +
>> + /* TODO Calls to the FW interface will go here in later patches. */
>> + return 0;
>> +}
>> +
>> +static int session_sample(struct panthor_perf *perf, struct panthor_perf_session *session,
>> + u64 user_data)
>> +{
>> + if (!test_bit(PANTHOR_PERF_SESSION_ACTIVE, session->state))
>> + return 0;
>> +
>> + const u64 extract_idx = session_read_extract_idx(session);
>> + const u64 insert_idx = session_read_insert_idx(session);
>> +
>> + /* Manual sampling for periodic sessions is forbidden. */
>> + if (session->sample_freq_ns)
>> + return -EINVAL;
>> +
>> + /*
>> + * Must have at least two slots remaining in the ringbuffer to sample: one for
>> + * the current sample, and one for a stop sample, since a stop command should
>> + * always be acknowledged by taking a final sample and stopping the session.
>> + */
>> + if (CIRC_SPACE_TO_END(insert_idx, extract_idx, session->ringbuf_slots) < 2)
>> + return -EBUSY;
>> +
>> + session->sample_start_ns = ktime_get_raw_ns();
>> + session->user_data = user_data;
>> +
>> + return 0;
>> +}
>> +
>> +static int session_destroy(struct panthor_perf *perf, struct panthor_perf_session *session)
>> +{
>> + session_put(session);
>> +
>> + return 0;
>> +}
>> +
>> +static int session_teardown(struct panthor_perf *perf, struct panthor_perf_session *session)
>> +{
>> + if (test_bit(PANTHOR_PERF_SESSION_ACTIVE, session->state))
>> + return -EINVAL;
>> +
>> + if (READ_ONCE(session->pending_sample_request) == SAMPLE_TYPE_NONE)
>> + return -EBUSY;
>> +
>> + return session_destroy(perf, session);
>> +}
>> +
>> +/**
>> + * panthor_perf_session_teardown - Teardown the session associated with the @sid.
>> + * @pfile: Open panthor file.
>> + * @perf: Handle to the perf control structure.
>> + * @sid: Session identifier.
>> + *
>> + * Destroys a stopped session where the last sample has been explicitly consumed
>> + * or discarded. Active sessions will be ignored.
>> + *
>> + * Return: 0 on success, negative error code on failure.
>> + */
>> +int panthor_perf_session_teardown(struct panthor_file *pfile, struct panthor_perf *perf, u32 sid)
>> +{
>> + int err;
>> + struct panthor_perf_session *session;
>> +
>> + xa_lock(&perf->sessions);
>> + session = __xa_store(&perf->sessions, sid, NULL, GFP_KERNEL);
>> +
>> + if (xa_is_err(session)) {
>> + err = xa_err(session);
>> + goto restore;
>> + }
>> +
>> + if (session->pfile != pfile) {
>> + err = -EINVAL;
>> + goto restore;
>> + }
>> +
>> + session_get(session);
>> + xa_unlock(&perf->sessions);
>> +
>> + err = session_teardown(perf, session);
>> +
>> + session_put(session);
>> +
>> + return err;
>> +
>> +restore:
>> + __xa_store(&perf->sessions, sid, session, GFP_KERNEL);
>> + xa_unlock(&perf->sessions);
>> +
>> + return err;
>> +}
>> +
>> +/**
>> + * panthor_perf_session_start - Start sampling on a stopped session.
>> + * @pfile: Open panthor file.
>> + * @perf: Handle to the panthor perf control structure.
>> + * @sid: Session identifier for the desired session.
>> + * @user_data: An opaque value passed in from userspace.
>> + *
>> + * A session counts as stopped when it is created or when it is explicitly stopped after being
>> + * started. Starting an active session is treated as a no-op.
>> + *
>> + * The @user_data parameter will be associated with all subsequent samples for a periodic
>> + * sampling session and will be ignored for manual sampling ones in favor of the user data
>> + * passed in the PERF_CONTROL.SAMPLE ioctl call.
>> + *
>> + * Return: 0 on success, negative error code on failure.
>> + */
>> +int panthor_perf_session_start(struct panthor_file *pfile, struct panthor_perf *perf,
>> + u32 sid, u64 user_data)
>> +{
>> + struct panthor_perf_session *session = session_find(pfile, perf, sid);
>> + int err;
>> +
>> + if (IS_ERR_OR_NULL(session))
>> + return IS_ERR(session) ? PTR_ERR(session) : -EINVAL;
>> +
>> + err = session_start(perf, session, user_data);
>> +
>> + session_put(session);
>> +
>> + return err;
>> +}
>> +
>> +/**
>> + * panthor_perf_session_stop - Stop sampling on an active session.
>> + * @pfile: Open panthor file.
>> + * @perf: Handle to the panthor perf control structure.
>> + * @sid: Session identifier for the desired session.
>> + * @user_data: An opaque value passed in from userspace.
>> + *
>> + * A session counts as active when it has been explicitly started via the PERF_CONTROL.START
>> + * ioctl. Stopping a stopped session is treated as a no-op.
>> + *
>> + * To ensure data is not lost when sampling is stopping, there must always be at least one slot
>> + * available for the final automatic sample, and the stop command will be rejected if there is not.
>> + *
>> + * The @user_data will always be associated with the final sample.
>> + *
>> + * Return: 0 on success, negative error code on failure.
>> + */
>> +int panthor_perf_session_stop(struct panthor_file *pfile, struct panthor_perf *perf,
>> + u32 sid, u64 user_data)
>> +{
>> + struct panthor_perf_session *session = session_find(pfile, perf, sid);
>> + int err;
>> +
>> + if (IS_ERR_OR_NULL(session))
>> + return IS_ERR(session) ? PTR_ERR(session) : -EINVAL;
>> +
>> + err = session_stop(perf, session, user_data);
>> +
>> + session_put(session);
>> +
>> + return err;
>> +}
>> +
>> +/**
>> + * panthor_perf_session_sample - Request a sample on a manual sampling session.
>> + * @pfile: Open panthor file.
>> + * @perf: Handle to the panthor perf control structure.
>> + * @sid: Session identifier for the desired session.
>> + * @user_data: An opaque value passed in from userspace.
>> + *
>> + * Only an active manual sampler is permitted to request samples directly. Failing to meet either
>> + * of these conditions will cause the sampling request to be rejected. Requesting a manual sample
>> + * with a full ringbuffer will see the request being rejected.
>> + *
>> + * The @user_data will always be unambiguously associated one-to-one with the resultant sample.
>> + *
>> + * Return: 0 on success, negative error code on failure.
>> + */
>> +int panthor_perf_session_sample(struct panthor_file *pfile, struct panthor_perf *perf,
>> + u32 sid, u64 user_data)
>> +{
>> + struct panthor_perf_session *session = session_find(pfile, perf, sid);
>> + int err;
>> +
>> + if (IS_ERR_OR_NULL(session))
>> + return IS_ERR(session) ? PTR_ERR(session) : -EINVAL;
>> +
>> + err = session_sample(perf, session, user_data);
>> +
>> + session_put(session);
>> +
>> + return err;
>> +}
>> +
>> +/**
>> + * panthor_perf_session_destroy - Destroy a sampling session associated with the @pfile.
>> + * @perf: Handle to the panthor perf control structure.
>> + * @pfile: The file being closed.
>> + *
>> + * Must be called when the corresponding userspace process is destroyed and cannot close its
>> + * own sessions. As such, we offer no guarantees about data delivery.
>> + */
>> +void panthor_perf_session_destroy(struct panthor_file *pfile, struct panthor_perf *perf)
>> +{
>> + unsigned long sid;
>> + struct panthor_perf_session *session;
>> +
>> + if (!pfile || !perf)
>> + return;
>> +
>> + xa_for_each(&perf->sessions, sid, session)
>> + {
>> + if (session->pfile == pfile) {
>> + session_destroy(perf, session);
>> + xa_erase(&perf->sessions, sid);
>> + }
>> + }
>> +}
>> +
>> /**
>> * panthor_perf_unplug - Terminate the performance counter subsystem.
>> * @ptdev: Panthor device.
>> @@ -124,8 +810,14 @@ void panthor_perf_unplug(struct panthor_device *ptdev)
>> return;
>>
>> if (!xa_empty(&perf->sessions)) {
>> + unsigned long sid;
>> + struct panthor_perf_session *session;
>> +
>> drm_err(&ptdev->base,
>> "Performance counter sessions active when unplugging the driver!");
>> +
>> + xa_for_each(&perf->sessions, sid, session)
>> + session_destroy(perf, session);
>> }
>>
>> xa_destroy(&perf->sessions);
>> diff --git a/drivers/gpu/drm/panthor/panthor_perf.h b/drivers/gpu/drm/panthor/panthor_perf.h
>> index e4805727b9e7..89d61cd1f017 100644
>> --- a/drivers/gpu/drm/panthor/panthor_perf.h
>> +++ b/drivers/gpu/drm/panthor/panthor_perf.h
>> @@ -7,10 +7,26 @@
>>
>> #include <linux/types.h>
>>
>> +struct drm_panthor_perf_cmd_setup;
>> struct panthor_device;
>> +struct panthor_file;
>> +struct panthor_perf;
>>
>> int panthor_perf_init(struct panthor_device *ptdev);
>> void panthor_perf_unplug(struct panthor_device *ptdev);
>>
>> +int panthor_perf_session_setup(struct panthor_device *ptdev, struct panthor_perf *perf,
>> + struct drm_panthor_perf_cmd_setup *setup_args,
>> + struct panthor_file *pfile);
>> +int panthor_perf_session_teardown(struct panthor_file *pfile, struct panthor_perf *perf,
>> + u32 sid);
>> +int panthor_perf_session_start(struct panthor_file *pfile, struct panthor_perf *perf,
>> + u32 sid, u64 user_data);
>> +int panthor_perf_session_stop(struct panthor_file *pfile, struct panthor_perf *perf,
>> + u32 sid, u64 user_data);
>> +int panthor_perf_session_sample(struct panthor_file *pfile, struct panthor_perf *perf,
>> + u32 sid, u64 user_data);
>> +void panthor_perf_session_destroy(struct panthor_file *pfile, struct panthor_perf *perf);
>> +
>> #endif /* __PANTHOR_PERF_H__ */
>>
>> --
>> 2.33.0.dirty
>
>
> Adrian Larumbe
[0]: https://lore.kernel.org/dri-devel/20250322212608.40511-2-dmitry.osipenko@collabora.com/
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v4 4/7] drm/panthor: Introduce sampling sessions to handle userspace clients
2025-06-20 15:28 ` Steven Price
@ 2025-07-21 9:58 ` Lukas Zapolskas
0 siblings, 0 replies; 29+ messages in thread
From: Lukas Zapolskas @ 2025-07-21 9:58 UTC (permalink / raw)
To: Steven Price, Boris Brezillon, Liviu Dudau, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
dri-devel, linux-kernel
Cc: Adrián Larumbe
Hello Steve,
Thanks for taking a look. Looks like my rebase wasn't great, and I missed a patch
removing some of the GEM functions. Will get that fixed.
Kind regards,
Lukas
On 20/06/2025 16:28, Steven Price wrote:
> Hi Lukas,
>
> I was going to try testing this out, but it doesn't look functional. See
> below.
>
> On 16/05/2025 16:49, Lukas Zapolskas wrote:
> [...]
>> diff --git a/drivers/gpu/drm/panthor/panthor_perf.c b/drivers/gpu/drm/panthor/panthor_perf.c
>> index 9365ce9fed04..15fa533731f3 100644
>> --- a/drivers/gpu/drm/panthor/panthor_perf.c
>> +++ b/drivers/gpu/drm/panthor/panthor_perf.c
>> @@ -2,13 +2,177 @@
>> /* Copyright 2023 Collabora Ltd */
>> /* Copyright 2025 Arm ltd. */
>>
>> -#include <linux/bitops.h>
>> +#include <drm/drm_gem.h>
>> #include <drm/panthor_drm.h>
>> +#include <linux/bitops.h>
>> +#include <linux/circ_buf.h>
>>
>> #include "panthor_device.h"
>> #include "panthor_fw.h"
>> #include "panthor_perf.h"
>>
>> +/**
>> + * PANTHOR_PERF_EM_BITS - Number of bits in a user-facing enable mask. This must correspond
>> + * to the maximum number of counters available for selection on the newest
>> + * Mali GPUs (128 as of the Mali-Gx15).
>> + */
>> +#define PANTHOR_PERF_EM_BITS (BITS_PER_TYPE(u64) * 2)
>> +
>> +enum panthor_perf_session_state {
>> + /** @PANTHOR_PERF_SESSION_ACTIVE: The session is active and can be used for sampling. */
>> + PANTHOR_PERF_SESSION_ACTIVE = 0,
>> +
>> + /**
>> + * @PANTHOR_PERF_SESSION_OVERFLOW: The session encountered an overflow in one of the
>> + * counters during the last sampling period. This flag
>> + * gets propagated as part of samples emitted for this
>> + * session, to ensure the userspace client can gracefully
>> + * handle this data corruption.
>> + */
>> + PANTHOR_PERF_SESSION_OVERFLOW,
>> +
>> + /* Must be last */
>> + PANTHOR_PERF_SESSION_MAX,
>> +};
>> +
>> +struct panthor_perf_enable_masks {
>> + /**
>> + * @mask: Array of bitmasks indicating the counters userspace requested, where
>> + * one bit represents a single counter. Used to build the firmware configuration
>> + * and ensure that userspace clients obtain only the counters they requested.
>> + */
>> + unsigned long mask[DRM_PANTHOR_PERF_BLOCK_MAX][BITS_TO_LONGS(PANTHOR_PERF_EM_BITS)];
>> +};
>> +
>> +struct panthor_perf_counter_block {
>> + struct drm_panthor_perf_block_header header;
>> + u64 counters[];
>> +};
>
> I think something has gone rather wrong in a rebasing. This struct was
> already added in patch 2. So this causes a build error (that the kernel
> test robot caught too).
>
> [...]
>> @@ -72,6 +236,122 @@ static void panthor_perf_info_init(struct panthor_device *ptdev)
>> perf_info->sample_size = session_get_user_sample_size(perf_info);
>> }
>>
>> +static struct panthor_perf_enable_masks *panthor_perf_create_em(struct drm_panthor_perf_cmd_setup
>> + *setup_args)
>
> There's some code style mis-formatting like this - which is then fixed
> up in patch 5. So it looks like you've applied fixups to the wrong commit.
>
> Also this series will need rebasing because there's some upstream
> changes that it's now conflicting with. The base commit looks pretty
> ancient now.
>
> Thanks,
> Steve
>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v4 7/7] drm/panthor: Expose the panthor perf ioctls
2025-07-18 15:19 ` Adrián Larumbe
@ 2025-07-25 9:09 ` Lukas Zapolskas
0 siblings, 0 replies; 29+ messages in thread
From: Lukas Zapolskas @ 2025-07-25 9:09 UTC (permalink / raw)
To: Adrián Larumbe
Cc: Boris Brezillon, Steven Price, Liviu Dudau, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
dri-devel, linux-kernel
On 18/07/2025 16:19, Adrián Larumbe wrote:
> Hi Lucas, another missing remark from the original review,
>
> On 16.05.2025 16:49, Lukas Zapolskas wrote:
>> This patch implements the PANTHOR_PERF_CONTROL ioctl series, and
>> a PANTHOR_GET_UOBJ wrapper to deal with the backwards and forwards
>> compatibility of the uAPI.
>>
>> The minor version is bumped to indicate that the feature is now
>> supported.
>>
>> Signed-off-by: Lukas Zapolskas <lukas.zapolskas@arm.com>
>> Reviewed-by: Adrián Larumbe <adrian.larumbe@collabora.com>
>> ---
>> drivers/gpu/drm/panthor/panthor_drv.c | 141 +++++++++++++++++++++++++-
>> 1 file changed, 139 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c
>> index 4c1381320859..850a894fe91b 100644
>> --- a/drivers/gpu/drm/panthor/panthor_drv.c
>> +++ b/drivers/gpu/drm/panthor/panthor_drv.c
>> @@ -31,6 +31,7 @@
>> #include "panthor_gpu.h"
>> #include "panthor_heap.h"
>> #include "panthor_mmu.h"
>> +#include "panthor_perf.h"
>> #include "panthor_regs.h"
>> #include "panthor_sched.h"
>>
>> @@ -73,6 +74,39 @@ panthor_set_uobj(u64 usr_ptr, u32 usr_size, u32 min_size, u32 kern_size, const v
>> return 0;
>> }
>>
>> +/**
>> + * panthor_get_uobj() - Copy kernel object to user object.
>> + * @usr_ptr: Users pointer.
>> + * @usr_size: Size of the user object.
>> + * @min_size: Minimum size for this object.
>> + *
>> + * Helper automating kernel -> user object copies.
>> + *
>> + * Don't use this function directly, use PANTHOR_UOBJ_GET() instead.
>> + *
>> + * Return: valid pointer on success, an encoded error code otherwise.
>> + */
>> +static void*
>> +panthor_get_uobj(u64 usr_ptr, u32 usr_size, u32 min_size)
>> +{
>> + int ret;
>> + void *out_alloc __free(kvfree) = NULL;
>> +
>> + /* User size shouldn't be smaller than the minimal object size. */
>> + if (usr_size < min_size)
>> + return ERR_PTR(-EINVAL);
>> +
>> + out_alloc = kvmalloc(min_size, GFP_KERNEL);
>> + if (!out_alloc)
>> + return ERR_PTR(-ENOMEM);
>> +
>> + ret = copy_struct_from_user(out_alloc, min_size, u64_to_user_ptr(usr_ptr), usr_size);
>> + if (ret)
>> + return ERR_PTR(ret);
>> +
>> + return_ptr(out_alloc);
>> +}
>> +
>> /**
>> * panthor_get_uobj_array() - Copy a user object array into a kernel accessible object array.
>> * @in: The object array to copy.
>> @@ -176,7 +210,12 @@ panthor_get_uobj_array(const struct drm_panthor_obj_array *in, u32 min_stride,
>> PANTHOR_UOBJ_DECL(struct drm_panthor_queue_submit, syncs), \
>> PANTHOR_UOBJ_DECL(struct drm_panthor_queue_create, ringbuf_size), \
>> PANTHOR_UOBJ_DECL(struct drm_panthor_vm_bind_op, syncs), \
>> - PANTHOR_UOBJ_DECL(struct drm_panthor_perf_info, shader_blocks))
>> + PANTHOR_UOBJ_DECL(struct drm_panthor_perf_info, shader_blocks), \
>> + PANTHOR_UOBJ_DECL(struct drm_panthor_perf_cmd_setup, shader_enable_mask), \
>> + PANTHOR_UOBJ_DECL(struct drm_panthor_perf_cmd_start, user_data), \
>> + PANTHOR_UOBJ_DECL(struct drm_panthor_perf_cmd_stop, user_data), \
>> + PANTHOR_UOBJ_DECL(struct drm_panthor_perf_cmd_sample, user_data))
>> +
>>
>> /**
>> * PANTHOR_UOBJ_SET() - Copy a kernel object to a user object.
>> @@ -191,6 +230,24 @@ panthor_get_uobj_array(const struct drm_panthor_obj_array *in, u32 min_stride,
>> PANTHOR_UOBJ_MIN_SIZE(_src_obj), \
>> sizeof(_src_obj), &(_src_obj))
>>
>> +/**
>> + * PANTHOR_UOBJ_GET() - Copies a user object from _usr_ptr to a kernel accessible _dest_ptr.
>> + * @_dest_ptr: Local variable
>> + * @_usr_size: Size of the user object.
>> + * @_usr_ptr: The pointer of the object in userspace.
>> + *
>> + * Return: Error code. See panthor_get_uobj().
>> + */
>> +#define PANTHOR_UOBJ_GET(_dest_ptr, _usr_size, _usr_ptr) \
>> + ({ \
>> + typeof(_dest_ptr) _tmp; \
>> + _tmp = panthor_get_uobj(_usr_ptr, _usr_size, \
>> + PANTHOR_UOBJ_MIN_SIZE(_tmp[0])); \
>> + if (!IS_ERR(_tmp)) \
>> + _dest_ptr = _tmp; \
>> + PTR_ERR_OR_ZERO(_tmp); \
>> + })
>> +
>> /**
>> * PANTHOR_UOBJ_GET_ARRAY() - Copy a user object array to a kernel accessible
>> * object array.
>> @@ -1339,6 +1396,83 @@ static int panthor_ioctl_vm_get_state(struct drm_device *ddev, void *data,
>> return 0;
>> }
>>
>> +#define perf_cmd(command) \
>> + ({ \
>> + struct drm_panthor_perf_cmd_##command *command##_args __free(kvfree) = NULL; \
>> + int _ret = PANTHOR_UOBJ_GET(command##_args, args->size, args->pointer); \
>> + if (_ret) \
>> + return _ret; \
>> + return panthor_perf_session_##command(pfile, ptdev->perf, args->handle, \
>> + command##_args->user_data); \
>> + })
>> +
>> +static int panthor_ioctl_perf_control(struct drm_device *ddev, void *data,
>> + struct drm_file *file)
>> +{
>> + struct panthor_device *ptdev = container_of(ddev, struct panthor_device, base);
>> + struct panthor_file *pfile = file->driver_priv;
>> + struct drm_panthor_perf_control *args = data;
>> + int ret;
>> +
>> + if (!args->pointer) {
>> + switch (args->cmd) {
>> + case DRM_PANTHOR_PERF_COMMAND_SETUP:
>> + args->size = sizeof(struct drm_panthor_perf_cmd_setup);
>> + return 0;
>> +
>> + case DRM_PANTHOR_PERF_COMMAND_TEARDOWN:
>> + args->size = 0;
>> + return 0;
>> +
>> + case DRM_PANTHOR_PERF_COMMAND_START:
>> + args->size = sizeof(struct drm_panthor_perf_cmd_start);
>> + return 0;
>> +
>> + case DRM_PANTHOR_PERF_COMMAND_STOP:
>> + args->size = sizeof(struct drm_panthor_perf_cmd_stop);
>> + return 0;
>> +
>> + case DRM_PANTHOR_PERF_COMMAND_SAMPLE:
>> + args->size = sizeof(struct drm_panthor_perf_cmd_sample);
>> + return 0;
>> +
>> + default:
>> + return -EINVAL;
>> + }
>> + }
>> +
>> + switch (args->cmd) {
>> + case DRM_PANTHOR_PERF_COMMAND_SETUP:
>> + {
>> + struct drm_panthor_perf_cmd_setup *setup_args __free(kvfree) = NULL;
>> +
>> + ret = PANTHOR_UOBJ_GET(setup_args, args->size, args->pointer);
>> + if (ret)
>> + return -EINVAL;
>> +
>> + return panthor_perf_session_setup(ptdev, ptdev->perf, setup_args, pfile);
>
> I think this is something I had already brought up in the revision for v2 of the patch series,
> but I think I would pass the drm_file here straight away rather than the panthor file,
> then retrieve the panthor_file pointer from the file's driver_priv field inside
> panthor_perf_session_setup, and that way you can get rid of struct panthor_file::drm_file.
>
> I think this should be alright, because the only place where it'd be essential to keep
> a copy of the drm_file is in the session struct, to make sure sessions match their DRM device fd's.
>
Thank you for pointing this out, I hadn't quite understood the suggestion for v2. Will update
the patches accordingly.
>> + }
>> + case DRM_PANTHOR_PERF_COMMAND_TEARDOWN:
>> + {
>> + return panthor_perf_session_teardown(pfile, ptdev->perf, args->handle);
>> + }
>> + case DRM_PANTHOR_PERF_COMMAND_START:
>> + {
>> + perf_cmd(start);
>> + }
>> + case DRM_PANTHOR_PERF_COMMAND_STOP:
>> + {
>> + perf_cmd(stop);
>> + }
>> + case DRM_PANTHOR_PERF_COMMAND_SAMPLE:
>> + {
>> + perf_cmd(sample);
>> + }
>> + default:
>> + return -EINVAL;
>> + }
>> +}
>> +
>> static int
>> panthor_open(struct drm_device *ddev, struct drm_file *file)
>> {
>> @@ -1409,6 +1543,7 @@ static const struct drm_ioctl_desc panthor_drm_driver_ioctls[] = {
>> PANTHOR_IOCTL(TILER_HEAP_CREATE, tiler_heap_create, DRM_RENDER_ALLOW),
>> PANTHOR_IOCTL(TILER_HEAP_DESTROY, tiler_heap_destroy, DRM_RENDER_ALLOW),
>> PANTHOR_IOCTL(GROUP_SUBMIT, group_submit, DRM_RENDER_ALLOW),
>> + PANTHOR_IOCTL(PERF_CONTROL, perf_control, DRM_RENDER_ALLOW),
>> };
>>
>> static int panthor_mmap(struct file *filp, struct vm_area_struct *vma)
>> @@ -1518,6 +1653,8 @@ static void panthor_debugfs_init(struct drm_minor *minor)
>> * - 1.2 - adds DEV_QUERY_GROUP_PRIORITIES_INFO query
>> * - adds PANTHOR_GROUP_PRIORITY_REALTIME priority
>> * - 1.3 - adds DRM_PANTHOR_GROUP_STATE_INNOCENT flag
>> + * - 1.4 - adds DEV_QUERY_PERF_INFO query
>> + * - adds PERF_CONTROL ioctl
>> */
>> static const struct drm_driver panthor_drm_driver = {
>> .driver_features = DRIVER_RENDER | DRIVER_GEM | DRIVER_SYNCOBJ |
>> @@ -1531,7 +1668,7 @@ static const struct drm_driver panthor_drm_driver = {
>> .name = "panthor",
>> .desc = "Panthor DRM driver",
>> .major = 1,
>> - .minor = 3,
>> + .minor = 4,
>>
>> .gem_create_object = panthor_gem_create_object,
>> .gem_prime_import_sg_table = drm_gem_shmem_prime_import_sg_table,
>> --
>> 2.33.0.dirty
>
>
> Adrian Larumbe
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v4 6/7] drm/panthor: Add suspend, resume and reset handling
2025-07-18 15:01 ` Adrián Larumbe
@ 2025-07-25 9:26 ` Lukas Zapolskas
0 siblings, 0 replies; 29+ messages in thread
From: Lukas Zapolskas @ 2025-07-25 9:26 UTC (permalink / raw)
To: Adrián Larumbe
Cc: Boris Brezillon, Steven Price, Liviu Dudau, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
dri-devel, linux-kernel
On 18/07/2025 16:01, Adrián Larumbe wrote:
> On 16.05.2025 16:49, Lukas Zapolskas wrote:
>> The sampler must disable and re-enable counter sampling around suspends,
>> and must re-program the FW interface after a reset to avoid losing
>> data.
>>
>> Signed-off-by: Lukas Zapolskas <lukas.zapolskas@arm.com>
>> ---
>> drivers/gpu/drm/panthor/panthor_device.c | 7 +-
>> drivers/gpu/drm/panthor/panthor_perf.c | 102 +++++++++++++++++++++++
>> drivers/gpu/drm/panthor/panthor_perf.h | 6 ++
>> 3 files changed, 114 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/panthor/panthor_device.c b/drivers/gpu/drm/panthor/panthor_device.c
>> index 7ac985d44655..92624a8717c5 100644
>> --- a/drivers/gpu/drm/panthor/panthor_device.c
>> +++ b/drivers/gpu/drm/panthor/panthor_device.c
>> @@ -139,6 +139,7 @@ static void panthor_device_reset_work(struct work_struct *work)
>> if (!drm_dev_enter(&ptdev->base, &cookie))
>> return;
>>
>> + panthor_perf_pre_reset(ptdev);
>> panthor_sched_pre_reset(ptdev);
>> panthor_fw_pre_reset(ptdev, true);
>> panthor_mmu_pre_reset(ptdev);
>> @@ -148,6 +149,7 @@ static void panthor_device_reset_work(struct work_struct *work)
>> ret = panthor_fw_post_reset(ptdev);
>> atomic_set(&ptdev->reset.pending, 0);
>> panthor_sched_post_reset(ptdev, ret != 0);
>> + panthor_perf_post_reset(ptdev);
>> drm_dev_exit(cookie);
>>
>> if (ret) {
>> @@ -496,8 +498,10 @@ int panthor_device_resume(struct device *dev)
>> ret = panthor_device_resume_hw_components(ptdev);
>> }
>>
>> - if (!ret)
>> + if (!ret) {
>> panthor_sched_resume(ptdev);
>> + panthor_perf_resume(ptdev);
>> + }
>>
>> drm_dev_exit(cookie);
>>
>> @@ -561,6 +565,7 @@ int panthor_device_suspend(struct device *dev)
>> /* We prepare everything as if we were resetting the GPU.
>> * The end of the reset will happen in the resume path though.
>> */
>> + panthor_perf_suspend(ptdev);
>> panthor_sched_suspend(ptdev);
>> panthor_fw_suspend(ptdev);
>> panthor_mmu_suspend(ptdev);
>> diff --git a/drivers/gpu/drm/panthor/panthor_perf.c b/drivers/gpu/drm/panthor/panthor_perf.c
>> index 97603b168d2d..438319cf71ab 100644
>> --- a/drivers/gpu/drm/panthor/panthor_perf.c
>> +++ b/drivers/gpu/drm/panthor/panthor_perf.c
>> @@ -1845,6 +1845,76 @@ void panthor_perf_session_destroy(struct panthor_file *pfile, struct panthor_per
>> }
>> }
>>
>> +static int panthor_perf_sampler_resume(struct panthor_perf_sampler *sampler)
>> +{
>> + int ret;
>> +
>> + if (!atomic_read(&sampler->enabled_clients))
>> + return 0;
>> +
>> + ret = panthor_perf_fw_start_sampling(sampler->ptdev);
>> + if (ret)
>> + return ret;
>> +
>> + return 0;
>> +}
>> +
>> +static int panthor_perf_sampler_suspend(struct panthor_perf_sampler *sampler)
>> +{
>> + int ret;
>> +
>> + if (!atomic_read(&sampler->enabled_clients))
>> + return 0;
>> +
>> + ret = panthor_perf_fw_stop_sampling(sampler->ptdev);
>> + if (ret)
>> + return ret;
>> +
>> + return 0;
>> +}
>> +
>> +/**
>> + * panthor_perf_suspend - Prepare the performance counter subsystem for system suspend.
>> + * @ptdev: Panthor device.
>> + *
>> + * Indicate to the performance counters that the system is suspending.
>> + *
>> + * This function must not be used to handle MCU power state transitions: just before MCU goes
>> + * from on to any inactive state, an automatic sample will be performed by the firmware, and
>> + * the performance counter firmware state will be restored on warm boot.
>> + *
>> + * Return: 0 on success, negative error code on failure.
>> + */
>> +int panthor_perf_suspend(struct panthor_device *ptdev)
>> +{
>> + struct panthor_perf *perf = ptdev->perf;
>> +
>> + if (!perf)
>> + return 0;
>> +
>> + return panthor_perf_sampler_suspend(&perf->sampler);
>> +}
>> +
>> +/**
>> + * panthor_perf_resume - Resume the performance counter subsystem after system resumption.
>> + * @ptdev: Panthor device.
>> + *
>> + * Indicate to the performance counters that the system has resumed. This must not be used
>> + * to handle MCU state transitions, for the same reasons as detailed in the kerneldoc for
>> + * @panthor_perf_suspend.
>> + *
>> + * Return: 0 on success, negative error code on failure.
>> + */
>> +int panthor_perf_resume(struct panthor_device *ptdev)
>> +{
>> + struct panthor_perf *perf = ptdev->perf;
>> +
>> + if (!perf)
>> + return 0;
>> +
>> + return panthor_perf_sampler_resume(&perf->sampler);
>> +}
>
> In the two previous functions, you return an int, but you never used it
> from where they're called.
Thanks, will drop the return values from the perf_{suspend,resume} functions.
> Also, in both of them, for the sake of
> coherence, I'd get rid of the *sampler* subcalls because later in
> 'panthor_perf_pre_reset' and 'panthor_perf_post_reset' you manipulate the
> sampler directly without referring it to another function. The functions
> are short enough for us to be able to inline the content of
> 'panthor_perf_sampler_resume' into 'panthor_perf_resume'.
>
Will do.
>> +
>> /**
>> * panthor_perf_unplug - Terminate the performance counter subsystem.
>> * @ptdev: Panthor device.
>> @@ -1878,3 +1948,35 @@ void panthor_perf_unplug(struct panthor_device *ptdev)
>>
>> ptdev->perf = NULL;
>> }
>> +
>> +void panthor_perf_pre_reset(struct panthor_device *ptdev)
>> +{
>> + struct panthor_perf_sampler *sampler;
>> +
>> + if (!ptdev || !ptdev->perf)
>> + return;
>> +
>> + sampler = &ptdev->perf->sampler;
>> +
>> + if (!atomic_read(&sampler->enabled_clients))
>> + return;
>> +
>> + panthor_perf_fw_stop_sampling(sampler->ptdev);
>> +}
>> +
>> +void panthor_perf_post_reset(struct panthor_device *ptdev)
>> +{
>> + struct panthor_perf_sampler *sampler;
>> +
>> + if (!ptdev || !ptdev->perf)
>> + return;
>
> In both this function and the preceding one, ptdev is meant to be
> available by the time they're called, so I'd turn the check of ptdev not
> being null into a drm_WARN().
>
I'll drop the check for the ptdev entirely, since it looks like there will
be other issues before these functions are even called if it's null, and
add the drm_WARN_ON_ONCE for the perf pointer, since that should also be
initialized by this point.
>> +
>> + sampler = &ptdev->perf->sampler;
>> +
>> + if (!atomic_read(&sampler->enabled_clients))
>> + return;
>> +
>> + panthor_perf_fw_write_sampler_config(sampler);
>> +
>> + panthor_perf_fw_start_sampling(sampler->ptdev);
>> +}
>> diff --git a/drivers/gpu/drm/panthor/panthor_perf.h b/drivers/gpu/drm/panthor/panthor_perf.h
>> index c482198b6fbd..fc08a5440a35 100644
>> --- a/drivers/gpu/drm/panthor/panthor_perf.h
>> +++ b/drivers/gpu/drm/panthor/panthor_perf.h
>> @@ -13,6 +13,8 @@ struct panthor_file;
>> struct panthor_perf;
>>
>> int panthor_perf_init(struct panthor_device *ptdev);
>> +int panthor_perf_suspend(struct panthor_device *ptdev);
>> +int panthor_perf_resume(struct panthor_device *ptdev);
>> void panthor_perf_unplug(struct panthor_device *ptdev);
>>
>> int panthor_perf_session_setup(struct panthor_device *ptdev, struct panthor_perf *perf,
>> @@ -30,5 +32,9 @@ void panthor_perf_session_destroy(struct panthor_file *pfile, struct panthor_per
>>
>> void panthor_perf_report_irq(struct panthor_device *ptdev, u32 status);
>>
>> +void panthor_perf_pre_reset(struct panthor_device *ptdev);
>> +
>> +void panthor_perf_post_reset(struct panthor_device *ptdev);
>> +
>> #endif /* __PANTHOR_PERF_H__ */
>>
>> --
>> 2.33.0.dirty
>
>
> Adrian Larumbe
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCH v4 5/7] drm/panthor: Implement the counter sampler and sample handling
2025-07-18 14:49 ` Adrián Larumbe
@ 2025-07-25 10:29 ` Lukas Zapolskas
0 siblings, 0 replies; 29+ messages in thread
From: Lukas Zapolskas @ 2025-07-25 10:29 UTC (permalink / raw)
To: Adrián Larumbe
Cc: Boris Brezillon, Steven Price, Liviu Dudau, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
dri-devel, linux-kernel
On 18/07/2025 15:49, Adrián Larumbe wrote:
> On 16.05.2025 16:49, Lukas Zapolskas wrote:
>> From: Adrián Larumbe <adrian.larumbe@collabora.com>
>>
>> The sampler aggregates counter and set requests coming from userspace
>> and mediates interactions with the FW interface, to ensure that user
>> sessions cannot override the global configuration.
>>
>> From the top-level interface, the sampler supports two different types
>> of samples: clearing samples and regular samples. Clearing samples are
>> a special sample type that allow for the creation of a sampling
>> baseline, to ensure that a session does not obtain counter data from
>> before its creation.
>>
>> Upon receipt of a relevant interrupt, corresponding to one of the three
>> relevant bits of the GLB_ACK register, the sampler takes any samples
>> that occurred, and, based on the insert and extract indices, accumulates
>> them to an internal storage buffer after zero-extending the counters
>> from the 32-bit counters emitted by the hardware to 64-bit counters
>> for internal accumulation.
>>
>> When the performance counters are enabled, the FW ensures no counter
>> data is lost when entering and leaving non-counting regions by producing
>> automatic samples that do not correspond to a GLB_REQ.PRFCNT_SAMPLE
>> request. Such regions may be per hardware unit, such as when a shader
>> core powers down, or global. Most of these events do not directly
>> correspond to session sample requests, so any intermediary counter data
>> must be stored into a temporary accumulation buffer.
>>
>> If there are sessions waiting for a sample, this accumulated buffer will
>> be taken, and emitted for each waiting client. During this phase,
>> information like the timestamps of sample request and sample emission,
>> type of the counter block and block index annotations are added to the
>> sample header and block headers. If no sessions are waiting for
>> a sample, this accumulation buffer is kept until the next time a sample
>> is requested.
>>
>> Special handling is needed for the PRFCNT_OVERFLOW interrupt, which is
>> an indication that the internal sample handling rate was insufficient.
>>
>> The sampler also maintains a buffer descriptor indicating the structure
>> of a firmware sample, since neither the firmware nor the hardware give
>> any indication of the sample structure, only that it is composed out of
>> three parts:
>> - the metadata is an optional initial counter block on supporting
>> firmware versions that contains a single counter, indicating the
>> reason a sample was taken when entering global non-counting regions.
>> This is used to provide coarse-grained information about why a sample
>> was taken to userspace, to help userspace interpret variations in
>> counter magnitude.
>> - the firmware component of the sample is composed out of a global
>> firmware counter block on supporting firmware versions.
>> - the hardware component is the most sizeable of the three and contains
>> a block of counters for each of the underlying hardware resources. It
>> has a fixed structure that is described in the architecture
>> specification, and contains the command stream hardware block(s), the
>> tiler block(s), the MMU and L2 blocks (collectively named the memsys
>> blocks) and the shader core blocks, in that order.
>> The structure of this buffer changes based on the firmware and hardware
>> combination, but is constant on a single system.
>>
>> This buffer descriptor also handles the sparseness of the shader cores,
>> wherein the physical core mask contains holes, but the memory allocated
>> for it is done based on the position of the most significant bit. In
>> cases with highly sparse core masks, this means that a lot of shader
>> counter blocks are empty, and must be skipped.
>>
>> The number of ring buffer slots is configurable through module param to
>> allow for a lower memory footprint on memory constrained systems.
>>
>> Signed-off-by: Adrián Larumbe <adrian.larumbe@collabora.com>
>> Co-developed-by: Lukas Zapolskas <lukas.zapolskas@arm.com>
>> Signed-off-by: Lukas Zapolskas <lukas.zapolskas@arm.com>
>> ---
>> drivers/gpu/drm/panthor/panthor_fw.c | 6 +
>> drivers/gpu/drm/panthor/panthor_fw.h | 6 +-
>> drivers/gpu/drm/panthor/panthor_perf.c | 1082 +++++++++++++++++++++++-
>> drivers/gpu/drm/panthor/panthor_perf.h | 2 +
>> 4 files changed, 1080 insertions(+), 16 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/panthor/panthor_fw.c b/drivers/gpu/drm/panthor/panthor_fw.c
>> index 0f52766a3120..e3948354daa4 100644
>> --- a/drivers/gpu/drm/panthor/panthor_fw.c
>> +++ b/drivers/gpu/drm/panthor/panthor_fw.c
>> @@ -22,6 +22,7 @@
>> #include "panthor_gem.h"
>> #include "panthor_gpu.h"
>> #include "panthor_mmu.h"
>> +#include "panthor_perf.h"
>> #include "panthor_regs.h"
>> #include "panthor_sched.h"
>>
>> @@ -987,9 +988,12 @@ static void panthor_fw_init_global_iface(struct panthor_device *ptdev)
>>
>> /* Enable interrupts we care about. */
>> glb_iface->input->ack_irq_mask = GLB_CFG_ALLOC_EN |
>> + GLB_PERFCNT_SAMPLE |
>> GLB_PING |
>> GLB_CFG_PROGRESS_TIMER |
>> GLB_CFG_POWEROFF_TIMER |
>> + GLB_PERFCNT_THRESHOLD |
>> + GLB_PERFCNT_OVERFLOW |
>> GLB_IDLE_EN |
>> GLB_IDLE;
>>
>> @@ -1018,6 +1022,8 @@ static void panthor_job_irq_handler(struct panthor_device *ptdev, u32 status)
>> return;
>>
>> panthor_sched_report_fw_events(ptdev, status);
>> +
>> + panthor_perf_report_irq(ptdev, status);
>> }
>> PANTHOR_IRQ_HANDLER(job, JOB, panthor_job_irq_handler);
>>
>> diff --git a/drivers/gpu/drm/panthor/panthor_fw.h b/drivers/gpu/drm/panthor/panthor_fw.h
>> index 8bcb933fa790..5a561e72e88b 100644
>> --- a/drivers/gpu/drm/panthor/panthor_fw.h
>> +++ b/drivers/gpu/drm/panthor/panthor_fw.h
>> @@ -198,6 +198,7 @@ struct panthor_fw_global_control_iface {
>> u32 group_num;
>> u32 group_stride;
>> #define GLB_PERFCNT_FW_SIZE(x) ((((x) >> 16) << 8))
>> +#define GLB_PERFCNT_HW_SIZE(x) (((x) & GENMASK(15, 0)) << 8)
>> u32 perfcnt_size;
>> u32 instr_features;
>> #define PERFCNT_FEATURES_MD_SIZE(x) (((x) & GENMASK(3, 0)) << 8)
>> @@ -210,7 +211,7 @@ struct panthor_fw_global_input_iface {
>> #define GLB_CFG_ALLOC_EN BIT(2)
>> #define GLB_CFG_POWEROFF_TIMER BIT(3)
>> #define GLB_PROTM_ENTER BIT(4)
>> -#define GLB_PERFCNT_EN BIT(5)
>> +#define GLB_PERFCNT_ENABLE BIT(5)
>> #define GLB_PERFCNT_SAMPLE BIT(6)
>> #define GLB_COUNTER_EN BIT(7)
>> #define GLB_PING BIT(8)
>> @@ -243,6 +244,9 @@ struct panthor_fw_global_input_iface {
>> u64 perfcnt_base;
>> u32 perfcnt_extract;
>> u32 reserved3[3];
>> +#define GLB_PERFCNT_CONFIG_SIZE(x) ((x) & GENMASK(7, 0))
>> +#define GLB_PERFCNT_CONFIG_SET(x) (((x) & GENMASK(1, 0)) << 8)
>> +#define GLB_PERFCNT_METADATA_ENABLE BIT(10)
>> u32 perfcnt_config;
>> u32 perfcnt_csg_select;
>> u32 perfcnt_fw_enable;
>> diff --git a/drivers/gpu/drm/panthor/panthor_perf.c b/drivers/gpu/drm/panthor/panthor_perf.c
>> index 15fa533731f3..97603b168d2d 100644
>> --- a/drivers/gpu/drm/panthor/panthor_perf.c
>> +++ b/drivers/gpu/drm/panthor/panthor_perf.c
>> @@ -9,7 +9,11 @@
>>
>> #include "panthor_device.h"
>> #include "panthor_fw.h"
>> +#include "panthor_gem.h"
>> +#include "panthor_gpu.h"
>> +#include "panthor_mmu.h"
>> #include "panthor_perf.h"
>> +#include "panthor_regs.h"
>>
>> /**
>> * PANTHOR_PERF_EM_BITS - Number of bits in a user-facing enable mask. This must correspond
>> @@ -18,6 +22,81 @@
>> */
>> #define PANTHOR_PERF_EM_BITS (BITS_PER_TYPE(u64) * 2)
>>
>> +/**
>> + * PANTHOR_CTR_TIMESTAMP_LO - The first architecturally mandated counter of every block type
>> + * contains the low 32-bits of the TIMESTAMP value.
>> + */
>> +#define PANTHOR_CTR_TIMESTAMP_LO (0)
>> +
>> +/**
>> + * PANTHOR_CTR_TIMESTAMP_HI - The register offset containinig the high 32-bits of the TIMESTAMP
>> + * value.
>> + */
>> +#define PANTHOR_CTR_TIMESTAMP_HI (1)
>> +
>> +/**
>> + * PANTHOR_CTR_PRFCNT_EN - The register offset containing the enable mask for the enabled counters
>> + * that were written to memory.
>> + */
>> +#define PANTHOR_CTR_PRFCNT_EN (2)
>> +
>> +/**
>> + * PANTHOR_HEADER_COUNTERS - The first four counters of every block type are architecturally
>> + * defined to be equivalent. The fourth counter is always reserved,
>> + * and should be zero and as such, does not have a separate define.
>> + *
>> + * These are the only four counters that are the same between different
>> + * blocks and are consistent between different architectures.
>> + */
>> +#define PANTHOR_HEADER_COUNTERS (4)
>> +
>> +/**
>> + * PANTHOR_CTR_SAMPLE_REASON - The metadata block has a single value in position three which
>> + * indicates the reason a sample was taken.
>> + */
>> +#define PANTHOR_CTR_SAMPLE_REASON (3)
>> +
>> +/**
>> + * PANTHOR_HW_COUNTER_SIZE - The size of a hardware counter in the FW ring buffer.
>> + */
>> +#define PANTHOR_HW_COUNTER_SIZE (sizeof(u32))
>> +
>> +/**
>> + * PANTHOR_PERF_RINGBUF_SLOTS_MIN - The minimum permitted number of slots in the Panthor perf
>> + * ring buffer.
>> + */
>> +#define PANTHOR_PERF_RINGBUF_SLOTS_MIN (16)
>> +
>> +/**
>> + * PANTHOR_PERF_RINGBUF_SLOTS_MAX - The maximum permitted number of slots in the Panthor perf
>> + * ring buffer.
>> + */
>> +#define PANTHOR_PERF_RINGBUF_SLOTS_MAX (256)
>> +
>> +static unsigned int perf_ringbuf_slots = 32;
>> +
>> +static int perf_ringbuf_slots_set(const char *val, const struct kernel_param *kp)
>> +{
>> + unsigned int slots;
>> + int ret = kstrtouint(val, 0, &slots);
>> +
>> + if (ret)
>> + return ret;
>> +
>> + if (!is_power_of_2(slots))
>> + return -EINVAL;
>> +
>> + return param_set_uint_minmax(val, kp, 16, 256);
>> +}
>> +
>> +static const struct kernel_param_ops perf_ringbuf_ops = {
>> + .set = perf_ringbuf_slots_set,
>> + .get = param_get_uint,
>> +};
>> +module_param_cb(perf_ringbuf_slots, &perf_ringbuf_ops, &perf_ringbuf_slots, 0400);
>> +MODULE_PARM_DESC(perf_ringbuf_slots,
>> + "Power of two slots allocated for the Panthor perf kernel-FW ringbuffer");
>> +
>> enum panthor_perf_session_state {
>> /** @PANTHOR_PERF_SESSION_ACTIVE: The session is active and can be used for sampling. */
>> PANTHOR_PERF_SESSION_ACTIVE = 0,
>> @@ -63,6 +142,116 @@ enum session_sample_type {
>> SAMPLE_TYPE_REGULAR,
>> };
>>
>> +struct panthor_perf_buffer_descriptor {
>> + /**
>> + * @block_size: The size of a single block in the FW ring buffer, equal to
>> + * sizeof(u32) * counters_per_block.
>> + */
>> + size_t block_size;
>> +
>> + /**
>> + * @buffer_size: The total size of the buffer, equal to (#hardware blocks +
>> + * #firmware blocks) * block_size.
>> + */
>> + size_t buffer_size;
>> +
>> + /**
>> + * @available_blocks: Bitmask indicating the blocks supported by the hardware and firmware
>> + * combination. Note that this can also include blocks that will not
>> + * be exposed to the user.
>> + */
>> + DECLARE_BITMAP(available_blocks, DRM_PANTHOR_PERF_BLOCK_MAX);
>> + struct {
>> + /** @offset: Starting offset of a block of type @type in the FW ringbuffer. */
>> + size_t offset;
>> +
>> + /** @block_count: Number of blocks of the given @type, starting at @offset. */
>> + size_t block_count;
>> +
>> + /** @phys_mask: Bitmask of the physically available blocks. */
>> + u64 phys_mask;
>> + } blocks[DRM_PANTHOR_PERF_BLOCK_MAX];
>> +};
>> +
>> +/**
>> + * struct panthor_perf_sampler - Interface to de-multiplex firmware interaction and handle
>> + * global interactions.
>> + */
>> +struct panthor_perf_sampler {
>> + /**
>> + * @enabled_clients: The number of clients concurrently requesting samples. To ensure that
>> + * one client cannot deny samples to another, we must ensure that clients
>> + * are effectively reference counted.
>> + */
>> + atomic_t enabled_clients;
>> +
>> + /**
>> + * @sample_handled: Synchronization point between the interrupt bottom half and the
>> + * main sampler interface. Must be re-armed solely on a new request
>> + * coming to the sampler.
>> + */
>> + struct completion sample_handled;
>> +
>> + /** @rb: Kernel BO in the FW AS containing the sample ringbuffer. */
>> + struct panthor_kernel_bo *rb;
>> +
>> + /**
>> + * @sample_slots: Number of slots for samples in the FW ringbuffer. Could be static,
>> + * but may be useful to customize for low-memory devices.
>> + */
>> + size_t sample_slots;
>> +
>> + /** @em: Combined enable mask for all of the active sessions. */
>> + struct panthor_perf_enable_masks *em;
>> +
>> + /**
>> + * @desc: Buffer descriptor for a sample in the FW ringbuffer. Note that this buffer
>> + * at current time does some interesting things with the zeroth block type. On
>> + * newer FW revisions, the first counter block of the sample is the METADATA block,
>> + * which contains a single value indicating the reason the sample was taken (if
>> + * any). This block must not be exposed to userspace, as userspace does not
>> + * have sufficient context to interpret it. As such, this block type is not
>> + * added to the uAPI, but we still use it in the kernel.
>> + */
>> + struct panthor_perf_buffer_descriptor desc;
>> +
>> + /**
>> + * @sample: Pointer to an upscaled and annotated sample that may be emitted to userspace.
>> + * This is used both as an intermediate buffer to do the zero-extension of the
>> + * 32-bit counters to 64-bits and as a storage buffer in case the sampler
>> + * requests an additional sample that was not requested by any of the top-level
>> + * sessions (for instance, when changing the enable masks).
>> + */
>> + u8 *sample;
>> +
>> + /**
>> + * @sampler_lock: Lock used to guard the list of sessions and the sampler configuration.
>> + * In particular, it guards the @session_list and the @em.
>> + */
>> + struct mutex sampler_lock;
>> +
>> + /** @session_list: List of all sessions. */
>> + struct list_head session_list;
>> +
>> + /** @pend_lock: Lock used to guard the list of sessions with pending samples. */
>> + spinlock_t pend_lock;
>> +
>> + /** @pending_samples: List of sessions requesting samples. */
>> + struct list_head pending_samples;
>> +
>> + /** @sample_requested: A sample has been requested. */
>> + bool sample_requested;
>> +
>> + /** @set_config: The set that will be configured onto the hardware. */
>> + u8 set_config;
>> +
>> + /**
>> + * @ptdev: Backpointer to the Panthor device, needed to ring the global doorbell and
>> + * interface with FW.
>> + */
>> + struct panthor_device *ptdev;
>> +};
>> +
>> struct panthor_perf_session {
>> DECLARE_BITMAP(state, PANTHOR_PERF_SESSION_MAX);
>>
>> @@ -184,6 +373,9 @@ struct panthor_perf {
>> * @sessions: Global map of sessions, accessed by their ID.
>> */
>> struct xarray sessions;
>> +
>> + /** @sampler: FW control interface. */
>> + struct panthor_perf_sampler sampler;
>> };
>>
>> struct panthor_perf_counter_block {
>> @@ -237,7 +429,7 @@ static void panthor_perf_info_init(struct panthor_device *ptdev)
>> }
>>
>> static struct panthor_perf_enable_masks *panthor_perf_create_em(struct drm_panthor_perf_cmd_setup
>> - *setup_args)
>> + *setup_args)
>> {
>> struct panthor_perf_enable_masks *em = kmalloc(sizeof(*em), GFP_KERNEL);
>> if (IS_ERR_OR_NULL(em))
>> @@ -257,6 +449,23 @@ static struct panthor_perf_enable_masks *panthor_perf_create_em(struct drm_panth
>> return em;
>> }
>>
>> +static void panthor_perf_em_add(struct panthor_perf_enable_masks *dst_em,
>> + const struct panthor_perf_enable_masks *const src_em)
>
> I think that, maybe for the sake of consistency, also make dst_em const? Just
> the pointer variable itself, not what it points to:
> struct panthor_perf_enable_masks *const dst_em
>
>> +{
>> + size_t i = 0;
>> +
>> + for (i = DRM_PANTHOR_PERF_BLOCK_FIRST; i <= DRM_PANTHOR_PERF_BLOCK_LAST; i++)
>> + bitmap_or(dst_em->mask[i], dst_em->mask[i], src_em->mask[i], PANTHOR_PERF_EM_BITS);
>> +}
>> +
>> +static void panthor_perf_em_zero(struct panthor_perf_enable_masks *em)
>> +{
>> + size_t i = 0;
>> +
>> + for (i = DRM_PANTHOR_PERF_BLOCK_FIRST; i <= DRM_PANTHOR_PERF_BLOCK_LAST; i++)
>> + bitmap_zero(em->mask[i], PANTHOR_PERF_EM_BITS);
>> +}
>> +
>> static u64 session_read_extract_idx(struct panthor_perf_session *session)
>> {
>> const u64 slots = session->ringbuf_slots;
>> @@ -267,6 +476,12 @@ static u64 session_read_extract_idx(struct panthor_perf_session *session)
>> return smp_load_acquire(&session->control->extract_idx) % slots;
>> }
>>
>> +static void session_write_insert_idx(struct panthor_perf_session *session, u64 idx)
>> +{
>> + /* Userspace needs the insert index to know where to look for the sample. */
>> + smp_store_release(&session->control->insert_idx, idx);
>> +}
>> +
>> static u64 session_read_insert_idx(struct panthor_perf_session *session)
>> {
>> const u64 slots = session->ringbuf_slots;
>> @@ -326,7 +541,7 @@ static void session_put(struct panthor_perf_session *session)
>> * Return: valid session pointer or an ERR_PTR.
>> */
>> static struct panthor_perf_session *session_find(struct panthor_file *pfile,
>> - struct panthor_perf *perf, u32 sid)
>> + struct panthor_perf *perf, u32 sid)
>> {
>> struct panthor_perf_session *session;
>>
>> @@ -352,6 +567,761 @@ static struct panthor_perf_session *session_find(struct panthor_file *pfile,
>> return session;
>> }
>>
>> +static u32 compress_enable_mask(unsigned long *const src)
>> +{
>> + size_t i;
>> + u32 result = 0;
>> + unsigned long clump;
>> +
>> + for_each_set_clump8(i, clump, src, PANTHOR_PERF_EM_BITS) {
>> + const unsigned long shift = div_u64(i, 4);
>> +
>> + result |= !!(clump & GENMASK(3, 0)) << shift;
>> + result |= !!(clump & GENMASK(7, 4)) << (shift + 1);
>> + }
>> +
>> + return result;
>> +}
>
> I think what you're trying to do here is, because one enable bit enables four consecutive counters,
> if we have a mask of 32 bit, then that could be used to stand for the enablement status of
> 32 * 4 = 128 counters altogether. However, one thing I keep wondering is, in what circumstances
> not all bits in a group of four could all be enabled at the same time (since they seem to be enabled
> at the user's request in groups of four anyways).
> Also, I think there's a problem in how you handle the shift.
> In the case of four consecutive bytes being handled:
> In for_each_set_clump8(i, [...]) , 'i' can take values between 0 and 15, so div_u64(i, 4)
> will return values between 0 and 3.
> Then let's say for 8-bit clumps with indeces between 0 and 3, you'd want to flag the 8 LS bits
> of 'result', but at the moment you're just overwriting them for successive clumps with the same modulo 'i'.
>
> I think what you meant was most likely this:
> ```
> for_each_set_clump8(i, clump, src, PANTHOR_PERF_EM_BITS) {
> result |= !!(clump & GENMASK(3, 0)) << (i * 2);
> result |= !!(clump & GENMASK(7, 4)) << (i * 2) + 1;
> }
> ```
>
The idea is that the user can request counters at a granularity of one counter per bit,
and this should work equivalently with both the case where we enable 2 counters per bit
at the hardware (64 counter per block systems) and the case where we enable 4 counters
per bit (128 counter per block systems).
The shift is wrong, as is the number of bits we want to be iterating over: for the Gx10
series, we want to be shifting by two bits at a time and only go through bits 0-63
from the enable mask.
>> +
>> +static void expand_enable_mask(u32 em, unsigned long *const dst)
>> +{
>> + size_t i;
>> + DECLARE_BITMAP(emb, BITS_PER_TYPE(u32));
>> +
>> + bitmap_from_arr32(emb, &em, BITS_PER_TYPE(u32));
>> +
>> + for_each_set_bit(i, emb, BITS_PER_TYPE(u32))
>> + bitmap_set(dst, i * 4, 4);
>> +}
>> +
>> +/**
>> + * panthor_perf_block_data - Identify the block index and type based on the offset.
>> + *
>> + * @desc: FW buffer descriptor.
>> + * @offset: The current offset being examined.
>> + * @idx: Pointer to an output index.
>> + * @type: Pointer to an output block type.
>> + *
>> + * To disambiguate different types of blocks as well as different blocks of the same type,
>> + * the offset into the FW ringbuffer is used to uniquely identify the block being considered.
>> + *
>> + * In the future, this is a good time to identify whether a block will be empty,
>> + * allowing us to short-circuit its processing after emitting header information.
>> + *
>> + * Return: True if the current block is available, false otherwise.
>> + */
>> +static bool panthor_perf_block_data(struct panthor_perf_buffer_descriptor *const desc,
>> + size_t offset, u32 *idx,
>> + enum drm_panthor_perf_block_type *type)
>> +{
>> + unsigned long id;
>> +
>> + for_each_set_bit(id, desc->available_blocks, DRM_PANTHOR_PERF_BLOCK_LAST) {
>
> I don't see the point of keeping this bitmask, because you set every single available bit
> in panthor_perf_setup_fw_buffer_desc, as you traverse all the enum drm_panthor_perf_block_type values.
> What you effectively do is turning the enum itself into a bitmask, so I'd just get rid of it and do
> 'for (enum drm_panthor_perf_block_type type = 0; type < DRM_PANTHOR_PERF_BLOCK_MAX; type++)'
> right here.
We do traverse all of the values, but the bitmask is meant to skip over the ones that are not
present on a given system. For instance, the metadata block and FW blocks may be missing, and if
new blocks are added, those would not be present on older systems.
>
>> + const size_t block_start = desc->blocks[id].offset;
>> + const size_t block_count = desc->blocks[id].block_count;
>> + const size_t block_end = desc->blocks[id].offset +
>> + desc->block_size * block_count;
>> +
>> + if (!block_count)
>> + continue;
>> +
>> + if ((offset >= block_start) && (offset < block_end)) {
>> + const unsigned long phys_mask[] = {
>> + BITMAP_FROM_U64(desc->blocks[id].phys_mask),
>> + };
>> + const size_t pos =
>> + div_u64(offset - desc->blocks[id].offset, desc->block_size);
>> +
>> + *type = id;
>> +
>> + if (test_bit(pos, phys_mask)) {
>> + const u64 mask = GENMASK_ULL(pos, 0);
>> + const u64 zeroes = ~desc->blocks[id].phys_mask & mask;
>> +
>> + *idx = pos - hweight64(zeroes);
>> + return true;
>
> I don't understand very well what you're trying to do here.
>
The code here is a bit awkward, but here is the idea: the architecturally
defined layout algorithm takes into account the total possible number of shader
cores by using the most significant bit of the shader core mask. For instance,
on one of the Rock5Bs (not sure if this is consistent), the shader core mask is
0b1010000000000000101, so the hardware will allocate 19 slots for shader core
counters.
However, the blocks corresponding to zeroes in the shader core mask will always be empty.
The above code does the checking to see if the current block has counters in it,
and then assigns each of the cores a stable index, based on the posiiton of the bit
in the bitmask. When setting up the buffer descriptor, we create an artificial physical
mask for each of the block types, where only the shader core one can be sparse at
current time.
With the bitmask above, the user would see a sample with four shader core blocks,
consistently indexed as 0 through 3 (inclusive), with no gaps between them.
For systems with highly sparse core masks, this is a fairly large memory saving. On
the Rock 5B, if I recall corerctly, the total block count is somewhere around 8-9
on the regular, whereas without the removal of architecturally mandated but empty
slots, this would be 23-24 blocks.
>> + }
>> + return false;
>> + }
>> + }
>> +
>> + return false;
>> +}
>> +
>> +static size_t session_get_user_sample_size(const struct drm_panthor_perf_info *const info)
>> +{
>> + const size_t block_size = get_annotated_block_size(info->counters_per_block);
>> + const size_t block_nr = info->cshw_blocks + info->fw_blocks +
>> + info->tiler_blocks + info->memsys_blocks + info->shader_blocks;
>> +
>> + return sizeof(struct drm_panthor_perf_sample_header) + (block_size * block_nr);
>> +}
>
> You've redefined session_get_user_sample_size here.
>
Will remove, thanks.
>> +
>> +static u32 panthor_perf_handle_sample(struct panthor_device *ptdev, u32 extract_idx, u32 insert_idx)
>> +{
>> + struct panthor_perf *perf = ptdev->perf;
>> + struct panthor_perf_sampler *sampler = &ptdev->perf->sampler;
>> + const size_t ann_block_size =
>> + get_annotated_block_size(ptdev->perf_info.counters_per_block);
>> + u32 i;
>> +
>> + for (i = extract_idx; i != insert_idx; i++) {
>> + u32 slot = i % sampler->sample_slots;
>> + u8 *fw_sample = (u8 *)sampler->rb->kmap + slot * sampler->desc.buffer_size;
>> +
>> + for (size_t fw_off = 0, ann_off = sizeof(struct drm_panthor_perf_sample_header);
>> + fw_off < sampler->desc.buffer_size;
>> + fw_off += sampler->desc.block_size)
>> +
>> + {
>> + u32 idx = 0;
>> + enum drm_panthor_perf_block_type type = 0;
>> + DECLARE_BITMAP(expanded_em, PANTHOR_PERF_EM_BITS);
>> + struct panthor_perf_counter_block *blk =
>> + (typeof(blk))(perf->sampler.sample + ann_off);
>> + u32 *const block = (u32 *)(fw_sample + fw_off);
>> + const u32 prfcnt_en = block[PANTHOR_CTR_PRFCNT_EN];
>> +
>> + if (!panthor_perf_block_data(&sampler->desc, fw_off, &idx, &type))
>> + continue;
>> +
>> + /**
>> + * TODO Data from the metadata block must be used to populate the
>> + * block state information.
>> + */
>> + if (type == DRM_PANTHOR_PERF_BLOCK_METADATA) {
>> + /*
>> + * The host must clear the SAMPLE_REASON to acknowledge it has
>> + * consumed the sample.
>> + */
>> + block[PANTHOR_CTR_SAMPLE_REASON] = 0;
>> + continue;
>> + }
>> +
>> + expand_enable_mask(prfcnt_en, expanded_em);
>> +
>> + blk->header = (struct drm_panthor_perf_block_header) {
>> + .clock = 0,
>> + .block_idx = idx,
>> + .block_type = type,
>> + .block_states = DRM_PANTHOR_PERF_BLOCK_STATE_UNKNOWN
>
> Are you using any states other than DRM_PANTHOR_PERF_BLOCK_STATE_UNKNOWN?
>
Not at the moment. Wanted to keep this functionality out of this patch series.
>> + };
>> + bitmap_to_arr64(blk->header.enable_mask, expanded_em, PANTHOR_PERF_EM_BITS);
>> +
>> + /*
>> + * The four header counters must be treated differently, because they are
>> + * not additive. For the fourth, the assignment does not matter, as it
>> + * is reserved and should be zero.
>> + */
>> + blk->counters[PANTHOR_CTR_TIMESTAMP_LO] = block[PANTHOR_CTR_TIMESTAMP_LO];
>> + blk->counters[PANTHOR_CTR_TIMESTAMP_HI] = block[PANTHOR_CTR_TIMESTAMP_HI];
>> + blk->counters[PANTHOR_CTR_PRFCNT_EN] = block[PANTHOR_CTR_PRFCNT_EN];
>> +
>> + /*
>> + * The host must clear PRFCNT_EN to acknowledge it has consumed the sample.
>> + */
>> + block[PANTHOR_CTR_PRFCNT_EN] = 0;
>> +
>> + for (size_t k = PANTHOR_HEADER_COUNTERS;
>> + k < ptdev->perf_info.counters_per_block;
>> + k++)
>> + blk->counters[k] += block[k];
>> +
>> + ann_off += ann_block_size;
>
> Why wouldn't you include this inside the for loop step definition?
>
Because we do not want to increment the annotated block size on blocks that will be ignored,
like the reserved but unavailable shader core ones. This determination is done in
panthor_perf_block_data.
>> + }
>> + }
>> +
>> + return i;
>> +}
>> +
>> +static size_t panthor_perf_get_fw_reported_size(struct panthor_device *ptdev)
>> +{
>> + struct panthor_fw_global_iface *glb_iface = panthor_fw_get_glb_iface(ptdev);
>> +
>> + size_t fw_size = GLB_PERFCNT_FW_SIZE(glb_iface->control->perfcnt_size);
>> + size_t hw_size = GLB_PERFCNT_HW_SIZE(glb_iface->control->perfcnt_size);
>> + size_t md_size = PERFCNT_FEATURES_MD_SIZE(glb_iface->control->perfcnt_features);
>> +
>> + return md_size + fw_size + hw_size;
>> +}
>> +
>> +#define PANTHOR_PERF_SET_BLOCK_DESC_DATA(__desc, __type, __blk_count, __phys_mask, __offset) \
>> + ({ \
>> + (__desc)->blocks[(__type)].offset = (__offset); \
>> + (__desc)->blocks[(__type)].block_count = (__blk_count); \
>> + (__desc)->blocks[(__type)].phys_mask = (__phys_mask); \
>> + if ((__blk_count)) \
>> + set_bit((__type), (__desc)->available_blocks); \
>> + (__offset) + ((__desc)->block_size) * (__blk_count); \
>> + })
>> +
>> +static size_t get_reserved_shader_core_blocks(struct panthor_device *ptdev)
>> +{
>> + const u64 sc_mask = ptdev->gpu_info.shader_present;
>> +
>> + return fls64(sc_mask);
>> +}
>> +
>> +#define BLK_MASK(x) GENMASK_ULL((x) - 1, 0)
>> +
>> +static u64 get_shader_core_mask(struct panthor_device *ptdev)
>> +{
>> + const u64 sc_mask = ptdev->gpu_info.shader_present;
>> +
>> + return BLK_MASK(hweight64(sc_mask));
>> +}
>> +
>> +static int panthor_perf_setup_fw_buffer_desc(struct panthor_device *ptdev,
>> + struct panthor_perf_sampler *sampler)
>> +{
>> + const struct drm_panthor_perf_info *const info = &ptdev->perf_info;
>> + const size_t block_size = info->counters_per_block * PANTHOR_HW_COUNTER_SIZE;
>> + struct panthor_perf_buffer_descriptor *desc = &sampler->desc;
>> + const size_t fw_sample_size = panthor_perf_get_fw_reported_size(ptdev);
>> + size_t offset = 0;
>> +
>> + desc->block_size = block_size;
>> +
>> + for (enum drm_panthor_perf_block_type type = 0; type < DRM_PANTHOR_PERF_BLOCK_MAX; type++) {
>> + switch (type) {
>> + case DRM_PANTHOR_PERF_BLOCK_METADATA:
>> + if (info->flags & DRM_PANTHOR_PERF_BLOCK_STATES_SUPPORT)
>> + offset = PANTHOR_PERF_SET_BLOCK_DESC_DATA(desc, type, 1,
>> + BLK_MASK(1), offset);
>> + break;
>> + case DRM_PANTHOR_PERF_BLOCK_FW:
>> + offset = PANTHOR_PERF_SET_BLOCK_DESC_DATA(desc, type, info->fw_blocks,
>> + BLK_MASK(info->fw_blocks),
>> + offset);
>> + break;
>> + case DRM_PANTHOR_PERF_BLOCK_CSHW:
>> + offset = PANTHOR_PERF_SET_BLOCK_DESC_DATA(desc, type, info->cshw_blocks,
>> + BLK_MASK(info->cshw_blocks),
>> + offset);
>> + break;
>> + case DRM_PANTHOR_PERF_BLOCK_TILER:
>> + offset = PANTHOR_PERF_SET_BLOCK_DESC_DATA(desc, type, info->tiler_blocks,
>> + BLK_MASK(info->tiler_blocks),
>> + offset);
>> + break;
>> + case DRM_PANTHOR_PERF_BLOCK_MEMSYS:
>> + offset = PANTHOR_PERF_SET_BLOCK_DESC_DATA(desc, type, info->memsys_blocks,
>> + BLK_MASK(info->memsys_blocks),
>> + offset);
>> + break;
>> + case DRM_PANTHOR_PERF_BLOCK_SHADER:
>> + offset = PANTHOR_PERF_SET_BLOCK_DESC_DATA(desc, type,
>> + get_reserved_shader_core_blocks(ptdev),
>> + get_shader_core_mask(ptdev),
>> + offset);
>> + break;
>> + case DRM_PANTHOR_PERF_BLOCK_MAX:
>> + drm_WARN_ON_ONCE(&ptdev->base,
>> + "DRM_PANTHOR_PERF_BLOCK_MAX should be unreachable!");
>> + break;
>> + }
>> + }
>> +
>> + /* Computed size is not the same as the reported size, so we should not proceed in
>> + * initializing the sampling session.
>> + */
>> + if (offset != fw_sample_size)
>> + return -EINVAL;
>> +
>> + desc->buffer_size = offset;
>> +
>> + return 0;
>> +}
>> +
>> +static int panthor_perf_fw_stop_sampling(struct panthor_device *ptdev)
>> +{
>> + struct panthor_fw_global_iface *glb_iface = panthor_fw_get_glb_iface(ptdev);
>> + u32 acked;
>> + int ret;
>> +
>> + if (~READ_ONCE(glb_iface->input->req) & GLB_PERFCNT_ENABLE)
>> + return 0;
>> +
>> + panthor_fw_update_reqs(glb_iface, req, 0, GLB_PERFCNT_ENABLE);
>> + gpu_write(ptdev, CSF_DOORBELL(CSF_GLB_DOORBELL_ID), 1);
>> + ret = panthor_fw_glb_wait_acks(ptdev, GLB_PERFCNT_ENABLE, &acked, 100);
>> + if (ret)
>> + drm_warn(&ptdev->base, "Could not disable performance counters");
>> +
>> + return ret;
>> +}
>> +
>> +static int panthor_perf_fw_start_sampling(struct panthor_device *ptdev)
>> +{
>> + struct panthor_fw_global_iface *glb_iface = panthor_fw_get_glb_iface(ptdev);
>> + u32 acked;
>> + int ret;
>> +
>> + if (READ_ONCE(glb_iface->input->req) & GLB_PERFCNT_ENABLE)
>> + return 0;
>> +
>> + panthor_fw_update_reqs(glb_iface, req, GLB_PERFCNT_ENABLE, GLB_PERFCNT_ENABLE);
>> + gpu_write(ptdev, CSF_DOORBELL(CSF_GLB_DOORBELL_ID), 1);
>> + ret = panthor_fw_glb_wait_acks(ptdev, GLB_PERFCNT_ENABLE, &acked, 100);
>> + if (ret)
>> + drm_warn(&ptdev->base, "Could not enable performance counters");
>> +
>> + return ret;
>> +}
>> +
>> +static void panthor_perf_fw_write_config(struct panthor_perf_sampler *sampler,
>> + struct panthor_perf_enable_masks *em)
>
> I've noticed inconsistent usage in the way you declare some local variables (enablement mask pointers specifically)
> as pointer constants, eg, in panthor_perf_em_add() you declare it as struct panthor_perf_enable_masks *const.
> Here it would be alright to declare it as const too, because it's only being read from.
>
Let me go through and make that more consistent, thanks.
>> +{
>> + struct panthor_fw_global_iface *glb_iface = panthor_fw_get_glb_iface(sampler->ptdev);
>> + u32 perfcnt_config;
>> +
>> + glb_iface->input->perfcnt_csf_enable =
>> + compress_enable_mask(em->mask[DRM_PANTHOR_PERF_BLOCK_CSHW]);
>> + glb_iface->input->perfcnt_shader_enable =
>> + compress_enable_mask(em->mask[DRM_PANTHOR_PERF_BLOCK_SHADER]);
>> + glb_iface->input->perfcnt_mmu_l2_enable =
>> + compress_enable_mask(em->mask[DRM_PANTHOR_PERF_BLOCK_MEMSYS]);
>> + glb_iface->input->perfcnt_tiler_enable =
>> + compress_enable_mask(em->mask[DRM_PANTHOR_PERF_BLOCK_TILER]);
>> + glb_iface->input->perfcnt_fw_enable =
>> + compress_enable_mask(em->mask[DRM_PANTHOR_PERF_BLOCK_FW]);
>> +
>> + WRITE_ONCE(glb_iface->input->perfcnt_as, panthor_vm_as(panthor_fw_vm(sampler->ptdev)));
>> + WRITE_ONCE(glb_iface->input->perfcnt_base, panthor_kernel_bo_gpuva(sampler->rb));
>
> Some of the things you do here need to be done only once at perf init time, like
>
> ```
> WRITE_ONCE(glb_iface->input->perfcnt_as, panthor_vm_as(panthor_fw_vm(sampler->ptdev)));
> WRITE_ONCE(glb_iface->input->perfcnt_base, panthor_kernel_bo_gpuva(sampler->rb));
> [...]
> perfcnt_config = GLB_PERFCNT_CONFIG_SIZE(perf_ringbuf_slots);
> [...]
> if (sampler->ptdev->perf_info.flags & DRM_PANTHOR_PERF_BLOCK_STATES_SUPPORT)
> perfcnt_config |= GLB_PERFCNT_METADATA_ENABLE;
> ```
> However, this is done every single time a new session is introduced.
We could save a little bit by skipping writing the AS and base address of the ring buffer,
but this is sometimes required. For instance, these functions are also called in the
post reset path, since the reset zeroes them out.
> Also, perhaps it's a good idea to allocate the FW ring buffer only when the first session is registered,
> and remove it when no sessions are active to save some memory
It's on the list of things to improve on, but would you mind if I did that in a follow-up
patch set? I'd like to progress this series without adding more features on top. Currently,
the module param can already be used to reduce the memory consumption of the FW ring buffer
to somewhere around several pages.
>> +
>> + perfcnt_config = GLB_PERFCNT_CONFIG_SIZE(perf_ringbuf_slots);
>> + perfcnt_config |= GLB_PERFCNT_CONFIG_SET(sampler->set_config);
>> + if (sampler->ptdev->perf_info.flags & DRM_PANTHOR_PERF_BLOCK_STATES_SUPPORT)
>> + perfcnt_config |= GLB_PERFCNT_METADATA_ENABLE;
>> +
>> + WRITE_ONCE(glb_iface->input->perfcnt_config, perfcnt_config);
>> +
>> + /**
>> + * The spec mandates that the host zero the PRFCNT_EXTRACT register before an enable
>> + * operation, and each (re-)enable will require an enable-disable pair to program
>> + * the new changes onto the FW interface.
>> + */
>> + WRITE_ONCE(glb_iface->input->perfcnt_extract, 0);
>> +}
>> +
>> +static void panthor_perf_fw_write_sampler_config(struct panthor_perf_sampler *sampler)
>> +{
>> + panthor_perf_fw_write_config(sampler, sampler->em);
>> +}
>> +
>> +static void session_populate_sample_header(struct panthor_perf_session *session,
>> + struct drm_panthor_perf_sample_header *hdr, u8 set)
>> +{
>> + *hdr = (struct drm_panthor_perf_sample_header) {
>> + .block_set = set,
>> + .user_data = session->user_data,
>> + .timestamp_start_ns = session->sample_start_ns,
>> + /**
>> + * TODO This should be changed to use the GPU clocks and the TIMESTAMP register,
>> + * when support is added.
>
> Access to the timestamp registers is available since the merging of the panthor fdinfo support patch series
>
It is, but I'd like to address that in a separate patch series.
>> + */
>> + .timestamp_end_ns = ktime_get_raw_ns(),
>> + };
>> +}
>> +
>> +/**
>> + * session_accumulate_sample - Accumulate the counters that are requested by the session
>> + * into the target buffer.
>> + *
>> + * @ptdev: Panthor device
>> + * @session: Perf session
>> + * @session_sample: Starting offset of the sample in the userspace mapping.
>> + * @sampler_sample: Starting offset of the sample in the sampler intermediate buffer.
>> + *
>> + * The hardware supports counter selection at the granularity of 1 bit per 4 counters, and there
>> + * is a single global FW frontend to program the counter requests from multiple sessions. This may
>> + * lead to a large disparity between the requested and provided counters for an individual client.
>> + * To remove this cross-talk, we patch out the counters that have not been requested by this
>> + * session and update the PRFCNT_EN, the header counter containing a bitmask of enabled counters,
>> + * accordingly.
>> + */
>> +static void session_accumulate_sample(struct panthor_device *ptdev,
>> + struct panthor_perf_session *session,
>> + u8 *session_sample, u8 *sampler_sample)
>> +{
>> + const struct drm_panthor_perf_info *const perf_info = &ptdev->perf_info;
>> + const size_t block_size = get_annotated_block_size(perf_info->counters_per_block);
>> + const size_t sample_size = session_get_user_sample_size(perf_info);
>> + const size_t sample_header_size = sizeof(struct drm_panthor_perf_sample_header);
>> + const size_t data_size = sample_size - sample_header_size;
>> + struct drm_panthor_perf_sample_header *hdr = (typeof(hdr))session_sample;
>> +
>> + hdr->timestamp_end_ns = ktime_get_raw_ns();
>> +
>> + session_sample += sample_header_size;
>> + sampler_sample += sample_header_size;
>> +
>> + for (size_t i = 0; i < data_size; i += block_size) {
>> + size_t ctr_idx;
>> + DECLARE_BITMAP(enabled_ctrs, PANTHOR_PERF_EM_BITS);
>> + struct panthor_perf_counter_block *dst_blk = (typeof(dst_blk))(session_sample + i);
>> + struct panthor_perf_counter_block *src_blk = (typeof(src_blk))(sampler_sample + i);
>> +
>> + bitmap_from_arr64(enabled_ctrs, dst_blk->header.enable_mask, PANTHOR_PERF_EM_BITS);
>> + bitmap_clear(enabled_ctrs, 0, PANTHOR_HEADER_COUNTERS);
>> +
>> + dst_blk->counters[PANTHOR_CTR_TIMESTAMP_HI] =
>> + src_blk->counters[PANTHOR_CTR_TIMESTAMP_HI];
>> + dst_blk->counters[PANTHOR_CTR_TIMESTAMP_LO] =
>> + src_blk->counters[PANTHOR_CTR_TIMESTAMP_LO];
>> +
>> + for_each_set_bit(ctr_idx, enabled_ctrs, PANTHOR_PERF_EM_BITS)
>> + dst_blk->counters[ctr_idx] += src_blk->counters[ctr_idx];
>> + }
>> +}
>> +
>> +static void panthor_perf_fw_request_sample(struct panthor_perf_sampler *sampler)
>> +{
>> + struct panthor_fw_global_iface *glb_iface = panthor_fw_get_glb_iface(sampler->ptdev);
>> +
>> + panthor_fw_toggle_reqs(glb_iface, req, ack, GLB_PERFCNT_SAMPLE);
>> + gpu_write(sampler->ptdev, CSF_DOORBELL(CSF_GLB_DOORBELL_ID), 1);
>> +}
>> +
>> +/**
>> + * session_populate_sample - Write out a new sample into a previously populated slot in the user
>> + * ringbuffer and update both the header of the block and the PRFCNT_EN
>> + * counter to contain only the selected subset of counters for that block.
>> + *
>> + * @ptdev: Panthor device
>> + * @session: Perf session
>> + * @session_sample: Pointer aligned to the start of the data section of the sample in the targeted
>> + * slot.
>> + * @sampler_sample: Pointer aligned to the start of the data section of the intermediate sampler
>> + * buffer.
>> + *
>> + * When a new sample slot is targeted, it must be cleared of the data already existing there,
>> + * enabling a direct copy from the intermediate buffer and then zeroing out any counters
>> + * that are not required for the current session.
>> + */
>> +static void session_populate_sample(struct panthor_device *ptdev,
>> + struct panthor_perf_session *session, u8 *session_sample,
>> + u8 *sampler_sample)
>> +{
>> + const struct drm_panthor_perf_info *const perf_info = &ptdev->perf_info;
>> +
>> + const size_t block_size = get_annotated_block_size(perf_info->counters_per_block);
>> + const size_t sample_size = session_get_user_sample_size(perf_info);
>> + const size_t sample_header_size = sizeof(struct drm_panthor_perf_sample_header);
>> + const size_t data_size = sample_size - sample_header_size;
>> +
>> + memcpy(session_sample, sampler_sample, sample_size);
>
> If you're overwriting the sample header next line, maybe what you meant was?
>
> memcpy(session_sample + sample_header_size, sampler_sample, sample_size);
>
Initially, I had done this to simplify the handling, but can equally convert it to
memcpy(session_sample, sampler_sample + sample_header_size, data_size)
after the start of the session sample has been incremented several lines down.
>> +
>> + session_populate_sample_header(session,
>> + (struct drm_panthor_perf_sample_header *)session_sample,
>> + ptdev->perf->sampler.set_config);
>> +
>> + session_sample += sample_header_size;
>> +
>> + for (size_t i = 0; i < data_size; i += block_size) {
>> + size_t ctr_idx;
>> + DECLARE_BITMAP(em_diff, PANTHOR_PERF_EM_BITS);
>> + struct panthor_perf_counter_block *blk = (typeof(blk))(session_sample + i);
>> + enum drm_panthor_perf_block_type type = blk->header.block_type;
>> + unsigned long *blk_em = session->enabled_counters->mask[type];
>> +
>> + bitmap_from_arr64(em_diff, blk->header.enable_mask, PANTHOR_PERF_EM_BITS);
>> +
>> + bitmap_andnot(em_diff, em_diff, blk_em, PANTHOR_PERF_EM_BITS);
>> + bitmap_clear(em_diff, 0, PANTHOR_HEADER_COUNTERS);
>> +
>> + blk->counters[PANTHOR_CTR_PRFCNT_EN] = compress_enable_mask(blk_em);
>> +
>> + for_each_set_bit(ctr_idx, em_diff, PANTHOR_PERF_EM_BITS)
>> + blk->counters[ctr_idx] = 0;
>> +
>> + bitmap_to_arr64(&blk->header.enable_mask, blk_em, PANTHOR_PERF_EM_BITS);
>
> I'm wondering about the need to do this (writing the session enablement mask into the block header)
> since we're already zeroing out the unrequested counters and also UM knows it.
>
This handles multiple sessions with different enable masks, where the UM must not know the enable
masks of other sessions.
>> + }
>> +}
>> +
>> +static int session_copy_sample(struct panthor_device *ptdev, struct panthor_perf_session *session)
>> +{
>> + struct panthor_perf *perf = ptdev->perf;
>> + const size_t sample_size = session_get_user_sample_size(&ptdev->perf_info);
>> + const u64 insert_idx = session_read_insert_idx(session);
>> + const u64 extract_idx = session_read_extract_idx(session);
>> + u8 *new_sample;
>> +
>> + if (!CIRC_SPACE_TO_END(insert_idx, extract_idx, session->ringbuf_slots))
>> + return -ENOSPC;
>> +
>> + if (READ_ONCE(session->pending_sample_request) == SAMPLE_TYPE_INITIAL)
>> + return 0;
>> +
>> + new_sample = session->samples + insert_idx * sample_size;
>> +
>> + if (session->accum_idx != insert_idx) {
>> + session_populate_sample(ptdev, session, new_sample, perf->sampler.sample);
>> + session->accum_idx = insert_idx;
>> + } else
>> + session_accumulate_sample(ptdev, session, new_sample, perf->sampler.sample);
>> +
>> + return 0;
>> +}
>> +
>> +static void session_emit_sample(struct panthor_perf_session *session)
>> +{
>> + const u64 insert_idx = session_read_insert_idx(session);
>> + const enum session_sample_type type = READ_ONCE(session->pending_sample_request);
>> +
>> + if (type == SAMPLE_TYPE_INITIAL || type == SAMPLE_TYPE_NONE)
>> + goto reset_sample_request;
>> +
>> + session_write_insert_idx(session, (insert_idx + 1) % session->ringbuf_slots);
>> +
>> + /* Since we are about to notify userspace, we must ensure that all changes to memory
>> + * are visible.
>> + */
>> + wmb();
>> +
>> + eventfd_signal(session->eventfd);
>> +
>> +reset_sample_request:
>> + WRITE_ONCE(session->pending_sample_request, SAMPLE_TYPE_NONE);
>> +}
>> +
>> +#define PRFCNT_IRQS (GLB_PERFCNT_OVERFLOW | GLB_PERFCNT_SAMPLE | GLB_PERFCNT_THRESHOLD)
>> +
>> +void panthor_perf_report_irq(struct panthor_device *ptdev, u32 status)
>> +{
>> + struct panthor_perf *const perf = ptdev->perf;
>> + struct panthor_perf_sampler *sampler;
>> + struct panthor_fw_global_iface *glb_iface = panthor_fw_get_glb_iface(ptdev);
>> + bool sample_requested;
>> +
>> + if (!(status & JOB_INT_GLOBAL_IF))
>> + return;
>> +
>> + if (!perf)
>> + return;
>> +
>> + sampler = &perf->sampler;
>> +
>> + const u32 ack = READ_ONCE(glb_iface->output->ack);
>> + const u32 req = READ_ONCE(glb_iface->input->req);
>> +
>> + scoped_guard(spinlock_irqsave, &sampler->pend_lock)
>> + sample_requested = sampler->sample_requested;
>> +
>> +
>> + /*
>> + * TODO Fix up the error handling for overflow. Currently, the user is unblocked
>> + * with a completely empty sample, whic is not the intended behaviour.
>> + */
>> + if (drm_WARN_ON_ONCE(&ptdev->base, (req ^ ack) & GLB_PERFCNT_OVERFLOW))
>> + goto emit;
>> +
>> + if ((sample_requested && (req & GLB_PERFCNT_SAMPLE) == (ack & GLB_PERFCNT_SAMPLE)) ||
>> + ((req ^ ack) & GLB_PERFCNT_THRESHOLD)) {
>> + const u32 extract_idx = READ_ONCE(glb_iface->input->perfcnt_extract);
>> + const u32 insert_idx = READ_ONCE(glb_iface->output->perfcnt_insert);
>> +
>> + /* If the sample was requested around a reset, some time may be needed
>> + * for the FW interface to be updated, so we reschedule a sample
>> + * and return immediately.
>> + */
>> + if (insert_idx == extract_idx) {
>> + guard(spinlock_irqsave)(&sampler->pend_lock);
>> + if (sampler->sample_requested)
>> + panthor_perf_fw_request_sample(sampler);
>> +
>> + return;
>> + }
>> +
>> + WRITE_ONCE(glb_iface->input->perfcnt_extract,
>> + panthor_perf_handle_sample(ptdev, extract_idx, insert_idx));
>
> Here you'd always be writing insert_idx into glb_iface->input->perfcnt_extract,
> so is it really necessary to let panthor_perf_handle_sample return it?
>
I had originally written it this way to have more flexibility in not immediately
handling some samples in the FW ring buffer, but since this functionality is not
being used, it can be removed.
>> + }
>> +
>> + scoped_guard(mutex, &sampler->sampler_lock)
>> + {
>> + struct list_head *pos;
>> +
>> + list_for_each(pos, &sampler->session_list) {
>> + struct panthor_perf_session *session = list_entry(pos,
>> + struct panthor_perf_session, sessions);
>> +
>> + session_copy_sample(ptdev, session);
>> + }
>> + }
>> +
>> +emit:
>> + scoped_guard(spinlock_irqsave, &sampler->pend_lock) {
>> + struct list_head *pos, *tmp;
>> +
>> + list_for_each_safe(pos, tmp, &sampler->pending_samples) {
>> + struct panthor_perf_session *session = list_entry(pos,
>> + struct panthor_perf_session, pending);
>> +
>> + session_emit_sample(session);
>> + list_del(pos);
>> + session_put(session);
>> + }
>> +
>> + sampler->sample_requested = false;
>> + }
>> +
>> + memset(sampler->sample, 0, session_get_user_sample_size(&ptdev->perf_info));
>
> I wonder why we'd want to zero out the intermediate sample buffer, since we don't need to do
> that to tell the hardware that the FW sample was consumed (that is done in the FW ringbuffer),
> and also it'll be overwritten next time a sample is produced by the FW. However, next time
> there's an irq notification for a sample, it turns out that session->accum_idx == insert_idx,
> perhaps we could defer the zero'ing out until then? Alternatively, adding a field to the
> sample header in the sampler->sample buffer that would tell us if it needs to be overwritten
> in the next occurrence of a copy might be enough?
>
There is no architectural reason to do this here, but in my mind, it simplifies re-use of the
buffer. We can't use the accum_idx to determine when to zero it, because at that point, it
would already be too late, so we would have to move it back to right before the sample
handling. Considering the single UM session case without any set re-configurations in
between, we may be able to get away without zeroing it at all. At the same time, we would
then have to be careful about how this interacts with the secondary and tertiary sets. If two
subsequent sessions choose different sets, it's simpler to ensure that the buffer
is always zeroed rather than tracking under what conditions we can skip the memset.
>> + complete(&sampler->sample_handled);
>> +}
>> +
>> +static int panthor_perf_sampler_init(struct panthor_perf_sampler *sampler,
>> + struct panthor_device *ptdev)
>> +{
>> + struct panthor_kernel_bo *bo;
>> + u8 *sample;
>> + int ret;
>> +
>> + ret = panthor_perf_setup_fw_buffer_desc(ptdev, sampler);
>> + if (ret) {
>> + drm_err(&ptdev->base,
>> + "Failed to setup descriptor for FW ring buffer, err = %d", ret);
>> + return ret;
>> + }
>> +
>> + bo = panthor_kernel_bo_create(ptdev, panthor_fw_vm(ptdev),
>> + sampler->desc.buffer_size * perf_ringbuf_slots,
>> + DRM_PANTHOR_BO_NO_MMAP,
>> + DRM_PANTHOR_VM_BIND_OP_MAP_NOEXEC |
>> + DRM_PANTHOR_VM_BIND_OP_MAP_UNCACHED,
>> + PANTHOR_VM_KERNEL_AUTO_VA);
>> +
>> + if (IS_ERR_OR_NULL(bo))
>> + return IS_ERR(bo) ? PTR_ERR(bo) : -ENOMEM;
>> +
>> + ret = panthor_kernel_bo_vmap(bo);
>> + if (ret)
>> + goto cleanup_bo;
>> +
>> + sample = kzalloc(session_get_user_sample_size(&ptdev->perf_info), GFP_KERNEL);
>> + if (ZERO_OR_NULL_PTR(sample)) {
>> + ret = -ENOMEM;
>> + goto cleanup_vmap;
>> + }
>> +
>> + sampler->rb = bo;
>> + sampler->sample = sample;
>> + sampler->sample_slots = perf_ringbuf_slots;
>> + sampler->em = kzalloc(sizeof(*sampler->em), GFP_KERNEL);
>> +
>> + mutex_init(&sampler->sampler_lock);
>> + spin_lock_init(&sampler->pend_lock);
>> + INIT_LIST_HEAD(&sampler->session_list);
>> + INIT_LIST_HEAD(&sampler->pending_samples);
>> + init_completion(&sampler->sample_handled);
>> +
>> + sampler->ptdev = ptdev;
>> +
>> + return 0;
>> +
>> +cleanup_vmap:
>> + panthor_kernel_bo_vunmap(bo);
>> +
>> +cleanup_bo:
>> + panthor_kernel_bo_destroy(bo);
>> +
>> + return ret;
>> +}
>> +
>> +static void panthor_perf_sampler_term(struct panthor_perf_sampler *sampler)
>> +{
>> + int ret;
>> + bool requested;
>> +
>> + scoped_guard(spinlock_irqsave, &sampler->pend_lock)
>> + requested = sampler->sample_requested;
>> +
>> + if (requested)
>> + wait_for_completion_killable(&sampler->sample_handled);
>> +
>> + panthor_perf_fw_write_config(sampler, &(struct panthor_perf_enable_masks){});
>
> When you remove a session, you first call 'panthor_perf_em_zero(sampler->em);'
> and then compose a new global sampler enablement mask with the OR'd bitmap
> of the different sessions' enablement masks. But if there are no sessions
> left, you're guaranteed that here sampler->em will be all zeros,
> so you can just do 'panthor_perf_fw_write_sampler_config(sampler)' and
> inline the definition of 'panthor_perf_fw_write_config()' into it
>
Hadn't spotted that, thank you.
>> +
>> + ret = panthor_perf_fw_stop_sampling(sampler->ptdev);
>> + if (ret)
>> + drm_warn_once(&sampler->ptdev->base, "Sampler termination failed, ret = %d", ret);
>> +
>> + kfree(sampler->sample);
>> +
>> + panthor_kernel_bo_destroy(sampler->rb);
>> +}
>> +
>> +static int panthor_perf_sampler_add(struct panthor_perf_sampler *sampler,
>> + struct panthor_perf_session *session, u8 set)
>> +{
>> + int ret = 0;
>> + struct panthor_perf_enable_masks *session_em = session->enabled_counters;
>> +
>> + guard(mutex)(&sampler->sampler_lock);
>> +
>> + /* Early check for whether a new set can be configured. */
>> + if (!atomic_read(&sampler->enabled_clients))
>> + sampler->set_config = set;
>> + else
>> + if (sampler->set_config != set)
>> + return -EBUSY;
>> +
>> + panthor_perf_em_add(sampler->em, session_em);
>> + ret = pm_runtime_resume_and_get(sampler->ptdev->base.dev);
>> + if (ret)
>> + return ret;
>> +
>> + if (atomic_read(&sampler->enabled_clients)) {
>> + ret = panthor_perf_fw_stop_sampling(sampler->ptdev);
>> + if (ret)
>> + return ret;
>> + }
>> +
>> + panthor_perf_fw_write_sampler_config(sampler);
>> +
>> + ret = panthor_perf_fw_start_sampling(sampler->ptdev);
>> + if (ret)
>> + return ret;
>> +
>> + session_get(session);
>> + list_add_tail(&session->sessions, &sampler->session_list);
>> + atomic_inc(&sampler->enabled_clients);
>> +
>> + return 0;
>> +}
>> +
>> +static int panthor_perf_sampler_remove_session(struct panthor_perf_sampler *sampler,
>> + struct panthor_perf_session *session)
>> +{
>> + int ret;
>> + struct list_head *snode;
>> +
>> + guard(mutex)(&sampler->sampler_lock);
>> +
>> + list_del_init(&session->sessions);
>> + session_put(session);
>> +
>> + panthor_perf_em_zero(sampler->em);
>> + list_for_each(snode, &sampler->session_list)
>> + {
>> + struct panthor_perf_session *session =
>> + container_of(snode, typeof(*session), sessions);
>> +
>> + panthor_perf_em_add(sampler->em, session->enabled_counters);
>> + }
>> +
>> + ret = panthor_perf_fw_stop_sampling(sampler->ptdev);
>> + if (ret)
>> + return ret;
>> +
>> + atomic_dec(&sampler->enabled_clients);
>> + pm_runtime_put_sync(sampler->ptdev->base.dev);
>> +
>> + panthor_perf_fw_write_sampler_config(sampler);
>> +
>> + if (atomic_read(&sampler->enabled_clients))
>> + return panthor_perf_fw_start_sampling(sampler->Ptdev);
>> + return 0;
>> +}
>> +
>> /**
>> * panthor_perf_init - Initialize the performance counter subsystem.
>> * @ptdev: Panthor device
>> @@ -382,6 +1352,10 @@ int panthor_perf_init(struct panthor_device *ptdev)
>> .max = 1,
>> };
>>
>> + ret = panthor_perf_sampler_init(&perf->sampler, ptdev);
>> + if (ret)
>> + return ret;
>> +
>> drm_info(&ptdev->base, "Performance counter subsystem initialized");
>>
>> ptdev->perf = no_free_ptr(perf);
>> @@ -389,6 +1363,69 @@ int panthor_perf_init(struct panthor_device *ptdev)
>> return ret;
>> }
>>
>> +static int sampler_request(struct panthor_perf_sampler *sampler,
>> + struct panthor_perf_session *session, enum session_sample_type type)
>> +{
>> + guard(spinlock_irqsave)(&sampler->pend_lock);
>
> You're extending the lock to the entire function, but the only time you
> modify the pending_samples list is after getting the session, so why not
> just limit the critical section to that statement?
>
No good reason to do so, will reduce the scope.
>> +
>> + /*
>> + * If a previous sample has not been handled yet, the session cannot request another
>> + * sample. If this happens too often, the requested sample rate is too high.
>> + */
>> + if (READ_ONCE(session->pending_sample_request) != SAMPLE_TYPE_NONE)
>> + return -EBUSY;
>> +
>> + WRITE_ONCE(session->pending_sample_request, type);
>> + session_get(session);
>
> Why do we increase the rfcnt for the session here?
> Is it because someone might try to tear the session down while a sample
> is being waited for?
That's the idea.
> I think, in that case the sample processing logic can determine no
> sessions might be left and then refuse to copy the FW sample.
I think we can run into a race around the UM calling a stop sample
and a teardown simultaneously. If we hold the refcount throughout the
sampling operation, then this never occurs.
>> + list_add_tail(&session->pending, &sampler->pending_samples);
>> +
>> + if (!sampler->sample_requested) {
>> + reinit_completion(&sampler->sample_handled);
>> + sampler->sample_requested = true;
>> + panthor_perf_fw_request_sample(sampler);
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +/**
>> + * panthor_perf_sampler_request_initial - Request an initial sample.
>> + * @sampler: Panthor sampler
>> + * @session: Target session
>> + *
>> + * Perform a synchronous sample that gets immediately discarded. This sets a baseline at the point
>> + * of time a new session is started, to avoid having counters from before the session.
>> + */
>> +static int panthor_perf_sampler_request_initial(struct panthor_perf_sampler *sampler,
>> + struct panthor_perf_session *session)
>> +{
>> + int ret = sampler_request(sampler, session, SAMPLE_TYPE_INITIAL);
>> +
>> + if (ret)
>> + return ret;
>> +
>> + return wait_for_completion_timeout(&sampler->sample_handled,
>> + msecs_to_jiffies(1000));
>> +}
>> +
>> +/**
>> + * panthor_perf_sampler_request_sample - Request a counter sample for the userspace client.
>> + * @sampler: Panthor sampler
>> + * @session: Target session
>> + *
>> + * A session that has already requested a sample cannot request another one until the previous
>> + * sample has been delivered.
>> + *
>> + * Return:
>> + * * %0 - The sample has been requested successfully.
>> + * * %-EBUSY - The target session has already requested a sample and has not received it yet.
>> + */
>> +static int panthor_perf_sampler_request_sample(struct panthor_perf_sampler *sampler,
>> + struct panthor_perf_session *session)
>> +{
>> + return sampler_request(sampler, session, SAMPLE_TYPE_REGULAR);
>> +}
>> +
>> static int session_validate_set(u8 set)
>> {
>> if (set > DRM_PANTHOR_PERF_SET_TERTIARY)
>> @@ -417,8 +1454,8 @@ static int session_validate_set(u8 set)
>> * Return: non-negative session identifier on success or negative error code on failure.
>> */
>> int panthor_perf_session_setup(struct panthor_device *ptdev, struct panthor_perf *perf,
>> - struct drm_panthor_perf_cmd_setup *setup_args,
>> - struct panthor_file *pfile)
>> + struct drm_panthor_perf_cmd_setup *setup_args,
>> + struct panthor_file *pfile)
>> {
>> struct panthor_perf_session *session;
>> struct drm_gem_object *ringbuffer;
>> @@ -510,6 +1547,10 @@ int panthor_perf_session_setup(struct panthor_device *ptdev, struct panthor_perf
>> kref_init(&session->ref);
>> session->enabled_counters = em;
>>
>> + ret = panthor_perf_sampler_add(&perf->sampler, session, setup_args->block_set);
>> + if (ret)
>> + goto cleanup_xa_alloc;
>> +
>> session->sample_freq_ns = setup_args->sample_freq_ns;
>> session->user_sample_size = user_sample_size;
>> session->ring_buf = ringbuffer;
>> @@ -520,6 +1561,9 @@ int panthor_perf_session_setup(struct panthor_device *ptdev, struct panthor_perf
>>
>> return session_id;
>>
>> +cleanup_xa_alloc:
>> + xa_store(&perf->sessions, session_id, NULL, GFP_KERNEL);
>> +
>> cleanup_em:
>> kfree(em);
>>
>> @@ -545,8 +1589,10 @@ int panthor_perf_session_setup(struct panthor_device *ptdev, struct panthor_perf
>> }
>>
>> static int session_stop(struct panthor_perf *perf, struct panthor_perf_session *session,
>> - u64 user_data)
>> + u64 user_data)
>
> You've changed the indentation of a few function headers in this
> commit. It's best to fix it in the original one right away.
>
Agreed, sorry for the noise!
>> {
>> + int ret;
>> +
>> if (!test_bit(PANTHOR_PERF_SESSION_ACTIVE, session->state))
>> return 0;
>>
>> @@ -559,14 +1605,17 @@ static int session_stop(struct panthor_perf *perf, struct panthor_perf_session *
>>
>> session->user_data = user_data;
>>
>> + ret = panthor_perf_sampler_request_sample(&perf->sampler, session);
>> + if (ret)
>> + return ret;
>> +
>> clear_bit(PANTHOR_PERF_SESSION_ACTIVE, session->state);
>>
>> - /* TODO Calls to the FW interface will go here in later patches. */
>> return 0;
>> }
>>
>> static int session_start(struct panthor_perf *perf, struct panthor_perf_session *session,
>> - u64 user_data)
>> + u64 user_data)
>> {
>> if (test_bit(PANTHOR_PERF_SESSION_ACTIVE, session->state))
>> return 0;
>> @@ -580,12 +1629,11 @@ static int session_start(struct panthor_perf *perf, struct panthor_perf_session
>> if (session->sample_freq_ns)
>> session->user_data = user_data;
>>
>> - /* TODO Calls to the FW interface will go here in later patches. */
>> - return 0;
>> + return panthor_perf_sampler_request_initial(&perf->sampler, session);
>> }
>>
>> static int session_sample(struct panthor_perf *perf, struct panthor_perf_session *session,
>> - u64 user_data)
>> + u64 user_data)
>> {
>> if (!test_bit(PANTHOR_PERF_SESSION_ACTIVE, session->state))
>> return 0;
>> @@ -608,14 +1656,16 @@ static int session_sample(struct panthor_perf *perf, struct panthor_perf_session
>> session->sample_start_ns = ktime_get_raw_ns();
>> session->user_data = user_data;
>>
>> - return 0;
>> + return panthor_perf_sampler_request_sample(&perf->sampler, session);
>> }
>>
>> static int session_destroy(struct panthor_perf *perf, struct panthor_perf_session *session)
>> {
>> + int ret = panthor_perf_sampler_remove_session(&perf->sampler, session);
>> +
>> session_put(session);
>>
>> - return 0;
>> + return ret;
>> }
>>
>> static int session_teardown(struct panthor_perf *perf, struct panthor_perf_session *session)
>> @@ -691,7 +1741,7 @@ int panthor_perf_session_teardown(struct panthor_file *pfile, struct panthor_per
>> * Return: 0 on success, negative error code on failure.
>> */
>> int panthor_perf_session_start(struct panthor_file *pfile, struct panthor_perf *perf,
>> - u32 sid, u64 user_data)
>> + u32 sid, u64 user_data)
>> {
>> struct panthor_perf_session *session = session_find(pfile, perf, sid);
>> int err;
>> @@ -724,7 +1774,7 @@ int panthor_perf_session_start(struct panthor_file *pfile, struct panthor_perf *
>> * Return: 0 on success, negative error code on failure.
>> */
>> int panthor_perf_session_stop(struct panthor_file *pfile, struct panthor_perf *perf,
>> - u32 sid, u64 user_data)
>> + u32 sid, u64 user_data)
>> {
>> struct panthor_perf_session *session = session_find(pfile, perf, sid);
>> int err;
>> @@ -755,7 +1805,7 @@ int panthor_perf_session_stop(struct panthor_file *pfile, struct panthor_perf *p
>> * Return: 0 on success, negative error code on failure.
>> */
>> int panthor_perf_session_sample(struct panthor_file *pfile, struct panthor_perf *perf,
>> - u32 sid, u64 user_data)
>> + u32 sid, u64 user_data)
>> {
>> struct panthor_perf_session *session = session_find(pfile, perf, sid);
>> int err;
>> @@ -822,6 +1872,8 @@ void panthor_perf_unplug(struct panthor_device *ptdev)
>>
>> xa_destroy(&perf->sessions);
>>
>> + panthor_perf_sampler_term(&perf->sampler);
>> +
>> kfree(ptdev->perf);
>>
>> ptdev->perf = NULL;
>> diff --git a/drivers/gpu/drm/panthor/panthor_perf.h b/drivers/gpu/drm/panthor/panthor_perf.h
>> index 89d61cd1f017..c482198b6fbd 100644
>> --- a/drivers/gpu/drm/panthor/panthor_perf.h
>> +++ b/drivers/gpu/drm/panthor/panthor_perf.h
>> @@ -28,5 +28,7 @@ int panthor_perf_session_sample(struct panthor_file *pfile, struct panthor_perf
>> u32 sid, u64 user_data);
>> void panthor_perf_session_destroy(struct panthor_file *pfile, struct panthor_perf *perf);
>>
>> +void panthor_perf_report_irq(struct panthor_device *ptdev, u32 status);
>> +
>> #endif /* __PANTHOR_PERF_H__ */
>>
>> --
>> 2.33.0.dirty
>
>
> Adrian Larumbe
^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2025-07-25 10:30 UTC | newest]
Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-16 15:49 [PATCH v4 0/7] Performance counter implementation with single manual client support Lukas Zapolskas
2025-05-16 15:49 ` [PATCH v4 1/7] drm/panthor: Add performance counter uAPI Lukas Zapolskas
2025-07-18 2:43 ` Adrián Larumbe
2025-07-21 8:46 ` Lukas Zapolskas
2025-05-16 15:49 ` [PATCH v4 2/7] drm/panthor: Add DEV_QUERY.PERF_INFO handling for Gx10 Lukas Zapolskas
2025-07-18 2:52 ` Adrián Larumbe
2025-07-21 9:04 ` Lukas Zapolskas
2025-07-18 15:11 ` Adrián Larumbe
2025-07-21 9:06 ` Lukas Zapolskas
2025-05-16 15:49 ` [PATCH v4 3/7] drm/panthor: Add panthor perf initialization and termination Lukas Zapolskas
2025-07-18 3:10 ` Adrián Larumbe
2025-07-21 9:10 ` Lukas Zapolskas
2025-05-16 15:49 ` [PATCH v4 4/7] drm/panthor: Introduce sampling sessions to handle userspace clients Lukas Zapolskas
2025-05-17 7:53 ` kernel test robot
2025-06-20 15:28 ` Steven Price
2025-07-21 9:58 ` Lukas Zapolskas
2025-07-18 3:34 ` Adrián Larumbe
2025-07-21 9:53 ` Lukas Zapolskas
2025-05-16 15:49 ` [PATCH v4 5/7] drm/panthor: Implement the counter sampler and sample handling Lukas Zapolskas
2025-05-17 8:56 ` kernel test robot
2025-07-18 14:49 ` Adrián Larumbe
2025-07-25 10:29 ` Lukas Zapolskas
2025-05-16 15:49 ` [PATCH v4 6/7] drm/panthor: Add suspend, resume and reset handling Lukas Zapolskas
2025-07-18 15:01 ` Adrián Larumbe
2025-07-25 9:26 ` Lukas Zapolskas
2025-05-16 15:49 ` [PATCH v4 7/7] drm/panthor: Expose the panthor perf ioctls Lukas Zapolskas
2025-07-18 15:05 ` Adrián Larumbe
2025-07-18 15:19 ` Adrián Larumbe
2025-07-25 9:09 ` Lukas Zapolskas
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).