From: Boris Brezillon <boris.brezillon@collabora.com>
To: "Boris Brezillon" <boris.brezillon@collabora.com>,
"Steven Price" <steven.price@arm.com>,
"Liviu Dudau" <liviu.dudau@arm.com>,
"Adrián Larumbe" <adrian.larumbe@collabora.com>
Cc: dri-devel@lists.freedesktop.org, kernel@collabora.com
Subject: Re: [PATCH v4] drm/panthor: Report innocent group kill
Date: Tue, 17 Dec 2024 11:11:07 +0100 [thread overview]
Message-ID: <20241217111107.379e95ac@collabora.com> (raw)
In-Reply-To: <20241211080500.2349505-1-boris.brezillon@collabora.com>
On Wed, 11 Dec 2024 09:05:00 +0100
Boris Brezillon <boris.brezillon@collabora.com> wrote:
> Groups can be killed during a reset even though they did nothing wrong.
> That usually happens when the FW is put in a bad state by other groups,
> resulting in group suspension failures when the reset happens.
>
> If we end up in that situation, flag the group innocent and report
> innocence through a new DRM_PANTHOR_GROUP_STATE flag.
>
> Bump the minor driver version to reflect the uAPI change.
>
> Changes in v4:
> - Add an entry to the driver version changelog
> - Add R-bs
>
> Changes in v3:
> - Actually report innocence to userspace
>
> Changes in v2:
> - New patch
>
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> Reviewed-by: Liviu Dudau <liviu.dudau@arm.com>
> Reviewed-by: Steven Price <steven.price@arm.com>
Queued to drm-misc-next.
> ---
> drivers/gpu/drm/panthor/panthor_drv.c | 3 ++-
> drivers/gpu/drm/panthor/panthor_sched.c | 18 ++++++++++++++++++
> include/uapi/drm/panthor_drm.h | 9 +++++++++
> 3 files changed, 29 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/panthor/panthor_drv.c b/drivers/gpu/drm/panthor/panthor_drv.c
> index ac7e53f6e3f0..1498c97b4b85 100644
> --- a/drivers/gpu/drm/panthor/panthor_drv.c
> +++ b/drivers/gpu/drm/panthor/panthor_drv.c
> @@ -1493,6 +1493,7 @@ static void panthor_debugfs_init(struct drm_minor *minor)
> * - 1.1 - adds DEV_QUERY_TIMESTAMP_INFO query
> * - 1.2 - adds DEV_QUERY_GROUP_PRIORITIES_INFO query
> * - adds PANTHOR_GROUP_PRIORITY_REALTIME priority
> + * - 1.3 - adds DRM_PANTHOR_GROUP_STATE_INNOCENT flag
> */
> static const struct drm_driver panthor_drm_driver = {
> .driver_features = DRIVER_RENDER | DRIVER_GEM | DRIVER_SYNCOBJ |
> @@ -1507,7 +1508,7 @@ static const struct drm_driver panthor_drm_driver = {
> .desc = "Panthor DRM driver",
> .date = "20230801",
> .major = 1,
> - .minor = 2,
> + .minor = 3,
>
> .gem_create_object = panthor_gem_create_object,
> .gem_prime_import_sg_table = drm_gem_shmem_prime_import_sg_table,
> diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
> index ef4bec7ff9c7..97ed5fe5a191 100644
> --- a/drivers/gpu/drm/panthor/panthor_sched.c
> +++ b/drivers/gpu/drm/panthor/panthor_sched.c
> @@ -610,6 +610,16 @@ struct panthor_group {
> */
> bool timedout;
>
> + /**
> + * @innocent: True when the group becomes unusable because the group suspension
> + * failed during a reset.
> + *
> + * Sometimes the FW was put in a bad state by other groups, causing the group
> + * suspension happening in the reset path to fail. In that case, we consider the
> + * group innocent.
> + */
> + bool innocent;
> +
> /**
> * @syncobjs: Pool of per-queue synchronization objects.
> *
> @@ -2690,6 +2700,12 @@ void panthor_sched_suspend(struct panthor_device *ptdev)
> u32 csg_id = ffs(slot_mask) - 1;
> struct panthor_csg_slot *csg_slot = &sched->csg_slots[csg_id];
>
> + /* If the group was still usable before that point, we consider
> + * it innocent.
> + */
> + if (group_can_run(csg_slot->group))
> + csg_slot->group->innocent = true;
> +
> /* We consider group suspension failures as fatal and flag the
> * group as unusable by setting timedout=true.
> */
> @@ -3570,6 +3586,8 @@ int panthor_group_get_state(struct panthor_file *pfile,
> get_state->state |= DRM_PANTHOR_GROUP_STATE_FATAL_FAULT;
> get_state->fatal_queues = group->fatal_queues;
> }
> + if (group->innocent)
> + get_state->state |= DRM_PANTHOR_GROUP_STATE_INNOCENT;
> mutex_unlock(&sched->lock);
>
> group_put(group);
> diff --git a/include/uapi/drm/panthor_drm.h b/include/uapi/drm/panthor_drm.h
> index 87c9cb555dd1..b99763cbae48 100644
> --- a/include/uapi/drm/panthor_drm.h
> +++ b/include/uapi/drm/panthor_drm.h
> @@ -923,6 +923,15 @@ enum drm_panthor_group_state_flags {
> * When a group ends up with this flag set, no jobs can be submitted to its queues.
> */
> DRM_PANTHOR_GROUP_STATE_FATAL_FAULT = 1 << 1,
> +
> + /**
> + * @DRM_PANTHOR_GROUP_STATE_INNOCENT: Group was killed during a reset caused by other
> + * groups.
> + *
> + * This flag can only be set if DRM_PANTHOR_GROUP_STATE_TIMEDOUT is set and
> + * DRM_PANTHOR_GROUP_STATE_FATAL_FAULT is not.
> + */
> + DRM_PANTHOR_GROUP_STATE_INNOCENT = 1 << 2,
> };
>
> /**
prev parent reply other threads:[~2024-12-17 10:11 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-11 8:05 [PATCH v4] drm/panthor: Report innocent group kill Boris Brezillon
2024-12-17 10:11 ` Boris Brezillon [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241217111107.379e95ac@collabora.com \
--to=boris.brezillon@collabora.com \
--cc=adrian.larumbe@collabora.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=kernel@collabora.com \
--cc=liviu.dudau@arm.com \
--cc=steven.price@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.