From: "Maíra Canal" <mcanal@igalia.com>
To: Melissa Wen <mwen@igalia.com>,
Iago Toral Quiroga <itoral@igalia.com>,
Jose Maria Casanova Crespo <jmcasanova@igalia.com>,
David Airlie <airlied@gmail.com>, Simona Vetter <simona@ffwll.ch>
Cc: kernel-dev@igalia.com, dri-devel@lists.freedesktop.org
Subject: Re: [PATCH v2 4/6] drm/v3d: Address race-condition between per-fd GPU stats and fd release
Date: Mon, 25 Aug 2025 17:53:07 -0300 [thread overview]
Message-ID: <8a5ecf46-897e-48ea-92c2-927980ddff0b@igalia.com> (raw)
In-Reply-To: <8cf9cd49-b4e5-4cd9-b4db-9bb81fdf6b0b@igalia.com>
Hi Melissa,
On 25/08/25 17:03, Melissa Wen wrote:
>
>
> On 15/08/2025 16:58, Maíra Canal wrote:
>> When the file descriptor is closed while a job is still running,
>> there's a race condition between the job completion callback and the
>> file descriptor cleanup. This can lead to accessing freed memory when
>> updating per-fd GPU stats, such as the following example:
>>
>> [56120.512903] Unable to handle kernel paging request at virtual
>> address 0000330a92b9688a
>> [56120.520881] Mem abort info:
>> [56120.523687] ESR = 0x0000000096000005
>> [56120.527454] EC = 0x25: DABT (current EL), IL = 32 bits
>> [56120.532785] SET = 0, FnV = 0
>> [56120.535847] EA = 0, S1PTW = 0
>> [56120.538995] FSC = 0x05: level 1 translation fault
>> [56120.543891] Data abort info:
>> [56120.546778] ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
>> [56120.552289] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
>> [56120.557362] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
>> [56120.562690] user pgtable: 16k pages, 47-bit VAs, pgdp=0000000023f54000
>> [56120.569239] [0000330a92b9688a] pgd=0000000000000000,
>> p4d=0000000000000000, pud=0000000000000000
>> [56120.577975] Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP
>> CPU: 0 UID: 1000 PID: 1497409 Comm: mpv Not tainted
>> 6.12.37-ncvm5+ #1
>> Hardware name: Raspberry Pi 5 Model B Rev 1.0 (DT)
>> pstate: 604000c9 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS
>> BTYPE=--)
>> pc : v3d_job_update_stats+0x64/0x168 [v3d]
>> lr : v3d_job_update_stats+0x40/0x168 [v3d]
>> sp : ffffc00080003e60
>> x29: ffffc00080003e60 x28: ffff800002860000 x27:
>> 0000000000000000
>> x26: 0000000000000000 x25: ffff800002860000 x24:
>> ffff800002630800
>> x23: ffff800060786000 x22: 0000330a933c31fb x21:
>> 0000000000000001
>> x20: 0000330a92b96302 x19: ffff800060786b10 x18:
>> 0000000000000000
>> x17: ffffaf90506a0000 x16: ffffd06fce57c360 x15:
>> 0000000000000000
>> x14: 0000000000000000 x13: 0000000000000000 x12:
>> 0000000000000000
>> x11: 0000000000000000 x10: 0000000000000000 x9 :
>> ffffd06f5d0fec40
>> x8 : 0000000000000000 x7 : 0000000000000000 x6 :
>> 000002978dbd535a
>> x5 : 00ffffffffffffff x4 : 0000000000000015 x3 :
>> 0000300001fddf88
>> x2 : 0000000000000020 x1 : 0000000000010001 x0 :
>> 0000330a92b96872
>> Call trace:
>> v3d_job_update_stats+0x64/0x168 [v3d]
>> v3d_irq+0x118/0x2e0 [v3d]
>> __handle_irq_event_percpu+0x60/0x220
>>
>> Fix such an issue by protecting all accesses to `job->file_priv` with
>> the queue's lock. With that, we can clear `job->file_priv` before the
>> V3D per-fd structure is freed and assure that `job->file_priv` exists
>> during the per-fd GPU stats updates.
>>
>> Fixes: e1bc3a13bd77 ("drm/v3d: Avoid NULL pointer dereference in
>> `v3d_job_update_stats()`")
>> Signed-off-by: Maíra Canal <mcanal@igalia.com>
>> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
>> ---
>> drivers/gpu/drm/v3d/v3d_drv.c | 14 +++++++++++++-
>> drivers/gpu/drm/v3d/v3d_drv.h | 2 +-
>> drivers/gpu/drm/v3d/v3d_sched.c | 14 +++++++-------
>> 3 files changed, 21 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/v3d/v3d_drv.c b/drivers/gpu/drm/v3d/
>> v3d_drv.c
>> index
>> 2def155ce496ec5f159f6bda9929aeaae141d1a6..c5a3bbbc74c5c6c9a34a6e6111b4e4e21bee0b34 100644
>> --- a/drivers/gpu/drm/v3d/v3d_drv.c
>> +++ b/drivers/gpu/drm/v3d/v3d_drv.c
>> @@ -157,12 +157,24 @@ v3d_open(struct drm_device *dev, struct drm_file
>> *file)
>> static void
>> v3d_postclose(struct drm_device *dev, struct drm_file *file)
>> {
>> + struct v3d_dev *v3d = to_v3d_dev(dev);
>> struct v3d_file_priv *v3d_priv = file->driver_priv;
>> + unsigned long irqflags;
>> enum v3d_queue q;
>> - for (q = 0; q < V3D_MAX_QUEUES; q++)
>> + for (q = 0; q < V3D_MAX_QUEUES; q++) {
>> + struct v3d_queue_state *queue = &v3d->queue[q];
>> + struct v3d_job *job = queue->active_job;
>> +
>> drm_sched_entity_destroy(&v3d_priv->sched_entity[q]);
>> + if (job && job->base.entity == &v3d_priv->sched_entity[q]) {
>> + spin_lock_irqsave(&queue->queue_lock, irqflags);
>> + job->file_priv = NULL;
>> + spin_unlock_irqrestore(&queue->queue_lock, irqflags);
>> + }
>> + }
>> +
>> v3d_perfmon_close_file(v3d_priv);
>> kfree(v3d_priv);
>> }
>> diff --git a/drivers/gpu/drm/v3d/v3d_drv.h b/drivers/gpu/drm/v3d/
>> v3d_drv.h
>> index
>> cfc2f9c123aa99f6f1875b297eaf6c226b03d4ec..0317f3d7452a3f01b91bfdc691b5a98e54b3a4ec 100644
>> --- a/drivers/gpu/drm/v3d/v3d_drv.h
>> +++ b/drivers/gpu/drm/v3d/v3d_drv.h
>> @@ -608,7 +608,7 @@ void v3d_timestamp_query_info_free(struct
>> v3d_timestamp_query_info *query_info,
>> unsigned int count);
>> void v3d_performance_query_info_free(struct
>> v3d_performance_query_info *query_info,
>> unsigned int count);
>> -void v3d_job_update_stats(struct v3d_job *job, enum v3d_queue queue);
>> +void v3d_job_update_stats(struct v3d_job *job, enum v3d_queue q);
>> int v3d_sched_init(struct v3d_dev *v3d);
>> void v3d_sched_fini(struct v3d_dev *v3d);
>> diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/
>> v3d_sched.c
>> index
>> e348816b691ef05909828accbe15399816e69369..60e251367f42170b30de968682deb6370604db44 100644
>> --- a/drivers/gpu/drm/v3d/v3d_sched.c
>> +++ b/drivers/gpu/drm/v3d/v3d_sched.c
>> @@ -194,11 +194,11 @@ v3d_stats_update(struct v3d_stats *stats, u64 now)
>> }
>> void
>> -v3d_job_update_stats(struct v3d_job *job, enum v3d_queue queue)
>> +v3d_job_update_stats(struct v3d_job *job, enum v3d_queue q)
>> {
>> struct v3d_dev *v3d = job->v3d;
>> - struct v3d_file_priv *file = job->file_priv;
>> - struct v3d_stats *global_stats = &v3d->queue[queue].stats;
>> + struct v3d_queue_state *queue = &v3d->queue[q];
>> + struct v3d_stats *global_stats = &queue->stats;
>> u64 now = local_clock();
>> unsigned long flags;
>> @@ -209,10 +209,10 @@ v3d_job_update_stats(struct v3d_job *job, enum
>> v3d_queue queue)
>> preempt_disable();
>> /* Don't update the local stats if the file context has already
>> closed */
>> - if (file)
>> - v3d_stats_update(&file->stats[queue], now);
>> - else
>> - drm_dbg(&v3d->drm, "The file descriptor was closed before job
>> completion\n");
>> + spin_lock(&queue->queue_lock);
>> + if (job->file_priv)
>> + v3d_stats_update(&job->file_priv->stats[q], now);
> Why not keep the drm_dbg() if !job->file_priv?
After some thought, I concluded that this debug message won't be useful
anymore. After this patch, it doesn't really matter if !job->file_priv,
the job will still end gracefully.
Best Regards,
- Maíra
next prev parent reply other threads:[~2025-08-25 20:53 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-15 19:58 [PATCH v2 0/6] drm/v3d: General job locking improvements + race-condition fixes Maíra Canal
2025-08-15 19:58 ` [PATCH v2 1/6] drm/v3d: Store a pointer to `struct v3d_file_priv` inside each job Maíra Canal
2025-08-25 19:56 ` Melissa Wen
2025-08-25 20:30 ` Maíra Canal
2025-08-25 20:33 ` Melissa Wen
2025-08-15 19:58 ` [PATCH v2 2/6] drm/v3d: Store the active job inside the queue's state Maíra Canal
2025-08-25 20:23 ` Melissa Wen
2025-08-15 19:58 ` [PATCH v2 3/6] drm/v3d: Replace a global spinlock with a per-queue spinlock Maíra Canal
2025-08-25 19:57 ` Melissa Wen
2025-08-25 20:35 ` Maíra Canal
2025-08-15 19:58 ` [PATCH v2 4/6] drm/v3d: Address race-condition between per-fd GPU stats and fd release Maíra Canal
2025-08-25 20:03 ` Melissa Wen
2025-08-25 20:53 ` Maíra Canal [this message]
2025-08-25 23:09 ` Melissa Wen
2025-08-26 12:03 ` Maíra Canal
2025-08-15 19:58 ` [PATCH v2 5/6] drm/v3d: Synchronous operations can't timeout Maíra Canal
2025-08-25 20:31 ` Melissa Wen
2025-08-15 19:58 ` [PATCH v2 6/6] drm/v3d: Protect per-fd reset counter against fd release Maíra Canal
2025-08-25 20:15 ` Melissa Wen
2025-08-25 20:50 ` Maíra Canal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8a5ecf46-897e-48ea-92c2-927980ddff0b@igalia.com \
--to=mcanal@igalia.com \
--cc=airlied@gmail.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=itoral@igalia.com \
--cc=jmcasanova@igalia.com \
--cc=kernel-dev@igalia.com \
--cc=mwen@igalia.com \
--cc=simona@ffwll.ch \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).