public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/panthor: Fix UAF race between device unplug and FW event processing
@ 2025-10-08 10:53 Ketil Johnsen
  2025-10-09  9:45 ` Boris Brezillon
  2025-10-09 22:18 ` kernel test robot
  0 siblings, 2 replies; 4+ messages in thread
From: Ketil Johnsen @ 2025-10-08 10:53 UTC (permalink / raw)
  To: Boris Brezillon, Steven Price, Liviu Dudau, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter
  Cc: Ketil Johnsen, dri-devel, linux-kernel

The function panthor_fw_unplug() will free the FW memory sections.
The problem is that there could still be pending FW events which are yet
not handled at this point. process_fw_events_work() can in this case try
to access said freed memory.

The fix is to stop FW event processing after IRQs are disabled but before
the FW memory is freed.

Signed-off-by: Ketil Johnsen <ketil.johnsen@arm.com>
---
 drivers/gpu/drm/panthor/panthor_fw.c    |  3 +++
 drivers/gpu/drm/panthor/panthor_sched.c | 12 ++++++++++++
 drivers/gpu/drm/panthor/panthor_sched.h |  1 +
 3 files changed, 16 insertions(+)

diff --git a/drivers/gpu/drm/panthor/panthor_fw.c b/drivers/gpu/drm/panthor/panthor_fw.c
index 9bf06e55eaee..4f393c5cd26f 100644
--- a/drivers/gpu/drm/panthor/panthor_fw.c
+++ b/drivers/gpu/drm/panthor/panthor_fw.c
@@ -1172,6 +1172,9 @@ void panthor_fw_unplug(struct panthor_device *ptdev)
 		panthor_fw_stop(ptdev);
 	}
 
+	/* Any pending FW event processing must stop before we free FW memory */
+	panthor_sched_stop_fw_events(ptdev);
+
 	list_for_each_entry(section, &ptdev->fw->sections, node)
 		panthor_kernel_bo_destroy(section->mem);
 
diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
index 0cc9055f4ee5..d150c8d99432 100644
--- a/drivers/gpu/drm/panthor/panthor_sched.c
+++ b/drivers/gpu/drm/panthor/panthor_sched.c
@@ -1794,6 +1794,18 @@ void panthor_sched_report_fw_events(struct panthor_device *ptdev, u32 events)
 	sched_queue_work(ptdev->scheduler, fw_events);
 }
 
+/**
+ * panthor_sched_stop_fw_events() - Stop processing FW events.
+ */
+void panthor_sched_stop_fw_events(struct panthor_device *ptdev)
+{
+	if (!ptdev->scheduler)
+		return;
+
+	atomic_set(&ptdev->scheduler->fw_events, 0);
+	cancel_work_sync(&ptdev->scheduler->fw_events_work);
+}
+
 static const char *fence_get_driver_name(struct dma_fence *fence)
 {
 	return "panthor";
diff --git a/drivers/gpu/drm/panthor/panthor_sched.h b/drivers/gpu/drm/panthor/panthor_sched.h
index f4a475aa34c0..4393599ed330 100644
--- a/drivers/gpu/drm/panthor/panthor_sched.h
+++ b/drivers/gpu/drm/panthor/panthor_sched.h
@@ -51,6 +51,7 @@ void panthor_sched_resume(struct panthor_device *ptdev);
 
 void panthor_sched_report_mmu_fault(struct panthor_device *ptdev);
 void panthor_sched_report_fw_events(struct panthor_device *ptdev, u32 events);
+void panthor_sched_stop_fw_events(struct panthor_device *ptdev);
 
 void panthor_fdinfo_gather_group_samples(struct panthor_file *pfile);
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] drm/panthor: Fix UAF race between device unplug and FW event processing
  2025-10-08 10:53 [PATCH] drm/panthor: Fix UAF race between device unplug and FW event processing Ketil Johnsen
@ 2025-10-09  9:45 ` Boris Brezillon
  2025-10-17  9:46   ` Ketil Johnsen
  2025-10-09 22:18 ` kernel test robot
  1 sibling, 1 reply; 4+ messages in thread
From: Boris Brezillon @ 2025-10-09  9:45 UTC (permalink / raw)
  To: Ketil Johnsen
  Cc: Steven Price, Liviu Dudau, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, dri-devel,
	linux-kernel

On Wed,  8 Oct 2025 12:53:20 +0200
Ketil Johnsen <ketil.johnsen@arm.com> wrote:

> The function panthor_fw_unplug() will free the FW memory sections.
> The problem is that there could still be pending FW events which are yet
> not handled at this point. process_fw_events_work() can in this case try
> to access said freed memory.
> 
> The fix is to stop FW event processing after IRQs are disabled but before
> the FW memory is freed.
> 
> Signed-off-by: Ketil Johnsen <ketil.johnsen@arm.com>
> ---
>  drivers/gpu/drm/panthor/panthor_fw.c    |  3 +++
>  drivers/gpu/drm/panthor/panthor_sched.c | 12 ++++++++++++
>  drivers/gpu/drm/panthor/panthor_sched.h |  1 +
>  3 files changed, 16 insertions(+)
> 
> diff --git a/drivers/gpu/drm/panthor/panthor_fw.c b/drivers/gpu/drm/panthor/panthor_fw.c
> index 9bf06e55eaee..4f393c5cd26f 100644
> --- a/drivers/gpu/drm/panthor/panthor_fw.c
> +++ b/drivers/gpu/drm/panthor/panthor_fw.c
> @@ -1172,6 +1172,9 @@ void panthor_fw_unplug(struct panthor_device *ptdev)
>  		panthor_fw_stop(ptdev);
>  	}
>  
> +	/* Any pending FW event processing must stop before we free FW memory */
> +	panthor_sched_stop_fw_events(ptdev);
> +
>  	list_for_each_entry(section, &ptdev->fw->sections, node)
>  		panthor_kernel_bo_destroy(section->mem);
>  
> diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
> index 0cc9055f4ee5..d150c8d99432 100644
> --- a/drivers/gpu/drm/panthor/panthor_sched.c
> +++ b/drivers/gpu/drm/panthor/panthor_sched.c
> @@ -1794,6 +1794,18 @@ void panthor_sched_report_fw_events(struct panthor_device *ptdev, u32 events)
>  	sched_queue_work(ptdev->scheduler, fw_events);
>  }
>  
> +/**
> + * panthor_sched_stop_fw_events() - Stop processing FW events.
> + */
> +void panthor_sched_stop_fw_events(struct panthor_device *ptdev)
> +{
> +	if (!ptdev->scheduler)
> +		return;
> +
> +	atomic_set(&ptdev->scheduler->fw_events, 0);
> +	cancel_work_sync(&ptdev->scheduler->fw_events_work);
> +}

Hm, I'd rather have this called from sched_unplug() and then have an
extra check in panthor_sched_report_fw_events() to bail out if the
scheduler component is no longer functional. This way this helper stays
private to panthor_sched.c.

> +
>  static const char *fence_get_driver_name(struct dma_fence *fence)
>  {
>  	return "panthor";
> diff --git a/drivers/gpu/drm/panthor/panthor_sched.h b/drivers/gpu/drm/panthor/panthor_sched.h
> index f4a475aa34c0..4393599ed330 100644
> --- a/drivers/gpu/drm/panthor/panthor_sched.h
> +++ b/drivers/gpu/drm/panthor/panthor_sched.h
> @@ -51,6 +51,7 @@ void panthor_sched_resume(struct panthor_device *ptdev);
>  
>  void panthor_sched_report_mmu_fault(struct panthor_device *ptdev);
>  void panthor_sched_report_fw_events(struct panthor_device *ptdev, u32 events);
> +void panthor_sched_stop_fw_events(struct panthor_device *ptdev);
>  
>  void panthor_fdinfo_gather_group_samples(struct panthor_file *pfile);
>  


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] drm/panthor: Fix UAF race between device unplug and FW event processing
  2025-10-08 10:53 [PATCH] drm/panthor: Fix UAF race between device unplug and FW event processing Ketil Johnsen
  2025-10-09  9:45 ` Boris Brezillon
@ 2025-10-09 22:18 ` kernel test robot
  1 sibling, 0 replies; 4+ messages in thread
From: kernel test robot @ 2025-10-09 22:18 UTC (permalink / raw)
  To: Ketil Johnsen, Boris Brezillon, Steven Price, Liviu Dudau,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, David Airlie,
	Simona Vetter
  Cc: oe-kbuild-all, Ketil Johnsen, dri-devel, linux-kernel

Hi Ketil,

kernel test robot noticed the following build warnings:

[auto build test WARNING on drm-misc/drm-misc-next]
[also build test WARNING on linus/master v6.17 next-20251009]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Ketil-Johnsen/drm-panthor-Fix-UAF-race-between-device-unplug-and-FW-event-processing/20251009-184851
base:   git://anongit.freedesktop.org/drm/drm-misc drm-misc-next
patch link:    https://lore.kernel.org/r/20251008105322.4077661-1-ketil.johnsen%40arm.com
patch subject: [PATCH] drm/panthor: Fix UAF race between device unplug and FW event processing
config: i386-buildonly-randconfig-001-20251010 (https://download.01.org/0day-ci/archive/20251010/202510100644.YPzFXMEb-lkp@intel.com/config)
compiler: gcc-14 (Debian 14.2.0-19) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251010/202510100644.YPzFXMEb-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202510100644.YPzFXMEb-lkp@intel.com/

All warnings (new ones prefixed by >>):

   Warning: drivers/gpu/drm/panthor/panthor_sched.c:1788 function parameter 'ptdev' not described in 'panthor_sched_report_fw_events'
   Warning: drivers/gpu/drm/panthor/panthor_sched.c:1788 function parameter 'events' not described in 'panthor_sched_report_fw_events'
>> Warning: drivers/gpu/drm/panthor/panthor_sched.c:1800 function parameter 'ptdev' not described in 'panthor_sched_stop_fw_events'
   Warning: drivers/gpu/drm/panthor/panthor_sched.c:2693 function parameter 'ptdev' not described in 'panthor_sched_report_mmu_fault'

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] drm/panthor: Fix UAF race between device unplug and FW event processing
  2025-10-09  9:45 ` Boris Brezillon
@ 2025-10-17  9:46   ` Ketil Johnsen
  0 siblings, 0 replies; 4+ messages in thread
From: Ketil Johnsen @ 2025-10-17  9:46 UTC (permalink / raw)
  To: Boris Brezillon
  Cc: Steven Price, Liviu Dudau, Maarten Lankhorst, Maxime Ripard,
	Thomas Zimmermann, David Airlie, Simona Vetter, dri-devel,
	linux-kernel

On 09/10/2025 11:45, Boris Brezillon wrote:
> On Wed,  8 Oct 2025 12:53:20 +0200
> Ketil Johnsen <ketil.johnsen@arm.com> wrote:
> 
>> The function panthor_fw_unplug() will free the FW memory sections.
>> The problem is that there could still be pending FW events which are yet
>> not handled at this point. process_fw_events_work() can in this case try
>> to access said freed memory.
>>
>> The fix is to stop FW event processing after IRQs are disabled but before
>> the FW memory is freed.
>>
>> Signed-off-by: Ketil Johnsen <ketil.johnsen@arm.com>
>> ---
>>   drivers/gpu/drm/panthor/panthor_fw.c    |  3 +++
>>   drivers/gpu/drm/panthor/panthor_sched.c | 12 ++++++++++++
>>   drivers/gpu/drm/panthor/panthor_sched.h |  1 +
>>   3 files changed, 16 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/panthor/panthor_fw.c b/drivers/gpu/drm/panthor/panthor_fw.c
>> index 9bf06e55eaee..4f393c5cd26f 100644
>> --- a/drivers/gpu/drm/panthor/panthor_fw.c
>> +++ b/drivers/gpu/drm/panthor/panthor_fw.c
>> @@ -1172,6 +1172,9 @@ void panthor_fw_unplug(struct panthor_device *ptdev)
>>   		panthor_fw_stop(ptdev);
>>   	}
>>   
>> +	/* Any pending FW event processing must stop before we free FW memory */
>> +	panthor_sched_stop_fw_events(ptdev);
>> +
>>   	list_for_each_entry(section, &ptdev->fw->sections, node)
>>   		panthor_kernel_bo_destroy(section->mem);
>>   
>> diff --git a/drivers/gpu/drm/panthor/panthor_sched.c b/drivers/gpu/drm/panthor/panthor_sched.c
>> index 0cc9055f4ee5..d150c8d99432 100644
>> --- a/drivers/gpu/drm/panthor/panthor_sched.c
>> +++ b/drivers/gpu/drm/panthor/panthor_sched.c
>> @@ -1794,6 +1794,18 @@ void panthor_sched_report_fw_events(struct panthor_device *ptdev, u32 events)
>>   	sched_queue_work(ptdev->scheduler, fw_events);
>>   }
>>   
>> +/**
>> + * panthor_sched_stop_fw_events() - Stop processing FW events.
>> + */
>> +void panthor_sched_stop_fw_events(struct panthor_device *ptdev)
>> +{
>> +	if (!ptdev->scheduler)
>> +		return;
>> +
>> +	atomic_set(&ptdev->scheduler->fw_events, 0);
>> +	cancel_work_sync(&ptdev->scheduler->fw_events_work);
>> +}
> 
> Hm, I'd rather have this called from sched_unplug() and then have an
> extra check in panthor_sched_report_fw_events() to bail out if the
> scheduler component is no longer functional. This way this helper stays
> private to panthor_sched.c.

A heads up on this from me:

I found a new race in the driver, somewhat similar to this one, as I was 
trying your suggested approach here. Simply put, panthor_device_unplug() 
calls drm_dev_unplug() at a time where there could be a running 
panthor_device_suspend(). This causes the suspend routine to skip a lot 
of work, for instance skip sync with running IRQ handlers, and boom, IRQ 
handlers will access a powered off GPU.

I will (most likely) push a v2 with both races fixed, as they both 
relate to interrupt processing while GPU is off.

> 
>> +
>>   static const char *fence_get_driver_name(struct dma_fence *fence)
>>   {
>>   	return "panthor";
>> diff --git a/drivers/gpu/drm/panthor/panthor_sched.h b/drivers/gpu/drm/panthor/panthor_sched.h
>> index f4a475aa34c0..4393599ed330 100644
>> --- a/drivers/gpu/drm/panthor/panthor_sched.h
>> +++ b/drivers/gpu/drm/panthor/panthor_sched.h
>> @@ -51,6 +51,7 @@ void panthor_sched_resume(struct panthor_device *ptdev);
>>   
>>   void panthor_sched_report_mmu_fault(struct panthor_device *ptdev);
>>   void panthor_sched_report_fw_events(struct panthor_device *ptdev, u32 events);
>> +void panthor_sched_stop_fw_events(struct panthor_device *ptdev);
>>   
>>   void panthor_fdinfo_gather_group_samples(struct panthor_file *pfile);
>>   
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-10-17  9:47 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-08 10:53 [PATCH] drm/panthor: Fix UAF race between device unplug and FW event processing Ketil Johnsen
2025-10-09  9:45 ` Boris Brezillon
2025-10-17  9:46   ` Ketil Johnsen
2025-10-09 22:18 ` kernel test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox