Linux kernel -stable discussions
 help / color / mirror / Atom feed
From: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
To: "Niklas Söderlund" <niklas.soderlund@ragnatech.se>
Cc: linux-media@vger.kernel.org, linux-renesas-soc@vger.kernel.org,
	Kieran Bingham <kieran.bingham@ideasonboard.com>,
	Jacopo Mondi <jacopo.mondi@ideasonboard.com>,
	stable@vger.kernel.org
Subject: Re: [PATCH] media: renesas: vsp1: Fix race condition when stopping display pipeline
Date: Tue, 12 May 2026 23:14:35 +0300	[thread overview]
Message-ID: <20260512201435.GF4107@killaraus.ideasonboard.com> (raw)
In-Reply-To: <20260512164339.GB332351@ragnatech.se>

Hi Niklas,

Thank you for the investigation.

On Tue, May 12, 2026 at 06:43:39PM +0200, Niklas Söderlund wrote:
> On 2026-05-12 17:36:01 +0200, Niklas Söderlund wrote:
> > On 2026-05-12 01:38:32 +0300, Laurent Pinchart wrote:
> > > When stopping a display pipeline, the vsp1_du_setup_lif() function first
> > > stops the hardware by calling vsp1_pipeline_stop(), and then resets
> > > drm_pipe->du_complete to NULL. Stopping the hardware ensures no new
> > > interrupt is generated, but does not wait for completion of any running
> > > interrupt handler. This creates a race with vsp1_du_pipeline_frame_end()
> > > which tests drm_pipe->du_complete before calling the function pointer.
> > > If the drm_pipe->du_complete pointer is reset to NULL between those two
> > > operations, a NULL pointer derefence will occur.
> > > 
> > > Fix this by setting pipe->state to STOPPING before stopping the
> > > hardware, and avoid calling the frame end handler if the state is not
> > > RUNNING. Condition the latter to the display pipeline, as the other
> > > pipeline use a different stop procedure that waits for the frame end
> > > handler to set the state to STOPPED.
> > > 
> > > The state check needs to be protected by the pipe->irqlock. The lock is
> > > used by the video and vspx completion handlers already, so move it one
> > > level up to vsp1_pipeline_frame_end().
> > 
> > Running with this and the still out-of-tree ISP driver I hit a splat. It 
> > might be an issue in the ISP driver, I will dig some more. But for 
> > reference I log the splat here. My stress test also seems to trigger the 
> > deadlock.
> 
> I did some more digging, and indeed the deadlock is a combination of the 
> R-Car ISP driver and this change. The usage pattern by the ISP is,
> 
> void prepare_next_job_for_vspx(...)
> {
>     lockdep_assert_held(&core->lock);
> 
>     /*
>      * Collect resources protected by core->lock into the next VSPX job.
>      */
>     myjob = ...;
> 
>     if(vsp1_isp_job_run(myjob)) {
>        /* Error path, clean up. */
> 
>        vsp1_isp_job_release(myjob);
> 
>        /* Cleanup resources protected by core->lock */
>     }
> }
> 
> void risp_vspx_frame_end_callback(...)
> {
>     /* I am called by the VSPX from vsp1_pipeline_frame_end() */
> 
>     guard(spinlock_irqsave)(&core->lock);
> 
>     /* 
>      * Act on resources protected by core->lock that the VSPX is now 
>      * done processing.
>      */
> 
>     prepare_next_job_for_vspx(...);
> }
> 
> void start_work_by_queueing_first_job_to_vspx(...)
> {
>     /*
>      * I'm called at stream on time in this example to get the ball 
>      * rolling.
>      */
> 
>     guard(spinlock_irqsave)(&core->lock);
> 
>     prepare_next_job_for_vspx(...)
> }
> 
> The R-Car ISP driver could possibly be reworked to release the 
> core->lock before calling vsp1_isp_job_run(). But it would then need to 
> retake the lock in the error path, which seems messy.
> 
> If it all possible do think this patch can be reworked to call the user 
> registered callback... (see below)
> 
> > [   32.111727] ======================================================
> > [   32.112507] WARNING: possible circular locking dependency detected
> > [   32.113287] 7.1.0-rc1-arm64-renesas-02551-g0fc35b78411a #17 Not tainted
> > [   32.114122] ------------------------------------------------------
> > [   32.114900] swapper/0/0 is trying to acquire lock:
> > [   32.115506] ffff000442e1ef30 (&core->lock){-...}-{3:3}, at: risp_core_svspx_frame_end+0x24/0x60
> > [   32.116624]
> >                but task is already holding lock:
> > [   32.117358] ffff000442e16cc8 (&pipe->irqlock){-...}-{3:3}, at: vsp1_pipeline_frame_end+0x4c/0xc0
> > [   32.118478]
> >                which lock already depends on the new lock.
> > 
> > [   32.119511]
> >                the existing dependency chain (in reverse order) is:
> > [   32.120457]
> >                -> #1 (&pipe->irqlock){-...}-{3:3}:
> > [   32.121223]        lock_acquire+0x27c/0x3fc
> > [   32.121764]        _raw_spin_lock_irqsave+0x58/0x80
> > [   32.122392]        vsp1_isp_job_run+0x84/0xf8
> > [   32.122952]        risp_core_job_run+0x1e4/0x2a4
> > [   32.123540]        risp_core_start_streaming+0x3d8/0x440
> > [   32.124215]        risp_io_start_streaming+0x64/0xe0
> > [   32.124846]        vb2_start_streaming+0x64/0x168
> > [   32.125449]        vb2_core_streamon+0xd0/0x1b8
> > [   32.126028]        vb2_ioctl_streamon+0x50/0x8c
> > [   32.126606]        v4l_streamon+0x20/0x28
> > [   32.127120]        __video_do_ioctl+0x344/0x3f0
> > [   32.127698]        video_usercopy+0x2e8/0x9f0
> > [   32.128254]        video_ioctl2+0x14/0x80
> > [   32.128766]        v4l2_ioctl+0x3c/0x60
> > [   32.129254]        __arm64_sys_ioctl+0x88/0xe0
> > [   32.129825]        invoke_syscall.constprop.0+0x3c/0x100
> > [   32.130504]        el0_svc_common.constprop.0+0x34/0xcc
> > [   32.131170]        do_el0_svc+0x18/0x20
> > [   32.131661]        el0_svc+0x3c/0x2b8
> > [   32.132131]        el0t_64_sync_handler+0x98/0xe0
> > [   32.132731]        el0t_64_sync+0x154/0x158
> > [   32.133265]
> >                -> #0 (&core->lock){-...}-{3:3}:
> > [   32.133999]        check_prev_add+0x10c/0xda0
> > [   32.134554]        __lock_acquire+0x129c/0x1584
> > [   32.135131]        lock_acquire+0x27c/0x3fc
> > [   32.135665]        _raw_spin_lock_irqsave+0x58/0x80
> > [   32.136286]        risp_core_svspx_frame_end+0x24/0x60
> > [   32.136939]        vsp1_vspx_pipeline_frame_end+0x1c/0x28
> > [   32.137626]        vsp1_pipeline_frame_end+0xa8/0xc0
> > [   32.138260]        vsp1_irq_handler+0xfc/0x12c
> > [   32.138828]        __handle_irq_event_percpu+0xa8/0x4cc
> > [   32.139497]        handle_irq_event+0x40/0x100
> > [   32.140065]        handle_fasteoi_irq+0xe8/0x210
> > [   32.140655]        handle_irq_desc+0x30/0x58
> > [   32.141202]        generic_handle_domain_irq+0x14/0x1c
> > [   32.141857]        gic_handle_irq+0x50/0xe0
> > [   32.142389]        call_on_irq_stack+0x30/0x60
> > [   32.142955]        do_interrupt_handler+0x78/0x7c
> > [   32.143555]        el1_interrupt+0x34/0x50
> > [   32.144079]        el1h_64_irq_handler+0x14/0x1c
> > [   32.144668]        el1h_64_irq+0x6c/0x70
> > [   32.145168]        cpuidle_enter_state+0xf4/0x440
> > [   32.145769]        cpuidle_enter+0x30/0x40
> > [   32.146295]        do_idle+0x16c/0x2d0
> > [   32.146776]        cpu_startup_entry+0x30/0x40
> > [   32.147342]        kernel_init+0x0/0x130
> > [   32.147842]        do_one_initcall+0x0/0x248
> > [   32.148392]        __primary_switched+0x88/0x90
> > [   32.148970]
> >                other info that might help us debug this:
> > 
> > [   32.149983]  Possible unsafe locking scenario:
> > 
> > [   32.150741]        CPU0                    CPU1
> > [   32.151325]        ----                    ----
> > [   32.151908]   lock(&pipe->irqlock);
> > [   32.152366]                                lock(&core->lock);
> > [   32.153110]                                lock(&pipe->irqlock);
> > [   32.153886]   lock(&core->lock);
> > [   32.154311]
> >                 *** DEADLOCK ***
> > 
> > [   32.155073] 1 lock held by swapper/0/0:
> > [   32.155572]  #0: ffff000442e16cc8 (&pipe->irqlock){-...}-{3:3}, at: vsp1_pipeline_frame_end+0x4c/0xc0
> > [   32.156780]
> >                stack backtrace:
> > [   32.157347] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 7.1.0-rc1-arm64-renesas-02551-g0fc35b78411a #17 PREEMPT
> > [   32.157359] Hardware name: Retronix Sparrow Hawk board based on r8a779g3 (DT)
> > [   32.157365] Call trace:
> > [   32.157369]  show_stack+0x14/0x1c (C)
> > [   32.157388]  dump_stack_lvl+0x6c/0x90
> > [   32.157396]  dump_stack+0x14/0x1c
> > [   32.157404]  print_circular_bug+0x254/0x2a0
> > [   32.157413]  check_noncircular+0x170/0x190
> > [   32.157422]  check_prev_add+0x10c/0xda0
> > [   32.157431]  __lock_acquire+0x129c/0x1584
> > [   32.157440]  lock_acquire+0x27c/0x3fc
> > [   32.157449]  _raw_spin_lock_irqsave+0x58/0x80
> > [   32.157460]  risp_core_svspx_frame_end+0x24/0x60
> > [   32.157468]  vsp1_vspx_pipeline_frame_end+0x1c/0x28
> > [   32.157480]  vsp1_pipeline_frame_end+0xa8/0xc0
> > [   32.157494]  vsp1_irq_handler+0xfc/0x12c
> > [   32.157507]  __handle_irq_event_percpu+0xa8/0x4cc
> > [   32.157522]  handle_irq_event+0x40/0x100
> > [   32.157537]  handle_fasteoi_irq+0xe8/0x210
> > [   32.157547]  handle_irq_desc+0x30/0x58
> > [   32.157560]  generic_handle_domain_irq+0x14/0x1c
> > [   32.157574]  gic_handle_irq+0x50/0xe0
> > [   32.157581]  call_on_irq_stack+0x30/0x60
> > [   32.157590]  do_interrupt_handler+0x78/0x7c
> > [   32.157600]  el1_interrupt+0x34/0x50
> > [   32.157611]  el1h_64_irq_handler+0x14/0x1c
> > [   32.157623]  el1h_64_irq+0x6c/0x70
> > [   32.157630]  cpuidle_enter_state+0xf4/0x440 (P)
> > [   32.157645]  cpuidle_enter+0x30/0x40
> > [   32.157655]  do_idle+0x16c/0x2d0
> > [   32.157665]  cpu_startup_entry+0x30/0x40
> > [   32.157675]  kernel_init+0x0/0x130
> > [   32.157682]  do_one_initcall+0x0/0x248
> > [   32.157694]  __primary_switched+0x88/0x90
> > 
> > > Fixes: d7ade201ae7f ("v4l: vsp1: Extend VSP1 module API to allow DRM callbacks")
> > > Cc: stable@vger.kernel.org
> > > Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
> > > ---
> > > 
> > > I have noticed this race condition while debugging a separate issue and
> > > adding printk() statements in the display pipeline frame end. I have
> > > tested the fix with the DU test suite and VSP test suite, exercising
> > > both the display and video pipelines. I'm fairly confident that the VSPX
> > > pipeline won't be negatively affected, but it would be good to
> > > double-check that. Jacopo, Niklas, would you be able to give test it ?
> > > 
> > > ---
> > >  drivers/media/platform/renesas/vsp1/vsp1_pipe.c  | 12 ++++++++++--
> > >  drivers/media/platform/renesas/vsp1/vsp1_video.c |  5 -----
> > >  drivers/media/platform/renesas/vsp1/vsp1_vspx.c  | 13 +++++--------
> > >  3 files changed, 15 insertions(+), 15 deletions(-)
> > > 
> > > diff --git a/drivers/media/platform/renesas/vsp1/vsp1_pipe.c b/drivers/media/platform/renesas/vsp1/vsp1_pipe.c
> > > index 5d769cc42fe1..aaec1aa15091 100644
> > > --- a/drivers/media/platform/renesas/vsp1/vsp1_pipe.c
> > > +++ b/drivers/media/platform/renesas/vsp1/vsp1_pipe.c
> > > @@ -509,6 +509,10 @@ int vsp1_pipeline_stop(struct vsp1_pipeline *pipe)
> > >  		 * When using display lists in continuous frame mode the only
> > >  		 * way to stop the pipeline is to reset the hardware.
> > >  		 */
> > > +		scoped_guard(spinlock_irqsave, &pipe->irqlock) {
> > > +			pipe->state = VSP1_PIPELINE_STOPPING;
> > > +		}
> > > +
> > >  		ret = vsp1_reset_wpf(vsp1, pipe->output->entity.index);
> > >  		if (ret == 0) {
> > >  			spin_lock_irqsave(&pipe->irqlock, flags);
> > > @@ -583,8 +587,12 @@ void vsp1_pipeline_frame_end(struct vsp1_pipeline *pipe)
> > >  	 * Regardless of frame completion we still need to notify the pipe
> > >  	 * frame_end to account for vblank events.
> > >  	 */
> > > -	if (pipe->frame_end)
> > > -		pipe->frame_end(pipe, flags);
> > > +	scoped_guard(spinlock_irqsave, &pipe->irqlock) {
> > > +		if (pipe->state == VSP1_PIPELINE_RUNNING || !pipe->lif) {
> > > +			if (pipe->frame_end)
> > > +				pipe->frame_end(pipe, flags);
> 
> .. here without holding the pipe->irqlock? AFIK this would be safe as 
> the user is in control on when the next VSPX job is scheduled. And would 
> IMHO make the API nicer as the user would be able to hold it's own locks 
> in the interaction of queueing new jobs from the frame_end callback.

We could release the lock here if !pipe->lif. I don't like that very
much though, as it will create different locking patterns for different
pipelines, which I was hoping to avoid :-/

Maybe pushing the pipe->state check to the individual frame_end handlers
would be better. I'll give it a try.

> I'm willing to try and rework the R-Car ISP driver for this, but will 
> await your feedback if you think it would be safe to avoid this issue in 
> the API design between VSPX provider and user.
> 
> > > +		}
> > > +	}
> > >  
> > >  	pipe->sequence++;
> > >  }
> > > diff --git a/drivers/media/platform/renesas/vsp1/vsp1_video.c b/drivers/media/platform/renesas/vsp1/vsp1_video.c
> > > index fe1dac11d4ae..a8db94bdb670 100644
> > > --- a/drivers/media/platform/renesas/vsp1/vsp1_video.c
> > > +++ b/drivers/media/platform/renesas/vsp1/vsp1_video.c
> > > @@ -325,14 +325,11 @@ static void vsp1_video_pipeline_frame_end(struct vsp1_pipeline *pipe,
> > >  {
> > >  	struct vsp1_device *vsp1 = pipe->output->entity.vsp1;
> > >  	enum vsp1_pipeline_state state;
> > > -	unsigned long flags;
> > >  	unsigned int i;
> > >  
> > >  	/* M2M Pipelines should never call here with an incomplete frame. */
> > >  	WARN_ON_ONCE(!(completion & VSP1_DL_FRAME_END_COMPLETED));
> > >  
> > > -	spin_lock_irqsave(&pipe->irqlock, flags);
> > > -
> > >  	/* Complete buffers on all video nodes. */
> > >  	for (i = 0; i < vsp1->info->rpf_count; ++i) {
> > >  		if (!pipe->inputs[i])
> > > @@ -354,8 +351,6 @@ static void vsp1_video_pipeline_frame_end(struct vsp1_pipeline *pipe,
> > >  		wake_up(&pipe->wq);
> > >  	else if (vsp1_pipeline_ready(pipe))
> > >  		vsp1_video_pipeline_run(pipe);
> > > -
> > > -	spin_unlock_irqrestore(&pipe->irqlock, flags);
> > >  }
> > >  
> > >  static int vsp1_video_pipeline_build_branch(struct vsp1_pipeline *pipe,
> > > diff --git a/drivers/media/platform/renesas/vsp1/vsp1_vspx.c b/drivers/media/platform/renesas/vsp1/vsp1_vspx.c
> > > index 1673479be0ff..26c477708858 100644
> > > --- a/drivers/media/platform/renesas/vsp1/vsp1_vspx.c
> > > +++ b/drivers/media/platform/renesas/vsp1/vsp1_vspx.c
> > > @@ -176,14 +176,11 @@ static void vsp1_vspx_pipeline_frame_end(struct vsp1_pipeline *pipe,
> > >  {
> > >  	struct vsp1_vspx_pipeline *vspx_pipe = to_vsp1_vspx_pipeline(pipe);
> > >  
> > > -	scoped_guard(spinlock_irqsave, &pipe->irqlock) {
> > > -		/*
> > > -		 * Operating the vsp1_pipe in singleshot mode requires to
> > > -		 * manually set the pipeline state to stopped when a transfer
> > > -		 * is completed.
> > > -		 */
> > > -		pipe->state = VSP1_PIPELINE_STOPPED;
> > > -	}
> > > +	/*
> > > +	 * Operating the vsp1_pipe in singleshot mode requires to manually set
> > > +	 * the pipeline state to stopped when a transfer is completed.
> > > +	 */
> > > +	pipe->state = VSP1_PIPELINE_STOPPED;
> > >  
> > >  	if (vspx_pipe->vspx_frame_end)
> > >  		vspx_pipe->vspx_frame_end(vspx_pipe->frame_end_data);
> > > 
> > > base-commit: bc1ba628e37c93cf2abeb2c79716f49087f8a024

-- 
Regards,

Laurent Pinchart

      reply	other threads:[~2026-05-12 20:14 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-11 22:38 [PATCH] media: renesas: vsp1: Fix race condition when stopping display pipeline Laurent Pinchart
2026-05-12  7:42 ` Jacopo Mondi
2026-05-12  8:28   ` Laurent Pinchart
2026-05-12 15:35 ` Niklas Söderlund
2026-05-12 16:43   ` Niklas Söderlund
2026-05-12 20:14     ` Laurent Pinchart [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260512201435.GF4107@killaraus.ideasonboard.com \
    --to=laurent.pinchart@ideasonboard.com \
    --cc=jacopo.mondi@ideasonboard.com \
    --cc=kieran.bingham@ideasonboard.com \
    --cc=linux-media@vger.kernel.org \
    --cc=linux-renesas-soc@vger.kernel.org \
    --cc=niklas.soderlund@ragnatech.se \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox