Re: [PATCH] drm/fb-helper: Fix race between deferred_io worker and dirty updater

From: Daniel Vetter <daniel@ffwll.ch>
To: Takashi Iwai <tiwai@suse.de>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>,
	dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] drm/fb-helper: Fix race between deferred_io worker and dirty updater
Date: Thu, 20 Oct 2016 16:34:51 +0200	[thread overview]
Message-ID: <20161020143451.GV20761@phenom.ffwll.local> (raw)
In-Reply-To: <s5h37jr16rt.wl-tiwai@suse.de>

On Thu, Oct 20, 2016 at 03:36:54PM +0200, Takashi Iwai wrote:
> On Thu, 20 Oct 2016 15:28:14 +0200,
> Ville Syrjälä wrote:
> > 
> > On Thu, Oct 20, 2016 at 03:20:55PM +0200, Takashi Iwai wrote:
> > > Since 4.7 kernel, we've seen the error messages like
> > > 
> > >  kernel: [TTM] Buffer eviction failed
> > >  kernel: qxl 0000:00:02.0: object_init failed for (4026540032, 0x00000001)
> > >  kernel: [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate VRAM BO
> > > 
> > > on QXL when switching and accessing on VT.  The culprit was the generic
> > > deferred_io code (qxl driver switched to it since 4.7).  There is a
> > > race between the dirty clip update and the call of callback.
> > > 
> > > In drm_fb_helper_dirty(), the dirty clip is updated in the spinlock,
> > > while it kicks off the update worker outside the spinlock.  Meanwhile
> > > the update worker clears the dirty clip in the spinlock, too.  Thus,
> > > when drm_fb_helper_dirty() is called concurrently, schedule_work() is
> > > called after the clip is cleared in the first worker call.
> > 
> > Why does that matter? The first worker should have done all the
> > necessary work already, no?
> 
> Before the first call, it clears the clip and passes the copied clip
> to the callback.  Then the second call will be with the cleared and
> untouched clip, i.e. with x1=~0.  This confuses
> qxl_framebuffer_dirty().
> 
> Of course, we can filter out in the callback side by checking the
> clip.  It was actually my first version.  But basically it's a race
> and should be covered better in the caller side.

Hm, I thought schedule_work also schedules the work when it's getting
processed right now. Which means if you're super unlucky you can still end
up with the work hitting an empty rectangle. I think filtering empty rects
in the worker is what we need to do instead.

Or is coffee not working right now?
-Daniel
> 
> 
> thanks,
> 
> Takashi
> 
> > 
> > > 
> > > The fix is simply moving schedule_work() inside the spinlock.
> > > 
> > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98322
> > > Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1003298
> > > Fixes: eaa434defaca ('drm/fb-helper: Add fb_deferred_io support')
> > > Signed-off-by: Takashi Iwai <tiwai@suse.de>
> > > ---
> > >  drivers/gpu/drm/drm_fb_helper.c | 3 +--
> > >  1 file changed, 1 insertion(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c
> > > index 03414bde1f15..bae392dea2cc 100644
> > > --- a/drivers/gpu/drm/drm_fb_helper.c
> > > +++ b/drivers/gpu/drm/drm_fb_helper.c
> > > @@ -861,9 +861,8 @@ static void drm_fb_helper_dirty(struct fb_info *info, u32 x, u32 y,
> > >  	clip->y1 = min_t(u32, clip->y1, y);
> > >  	clip->x2 = max_t(u32, clip->x2, x + width);
> > >  	clip->y2 = max_t(u32, clip->y2, y + height);
> > > -	spin_unlock_irqrestore(&helper->dirty_lock, flags);
> > > -
> > >  	schedule_work(&helper->dirty_work);
> > > +	spin_unlock_irqrestore(&helper->dirty_lock, flags);
> > >  }
> > >  
> > >  /**
> > > -- 
> > > 2.10.1
> > > 
> > > _______________________________________________
> > > dri-devel mailing list
> > > dri-devel@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> > 
> > -- 
> > Ville Syrjälä
> > Intel OTC
> > 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel