* [PATCH] drm/fb-helper: Fix race between deferred_io worker and dirty updater
@ 2016-10-20 13:20 Takashi Iwai
2016-10-20 13:23 ` Takashi Iwai
2016-10-20 13:28 ` Ville Syrjälä
0 siblings, 2 replies; 7+ messages in thread
From: Takashi Iwai @ 2016-10-20 13:20 UTC (permalink / raw)
To: dri-devel; +Cc: David Airlie, linux-kernel, Daniel Vetter, Noralf Trønnes
Since 4.7 kernel, we've seen the error messages like
kernel: [TTM] Buffer eviction failed
kernel: qxl 0000:00:02.0: object_init failed for (4026540032, 0x00000001)
kernel: [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate VRAM BO
on QXL when switching and accessing on VT. The culprit was the generic
deferred_io code (qxl driver switched to it since 4.7). There is a
race between the dirty clip update and the call of callback.
In drm_fb_helper_dirty(), the dirty clip is updated in the spinlock,
while it kicks off the update worker outside the spinlock. Meanwhile
the update worker clears the dirty clip in the spinlock, too. Thus,
when drm_fb_helper_dirty() is called concurrently, schedule_work() is
called after the clip is cleared in the first worker call.
The fix is simply moving schedule_work() inside the spinlock.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98322
Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1003298
Fixes: eaa434defaca ('drm/fb-helper: Add fb_deferred_io support')
Signed-off-by: Takashi Iwai <tiwai@suse.de>
---
drivers/gpu/drm/drm_fb_helper.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c
index 03414bde1f15..bae392dea2cc 100644
--- a/drivers/gpu/drm/drm_fb_helper.c
+++ b/drivers/gpu/drm/drm_fb_helper.c
@@ -861,9 +861,8 @@ static void drm_fb_helper_dirty(struct fb_info *info, u32 x, u32 y,
clip->y1 = min_t(u32, clip->y1, y);
clip->x2 = max_t(u32, clip->x2, x + width);
clip->y2 = max_t(u32, clip->y2, y + height);
- spin_unlock_irqrestore(&helper->dirty_lock, flags);
-
schedule_work(&helper->dirty_work);
+ spin_unlock_irqrestore(&helper->dirty_lock, flags);
}
/**
--
2.10.1
^ permalink raw reply related [flat|nested] 7+ messages in thread* Re: [PATCH] drm/fb-helper: Fix race between deferred_io worker and dirty updater 2016-10-20 13:20 [PATCH] drm/fb-helper: Fix race between deferred_io worker and dirty updater Takashi Iwai @ 2016-10-20 13:23 ` Takashi Iwai 2016-10-20 13:28 ` Ville Syrjälä 1 sibling, 0 replies; 7+ messages in thread From: Takashi Iwai @ 2016-10-20 13:23 UTC (permalink / raw) To: dri-devel; +Cc: David Airlie, linux-kernel, Daniel Vetter, Noralf Trønnes On Thu, 20 Oct 2016 15:20:55 +0200, Takashi Iwai wrote: > > Since 4.7 kernel, we've seen the error messages like > > kernel: [TTM] Buffer eviction failed > kernel: qxl 0000:00:02.0: object_init failed for (4026540032, 0x00000001) > kernel: [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate VRAM BO > > on QXL when switching and accessing on VT. The culprit was the generic > deferred_io code (qxl driver switched to it since 4.7). There is a > race between the dirty clip update and the call of callback. > > In drm_fb_helper_dirty(), the dirty clip is updated in the spinlock, > while it kicks off the update worker outside the spinlock. Meanwhile > the update worker clears the dirty clip in the spinlock, too. Thus, > when drm_fb_helper_dirty() is called concurrently, schedule_work() is > called after the clip is cleared in the first worker call. > > The fix is simply moving schedule_work() inside the spinlock. > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98322 > Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1003298 > Fixes: eaa434defaca ('drm/fb-helper: Add fb_deferred_io support') > Signed-off-by: Takashi Iwai <tiwai@suse.de> I forgot to put Cc to stable. Feel free to add it once if it's accepted. thanks, Takashi > --- > drivers/gpu/drm/drm_fb_helper.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c > index 03414bde1f15..bae392dea2cc 100644 > --- a/drivers/gpu/drm/drm_fb_helper.c > +++ b/drivers/gpu/drm/drm_fb_helper.c > @@ -861,9 +861,8 @@ static void drm_fb_helper_dirty(struct fb_info *info, u32 x, u32 y, > clip->y1 = min_t(u32, clip->y1, y); > clip->x2 = max_t(u32, clip->x2, x + width); > clip->y2 = max_t(u32, clip->y2, y + height); > - spin_unlock_irqrestore(&helper->dirty_lock, flags); > - > schedule_work(&helper->dirty_work); > + spin_unlock_irqrestore(&helper->dirty_lock, flags); > } > > /** > -- > 2.10.1 > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] drm/fb-helper: Fix race between deferred_io worker and dirty updater 2016-10-20 13:20 [PATCH] drm/fb-helper: Fix race between deferred_io worker and dirty updater Takashi Iwai 2016-10-20 13:23 ` Takashi Iwai @ 2016-10-20 13:28 ` Ville Syrjälä 2016-10-20 13:36 ` Takashi Iwai 1 sibling, 1 reply; 7+ messages in thread From: Ville Syrjälä @ 2016-10-20 13:28 UTC (permalink / raw) To: Takashi Iwai; +Cc: dri-devel, Daniel Vetter, linux-kernel On Thu, Oct 20, 2016 at 03:20:55PM +0200, Takashi Iwai wrote: > Since 4.7 kernel, we've seen the error messages like > > kernel: [TTM] Buffer eviction failed > kernel: qxl 0000:00:02.0: object_init failed for (4026540032, 0x00000001) > kernel: [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate VRAM BO > > on QXL when switching and accessing on VT. The culprit was the generic > deferred_io code (qxl driver switched to it since 4.7). There is a > race between the dirty clip update and the call of callback. > > In drm_fb_helper_dirty(), the dirty clip is updated in the spinlock, > while it kicks off the update worker outside the spinlock. Meanwhile > the update worker clears the dirty clip in the spinlock, too. Thus, > when drm_fb_helper_dirty() is called concurrently, schedule_work() is > called after the clip is cleared in the first worker call. Why does that matter? The first worker should have done all the necessary work already, no? > > The fix is simply moving schedule_work() inside the spinlock. > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98322 > Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1003298 > Fixes: eaa434defaca ('drm/fb-helper: Add fb_deferred_io support') > Signed-off-by: Takashi Iwai <tiwai@suse.de> > --- > drivers/gpu/drm/drm_fb_helper.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c > index 03414bde1f15..bae392dea2cc 100644 > --- a/drivers/gpu/drm/drm_fb_helper.c > +++ b/drivers/gpu/drm/drm_fb_helper.c > @@ -861,9 +861,8 @@ static void drm_fb_helper_dirty(struct fb_info *info, u32 x, u32 y, > clip->y1 = min_t(u32, clip->y1, y); > clip->x2 = max_t(u32, clip->x2, x + width); > clip->y2 = max_t(u32, clip->y2, y + height); > - spin_unlock_irqrestore(&helper->dirty_lock, flags); > - > schedule_work(&helper->dirty_work); > + spin_unlock_irqrestore(&helper->dirty_lock, flags); > } > > /** > -- > 2.10.1 > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel -- Ville Syrjälä Intel OTC ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] drm/fb-helper: Fix race between deferred_io worker and dirty updater 2016-10-20 13:28 ` Ville Syrjälä @ 2016-10-20 13:36 ` Takashi Iwai 2016-10-20 14:17 ` Ville Syrjälä 2016-10-20 14:34 ` Daniel Vetter 0 siblings, 2 replies; 7+ messages in thread From: Takashi Iwai @ 2016-10-20 13:36 UTC (permalink / raw) To: Ville Syrjälä; +Cc: dri-devel, Daniel Vetter, linux-kernel On Thu, 20 Oct 2016 15:28:14 +0200, Ville Syrjälä wrote: > > On Thu, Oct 20, 2016 at 03:20:55PM +0200, Takashi Iwai wrote: > > Since 4.7 kernel, we've seen the error messages like > > > > kernel: [TTM] Buffer eviction failed > > kernel: qxl 0000:00:02.0: object_init failed for (4026540032, 0x00000001) > > kernel: [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate VRAM BO > > > > on QXL when switching and accessing on VT. The culprit was the generic > > deferred_io code (qxl driver switched to it since 4.7). There is a > > race between the dirty clip update and the call of callback. > > > > In drm_fb_helper_dirty(), the dirty clip is updated in the spinlock, > > while it kicks off the update worker outside the spinlock. Meanwhile > > the update worker clears the dirty clip in the spinlock, too. Thus, > > when drm_fb_helper_dirty() is called concurrently, schedule_work() is > > called after the clip is cleared in the first worker call. > > Why does that matter? The first worker should have done all the > necessary work already, no? Before the first call, it clears the clip and passes the copied clip to the callback. Then the second call will be with the cleared and untouched clip, i.e. with x1=~0. This confuses qxl_framebuffer_dirty(). Of course, we can filter out in the callback side by checking the clip. It was actually my first version. But basically it's a race and should be covered better in the caller side. thanks, Takashi > > > > > The fix is simply moving schedule_work() inside the spinlock. > > > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98322 > > Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1003298 > > Fixes: eaa434defaca ('drm/fb-helper: Add fb_deferred_io support') > > Signed-off-by: Takashi Iwai <tiwai@suse.de> > > --- > > drivers/gpu/drm/drm_fb_helper.c | 3 +-- > > 1 file changed, 1 insertion(+), 2 deletions(-) > > > > diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c > > index 03414bde1f15..bae392dea2cc 100644 > > --- a/drivers/gpu/drm/drm_fb_helper.c > > +++ b/drivers/gpu/drm/drm_fb_helper.c > > @@ -861,9 +861,8 @@ static void drm_fb_helper_dirty(struct fb_info *info, u32 x, u32 y, > > clip->y1 = min_t(u32, clip->y1, y); > > clip->x2 = max_t(u32, clip->x2, x + width); > > clip->y2 = max_t(u32, clip->y2, y + height); > > - spin_unlock_irqrestore(&helper->dirty_lock, flags); > > - > > schedule_work(&helper->dirty_work); > > + spin_unlock_irqrestore(&helper->dirty_lock, flags); > > } > > > > /** > > -- > > 2.10.1 > > > > _______________________________________________ > > dri-devel mailing list > > dri-devel@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/dri-devel > > -- > Ville Syrjälä > Intel OTC > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] drm/fb-helper: Fix race between deferred_io worker and dirty updater 2016-10-20 13:36 ` Takashi Iwai @ 2016-10-20 14:17 ` Ville Syrjälä 2016-10-20 14:35 ` Takashi Iwai 2016-10-20 14:34 ` Daniel Vetter 1 sibling, 1 reply; 7+ messages in thread From: Ville Syrjälä @ 2016-10-20 14:17 UTC (permalink / raw) To: Takashi Iwai; +Cc: dri-devel, Daniel Vetter, linux-kernel On Thu, Oct 20, 2016 at 03:36:54PM +0200, Takashi Iwai wrote: > On Thu, 20 Oct 2016 15:28:14 +0200, > Ville Syrjälä wrote: > > > > On Thu, Oct 20, 2016 at 03:20:55PM +0200, Takashi Iwai wrote: > > > Since 4.7 kernel, we've seen the error messages like > > > > > > kernel: [TTM] Buffer eviction failed > > > kernel: qxl 0000:00:02.0: object_init failed for (4026540032, 0x00000001) > > > kernel: [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate VRAM BO > > > > > > on QXL when switching and accessing on VT. The culprit was the generic > > > deferred_io code (qxl driver switched to it since 4.7). There is a > > > race between the dirty clip update and the call of callback. > > > > > > In drm_fb_helper_dirty(), the dirty clip is updated in the spinlock, > > > while it kicks off the update worker outside the spinlock. Meanwhile > > > the update worker clears the dirty clip in the spinlock, too. Thus, > > > when drm_fb_helper_dirty() is called concurrently, schedule_work() is > > > called after the clip is cleared in the first worker call. > > > > Why does that matter? The first worker should have done all the > > necessary work already, no? > > Before the first call, it clears the clip and passes the copied clip > to the callback. Then the second call will be with the cleared and > untouched clip, i.e. with x1=~0. This confuses > qxl_framebuffer_dirty(). > > Of course, we can filter out in the callback side by checking the > clip. It was actually my first version. But basically it's a race > and should be covered better in the caller side. The race is still there AFAICS. The worker may already be executing but not yet in the critical section, at which point drm_fb_helper_dirty() will expand the dirty rectangle, and schedule another work. So the first worker will already see the expanded rectangle, and second worker will get zilch. I think the only good fix is to have the worker validate the dirty rectangle before calling the driver. > > > thanks, > > Takashi > > > > > > > > > The fix is simply moving schedule_work() inside the spinlock. > > > > > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98322 > > > Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1003298 > > > Fixes: eaa434defaca ('drm/fb-helper: Add fb_deferred_io support') > > > Signed-off-by: Takashi Iwai <tiwai@suse.de> > > > --- > > > drivers/gpu/drm/drm_fb_helper.c | 3 +-- > > > 1 file changed, 1 insertion(+), 2 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c > > > index 03414bde1f15..bae392dea2cc 100644 > > > --- a/drivers/gpu/drm/drm_fb_helper.c > > > +++ b/drivers/gpu/drm/drm_fb_helper.c > > > @@ -861,9 +861,8 @@ static void drm_fb_helper_dirty(struct fb_info *info, u32 x, u32 y, > > > clip->y1 = min_t(u32, clip->y1, y); > > > clip->x2 = max_t(u32, clip->x2, x + width); > > > clip->y2 = max_t(u32, clip->y2, y + height); > > > - spin_unlock_irqrestore(&helper->dirty_lock, flags); > > > - > > > schedule_work(&helper->dirty_work); > > > + spin_unlock_irqrestore(&helper->dirty_lock, flags); > > > } > > > > > > /** > > > -- > > > 2.10.1 > > > > > > _______________________________________________ > > > dri-devel mailing list > > > dri-devel@lists.freedesktop.org > > > https://lists.freedesktop.org/mailman/listinfo/dri-devel > > > > -- > > Ville Syrjälä > > Intel OTC > > -- Ville Syrjälä Intel OTC ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] drm/fb-helper: Fix race between deferred_io worker and dirty updater 2016-10-20 14:17 ` Ville Syrjälä @ 2016-10-20 14:35 ` Takashi Iwai 0 siblings, 0 replies; 7+ messages in thread From: Takashi Iwai @ 2016-10-20 14:35 UTC (permalink / raw) To: Ville Syrjälä; +Cc: dri-devel, Daniel Vetter, linux-kernel On Thu, 20 Oct 2016 16:17:25 +0200, Ville Syrjälä wrote: > > On Thu, Oct 20, 2016 at 03:36:54PM +0200, Takashi Iwai wrote: > > On Thu, 20 Oct 2016 15:28:14 +0200, > > Ville Syrjälä wrote: > > > > > > On Thu, Oct 20, 2016 at 03:20:55PM +0200, Takashi Iwai wrote: > > > > Since 4.7 kernel, we've seen the error messages like > > > > > > > > kernel: [TTM] Buffer eviction failed > > > > kernel: qxl 0000:00:02.0: object_init failed for (4026540032, 0x00000001) > > > > kernel: [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate VRAM BO > > > > > > > > on QXL when switching and accessing on VT. The culprit was the generic > > > > deferred_io code (qxl driver switched to it since 4.7). There is a > > > > race between the dirty clip update and the call of callback. > > > > > > > > In drm_fb_helper_dirty(), the dirty clip is updated in the spinlock, > > > > while it kicks off the update worker outside the spinlock. Meanwhile > > > > the update worker clears the dirty clip in the spinlock, too. Thus, > > > > when drm_fb_helper_dirty() is called concurrently, schedule_work() is > > > > called after the clip is cleared in the first worker call. > > > > > > Why does that matter? The first worker should have done all the > > > necessary work already, no? > > > > Before the first call, it clears the clip and passes the copied clip > > to the callback. Then the second call will be with the cleared and > > untouched clip, i.e. with x1=~0. This confuses > > qxl_framebuffer_dirty(). > > > > Of course, we can filter out in the callback side by checking the > > clip. It was actually my first version. But basically it's a race > > and should be covered better in the caller side. > > The race is still there AFAICS. The worker may already be executing but > not yet in the critical section, at which point drm_fb_helper_dirty() > will expand the dirty rectangle, and schedule another work. So the first > worker will already see the expanded rectangle, and second worker will > get zilch. Hrm, right, there's a slight race window there. > I think the only good fix is to have the worker validate the dirty > rectangle before calling the driver. OK, let me cook it quickly. (It was actually the second version of the patch I wrote, and I sent the third one :) thanks, Takashi ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] drm/fb-helper: Fix race between deferred_io worker and dirty updater 2016-10-20 13:36 ` Takashi Iwai 2016-10-20 14:17 ` Ville Syrjälä @ 2016-10-20 14:34 ` Daniel Vetter 1 sibling, 0 replies; 7+ messages in thread From: Daniel Vetter @ 2016-10-20 14:34 UTC (permalink / raw) To: Takashi Iwai Cc: Ville Syrjälä, dri-devel, Daniel Vetter, linux-kernel On Thu, Oct 20, 2016 at 03:36:54PM +0200, Takashi Iwai wrote: > On Thu, 20 Oct 2016 15:28:14 +0200, > Ville Syrjälä wrote: > > > > On Thu, Oct 20, 2016 at 03:20:55PM +0200, Takashi Iwai wrote: > > > Since 4.7 kernel, we've seen the error messages like > > > > > > kernel: [TTM] Buffer eviction failed > > > kernel: qxl 0000:00:02.0: object_init failed for (4026540032, 0x00000001) > > > kernel: [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate VRAM BO > > > > > > on QXL when switching and accessing on VT. The culprit was the generic > > > deferred_io code (qxl driver switched to it since 4.7). There is a > > > race between the dirty clip update and the call of callback. > > > > > > In drm_fb_helper_dirty(), the dirty clip is updated in the spinlock, > > > while it kicks off the update worker outside the spinlock. Meanwhile > > > the update worker clears the dirty clip in the spinlock, too. Thus, > > > when drm_fb_helper_dirty() is called concurrently, schedule_work() is > > > called after the clip is cleared in the first worker call. > > > > Why does that matter? The first worker should have done all the > > necessary work already, no? > > Before the first call, it clears the clip and passes the copied clip > to the callback. Then the second call will be with the cleared and > untouched clip, i.e. with x1=~0. This confuses > qxl_framebuffer_dirty(). > > Of course, we can filter out in the callback side by checking the > clip. It was actually my first version. But basically it's a race > and should be covered better in the caller side. Hm, I thought schedule_work also schedules the work when it's getting processed right now. Which means if you're super unlucky you can still end up with the work hitting an empty rectangle. I think filtering empty rects in the worker is what we need to do instead. Or is coffee not working right now? -Daniel > > > thanks, > > Takashi > > > > > > > > > The fix is simply moving schedule_work() inside the spinlock. > > > > > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98322 > > > Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1003298 > > > Fixes: eaa434defaca ('drm/fb-helper: Add fb_deferred_io support') > > > Signed-off-by: Takashi Iwai <tiwai@suse.de> > > > --- > > > drivers/gpu/drm/drm_fb_helper.c | 3 +-- > > > 1 file changed, 1 insertion(+), 2 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c > > > index 03414bde1f15..bae392dea2cc 100644 > > > --- a/drivers/gpu/drm/drm_fb_helper.c > > > +++ b/drivers/gpu/drm/drm_fb_helper.c > > > @@ -861,9 +861,8 @@ static void drm_fb_helper_dirty(struct fb_info *info, u32 x, u32 y, > > > clip->y1 = min_t(u32, clip->y1, y); > > > clip->x2 = max_t(u32, clip->x2, x + width); > > > clip->y2 = max_t(u32, clip->y2, y + height); > > > - spin_unlock_irqrestore(&helper->dirty_lock, flags); > > > - > > > schedule_work(&helper->dirty_work); > > > + spin_unlock_irqrestore(&helper->dirty_lock, flags); > > > } > > > > > > /** > > > -- > > > 2.10.1 > > > > > > _______________________________________________ > > > dri-devel mailing list > > > dri-devel@lists.freedesktop.org > > > https://lists.freedesktop.org/mailman/listinfo/dri-devel > > > > -- > > Ville Syrjälä > > Intel OTC > > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2016-10-20 14:35 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-10-20 13:20 [PATCH] drm/fb-helper: Fix race between deferred_io worker and dirty updater Takashi Iwai 2016-10-20 13:23 ` Takashi Iwai 2016-10-20 13:28 ` Ville Syrjälä 2016-10-20 13:36 ` Takashi Iwai 2016-10-20 14:17 ` Ville Syrjälä 2016-10-20 14:35 ` Takashi Iwai 2016-10-20 14:34 ` Daniel Vetter
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox