intel-gfx.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH] drm/i915: Protect fbdev across slow or failed initialisation
@ 2016-02-04  9:21 Li, Weinan Z
  2016-02-05  0:27 ` Lukas Wunner
  0 siblings, 1 reply; 7+ messages in thread
From: Li, Weinan Z @ 2016-02-04  9:21 UTC (permalink / raw)
  To: gustav.fagerlind@gmail.com, Lukas Wunner; +Cc: intel-gfx@lists.freedesktop.org


[-- Attachment #1.1: Type: text/plain, Size: 5982 bytes --]

Hi Wilson,
We still need this patch. Seems 54632abe8ca3 ("drm/i915: Fix oops caused by fbdev initialization
failure") as well as 366e39b4d2c5 ("drm/i915: Tear down fbdev if
initialization fails") this 2 patches can’t cover this case. It’s access ifbdev->fb before the initialization
finished, but not initialization failed. If don’t have any other patches or code update relative, it may still have in 4.5.

add info NULL check should be better, it is also initialized in the async queue
>       info = ifbdev->helper.fbdev;
> +     if (info == NULL)
> +            return false;
>       if (!info->screen_base)

BRs,
Weinan Li


From: Li, Weinan Z
Sent: Thursday, February 04, 2016 10:34 AM
To: 'gustav.fagerlind@gmail.com'; Lukas Wunner
Cc: Chris Wilson; intel-gfx@lists.freedesktop.org<mailto:intel-gfx@lists.freedesktop.org>
Subject: RE: [Intel-gfx] [PATCH] drm/i915: Protect fbdev across slow or failed initialisation

Thanks for your quick response.
Yes it is not easily be reproduced in native. In  iVGT we startup several VMs  simultaneously, it can be reproduced in several cycles, upon 1/10 fail rate.
Need to use GUI mode but not text mode to reproduce this issue.

BRs,
Weinan Li


From: Gustav Fägerlind [mailto:gustav.fagerlind@gmail.com]
Sent: Thursday, February 04, 2016 1:08 AM
To: Lukas Wunner
Cc: Chris Wilson; intel-gfx@lists.freedesktop.org<mailto:intel-gfx@lists.freedesktop.org>; Li, Weinan Z
Subject: Re: [Intel-gfx] [PATCH] drm/i915: Protect fbdev across slow or failed initialisation

Cool, thank you.
I dont believe I can easily reproduce it, it has only happend few times (and i reboot my lappy >2 times per day).

//
Gustav

2016-02-03 14:25 GMT+01:00 Lukas Wunner <lukas@wunner.de<mailto:lukas@wunner.de>>:
Hi,

On Wed, Feb 03, 2016 at 09:17:37AM +0000, Chris Wilson wrote:
> If the initialisation fails, we may be left with a dangling pointer with
> an incomplete fbdev structure.

This shouldn't happen with 4.5, the fbdev is now clobbered if initialization
fails, the existing "if (dev_priv->fbdev)" checks should thus be sufficient.
See 54632abe8ca3 ("drm/i915: Fix oops caused by fbdev initialization
failure") as well as 366e39b4d2c5 ("drm/i915: Tear down fbdev if
initialization fails").

Gustav Fagerlind and Li Weinan both reported this for 4.3. It would be
interesting to know if it can be reproduced at all with 4.5-rc2.

Best regards,

Lukas

> Here we want to disable internal calls
> into fbdev. Similarly, the initialisation may be slow and we haven't yet
> enabled the fbdev (e.g. quick suspend or last-close before the async init
> completes).
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93580
> Reported-by: "Li, Weinan Z" <weinan.z.li@intel.com<mailto:weinan.z.li@intel.com>>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk<mailto:chris@chris-wilson.co.uk>>
> ---
>  drivers/gpu/drm/i915/intel_fbdev.c | 41 ++++++++++++++++++++++++--------------
>  1 file changed, 26 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_fbdev.c b/drivers/gpu/drm/i915/intel_fbdev.c
> index 09840f4380f9..6218bc5370a1 100644
> --- a/drivers/gpu/drm/i915/intel_fbdev.c
> +++ b/drivers/gpu/drm/i915/intel_fbdev.c
> @@ -114,6 +114,20 @@ static struct fb_ops intelfb_ops = {
>       .fb_debug_leave = drm_fb_helper_debug_leave,
>  };
>
> +static bool intel_fbdev_active(struct intel_fbdev *ifbdev)
> +{
> +     struct fb_info *info;
> +
> +     if (ifbdev == NULL)
> +             return false;
> +
> +     info = ifbdev->helper.fbdev;
> +     if (!info->screen_base)
> +             return false;
> +
> +     return info->state == FBINFO_STATE_RUNNING;
> +}
> +
>  static int intelfb_alloc(struct drm_fb_helper *helper,
>                        struct drm_fb_helper_surface_size *sizes)
>  {
> @@ -753,6 +767,8 @@ void intel_fbdev_set_suspend(struct drm_device *dev, int state, bool synchronous
>               return;
>
>       info = ifbdev->helper.fbdev;
> +     if (!info->screen_base)
> +             return;
>
>       if (synchronous) {
>               /* Flush any pending work to turn the console on, and then
> @@ -794,29 +810,24 @@ void intel_fbdev_set_suspend(struct drm_device *dev, int state, bool synchronous
>
>  void intel_fbdev_output_poll_changed(struct drm_device *dev)
>  {
> -     struct drm_i915_private *dev_priv = dev->dev_private;
> -     if (dev_priv->fbdev)
> +     struct drm_i915_private *dev_priv = to_i915(dev);
> +
> +     if (intel_fbdev_active(dev_priv->fbdev))
>               drm_fb_helper_hotplug_event(&dev_priv->fbdev->helper);
>  }
>
>  void intel_fbdev_restore_mode(struct drm_device *dev)
>  {
> -     int ret;
> -     struct drm_i915_private *dev_priv = dev->dev_private;
> -     struct intel_fbdev *ifbdev = dev_priv->fbdev;
> -     struct drm_fb_helper *fb_helper;
> +     struct intel_fbdev *ifbdev = to_i915(dev)->fbdev;
>
> -     if (!ifbdev)
> +     if (!intel_fbdev_active(ifbdev))
>               return;
>
> -     fb_helper = &ifbdev->helper;
> -
> -     ret = drm_fb_helper_restore_fbdev_mode_unlocked(fb_helper);
> -     if (ret) {
> -             DRM_DEBUG("failed to restore crtc mode\n");
> -     } else {
> -             mutex_lock(&fb_helper->dev->struct_mutex);
> +     if (drm_fb_helper_restore_fbdev_mode_unlocked(&ifbdev->helper) == 0) {
> +             mutex_lock(&dev->struct_mutex);
>               intel_fb_obj_invalidate(ifbdev->fb->obj, ORIGIN_GTT);
> -             mutex_unlock(&fb_helper->dev->struct_mutex);
> +             mutex_unlock(&dev->struct_mutex);
> +     } else {
> +             DRM_DEBUG("failed to restore crtc mode\n");
>       }
>  }
> --
> 2.7.0
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org<mailto:Intel-gfx@lists.freedesktop.org>
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx


[-- Attachment #1.2: Type: text/html, Size: 17071 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] drm/i915: Protect fbdev across slow or failed initialisation
  2016-02-04  9:21 [PATCH] drm/i915: Protect fbdev across slow or failed initialisation Li, Weinan Z
@ 2016-02-05  0:27 ` Lukas Wunner
  2016-02-05 11:09   ` Chris Wilson
  0 siblings, 1 reply; 7+ messages in thread
From: Lukas Wunner @ 2016-02-05  0:27 UTC (permalink / raw)
  To: Li, Weinan Z; +Cc: intel-gfx@lists.freedesktop.org, gustav.fagerlind@gmail.com

Hi,

On Thu, Feb 04, 2016 at 09:21:17AM +0000, Li, Weinan Z wrote:
> We still need this patch. Seems 54632abe8ca3 ("drm/i915: Fix oops caused by fbdev initialization
> failure") as well as 366e39b4d2c5 ("drm/i915: Tear down fbdev if
> initialization fails") this 2 patches can???t cover this case. It???s access ifbdev->fb before the initialization
> finished, but not initialization failed. If don???t have any other patches or code update relative, it may still have in 4.5.

Okay, I see.

> 
> add info NULL check should be better, it is also initialized in the async queue
> >       info = ifbdev->helper.fbdev;
> > +     if (info == NULL)
> > +            return false;
> >       if (!info->screen_base)

So if there's indeed a race between fbdev initialization and the call to
intel_fbdev_restore_mode() (on lastclose), there's more that can go awry:
- intel_fbdev_restore_mode() dereferences ifbdev->fb->obj
- it calls __drm_atomic_helper_set_config() which WARNs if ifbdev->fb is NULL
- it calls drm_fb_helper_set_par() which dereferences ifbdev->helper.fbdev

Instead of checking all that it's probably simpler to call
async_synchronize_full() at the beginning of intel_fbdev_restore_mode(),
as Li Weinan suggested. I'll submit the corresponding one-liner patch
tomorrow if noone else does.

Chris' patch also modified intel_fbdev_set_suspend() as well as
intel_fbdev_output_poll_changed(), not sure if these are racy as well.
At least the stack traces posted by Li Weinan and Gustav Fägerlind
only indicate that lastclose is racy.

Best regards,

Lukas

> From: Li, Weinan Z
> Sent: Thursday, February 04, 2016 10:34 AM
> To: 'gustav.fagerlind@gmail.com'; Lukas Wunner
> Cc: Chris Wilson; intel-gfx@lists.freedesktop.org<mailto:intel-gfx@lists.freedesktop.org>
> Subject: RE: [Intel-gfx] [PATCH] drm/i915: Protect fbdev across slow or failed initialisation
> 
> Thanks for your quick response.
> Yes it is not easily be reproduced in native. In  iVGT we startup several VMs  simultaneously, it can be reproduced in several cycles, upon 1/10 fail rate.
> Need to use GUI mode but not text mode to reproduce this issue.
> 
> BRs,
> Weinan Li
> 
> 
> From: Gustav Fägerlind [mailto:gustav.fagerlind@gmail.com]
> Sent: Thursday, February 04, 2016 1:08 AM
> To: Lukas Wunner
> Cc: Chris Wilson; intel-gfx@lists.freedesktop.org<mailto:intel-gfx@lists.freedesktop.org>; Li, Weinan Z
> Subject: Re: [Intel-gfx] [PATCH] drm/i915: Protect fbdev across slow or failed initialisation
> 
> Cool, thank you.
> I dont believe I can easily reproduce it, it has only happend few times (and i reboot my lappy >2 times per day).
> 
> //
> Gustav
> 
> 2016-02-03 14:25 GMT+01:00 Lukas Wunner <lukas@wunner.de<mailto:lukas@wunner.de>>:
> Hi,
> 
> On Wed, Feb 03, 2016 at 09:17:37AM +0000, Chris Wilson wrote:
> > If the initialisation fails, we may be left with a dangling pointer with
> > an incomplete fbdev structure.
> 
> This shouldn't happen with 4.5, the fbdev is now clobbered if initialization
> fails, the existing "if (dev_priv->fbdev)" checks should thus be sufficient.
> See 54632abe8ca3 ("drm/i915: Fix oops caused by fbdev initialization
> failure") as well as 366e39b4d2c5 ("drm/i915: Tear down fbdev if
> initialization fails").
> 
> Gustav Fagerlind and Li Weinan both reported this for 4.3. It would be
> interesting to know if it can be reproduced at all with 4.5-rc2.
> 
> Best regards,
> 
> Lukas
> 
> > Here we want to disable internal calls
> > into fbdev. Similarly, the initialisation may be slow and we haven't yet
> > enabled the fbdev (e.g. quick suspend or last-close before the async init
> > completes).
> >
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93580
> > Reported-by: "Li, Weinan Z" <weinan.z.li@intel.com<mailto:weinan.z.li@intel.com>>
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk<mailto:chris@chris-wilson.co.uk>>
> > ---
> >  drivers/gpu/drm/i915/intel_fbdev.c | 41 ++++++++++++++++++++++++--------------
> >  1 file changed, 26 insertions(+), 15 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/intel_fbdev.c b/drivers/gpu/drm/i915/intel_fbdev.c
> > index 09840f4380f9..6218bc5370a1 100644
> > --- a/drivers/gpu/drm/i915/intel_fbdev.c
> > +++ b/drivers/gpu/drm/i915/intel_fbdev.c
> > @@ -114,6 +114,20 @@ static struct fb_ops intelfb_ops = {
> >       .fb_debug_leave = drm_fb_helper_debug_leave,
> >  };
> >
> > +static bool intel_fbdev_active(struct intel_fbdev *ifbdev)
> > +{
> > +     struct fb_info *info;
> > +
> > +     if (ifbdev == NULL)
> > +             return false;
> > +
> > +     info = ifbdev->helper.fbdev;
> > +     if (!info->screen_base)
> > +             return false;
> > +
> > +     return info->state == FBINFO_STATE_RUNNING;
> > +}
> > +
> >  static int intelfb_alloc(struct drm_fb_helper *helper,
> >                        struct drm_fb_helper_surface_size *sizes)
> >  {
> > @@ -753,6 +767,8 @@ void intel_fbdev_set_suspend(struct drm_device *dev, int state, bool synchronous
> >               return;
> >
> >       info = ifbdev->helper.fbdev;
> > +     if (!info->screen_base)
> > +             return;
> >
> >       if (synchronous) {
> >               /* Flush any pending work to turn the console on, and then
> > @@ -794,29 +810,24 @@ void intel_fbdev_set_suspend(struct drm_device *dev, int state, bool synchronous
> >
> >  void intel_fbdev_output_poll_changed(struct drm_device *dev)
> >  {
> > -     struct drm_i915_private *dev_priv = dev->dev_private;
> > -     if (dev_priv->fbdev)
> > +     struct drm_i915_private *dev_priv = to_i915(dev);
> > +
> > +     if (intel_fbdev_active(dev_priv->fbdev))
> >               drm_fb_helper_hotplug_event(&dev_priv->fbdev->helper);
> >  }
> >
> >  void intel_fbdev_restore_mode(struct drm_device *dev)
> >  {
> > -     int ret;
> > -     struct drm_i915_private *dev_priv = dev->dev_private;
> > -     struct intel_fbdev *ifbdev = dev_priv->fbdev;
> > -     struct drm_fb_helper *fb_helper;
> > +     struct intel_fbdev *ifbdev = to_i915(dev)->fbdev;
> >
> > -     if (!ifbdev)
> > +     if (!intel_fbdev_active(ifbdev))
> >               return;
> >
> > -     fb_helper = &ifbdev->helper;
> > -
> > -     ret = drm_fb_helper_restore_fbdev_mode_unlocked(fb_helper);
> > -     if (ret) {
> > -             DRM_DEBUG("failed to restore crtc mode\n");
> > -     } else {
> > -             mutex_lock(&fb_helper->dev->struct_mutex);
> > +     if (drm_fb_helper_restore_fbdev_mode_unlocked(&ifbdev->helper) == 0) {
> > +             mutex_lock(&dev->struct_mutex);
> >               intel_fb_obj_invalidate(ifbdev->fb->obj, ORIGIN_GTT);
> > -             mutex_unlock(&fb_helper->dev->struct_mutex);
> > +             mutex_unlock(&dev->struct_mutex);
> > +     } else {
> > +             DRM_DEBUG("failed to restore crtc mode\n");
> >       }
> >  }
> > --
> > 2.7.0
> >
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org<mailto:Intel-gfx@lists.freedesktop.org>
> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] drm/i915: Protect fbdev across slow or failed initialisation
  2016-02-05  0:27 ` Lukas Wunner
@ 2016-02-05 11:09   ` Chris Wilson
  2016-02-05 14:58     ` Lukas Wunner
  0 siblings, 1 reply; 7+ messages in thread
From: Chris Wilson @ 2016-02-05 11:09 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: intel-gfx@lists.freedesktop.org, gustav.fagerlind@gmail.com,
	Li, Weinan Z

On Fri, Feb 05, 2016 at 01:27:10AM +0100, Lukas Wunner wrote:
> Hi,
> 
> On Thu, Feb 04, 2016 at 09:21:17AM +0000, Li, Weinan Z wrote:
> > We still need this patch. Seems 54632abe8ca3 ("drm/i915: Fix oops caused by fbdev initialization
> > failure") as well as 366e39b4d2c5 ("drm/i915: Tear down fbdev if
> > initialization fails") this 2 patches can???t cover this case. It???s access ifbdev->fb before the initialization
> > finished, but not initialization failed. If don???t have any other patches or code update relative, it may still have in 4.5.
> 
> Okay, I see.
> 
> > 
> > add info NULL check should be better, it is also initialized in the async queue
> > >       info = ifbdev->helper.fbdev;
> > > +     if (info == NULL)
> > > +            return false;
> > >       if (!info->screen_base)
> 
> So if there's indeed a race between fbdev initialization and the call to
> intel_fbdev_restore_mode() (on lastclose), there's more that can go awry:
> - intel_fbdev_restore_mode() dereferences ifbdev->fb->obj
> - it calls __drm_atomic_helper_set_config() which WARNs if ifbdev->fb is NULL
> - it calls drm_fb_helper_set_par() which dereferences ifbdev->helper.fbdev
> 
> Instead of checking all that it's probably simpler to call
> async_synchronize_full() at the beginning of intel_fbdev_restore_mode(),
> as Li Weinan suggested. I'll submit the corresponding one-liner patch
> tomorrow if noone else does.
> 
> Chris' patch also modified intel_fbdev_set_suspend() as well as
> intel_fbdev_output_poll_changed(), not sure if these are racy as well.
> At least the stack traces posted by Li Weinan and Gustav Fägerlind
> only indicate that lastclose is racy.

We called set-suspend internally, but we don't do any checks before
hand. I was concerned we may be able to get into a situation where
.fb_probe fails and leaves a dangling dev_priv->ifbdev for which we
would then trip over the NULL info->screen_base. So I was looking for a
suitable guard.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] drm/i915: Protect fbdev across slow or failed initialisation
  2016-02-05 11:09   ` Chris Wilson
@ 2016-02-05 14:58     ` Lukas Wunner
  2016-02-15 16:32       ` Daniel Vetter
  0 siblings, 1 reply; 7+ messages in thread
From: Lukas Wunner @ 2016-02-05 14:58 UTC (permalink / raw)
  To: Chris Wilson, Li, Weinan Z, gustav.fagerlind@gmail.com,
	intel-gfx@lists.freedesktop.org

Hi Chris,

On Fri, Feb 05, 2016 at 11:09:27AM +0000, Chris Wilson wrote:
> On Fri, Feb 05, 2016 at 01:27:10AM +0100, Lukas Wunner wrote:
> > On Thu, Feb 04, 2016 at 09:21:17AM +0000, Li, Weinan Z wrote:
> > > We still need this patch. Seems 54632abe8ca3 ("drm/i915: Fix oops caused by fbdev initialization
> > > failure") as well as 366e39b4d2c5 ("drm/i915: Tear down fbdev if
> > > initialization fails") this 2 patches can???t cover this case. It???s access ifbdev->fb before the initialization
> > > finished, but not initialization failed. If don???t have any other patches or code update relative, it may still have in 4.5.
> > 
> > Okay, I see.
> > 
> > > 
> > > add info NULL check should be better, it is also initialized in the async queue
> > > >       info = ifbdev->helper.fbdev;
> > > > +     if (info == NULL)
> > > > +            return false;
> > > >       if (!info->screen_base)
> > 
> > So if there's indeed a race between fbdev initialization and the call to
> > intel_fbdev_restore_mode() (on lastclose), there's more that can go awry:
> > - intel_fbdev_restore_mode() dereferences ifbdev->fb->obj
> > - it calls __drm_atomic_helper_set_config() which WARNs if ifbdev->fb is NULL
> > - it calls drm_fb_helper_set_par() which dereferences ifbdev->helper.fbdev
> > 
> > Instead of checking all that it's probably simpler to call
> > async_synchronize_full() at the beginning of intel_fbdev_restore_mode(),
> > as Li Weinan suggested. I'll submit the corresponding one-liner patch
> > tomorrow if noone else does.
> > 
> > Chris' patch also modified intel_fbdev_set_suspend() as well as
> > intel_fbdev_output_poll_changed(), not sure if these are racy as well.
> > At least the stack traces posted by Li Weinan and Gustav Fägerlind
> > only indicate that lastclose is racy.
> 
> We called set-suspend internally, but we don't do any checks before
> hand. I was concerned we may be able to get into a situation where
> .fb_probe fails and leaves a dangling dev_priv->ifbdev for which we
> would then trip over the NULL info->screen_base. So I was looking for a
> suitable guard.

Yes, looking at this with a fresh pair of eyeballs I think you were
totally right, i915_pm_ops is declared as part of i915_pci_driver and
it might indeed happen that i915_pm_suspend() is called before the fbdev
is fully initialized.

As for intel_fbdev_output_poll_changed(), there's even a comment in
i915_load_modeset_init() stating that hotplug events might occur
during fdbev initial config.

Below patch adds the requisite async_synchronize_full() to the three
functions that you also modified and updates that comment.

However AFAICT a corner case remains where we're still vulnerable to races:
async_schedule() runs synchronously "if we're out of memory or if there's
too much work pending already" (see __async_schedule() in kernel/async.c).
In this case no entry is added to the pending list and
async_synchronize_full() might theoretically return before the synchronously
executed function has finished.

The existing call to async_synchronize_full() in intel_fbdev_fini()
seems to be susceptible to the same race condition, but it's probably
very hard to trigger it. (Unload the module before the fbdev is fully
initialized.)

To make it bullet proof we could declare a completion in intel_fbdev.c
and wait for that instead. Opinions?

Thanks,

Lukas

-- >8 --

Subject: [PATCH] drm/i915: Fix races on fbdev

The ->lastclose callback invokes intel_fbdev_restore_mode() and has
been witnessed to run before intel_fbdev_initial_config_async()
has finished.

We might likewise receive hotplug events or be suspended before
we've had a chance to fully set up the fbdev.

Fix by waiting for the asynchronous thread to finish.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93580
Reported-by: Gustav FÀgerlind <gustav.fagerlind@gmail.com>
Reported-by: "Li, Weinan Z" <weinan.z.li@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/gpu/drm/i915/i915_dma.c    | 8 +++-----
 drivers/gpu/drm/i915/intel_fbdev.c | 4 ++++
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index a0f5659..a76b528 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -437,11 +437,9 @@ static int i915_load_modeset_init(struct drm_device *dev)
 	 * Some ports require correctly set-up hpd registers for detection to
 	 * work properly (leading to ghost connected connector status), e.g. VGA
 	 * on gm45.  Hence we can only set up the initial fbdev config after hpd
-	 * irqs are fully enabled. Now we should scan for the initial config
-	 * only once hotplug handling is enabled, but due to screwed-up locking
-	 * around kms/fbdev init we can't protect the fdbev initial config
-	 * scanning against hotplug events. Hence do this first and ignore the
-	 * tiny window where we will loose hotplug notifactions.
+	 * irqs are fully enabled. We protect the fbdev initial config scanning
+	 * against hotplug events by waiting in intel_fbdev_output_poll_changed
+	 * until the asynchronous thread has finished.
 	 */
 	intel_fbdev_initial_config_async(dev);
 
diff --git a/drivers/gpu/drm/i915/intel_fbdev.c b/drivers/gpu/drm/i915/intel_fbdev.c
index 09840f4..2002b13 100644
--- a/drivers/gpu/drm/i915/intel_fbdev.c
+++ b/drivers/gpu/drm/i915/intel_fbdev.c
@@ -749,6 +749,7 @@ void intel_fbdev_set_suspend(struct drm_device *dev, int state, bool synchronous
 	struct intel_fbdev *ifbdev = dev_priv->fbdev;
 	struct fb_info *info;
 
+	async_synchronize_full();
 	if (!ifbdev)
 		return;
 
@@ -795,6 +796,8 @@ void intel_fbdev_set_suspend(struct drm_device *dev, int state, bool synchronous
 void intel_fbdev_output_poll_changed(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
+
+	async_synchronize_full();
 	if (dev_priv->fbdev)
 		drm_fb_helper_hotplug_event(&dev_priv->fbdev->helper);
 }
@@ -806,6 +809,7 @@ void intel_fbdev_restore_mode(struct drm_device *dev)
 	struct intel_fbdev *ifbdev = dev_priv->fbdev;
 	struct drm_fb_helper *fb_helper;
 
+	async_synchronize_full();
 	if (!ifbdev)
 		return;
 
-- 
1.8.5.2 (Apple Git-48)

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] drm/i915: Protect fbdev across slow or failed initialisation
  2016-02-05 14:58     ` Lukas Wunner
@ 2016-02-15 16:32       ` Daniel Vetter
  2016-02-15 17:13         ` [PATCH 0/1] drm/i915: Fix races on fbdev [RESEND FOR CI] Lukas Wunner
  0 siblings, 1 reply; 7+ messages in thread
From: Daniel Vetter @ 2016-02-15 16:32 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: intel-gfx@lists.freedesktop.org, Li, Weinan Z,
	gustav.fagerlind@gmail.com

On Fri, Feb 05, 2016 at 03:58:31PM +0100, Lukas Wunner wrote:
> Hi Chris,
> 
> On Fri, Feb 05, 2016 at 11:09:27AM +0000, Chris Wilson wrote:
> > On Fri, Feb 05, 2016 at 01:27:10AM +0100, Lukas Wunner wrote:
> > > On Thu, Feb 04, 2016 at 09:21:17AM +0000, Li, Weinan Z wrote:
> > > > We still need this patch. Seems 54632abe8ca3 ("drm/i915: Fix oops caused by fbdev initialization
> > > > failure") as well as 366e39b4d2c5 ("drm/i915: Tear down fbdev if
> > > > initialization fails") this 2 patches can???t cover this case. It???s access ifbdev->fb before the initialization
> > > > finished, but not initialization failed. If don???t have any other patches or code update relative, it may still have in 4.5.
> > > 
> > > Okay, I see.
> > > 
> > > > 
> > > > add info NULL check should be better, it is also initialized in the async queue
> > > > >       info = ifbdev->helper.fbdev;
> > > > > +     if (info == NULL)
> > > > > +            return false;
> > > > >       if (!info->screen_base)
> > > 
> > > So if there's indeed a race between fbdev initialization and the call to
> > > intel_fbdev_restore_mode() (on lastclose), there's more that can go awry:
> > > - intel_fbdev_restore_mode() dereferences ifbdev->fb->obj
> > > - it calls __drm_atomic_helper_set_config() which WARNs if ifbdev->fb is NULL
> > > - it calls drm_fb_helper_set_par() which dereferences ifbdev->helper.fbdev
> > > 
> > > Instead of checking all that it's probably simpler to call
> > > async_synchronize_full() at the beginning of intel_fbdev_restore_mode(),
> > > as Li Weinan suggested. I'll submit the corresponding one-liner patch
> > > tomorrow if noone else does.
> > > 
> > > Chris' patch also modified intel_fbdev_set_suspend() as well as
> > > intel_fbdev_output_poll_changed(), not sure if these are racy as well.
> > > At least the stack traces posted by Li Weinan and Gustav Fägerlind
> > > only indicate that lastclose is racy.
> > 
> > We called set-suspend internally, but we don't do any checks before
> > hand. I was concerned we may be able to get into a situation where
> > .fb_probe fails and leaves a dangling dev_priv->ifbdev for which we
> > would then trip over the NULL info->screen_base. So I was looking for a
> > suitable guard.
> 
> Yes, looking at this with a fresh pair of eyeballs I think you were
> totally right, i915_pm_ops is declared as part of i915_pci_driver and
> it might indeed happen that i915_pm_suspend() is called before the fbdev
> is fully initialized.
> 
> As for intel_fbdev_output_poll_changed(), there's even a comment in
> i915_load_modeset_init() stating that hotplug events might occur
> during fdbev initial config.
> 
> Below patch adds the requisite async_synchronize_full() to the three
> functions that you also modified and updates that comment.
> 
> However AFAICT a corner case remains where we're still vulnerable to races:
> async_schedule() runs synchronously "if we're out of memory or if there's
> too much work pending already" (see __async_schedule() in kernel/async.c).
> In this case no entry is added to the pending list and
> async_synchronize_full() might theoretically return before the synchronously
> executed function has finished.
> 
> The existing call to async_synchronize_full() in intel_fbdev_fini()
> seems to be susceptible to the same race condition, but it's probably
> very hard to trigger it. (Unload the module before the fbdev is fully
> initialized.)
> 
> To make it bullet proof we could declare a completion in intel_fbdev.c
> and wait for that instead. Opinions?
> 
> Thanks,
> 
> Lukas
> 
> -- >8 --
> 
> Subject: [PATCH] drm/i915: Fix races on fbdev
> 
> The ->lastclose callback invokes intel_fbdev_restore_mode() and has
> been witnessed to run before intel_fbdev_initial_config_async()
> has finished.
> 
> We might likewise receive hotplug events or be suspended before
> we've had a chance to fully set up the fbdev.
> 
> Fix by waiting for the asynchronous thread to finish.
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93580
> Reported-by: Gustav FÀgerlind <gustav.fagerlind@gmail.com>
> Reported-by: "Li, Weinan Z" <weinan.z.li@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: stable@vger.kernel.org
> Signed-off-by: Lukas Wunner <lukas@wunner.de>

Can you pls resubmit this patch stand-alone? Patchwork (and therefore CI)
won't pick up inlined patches like this one here unfortunately.

Thanks, Daniel

> ---
>  drivers/gpu/drm/i915/i915_dma.c    | 8 +++-----
>  drivers/gpu/drm/i915/intel_fbdev.c | 4 ++++
>  2 files changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> index a0f5659..a76b528 100644
> --- a/drivers/gpu/drm/i915/i915_dma.c
> +++ b/drivers/gpu/drm/i915/i915_dma.c
> @@ -437,11 +437,9 @@ static int i915_load_modeset_init(struct drm_device *dev)
>  	 * Some ports require correctly set-up hpd registers for detection to
>  	 * work properly (leading to ghost connected connector status), e.g. VGA
>  	 * on gm45.  Hence we can only set up the initial fbdev config after hpd
> -	 * irqs are fully enabled. Now we should scan for the initial config
> -	 * only once hotplug handling is enabled, but due to screwed-up locking
> -	 * around kms/fbdev init we can't protect the fdbev initial config
> -	 * scanning against hotplug events. Hence do this first and ignore the
> -	 * tiny window where we will loose hotplug notifactions.
> +	 * irqs are fully enabled. We protect the fbdev initial config scanning
> +	 * against hotplug events by waiting in intel_fbdev_output_poll_changed
> +	 * until the asynchronous thread has finished.
>  	 */
>  	intel_fbdev_initial_config_async(dev);
>  
> diff --git a/drivers/gpu/drm/i915/intel_fbdev.c b/drivers/gpu/drm/i915/intel_fbdev.c
> index 09840f4..2002b13 100644
> --- a/drivers/gpu/drm/i915/intel_fbdev.c
> +++ b/drivers/gpu/drm/i915/intel_fbdev.c
> @@ -749,6 +749,7 @@ void intel_fbdev_set_suspend(struct drm_device *dev, int state, bool synchronous
>  	struct intel_fbdev *ifbdev = dev_priv->fbdev;
>  	struct fb_info *info;
>  
> +	async_synchronize_full();
>  	if (!ifbdev)
>  		return;
>  
> @@ -795,6 +796,8 @@ void intel_fbdev_set_suspend(struct drm_device *dev, int state, bool synchronous
>  void intel_fbdev_output_poll_changed(struct drm_device *dev)
>  {
>  	struct drm_i915_private *dev_priv = dev->dev_private;
> +
> +	async_synchronize_full();
>  	if (dev_priv->fbdev)
>  		drm_fb_helper_hotplug_event(&dev_priv->fbdev->helper);
>  }
> @@ -806,6 +809,7 @@ void intel_fbdev_restore_mode(struct drm_device *dev)
>  	struct intel_fbdev *ifbdev = dev_priv->fbdev;
>  	struct drm_fb_helper *fb_helper;
>  
> +	async_synchronize_full();
>  	if (!ifbdev)
>  		return;
>  
> -- 
> 1.8.5.2 (Apple Git-48)
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 0/1] drm/i915: Fix races on fbdev [RESEND FOR CI]
  2016-02-15 16:32       ` Daniel Vetter
@ 2016-02-15 17:13         ` Lukas Wunner
  2016-02-15 17:13           ` [PATCH 1/1] drm/i915: Fix races on fbdev Lukas Wunner
  0 siblings, 1 reply; 7+ messages in thread
From: Lukas Wunner @ 2016-02-15 17:13 UTC (permalink / raw)
  To: intel-gfx; +Cc: gustav.fagerlind, Li, Weinan Z

Originally submitted inline with <20160205145831.GA14084@wunner.de>,
hereby resubmitted standalone for CI as requested by Daniel with
<20160215163251.GR11240@phenom.ffwll.local>.

Above-mentioned message contained the following important note:

> However AFAICT a corner case remains where we're still vulnerable to races:
> async_schedule() runs synchronously "if we're out of memory or if there's
> too much work pending already" (see __async_schedule() in kernel/async.c).
> In this case no entry is added to the pending list and
> async_synchronize_full() might theoretically return before the synchronously
> executed function has finished.
>
> The existing call to async_synchronize_full() in intel_fbdev_fini()
> seems to be susceptible to the same race condition, but it's probably
> very hard to trigger it. (Unload the module before the fbdev is fully
> initialized.)
>
> To make it bullet proof we could declare a completion in intel_fbdev.c
> and wait for that instead. Opinions?


Lukas Wunner (1):
  drm/i915: Fix races on fbdev

 drivers/gpu/drm/i915/i915_dma.c    | 8 +++-----
 drivers/gpu/drm/i915/intel_fbdev.c | 4 ++++
 2 files changed, 7 insertions(+), 5 deletions(-)

-- 
1.8.5.2 (Apple Git-48)

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/1] drm/i915: Fix races on fbdev
  2016-02-15 17:13         ` [PATCH 0/1] drm/i915: Fix races on fbdev [RESEND FOR CI] Lukas Wunner
@ 2016-02-15 17:13           ` Lukas Wunner
  0 siblings, 0 replies; 7+ messages in thread
From: Lukas Wunner @ 2016-02-15 17:13 UTC (permalink / raw)
  To: intel-gfx; +Cc: gustav.fagerlind, Li, Weinan Z

The ->lastclose callback invokes intel_fbdev_restore_mode() and has
been witnessed to run before intel_fbdev_initial_config_async()
has finished.

We might likewise receive hotplug events or be suspended before
we've had a chance to fully set up the fbdev.

Fix by waiting for the asynchronous thread to finish.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93580
Reported-by: Gustav Fägerlind <gustav.fagerlind@gmail.com>
Reported-by: "Li, Weinan Z" <weinan.z.li@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@vger.kernel.org
Signed-off-by: Lukas Wunner <lukas@wunner.de>
---
 drivers/gpu/drm/i915/i915_dma.c    | 8 +++-----
 drivers/gpu/drm/i915/intel_fbdev.c | 4 ++++
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index a0f5659..a76b528 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -437,11 +437,9 @@ static int i915_load_modeset_init(struct drm_device *dev)
 	 * Some ports require correctly set-up hpd registers for detection to
 	 * work properly (leading to ghost connected connector status), e.g. VGA
 	 * on gm45.  Hence we can only set up the initial fbdev config after hpd
-	 * irqs are fully enabled. Now we should scan for the initial config
-	 * only once hotplug handling is enabled, but due to screwed-up locking
-	 * around kms/fbdev init we can't protect the fdbev initial config
-	 * scanning against hotplug events. Hence do this first and ignore the
-	 * tiny window where we will loose hotplug notifactions.
+	 * irqs are fully enabled. We protect the fbdev initial config scanning
+	 * against hotplug events by waiting in intel_fbdev_output_poll_changed
+	 * until the asynchronous thread has finished.
 	 */
 	intel_fbdev_initial_config_async(dev);
 
diff --git a/drivers/gpu/drm/i915/intel_fbdev.c b/drivers/gpu/drm/i915/intel_fbdev.c
index 09840f4..2002b13 100644
--- a/drivers/gpu/drm/i915/intel_fbdev.c
+++ b/drivers/gpu/drm/i915/intel_fbdev.c
@@ -749,6 +749,7 @@ void intel_fbdev_set_suspend(struct drm_device *dev, int state, bool synchronous
 	struct intel_fbdev *ifbdev = dev_priv->fbdev;
 	struct fb_info *info;
 
+	async_synchronize_full();
 	if (!ifbdev)
 		return;
 
@@ -795,6 +796,8 @@ void intel_fbdev_set_suspend(struct drm_device *dev, int state, bool synchronous
 void intel_fbdev_output_poll_changed(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
+
+	async_synchronize_full();
 	if (dev_priv->fbdev)
 		drm_fb_helper_hotplug_event(&dev_priv->fbdev->helper);
 }
@@ -806,6 +809,7 @@ void intel_fbdev_restore_mode(struct drm_device *dev)
 	struct intel_fbdev *ifbdev = dev_priv->fbdev;
 	struct drm_fb_helper *fb_helper;
 
+	async_synchronize_full();
 	if (!ifbdev)
 		return;
 
-- 
1.8.5.2 (Apple Git-48)

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-02-15 17:13 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-02-04  9:21 [PATCH] drm/i915: Protect fbdev across slow or failed initialisation Li, Weinan Z
2016-02-05  0:27 ` Lukas Wunner
2016-02-05 11:09   ` Chris Wilson
2016-02-05 14:58     ` Lukas Wunner
2016-02-15 16:32       ` Daniel Vetter
2016-02-15 17:13         ` [PATCH 0/1] drm/i915: Fix races on fbdev [RESEND FOR CI] Lukas Wunner
2016-02-15 17:13           ` [PATCH 1/1] drm/i915: Fix races on fbdev Lukas Wunner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).