From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ville =?iso-8859-1?Q?Syrj=E4l=E4?= Subject: Re: [PATCH] drm/i915: fix failure to power off after hibernate Date: Thu, 26 Feb 2015 22:05:56 +0200 Message-ID: <20150226200556.GF11371@intel.com> References: <87bnkjcqjt.fsf@nemi.mork.no> <1424789904-26699-1-git-send-email-bjorn@mork.no> <1424794340.15554.3.camel@intel.com> <878ufnjfrr.fsf@nemi.mork.no> <1424889214.5991.4.camel@intel.com> <87bnkhqan8.fsf@nemi.mork.no> <1424976648.17078.1.camel@intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: Content-Disposition: inline In-Reply-To: <1424976648.17078.1.camel@intel.com> Sender: stable-owner@vger.kernel.org To: Imre Deak Cc: =?iso-8859-1?Q?Bj=F8rn?= Mork , linux-kernel@vger.kernel.org, Daniel Vetter , Jani Nikula , intel-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org, stable@vger.kernel.org List-Id: dri-devel@lists.freedesktop.org On Thu, Feb 26, 2015 at 08:50:48PM +0200, Imre Deak wrote: > On to, 2015-02-26 at 10:34 +0100, Bj=F8rn Mork wrote: > > Imre Deak writes: > > > > >> That patch fixes the problem, with only pci_set_power_state comm= ented > > >> out. Do you still want me to try with pci_disable_device() comm= ented > > >> out as well? > > > > > > No, but it would help if you could still try the two attached pat= ch > > > separately, without any of the previous workarounds. Based on the > > > result, we'll follow up with a fix that adds for your specific pl= atform > > > either of the attached workarounds or simply avoids putting the d= evice > > > into D3 (corresponding to the patch you already tried). > > > > None of those patches made any difference. The laptop still hangs = at > > power-off. > > > > Not really surprising, is it? Previous testing shows that the hang > > occurs at the pci_set_power_state(drm_dev->pdev, PCI_D3hot) call in= the > > poweroff_late hook. It is hard to see how replacing it by an attem= pt to > > set D3cold, or adding any call after this point, could possibly cha= nge > > anything. The system is stil hanging at the pci_set_power_state() = call. >=20 > Judging from the blinking LED, I assume that it's not > pci_set_power_state() that hangs the machine, but the hang happens in > BIOS code. >=20 > > The generic pci-driver code will put the i915 device into PCI_D3hot= for > > you, won't it? Why do you need to duplicate that in the driver, > > *knowing* that doing so breaks (at least some) systems? >=20 > Letting the pci core put the device into D3 wouldn't get rid of the p= roblem. > It's putting the device into D3 in the first place what causes it. >=20 > > I honestly don't think this "let's try some random code" is the pro= per > > way to fix this bug (or any other bug for that matter). You need t= o > > start understanding the code you write, and the first step is by > > actually explaining the changes you make. >=20 > We have a good understanding about the issue: the BIOS on your platfo= rm > does something unexpected behind the back of the driver/kernel. In th= at > sense the patches were not random, for instance the first one is base= d on > an existing workaround for an issue that resembles quite a lot yours,= see > pci_pm_poweroff_noirq(). >=20 > > I also believe that you completely miss the fact that this bug has > > survived a full release cycle before you became aware of it, and th= e > > implications this has wrt other affected systems/users. I assume y= ou > > understand that my system isn't one-of-a-kind, This means that ther= e are > > other affected users with identical/similar systems. Now, if none = of > > those users reported the bug to you (we all know why: Linux kernel > > development is currently limited by the available testing resources= , NOT > > by the available developer resources), then how do you know that th= ere > > aren't a number of other systems affected as well? > > > > Let me answer that for you: You don't. > > > > Which is why you must explain the mechanism triggering the bug, pro= ving > > that it is a chipset/system specific issue. Because that's the onl= y way > > you will *know* that you have solved the problem not only for me, b= ut for > > all affected users. > > > > IMHO, the only safe and sane solution at the moment is the revert p= atch > > I posted. It's a simple fix, reverting back to the *known* working > > state before this regression was introduced. > > > > Then you can start over from there, trying to implement this proper= ly. >=20 > The current way is the proper one that we want for the generic case. = The issue > on your platform is the exception, so working around that is a sensib= le choice. >=20 > Attached is the proposed fix for this issue. >=20 > --Imre >=20 > From 5c23657bc168db12a1ba100ab65fabd305c89c8a Mon Sep 17 00:00:00 200= 1 > From: Imre Deak > Date: Thu, 26 Feb 2015 18:38:53 +0200 > Subject: [PATCH] drm/i915: gm45: work around hang during hibernation > MIME-Version: 1.0 > Content-Type: text/plain; charset=3DUTF-8 > Content-Transfer-Encoding: 8bit >=20 > Bj=F8rn reported that his machine hang during hibernation and eventua= lly > bisected the problem to the following commit: >=20 > commit da2bc1b9db3351addd293e5b82757efe1f77ed1d > Author: Imre Deak > Date: Thu Oct 23 19:23:26 2014 +0300 >=20 > drm/i915: add poweroff_late handler >=20 > The problem seems to be that after the kernel puts the device into D3 > the BIOS still tries to access it, or otherwise assumes that it's in = D0. > This is clearly bogus, since ACPI mandates that devices are put into = D3 > by the OSPM if they are not wake-up sources. In the future we want to > unify more of the driver's runtime and system suspend paths, for exam= ple > by skipping all the system suspend/hibernation hooks if the device is > runtime suspended already. Accordingly for all other platforms the go= al > is still to properly power down the device during hibernation. >=20 > Reference: http://lists.freedesktop.org/archives/intel-gfx/2015-Febru= ary/060633.html > Reported-and-bisected-by: Bj=F8rn Mork > Signed-off-by: Imre Deak > --- > drivers/gpu/drm/i915/i915_drv.c | 27 ++++++++++++++++++++++----- > 1 file changed, 22 insertions(+), 5 deletions(-) >=20 > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i= 915_drv.c > index 4badb23..67d212b 100644 > --- a/drivers/gpu/drm/i915/i915_drv.c > +++ b/drivers/gpu/drm/i915/i915_drv.c > @@ -637,7 +637,7 @@ static int i915_drm_suspend(struct drm_device *de= v) > return 0; > } > =20 > -static int i915_drm_suspend_late(struct drm_device *drm_dev) > +static int i915_drm_suspend_late(struct drm_device *drm_dev, bool hi= bernation) > { > struct drm_i915_private *dev_priv =3D drm_dev->dev_private; > int ret; > @@ -651,7 +651,14 @@ static int i915_drm_suspend_late(struct drm_devi= ce *drm_dev) > } > =20 > pci_disable_device(drm_dev->pdev); > - pci_set_power_state(drm_dev->pdev, PCI_D3hot); > + /* > + * During hibernation on some GM45 platforms the BIOS may try to ac= cess > + * the device even though it's already in D3 and hang the machine. = So > + * leave the device in D0 on those platforms and hope the BIOS will > + * power down the device properly. Please include the model of the known bad machine in this comment, to help future archaeologists. > + */ > + if (!(hibernation && IS_GM45(dev_priv))) > + pci_set_power_state(drm_dev->pdev, PCI_D3hot); > =20 > return 0; > } > @@ -677,7 +684,7 @@ int i915_suspend_legacy(struct drm_device *dev, p= m_message_t state) > if (error) > return error; > =20 > - return i915_drm_suspend_late(dev); > + return i915_drm_suspend_late(dev, false); > } > =20 > static int i915_drm_resume(struct drm_device *dev) > @@ -965,7 +972,17 @@ static int i915_pm_suspend_late(struct device *d= ev) > if (drm_dev->switch_power_state =3D=3D DRM_SWITCH_POWER_OFF) > return 0; > =20 > - return i915_drm_suspend_late(drm_dev); > + return i915_drm_suspend_late(drm_dev, false); > +} > + > +static int i915_pm_poweroff_late(struct device *dev) > +{ > + struct drm_device *drm_dev =3D dev_to_i915(dev)->dev; > + > + if (drm_dev->switch_power_state =3D=3D DRM_SWITCH_POWER_OFF) > + return 0; > + > + return i915_drm_suspend_late(drm_dev, true); > } > =20 > static int i915_pm_resume_early(struct device *dev) > @@ -1535,7 +1552,7 @@ static const struct dev_pm_ops i915_pm_ops =3D = { > .thaw_early =3D i915_pm_resume_early, > .thaw =3D i915_pm_resume, > .poweroff =3D i915_pm_suspend, > - .poweroff_late =3D i915_pm_suspend_late, > + .poweroff_late =3D i915_pm_poweroff_late, > .restore_early =3D i915_pm_resume_early, > .restore =3D i915_pm_resume, > =20 > --=20 > 2.1.0 >=20 --=20 Ville Syrj=E4l=E4 Intel OTC