From mboxrd@z Thu Jan 1 00:00:00 1970 From: Konrad Rzeszutek Wilk Subject: Re: nouveau shuts the machine down with v3.9-rc1 (temperature (72 C) hit the 'shutdown' threshold). Date: Mon, 4 Mar 2013 16:41:10 -0500 Message-ID: <20130304214110.GA17402@phenom.dumpdata.com> References: <20130304184022.GA8222@phenom.dumpdata.com> <5134F44C.7040700@free.fr> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Return-path: Content-Disposition: inline In-Reply-To: <5134F44C.7040700@free.fr> Sender: linux-kernel-owner@vger.kernel.org To: Martin Peres Cc: airlied@linux.ie, bskeggs@redhat.com, marcin.slusarz@gmail.com, dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org List-Id: dri-devel@lists.freedesktop.org On Mon, Mar 04, 2013 at 08:21:48PM +0100, Martin Peres wrote: > Hi Konrad, >=20 > On 04/03/2013 19:40, Konrad Rzeszutek Wilk wrote:> After git merge > ab7826595e9ec51a51f622c5fc91e2f59440481a > > (Merge tag 'mfd-3.9-1' of > git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6) > > the nouveau driver ends up shutting of the machine when booting. > > > > > > I hadn't done a git bisection yet and was wondering if there are so= me > > juice commits I ought to look at? >=20 > Sure, no need to bisect, it is a new (apparently-broken-for-you) feat= ure. >=20 > The code is in /drivers/gpu/drm/nouveau/core/subdev/therm/ >=20 >=20 > > > > Here is the serial console: >=20 >=20 > > [ 6.940628] nouveau [ PTHERM][0000:00:0d.0] Thermal > management: disabled > > [ 6.957474] nouveau [ PTHERM][0000:00:0d.0] programmed > thresholds [ 90(2), 95(3), 145(2), 135(5) ] > > [ 6.966594] nouveau 6.975100] nouveau [ > PTHERM][0000:00:0d.0] Thermal management: automatic > > [ 6.982059] nouveau [ PTHERM][0000:00:0d.0] temperature (88 > C) hit the 'downclock' threshold > > [ 6.990680] nouveau [ PTHERM][0000:00:0d.0] temperature (88 > C) hit the 'critical' threshold > > [ 6.999194] nouveau [ PTHERM][0000:00:0d.0] temperature (90 > C) hit the 'shutdown' threshold >=20 > See, this is strange. If I believe the "programmed thresholds" line, > the fanboost threshold is at 90=B0C, downclock is at 95=B0C, critical > temperature is at 145=B0C and shutdown is at 135=B0C. > So, from the BIOS side, things seem to be in fairly good shape > (critical should be lower than shutdown, but that's OK). >=20 > My theory is that your temperature sensor is very variable that > would set off the shutdown alarm. So, either the sensor needs more > settling time or the output is genuinely very variable. You should see it when I boot it under Xen: [ 8.427789] nouveau [ PTHERM][0000:00:0d.0] programmed thresholds = [ 90(2), 95(3), 145(2), 135(5) ]^M^M [ 8.427855] nouveau [ PTHERM][0000:00:0d.0] temperature (222 C) hi= t the 'fanboost' threshold^M^M [ 8.427919] nouveau [ PTHERM][0000:00:0d.0] Thermal management: au= tomatic^M^M [ 8.427973] nouveau [ PTHERM][0000:00:0d.0] temperature (222 C) hi= t the 'downclock' threshold^M^M [ 8.428036] nouveau [ PTHERM][0000:00:0d.0] temperature (222 C) hi= t the 'critical' threshold^M^M [ 8.428099] nouveau [ PTHERM][0000:00:0d.0] temperature (222 C) hi= t the 'shutdown' threshold^M^M >=20 > In the first case, we could fix that by increasing the settling time > (at the expense of a longer boot period). We could also for a 10s > wait at boot time before reading temperature. > If this is the latter case, we only have the solution to average the > temperature on several samples. I would need statistics on the > variability in order to calculate a proper low-pass filter that > wouldn't be too slow or too RAM/wakeup-intensive. >=20 > I really hope the problem is the settling time! >=20 >=20 > Here is what you can do to test the theory: >=20 > Change the mdelay at line 41 of > /drivers/gpu/drm/nouveau/core/subdev/therm/nv40.c (http://cgit.freede= sktop.org/nouveau/linux-2.6/tree/drivers/gpu/drm/nouveau/core/subdev/th= erm/nv40.c#n41) > from 10 to 1000. > Please also add an mdelay of 1000 between lines 44 and 45. Let me do that tomorrow and report my findings. >=20 > If it works with this patch, then try decreasing the delay to 20ms. >=20 > In any way, I'll send some thermal patches tonight to be more > resistant to long settling times. Pls CC me in case you would like me also to test them with the mdelay patch. >=20 > Thanks for reporting! Of course. >=20 > Martin (mupuf) >=20 >=20