public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: Martin Peres <martin.peres@free.fr>
Cc: airlied@linux.ie, bskeggs@redhat.com, marcin.slusarz@gmail.com,
	dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org
Subject: Re: nouveau shuts the machine down with v3.9-rc1 (temperature (72 C) hit the 'shutdown' threshold).
Date: Mon, 4 Mar 2013 16:41:10 -0500	[thread overview]
Message-ID: <20130304214110.GA17402@phenom.dumpdata.com> (raw)
In-Reply-To: <5134F44C.7040700@free.fr>

On Mon, Mar 04, 2013 at 08:21:48PM +0100, Martin Peres wrote:
> Hi Konrad,
> 
> On 04/03/2013 19:40, Konrad Rzeszutek Wilk wrote:> After git merge
> ab7826595e9ec51a51f622c5fc91e2f59440481a
> > (Merge tag 'mfd-3.9-1' of
> git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6)
> > the nouveau driver ends up shutting of the machine when booting.
> >
> >
> > I hadn't done a git bisection yet and was wondering if there are some
> > juice commits I ought to look at?
> 
> Sure, no need to bisect, it is a new (apparently-broken-for-you) feature.
> 
> The code is in /drivers/gpu/drm/nouveau/core/subdev/therm/
> 
> 
> >
> > Here is the serial console:
> 
> 
> > [    6.940628] nouveau  [  PTHERM][0000:00:0d.0] Thermal
> management: disabled
> > [    6.957474] nouveau  [  PTHERM][0000:00:0d.0] programmed
> thresholds [ 90(2), 95(3), 145(2), 135(5) ]
> > [    6.966594] nouveau     6.975100] nouveau  [
> PTHERM][0000:00:0d.0] Thermal management: automatic
> > [    6.982059] nouveau  [  PTHERM][0000:00:0d.0] temperature (88
> C) hit the 'downclock' threshold
> > [    6.990680] nouveau  [  PTHERM][0000:00:0d.0] temperature (88
> C) hit the 'critical' threshold
> > [    6.999194] nouveau  [  PTHERM][0000:00:0d.0] temperature (90
> C) hit the 'shutdown' threshold
> 
> See, this is strange. If I believe the "programmed thresholds" line,
> the fanboost threshold is at 90°C, downclock is at 95°C, critical
> temperature is at 145°C and shutdown is at 135°C.
> So, from the BIOS side, things seem to be in fairly good shape
> (critical should be lower than shutdown, but that's OK).
> 
> My theory is that your temperature sensor is very variable that
> would set off the shutdown alarm. So, either the sensor needs more
> settling time or the output is genuinely very variable.

You should see it when I boot it under Xen:

[    8.427789] nouveau  [  PTHERM][0000:00:0d.0] programmed thresholds [ 90(2), 95(3), 145(2), 135(5) ]^M^M
[    8.427855] nouveau  [  PTHERM][0000:00:0d.0] temperature (222 C) hit the 'fanboost' threshold^M^M
[    8.427919] nouveau  [  PTHERM][0000:00:0d.0] Thermal management: automatic^M^M
[    8.427973] nouveau  [  PTHERM][0000:00:0d.0] temperature (222 C) hit the 'downclock' threshold^M^M
[    8.428036] nouveau  [  PTHERM][0000:00:0d.0] temperature (222 C) hit the 'critical' threshold^M^M
[    8.428099] nouveau  [  PTHERM][0000:00:0d.0] temperature (222 C) hit the 'shutdown' threshold^M^M

> 
> In the first case, we could fix that by increasing the settling time
> (at the expense of a longer boot period). We could also for a 10s
> wait at boot time before reading temperature.
> If this is the latter case, we only have the solution to average the
> temperature on several samples. I would need statistics on the
> variability in order to calculate a proper low-pass filter that
> wouldn't be too slow or too RAM/wakeup-intensive.
> 
> I really hope the problem is the settling time!
> 
> 
> Here is what you can do to test the theory:
> 
> Change the mdelay at line 41 of
> /drivers/gpu/drm/nouveau/core/subdev/therm/nv40.c (http://cgit.freedesktop.org/nouveau/linux-2.6/tree/drivers/gpu/drm/nouveau/core/subdev/therm/nv40.c#n41)
> from 10 to 1000.
> Please also add an mdelay of 1000 between lines 44 and 45.

Let me do that tomorrow and report my findings.
> 
> If it works with this patch, then try decreasing the delay to 20ms.
> 
> In any way, I'll send some thermal patches tonight to be more
> resistant to long settling times.

Pls CC me in case you would like me also to test them with the
mdelay patch.

> 
> Thanks for reporting!

Of course.
> 
> Martin (mupuf)
> 
> 

  reply	other threads:[~2013-03-04 21:41 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-04 18:40 nouveau shuts the machine down with v3.9-rc1 (temperature (72 C) hit the 'shutdown' threshold) Konrad Rzeszutek Wilk
2013-03-04 19:21 ` Martin Peres
2013-03-04 21:41   ` Konrad Rzeszutek Wilk [this message]
     [not found]     ` <5135D375.9060006@free.fr>
     [not found]       ` <20130305154404.GA15271@phenom.dumpdata.com>
2013-03-11 12:38         ` Konrad Rzeszutek Wilk
2013-03-11 23:00           ` Martin Peres

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130304214110.GA17402@phenom.dumpdata.com \
    --to=konrad.wilk@oracle.com \
    --cc=airlied@linux.ie \
    --cc=bskeggs@redhat.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=marcin.slusarz@gmail.com \
    --cc=martin.peres@free.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox