public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andi Kleen <andi@firstfloor.org>
To: Russell Leidich <rml@google.com>
Cc: Andi Kleen <andi@firstfloor.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org,
	Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@elte.hu>,
	valdis.kletnieks@vt.edu
Subject: Re: [PATCH] AMD Thermal Interrupt Support
Date: Wed, 2 Jan 2008 21:00:39 +0100	[thread overview]
Message-ID: <20080102200039.GA13784@one.firstfloor.org> (raw)
In-Reply-To: <3f1a065b0801021143y5fc15e29r4201ac19bc7c1daa@mail.gmail.com>

On Wed, Jan 02, 2008 at 11:43:08AM -0800, Russell Leidich wrote:
> > > +      */
> > > +     for (nb_num = 0; nb_num < num_k8_northbridges; nb_num++) {
> > > +             if ((k8_northbridges[nb_num]->device) == 0x1103)
> > > +                     goto out;
> > > +     }
> >
> > AFAIK that's all K8s so the code will never work on them.
> 
> Well, yes, this is Barcelona-only at the moment (and in all

Ok, but that was totally unclear to me which suggests it should
be somewhere in the description/changelog and possibly in the comments.

> likelihood, will extend to some future CPUs).  Indeed, as far as my
> testing has determined, there is no stepping of K8s which properly
> implements thermal throttling.  Even Rev F has a crippling erratum.

How about RevG?
> 
> > Same with the rename.
> 
> I disagree here.  The original code was exclusively focussed on
> setting some MCE-related "threshold".  With my changes, it's more
> generic.  "amd_" might not be the most appropriate prefix, but
> "threshold_" certainly is not.

But such changes should be separate.

> 
> > +             printk(KERN_CRIT "CPU 0x%x: Thermal monitoring not "
> > +                     "functional.\n", cpu);
> >
> > Why is that KERN_CRIT? Does not seem that critical to me.
> 
> So what this message really says is:  "The OS cannot hook the thermal
> interrupt because it has been hijacked or misconfigured by the BIOS.
> Therefore, the throttling MCEs which you might naturally expect to see
> on other Barcelona systems will not occur on this one.  Therefore, if
> your cooling policy relies on these MCEs (bad idea, but legal), it

I don't think any Linux relies on that MCE for cooling so that seems like
a highly hypothetical situation. You would need a kernel driver for it
because there is no way to get it in user space even using a mce trigger.

If anything that should be handled through ACPI thermal trip anyways.

I think my point stands that this is not critical.

> I agree, but it sounds like that should be the subject of another
> patch which touches lots of other components.

Sure.

> 
> > The erratum number/part number should be documented and the kernel ought to print
> > why it didn't initialize thermal on that hardware.
> 
> I don't think there's a need for this, because 0x1103 came before
> Barcelona; therefore, we can just consider this as a
> Barcelona-and-later feature.

Ok, but that needs to be documented somewhere. Otherwise people will
eventually find out and then require lots of research to find this out.

e.g. if you had a high level comment that says

/* AMD Fam10h supports thermal throttling. When such a event happens we do 
 * .... because of X Y Z. Implement this in the following code.
 * We don't do it on K8 due to crippling errata.
 */

it would have been all clear.

> 
> > You're technically racy against parallel cpu hotplug happening from initrd
> > (which already runs during initcall -- yes that is a deathtrap)
> > although that is typically hard to fix.
> 
> Can you elaborate?  I'm assuming that I still need to fix this in
> order to get the patch accepted, no?

initrd user space can run in parallel with __initcall and in theory
it could trigger CPU hotplug events (and do lots of other things too) 

It's quite unexpected and regularly causes bugs.

A lot of subsystems are racy this way.  Also it's not easily fixable because 
of locking problems in the cpu hotplug subsystem.

I just mentioned it for completeness; fixing it is probably beyond
scope of your patch and not a merging requirement.

> 
> > thermal_apic_init_allowed seems like a hack. A separate notifier would
> > be cleaner.
> 
> A variable is always lighter than a notifier.  I'm just trying to make

Lighter but still a hack.  I don't insist on it, it just makes
your code uglier than it could be.

> sure that if a CPU comes online before I'm ready to hook the thermal
> interrupt, that it does not attempt to do so prematurely.  I'm not
> sure what a notifier would do, other than set
> thermal_apic_init_allowed anyway.
> 
> > PCI is already initialized before normal initcall, otherwise pretty much
> > all drivers would break.
> 
> I'll change the comment.  I still want the convenience of a process
> context, so I plan to keep the late_initcall().

All __initcalls are in process context. late etc. just changes the
ordering inside them.

> 
> > mce_thermal.c seems to be just unnecessary to me. It would be cleaner
> > to just keep the separate own handlers, especially since they do quite
> > different things anyways. Don't mesh together what is quite different.
> 
> As I mentioned to Andrew Morton, this is not easily avoidable without
> some gross runtime CPUID hack.  Specifically, how do you handle a
> kernel build which supports both AMD and Intel thermal throttling,
> wherein you don't know which CPU -- if either -- is present until
> runtime?  The root of the problem is that both architectures share the
> same APIC vector, but implement throttling in incompatible ways.

You set different handlers depending on the CPU type.

> 
> > Also "cpu_specific_smp_thermal_interrupt_callback_t" is definitely too long.
> 
> I'll delete some characters to make it more obscure and Linux-like.

If you just set different handlers you don't need it at all.

-Andi

  reply	other threads:[~2008-01-02 19:58 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-12-17 18:54 [PATCH] AMD Thermal Interrupt Support Russell Leidich
2007-12-25 22:04 ` Andrew Morton
2007-12-27 18:57   ` Russell Leidich
2007-12-28  7:34     ` Andrew Morton
2007-12-28 20:40       ` Russell Leidich
2007-12-29  2:11         ` Andi Kleen
2007-12-29  2:30           ` Valdis.Kletnieks
2007-12-29  2:34             ` Andi Kleen
2007-12-29  2:57               ` Valdis.Kletnieks
2007-12-30 18:39         ` Andi Kleen
2008-01-02 19:43           ` Russell Leidich
2008-01-02 20:00             ` Andi Kleen [this message]
2008-01-02 21:12               ` Russell Leidich
2008-01-02 21:33                 ` Torsten Kaiser
2008-01-02 21:50                   ` Russell Leidich
2008-01-02 21:54                     ` Andi Kleen
2008-01-02 22:32                       ` Russell Leidich
2008-01-04 21:33                       ` Russell Leidich
2008-01-04 22:26                         ` Andi Kleen
2008-01-05  0:53                           ` Russell Leidich
2008-01-05 13:24                             ` Andi Kleen
2008-01-05 20:08                               ` Russell Leidich
2008-01-08 23:42                                 ` Russell Leidich
2008-01-08 23:52                                   ` Andi Kleen
2008-01-09  2:28                                     ` Russell Leidich
2008-01-09  2:37                                       ` Andi Kleen
2008-01-11  2:21                                         ` Russell Leidich
2008-01-18  1:06                                           ` Russell Leidich
2008-02-03  0:10                                             ` Andrew Morton
2008-02-03  0:27                                               ` Harvey Harrison
2008-02-03  0:39                                                 ` Andrew Morton
2008-02-03  0:50                                                   ` [PATCH] x86: Remove pt_regs arg from smp_thermal_interrupt Harvey Harrison
2008-02-03  1:01                                                     ` Harvey Harrison
2008-02-04  7:12                                                       ` Andi Kleen
  -- strict thread matches above, loose matches on Subject: below --
2007-12-11 18:44 [PATCH] AMD Thermal Interrupt Support Russell Leidich
2007-12-11 19:13 ` Russell Leidich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080102200039.GA13784@one.firstfloor.org \
    --to=andi@firstfloor.org \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=rml@google.com \
    --cc=tglx@linutronix.de \
    --cc=valdis.kletnieks@vt.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox