All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andi Kleen <andi@firstfloor.org>
To: Russell Leidich <rml@google.com>
Cc: Andi Kleen <andi@firstfloor.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org,
	Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@elte.hu>,
	valdis.kletnieks@vt.edu
Subject: Re: [PATCH] AMD Thermal Interrupt Support
Date: Wed, 2 Jan 2008 21:00:39 +0100	[thread overview]
Message-ID: <20080102200039.GA13784@one.firstfloor.org> (raw)
In-Reply-To: <3f1a065b0801021143y5fc15e29r4201ac19bc7c1daa@mail.gmail.com>

On Wed, Jan 02, 2008 at 11:43:08AM -0800, Russell Leidich wrote:
> > > +      */
> > > +     for (nb_num = 0; nb_num < num_k8_northbridges; nb_num++) {
> > > +             if ((k8_northbridges[nb_num]->device) == 0x1103)
> > > +                     goto out;
> > > +     }
> >
> > AFAIK that's all K8s so the code will never work on them.
> 
> Well, yes, this is Barcelona-only at the moment (and in all

Ok, but that was totally unclear to me which suggests it should
be somewhere in the description/changelog and possibly in the comments.

> likelihood, will extend to some future CPUs).  Indeed, as far as my
> testing has determined, there is no stepping of K8s which properly
> implements thermal throttling.  Even Rev F has a crippling erratum.

How about RevG?
> 
> > Same with the rename.
> 
> I disagree here.  The original code was exclusively focussed on
> setting some MCE-related "threshold".  With my changes, it's more
> generic.  "amd_" might not be the most appropriate prefix, but
> "threshold_" certainly is not.

But such changes should be separate.

> 
> > +             printk(KERN_CRIT "CPU 0x%x: Thermal monitoring not "
> > +                     "functional.\n", cpu);
> >
> > Why is that KERN_CRIT? Does not seem that critical to me.
> 
> So what this message really says is:  "The OS cannot hook the thermal
> interrupt because it has been hijacked or misconfigured by the BIOS.
> Therefore, the throttling MCEs which you might naturally expect to see
> on other Barcelona systems will not occur on this one.  Therefore, if
> your cooling policy relies on these MCEs (bad idea, but legal), it

I don't think any Linux relies on that MCE for cooling so that seems like
a highly hypothetical situation. You would need a kernel driver for it
because there is no way to get it in user space even using a mce trigger.

If anything that should be handled through ACPI thermal trip anyways.

I think my point stands that this is not critical.

> I agree, but it sounds like that should be the subject of another
> patch which touches lots of other components.

Sure.

> 
> > The erratum number/part number should be documented and the kernel ought to print
> > why it didn't initialize thermal on that hardware.
> 
> I don't think there's a need for this, because 0x1103 came before
> Barcelona; therefore, we can just consider this as a
> Barcelona-and-later feature.

Ok, but that needs to be documented somewhere. Otherwise people will
eventually find out and then require lots of research to find this out.

e.g. if you had a high level comment that says

/* AMD Fam10h supports thermal throttling. When such a event happens we do 
 * .... because of X Y Z. Implement this in the following code.
 * We don't do it on K8 due to crippling errata.
 */

it would have been all clear.

> 
> > You're technically racy against parallel cpu hotplug happening from initrd
> > (which already runs during initcall -- yes that is a deathtrap)
> > although that is typically hard to fix.
> 
> Can you elaborate?  I'm assuming that I still need to fix this in
> order to get the patch accepted, no?

initrd user space can run in parallel with __initcall and in theory
it could trigger CPU hotplug events (and do lots of other things too) 

It's quite unexpected and regularly causes bugs.

A lot of subsystems are racy this way.  Also it's not easily fixable because 
of locking problems in the cpu hotplug subsystem.

I just mentioned it for completeness; fixing it is probably beyond
scope of your patch and not a merging requirement.

> 
> > thermal_apic_init_allowed seems like a hack. A separate notifier would
> > be cleaner.
> 
> A variable is always lighter than a notifier.  I'm just trying to make

Lighter but still a hack.  I don't insist on it, it just makes
your code uglier than it could be.

> sure that if a CPU comes online before I'm ready to hook the thermal
> interrupt, that it does not attempt to do so prematurely.  I'm not
> sure what a notifier would do, other than set
> thermal_apic_init_allowed anyway.
> 
> > PCI is already initialized before normal initcall, otherwise pretty much
> > all drivers would break.
> 
> I'll change the comment.  I still want the convenience of a process
> context, so I plan to keep the late_initcall().

All __initcalls are in process context. late etc. just changes the
ordering inside them.

> 
> > mce_thermal.c seems to be just unnecessary to me. It would be cleaner
> > to just keep the separate own handlers, especially since they do quite
> > different things anyways. Don't mesh together what is quite different.
> 
> As I mentioned to Andrew Morton, this is not easily avoidable without
> some gross runtime CPUID hack.  Specifically, how do you handle a
> kernel build which supports both AMD and Intel thermal throttling,
> wherein you don't know which CPU -- if either -- is present until
> runtime?  The root of the problem is that both architectures share the
> same APIC vector, but implement throttling in incompatible ways.

You set different handlers depending on the CPU type.

> 
> > Also "cpu_specific_smp_thermal_interrupt_callback_t" is definitely too long.
> 
> I'll delete some characters to make it more obscure and Linux-like.

If you just set different handlers you don't need it at all.

-Andi

  reply	other threads:[~2008-01-02 19:58 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-12-17 18:54 [PATCH] AMD Thermal Interrupt Support Russell Leidich
2007-12-25 22:04 ` Andrew Morton
2007-12-27 18:57   ` Russell Leidich
2007-12-28  7:34     ` Andrew Morton
2007-12-28 20:40       ` Russell Leidich
2007-12-29  2:11         ` Andi Kleen
2007-12-29  2:30           ` Valdis.Kletnieks
2007-12-29  2:34             ` Andi Kleen
2007-12-29  2:57               ` Valdis.Kletnieks
2007-12-30 18:39         ` Andi Kleen
2008-01-02 19:43           ` Russell Leidich
2008-01-02 20:00             ` Andi Kleen [this message]
2008-01-02 21:12               ` Russell Leidich
2008-01-02 21:33                 ` Torsten Kaiser
2008-01-02 21:50                   ` Russell Leidich
2008-01-02 21:54                     ` Andi Kleen
2008-01-02 22:32                       ` Russell Leidich
2008-01-04 21:33                       ` Russell Leidich
2008-01-04 22:26                         ` Andi Kleen
2008-01-05  0:53                           ` Russell Leidich
2008-01-05 13:24                             ` Andi Kleen
2008-01-05 20:08                               ` Russell Leidich
2008-01-08 23:42                                 ` Russell Leidich
2008-01-08 23:52                                   ` Andi Kleen
2008-01-09  2:28                                     ` Russell Leidich
2008-01-09  2:37                                       ` Andi Kleen
2008-01-11  2:21                                         ` Russell Leidich
2008-01-18  1:06                                           ` Russell Leidich
2008-02-03  0:10                                             ` Andrew Morton
2008-02-03  0:27                                               ` Harvey Harrison
2008-02-03  0:39                                                 ` Andrew Morton
2008-02-03  0:50                                                   ` [PATCH] x86: Remove pt_regs arg from smp_thermal_interrupt Harvey Harrison
2008-02-03  1:01                                                     ` Harvey Harrison
2008-02-04  7:12                                                       ` Andi Kleen
  -- strict thread matches above, loose matches on Subject: below --
2007-12-11 18:44 [PATCH] AMD Thermal Interrupt Support Russell Leidich
2007-12-11 19:13 ` Russell Leidich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080102200039.GA13784@one.firstfloor.org \
    --to=andi@firstfloor.org \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=rml@google.com \
    --cc=tglx@linutronix.de \
    --cc=valdis.kletnieks@vt.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.