public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
To: Andi Kleen <ak@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>, Ingo Molnar <mingo@elte.hu>,
	Huang Ying <ying.huang@intel.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	linux-kernel@vger.kernel.org
Subject: 32bit mce unification (Re: Re-implement MCE log ring buffer as per-CPU ring buffer II)
Date: Thu, 30 Apr 2009 16:38:12 +0900	[thread overview]
Message-ID: <49F95564.5020402@jp.fujitsu.com> (raw)
In-Reply-To: <49F8A734.8050106@linux.intel.com>

Hi Andi,

Andi Kleen wrote:
> H. Peter Anvin wrote:
>> Andi Kleen wrote:
>>> And then restart here? Add 32bit workarounds, add hook
>>> to initialize p5/winchip, make 64bit code 32bit clean.
>>> That can come straight from my patchkit. I can do that
>>> if there is interest.
>>>
>>
>> Could you come up with a tree we can look at?
> 
> Sorry for taking longer, had to fight some infrastructure issues.
> 
> I merged the two trees in
> git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6.git
> mce-32bit-merge
> 
> This is my old 32bit mce unification tree forward ported to mce2 and
> some fixes
> for the code in mce2. This is currently with Kconfig for old/new,
> deprecation
> of old for .32 (could be done directly too, but doing it with this
> additional
> step seems safer to me because it will allow easier debugging)
> 
> This currently includes the two error injection patches as last (used
> for testing), those can be dropped/added later of course too.
> 
> If this is ok I can respin the rest of the mce patches on top of that
> tree.

Thank you for providing updates.
They are looks good, but I have some comments:


We can get an instant numbered list from your tree by following:

$ git format-patch origin/master..origin/mce-32bit-merge
0001-x86-mce-Add-mce_threshold-option-for-intel-cmci.patch
0002-x86-mce-Cleanup-param-parser.patch
0003-x86-mce-Add-mce-nopoll-option-to-disable-timer-pol.patch
0004-x86-mce-clean-up-the-mce_64.c-code.patch
0005-x86-mce-clean-up-the-sysfs-variables-namespace.patch
0006-x86-mce-clean-up-mce_amd_64.c.patch
0007-x86-mce-clean-up-p4.c.patch
0008-x86-mce-clean-up-therm_throt.c.patch
0009-x86-mce-clean-up-p6.c.patch
0010-x86-mce-clean-up-k7.c.patch
0011-x86-mce-clean-up-non-fatal.c.patch
0012-x86-mce-clean-up-mce_32.c.patch
0013-x86-mce-clean-up-p5.c.patch
0014-x86-mce-clean-up-winchip.c.patch
0015-x86-mce-clean-up-mce.h.patch
0016-mce-unify-Intel-thermal-init-prepare.patch
0017-x86-mce-unify-the-Intel-thermal-interrupt-code.patch
0018-x86-mce-clean-up-mce_intel.c.patch
0019-x86-mce-clean-up-mce_64.c.patch
0020-x86-mce-prepare-mce.h-and-mce_64.c-for-unification.patch
0021-x86-mce-prepare-unification.patch
0022-x86-mce-unify-prepare-for-32-bit.patch
0023-x86-mce-unify.patch
0024-x86-mce-print-number-of-MCE-banks.patch
0025-x86-MCE-Make-mce_amd_64.c-compile-again.patch
0026-Revert-x86-mce-Add-mce-nopoll-option-to-disable-t.patch
0027-x86-MCE-Move-ifdef-X86_64-to-the-beginning-of-the.patch
0028-x86-MCE-Initial-steps-to-make-64bit-mce-code-32bit.patch
0029-x86-MCE-Implement-the-PPro-bank-0-quirk-in-the-64b.patch
0030-x86-MCE-Port-K7-bank-0-quirk-to-64bit-mce-code.patch
0031-x86-MCE-Use-a-call-vector-to-call-the-64bit-mce-ha.patch
0032-x86-MCE-Rename-64bit-mce_dont_init-to-mce_disabled.patch
0033-x86-MCE-Move-mce_disabled-option-into-common-64bit.patch
0034-x86-MCE-Remove-machine-check-handler-idle-notify-o.patch
0035-x86-MCE-Remove-oops_begin-use-in-64bit-machine-c.patch
0036-x86-MCE-Remove-unused-stop-restart_mce-on-32bit.patch
0037-x86-MCE-Use-64bit-machine-check-code-on-32bit.patch
0038-x86-MCE-Deprecate-old-32bit-machine-check-code.patch
0039-x86-MCE-Enable-MCE_INTEL-for-32bit-new-MCE.patch
0040-x86-MCE-Enable-MCE_AMD-for-32bit-NEW_MCE.patch
0041-x86-MCE-Document-new-32bit-mcelog-requirement-in-D.patch
0042-x86-MCE-Add-MSR-read-wrappers-for-easier-error-inj.patch
0043-x86-MCE-Add-basic-error-injection-infrastructure.patch


Let's start comment/review on this list:

0001-x86-mce-Add-mce_threshold-option-for-intel-cmci.patch
0002-x86-mce-Cleanup-param-parser.patch
0003-x86-mce-Add-mce-nopoll-option-to-disable-timer-pol.patch

  These 3 above are merged in tip/x86/mce2, but not pushed into
  mainline yet.
  You(Andi) and I agreed that 0001 and 0003 should be reverted and
  re-implemented before pushing them into .31, and I have patches
  to do it.
  One of revert patches is placed in 0026.  It's fine but I'd like to
  replace it with that I own, which have more detailed description.

0004-x86-mce-clean-up-the-mce_64.c-code.patch
0005-x86-mce-clean-up-the-sysfs-variables-namespace.patch
0006-x86-mce-clean-up-mce_amd_64.c.patch
0007-x86-mce-clean-up-p4.c.patch
0008-x86-mce-clean-up-therm_throt.c.patch
0009-x86-mce-clean-up-p6.c.patch
0010-x86-mce-clean-up-k7.c.patch
0011-x86-mce-clean-up-non-fatal.c.patch
0012-x86-mce-clean-up-mce_32.c.patch
0013-x86-mce-clean-up-p5.c.patch
0014-x86-mce-clean-up-winchip.c.patch
0015-x86-mce-clean-up-mce.h.patch
0016-mce-unify-Intel-thermal-init-prepare.patch
0017-x86-mce-unify-the-Intel-thermal-interrupt-code.patch
0018-x86-mce-clean-up-mce_intel.c.patch
0019-x86-mce-clean-up-mce_64.c.patch
0020-x86-mce-prepare-mce.h-and-mce_64.c-for-unification.patch
0021-x86-mce-prepare-unification.patch
0022-x86-mce-unify-prepare-for-32-bit.patch
0023-x86-mce-unify.patch
0024-x86-mce-print-number-of-MCE-banks.patch

  These 21 patches above are now top of tip/x86/mce2.
  Some are broken and some following patches from Andi are
  addressed to fix it.

0025-x86-MCE-Make-mce_amd_64.c-compile-again.patch

  The description of this patch is "It's device_mce, not mce_dev."
  This is for a bug of 0005, which was intended to rename device_mce
  to mce_dev, but the rename was not finished.
  I suppose "mce_" prefix is better for name space, so I think
  rework on 0005 or incremental fix to finish renaming will be more
  better solution.

0026-Revert-x86-mce-Add-mce-nopoll-option-to-disable-t.patch

  This reverts 0003.  I'd like to replace this with that I own.

0027-x86-MCE-Move-ifdef-X86_64-to-the-beginning-of-the.patch

  This is for a bug of 0022, that places ifdefs wrongly.


0028-x86-MCE-Initial-steps-to-make-64bit-mce-code-32bit.patch
0029-x86-MCE-Implement-the-PPro-bank-0-quirk-in-the-64b.patch
0030-x86-MCE-Port-K7-bank-0-quirk-to-64bit-mce-code.patch
0031-x86-MCE-Use-a-call-vector-to-call-the-64bit-mce-ha.patch
0032-x86-MCE-Rename-64bit-mce_dont_init-to-mce_disabled.patch
0033-x86-MCE-Move-mce_disabled-option-into-common-64bit.patch
0034-x86-MCE-Remove-machine-check-handler-idle-notify-o.patch
0035-x86-MCE-Remove-oops_begin-use-in-64bit-machine-c.patch
0036-x86-MCE-Remove-unused-stop-restart_mce-on-32bit.patch
0037-x86-MCE-Use-64bit-machine-check-code-on-32bit.patch
0038-x86-MCE-Deprecate-old-32bit-machine-check-code.patch
0039-x86-MCE-Enable-MCE_INTEL-for-32bit-new-MCE.patch
0040-x86-MCE-Enable-MCE_AMD-for-32bit-NEW_MCE.patch
0041-x86-MCE-Document-new-32bit-mcelog-requirement-in-D.patch
0042-x86-MCE-Add-MSR-read-wrappers-for-easier-error-inj.patch
0043-x86-MCE-Add-basic-error-injection-infrastructure.patch

  These 16 patches above is 32bit mce unification by Andi.
  0028~0036 are preparation and cleanup,
  0037~0041 are main part of unification, and
  0042~0043 are for error injection.
  My comments are as followings:


  0029,0037,0038,0041:
  have space before tab in indent / trailing whitespace.


  0038:
  The change in the first line is not required, and we need to schedule
  feature removal not so soon.
  Since the patch description is "Schedule for removal in 2.6.32",
  the "When:" should be .32 instead of .31

  >@@ -1,4 +1,4 @@
  >-The following is a list of files and features that are going to be
  >+he following is a list of files and features that are going to be
  > removed in the kernel source tree.  Every entry should contain what
  > exactly is going away, why it is happening, and who is going to be doing
  > the work.  When the feature is removed from the kernel, it should also
  >@@ -428,3 +428,13 @@
  >        After a reasonable transition period, we will remove the legacy
  >        fakephp interface.
  > Who:   Alex Chiang <achiang@hp.com>
  >+
  >+----------------------------
  >+
  >+What:  CONFIG_X86_OLD_MCE
  >+When:  2.6.31
  >+Why:   Remove the old legacy 32bit machine check code. This has been
  >+       superseded by the newer machine check code from the 64bit port,
  >+       but the old version has been kept around for easier testing. Note this
  >+        doesn't impact the old P5 and WinChip machine check handlers.
  >+Who:   Andi Kleen <andi@firstfloor.org>


  0039:
  The comment is wrongly changed, and it would be better to place vectors
  in sorted order.

  >@@ -88,12 +88,13 @@
  > #define THERMAL_APIC_VECTOR            0xfa
  >
  > #ifdef CONFIG_X86_32
  >-/* 0xf8 - 0xf9 : free */
  >+/* 0xf9 : free */
  > #else
  >-# define THRESHOLD_APIC_VECTOR         0xf9
  > # define UV_BAU_MESSAGE                        0xf8
  > #endif
  >
  >+#define THRESHOLD_APIC_VECTOR          0xf9
  >+
  > /* f0-f7 used for spreading out TLB flushes: */
  > #define INVALIDATE_TLB_VECTOR_END      0xf7
  > #define INVALIDATE_TLB_VECTOR_START    0xf0


  0041:
  Is this version correct?
  The latest (git://git.kernel.org/pub/scm/utils/cpu/mce/mcelog.git) is:
  #define MCELOG_VERSION "0.8pre2"

  >@@ -48,6 +48,7 @@ o  procps                 3.2.0                   # ps --version
  > o  oprofile               0.9                     # oprofiled --version
  > o  udev                   081                     # udevinfo -V
  > o  grub                   0.93                    # grub --version
  >+o  mcelog                0.6
  >
  > Kernel compilation
  > ==================


  0043:
  From checkpatch:
  WARNING: usage of NR_CPUS is often wrong - consider using cpu_possible(), num_possible_cpus(), for_each_possible_cpu(), etc
  #174: FILE: arch/x86/kernel/cpu/mcheck/mce-inject.c:96:
  +       if (m.cpu >= NR_CPUS || !cpu_online(m.cpu))

  Plus, with CONFIG_X86_MCE_INJECT=m, I got:
    ERROR: "add_timer_on" [arch/x86/kernel/cpu/mcheck/mce-inject.ko] undefined!
  IIRC there was a patch named "Export add_timer_on for modules" for
  error injection. (Or how about using bool instead of tristate if there
  are no strong reason to make it kernel module?)

  >@@ -823,6 +823,14 @@ config X86_MCE_THRESHOLD
  >        bool
  >        default y
  >
  >+config X86_MCE_INJECT
  >+       depends on X86_NEW_MCE
  >+       tristate "Machine check injector support"
  >+       help
  >+         Provide support for injecting machine checks for testing purposes.
  >+         If you don't know what a machine check is and you don't do kernel
  >+         QA it is safe to say n.
  >+
  > config X86_MCE_NONFATAL
  >        tristate "Check for non-fatal errors on AMD Athlon/Duron / Intel Pentium 4"
  >        depends on X86_OLD_MCE


Thanks,
H.Seto


  reply	other threads:[~2009-04-30  7:39 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-04-22  9:11 Re-implement MCE log ring buffer as per-CPU ring buffer Huang Ying
2009-04-22  9:22 ` Ingo Molnar
2009-04-22 10:16   ` Robert Richter
2009-04-22 10:19     ` Ingo Molnar
2009-04-22 11:35       ` Andi Kleen
2009-04-22 11:30     ` Andi Kleen
2009-04-24  6:06     ` Huang Ying
2009-04-24 10:09       ` Robert Richter
2009-04-24 13:36         ` Steven Rostedt
2009-04-27  7:49           ` Huang Ying
2009-04-27  7:53             ` Andi Kleen
2009-04-27 14:39               ` Steven Rostedt
2009-04-27 14:44                 ` Davide Libenzi
2009-04-27 14:57                   ` Steven Rostedt
2009-04-27 16:14                   ` Valdis.Kletnieks
2009-04-27 17:05                 ` Theodore Tso
2009-04-27 20:06                   ` H. Peter Anvin
2009-04-27 14:27             ` Steven Rostedt
2009-04-27 14:36               ` Patenting kernel patches was " Andi Kleen
2009-04-27 14:49                 ` Steven Rostedt
2009-04-27 15:33                 ` Alan Cox
2009-04-27  0:58         ` Huang Ying
2009-04-22 11:20   ` Andi Kleen
2009-04-22 15:55     ` H. Peter Anvin
2009-04-22 15:58       ` Andi Kleen
2009-04-29 19:15       ` Re-implement MCE log ring buffer as per-CPU ring buffer II Andi Kleen
2009-04-30  7:38         ` Hidetoshi Seto [this message]
2009-04-22 11:11 ` Re-implement MCE log ring buffer as per-CPU ring buffer Andi Kleen
2009-04-24  5:51   ` Huang Ying

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=49F95564.5020402@jp.fujitsu.com \
    --to=seto.hidetoshi@jp.fujitsu.com \
    --cc=ak@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=tglx@linutronix.de \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox