Re: [GIT PULL] x86/mm changes for v4.21

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Luck, Tony" <tony.luck@intel.com>
To: Ingo Molnar <mingo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	linux-kernel@vger.kernel.org,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Andy Lutomirski <luto@kernel.org>, Borislav Petkov <bp@alien8.de>,
	Thomas Gleixner <tglx@linutronix.de>,
	Rik van Riel <riel@surriel.com>
Subject: Re: [GIT PULL] x86/mm changes for v4.21
Date: Wed, 6 Feb 2019 16:17:42 -0800	[thread overview]
Message-ID: <20190207001737.GA32096@agluck-desk> (raw)
In-Reply-To: <20181224231106.GA27438@gmail.com>

On Tue, Dec 25, 2018 at 12:11:06AM +0100, Ingo Molnar wrote:
> Peter Zijlstra (9):
>       x86/mm/cpa: Add ARRAY and PAGES_ARRAY selftests
>       x86/mm/cpa: Add __cpa_addr() helper
>       x86/mm/cpa: Make cpa_data::vaddr invariant
>       x86/mm/cpa: Simplify the code after making cpa->vaddr invariant
>       x86/mm/cpa: Optimize cpa_flush_array() TLB invalidation
>       x86/mm/cpa: Make cpa_data::numpages invariant
>       x86/mm/cpa: Fold cpa_flush_range() and cpa_flush_array() into a single cpa_flush() function
>       x86/mm/cpa: Better use CLFLUSHOPT
>       x86/mm/cpa: Rename @addrinarray to @numpages

Something in this series from Peter is causing problems with
machine check recovery.  The kernel dies with a #GP fault

[   93.363295] Disabling lock debugging due to kernel taint
[   93.369700] mce: Uncorrected hardware memory error in user-access at 3fbeeab400
[   93.369709] mce: [Hardware Error]: Machine check events logged
[   93.384415] mce: [Hardware Error]: Machine check events logged
[   93.390973] EDAC MC2: 1 UE memory read error on CPU_SrcID#1_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0x3fbeeab offset:0x400 grain:32 -  OVERFLOW recoverable area:DRAM err_code:0001:0090 socket:1 ha:0 channel_mask:1 rank:0)
[   93.413569] Memory failure: 0x3fbeeab: Killing einj_mem_uc:4810 due to hardware memory corruption
[   93.423501] Memory failure: 0x3fbeeab: recovery action for dirty LRU page: Recovered
[   93.432508] general protection fault: 0000 [#1] SMP PTI
[   93.438359] CPU: 11 PID: 0 Comm: swapper/11 Tainted: G   M              4.20.0-rc5+ #13
[   93.447294] Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRBDXSD1.86B.0338.V01.1603162127 03/16/2016
[   93.458869] RIP: 0010:native_flush_tlb_one_user+0x8c/0xa0
[   93.464899] Code: 02 48 8b 44 24 18 65 48 33 04 25 28 00 00 00 75 20 c9 c3 83 c0 01 48 89 7c 24 08 48 89 e1 80 cc 08 0f b7 c0 48 89 04 24 31 c0 <66> 0f 38 82 01 eb d0 e8 78 0e 05 00 0f 1f 84 00 00 00 00 00 0f 1f
[   93.485859] RSP: 0018:ffff99623f2c3f70 EFLAGS: 00010046
[   93.491692] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff99623f2c3f70
[   93.499658] RDX: 2e6b58da00000121 RSI: 0000000000000000 RDI: 7fff9981feeab000
[   93.507623] RBP: ffff99623f2c3f98 R08: 0000000000000002 R09: 0000000000021640
[   93.515587] R10: 000ecaed3e716d58 R11: 0000000000000000 R12: ffffffff84fe9920
[   93.523550] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[   93.531518] FS:  0000000000000000(0000) GS:ffff99623f2c0000(0000) knlGS:0000000000000000
[   93.540551] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   93.546966] CR2: 00005566b2cd5470 CR3: 00000049bee0a006 CR4: 00000000003606e0
[   93.554927] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   93.562892] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   93.570857] Call Trace:
[   93.573593]  <IRQ>
[   93.575846]  ? recalibrate_cpu_khz+0x10/0x10
[   93.580628]  __cpa_flush_tlb+0x2e/0x50
[   93.584830]  flush_smp_call_function_queue+0x35/0xe0
[   93.590390]  smp_call_function_interrupt+0x3a/0xd0
[   93.595740]  call_function_interrupt+0xf/0x20
[   93.600604]  </IRQ>

Build errors during bisection couldn't point to a single commit, but
it did limit it to:

	There are only 'skip'ped commits left to test.
	The first bad commit could be any of:
	83b4e39146aa70913580966e0f2b78b7c3492760
	935f5839827ef54b53406e80906f7c355eb73c1b
	fe0937b24ff5d7b343b9922201e469f9a6009d9d
	We cannot bisect more!

so (more descriptively):
83b4e39146aa ("x86/mm/cpa: Make cpa_data::numpages invariant")
935f5839827e ("x86/mm/cpa: Optimize cpa_flush_array() TLB invalidation")
fe0937b24ff5 ("x86/mm/cpa: Fold cpa_flush_range() and cpa_flush_array() into a single cpa_flush() function")


If I revert those three (together with the following three from this
merge - because I didn't want to run into more build problems). Then
machine check recovery starts working again.

Potentially the problem might be a non-canonical address passed down
by the machine check recovery code to switch the page with the error
to uncacheable. Perhaps the refactored code is now using that in the

	invpcid (%rcx),%rax

instruction that gets the #GP fault?

-Tony

next prev parent reply	other threads:[~2019-02-07  0:17 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-24 23:11 [GIT PULL] x86/mm changes for v4.21 Ingo Molnar
2018-12-27  2:45 ` pr-tracker-bot
2019-02-07  0:17 ` Luck, Tony [this message]
2019-02-07  0:33   ` Dave Hansen
2019-02-07  9:50     ` Peter Zijlstra
2019-02-07 10:18   ` Peter Zijlstra
2019-02-07 11:50     ` Linus Torvalds
2019-02-07 14:01       ` Peter Zijlstra
2019-02-07 17:36         ` Luck, Tony
2019-02-07 17:57           ` Peter Zijlstra
2019-02-07 18:07             ` Andy Lutomirski
2019-02-07 18:46               ` Luck, Tony
2019-02-07 20:24                 ` Andy Lutomirski
2019-02-07 22:53                   ` Linus Torvalds
2019-02-07 23:05                     ` Andy Lutomirski
2019-02-07 18:40             ` Luck, Tony
2019-02-08 12:08               ` [PATCH] x86/mm/cpa: Fix set_mce_nospec() Peter Zijlstra
2019-02-08 13:37                 ` [tip:x86/urgent] " tip-bot for Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190207001737.GA32096@agluck-desk \
    --to=tony.luck@intel.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mingo@kernel.org \
    --cc=riel@surriel.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.