[PATCH v3] x86,mm: print likely CPU at segfault time

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3] x86,mm: print likely CPU at segfault time
@ 2022-08-05 14:16 Rik van Riel
  2022-08-05 14:27 ` Borislav Petkov
  2022-08-24 11:03 ` [tip: x86/cpu] x86/mm: Print " tip-bot2 for Rik van Riel
  0 siblings, 2 replies; 7+ messages in thread
From: Rik van Riel @ 2022-08-05 14:16 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: x86, linux-kernel, kernel-team, Borislav Petkov, Dave Hansen

In a large enough fleet of computers, it is common to have a few bad CPUs.
Those can often be identified by seeing that some commonly run kernel code,
which runs fine everywhere else, keeps crashing on the same CPU core on one
particular bad system.

However, the failure modes in CPUs that have gone bad over the years are
often oddly specific, and the only bad behavior seen might be segfaults
in programs like bash, python, or various system daemons that run fine
everywhere else.

Add a printk() to show_signal_msg() to print the CPU, core, and socket
at segfault time. This is not perfect, since the task might get rescheduled
on another CPU between when the fault hit, and when the message is printed,
but in practice this has been good enough to help us identify several bad
CPU cores.

segfault[1349]: segfault at 0 ip 000000000040113a sp 00007ffc6d32e360 error 4 in segfault[401000+1000] on CPU 0 (core 0, socket 0)

This printk can be controlled through /proc/sys/debug/exception-trace

Signed-off-by: Rik van Riel <riel@surriel.com>
CC: Dave Jones <dsj@fb.com>
---
v3: READ_ONCE around raw_smp_processor_id() does not work, lets just omit that
    instead of making the code harder to read

 arch/x86/mm/fault.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index fad8faa29d04..c7a5bbf40367 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -769,6 +769,8 @@ show_signal_msg(struct pt_regs *regs, unsigned long error_code,
 		unsigned long address, struct task_struct *tsk)
 {
 	const char *loglvl = task_pid_nr(tsk) > 1 ? KERN_INFO : KERN_EMERG;
+	/* This is a racy snapshot, but it's better than nothing. */
+	int cpu = raw_smp_processor_id();
 
 	if (!unhandled_signal(tsk, SIGSEGV))
 		return;
@@ -782,6 +784,14 @@ show_signal_msg(struct pt_regs *regs, unsigned long error_code,
 
 	print_vma_addr(KERN_CONT " in ", regs->ip);
 
+	/*
+	 * Dump the likely CPU where the fatal segfault happened.
+	 * This can help identify faulty hardware.
+	 */
+	printk(KERN_CONT " on CPU %d (core %d, socket %d)", cpu,
+	       topology_core_id(cpu), topology_physical_package_id(cpu));
+
+
 	printk(KERN_CONT "\n");
 
 	show_opcodes(regs, loglvl);
-- 
2.37.1



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] x86,mm: print likely CPU at segfault time
  2022-08-05 14:16 [PATCH v3] x86,mm: print likely CPU at segfault time Rik van Riel
@ 2022-08-05 14:27 ` Borislav Petkov
  2022-08-05 14:40   ` Rik van Riel
  2022-08-05 17:09   ` Ira Weiny
  2022-08-24 11:03 ` [tip: x86/cpu] x86/mm: Print " tip-bot2 for Rik van Riel
  1 sibling, 2 replies; 7+ messages in thread
From: Borislav Petkov @ 2022-08-05 14:27 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Ingo Molnar, x86, linux-kernel, kernel-team, Dave Hansen

On Fri, Aug 05, 2022 at 10:16:44AM -0400, Rik van Riel wrote:
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index fad8faa29d04..c7a5bbf40367 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -769,6 +769,8 @@ show_signal_msg(struct pt_regs *regs, unsigned long error_code,
>  		unsigned long address, struct task_struct *tsk)
>  {
>  	const char *loglvl = task_pid_nr(tsk) > 1 ? KERN_INFO : KERN_EMERG;
> +	/* This is a racy snapshot, but it's better than nothing. */
> +	int cpu = raw_smp_processor_id();

Please read this in exc_page_fault() and hand it down to helpers.

Alternatively, I'm being told there's a patchset in the works which
will allow for any exception handler to pass in additional information
downwards through an extended pt_regs. Then, saving the CPU number on
which the handler is running would work generically everywhere.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] x86,mm: print likely CPU at segfault time
  2022-08-05 14:27 ` Borislav Petkov
@ 2022-08-05 14:40   ` Rik van Riel
  2022-08-06  8:47     ` Ingo Molnar
  2022-08-05 17:09   ` Ira Weiny
  1 sibling, 1 reply; 7+ messages in thread
From: Rik van Riel @ 2022-08-05 14:40 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Ingo Molnar, x86, linux-kernel, kernel-team, Dave Hansen,
	Thomas Gleixner, Andy Lutomirski

On Fri, 5 Aug 2022 16:27:40 +0200
Borislav Petkov <bp@alien8.de> wrote:

> On Fri, Aug 05, 2022 at 10:16:44AM -0400, Rik van Riel wrote:
> > diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> > index fad8faa29d04..c7a5bbf40367 100644
> > --- a/arch/x86/mm/fault.c
> > +++ b/arch/x86/mm/fault.c
> > @@ -769,6 +769,8 @@ show_signal_msg(struct pt_regs *regs, unsigned long error_code,
> >  		unsigned long address, struct task_struct *tsk)
> >  {
> >  	const char *loglvl = task_pid_nr(tsk) > 1 ? KERN_INFO : KERN_EMERG;
> > +	/* This is a racy snapshot, but it's better than nothing. */
> > +	int cpu = raw_smp_processor_id();  
> 
> Please read this in exc_page_fault() and hand it down to helpers.

Below is the change that implements your suggestion.

If there is consensus among the x86 maintainers that this is
desirable, I am more than happy to merge that change into my
patch and resubmit v4.

I don't have a strong opinion either way.

---8<---

From 444f8588f0edfd8586a86e85191ad8fa8b7c6a6c Mon Sep 17 00:00:00 2001
From: Rik van Riel <riel@surriel.com>
Date: Fri, 5 Aug 2022 10:32:11 -0400
Subject: [PATCH 2/2] x86,mm: get CPU number for segfault printk before
 enabling preemption

Get the CPU number for the segfault printk earlier in the page fault
handler, before preemption is enabled.

Suggested-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Rik van Riel <riel@surriel.com>
---
 arch/x86/mm/fault.c | 58 +++++++++++++++++++++++----------------------
 1 file changed, 30 insertions(+), 28 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index c7a5bbf40367..bd06b22826b2 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -766,11 +766,9 @@ kernelmode_fixup_or_oops(struct pt_regs *regs, unsigned long error_code,
  */
 static inline void
 show_signal_msg(struct pt_regs *regs, unsigned long error_code,
-		unsigned long address, struct task_struct *tsk)
+		unsigned long address, struct task_struct *tsk, int cpu)
 {
 	const char *loglvl = task_pid_nr(tsk) > 1 ? KERN_INFO : KERN_EMERG;
-	/* This is a racy snapshot, but it's better than nothing. */
-	int cpu = raw_smp_processor_id();
 
 	if (!unhandled_signal(tsk, SIGSEGV))
 		return;
@@ -808,7 +806,7 @@ static bool is_vsyscall_vaddr(unsigned long vaddr)
 
 static void
 __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code,
-		       unsigned long address, u32 pkey, int si_code)
+		       unsigned long address, u32 pkey, int si_code, int cpu)
 {
 	struct task_struct *tsk = current;
 
@@ -846,7 +844,7 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code,
 		return;
 
 	if (likely(show_unhandled_signals))
-		show_signal_msg(regs, error_code, address, tsk);
+		show_signal_msg(regs, error_code, address, tsk, cpu);
 
 	set_signal_archinfo(address, error_code);
 
@@ -860,14 +858,14 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code,
 
 static noinline void
 bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code,
-		     unsigned long address)
+		     unsigned long address, int cpu)
 {
-	__bad_area_nosemaphore(regs, error_code, address, 0, SEGV_MAPERR);
+	__bad_area_nosemaphore(regs, error_code, address, 0, SEGV_MAPERR, cpu);
 }
 
 static void
 __bad_area(struct pt_regs *regs, unsigned long error_code,
-	   unsigned long address, u32 pkey, int si_code)
+	   unsigned long address, u32 pkey, int si_code, int cpu)
 {
 	struct mm_struct *mm = current->mm;
 	/*
@@ -876,13 +874,14 @@ __bad_area(struct pt_regs *regs, unsigned long error_code,
 	 */
 	mmap_read_unlock(mm);
 
-	__bad_area_nosemaphore(regs, error_code, address, pkey, si_code);
+	__bad_area_nosemaphore(regs, error_code, address, pkey, si_code, cpu);
 }
 
 static noinline void
-bad_area(struct pt_regs *regs, unsigned long error_code, unsigned long address)
+bad_area(struct pt_regs *regs, unsigned long error_code, unsigned long address,
+	 int cpu)
 {
-	__bad_area(regs, error_code, address, 0, SEGV_MAPERR);
+	__bad_area(regs, error_code, address, 0, SEGV_MAPERR, cpu);
 }
 
 static inline bool bad_area_access_from_pkeys(unsigned long error_code,
@@ -904,7 +903,8 @@ static inline bool bad_area_access_from_pkeys(unsigned long error_code,
 
 static noinline void
 bad_area_access_error(struct pt_regs *regs, unsigned long error_code,
-		      unsigned long address, struct vm_area_struct *vma)
+		      unsigned long address, struct vm_area_struct *vma,
+		      int cpu)
 {
 	/*
 	 * This OSPKE check is not strictly necessary at runtime.
@@ -934,9 +934,9 @@ bad_area_access_error(struct pt_regs *regs, unsigned long error_code,
 		 */
 		u32 pkey = vma_pkey(vma);
 
-		__bad_area(regs, error_code, address, pkey, SEGV_PKUERR);
+		__bad_area(regs, error_code, address, pkey, SEGV_PKUERR, cpu);
 	} else {
-		__bad_area(regs, error_code, address, 0, SEGV_ACCERR);
+		__bad_area(regs, error_code, address, 0, SEGV_ACCERR, cpu);
 	}
 }
 
@@ -1155,7 +1155,7 @@ bool fault_in_kernel_space(unsigned long address)
  */
 static void
 do_kern_addr_fault(struct pt_regs *regs, unsigned long hw_error_code,
-		   unsigned long address)
+		   unsigned long address, int cpu)
 {
 	/*
 	 * Protection keys exceptions only happen on user pages.  We
@@ -1214,7 +1214,7 @@ do_kern_addr_fault(struct pt_regs *regs, unsigned long hw_error_code,
 	 * Don't take the mm semaphore here. If we fixup a prefetch
 	 * fault we could otherwise deadlock:
 	 */
-	bad_area_nosemaphore(regs, hw_error_code, address);
+	bad_area_nosemaphore(regs, hw_error_code, address, cpu);
 }
 NOKPROBE_SYMBOL(do_kern_addr_fault);
 
@@ -1229,7 +1229,8 @@ NOKPROBE_SYMBOL(do_kern_addr_fault);
 static inline
 void do_user_addr_fault(struct pt_regs *regs,
 			unsigned long error_code,
-			unsigned long address)
+			unsigned long address,
+			int cpu)
 {
 	struct vm_area_struct *vma;
 	struct task_struct *tsk;
@@ -1289,7 +1290,7 @@ void do_user_addr_fault(struct pt_regs *regs,
 	 * in a region with pagefaults disabled then we must not take the fault
 	 */
 	if (unlikely(faulthandler_disabled() || !mm)) {
-		bad_area_nosemaphore(regs, error_code, address);
+		bad_area_nosemaphore(regs, error_code, address, cpu);
 		return;
 	}
 
@@ -1351,7 +1352,7 @@ void do_user_addr_fault(struct pt_regs *regs,
 			 * Fault from code in kernel from
 			 * which we do not expect faults.
 			 */
-			bad_area_nosemaphore(regs, error_code, address);
+			bad_area_nosemaphore(regs, error_code, address, cpu);
 			return;
 		}
 retry:
@@ -1367,17 +1368,17 @@ void do_user_addr_fault(struct pt_regs *regs,
 
 	vma = find_vma(mm, address);
 	if (unlikely(!vma)) {
-		bad_area(regs, error_code, address);
+		bad_area(regs, error_code, address, cpu);
 		return;
 	}
 	if (likely(vma->vm_start <= address))
 		goto good_area;
 	if (unlikely(!(vma->vm_flags & VM_GROWSDOWN))) {
-		bad_area(regs, error_code, address);
+		bad_area(regs, error_code, address, cpu);
 		return;
 	}
 	if (unlikely(expand_stack(vma, address))) {
-		bad_area(regs, error_code, address);
+		bad_area(regs, error_code, address, cpu);
 		return;
 	}
 
@@ -1387,7 +1388,7 @@ void do_user_addr_fault(struct pt_regs *regs,
 	 */
 good_area:
 	if (unlikely(access_error(error_code, vma))) {
-		bad_area_access_error(regs, error_code, address, vma);
+		bad_area_access_error(regs, error_code, address, vma, cpu);
 		return;
 	}
 
@@ -1458,7 +1459,7 @@ void do_user_addr_fault(struct pt_regs *regs,
 			     VM_FAULT_HWPOISON_LARGE))
 			do_sigbus(regs, error_code, address, fault);
 		else if (fault & VM_FAULT_SIGSEGV)
-			bad_area_nosemaphore(regs, error_code, address);
+			bad_area_nosemaphore(regs, error_code, address, cpu);
 		else
 			BUG();
 	}
@@ -1480,7 +1481,7 @@ trace_page_fault_entries(struct pt_regs *regs, unsigned long error_code,
 
 static __always_inline void
 handle_page_fault(struct pt_regs *regs, unsigned long error_code,
-			      unsigned long address)
+			      unsigned long address, int cpu)
 {
 	trace_page_fault_entries(regs, error_code, address);
 
@@ -1489,9 +1490,9 @@ handle_page_fault(struct pt_regs *regs, unsigned long error_code,
 
 	/* Was the fault on kernel-controlled part of the address space? */
 	if (unlikely(fault_in_kernel_space(address))) {
-		do_kern_addr_fault(regs, error_code, address);
+		do_kern_addr_fault(regs, error_code, address, cpu);
 	} else {
-		do_user_addr_fault(regs, error_code, address);
+		do_user_addr_fault(regs, error_code, address, cpu);
 		/*
 		 * User address page fault handling might have reenabled
 		 * interrupts. Fixing up all potential exit points of
@@ -1506,6 +1507,7 @@ handle_page_fault(struct pt_regs *regs, unsigned long error_code,
 DEFINE_IDTENTRY_RAW_ERRORCODE(exc_page_fault)
 {
 	unsigned long address = read_cr2();
+	int cpu = raw_smp_processor_id();
 	irqentry_state_t state;
 
 	prefetchw(&current->mm->mmap_lock);
@@ -1547,7 +1549,7 @@ DEFINE_IDTENTRY_RAW_ERRORCODE(exc_page_fault)
 	state = irqentry_enter(regs);
 
 	instrumentation_begin();
-	handle_page_fault(regs, error_code, address);
+	handle_page_fault(regs, error_code, address, cpu);
 	instrumentation_end();
 
 	irqentry_exit(regs, state);
-- 
2.37.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] x86,mm: print likely CPU at segfault time
  2022-08-05 14:40   ` Rik van Riel
@ 2022-08-06  8:47     ` Ingo Molnar
  2022-08-06  8:59       ` Ingo Molnar
  0 siblings, 1 reply; 7+ messages in thread
From: Ingo Molnar @ 2022-08-06  8:47 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Borislav Petkov, x86, linux-kernel, kernel-team, Dave Hansen,
	Thomas Gleixner, Andy Lutomirski


* Rik van Riel <riel@surriel.com> wrote:

> On Fri, 5 Aug 2022 16:27:40 +0200
> Borislav Petkov <bp@alien8.de> wrote:
> 
> > On Fri, Aug 05, 2022 at 10:16:44AM -0400, Rik van Riel wrote:
> > > diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> > > index fad8faa29d04..c7a5bbf40367 100644
> > > --- a/arch/x86/mm/fault.c
> > > +++ b/arch/x86/mm/fault.c
> > > @@ -769,6 +769,8 @@ show_signal_msg(struct pt_regs *regs, unsigned long error_code,
> > >  		unsigned long address, struct task_struct *tsk)
> > >  {
> > >  	const char *loglvl = task_pid_nr(tsk) > 1 ? KERN_INFO : KERN_EMERG;
> > > +	/* This is a racy snapshot, but it's better than nothing. */
> > > +	int cpu = raw_smp_processor_id();  
> > 
> > Please read this in exc_page_fault() and hand it down to helpers.
> 
> Below is the change that implements your suggestion.
> 
> If there is consensus among the x86 maintainers that this is
> desirable, I am more than happy to merge that change into my
> patch and resubmit v4.
> 
> I don't have a strong opinion either way.
> 
> ---8<---
> 
> From 444f8588f0edfd8586a86e85191ad8fa8b7c6a6c Mon Sep 17 00:00:00 2001
> From: Rik van Riel <riel@surriel.com>
> Date: Fri, 5 Aug 2022 10:32:11 -0400
> Subject: [PATCH 2/2] x86,mm: get CPU number for segfault printk before
>  enabling preemption
> 
> Get the CPU number for the segfault printk earlier in the page fault
> handler, before preemption is enabled.
> 
> Suggested-by: Borislav Petkov <bp@alien8.de>
> Signed-off-by: Rik van Riel <riel@surriel.com>
> ---
>  arch/x86/mm/fault.c | 58 +++++++++++++++++++++++----------------------
>  1 file changed, 30 insertions(+), 28 deletions(-)
> 
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index c7a5bbf40367..bd06b22826b2 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -766,11 +766,9 @@ kernelmode_fixup_or_oops(struct pt_regs *regs, unsigned long error_code,
>   */
>  static inline void
>  show_signal_msg(struct pt_regs *regs, unsigned long error_code,
> -		unsigned long address, struct task_struct *tsk)
> +		unsigned long address, struct task_struct *tsk, int cpu)

>  __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code,
> -		       unsigned long address, u32 pkey, int si_code)
> +		       unsigned long address, u32 pkey, int si_code, int cpu)

> -		show_signal_msg(regs, error_code, address, tsk);
> +		show_signal_msg(regs, error_code, address, tsk, cpu);

> -		     unsigned long address)
> +		     unsigned long address, int cpu)

> -	   unsigned long address, u32 pkey, int si_code)
> +	   unsigned long address, u32 pkey, int si_code, int cpu)

> -	__bad_area_nosemaphore(regs, error_code, address, pkey, si_code);
> +	__bad_area_nosemaphore(regs, error_code, address, pkey, si_code, cpu);

> -bad_area(struct pt_regs *regs, unsigned long error_code, unsigned long address)
> +bad_area(struct pt_regs *regs, unsigned long error_code, unsigned long address,
> +	 int cpu)
>  {
> -	__bad_area(regs, error_code, address, 0, SEGV_MAPERR);
> +	__bad_area(regs, error_code, address, 0, SEGV_MAPERR, cpu);

>  bad_area_access_error(struct pt_regs *regs, unsigned long error_code,
> -		      unsigned long address, struct vm_area_struct *vma)
> +		      unsigned long address, struct vm_area_struct *vma,
> +		      int cpu)

> -		__bad_area(regs, error_code, address, pkey, SEGV_PKUERR);
> +		__bad_area(regs, error_code, address, pkey, SEGV_PKUERR, cpu);
>  	} else {
> -		__bad_area(regs, error_code, address, 0, SEGV_ACCERR);
> +		__bad_area(regs, error_code, address, 0, SEGV_ACCERR, cpu);

>  static void
>  do_kern_addr_fault(struct pt_regs *regs, unsigned long hw_error_code,
> -		   unsigned long address)
> +		   unsigned long address, int cpu)

> -	bad_area_nosemaphore(regs, hw_error_code, address);
> +	bad_area_nosemaphore(regs, hw_error_code, address, cpu);

>  void do_user_addr_fault(struct pt_regs *regs,
>  			unsigned long error_code,
> -			unsigned long address)
> +			unsigned long address,
> +			int cpu)

> -		bad_area_nosemaphore(regs, error_code, address);
> +		bad_area_nosemaphore(regs, error_code, address, cpu);

> -			bad_area_nosemaphore(regs, error_code, address);
> +			bad_area_nosemaphore(regs, error_code, address, cpu);

> -		bad_area(regs, error_code, address);
> +		bad_area(regs, error_code, address, cpu);

> -		bad_area(regs, error_code, address);
> +		bad_area(regs, error_code, address, cpu);

> -		bad_area(regs, error_code, address);
> +		bad_area(regs, error_code, address, cpu);

> -		bad_area_access_error(regs, error_code, address, vma);
> +		bad_area_access_error(regs, error_code, address, vma, cpu);

> -			bad_area_nosemaphore(regs, error_code, address);
> +			bad_area_nosemaphore(regs, error_code, address, cpu);

> -			      unsigned long address)
> +			      unsigned long address, int cpu)

> -		do_kern_addr_fault(regs, error_code, address);
> +		do_kern_addr_fault(regs, error_code, address, cpu);

> -		do_user_addr_fault(regs, error_code, address);
> +		do_user_addr_fault(regs, error_code, address, cpu);

>  DEFINE_IDTENTRY_RAW_ERRORCODE(exc_page_fault)
>  {
>  	unsigned long address = read_cr2();
> +	int cpu = raw_smp_processor_id();
>  	irqentry_state_t state;
>  
>  	prefetchw(&current->mm->mmap_lock);
> @@ -1547,7 +1549,7 @@ DEFINE_IDTENTRY_RAW_ERRORCODE(exc_page_fault)
>  	state = irqentry_enter(regs);
>  
>  	instrumentation_begin();
> -	handle_page_fault(regs, error_code, address);
> +	handle_page_fault(regs, error_code, address, cpu);

Not convinced that this is a good change: this will bloat all the affected 
code by a couple of dozen instructions - for no good reason in the context 
of this patch.

Boris, why should we do this? Extracting a parameter at higher levels and 
passing it down to lower levels is almost always a bad idea from a code 
generation POV, unless the majority of lower levels needs this information 
anyway (which isn't the case here).

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] x86,mm: print likely CPU at segfault time
  2022-08-06  8:47     ` Ingo Molnar
@ 2022-08-06  8:59       ` Ingo Molnar
  0 siblings, 0 replies; 7+ messages in thread
From: Ingo Molnar @ 2022-08-06  8:59 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Borislav Petkov, x86, linux-kernel, kernel-team, Dave Hansen,
	Thomas Gleixner, Andy Lutomirski


* Ingo Molnar <mingo@kernel.org> wrote:

> >  	instrumentation_begin();
> > -	handle_page_fault(regs, error_code, address);
> > +	handle_page_fault(regs, error_code, address, cpu);
> 
> Not convinced that this is a good change: this will bloat all the 
> affected code by a couple of dozen instructions - for no good reason in 
> the context of this patch.
> 
> Boris, why should we do this? Extracting a parameter at higher levels and 
> passing it down to lower levels is almost always a bad idea from a code 
> generation POV, unless the majority of lower levels needs this 
> information anyway (which isn't the case here).

Oh, I just got to this series in my mbox:

  [RFC PATCH 0/5] Print CPU at segfault time
  ...
  [RFC PATCH 5/5] x86/entry: Store CPU info on exception entry

With that basis, printing the segfault CPU becomes a 'free' feature.

At the cost of putting ~2 new instructions into the hotpath of every 
exception though. :-/

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v3] x86,mm: print likely CPU at segfault time
  2022-08-05 14:27 ` Borislav Petkov
  2022-08-05 14:40   ` Rik van Riel
@ 2022-08-05 17:09   ` Ira Weiny
  1 sibling, 0 replies; 7+ messages in thread
From: Ira Weiny @ 2022-08-05 17:09 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Rik van Riel, Ingo Molnar, x86, linux-kernel, kernel-team,
	Dave Hansen

On Fri, Aug 05, 2022 at 04:27:40PM +0200, Borislav Petkov wrote:
> On Fri, Aug 05, 2022 at 10:16:44AM -0400, Rik van Riel wrote:
> > diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> > index fad8faa29d04..c7a5bbf40367 100644
> > --- a/arch/x86/mm/fault.c
> > +++ b/arch/x86/mm/fault.c
> > @@ -769,6 +769,8 @@ show_signal_msg(struct pt_regs *regs, unsigned long error_code,
> >  		unsigned long address, struct task_struct *tsk)
> >  {
> >  	const char *loglvl = task_pid_nr(tsk) > 1 ? KERN_INFO : KERN_EMERG;
> > +	/* This is a racy snapshot, but it's better than nothing. */
> > +	int cpu = raw_smp_processor_id();
> 
> Please read this in exc_page_fault() and hand it down to helpers.
> 
> Alternatively, I'm being told there's a patchset in the works which
> will allow for any exception handler to pass in additional information
> downwards through an extended pt_regs. Then, saving the CPU number on
> which the handler is running would work generically everywhere.

Indeed that was part of the PKS series.[1]

I've thrown together a quick RFC with the relevant patches from that series and
Rik's code in show_signal_msg().

I'll post it shortly,
Ira

[1] https://lore.kernel.org/lkml/20220419170649.1022246-18-ira.weiny@intel.com/

> 
> Thx.
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [tip: x86/cpu] x86/mm: Print likely CPU at segfault time
  2022-08-05 14:16 [PATCH v3] x86,mm: print likely CPU at segfault time Rik van Riel
  2022-08-05 14:27 ` Borislav Petkov
@ 2022-08-24 11:03 ` tip-bot2 for Rik van Riel
  1 sibling, 0 replies; 7+ messages in thread
From: tip-bot2 for Rik van Riel @ 2022-08-24 11:03 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Rik van Riel, Borislav Petkov, x86, linux-kernel

The following commit has been merged into the x86/cpu branch of tip:

Commit-ID:     c926087eb38520b268515ae1a842db6db62554cc
Gitweb:        https://git.kernel.org/tip/c926087eb38520b268515ae1a842db6db62554cc
Author:        Rik van Riel <riel@surriel.com>
AuthorDate:    Fri, 05 Aug 2022 10:16:44 -04:00
Committer:     Borislav Petkov <bp@suse.de>
CommitterDate: Wed, 24 Aug 2022 12:48:05 +02:00

x86/mm: Print likely CPU at segfault time

In a large enough fleet of computers, it is common to have a few bad CPUs.
Those can often be identified by seeing that some commonly run kernel code,
which runs fine everywhere else, keeps crashing on the same CPU core on one
particular bad system.

However, the failure modes in CPUs that have gone bad over the years are
often oddly specific, and the only bad behavior seen might be segfaults
in programs like bash, python, or various system daemons that run fine
everywhere else.

Add a printk() to show_signal_msg() to print the CPU, core, and socket
at segfault time.

This is not perfect, since the task might get rescheduled on another
CPU between when the fault hit, and when the message is printed, but in
practice this has been good enough to help people identify several bad
CPU cores.

For example:

  segfault[1349]: segfault at 0 ip 000000000040113a sp 00007ffc6d32e360 error 4 in \
	  segfault[401000+1000] likely on CPU 0 (core 0, socket 0)

This printk can be controlled through /proc/sys/debug/exception-trace.

  [ bp: Massage a bit, add "likely" to the printed line to denote that
    the CPU number is not always reliable. ]

Signed-off-by: Rik van Riel <riel@surriel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220805101644.2e674553@imladris.surriel.com
---
 arch/x86/mm/fault.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index fa71a5d..a498ae1 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -769,6 +769,8 @@ show_signal_msg(struct pt_regs *regs, unsigned long error_code,
 		unsigned long address, struct task_struct *tsk)
 {
 	const char *loglvl = task_pid_nr(tsk) > 1 ? KERN_INFO : KERN_EMERG;
+	/* This is a racy snapshot, but it's better than nothing. */
+	int cpu = raw_smp_processor_id();
 
 	if (!unhandled_signal(tsk, SIGSEGV))
 		return;
@@ -782,6 +784,14 @@ show_signal_msg(struct pt_regs *regs, unsigned long error_code,
 
 	print_vma_addr(KERN_CONT " in ", regs->ip);
 
+	/*
+	 * Dump the likely CPU where the fatal segfault happened.
+	 * This can help identify faulty hardware.
+	 */
+	printk(KERN_CONT " likely on CPU %d (core %d, socket %d)", cpu,
+	       topology_core_id(cpu), topology_physical_package_id(cpu));
+
+
 	printk(KERN_CONT "\n");
 
 	show_opcodes(regs, loglvl);

^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-08-24 11:04 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-08-05 14:16 [PATCH v3] x86,mm: print likely CPU at segfault time Rik van Riel
2022-08-05 14:27 ` Borislav Petkov
2022-08-05 14:40   ` Rik van Riel
2022-08-06  8:47     ` Ingo Molnar
2022-08-06  8:59       ` Ingo Molnar
2022-08-05 17:09   ` Ira Weiny
2022-08-24 11:03 ` [tip: x86/cpu] x86/mm: Print " tip-bot2 for Rik van Riel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox