Re: crashme fault - Randy Dunlap

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Randy Dunlap <randy.dunlap@oracle.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andi Kleen <andi@firstfloor.org>,
	lkml <linux-kernel@vger.kernel.org>, Andi Kleen <ak@suse.de>
Subject: Re: crashme fault
Date: Sat, 15 Sep 2007 12:53:39 -0700	[thread overview]
Message-ID: <46EC3843.8010405@oracle.com> (raw)
In-Reply-To: <alpine.LFD.0.999.0709151224400.16478@woody.linux-foundation.org>

Linus Torvalds wrote:
> 
> On Sat, 15 Sep 2007, Randy Dunlap wrote:
>> Had another on recent last night (probably not helpful):
> 
> At least the original "crashme" would write its random number seeds to a 
> logfile each time (and I made it fsync it in some versions), which meant 
> that once a crash happened, you could re-produce it immediately (if it was 
> reproducible at all, of course).
> 
> Does your crashme have something like that?

I tell it the "random" seed to use.  I can also sets its debug level,
but when I did that yesterday, it never faulted, so I lowered it again,
them boom.  Could be coincidence.


> All your crashes look basically identical - I don't think there is 
> anything new in this one, they're all the same issue. What CPU do you have 
> - vendor, stepping, version etc - and has something else than the kernel 
> changed in your setup lately?

Just kernel changes.  CPU is dual Pentium Xeon + HT:

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 15
model           : 3
model name      :                   Intel(R) Xeon(TM) CPU 3.40GHz
stepping        : 4
cpu MHz         : 3400.227
cache size      : 1024 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 1
fpu             : yes
fpu_exception   : yes
cpuid level     : 5
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 c
lflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm constant_tsc pni monitor ds_cpl est t
m2 cid xtpr
bogomips        : 6805.96
clflush size    : 64
cache_alignment : 128
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 15
model           : 3
model name      :                   Intel(R) Xeon(TM) CPU 3.40GHz
stepping        : 4
cpu MHz         : 3400.227
cache size      : 1024 KB
physical id     : 3
siblings        : 2
core id         : 0
cpu cores       : 1
fpu             : yes
fpu_exception   : yes
cpuid level     : 5
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 c
lflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm constant_tsc pni monitor ds_cpl est t
m2 cid xtpr
bogomips        : 6800.28
clflush size    : 64
cache_alignment : 128
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor       : 2
vendor_id       : GenuineIntel
cpu family      : 15
model           : 3
model name      :                   Intel(R) Xeon(TM) CPU 3.40GHz
stepping        : 4
cpu MHz         : 3400.227
cache size      : 1024 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 1
fpu             : yes
fpu_exception   : yes
cpuid level     : 5
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 c
lflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm constant_tsc pni monitor ds_cpl est t
m2 cid xtpr
bogomips        : 6800.72
clflush size    : 64
cache_alignment : 128
address sizes   : 36 bits physical, 48 bits virtual
power management:

processor       : 3
vendor_id       : GenuineIntel
cpu family      : 15
model           : 3
model name      :                   Intel(R) Xeon(TM) CPU 3.40GHz
stepping        : 4
cpu MHz         : 3400.227
cache size      : 1024 KB
physical id     : 3
siblings        : 2
core id         : 0
cpu cores       : 1
fpu             : yes
fpu_exception   : yes
cpuid level     : 5
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 c
lflush dts acpi mmx fxsr sse sse2 ss ht tm syscall lm constant_tsc pni monitor ds_cpl est t
m2 cid xtpr
bogomips        : 6800.57
clflush size    : 64
cache_alignment : 128
address sizes   : 36 bits physical, 48 bits virtual
power management:



> As mentioned, the crash does look like a user-level crash got reported as 
> a kernel page fault, and while a CPU bug sounds incredibly unlikely, this 
> does have the smell of something strange like a fault in the middle of an 
> "iretq" or "sysretq", where part of the CPU state has already been 
> restored - which would explain why rip/cs is user space - but some part of 
> the CPU is still in kernel mode - which would explain the incorrect page 
> fault error code.
> 
> Here's a really *stupid* patch (and untested too, btw) to see if it gets 
> easier to debug when you don't oops, just print the register state 
> instead.

Will add this patch.

> (It might be interesting to also do something like
> 
> 	force_sig_specific(SIGSTOP, current);
> 
> to then be able to more easily attach to the process that had problems, 
> and debug it in user space to see what was going on..)
> 
> 		Linus
> ---
> diff --git a/arch/x86_64/mm/fault.c b/arch/x86_64/mm/fault.c
> index 327c9f2..1b81392 100644
> --- a/arch/x86_64/mm/fault.c
> +++ b/arch/x86_64/mm/fault.c
> @@ -320,6 +320,11 @@ asmlinkage void __kprobes do_page_fault(struct pt_regs *regs,
>  
>  	info.si_code = SEGV_MAPERR;
>  
> +	if (!(error_code & PF_USER) && user_mode(regs)) {
> +		printk("kernel mode page fault from user space? Huh?\n");
> +		__show_regs(regs);
> +		error_code |= PF_USER;
> +	}
>  
>  	/*
>  	 * We fault-in kernel-space virtual memory on-demand. The

next prev parent reply	other threads:[~2007-09-15 19:55 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-09-13  5:21 crashme fault Randy Dunlap
2007-09-15  4:28 ` Linus Torvalds
2007-09-15  5:05   ` Randy Dunlap
2007-09-15  5:21     ` Randy Dunlap
2007-09-15 18:34   ` Andi Kleen
2007-09-15 18:40     ` Randy Dunlap
2007-09-15 19:44       ` Linus Torvalds
2007-09-15 19:53         ` Randy Dunlap [this message]
2007-09-15 22:15         ` Linus Torvalds
2007-09-15 22:47           ` Linus Torvalds
2007-09-15 23:47             ` Randy Dunlap
2007-09-16  0:34               ` Linus Torvalds
2007-09-16 16:40                 ` Randy Dunlap
2007-09-16 17:14                   ` Linus Torvalds
2007-09-16 18:12                     ` Linus Torvalds
2007-09-17  5:06                       ` Randy Dunlap
2007-09-17  5:28                         ` Linus Torvalds
2007-09-17 14:29                           ` Randy Dunlap
2007-09-17 14:53                             ` Linus Torvalds
2007-09-17 20:05                               ` Randy Dunlap
2007-09-16 18:28                     ` Andi Kleen
2007-09-16  3:10             ` Andi Kleen
2007-09-16 15:53 ` Andrea Arcangeli
2007-09-16 16:17   ` Randy Dunlap

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=46EC3843.8010405@oracle.com \
    --to=randy.dunlap@oracle.com \
    --cc=ak@suse.de \
    --cc=andi@firstfloor.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.