* 2.6.25-rc6: BUG: unable to handle kernel NULL pointer dereference
@ 2008-03-25 23:08 Christian Kujau
2008-03-26 6:33 ` Andrew Morton
0 siblings, 1 reply; 12+ messages in thread
From: Christian Kujau @ 2008-03-25 23:08 UTC (permalink / raw)
To: LKML
Hi,
2.6.25-rc6 is a strong beast :)
Another[0] BUG is printed and the box is still alive:
BUG: unable to handle kernel NULL pointer dereference at 00000000
IP: [<c0179114>] __d_lookup+0x94/0x150
*pde = 00000000
Oops: 0000 [#1]
Modules linked in: fuse sha256_generic xt_tcpudp ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_nat_ftp nf_nat nf_conntrack_ftp xt_conntrack nf_conntrack iptable_filter ip_tables ipt_ULOG x_tables nfsd lockd nfs_acl auth_rpcgss exportfs tun sunrpc twofish_i586 twofish_common eeprom w83l785ts asb100 hwmon_vid usb_storage zd1211rw firmware_class mac80211 snd_intel8x0 snd_ac97_codec i2c_nforce2 cfg80211 ac97_bus snd_pcm snd_timer snd soundcore snd_page_alloc i2c_core [last unloaded: fuse]
Pid: 15705, comm: imap Not tainted (2.6.25-rc6 #5)
EIP: 0060:[<c0179114>] EFLAGS: 00010286 CPU: 0
EIP is at __d_lookup+0x94/0x150
EAX: 00000000 EBX: 0006bc44 ECX: 00000001 EDX: d60634e8
ESI: c2020a00 EDI: c56ebf30 EBP: c478ad6c ESP: c56ebd7c
DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
Process imap (pid: 15705, ti=c56eb000 task=e153c000 task.ti=c56eb000)
Stack: 00000002 00000001 c0179080 f4826be0 00000246 c56ebe08 0000000b f66a800b
d60634e8 c56ebe08 0000002f c56ebf30 c56ebe08 c016f388 c56ebe14 f7faff80
c016ee97 01eb3b48 c56ebe08 0000002f c56ebe14 f66a8017 c0170a70 c56ebf30
Call Trace:
[<c0179080>] __d_lookup+0x0/0x150
[<c016f388>] do_lookup+0x28/0x1a0
[<c016ee97>] permission+0xb7/0x120
[<c0170a70>] __link_path_walk+0x140/0xcd0
[<c043f5e4>] _spin_unlock+0x14/0x20
[<c02c3e1a>] _atomic_dec_and_lock+0x2a/0x40
[<c0179855>] dput+0x65/0xf0
[<c017163a>] link_path_walk+0x3a/0xa0
[<c043f5e4>] _spin_unlock+0x14/0x20
[<c01662bb>] get_unused_fd_flags+0xab/0xd0
[<c017189e>] do_path_lookup+0x6e/0x180
[<c0169088>] get_empty_filp+0xa8/0x120
[<c01724b1>] __path_lookup_intent_open+0x51/0xa0
[<c0172590>] path_lookup_open+0x20/0x30
[<c0172686>] open_namei+0x66/0x5f0
[<c01665ae>] do_filp_open+0x2e/0x60
[<c043f5e4>] _spin_unlock+0x14/0x20
[<c01662bb>] get_unused_fd_flags+0xab/0xd0
[<c016662c>] do_sys_open+0x4c/0xe0
[<c01666fc>] sys_open+0x1c/0x20
[<c0102dee>] sysenter_past_esp+0x5f/0xa5
=======================
Code: 53 c0 e8 20 08 fc ff c1 e3 02 8b 14 33 89 54 24 20 8b 44 24 20 85 c0 75 10 eb 51 8b 12 89 54 24 20 8b 44 24 20 85 c0 74 43 8b 02 <0f> 18 00 90 8d 5a d8 39 6b 34 75 e4 8b 7c 24 0c 39 7b 30 75 db
EIP: [<c0179114>] __d_lookup+0x94/0x150 SS:ESP 0068:c56ebd7c
---[ end trace 274145890e21aa9a ]---
I've put some more details (.config, dmesg, some sysrq printouts) on:
http://nerdbynature.de/bits/2.6.25-rc6/Oops_d_lookup/
Please tell me not to worry :)
Christian.
[0] http://lkml.org/lkml/2008/3/23/245
--
BOFH excuse #85:
Windows 95 undocumented "feature"
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: 2.6.25-rc6: BUG: unable to handle kernel NULL pointer dereference 2008-03-25 23:08 2.6.25-rc6: BUG: unable to handle kernel NULL pointer dereference Christian Kujau @ 2008-03-26 6:33 ` Andrew Morton 2008-03-26 21:56 ` Rafael J. Wysocki ` (2 more replies) 0 siblings, 3 replies; 12+ messages in thread From: Andrew Morton @ 2008-03-26 6:33 UTC (permalink / raw) To: Christian Kujau; +Cc: LKML, Markus Rehbach, Rafael J. Wysocki On Wed, 26 Mar 2008 00:08:48 +0100 (CET) Christian Kujau <lists@nerdbynature.de> wrote: > Hi, > > 2.6.25-rc6 is a strong beast :) > Another[0] BUG is printed and the box is still alive: > > BUG: unable to handle kernel NULL pointer dereference at 00000000 > IP: [<c0179114>] __d_lookup+0x94/0x150 > *pde = 00000000 > Oops: 0000 [#1] > Modules linked in: fuse sha256_generic xt_tcpudp ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_nat_ftp nf_nat nf_conntrack_ftp xt_conntrack nf_conntrack iptable_filter ip_tables ipt_ULOG x_tables nfsd lockd nfs_acl auth_rpcgss exportfs tun sunrpc twofish_i586 twofish_common eeprom w83l785ts asb100 hwmon_vid usb_storage zd1211rw firmware_class mac80211 snd_intel8x0 snd_ac97_codec i2c_nforce2 cfg80211 ac97_bus snd_pcm snd_timer snd soundcore snd_page_alloc i2c_core [last unloaded: fuse] > Pid: 15705, comm: imap Not tainted (2.6.25-rc6 #5) > EIP: 0060:[<c0179114>] EFLAGS: 00010286 CPU: 0 > EIP is at __d_lookup+0x94/0x150 > EAX: 00000000 EBX: 0006bc44 ECX: 00000001 EDX: d60634e8 > ESI: c2020a00 EDI: c56ebf30 EBP: c478ad6c ESP: c56ebd7c > DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068 > Process imap (pid: 15705, ti=c56eb000 task=e153c000 task.ti=c56eb000) > Stack: 00000002 00000001 c0179080 f4826be0 00000246 c56ebe08 0000000b f66a800b > d60634e8 c56ebe08 0000002f c56ebf30 c56ebe08 c016f388 c56ebe14 f7faff80 > c016ee97 01eb3b48 c56ebe08 0000002f c56ebe14 f66a8017 c0170a70 c56ebf30 > Call Trace: > [<c0179080>] __d_lookup+0x0/0x150 > [<c016f388>] do_lookup+0x28/0x1a0 > [<c016ee97>] permission+0xb7/0x120 > [<c0170a70>] __link_path_walk+0x140/0xcd0 > [<c043f5e4>] _spin_unlock+0x14/0x20 > [<c02c3e1a>] _atomic_dec_and_lock+0x2a/0x40 > [<c0179855>] dput+0x65/0xf0 > [<c017163a>] link_path_walk+0x3a/0xa0 > [<c043f5e4>] _spin_unlock+0x14/0x20 > [<c01662bb>] get_unused_fd_flags+0xab/0xd0 > [<c017189e>] do_path_lookup+0x6e/0x180 > [<c0169088>] get_empty_filp+0xa8/0x120 > [<c01724b1>] __path_lookup_intent_open+0x51/0xa0 > [<c0172590>] path_lookup_open+0x20/0x30 > [<c0172686>] open_namei+0x66/0x5f0 > [<c01665ae>] do_filp_open+0x2e/0x60 > [<c043f5e4>] _spin_unlock+0x14/0x20 > [<c01662bb>] get_unused_fd_flags+0xab/0xd0 > [<c016662c>] do_sys_open+0x4c/0xe0 > [<c01666fc>] sys_open+0x1c/0x20 > [<c0102dee>] sysenter_past_esp+0x5f/0xa5 > ======================= > Code: 53 c0 e8 20 08 fc ff c1 e3 02 8b 14 33 89 54 24 20 8b 44 24 20 85 c0 75 10 eb 51 8b 12 89 54 24 20 8b 44 24 20 85 c0 74 43 8b 02 <0f> 18 00 90 8d 5a d8 39 6b 34 75 e4 8b 7c 24 0c 39 7b 30 75 db > EIP: [<c0179114>] __d_lookup+0x94/0x150 SS:ESP 0068:c56ebd7c > ---[ end trace 274145890e21aa9a ]--- > > > I've put some more details (.config, dmesg, some sysrq printouts) on: > http://nerdbynature.de/bits/2.6.25-rc6/Oops_d_lookup/ > > Please tell me not to worry :) > Christian. > > [0] http://lkml.org/lkml/2008/3/23/245 Markus reported what looks to be the same thing here: http://lkml.org/lkml/2008/3/21/202 and it's already in the regresison list. I guess you've confirmed that this wasn't a mystery once-off-on-that-machine. I can't think what we did to cause this. Were you doing anything unusual on that machine? I see the fuse module was loaded - was it being used? Were any oddball (ie: non-ext3 ;)) filesystems being used? etc. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.25-rc6: BUG: unable to handle kernel NULL pointer dereference 2008-03-26 6:33 ` Andrew Morton @ 2008-03-26 21:56 ` Rafael J. Wysocki 2008-03-26 23:57 ` Christian Kujau 2008-03-27 15:20 ` Thomas Gleixner 2 siblings, 0 replies; 12+ messages in thread From: Rafael J. Wysocki @ 2008-03-26 21:56 UTC (permalink / raw) To: Andrew Morton; +Cc: Christian Kujau, LKML, Markus Rehbach On Wednesday, 26 of March 2008, Andrew Morton wrote: > On Wed, 26 Mar 2008 00:08:48 +0100 (CET) Christian Kujau <lists@nerdbynature.de> wrote: > > > Hi, > > > > 2.6.25-rc6 is a strong beast :) > > Another[0] BUG is printed and the box is still alive: > > > > BUG: unable to handle kernel NULL pointer dereference at 00000000 > > IP: [<c0179114>] __d_lookup+0x94/0x150 > > *pde = 00000000 > > Oops: 0000 [#1] > > Modules linked in: fuse sha256_generic xt_tcpudp ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_nat_ftp nf_nat nf_conntrack_ftp xt_conntrack nf_conntrack iptable_filter ip_tables ipt_ULOG x_tables nfsd lockd nfs_acl auth_rpcgss exportfs tun sunrpc twofish_i586 twofish_common eeprom w83l785ts asb100 hwmon_vid usb_storage zd1211rw firmware_class mac80211 snd_intel8x0 snd_ac97_codec i2c_nforce2 cfg80211 ac97_bus snd_pcm snd_timer snd soundcore snd_page_alloc i2c_core [last unloaded: fuse] > > Pid: 15705, comm: imap Not tainted (2.6.25-rc6 #5) > > EIP: 0060:[<c0179114>] EFLAGS: 00010286 CPU: 0 > > EIP is at __d_lookup+0x94/0x150 > > EAX: 00000000 EBX: 0006bc44 ECX: 00000001 EDX: d60634e8 > > ESI: c2020a00 EDI: c56ebf30 EBP: c478ad6c ESP: c56ebd7c > > DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068 > > Process imap (pid: 15705, ti=c56eb000 task=e153c000 task.ti=c56eb000) > > Stack: 00000002 00000001 c0179080 f4826be0 00000246 c56ebe08 0000000b f66a800b > > d60634e8 c56ebe08 0000002f c56ebf30 c56ebe08 c016f388 c56ebe14 f7faff80 > > c016ee97 01eb3b48 c56ebe08 0000002f c56ebe14 f66a8017 c0170a70 c56ebf30 > > Call Trace: > > [<c0179080>] __d_lookup+0x0/0x150 > > [<c016f388>] do_lookup+0x28/0x1a0 > > [<c016ee97>] permission+0xb7/0x120 > > [<c0170a70>] __link_path_walk+0x140/0xcd0 > > [<c043f5e4>] _spin_unlock+0x14/0x20 > > [<c02c3e1a>] _atomic_dec_and_lock+0x2a/0x40 > > [<c0179855>] dput+0x65/0xf0 > > [<c017163a>] link_path_walk+0x3a/0xa0 > > [<c043f5e4>] _spin_unlock+0x14/0x20 > > [<c01662bb>] get_unused_fd_flags+0xab/0xd0 > > [<c017189e>] do_path_lookup+0x6e/0x180 > > [<c0169088>] get_empty_filp+0xa8/0x120 > > [<c01724b1>] __path_lookup_intent_open+0x51/0xa0 > > [<c0172590>] path_lookup_open+0x20/0x30 > > [<c0172686>] open_namei+0x66/0x5f0 > > [<c01665ae>] do_filp_open+0x2e/0x60 > > [<c043f5e4>] _spin_unlock+0x14/0x20 > > [<c01662bb>] get_unused_fd_flags+0xab/0xd0 > > [<c016662c>] do_sys_open+0x4c/0xe0 > > [<c01666fc>] sys_open+0x1c/0x20 > > [<c0102dee>] sysenter_past_esp+0x5f/0xa5 > > ======================= > > Code: 53 c0 e8 20 08 fc ff c1 e3 02 8b 14 33 89 54 24 20 8b 44 24 20 85 c0 75 10 eb 51 8b 12 89 54 24 20 8b 44 24 20 85 c0 74 43 8b 02 <0f> 18 00 90 8d 5a d8 39 6b 34 75 e4 8b 7c 24 0c 39 7b 30 75 db > > EIP: [<c0179114>] __d_lookup+0x94/0x150 SS:ESP 0068:c56ebd7c > > ---[ end trace 274145890e21aa9a ]--- > > > > > > I've put some more details (.config, dmesg, some sysrq printouts) on: > > http://nerdbynature.de/bits/2.6.25-rc6/Oops_d_lookup/ > > > > Please tell me not to worry :) > > Christian. > > > > [0] http://lkml.org/lkml/2008/3/23/245 > > Markus reported what looks to be the same thing here: > http://lkml.org/lkml/2008/3/21/202 and it's already in the regresison list. > > I guess you've confirmed that this wasn't a mystery > once-off-on-that-machine. > > I can't think what we did to cause this. Were you doing anything unusual > on that machine? I see the fuse module was loaded - was it being used? > Were any oddball (ie: non-ext3 ;)) filesystems being used? etc. Well, we seem to get mm-related traces on x86-32 at random places. http://www.ussg.iu.edu/hypermail/linux/kernel/0803.3/0782.html for example. I'm starting to think there's some arch-related mm issue lurking in there. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.25-rc6: BUG: unable to handle kernel NULL pointer dereference 2008-03-26 6:33 ` Andrew Morton 2008-03-26 21:56 ` Rafael J. Wysocki @ 2008-03-26 23:57 ` Christian Kujau 2008-03-27 15:20 ` Thomas Gleixner 2 siblings, 0 replies; 12+ messages in thread From: Christian Kujau @ 2008-03-26 23:57 UTC (permalink / raw) To: Andrew Morton; +Cc: LKML, Markus Rehbach, Rafael J. Wysocki On Tue, 25 Mar 2008, Andrew Morton wrote: > Markus reported what looks to be the same thing here: > http://lkml.org/lkml/2008/3/21/202 and it's already in the regresison list. Yes, I've found 3 more reports for __d_lookup on kerneloops.org, first seen for 2.6.25-rc5-git5. > I can't think what we did to cause this. Were you doing anything unusual > on that machine? Well, I was reading mail...and suddenly alpine complained that the imap server was gone - and indeed "imap" was in the Oops message. But apart from that, nothing exotic going on. > I see the fuse module was loaded - was it being used? No, it's loaded, but it was not in use. > Were any oddball (ie: non-ext3 ;)) filesystems being used? etc. There's ext2/3/4, jfs, xfs, reiserfs (not reiser4) - the whole family. The only oddball coming to mind is zd1211rw with its binary firmware. But no SMP, no ACPI, no preempt... Christian. -- BOFH excuse #90: Budget cuts ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.25-rc6: BUG: unable to handle kernel NULL pointer dereference 2008-03-26 6:33 ` Andrew Morton 2008-03-26 21:56 ` Rafael J. Wysocki 2008-03-26 23:57 ` Christian Kujau @ 2008-03-27 15:20 ` Thomas Gleixner 2008-03-27 15:26 ` Ingo Molnar ` (3 more replies) 2 siblings, 4 replies; 12+ messages in thread From: Thomas Gleixner @ 2008-03-27 15:20 UTC (permalink / raw) To: Andrew Morton Cc: Christian Kujau, LKML, Markus Rehbach, Rafael J. Wysocki, Ingo Molnar On Tue, 25 Mar 2008, Andrew Morton wrote: > > Code: 53 c0 e8 20 08 fc ff c1 e3 02 8b 14 33 89 54 24 20 8b 44 24 20 85 c0 75 10 eb 51 8b 12 89 54 24 20 8b 44 24 20 85 c0 74 43 8b 02 <0f> 18 00 90 8d 5a d8 39 6b 34 75 e4 8b 7c 24 0c 39 7b 30 75 db It faults in a prefetch. > Markus reported what looks to be the same thing here: > http://lkml.org/lkml/2008/3/21/202 and it's already in the regresison list. Same here. And both are AMD X2 early stepping machines. > I guess you've confirmed that this wasn't a mystery > once-off-on-that-machine. > > I can't think what we did to cause this. I had a lengthy bug decoding session with Ingo and we found the root cause: A dropped workaround for the prefetch bug in early X2s and Opterons. Patch below. Thanks, tglx ---------------> Subject: x86: fix prefetch workaround From: Ingo Molnar <mingo@elte.hu> Date: Thu Mar 27 15:58:28 CET 2008 some early Athlon XP's and Opterons generate bogus faults on prefetch instructions. The workaround for this regressed over .24 - reinstate it. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- arch/x86/mm/fault.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) Index: linux-x86.q/arch/x86/mm/fault.c =================================================================== --- linux-x86.q.orig/arch/x86/mm/fault.c +++ linux-x86.q/arch/x86/mm/fault.c @@ -104,7 +104,8 @@ static int is_prefetch(struct pt_regs *r unsigned char *max_instr; #ifdef CONFIG_X86_32 - if (!(__supported_pte_mask & _PAGE_NX)) + /* Catch an obscure case of prefetch inside an NX page: */ + if ((__supported_pte_mask & _PAGE_NX) && (error_code & 16)) return 0; #endif ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.25-rc6: BUG: unable to handle kernel NULL pointer dereference 2008-03-27 15:20 ` Thomas Gleixner @ 2008-03-27 15:26 ` Ingo Molnar 2008-03-27 18:30 ` Markus Rehbach ` (2 subsequent siblings) 3 siblings, 0 replies; 12+ messages in thread From: Ingo Molnar @ 2008-03-27 15:26 UTC (permalink / raw) To: Thomas Gleixner Cc: Andrew Morton, Christian Kujau, LKML, Markus Rehbach, Rafael J. Wysocki * Thomas Gleixner <tglx@linutronix.de> wrote: > On Tue, 25 Mar 2008, Andrew Morton wrote: > > > Code: 53 c0 e8 20 08 fc ff c1 e3 02 8b 14 33 89 54 24 20 8b 44 24 20 85 c0 75 10 eb 51 8b 12 89 54 24 20 8b 44 24 20 85 c0 74 43 8b 02 <0f> 18 00 90 8d 5a d8 39 6b 34 75 e4 8b 7c 24 0c 39 7b 30 75 db > > It faults in a prefetch. > > > Markus reported what looks to be the same thing here: > > http://lkml.org/lkml/2008/3/21/202 and it's already in the regresison list. > > Same here. And both are AMD X2 early stepping machines. > > > I guess you've confirmed that this wasn't a mystery > > once-off-on-that-machine. > > > > I can't think what we did to cause this. > > I had a lengthy bug decoding session with Ingo and we found the root > cause: > > A dropped workaround for the prefetch bug in early X2s and > Opterons. Patch below. can also be tested by picking up x86.git/latest, which has this patch included: http://people.redhat.com/mingo/x86.git/README Ingo ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.25-rc6: BUG: unable to handle kernel NULL pointer dereference 2008-03-27 15:20 ` Thomas Gleixner 2008-03-27 15:26 ` Ingo Molnar @ 2008-03-27 18:30 ` Markus Rehbach 2008-03-27 19:26 ` Thomas Gleixner 2008-03-27 23:50 ` Björn Steinbrink 2008-03-28 1:46 ` Christian Kujau 3 siblings, 1 reply; 12+ messages in thread From: Markus Rehbach @ 2008-03-27 18:30 UTC (permalink / raw) To: Thomas Gleixner Cc: Andrew Morton, Christian Kujau, LKML, Rafael J. Wysocki, Ingo Molnar Thomas Gleixner schrieb: > On Tue, 25 Mar 2008, Andrew Morton wrote: >> http://lkml.org/lkml/2008/3/21/202 and it's already in the regresison list. > > Same here. And both are AMD X2 early stepping machines. > A dropped workaround for the prefetch bug in early X2s and > Opterons. Patch below. The patch cures it. Tested with rc5-git5, and it was 100% reproducible here. Markus ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.25-rc6: BUG: unable to handle kernel NULL pointer dereference 2008-03-27 18:30 ` Markus Rehbach @ 2008-03-27 19:26 ` Thomas Gleixner 0 siblings, 0 replies; 12+ messages in thread From: Thomas Gleixner @ 2008-03-27 19:26 UTC (permalink / raw) To: Markus Rehbach Cc: Andrew Morton, Christian Kujau, LKML, Rafael J. Wysocki, Ingo Molnar On Thu, 27 Mar 2008, Markus Rehbach wrote: > Thomas Gleixner schrieb: > > On Tue, 25 Mar 2008, Andrew Morton wrote: > > >> http://lkml.org/lkml/2008/3/21/202 and it's already in the regresison list. > > > > Same here. And both are AMD X2 early stepping machines. > > > A dropped workaround for the prefetch bug in early X2s and > > Opterons. Patch below. > > The patch cures it. Tested with rc5-git5, and it was 100% > reproducible here. Thanks for testing. Fix is queued for Linus. Thanks, tglx ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.25-rc6: BUG: unable to handle kernel NULL pointer dereference 2008-03-27 15:20 ` Thomas Gleixner 2008-03-27 15:26 ` Ingo Molnar 2008-03-27 18:30 ` Markus Rehbach @ 2008-03-27 23:50 ` Björn Steinbrink 2008-03-28 8:50 ` Christian Kujau 2008-03-28 1:46 ` Christian Kujau 3 siblings, 1 reply; 12+ messages in thread From: Björn Steinbrink @ 2008-03-27 23:50 UTC (permalink / raw) To: Thomas Gleixner Cc: Andrew Morton, Christian Kujau, LKML, Markus Rehbach, Rafael J. Wysocki, Ingo Molnar On 2008.03.27 16:20:53 +0100, Thomas Gleixner wrote: > On Tue, 25 Mar 2008, Andrew Morton wrote: > > > Code: 53 c0 e8 20 08 fc ff c1 e3 02 8b 14 33 89 54 24 20 8b 44 24 20 85 c0 75 10 eb 51 8b 12 89 54 24 20 8b 44 24 20 85 c0 74 43 8b 02 <0f> 18 00 90 8d 5a d8 39 6b 34 75 e4 8b 7c 24 0c 39 7b 30 75 db > > It faults in a prefetch. > > > Markus reported what looks to be the same thing here: > > http://lkml.org/lkml/2008/3/21/202 and it's already in the regresison list. > > Same here. And both are AMD X2 early stepping machines. > > > I guess you've confirmed that this wasn't a mystery > > once-off-on-that-machine. > > > > I can't think what we did to cause this. > > I had a lengthy bug decoding session with Ingo and we found the root > cause: > > A dropped workaround for the prefetch bug in early X2s and > Opterons. Patch below. > > Thanks, > > tglx > > ---------------> > Subject: x86: fix prefetch workaround > From: Ingo Molnar <mingo@elte.hu> > Date: Thu Mar 27 15:58:28 CET 2008 > > some early Athlon XP's and Opterons generate bogus faults on prefetch ^^ Umh, XP? Didn't you say X2 above? And looking at the patch, X2 seems more plausible as well, I don't think that the XP supported the NX bit, did it? Björn > instructions. The workaround for this regressed over .24 - reinstate it. > > Signed-off-by: Ingo Molnar <mingo@elte.hu> > Signed-off-by: Thomas Gleixner <tglx@linutronix.de> > > --- > arch/x86/mm/fault.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > Index: linux-x86.q/arch/x86/mm/fault.c > =================================================================== > --- linux-x86.q.orig/arch/x86/mm/fault.c > +++ linux-x86.q/arch/x86/mm/fault.c > @@ -104,7 +104,8 @@ static int is_prefetch(struct pt_regs *r > unsigned char *max_instr; > > #ifdef CONFIG_X86_32 > - if (!(__supported_pte_mask & _PAGE_NX)) > + /* Catch an obscure case of prefetch inside an NX page: */ > + if ((__supported_pte_mask & _PAGE_NX) && (error_code & 16)) > return 0; > #endif > > > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.25-rc6: BUG: unable to handle kernel NULL pointer dereference 2008-03-27 23:50 ` Björn Steinbrink @ 2008-03-28 8:50 ` Christian Kujau 0 siblings, 0 replies; 12+ messages in thread From: Christian Kujau @ 2008-03-28 8:50 UTC (permalink / raw) To: Björn Steinbrink Cc: Thomas Gleixner, Andrew Morton, LKML, Markus Rehbach, Rafael J. Wysocki, Ingo Molnar On Fri, 28 Mar 2008, Björn Steinbrink wrote: >> Subject: x86: fix prefetch workaround >> From: Ingo Molnar <mingo@elte.hu> >> Date: Thu Mar 27 15:58:28 CET 2008 >> >> some early Athlon XP's and Opterons generate bogus faults on prefetch > ^^ > > Umh, XP? Didn't you say X2 above? And looking at the patch, X2 seems > more plausible as well, I don't think that the XP supported the NX bit, > did it? Hm, would be a shame because I have an XP 2600+. /proc/cpuinfo tells me: flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr mca cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow ts ...no NX in there. I wonder why this (already applied) patch should do anything on my box at all. Thanks, C. -- BOFH excuse #216: What office are you in? Oh, that one. Did you know that your building was built over the universities first nuclear research site? And wow, aren't you the lucky one, your office is right over where the core is buried! ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.25-rc6: BUG: unable to handle kernel NULL pointer dereference 2008-03-27 15:20 ` Thomas Gleixner ` (2 preceding siblings ...) 2008-03-27 23:50 ` Björn Steinbrink @ 2008-03-28 1:46 ` Christian Kujau 2008-03-28 10:25 ` Ingo Molnar 3 siblings, 1 reply; 12+ messages in thread From: Christian Kujau @ 2008-03-28 1:46 UTC (permalink / raw) To: Thomas Gleixner Cc: Andrew Morton, LKML, Markus Rehbach, Rafael J. Wysocki, Ingo Molnar On Thu, 27 Mar 2008, Thomas Gleixner wrote: > I had a lengthy bug decoding session with Ingo and we found the root > cause: > A dropped workaround for the prefetch bug in early X2s and > Opterons. Patch below. Although I reported it, I could not repoduce the bug. Anyway, I've applied your patch to -rc7 and no BUG so far :) Thanks! Christian. -- BOFH excuse #385: Dyslexics retyping hosts file on servers ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: 2.6.25-rc6: BUG: unable to handle kernel NULL pointer dereference 2008-03-28 1:46 ` Christian Kujau @ 2008-03-28 10:25 ` Ingo Molnar 0 siblings, 0 replies; 12+ messages in thread From: Ingo Molnar @ 2008-03-28 10:25 UTC (permalink / raw) To: Christian Kujau Cc: Thomas Gleixner, Andrew Morton, LKML, Markus Rehbach, Rafael J. Wysocki * Christian Kujau <lists@nerdbynature.de> wrote: > On Thu, 27 Mar 2008, Thomas Gleixner wrote: >> I had a lengthy bug decoding session with Ingo and we found the root >> cause: >> A dropped workaround for the prefetch bug in early X2s and >> Opterons. Patch below. > > Although I reported it, I could not repoduce the bug. Anyway, I've > applied your patch to -rc7 and no BUG so far :) yeah, the condition would normally be very sporadic and it can easily depend on a specific layout of your kernel image, etc. the (updated) fix is in Linus' latest git tree as well, and in x86.git/latest. Ingo ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2008-03-28 10:26 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-03-25 23:08 2.6.25-rc6: BUG: unable to handle kernel NULL pointer dereference Christian Kujau 2008-03-26 6:33 ` Andrew Morton 2008-03-26 21:56 ` Rafael J. Wysocki 2008-03-26 23:57 ` Christian Kujau 2008-03-27 15:20 ` Thomas Gleixner 2008-03-27 15:26 ` Ingo Molnar 2008-03-27 18:30 ` Markus Rehbach 2008-03-27 19:26 ` Thomas Gleixner 2008-03-27 23:50 ` Björn Steinbrink 2008-03-28 8:50 ` Christian Kujau 2008-03-28 1:46 ` Christian Kujau 2008-03-28 10:25 ` Ingo Molnar
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox