* Re: linux-next: Tree for September 3 [not found] ` <alpine.LFD.1.10.0809040203510.3452@nehalem.linux-foundation.org> @ 2008-09-04 17:45 ` Andrew Morton 2008-09-04 18:05 ` Linus Torvalds 0 siblings, 1 reply; 15+ messages in thread From: Andrew Morton @ 2008-09-04 17:45 UTC (permalink / raw) To: Linus Torvalds Cc: Stephen Rothwell, linux-next, LKML, Yinghai Lu, Ivan Kokshaysky, Jesse Barnes, netdev, Al Viro On Thu, 4 Sep 2008 02:07:05 -0700 (PDT) Linus Torvalds <torvalds@linux-foundation.org> wrote: > > > On Thu, 4 Sep 2008, Andrew Morton wrote: > > > > > > commit 5f17cfce5776c566d64430f543a289e5cfa4538b > > > > > > PCI: fix pbus_size_mem() resource alignment for CardBus controllers > > > > Is this worth backporting into 2.6.26.x? > > It's certainly at least *potentially* -stable material, but because I > suspect that the whole yenta_allocate_resources() -> pci_setup_cardbus() > fallback code will end up resulting in a working setup, it may not be > worth it. At least not until we've heard from more people.. > > Can you check whether your vortex cardbus thing _works_ even without the > fix? > Working on it, but got distracted by a /proc/net bug. sony:/home/akpm> ifconfig -a Warning: cannot open /proc/net/dev (Permission denied). Limited output. Warning: cannot open /proc/net/dev (Permission denied). Limited output. eth0 Link encap:Ethernet HWaddr 00:01:4A:9F:7C:79 inet addr:192.168.2.10 Bcast:192.168.2.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 Warning: cannot open /proc/net/dev (Permission denied). Limited output. lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 UP LOOPBACK RUNNING MTU:16436 Metric:1 sony:/home/akpm> ls -l /proc/net/dev ls: /proc/net/dev: Permission denied sony:/home/akpm> ls -l /proc/net ls: /proc/net: Permission denied sony:/home/akpm> ls -ld /proc/net ls: /proc/net: Permission denied sony:/home/akpm> ls -l /proc | grep net ?--------- ? ? ? ? ? /proc/net This is a pull of your tree from yesterday, ending at commit fbb16e243887332dd5754e48ffe5b963378f3cd2 Author: Thomas Gleixner <tglx@linutronix.de> Date: Wed Sep 3 00:54:47 2008 +0200 [x86] Fix TSC calibration issues config: http://userweb.kernel.org/~akpm/config-sony.txt dmesg: http://userweb.kernel.org/~akpm/dmesg-sony-without.txt ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: linux-next: Tree for September 3 2008-09-04 17:45 ` linux-next: Tree for September 3 Andrew Morton @ 2008-09-04 18:05 ` Linus Torvalds 2008-09-04 18:34 ` Andrew Morton 0 siblings, 1 reply; 15+ messages in thread From: Linus Torvalds @ 2008-09-04 18:05 UTC (permalink / raw) To: Andrew Morton Cc: Stephen Rothwell, linux-next, LKML, Yinghai Lu, Ivan Kokshaysky, Jesse Barnes, netdev, Al Viro, Eric W. Biederman, Al Viro On Thu, 4 Sep 2008, Andrew Morton wrote: > > Working on it, but got distracted by a /proc/net bug. Ok, that's just something odd. > sony:/home/akpm> ls -l /proc/net/dev > ls: /proc/net/dev: Permission denied > > sony:/home/akpm> ls -l /proc/net > ls: /proc/net: Permission denied > > sony:/home/akpm> ls -ld /proc/net > ls: /proc/net: Permission denied > > sony:/home/akpm> ls -l /proc | grep net > ?--------- ? ? ? ? ? /proc/net > > This is a pull of your tree from yesterday, ending at commit > fbb16e243887332dd5754e48ffe5b963378f3cd2 There's been various suggested patches by Al/Eric (added to cc) for /proc/net handling, but none of them have actually even been merged yet. So I don't think this code has changed in a while. Al, Eric, ideas? > config: http://userweb.kernel.org/~akpm/config-sony.txt > dmesg: http://userweb.kernel.org/~akpm/dmesg-sony-without.txt That whole thing should just be a simple symlink: fs/proc/proc_net.c: proc_symlink("net", NULL, "self/net"); are you sure it's a plain tree of mine, without any of the patches floating around between Eric/Al? Linus ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: linux-next: Tree for September 3 2008-09-04 18:05 ` Linus Torvalds @ 2008-09-04 18:34 ` Andrew Morton 2008-09-04 20:31 ` Eric W. Biederman 2008-09-04 22:45 ` Thomas Gleixner 0 siblings, 2 replies; 15+ messages in thread From: Andrew Morton @ 2008-09-04 18:34 UTC (permalink / raw) To: Linus Torvalds Cc: Stephen Rothwell, linux-next, LKML, Yinghai Lu, Ivan Kokshaysky, Jesse Barnes, netdev, Al Viro, Eric W. Biederman, David Woodhouse, Sam Ravnborg, john stultz, Thomas Gleixner On Thu, 4 Sep 2008 11:05:21 -0700 (PDT) Linus Torvalds <torvalds@linux-foundation.org> wrote: > > This is a pull of your tree from yesterday, ending at commit > > fbb16e243887332dd5754e48ffe5b963378f3cd2 > > There's been various suggested patches by Al/Eric (added to cc) for > /proc/net handling, but none of them have actually even been merged yet. > So I don't think this code has changed in a while. > > Al, Eric, ideas? I don't think I saw it on any other test machines. This machine runs SELinux. Distro is FC5. > > config: http://userweb.kernel.org/~akpm/config-sony.txt > > dmesg: http://userweb.kernel.org/~akpm/dmesg-sony-without.txt > > That whole thing should just be a simple symlink: > > fs/proc/proc_net.c: proc_symlink("net", NULL, "self/net"); /proc/self/net looks fine. > are you sure it's a plain tree of mine, without any of the patches > floating around between Eric/Al? yup, it's yesterday's mainline. Found another problem. The way I install kernels on machine `sony' is: - Build the kernel on machine `y', in /usr/src/25 - On machine sony, NFS mount y:/usr/src at /mnt/y/usr/src - On sony, `cd /mnt/y/usr/src/25' - <copy stuff> - make modules_install - depmod -a IOW, I run the kernel's installation tools on the *target* machine, within an nfs mount of the *build* machine. This has worked happily for five or more years. But now: Failed to open destination file: Permission deniedihex2fw: Convert ihex files into binary representation for use by Linux kernel usage: ihex2fw [<options>] <src.HEX> <dst.fw> -w: wide records (16-bit length) -s: sort records by address make[1]: *** [firmware/emi26/loader.fw] Error 1 make: *** [_modinst_post] Error 2 This is because the target machine is i386 and it is trying to execute an x86_64 binary. and oh dear, the clockevents code just oopsed. firmware: requesting ipw2200-bss.fw ipw2200: Radio Frequency Kill Switch is On: Kill switch must be turned off for wireless networking to work. ipw2200: Detected geography ZZA (11 802.11bg channels, 13 802.11a channels) initcall ipw_init+0x0/0x71 [ipw2200] returned 0 after 163 msecs ipw2200: Failed to send WEP_KEY: Aborted due to RF kill switch. BUG: unable to handle kernel NULL pointer dereference at 00000040 IP: [<c0126e7f>] get_next_timer_interrupt+0xe9/0x1ab *pde = 00000000 Oops: 0000 [#1] PREEMPT Modules linked in: ipw2200 sonypi ipv6 autofs4 hidp l2cap bluetooth sunrpc nf_conntrack_netbios_ns ipt_REJECT nf_conntrack_ipv4 xt_state nf_conntrack xt_tcpudp iptable_filter ip_tables x_tables acpi_cpufreq nvram ohci1394 ieee1394 ehci_hcd uhci_hcd sg joydev snd_hda_intel snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss ieee80211 3c59x snd_pcm ieee80211_crypt sr_mod snd_timer cdrom snd i2c_i801 soundcore snd_page_alloc button i2c_core pcspkr ext3 jbd [last unloaded: ipw2200] Pid: 0, comm: swapper Not tainted (2.6.27-rc5 #18) EIP: 0060:[<c0126e7f>] EFLAGS: 00010013 CPU: 0 EIP is at get_next_timer_interrupt+0xe9/0x1ab EAX: 00000040 EBX: 00000001 ECX: 0000001d EDX: 00000040 ESI: 0000001d EDI: c05bc700 EBP: c0469f1c ESP: c0469ee4 DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 Process swapper (pid: 0, ti=c0468000 task=c04343c0 task.ti=c0468000) Stack: ffff1cef c013cc1d c05bcf28 00000000 c05bd010 c05bc798 00ffff1d c05bcf28 c05bd128 c05bd328 c05bd528 00000000 b65eb8b3 0000000f c0469f4c c013816f 00000000 b65c1f00 0000000f ffff1cef 00000046 00000096 c04b11c0 00000000 Call Trace: [<c013cc1d>] ? __lock_acquire+0x671/0x6b7 [<c013816f>] ? tick_nohz_stop_sched_tick+0x13f/0x2ba [<c0123640>] ? irq_exit+0x6d/0x79 [<c0105c6a>] ? do_IRQ+0x6d/0x7f [<c0104300>] ? common_interrupt+0x28/0x30 [<c013007b>] ? set_process_cpu_timer+0x94/0xb9 [<c0234743>] ? acpi_processor_idle+0x2a6/0x44b [<c010256b>] ? cpu_idle+0x5a/0x87 [<c031e2fd>] ? rest_init+0x61/0x63 ======================= Code: 83 e6 3f 89 f1 89 5d d0 8b 45 d0 89 d3 8d 04 c8 89 45 d8 8b 00 eb 14 8b 40 08 bb 01 00 00 00 3b 45 cc 0f 49 45 cc 89 45 cc 89 d0 <8b> 10 0f 18 02 90 3b 45 d8 75 e1 85 db 89 da 74 0c 85 f6 74 04 EIP: [<c0126e7f>] get_next_timer_interrupt+0xe9/0x1ab SS:ESP 0068:c0469ee4 ---[ end trace 2cf31fb827f3051f ]--- Kernel panic - not syncing: Attempted to kill the idle task! BUG: NMI Watchdog detected LOCKUP on CPU0, ip c01f584e, registers: ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: linux-next: Tree for September 3 2008-09-04 18:34 ` Andrew Morton @ 2008-09-04 20:31 ` Eric W. Biederman 2008-09-04 20:41 ` Andrew Morton 2008-09-04 22:45 ` Thomas Gleixner 1 sibling, 1 reply; 15+ messages in thread From: Eric W. Biederman @ 2008-09-04 20:31 UTC (permalink / raw) To: Andrew Morton Cc: Linus Torvalds, Stephen Rothwell, linux-next, LKML, Yinghai Lu, Ivan Kokshaysky, Jesse Barnes, netdev, Al Viro, David Woodhouse, Sam Ravnborg, john stultz, Thomas Gleixner Andrew Morton <akpm@linux-foundation.org> writes: > On Thu, 4 Sep 2008 11:05:21 -0700 (PDT) Linus Torvalds > <torvalds@linux-foundation.org> wrote: > >> > This is a pull of your tree from yesterday, ending at commit >> > fbb16e243887332dd5754e48ffe5b963378f3cd2 >> >> There's been various suggested patches by Al/Eric (added to cc) for >> /proc/net handling, but none of them have actually even been merged yet. >> So I don't think this code has changed in a while. >> >> Al, Eric, ideas? There aren't any issues I know of with normal configurations and the current proc code. > I don't think I saw it on any other test machines. > > This machine runs SELinux. Distro is FC5. >> > config: http://userweb.kernel.org/~akpm/config-sony.txt >> > dmesg: http://userweb.kernel.org/~akpm/dmesg-sony-without.txt >> >> That whole thing should just be a simple symlink: >> >> fs/proc/proc_net.c: proc_symlink("net", NULL, "self/net"); > > /proc/self/net looks fine. > >> are you sure it's a plain tree of mine, without any of the patches >> floating around between Eric/Al? > > yup, it's yesterday's mainline. Does the problem happen if you disable selinux? This feels like a case of selinux being over zealous. Eric ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: linux-next: Tree for September 3 2008-09-04 20:31 ` Eric W. Biederman @ 2008-09-04 20:41 ` Andrew Morton 2008-09-04 21:03 ` Eric W. Biederman 0 siblings, 1 reply; 15+ messages in thread From: Andrew Morton @ 2008-09-04 20:41 UTC (permalink / raw) To: Eric W. Biederman Cc: torvalds, sfr, linux-next, linux-kernel, yhlu.kernel, ink, jbarnes, netdev, viro, dwmw2, sam, johnstul, tglx On Thu, 04 Sep 2008 13:31:01 -0700 ebiederm@xmission.com (Eric W. Biederman) wrote: > >> > config: http://userweb.kernel.org/~akpm/config-sony.txt > >> > dmesg: http://userweb.kernel.org/~akpm/dmesg-sony-without.txt > >> > >> That whole thing should just be a simple symlink: > >> > >> fs/proc/proc_net.c: proc_symlink("net", NULL, "self/net"); > > > > /proc/self/net looks fine. > > > >> are you sure it's a plain tree of mine, without any of the patches > >> floating around between Eric/Al? > > > > yup, it's yesterday's mainline. > > Does the problem happen if you disable selinux? > > This feels like a case of selinux being over zealous. yeah, adding `selinux=0' to the boot command line fixes it. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: linux-next: Tree for September 3 2008-09-04 20:41 ` Andrew Morton @ 2008-09-04 21:03 ` Eric W. Biederman 2008-09-04 22:22 ` Andrew Morton 0 siblings, 1 reply; 15+ messages in thread From: Eric W. Biederman @ 2008-09-04 21:03 UTC (permalink / raw) To: Andrew Morton Cc: torvalds, sfr, linux-next, linux-kernel, yhlu.kernel, ink, jbarnes, netdev, viro, dwmw2, sam, johnstul, tglx Andrew Morton <akpm@linux-foundation.org> writes: > On Thu, 04 Sep 2008 13:31:01 -0700 > ebiederm@xmission.com (Eric W. Biederman) wrote: > >> >> are you sure it's a plain tree of mine, without any of the patches >> >> floating around between Eric/Al? >> > >> > yup, it's yesterday's mainline. >> >> Does the problem happen if you disable selinux? >> >> This feels like a case of selinux being over zealous. > > yeah, adding `selinux=0' to the boot command line fixes it. The proc generic directory back structure is the same. As requested by the selinux folks. So I don't expect there is much more we can do on the /proc side. When we get the interaction bug between the VFS and /proc/net fixed I wonder if there will be some more selinux fall out. Something to think about. Eric ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: linux-next: Tree for September 3 2008-09-04 21:03 ` Eric W. Biederman @ 2008-09-04 22:22 ` Andrew Morton 0 siblings, 0 replies; 15+ messages in thread From: Andrew Morton @ 2008-09-04 22:22 UTC (permalink / raw) To: Eric W. Biederman Cc: torvalds, sfr, linux-next, linux-kernel, yhlu.kernel, ink, jbarnes, netdev, viro, dwmw2, sam, johnstul, tglx On Thu, 04 Sep 2008 14:03:41 -0700 ebiederm@xmission.com (Eric W. Biederman) wrote: > Andrew Morton <akpm@linux-foundation.org> writes: > > > On Thu, 04 Sep 2008 13:31:01 -0700 > > ebiederm@xmission.com (Eric W. Biederman) wrote: > > > >> >> are you sure it's a plain tree of mine, without any of the patches > >> >> floating around between Eric/Al? > >> > > >> > yup, it's yesterday's mainline. > >> > >> Does the problem happen if you disable selinux? > >> > >> This feels like a case of selinux being over zealous. > > > > yeah, adding `selinux=0' to the boot command line fixes it. > > The proc generic directory back structure is the same. As requested by > the selinux folks. So I don't expect there is much more we can do on > the /proc side. > > When we get the interaction bug between the VFS and /proc/net fixed I wonder > if there will be some more selinux fall out. Something to think about. fyi, that machine is x86_32-on-FC5. My x86_64-on-FC6 test box is also running selinux and has the same bug. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: linux-next: Tree for September 3 2008-09-04 18:34 ` Andrew Morton 2008-09-04 20:31 ` Eric W. Biederman @ 2008-09-04 22:45 ` Thomas Gleixner 2008-09-04 23:17 ` Linus Torvalds 2008-09-04 23:17 ` Andrew Morton 1 sibling, 2 replies; 15+ messages in thread From: Thomas Gleixner @ 2008-09-04 22:45 UTC (permalink / raw) To: Andrew Morton Cc: Linus Torvalds, Stephen Rothwell, linux-next, LKML, Yinghai Lu, Ivan Kokshaysky, Jesse Barnes, netdev, Al Viro, Eric W. Biederman, David Woodhouse, Sam Ravnborg, john stultz On Thu, 4 Sep 2008, Andrew Morton wrote: > > and oh dear, the clockevents code just oopsed. Sigh. > BUG: unable to handle kernel NULL pointer dereference at 00000040 > IP: [<c0126e7f>] get_next_timer_interrupt+0xe9/0x1ab Cute, NULL pointer in the timer check code. Can you please addr2line the exact code line or upload the vmlinux somewhere ? Thanks, tglx ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: linux-next: Tree for September 3 2008-09-04 22:45 ` Thomas Gleixner @ 2008-09-04 23:17 ` Linus Torvalds 2008-09-05 5:39 ` Arjan van de Ven 2008-09-04 23:17 ` Andrew Morton 1 sibling, 1 reply; 15+ messages in thread From: Linus Torvalds @ 2008-09-04 23:17 UTC (permalink / raw) To: Thomas Gleixner Cc: Andrew Morton, Stephen Rothwell, linux-next, LKML, Yinghai Lu, Ivan Kokshaysky, Jesse Barnes, netdev, Al Viro, Eric W. Biederman, David Woodhouse, Sam Ravnborg, john stultz On Fri, 5 Sep 2008, Thomas Gleixner wrote: > > > BUG: unable to handle kernel NULL pointer dereference at 00000040 > > IP: [<c0126e7f>] get_next_timer_interrupt+0xe9/0x1ab > > Cute, NULL pointer in the timer check code. Can you please addr2line > the exact code line or upload the vmlinux somewhere ? Use "scrips/decodecode" (with AFLAGS=--32 since this is x86-32). It shows (after some cleanup and editing): 3: 89 f1 mov %esi,%ecx 5: 89 5d d0 mov %ebx,-0x30(%ebp) 8: 8b 45 d0 mov -0x30(%ebp),%eax b: 89 d3 mov %edx,%ebx d: 8d 04 c8 lea (%eax,%ecx,8),%eax 10: 89 45 d8 mov %eax,-0x28(%ebp) 13: 8b 00 mov (%eax),%eax 15: eb 14 jmp 0x2b ----------------------+ 17: 8b 40 08 mov 0x8(%eax),%eax <--------+ | 1a: bb 01 00 00 00 mov $0x1,%ebx | | 1f: 3b 45 cc cmp -0x34(%ebp),%eax | | 22: 0f 49 45 cc cmovns -0x34(%ebp),%eax | | 26: 89 45 cc mov %eax,-0x34(%ebp) | | 29: 89 d0 mov %edx,%eax | | *** 2b: 8b 10 mov (%eax),%edx | <-+ 2d: 0f 18 02 prefetchnta (%edx) | 30: 90 nop | 31: 3b 45 d8 cmp -0x28(%ebp),%eax | 34: 75 e1 jne 17 ---------------------+ 36: 85 db test %ebx,%ebx 38: 89 da mov %ebx,%edx 3a: 74 0c je 0x48 3c: 85 f6 test %esi,%esi 3e: 74 04 je 0x44 and that "prefetchnta" is a dead giveaway: it's a "list_for_each_entry()" loop. And looking at the registers: > > Pid: 0, comm: swapper Not tainted (2.6.27-rc5 #18) > > EIP: 0060:[<c0126e7f>] EFLAGS: 00010013 CPU: 0 > > EIP is at get_next_timer_interrupt+0xe9/0x1ab > > EAX: 00000040 EBX: 00000001 ECX: 0000001d EDX: 00000040 > > ESI: 0000001d EDI: c05bc700 EBP: c0469f1c ESP: c0469ee4 > > DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 > > Process swapper (pid: 0, ti=c0468000 task=c04343c0 task.ti=c0468000) > > Stack: ffff1cef c013cc1d c05bcf28 00000000 c05bd010 c05bc798 00ffff1d c05bcf28 > > c05bd128 c05bd328 c05bd528 00000000 b65eb8b3 0000000f c0469f4c c013816f > > 00000000 b65c1f00 0000000f ffff1cef 00000046 00000096 c04b11c0 00000000 > > Call Trace: > > [<c013cc1d>] ? __lock_acquire+0x671/0x6b7 > > [<c013816f>] ? tick_nohz_stop_sched_tick+0x13f/0x2ba since %eax == %edx, it's not the first iteration through the loop. IOW, it's this loop (kernel/timer.c, line 863): list_for_each_entry(nte, varp->vec + slot, entry) { found = 1; if (time_before(nte->expires, expires)) expires = nte->expires; } as can be seen by looking at the loop body (that "mov $0x1,%ebx" thing is the "found = 1;" thing. The next list entry pointer is obviously corrupt: it's 0x00000040, which is clearly not a valid pointer. Looks like %ecx contains 'slot' (0x1d), but that's the only other piece of info I can see in the register state. I do wonder if there isn't some memory corruption going on here. The SElinux thing didn't look very sane either (even if it's a SElinux permission issue, the inode is corrupt, since the mode is crap). Linus ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: linux-next: Tree for September 3 2008-09-04 23:17 ` Linus Torvalds @ 2008-09-05 5:39 ` Arjan van de Ven 0 siblings, 0 replies; 15+ messages in thread From: Arjan van de Ven @ 2008-09-05 5:39 UTC (permalink / raw) To: Linus Torvalds Cc: Thomas Gleixner, Andrew Morton, Stephen Rothwell, linux-next, LKML, Yinghai Lu, Ivan Kokshaysky, Jesse Barnes, netdev, Al Viro, Eric W. Biederman, David Woodhouse, Sam Ravnborg, john stultz On Thu, 4 Sep 2008 16:17:01 -0700 (PDT) Linus Torvalds <torvalds@linux-foundation.org> wrote: > > > On Fri, 5 Sep 2008, Thomas Gleixner wrote: > > > > > BUG: unable to handle kernel NULL pointer dereference at 00000040 > > > IP: [<c0126e7f>] get_next_timer_interrupt+0xe9/0x1ab > > > > Cute, NULL pointer in the timer check code. Can you please addr2line > > the exact code line or upload the vmlinux somewhere ? > > Use "scrips/decodecode" (with AFLAGS=--32 since this is x86-32). It > shows (after some cleanup and editing): btw if the oops was on lkml like it was here you can also just look it up on kerneloops.org via the search option which for this case finds you http://www.kerneloops.org/raw.php?rawid=59347&msgid=http://mid.gmane.org/20080904113408.d47c65f6.akpm@linux-foundation.org and this has the decodecode already done and the search output at http://www.kerneloops.org/search.php?search=get_next_timer_interrupt shows that there's only been very few reports of this one, but of the ones there were at least half were slab poisoned. what the site doesn't do for an oops like this is show the C-code interspersed, it's not that smart for general kernels (that only works for fedora rpm kernels like here: http://www.kerneloops.org/raw.php?rawid=57807&msgid= ) otoh if you have the vmlinux you could do this locally (hmm maybe I should clean that script up and submit for adding to the scripts/ directory) -- If you want to reach me at my work email, use arjan@linux.intel.com For development, discussion and tips for power savings, visit http://www.lesswatts.org ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: linux-next: Tree for September 3 2008-09-04 22:45 ` Thomas Gleixner 2008-09-04 23:17 ` Linus Torvalds @ 2008-09-04 23:17 ` Andrew Morton 2008-09-04 23:25 ` Linus Torvalds 2008-09-04 23:27 ` Thomas Gleixner 1 sibling, 2 replies; 15+ messages in thread From: Andrew Morton @ 2008-09-04 23:17 UTC (permalink / raw) To: Thomas Gleixner Cc: torvalds, sfr, linux-next, linux-kernel, yhlu.kernel, ink, jbarnes, netdev, viro, ebiederm, dwmw2, sam, johnstul On Fri, 5 Sep 2008 00:45:33 +0200 (CEST) Thomas Gleixner <tglx@linutronix.de> wrote: > On Thu, 4 Sep 2008, Andrew Morton wrote: > > > > and oh dear, the clockevents code just oopsed. > > Sigh. > > > BUG: unable to handle kernel NULL pointer dereference at 00000040 > > IP: [<c0126e7f>] get_next_timer_interrupt+0xe9/0x1ab > > Cute, NULL pointer in the timer check code. Can you please addr2line > the exact code line or upload the vmlinux somewhere ? > erm, I might have lost that binary, and it only happened the once. It happened shortly after the machine had fully booted, during establishment of the first sshd session. It nuked the machine really well, too. I had to pull the battery to get it back. fwiw: (gdb) l *0xc0126e7f 0xc0126e7f is in get_next_timer_interrupt (kernel/timer.c:863). warning: Source file is more recent than executable. 858 for (array = 0; array < 4; array++) { 859 struct tvec *varp = varray[array]; 860 861 index = slot = timer_jiffies & TVN_MASK; 862 do { 863 list_for_each_entry(nte, varp->vec + slot, entry) { 864 found = 1; 865 if (time_before(nte->expires, expires)) 866 expires = nte->expires; 867 } which looks reasonable. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: linux-next: Tree for September 3 2008-09-04 23:17 ` Andrew Morton @ 2008-09-04 23:25 ` Linus Torvalds 2008-09-04 23:27 ` Thomas Gleixner 1 sibling, 0 replies; 15+ messages in thread From: Linus Torvalds @ 2008-09-04 23:25 UTC (permalink / raw) To: Andrew Morton Cc: Thomas Gleixner, sfr, linux-next, linux-kernel, yhlu.kernel, ink, jbarnes, netdev, viro, ebiederm, dwmw2, sam, johnstul On Thu, 4 Sep 2008, Andrew Morton wrote: > > erm, I might have lost that binary, and it only happened the once. It > happened shortly after the machine had fully booted, during > establishment of the first sshd session. Considering that both this and your odd /proc/net issue look like memory corruption, maybe CONFIG_DEBUG_SLAB and CONFIG_DEBUG_PAGEALLOC are worth testing? Linus ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: linux-next: Tree for September 3 2008-09-04 23:17 ` Andrew Morton 2008-09-04 23:25 ` Linus Torvalds @ 2008-09-04 23:27 ` Thomas Gleixner 2008-09-05 11:04 ` Ingo Molnar 1 sibling, 1 reply; 15+ messages in thread From: Thomas Gleixner @ 2008-09-04 23:27 UTC (permalink / raw) To: Andrew Morton Cc: torvalds, sfr, linux-next, linux-kernel, yhlu.kernel, ink, jbarnes, netdev, viro, ebiederm, dwmw2, sam, johnstul On Thu, 4 Sep 2008, Andrew Morton wrote: > > > > Cute, NULL pointer in the timer check code. Can you please addr2line > > the exact code line or upload the vmlinux somewhere ? > > > > erm, I might have lost that binary, and it only happened the once. It > happened shortly after the machine had fully booted, during > establishment of the first sshd session. > > It nuked the machine really well, too. I had to pull the battery to > get it back. Known problem on Sonys. :( > fwiw: > > (gdb) l *0xc0126e7f > 0xc0126e7f is in get_next_timer_interrupt (kernel/timer.c:863). > warning: Source file is more recent than executable. > 858 for (array = 0; array < 4; array++) { > 859 struct tvec *varp = varray[array]; > 860 > 861 index = slot = timer_jiffies & TVN_MASK; > 862 do { > 863 list_for_each_entry(nte, varp->vec + slot, entry) { > 864 found = 1; > 865 if (time_before(nte->expires, expires)) > 866 expires = nte->expires; > 867 } > > which looks reasonable. Yeah, as Linus decoded it's that loop. So we look at some corrupted entry here. CONFIG_DEBUG_OBJECTS (add debug_objects to the command line as well) should catch it when this is a timer being discarded, freed or reinitialized. Otherwise, when it is just random corruption it wont help much. Thanks, tglx ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: linux-next: Tree for September 3 2008-09-04 23:27 ` Thomas Gleixner @ 2008-09-05 11:04 ` Ingo Molnar 2008-09-05 17:49 ` Andrew Morton 0 siblings, 1 reply; 15+ messages in thread From: Ingo Molnar @ 2008-09-05 11:04 UTC (permalink / raw) To: Thomas Gleixner Cc: Andrew Morton, torvalds, sfr, linux-next, linux-kernel, yhlu.kernel, ink, jbarnes, netdev, viro, ebiederm, dwmw2, sam, johnstul * Thomas Gleixner <tglx@linutronix.de> wrote: > On Thu, 4 Sep 2008, Andrew Morton wrote: > > > > > > Cute, NULL pointer in the timer check code. Can you please addr2line > > > the exact code line or upload the vmlinux somewhere ? > > > > > > > erm, I might have lost that binary, and it only happened the once. It > > happened shortly after the machine had fully booted, during > > establishment of the first sshd session. > > > > It nuked the machine really well, too. I had to pull the battery to > > get it back. > > Known problem on Sonys. :( > > > fwiw: > > > > (gdb) l *0xc0126e7f > > 0xc0126e7f is in get_next_timer_interrupt (kernel/timer.c:863). > > warning: Source file is more recent than executable. > > 858 for (array = 0; array < 4; array++) { > > 859 struct tvec *varp = varray[array]; > > 860 > > 861 index = slot = timer_jiffies & TVN_MASK; > > 862 do { > > 863 list_for_each_entry(nte, varp->vec + slot, entry) { > > 864 found = 1; > > 865 if (time_before(nte->expires, expires)) > > 866 expires = nte->expires; > > 867 } > > > > which looks reasonable. > > Yeah, as Linus decoded it's that loop. So we look at some corrupted > entry here. > > CONFIG_DEBUG_OBJECTS (add debug_objects to the command line as well) > should catch it when this is a timer being discarded, freed or > reinitialized. > > Otherwise, when it is just random corruption it wont help much. i guess CONFIG_DEBUG_OBJECTS_TIMERS=y is practical, and CONFIG_DEBUG_LIST=y would be nice as well - it can catch memory corruptions rather early and is relatively light-weight. [ and if there's any reproducability of the corruption and if it happens at a stable kernel address then a small custom hack in ftrace can catch it the moment it happens. ] Ingo ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: linux-next: Tree for September 3 2008-09-05 11:04 ` Ingo Molnar @ 2008-09-05 17:49 ` Andrew Morton 0 siblings, 0 replies; 15+ messages in thread From: Andrew Morton @ 2008-09-05 17:49 UTC (permalink / raw) To: Ingo Molnar Cc: Thomas Gleixner, torvalds, sfr, linux-next, linux-kernel, yhlu.kernel, ink, jbarnes, netdev, viro, ebiederm, dwmw2, sam, johnstul On Fri, 5 Sep 2008 13:04:11 +0200 Ingo Molnar <mingo@elte.hu> wrote: > > * Thomas Gleixner <tglx@linutronix.de> wrote: > > > On Thu, 4 Sep 2008, Andrew Morton wrote: > > > > > > > > Cute, NULL pointer in the timer check code. Can you please addr2line > > > > the exact code line or upload the vmlinux somewhere ? > > > > > > > > > > erm, I might have lost that binary, and it only happened the once. It > > > happened shortly after the machine had fully booted, during > > > establishment of the first sshd session. > > > > > > It nuked the machine really well, too. I had to pull the battery to > > > get it back. > > > > Known problem on Sonys. :( > > > > > fwiw: > > > > > > (gdb) l *0xc0126e7f > > > 0xc0126e7f is in get_next_timer_interrupt (kernel/timer.c:863). > > > warning: Source file is more recent than executable. > > > 858 for (array = 0; array < 4; array++) { > > > 859 struct tvec *varp = varray[array]; > > > 860 > > > 861 index = slot = timer_jiffies & TVN_MASK; > > > 862 do { > > > 863 list_for_each_entry(nte, varp->vec + slot, entry) { > > > 864 found = 1; > > > 865 if (time_before(nte->expires, expires)) > > > 866 expires = nte->expires; > > > 867 } > > > > > > which looks reasonable. > > > > Yeah, as Linus decoded it's that loop. So we look at some corrupted > > entry here. > > > > CONFIG_DEBUG_OBJECTS (add debug_objects to the command line as well) > > should catch it when this is a timer being discarded, freed or > > reinitialized. > > > > Otherwise, when it is just random corruption it wont help much. > > i guess CONFIG_DEBUG_OBJECTS_TIMERS=y is practical, and > CONFIG_DEBUG_LIST=y would be nice as well - it can catch memory > corruptions rather early and is relatively light-weight. I tested rc5-mm1 with all debug options except PAGEALLOC. No help. > [ and if there's any reproducability of the corruption and if it happens > at a stable kernel address then a small custom hack in ftrace can > catch it the moment it happens. ] It was a once-off. ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2008-09-05 17:49 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20080903191619.6b6b230e.sfr@canb.auug.org.au>
[not found] ` <20080903214634.ea17ff53.akpm@linux-foundation.org>
[not found] ` <alpine.LFD.1.10.0809032201510.3378@nehalem.linux-foundation.org>
[not found] ` <20080903223318.84b6ce8b.akpm@linux-foundation.org>
[not found] ` <alpine.LFD.1.10.0809040045190.3378@nehalem.linux-foundation.org>
[not found] ` <20080904012544.cabed847.akpm@linux-foundation.org>
[not found] ` <alpine.LFD.1.10.0809040143350.3452@nehalem.linux-foundation.org>
[not found] ` <20080904015701.5959623a.akpm@linux-foundation.org>
[not found] ` <alpine.LFD.1.10.0809040203510.3452@nehalem.linux-foundation.org>
2008-09-04 17:45 ` linux-next: Tree for September 3 Andrew Morton
2008-09-04 18:05 ` Linus Torvalds
2008-09-04 18:34 ` Andrew Morton
2008-09-04 20:31 ` Eric W. Biederman
2008-09-04 20:41 ` Andrew Morton
2008-09-04 21:03 ` Eric W. Biederman
2008-09-04 22:22 ` Andrew Morton
2008-09-04 22:45 ` Thomas Gleixner
2008-09-04 23:17 ` Linus Torvalds
2008-09-05 5:39 ` Arjan van de Ven
2008-09-04 23:17 ` Andrew Morton
2008-09-04 23:25 ` Linus Torvalds
2008-09-04 23:27 ` Thomas Gleixner
2008-09-05 11:04 ` Ingo Molnar
2008-09-05 17:49 ` Andrew Morton
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).