netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: linux-next: Tree for September 3
       [not found]               ` <alpine.LFD.1.10.0809040203510.3452@nehalem.linux-foundation.org>
@ 2008-09-04 17:45                 ` Andrew Morton
  2008-09-04 18:05                   ` Linus Torvalds
  0 siblings, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2008-09-04 17:45 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Stephen Rothwell, linux-next, LKML, Yinghai Lu, Ivan Kokshaysky,
	Jesse Barnes, netdev, Al Viro

On Thu, 4 Sep 2008 02:07:05 -0700 (PDT) Linus Torvalds <torvalds@linux-foundation.org> wrote:

> 
> 
> On Thu, 4 Sep 2008, Andrew Morton wrote:
> > > 
> > > commit 5f17cfce5776c566d64430f543a289e5cfa4538b
> > > 
> > >     PCI: fix pbus_size_mem() resource alignment for CardBus controllers
> > 
> > Is this worth backporting into 2.6.26.x?
> 
> It's certainly at least *potentially* -stable material, but because I 
> suspect that the whole yenta_allocate_resources() -> pci_setup_cardbus() 
> fallback code will end up resulting in a working setup, it may not be 
> worth it.  At least not until we've heard from more people..
> 
> Can you check whether your vortex cardbus thing _works_ even without the 
> fix?
> 

Working on it, but got distracted by a /proc/net bug.



sony:/home/akpm> ifconfig -a
Warning: cannot open /proc/net/dev (Permission denied). Limited output.
Warning: cannot open /proc/net/dev (Permission denied). Limited output.
eth0      Link encap:Ethernet  HWaddr 00:01:4A:9F:7C:79  
          inet addr:192.168.2.10  Bcast:192.168.2.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

Warning: cannot open /proc/net/dev (Permission denied). Limited output.
lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:16436  Metric:1

sony:/home/akpm> ls -l /proc/net/dev
ls: /proc/net/dev: Permission denied

sony:/home/akpm> ls -l /proc/net
ls: /proc/net: Permission denied

sony:/home/akpm> ls -ld /proc/net
ls: /proc/net: Permission denied

sony:/home/akpm> ls -l /proc | grep net
?---------  ? ?         ?                 ?            ? /proc/net


This is a pull of your tree from yesterday, ending at


commit fbb16e243887332dd5754e48ffe5b963378f3cd2
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Wed Sep 3 00:54:47 2008 +0200

    [x86] Fix TSC calibration issues

config: http://userweb.kernel.org/~akpm/config-sony.txt
dmesg: http://userweb.kernel.org/~akpm/dmesg-sony-without.txt

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: linux-next: Tree for September 3
  2008-09-04 17:45                 ` linux-next: Tree for September 3 Andrew Morton
@ 2008-09-04 18:05                   ` Linus Torvalds
  2008-09-04 18:34                     ` Andrew Morton
  0 siblings, 1 reply; 15+ messages in thread
From: Linus Torvalds @ 2008-09-04 18:05 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Stephen Rothwell, linux-next, LKML, Yinghai Lu, Ivan Kokshaysky,
	Jesse Barnes, netdev, Al Viro, Eric W. Biederman, Al Viro


On Thu, 4 Sep 2008, Andrew Morton wrote:
> 
> Working on it, but got distracted by a /proc/net bug.

Ok, that's just something odd.

> sony:/home/akpm> ls -l /proc/net/dev
> ls: /proc/net/dev: Permission denied
> 
> sony:/home/akpm> ls -l /proc/net
> ls: /proc/net: Permission denied
> 
> sony:/home/akpm> ls -ld /proc/net
> ls: /proc/net: Permission denied
> 
> sony:/home/akpm> ls -l /proc | grep net
> ?---------  ? ?         ?                 ?            ? /proc/net
> 
> This is a pull of your tree from yesterday, ending at commit 
> fbb16e243887332dd5754e48ffe5b963378f3cd2

There's been various suggested patches by Al/Eric (added to cc) for 
/proc/net handling, but none of them have actually even been merged yet. 
So I don't think this code has changed in a while. 

Al, Eric, ideas?

> config: http://userweb.kernel.org/~akpm/config-sony.txt
> dmesg: http://userweb.kernel.org/~akpm/dmesg-sony-without.txt

That whole thing should just be a simple symlink:

	fs/proc/proc_net.c:     proc_symlink("net", NULL, "self/net");

are you sure it's a plain tree of mine, without any of the patches 
floating around between Eric/Al?

			Linus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: linux-next: Tree for September 3
  2008-09-04 18:05                   ` Linus Torvalds
@ 2008-09-04 18:34                     ` Andrew Morton
  2008-09-04 20:31                       ` Eric W. Biederman
  2008-09-04 22:45                       ` Thomas Gleixner
  0 siblings, 2 replies; 15+ messages in thread
From: Andrew Morton @ 2008-09-04 18:34 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Stephen Rothwell, linux-next, LKML, Yinghai Lu, Ivan Kokshaysky,
	Jesse Barnes, netdev, Al Viro, Eric W. Biederman, David Woodhouse,
	Sam Ravnborg, john stultz, Thomas Gleixner

On Thu, 4 Sep 2008 11:05:21 -0700 (PDT) Linus Torvalds <torvalds@linux-foundation.org> wrote:

> > This is a pull of your tree from yesterday, ending at commit 
> > fbb16e243887332dd5754e48ffe5b963378f3cd2
> 
> There's been various suggested patches by Al/Eric (added to cc) for 
> /proc/net handling, but none of them have actually even been merged yet. 
> So I don't think this code has changed in a while. 
> 
> Al, Eric, ideas?

I don't think I saw it on any other test machines.

This machine runs SELinux.  Distro is FC5.

> > config: http://userweb.kernel.org/~akpm/config-sony.txt
> > dmesg: http://userweb.kernel.org/~akpm/dmesg-sony-without.txt
> 
> That whole thing should just be a simple symlink:
> 
> 	fs/proc/proc_net.c:     proc_symlink("net", NULL, "self/net");

/proc/self/net looks fine.

> are you sure it's a plain tree of mine, without any of the patches 
> floating around between Eric/Al?

yup, it's yesterday's mainline.



Found another problem.

The way I install kernels on machine `sony' is:

- Build the kernel on machine `y', in /usr/src/25

- On machine sony, NFS mount y:/usr/src at /mnt/y/usr/src

- On sony, `cd /mnt/y/usr/src/25'

- <copy stuff>

- make modules_install

- depmod -a

IOW, I run the kernel's installation tools on the *target* machine,
within an nfs mount of the *build* machine.

This has worked happily for five or more years.  But now:

Failed to open destination file: Permission deniedihex2fw: Convert ihex files into binary representation for use by Linux kernel
usage: ihex2fw [<options>] <src.HEX> <dst.fw>
       -w: wide records (16-bit length)
       -s: sort records by address
make[1]: *** [firmware/emi26/loader.fw] Error 1
make: *** [_modinst_post] Error 2

This is because the target machine is i386 and it is trying to execute
an x86_64 binary.





and oh dear, the clockevents code just oopsed.

firmware: requesting ipw2200-bss.fw
ipw2200: Radio Frequency Kill Switch is On:
Kill switch must be turned off for wireless networking to work.
ipw2200: Detected geography ZZA (11 802.11bg channels, 13 802.11a channels)
initcall ipw_init+0x0/0x71 [ipw2200] returned 0 after 163 msecs
ipw2200: Failed to send WEP_KEY: Aborted due to RF kill switch.
BUG: unable to handle kernel NULL pointer dereference at 00000040
IP: [<c0126e7f>] get_next_timer_interrupt+0xe9/0x1ab
*pde = 00000000 
Oops: 0000 [#1] PREEMPT 
Modules linked in: ipw2200 sonypi ipv6 autofs4 hidp l2cap bluetooth sunrpc nf_conntrack_netbios_ns ipt_REJECT nf_conntrack_ipv4 xt_state nf_conntrack xt_tcpudp iptable_filter ip_tables x_tables acpi_cpufreq nvram ohci1394 ieee1394 ehci_hcd uhci_hcd sg joydev snd_hda_intel snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss ieee80211 3c59x snd_pcm ieee80211_crypt sr_mod snd_timer cdrom snd i2c_i801 soundcore snd_page_alloc button i2c_core pcspkr ext3 jbd [last unloaded: ipw2200]

Pid: 0, comm: swapper Not tainted (2.6.27-rc5 #18)
EIP: 0060:[<c0126e7f>] EFLAGS: 00010013 CPU: 0
EIP is at get_next_timer_interrupt+0xe9/0x1ab
EAX: 00000040 EBX: 00000001 ECX: 0000001d EDX: 00000040
ESI: 0000001d EDI: c05bc700 EBP: c0469f1c ESP: c0469ee4
 DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
Process swapper (pid: 0, ti=c0468000 task=c04343c0 task.ti=c0468000)
Stack: ffff1cef c013cc1d c05bcf28 00000000 c05bd010 c05bc798 00ffff1d c05bcf28 
       c05bd128 c05bd328 c05bd528 00000000 b65eb8b3 0000000f c0469f4c c013816f 
       00000000 b65c1f00 0000000f ffff1cef 00000046 00000096 c04b11c0 00000000 
Call Trace:
 [<c013cc1d>] ? __lock_acquire+0x671/0x6b7
 [<c013816f>] ? tick_nohz_stop_sched_tick+0x13f/0x2ba
 [<c0123640>] ? irq_exit+0x6d/0x79
 [<c0105c6a>] ? do_IRQ+0x6d/0x7f
 [<c0104300>] ? common_interrupt+0x28/0x30
 [<c013007b>] ? set_process_cpu_timer+0x94/0xb9
 [<c0234743>] ? acpi_processor_idle+0x2a6/0x44b
 [<c010256b>] ? cpu_idle+0x5a/0x87
 [<c031e2fd>] ? rest_init+0x61/0x63
 =======================
Code: 83 e6 3f 89 f1 89 5d d0 8b 45 d0 89 d3 8d 04 c8 89 45 d8 8b 00 eb 14 8b 40 08 bb 01 00 00 00 3b 45 cc 0f 49 45 cc 89 45 cc 89 d0 <8b> 10 0f 18 02 90 3b 45 d8 75 e1 85 db 89 da 74 0c 85 f6 74 04 
EIP: [<c0126e7f>] get_next_timer_interrupt+0xe9/0x1ab SS:ESP 0068:c0469ee4
---[ end trace 2cf31fb827f3051f ]---
Kernel panic - not syncing: Attempted to kill the idle task!
BUG: NMI Watchdog detected LOCKUP on CPU0, ip c01f584e, registers:



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: linux-next: Tree for September 3
  2008-09-04 18:34                     ` Andrew Morton
@ 2008-09-04 20:31                       ` Eric W. Biederman
  2008-09-04 20:41                         ` Andrew Morton
  2008-09-04 22:45                       ` Thomas Gleixner
  1 sibling, 1 reply; 15+ messages in thread
From: Eric W. Biederman @ 2008-09-04 20:31 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linus Torvalds, Stephen Rothwell, linux-next, LKML, Yinghai Lu,
	Ivan Kokshaysky, Jesse Barnes, netdev, Al Viro, David Woodhouse,
	Sam Ravnborg, john stultz, Thomas Gleixner

Andrew Morton <akpm@linux-foundation.org> writes:

> On Thu, 4 Sep 2008 11:05:21 -0700 (PDT) Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>
>> > This is a pull of your tree from yesterday, ending at commit 
>> > fbb16e243887332dd5754e48ffe5b963378f3cd2
>> 
>> There's been various suggested patches by Al/Eric (added to cc) for 
>> /proc/net handling, but none of them have actually even been merged yet. 
>> So I don't think this code has changed in a while. 
>> 
>> Al, Eric, ideas?

There aren't any issues I know of with normal configurations
and the current proc code.

> I don't think I saw it on any other test machines.
>
> This machine runs SELinux.  Distro is FC5.

>> > config: http://userweb.kernel.org/~akpm/config-sony.txt
>> > dmesg: http://userweb.kernel.org/~akpm/dmesg-sony-without.txt
>> 
>> That whole thing should just be a simple symlink:
>> 
>> 	fs/proc/proc_net.c:     proc_symlink("net", NULL, "self/net");
>
> /proc/self/net looks fine.
>
>> are you sure it's a plain tree of mine, without any of the patches 
>> floating around between Eric/Al?
>
> yup, it's yesterday's mainline.

Does the problem happen if you disable selinux?

This feels like a case of selinux being over zealous.

Eric

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: linux-next: Tree for September 3
  2008-09-04 20:31                       ` Eric W. Biederman
@ 2008-09-04 20:41                         ` Andrew Morton
  2008-09-04 21:03                           ` Eric W. Biederman
  0 siblings, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2008-09-04 20:41 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: torvalds, sfr, linux-next, linux-kernel, yhlu.kernel, ink,
	jbarnes, netdev, viro, dwmw2, sam, johnstul, tglx

On Thu, 04 Sep 2008 13:31:01 -0700
ebiederm@xmission.com (Eric W. Biederman) wrote:

> >> > config: http://userweb.kernel.org/~akpm/config-sony.txt
> >> > dmesg: http://userweb.kernel.org/~akpm/dmesg-sony-without.txt
> >> 
> >> That whole thing should just be a simple symlink:
> >> 
> >> 	fs/proc/proc_net.c:     proc_symlink("net", NULL, "self/net");
> >
> > /proc/self/net looks fine.
> >
> >> are you sure it's a plain tree of mine, without any of the patches 
> >> floating around between Eric/Al?
> >
> > yup, it's yesterday's mainline.
> 
> Does the problem happen if you disable selinux?
> 
> This feels like a case of selinux being over zealous.

yeah, adding `selinux=0' to the boot command line fixes it.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: linux-next: Tree for September 3
  2008-09-04 20:41                         ` Andrew Morton
@ 2008-09-04 21:03                           ` Eric W. Biederman
  2008-09-04 22:22                             ` Andrew Morton
  0 siblings, 1 reply; 15+ messages in thread
From: Eric W. Biederman @ 2008-09-04 21:03 UTC (permalink / raw)
  To: Andrew Morton
  Cc: torvalds, sfr, linux-next, linux-kernel, yhlu.kernel, ink,
	jbarnes, netdev, viro, dwmw2, sam, johnstul, tglx

Andrew Morton <akpm@linux-foundation.org> writes:

> On Thu, 04 Sep 2008 13:31:01 -0700
> ebiederm@xmission.com (Eric W. Biederman) wrote:
>
>> >> are you sure it's a plain tree of mine, without any of the patches 
>> >> floating around between Eric/Al?
>> >
>> > yup, it's yesterday's mainline.
>> 
>> Does the problem happen if you disable selinux?
>> 
>> This feels like a case of selinux being over zealous.
>
> yeah, adding `selinux=0' to the boot command line fixes it.

The proc generic directory back structure is the same.  As requested by
the selinux folks.  So I don't expect there is much more we can do on
the /proc side.

When we get the interaction bug between the VFS and /proc/net fixed I wonder
if there will be some more selinux fall out.  Something to think about.

Eric

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: linux-next: Tree for September 3
  2008-09-04 21:03                           ` Eric W. Biederman
@ 2008-09-04 22:22                             ` Andrew Morton
  0 siblings, 0 replies; 15+ messages in thread
From: Andrew Morton @ 2008-09-04 22:22 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: torvalds, sfr, linux-next, linux-kernel, yhlu.kernel, ink,
	jbarnes, netdev, viro, dwmw2, sam, johnstul, tglx

On Thu, 04 Sep 2008 14:03:41 -0700
ebiederm@xmission.com (Eric W. Biederman) wrote:

> Andrew Morton <akpm@linux-foundation.org> writes:
> 
> > On Thu, 04 Sep 2008 13:31:01 -0700
> > ebiederm@xmission.com (Eric W. Biederman) wrote:
> >
> >> >> are you sure it's a plain tree of mine, without any of the patches 
> >> >> floating around between Eric/Al?
> >> >
> >> > yup, it's yesterday's mainline.
> >> 
> >> Does the problem happen if you disable selinux?
> >> 
> >> This feels like a case of selinux being over zealous.
> >
> > yeah, adding `selinux=0' to the boot command line fixes it.
> 
> The proc generic directory back structure is the same.  As requested by
> the selinux folks.  So I don't expect there is much more we can do on
> the /proc side.
> 
> When we get the interaction bug between the VFS and /proc/net fixed I wonder
> if there will be some more selinux fall out.  Something to think about.

fyi, that machine is x86_32-on-FC5.  My x86_64-on-FC6 test box is
also running selinux and has the same bug.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: linux-next: Tree for September 3
  2008-09-04 18:34                     ` Andrew Morton
  2008-09-04 20:31                       ` Eric W. Biederman
@ 2008-09-04 22:45                       ` Thomas Gleixner
  2008-09-04 23:17                         ` Linus Torvalds
  2008-09-04 23:17                         ` Andrew Morton
  1 sibling, 2 replies; 15+ messages in thread
From: Thomas Gleixner @ 2008-09-04 22:45 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linus Torvalds, Stephen Rothwell, linux-next, LKML, Yinghai Lu,
	Ivan Kokshaysky, Jesse Barnes, netdev, Al Viro, Eric W. Biederman,
	David Woodhouse, Sam Ravnborg, john stultz

On Thu, 4 Sep 2008, Andrew Morton wrote:
> 
> and oh dear, the clockevents code just oopsed.

Sigh.
 
> BUG: unable to handle kernel NULL pointer dereference at 00000040
> IP: [<c0126e7f>] get_next_timer_interrupt+0xe9/0x1ab

Cute, NULL pointer in the timer check code. Can you please addr2line
the exact code line or upload the vmlinux somewhere ?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: linux-next: Tree for September 3
  2008-09-04 22:45                       ` Thomas Gleixner
@ 2008-09-04 23:17                         ` Linus Torvalds
  2008-09-05  5:39                           ` Arjan van de Ven
  2008-09-04 23:17                         ` Andrew Morton
  1 sibling, 1 reply; 15+ messages in thread
From: Linus Torvalds @ 2008-09-04 23:17 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andrew Morton, Stephen Rothwell, linux-next, LKML, Yinghai Lu,
	Ivan Kokshaysky, Jesse Barnes, netdev, Al Viro, Eric W. Biederman,
	David Woodhouse, Sam Ravnborg, john stultz



On Fri, 5 Sep 2008, Thomas Gleixner wrote:
>  
> > BUG: unable to handle kernel NULL pointer dereference at 00000040
> > IP: [<c0126e7f>] get_next_timer_interrupt+0xe9/0x1ab
> 
> Cute, NULL pointer in the timer check code. Can you please addr2line
> the exact code line or upload the vmlinux somewhere ?

Use "scrips/decodecode" (with AFLAGS=--32 since this is x86-32). It shows 
(after some cleanup and editing):

	   3:	89 f1                	mov    %esi,%ecx
	   5:	89 5d d0             	mov    %ebx,-0x30(%ebp)
	   8:	8b 45 d0             	mov    -0x30(%ebp),%eax
	   b:	89 d3                	mov    %edx,%ebx
	   d:	8d 04 c8             	lea    (%eax,%ecx,8),%eax
	  10:	89 45 d8             	mov    %eax,-0x28(%ebp)
	  13:	8b 00                	mov    (%eax),%eax
	  15:	eb 14                	jmp    0x2b   ----------------------+
	  17:	8b 40 08             	mov    0x8(%eax),%eax  <--------+   |
	  1a:	bb 01 00 00 00       	mov    $0x1,%ebx		|   |
	  1f:	3b 45 cc             	cmp    -0x34(%ebp),%eax		|   |
	  22:	0f 49 45 cc          	cmovns -0x34(%ebp),%eax		|   |
	  26:	89 45 cc             	mov    %eax,-0x34(%ebp)		|   |
	  29:	89 d0                	mov    %edx,%eax		|   |
***	  2b:	8b 10                	mov    (%eax),%edx		| <-+
	  2d:	0f 18 02             	prefetchnta (%edx)		| 
	  30:	90                   	nop    				|
	  31:	3b 45 d8             	cmp    -0x28(%ebp),%eax		|
	  34:	75 e1                	jne    17  ---------------------+
	  36:	85 db                	test   %ebx,%ebx
	  38:	89 da                	mov    %ebx,%edx
	  3a:	74 0c                	je     0x48
	  3c:	85 f6                	test   %esi,%esi
	  3e:	74 04                	je     0x44

and that "prefetchnta" is a dead giveaway: it's a "list_for_each_entry()" 
loop. And looking at the registers:

> > Pid: 0, comm: swapper Not tainted (2.6.27-rc5 #18)
> > EIP: 0060:[<c0126e7f>] EFLAGS: 00010013 CPU: 0
> > EIP is at get_next_timer_interrupt+0xe9/0x1ab
> > EAX: 00000040 EBX: 00000001 ECX: 0000001d EDX: 00000040
> > ESI: 0000001d EDI: c05bc700 EBP: c0469f1c ESP: c0469ee4
> >  DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
> > Process swapper (pid: 0, ti=c0468000 task=c04343c0 task.ti=c0468000)
> > Stack: ffff1cef c013cc1d c05bcf28 00000000 c05bd010 c05bc798 00ffff1d c05bcf28 
> >        c05bd128 c05bd328 c05bd528 00000000 b65eb8b3 0000000f c0469f4c c013816f 
> >        00000000 b65c1f00 0000000f ffff1cef 00000046 00000096 c04b11c0 00000000 
> > Call Trace:
> >  [<c013cc1d>] ? __lock_acquire+0x671/0x6b7
> >  [<c013816f>] ? tick_nohz_stop_sched_tick+0x13f/0x2ba

since %eax == %edx, it's not the first iteration through the loop.

IOW, it's this loop (kernel/timer.c, line 863):

                        list_for_each_entry(nte, varp->vec + slot, entry) {
                                found = 1;
                                if (time_before(nte->expires, expires))
                                        expires = nte->expires;
                        }


as can be seen by looking at the loop body (that "mov $0x1,%ebx" thing
is the "found = 1;" thing.

The next list entry pointer is obviously corrupt: it's 0x00000040, which 
is clearly not a valid pointer. 

Looks like %ecx contains 'slot' (0x1d), but that's the only other piece
of info I can see in the register state.

I do wonder if there isn't some memory corruption going on here. The 
SElinux thing didn't look very sane either (even if it's a SElinux 
permission issue, the inode is corrupt, since the mode is crap).

			Linus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: linux-next: Tree for September 3
  2008-09-04 22:45                       ` Thomas Gleixner
  2008-09-04 23:17                         ` Linus Torvalds
@ 2008-09-04 23:17                         ` Andrew Morton
  2008-09-04 23:25                           ` Linus Torvalds
  2008-09-04 23:27                           ` Thomas Gleixner
  1 sibling, 2 replies; 15+ messages in thread
From: Andrew Morton @ 2008-09-04 23:17 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: torvalds, sfr, linux-next, linux-kernel, yhlu.kernel, ink,
	jbarnes, netdev, viro, ebiederm, dwmw2, sam, johnstul

On Fri, 5 Sep 2008 00:45:33 +0200 (CEST)
Thomas Gleixner <tglx@linutronix.de> wrote:

> On Thu, 4 Sep 2008, Andrew Morton wrote:
> > 
> > and oh dear, the clockevents code just oopsed.
> 
> Sigh.
>  
> > BUG: unable to handle kernel NULL pointer dereference at 00000040
> > IP: [<c0126e7f>] get_next_timer_interrupt+0xe9/0x1ab
> 
> Cute, NULL pointer in the timer check code. Can you please addr2line
> the exact code line or upload the vmlinux somewhere ?
> 

erm, I might have lost that binary, and it only happened the once.  It
happened shortly after the machine had fully booted, during
establishment of the first sshd session.

It nuked the machine really well, too.  I had to pull the battery to
get it back.

fwiw:


(gdb) l *0xc0126e7f
0xc0126e7f is in get_next_timer_interrupt (kernel/timer.c:863).
warning: Source file is more recent than executable.
858             for (array = 0; array < 4; array++) {
859                     struct tvec *varp = varray[array];
860     
861                     index = slot = timer_jiffies & TVN_MASK;
862                     do {
863                             list_for_each_entry(nte, varp->vec + slot, entry) {
864                                     found = 1;
865                                     if (time_before(nte->expires, expires))
866                                             expires = nte->expires;
867                             }

which looks reasonable.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: linux-next: Tree for September 3
  2008-09-04 23:17                         ` Andrew Morton
@ 2008-09-04 23:25                           ` Linus Torvalds
  2008-09-04 23:27                           ` Thomas Gleixner
  1 sibling, 0 replies; 15+ messages in thread
From: Linus Torvalds @ 2008-09-04 23:25 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Thomas Gleixner, sfr, linux-next, linux-kernel, yhlu.kernel, ink,
	jbarnes, netdev, viro, ebiederm, dwmw2, sam, johnstul



On Thu, 4 Sep 2008, Andrew Morton wrote:
> 
> erm, I might have lost that binary, and it only happened the once.  It
> happened shortly after the machine had fully booted, during
> establishment of the first sshd session.

Considering that both this and your odd /proc/net issue look like memory 
corruption, maybe CONFIG_DEBUG_SLAB and CONFIG_DEBUG_PAGEALLOC are worth 
testing?

		Linus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: linux-next: Tree for September 3
  2008-09-04 23:17                         ` Andrew Morton
  2008-09-04 23:25                           ` Linus Torvalds
@ 2008-09-04 23:27                           ` Thomas Gleixner
  2008-09-05 11:04                             ` Ingo Molnar
  1 sibling, 1 reply; 15+ messages in thread
From: Thomas Gleixner @ 2008-09-04 23:27 UTC (permalink / raw)
  To: Andrew Morton
  Cc: torvalds, sfr, linux-next, linux-kernel, yhlu.kernel, ink,
	jbarnes, netdev, viro, ebiederm, dwmw2, sam, johnstul

On Thu, 4 Sep 2008, Andrew Morton wrote:
> > 
> > Cute, NULL pointer in the timer check code. Can you please addr2line
> > the exact code line or upload the vmlinux somewhere ?
> > 
> 
> erm, I might have lost that binary, and it only happened the once.  It
> happened shortly after the machine had fully booted, during
> establishment of the first sshd session.
> 
> It nuked the machine really well, too.  I had to pull the battery to
> get it back.

Known problem on Sonys. :(

> fwiw:
>
> (gdb) l *0xc0126e7f
> 0xc0126e7f is in get_next_timer_interrupt (kernel/timer.c:863).
> warning: Source file is more recent than executable.
> 858             for (array = 0; array < 4; array++) {
> 859                     struct tvec *varp = varray[array];
> 860     
> 861                     index = slot = timer_jiffies & TVN_MASK;
> 862                     do {
> 863                             list_for_each_entry(nte, varp->vec + slot, entry) {
> 864                                     found = 1;
> 865                                     if (time_before(nte->expires, expires))
> 866                                             expires = nte->expires;
> 867                             }
> 
> which looks reasonable.

Yeah, as Linus decoded it's that loop. So we look at some corrupted
entry here. 

CONFIG_DEBUG_OBJECTS (add debug_objects to the command line as well)
should catch it when this is a timer being discarded, freed or
reinitialized.

Otherwise, when it is just random corruption it wont help much.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: linux-next: Tree for September 3
  2008-09-04 23:17                         ` Linus Torvalds
@ 2008-09-05  5:39                           ` Arjan van de Ven
  0 siblings, 0 replies; 15+ messages in thread
From: Arjan van de Ven @ 2008-09-05  5:39 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, Andrew Morton, Stephen Rothwell, linux-next,
	LKML, Yinghai Lu, Ivan Kokshaysky, Jesse Barnes, netdev, Al Viro,
	Eric W. Biederman, David Woodhouse, Sam Ravnborg, john stultz

On Thu, 4 Sep 2008 16:17:01 -0700 (PDT)
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> 
> 
> On Fri, 5 Sep 2008, Thomas Gleixner wrote:
> >  
> > > BUG: unable to handle kernel NULL pointer dereference at 00000040
> > > IP: [<c0126e7f>] get_next_timer_interrupt+0xe9/0x1ab
> > 
> > Cute, NULL pointer in the timer check code. Can you please addr2line
> > the exact code line or upload the vmlinux somewhere ?
> 
> Use "scrips/decodecode" (with AFLAGS=--32 since this is x86-32). It
> shows (after some cleanup and editing):


btw if the oops was on lkml like it was here you can also just look it
up on kerneloops.org via the search option

which for this case finds you
http://www.kerneloops.org/raw.php?rawid=59347&msgid=http://mid.gmane.org/20080904113408.d47c65f6.akpm@linux-foundation.org

and this has the decodecode already done 

and the search output at
http://www.kerneloops.org/search.php?search=get_next_timer_interrupt
shows that there's only been very few reports of this one, but of the
ones there were at least half were slab poisoned.

what the site doesn't do for an oops like this is show the C-code
interspersed, it's not that smart for general kernels (that only works
for fedora rpm kernels like here:
http://www.kerneloops.org/raw.php?rawid=57807&msgid=
)

otoh if you have the vmlinux you could do this locally
(hmm maybe I should clean that script up and submit for adding to
the scripts/ directory)
-- 
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: linux-next: Tree for September 3
  2008-09-04 23:27                           ` Thomas Gleixner
@ 2008-09-05 11:04                             ` Ingo Molnar
  2008-09-05 17:49                               ` Andrew Morton
  0 siblings, 1 reply; 15+ messages in thread
From: Ingo Molnar @ 2008-09-05 11:04 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andrew Morton, torvalds, sfr, linux-next, linux-kernel,
	yhlu.kernel, ink, jbarnes, netdev, viro, ebiederm, dwmw2, sam,
	johnstul


* Thomas Gleixner <tglx@linutronix.de> wrote:

> On Thu, 4 Sep 2008, Andrew Morton wrote:
> > > 
> > > Cute, NULL pointer in the timer check code. Can you please addr2line
> > > the exact code line or upload the vmlinux somewhere ?
> > > 
> > 
> > erm, I might have lost that binary, and it only happened the once.  It
> > happened shortly after the machine had fully booted, during
> > establishment of the first sshd session.
> > 
> > It nuked the machine really well, too.  I had to pull the battery to
> > get it back.
> 
> Known problem on Sonys. :(
> 
> > fwiw:
> >
> > (gdb) l *0xc0126e7f
> > 0xc0126e7f is in get_next_timer_interrupt (kernel/timer.c:863).
> > warning: Source file is more recent than executable.
> > 858             for (array = 0; array < 4; array++) {
> > 859                     struct tvec *varp = varray[array];
> > 860     
> > 861                     index = slot = timer_jiffies & TVN_MASK;
> > 862                     do {
> > 863                             list_for_each_entry(nte, varp->vec + slot, entry) {
> > 864                                     found = 1;
> > 865                                     if (time_before(nte->expires, expires))
> > 866                                             expires = nte->expires;
> > 867                             }
> > 
> > which looks reasonable.
> 
> Yeah, as Linus decoded it's that loop. So we look at some corrupted
> entry here. 
> 
> CONFIG_DEBUG_OBJECTS (add debug_objects to the command line as well)
> should catch it when this is a timer being discarded, freed or
> reinitialized.
> 
> Otherwise, when it is just random corruption it wont help much.

i guess CONFIG_DEBUG_OBJECTS_TIMERS=y is practical, and 
CONFIG_DEBUG_LIST=y would be nice as well - it can catch memory 
corruptions rather early and is relatively light-weight.

[ and if there's any reproducability of the corruption and if it happens 
  at a stable kernel address then a small custom hack in ftrace can 
  catch it the moment it happens. ]

	Ingo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: linux-next: Tree for September 3
  2008-09-05 11:04                             ` Ingo Molnar
@ 2008-09-05 17:49                               ` Andrew Morton
  0 siblings, 0 replies; 15+ messages in thread
From: Andrew Morton @ 2008-09-05 17:49 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, torvalds, sfr, linux-next, linux-kernel,
	yhlu.kernel, ink, jbarnes, netdev, viro, ebiederm, dwmw2, sam,
	johnstul

On Fri, 5 Sep 2008 13:04:11 +0200 Ingo Molnar <mingo@elte.hu> wrote:

> 
> * Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> > On Thu, 4 Sep 2008, Andrew Morton wrote:
> > > > 
> > > > Cute, NULL pointer in the timer check code. Can you please addr2line
> > > > the exact code line or upload the vmlinux somewhere ?
> > > > 
> > > 
> > > erm, I might have lost that binary, and it only happened the once.  It
> > > happened shortly after the machine had fully booted, during
> > > establishment of the first sshd session.
> > > 
> > > It nuked the machine really well, too.  I had to pull the battery to
> > > get it back.
> > 
> > Known problem on Sonys. :(
> > 
> > > fwiw:
> > >
> > > (gdb) l *0xc0126e7f
> > > 0xc0126e7f is in get_next_timer_interrupt (kernel/timer.c:863).
> > > warning: Source file is more recent than executable.
> > > 858             for (array = 0; array < 4; array++) {
> > > 859                     struct tvec *varp = varray[array];
> > > 860     
> > > 861                     index = slot = timer_jiffies & TVN_MASK;
> > > 862                     do {
> > > 863                             list_for_each_entry(nte, varp->vec + slot, entry) {
> > > 864                                     found = 1;
> > > 865                                     if (time_before(nte->expires, expires))
> > > 866                                             expires = nte->expires;
> > > 867                             }
> > > 
> > > which looks reasonable.
> > 
> > Yeah, as Linus decoded it's that loop. So we look at some corrupted
> > entry here. 
> > 
> > CONFIG_DEBUG_OBJECTS (add debug_objects to the command line as well)
> > should catch it when this is a timer being discarded, freed or
> > reinitialized.
> > 
> > Otherwise, when it is just random corruption it wont help much.
> 
> i guess CONFIG_DEBUG_OBJECTS_TIMERS=y is practical, and 
> CONFIG_DEBUG_LIST=y would be nice as well - it can catch memory 
> corruptions rather early and is relatively light-weight.

I tested rc5-mm1 with all debug options except PAGEALLOC.  No help.

> [ and if there's any reproducability of the corruption and if it happens 
>   at a stable kernel address then a small custom hack in ftrace can 
>   catch it the moment it happens. ]

It was a once-off.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2008-09-05 17:49 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20080903191619.6b6b230e.sfr@canb.auug.org.au>
     [not found] ` <20080903214634.ea17ff53.akpm@linux-foundation.org>
     [not found]   ` <alpine.LFD.1.10.0809032201510.3378@nehalem.linux-foundation.org>
     [not found]     ` <20080903223318.84b6ce8b.akpm@linux-foundation.org>
     [not found]       ` <alpine.LFD.1.10.0809040045190.3378@nehalem.linux-foundation.org>
     [not found]         ` <20080904012544.cabed847.akpm@linux-foundation.org>
     [not found]           ` <alpine.LFD.1.10.0809040143350.3452@nehalem.linux-foundation.org>
     [not found]             ` <20080904015701.5959623a.akpm@linux-foundation.org>
     [not found]               ` <alpine.LFD.1.10.0809040203510.3452@nehalem.linux-foundation.org>
2008-09-04 17:45                 ` linux-next: Tree for September 3 Andrew Morton
2008-09-04 18:05                   ` Linus Torvalds
2008-09-04 18:34                     ` Andrew Morton
2008-09-04 20:31                       ` Eric W. Biederman
2008-09-04 20:41                         ` Andrew Morton
2008-09-04 21:03                           ` Eric W. Biederman
2008-09-04 22:22                             ` Andrew Morton
2008-09-04 22:45                       ` Thomas Gleixner
2008-09-04 23:17                         ` Linus Torvalds
2008-09-05  5:39                           ` Arjan van de Ven
2008-09-04 23:17                         ` Andrew Morton
2008-09-04 23:25                           ` Linus Torvalds
2008-09-04 23:27                           ` Thomas Gleixner
2008-09-05 11:04                             ` Ingo Molnar
2008-09-05 17:49                               ` Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).