netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* networking oops after resume from s2ram (2.6.28-rc6)
@ 2008-11-28 21:15 Marcin Slusarz
  2008-11-30  3:36 ` Andrew Morton
  0 siblings, 1 reply; 3+ messages in thread
From: Marcin Slusarz @ 2008-11-28 21:15 UTC (permalink / raw)
  To: LKML; +Cc: netdev

Hi

Sometimes after resume from s2ram networking doesn't work, so I restart it by
/etc/init.d/net.eth1 restart. Recently it started to lock up my box completely,
but today it oopsed only (and killed my keyboard, so I had to save dmesg with
mouse :D).

It looks like it tries to use netconsole without working network interface...

[ 1621.013789] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[ 1621.013793] IP: [<ffffffff803f5db9>] skge_xmit_frame+0xb8/0x3ba
[ 1621.013802] PGD 16880067 PUD 16894067 PMD 0 
[ 1621.013806] Oops: 0000 [#1] PREEMPT 
[ 1621.013808] last sysfs file: /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
[ 1621.013812] Dumping ftrace buffer:
[ 1621.013814]    (ftrace buffer empty)
[ 1621.013816] CPU 0 
[ 1621.013819] Pid: 6725, comm: ip Not tainted 2.6.28-rc6-stack+mode #51
[ 1621.013821] RIP: 0010:[<ffffffff803f5db9>]  [<ffffffff803f5db9>] skge_xmit_frame+0xb8/0x3ba
[ 1621.013825] RSP: 0018:ffff8800168c38b8  EFLAGS: 00010006
[ 1621.013827] RAX: 000000000000007f RBX: ffff88003e9c7090 RCX: 0000000000000001
[ 1621.013829] RDX: 0000000000000001 RSI: ffff88003e8b2840 RDI: ffff88003e95d700
[ 1621.013832] RBP: ffff8800168c3918 R08: 0000000000000002 R09: 0000000000000000
[ 1621.013834] R10: 0000000000000006 R11: 0000000000000000 R12: ffff88003e95d700
[ 1621.013836] R13: ffff88003e9c7090 R14: 0000000000000000 R15: 0000000000000001
[ 1621.013838] FS:  00007f47e467f6f0(0000) GS:ffffffff80738180(0000) knlGS:0000000000000000
[ 1621.013841] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1621.013843] CR2: 0000000000000000 CR3: 000000001689a000 CR4: 00000000000006e0
[ 1621.013845] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1621.013847] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1621.013850] Process ip (pid: 6725, threadinfo ffff8800168c2000, task ffff88001688da30)
[ 1621.013851] Stack:
[ 1621.013853]  ffff880026921800 0000000080737c20 ffff88003e8b2000 ffff88003e8b2840
[ 1621.013856]  ffff88003e87cf00 ffff88003e8b2000 0000000000000000 ffff88003e9ec120
[ 1621.013860]  0000000000000082 ffff88003e8b2000 0000000000000000 0000000000000001
[ 1621.013864] Call Trace:
[ 1621.013866]  [<ffffffff804f5a28>] netpoll_send_skb+0xcd/0x196
[ 1621.013871]  [<ffffffff804f5fe7>] netpoll_send_udp+0x202/0x20e
[ 1621.013874]  [<ffffffff80400081>] write_msg+0x80/0xbf
[ 1621.013879]  [<ffffffff8023220d>] __call_console_drivers+0x58/0x69
[ 1621.013884]  [<ffffffff8023227f>] _call_console_drivers+0x61/0x66
[ 1621.013887]  [<ffffffff802323b5>] release_console_sem+0x131/0x1d4
[ 1621.013890]  [<ffffffff80232a08>] vprintk+0x389/0x3b8
[ 1621.013894]  [<ffffffff80232a9e>] printk+0x67/0x69
[ 1621.013897]  [<ffffffff804e7785>] ? dev_set_rx_mode+0x19/0x2e
[ 1621.013902]  [<ffffffff80252e95>] ? __lock_acquire+0x6dd/0x73b
[ 1621.013905]  [<ffffffff80251379>] ? mark_held_locks+0x52/0x72
[ 1621.013908]  [<ffffffff802361f5>] ? local_bh_enable_ip+0xba/0xd6
[ 1621.013912]  [<ffffffff803f2a07>] skge_up+0x7c/0x88a
[ 1621.013916]  [<ffffffff804e7795>] ? dev_set_rx_mode+0x29/0x2e
[ 1621.013920]  [<ffffffff802361f5>] ? local_bh_enable_ip+0xba/0xd6
[ 1621.013923]  [<ffffffff804eb050>] dev_open+0x72/0xa6
[ 1621.013926]  [<ffffffff804e9211>] dev_change_flags+0xa8/0x167
[ 1621.013929]  [<ffffffff80527683>] devinet_ioctl+0x26a/0x5df
[ 1621.013933]  [<ffffffff8052845a>] inet_ioctl+0x92/0xaa
[ 1621.013936]  [<ffffffff804de260>] sock_ioctl+0x18c/0x1b8
[ 1621.013939]  [<ffffffff8029ed28>] vfs_ioctl+0x2a/0x77
[ 1621.013943]  [<ffffffff8029f110>] do_vfs_ioctl+0x39b/0x3e5
[ 1621.013946]  [<ffffffff802479ad>] ? up_read+0x26/0x2b
[ 1621.013950]  [<ffffffff8020b36c>] ? sysret_check+0x27/0x62
[ 1621.013953]  [<ffffffff8029f19c>] sys_ioctl+0x42/0x65
[ 1621.013956]  [<ffffffff8020b33b>] system_call_fastpath+0x16/0x1b
[ 1621.013960] Code: 48 c1 f8 03 0f b7 52 04 69 c0 cd cc cc cc 8d 44 30 ff ff c2 39 d0 0f 8c 00 03 00 00 48 8b 75 b8 4c 8b ae 98 00 00 00 4d 8b 75 08 <41> 83 3e 00 79 04 0f 0b eb fe 4d 89 65 10 41 8b 44 24 68 45 31 
[ 1621.013988] RIP  [<ffffffff803f5db9>] skge_xmit_frame+0xb8/0x3ba
[ 1621.013991]  RSP <ffff8800168c38b8>
[ 1621.013993] CR2: 0000000000000000
[ 1621.013995] ---[ end trace 50a82573cfe89df3 ]---
[ 1621.013998] note: ip[6725] exited with preempt_count 3

Full demsg and config can be found at:
http://www.kadu.net/~joi/kernel/2008.11.28/skge-net.oops/

Marcin

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: networking oops after resume from s2ram (2.6.28-rc6)
  2008-11-28 21:15 networking oops after resume from s2ram (2.6.28-rc6) Marcin Slusarz
@ 2008-11-30  3:36 ` Andrew Morton
  2008-11-30 12:59   ` Marcin Slusarz
  0 siblings, 1 reply; 3+ messages in thread
From: Andrew Morton @ 2008-11-30  3:36 UTC (permalink / raw)
  To: Marcin Slusarz; +Cc: LKML, netdev

On Fri, 28 Nov 2008 22:15:40 +0100 Marcin Slusarz <marcin.slusarz@gmail.com> wrote:

> Hi
> 
> Sometimes after resume from s2ram networking doesn't work, so I restart it by
> /etc/init.d/net.eth1 restart. Recently it started to lock up my box completely,
> but today it oopsed only (and killed my keyboard, so I had to save dmesg with
> mouse :D).
> 
> It looks like it tries to use netconsole without working network interface...
> 
> [ 1621.013789] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
> [ 1621.013793] IP: [<ffffffff803f5db9>] skge_xmit_frame+0xb8/0x3ba
> [ 1621.013802] PGD 16880067 PUD 16894067 PMD 0 
> [ 1621.013806] Oops: 0000 [#1] PREEMPT 
> [ 1621.013808] last sysfs file: /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
> [ 1621.013812] Dumping ftrace buffer:
> [ 1621.013814]    (ftrace buffer empty)
> [ 1621.013816] CPU 0 
> [ 1621.013819] Pid: 6725, comm: ip Not tainted 2.6.28-rc6-stack+mode #51
> [ 1621.013821] RIP: 0010:[<ffffffff803f5db9>]  [<ffffffff803f5db9>] skge_xmit_frame+0xb8/0x3ba

skge driver went splat, I guess.

It would be fun to try using this:


From: Arjan van de Ven <arjan@infradead.org>

We're struggling all the time to figure out where the code came from that
oopsed..  The script below (a adaption from a script used by
kerneloops.org) can help developers quite a bit, at least for non-module
cases.

It works and looks like this:

[/home/arjan/linux]$ dmesg | perl scripts/markup_oops.pl vmlinux
 {
 	struct agp_memory *memory;

 	memory = agp_allocate_memory(agp_bridge, pg_count, type);
 c055c10f:	89 c2                	mov    %eax,%edx
 	if (memory == NULL)
 c055c111:	74 19                	je     c055c12c <agp_allocate_memory_wrap+0x30>
 /* This function must only be called when current_controller != NULL */
 static void agp_insert_into_pool(struct agp_memory * temp)
 {
 	struct agp_memory *prev;

 	prev = agp_fe.current_controller->pool;
 c055c113:	a1 ec dc 8f c0       	mov    0xc08fdcec,%eax
*c055c118:	8b 40 10             	mov    0x10(%eax),%eax     <----- faulting instruction

 	if (prev != NULL) {
 c055c11b:	85 c0                	test   %eax,%eax
 c055c11d:	74 05                	je     c055c124 <agp_allocate_memory_wrap+0x28>
 		prev->prev = temp;
 c055c11f:	89 50 04             	mov    %edx,0x4(%eax)
 		temp->next = prev;
 c055c122:	89 02                	mov    %eax,(%edx)
 	}
 	agp_fe.current_controller->pool = temp;
 c055c124:	a1 ec dc 8f c0       	mov    0xc08fdcec,%eax
 c055c129:	89 50 10             	mov    %edx,0x10(%eax)
 	if (memory == NULL)
 		return NULL;

 	agp_insert_into_pool(memory);

so in this case, we faulted while dereferencing agp_fe.current_controller pointer, and we get to see
exactly which function and line it affects... Personally I find this very useful, and I can
see value for having this script in the kernel for more-than-just-me to use.

Caveats:
* It only works for oopses not-in-modules
* It only works nicely for kernels compiled with CONFIG_DEBUG_INFO
* It's not very fast.
* It only works on x86

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 scripts/markup_oops.pl |  162 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 162 insertions(+)

diff -puN /dev/null scripts/markup_oops.pl
--- /dev/null
+++ a/scripts/markup_oops.pl
@@ -0,0 +1,162 @@
+#!/usr/bin/perl -w
+
+# Copyright 2008, Intel Corporation
+#
+# This file is part of the Linux kernel
+#
+# This program file is free software; you can redistribute it and/or modify it
+# under the terms of the GNU General Public License as published by the
+# Free Software Foundation; version 2 of the License.
+#
+# Authors:
+# 	Arjan van de Ven <arjan@linux.intel.com>
+
+
+my $vmlinux_name = $ARGV[0];
+
+#
+# Step 1: Parse the oops to find the EIP value
+#
+
+my $target = "0";
+while (<STDIN>) {
+	if ($_ =~ /EIP: 0060:\[\<([a-z0-9]+)\>\]/) {
+		$target = $1;
+	}
+}
+
+if ($target =~ /^f8/) {
+	print "This script does not work on modules ... \n";
+	exit;
+}
+
+if ($target eq "0") {
+	print "No oops found!\n";
+	print "Usage: \n";
+	print "    dmesg | perl scripts/markup_oops.pl vmlinux\n";
+	exit;
+}
+
+my $counter = 0;
+my $state   = 0;
+my $center  = 0;
+my @lines;
+
+sub InRange {
+	my ($address, $target) = @_;
+	my $ad = "0x".$address;
+	my $ta = "0x".$target;
+	my $delta = hex($ad) - hex($ta);
+
+	if (($delta > -4096) && ($delta < 4096)) {
+		return 1;
+	}
+	return 0;
+}
+
+
+
+# first, parse the input into the lines array, but to keep size down,
+# we only do this for 4Kb around the sweet spot
+
+my $filename;
+
+open(FILE, "objdump -dS $vmlinux_name |") || die "Cannot start objdump";
+
+while (<FILE>) {
+	my $line = $_;
+	chomp($line);
+	if ($state == 0) {
+		if ($line =~ /^([a-f0-9]+)\:/) {
+			if (InRange($1, $target)) {
+				$state = 1;
+			}
+		}
+	} else {
+		if ($line =~ /^([a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9][a-f0-9]+)\:/) {
+			my $val = $1;
+			if (!InRange($val, $target)) {
+				last;
+			}
+			if ($val eq $target) {
+				$center = $counter;
+			}
+		}
+		$lines[$counter] = $line;
+
+		$counter = $counter + 1;
+	}
+}
+
+close(FILE);
+
+if ($counter == 0) {
+	print "No matching code found \n";
+	exit;
+}
+
+if ($center == 0) {
+	print "No matching code found \n";
+	exit;
+}
+
+my $start;
+my $finish;
+my $codelines = 0;
+my $binarylines = 0;
+# now we go up and down in the array to find how much we want to print
+
+$start = $center;
+
+while ($start > 1) {
+	$start = $start - 1;
+	my $line = $lines[$start];
+	if ($line =~ /^([a-f0-9]+)\:/) {
+		$binarylines = $binarylines + 1;
+	} else {
+		$codelines = $codelines + 1;
+	}
+	if ($codelines > 10) {
+		last;
+	}
+	if ($binarylines > 20) {
+		last;
+	}
+}
+
+
+$finish = $center;
+$codelines = 0;
+$binarylines = 0;
+while ($finish < $counter) {
+	$finish = $finish + 1;
+	my $line = $lines[$finish];
+	if ($line =~ /^([a-f0-9]+)\:/) {
+		$binarylines = $binarylines + 1;
+	} else {
+		$codelines = $codelines + 1;
+	}
+	if ($codelines > 10) {
+		last;
+	}
+	if ($binarylines > 20) {
+		last;
+	}
+}
+
+
+my $i;
+
+my $fulltext = "";
+$i = $start;
+while ($i < $finish) {
+	if ($i == $center) {
+		$fulltext = $fulltext . "*$lines[$i]     <----- faulting instruction\n";
+	} else {
+		$fulltext = $fulltext .  " $lines[$i]\n";
+	}
+	$i = $i +1;
+}
+
+print $fulltext;
+
_


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: networking oops after resume from s2ram (2.6.28-rc6)
  2008-11-30  3:36 ` Andrew Morton
@ 2008-11-30 12:59   ` Marcin Slusarz
  0 siblings, 0 replies; 3+ messages in thread
From: Marcin Slusarz @ 2008-11-30 12:59 UTC (permalink / raw)
  To: Andrew Morton; +Cc: LKML, netdev, Arjan van de Ven

On Sat, Nov 29, 2008 at 07:36:56PM -0800, Andrew Morton wrote:
> On Fri, 28 Nov 2008 22:15:40 +0100 Marcin Slusarz <marcin.slusarz@gmail.com> wrote:
> 
> > Hi
> > 
> > Sometimes after resume from s2ram networking doesn't work, so I restart it by
> > /etc/init.d/net.eth1 restart. Recently it started to lock up my box completely,
> > but today it oopsed only (and killed my keyboard, so I had to save dmesg with
> > mouse :D).
> > 
> > It looks like it tries to use netconsole without working network interface...
> > 
> > [ 1621.013789] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
> > [ 1621.013793] IP: [<ffffffff803f5db9>] skge_xmit_frame+0xb8/0x3ba
> > [ 1621.013802] PGD 16880067 PUD 16894067 PMD 0 
> > [ 1621.013806] Oops: 0000 [#1] PREEMPT 
> > [ 1621.013808] last sysfs file: /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
> > [ 1621.013812] Dumping ftrace buffer:
> > [ 1621.013814]    (ftrace buffer empty)
> > [ 1621.013816] CPU 0 
> > [ 1621.013819] Pid: 6725, comm: ip Not tainted 2.6.28-rc6-stack+mode #51
> > [ 1621.013821] RIP: 0010:[<ffffffff803f5db9>]  [<ffffffff803f5db9>] skge_xmit_frame+0xb8/0x3ba
> 
> skge driver went splat, I guess.
> 
> It would be fun to try using this:

I had to "add support" for x86_64 oopses. Patch at the end of mail.
 
> 
> From: Arjan van de Ven <arjan@infradead.org>
> 
> We're struggling all the time to figure out where the code came from that
> oopsed..  The script below (a adaption from a script used by
> kerneloops.org) can help developers quite a bit, at least for non-module
> cases.
> 
> It works and looks like this:
> 
> [/home/arjan/linux]$ dmesg | perl scripts/markup_oops.pl vmlinux
> (...)
> diff -puN /dev/null scripts/markup_oops.pl
> --- /dev/null
> +++ a/scripts/markup_oops.pl
> (...)

ffffffff803f5db9
        u64 map;

        if (skb_padto(skb, ETH_ZLEN))
                return NETDEV_TX_OK;

        if (unlikely(skge_avail(&skge->tx_ring) < skb_shinfo(skb)->nr_frags + 1))
 ffffffff803f5d75:      41 8b 94 24 a8 00 00    mov    0xa8(%r12),%edx
 ffffffff803f5d7c:      00
 ffffffff803f5d7d:      48 2b 41 08             sub    0x8(%rcx),%rax
 ffffffff803f5d81:      49 03 94 24 b0 00 00    add    0xb0(%r12),%rdx
 ffffffff803f5d88:      00
 ffffffff803f5d89:      b9 01 00 00 00          mov    $0x1,%ecx
 ffffffff803f5d8e:      48 c1 f8 03             sar    $0x3,%rax
 ffffffff803f5d92:      0f b7 52 04             movzwl 0x4(%rdx),%edx
 ffffffff803f5d96:      69 c0 cd cc cc cc       imul   $0xcccccccd,%eax,%eax
 ffffffff803f5d9c:      8d 44 30 ff             lea    -0x1(%rax,%rsi,1),%eax
 ffffffff803f5da0:      ff c2                   inc    %edx
 ffffffff803f5da2:      39 d0                   cmp    %edx,%eax
 ffffffff803f5da4:      0f 8c 00 03 00 00       jl     ffffffff803f60aa <skge_xmit_frame+0x3a9>
                return NETDEV_TX_BUSY;

        e = skge->tx_ring.to_use;
 ffffffff803f5daa:      48 8b 75 b8             mov    -0x48(%rbp),%rsi
 ffffffff803f5dae:      4c 8b ae 98 00 00 00    mov    0x98(%rsi),%r13
        td = e->desc;
 ffffffff803f5db5:      4d 8b 75 08             mov    0x8(%r13),%r14
        BUG_ON(td->control & BMU_OWN);
*ffffffff803f5db9:      41 83 3e 00             cmpl   $0x0,(%r14)     <----- faulting instruction
 ffffffff803f5dbd:      79 04                   jns    ffffffff803f5dc3 <skge_xmit_frame+0xc2>
 ffffffff803f5dbf:      0f 0b                   ud2a
 ffffffff803f5dc1:      eb fe                   jmp    ffffffff803f5dc1 <skge_xmit_frame+0xc0>
        e->skb = skb;
 ffffffff803f5dc3:      4d 89 65 10             mov    %r12,0x10(%r13)
        return skb->data_len;
 }

 static inline unsigned int skb_headlen(const struct sk_buff *skb)
 {
        return skb->len - skb->data_len;
 ffffffff803f5dc7:      41 8b 44 24 68          mov    0x68(%r12),%eax
 }

 static inline dma_addr_t


---
From: Marcin Slusarz <marcin.slusarz@gmail.com>
Subject: [PATCH] markup_oops.pl: "add support" for x86_64

Find instruction pointer in x86_64 oopses.

-w removed because it spammed with:
"Hexadecimal number > 0xffffffff non-portable at scripts/markup_oops.pl line 52, <FILE> line 383."

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: Arjan van de Ven <arjan@infradead.org>
---
 scripts/markup_oops.pl |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/scripts/markup_oops.pl b/scripts/markup_oops.pl
index 700a7a6..7038679 100644
--- a/scripts/markup_oops.pl
+++ b/scripts/markup_oops.pl
@@ -1,4 +1,4 @@
-#!/usr/bin/perl -w
+#!/usr/bin/perl
 
 # Copyright 2008, Intel Corporation
 #
@@ -23,6 +23,9 @@ while (<STDIN>) {
 	if ($_ =~ /EIP: 0060:\[\<([a-z0-9]+)\>\]/) {
 		$target = $1;
 	}
+	if ($_ =~ /RIP: 0010:\[\<([a-z0-9]+)\>\]/) {
+		$target = $1;
+	}
 }
 
 if ($target =~ /^f8/) {
-- 


^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2008-11-30 12:59 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-11-28 21:15 networking oops after resume from s2ram (2.6.28-rc6) Marcin Slusarz
2008-11-30  3:36 ` Andrew Morton
2008-11-30 12:59   ` Marcin Slusarz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).