2.6.11 on AMD64 traps

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* 2.6.11 on AMD64 traps
@ 2005-03-08 18:00 Michal Vanco
  2005-03-08 18:35 ` Andre Tomt
  0 siblings, 1 reply; 6+ messages in thread
From: Michal Vanco @ 2005-03-08 18:00 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1832 bytes --]

Hello,

I see this problem running 2.6.11 on dual AMD64:

Running quagga routing daemon (ospf+bgp) and issuing "netstat -rn |wc -l" command
while quagga tries to load more than 154000 routes from its bgp neighbours causes this trap:

Unable to handle kernel paging request at 00000000007f5c60 RIP:
<ffffffff8041be35>{fib_get_next+181}
PGD 3a112067 PUD 3a115067 PMD 0
Oops: 0000 [1] SMP
CPU 1
Modules linked in:
Pid: 2537, comm: netstat Not tainted 2.6.11-mv
RIP: 0010:[<ffffffff8041be35>] <ffffffff8041be35>{fib_get_next+181}
RSP: 0018:ffff81003a13fe90  EFLAGS: 00010206
RAX: ffff81003a74c000 RBX: 0000000000000000 RCX: ffff81003a13ff50
RDX: 00000000007f5c60 RSI: 0000000000000000 RDI: ffff81003a004d00
RBP: ffff81003a13fed8 R08: ffff81003f3ff7c0 R09: 0000000000000800
R10: 00007fffffffefe0 R11: 0000000000000246 R12: ffff810002231480
R13: 00002aaaaab08000 R14: 0000000000000400 R15: ffff8100022314a8
FS:  00002aaaaae00620(0000) GS:ffffffff806195c0(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000000007f5c60 CR3: 000000003a12e000 CR4: 00000000000006e0
Process netstat (pid: 2537, threadinfo ffff81003a13e000, task ffff81003a66a760)
Stack: ffffffff8041bf0f ffff810002231480 ffff81003a67ac80 0000000000000000
ffffffff8019576b 0000000000000000 ffff81003a13ff50 00002aaaaab08000
00000000000006f7 00000000000006f8
Call Trace:<ffffffff8041bf0f>{fib_seq_start+63} <ffffffff8019576b>{seq_read+219}
<ffffffff8017497f>{vfs_read+191} <ffffffff80174c53>{sys_read+83}
<ffffffff8010d1ba>{system_call+126}

Code: 48 8b 0a 0f 18 09 48 8b 72 10 48 8b 06 0f 18 08 48 8d 42 10
RIP <ffffffff8041be35>{fib_get_next+181} RSP <ffff81003a13fe90>
CR2: 00000000007f5c60

I saw the same issue on 2.6.10 before. I'm not a kernel hacker but it sounds like
locking problem. But may be I'm totally wrong in this.

michal

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.6.11 on AMD64 traps
  2005-03-08 18:00 2.6.11 on AMD64 traps Michal Vanco
@ 2005-03-08 18:35 ` Andre Tomt
  2005-03-09 19:45   ` Patrick McHardy
  0 siblings, 1 reply; 6+ messages in thread
From: Andre Tomt @ 2005-03-08 18:35 UTC (permalink / raw)
  To: Michal Vanco; +Cc: linux-kernel, Netdev

[just adding netdev to CC, from LKML]

Michal Vanco wrote:
> Hello,
> 
> I see this problem running 2.6.11 on dual AMD64:
> 
> Running quagga routing daemon (ospf+bgp) and issuing "netstat -rn |wc -l" command
> while quagga tries to load more than 154000 routes from its bgp neighbours causes this trap:
> 
> Unable to handle kernel paging request at 00000000007f5c60 RIP:
> <ffffffff8041be35>{fib_get_next+181}
> PGD 3a112067 PUD 3a115067 PMD 0
> Oops: 0000 [1] SMP
> CPU 1
> Modules linked in:
> Pid: 2537, comm: netstat Not tainted 2.6.11-mv
> RIP: 0010:[<ffffffff8041be35>] <ffffffff8041be35>{fib_get_next+181}
> RSP: 0018:ffff81003a13fe90  EFLAGS: 00010206
> RAX: ffff81003a74c000 RBX: 0000000000000000 RCX: ffff81003a13ff50
> RDX: 00000000007f5c60 RSI: 0000000000000000 RDI: ffff81003a004d00
> RBP: ffff81003a13fed8 R08: ffff81003f3ff7c0 R09: 0000000000000800
> R10: 00007fffffffefe0 R11: 0000000000000246 R12: ffff810002231480
> R13: 00002aaaaab08000 R14: 0000000000000400 R15: ffff8100022314a8
> FS:  00002aaaaae00620(0000) GS:ffffffff806195c0(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 00000000007f5c60 CR3: 000000003a12e000 CR4: 00000000000006e0
> Process netstat (pid: 2537, threadinfo ffff81003a13e000, task ffff81003a66a760)
> Stack: ffffffff8041bf0f ffff810002231480 ffff81003a67ac80 0000000000000000
> ffffffff8019576b 0000000000000000 ffff81003a13ff50 00002aaaaab08000
> 00000000000006f7 00000000000006f8
> Call Trace:<ffffffff8041bf0f>{fib_seq_start+63} <ffffffff8019576b>{seq_read+219}
> <ffffffff8017497f>{vfs_read+191} <ffffffff80174c53>{sys_read+83}
> <ffffffff8010d1ba>{system_call+126}
> 
> Code: 48 8b 0a 0f 18 09 48 8b 72 10 48 8b 06 0f 18 08 48 8d 42 10
> RIP <ffffffff8041be35>{fib_get_next+181} RSP <ffff81003a13fe90>
> CR2: 00000000007f5c60
> 
> I saw the same issue on 2.6.10 before. I'm not a kernel hacker but it sounds like
> locking problem. But may be I'm totally wrong in this.
> 
> michal


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.6.11 on AMD64 traps
  2005-03-08 18:35 ` Andre Tomt
@ 2005-03-09 19:45   ` Patrick McHardy
  2005-03-09 20:24     ` Michal Vanco
  2005-03-11  2:20     ` David S. Miller
  0 siblings, 2 replies; 6+ messages in thread
From: Patrick McHardy @ 2005-03-09 19:45 UTC (permalink / raw)
  To: Andre Tomt; +Cc: Michal Vanco, linux-kernel, Netdev, David S. Miller

[-- Attachment #1: Type: text/plain, Size: 435 bytes --]

> Michal Vanco wrote:
>>
>> I see this problem running 2.6.11 on dual AMD64:
>>
>> Running quagga routing daemon (ospf+bgp) and issuing "netstat -rn |wc 
>> -l" command
>> while quagga tries to load more than 154000 routes from its bgp 
>> neighbours causes this trap:

This patch should fix it. The crash is caused by stale pointers,
the pointers in fib_iter_state are not reloaded after seq->stop()
followed by seq->start(pos > 0).


[-- Attachment #2: x --]
[-- Type: text/plain, Size: 1158 bytes --]

# This is a BitKeeper generated diff -Nru style patch.
#
# ChangeSet
#   2005/03/09 20:41:46+01:00 kaber@coreworks.de 
#   [IPV4]: Fix crash while reading /proc/net/route caused by stale pointers
#   
#   Signed-off-by: Patrick McHardy <kaber@trash.net>
# 
# net/ipv4/fib_hash.c
#   2005/03/09 20:41:37+01:00 kaber@coreworks.de +11 -1
#   [IPV4]: Fix crash while reading /proc/net/route caused by stale pointers
#   
#   Signed-off-by: Patrick McHardy <kaber@trash.net>
# 
diff -Nru a/net/ipv4/fib_hash.c b/net/ipv4/fib_hash.c
--- a/net/ipv4/fib_hash.c	2005-03-09 20:43:55 +01:00
+++ b/net/ipv4/fib_hash.c	2005-03-09 20:43:55 +01:00
@@ -919,13 +919,23 @@
 	return fa;
 }
 
+static struct fib_alias *fib_get_idx(struct seq_file *seq, loff_t pos)
+{
+	struct fib_alias *fa = fib_get_first(seq);
+
+	if (fa)
+		while (pos && (fa = fib_get_next(seq)))
+			--pos;
+	return pos ? NULL : fa;
+}
+
 static void *fib_seq_start(struct seq_file *seq, loff_t *pos)
 {
 	void *v = NULL;
 
 	read_lock(&fib_hash_lock);
 	if (ip_fib_main_table)
-		v = *pos ? fib_get_next(seq) : SEQ_START_TOKEN;
+		v = *pos ? fib_get_idx(seq, *pos - 1) : SEQ_START_TOKEN;
 	return v;
 }
 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.6.11 on AMD64 traps
  2005-03-09 19:45   ` Patrick McHardy
@ 2005-03-09 20:24     ` Michal Vanco
  2005-03-09 20:34       ` Patrick McHardy
  2005-03-11  2:20     ` David S. Miller
  1 sibling, 1 reply; 6+ messages in thread
From: Michal Vanco @ 2005-03-09 20:24 UTC (permalink / raw)
  To: netdev; +Cc: Patrick McHardy, Andre Tomt, linux-kernel, David S. Miller

[-- Attachment #1: Type: text/plain, Size: 979 bytes --]

On Wednesday 09 March 2005 20:45, Patrick McHardy wrote:
> > Michal Vanco wrote:
> >> I see this problem running 2.6.11 on dual AMD64:
> >>
> >> Running quagga routing daemon (ospf+bgp) and issuing "netstat -rn |wc
> >> -l" command
> >> while quagga tries to load more than 154000 routes from its bgp
> >> neighbours causes this trap:
>
> This patch should fix it. The crash is caused by stale pointers,
> the pointers in fib_iter_state are not reloaded after seq->stop()
> followed by seq->start(pos > 0).

Well. Trap vanished after applying this patch, but another weird thing occurs:

# ip route show | wc -l
156033
# date; time ip route show > /dev/null; date; time netstat -rn > /dev/null
Wed Mar  9 22:15:21 CET 2005

real    0m0.656s
user    0m0.415s
sys     0m0.242s
Wed Mar  9 22:15:22 CET 2005

real    6m41.472s
user    0m1.261s
sys     6m40.143s

regards,
-- 
Ing. Michal Vančo
Network Engineer
SATRO s.r.o.
e-mail: vanco@satro.sk

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.6.11 on AMD64 traps
  2005-03-09 20:24     ` Michal Vanco
@ 2005-03-09 20:34       ` Patrick McHardy
  0 siblings, 0 replies; 6+ messages in thread
From: Patrick McHardy @ 2005-03-09 20:34 UTC (permalink / raw)
  To: Michal Vanco; +Cc: netdev, Andre Tomt, linux-kernel, David S. Miller

Michal Vanco wrote:
> On Wednesday 09 March 2005 20:45, Patrick McHardy wrote:
>>
>>This patch should fix it. The crash is caused by stale pointers,
>>the pointers in fib_iter_state are not reloaded after seq->stop()
>>followed by seq->start(pos > 0).
> 
> Well. Trap vanished after applying this patch, but another weird thing occurs:
> 
> # ip route show | wc -l
> 156033
> # date; time ip route show > /dev/null; date; time netstat -rn > /dev/null
> Wed Mar  9 22:15:21 CET 2005
> 
> real    0m0.656s
> user    0m0.415s
> sys     0m0.242s
> Wed Mar  9 22:15:22 CET 2005
> 
> real    6m41.472s
> user    0m1.261s
> sys     6m40.143s

Yes, I know it is totally inefficient. Just use ip route, which doesn't
suffer from this problem.

Regards
Patrick

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.6.11 on AMD64 traps
  2005-03-09 19:45   ` Patrick McHardy
  2005-03-09 20:24     ` Michal Vanco
@ 2005-03-11  2:20     ` David S. Miller
  1 sibling, 0 replies; 6+ messages in thread
From: David S. Miller @ 2005-03-11  2:20 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: andre, vanco, linux-kernel, netdev

On Wed, 09 Mar 2005 20:45:35 +0100
Patrick McHardy <kaber@trash.net> wrote:

> > Michal Vanco wrote:
> >>
> >> I see this problem running 2.6.11 on dual AMD64:
> >>
> >> Running quagga routing daemon (ospf+bgp) and issuing "netstat -rn |wc 
> >> -l" command
> >> while quagga tries to load more than 154000 routes from its bgp 
> >> neighbours causes this trap:
> 
> This patch should fix it. The crash is caused by stale pointers,
> the pointers in fib_iter_state are not reloaded after seq->stop()
> followed by seq->start(pos > 0).

Applied, thanks Patrick.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2005-03-11  2:26 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-03-08 18:00 2.6.11 on AMD64 traps Michal Vanco
2005-03-08 18:35 ` Andre Tomt
2005-03-09 19:45   ` Patrick McHardy
2005-03-09 20:24     ` Michal Vanco
2005-03-09 20:34       ` Patrick McHardy
2005-03-11  2:20     ` David S. Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox