All of lore.kernel.org
 help / color / mirror / Atom feed
* [LARTC] 2.4.20 htb3 oops
@ 2003-01-24 12:54 Mihai RUSU
  2003-01-24 14:29 ` Catalin Bucur
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: Mihai RUSU @ 2003-01-24 12:54 UTC (permalink / raw)
  To: lartc

Hello

After 4 days of uptime the new 2.4.20 kernel with HTB3 (previously we used
HTB2 and 2.4.9-31) oopsed and in short time I could not even ssh to the
system. Here is the ksymoops filtered message:

ksymoops 2.4.8 on i686 2.4.20-xfs.  Options used
     -V (default)
     -k /proc/ksyms (default)
     -l /proc/modules (default)
     -o /lib/modules/2.4.20-xfs/ (default)
     -m /usr/src/linux/System.map (default)

Warning: You did not tell me where to find symbol information.  I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc.  ksymoops -h explains the options.

Oops: 0000
CPU:    1
EIP:    0010:[<c021be25>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010206
eax: 000000b0   ebx: 00000000   ecx: c031ad88   edx: 000000b0
esi: edb63de0   edi: f5220880   ebp: f5220880   esp: edb63d98
ds: 0018   es: 0018   ss: 0018
Process tc (pid: 7596, stackpageíb63000)
Stack: f5220800 00001831 ede963a0 ede963a0 f5220880 00000003 f5220860
c0214115
       f5220800 edb63de0 f69b3960 f2bc0140 ede963a0 f2bc0140 ede963b8
f28f95b0
       f757e800 00001831 00000000 000002a5 000002a6 c0214004 f69b3960
ede963a0
Call Trace:    [<c0214115>] [<c0214004>] [<c0221db7>] [<c0208bd9>]
[<c0221b9b>]
  [<c0221bc2>] [<c0203c91>] [<c020510f>] [<c0210010>] [<c0228186>]
[<c02281f0>]
  [<c0226ffa>] [<c0114515>] [<c02053ed>] [<c0106f17>]
Code: 8b 03 0f 18 00 39 fb 75 c2 83 44 24 10 08 83 c5 08 ff 44 24


>>EIP; c021be25 <htb_walk+89/ad>   <==
>>ecx; c031ad88 <irq_stat+28/400>
>>esi; edb63de0 <_end+2d81f388/384c35a8>
>>edi; f5220880 <_end+34edbe28/384c35a8>
>>ebp; f5220880 <_end+34edbe28/384c35a8>
>>esp; edb63d98 <_end+2d81f340/384c35a8>

Trace; c0214115 <tc_dump_tclass+dd/130>
Trace; c0214004 <qdisc_class_dump+0/34>
Trace; c0221db7 <netlink_dump+7f/1d4>
Trace; c0208bd9 <skb_free_datagram+1d/24>
Trace; c0221b9b <netlink_recvmsg+b7/134>
Trace; c0221bc2 <netlink_recvmsg+de/134>
Trace; c0203c91 <sock_recvmsg+3d/ac>
Trace; c020510f <sys_recvmsg+15b/204>
Trace; c0210010 <.text.lock.neighbour+24a/34a>
Trace; c0228186 <ip_forward+1a6/210>
Trace; c02281f0 <ip_forward_finish+0/60>
Trace; c0226ffa <ip_rcv+31a/3b0>
Trace; c0114515 <schedule+49d/560>
Trace; c02053ed <sys_socketcall+1f5/200>
Trace; c0106f17 <system_call+33/38>

Code;  c021be25 <htb_walk+89/ad>
00000000 <_EIP>:
Code;  c021be25 <htb_walk+89/ad>   <==   0:   8b 03                     mov    (%ebx),%eax   <==Code;  c021be27 <htb_walk+8b/ad>
   2:   0f 18 00                  prefetchnta (%eax)
Code;  c021be2a <htb_walk+8e/ad>
   5:   39 fb                     cmp    %edi,%ebx
Code;  c021be2c <htb_walk+90/ad>
   7:   75 c2                     jne    ffffffcb <_EIP+0xffffffcb>
c021bdf0 <htb_walk+54/ad>
Code;  c021be2e <htb_walk+92/ad>
   9:   83 44 24 10 08            addl   $0x8,0x10(%esp,1)
Code;  c021be33 <htb_walk+97/ad>
   e:   83 c5 08                  add    $0x8,%ebp
Code;  c021be36 <htb_walk+9a/ad>
  11:   ff 44 24 00               incl   0x0(%esp,1)


1 warning issued.  Results may not be reliable.

$ /usr/src/linux/scripts/ver_linux

Linux htb 2.4.20-xfs #3 SMP Fri Jan 17 17:14:53 EET 2003 i686 unknown

Gnu C                  2.95.3
Gnu make               3.79.1
util-linux             2.11r
mount                  2.11r
modutils               2.4.16
e2fsprogs              1.27
Linux C Library        2.2.5
Dynamic linker (ldd)   2.2.5
Procps                 2.0.7
Net-tools              1.60
Kbd                    1.06
Sh-utils               2.0
Modules Loaded         sch_sfq e1000

$ cat /etc/fstab
/dev/sda1        swap             swap        defaults         0   0
/dev/sda2        /                ext2        defaults         1   1
/dev/sda3        /var             xfs         defaults         0   0
/dev/cdrom       /mnt/cdrom       iso9660     noauto,owner,ro  0   0
/dev/fd0         /mnt/floppy      auto        noauto,owner     0   0
none             /dev/pts         devpts      gid=5,modeb0   0   0
none             /proc            proc        defaults         0   0

After the oops applications that were trying to write to the /var (xfs)
partition were hanging in D state. We will reformat it and reboot with a
vanilla 2.4.20 having the same config except the XFS filesystem.

Help ?

----------------------------
Mihai RUSU

Disclaimer: Any views or opinions presented within this e-mail are solely
those of the author and do not necessarily represent those of any company,
unless otherwise specifically stated.

_______________________________________________
LARTC mailing list / LARTC@mailman.ds9a.nl
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [LARTC] 2.4.20 htb3 oops
  2003-01-24 12:54 [LARTC] 2.4.20 htb3 oops Mihai RUSU
@ 2003-01-24 14:29 ` Catalin Bucur
  2003-01-25 21:51 ` Alexey Sheshka
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Catalin Bucur @ 2003-01-24 14:29 UTC (permalink / raw)
  To: lartc

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Mihai RUSU wrote:
| After the oops applications that were trying to write to the /var (xfs)
| partition were hanging in D state. We will reformat it and reboot with a
| vanilla 2.4.20 having the same config except the XFS filesystem.

I don't know if it's about xfs. I've had the same oops with kernel
2.4.19 with patch for HTB applied when I was trying to restart HTB for 3
or 4 times a day. I was very happy when 2.4.20 appeared with HTB
included because all there errors was gone. Now I can restart HTB how
many times I want and there are no problems. I am curious too about this
behavior of kernel with HTB enabled.

- --
Catalin Bucur      mailto:cata@geniusnet.ro
NOC @ Genius Network SRL - Galati - Romania

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE+MU3PpDe20wwI9oIRApQwAJ9V+IuZYAjfeDtNYp5CNhK20PdTTwCfV/fi
FObrQbZpI6lK/seMfqESJsA=QTuE
-----END PGP SIGNATURE-----

_______________________________________________
LARTC mailing list / LARTC@mailman.ds9a.nl
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [LARTC] 2.4.20 htb3 oops
  2003-01-24 12:54 [LARTC] 2.4.20 htb3 oops Mihai RUSU
  2003-01-24 14:29 ` Catalin Bucur
@ 2003-01-25 21:51 ` Alexey Sheshka
  2003-03-03 13:22 ` Göran Runfeldt
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Alexey Sheshka @ 2003-01-25 21:51 UTC (permalink / raw)
  To: lartc

On Fri, 24 Jan 2003 16:29:35 +0200
Catalin Bucur <cata@geniusnet.ro> wrote:


> I don't know if it's about xfs. I've had the same oops with kernel
> 2.4.19 with patch for HTB applied when I was trying to restart HTB for 3
> or 4 times a day. I was very happy when 2.4.20 appeared with HTB
> included because all there errors was gone. Now I can restart HTB how
> many times I want and there are no problems. I am curious too about this
> behavior of kernel with HTB enabled.
> 

In my SMP system (2xp3) I had also oops (2.4.19 and 2.4.20), but on single processor systems everything is OK.
_______________________________________________
LARTC mailing list / LARTC@mailman.ds9a.nl
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [LARTC] 2.4.20 htb3 oops
  2003-01-24 12:54 [LARTC] 2.4.20 htb3 oops Mihai RUSU
  2003-01-24 14:29 ` Catalin Bucur
  2003-01-25 21:51 ` Alexey Sheshka
@ 2003-03-03 13:22 ` Göran Runfeldt
  2003-03-03 13:39 ` Abraham van der Merwe
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Göran Runfeldt @ 2003-03-03 13:22 UTC (permalink / raw)
  To: lartc

Hi everyone,

I am having problems with "oopses" since I introduced HTB on my
company's PC-based routers. It seems that only routers with high 
network load are affected. The average network load on the two most
problematic routers are 10Mbps in/out and 2.5Mbps in/out.
The other machines with less than 1Mbps average traffic seems unaffected.
 
We have been getting oopses on these machines 1-3 times per week.

We have tried to replace the hardware on both machines without any
improvement. We are using the same combination of hardware and kernel in
the same physical location without any problems, so we assume that hardware,
kernel or heat is not the problem here.
Machines with high network load that does not have any HTB rules loaded
do not suffer from this problem.

Hardware info:
  Router 1 (10Mbps avg in/out):
    1 x Intel(R) Celeron(R) CPU 1.80GHz
    256MB RAM
    eth0: Intel Corp. 82801BD PRO/100 VE (CNR)
    eth1: RealTek RTL8139

  Router 2: (2.5Mbit avg in/out):
    1 x Intel(R) Celeron(R) CPU 1.70GHz
    128MB RAM
     eth0: RealTek RTL8139
     eth1: RealTek RTL8139

Both use Linux kernel 2.4.20 with patches for FreeS/WAN and connection-
tracking of GRE/PPTP connections. They are both single processor machines.
They both shape traffic from and to a VLAN interface. The kernel is compiled
for CPU type "Pentium-III/Celeron" but the machines are running on
Pentium-IV/Celeron processors, if that matters. Router 1 were using a P3 CPU
before we replaced the hardware, and we had the same problem back then.

Unfortunately I have not been able to gather any output from the consoles of
the crasched machines.

Here is the script the ruleset script:
#!/bin/sh
for DEV in eth0.123 eth1
do
        tc qdisc del dev $DEV root
        tc qdisc add dev $DEV root handle 1: htb
        # Total
        tc class add dev $DEV parent 1:0 classid 1:1 htb rate 12Mbit
        # Default class
        tc class add dev $DEV parent 1:1 classid 1:2 htb rate 11Mbit
        # Filesharing traffic
        tc class add dev $DEV parent 1:1 classid 1:3 htb rate 512Kbit
        # ICMP (Highest priority - on customer's request, not ours)
        tc class add dev $DEV parent 1:1 classid 1:4 htb rate 512Kbit \
prio 0
        tc qdisc add dev $DEV parent 1:2 handle 2: sfq
        tc qdisc add dev $DEV parent 1:3 handle 3: sfq
        tc qdisc add dev $DEV parent 1:4 handle 4: sfq
        for PORT in 411 412 413 4661 4662 8081 19114 6340 6341 6342 \
6343 6344 6345 6346 6347 6348 6349 1214 1215 6699 6257 7668
        do
                # Send to "crap-class"
                tc filter add dev $DEV protocol ip parent 1: prio 1 u32 \
match ip sport $PORT 0xffff flowid 1:3
                tc filter add dev $DEV protocol ip parent 1: prio 1 u32 \
match ip dport $PORT 0xffff flowid 1:3
        done
        tc filter add dev $DEV protocol ip parent 1: prio 1 u32 match ip \
protocol 1 0xff flowid 1:4 # ICMP
        tc filter add dev $DEV protocol ip parent 1: prio 2 u32 match ip \
protocol 0 0x00 flowid 1:2 # Everything else
done

I have not tried to apply the HTB patches from the latest prepatch
version of the Linux kernel or the "htb_3.7_delay_bug" patch
(I think they do the same thing?). Maybe I should try that?

I can get more information (like kernel config etc.) if anyone needs it,
but this thing is really hard to debug since it only happens sporadically.

Thanks,
Göran

>
> In my SMP system (2xp3) I had also oops (2.4.19 and 2.4.20), but
> on single processor systems everything is OK.
>
_______________________________________________
LARTC mailing list / LARTC@mailman.ds9a.nl
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [LARTC] 2.4.20 htb3 oops
  2003-01-24 12:54 [LARTC] 2.4.20 htb3 oops Mihai RUSU
                   ` (2 preceding siblings ...)
  2003-03-03 13:22 ` Göran Runfeldt
@ 2003-03-03 13:39 ` Abraham van der Merwe
  2003-03-03 14:53 ` Göran Runfeldt
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Abraham van der Merwe @ 2003-03-03 13:39 UTC (permalink / raw)
  To: lartc

[-- Attachment #1: Type: text/plain, Size: 4395 bytes --]

Hi G?ran!

Oopses or kernel panics? Could you please post the oops dumps (with decoded
symbols of course).

> I am having problems with "oopses" since I introduced HTB on my
> company's PC-based routers. It seems that only routers with high 
> network load are affected. The average network load on the two most
> problematic routers are 10Mbps in/out and 2.5Mbps in/out.
> The other machines with less than 1Mbps average traffic seems unaffected.
>  
> We have been getting oopses on these machines 1-3 times per week.
> 
> We have tried to replace the hardware on both machines without any
> improvement. We are using the same combination of hardware and kernel in
> the same physical location without any problems, so we assume that hardware,
> kernel or heat is not the problem here.
> Machines with high network load that does not have any HTB rules loaded
> do not suffer from this problem.
> 
> Hardware info:
>   Router 1 (10Mbps avg in/out):
>     1 x Intel(R) Celeron(R) CPU 1.80GHz
>     256MB RAM
>     eth0: Intel Corp. 82801BD PRO/100 VE (CNR)
>     eth1: RealTek RTL8139
> 
>   Router 2: (2.5Mbit avg in/out):
>     1 x Intel(R) Celeron(R) CPU 1.70GHz
>     128MB RAM
>      eth0: RealTek RTL8139
>      eth1: RealTek RTL8139
> 
> Both use Linux kernel 2.4.20 with patches for FreeS/WAN and connection-
> tracking of GRE/PPTP connections. They are both single processor machines.
> They both shape traffic from and to a VLAN interface. The kernel is compiled
> for CPU type "Pentium-III/Celeron" but the machines are running on
> Pentium-IV/Celeron processors, if that matters. Router 1 were using a P3 CPU
> before we replaced the hardware, and we had the same problem back then.
> 
> Unfortunately I have not been able to gather any output from the consoles of
> the crasched machines.
> 
> Here is the script the ruleset script:
> #!/bin/sh
> for DEV in eth0.123 eth1
> do
>         tc qdisc del dev $DEV root
>         tc qdisc add dev $DEV root handle 1: htb
>         # Total
>         tc class add dev $DEV parent 1:0 classid 1:1 htb rate 12Mbit
>         # Default class
>         tc class add dev $DEV parent 1:1 classid 1:2 htb rate 11Mbit
>         # Filesharing traffic
>         tc class add dev $DEV parent 1:1 classid 1:3 htb rate 512Kbit
>         # ICMP (Highest priority - on customer's request, not ours)
>         tc class add dev $DEV parent 1:1 classid 1:4 htb rate 512Kbit \
> prio 0
>         tc qdisc add dev $DEV parent 1:2 handle 2: sfq
>         tc qdisc add dev $DEV parent 1:3 handle 3: sfq
>         tc qdisc add dev $DEV parent 1:4 handle 4: sfq
>         for PORT in 411 412 413 4661 4662 8081 19114 6340 6341 6342 \
> 6343 6344 6345 6346 6347 6348 6349 1214 1215 6699 6257 7668
>         do
>                 # Send to "crap-class"
>                 tc filter add dev $DEV protocol ip parent 1: prio 1 u32 \
> match ip sport $PORT 0xffff flowid 1:3
>                 tc filter add dev $DEV protocol ip parent 1: prio 1 u32 \
> match ip dport $PORT 0xffff flowid 1:3
>         done
>         tc filter add dev $DEV protocol ip parent 1: prio 1 u32 match ip \
> protocol 1 0xff flowid 1:4 # ICMP
>         tc filter add dev $DEV protocol ip parent 1: prio 2 u32 match ip \
> protocol 0 0x00 flowid 1:2 # Everything else
> done
> 
> I have not tried to apply the HTB patches from the latest prepatch
> version of the Linux kernel or the "htb_3.7_delay_bug" patch
> (I think they do the same thing?). Maybe I should try that?
> 
> I can get more information (like kernel config etc.) if anyone needs it,
> but this thing is really hard to debug since it only happens sporadically.
> 
> Thanks,
> Göran
> 
> >
> > In my SMP system (2xp3) I had also oops (2.4.19 and 2.4.20), but
> > on single processor systems everything is OK.
> >
> _______________________________________________
> LARTC mailing list / LARTC@mailman.ds9a.nl
> http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/

-- 

Regards
 Abraham

Genius may have its limitations, but stupidity is not thus handicapped.
		-- Elbert Hubbard

___________________________________________________
 Abraham vd Merwe [ZR1BBQ] - Frogfoot Networks
 P.O. Box 3472, Matieland, Stellenbosch, 7602
 Cell: +27 82 565 4451 Http: http://www.frogfoot.net/
 Email: abz@frogfoot.net


[-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [LARTC] 2.4.20 htb3 oops
  2003-01-24 12:54 [LARTC] 2.4.20 htb3 oops Mihai RUSU
                   ` (3 preceding siblings ...)
  2003-03-03 13:39 ` Abraham van der Merwe
@ 2003-03-03 14:53 ` Göran Runfeldt
  2003-03-03 16:29 ` Abraham van der Merwe
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Göran Runfeldt @ 2003-03-03 14:53 UTC (permalink / raw)
  To: lartc

Hi Abraham,

I'm sorry for mixing up the terms. I thought "oops" and "kernel panic"
were the same thing. This is the text that our technician wrote down
from the screen after the first crasch: 

"...unable to handling kernel null pointer dereference at virtual address 00000
Kernel panic: aiee killing interrupt handling - in interrupt handler not syncing."

He also says that the keyboard LEDs were "blinking". We have not
been able to receive any data from the other crasches, since when
the technician arrived the machines were "stone dead" with the
keyboard LEDs blinking. This is all the data that I have at the moment,
because the machines are physically located far from here.

I realize that this might not be enough information to make something useful of.
It is merely an attempt to somewhat document the problem.

Göran
----- Original Message ----- 
From: "Abraham van der Merwe" <abz@frogfoot.net>
To: "Göran Runfeldt" <goran@wasadata.se>
Cc: "Linux Advanced Routing & Traffic Control list" <lartc@mailman.ds9a.nl>
Sent: Monday, March 03, 2003 2:39 PM
Subject: Re: [LARTC] 2.4.20 htb3 oops

Hi G?ran!

Oopses or kernel panics? Could you please post the oops dumps (with decoded
symbols of course).

> I am having problems with "oopses" since I introduced HTB on my
> company's PC-based routers. It seems that only routers with high 
> network load are affected. The average network load on the two most
> problematic routers are 10Mbps in/out and 2.5Mbps in/out.
> The other machines with less than 1Mbps average traffic seems unaffected.
>  
> We have been getting oopses on these machines 1-3 times per week.
> 
> We have tried to replace the hardware on both machines without any
> improvement. We are using the same combination of hardware and kernel in
> the same physical location without any problems, so we assume that hardware,
> kernel or heat is not the problem here.
> Machines with high network load that does not have any HTB rules loaded
> do not suffer from this problem.
> 
> Hardware info:
>   Router 1 (10Mbps avg in/out):
>     1 x Intel(R) Celeron(R) CPU 1.80GHz
>     256MB RAM
>     eth0: Intel Corp. 82801BD PRO/100 VE (CNR)
>     eth1: RealTek RTL8139
> 
>   Router 2: (2.5Mbit avg in/out):
>     1 x Intel(R) Celeron(R) CPU 1.70GHz
>     128MB RAM
>      eth0: RealTek RTL8139
>      eth1: RealTek RTL8139
> 
> Both use Linux kernel 2.4.20 with patches for FreeS/WAN and connection-
> tracking of GRE/PPTP connections. They are both single processor machines.
> They both shape traffic from and to a VLAN interface. The kernel is compiled
> for CPU type "Pentium-III/Celeron" but the machines are running on
> Pentium-IV/Celeron processors, if that matters. Router 1 were using a P3 CPU
> before we replaced the hardware, and we had the same problem back then.
> 
> Unfortunately I have not been able to gather any output from the consoles of
> the crasched machines.
> 
> Here is the script the ruleset script:
> #!/bin/sh
> for DEV in eth0.123 eth1
> do
>         tc qdisc del dev $DEV root
>         tc qdisc add dev $DEV root handle 1: htb
>         # Total
>         tc class add dev $DEV parent 1:0 classid 1:1 htb rate 12Mbit
>         # Default class
>         tc class add dev $DEV parent 1:1 classid 1:2 htb rate 11Mbit
>         # Filesharing traffic
>         tc class add dev $DEV parent 1:1 classid 1:3 htb rate 512Kbit
>         # ICMP (Highest priority - on customer's request, not ours)
>         tc class add dev $DEV parent 1:1 classid 1:4 htb rate 512Kbit \
> prio 0
>         tc qdisc add dev $DEV parent 1:2 handle 2: sfq
>         tc qdisc add dev $DEV parent 1:3 handle 3: sfq
>         tc qdisc add dev $DEV parent 1:4 handle 4: sfq
>         for PORT in 411 412 413 4661 4662 8081 19114 6340 6341 6342 \
> 6343 6344 6345 6346 6347 6348 6349 1214 1215 6699 6257 7668
>         do
>                 # Send to "crap-class"
>                 tc filter add dev $DEV protocol ip parent 1: prio 1 u32 \
> match ip sport $PORT 0xffff flowid 1:3
>                 tc filter add dev $DEV protocol ip parent 1: prio 1 u32 \
> match ip dport $PORT 0xffff flowid 1:3
>         done
>         tc filter add dev $DEV protocol ip parent 1: prio 1 u32 match ip \
> protocol 1 0xff flowid 1:4 # ICMP
>         tc filter add dev $DEV protocol ip parent 1: prio 2 u32 match ip \
> protocol 0 0x00 flowid 1:2 # Everything else
> done
> 
> I have not tried to apply the HTB patches from the latest prepatch
> version of the Linux kernel or the "htb_3.7_delay_bug" patch
> (I think they do the same thing?). Maybe I should try that?
> 
> I can get more information (like kernel config etc.) if anyone needs it,
> but this thing is really hard to debug since it only happens sporadically.
> 
> Thanks,
> Göran
> 
> >
> > In my SMP system (2xp3) I had also oops (2.4.19 and 2.4.20), but
> > on single processor systems everything is OK.
> >
> _______________________________________________
> LARTC mailing list / LARTC@mailman.ds9a.nl
> http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/

-- 

Regards
 Abraham

Genius may have its limitations, but stupidity is not thus handicapped.
  -- Elbert Hubbard

___________________________________________________
 Abraham vd Merwe [ZR1BBQ] - Frogfoot Networks
 P.O. Box 3472, Matieland, Stellenbosch, 7602
 Cell: +27 82 565 4451 Http: http://www.frogfoot.net/
 Email: abz@frogfoot.net
_______________________________________________
LARTC mailing list / LARTC@mailman.ds9a.nl
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [LARTC] 2.4.20 htb3 oops
  2003-01-24 12:54 [LARTC] 2.4.20 htb3 oops Mihai RUSU
                   ` (4 preceding siblings ...)
  2003-03-03 14:53 ` Göran Runfeldt
@ 2003-03-03 16:29 ` Abraham van der Merwe
  2003-05-06 16:10 ` Kabelweb
  2003-05-07 18:31 ` Martin Volf
  7 siblings, 0 replies; 9+ messages in thread
From: Abraham van der Merwe @ 2003-03-03 16:29 UTC (permalink / raw)
  To: lartc

[-- Attachment #1: Type: text/plain, Size: 5992 bytes --]

Hi G?ran!

Unfortunately that is not going to help much since you can't figure out from
the information below where in the code it crashes. Next time, please copy
the entire kernel panic (including stack trace) and run it through ksymoops
or look up the symbols in vmlinux.

> I'm sorry for mixing up the terms. I thought "oops" and "kernel panic"
> were the same thing. This is the text that our technician wrote down
> from the screen after the first crasch: 
> 
> "...unable to handling kernel null pointer dereference at virtual address 00000
> Kernel panic: aiee killing interrupt handling - in interrupt handler not syncing."
> 
> He also says that the keyboard LEDs were "blinking". We have not
> been able to receive any data from the other crasches, since when
> the technician arrived the machines were "stone dead" with the
> keyboard LEDs blinking. This is all the data that I have at the moment,
> because the machines are physically located far from here.
> 
> I realize that this might not be enough information to make something useful of.
> It is merely an attempt to somewhat document the problem.
> 
> Göran
> ----- Original Message ----- 
> From: "Abraham van der Merwe" <abz@frogfoot.net>
> To: "Göran Runfeldt" <goran@wasadata.se>
> Cc: "Linux Advanced Routing & Traffic Control list" <lartc@mailman.ds9a.nl>
> Sent: Monday, March 03, 2003 2:39 PM
> Subject: Re: [LARTC] 2.4.20 htb3 oops
> 
> Hi G?ran!
> 
> Oopses or kernel panics? Could you please post the oops dumps (with decoded
> symbols of course).
> 
> > I am having problems with "oopses" since I introduced HTB on my
> > company's PC-based routers. It seems that only routers with high 
> > network load are affected. The average network load on the two most
> > problematic routers are 10Mbps in/out and 2.5Mbps in/out.
> > The other machines with less than 1Mbps average traffic seems unaffected.
> >  
> > We have been getting oopses on these machines 1-3 times per week.
> > 
> > We have tried to replace the hardware on both machines without any
> > improvement. We are using the same combination of hardware and kernel in
> > the same physical location without any problems, so we assume that hardware,
> > kernel or heat is not the problem here.
> > Machines with high network load that does not have any HTB rules loaded
> > do not suffer from this problem.
> > 
> > Hardware info:
> >   Router 1 (10Mbps avg in/out):
> >     1 x Intel(R) Celeron(R) CPU 1.80GHz
> >     256MB RAM
> >     eth0: Intel Corp. 82801BD PRO/100 VE (CNR)
> >     eth1: RealTek RTL8139
> > 
> >   Router 2: (2.5Mbit avg in/out):
> >     1 x Intel(R) Celeron(R) CPU 1.70GHz
> >     128MB RAM
> >      eth0: RealTek RTL8139
> >      eth1: RealTek RTL8139
> > 
> > Both use Linux kernel 2.4.20 with patches for FreeS/WAN and connection-
> > tracking of GRE/PPTP connections. They are both single processor machines.
> > They both shape traffic from and to a VLAN interface. The kernel is compiled
> > for CPU type "Pentium-III/Celeron" but the machines are running on
> > Pentium-IV/Celeron processors, if that matters. Router 1 were using a P3 CPU
> > before we replaced the hardware, and we had the same problem back then.
> > 
> > Unfortunately I have not been able to gather any output from the consoles of
> > the crasched machines.
> > 
> > Here is the script the ruleset script:
> > #!/bin/sh
> > for DEV in eth0.123 eth1
> > do
> >         tc qdisc del dev $DEV root
> >         tc qdisc add dev $DEV root handle 1: htb
> >         # Total
> >         tc class add dev $DEV parent 1:0 classid 1:1 htb rate 12Mbit
> >         # Default class
> >         tc class add dev $DEV parent 1:1 classid 1:2 htb rate 11Mbit
> >         # Filesharing traffic
> >         tc class add dev $DEV parent 1:1 classid 1:3 htb rate 512Kbit
> >         # ICMP (Highest priority - on customer's request, not ours)
> >         tc class add dev $DEV parent 1:1 classid 1:4 htb rate 512Kbit \
> > prio 0
> >         tc qdisc add dev $DEV parent 1:2 handle 2: sfq
> >         tc qdisc add dev $DEV parent 1:3 handle 3: sfq
> >         tc qdisc add dev $DEV parent 1:4 handle 4: sfq
> >         for PORT in 411 412 413 4661 4662 8081 19114 6340 6341 6342 \
> > 6343 6344 6345 6346 6347 6348 6349 1214 1215 6699 6257 7668
> >         do
> >                 # Send to "crap-class"
> >                 tc filter add dev $DEV protocol ip parent 1: prio 1 u32 \
> > match ip sport $PORT 0xffff flowid 1:3
> >                 tc filter add dev $DEV protocol ip parent 1: prio 1 u32 \
> > match ip dport $PORT 0xffff flowid 1:3
> >         done
> >         tc filter add dev $DEV protocol ip parent 1: prio 1 u32 match ip \
> > protocol 1 0xff flowid 1:4 # ICMP
> >         tc filter add dev $DEV protocol ip parent 1: prio 2 u32 match ip \
> > protocol 0 0x00 flowid 1:2 # Everything else
> > done
> > 
> > I have not tried to apply the HTB patches from the latest prepatch
> > version of the Linux kernel or the "htb_3.7_delay_bug" patch
> > (I think they do the same thing?). Maybe I should try that?
> > 
> > I can get more information (like kernel config etc.) if anyone needs it,
> > but this thing is really hard to debug since it only happens sporadically.
> > 
> > Thanks,
> > Göran
> > 
> > >
> > > In my SMP system (2xp3) I had also oops (2.4.19 and 2.4.20), but
> > > on single processor systems everything is OK.
> > >
> > _______________________________________________
> > LARTC mailing list / LARTC@mailman.ds9a.nl
> > http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/

-- 

Regards
 Abraham

Kirk to Enterprise -- beam down yeoman Rand and a six-pack.

___________________________________________________
 Abraham vd Merwe [ZR1BBQ] - Frogfoot Networks
 P.O. Box 3472, Matieland, Stellenbosch, 7602
 Cell: +27 82 565 4451 Http: http://www.frogfoot.net/
 Email: abz@frogfoot.net


[-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [LARTC] 2.4.20 htb3 oops
  2003-01-24 12:54 [LARTC] 2.4.20 htb3 oops Mihai RUSU
                   ` (5 preceding siblings ...)
  2003-03-03 16:29 ` Abraham van der Merwe
@ 2003-05-06 16:10 ` Kabelweb
  2003-05-07 18:31 ` Martin Volf
  7 siblings, 0 replies; 9+ messages in thread
From: Kabelweb @ 2003-05-06 16:10 UTC (permalink / raw)
  To: lartc

I've also encountered machine-lockups once a day with stock 2.4.20 kernel and 
htb on SMP. I am using a setup with dynamically changing classes and have 
about 2-4mbit traffic. I was using 2.4.17, 2.4.18 and 2.4.19 with htb 3.6 
without problems before (other machine - not SMP though). Unfortunately I 
wasn't able to gather any debugging information on the crashes because 
getting the system up again was my primary concern. Since I have disabled all 
tc stuff on that machine it runs flawlessly (about a week now). I will soon 
try to reproduce the problem on another machine.

Might the changes to htb in 2.4.21 be a solution? Is there any evidence that 
SMP might cause problems with htb?

Any hints would be greatly appreciated!

        Andreas

On Tuesday 11 March 2003 09:52, Göran Runfeldt wrote:
> > Any solution to the problems describe above? Im currently looking into
> > building a new kernel 2.4.20 with HTB compiled as module in a
> > production enviroment (2,5mbits average, 6-10mbits/peak). I wont use
> > it if its broken though?
>
> Unfortunately we have not yet come to a solution when it comes to our
> problems.
>
> We have had the machines running for more than five days now, without any
> HTB qdisc/classes loaded. This confirms that this is a problem with HTB.
> I have activated the rules again in hope of getting a useful crash dump.
>
> I do not know if anyone else has or will have this problem. All I know is
> that we have the problem over here.
>
> Göran
> _______________________________________________
> LARTC mailing list / LARTC@mailman.ds9a.nl
> http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
_______________________________________________
LARTC mailing list / LARTC@mailman.ds9a.nl
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [LARTC] 2.4.20 htb3 oops
  2003-01-24 12:54 [LARTC] 2.4.20 htb3 oops Mihai RUSU
                   ` (6 preceding siblings ...)
  2003-05-06 16:10 ` Kabelweb
@ 2003-05-07 18:31 ` Martin Volf
  7 siblings, 0 replies; 9+ messages in thread
From: Martin Volf @ 2003-05-07 18:31 UTC (permalink / raw)
  To: lartc

On Tue, 6 May 2003 18:10:04 +0200
Kabelweb <office@kabelweb.at> wrote:

> I've also encountered machine-lockups once a day with stock 2.4.20 kernel and 
> htb on SMP. I am using a setup with dynamically changing classes and have 
> about 2-4mbit traffic. I was using 2.4.17, 2.4.18 and 2.4.19 with htb 3.6 
> without problems before (other machine - not SMP though). Unfortunately I 
> wasn't able to gather any debugging information on the crashes because 
> getting the system up again was my primary concern. Since I have disabled all 
> tc stuff on that machine it runs flawlessly (about a week now). I will soon 
> try to reproduce the problem on another machine.
> 
> Might the changes to htb in 2.4.21 be a solution? Is there any evidence that 
> SMP might cause problems with htb?
> 
> Any hints would be greatly appreciated!


Hello,

In 2.4.20 there is a bug in htb code, which was fixed in 2.4.21pre5 (or was it pre6?). You can use net/sched/sch_htb.c from 2.4.21rc1. For details see this thread in lkml:
http://marc.theaimsgroup.com/?l=linux-kernel&m\x105039488700308&w=2

-- 
Martin Volf
_______________________________________________
LARTC mailing list / LARTC@mailman.ds9a.nl
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2003-05-07 18:31 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-01-24 12:54 [LARTC] 2.4.20 htb3 oops Mihai RUSU
2003-01-24 14:29 ` Catalin Bucur
2003-01-25 21:51 ` Alexey Sheshka
2003-03-03 13:22 ` Göran Runfeldt
2003-03-03 13:39 ` Abraham van der Merwe
2003-03-03 14:53 ` Göran Runfeldt
2003-03-03 16:29 ` Abraham van der Merwe
2003-05-06 16:10 ` Kabelweb
2003-05-07 18:31 ` Martin Volf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.