* [BUG] freeze Alpha ES40 SMP 2.4.4.ac3, another TCP/IP Problem ? ( was 2.4.4 kernel crash , possibly tcp related )
@ 2001-05-03 16:16 Cabaniols, Sebastien
2001-05-03 16:46 ` Andrea Arcangeli
0 siblings, 1 reply; 6+ messages in thread
From: Cabaniols, Sebastien @ 2001-05-03 16:16 UTC (permalink / raw)
To: 'Andrew Morton', 'netdev@oss.sgi.com',
'linux-kernel@vger.kernel.org',
'davem@redhat.com'
Cc: 'kuznet@ms2.inr.ac.ru', 'andrea@suse.de'
Hello,
I have a bug on an Alpha ES40 SMP 2.4.4.ac3 modified (TCP Bug from lkml)
Platform:
Linux Version:
-----------------------
My kernel is 2.4.4-ac3 with the tcp.c file modified as suggested by the
following patch.
>I see! Dave, please, take the second Andrea's patch (appended).
>It is really the cleanest one.
>Alexey
>--- 2.4.4aa3/net/ipv4/tcp.c.~1~ Tue May 1 10:44:57 2001
>+++ 2.4.4aa3/net/ipv4/tcp.c Tue May 1 12:00:25 2001
>@@ -1183,11 +1183,8 @@
> do_fault:
> if (skb->len==0) {
>- if (tp->send_head == skb) {
>- tp->send_head = skb->next;
>- if (tp->send_head == (struct
sk_buff*)&sk->write_queue)
>- tp->send_head = NULL;
>- }
>+ if (tp->send_head == skb)
>+ tp->send_head = NULL;
> __skb_unlink(skb, skb->list);
> tcp_free_skb(sk, skb);
> }
>
>-
This time, to show that it has nothing to do with the ftp server I used a
simple
rcp:
Experiment 1:
----------------------
ES40-06 ES40-05
rcp es40-05:/mnt/big/mid /tmp/toto Machine fine
with a mid file not too big (1.4Megabytes) everything is fine
Experiment 2:
----------------------
ES40-06 ES40-05
rcp es40-05:/mnt/big/1Giga /tmp/toto Machine frozen
the ES40-06 managed to retrieve only 11 Mbytes so I guess I can start again
with a 12 Megabytes file, It should trigger the bug.
Here is the log of the machine who crashed:
-----------------------------------------------------------------------
May 3 17:27:57 es40-05 PAM_unix[651]: (system-auth) session opened for user
root by (uid=0)
May 3 17:27:57 es40-05 in.rshd[651]: root@es40-06.idris.domain as root:
cmd='rcp -f /mnt/big/mid'
May 3 17:29:36 es40-05 PAM_unix[662]: (system-auth) session opened for user
root by (uid=0)
May 3 17:29:36 es40-05 in.rshd[662]: root@es40-06.idris.domain as root:
cmd='rcp -f /mnt/big/1Giga'
May 3 17:29:36 es40-05 kernel: <oomerang_rx(): status e001
May 3 17:29:36 es40-05 kernel: <<7>eth0: interrupt, status e401, latency 4
ticks.
May 3 17:29:36 es40-05 kernel: .
May 3 17:29:36 es40-05 kernel: <th0: interrupt, status e401, latency 3
ticks.
May 3 17:29:36 es40-05 kernel: <7
May 3 17:29:36 es40-05 kernel: <7t()
May 3 17:29:37 es40-05 kernel: <01, latency 4 ticks.
May 3 17:29:37 es40-05 kernel: <7
May 3 17:29:37 es40-05 kernel: <7
May 3 17:29:37 es40-05 kernel: th0: interrupt, status e401, latency 4
ticks.
May 3 17:29:37 es40-05 kernel: <7o send a packet, Tx index 5905.
May 3 17:29:37 es40-05 kernel: <7<7>eth0: exiting interrupt, status e000.
May 3 17:29:37 es40-05 kernel: e201.
May 3 17:29:37 es40-05 kernel: <7<7>eth0: In interrupt loop, status e401.
May 3 17:29:37 es40-05 kernel: <7omerang_start_xmit()
May 3 17:29:37 es40-05 kernel: <7omerang_start_xmit()
The next line is:
--------------------------
May 3 17:36:17 es40-05 syslogd 1.3-3: restart.
What could I do to be sure where the problem is ?
I tested the machine under high cpu load, memory, swap, combination of the
three.
The only thing that does not work under load is the network.... TCP/IP ?
Andrew Morton is pretty sure this has nothing to do with his driver...
Any ideas of how I could find where the problem is ?
Thx for any help.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [BUG] freeze Alpha ES40 SMP 2.4.4.ac3, another TCP/IP Problem ? ( was 2.4.4 kernel crash , possibly tcp related )
2001-05-03 16:16 [BUG] freeze Alpha ES40 SMP 2.4.4.ac3, another TCP/IP Problem ? ( was 2.4.4 kernel crash , possibly tcp related ) Cabaniols, Sebastien
@ 2001-05-03 16:46 ` Andrea Arcangeli
2001-05-03 16:58 ` Peter Rival
2001-05-03 17:23 ` Andrea Arcangeli
0 siblings, 2 replies; 6+ messages in thread
From: Andrea Arcangeli @ 2001-05-03 16:46 UTC (permalink / raw)
To: Cabaniols, Sebastien
Cc: 'Andrew Morton', 'netdev@oss.sgi.com',
'linux-kernel@vger.kernel.org',
'davem@redhat.com', 'kuznet@ms2.inr.ac.ru'
On Thu, May 03, 2001 at 06:16:02PM +0200, Cabaniols, Sebastien wrote:
> The only thing that does not work under load is the network.... TCP/IP ?
My alpha is running 2.4.4aa3 under very high load (apache beaten from ab
in loop via 100mbit switched network [tulip on the alpha] plus cerberus)
and I didn't had any problem so far (it only deadlocked with OOM after
one day of day of tux [instead of apache] + cerberus regression testing
but that's only because of a memleak in tux that I reproduced on x86 too
it seems)
I'm going to release soon a 2.4.5pre1aa1 that will compile with modules
as well. The only annoying thing is that UP kernel compiles seems not to
boot but I hope that will be fixed soon too.
So I doubt the problem is the tcp stack, it may not be the driver but it
shouldn't be a generic bug in vanilla 2.4.4 at least.
Andrea
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [BUG] freeze Alpha ES40 SMP 2.4.4.ac3, another TCP/IP Problem ? ( was 2.4.4 kernel crash , possibly tcp related )
2001-05-03 16:46 ` Andrea Arcangeli
@ 2001-05-03 16:58 ` Peter Rival
2001-05-03 17:23 ` Andrea Arcangeli
1 sibling, 0 replies; 6+ messages in thread
From: Peter Rival @ 2001-05-03 16:58 UTC (permalink / raw)
To: Andrea Arcangeli
Cc: Cabaniols, Sebastien, 'Andrew Morton',
'netdev@oss.sgi.com',
'linux-kernel@vger.kernel.org',
'davem@redhat.com', 'kuznet@ms2.inr.ac.ru'
Andrea Arcangeli wrote:
> On Thu, May 03, 2001 at 06:16:02PM +0200, Cabaniols, Sebastien wrote:
> > The only thing that does not work under load is the network.... TCP/IP ?
>
> My alpha is running 2.4.4aa3 under very high load (apache beaten from ab
> in loop via 100mbit switched network [tulip on the alpha] plus cerberus)
> and I didn't had any problem so far (it only deadlocked with OOM after
> one day of day of tux [instead of apache] + cerberus regression testing
> but that's only because of a memleak in tux that I reproduced on x86 too
> it seems)
>
Silly question, Sebastien - when you do a "show config" at the console, how
is your card represented? FWIU, there have been problems with adapters under
load that aren't fully supported by SRM... Just a guess. Could you try this
with a DE600 (Intel) or a DE500 (tulip)?
- Pete
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [BUG] freeze Alpha ES40 SMP 2.4.4.ac3, another TCP/IP Problem ? ( was 2.4.4 kernel crash , possibly tcp related )
2001-05-03 16:46 ` Andrea Arcangeli
2001-05-03 16:58 ` Peter Rival
@ 2001-05-03 17:23 ` Andrea Arcangeli
1 sibling, 0 replies; 6+ messages in thread
From: Andrea Arcangeli @ 2001-05-03 17:23 UTC (permalink / raw)
To: Cabaniols, Sebastien
Cc: 'Andrew Morton', 'netdev@oss.sgi.com',
'linux-kernel@vger.kernel.org',
'davem@redhat.com', 'kuznet@ms2.inr.ac.ru'
On Thu, May 03, 2001 at 06:46:10PM +0200, Andrea Arcangeli wrote:
> as well. The only annoying thing is that UP kernel compiles seems not to
> boot but I hope that will be fixed soon too.
Ok I spotted and fixed that bug that forbidden my tree to boot with UP
compiles on alpha. The bug is that the SCHED_YIELD handling was broken
on alpha UP, this is the fix:
--- 2.4.5pre1aa1/arch/alpha/kernel/entry.S.~1~ Thu May 3 18:22:13 2001
+++ 2.4.5pre1aa1/arch/alpha/kernel/entry.S Thu May 3 19:18:16 2001
@@ -709,16 +709,14 @@
br restore_all
.end entSys
-#ifdef CONFIG_SMP
- .globl ret_from_smp_fork
+ .globl ret_from_fork
.align 3
-.ent ret_from_smp_fork
-ret_from_smp_fork:
+.ent ret_from_fork
+ret_from_fork:
lda $26,ret_from_sys_call
mov $17,$16
jsr $31,schedule_tail
-.end ret_from_smp_fork
-#endif /* CONFIG_SMP */
+.end ret_from_fork
.align 3
.ent reschedule
--- 2.4.5pre1aa1/arch/alpha/kernel/process.c.~1~ Thu May 3 18:22:09 2001
+++ 2.4.5pre1aa1/arch/alpha/kernel/process.c Thu May 3 19:15:41 2001
@@ -306,7 +306,7 @@
struct task_struct * p, struct pt_regs * regs)
{
extern void ret_from_sys_call(void);
- extern void ret_from_smp_fork(void);
+ extern void ret_from_fork(void);
struct pt_regs * childregs;
struct switch_stack * childstack, *stack;
@@ -325,11 +325,7 @@
stack = ((struct switch_stack *) regs) - 1;
childstack = ((struct switch_stack *) childregs) - 1;
*childstack = *stack;
-#ifdef CONFIG_SMP
- childstack->r26 = (unsigned long) ret_from_smp_fork;
-#else
- childstack->r26 = (unsigned long) ret_from_sys_call;
-#endif
+ childstack->r26 = (unsigned long) ret_from_fork;
p->thread.usp = usp;
p->thread.ksp = (unsigned long) childstack;
p->thread.pal_flags = 1; /* set FEN, clear everything else */
(SCHED_YIELD of the previous task is cleared by __schedule_tail, it
wasn't cleared so a non running task had a SCHED_YIELD set and it was
deadlocking, this can explain many malfunction of UP alpha kernels)
I never noticed so far because I always compiled it SMP.
Andrea
^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: [BUG] freeze Alpha ES40 SMP 2.4.4.ac3, another TCP/IP Problem ? ( was 2.4.4 kernel crash , possibly tcp related )
@ 2001-05-03 17:41 Cabaniols, Sebastien
0 siblings, 0 replies; 6+ messages in thread
From: Cabaniols, Sebastien @ 2001-05-03 17:41 UTC (permalink / raw)
To: Rival, Frank, Andrea Arcangeli
Cc: 'Andrew Morton', 'netdev@oss.sgi.com',
'linux-kernel@vger.kernel.org',
'davem@redhat.com', 'kuznet@ms2.inr.ac.ru'
>Silly question, Sebastien - when you do a "show config" at the console, how
>is your card represented? FWIU, there have been problems with adapters
under
>load that aren't fully supported by SRM... Just a guess. Could you try
this
>with a DE600 (Intel) or a DE500 (tulip)?
> - Pete
appended to this email is the output of show conf
I can see the 3COM board at first slot 2
I also have a DE600 board into slot 6 of second PCI bus
DE600 boards freeze my system
DE504 board freeze my system
I have tried to change the switch, point to point connections... So I
changed to 3com905b
to have a more standart board (in the linux community I mean). :(((
P00>>>show conf
Compaq Computer Corporation
Compaq AlphaServer ES40
Firmware
SRM Console: V5.9-24
ARC Console: v5.70
PALcode: OpenVMS PALcode V1.90-101, Tru64 UNIX PALcode V1.86-101
Serial ROM: V2.12-F
RMC ROM: V1.0
RMC Flash ROM: V2.6
Processors
CPU 0 Alpha EV68A pass 2.1 or 2.1A or 3.0 833 MHz 8MB Bcache
CPU 1 Alpha EV68A pass 2.1 or 2.1A or 3.0 833 MHz 8MB Bcache
CPU 2 Alpha EV68A pass 2.1 or 2.1A or 3.0 833 MHz 8MB Bcache
CPU 3 Alpha EV68A pass 2.1 or 2.1A or 3.0 833 MHz 8MB Bcache
Core Logic
Cchip DECchip 21272-CA Rev 9(C4)
Dchip DECchip 21272-DA Rev 2
Pchip 0 DECchip 21272-EA Rev 2
Pchip 1 DECchip 21272-EA Rev 2
TIG Rev 10
Memory
Array Size Base Address Intlv Mode
--------- ---------- ---------------- ----------
0 2048Mb 0000000000000000 4-Way
1 2048Mb 0000000080000000 4-Way
2 2048Mb 0000000100000000 4-Way
3 2048Mb 0000000180000000 4-Way
8192 MB of System Memory
Slot Option Hose 0, Bus 0, PCI
1 NCR 53C895 pkb0.7.0.1.0 SCSI Bus ID 7
dkb0.0.0.1.0 COMPAQ BD009635C3
dkb100.1.0.1.0 COMPAQ BF01863644
dkb200.2.0.1.0 COMPAQ BF01863644
2 905510B7/905510B7
3 804314C1/804314C1
7 Acer Labs M1543C Bridge to Bus 1, ISA
15 Acer Labs M1543C IDE dqa.0.0.15.0
dqb.0.1.15.0
dqa0.0.0.15.0 Compaq CRD-8402B
19 Acer Labs M1543C USB
Option Hose 0, Bus 1, ISA
Floppy dva0.0.0.1000.0
Slot Option Hose 1, Bus 0, PCI
4 NCR 53C895 pka0.7.0.4.1 SCSI Bus ID 7
dka0.0.0.4.1 COMPAQ BF01863644
dka100.1.0.4.1 COMPAQ BF01863644
dka200.2.0.4.1 COMPAQ BF01863644
dka300.3.0.4.1 COMPAQ BF01863644
5 QLogic QLA2200 pya0.0.0.5.1
6 DE600-AA eia0.0.0.6.1 00-50-8B-AE-DD-A0
^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: [BUG] freeze Alpha ES40 SMP 2.4.4.ac3, another TCP/IP Problem ? ( was 2.4.4 kernel crash , possibly tcp related )
@ 2001-05-03 17:53 Cabaniols, Sebastien
0 siblings, 0 replies; 6+ messages in thread
From: Cabaniols, Sebastien @ 2001-05-03 17:53 UTC (permalink / raw)
To: Rival, Frank, Andrea Arcangeli
Cc: 'Andrew Morton', 'netdev@oss.sgi.com',
'linux-kernel@vger.kernel.org',
'davem@redhat.com', 'kuznet@ms2.inr.ac.ru'
> Andrea Arcangeli wrote:
>
> > On Thu, May 03, 2001 at 06:16:02PM +0200, Cabaniols,
> Sebastien wrote:
> > > The only thing that does not work under load is the
> network.... TCP/IP ?
> >
> > My alpha is running 2.4.4aa3 under very high load (apache
> beaten from ab
> > in loop via 100mbit switched network [tulip on the alpha]
> plus cerberus)
> > and I didn't had any problem so far (it only deadlocked
> with OOM after
> > one day of day of tux [instead of apache] + cerberus
> regression testing
> > but that's only because of a memleak in tux that I
> reproduced on x86 too
> > it seems)
> >
Andrea,
Do you think I should install exactly the same version 2.4.4aa3 instead
of 2.4.4.ac3 with the TCP patch ?
What else can I try to find where my bug is ?
I have DE600 boards too but from the last stress tests I did a few days ago
it was
freezing my system but I suspect this was another story, I then switched to
3com950b
because this is a very well known board and I was suspecting it could help a
lot
to standardize my system.
I also used DE504 with the de4x5 driver and it was again crashing my system.
I did not used the tulip driver though ( :( )
Again, thanks a lot for any help
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2001-05-03 17:55 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-05-03 16:16 [BUG] freeze Alpha ES40 SMP 2.4.4.ac3, another TCP/IP Problem ? ( was 2.4.4 kernel crash , possibly tcp related ) Cabaniols, Sebastien
2001-05-03 16:46 ` Andrea Arcangeli
2001-05-03 16:58 ` Peter Rival
2001-05-03 17:23 ` Andrea Arcangeli
-- strict thread matches above, loose matches on Subject: below --
2001-05-03 17:41 Cabaniols, Sebastien
2001-05-03 17:53 Cabaniols, Sebastien
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox