public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [BUG] freeze Alpha ES40 SMP 2.4.4.ac3, another TCP/IP Problem ? ( was 2.4.4 kernel crash , possibly tcp related )
@ 2001-05-03 16:16 Cabaniols, Sebastien
  2001-05-03 16:46 ` Andrea Arcangeli
  0 siblings, 1 reply; 6+ messages in thread
From: Cabaniols, Sebastien @ 2001-05-03 16:16 UTC (permalink / raw)
  To: 'Andrew Morton', 'netdev@oss.sgi.com',
	'linux-kernel@vger.kernel.org',
	'davem@redhat.com'
  Cc: 'kuznet@ms2.inr.ac.ru', 'andrea@suse.de'

Hello, 

I have a bug on an Alpha ES40 SMP 2.4.4.ac3 modified (TCP Bug from lkml)

Platform:

Linux Version:
-----------------------

My kernel is 2.4.4-ac3 with the tcp.c file modified as suggested by the
following patch.


>I see! Dave, please, take the second Andrea's patch (appended).
>It is really the cleanest one.

>Alexey


>--- 2.4.4aa3/net/ipv4/tcp.c.~1~	Tue May  1 10:44:57 2001
>+++ 2.4.4aa3/net/ipv4/tcp.c	Tue May  1 12:00:25 2001
>@@ -1183,11 +1183,8 @@
 
> do_fault:
> 	if (skb->len==0) {
>-		if (tp->send_head == skb) {
>-			tp->send_head = skb->next;
>-			if (tp->send_head == (struct
sk_buff*)&sk->write_queue)
>-				tp->send_head = NULL;
>-		}
>+		if (tp->send_head == skb)
>+			tp->send_head = NULL;
> 		__skb_unlink(skb, skb->list);
> 		tcp_free_skb(sk, skb);
> 	}
>
>-

This time, to show that it has nothing to do with the ftp server I used a
simple
rcp:

Experiment 1:
----------------------

 ES40-06					ES40-05

 rcp es40-05:/mnt/big/mid /tmp/toto		Machine fine

 with a mid file not too big (1.4Megabytes) everything is fine
 

Experiment 2:
----------------------

 ES40-06					ES40-05

 rcp es40-05:/mnt/big/1Giga /tmp/toto		Machine frozen

 the ES40-06 managed to retrieve only 11 Mbytes so I guess I can start again
with a 12 Megabytes file, It should trigger the bug.

Here is the log of the machine who crashed:
-----------------------------------------------------------------------

May  3 17:27:57 es40-05 PAM_unix[651]: (system-auth) session opened for user
root by (uid=0)
May  3 17:27:57 es40-05 in.rshd[651]: root@es40-06.idris.domain as root:
cmd='rcp -f /mnt/big/mid'
May  3 17:29:36 es40-05 PAM_unix[662]: (system-auth) session opened for user
root by (uid=0)
May  3 17:29:36 es40-05 in.rshd[662]: root@es40-06.idris.domain as root:
cmd='rcp -f /mnt/big/1Giga'
May  3 17:29:36 es40-05 kernel: <oomerang_rx(): status e001
May  3 17:29:36 es40-05 kernel: <<7>eth0: interrupt, status e401, latency 4
ticks.
May  3 17:29:36 es40-05 kernel: .
May  3 17:29:36 es40-05 kernel: <th0: interrupt, status e401, latency 3
ticks.
May  3 17:29:36 es40-05 kernel: <7
May  3 17:29:36 es40-05 kernel: <7t()
May  3 17:29:37 es40-05 kernel: <01, latency 4 ticks.
May  3 17:29:37 es40-05 kernel: <7
May  3 17:29:37 es40-05 kernel: <7
May  3 17:29:37 es40-05 kernel: th0: interrupt, status e401, latency 4
ticks.
May  3 17:29:37 es40-05 kernel: <7o send a packet, Tx index 5905.
May  3 17:29:37 es40-05 kernel: <7<7>eth0: exiting interrupt, status e000.
May  3 17:29:37 es40-05 kernel:  e201.
May  3 17:29:37 es40-05 kernel: <7<7>eth0: In interrupt loop, status e401.
May  3 17:29:37 es40-05 kernel: <7omerang_start_xmit()
May  3 17:29:37 es40-05 kernel: <7omerang_start_xmit()

The next line is:
--------------------------
May  3 17:36:17 es40-05 syslogd 1.3-3: restart.



What could I do to be sure where the problem is ?

I tested the machine under high cpu load, memory, swap, combination of the
three.
The only thing that does not work under load is the network.... TCP/IP ?

Andrew Morton is pretty sure this has nothing to do with his driver...

Any ideas of how I could find where the problem is ?

Thx for any help.

^ permalink raw reply	[flat|nested] 6+ messages in thread
* RE: [BUG] freeze Alpha ES40 SMP 2.4.4.ac3, another TCP/IP Problem ? (  was 2.4.4 kernel crash , possibly tcp related )
@ 2001-05-03 17:41 Cabaniols, Sebastien
  0 siblings, 0 replies; 6+ messages in thread
From: Cabaniols, Sebastien @ 2001-05-03 17:41 UTC (permalink / raw)
  To: Rival, Frank, Andrea Arcangeli
  Cc: 'Andrew Morton', 'netdev@oss.sgi.com',
	'linux-kernel@vger.kernel.org',
	'davem@redhat.com', 'kuznet@ms2.inr.ac.ru'


>Silly question, Sebastien - when you do a "show config" at the console, how
>is your card represented?  FWIU, there have been problems with adapters
under
>load that aren't fully supported by SRM...  Just a guess.  Could you try
this
>with a DE600 (Intel) or a DE500 (tulip)?

> - Pete



appended to this email is the output of show conf

I can see the 3COM board at first slot 2
I also have a DE600 board into slot 6 of second PCI bus

DE600 boards freeze my system 
DE504 board freeze my system

I have tried to change the switch, point to point connections... So I
changed to 3com905b
to have a more standart board (in the linux community I mean). :(((



P00>>>show conf
                        Compaq Computer Corporation
                          Compaq AlphaServer ES40

Firmware
SRM Console:    V5.9-24
ARC Console:    v5.70
PALcode:        OpenVMS PALcode V1.90-101, Tru64 UNIX PALcode V1.86-101
Serial ROM:     V2.12-F  
RMC ROM:        V1.0
RMC Flash ROM:  V2.6

Processors
CPU 0           Alpha EV68A pass 2.1 or 2.1A or 3.0 833 MHz  8MB Bcache
CPU 1           Alpha EV68A pass 2.1 or 2.1A or 3.0 833 MHz  8MB Bcache
CPU 2           Alpha EV68A pass 2.1 or 2.1A or 3.0 833 MHz  8MB Bcache
CPU 3           Alpha EV68A pass 2.1 or 2.1A or 3.0 833 MHz  8MB Bcache

Core Logic
Cchip           DECchip 21272-CA Rev 9(C4)
Dchip           DECchip 21272-DA Rev 2
Pchip 0         DECchip 21272-EA Rev 2
Pchip 1         DECchip 21272-EA Rev 2
TIG             Rev 10

Memory
  Array       Size       Base Address    Intlv Mode
---------  ----------  ----------------  ----------
    0       2048Mb     0000000000000000    4-Way
    1       2048Mb     0000000080000000    4-Way
    2       2048Mb     0000000100000000    4-Way
    3       2048Mb     0000000180000000    4-Way

     8192 MB of System Memory

 Slot   Option                  Hose 0, Bus 0, PCI
   1    NCR 53C895              pkb0.7.0.1.0            SCSI Bus ID 7
                                dkb0.0.0.1.0            COMPAQ BD009635C3
                                dkb100.1.0.1.0          COMPAQ BF01863644
                                dkb200.2.0.1.0          COMPAQ BF01863644
   2    905510B7/905510B7                           
   3    804314C1/804314C1                           
   7    Acer Labs M1543C                                Bridge to Bus 1, ISA
  15    Acer Labs M1543C IDE    dqa.0.0.15.0        
                                dqb.0.1.15.0        
                                dqa0.0.0.15.0           Compaq   CRD-8402B
  19    Acer Labs M1543C USB                        

        Option                  Hose 0, Bus 1, ISA
        Floppy                  dva0.0.0.1000.0     

 Slot   Option                  Hose 1, Bus 0, PCI
   4    NCR 53C895              pka0.7.0.4.1            SCSI Bus ID 7
                                dka0.0.0.4.1            COMPAQ BF01863644
                                dka100.1.0.4.1          COMPAQ BF01863644
                                dka200.2.0.4.1          COMPAQ BF01863644
                                dka300.3.0.4.1          COMPAQ BF01863644
   5    QLogic QLA2200          pya0.0.0.5.1        
   6    DE600-AA                eia0.0.0.6.1            00-50-8B-AE-DD-A0

^ permalink raw reply	[flat|nested] 6+ messages in thread
* RE: [BUG] freeze Alpha ES40 SMP 2.4.4.ac3, another TCP/IP Problem ? (  was 2.4.4 kernel crash , possibly tcp related )
@ 2001-05-03 17:53 Cabaniols, Sebastien
  0 siblings, 0 replies; 6+ messages in thread
From: Cabaniols, Sebastien @ 2001-05-03 17:53 UTC (permalink / raw)
  To: Rival, Frank, Andrea Arcangeli
  Cc: 'Andrew Morton', 'netdev@oss.sgi.com',
	'linux-kernel@vger.kernel.org',
	'davem@redhat.com', 'kuznet@ms2.inr.ac.ru'

> Andrea Arcangeli wrote:
> 
> > On Thu, May 03, 2001 at 06:16:02PM +0200, Cabaniols, 
> Sebastien wrote:
> > > The only thing that does not work under load is the 
> network.... TCP/IP ?
> >
> > My alpha is running 2.4.4aa3 under very high load (apache 
> beaten from ab
> > in loop via 100mbit switched network [tulip on the alpha] 
> plus cerberus)
> > and I didn't had any problem so far (it only deadlocked 
> with OOM after
> > one day of day of tux [instead of apache] + cerberus 
> regression testing
> > but that's only because of a memleak in tux that I 
> reproduced on x86 too
> > it seems)
> >

Andrea, 


Do you think I should  install exactly the same version 2.4.4aa3 instead
of 2.4.4.ac3 with the TCP patch ? 

What else can I try to find where my bug is ?

I have DE600 boards too but from the last stress tests I did a few days ago
it was
freezing my system but I suspect this was another story, I then switched to
3com950b
because this is a very well known board and I was suspecting it could help a
lot
to standardize my system. 

I also used DE504 with the de4x5 driver and it was again crashing my system.

I did not used the tulip driver though ( :( ) 

Again, thanks a lot for any help
  


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2001-05-03 17:55 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-05-03 16:16 [BUG] freeze Alpha ES40 SMP 2.4.4.ac3, another TCP/IP Problem ? ( was 2.4.4 kernel crash , possibly tcp related ) Cabaniols, Sebastien
2001-05-03 16:46 ` Andrea Arcangeli
2001-05-03 16:58   ` Peter Rival
2001-05-03 17:23   ` Andrea Arcangeli
  -- strict thread matches above, loose matches on Subject: below --
2001-05-03 17:41 Cabaniols, Sebastien
2001-05-03 17:53 Cabaniols, Sebastien

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox