netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Network oops
       [not found] <200205202125.BAA03545@sex.inr.ac.ru>
@ 2002-06-09 17:31 ` george anzinger
       [not found] ` <3D0390E2.1B80ADEE@mvista.com>
  1 sibling, 0 replies; 19+ messages in thread
From: george anzinger @ 2002-06-09 17:31 UTC (permalink / raw)
  To: kuznet; +Cc: netdev, linux-net, davem, ak, pekkas

Is this sort of thing expected when the NIC is out of poop?


    <4>eth1: card reports no resources.
    <4>KERNEL: assertion ((int)tcp_packets_in_flight(tp) >=
0) failed at tcp_input.c(956):tcp_sacktag_write_queue
    <7>Leak l=8 4
    <1>Unable to handle kernel NULL pointer dereference at
virtual address 00000049
    <4> printing eip:
    <4>c025b7aa
    <1>*pde = 00000000
    <4>Oops: 0000
    <4>CPU:    0
    <4>EIP:    0010:[<c025b7aa>]    Not tainted
    <4>EFLAGS: 00010207
    <4>eax: c40d82c8   ebx: 00000000   ecx: 00000004   edx:
00000000
    <4>esi: c40d8338   edi: c40d8260   ebp: c0437eb4   esp:
c0437ea4
    <4>ds: 0018   es: 0018   ss: 0018
    <4>Process swapper (pid: 0, stackpage=c0437000)
    <4>Stack: 0000002e c40d8260 c40d8338 c40d8260 c0437ee4
c0264e8b c40d8260 00000000 
    <4>       00000001 c73cd944 00000009 c0437f10 c887250f
c40d8260 c40d8338 c0436000 
    <4>       c0437f08 c02650a6 c40d8260 c037edb8 00000094
00000000 c0437f08 c40d8260 
    <4>Call Trace: [<c0264e8b>] [<c887250f>] [<c02650a6>]
[<c01296a9>] [<c0264fc0>] 
    <4>   [<c01297ac>] [<c0124ebb>] [<c0109ae2>]
[<c01054f0>] [<c0105530>] [<c01055c2>] 
    <4>   [<c0105000>] 
    <4>
    <4>Code: 0f b6 4b 49 f6 c1 82 74 0c 31 d2 89 96 78 01 00
00 0f b6 4b 
STACK TRACE FOR TASK: 0xc0436000 (swapper)

 0 tcp_enter_loss+202 [0xc025b7aa]
 1 tcp_retransmit_timer+566 [0xc0264e86]
 2 tcp_write_timer+225 [0xc02650a1]
 3 timer_bh+710 [0xc01296a6]
 4 timer_softirq+39 [0xc01297a7]
 5 do_softirq+249 [0xc0124eb9]
 6 do_IRQ+525 [0xc0109add]
 7 do_IRQ+525 [0xc0109add]

-- 
George Anzinger   george@mvista.com
High-res-timers: 
http://sourceforge.net/projects/high-res-timers/
Real time sched:  http://sourceforge.net/projects/rtsched/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Network oops
       [not found] ` <3D0390E2.1B80ADEE@mvista.com>
@ 2002-06-10  4:31   ` David S. Miller
  2002-06-10  4:32   ` David S. Miller
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 19+ messages in thread
From: David S. Miller @ 2002-06-10  4:31 UTC (permalink / raw)
  To: george; +Cc: kuznet, netdev, linux-net, ak, pekkas


No mention of what kernel version, what patches applied, etc.
so we cannot help you.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Network oops
       [not found] ` <3D0390E2.1B80ADEE@mvista.com>
  2002-06-10  4:31   ` David S. Miller
@ 2002-06-10  4:32   ` David S. Miller
       [not found]   ` <20020609.213150.32126725.davem@redhat.com>
       [not found]   ` <20020609.213224.01016187.davem@redhat.com>
  3 siblings, 0 replies; 19+ messages in thread
From: David S. Miller @ 2002-06-10  4:32 UTC (permalink / raw)
  To: george; +Cc: kuznet, netdev, linux-net, ak, pekkas


Also we need to know what network driver, if preemption is being
used (being from mvista I assume you are using preemption, and if
so make sure you have the preemption networking patches applied).

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Network oops
       [not found]   ` <20020609.213150.32126725.davem@redhat.com>
@ 2002-06-10  4:48     ` george anzinger
       [not found]     ` <3D042F8F.72764243@mvista.com>
  1 sibling, 0 replies; 19+ messages in thread
From: george anzinger @ 2002-06-10  4:48 UTC (permalink / raw)
  To: David S. Miller; +Cc: kuznet, netdev, linux-net, ak, pekkas

"David S. Miller" wrote:
> 
> No mention of what kernel version, what patches applied, etc.
> so we cannot help you.

Sorry bout that.  It is a 2.4.17 kernel and the test is to
verify that we have the preempt code right.  The question at
hand is if this is a likely preempt problem or just a pure
overload.  The stress on the network is rather high.

I would expect that the network code would recover from this
sort of thing, so we are looking for a preempt issue at the
moment.  Still, it could just be the way things work in the
2.4.17 kernel so I thought I would ask.

-- 
George Anzinger   george@mvista.com
High-res-timers: 
http://sourceforge.net/projects/high-res-timers/
Real time sched:  http://sourceforge.net/projects/rtsched/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Network oops
       [not found]     ` <3D042F8F.72764243@mvista.com>
@ 2002-06-10  5:06       ` David S. Miller
  2002-06-20 17:19       ` george anzinger
       [not found]       ` <3D120EAE.5A0D365E@mvista.com>
  2 siblings, 0 replies; 19+ messages in thread
From: David S. Miller @ 2002-06-10  5:06 UTC (permalink / raw)
  To: george; +Cc: kuznet, netdev, linux-net, ak, pekkas

   From: george anzinger <george@mvista.com>
   Date: Sun, 09 Jun 2002 21:48:15 -0700
   
   I would expect that the network code would recover from this
   sort of thing, so we are looking for a preempt issue at the
   moment.  Still, it could just be the way things work in the
   2.4.17 kernel so I thought I would ask.

Even though 2.4.17 is pretty old I still think it's a preempt
problem.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Network oops
       [not found]   ` <20020609.213224.01016187.davem@redhat.com>
@ 2002-06-10  8:11     ` george anzinger
       [not found]     ` <3D045F15.578E1DA9@mvista.com>
  1 sibling, 0 replies; 19+ messages in thread
From: george anzinger @ 2002-06-10  8:11 UTC (permalink / raw)
  To: David S. Miller; +Cc: kuznet, netdev, linux-net, ak, pekkas

"David S. Miller" wrote:
> 
> Also we need to know what network driver, if preemption is being
> used (being from mvista I assume you are using preemption, and if
> so make sure you have the preemption networking patches applied).

Uh, are you saying there is a set of network patches?  If
so, where might they be found?  And yes, we are testing
preemption, trying to wring out the "last" bug :)  (Could
you be referring to the patches we are developing?)

The driver (from the system log):
    <4>eepro100.c:v1.09j-t 9/29/99 Donald Becker
http://cesdis.gsfc.nasa.gov/linux/drivers/eepro100.html
    <4>eepro100.c: $Revision: 1.36 $ 2000/11/17 Modified by
Andrey V. Savochkin <saw@saw.sw.com.sg> and others
    <6>eth0: OEM i82557/i82558 10/100 Ethernet,
00:03:47:BD:60:86, IRQ 21.
    <6>  Board assembly fab600-000, Physical connectors
present: RJ45
    <6>  Primary interface chip i82555 PHY #1.
    <6>  General self-test: passed.
    <6>  Serial sub-system self-test: passed.
    <6>  Internal registers self-test: passed.
    <6>  ROM checksum self-test: passed (0xb874c1d3).
    <6>eth1: OEM i82557/i82558 10/100 Ethernet,
00:03:47:BD:60:87, IRQ 20.
    <6>  Board assembly fab600-000, Physical connectors
present: RJ45
    <6>  Primary interface chip i82555 PHY #1.
    <6>  General self-test: passed.
    <6>  Serial sub-system self-test: passed.
    <6>  Internal registers self-test: passed.
    <6>  ROM checksum self-test: passed (0xb874c1d3).

Looks like two NICs.
-- 
George Anzinger   george@mvista.com
High-res-timers: 
http://sourceforge.net/projects/high-res-timers/
Real time sched:  http://sourceforge.net/projects/rtsched/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Network oops
       [not found]     ` <3D045F15.578E1DA9@mvista.com>
@ 2002-06-10  8:31       ` David S. Miller
       [not found]       ` <20020610.013110.81671593.davem@redhat.com>
  1 sibling, 0 replies; 19+ messages in thread
From: David S. Miller @ 2002-06-10  8:31 UTC (permalink / raw)
  To: george; +Cc: kuznet, netdev, linux-net, ak, pekkas

   From: george anzinger <george@mvista.com>
   Date: Mon, 10 Jun 2002 01:11:01 -0700

   "David S. Miller" wrote:
   > 
   > Also we need to know what network driver, if preemption is being
   > used (being from mvista I assume you are using preemption, and if
   > so make sure you have the preemption networking patches applied).
   
   Uh, are you saying there is a set of network patches?  If
   so, where might they be found?  And yes, we are testing
   preemption, trying to wring out the "last" bug :)  (Could
   you be referring to the patches we are developing?)
   
An mvista person recently put "make net preempt safe" patches
into 2.5.x  Basically it amounted to putting a few preempt_disable
sections into net/core/skbuff.c

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Network oops
       [not found]       ` <20020610.013110.81671593.davem@redhat.com>
@ 2002-06-10 14:12         ` george anzinger
  0 siblings, 0 replies; 19+ messages in thread
From: george anzinger @ 2002-06-10 14:12 UTC (permalink / raw)
  To: David S. Miller; +Cc: kuznet, netdev, linux-net, ak, pekkas

"David S. Miller" wrote:
> 
>    From: george anzinger <george@mvista.com>
>    Date: Mon, 10 Jun 2002 01:11:01 -0700
> 
>    "David S. Miller" wrote:
>    >
>    > Also we need to know what network driver, if preemption is being
>    > used (being from mvista I assume you are using preemption, and if
>    > so make sure you have the preemption networking patches applied).
> 
>    Uh, are you saying there is a set of network patches?  If
>    so, where might they be found?  And yes, we are testing
>    preemption, trying to wring out the "last" bug :)  (Could
>    you be referring to the patches we are developing?)
> 
> An mvista person recently put "make net preempt safe" patches
> into 2.5.x  Basically it amounted to putting a few preempt_disable
> sections into net/core/skbuff.c

Ah, yes.  Those are the ones I have found sofar.  Robert
Love would be the submitter.  It would apear there is at
least one more hole :(  Thus the query.
-- 
George Anzinger   george@mvista.com
High-res-timers: 
http://sourceforge.net/projects/high-res-timers/
Real time sched:  http://sourceforge.net/projects/rtsched/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Network oops
@ 2002-06-10 14:42 Mala Anand
  2002-06-12  1:35 ` Donald Becker
  0 siblings, 1 reply; 19+ messages in thread
From: Mala Anand @ 2002-06-10 14:42 UTC (permalink / raw)
  To: george anzinger
  Cc: ak, David S. Miller, kuznet, linux-net, linux-net-owner, netdev,
	pekkas

>"David S. Miller" wrote:
>>
>> Also we need to know what network driver, if preemption is being
>> used (being from mvista I assume you are using preemption, and if
>> so make sure you have the preemption networking patches applied).

>Uh, are you saying there is a set of network patches?  If
>so, where might they be found?  And yes, we are testing
>preemption, trying to wring out the "last" bug :)  (Could
>you be referring to the patches we are developing?)

>The driver (from the system log):
>    <4>eepro100.c:v1.09j-t 9/29/99 Donald Becker
>http://cesdis.gsfc.nasa.gov/linux/drivers/eepro100.html
>    <4>eepro100.c: $Revision: 1.36 $ 2000/11/17 Modified by
>Andrey V. Savochkin <saw@saw.sw.com.sg> and others
>    <6>eth0: OEM i82557/i82558 10/100 Ethernet,
>00:03:47:BD:60:86, IRQ 21.
>    <6>  Board assembly fab600-000, Physical connectors
>present: RJ45
>    <6>  Primary interface chip i82555 PHY #1.
> <6>  General self-test: passed.
>    <6>  Serial sub-system self-test: passed.
>    <6>  Internal registers self-test: passed.
>    <6>  ROM checksum self-test: passed (0xb874c1d3).
>    <6>eth1: OEM i82557/i82558 10/100 Ethernet,
>00:03:47:BD:60:87, IRQ 20.
>    <6>  Board assembly fab600-000, Physical connectors
>present: RJ45
>    <6>  Primary interface chip i82555 PHY #1.
>    <6>  General self-test: passed.
>    <6>  Serial sub-system self-test: passed.
>    <6>  Internal registers self-test: passed.
>    <6>  ROM checksum self-test: passed (0xb874c1d3).

>Looks like two NICs.
--
 I have see this problem with this NIC.  You need to increase
the number of buffers in the RX and TX ring in eepro100.c file.
Right now it is set to 32, change it to 256 or 128 this error
will go away.

Regards,
    Mala


   Mala Anand
   E-mail:manand@us.ibm.com
   Linux Technology Center - Performance
   Phone:838-8088; Tie-line:678-8088

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Network oops
  2002-06-10 14:42 Mala Anand
@ 2002-06-12  1:35 ` Donald Becker
  0 siblings, 0 replies; 19+ messages in thread
From: Donald Becker @ 2002-06-12  1:35 UTC (permalink / raw)
  To: Mala Anand; +Cc: linux-net, linux-net-owner, netdev, pekkas

On Mon, 10 Jun 2002, Mala Anand wrote:

>  I have see this problem with this NIC.  You need to increase
> the number of buffers in the RX and TX ring in eepro100.c file.
> Right now it is set to 32, change it to 256 or 128 this error
> will go away.

Changing the Tx queue size or number of Rx skbuffs waiting to be filled
will not fix a problem.  It might mask some problem e.g. a temporary
shortage of skbuffs, but it doesn't fix anything.

-- 
Donald Becker				becker@scyld.com
Scyld Computing Corporation		http://www.scyld.com
410 Severn Ave. Suite 210		Second Generation Beowulf Clusters
Annapolis MD 21403			410-990-9993

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Network oops
@ 2002-06-12 16:58 Mala Anand
  0 siblings, 0 replies; 19+ messages in thread
From: Mala Anand @ 2002-06-12 16:58 UTC (permalink / raw)
  To: Donald Becker; +Cc: linux-net, linux-net-owner, netdev, pekkas

>> On Mon, 10 Jun 2002, Mala Anand wrote:

>>  I have seen this problem with this NIC.  You need to increase
>> the number of buffers in the RX and TX ring in eepro100.c file.
>> Right now it is set to 32, change it to 256 or 128 this error
>> will go away.

> Donald Becker wrote:
>Changing the Tx queue size or number of Rx skbuffs waiting to be filled
>will not fix a problem.  It might mask some problem e.g. a temporary
>shortage of skbuffs, but it doesn't fix anything.

The problem what I saw was resource problem and as a result I had packet
drops.  The
input rate was high and the driver didn't have enough resources to handle.
It was a pure resource problem.
--

Regards,
    Mala


   Mala Anand
   E-mail:manand@us.ibm.com
   Linux Technology Center - Performance
   Phone:838-8088; Tie-line:678-8088

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Network oops
       [not found] <OF8ADD9FF7.49EC72C0-ON85256BD6.005CEBAC@raleigh.ibm.com>
@ 2002-06-12 17:42 ` Nivedita Singhvi
  0 siblings, 0 replies; 19+ messages in thread
From: Nivedita Singhvi @ 2002-06-12 17:42 UTC (permalink / raw)
  To: Mala Anand; +Cc: Donald Becker, linux-net, linux-net-owner, netdev, pekkas

On Wed, 12 Jun 2002, Mala Anand wrote:

> >> On Mon, 10 Jun 2002, Mala Anand wrote:
> 
> >>  I have seen this problem with this NIC.  You need to increase
> >> the number of buffers in the RX and TX ring in eepro100.c file.
> >> Right now it is set to 32, change it to 256 or 128 this error
> >> will go away.
> 
> > Donald Becker wrote:
> >Changing the Tx queue size or number of Rx skbuffs waiting to be filled
> >will not fix a problem.  It might mask some problem e.g. a temporary
> >shortage of skbuffs, but it doesn't fix anything.
> 
> The problem what I saw was resource problem and as a result I had packet
> drops.  The
> input rate was high and the driver didn't have enough resources to handle.
> It was a pure resource problem.
> --

I think the problem Donald Becker is referring to is the
fact that it oopses when out of resources, (or whatever),
and that is not a graceful and acceptable thing. Adding
resources pushes the problem to a point under additional
stress/load/consumption. It would be nice if that were
handled gracefully.

thanks,
Nivedita

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Network oops
       [not found]     ` <3D042F8F.72764243@mvista.com>
  2002-06-10  5:06       ` David S. Miller
@ 2002-06-20 17:19       ` george anzinger
       [not found]       ` <3D120EAE.5A0D365E@mvista.com>
  2 siblings, 0 replies; 19+ messages in thread
From: george anzinger @ 2002-06-20 17:19 UTC (permalink / raw)
  To: David S. Miller, kuznet, netdev, linux-net, ak, pekkas,
	Cress, Andrew R

We need help from someone who knows the network code.  I have tried to give all the relevant information below.  The machine that last failed can still be examined with kgdb to answer any further questions.

We are working with a 2.4.17 kernel with all the latest preempt patches as well as the high-res-timres patch (which, by the way has the proposed TIMER_BH conversion to softirq code).  Other patches are applied but, when removed, do not affect the below described behavior.  The system is an SMP (2) processor:     <4>CPU0: Intel(R) Pentium(R) III CPU family      1266MHz stepping 01

A break point placed in deliver_to_old_ones() is never hit.

Failure rate under heavy network stress (4 machines each measuring network performance with all 4 machines in the test) occurs very seldom.  The last failure took 32 hours, the prior one 22 hours.  

Failure appears to occur because a call is made to a bogus address that is pulled from an skb.
Here is a back trace of the latest failure:

Program received signal SIGEMT, Emulation trap.
0x00b24d18 in ?? () at af_packet.c:1891

(gdb) bt
#0  0x00b24d18 in ?? () at af_packet.c:1891
#1  0xc0267751 in tcp_v4_destroy_sock (sk=0xc7164260)
    at /usr/src/linux-2.4.17-CLT/include/net/tcp.h:1673
#2  0xc02566f1 in tcp_destroy_sock (sk=0xc7164260) at tcp.c:1800
#3  0xc025731a in tcp_close (sk=0xc7164260, timeout=0) at tcp.c:1971
#4  0xc0274f67 in inet_release (sock=0xc2c94160) at af_inet.c:465
#5  0xc02304a2 in sock_release (sock=0xc2c94160) at socket.c:489
#6  0xc0230a50 in sock_close (inode=0xc2c94040, filp=0xc51297e0)
    at socket.c:724
#7  0xc014d833 in fput (file=0xc51297e0) at file_table.c:113
#8  0xc014bfe3 in filp_close (filp=0xc51297e0, id=0xc2eb6d60) at open.c:838
#9  0xc014c0b2 in sys_close (fd=4) at open.c:862
#10 0xc010782b in system_call () at af_packet.c:1891
#11 0x40043507 in ?? () at af_packet.c:1891
(
kgdb caught this in:
0xc0119b3d is in do_page_fault (fault.c:329).
324      * terminate things with extreme prejudice.
325      */
326     #ifdef CONFIG_KGDB
327             if (!user_mode(regs)){
328                     kgdb_handle_exception(14,SIGBUS, error_code, regs);
329                     return;
330             }
331     #endif
332
333             bust_spinlocks(1);
(

In both failures we found this in the log buffer just prior to the failure:

<5>\n<4>KERNEL: assertion ((int)tcp_packets_in_flight(tp) >= 0) failed at tcp_input.c(956):tcp_sacktag_write_queue\n

On the assumption that preemption is the root cause of this problem we have instrumented the preemption code to keep track of the last 100 preemptions.  What follows is an edited log 
of these preemptions.  Each entry consists of two words of time (sec, usec) followed by the 
address (hex and symbolic) and the pid of the process.  Preemptions clearly unrelated to the network code have been removed.  Most of these were in idle or sched_yield.

0xc0368e60 <preempt_log>:       0x3d101941      0xc9868 0xc025f4bb <tcp_transmit_skb+747>       0x15aa
0xc0368e80 <preempt_log+32>:    0x3d101941      0xec247 0xc025d821 <tcp_copy_to_iovec+129>      0x15aa
0xc0368ea0 <preempt_log+64>:    0x3d101942      0xc1f6  0xc0167ee0 <update_atime>       0x15ae
0xc0368ec0 <preempt_log+96>:    0x3d101942      0x24440 0xc025d821 <tcp_copy_to_iovec+129>      0x15aa
0xc0368ee0 <preempt_log+128>:   0x3d101942      0x26eae 0xc025d821 <tcp_copy_to_iovec+129>      0x15a9
0xc0368f00 <preempt_log+160>:   0x3d101942      0x2a4b4 0xc025d821 <tcp_copy_to_iovec+129>      0x15aa
0xc0368f20 <preempt_log+192>:   0x3d101942      0x2c74a 0xc02558b1 <tcp_prequeue_process+305>   0x15aa
0xc0368f40 <preempt_log+224>:   0x3d101942      0x2d5b6 0xc0255741 <tcp_data_wait+705>  0x15aa
0xc0368f60 <preempt_log+256>:   0x3d101942      0x2e634 0xc02558b1 <tcp_prequeue_process+305>   0x15a9
0xc0368f80 <preempt_log+288>:   0x3d101942      0x2fcaa 0xc024db81 <ip_output+401>      0x15aa
0xc0368fc0 <preempt_log+352>:   0x3d101942      0x30c4f 0xc02558b1 <tcp_prequeue_process+305>   0x15a6
0xc0368fe0 <preempt_log+384>:   0x3d101942      0x33603 0xc02558b1 <tcp_prequeue_process+305>   0x15a6
0xc0369000 <preempt_log+416>:   0x3d101942      0xc5137 0xc011d1ec <remove_wait_queue+156>      0x15a8
0xc0369060 <preempt_log+512>:   0x3d101944      0x6f2e4 0xc025561b <tcp_data_wait+411>  0x15a8
0xc0369120 <preempt_log+704>:   0x3d101948      0xae22f 0xc024dc28 <ip_queue_xmit+24>   0x15b2
0xc0369140 <preempt_log+736>:   0x3d101949      0xbe01  0xc02558b1 <tcp_prequeue_process+305>   0x15b2
0xc03691a0 <preempt_log+832>:   0x3d10194b      0xbd567 0xc025d821 <tcp_copy_to_iovec+129>      0x15b3
0xc0369250 <preempt_log+1008>:  0x3d10194b      0xbde39 0xc01054df <default_idle+47>    0x0

Each of these is pinned to source below:
(gdb)
(gdb)l* l* tcp_transmit_skb+747
0xc01f4a83 is in tcp_transmit_skb (/usr/src/cvs/hhl-kernel-cambell/linux_test/include/net/tcp.h:1422).
1417						  TCPOLEN_TIMESTAMP);
1418			*ptr++ = htonl(tstamp);
1419			*ptr++ = htonl(tp->ts_recent);
1420		}
1421		if (tp->eff_sacks) {
1422			struct tcp_sack_block *sp = tp->dsack ? tp->duplicate_sack : tp->selective_acks;
1423			int this_sack;
1424	
1425			*ptr++ = __constant_htonl((TCPOPT_NOP << 24) |
1426						  (TCPOPT_NOP << 16) |
(gdb)l * tcp_copy_to_iovec+129
0xc01f2eed is in tcp_copy_to_iovec (tcp_input.c:3157).
3152		int chunk = skb->len - hlen;
3153		int err;
3154	
3155		local_bh_enable();
3156		if (skb->ip_summed==CHECKSUM_UNNECESSARY)
3157			err = skb_copy_datagram_iovec(skb, hlen, tp->ucopy.iov, chunk);
3158		else
3159			err = skb_copy_and_csum_datagram_iovec(skb, hlen, tp->ucopy.iov);
3160	
3161		if (!err) {
(gdb) l *tcp_prequeue_process+305
0xc01ebba5 is in tcp_recvmsg (tcp.c:1400).
1395		int err;
1396		int target;		/* Read at least this many bytes */
1397		long timeo;
1398		struct task_struct *user_recv = NULL;
1399	
1400		lock_sock(sk);
1401	
1402		TCP_CHECK_TIMER(sk);
1403	
1404		err = -ENOTCONN;

(gdb) l* tcp_data_wait+705
0xc01ebb49 is in tcp_prequeue_process (tcp.c:1376).
1371		while ((skb = __skb_dequeue(&tp->ucopy.prequeue)) != NULL)
1372			sk->backlog_rcv(sk, skb);
1373		local_bh_enable();
1374	
1375		/* Clear memory counter. */
1376		tp->ucopy.memory = 0;
1377	}
1378	
1379	/*
1380	 *	This routine copies from a sock struct into the user buffer. 
(gdb) l *tcp_data_wait+411
0xc01eba23 is in tcp_data_wait (tcp.c:1354).
1349		release_sock(sk);
1350	
1351		if (skb_queue_empty(&sk->receive_queue))
1352			timeo = schedule_timeout(timeo);
1353	
1354		lock_sock(sk);
1355		clear_bit(SOCK_ASYNC_WAITDATA, &sk->socket->flags);
1356	
1357		remove_wait_queue(sk->sleep, &wait);
1358		__set_current_state(TASK_RUNNING);
(gdb) (gdb) l *ip_queue_xmit+24
0xc024dc28 is in ip_queue_xmit (ip_output.c:351).
346             struct iphdr *iph;
347
348             /* Skip all of this if the packet is already routed,
349              * f.e. by something like SCTP.
350              */
351             rt = (struct rtable *) skb->dst;
352             if (rt != NULL)
353                     goto packet_routed;
354
355             /* Make sure we can route this packet. */
(gdb)
(gdb) l * ip_output+401
0xc01e55d9 is in ip_queue_xmit (ip_output.c:344).
339	}
340	
341	int ip_queue_xmit(struct sk_buff *skb)
342	{
343		struct sock *sk = skb->sk;
344		struct ip_options *opt = sk->protinfo.af_inet.opt;
345		struct rtable *rt;
346		struct iphdr *iph;
347	
348		/* Skip all of this if the packet is already routed,


Any help would be greatly appreciated.  We also can probe the system to answer any further questions.  As said above, we are assuming this is related to preemption, however, that assumption may be bad.

Thanks
-- 
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/
Real time sched:  http://sourceforge.net/projects/rtsched/
Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Network oops
       [not found]       ` <3D120EAE.5A0D365E@mvista.com>
@ 2002-06-21  0:38         ` David S. Miller
       [not found]         ` <20020620.173805.55219901.davem@redhat.com>
  1 sibling, 0 replies; 19+ messages in thread
From: David S. Miller @ 2002-06-21  0:38 UTC (permalink / raw)
  To: george; +Cc: kuznet, netdev, linux-net, ak, pekkas, andrew.r.cress


I don't understand, you've completely turned the preemption model of
the kernel upside down and you want _US_ to debug this for you?

This is a lot of work, work I personally don't have time for.
It requires a full audit of the networking in the new preemption
environment.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Network oops
       [not found]         ` <20020620.173805.55219901.davem@redhat.com>
@ 2002-06-21 14:16           ` george anzinger
       [not found]           ` <3D133538.60B6810C@mvista.com>
  1 sibling, 0 replies; 19+ messages in thread
From: george anzinger @ 2002-06-21 14:16 UTC (permalink / raw)
  To: David S. Miller; +Cc: kuznet, netdev, linux-net, ak, pekkas, andrew.r.cress

"David S. Miller" wrote:
> 
> I don't understand, you've completely turned the preemption model of
> the kernel upside down and you want _US_ to debug this for you?

You could look at it this way :)

On the other hand, I prefer to think of the linux community as a group of folks working on making one of the best OSs on the planet even better.  Its adoption by more and more companies attests to its place in the world.  We who work on it, IMHO, should cooperate with each other in our efforts to make the system even better.  Sure we will disagree about some of the things others are doing and thus the need for some central control of the system.  For this we are indebted to Linus.  And we may not have time to spend working on the system when others would most like us to, but that is the nature
of life and we should respect others in this regard.

So while I would very much like your help, I understand that you may be busy with other things at this time and not have the time.  Still, I must ask, just as others ask of me.
> 
> This is a lot of work, work I personally don't have time for.

I can understand and accept that.

> It requires a full audit of the networking in the new preemption
> environment.

Any pointers on what to look for would be welcome.

-- 
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/
Real time sched:  http://sourceforge.net/projects/rtsched/
Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Network oops
       [not found]           ` <3D133538.60B6810C@mvista.com>
@ 2002-06-21 14:17             ` David S. Miller
       [not found]             ` <20020621.071720.07439917.davem@redhat.com>
                               ` (2 subsequent siblings)
  3 siblings, 0 replies; 19+ messages in thread
From: David S. Miller @ 2002-06-21 14:17 UTC (permalink / raw)
  To: george; +Cc: kuznet, netdev, linux-net, ak, pekkas, andrew.r.cress

   From: george anzinger <george@mvista.com>
   Date: Fri, 21 Jun 2002 07:16:24 -0700

[ BTW please start using newlines in your emails, instead of 1,000
  character monstrosities lacking them.  Thanks ]

   On the other hand, I prefer to think of the linux community as a
   group of folks working on making one of the best OSs on the planet
   even better.

Right, and I don't think CONFIG_PREEMPT makes the planet better.
In fact, now that you mention it, I think CONFIG_PREEMPT is a pile
of crap.

   > It requires a full audit of the networking in the new preemption
   > environment.
   
   Any pointers on what to look for would be welcome.

And you go right back to asking me to do the work for you.
What is wrong with you?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Network oops
       [not found]             ` <20020621.071720.07439917.davem@redhat.com>
@ 2002-06-21 15:12               ` george anzinger
  0 siblings, 0 replies; 19+ messages in thread
From: george anzinger @ 2002-06-21 15:12 UTC (permalink / raw)
  To: David S. Miller; +Cc: kuznet, netdev, linux-net, ak, pekkas, andrew.r.cress

"David S. Miller" wrote:
> 
>    From: george anzinger <george@mvista.com>
>    Date: Fri, 21 Jun 2002 07:16:24 -0700
> 
> [ BTW please start using newlines in your emails, instead of 1,000
>   character monstrosities lacking them.  Thanks ]

Sure.  I am trying to find something that works with patches
as well as text.  Still looking I guess.
> 
>    On the other hand, I prefer to think of the linux community as a
>    group of folks working on making one of the best OSs on the planet
>    even better.
> 
> Right, and I don't think CONFIG_PREEMPT makes the planet better.
> In fact, now that you mention it, I think CONFIG_PREEMPT is a pile
> of crap.

Can we agree to disagree :)
> 
>    > It requires a full audit of the networking in the new preemption
>    > environment.
> 
>    Any pointers on what to look for would be welcome.
> 
> And you go right back to asking me to do the work for you.
> What is wrong with you?

I said they would be welcome, not that you have to reply...

-- 
George Anzinger   george@mvista.com
High-res-timers: 
http://sourceforge.net/projects/high-res-timers/
Real time sched:  http://sourceforge.net/projects/rtsched/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Network oops
       [not found]           ` <3D133538.60B6810C@mvista.com>
  2002-06-21 14:17             ` David S. Miller
       [not found]             ` <20020621.071720.07439917.davem@redhat.com>
@ 2002-06-22  0:55             ` Andi Kleen
       [not found]             ` <20020622025551.A1919@averell>
  3 siblings, 0 replies; 19+ messages in thread
From: Andi Kleen @ 2002-06-22  0:55 UTC (permalink / raw)
  To: george anzinger
  Cc: David S. Miller, kuznet, netdev, linux-net, ak, pekkas,
	andrew.r.cress

> > It requires a full audit of the networking in the new preemption
> > environment.
> 
> Any pointers on what to look for would be welcome.

I would look at the driver, especially races in its skb handling.

-Andi

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Network oops
       [not found]             ` <20020622025551.A1919@averell>
@ 2002-06-28 19:56               ` george anzinger
  0 siblings, 0 replies; 19+ messages in thread
From: george anzinger @ 2002-06-28 19:56 UTC (permalink / raw)
  To: Andi Kleen
  Cc: David S. Miller, kuznet, netdev, linux-net, pekkas,
	andrew.r.cress

We finally found the problem.  I want to thank the folks who
gave any thought and/or feedback to us, particularly Alexey
and David.

For what its worth we tracked it down to the kernel memory
allocation routines allocating the same buffer to two
different network requesters.  This in turn was caused by a
basic flaw in the preemption code that preempted on
spin_unlock, REGARDLESS OF THE STATE OF THE INTERRUPT
SYSTEM.
-- 
George Anzinger   george@mvista.com
High-res-timers: 
http://sourceforge.net/projects/high-res-timers/
Real time sched:  http://sourceforge.net/projects/rtsched/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2002-06-28 19:56 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <OF8ADD9FF7.49EC72C0-ON85256BD6.005CEBAC@raleigh.ibm.com>
2002-06-12 17:42 ` Network oops Nivedita Singhvi
2002-06-12 16:58 Mala Anand
  -- strict thread matches above, loose matches on Subject: below --
2002-06-10 14:42 Mala Anand
2002-06-12  1:35 ` Donald Becker
     [not found] <200205202125.BAA03545@sex.inr.ac.ru>
2002-06-09 17:31 ` george anzinger
     [not found] ` <3D0390E2.1B80ADEE@mvista.com>
2002-06-10  4:31   ` David S. Miller
2002-06-10  4:32   ` David S. Miller
     [not found]   ` <20020609.213150.32126725.davem@redhat.com>
2002-06-10  4:48     ` george anzinger
     [not found]     ` <3D042F8F.72764243@mvista.com>
2002-06-10  5:06       ` David S. Miller
2002-06-20 17:19       ` george anzinger
     [not found]       ` <3D120EAE.5A0D365E@mvista.com>
2002-06-21  0:38         ` David S. Miller
     [not found]         ` <20020620.173805.55219901.davem@redhat.com>
2002-06-21 14:16           ` george anzinger
     [not found]           ` <3D133538.60B6810C@mvista.com>
2002-06-21 14:17             ` David S. Miller
     [not found]             ` <20020621.071720.07439917.davem@redhat.com>
2002-06-21 15:12               ` george anzinger
2002-06-22  0:55             ` Andi Kleen
     [not found]             ` <20020622025551.A1919@averell>
2002-06-28 19:56               ` george anzinger
     [not found]   ` <20020609.213224.01016187.davem@redhat.com>
2002-06-10  8:11     ` george anzinger
     [not found]     ` <3D045F15.578E1DA9@mvista.com>
2002-06-10  8:31       ` David S. Miller
     [not found]       ` <20020610.013110.81671593.davem@redhat.com>
2002-06-10 14:12         ` george anzinger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).