public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* assertion (cnt <= tp->packets_out) failed
@ 2005-08-05 15:53 John Bäckstrand
  2005-08-05 16:32 ` David S. Miller
  0 siblings, 1 reply; 13+ messages in thread
From: John Bäckstrand @ 2005-08-05 15:53 UTC (permalink / raw)
  To: linux-kernel; +Cc: netdev

I get

KERNEL: assertion (cnt <= tp->packets_out) failed at 
net/ipv4/tcp_input.c (1476)

with 2.6.13-rc5, also with a small netpoll patch that shouldnt affect 
these things. (Topic: "lockups with netconsole on e1000 on media 
insertion").

I have a decent amount of dropped/overruns:

eth2      Link encap:Ethernet  HWaddr 00:50:DA:E0:BB:36
           inet addr:83.233.27.60  Bcast:83.233.27.255  Mask:255.255.255.0
           inet6 addr: fe80::250:daff:fee0:bb36/64 Scope:Link
           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
           RX packets:9141685 errors:0 dropped:0 overruns:794 frame:0
           TX packets:10596040 errors:0 dropped:0 overruns:0 carrier:0
           collisions:0 txqueuelen:1000
           RX bytes:950232746 (906.2 MiB)  TX bytes:804721505 (767.4 MiB)
           Interrupt:10 Base address:0x8800

eth3      Link encap:Ethernet  HWaddr 00:0E:0C:75:F1:2A
           inet addr:10.32.0.1  Bcast:10.255.255.255  Mask:255.255.0.0
           inet6 addr: fe80::20e:cff:fe75:f12a/64 Scope:Link
           UP BROADCAST RUNNING MULTICAST  MTU:16000  Metric:1
           RX packets:16090188 errors:2329 dropped:4658 overruns:2329 
frame:0
           TX packets:34370559 errors:0 dropped:0 overruns:0 carrier:0
           collisions:0 txqueuelen:1000
           RX bytes:1148661167 (1.0 GiB)  TX bytes:4000412315 (3.7 GiB)
           Base address:0x8400 Memory:e2000000-e2020000



ethtool -S eth3
NIC statistics:
      rx_packets: 16195970
      tx_packets: 34563822
      rx_bytes: 1258213074
      tx_bytes: 4205874656
      rx_errors: 2332
      tx_errors: 0
      rx_dropped: 2332
      tx_dropped: 0
      multicast: 0
      collisions: 0
      rx_length_errors: 0
      rx_over_errors: 0
      rx_crc_errors: 0
      rx_frame_errors: 0
      rx_fifo_errors: 2332
      rx_no_buffer_count: 0
      rx_missed_errors: 2332
      tx_aborted_errors: 0
      tx_carrier_errors: 0
      tx_fifo_errors: 0
      tx_heartbeat_errors: 0
      tx_window_errors: 0
      tx_abort_late_coll: 0
      tx_deferred_ok: 0
      tx_single_coll_ok: 0
      tx_multi_coll_ok: 0
      rx_long_length_errors: 0
      rx_short_length_errors: 0
      rx_align_errors: 0
      tx_tcp_seg_good: 2981894
      tx_tcp_seg_failed: 0
      rx_flow_control_xon: 0
      rx_flow_control_xoff: 0
      tx_flow_control_xon: 0
      tx_flow_control_xoff: 0
      rx_long_byte_count: 14143114962
      rx_csum_offload_good: 16195740
      rx_csum_offload_errors: 0


ethtool -S eth2
NIC statistics:
      tx_deferred: 0
      tx_multiple_collisions: 0
      rx_bad_ssd: 0

---
John Bäckstrand

^ permalink raw reply	[flat|nested] 13+ messages in thread
* Re: assertion (cnt <= tp->packets_out) failed
@ 2005-08-07 16:20 Heikki Orsila
  0 siblings, 0 replies; 13+ messages in thread
From: Heikki Orsila @ 2005-08-07 16:20 UTC (permalink / raw)
  To: Linux Kernel Mailing List

David S. Miller wrote:
> I suspect this is a side effect of some changes Herbert Xu and 
> myself did to fix some other bugs.

I think I crossed this bug independently today. I did some testing, and 
got kernel panics when uploading files with ftp on a gigabit lan. The 
error happens always at net/ipv4/tcp_output.c:918. Here's a matrix of 
combinations that crashes or doesn't crash:

kernel			what happens
2.6.12			no crash
2.6.13-rc1 & e1000	no crash
2.6.13-rc1 & skge	no crash
2.6.13-rc2 & e1000	crash
2.6.13-rc2 & skge	no crash
2.6.13-rc3 & e1000	crash
2.6.13-rc3 & skge	no crash
2.6.13-rc5 & e1000	crash
2.6.13-rc5 & skge	no crash

There were big changes in tcp_output.c between rc1 and rc2, and the bug 
is triggered when using e1000 with rc2 or later. And because the bug 
does not happen on skge (new sk98 driver) it makes me guess it's a race 
condition of sorts.. I am surprised this bug wasn't noticed with rc2.

The crash happens just in the beginning of a tcp connection. Only some 
packets are sent and then the sends stall by slowing down, and after 5 
secs the kernel crashes. Basically usage pattern is this:

upload file a
upload file b
upload file a (replaces old file)
upload file b (replaces old file)
upload file a (replaces old file) => slowdown and then a crash in 5 secs 


Here's the kernel greeting:

Kernel BUG at "net/ipv4/tcp_output.c":918
Invalid operand: 0000 [1]
CPU 0
Modules linked in:
Pid 0, comm: swapper Not tainted 2.6.13-rc5
RIP: 0010:[<ffffffff804d086d>] <ffffffff804d086d><__tcp_push_pending_frames+429>
...
Process swapper (pid: 0, threadinfo ffffffff807de000, task ffffffff80640ec0)
...
Call Trace: <IRQ> <ffffffff804ce30c><tcp_rcv_established+1964>
		  <...>             <tcp_v4_do_rcv+37> ... <...><ip_local_deliver_finish+0>
		  <...>             <tcp_v4_rcv+1483> ... <...><ip_local_deliver+322>
		  <...>             <ip_rcv+1107> ... <...><netif_receive_skb+426>
		  <...>             <process_backlog+154>
		  <...>             <__do_softirq+83>
		  <...>             <do_softirq+53>
		  <...>             <ret_from_intr+0>
		  <...>             <default_idle+0>
		  <...>             <cpu_idle+49>
		  <...>             <_sinittext+534>
		  <...>             <>
		  <...>             <>
		  <...>             <>
		  <...>             <>
		  <...>             <>
		  <...>             <>
...
RIP <ffffffff804d086d><__tcp_push_pending_frames+429> ...
 <0>Kernel panic - not syncing: Aiee, killing interrupt handler!


Here's system info:

Linux e275d 2.6.13-rc2 #1 Sun Aug 7 18:48:05 EEST 2005 x86_64 AMD Athlon(tm) 64 Processor 3000+ AuthenticAMD GNU/Linux
 
Gnu C                  3.4.3
Gnu make               3.80
binutils               2.15.92.0.2
util-linux             2.12i
mount                  2.12i
module-init-tools      3.0
e2fsprogs              1.38
jfsutils               1.1.4
reiserfsprogs          line
reiser4progs           line
xfsprogs               2.6.25
nfs-utils              1.0.6
Linux C Library        2.3.5
Dynamic linker (ldd)   2.3.5
Procps                 3.2.5
Net-tools              1.60
Kbd                    1.12
Sh-utils               5.2.1
udev                   058
Modules Loaded         



-- 
Heikki Orsila			Barbie's law:
heikki.orsila@iki.fi		"Math is hard, let's go shopping!"
http://www.iki.fi/shd

^ permalink raw reply	[flat|nested] 13+ messages in thread
* Re: assertion (cnt <= tp->packets_out) failed
@ 2005-08-07 16:29 Heikki Orsila
  0 siblings, 0 replies; 13+ messages in thread
From: Heikki Orsila @ 2005-08-07 16:29 UTC (permalink / raw)
  To: Linux Kernel Mailing List

Herbert Xu <herbert@gondor.apana.org.au> wrote:
> Hang on a second, the original poster mentioned rc5.  Is this really
> pristine rc5 with the one netpoll patch? If so then it can't be the
> patches we're talking about because they only went in days later.

I produced a similar panic on rc2 and later (which doesn't happen on 
rc1). See my other post in this thread.

-- 
Heikki Orsila			Barbie's law:
heikki.orsila@iki.fi		"Math is hard, let's go shopping!"
http://www.iki.fi/shd

^ permalink raw reply	[flat|nested] 13+ messages in thread
* Re: assertion (cnt <= tp->packets_out) failed
@ 2005-08-07 16:37 Heikki Orsila
  0 siblings, 0 replies; 13+ messages in thread
From: Heikki Orsila @ 2005-08-07 16:37 UTC (permalink / raw)
  To: Linux Kernel Mailing List

Heikki Orsila wrote:
> There were big changes in tcp_output.c between rc1 and rc2, and the 
> bug is triggered when using e1000 with rc2 or later. And because the 
> bug does not happen on skge (new sk98 driver) it makes me guess it's a 
> race condition of sorts.. I am surprised this bug wasn't noticed with 
> rc2.

One more bit of info: there was no e1000 driver changes between rc1 and 
rc2, which increases the evidence that the error was induced by
tcp_output.c.

-- 
Heikki Orsila			Barbie's law:
heikki.orsila@iki.fi		"Math is hard, let's go shopping!"
http://www.iki.fi/shd

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2005-08-07 21:31 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-08-05 15:53 assertion (cnt <= tp->packets_out) failed John Bäckstrand
2005-08-05 16:32 ` David S. Miller
2005-08-06  2:24   ` Herbert Xu
2005-08-06  7:57     ` Herbert Xu
2005-08-06 12:06       ` John Bäckstrand
2005-08-07 12:02         ` Herbert Xu
2005-08-06 13:30       ` David S. Miller
2005-08-07 13:25       ` John Bäckstrand
2005-08-07 17:11         ` Andrew Morton
2005-08-07 21:30         ` Herbert Xu
  -- strict thread matches above, loose matches on Subject: below --
2005-08-07 16:20 Heikki Orsila
2005-08-07 16:29 Heikki Orsila
2005-08-07 16:37 Heikki Orsila

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox