netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* kernel BUG at include/net/tcp.h:739
@ 2007-05-02 20:15 Michal Piotrowski
  2007-05-03  6:51 ` Ilpo Järvinen
  0 siblings, 1 reply; 4+ messages in thread
From: Michal Piotrowski @ 2007-05-02 20:15 UTC (permalink / raw)
  To: Ilpo Järvinen, David S. Miller, netdev, LKML

Hi Ilpo,

Please take a look at this bug

[15236.638092] kernel BUG at /mnt/md0/devel/linux-git/include/net/tcp.h:739!
[15236.644860] invalid opcode: 0000 [#1]
[15236.648514] PREEMPT SMP 
[15236.651075] Modules linked in: ipt_MASQUERADE iptable_nat nf_nat autofs4 af_packet nf_conntrack_netbios_ns ipt_REJECT nf_conntrack_ipv4 xt_state nf_conntrack nfnetlink iptable_filter ip_tables ip6t_REJECT xt_tcpudp ip6table_filter ip6_tables x_tables ipv6 binfmt_misc thermal processor fan container nvram snd_intel8x0 snd_ac97_codec ac97_bus snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm evdev snd_timer snd intel_agp i2c_i801 soundcore agpgart snd_page_alloc ide_cd cdrom rtc unix
[15236.698898] CPU:    0
[15236.698899] EIP:    0060:[<c02f798b>]    Not tainted VLI
[15236.698900] EFLAGS: 00010206   (2.6.21-gdc87c398 #169)
[15236.711580] EIP is at tcp_ack+0xc54/0x16a0
[15236.715664] eax: 00000017   ebx: 00000000   ecx: 00000003   edx: 0000010e
[15236.722433] esi: d5bc1254   edi: 0000010e   ebp: c0462e18   esp: c0462da8
[15236.729202] ds: 007b   es: 007b   fs: 00d8  gs: 0000  ss: 0068
[15236.735019] Process swapper (pid: 0, ti=c0462000 task=c03f14e0 task.ti=c0427000)
[15236.742219] Stack: 00000000 00000000 00000000 00000198 00000100 00000000 00000000 00000018 
[15236.750698]        3b4d5775 0ffef1c0 3b4d79ad 00000001 00000018 00000006 3b4d79ad 003f14e0 
[15236.759178]        00000006 0000082a d4b9cde0 00e4059a 0000000c 00000000 00000000 0130204a 
[15236.767649] Call Trace:
[15236.770286]  [<c0105039>] show_trace_log_lvl+0x1a/0x2f
[15236.775438]  [<c01050eb>] show_stack_log_lvl+0x9d/0xa5
[15236.780590]  [<c01052e0>] show_registers+0x1ed/0x32c
[15236.785569]  [<c0105537>] die+0x118/0x22f
[15236.789588]  [<c01056c7>] do_trap+0x79/0x91
[15236.793781]  [<c0105f57>] do_invalid_op+0x97/0xa1
[15236.798501]  [<c031f38c>] error_code+0x7c/0x84
[15236.802960]  [<c02fafb7>] tcp_rcv_established+0x568/0x645
[15236.808354]  [<c03000bd>] tcp_v4_do_rcv+0x2b/0x32c
[15236.813144]  [<c030243a>] tcp_v4_rcv+0x7f9/0x86b
[15236.817777]  [<c02e9dd3>] ip_local_deliver+0x170/0x235
[15236.822928]  [<c02e9c2a>] ip_rcv+0x4f3/0x52c
[15236.827199]  [<c02d4909>] netif_receive_skb+0x1b9/0x252
[15236.832437]  [<c02635c2>] skge_poll+0x47a/0x545
[15236.836967]  [<c02d68df>] net_rx_action+0x9f/0x192
[15236.841772]  [<c0126408>] __do_softirq+0x6d/0xea
[15236.846407]  [<c01069b5>] do_softirq+0x64/0xd1
[15236.850867]  =======================
[15236.854437] Code: 69 42 c0 f7 d0 64 8b 15 04 00 00 00 8b 04 90 ff 80 a4 00 00 00 8b 9e 70 05 00 00 89 d8 03 86 74 05 00 00 3b 86 8c 04 00 00 76 04 <0f> 0b eb fe 89 86 90 04 00 00 8a 86 88 03 00 00 84 c0 75 3c 83 
[15236.874343] EIP: [<c02f798b>] tcp_ack+0xc54/0x16a0 SS:ESP 0068:c0462da8

l *0xc02f798b
0xc02f798b is in tcp_ack (/mnt/md0/devel/linux-git/include/net/tcp.h:739).
734                                 (tp->snd_cwnd >> 2)));
735     }
736
737     static inline void tcp_sync_left_out(struct tcp_sock *tp)
738     {
739             BUG_ON(tp->sacked_out + tp->lost_out > tp->packets_out);
740             tp->left_out = tp->sacked_out + tp->lost_out;
741     }
742
743     extern void tcp_enter_cwr(struct sock *sk, const int set_ssthresh);

Caused by commit 34588b4c046c34773e5a1a962da7b78b05c4d1bd 

http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-git4/serial-console.log
http://www.stardust.webpages.pl/files/tbf/bitis-gabonica/2.6.21-git4/git-config

Regards,
Michal

-- 
Michal K. K. Piotrowski
Kernel Monkeys
(http://kernel.wikidot.com/start)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: kernel BUG at include/net/tcp.h:739
  2007-05-02 20:15 kernel BUG at include/net/tcp.h:739 Michal Piotrowski
@ 2007-05-03  6:51 ` Ilpo Järvinen
  2007-05-03 10:31   ` David Miller
  2007-05-03 14:07   ` Michal Piotrowski
  0 siblings, 2 replies; 4+ messages in thread
From: Ilpo Järvinen @ 2007-05-03  6:51 UTC (permalink / raw)
  To: Michal Piotrowski, David Miller; +Cc: Netdev, LKML

[-- Attachment #1: Type: TEXT/PLAIN, Size: 5227 bytes --]

On Wed, 2 May 2007, Michal Piotrowski wrote:

> Please take a look at this bug
> 
> [15236.638092] kernel BUG at /mnt/md0/devel/linux-git/include/net/tcp.h:739!
> [15236.644860] invalid opcode: 0000 [#1]
> [15236.648514] PREEMPT SMP 
> [15236.651075] Modules linked in: ipt_MASQUERADE iptable_nat nf_nat autofs4 af_packet nf_conntrack_netbios_ns ipt_REJECT nf_conntrack_ipv4 xt_state nf_conntrack nfnetlink iptable_filter ip_tables ip6t_REJECT xt_tcpudp ip6table_filter ip6_tables x_tables ipv6 binfmt_misc thermal processor fan container nvram snd_intel8x0 snd_ac97_codec ac97_bus snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm evdev snd_timer snd intel_agp i2c_i801 soundcore agpgart snd_page_alloc ide_cd cdrom rtc unix
> [15236.698898] CPU:    0
> [15236.698899] EIP:    0060:[<c02f798b>]    Not tainted VLI
> [15236.698900] EFLAGS: 00010206   (2.6.21-gdc87c398 #169)
> [15236.711580] EIP is at tcp_ack+0xc54/0x16a0
> [15236.715664] eax: 00000017   ebx: 00000000   ecx: 00000003   edx: 0000010e
> [15236.722433] esi: d5bc1254   edi: 0000010e   ebp: c0462e18   esp: c0462da8
> [15236.729202] ds: 007b   es: 007b   fs: 00d8  gs: 0000  ss: 0068
> [15236.735019] Process swapper (pid: 0, ti=c0462000 task=c03f14e0 task.ti=c0427000)
> [15236.742219] Stack: 00000000 00000000 00000000 00000198 00000100 00000000 00000000 00000018 
> [15236.750698]        3b4d5775 0ffef1c0 3b4d79ad 00000001 00000018 00000006 3b4d79ad 003f14e0 
> [15236.759178]        00000006 0000082a d4b9cde0 00e4059a 0000000c 00000000 00000000 0130204a 
> [15236.767649] Call Trace:
> [15236.770286]  [<c0105039>] show_trace_log_lvl+0x1a/0x2f
> [15236.775438]  [<c01050eb>] show_stack_log_lvl+0x9d/0xa5
> [15236.780590]  [<c01052e0>] show_registers+0x1ed/0x32c
> [15236.785569]  [<c0105537>] die+0x118/0x22f
> [15236.789588]  [<c01056c7>] do_trap+0x79/0x91
> [15236.793781]  [<c0105f57>] do_invalid_op+0x97/0xa1
> [15236.798501]  [<c031f38c>] error_code+0x7c/0x84
> [15236.802960]  [<c02fafb7>] tcp_rcv_established+0x568/0x645
> [15236.808354]  [<c03000bd>] tcp_v4_do_rcv+0x2b/0x32c
> [15236.813144]  [<c030243a>] tcp_v4_rcv+0x7f9/0x86b
> [15236.817777]  [<c02e9dd3>] ip_local_deliver+0x170/0x235
> [15236.822928]  [<c02e9c2a>] ip_rcv+0x4f3/0x52c
> [15236.827199]  [<c02d4909>] netif_receive_skb+0x1b9/0x252
> [15236.832437]  [<c02635c2>] skge_poll+0x47a/0x545
> [15236.836967]  [<c02d68df>] net_rx_action+0x9f/0x192
> [15236.841772]  [<c0126408>] __do_softirq+0x6d/0xea
> [15236.846407]  [<c01069b5>] do_softirq+0x64/0xd1
> [15236.850867]  =======================
> [15236.854437] Code: 69 42 c0 f7 d0 64 8b 15 04 00 00 00 8b 04 90 ff 80 a4 00 00 00 8b 9e 70 05 00 00 89 d8 03 86 74 05 00 00 3b 86 8c 04 00 00 76 04 <0f> 0b eb fe 89 86 90 04 00 00 8a 86 88 03 00 00 84 c0 75 3c 83 
> [15236.874343] EIP: [<c02f798b>] tcp_ack+0xc54/0x16a0 SS:ESP 0068:c0462da8
> 
> l *0xc02f798b
> 0xc02f798b is in tcp_ack (/mnt/md0/devel/linux-git/include/net/tcp.h:739).
> 734                                 (tp->snd_cwnd >> 2)));
> 735     }
> 736
> 737     static inline void tcp_sync_left_out(struct tcp_sock *tp)
> 738     {
> 739             BUG_ON(tp->sacked_out + tp->lost_out > tp->packets_out);
> 740             tp->left_out = tp->sacked_out + tp->lost_out;
> 741     }
> 742
> 743     extern void tcp_enter_cwr(struct sock *sk, const int set_ssthresh);
> 
> Caused by commit 34588b4c046c34773e5a1a962da7b78b05c4d1bd 

I think I found the reason:

tcp_clean_rtx_queue without SACK it does not decrement sacked_out but
tcp_reset_reno_sack/remove_reno_sack is being called later in
tcp_fastretrans_alert. Before that, tcp_sync_left_out is being called, 
at least by tcp_fastretrans_alert itself, which sees sacked_out that 
includes also segments that are no longer in window (this also explains 
why the original code did not do the reduction in the non-SACK case). Also 
tcp_process_frto calls tcp_sync_left_out, so it would also lead to the 
same problem.

Here is a change that ignores this trap without SACK. However, it would be 
useful to trap this without SACK too as S+L skb causes potentially a 
negative packets in flight (= large one) disturbing cwnd compares. 


[PATCH] [TCP]: Use S+L catcher only with SACK for now

TCP has a transitional state when SACK is not in use during
which this invariant is temporarily broken. Without SACK,
tcp_clean_rtx_queue does not decrement sacked_out. Therefore
calls to tcp_sync_left_out before sacked_out is again
corrected by tcp_fastretrans_alert can trigger this trap as
sacked_out still has couple of segments that are already out
of window.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 include/net/tcp.h |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index ef8f9d4..e22b4f0 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -736,7 +736,8 @@ static inline __u32 tcp_current_ssthresh
 
 static inline void tcp_sync_left_out(struct tcp_sock *tp)
 {
-	BUG_ON(tp->sacked_out + tp->lost_out > tp->packets_out);
+	BUG_ON(tp->rx_opt.sack_ok &&
+	       (tp->sacked_out + tp->lost_out > tp->packets_out));
 	tp->left_out = tp->sacked_out + tp->lost_out;
 }
 
-- 
1.4.2



^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: kernel BUG at include/net/tcp.h:739
  2007-05-03  6:51 ` Ilpo Järvinen
@ 2007-05-03 10:31   ` David Miller
  2007-05-03 14:07   ` Michal Piotrowski
  1 sibling, 0 replies; 4+ messages in thread
From: David Miller @ 2007-05-03 10:31 UTC (permalink / raw)
  To: ilpo.jarvinen; +Cc: michal.k.k.piotrowski, netdev, linux-kernel

From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi>
Date: Thu, 3 May 2007 09:51:36 +0300 (EEST)

> [PATCH] [TCP]: Use S+L catcher only with SACK for now
> 
> TCP has a transitional state when SACK is not in use during
> which this invariant is temporarily broken. Without SACK,
> tcp_clean_rtx_queue does not decrement sacked_out. Therefore
> calls to tcp_sync_left_out before sacked_out is again
> corrected by tcp_fastretrans_alert can trigger this trap as
> sacked_out still has couple of segments that are already out
> of window.
> 
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>

It would be nice to fix non-SACK as you say, but this is good
enough for now.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: kernel BUG at include/net/tcp.h:739
  2007-05-03  6:51 ` Ilpo Järvinen
  2007-05-03 10:31   ` David Miller
@ 2007-05-03 14:07   ` Michal Piotrowski
  1 sibling, 0 replies; 4+ messages in thread
From: Michal Piotrowski @ 2007-05-03 14:07 UTC (permalink / raw)
  To: Ilpo Järvinen; +Cc: Michal Piotrowski, David Miller, Netdev, LKML

Ilpo Järvinen napisał(a):
> On Wed, 2 May 2007, Michal Piotrowski wrote:
> 
>> Please take a look at this bug
>>
>> [15236.638092] kernel BUG at /mnt/md0/devel/linux-git/include/net/tcp.h:739!
>> [15236.644860] invalid opcode: 0000 [#1]
>> [15236.648514] PREEMPT SMP 
>> [15236.651075] Modules linked in: ipt_MASQUERADE iptable_nat nf_nat autofs4 af_packet nf_conntrack_netbios_ns ipt_REJECT nf_conntrack_ipv4 xt_state nf_conntrack nfnetlink iptable_filter ip_tables ip6t_REJECT xt_tcpudp ip6table_filter ip6_tables x_tables ipv6 binfmt_misc thermal processor fan container nvram snd_intel8x0 snd_ac97_codec ac97_bus snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm evdev snd_timer snd intel_agp i2c_i801 soundcore agpgart snd_page_alloc ide_cd cdrom rtc unix
>> [15236.698898] CPU:    0
>> [15236.698899] EIP:    0060:[<c02f798b>]    Not tainted VLI
>> [15236.698900] EFLAGS: 00010206   (2.6.21-gdc87c398 #169)
>> [15236.711580] EIP is at tcp_ack+0xc54/0x16a0
>> [15236.715664] eax: 00000017   ebx: 00000000   ecx: 00000003   edx: 0000010e
>> [15236.722433] esi: d5bc1254   edi: 0000010e   ebp: c0462e18   esp: c0462da8
>> [15236.729202] ds: 007b   es: 007b   fs: 00d8  gs: 0000  ss: 0068
>> [15236.735019] Process swapper (pid: 0, ti=c0462000 task=c03f14e0 task.ti=c0427000)
>> [15236.742219] Stack: 00000000 00000000 00000000 00000198 00000100 00000000 00000000 00000018 
>> [15236.750698]        3b4d5775 0ffef1c0 3b4d79ad 00000001 00000018 00000006 3b4d79ad 003f14e0 
>> [15236.759178]        00000006 0000082a d4b9cde0 00e4059a 0000000c 00000000 00000000 0130204a 
>> [15236.767649] Call Trace:
>> [15236.770286]  [<c0105039>] show_trace_log_lvl+0x1a/0x2f
>> [15236.775438]  [<c01050eb>] show_stack_log_lvl+0x9d/0xa5
>> [15236.780590]  [<c01052e0>] show_registers+0x1ed/0x32c
>> [15236.785569]  [<c0105537>] die+0x118/0x22f
>> [15236.789588]  [<c01056c7>] do_trap+0x79/0x91
>> [15236.793781]  [<c0105f57>] do_invalid_op+0x97/0xa1
>> [15236.798501]  [<c031f38c>] error_code+0x7c/0x84
>> [15236.802960]  [<c02fafb7>] tcp_rcv_established+0x568/0x645
>> [15236.808354]  [<c03000bd>] tcp_v4_do_rcv+0x2b/0x32c
>> [15236.813144]  [<c030243a>] tcp_v4_rcv+0x7f9/0x86b
>> [15236.817777]  [<c02e9dd3>] ip_local_deliver+0x170/0x235
>> [15236.822928]  [<c02e9c2a>] ip_rcv+0x4f3/0x52c
>> [15236.827199]  [<c02d4909>] netif_receive_skb+0x1b9/0x252
>> [15236.832437]  [<c02635c2>] skge_poll+0x47a/0x545
>> [15236.836967]  [<c02d68df>] net_rx_action+0x9f/0x192
>> [15236.841772]  [<c0126408>] __do_softirq+0x6d/0xea
>> [15236.846407]  [<c01069b5>] do_softirq+0x64/0xd1
>> [15236.850867]  =======================
>> [15236.854437] Code: 69 42 c0 f7 d0 64 8b 15 04 00 00 00 8b 04 90 ff 80 a4 00 00 00 8b 9e 70 05 00 00 89 d8 03 86 74 05 00 00 3b 86 8c 04 00 00 76 04 <0f> 0b eb fe 89 86 90 04 00 00 8a 86 88 03 00 00 84 c0 75 3c 83 
>> [15236.874343] EIP: [<c02f798b>] tcp_ack+0xc54/0x16a0 SS:ESP 0068:c0462da8
>>
>> l *0xc02f798b
>> 0xc02f798b is in tcp_ack (/mnt/md0/devel/linux-git/include/net/tcp.h:739).
>> 734                                 (tp->snd_cwnd >> 2)));
>> 735     }
>> 736
>> 737     static inline void tcp_sync_left_out(struct tcp_sock *tp)
>> 738     {
>> 739             BUG_ON(tp->sacked_out + tp->lost_out > tp->packets_out);
>> 740             tp->left_out = tp->sacked_out + tp->lost_out;
>> 741     }
>> 742
>> 743     extern void tcp_enter_cwr(struct sock *sk, const int set_ssthresh);
>>
>> Caused by commit 34588b4c046c34773e5a1a962da7b78b05c4d1bd 
> 
> I think I found the reason:
> 
> tcp_clean_rtx_queue without SACK it does not decrement sacked_out but
> tcp_reset_reno_sack/remove_reno_sack is being called later in
> tcp_fastretrans_alert. Before that, tcp_sync_left_out is being called, 
> at least by tcp_fastretrans_alert itself, which sees sacked_out that 
> includes also segments that are no longer in window (this also explains 
> why the original code did not do the reduction in the non-SACK case). Also 
> tcp_process_frto calls tcp_sync_left_out, so it would also lead to the 
> same problem.
> 
> Here is a change that ignores this trap without SACK. However, it would be 
> useful to trap this without SACK too as S+L skb causes potentially a 
> negative packets in flight (= large one) disturbing cwnd compares. 
> 
> 

Thanks Ilpo. I'm testing the patch.


Regards,
Michal

-- 
Michal K. K. Piotrowski
Kernel Monkeys
(http://kernel.wikidot.com/start)

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2007-05-03 14:07 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-05-02 20:15 kernel BUG at include/net/tcp.h:739 Michal Piotrowski
2007-05-03  6:51 ` Ilpo Järvinen
2007-05-03 10:31   ` David Miller
2007-05-03 14:07   ` Michal Piotrowski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).