netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: 2.6.9 failed assertion in tcp_timer.c
       [not found]           ` <20041021165224.GA25399@linuxace.com>
@ 2004-10-21 21:57             ` Herbert Xu
  2004-10-22  5:53               ` David S. Miller
  2004-10-22 18:54               ` Phil Oester
  0 siblings, 2 replies; 9+ messages in thread
From: Herbert Xu @ 2004-10-21 21:57 UTC (permalink / raw)
  To: Phil Oester; +Cc: linux-net, netdev, David S. Miller

[-- Attachment #1: Type: text/plain, Size: 633 bytes --]

On Thu, Oct 21, 2004 at 09:52:24AM -0700, Phil Oester wrote:
> 
> KERNEL: assertion (!tcp_get_pcount(&tp->packets_out) || !skb_
> 	queue_empty(&sk->sk_write_queue)) failed at net/ipv4/tcp_input.c (2496)

Perhaps the MTU is growing upwards? Does this patch help?

Dave, we need to use the skb's MSS in trim_head as otherwise the
counters will be screwed up.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

[-- Attachment #2: p --]
[-- Type: text/plain, Size: 603 bytes --]

===== net/ipv4/tcp_output.c 1.68 vs edited =====
--- 1.68/net/ipv4/tcp_output.c	2004-10-20 15:13:52 +10:00
+++ edited/net/ipv4/tcp_output.c	2004-10-22 07:56:45 +10:00
@@ -566,8 +566,6 @@
 
 int tcp_trim_head(struct sock *sk, struct sk_buff *skb, u32 len)
 {
-	struct tcp_opt *tp = tcp_sk(sk);
-
 	if (skb_cloned(skb) &&
 	    pskb_expand_head(skb, 0, 0, GFP_ATOMIC))
 		return -ENOMEM;
@@ -590,7 +588,7 @@
 	/* Any change of skb->len requires recalculation of tso
 	 * factor and mss.
 	 */
-	tcp_set_skb_tso_segs(skb, tp->mss_cache_std);
+	tcp_set_skb_tso_segs(skb, tcp_skb_mss(skb));
 
 	return 0;
 }

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.9 failed assertion in tcp_timer.c
  2004-10-21 21:57             ` 2.6.9 failed assertion in tcp_timer.c Herbert Xu
@ 2004-10-22  5:53               ` David S. Miller
  2004-10-22 18:54               ` Phil Oester
  1 sibling, 0 replies; 9+ messages in thread
From: David S. Miller @ 2004-10-22  5:53 UTC (permalink / raw)
  To: Herbert Xu; +Cc: kernel, linux-net, netdev

On Fri, 22 Oct 2004 07:57:43 +1000
Herbert Xu <herbert@gondor.apana.org.au> wrote:

> Dave, we need to use the skb's MSS in trim_head as otherwise the
> counters will be screwed up.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Excellent catch.  Patch applied, thanks Herbert.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.9 failed assertion in tcp_timer.c
  2004-10-21 21:57             ` 2.6.9 failed assertion in tcp_timer.c Herbert Xu
  2004-10-22  5:53               ` David S. Miller
@ 2004-10-22 18:54               ` Phil Oester
  2004-10-22 21:50                 ` Herbert Xu
  1 sibling, 1 reply; 9+ messages in thread
From: Phil Oester @ 2004-10-22 18:54 UTC (permalink / raw)
  To: Herbert Xu; +Cc: linux-net, netdev, David S. Miller

On Fri, Oct 22, 2004 at 07:57:43AM +1000, Herbert Xu wrote:
> On Thu, Oct 21, 2004 at 09:52:24AM -0700, Phil Oester wrote:
> > 
> > KERNEL: assertion (!tcp_get_pcount(&tp->packets_out) || !skb_
> > 	queue_empty(&sk->sk_write_queue)) failed at net/ipv4/tcp_input.c (2496)
> 
> Perhaps the MTU is growing upwards? Does this patch help?
> 
> Dave, we need to use the skb's MSS in trim_head as otherwise the
> counters will be screwed up.

This patch gets it a bit further, but still panics (divide by 0?):


divide error: 0000 [#1]
SMP 
CPU:    0
EIP:    0060:[<c029e36c>]    Not tainted VLI
EFLAGS: 00010246   (2.6.9) 
EIP is at tcp_set_skb_tso_segs+0x2c/0x50
eax: 000000f3   ebx: d75f5b00   ecx: 00000000   edx: 00000000
esi: 00000001   edi: d6c22460   ebp: d6c22650   esp: c035decc
ds: 007b   es: 007b   ss: 0068
Process swapper (pid: 0, threadinfo=c035c000 task=c030eac0)
Stack: d75f5b00 c029e858 d75f5b00 d75f5b00 d6c22460 fffffff5 c029f3dd c02971a6 
       00000002 000005b4 d6c22460 d6c22650 d6c224c4 c035df58 c02a148d 00000002 
       00004100 00000080 f7c52260 c0393bec c038bda0 f710f130 00000010 c035df4c 
Call Trace:
 [<c029e858>] tcp_trim_head+0x88/0xc0
 [<c029f3dd>] tcp_retransmit_skb+0x8d/0x320
 [<c02971a6>] tcp_enter_loss+0x76/0x240
 [<c02a148d>] tcp_retransmit_timer+0xfd/0x430
 [<c02a1889>] tcp_write_timer+0xc9/0x100
 [<c02a17c0>] tcp_write_timer+0x0/0x100
 [<c011ee49>] run_timer_softirq+0xd9/0x170
 [<c011b0f6>] __do_softirq+0xb6/0xd0
 [<c011b13d>] do_softirq+0x2d/0x30
 [<c010de35>] smp_apic_timer_interrupt+0x85/0xf0
 [<c010492e>] apic_timer_interrupt+0x1a/0x20
 [<c0101f40>] default_idle+0x0/0x40
 [<c0101f6c>] default_idle+0x2c/0x40
 [<c0101ffb>] cpu_idle+0x3b/0x50
 [<c035ea66>] start_kernel+0x156/0x180
 [<c035e530>] unknown_bootoption+0x0/0x180

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.9 failed assertion in tcp_timer.c
  2004-10-22 18:54               ` Phil Oester
@ 2004-10-22 21:50                 ` Herbert Xu
  2004-10-22 22:12                   ` David S. Miller
  2004-10-22 23:24                   ` Herbert Xu
  0 siblings, 2 replies; 9+ messages in thread
From: Herbert Xu @ 2004-10-22 21:50 UTC (permalink / raw)
  To: Phil Oester; +Cc: linux-net, netdev, David S. Miller

[-- Attachment #1: Type: text/plain, Size: 603 bytes --]

On Fri, Oct 22, 2004 at 11:54:35AM -0700, Phil Oester wrote:
> 
> This patch gets it a bit further, but still panics (divide by 0?):

Oops, I forgot that this gets called on non-TSO packets as well.
So we don't really know whether it went further or not :)

We should only set the TSO size if the packet was TSO to start with.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

[-- Attachment #2: p --]
[-- Type: text/plain, Size: 390 bytes --]

--- linux-2.6/net/ipv4/tcp_output.c.orig	2004-10-22 07:56:45.000000000 +1000
+++ linux-2.6/net/ipv4/tcp_output.c	2004-10-23 07:48:15.000000000 +1000
@@ -588,7 +588,8 @@
 	/* Any change of skb->len requires recalculation of tso
 	 * factor and mss.
 	 */
-	tcp_set_skb_tso_segs(skb, tcp_skb_mss(skb));
+	if (tcp_skb_mss(skb))
+		tcp_set_skb_tso_segs(skb, tcp_skb_mss(skb));
 
 	return 0;
 }

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.9 failed assertion in tcp_timer.c
  2004-10-22 21:50                 ` Herbert Xu
@ 2004-10-22 22:12                   ` David S. Miller
  2004-10-22 23:24                   ` Herbert Xu
  1 sibling, 0 replies; 9+ messages in thread
From: David S. Miller @ 2004-10-22 22:12 UTC (permalink / raw)
  To: Herbert Xu; +Cc: kernel, linux-net, netdev

On Sat, 23 Oct 2004 07:50:40 +1000
Herbert Xu <herbert@gondor.apana.org.au> wrote:

> On Fri, Oct 22, 2004 at 11:54:35AM -0700, Phil Oester wrote:
> > 
> > This patch gets it a bit further, but still panics (divide by 0?):
> 
> Oops, I forgot that this gets called on non-TSO packets as well.
> So we don't really know whether it went further or not :)
> 
> We should only set the TSO size if the packet was TSO to start with.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Applied.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.9 failed assertion in tcp_timer.c
  2004-10-22 21:50                 ` Herbert Xu
  2004-10-22 22:12                   ` David S. Miller
@ 2004-10-22 23:24                   ` Herbert Xu
  2004-10-25 19:58                     ` Phil Oester
  2004-10-26  4:09                     ` David S. Miller
  1 sibling, 2 replies; 9+ messages in thread
From: Herbert Xu @ 2004-10-22 23:24 UTC (permalink / raw)
  To: Phil Oester; +Cc: linux-net, netdev, David S. Miller

[-- Attachment #1: Type: text/plain, Size: 1079 bytes --]

On Sat, Oct 23, 2004 at 07:50:40AM +1000, herbert wrote:
>
> So we don't really know whether it went further or not :)

Actually, I think we've caught your crash now.  If that code path
is triggering at all, then it'll trigger with TSO packets too.  If
we get a truly partial ack on a TSO packet, then tcp_tso_acked will
not trim it off.  So we will fall through to this last-ditch trim
call, which doesn't update packets_out.

There are two solutions to this problem.  I've taken the simpler
approach for now.  We simply trim off the partial bits in tcp_tso_acked
and live with the fact that the packet counters may differ from
what's on the netwrok by one.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Later on we can "fix" this by remembering where the original TSO
packet started from, perhaps in skb->h or somewhere.  Dave, is this
worth it?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

[-- Attachment #2: p --]
[-- Type: text/plain, Size: 1327 bytes --]

===== net/ipv4/tcp_input.c 1.81 vs edited =====
--- 1.81/net/ipv4/tcp_input.c	2004-10-04 07:31:39 +10:00
+++ edited/net/ipv4/tcp_input.c	2004-10-23 09:12:42 +10:00
@@ -2369,25 +2369,19 @@
 {
 	struct tcp_opt *tp = tcp_sk(sk);
 	struct tcp_skb_cb *scb = TCP_SKB_CB(skb); 
-	__u32 mss = tcp_skb_mss(skb);
-	__u32 snd_una = tp->snd_una;
-	__u32 orig_seq, seq;
-	__u32 packets_acked = 0;
+	__u32 seq = tp->snd_una;
+	__u32 packets_acked;
 	int acked = 0;
 
 	/* If we get here, the whole TSO packet has not been
 	 * acked.
 	 */
-	BUG_ON(!after(scb->end_seq, snd_una));
+	BUG_ON(!after(scb->end_seq, seq));
 
-	seq = orig_seq = scb->seq;
-	while (!after(seq + mss, snd_una)) {
-		packets_acked++;
-		seq += mss;
-	}
-
-	if (tcp_trim_head(sk, skb, (seq - orig_seq)))
+	packets_acked = tcp_skb_pcount(skb);
+	if (tcp_trim_head(sk, skb, seq - scb->seq))
 		return 0;
+	packets_acked -= tcp_skb_pcount(skb);
 
 	if (packets_acked) {
 		__u8 sacked = scb->sacked;
--- linux-2.6/net/ipv4/tcp_output.c.orig	2004-10-23 08:44:16.000000000 +1000
+++ linux-2.6/net/ipv4/tcp_output.c	2004-10-23 09:10:13.000000000 +1000
@@ -588,7 +588,7 @@
 	/* Any change of skb->len requires recalculation of tso
 	 * factor and mss.
 	 */
-	if (tcp_skb_mss(skb))
+	if (tcp_skb_pcount(skb) > 1)
 		tcp_set_skb_tso_segs(skb, tcp_skb_mss(skb));
 
 	return 0;

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.9 failed assertion in tcp_timer.c
  2004-10-22 23:24                   ` Herbert Xu
@ 2004-10-25 19:58                     ` Phil Oester
  2004-10-25 20:25                       ` David S. Miller
  2004-10-26  4:09                     ` David S. Miller
  1 sibling, 1 reply; 9+ messages in thread
From: Phil Oester @ 2004-10-25 19:58 UTC (permalink / raw)
  To: Herbert Xu; +Cc: linux-net, netdev, David S. Miller

On Sat, Oct 23, 2004 at 09:24:03AM +1000, Herbert Xu wrote:
> Actually, I think we've caught your crash now.  If that code path
> is triggering at all, then it'll trigger with TSO packets too.  If
> we get a truly partial ack on a TSO packet, then tcp_tso_acked will
> not trim it off.  So we will fall through to this last-ditch trim
> call, which doesn't update packets_out.
> 
> There are two solutions to this problem.  I've taken the simpler
> approach for now.  We simply trim off the partial bits in tcp_tso_acked
> and live with the fact that the packet counters may differ from
> what's on the netwrok by one.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> 
> Later on we can "fix" this by remembering where the original TSO
> packet started from, perhaps in skb->h or somewhere.  Dave, is this
> worth it?

Looks like your most recent 2 patches did the trick.  Uptime on the
test box is over 4 hours, whereas before I could count on a panic
within about an hour.  I'll continue running 2.6.9 + your patches
and let you know if this changes...

Thanks,
Phil

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.9 failed assertion in tcp_timer.c
  2004-10-25 19:58                     ` Phil Oester
@ 2004-10-25 20:25                       ` David S. Miller
  0 siblings, 0 replies; 9+ messages in thread
From: David S. Miller @ 2004-10-25 20:25 UTC (permalink / raw)
  To: Phil Oester; +Cc: herbert, linux-net, netdev

On Mon, 25 Oct 2004 12:58:15 -0700
Phil Oester <kernel@linuxace.com> wrote:

> > Later on we can "fix" this by remembering where the original TSO
> > packet started from, perhaps in skb->h or somewhere.  Dave, is this
> > worth it?
> 
> Looks like your most recent 2 patches did the trick.  Uptime on the
> test box is over 4 hours, whereas before I could count on a panic
> within about an hour.  I'll continue running 2.6.9 + your patches
> and let you know if this changes...

Thanks for testing Phil.

Herbert I'll ponder over this patch today.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.9 failed assertion in tcp_timer.c
  2004-10-22 23:24                   ` Herbert Xu
  2004-10-25 19:58                     ` Phil Oester
@ 2004-10-26  4:09                     ` David S. Miller
  1 sibling, 0 replies; 9+ messages in thread
From: David S. Miller @ 2004-10-26  4:09 UTC (permalink / raw)
  To: Herbert Xu; +Cc: kernel, linux-net, netdev

On Sat, 23 Oct 2004 09:24:03 +1000
Herbert Xu <herbert@gondor.apana.org.au> wrote:

> Actually, I think we've caught your crash now.  If that code path
> is triggering at all, then it'll trigger with TSO packets too.  If
> we get a truly partial ack on a TSO packet, then tcp_tso_acked will
> not trim it off.  So we will fall through to this last-ditch trim
> call, which doesn't update packets_out.
> 
> There are two solutions to this problem.  I've taken the simpler
> approach for now.  We simply trim off the partial bits in tcp_tso_acked
> and live with the fact that the packet counters may differ from
> what's on the netwrok by one.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> 
> Later on we can "fix" this by remembering where the original TSO
> packet started from, perhaps in skb->h or somewhere.  Dave, is this
> worth it?

I'm going to apply this patch for now.

I think at this point we can generalize TSO processing a bit
better in this code now.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2004-10-26  4:09 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20041019215054.GA17565@linuxace.com>
     [not found] ` <E1CK3Az-0004P2-00@gondolin.me.apana.org.au>
     [not found]   ` <20041020164748.GA20905@linuxace.com>
     [not found]     ` <20041020212651.GA8534@gondor.apana.org.au>
     [not found]       ` <20041020225536.GA22179@linuxace.com>
     [not found]         ` <20041021130339.GA19345@gondor.apana.org.au>
     [not found]           ` <20041021165224.GA25399@linuxace.com>
2004-10-21 21:57             ` 2.6.9 failed assertion in tcp_timer.c Herbert Xu
2004-10-22  5:53               ` David S. Miller
2004-10-22 18:54               ` Phil Oester
2004-10-22 21:50                 ` Herbert Xu
2004-10-22 22:12                   ` David S. Miller
2004-10-22 23:24                   ` Herbert Xu
2004-10-25 19:58                     ` Phil Oester
2004-10-25 20:25                       ` David S. Miller
2004-10-26  4:09                     ` David S. Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).