Re: 2.6.13-rc4 - kernel panic - BUG at net/ipv4/tcp

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Re: 2.6.13-rc4 - kernel panic - BUG at net/ipv4/tcp_output.c:918
@ 2005-08-01  8:33 Guillaume Pelat
  2005-08-04  3:33 ` Herbert Xu
  0 siblings, 1 reply; 9+ messages in thread
From: Guillaume Pelat @ 2005-08-01  8:33 UTC (permalink / raw)
  To: davem; +Cc: akpm, netdev, linux-kernel

Hi,

 > [TCP]: Fix two TSO sizing bugs
 >
 > MSS changes can be lost since we preemptively initialize
 > the tso_segs count for an SKB before we %100 commit
 > to sending it out.
 > So, by the time we send it out, the tso_size information
 > can be stale due to PMTU events.  This mucks up all of the
 > logic in our send engine, and can even result in the BUG()
 > triggering in tcp_tso_should_defer().
 > Another problem we have is that we're storing the tp->mss_cache,
 > not the SACK block normalized MSS, as the tso_size.  That's wrong
 > too.
 >
 > Signed-off-by: David S. Miller <davem@davemloft.net>

I just tried the patch attached. :)

The bug is still here (same symptoms), with a slightly different backtrace :
------------[ cut here ]------------
kernel BUG at net/ipv4/tcp_output.c:918!
invalid operand: 0000 [#1]
CPU:    0
EIP:    0060:[<c027dd66>]    Not tainted VLI
EFLAGS: 00010297   (2.6.13-rc4-endy)
EIP is at tcp_tso_should_defer+0xd6/0xf0
eax: 00000005   ebx: f5032e80   ecx: 00000007   edx: f3b2fc00
esi: 00000006   edi: 00000006   ebp: c031fd78   esp: c031fd68
ds: 007b   es: 007b   ss: 0068
Process swapper (pid: 0, threadinfo=c031e000 task=c02dbb80)
Stack: f129b2b8 f5032e80 00000006 f3b2fc00 c031fdb0 c027de7b f3b2fc00 
f3b2fc00
        f5032e80 0000000b f3b2fc00 b7773884 00000000 00000005 00000002 
f3b2fc00
        f3b2fc00 f5a26034 c031fdd4 c027e1b2 f3b2fc00 00000564 00000001 
f5a26034
Call Trace:
  [<c0102e5f>] show_stack+0x7f/0xa0
  [<c0103002>] show_registers+0x152/0x1c0
  [<c01031f8>] die+0xc8/0x140
  [<c0103325>] do_trap+0xb5/0xc0
  [<c010366c>] do_invalid_op+0xbc/0xd0
  [<c0102aa3>] error_code+0x4f/0x54
  [<c027de7b>] tcp_write_xmit+0xfb/0x400
  [<c027e1b2>] __tcp_push_pending_frames+0x32/0xd0
  [<c027b1cc>] tcp_rcv_established+0x27c/0x860 (was 
tcp_rcv_state_process before).
  [<c0283f8a>] tcp_v4_do_rcv+0x11a/0x120
  [<c0284502>] tcp_v4_rcv+0x572/0x750
  [<c026a62b>] ip_local_deliver+0xcb/0x1d0
  [<c026aa52>] ip_rcv+0x322/0x4a0
  [<c0256a97>] netif_receive_skb+0x137/0x1a0
  [<c0256b8f>] process_backlog+0x8f/0x110
  [<c0256c82>] net_rx_action+0x72/0x100
  [<c01172dc>] __do_softirq+0x8c/0xa0
  [<c011731a>] do_softirq+0x2a/0x30
  [<c01173d5>] irq_exit+0x35/0x40
  [<c01044fc>] do_IRQ+0x3c/0x70
  [<c0102a46>] common_interrupt+0x1a/0x20
  [<c0100997>] cpu_idle+0x57/0x60
  [<c010024b>] _stext+0x2b/0x30
  [<c0320847>] start_kernel+0x147/0x170
  [<c0100199>] 0xc0100199
Code: 89 f8 0f af c2 3b 45 f0 0f 47 45 f0 31 d2 89 45 f0 f7 f3 31 d2 39 
c1 73 ce ba 01 00 00 00 eb c7 6b c2 03 31 d239 c1 77 be eb ee <0f> 0b 96 
03 ce 54 2d c0 e9 76 ff ff ff 8b ba 78 02 00 00 eb eb
  <0>Kernel panic - not syncing: Fatal exception in interrupt

I guess it's the same bug :)

Just a few more infos about the problem :
- Turning off TSO with ethtool solves the problem.
- I tried 2.6.13-rc4 on another server (with the same configuration) and 
the bug occured too (so i guess it's not due to some weird memory 
problem :) )
- The problem dont seems to be present in 2.6.12.3 (but i only tried 
2.6.12.3 during 2 days).


Best Regards,

Guillaume Pelat

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.13-rc4 - kernel panic - BUG at net/ipv4/tcp_output.c:918
  2005-08-01  8:33 2.6.13-rc4 - kernel panic - BUG at net/ipv4/tcp_output.c:918 Guillaume Pelat
@ 2005-08-04  3:33 ` Herbert Xu
  2005-08-04 10:35   ` Herbert Xu
  0 siblings, 1 reply; 9+ messages in thread
From: Herbert Xu @ 2005-08-04  3:33 UTC (permalink / raw)
  To: Guillaume Pelat; +Cc: davem, akpm, netdev, linux-kernel

On Mon, Aug 01, 2005 at 08:33:20AM +0000, Guillaume Pelat wrote:
> 
> I just tried the patch attached. :)
> 
> The bug is still here (same symptoms), with a slightly different backtrace :
> ------------[ cut here ]------------
> kernel BUG at net/ipv4/tcp_output.c:918!

OK, let's try again :)

I bet it's the tcp_enter_cwr() call in tcp_transmit_skb().  So
the sequence is:

tcp_write_xmit
	cwnd_quota = tcp_cwnd_test
	tcp_transmit_skb
		tcp_enter_cwr
			tp->snd_cwnd = min(tp->snd_cwnd, in_flight + 1)

At this point cwnd_quota is out-of-sync with tp->snd_cwnd.

	cwnd_quota -= tcp_skb_pcount(skb)
	cwnd_quota > 0
	tcp_tso_should_defer
		BUG since tp->snd_cwnd is smaller than what
		cwnd_quota indicated.

So I suppose we should reset cwnd_quota after tcp_transmit_skb?

Perhaps we should only transmit one MSS in this case?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.13-rc4 - kernel panic - BUG at net/ipv4/tcp_output.c:918
  2005-08-04  3:33 ` Herbert Xu
@ 2005-08-04 10:35   ` Herbert Xu
  2005-08-04 17:41     ` Guillaume Pelat
  0 siblings, 1 reply; 9+ messages in thread
From: Herbert Xu @ 2005-08-04 10:35 UTC (permalink / raw)
  To: Guillaume Pelat; +Cc: davem, akpm, netdev, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 413 bytes --]

On Thu, Aug 04, 2005 at 01:33:29PM +1000, herbert wrote:
> 
> So I suppose we should reset cwnd_quota after tcp_transmit_skb?

Please try this patch to see if this is really the problem or not.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

[-- Attachment #2: p --]
[-- Type: text/plain, Size: 674 bytes --]

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1027,19 +1027,14 @@ static int tcp_write_xmit(struct sock *s
 		tcp_minshall_update(tp, mss_now, skb);
 		sent_pkts++;
 
-		/* Do not optimize this to use tso_segs. If we chopped up
-		 * the packet above, tso_segs will no longer be valid.
-		 */
-		cwnd_quota -= tcp_skb_pcount(skb);
-
-		BUG_ON(cwnd_quota < 0);
-		if (!cwnd_quota)
-			break;
-
 		skb = sk->sk_send_head;
 		if (!skb)
 			break;
+
 		tso_segs = tcp_init_tso_segs(sk, skb, mss_now);
+		cwnd_quota = tcp_cwnd_test(tp, skb);
+		if (!cwnd_quota)
+			break;
 	}
 
 	if (likely(sent_pkts)) {

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.13-rc4 - kernel panic - BUG at net/ipv4/tcp_output.c:918
  2005-08-04 10:35   ` Herbert Xu
@ 2005-08-04 17:41     ` Guillaume Pelat
  2005-08-04 18:23       ` [RFC] : SLAB : Could we have a process context only versions of kmem_cache_alloc(), and kmem_cache_free() Eric Dumazet
  2005-08-04 23:58       ` 2.6.13-rc4 - kernel panic - BUG at net/ipv4/tcp_output.c:918 Andrew Morton
  0 siblings, 2 replies; 9+ messages in thread
From: Guillaume Pelat @ 2005-08-04 17:41 UTC (permalink / raw)
  To: Herbert Xu; +Cc: davem, akpm, netdev, linux-kernel

Hi,

Herbert Xu wrote:
> On Thu, Aug 04, 2005 at 01:33:29PM +1000, herbert wrote:
> 
>>So I suppose we should reset cwnd_quota after tcp_transmit_skb?
> 
> Please try this patch to see if this is really the problem or not.
> 
> Thanks,

I just applied your patch, and it seems to work :)
2 hours uptime, and no crash yet (without the patch, it was crashing a 
few mins only after booting).
So i think the bug is crushed :)

Thanks a lot !

-- 
Guillaume Pelat

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [RFC] : SLAB : Could we have a process context only versions of  kmem_cache_alloc(), and kmem_cache_free()
  2005-08-04 17:41     ` Guillaume Pelat
@ 2005-08-04 18:23       ` Eric Dumazet
  2005-08-04 23:58       ` 2.6.13-rc4 - kernel panic - BUG at net/ipv4/tcp_output.c:918 Andrew Morton
  1 sibling, 0 replies; 9+ messages in thread
From: Eric Dumazet @ 2005-08-04 18:23 UTC (permalink / raw)
  To: linux-kernel

Hi

The cost of local_irq_save(flags)/local_irq_restore(flags) in slab functions is very high
  popf, cli, pushf do stress the modern processors.

Maybe we could provide special functions for caches that are known to be used only from process context ?


These functions may use the local_irq_save(flags)/local_irq_restore(flags) only if needed (cache_alloc_refill() or cache_flusharray())

Something like :

void *kmem_cache_alloc_noirq(kmem_cache_t *cachep, unsigned int __nocast flags)
{
         unsigned long save_flags;
         void* objp;
         struct array_cache *ac;

         cache_alloc_debugcheck_before(cachep, flags);
	check_irq_on();
         preempt_disable();
         ac = ac_data(cachep);
         if (likely(ac->avail)) {
                 STATS_INC_ALLOCHIT(cachep);
                 ac->touched = 1;
                 objp = ac_entry(ac)[--ac->avail];
         } else {
                 STATS_INC_ALLOCMISS(cachep);
		local_irq_save(save_flags);
                 objp = cache_alloc_refill(cachep, flags);
		local_irq_restore(save_flags);
         }
         preempt_enable();
         objp = cache_alloc_debugcheck_after(cachep, flags, objp, __builtin_return_address(0));
	prefetchw(objp);
         return objp;
}


void kmem_cache_free_noirq(kmem_cache_t *cachep, void *objp)
{
         struct array_cache *ac;

	check_irq_on();
	preempt_disable();
	ac  = ac_data(cachep);

         objp = cache_free_debugcheck(cachep, objp, __builtin_return_address(0));

         if (likely(ac->avail < ac->limit)) {
                 STATS_INC_FREEHIT(cachep);
         } else {
		unsigned long flags;
                 STATS_INC_FREEMISS(cachep);
		local_irq_save(flags);
                 cache_flusharray(cachep, ac);
		local_irq_restore(flags);
         }
         ac_entry(ac)[ac->avail++] = objp;
	preempt_disable();
}

Thank you

Eric Dumazet


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: 2.6.13-rc4 - kernel panic - BUG at net/ipv4/tcp_output.c:918
  2005-08-04 17:41     ` Guillaume Pelat
  2005-08-04 18:23       ` [RFC] : SLAB : Could we have a process context only versions of kmem_cache_alloc(), and kmem_cache_free() Eric Dumazet
@ 2005-08-04 23:58       ` Andrew Morton
  2005-08-05  0:57         ` [TCP]: Fix TSO cwnd caching bug Herbert Xu
  1 sibling, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2005-08-04 23:58 UTC (permalink / raw)
  To: Guillaume Pelat; +Cc: herbert, davem, netdev, linux-kernel

Guillaume Pelat <guillaume.pelat@winch-hebergement.net> wrote:
>
> Hi,
> 
> Herbert Xu wrote:
> > On Thu, Aug 04, 2005 at 01:33:29PM +1000, herbert wrote:
> > 
> >>So I suppose we should reset cwnd_quota after tcp_transmit_skb?
> > 
> > Please try this patch to see if this is really the problem or not.
> > 
> > Thanks,
> 
> I just applied your patch, and it seems to work :)
> 2 hours uptime, and no crash yet (without the patch, it was crashing a 
> few mins only after booting).
> So i think the bug is crushed :)
> 

Thanks, Guillaume.  Herbert, David is travelling and not able to do a lot
of patchmonkeying.  Could you please prepare and submit a final patch?

Thanks.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [TCP]: Fix TSO cwnd caching bug
  2005-08-04 23:58       ` 2.6.13-rc4 - kernel panic - BUG at net/ipv4/tcp_output.c:918 Andrew Morton
@ 2005-08-05  0:57         ` Herbert Xu
  2005-08-05  1:08           ` Andrew Morton
  2005-08-05  8:33           ` David S. Miller
  0 siblings, 2 replies; 9+ messages in thread
From: Herbert Xu @ 2005-08-05  0:57 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Guillaume Pelat, davem, netdev, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 928 bytes --]

On Thu, Aug 04, 2005 at 04:58:42PM -0700, Andrew Morton wrote:
> 
> Thanks, Guillaume.  Herbert, David is travelling and not able to do a lot
> of patchmonkeying.  Could you please prepare and submit a final patch?

OK, here is the final version.  It depends on the patch that David
posted earlier on in this thread.  Please let me know if you need a
copy of that.

[TCP]: Fix TSO cwnd caching bug

tcp_write_xmit caches the cwnd value indirectly in cwnd_quota.
When tcp_transmit_skb reduces the cwnd because of tcp_enter_cwr,
the cached value becomes invalid.

This patch ensures that the cwnd value is always reread after
each tcp_transmit_skb call.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

[-- Attachment #2: tso-1 --]
[-- Type: text/plain, Size: 1464 bytes --]

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -972,19 +972,18 @@ static int tcp_write_xmit(struct sock *s
 	if (unlikely(sk->sk_state == TCP_CLOSE))
 		return 0;
 
-	skb = sk->sk_send_head;
-	if (unlikely(!skb))
-		return 0;
-
-	tso_segs = tcp_init_tso_segs(sk, skb, mss_now);
-	cwnd_quota = tcp_cwnd_test(tp, skb);
-	if (unlikely(!cwnd_quota))
-		goto out;
-
 	sent_pkts = 0;
-	while (likely(tcp_snd_wnd_test(tp, skb, mss_now))) {
+	while ((skb = sk->sk_send_head)) {
+		tso_segs = tcp_init_tso_segs(sk, skb, mss_now);
 		BUG_ON(!tso_segs);
 
+		cwnd_quota = tcp_cwnd_test(tp, skb);
+		if (!cwnd_quota)
+			break;
+
+		if (unlikely(!tcp_snd_wnd_test(tp, skb, mss_now)))
+			break;
+
 		if (tso_segs == 1) {
 			if (unlikely(!tcp_nagle_test(tp, skb, mss_now,
 						     (tcp_skb_is_last(sk, skb) ?
@@ -1026,27 +1025,12 @@ static int tcp_write_xmit(struct sock *s
 
 		tcp_minshall_update(tp, mss_now, skb);
 		sent_pkts++;
-
-		/* Do not optimize this to use tso_segs. If we chopped up
-		 * the packet above, tso_segs will no longer be valid.
-		 */
-		cwnd_quota -= tcp_skb_pcount(skb);
-
-		BUG_ON(cwnd_quota < 0);
-		if (!cwnd_quota)
-			break;
-
-		skb = sk->sk_send_head;
-		if (!skb)
-			break;
-		tso_segs = tcp_init_tso_segs(sk, skb, mss_now);
 	}
 
 	if (likely(sent_pkts)) {
 		tcp_cwnd_validate(sk, tp);
 		return 0;
 	}
-out:
 	return !tp->packets_out && sk->sk_send_head;
 }
 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [TCP]: Fix TSO cwnd caching bug
  2005-08-05  0:57         ` [TCP]: Fix TSO cwnd caching bug Herbert Xu
@ 2005-08-05  1:08           ` Andrew Morton
  2005-08-05  8:33           ` David S. Miller
  1 sibling, 0 replies; 9+ messages in thread
From: Andrew Morton @ 2005-08-05  1:08 UTC (permalink / raw)
  To: Herbert Xu; +Cc: guillaume.pelat, davem, netdev, linux-kernel

Herbert Xu <herbert@gondor.apana.org.au> wrote:
>
> On Thu, Aug 04, 2005 at 04:58:42PM -0700, Andrew Morton wrote:
>  > 
>  > Thanks, Guillaume.  Herbert, David is travelling and not able to do a lot
>  > of patchmonkeying.  Could you please prepare and submit a final patch?
> 
>  OK, here is the final version.

Thanks.

>  It depends on the patch that David
>  posted earlier on in this thread.  Please let me know if you need a
>  copy of that.

Yes please.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [TCP]: Fix TSO cwnd caching bug
  2005-08-05  0:57         ` [TCP]: Fix TSO cwnd caching bug Herbert Xu
  2005-08-05  1:08           ` Andrew Morton
@ 2005-08-05  8:33           ` David S. Miller
  1 sibling, 0 replies; 9+ messages in thread
From: David S. Miller @ 2005-08-05  8:33 UTC (permalink / raw)
  To: herbert; +Cc: akpm, guillaume.pelat, netdev, linux-kernel

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Fri, 5 Aug 2005 10:57:41 +1000

> OK, here is the final version.  It depends on the patch that David
> posted earlier on in this thread.  Please let me know if you need a
> copy of that.
> 
> [TCP]: Fix TSO cwnd caching bug

Good catch Herbert :)

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2005-08-05  8:33 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-08-01  8:33 2.6.13-rc4 - kernel panic - BUG at net/ipv4/tcp_output.c:918 Guillaume Pelat
2005-08-04  3:33 ` Herbert Xu
2005-08-04 10:35   ` Herbert Xu
2005-08-04 17:41     ` Guillaume Pelat
2005-08-04 18:23       ` [RFC] : SLAB : Could we have a process context only versions of kmem_cache_alloc(), and kmem_cache_free() Eric Dumazet
2005-08-04 23:58       ` 2.6.13-rc4 - kernel panic - BUG at net/ipv4/tcp_output.c:918 Andrew Morton
2005-08-05  0:57         ` [TCP]: Fix TSO cwnd caching bug Herbert Xu
2005-08-05  1:08           ` Andrew Morton
2005-08-05  8:33           ` David S. Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox