netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-2.6.24 0/3]: More TCP fixes
@ 2007-10-03 11:00 Ilpo Järvinen
  2007-10-03 11:00 ` [PATCH 1/3] [TCP]: Fix two off-by-one errors in fackets_out adjusting logic Ilpo Järvinen
  2007-10-03 11:58 ` [PATCH net-2.6.24 0/3]: More TCP fixes Cedric Le Goater
  0 siblings, 2 replies; 20+ messages in thread
From: Ilpo Järvinen @ 2007-10-03 11:00 UTC (permalink / raw)
  To: David Miller, Cedric Le Goater; +Cc: netdev

Hi Dave,

Sacktag fastpath_cnt_hint seems to be very tricky to get right...
I suppose this one fixes Cedric's case. I cannot say for sure    
until there is something more definite indication of
tcp_retrans_try_collapse origin than what the simple late WARN_ON
gave for us. ...Especially since it's non-trivial to have skb
hint "correctly" positioned in the write_queue while still ending
up calling that function. However, considering how difficult it
seems to be for Cedric to reproduce, it might well be this one.

In addition, I noticed another reset which wasn't previously   
converted to WARN_ON, so doing that now. Boot + simple xfer
tested. Please apply to net-2.6.24.

--
 i.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 1/3] [TCP]: Fix two off-by-one errors in fackets_out adjusting logic
  2007-10-03 11:00 [PATCH net-2.6.24 0/3]: More TCP fixes Ilpo Järvinen
@ 2007-10-03 11:00 ` Ilpo Järvinen
  2007-10-03 11:00   ` [PATCH 2/3] [TCP]: Comment fastpath_cnt_hint off-by-one trap Ilpo Järvinen
  2007-10-08  6:36   ` [PATCH 1/3] [TCP]: Fix two off-by-one errors in fackets_out adjusting logic David Miller
  2007-10-03 11:58 ` [PATCH net-2.6.24 0/3]: More TCP fixes Cedric Le Goater
  1 sibling, 2 replies; 20+ messages in thread
From: Ilpo Järvinen @ 2007-10-03 11:00 UTC (permalink / raw)
  To: David Miller, Cedric Le Goater; +Cc: netdev

1) Passing wrong skb to tcp_adjust_fackets_out could corrupt
fastpath_cnt_hint as tcp_skb_pcount(next_skb) is not included
to it if hint points exactly to the next_skb (it's lagging
behind, see sacktag).

2) When fastpath_skb_hint is put backwards to avoid dangling
skb reference, the skb's pcount must also be removed from count
(not included like above).

Reported by Cedric Le Goater <legoater@free.fr>

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 net/ipv4/tcp_output.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 6199abe..5329675 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1755,14 +1755,16 @@ static void tcp_retrans_try_collapse(struct sock *sk, struct sk_buff *skb, int m
 		if (tcp_is_reno(tp) && tp->sacked_out)
 			tcp_dec_pcount_approx(&tp->sacked_out, next_skb);
 
-		tcp_adjust_fackets_out(tp, skb, tcp_skb_pcount(next_skb));
+		tcp_adjust_fackets_out(tp, next_skb, tcp_skb_pcount(next_skb));
 		tp->packets_out -= tcp_skb_pcount(next_skb);
 
 		/* changed transmit queue under us so clear hints */
 		tcp_clear_retrans_hints_partial(tp);
 		/* manually tune sacktag skb hint */
-		if (tp->fastpath_skb_hint == next_skb)
+		if (tp->fastpath_skb_hint == next_skb) {
 			tp->fastpath_skb_hint = skb;
+			tp->fastpath_cnt_hint -= tcp_skb_pcount(skb);
+		}
 
 		sk_stream_free_skb(sk, next_skb);
 	}
-- 
1.5.0.6


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 2/3] [TCP]: Comment fastpath_cnt_hint off-by-one trap
  2007-10-03 11:00 ` [PATCH 1/3] [TCP]: Fix two off-by-one errors in fackets_out adjusting logic Ilpo Järvinen
@ 2007-10-03 11:00   ` Ilpo Järvinen
  2007-10-03 11:00     ` [PATCH 3/3] [TCP]: "Annotate" another fackets_out state reset Ilpo Järvinen
  2007-10-08  6:37     ` [PATCH 2/3] [TCP]: Comment fastpath_cnt_hint off-by-one trap David Miller
  2007-10-08  6:36   ` [PATCH 1/3] [TCP]: Fix two off-by-one errors in fackets_out adjusting logic David Miller
  1 sibling, 2 replies; 20+ messages in thread
From: Ilpo Järvinen @ 2007-10-03 11:00 UTC (permalink / raw)
  To: David Miller, Cedric Le Goater; +Cc: netdev

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 include/linux/tcp.h |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index f8cf090..9ff456e 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -343,7 +343,8 @@ struct tcp_sock {
 	struct sk_buff *forward_skb_hint;
 	struct sk_buff *fastpath_skb_hint;
 
-	int     fastpath_cnt_hint;
+	int     fastpath_cnt_hint;	/* Lags behind by current skb's pcount
+					 * compared to respective fackets_out */
 	int     lost_cnt_hint;
 	int     retransmit_cnt_hint;
 
-- 
1.5.0.6


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 3/3] [TCP]: "Annotate" another fackets_out state reset
  2007-10-03 11:00   ` [PATCH 2/3] [TCP]: Comment fastpath_cnt_hint off-by-one trap Ilpo Järvinen
@ 2007-10-03 11:00     ` Ilpo Järvinen
  2007-10-08  6:38       ` David Miller
  2007-10-08  6:37     ` [PATCH 2/3] [TCP]: Comment fastpath_cnt_hint off-by-one trap David Miller
  1 sibling, 1 reply; 20+ messages in thread
From: Ilpo Järvinen @ 2007-10-03 11:00 UTC (permalink / raw)
  To: David Miller, Cedric Le Goater; +Cc: netdev

This should no longer be necessary because fackets_out is
accurate. It indicates bugs elsewhere, thus report it.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 net/ipv4/tcp_input.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index e22ffe7..87c9ef5 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1160,7 +1160,8 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_
 	int first_sack_index;
 
 	if (!tp->sacked_out) {
-		tp->fackets_out = 0;
+		if (WARN_ON(tp->fackets_out))
+			tp->fackets_out = 0;
 		tp->highest_sack = tp->snd_una;
 	}
 	prior_fackets = tp->fackets_out;
-- 
1.5.0.6


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH net-2.6.24 0/3]: More TCP fixes
  2007-10-03 11:00 [PATCH net-2.6.24 0/3]: More TCP fixes Ilpo Järvinen
  2007-10-03 11:00 ` [PATCH 1/3] [TCP]: Fix two off-by-one errors in fackets_out adjusting logic Ilpo Järvinen
@ 2007-10-03 11:58 ` Cedric Le Goater
  2007-10-03 12:34   ` Ilpo Järvinen
  1 sibling, 1 reply; 20+ messages in thread
From: Cedric Le Goater @ 2007-10-03 11:58 UTC (permalink / raw)
  To: Ilpo Järvinen; +Cc: David Miller, netdev

Hello Ilpo !

Ilpo Järvinen wrote:
> Hi Dave,
> 
> Sacktag fastpath_cnt_hint seems to be very tricky to get right...
> I suppose this one fixes Cedric's case. I cannot say for sure    
> until there is something more definite indication of
> tcp_retrans_try_collapse origin than what the simple late WARN_ON
> gave for us. ...Especially since it's non-trivial to have skb
> hint "correctly" positioned in the write_queue while still ending
> up calling that function. However, considering how difficult it
> seems to be for Cedric to reproduce, it might well be this one.
> 
> In addition, I noticed another reset which wasn't previously   
> converted to WARN_ON, so doing that now. Boot + simple xfer
> tested. Please apply to net-2.6.24.

I'm dropping the previous patches you sent me and switching to this patchset. 
right ?

Thanks,

C.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-2.6.24 0/3]: More TCP fixes
  2007-10-03 11:58 ` [PATCH net-2.6.24 0/3]: More TCP fixes Cedric Le Goater
@ 2007-10-03 12:34   ` Ilpo Järvinen
  2007-10-03 12:48     ` Cedric Le Goater
  0 siblings, 1 reply; 20+ messages in thread
From: Ilpo Järvinen @ 2007-10-03 12:34 UTC (permalink / raw)
  To: Cedric Le Goater; +Cc: David Miller, Netdev

[-- Attachment #1: Type: TEXT/PLAIN, Size: 10192 bytes --]

On Wed, 3 Oct 2007, Cedric Le Goater wrote:

> Ilpo Järvinen wrote:
> > Sacktag fastpath_cnt_hint seems to be very tricky to get right...
> > I suppose this one fixes Cedric's case. I cannot say for sure    
> > until there is something more definite indication of
> > tcp_retrans_try_collapse origin than what the simple late WARN_ON
> > gave for us. ...Especially since it's non-trivial to have skb
> > hint "correctly" positioned in the write_queue while still ending
> > up calling that function. However, considering how difficult it
> > seems to be for Cedric to reproduce, it might well be this one.
> > 
> > In addition, I noticed another reset which wasn't previously   
> > converted to WARN_ON, so doing that now. Boot + simple xfer
> > tested. Please apply to net-2.6.24.
> 
> I'm dropping the previous patches you sent me and switching to this patchset. 
> right ?

Yes you can do that... However, there are two ways forward:

1) Drop and test with this patchset long enough to verify it's gone...
2) No dropping and get the more exact trace by reproducing, which can 
   point out to tcp_retrans_try_collapse confirming the source of the
   bug or revealing yet another bug...

The first one has one drawback, it cannot prove the fix very well since 
the bug could just not occur by chance... Path 2 would clearly show the 
place from where the problem originates because we will know that it got 
triggered! I personally would prefer path 2 but whether you want to go for 
that depends on the time you want to invest in it...

...I rediffed the tcp_verify_fackets patch too (below) just in case it 
would be something else in you case and you choose path 1 (put it on top 
of this patchset, applies with some offsets). In case the problem is gone, 
it shouldn't trigger and if it does, we'll have another bug caught.

Anyway, thanks for ccing right persons and netdev right from the 
beginning.


-- 
 i.

 include/net/tcp.h     |    3 +
 net/ipv4/tcp_input.c  |   25 +++++++++---
 net/ipv4/tcp_ipv4.c   |  103 +++++++++++++++++++++++++++++++++++++++++++++++++
 net/ipv4/tcp_output.c |    6 ++-
 4 files changed, 130 insertions(+), 7 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 991ccdc..54a0d91 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -43,6 +43,9 @@
 
 #include <linux/seq_file.h>
 
+extern void tcp_verify_fackets(struct sock *sk);
+extern void tcp_print_queue(struct sock *sk);
+
 extern struct inet_hashinfo tcp_hashinfo;
 
 extern atomic_t tcp_orphan_count;
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 87c9ef5..93bdc20 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1140,7 +1140,7 @@ static int tcp_check_dsack(struct tcp_sock *tp, struct sk_buff *ack_skb,
 	return dup_sack;
 }
 
-static int
+int
 tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_una)
 {
 	const struct inet_connection_sock *icsk = inet_csk(sk);
@@ -1160,8 +1160,10 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_
 	int first_sack_index;
 
 	if (!tp->sacked_out) {
-		if (WARN_ON(tp->fackets_out))
+		if (WARN_ON(tp->fackets_out)) {
 			tp->fackets_out = 0;
+			tcp_print_queue(sk);
+		}
 		tp->highest_sack = tp->snd_una;
 	}
 	prior_fackets = tp->fackets_out;
@@ -1421,6 +1423,7 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_
 			}
 		}
 	}
+	tcp_verify_fackets(sk);
 
 	/* Check for lost retransmit. This superb idea is
 	 * borrowed from "ratehalving". Event "C".
@@ -1633,13 +1636,14 @@ void tcp_enter_frto(struct sock *sk)
 	tcp_set_ca_state(sk, TCP_CA_Disorder);
 	tp->high_seq = tp->snd_nxt;
 	tp->frto_counter = 1;
+	tcp_verify_fackets(sk);
 }
 
 /* Enter Loss state after F-RTO was applied. Dupack arrived after RTO,
  * which indicates that we should follow the traditional RTO recovery,
  * i.e. mark everything lost and do go-back-N retransmission.
  */
-static void tcp_enter_frto_loss(struct sock *sk, int allowed_segments, int flag)
+void tcp_enter_frto_loss(struct sock *sk, int allowed_segments, int flag)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 	struct sk_buff *skb;
@@ -1676,6 +1680,7 @@ static void tcp_enter_frto_loss(struct sock *sk, int allowed_segments, int flag)
 		}
 	}
 	tcp_verify_left_out(tp);
+	tcp_verify_fackets(sk);
 
 	tp->snd_cwnd = tcp_packets_in_flight(tp) + allowed_segments;
 	tp->snd_cwnd_cnt = 0;
@@ -1754,6 +1759,7 @@ void tcp_enter_loss(struct sock *sk, int how)
 		}
 	}
 	tcp_verify_left_out(tp);
+	tcp_verify_fackets(sk);
 
 	tp->reordering = min_t(unsigned int, tp->reordering,
 					     sysctl_tcp_reordering);
@@ -2309,7 +2315,7 @@ static void tcp_mtup_probe_success(struct sock *sk, struct sk_buff *skb)
  * It does _not_ decide what to send, it is made in function
  * tcp_xmit_retransmit_queue().
  */
-static void
+void
 tcp_fastretrans_alert(struct sock *sk, int pkts_acked, int flag)
 {
 	struct inet_connection_sock *icsk = inet_csk(sk);
@@ -2323,8 +2329,11 @@ tcp_fastretrans_alert(struct sock *sk, int pkts_acked, int flag)
 	if (!tp->packets_out)
 		tp->sacked_out = 0;
 
-	if (WARN_ON(!tp->sacked_out && tp->fackets_out))
+	if (WARN_ON(!tp->sacked_out && tp->fackets_out)) {
+		printk(KERN_ERR "TCP %d\n", tcp_is_reno(tp));
+		tcp_print_queue(sk);
 		tp->fackets_out = 0;
+	}
 
 	/* Now state machine starts.
 	 * A. ECE, hence prohibit cwnd undoing, the reduction is required. */
@@ -2334,6 +2343,8 @@ tcp_fastretrans_alert(struct sock *sk, int pkts_acked, int flag)
 	/* B. In all the states check for reneging SACKs. */
 	if (tp->sacked_out && tcp_check_sack_reneging(sk))
 		return;
+	
+	tcp_verify_fackets(sk);
 
 	/* C. Process data loss notification, provided it is valid. */
 	if ((flag&FLAG_DATA_LOST) &&
@@ -2573,7 +2584,7 @@ static u32 tcp_tso_acked(struct sock *sk, struct sk_buff *skb)
  * is before the ack sequence we can discard it as it's confirmed to have
  * arrived at the other end.
  */
-static int tcp_clean_rtx_queue(struct sock *sk, s32 *seq_rtt_p)
+int tcp_clean_rtx_queue(struct sock *sk, s32 *seq_rtt_p)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 	const struct inet_connection_sock *icsk = inet_csk(sk);
@@ -2695,6 +2706,8 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 *seq_rtt_p)
 			ca_ops->pkts_acked(sk, pkts_acked, rtt_us);
 		}
 	}
+	tcp_verify_fackets(sk);
+
 
 #if FASTRETRANS_DEBUG > 0
 	BUG_TRAP((int)tp->sacked_out >= 0);
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 7fed0a6..b5877d0 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -108,6 +108,109 @@ struct inet_hashinfo __cacheline_aligned tcp_hashinfo = {
 	.lhash_wait  = __WAIT_QUEUE_HEAD_INITIALIZER(tcp_hashinfo.lhash_wait),
 };
 
+void tcp_print_queue(struct sock *sk)
+{
+	struct tcp_sock *tp = tcp_sk(sk);
+	struct sk_buff *skb;
+	char s[50+1];
+	char i[50+1];
+	int idx = 0;
+	u32 hs = tp->highest_sack;
+	
+	if (!tp->sacked_out)
+		hs = tp->snd_una;
+	
+	tcp_for_write_queue(skb, sk) {
+		if (skb == tcp_send_head(sk))
+			break;
+		
+		if (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED) {
+			if (skb->len < tp->mss_cache)
+				s[idx] = 's';
+			else
+				s[idx] = 'S';
+		} else {
+			s[idx] = '-';
+		}
+		if ((TCP_SKB_CB(skb)->seq == hs) && (tp->fastpath_skb_hint == skb))
+			i[idx] = 'x';
+		else if (tp->fastpath_skb_hint == skb)
+			i[idx] = 'f';
+		else if (TCP_SKB_CB(skb)->seq == hs)
+			i[idx] = 'h';
+		else
+			i[idx] = ' ';
+			
+		if (++idx >= 50) {
+			s[idx] = 0;
+			i[idx] = 0;
+			printk(KERN_ERR "TCP wq(s) %s\n", s);
+			printk(KERN_ERR "TCP wq(i) %s\n", i);
+			idx = 0;
+		}
+	}
+	if (idx) {
+		s[idx] = '<';
+		s[idx+1] = 0;
+		i[idx] = '<';
+		i[idx+1] = 0;
+		printk(KERN_ERR "TCP wq(s) %s\n", s);
+		printk(KERN_ERR "TCP wq(i) %s\n", i);
+	}
+	printk(KERN_ERR "s%u f%u (%u) p%u seq: su%u hs%u sn%u (%u)\n",
+		tp->sacked_out, tp->fackets_out, tp->fastpath_cnt_hint,
+		tp->packets_out,
+		tp->snd_una, tp->highest_sack, tp->snd_nxt,
+		((tp->fastpath_skb_hint == NULL) ? 0 :
+			TCP_SKB_CB(tp->fastpath_skb_hint)->seq));
+}
+
+void tcp_verify_fackets(struct sock *sk)
+{
+	struct tcp_sock *tp = tcp_sk(sk);
+	struct sk_buff *skb;
+	u32 fackets = 0;
+	int hisack_valid = 0;
+	int err = 0;
+	
+	if (tcp_is_reno(tp))
+		return;
+	
+	if (!tp->sacked_out) {
+		if (WARN_ON(tp->fackets_out))
+			err = 1;
+		else if (tp->fastpath_skb_hint == NULL)
+			return;
+	}
+	
+	/* ...expensive processing here... */
+	tcp_for_write_queue(skb, sk) {
+		if (skb == tcp_send_head(sk))
+			break;
+
+		if (tp->sacked_out && (TCP_SKB_CB(skb)->seq == tp->highest_sack)) {
+			hisack_valid = 1;
+			if (WARN_ON(tp->fackets_out != fackets + tcp_skb_pcount(skb)))
+				err = 1;
+		}
+
+		if (skb == tp->fastpath_skb_hint)
+			if (WARN_ON(fackets != tp->fastpath_cnt_hint))
+				err = 1;
+
+		if (WARN_ON((fackets > tp->fackets_out) && (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED)))
+			err = 1;
+
+		fackets += tcp_skb_pcount(skb);
+	}
+	
+	if (WARN_ON(tp->sacked_out && !hisack_valid))
+		err = 1;
+	
+	if (err)
+		tcp_print_queue(sk);
+}
+
 static int tcp_v4_get_port(struct sock *sk, unsigned short snum)
 {
 	return inet_csk_get_port(&tcp_hashinfo, sk, snum,
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 5329675..3aba96a 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -773,6 +773,8 @@ int tcp_fragment(struct sock *sk, struct sk_buff *skb, u32 len, unsigned int mss
 			tcp_verify_left_out(tp);
 		}
 		tcp_adjust_fackets_out(tp, skb, diff);
+		
+		tcp_verify_fackets(sk);
 	}
 
 	/* Link BUFF into the send queue. */
@@ -1688,7 +1690,7 @@ u32 __tcp_select_window(struct sock *sk)
 }
 
 /* Attempt to collapse two adjacent SKB's during retransmission. */
-static void tcp_retrans_try_collapse(struct sock *sk, struct sk_buff *skb, int mss_now)
+void tcp_retrans_try_collapse(struct sock *sk, struct sk_buff *skb, int mss_now)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 	struct sk_buff *next_skb = tcp_write_queue_next(sk, skb);
@@ -1766,6 +1768,8 @@ static void tcp_retrans_try_collapse(struct sock *sk, struct sk_buff *skb, int m
 			tp->fastpath_cnt_hint -= tcp_skb_pcount(skb);
 		}
 
+		tcp_verify_fackets(sk);
+
 		sk_stream_free_skb(sk, next_skb);
 	}
 }
-- 
1.5.0.6

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH net-2.6.24 0/3]: More TCP fixes
  2007-10-03 12:34   ` Ilpo Järvinen
@ 2007-10-03 12:48     ` Cedric Le Goater
  2007-10-03 13:19       ` Ilpo Järvinen
  0 siblings, 1 reply; 20+ messages in thread
From: Cedric Le Goater @ 2007-10-03 12:48 UTC (permalink / raw)
  To: Ilpo Järvinen; +Cc: David Miller, Netdev

Ilpo Järvinen wrote:
> On Wed, 3 Oct 2007, Cedric Le Goater wrote:
> 
>> Ilpo Järvinen wrote:
>>> Sacktag fastpath_cnt_hint seems to be very tricky to get right...
>>> I suppose this one fixes Cedric's case. I cannot say for sure    
>>> until there is something more definite indication of
>>> tcp_retrans_try_collapse origin than what the simple late WARN_ON
>>> gave for us. ...Especially since it's non-trivial to have skb
>>> hint "correctly" positioned in the write_queue while still ending
>>> up calling that function. However, considering how difficult it
>>> seems to be for Cedric to reproduce, it might well be this one.
>>>
>>> In addition, I noticed another reset which wasn't previously   
>>> converted to WARN_ON, so doing that now. Boot + simple xfer
>>> tested. Please apply to net-2.6.24.
>> I'm dropping the previous patches you sent me and switching to this patchset. 
>> right ?
> 
> Yes you can do that... However, there are two ways forward:
> 
> 1) Drop and test with this patchset long enough to verify it's gone...
> 2) No dropping and get the more exact trace by reproducing, which can 
>    point out to tcp_retrans_try_collapse confirming the source of the
>    bug or revealing yet another bug...
> 
> The first one has one drawback, it cannot prove the fix very well since 
> the bug could just not occur by chance... Path 2 would clearly show the 
> place from where the problem originates because we will know that it got 
> triggered! I personally would prefer path 2 but whether you want to go for 
> that depends on the time you want to invest in it...
> 
> ...I rediffed the tcp_verify_fackets patch too (below) just in case it 
> would be something else in you case and you choose path 1 (put it on top 
> of this patchset, applies with some offsets). In case the problem is gone, 
> it shouldn't trigger and if it does, we'll have another bug caught.

I have a spare node so I'm starting 2) with the 3 patches you sent and that
last one which applied fine. all of them on a fresh git pull of net-2.6.24

> Anyway, thanks for ccing right persons and netdev right from the 
> beginning.

thanks to git ! :) 

C.
 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-2.6.24 0/3]: More TCP fixes
  2007-10-03 12:48     ` Cedric Le Goater
@ 2007-10-03 13:19       ` Ilpo Järvinen
  2007-10-03 14:05         ` Cedric Le Goater
  0 siblings, 1 reply; 20+ messages in thread
From: Ilpo Järvinen @ 2007-10-03 13:19 UTC (permalink / raw)
  To: Cedric Le Goater; +Cc: David Miller, Netdev

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2003 bytes --]

On Wed, 3 Oct 2007, Cedric Le Goater wrote:

> Ilpo Järvinen wrote:
> > On Wed, 3 Oct 2007, Cedric Le Goater wrote:
> > 
> >> I'm dropping the previous patches you sent me and switching to this patchset. 
> >> right ?
> > 
> > Yes you can do that... However, there are two ways forward:
> > 
> > 1) Drop and test with this patchset long enough to verify it's gone...
> > 2) No dropping and get the more exact trace by reproducing, which can 
> >    point out to tcp_retrans_try_collapse confirming the source of the
> >    bug or revealing yet another bug...
> > 
> > The first one has one drawback, it cannot prove the fix very well since 
> > the bug could just not occur by chance... Path 2 would clearly show the 
> > place from where the problem originates because we will know that it got 
> > triggered! I personally would prefer path 2 but whether you want to go for 
> > that depends on the time you want to invest in it...
> > 
> > ...I rediffed the tcp_verify_fackets patch too (below) just in case it 
> > would be something else in you case and you choose path 1 (put it on top 
> > of this patchset, applies with some offsets). In case the problem is gone, 
> > it shouldn't trigger and if it does, we'll have another bug caught.
> 
> I have a spare node so I'm starting 2) with the 3 patches you sent and that
> last one which applied fine.

Ah, that's path 1) then... Since you seem to have enough time, I would say 
that the path 1 is good as well and bugs unrelated to the fix will show up 
there too...

I should have stated it explicitly that with path 2 those 3 patches should 
not be applied because the aim is not a fix but reproducal. Path 2 was 
intentionally left without the potentional fix as then nice backtrace 
informs when we can stop trying (which would hopefully occurred 
pretty soon) :-). But lets discard that path 2...

> all of them on a fresh git pull of net-2.6.24

That's fine, they're pretty well in sync (mm and net-2.6.24, and 
soon 2.6.24-rcs too).

-- 
 i.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-2.6.24 0/3]: More TCP fixes
  2007-10-03 13:19       ` Ilpo Järvinen
@ 2007-10-03 14:05         ` Cedric Le Goater
  2007-10-03 14:22           ` Ilpo Järvinen
  0 siblings, 1 reply; 20+ messages in thread
From: Cedric Le Goater @ 2007-10-03 14:05 UTC (permalink / raw)
  To: Ilpo Järvinen; +Cc: David Miller, Netdev

Ilpo Järvinen wrote:
> On Wed, 3 Oct 2007, Cedric Le Goater wrote:
> 
>> Ilpo Järvinen wrote:
>>> On Wed, 3 Oct 2007, Cedric Le Goater wrote:
>>>
>>>> I'm dropping the previous patches you sent me and switching to this patchset. 
>>>> right ?
>>> Yes you can do that... However, there are two ways forward:
>>>
>>> 1) Drop and test with this patchset long enough to verify it's gone...
>>> 2) No dropping and get the more exact trace by reproducing, which can 
>>>    point out to tcp_retrans_try_collapse confirming the source of the
>>>    bug or revealing yet another bug...
>>>
>>> The first one has one drawback, it cannot prove the fix very well since 
>>> the bug could just not occur by chance... Path 2 would clearly show the 
>>> place from where the problem originates because we will know that it got 
>>> triggered! I personally would prefer path 2 but whether you want to go for 
>>> that depends on the time you want to invest in it...
>>>
>>> ...I rediffed the tcp_verify_fackets patch too (below) just in case it 
>>> would be something else in you case and you choose path 1 (put it on top 
>>> of this patchset, applies with some offsets). In case the problem is gone, 
>>> it shouldn't trigger and if it does, we'll have another bug caught.
>> I have a spare node so I'm starting 2) with the 3 patches you sent and that
>> last one which applied fine.
> 
> Ah, that's path 1) then... Since you seem to have enough time, I would say 
> that the path 1 is good as well and bugs unrelated to the fix will show up 
> there too...

arg. yes. sorry for the confusion.

> I should have stated it explicitly that with path 2 those 3 patches should 
> not be applied because the aim is not a fix but reproducal. Path 2 was 
> intentionally left without the potentional fix as then nice backtrace 
> informs when we can stop trying (which would hopefully occurred 
> pretty soon) :-).  But lets discard that path 2...

I have 2 spare nodes so i'll run both. 1) is on already without any issues
i'm just compiling 2)

I usually work on -mm, so what would be interesting for me is to have what you 
need in net-2.6.24 which is getting pulled in -mm by andrew. then, if you need 
an extra patch for verbosity, that's fine, i'll include it in my usual patchset.

Cheers,

C.
   
>> all of them on a fresh git pull of net-2.6.24
> 
> That's fine, they're pretty well in sync (mm and net-2.6.24, and 
> soon 2.6.24-rcs too).
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-2.6.24 0/3]: More TCP fixes
  2007-10-03 14:05         ` Cedric Le Goater
@ 2007-10-03 14:22           ` Ilpo Järvinen
  2007-10-03 14:58             ` Cedric Le Goater
  0 siblings, 1 reply; 20+ messages in thread
From: Ilpo Järvinen @ 2007-10-03 14:22 UTC (permalink / raw)
  To: Cedric Le Goater; +Cc: David Miller, Netdev

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1480 bytes --]

On Wed, 3 Oct 2007, Cedric Le Goater wrote:

> Ilpo Järvinen wrote:
> > 
> > Ah, that's path 1) then... Since you seem to have enough time, I would say 
> > that the path 1 is good as well and bugs unrelated to the fix will show up 
> > there too...
> 
> arg. yes. sorry for the confusion.
> 
> > I should have stated it explicitly that with path 2 those 3 patches should 
> > not be applied because the aim is not a fix but reproducal. Path 2 was 
> > intentionally left without the potentional fix as then nice backtrace 
> > informs when we can stop trying (which would hopefully occurred 
> > pretty soon) :-).  But lets discard that path 2...
> 
> I have 2 spare nodes so i'll run both. 1) is on already without any issues
> i'm just compiling 2)

Thanks a lot. :-)

> I usually work on -mm, so what would be interesting for me is to have what you 
> need in net-2.6.24 which is getting pulled in -mm by andrew. then, if 
> you need an extra patch for verbosity, that's fine, i'll include it in 
> my usual patchset.

Ah, I'm sorry about the subject and the extra work it caused, it was 
meant for DaveM only, didn't realize at that time it would be 
meaningful to you as well, thus couldn't warn you back then... Testing on 
top of mm would be (/ have been) fine as well... From my point of view 
both mm and net-2.6.24 are pretty much the same (I even verified that 
those patches apply fine on top of rc8-mm2 since I thought that you might 
want to use that one).

-- 
 i.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-2.6.24 0/3]: More TCP fixes
  2007-10-03 14:22           ` Ilpo Järvinen
@ 2007-10-03 14:58             ` Cedric Le Goater
  2007-10-03 14:59               ` Cedric Le Goater
  2007-10-03 15:12               ` Cedric Le Goater
  0 siblings, 2 replies; 20+ messages in thread
From: Cedric Le Goater @ 2007-10-03 14:58 UTC (permalink / raw)
  To: Ilpo Järvinen; +Cc: David Miller, Netdev

Ilpo Järvinen wrote:
> On Wed, 3 Oct 2007, Cedric Le Goater wrote:
> 
>> Ilpo Järvinen wrote:
>>> Ah, that's path 1) then... Since you seem to have enough time, I would say 
>>> that the path 1 is good as well and bugs unrelated to the fix will show up 
>>> there too...
>> arg. yes. sorry for the confusion.
>>
>>> I should have stated it explicitly that with path 2 those 3 patches should 
>>> not be applied because the aim is not a fix but reproducal. Path 2 was 
>>> intentionally left without the potentional fix as then nice backtrace 
>>> informs when we can stop trying (which would hopefully occurred 
>>> pretty soon) :-).  But lets discard that path 2...
>> I have 2 spare nodes so i'll run both. 1) is on already without any issues
>> i'm just compiling 2)

Below are the messages I got on 2) right after running ketchup (which does 
a wget www.kernel.org) 

not a warning on 1) with your extra verbose patch.

>> I usually work on -mm, so what would be interesting for me is to have what you 
>> need in net-2.6.24 which is getting pulled in -mm by andrew. then, if 
>> you need an extra patch for verbosity, that's fine, i'll include it in 
>> my usual patchset.
> 
> Ah, I'm sorry about the subject and the extra work it caused, 

no problem, that was a comment for the futur patchset.
 
> it was meant for DaveM only, didn't realize at that time it would be 
> meaningful to you as well, thus couldn't warn you back then... Testing on 
> top of mm would be (/ have been) fine as well... From my point of view 
> both mm and net-2.6.24 are pretty much the same (I even verified that 
> those patches apply fine on top of rc8-mm2 since I thought that you might 
> want to use that one).

He, you might have solved it with 1). If not, I'm keeping the hardware for
you.

Cheers,

C.

WARNING: at /home/legoater/linux/2.6.23-rc8-mm2-tcp_fastretrans/net/ipv4/tcp_ipv4.c:193 tcp_verify_fackets()

Call Trace:
 <IRQ>  [<ffffffff8041aa86>] tcp_verify_fackets+0x119/0x237
 [<ffffffff80416e57>] tcp_fragment+0x468/0x4b8
 [<ffffffff804184a5>] tcp_retransmit_skb+0xcf/0x2f4
 [<ffffffff8041878d>] tcp_xmit_retransmit_queue+0xc3/0x31e
 [<ffffffff8041220a>] tcp_fastretrans_alert+0xb36/0xb43
 [<ffffffff80412f0f>] tcp_ack+0x5d3/0x71b
 [<ffffffff80415229>] tcp_rcv_established+0x61f/0x6df
 [<ffffffff8025419a>] __lock_acquire+0x8a1/0xf1b
 [<ffffffff8041c7ff>] tcp_v4_do_rcv+0x3e/0x394
 [<ffffffff8041d171>] tcp_v4_rcv+0x61c/0x9a9
 [<ffffffff804017e3>] ip_local_deliver+0x1da/0x2a4
 [<ffffffff8040214e>] ip_rcv+0x583/0x5c9
 [<ffffffff8046fe43>] packet_rcv_spkt+0x19a/0x1a8
 [<ffffffff803e2e1c>] netif_receive_skb+0x2cf/0x2f5
 [<ffffffff88042505>] :tg3:tg3_poll+0x65d/0x8a4
 [<ffffffff803e2fe8>] net_rx_action+0xb8/0x191
 [<ffffffff8023a9b7>] __do_softirq+0x5f/0xe0
 [<ffffffff8020c98c>] call_softirq+0x1c/0x28
 [<ffffffff8020e9c3>] do_softirq+0x3b/0xb8
 [<ffffffff8023aaae>] irq_exit+0x4e/0x50
 [<ffffffff8020e7df>] do_IRQ+0xbd/0xd7
 [<ffffffff80209cb9>] mwait_idle+0x0/0x4d
 [<ffffffff8020bce6>] ret_from_intr+0x0/0xf
 <EOI>  [<ffffffff80209cfc>] mwait_idle+0x43/0x4d
 [<ffffffff802099fb>] enter_idle+0x22/0x24
 [<ffffffff80209c4f>] cpu_idle+0x9d/0xc0
 [<ffffffff80479591>] rest_init+0x55/0x57
 [<ffffffff8063681f>] start_kernel+0x2e0/0x2ec
 [<ffffffff80636134>] _sinittext+0x134/0x13b

WARNING: at /home/legoater/linux/2.6.23-rc8-mm2-tcp_fastretrans/net/ipv4/tcp_ipv4.c:198 tcp_verify_fackets()

Call Trace:
 <IRQ>  [<ffffffff80323bf6>] vgacon_set_cursor_size+0x39/0xd5
 [<ffffffff8041aad0>] tcp_verify_fackets+0x163/0x237
 [<ffffffff80416e57>] tcp_fragment+0x468/0x4b8
 [<ffffffff804184a5>] tcp_retransmit_skb+0xcf/0x2f4
 [<ffffffff8041878d>] tcp_xmit_retransmit_queue+0xc3/0x31e
 [<ffffffff8041220a>] tcp_fastretrans_alert+0xb36/0xb43
 [<ffffffff80412f0f>] tcp_ack+0x5d3/0x71b
 [<ffffffff80415229>] tcp_rcv_established+0x61f/0x6df
 [<ffffffff8025419a>] __lock_acquire+0x8a1/0xf1b
 [<ffffffff8041c7ff>] tcp_v4_do_rcv+0x3e/0x394
 [<ffffffff8041d171>] tcp_v4_rcv+0x61c/0x9a9
 [<ffffffff804017e3>] ip_local_deliver+0x1da/0x2a4
 [<ffffffff8040214e>] ip_rcv+0x583/0x5c9
 [<ffffffff8046fe43>] packet_rcv_spkt+0x19a/0x1a8
 [<ffffffff803e2e1c>] netif_receive_skb+0x2cf/0x2f5
 [<ffffffff88042505>] :tg3:tg3_poll+0x65d/0x8a4
 [<ffffffff803e2fe8>] net_rx_action+0xb8/0x191
 [<ffffffff8023a9b7>] __do_softirq+0x5f/0xe0
 [<ffffffff8020c98c>] call_softirq+0x1c/0x28
 [<ffffffff8020e9c3>] do_softirq+0x3b/0xb8
 [<ffffffff8023aaae>] irq_exit+0x4e/0x50
 [<ffffffff8020e7df>] do_IRQ+0xbd/0xd7
 [<ffffffff80209cb9>] mwait_idle+0x0/0x4d
 [<ffffffff8020bce6>] ret_from_intr+0x0/0xf
 <EOI>  [<ffffffff80209cfc>] mwait_idle+0x43/0x4d
 [<ffffffff802099fb>] enter_idle+0x22/0x24
 [<ffffffff80209c4f>] cpu_idle+0x9d/0xc0
 [<ffffffff80479591>] rest_init+0x55/0x57
 [<ffffffff8063681f>] start_kernel+0x2e0/0x2ec
 [<ffffffff80636134>] _sinittext+0x134/0x13b

TCP wq(s) -S--SSS--------------------<
TCP wq(i)       hf                   <
s4 f9 (47) p9 seq: su3460595874 hs3460607374 sn3460659962 (3460608822)
WARNING: at /home/legoater/linux/2.6.23-rc8-mm2-tcp_fastretrans/net/ipv4/tcp_ipv4.c:193 tcp_verify_fackets()

Call Trace:
 <IRQ>  [<ffffffff80323bf6>] vgacon_set_cursor_size+0x39/0xd5
 [<ffffffff8041aa86>] tcp_verify_fackets+0x119/0x237
 [<ffffffff80416e57>] tcp_fragment+0x468/0x4b8
 [<ffffffff804184a5>] tcp_retransmit_skb+0xcf/0x2f4
 [<ffffffff8041878d>] tcp_xmit_retransmit_queue+0xc3/0x31e
 [<ffffffff8041220a>] tcp_fastretrans_alert+0xb36/0xb43
 [<ffffffff80412f0f>] tcp_ack+0x5d3/0x71b
 [<ffffffff80415229>] tcp_rcv_established+0x61f/0x6df
 [<ffffffff8025419a>] __lock_acquire+0x8a1/0xf1b
 [<ffffffff8041c7ff>] tcp_v4_do_rcv+0x3e/0x394
 [<ffffffff8041d171>] tcp_v4_rcv+0x61c/0x9a9
 [<ffffffff804017e3>] ip_local_deliver+0x1da/0x2a4
 [<ffffffff8040214e>] ip_rcv+0x583/0x5c9
 [<ffffffff8046fe43>] packet_rcv_spkt+0x19a/0x1a8
 [<ffffffff803e2e1c>] netif_receive_skb+0x2cf/0x2f5
 [<ffffffff88042505>] :tg3:tg3_poll+0x65d/0x8a4
 [<ffffffff803e2fe8>] net_rx_action+0xb8/0x191
 [<ffffffff8023a9b7>] __do_softirq+0x5f/0xe0
 [<ffffffff8020c98c>] call_softirq+0x1c/0x28
 [<ffffffff8020e9c3>] do_softirq+0x3b/0xb8
 [<ffffffff8023aaae>] irq_exit+0x4e/0x50
 [<ffffffff8020e7df>] do_IRQ+0xbd/0xd7
 [<ffffffff80209cb9>] mwait_idle+0x0/0x4d
 [<ffffffff8020bce6>] ret_from_intr+0x0/0xf
 <EOI>  [<ffffffff80209cfc>] mwait_idle+0x43/0x4d
 [<ffffffff802099fb>] enter_idle+0x22/0x24
 [<ffffffff80209c4f>] cpu_idle+0x9d/0xc0
 [<ffffffff80479591>] rest_init+0x55/0x57
 [<ffffffff8063681f>] start_kernel+0x2e0/0x2ec
 [<ffffffff80636134>] _sinittext+0x134/0x13b

WARNING: at /home/legoater/linux/2.6.23-rc8-mm2-tcp_fastretrans/net/ipv4/tcp_ipv4.c:198 tcp_verify_fackets()

Call Trace:
 <IRQ>  [<ffffffff80323bf6>] vgacon_set_cursor_size+0x39/0xd5
 [<ffffffff8041aad0>] tcp_verify_fackets+0x163/0x237
 [<ffffffff80416e57>] tcp_fragment+0x468/0x4b8
 [<ffffffff804184a5>] tcp_retransmit_skb+0xcf/0x2f4
 [<ffffffff8041878d>] tcp_xmit_retransmit_queue+0xc3/0x31e
 [<ffffffff8041220a>] tcp_fastretrans_alert+0xb36/0xb43
 [<ffffffff80412f0f>] tcp_ack+0x5d3/0x71b
 [<ffffffff80415229>] tcp_rcv_established+0x61f/0x6df
 [<ffffffff8025419a>] __lock_acquire+0x8a1/0xf1b
 [<ffffffff8041c7ff>] tcp_v4_do_rcv+0x3e/0x394
 [<ffffffff8041d171>] tcp_v4_rcv+0x61c/0x9a9
 [<ffffffff804017e3>] ip_local_deliver+0x1da/0x2a4
 [<ffffffff8040214e>] ip_rcv+0x583/0x5c9
 [<ffffffff8046fe43>] packet_rcv_spkt+0x19a/0x1a8
 [<ffffffff803e2e1c>] netif_receive_skb+0x2cf/0x2f5
 [<ffffffff88042505>] :tg3:tg3_poll+0x65d/0x8a4
 [<ffffffff803e2fe8>] net_rx_action+0xb8/0x191
 [<ffffffff8023a9b7>] __do_softirq+0x5f/0xe0
 [<ffffffff8020c98c>] call_softirq+0x1c/0x28
 [<ffffffff8020e9c3>] do_softirq+0x3b/0xb8
 [<ffffffff8023aaae>] irq_exit+0x4e/0x50
 [<ffffffff8020e7df>] do_IRQ+0xbd/0xd7
 [<ffffffff80209cb9>] mwait_idle+0x0/0x4d
 [<ffffffff8020bce6>] ret_from_intr+0x0/0xf
 <EOI>  [<ffffffff80209cfc>] mwait_idle+0x43/0x4d
 [<ffffffff802099fb>] enter_idle+0x22/0x24
 [<ffffffff80209c4f>] cpu_idle+0x9d/0xc0
 [<ffffffff80479591>] rest_init+0x55/0x57
 [<ffffffff8063681f>] start_kernel+0x2e0/0x2ec
 [<ffffffff80636134>] _sinittext+0x134/0x13b

TCP wq(s) -S---SSSSSS------------------<
TCP wq(i)           hf                 <
s7 f12 (47) p12 seq: su3460595874 hs3460611718 sn3460659962 (3460613166)
WARNING: at /home/legoater/linux/2.6.23-rc8-mm2-tcp_fastretrans/net/ipv4/tcp_ipv4.c:193 tcp_verify_fackets()

Call Trace:
 <IRQ>  [<ffffffff8041aa86>] tcp_verify_fackets+0x119/0x237
 [<ffffffff80416e57>] tcp_fragment+0x468/0x4b8
 [<ffffffff804184a5>] tcp_retransmit_skb+0xcf/0x2f4
 [<ffffffff8041878d>] tcp_xmit_retransmit_queue+0xc3/0x31e
 [<ffffffff8041220a>] tcp_fastretrans_alert+0xb36/0xb43
 [<ffffffff80412f0f>] tcp_ack+0x5d3/0x71b
 [<ffffffff80415229>] tcp_rcv_established+0x61f/0x6df
 [<ffffffff8025419a>] __lock_acquire+0x8a1/0xf1b
 [<ffffffff8041c7ff>] tcp_v4_do_rcv+0x3e/0x394
 [<ffffffff8041d171>] tcp_v4_rcv+0x61c/0x9a9
 [<ffffffff804017e3>] ip_local_deliver+0x1da/0x2a4
 [<ffffffff8040214e>] ip_rcv+0x583/0x5c9
 [<ffffffff8046fe43>] packet_rcv_spkt+0x19a/0x1a8
 [<ffffffff803e2e1c>] netif_receive_skb+0x2cf/0x2f5
 [<ffffffff88042505>] :tg3:tg3_poll+0x65d/0x8a4
 [<ffffffff803e2fe8>] net_rx_action+0xb8/0x191
 [<ffffffff8023a9b7>] __do_softirq+0x5f/0xe0
 [<ffffffff8020c98c>] call_softirq+0x1c/0x28
 [<ffffffff8020e9c3>] do_softirq+0x3b/0xb8
 [<ffffffff8023aaae>] irq_exit+0x4e/0x50
 [<ffffffff8020e7df>] do_IRQ+0xbd/0xd7
 [<ffffffff80209cb9>] mwait_idle+0x0/0x4d
 [<ffffffff8020bce6>] ret_from_intr+0x0/0xf
 <EOI>  [<ffffffff80209cfc>] mwait_idle+0x43/0x4d
 [<ffffffff802099fb>] enter_idle+0x22/0x24
 [<ffffffff80209c4f>] cpu_idle+0x9d/0xc0
 [<ffffffff80479591>] rest_init+0x55/0x57
 [<ffffffff8063681f>] start_kernel+0x2e0/0x2ec
 [<ffffffff80636134>] _sinittext+0x134/0x13b

TCP wq(s) --S--------<
TCP wq(i)   h        <
s1 f7 (19) p8 seq: su3633343602 hs3633352290 sn3633370354 (0)
WARNING: at /home/legoater/linux/2.6.23-rc8-mm2-tcp_fastretrans/net/ipv4/tcp_ipv4.c:193 tcp_verify_fackets()

Call Trace:
 <IRQ>  [<ffffffff80323bf6>] vgacon_set_cursor_size+0x39/0xd5
 [<ffffffff8041aa86>] tcp_verify_fackets+0x119/0x237
 [<ffffffff80416e57>] tcp_fragment+0x468/0x4b8
 [<ffffffff804184a5>] tcp_retransmit_skb+0xcf/0x2f4
 [<ffffffff8041878d>] tcp_xmit_retransmit_queue+0xc3/0x31e
 [<ffffffff8041220a>] tcp_fastretrans_alert+0xb36/0xb43
 [<ffffffff80412f0f>] tcp_ack+0x5d3/0x71b
 [<ffffffff80415229>] tcp_rcv_established+0x61f/0x6df
 [<ffffffff8025419a>] __lock_acquire+0x8a1/0xf1b
 [<ffffffff8041c7ff>] tcp_v4_do_rcv+0x3e/0x394
 [<ffffffff8041d171>] tcp_v4_rcv+0x61c/0x9a9
 [<ffffffff804017e3>] ip_local_deliver+0x1da/0x2a4
 [<ffffffff8040214e>] ip_rcv+0x583/0x5c9
 [<ffffffff8046fe43>] packet_rcv_spkt+0x19a/0x1a8
 [<ffffffff803e2e1c>] netif_receive_skb+0x2cf/0x2f5
 [<ffffffff88042505>] :tg3:tg3_poll+0x65d/0x8a4
 [<ffffffff803e2fe8>] net_rx_action+0xb8/0x191
 [<ffffffff8023a9b7>] __do_softirq+0x5f/0xe0
 [<ffffffff8020c98c>] call_softirq+0x1c/0x28
 [<ffffffff8020e9c3>] do_softirq+0x3b/0xb8
 [<ffffffff8023aaae>] irq_exit+0x4e/0x50
 [<ffffffff8020e7df>] do_IRQ+0xbd/0xd7
 [<ffffffff80209cb9>] mwait_idle+0x0/0x4d
 [<ffffffff8020bce6>] ret_from_intr+0x0/0xf
 <EOI>  [<ffffffff80209cfc>] mwait_idle+0x43/0x4d
 [<ffffffff802099fb>] enter_idle+0x22/0x24
 [<ffffffff80209c4f>] cpu_idle+0x9d/0xc0
 [<ffffffff80479591>] rest_init+0x55/0x57
 [<ffffffff8063681f>] start_kernel+0x2e0/0x2ec
 [<ffffffff80636134>] _sinittext+0x134/0x13b

WARNING: at /home/legoater/linux/2.6.23-rc8-mm2-tcp_fastretrans/net/ipv4/tcp_ipv4.c:198 tcp_verify_fackets()

Call Trace:
 <IRQ>  [<ffffffff80323bf6>] vgacon_set_cursor_size+0x39/0xd5
 [<ffffffff8041aad0>] tcp_verify_fackets+0x163/0x237
 [<ffffffff80416e57>] tcp_fragment+0x468/0x4b8
 [<ffffffff804184a5>] tcp_retransmit_skb+0xcf/0x2f4
 [<ffffffff8041878d>] tcp_xmit_retransmit_queue+0xc3/0x31e
 [<ffffffff8041220a>] tcp_fastretrans_alert+0xb36/0xb43
 [<ffffffff80412f0f>] tcp_ack+0x5d3/0x71b
 [<ffffffff80415229>] tcp_rcv_established+0x61f/0x6df
 [<ffffffff8025419a>] __lock_acquire+0x8a1/0xf1b
 [<ffffffff8041c7ff>] tcp_v4_do_rcv+0x3e/0x394
 [<ffffffff8041d171>] tcp_v4_rcv+0x61c/0x9a9
 [<ffffffff804017e3>] ip_local_deliver+0x1da/0x2a4
 [<ffffffff8040214e>] ip_rcv+0x583/0x5c9
 [<ffffffff8046fe43>] packet_rcv_spkt+0x19a/0x1a8
 [<ffffffff803e2e1c>] netif_receive_skb+0x2cf/0x2f5
 [<ffffffff88042505>] :tg3:tg3_poll+0x65d/0x8a4
 [<ffffffff803e2fe8>] net_rx_action+0xb8/0x191
 [<ffffffff8023a9b7>] __do_softirq+0x5f/0xe0
 [<ffffffff8020c98c>] call_softirq+0x1c/0x28
 [<ffffffff8020e9c3>] do_softirq+0x3b/0xb8
 [<ffffffff8023aaae>] irq_exit+0x4e/0x50
 [<ffffffff8020e7df>] do_IRQ+0xbd/0xd7
 [<ffffffff80209cb9>] mwait_idle+0x0/0x4d
 [<ffffffff8020bce6>] ret_from_intr+0x0/0xf
 <EOI>  [<ffffffff80209cfc>] mwait_idle+0x43/0x4d
 [<ffffffff802099fb>] enter_idle+0x22/0x24
 [<ffffffff80209c4f>] cpu_idle+0x9d/0xc0
 [<ffffffff80479591>] rest_init+0x55/0x57
 [<ffffffff8063681f>] start_kernel+0x2e0/0x2ec
 [<ffffffff80636134>] _sinittext+0x134/0x13b

TCP wq(s) ---S-S-------<
TCP wq(i)      hf      <
s2 f9 (19) p9 seq: su3633343602 hs3633355102 sn3633370354 (3633356550)
WARNING: at /home/legoater/linux/2.6.23-rc8-mm2-tcp_fastretrans/net/ipv4/tcp_ipv4.c:193 tcp_verify_fackets()

Call Trace:
 <IRQ>  [<ffffffff80323bf6>] vgacon_set_cursor_size+0x39/0xd5
 [<ffffffff8041aa86>] tcp_verify_fackets+0x119/0x237
 [<ffffffff80416e57>] tcp_fragment+0x468/0x4b8
 [<ffffffff804184a5>] tcp_retransmit_skb+0xcf/0x2f4
 [<ffffffff8041878d>] tcp_xmit_retransmit_queue+0xc3/0x31e
 [<ffffffff8041220a>] tcp_fastretrans_alert+0xb36/0xb43
 [<ffffffff80412f0f>] tcp_ack+0x5d3/0x71b
 [<ffffffff80415229>] tcp_rcv_established+0x61f/0x6df
 [<ffffffff8025419a>] __lock_acquire+0x8a1/0xf1b
 [<ffffffff8041c7ff>] tcp_v4_do_rcv+0x3e/0x394
 [<ffffffff8041d171>] tcp_v4_rcv+0x61c/0x9a9
 [<ffffffff804017e3>] ip_local_deliver+0x1da/0x2a4
 [<ffffffff8040214e>] ip_rcv+0x583/0x5c9
 [<ffffffff8046fe43>] packet_rcv_spkt+0x19a/0x1a8
 [<ffffffff803e2e1c>] netif_receive_skb+0x2cf/0x2f5
 [<ffffffff88042505>] :tg3:tg3_poll+0x65d/0x8a4
 [<ffffffff803e2fe8>] net_rx_action+0xb8/0x191
 [<ffffffff8023a9b7>] __do_softirq+0x5f/0xe0
 [<ffffffff8020c98c>] call_softirq+0x1c/0x28
 [<ffffffff8020e9c3>] do_softirq+0x3b/0xb8
 [<ffffffff8023aaae>] irq_exit+0x4e/0x50
 [<ffffffff8020e7df>] do_IRQ+0xbd/0xd7
 [<ffffffff80209cb9>] mwait_idle+0x0/0x4d
 [<ffffffff8020bce6>] ret_from_intr+0x0/0xf
 <EOI>  [<ffffffff80209cfc>] mwait_idle+0x43/0x4d
 [<ffffffff802099fb>] enter_idle+0x22/0x24
 [<ffffffff80209c4f>] cpu_idle+0x9d/0xc0
 [<ffffffff80479591>] rest_init+0x55/0x57
 [<ffffffff8063681f>] start_kernel+0x2e0/0x2ec
 [<ffffffff80636134>] _sinittext+0x134/0x13b

WARNING: at /home/legoater/linux/2.6.23-rc8-mm2-tcp_fastretrans/net/ipv4/tcp_ipv4.c:198 tcp_verify_fackets()

Call Trace:
 <IRQ>  [<ffffffff80323bf6>] vgacon_set_cursor_size+0x39/0xd5
 [<ffffffff8041aad0>] tcp_verify_fackets+0x163/0x237
 [<ffffffff80416e57>] tcp_fragment+0x468/0x4b8
 [<ffffffff804184a5>] tcp_retransmit_skb+0xcf/0x2f4
 [<ffffffff8041878d>] tcp_xmit_retransmit_queue+0xc3/0x31e
 [<ffffffff8041220a>] tcp_fastretrans_alert+0xb36/0xb43
 [<ffffffff80412f0f>] tcp_ack+0x5d3/0x71b
 [<ffffffff80415229>] tcp_rcv_established+0x61f/0x6df
 [<ffffffff8025419a>] __lock_acquire+0x8a1/0xf1b
 [<ffffffff8041c7ff>] tcp_v4_do_rcv+0x3e/0x394
 [<ffffffff8041d171>] tcp_v4_rcv+0x61c/0x9a9
 [<ffffffff804017e3>] ip_local_deliver+0x1da/0x2a4
 [<ffffffff8040214e>] ip_rcv+0x583/0x5c9
 [<ffffffff8046fe43>] packet_rcv_spkt+0x19a/0x1a8
 [<ffffffff803e2e1c>] netif_receive_skb+0x2cf/0x2f5
 [<ffffffff88042505>] :tg3:tg3_poll+0x65d/0x8a4
 [<ffffffff803e2fe8>] net_rx_action+0xb8/0x191
 [<ffffffff8023a9b7>] __do_softirq+0x5f/0xe0
 [<ffffffff8020c98c>] call_softirq+0x1c/0x28
 [<ffffffff8020e9c3>] do_softirq+0x3b/0xb8
 [<ffffffff8023aaae>] irq_exit+0x4e/0x50
 [<ffffffff8020e7df>] do_IRQ+0xbd/0xd7
 [<ffffffff80209cb9>] mwait_idle+0x0/0x4d
 [<ffffffff8020bce6>] ret_from_intr+0x0/0xf
 <EOI>  [<ffffffff80209cfc>] mwait_idle+0x43/0x4d
 [<ffffffff802099fb>] enter_idle+0x22/0x24
 [<ffffffff80209c4f>] cpu_idle+0x9d/0xc0
 [<ffffffff80479591>] rest_init+0x55/0x57
 [<ffffffff8063681f>] start_kernel+0x2e0/0x2ec
 [<ffffffff80636134>] _sinittext+0x134/0x13b

TCP wq(s) ----S-SSS-----<
TCP wq(i)         hf    <
s4 f11 (19) p11 seq: su3633343602 hs3633357998 sn3633370354 (3633359446)
WARNING: at /home/legoater/linux/2.6.23-rc8-mm2-tcp_fastretrans/net/ipv4/tcp_ipv4.c:193 tcp_verify_fackets()

Call Trace:
 <IRQ>  [<ffffffff80323bf6>] vgacon_set_cursor_size+0x39/0xd5
 [<ffffffff8041aa86>] tcp_verify_fackets+0x119/0x237
 [<ffffffff80416e57>] tcp_fragment+0x468/0x4b8
 [<ffffffff804184a5>] tcp_retransmit_skb+0xcf/0x2f4
 [<ffffffff8041878d>] tcp_xmit_retransmit_queue+0xc3/0x31e
 [<ffffffff8041220a>] tcp_fastretrans_alert+0xb36/0xb43
 [<ffffffff80412f0f>] tcp_ack+0x5d3/0x71b
 [<ffffffff80415229>] tcp_rcv_established+0x61f/0x6df
 [<ffffffff8025419a>] __lock_acquire+0x8a1/0xf1b
 [<ffffffff8041c7ff>] tcp_v4_do_rcv+0x3e/0x394
 [<ffffffff8041d171>] tcp_v4_rcv+0x61c/0x9a9
 [<ffffffff804017e3>] ip_local_deliver+0x1da/0x2a4
 [<ffffffff8040214e>] ip_rcv+0x583/0x5c9
 [<ffffffff8046fe43>] packet_rcv_spkt+0x19a/0x1a8
 [<ffffffff803e2e1c>] netif_receive_skb+0x2cf/0x2f5
 [<ffffffff88042505>] :tg3:tg3_poll+0x65d/0x8a4
 [<ffffffff803e2fe8>] net_rx_action+0xb8/0x191
 [<ffffffff8023a9b7>] __do_softirq+0x5f/0xe0
 [<ffffffff8020c98c>] call_softirq+0x1c/0x28
 [<ffffffff8020e9c3>] do_softirq+0x3b/0xb8
 [<ffffffff8023aaae>] irq_exit+0x4e/0x50
 [<ffffffff8020e7df>] do_IRQ+0xbd/0xd7
 [<ffffffff80209cb9>] mwait_idle+0x0/0x4d
 [<ffffffff8020bce6>] ret_from_intr+0x0/0xf
 <EOI>  [<ffffffff80209cfc>] mwait_idle+0x43/0x4d
 [<ffffffff802099fb>] enter_idle+0x22/0x24
 [<ffffffff80209c4f>] cpu_idle+0x9d/0xc0
 [<ffffffff80479591>] rest_init+0x55/0x57
 [<ffffffff8063681f>] start_kernel+0x2e0/0x2ec
 [<ffffffff80636134>] _sinittext+0x134/0x13b

WARNING: at /home/legoater/linux/2.6.23-rc8-mm2-tcp_fastretrans/net/ipv4/tcp_ipv4.c:198 tcp_verify_fackets()

Call Trace:
 <IRQ>  [<ffffffff80323bf6>] vgacon_set_cursor_size+0x39/0xd5
 [<ffffffff8041aad0>] tcp_verify_fackets+0x163/0x237
 [<ffffffff80416e57>] tcp_fragment+0x468/0x4b8
 [<ffffffff804184a5>] tcp_retransmit_skb+0xcf/0x2f4
 [<ffffffff8041878d>] tcp_xmit_retransmit_queue+0xc3/0x31e
 [<ffffffff8041220a>] tcp_fastretrans_alert+0xb36/0xb43
 [<ffffffff80412f0f>] tcp_ack+0x5d3/0x71b
 [<ffffffff80415229>] tcp_rcv_established+0x61f/0x6df
 [<ffffffff8025419a>] __lock_acquire+0x8a1/0xf1b
 [<ffffffff8041c7ff>] tcp_v4_do_rcv+0x3e/0x394
 [<ffffffff8041d171>] tcp_v4_rcv+0x61c/0x9a9
 [<ffffffff804017e3>] ip_local_deliver+0x1da/0x2a4
 [<ffffffff8040214e>] ip_rcv+0x583/0x5c9
 [<ffffffff8046fe43>] packet_rcv_spkt+0x19a/0x1a8
 [<ffffffff803e2e1c>] netif_receive_skb+0x2cf/0x2f5
 [<ffffffff88042505>] :tg3:tg3_poll+0x65d/0x8a4
 [<ffffffff803e2fe8>] net_rx_action+0xb8/0x191
 [<ffffffff8023a9b7>] __do_softirq+0x5f/0xe0
 [<ffffffff8020c98c>] call_softirq+0x1c/0x28
 [<ffffffff8020e9c3>] do_softirq+0x3b/0xb8
 [<ffffffff8023aaae>] irq_exit+0x4e/0x50
 [<ffffffff8020e7df>] do_IRQ+0xbd/0xd7
 [<ffffffff80209cb9>] mwait_idle+0x0/0x4d
 [<ffffffff8020bce6>] ret_from_intr+0x0/0xf
 <EOI>  [<ffffffff80209cfc>] mwait_idle+0x43/0x4d
 [<ffffffff802099fb>] enter_idle+0x22/0x24
 [<ffffffff80209c4f>] cpu_idle+0x9d/0xc0
 [<ffffffff80479591>] rest_init+0x55/0x57
 [<ffffffff8063681f>] start_kernel+0x2e0/0x2ec
 [<ffffffff80636134>] _sinittext+0x134/0x13b

TCP wq(s) -----S-SSSS-----<
TCP wq(i)           hf    <
s5 f12 (19) p12 seq: su3633343602 hs3633359446 sn3633370354 (3633360894)



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-2.6.24 0/3]: More TCP fixes
  2007-10-03 14:58             ` Cedric Le Goater
@ 2007-10-03 14:59               ` Cedric Le Goater
  2007-10-03 15:12               ` Cedric Le Goater
  1 sibling, 0 replies; 20+ messages in thread
From: Cedric Le Goater @ 2007-10-03 14:59 UTC (permalink / raw)
  To: Cedric Le Goater; +Cc: Ilpo Järvinen, David Miller, Netdev

Cedric Le Goater wrote:
> Ilpo Järvinen wrote:
>> On Wed, 3 Oct 2007, Cedric Le Goater wrote:
>>
>>> Ilpo Järvinen wrote:
>>>> Ah, that's path 1) then... Since you seem to have enough time, I would say 
>>>> that the path 1 is good as well and bugs unrelated to the fix will show up 
>>>> there too...
>>> arg. yes. sorry for the confusion.
>>>
>>>> I should have stated it explicitly that with path 2 those 3 patches should 
>>>> not be applied because the aim is not a fix but reproducal. Path 2 was 
>>>> intentionally left without the potentional fix as then nice backtrace 
>>>> informs when we can stop trying (which would hopefully occurred 
>>>> pretty soon) :-).  But lets discard that path 2...
>>> I have 2 spare nodes so i'll run both. 1) is on already without any issues
>>> i'm just compiling 2)
> 
> Below are the messages I got on 2) right after running ketchup (which does 
> a wget www.kernel.org) 

and a second run of ketchup gave the following.

cheers,

C.

WARNING: at /home/legoater/linux/2.6.23-rc8-mm2-tcp_fastretrans/net/ipv4/tcp_ipv4.c:193 tcp_verify_fackets()

Call Trace:
 <IRQ>  [<ffffffff8041aa86>] tcp_verify_fackets+0x119/0x237
 [<ffffffff80416e57>] tcp_fragment+0x468/0x4b8
 [<ffffffff804184a5>] tcp_retransmit_skb+0xcf/0x2f4
 [<ffffffff8041878d>] tcp_xmit_retransmit_queue+0xc3/0x31e
 [<ffffffff8041220a>] tcp_fastretrans_alert+0xb36/0xb43
 [<ffffffff80412f0f>] tcp_ack+0x5d3/0x71b
 [<ffffffff80415229>] tcp_rcv_established+0x61f/0x6df
 [<ffffffff8025419a>] __lock_acquire+0x8a1/0xf1b
 [<ffffffff8041c7ff>] tcp_v4_do_rcv+0x3e/0x394
 [<ffffffff8041d171>] tcp_v4_rcv+0x61c/0x9a9
 [<ffffffff804017e3>] ip_local_deliver+0x1da/0x2a4
 [<ffffffff8040214e>] ip_rcv+0x583/0x5c9
 [<ffffffff8046fe43>] packet_rcv_spkt+0x19a/0x1a8
 [<ffffffff803e2e1c>] netif_receive_skb+0x2cf/0x2f5
 [<ffffffff88042505>] :tg3:tg3_poll+0x65d/0x8a4
 [<ffffffff803e2fe8>] net_rx_action+0xb8/0x191
 [<ffffffff8023a9b7>] __do_softirq+0x5f/0xe0
 [<ffffffff8020c98c>] call_softirq+0x1c/0x28
 [<ffffffff8020e9c3>] do_softirq+0x3b/0xb8
 [<ffffffff8023aaae>] irq_exit+0x4e/0x50
 [<ffffffff8020e7df>] do_IRQ+0xbd/0xd7
 [<ffffffff80209cb9>] mwait_idle+0x0/0x4d
 [<ffffffff8020bce6>] ret_from_intr+0x0/0xf
 <EOI>  [<ffffffff80209cfc>] mwait_idle+0x43/0x4d
 [<ffffffff802099fb>] enter_idle+0x22/0x24
 [<ffffffff80209c4f>] cpu_idle+0x9d/0xc0
 [<ffffffff80479591>] rest_init+0x55/0x57
 [<ffffffff8063681f>] start_kernel+0x2e0/0x2ec
 [<ffffffff80636134>] _sinittext+0x134/0x13b

TCP wq(s) --S------<
TCP wq(i)   h      <
s1 f5 (14) p6 seq: su110259658 hs110265450 sn110278722 (0)
WARNING: at /home/legoater/linux/2.6.23-rc8-mm2-tcp_fastretrans/net/ipv4/tcp_ipv4.c:193 tcp_verify_fackets()

Call Trace:
 <IRQ>  [<ffffffff803250aa>] vgacon_scroll+0x188/0x1dd
 [<ffffffff8041aa86>] tcp_verify_fackets+0x119/0x237
 [<ffffffff80416e57>] tcp_fragment+0x468/0x4b8
 [<ffffffff804184a5>] tcp_retransmit_skb+0xcf/0x2f4
 [<ffffffff8041878d>] tcp_xmit_retransmit_queue+0xc3/0x31e
 [<ffffffff8041220a>] tcp_fastretrans_alert+0xb36/0xb43
 [<ffffffff80412f0f>] tcp_ack+0x5d3/0x71b
 [<ffffffff80415229>] tcp_rcv_established+0x61f/0x6df
 [<ffffffff8025419a>] __lock_acquire+0x8a1/0xf1b
 [<ffffffff8041c7ff>] tcp_v4_do_rcv+0x3e/0x394
 [<ffffffff8041d171>] tcp_v4_rcv+0x61c/0x9a9
 [<ffffffff804017e3>] ip_local_deliver+0x1da/0x2a4
 [<ffffffff8040214e>] ip_rcv+0x583/0x5c9
 [<ffffffff8046fe43>] packet_rcv_spkt+0x19a/0x1a8
 [<ffffffff803e2e1c>] netif_receive_skb+0x2cf/0x2f5
 [<ffffffff88042505>] :tg3:tg3_poll+0x65d/0x8a4
 [<ffffffff803e2fe8>] net_rx_action+0xb8/0x191
 [<ffffffff8023a9b7>] __do_softirq+0x5f/0xe0
 [<ffffffff8020c98c>] call_softirq+0x1c/0x28
 [<ffffffff8020e9c3>] do_softirq+0x3b/0xb8
 [<ffffffff8023aaae>] irq_exit+0x4e/0x50
 [<ffffffff8020e7df>] do_IRQ+0xbd/0xd7
 [<ffffffff80209cb9>] mwait_idle+0x0/0x4d
 [<ffffffff8020bce6>] ret_from_intr+0x0/0xf
 <EOI>  [<ffffffff80209cfc>] mwait_idle+0x43/0x4d
 [<ffffffff802099fb>] enter_idle+0x22/0x24
 [<ffffffff80209c4f>] cpu_idle+0x9d/0xc0
 [<ffffffff80479591>] rest_init+0x55/0x57
 [<ffffffff8063681f>] start_kernel+0x2e0/0x2ec
 [<ffffffff80636134>] _sinittext+0x134/0x13b

WARNING: at /home/legoater/linux/2.6.23-rc8-mm2-tcp_fastretrans/net/ipv4/tcp_ipv4.c:198 tcp_verify_fackets()

Call Trace:
 <IRQ>  [<ffffffff803250aa>] vgacon_scroll+0x188/0x1dd
 [<ffffffff8041aad0>] tcp_verify_fackets+0x163/0x237
 [<ffffffff80416e57>] tcp_fragment+0x468/0x4b8
 [<ffffffff804184a5>] tcp_retransmit_skb+0xcf/0x2f4
 [<ffffffff8041878d>] tcp_xmit_retransmit_queue+0xc3/0x31e
 [<ffffffff8041220a>] tcp_fastretrans_alert+0xb36/0xb43
 [<ffffffff80412f0f>] tcp_ack+0x5d3/0x71b
 [<ffffffff80415229>] tcp_rcv_established+0x61f/0x6df
 [<ffffffff8025419a>] __lock_acquire+0x8a1/0xf1b
 [<ffffffff8041c7ff>] tcp_v4_do_rcv+0x3e/0x394
 [<ffffffff8041d171>] tcp_v4_rcv+0x61c/0x9a9
 [<ffffffff804017e3>] ip_local_deliver+0x1da/0x2a4
 [<ffffffff8040214e>] ip_rcv+0x583/0x5c9
 [<ffffffff8046fe43>] packet_rcv_spkt+0x19a/0x1a8
 [<ffffffff803e2e1c>] netif_receive_skb+0x2cf/0x2f5
 [<ffffffff88042505>] :tg3:tg3_poll+0x65d/0x8a4
 [<ffffffff803e2fe8>] net_rx_action+0xb8/0x191
 [<ffffffff8023a9b7>] __do_softirq+0x5f/0xe0
 [<ffffffff8020c98c>] call_softirq+0x1c/0x28
 [<ffffffff8020e9c3>] do_softirq+0x3b/0xb8
 [<ffffffff8023aaae>] irq_exit+0x4e/0x50
 [<ffffffff8020e7df>] do_IRQ+0xbd/0xd7
 [<ffffffff80209cb9>] mwait_idle+0x0/0x4d
 [<ffffffff8020bce6>] ret_from_intr+0x0/0xf
 <EOI>  [<ffffffff80209cfc>] mwait_idle+0x43/0x4d
 [<ffffffff802099fb>] enter_idle+0x22/0x24
 [<ffffffff80209c4f>] cpu_idle+0x9d/0xc0
 [<ffffffff80479591>] rest_init+0x55/0x57
 [<ffffffff8063681f>] start_kernel+0x2e0/0x2ec
 [<ffffffff80636134>] _sinittext+0x134/0x13b

TCP wq(s) ---SSS-----<
TCP wq(i)      hf    <
s3 f7 (14) p7 seq: su110259658 hs110268346 sn110278722 (110269794)




C.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-2.6.24 0/3]: More TCP fixes
  2007-10-03 14:58             ` Cedric Le Goater
  2007-10-03 14:59               ` Cedric Le Goater
@ 2007-10-03 15:12               ` Cedric Le Goater
  2007-10-04 10:13                 ` Ilpo Järvinen
  1 sibling, 1 reply; 20+ messages in thread
From: Cedric Le Goater @ 2007-10-03 15:12 UTC (permalink / raw)
  To: Cedric Le Goater; +Cc: Ilpo Järvinen, David Miller, Netdev

Cedric Le Goater wrote:
> Ilpo Järvinen wrote:
>> On Wed, 3 Oct 2007, Cedric Le Goater wrote:
>>
>>> Ilpo Järvinen wrote:
>>>> Ah, that's path 1) then... Since you seem to have enough time, I would say 
>>>> that the path 1 is good as well and bugs unrelated to the fix will show up 
>>>> there too...
>>> arg. yes. sorry for the confusion.
>>>
>>>> I should have stated it explicitly that with path 2 those 3 patches should 
>>>> not be applied because the aim is not a fix but reproducal. Path 2 was 
>>>> intentionally left without the potentional fix as then nice backtrace 
>>>> informs when we can stop trying (which would hopefully occurred 
>>>> pretty soon) :-).  But lets discard that path 2...
>>> I have 2 spare nodes so i'll run both. 1) is on already without any issues
>>> i'm just compiling 2)
> 
> Below are the messages I got on 2) right after running ketchup (which does 
> a wget www.kernel.org) 
> 
> not a warning on 1) with your extra verbose patch.

bummer, I got this one on 1) :(

C.

WARNING: at /home/legoater/linux/net-2.6.24.git/net/ipv4/tcp_input.c:2325 tcp_fastretrans_alert()

Call Trace:
 <IRQ>  [<ffffffff8022ddb6>] __wake_up+0x1f/0x4c
 [<ffffffff803fd9d3>] tcp_ack+0xcee/0x18ac
 [<ffffffff80400764>] tcp_rcv_established+0x61f/0x6df
 [<ffffffff8024e8d8>] __lock_acquire+0x8a1/0xf1b
 [<ffffffff8040795b>] tcp_v4_do_rcv+0x3e/0x394
 [<ffffffff804082d5>] tcp_v4_rcv+0x624/0x9b1
 [<ffffffff803ecfa3>] ip_local_deliver+0x1da/0x2a4
 [<ffffffff803ed900>] ip_rcv+0x57c/0x5c4
 [<ffffffff8045ae53>] packet_rcv_spkt+0x19a/0x1a8
 [<ffffffff803ce78e>] netif_receive_skb+0x2ba/0x2de
 [<ffffffff88044505>] :tg3:tg3_poll+0x65d/0x8a4
 [<ffffffff803ce958>] net_rx_action+0xb8/0x191
 [<ffffffff802385cb>] __do_softirq+0x5f/0xe0
 [<ffffffff8020c97c>] call_softirq+0x1c/0x28
 [<ffffffff8020e672>] do_softirq+0x3b/0xb9
 [<ffffffff802386c2>] irq_exit+0x4e/0x50
 [<ffffffff8020e48e>] do_IRQ+0xbe/0xd8
 [<ffffffff80209cb9>] mwait_idle+0x0/0x4d
 [<ffffffff8020bcc6>] ret_from_intr+0x0/0xf
 <EOI>  [<ffffffff80464e10>] __sched_text_start+0x5f0/0x62b
 [<ffffffff80464e10>] __sched_text_start+0x5f0/0x62b
 [<ffffffff80209cfc>] mwait_idle+0x43/0x4d
 [<ffffffff802099fb>] enter_idle+0x22/0x24
 [<ffffffff80209c4f>] cpu_idle+0x9d/0xc0
 [<ffffffff80464513>] rest_init+0x57/0x59
 [<ffffffff8060c82a>] start_kernel+0x2d1/0x2dd
 [<ffffffff8060c14e>] _sinittext+0x14e/0x155

WARNING: at /home/legoater/linux/net-2.6.24.git/net/ipv4/tcp_input.c:2325 tcp_fastretrans_alert()

Call Trace:
 <IRQ>  [<ffffffff8022ddb6>] __wake_up+0x1f/0x4c
 [<ffffffff803fd9d3>] tcp_ack+0xcee/0x18ac
 [<ffffffff80400764>] tcp_rcv_established+0x61f/0x6df
 [<ffffffff8024e8d8>] __lock_acquire+0x8a1/0xf1b
 [<ffffffff8040795b>] tcp_v4_do_rcv+0x3e/0x394
 [<ffffffff804082d5>] tcp_v4_rcv+0x624/0x9b1
 [<ffffffff803ecfa3>] ip_local_deliver+0x1da/0x2a4
 [<ffffffff803ed900>] ip_rcv+0x57c/0x5c4
 [<ffffffff8045ae53>] packet_rcv_spkt+0x19a/0x1a8
 [<ffffffff803ce78e>] netif_receive_skb+0x2ba/0x2de
 [<ffffffff88044505>] :tg3:tg3_poll+0x65d/0x8a4
 [<ffffffff803ce958>] net_rx_action+0xb8/0x191
 [<ffffffff802385cb>] __do_softirq+0x5f/0xe0
 [<ffffffff8020c97c>] call_softirq+0x1c/0x28
 [<ffffffff8020e672>] do_softirq+0x3b/0xb9
 [<ffffffff802386c2>] irq_exit+0x4e/0x50
 [<ffffffff8020e48e>] do_IRQ+0xbe/0xd8
 [<ffffffff80209cb9>] mwait_idle+0x0/0x4d
 [<ffffffff8020bcc6>] ret_from_intr+0x0/0xf
 <EOI>  [<ffffffff80464e10>] __sched_text_start+0x5f0/0x62b
 [<ffffffff80464e10>] __sched_text_start+0x5f0/0x62b
 [<ffffffff80209cfc>] mwait_idle+0x43/0x4d
 [<ffffffff802099fb>] enter_idle+0x22/0x24
 [<ffffffff80209c4f>] cpu_idle+0x9d/0xc0
 [<ffffffff80464513>] rest_init+0x57/0x59
 [<ffffffff8060c82a>] start_kernel+0x2d1/0x2dd



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-2.6.24 0/3]: More TCP fixes
  2007-10-03 15:12               ` Cedric Le Goater
@ 2007-10-04 10:13                 ` Ilpo Järvinen
  2007-10-04 14:53                   ` Cedric Le Goater
  0 siblings, 1 reply; 20+ messages in thread
From: Ilpo Järvinen @ 2007-10-04 10:13 UTC (permalink / raw)
  To: Cedric Le Goater; +Cc: David Miller, Netdev

On Wed, 3 Oct 2007, Cedric Le Goater wrote:

> Cedric Le Goater wrote:
> > 
> > Below are the messages I got on 2) right after running ketchup (which does 
> > a wget www.kernel.org) 

Oops, those tcp_fragment WARNINGs in the other mail were due to bug in 
the debug patch as it called verify too early in there (before queue was 
adjusted, no wonder it finds state inconsistent at that point, fixed that)...

...So please discard all old debug patches, they're all broken in this 
respect... :-(

> > not a warning on 1) with your extra verbose patch.
> 
> bummer, I got this one on 1) :(
>
> WARNING: at /home/legoater/linux/net-2.6.24.git/net/ipv4/tcp_input.c:2325 tcp_fastretrans_alert()
> Call Trace:
>  <IRQ>  [<ffffffff8022ddb6>] __wake_up+0x1f/0x4c
>  [<ffffffff803fd9d3>] tcp_ack+0xcee/0x18ac
>  [<ffffffff80400764>] tcp_rcv_established+0x61f/0x6df

...I just wonder why that's the first place where it occurs... Can you try 
the debug patch below (fixed verify place in tcp_fragment/collapse, added 
some of them to narrow it down, and handled GSO more user friendly way in 
the printout). Put it on top of those three patches (mm should be fine :-)).
...I wish the verify triggers way before the fastretrans trap (for some 
reason it didn't do that in the quoted trace, maybe I had some verifys 
missing in that old patch or something)...


-- 
 i.

 include/net/tcp.h     |    3 +
 net/ipv4/tcp_input.c  |   41 +++++++++++++++---
 net/ipv4/tcp_ipv4.c   |  113 +++++++++++++++++++++++++++++++++++++++++++++++++
 net/ipv4/tcp_output.c |    4 +-
 4 files changed, 154 insertions(+), 7 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 991ccdc..54a0d91 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -43,6 +43,9 @@
 
 #include <linux/seq_file.h>
 
+extern void tcp_verify_fackets(struct sock *sk);
+extern void tcp_print_queue(struct sock *sk);
+
 extern struct inet_hashinfo tcp_hashinfo;
 
 extern atomic_t tcp_orphan_count;
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 87c9ef5..7707b1d 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1140,7 +1140,7 @@ static int tcp_check_dsack(struct tcp_sock *tp, struct sk_buff *ack_skb,
 	return dup_sack;
 }
 
-static int
+int
 tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_una)
 {
 	const struct inet_connection_sock *icsk = inet_csk(sk);
@@ -1159,6 +1159,9 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_
 	int i;
 	int first_sack_index;
 
+	if (WARN_ON(!tp->sacked_out && tp->fackets_out))
+		tcp_print_queue(sk);
+
 	if (!tp->sacked_out) {
 		if (WARN_ON(tp->fackets_out))
 			tp->fackets_out = 0;
@@ -1421,6 +1424,7 @@ tcp_sacktag_write_queue(struct sock *sk, struct sk_buff *ack_skb, u32 prior_snd_
 			}
 		}
 	}
+	tcp_verify_fackets(sk);
 
 	/* Check for lost retransmit. This superb idea is
 	 * borrowed from "ratehalving". Event "C".
@@ -1633,13 +1637,14 @@ void tcp_enter_frto(struct sock *sk)
 	tcp_set_ca_state(sk, TCP_CA_Disorder);
 	tp->high_seq = tp->snd_nxt;
 	tp->frto_counter = 1;
+	tcp_verify_fackets(sk);
 }
 
 /* Enter Loss state after F-RTO was applied. Dupack arrived after RTO,
  * which indicates that we should follow the traditional RTO recovery,
  * i.e. mark everything lost and do go-back-N retransmission.
  */
-static void tcp_enter_frto_loss(struct sock *sk, int allowed_segments, int flag)
+void tcp_enter_frto_loss(struct sock *sk, int allowed_segments, int flag)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 	struct sk_buff *skb;
@@ -1676,6 +1681,7 @@ static void tcp_enter_frto_loss(struct sock *sk, int allowed_segments, int flag)
 		}
 	}
 	tcp_verify_left_out(tp);
+	tcp_verify_fackets(sk);
 
 	tp->snd_cwnd = tcp_packets_in_flight(tp) + allowed_segments;
 	tp->snd_cwnd_cnt = 0;
@@ -1754,6 +1760,7 @@ void tcp_enter_loss(struct sock *sk, int how)
 		}
 	}
 	tcp_verify_left_out(tp);
+	tcp_verify_fackets(sk);
 
 	tp->reordering = min_t(unsigned int, tp->reordering,
 					     sysctl_tcp_reordering);
@@ -2309,7 +2316,7 @@ static void tcp_mtup_probe_success(struct sock *sk, struct sk_buff *skb)
  * It does _not_ decide what to send, it is made in function
  * tcp_xmit_retransmit_queue().
  */
-static void
+void
 tcp_fastretrans_alert(struct sock *sk, int pkts_acked, int flag)
 {
 	struct inet_connection_sock *icsk = inet_csk(sk);
@@ -2318,13 +2325,20 @@ tcp_fastretrans_alert(struct sock *sk, int pkts_acked, int flag)
 	int do_lost = is_dupack || ((flag&FLAG_DATA_SACKED) &&
 				    (tp->fackets_out > tp->reordering));
 
+	tcp_verify_fackets(sk);
+
 	/* Some technical things:
 	 * 1. Reno does not count dupacks (sacked_out) automatically. */
-	if (!tp->packets_out)
+	if (!tp->packets_out) {
+		WARN_ON(tcp_is_sack(tp) && tp->sacked_out);
 		tp->sacked_out = 0;
+	}
 
-	if (WARN_ON(!tp->sacked_out && tp->fackets_out))
+	if (WARN_ON(!tp->sacked_out && tp->fackets_out)) {
+		printk(KERN_ERR "TCP %d\n", tcp_is_reno(tp));
+		tcp_print_queue(sk);
 		tp->fackets_out = 0;
+	}
 
 	/* Now state machine starts.
 	 * A. ECE, hence prohibit cwnd undoing, the reduction is required. */
@@ -2334,6 +2348,8 @@ tcp_fastretrans_alert(struct sock *sk, int pkts_acked, int flag)
 	/* B. In all the states check for reneging SACKs. */
 	if (tp->sacked_out && tcp_check_sack_reneging(sk))
 		return;
+	
+	tcp_verify_fackets(sk);
 
 	/* C. Process data loss notification, provided it is valid. */
 	if ((flag&FLAG_DATA_LOST) &&
@@ -2390,6 +2406,8 @@ tcp_fastretrans_alert(struct sock *sk, int pkts_acked, int flag)
 		}
 	}
 
+	tcp_verify_fackets(sk);
+
 	/* F. Process state. */
 	switch (icsk->icsk_ca_state) {
 	case TCP_CA_Recovery:
@@ -2405,6 +2423,7 @@ tcp_fastretrans_alert(struct sock *sk, int pkts_acked, int flag)
 		if (!tcp_try_undo_loss(sk)) {
 			tcp_moderate_cwnd(tp);
 			tcp_xmit_retransmit_queue(sk);
+			tcp_verify_fackets(sk);
 			return;
 		}
 		if (icsk->icsk_ca_state != TCP_CA_Open)
@@ -2423,6 +2442,7 @@ tcp_fastretrans_alert(struct sock *sk, int pkts_acked, int flag)
 
 		if (!tcp_time_to_recover(sk)) {
 			tcp_try_to_open(sk, flag);
+			tcp_verify_fackets(sk);
 			return;
 		}
 
@@ -2434,6 +2454,7 @@ tcp_fastretrans_alert(struct sock *sk, int pkts_acked, int flag)
 			/* Restores the reduction we did in tcp_mtup_probe() */
 			tp->snd_cwnd++;
 			tcp_simple_retransmit(sk);
+			tcp_verify_fackets(sk);
 			return;
 		}
 
@@ -2460,11 +2481,13 @@ tcp_fastretrans_alert(struct sock *sk, int pkts_acked, int flag)
 		tp->snd_cwnd_cnt = 0;
 		tcp_set_ca_state(sk, TCP_CA_Recovery);
 	}
+	tcp_verify_fackets(sk);
 
 	if (do_lost || tcp_head_timedout(sk))
 		tcp_update_scoreboard(sk);
 	tcp_cwnd_down(sk, flag);
 	tcp_xmit_retransmit_queue(sk);
+	tcp_verify_fackets(sk);
 }
 
 /* Read draft-ietf-tcplw-high-performance before mucking
@@ -2573,7 +2596,7 @@ static u32 tcp_tso_acked(struct sock *sk, struct sk_buff *skb)
  * is before the ack sequence we can discard it as it's confirmed to have
  * arrived at the other end.
  */
-static int tcp_clean_rtx_queue(struct sock *sk, s32 *seq_rtt_p)
+int tcp_clean_rtx_queue(struct sock *sk, s32 *seq_rtt_p)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 	const struct inet_connection_sock *icsk = inet_csk(sk);
@@ -2695,6 +2718,8 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 *seq_rtt_p)
 			ca_ops->pkts_acked(sk, pkts_acked, rtt_us);
 		}
 	}
+	tcp_verify_fackets(sk);
+
 
 #if FASTRETRANS_DEBUG > 0
 	BUG_TRAP((int)tp->sacked_out >= 0);
@@ -3011,6 +3036,7 @@ static int tcp_ack(struct sock *sk, struct sk_buff *skb, int flag)
 		tcp_ca_event(sk, CA_EVENT_SLOW_ACK);
 	}
 
+	tcp_verify_fackets(sk);
 	/* We passed data and got it acked, remove any soft error
 	 * log. Something worked...
 	 */
@@ -3031,6 +3057,8 @@ static int tcp_ack(struct sock *sk, struct sk_buff *skb, int flag)
 	if (tp->frto_counter)
 		frto_cwnd = tcp_process_frto(sk, flag);
 
+	tcp_verify_fackets(sk);
+
 	if (tcp_ack_is_dubious(sk, flag)) {
 		/* Advance CWND, if state allows this. */
 		if ((flag & FLAG_DATA_ACKED) && !frto_cwnd &&
@@ -3040,6 +3068,7 @@ static int tcp_ack(struct sock *sk, struct sk_buff *skb, int flag)
 	} else {
 		if ((flag & FLAG_DATA_ACKED) && !frto_cwnd)
 			tcp_cong_avoid(sk, ack, prior_in_flight, 1);
+		tcp_verify_fackets(sk);
 	}
 
 	if ((flag & FLAG_FORWARD_PROGRESS) || !(flag&FLAG_NOT_DUP))
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 7fed0a6..6a450a7 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -108,6 +108,119 @@ struct inet_hashinfo __cacheline_aligned tcp_hashinfo = {
 	.lhash_wait  = __WAIT_QUEUE_HEAD_INITIALIZER(tcp_hashinfo.lhash_wait),
 };
 
+void tcp_print_queue(struct sock *sk)
+{
+	struct tcp_sock *tp = tcp_sk(sk);
+	struct sk_buff *skb;
+	char s[50+1];
+	char p[50+1];
+	int idx = 0, i;
+	u32 hs = tp->highest_sack;
+	
+	if (!tp->sacked_out)
+		hs = tp->snd_una;
+	
+	tcp_for_write_queue(skb, sk) {
+		if (skb == tcp_send_head(sk))
+			break;
+		
+		for (i = 0; i < tcp_skb_pcount(skb); i++) {
+			if (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED) {
+				if (i)
+					s[idx] = 'g';
+				else if (tcp_skb_pcount(skb) > 1)
+					s[idx] = 'G';
+				else if (skb->len < tp->mss_cache)
+					s[idx] = 's';
+				else
+					s[idx] = 'S';
+			} else {
+				if (i)
+					s[idx] = '-';
+				else
+					s[idx] = '+';
+			}
+			p[idx] = ' ';
+			if (!i) {
+				if ((TCP_SKB_CB(skb)->seq == hs) && (tp->fastpath_skb_hint == skb))
+					p[idx] = 'x';
+				else if (tp->fastpath_skb_hint == skb)
+					p[idx] = 'f';
+				else if (TCP_SKB_CB(skb)->seq == hs)
+					p[idx] = 'h';
+			}
+			
+			if (++idx >= 50) {
+				s[idx] = 0;
+				p[idx] = 0;
+				printk(KERN_ERR "TCP wq(s) %s\n", s);
+				printk(KERN_ERR "TCP wq(i) %s\n", p);
+				idx = 0;
+			}
+		}
+	}
+	if (idx) {
+		s[idx] = '<';
+		s[idx+1] = 0;
+		p[idx] = '<';
+		p[idx+1] = 0;
+		printk(KERN_ERR "TCP wq(s) %s\n", s);
+		printk(KERN_ERR "TCP wq(i) %s\n", p);
+	}
+	printk(KERN_ERR "s%u f%u (%u) p%u seq: su%u hs%u sn%u (%u)\n",
+		tp->sacked_out, tp->fackets_out, tp->fastpath_cnt_hint,
+		tp->packets_out,
+		tp->snd_una, tp->highest_sack, tp->snd_nxt,
+		((tp->fastpath_skb_hint == NULL) ? 0 :
+			TCP_SKB_CB(tp->fastpath_skb_hint)->seq));
+}
+
+void tcp_verify_fackets(struct sock *sk)
+{
+	struct tcp_sock *tp = tcp_sk(sk);
+	struct sk_buff *skb;
+	u32 fackets = 0;
+	int hisack_valid = 0;
+	int err = 0;
+	
+	if (tcp_is_reno(tp))
+		return;
+	
+	if (!tp->sacked_out) {
+		if (WARN_ON(tp->fackets_out))
+			err = 1;
+		else if (tp->fastpath_skb_hint == NULL)
+			return;
+	}
+	
+	/* ...expensive processing here... */
+	tcp_for_write_queue(skb, sk) {
+		if (skb == tcp_send_head(sk))
+			break;
+
+		if (tp->sacked_out && (TCP_SKB_CB(skb)->seq == tp->highest_sack)) {
+			hisack_valid = 1;
+			if (WARN_ON(tp->fackets_out != fackets + tcp_skb_pcount(skb)))
+				err = 1;
+		}
+
+		if (skb == tp->fastpath_skb_hint)
+			if (WARN_ON(fackets != tp->fastpath_cnt_hint))
+				err = 1;
+
+		if (WARN_ON((fackets > tp->fackets_out) && (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED)))
+			err = 1;
+
+		fackets += tcp_skb_pcount(skb);
+	}
+	
+	if (WARN_ON(tp->sacked_out && !hisack_valid))
+		err = 1;
+	
+	if (err)
+		tcp_print_queue(sk);
+}
+
 static int tcp_v4_get_port(struct sock *sk, unsigned short snum)
 {
 	return inet_csk_get_port(&tcp_hashinfo, sk, snum,
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 5329675..8a2917f 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -778,6 +778,7 @@ int tcp_fragment(struct sock *sk, struct sk_buff *skb, u32 len, unsigned int mss
 	/* Link BUFF into the send queue. */
 	skb_header_release(buff);
 	tcp_insert_write_queue_after(skb, buff, sk);
+	tcp_verify_fackets(sk);
 
 	return 0;
 }
@@ -1688,7 +1689,7 @@ u32 __tcp_select_window(struct sock *sk)
 }
 
 /* Attempt to collapse two adjacent SKB's during retransmission. */
-static void tcp_retrans_try_collapse(struct sock *sk, struct sk_buff *skb, int mss_now)
+void tcp_retrans_try_collapse(struct sock *sk, struct sk_buff *skb, int mss_now)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 	struct sk_buff *next_skb = tcp_write_queue_next(sk, skb);
@@ -1767,6 +1768,7 @@ static void tcp_retrans_try_collapse(struct sock *sk, struct sk_buff *skb, int m
 		}
 
 		sk_stream_free_skb(sk, next_skb);
+		tcp_verify_fackets(sk);
 	}
 }
 
-- 
1.5.0.6


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH net-2.6.24 0/3]: More TCP fixes
  2007-10-04 10:13                 ` Ilpo Järvinen
@ 2007-10-04 14:53                   ` Cedric Le Goater
  2007-10-04 20:20                     ` [PATCH net-2.6.24] [TCP]: Fix fastpath_cnt_hint when GSO skb is partially ACKed Ilpo Järvinen
  0 siblings, 1 reply; 20+ messages in thread
From: Cedric Le Goater @ 2007-10-04 14:53 UTC (permalink / raw)
  To: Ilpo Järvinen; +Cc: David Miller, Netdev

Ilpo Järvinen wrote:
> On Wed, 3 Oct 2007, Cedric Le Goater wrote:
> 
>> Cedric Le Goater wrote:
>>> Below are the messages I got on 2) right after running ketchup (which does 
>>> a wget www.kernel.org) 
> 
> Oops, those tcp_fragment WARNINGs in the other mail were due to bug in 
> the debug patch as it called verify too early in there (before queue was 
> adjusted, no wonder it finds state inconsistent at that point, fixed that)...
> 
> ...So please discard all old debug patches, they're all broken in this 
> respect... :-(
> 
>>> not a warning on 1) with your extra verbose patch.
>> bummer, I got this one on 1) :(
>>
>> WARNING: at /home/legoater/linux/net-2.6.24.git/net/ipv4/tcp_input.c:2325 tcp_fastretrans_alert()
>> Call Trace:
>>  <IRQ>  [<ffffffff8022ddb6>] __wake_up+0x1f/0x4c
>>  [<ffffffff803fd9d3>] tcp_ack+0xcee/0x18ac
>>  [<ffffffff80400764>] tcp_rcv_established+0x61f/0x6df
> 
> ...I just wonder why that's the first place where it occurs... Can you try 
> the debug patch below (fixed verify place in tcp_fragment/collapse, added 
> some of them to narrow it down, and handled GSO more user friendly way in 
> the printout). Put it on top of those three patches (mm should be fine :-)).
> ...I wish the verify triggers way before the fastretrans trap (for some 
> reason it didn't do that in the quoted trace, maybe I had some verifys 
> missing in that old patch or something)...

so here are the results on a net-2.6.24 kernel. 

I've put the patchset here to make sure it's correct: 

	http://legoater.free.fr/patches/2.6.23/net-2.6.24.git-tcp_fastretrans/

and plenty of logs :

	http://legoater.free.fr/patches/2.6.23/net-2.6.24.git-tcp_fastretrans.messages

FYI, config is here :

	http://legoater.free.fr/patches/2.6.23/net-2.6.24.git-tcp_fastretrans.config

C.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH net-2.6.24] [TCP]: Fix fastpath_cnt_hint when GSO skb is partially ACKed
  2007-10-04 14:53                   ` Cedric Le Goater
@ 2007-10-04 20:20                     ` Ilpo Järvinen
  2007-10-08  6:39                       ` David Miller
  0 siblings, 1 reply; 20+ messages in thread
From: Ilpo Järvinen @ 2007-10-04 20:20 UTC (permalink / raw)
  To: Cedric Le Goater, David Miller; +Cc: Netdev

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2784 bytes --]

On Thu, 4 Oct 2007, Cedric Le Goater wrote:

> so here are the results on a net-2.6.24 kernel. 
> 
> I've put the patchset here to make sure it's correct: 
> 
> 	http://legoater.free.fr/patches/2.6.23/net-2.6.24.git-tcp_fastretrans/
> 
> and plenty of logs :
> 
> 	http://legoater.free.fr/patches/2.6.23/net-2.6.24.git-tcp_fastretrans.messages

Thanks a lot, the bug itself seems to occur far more common than I 
thought... It just very rarely manifests in significant behavior (requires 
non-trivial ACK/SACK pattern to occur). This same bug is present in the 
soon to be released 2.6.23 and in stable too though it's nearly impossible 
to ever see that by human eye and there's a silent fackets_out "fix up" in 
place due to known (and intentional) fackets_out inaccuracy (and nobody is 
really looking that large number of TCP traces)...

The fix below, you can test it with the debug patch quite easily, all 
those messages should of course vanish :-). In case you want to 
test, please mix those three patches (fixes a real bug too), this one and 
the debug patch together...

Dave should apply both the three patch series fix and this one as well to 
net-2.6.24 (Dave, in case you want me to resubmit, just ask... :-)). I'll 
send one against .23-rc9 soon after this (sadly enough, it will cause a 
conflict...).

-- 
 i.


[PATCH net-2.6.24] [TCP]: Fix fastpath_cnt_hint when GSO skb is partially ACKed

I used to be under impression that all hints are cleared in
clean_rtx_queue whenever cumulative ACKs occur. Yet, when only
GSO skb was partially ACKed, no hints were reset, therefore
fastpath_cnt_hint must be tweaked too or else it can corrupt
fackets_out. In case there was also a skb that got fully ACKed,
the fastpath_skb_hint is set to NULL which causes a recount for 
fastpath_cnt_hint (the old value won't be accessed anymore),
thus it can safely be decremented without additional checking.

Reported by Cedric Le Goater <clg@fr.ibm.com>, two other bugs
have been already found and patched due to that report, and
third one (unrelated to this fix) is under evaluation
currently :-)...

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
---
 net/ipv4/tcp_input.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 87c9ef5..4268cd1 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2674,6 +2674,9 @@ static int tcp_clean_rtx_queue(struct sock *sk, s32 *seq_rtt_p)
 		tcp_rearm_rto(sk);
 
 		tp->fackets_out -= min(pkts_acked, tp->fackets_out);
+		/* hint's skb might be NULL but we don't need to care */
+		tp->fastpath_cnt_hint -= min_t(u32, pkts_acked,
+					       tp->fastpath_cnt_hint);
 		if (tcp_is_reno(tp))
 			tcp_remove_reno_sacks(sk, pkts_acked);
 
-- 
1.5.0.6

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 1/3] [TCP]: Fix two off-by-one errors in fackets_out adjusting logic
  2007-10-03 11:00 ` [PATCH 1/3] [TCP]: Fix two off-by-one errors in fackets_out adjusting logic Ilpo Järvinen
  2007-10-03 11:00   ` [PATCH 2/3] [TCP]: Comment fastpath_cnt_hint off-by-one trap Ilpo Järvinen
@ 2007-10-08  6:36   ` David Miller
  1 sibling, 0 replies; 20+ messages in thread
From: David Miller @ 2007-10-08  6:36 UTC (permalink / raw)
  To: ilpo.jarvinen; +Cc: legoater, netdev

From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi>
Date: Wed,  3 Oct 2007 14:00:16 +0300

> 1) Passing wrong skb to tcp_adjust_fackets_out could corrupt
> fastpath_cnt_hint as tcp_skb_pcount(next_skb) is not included
> to it if hint points exactly to the next_skb (it's lagging
> behind, see sacktag).
> 
> 2) When fastpath_skb_hint is put backwards to avoid dangling
> skb reference, the skb's pcount must also be removed from count
> (not included like above).
> 
> Reported by Cedric Le Goater <legoater@free.fr>
> 
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>

Applied, thanks Ilpo.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 2/3] [TCP]: Comment fastpath_cnt_hint off-by-one trap
  2007-10-03 11:00   ` [PATCH 2/3] [TCP]: Comment fastpath_cnt_hint off-by-one trap Ilpo Järvinen
  2007-10-03 11:00     ` [PATCH 3/3] [TCP]: "Annotate" another fackets_out state reset Ilpo Järvinen
@ 2007-10-08  6:37     ` David Miller
  1 sibling, 0 replies; 20+ messages in thread
From: David Miller @ 2007-10-08  6:37 UTC (permalink / raw)
  To: ilpo.jarvinen; +Cc: legoater, netdev

From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi>
Date: Wed,  3 Oct 2007 14:00:17 +0300

> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>

Applied.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 3/3] [TCP]: "Annotate" another fackets_out state reset
  2007-10-03 11:00     ` [PATCH 3/3] [TCP]: "Annotate" another fackets_out state reset Ilpo Järvinen
@ 2007-10-08  6:38       ` David Miller
  0 siblings, 0 replies; 20+ messages in thread
From: David Miller @ 2007-10-08  6:38 UTC (permalink / raw)
  To: ilpo.jarvinen; +Cc: legoater, netdev

From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi>
Date: Wed,  3 Oct 2007 14:00:18 +0300

> This should no longer be necessary because fackets_out is
> accurate. It indicates bugs elsewhere, thus report it.
> 
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>

Applied, thanks.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-2.6.24] [TCP]: Fix fastpath_cnt_hint when GSO skb is partially ACKed
  2007-10-04 20:20                     ` [PATCH net-2.6.24] [TCP]: Fix fastpath_cnt_hint when GSO skb is partially ACKed Ilpo Järvinen
@ 2007-10-08  6:39                       ` David Miller
  0 siblings, 0 replies; 20+ messages in thread
From: David Miller @ 2007-10-08  6:39 UTC (permalink / raw)
  To: ilpo.jarvinen; +Cc: clg, netdev

From: "Ilpo_Järvinen" <ilpo.jarvinen@helsinki.fi>
Date: Thu, 4 Oct 2007 23:20:34 +0300 (EEST)

> Dave should apply both the three patch series fix and this one as well to 
> net-2.6.24 (Dave, in case you want me to resubmit, just ask... :-)). I'll 
> send one against .23-rc9 soon after this (sadly enough, it will cause a 
> conflict...).

Ok, all the net-2.6.24 bits are in, I'll take care of -stable...

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2007-10-08  6:39 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-03 11:00 [PATCH net-2.6.24 0/3]: More TCP fixes Ilpo Järvinen
2007-10-03 11:00 ` [PATCH 1/3] [TCP]: Fix two off-by-one errors in fackets_out adjusting logic Ilpo Järvinen
2007-10-03 11:00   ` [PATCH 2/3] [TCP]: Comment fastpath_cnt_hint off-by-one trap Ilpo Järvinen
2007-10-03 11:00     ` [PATCH 3/3] [TCP]: "Annotate" another fackets_out state reset Ilpo Järvinen
2007-10-08  6:38       ` David Miller
2007-10-08  6:37     ` [PATCH 2/3] [TCP]: Comment fastpath_cnt_hint off-by-one trap David Miller
2007-10-08  6:36   ` [PATCH 1/3] [TCP]: Fix two off-by-one errors in fackets_out adjusting logic David Miller
2007-10-03 11:58 ` [PATCH net-2.6.24 0/3]: More TCP fixes Cedric Le Goater
2007-10-03 12:34   ` Ilpo Järvinen
2007-10-03 12:48     ` Cedric Le Goater
2007-10-03 13:19       ` Ilpo Järvinen
2007-10-03 14:05         ` Cedric Le Goater
2007-10-03 14:22           ` Ilpo Järvinen
2007-10-03 14:58             ` Cedric Le Goater
2007-10-03 14:59               ` Cedric Le Goater
2007-10-03 15:12               ` Cedric Le Goater
2007-10-04 10:13                 ` Ilpo Järvinen
2007-10-04 14:53                   ` Cedric Le Goater
2007-10-04 20:20                     ` [PATCH net-2.6.24] [TCP]: Fix fastpath_cnt_hint when GSO skb is partially ACKed Ilpo Järvinen
2007-10-08  6:39                       ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).