netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Two TCP/IP warnings in 2.6.25-rc7-git1
@ 2008-03-31 21:05 Arjan van de Ven
  2008-04-03  5:21 ` Three " Arjan van de Ven
  0 siblings, 1 reply; 8+ messages in thread
From: Arjan van de Ven @ 2008-03-31 21:05 UTC (permalink / raw)
  To: NetDev; +Cc: ilpo.jarvinen, David Miller

Hi,

kerneloops.org still has 3 tcp/ip related warnings in the top 15 of oopses/warnings
that users (mostly of Fedora 9 alpha/beta, but also LKML and netdev) see.
All three have been seen as recent as 2.6.25-rc7-git1, but seem to date back to around
2.6.24-rc4 era.

Number 1 is a WARN_ON in tcp_ack(), there are several backtraces at
http://www.kerneloops.org/search.php?search=tcp_ack but the core
comes down to the following:

tcp_ack
tcp_rcv_established
tcp_v4_do_rcv
tcp_v4_rcv
ip_local_deliver_finish
ip_local_deliver
ip_rcv_finish
ip_rcv
netif_receive_skb
tg3_poll or nv_napi_poll or the e1000 equivalent
net_rx_action
__do_softirq
do_softirq
irq_exit
do_IRQ


Number 2 is a WARN_ON in tcp_enter_frto(), backtraces at
http://www.kerneloops.org/search.php?search=tcp_enter_frto
the core backtrace comes down to

tcp_enter_frto
tcp_write_timer
run_timer_softirq
__do_softirq
do_softirq


Number 3 is a WARN_ON in tcp_mark_head+lost, backtraces at
http://www.kerneloops.org/search.php?search=tcp_mark_head_lost
the core backtrace comes down to

tcp_mark_head_lost
tcp_ack
tcp_rcv_established
tcp_v4_do_rcv
tcp_v4_rcv
ip_local_deliver_finish
ip_local_deliver
ip_rcv_finish
ip_rcv
netif_receive_skb
tg3_poll or nv_napi_poll
net_rx_action
__do_softirq


The URLs link to several mailinglist/bugzilla postings for these.
Note that not all backtraces have such a thing; a lot of these backtraces
are collected by an automated daemon and submitted directly from testers/users
systems to the database.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Three TCP/IP warnings in 2.6.25-rc7-git1
  2008-03-31 21:05 Two TCP/IP warnings in 2.6.25-rc7-git1 Arjan van de Ven
@ 2008-04-03  5:21 ` Arjan van de Ven
  2008-04-03  5:28   ` David Miller
  0 siblings, 1 reply; 8+ messages in thread
From: Arjan van de Ven @ 2008-04-03  5:21 UTC (permalink / raw)
  To: NetDev; +Cc: ilpo.jarvinen, David Miller

Arjan van de Ven wrote:
> Hi,
> 
> kerneloops.org still has 3 tcp/ip related warnings in the top 15 of 
> oopses/warnings
> that users (mostly of Fedora 9 alpha/beta, but also LKML and netdev) see.
> All three have been seen as recent as 2.6.25-rc7-git1, but seem to date 
> back to around
> 2.6.24-rc4 era.


the warnings below are still present in 2.6.25-rc8.....

It would be not so nice to ship a 2.6.25 that triggers these frequently,
it has the risk of causing many user bugreports to this list and other
places...

> 
> Number 1 is a WARN_ON in tcp_ack(), there are several backtraces at
> http://www.kerneloops.org/search.php?search=tcp_ack but the core
> comes down to the following:
> 
> tcp_ack
> tcp_rcv_established
> tcp_v4_do_rcv
> tcp_v4_rcv
> ip_local_deliver_finish
> ip_local_deliver
> ip_rcv_finish
> ip_rcv
> netif_receive_skb
> tg3_poll or nv_napi_poll or the e1000 equivalent
> net_rx_action
> __do_softirq
> do_softirq
> irq_exit
> do_IRQ
> 
> 
> Number 2 is a WARN_ON in tcp_enter_frto(), backtraces at
> http://www.kerneloops.org/search.php?search=tcp_enter_frto
> the core backtrace comes down to
> 
> tcp_enter_frto
> tcp_write_timer
> run_timer_softirq
> __do_softirq
> do_softirq
> 
> 
> Number 3 is a WARN_ON in tcp_mark_head+lost, backtraces at
> http://www.kerneloops.org/search.php?search=tcp_mark_head_lost
> the core backtrace comes down to
> 
> tcp_mark_head_lost
> tcp_ack
> tcp_rcv_established
> tcp_v4_do_rcv
> tcp_v4_rcv
> ip_local_deliver_finish
> ip_local_deliver
> ip_rcv_finish
> ip_rcv
> netif_receive_skb
> tg3_poll or nv_napi_poll
> net_rx_action
> __do_softirq
> 
> 
> The URLs link to several mailinglist/bugzilla postings for these.
> Note that not all backtraces have such a thing; a lot of these backtraces
> are collected by an automated daemon and submitted directly from 
> testers/users
> systems to the database.
> 


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Three TCP/IP warnings in 2.6.25-rc7-git1
  2008-04-03  5:21 ` Three " Arjan van de Ven
@ 2008-04-03  5:28   ` David Miller
  2008-04-03  5:40     ` Arjan van de Ven
  2008-04-03  5:56     ` Denys Fedoryshchenko
  0 siblings, 2 replies; 8+ messages in thread
From: David Miller @ 2008-04-03  5:28 UTC (permalink / raw)
  To: arjan; +Cc: netdev, ilpo.jarvinen

From: Arjan van de Ven <arjan@linux.intel.com>
Date: Wed, 02 Apr 2008 22:21:09 -0700

> It would be not so nice to ship a 2.6.25 that triggers these frequently,
> it has the risk of causing many user bugreports to this list and other
> places...

I wouldn't call it frequently, on at least an individual basis,
because nobody we try to get feedback from can reproduce this
reliably.  That's the only reason this isn't fixed yet, despite
all of Ilpo's efforts.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Three TCP/IP warnings in 2.6.25-rc7-git1
  2008-04-03  5:28   ` David Miller
@ 2008-04-03  5:40     ` Arjan van de Ven
  2008-04-03  5:56     ` Denys Fedoryshchenko
  1 sibling, 0 replies; 8+ messages in thread
From: Arjan van de Ven @ 2008-04-03  5:40 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, ilpo.jarvinen

David Miller wrote:
> From: Arjan van de Ven <arjan@linux.intel.com>
> Date: Wed, 02 Apr 2008 22:21:09 -0700
> 
>> It would be not so nice to ship a 2.6.25 that triggers these frequently,
>> it has the risk of causing many user bugreports to this list and other
>> places...
> 
> I wouldn't call it frequently, on at least an individual basis,
> because nobody we try to get feedback from can reproduce this
> reliably.  That's the only reason this isn't fixed yet, despite
> all of Ilpo's efforts.

(first of all I'm not trying to say you guys aren't trying; I know chasing such stuff is
really hard; I'm just trying to help by giving information on when/what it happens, and that it hasn't been
reported prior to 2.6.24-rc4 for example)

What I meant by frequently is that the fedora rawhide users have seen this 24 times
in the last 7 days. The Fedora rawhide userbase is a lot smaller than the total userbase of
what will be the 2.6.25 kernel, so I suspect that the total number of people who'll find
this at least once when 2.6.25 goes out is likely to be much higher...
(to counter that.. it's a WARN_ON so it's a good question if those people will actually see it and
then report it)

I didn't mean it in the sense that one person sees it all the time.
(Which would obviously MUCH nicer debugging wise)

If there's a way to add more WARN_ON's to places that are suspect that would work; via kerneloops.org
there'll be at least statistical information if those places hit (and possibly they can be correlated
to these warnings happening). This doesn't mean you'll have someone who can try this and say it goes away,
but it also means the testbase for this is a lot wider... eg when the fish are rare, cast a wider net.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Three TCP/IP warnings in 2.6.25-rc7-git1
  2008-04-03  5:28   ` David Miller
  2008-04-03  5:40     ` Arjan van de Ven
@ 2008-04-03  5:56     ` Denys Fedoryshchenko
  2008-04-03  9:53       ` Ilpo Järvinen
  1 sibling, 1 reply; 8+ messages in thread
From: Denys Fedoryshchenko @ 2008-04-03  5:56 UTC (permalink / raw)
  To: David Miller, arjan; +Cc: netdev, ilpo.jarvinen

I can reproduce it during one day on my loaded proxies. I have:

[223178.376188] WARNING: at net/ipv4/tcp_input.c:2532 tcp_ack+0xd83/0x17bd()
and
[273139.505308] WARNING: at net/ipv4/tcp_input.c:2173
tcp_mark_head_lost+0x11e/0x126()

This two... maybe third, but i didn't notice it yet.

On Wed, 02 Apr 2008 22:28:46 -0700 (PDT), David Miller wrote
> From: Arjan van de Ven <arjan@linux.intel.com>
> Date: Wed, 02 Apr 2008 22:21:09 -0700
> 
> > It would be not so nice to ship a 2.6.25 that triggers these frequently,
> > it has the risk of causing many user bugreports to this list and other
> > places...
> 
> I wouldn't call it frequently, on at least an individual basis,
> because nobody we try to get feedback from can reproduce this
> reliably.  That's the only reason this isn't fixed yet, despite
> all of Ilpo's efforts.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Three TCP/IP warnings in 2.6.25-rc7-git1
  2008-04-03  5:56     ` Denys Fedoryshchenko
@ 2008-04-03  9:53       ` Ilpo Järvinen
  2008-04-03 12:14         ` Denys Fedoryshchenko
  0 siblings, 1 reply; 8+ messages in thread
From: Ilpo Järvinen @ 2008-04-03  9:53 UTC (permalink / raw)
  To: Denys Fedoryshchenko; +Cc: David Miller, arjan, Netdev

On Thu, 3 Apr 2008, Denys Fedoryshchenko wrote:

> I can reproduce it during one day on my loaded proxies. I have:
> 
> [223178.376188] WARNING: at net/ipv4/tcp_input.c:2532 tcp_ack+0xd83/0x17bd()
> and
> [273139.505308] WARNING: at net/ipv4/tcp_input.c:2173
> tcp_mark_head_lost+0x11e/0x126()
> 
> This two... maybe third, but i didn't notice it yet.

Here's a low cost patch to gather at least some info about it. I suppose 
you're not able to run the processing a processing expensive verification?

--
 i.


---
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 723b368..db641ae 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -776,7 +776,7 @@ static inline __u32 tcp_current_ssthresh(const struct sock *sk)
 }
 
 /* Use define here intentionally to get WARN_ON location shown at the caller */
-#define tcp_verify_left_out(tp)	WARN_ON(tcp_left_out(tp) > tp->packets_out)
+extern void tcp_verify_left_out(struct tcp_sock *tp);
 
 extern void tcp_enter_cwr(struct sock *sk, const int set_ssthresh);
 extern __u32 tcp_init_cwnd(struct tcp_sock *tp, struct dst_entry *dst);
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 6e46b4c..bc4cd30 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -167,6 +167,14 @@ static void tcp_measure_rcv_mss(struct sock *sk, const struct sk_buff *skb)
 	}
 }
 
+void tcp_verify_left_out(struct tcp_sock *tp)
+{
+	if (WARN_ON(tcp_left_out(tp) > tp->packets_out)) {
+		pr_err("TCP debug %u+%u/%u %d, please report to ilpo.jarvinen@helsinki.fi\n",
+			tp->lost_out, tp->sacked_out, tp->packets_out, tp->rx_opt.sack_ok);
+	}
+}
+
 static void tcp_incr_quickack(struct sock *sk)
 {
 	struct inet_connection_sock *icsk = inet_csk(sk);
@@ -2141,6 +2149,11 @@ static void tcp_mark_head_lost(struct sock *sk, int packets, int fast_rexmit)
 	int cnt;
 
 	BUG_TRAP(packets <= tp->packets_out);
+	if (packets > tp->packets_out) {
+		pr_err("TCP debug %u-%u/%u %d, please report to ilpo.jarvinen@helsinki.fi\n",
+			tp->fackets_out, tp->reordering, tp->packets_out,
+			tp->rx_opt.sack_ok);
+	}
 	if (tp->lost_skb_hint) {
 		skb = tp->lost_skb_hint;
 		cnt = tp->lost_cnt_hint;
@@ -2507,8 +2520,11 @@ static void tcp_fastretrans_alert(struct sock *sk, int pkts_acked, int flag)
 
 	if (WARN_ON(!tp->packets_out && tp->sacked_out))
 		tp->sacked_out = 0;
-	if (WARN_ON(!tp->sacked_out && tp->fackets_out))
+	if (WARN_ON(!tp->sacked_out && tp->fackets_out)) {
+		pr_err("TCP debug %u/%u %d, please report to ilpo.jarvinen@helsinki.fi\n",
+			tp->fackets_out, tp->packets_out, tp->rx_opt.sack_ok);
 		tp->fackets_out = 0;
+	}
 
 	/* Now state machine starts.
 	 * A. ECE, hence prohibit cwnd undoing, the reduction is required. */
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 6e25540..36a12f3 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1892,6 +1892,7 @@ int tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb)
 	    (TCP_SKB_CB(skb)->flags & TCPCB_FLAG_FIN) &&
 	    tp->snd_una == (TCP_SKB_CB(skb)->end_seq - 1)) {
 		if (!pskb_trim(skb, 0)) {
+			WARN_ON(TCP_SKB_CB(skb)->sacked & TCPCB_LOST);
 			/* Reuse, even though it does some unnecessary work */
 			tcp_init_nondata_skb(skb, TCP_SKB_CB(skb)->end_seq - 1,
 					     TCP_SKB_CB(skb)->flags);

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: Three TCP/IP warnings in 2.6.25-rc7-git1
  2008-04-03  9:53       ` Ilpo Järvinen
@ 2008-04-03 12:14         ` Denys Fedoryshchenko
  2008-04-03 13:11           ` Ilpo Järvinen
  0 siblings, 1 reply; 8+ messages in thread
From: Denys Fedoryshchenko @ 2008-04-03 12:14 UTC (permalink / raw)
  To: Ilpo [UTF-8?]Jц╓rvinen; +Cc: David Miller, arjan, Netdev


I will apply this patch tonight. And will send all output most probably at
morning. At the current moment probably expensive things i cannot try, because
it is loaded proxy. But after 2-3 days when i will be a bit free - we can try
to do them, maybe it will withstand.


--
Denys Fedoryshchenko
Technical Manager
Virtual ISP S.A.L.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Three TCP/IP warnings in 2.6.25-rc7-git1
  2008-04-03 12:14         ` Denys Fedoryshchenko
@ 2008-04-03 13:11           ` Ilpo Järvinen
  0 siblings, 0 replies; 8+ messages in thread
From: Ilpo Järvinen @ 2008-04-03 13:11 UTC (permalink / raw)
  To: Denys Fedoryshchenko; +Cc: David Miller, arjan, Netdev

On Thu, 3 Apr 2008, Denys Fedoryshchenko wrote:

> I will apply this patch tonight. And will send all output most probably at
> morning. At the current moment probably expensive things i cannot try, because
> it is loaded proxy. But after 2-3 days when i will be a bit free - we can try
> to do them, maybe it will withstand.

IMHO there's no need to take risks currently, lets see what the debug 
info reveals, it might even show that some (maybe even all because 
sacktag bugs haven't been seen for a while) fackets_out problems occur 
for non-SACK TCP, which won't be a problem at all and that one could 
safely be silenced if so desired...


-- 
 i.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2008-04-03 13:11 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-03-31 21:05 Two TCP/IP warnings in 2.6.25-rc7-git1 Arjan van de Ven
2008-04-03  5:21 ` Three " Arjan van de Ven
2008-04-03  5:28   ` David Miller
2008-04-03  5:40     ` Arjan van de Ven
2008-04-03  5:56     ` Denys Fedoryshchenko
2008-04-03  9:53       ` Ilpo Järvinen
2008-04-03 12:14         ` Denys Fedoryshchenko
2008-04-03 13:11           ` Ilpo Järvinen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).