* oops in tcp_xmit_retransmit_queue() w/ v2.6.32.15 @ 2010-07-08 8:22 Tejun Heo 2010-07-11 2:36 ` David Miller 2010-07-11 16:09 ` Ilpo Järvinen 0 siblings, 2 replies; 29+ messages in thread From: Tejun Heo @ 2010-07-08 8:22 UTC (permalink / raw) To: David S. Miller, lkml, netdev@vger.kernel.org Cc: Fehrmann, Henning, Carsten Aulbert [-- Attachment #1: Type: text/plain, Size: 1970 bytes --] Hello, We've been seeing oops in tcp_xmit_retransmit_queue() w/ 2.6.32.15. Please see the attached photoshoot. This is happening on a HPC cluster and very interestingly caused by one particular job. How long it takes isn't clear yet (at least more than a day) but when it happens it happens on a lot of machines in relatively short time. With a bit of disassemblying, I've found that the oops is happening during tcp_for_write_queue_from() because the skb->next points to NULL. void tcp_xmit_retransmit_queue(struct sock *sk) { ... if (tp->retransmit_skb_hint) { skb = tp->retransmit_skb_hint; last_lost = TCP_SKB_CB(skb)->end_seq; if (after(last_lost, tp->retransmit_high)) last_lost = tp->retransmit_high; } else { skb = tcp_write_queue_head(sk); last_lost = tp->snd_una; } => tcp_for_write_queue_from(skb, sk) { __u8 sacked = TCP_SKB_CB(skb)->sacked; if (skb == tcp_send_head(sk)) break; /* we could do better than to assign each time */ if (hole == NULL) This can happen for one of the following reasons, 1. tp->retransmit_skb_hint is NULL and tcp_write_queue_head() is NULL too. ie. tcp_xmit_retransmit_queue() is called on an empty write queue for some reason. 2. tp->retransmit_skb_hint is pointing to a skb which is not on the write_queue. ie. somebody forgot to update hint while removing the skb from the write queue. 3. The hint is pointing to a skb on the list but the list itself is corrupt. I added some debug code and the crash is happening when tp->retransmit_skb_hint is not NULL but tp->retransmit_skb_hint->next is NULL. So, #1 is out; unfortunately, I didn't have debug code in place to discern between #2 and #3. Does anything ring a bell? This is a production system and debugging affects quite a number of people. I can put debug code in to discern between #2 and #3 but I'm basically shooting in the dark and it would be great if someone has a better idea. Thanks. -- tejun [-- Attachment #2: oops.jpg --] [-- Type: image/jpeg, Size: 92585 bytes --] ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: oops in tcp_xmit_retransmit_queue() w/ v2.6.32.15 2010-07-08 8:22 oops in tcp_xmit_retransmit_queue() w/ v2.6.32.15 Tejun Heo @ 2010-07-11 2:36 ` David Miller 2010-07-11 16:09 ` Ilpo Järvinen 1 sibling, 0 replies; 29+ messages in thread From: David Miller @ 2010-07-11 2:36 UTC (permalink / raw) To: tj; +Cc: linux-kernel, netdev, henning.fehrmann, carsten.aulbert From: Tejun Heo <tj@kernel.org> Date: Thu, 08 Jul 2010 10:22:02 +0200 > We've been seeing oops in tcp_xmit_retransmit_queue() w/ 2.6.32.15. ... > Does anything ring a bell? A long time ago we had a packet scheduler bug that could corrupt the TCP socket queues, but that was fixed in 2.6.27 so would definitely be fixed in your kernel. -------------------- commit 69747650c814a8a79fef412c7416adf823293a3e Author: David S. Miller <davem@davemloft.net> Date: Sun Aug 17 23:55:36 2008 -0700 pkt_sched: Fix return value corruption in HTB and TBF. Based upon a bug report by Josip Rodin. Packet schedulers should only return NET_XMIT_DROP iff the packet really was dropped. If the packet does reach the device after we return NET_XMIT_DROP then TCP can crash because it depends upon the enqueue path return values being accurate. Signed-off-by: David S. Miller <davem@davemloft.net> -------------------- Nothing else jumps to mind, sorry. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: oops in tcp_xmit_retransmit_queue() w/ v2.6.32.15 2010-07-08 8:22 oops in tcp_xmit_retransmit_queue() w/ v2.6.32.15 Tejun Heo 2010-07-11 2:36 ` David Miller @ 2010-07-11 16:09 ` Ilpo Järvinen 2010-07-11 17:06 ` Eric Dumazet 2010-07-15 11:58 ` Lennart Schulte 1 sibling, 2 replies; 29+ messages in thread From: Ilpo Järvinen @ 2010-07-11 16:09 UTC (permalink / raw) To: Tejun Heo Cc: David S. Miller, lkml, netdev@vger.kernel.org, Fehrmann, Henning, Carsten Aulbert, Eric Dumazet On Thu, 8 Jul 2010, Tejun Heo wrote: > We've been seeing oops in tcp_xmit_retransmit_queue() w/ 2.6.32.15. > Please see the attached photoshoot. This is happening on a HPC > cluster and very interestingly caused by one particular job. How long > it takes isn't clear yet (at least more than a day) but when it > happens it happens on a lot of machines in relatively short time. > > With a bit of disassemblying, I've found that the oops is happening > during tcp_for_write_queue_from() because the skb->next points to > NULL. > > void tcp_xmit_retransmit_queue(struct sock *sk) > { > ... > if (tp->retransmit_skb_hint) { > skb = tp->retransmit_skb_hint; > last_lost = TCP_SKB_CB(skb)->end_seq; > if (after(last_lost, tp->retransmit_high)) > last_lost = tp->retransmit_high; > } else { > skb = tcp_write_queue_head(sk); > last_lost = tp->snd_una; > } > > => tcp_for_write_queue_from(skb, sk) { > __u8 sacked = TCP_SKB_CB(skb)->sacked; > > if (skb == tcp_send_head(sk)) > break; > /* we could do better than to assign each time */ > if (hole == NULL) > > This can happen for one of the following reasons, > > 1. tp->retransmit_skb_hint is NULL and tcp_write_queue_head() is NULL > too. ie. tcp_xmit_retransmit_queue() is called on an empty write > queue for some reason. > > 2. tp->retransmit_skb_hint is pointing to a skb which is not on the > write_queue. ie. somebody forgot to update hint while removing the > skb from the write queue. Once again I've read the unlinkers through, and only thing that could cause this is tcp_send_synack (others do deal with the hints) but I think Eric already proposed a patch to that but we never got anywhere due to some counterargument why it wouldn't take place (too far away for me to remember, see archives about the discussions). ...But if you want be dead sure some WARN_ON there might not hurt. Also the purging of the whole queue was a similar suspect I then came across (but that would only materialize with sk reuse happening e.g., with nfs which the other guys weren't using). > 3. The hint is pointing to a skb on the list but the list itself is > corrupt. > > I added some debug code and the crash is happening when > tp->retransmit_skb_hint is not NULL but tp->retransmit_skb_hint->next > is NULL. So, #1 is out; unfortunately, I didn't have debug code in > place to discern between #2 and #3. > > Does anything ring a bell? This is a production system and debugging > affects quite a number of people. I can put debug code in to discern > between #2 and #3 but I'm basically shooting in the dark and it would > be great if someone has a better idea. Thanks for taking this up. I've been kind of waiting somebody to show up who actually has some way of reproducing it. Once I had one guy in the hook but his ability to reproduce was for some reason lost when he tried with a debug patch [1]. I now realize that the debug patch should probably also print the write queue too when the problem is caught in order to discern the cases you mention. Something along these lines: tcp_for_write_queue(skb, sk) { printk("skb %p (%u-%u) next %p prev %p sacked %u\n", ...); } Anyway, my debugging patch should be such that in a lucky case it avoids crashing the system too, though price to pay might then be a stuck connection. In case #3 I'd expect the box to die elsewhere in TCP code pretty soon anyway so it depends whether avoiding oops is really so useful, but if you're lucky other mechanism in TCP will recover the lost one for you (basically RTO driven retransmission). -- i. [1] http://marc.info/?l=linux-kernel&m=126624014117610&w=2 ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: oops in tcp_xmit_retransmit_queue() w/ v2.6.32.15 2010-07-11 16:09 ` Ilpo Järvinen @ 2010-07-11 17:06 ` Eric Dumazet 2010-07-11 17:46 ` Eric Dumazet 2010-07-15 11:58 ` Lennart Schulte 1 sibling, 1 reply; 29+ messages in thread From: Eric Dumazet @ 2010-07-11 17:06 UTC (permalink / raw) To: Ilpo Järvinen Cc: Tejun Heo, David S. Miller, lkml, netdev@vger.kernel.org, Fehrmann, Henning, Carsten Aulbert Le dimanche 11 juillet 2010 à 19:09 +0300, Ilpo Järvinen a écrit : > On Thu, 8 Jul 2010, Tejun Heo wrote: > > > We've been seeing oops in tcp_xmit_retransmit_queue() w/ 2.6.32.15. > > Please see the attached photoshoot. This is happening on a HPC > > cluster and very interestingly caused by one particular job. How long > > it takes isn't clear yet (at least more than a day) but when it > > happens it happens on a lot of machines in relatively short time. > > > > With a bit of disassemblying, I've found that the oops is happening > > during tcp_for_write_queue_from() because the skb->next points to > > NULL. > > > > void tcp_xmit_retransmit_queue(struct sock *sk) > > { > > ... > > if (tp->retransmit_skb_hint) { > > skb = tp->retransmit_skb_hint; > > last_lost = TCP_SKB_CB(skb)->end_seq; > > if (after(last_lost, tp->retransmit_high)) > > last_lost = tp->retransmit_high; > > } else { > > skb = tcp_write_queue_head(sk); > > last_lost = tp->snd_una; > > } > > > > => tcp_for_write_queue_from(skb, sk) { > > __u8 sacked = TCP_SKB_CB(skb)->sacked; > > > > if (skb == tcp_send_head(sk)) > > break; > > /* we could do better than to assign each time */ > > if (hole == NULL) > > > > This can happen for one of the following reasons, > > > > 1. tp->retransmit_skb_hint is NULL and tcp_write_queue_head() is NULL > > too. ie. tcp_xmit_retransmit_queue() is called on an empty write > > queue for some reason. > > > > 2. tp->retransmit_skb_hint is pointing to a skb which is not on the > > write_queue. ie. somebody forgot to update hint while removing the > > skb from the write queue. > > Once again I've read the unlinkers through, and only thing that could > cause this is tcp_send_synack (others do deal with the hints) but I think > Eric already proposed a patch to that but we never got anywhere due to > some counterargument why it wouldn't take place (too far away for me to > remember, see archives about the discussions). ...But if you want be dead > sure some WARN_ON there might not hurt. Also the purging of the whole > queue was a similar suspect I then came across (but that would only > materialize with sk reuse happening e.g., with nfs which the other guys > weren't using). > Hmm. This sounds familiar to me, but I cannot remember the discussion you mention or the patch. Or maybe it was the TCP transaction thing ? (including data in SYN or SYN-ACK packet) > > 3. The hint is pointing to a skb on the list but the list itself is > > corrupt. > > > > I added some debug code and the crash is happening when > > tp->retransmit_skb_hint is not NULL but tp->retransmit_skb_hint->next > > is NULL. So, #1 is out; unfortunately, I didn't have debug code in > > place to discern between #2 and #3. > > > > Does anything ring a bell? This is a production system and debugging > > affects quite a number of people. I can put debug code in to discern > > between #2 and #3 but I'm basically shooting in the dark and it would > > be great if someone has a better idea. > > Thanks for taking this up. I've been kind of waiting somebody to show up > who actually has some way of reproducing it. Once I had one guy in the > hook but his ability to reproduce was for some reason lost when he tried > with a debug patch [1]. > > I now realize that the debug patch should probably also print the write > queue too when the problem is caught in order to discern the cases you > mention. > > Something along these lines: > > tcp_for_write_queue(skb, sk) { > printk("skb %p (%u-%u) next %p prev %p sacked %u\n", ...); > } > > Anyway, my debugging patch should be such that in a lucky case it avoids > crashing the system too, though price to pay might then be a stuck > connection. In case #3 I'd expect the box to die elsewhere in TCP code > pretty soon anyway so it depends whether avoiding oops is really so > useful, but if you're lucky other mechanism in TCP will recover > the lost one for you (basically RTO driven retransmission). > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: oops in tcp_xmit_retransmit_queue() w/ v2.6.32.15 2010-07-11 17:06 ` Eric Dumazet @ 2010-07-11 17:46 ` Eric Dumazet 2010-07-11 18:29 ` Eric Dumazet 0 siblings, 1 reply; 29+ messages in thread From: Eric Dumazet @ 2010-07-11 17:46 UTC (permalink / raw) To: Ilpo Järvinen Cc: Tejun Heo, David S. Miller, lkml, netdev@vger.kernel.org, Fehrmann, Henning, Carsten Aulbert Le dimanche 11 juillet 2010 à 19:06 +0200, Eric Dumazet a écrit : > Le dimanche 11 juillet 2010 à 19:09 +0300, Ilpo Järvinen a écrit : > > On Thu, 8 Jul 2010, Tejun Heo wrote: > > > > > We've been seeing oops in tcp_xmit_retransmit_queue() w/ 2.6.32.15. > > > Please see the attached photoshoot. This is happening on a HPC > > > cluster and very interestingly caused by one particular job. How long > > > it takes isn't clear yet (at least more than a day) but when it > > > happens it happens on a lot of machines in relatively short time. > > > > > > With a bit of disassemblying, I've found that the oops is happening > > > during tcp_for_write_queue_from() because the skb->next points to > > > NULL. > > > > > > void tcp_xmit_retransmit_queue(struct sock *sk) > > > { > > > ... > > > if (tp->retransmit_skb_hint) { > > > skb = tp->retransmit_skb_hint; > > > last_lost = TCP_SKB_CB(skb)->end_seq; > > > if (after(last_lost, tp->retransmit_high)) > > > last_lost = tp->retransmit_high; > > > } else { > > > skb = tcp_write_queue_head(sk); > > > last_lost = tp->snd_una; > > > } > > > > > > => tcp_for_write_queue_from(skb, sk) { > > > __u8 sacked = TCP_SKB_CB(skb)->sacked; > > > > > > if (skb == tcp_send_head(sk)) > > > break; > > > /* we could do better than to assign each time */ > > > if (hole == NULL) > > > > > > This can happen for one of the following reasons, > > > > > > 1. tp->retransmit_skb_hint is NULL and tcp_write_queue_head() is NULL > > > too. ie. tcp_xmit_retransmit_queue() is called on an empty write > > > queue for some reason. > > > > > > 2. tp->retransmit_skb_hint is pointing to a skb which is not on the > > > write_queue. ie. somebody forgot to update hint while removing the > > > skb from the write queue. > > > > Once again I've read the unlinkers through, and only thing that could > > cause this is tcp_send_synack (others do deal with the hints) but I think > > Eric already proposed a patch to that but we never got anywhere due to > > some counterargument why it wouldn't take place (too far away for me to > > remember, see archives about the discussions). ...But if you want be dead > > sure some WARN_ON there might not hurt. Also the purging of the whole > > queue was a similar suspect I then came across (but that would only > > materialize with sk reuse happening e.g., with nfs which the other guys > > weren't using). > > > > Hmm. > > This sounds familiar to me, but I cannot remember the discussion you > mention or the patch. > > Or maybe it was the TCP transaction thing ? (including data in SYN or > SYN-ACK packet) Hmm, I cannot find where we reset restransmit_skb_hint in tcp_mtu_probe(), if we call tcp_unlink_write_queue(). if (skb->len <= copy) { /* We've eaten all the data from this skb. * Throw it away. */ TCP_SKB_CB(nskb)->flags |= TCP_SKB_CB(skb)->flags; <<>> tcp_unlink_write_queue(skb, sk); sk_wmem_free_skb(sk, skb); } else { Sorry if this was already discussed. We might add a comment here in anycase ;) ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: oops in tcp_xmit_retransmit_queue() w/ v2.6.32.15 2010-07-11 17:46 ` Eric Dumazet @ 2010-07-11 18:29 ` Eric Dumazet 2010-07-11 19:22 ` Ilpo Järvinen 0 siblings, 1 reply; 29+ messages in thread From: Eric Dumazet @ 2010-07-11 18:29 UTC (permalink / raw) To: Ilpo Järvinen Cc: Tejun Heo, David S. Miller, lkml, netdev@vger.kernel.org, Fehrmann, Henning, Carsten Aulbert Le dimanche 11 juillet 2010 à 19:46 +0200, Eric Dumazet a écrit : > Le dimanche 11 juillet 2010 à 19:06 +0200, Eric Dumazet a écrit : > > Le dimanche 11 juillet 2010 à 19:09 +0300, Ilpo Järvinen a écrit : > > > On Thu, 8 Jul 2010, Tejun Heo wrote: > > > > > > > We've been seeing oops in tcp_xmit_retransmit_queue() w/ 2.6.32.15. > > > > Please see the attached photoshoot. This is happening on a HPC > > > > cluster and very interestingly caused by one particular job. How long > > > > it takes isn't clear yet (at least more than a day) but when it > > > > happens it happens on a lot of machines in relatively short time. > > > > > > > > With a bit of disassemblying, I've found that the oops is happening > > > > during tcp_for_write_queue_from() because the skb->next points to > > > > NULL. > > > > > > > > void tcp_xmit_retransmit_queue(struct sock *sk) > > > > { > > > > ... > > > > if (tp->retransmit_skb_hint) { > > > > skb = tp->retransmit_skb_hint; > > > > last_lost = TCP_SKB_CB(skb)->end_seq; > > > > if (after(last_lost, tp->retransmit_high)) > > > > last_lost = tp->retransmit_high; > > > > } else { > > > > skb = tcp_write_queue_head(sk); > > > > last_lost = tp->snd_una; > > > > } > > > > > > > > => tcp_for_write_queue_from(skb, sk) { > > > > __u8 sacked = TCP_SKB_CB(skb)->sacked; > > > > > > > > if (skb == tcp_send_head(sk)) > > > > break; > > > > /* we could do better than to assign each time */ > > > > if (hole == NULL) > > > > > > > > This can happen for one of the following reasons, > > > > > > > > 1. tp->retransmit_skb_hint is NULL and tcp_write_queue_head() is NULL > > > > too. ie. tcp_xmit_retransmit_queue() is called on an empty write > > > > queue for some reason. > > > > > > > > 2. tp->retransmit_skb_hint is pointing to a skb which is not on the > > > > write_queue. ie. somebody forgot to update hint while removing the > > > > skb from the write queue. > > > > > > Once again I've read the unlinkers through, and only thing that could > > > cause this is tcp_send_synack (others do deal with the hints) but I think > > > Eric already proposed a patch to that but we never got anywhere due to > > > some counterargument why it wouldn't take place (too far away for me to > > > remember, see archives about the discussions). ...But if you want be dead > > > sure some WARN_ON there might not hurt. Also the purging of the whole > > > queue was a similar suspect I then came across (but that would only > > > materialize with sk reuse happening e.g., with nfs which the other guys > > > weren't using). > > > > > > > Hmm. > > > > This sounds familiar to me, but I cannot remember the discussion you > > mention or the patch. > > > > Or maybe it was the TCP transaction thing ? (including data in SYN or > > SYN-ACK packet) > > Hmm, I cannot find where we reset restransmit_skb_hint in > tcp_mtu_probe(), if we call tcp_unlink_write_queue(). > > if (skb->len <= copy) { > /* We've eaten all the data from this skb. > * Throw it away. */ > TCP_SKB_CB(nskb)->flags |= TCP_SKB_CB(skb)->flags; > <<>> tcp_unlink_write_queue(skb, sk); > sk_wmem_free_skb(sk, skb); > } else { > > > Sorry if this was already discussed. We might add a comment here in anycase ;) > Just in case, here is a patch for this issue, if Tejun wants to try it. Thanks [PATCH] tcp: tcp_mtu_probe() and retransmit hints When removing an skb from tcp write queue, we must take care of various hints that could be kept on this skb. tcp_mtu_probe() misses this cleanup. lkml reference : http://lkml.org/lkml/2010/7/8/63 Reported-by: Tejun Heo <tj@kernel.org> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> --- diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index b4ed957..187453f 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -1666,6 +1666,9 @@ static int tcp_mtu_probe(struct sock *sk) * Throw it away. */ TCP_SKB_CB(nskb)->flags |= TCP_SKB_CB(skb)->flags; tcp_unlink_write_queue(skb, sk); + tcp_clear_retrans_hints_partial(tp); + if (skb == tp->retransmit_skb_hint) + tp->retransmit_skb_hint = nskb; sk_wmem_free_skb(sk, skb); } else { TCP_SKB_CB(nskb)->flags |= TCP_SKB_CB(skb)->flags & ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: oops in tcp_xmit_retransmit_queue() w/ v2.6.32.15 2010-07-11 18:29 ` Eric Dumazet @ 2010-07-11 19:22 ` Ilpo Järvinen 2010-07-11 19:25 ` Ilpo Järvinen 0 siblings, 1 reply; 29+ messages in thread From: Ilpo Järvinen @ 2010-07-11 19:22 UTC (permalink / raw) To: Eric Dumazet Cc: Tejun Heo, David S. Miller, lkml, netdev@vger.kernel.org, Fehrmann, Henning, Carsten Aulbert [-- Attachment #1: Type: TEXT/PLAIN, Size: 5020 bytes --] On Sun, 11 Jul 2010, Eric Dumazet wrote: > Le dimanche 11 juillet 2010 à 19:46 +0200, Eric Dumazet a écrit : > > Le dimanche 11 juillet 2010 à 19:06 +0200, Eric Dumazet a écrit : > > > Le dimanche 11 juillet 2010 à 19:09 +0300, Ilpo Järvinen a écrit : > > > > On Thu, 8 Jul 2010, Tejun Heo wrote: > > > > > > > > > We've been seeing oops in tcp_xmit_retransmit_queue() w/ 2.6.32.15. > > > > > Please see the attached photoshoot. This is happening on a HPC > > > > > cluster and very interestingly caused by one particular job. How long > > > > > it takes isn't clear yet (at least more than a day) but when it > > > > > happens it happens on a lot of machines in relatively short time. > > > > > > > > > > With a bit of disassemblying, I've found that the oops is happening > > > > > during tcp_for_write_queue_from() because the skb->next points to > > > > > NULL. > > > > > > > > > > void tcp_xmit_retransmit_queue(struct sock *sk) > > > > > { > > > > > ... > > > > > if (tp->retransmit_skb_hint) { > > > > > skb = tp->retransmit_skb_hint; > > > > > last_lost = TCP_SKB_CB(skb)->end_seq; > > > > > if (after(last_lost, tp->retransmit_high)) > > > > > last_lost = tp->retransmit_high; > > > > > } else { > > > > > skb = tcp_write_queue_head(sk); > > > > > last_lost = tp->snd_una; > > > > > } > > > > > > > > > > => tcp_for_write_queue_from(skb, sk) { > > > > > __u8 sacked = TCP_SKB_CB(skb)->sacked; > > > > > > > > > > if (skb == tcp_send_head(sk)) > > > > > break; > > > > > /* we could do better than to assign each time */ > > > > > if (hole == NULL) > > > > > > > > > > This can happen for one of the following reasons, > > > > > > > > > > 1. tp->retransmit_skb_hint is NULL and tcp_write_queue_head() is NULL > > > > > too. ie. tcp_xmit_retransmit_queue() is called on an empty write > > > > > queue for some reason. > > > > > > > > > > 2. tp->retransmit_skb_hint is pointing to a skb which is not on the > > > > > write_queue. ie. somebody forgot to update hint while removing the > > > > > skb from the write queue. > > > > > > > > Once again I've read the unlinkers through, and only thing that could > > > > cause this is tcp_send_synack (others do deal with the hints) but I think > > > > Eric already proposed a patch to that but we never got anywhere due to > > > > some counterargument why it wouldn't take place (too far away for me to > > > > remember, see archives about the discussions). ...But if you want be dead > > > > sure some WARN_ON there might not hurt. Also the purging of the whole > > > > queue was a similar suspect I then came across (but that would only > > > > materialize with sk reuse happening e.g., with nfs which the other guys > > > > weren't using). > > > > > > > > > > Hmm. > > > > > > This sounds familiar to me, but I cannot remember the discussion you > > > mention or the patch. > > > > > > Or maybe it was the TCP transaction thing ? (including data in SYN or > > > SYN-ACK packet) No. That's another thing. ...I've already found it with google today but cannot seem to find it again. I thought I used tcp_make_synack eric but for some reason I only get these transaction fix hits. I'll keep looking. > > Hmm, I cannot find where we reset restransmit_skb_hint in > > tcp_mtu_probe(), if we call tcp_unlink_write_queue(). > > > > if (skb->len <= copy) { > > /* We've eaten all the data from this skb. > > * Throw it away. */ > > TCP_SKB_CB(nskb)->flags |= TCP_SKB_CB(skb)->flags; > > <<>> tcp_unlink_write_queue(skb, sk); > > sk_wmem_free_skb(sk, skb); > > } else { > > > > > > Sorry if this was already discussed. We might add a comment here in anycase ;) > > > > Just in case, here is a patch for this issue, if Tejun wants to try it. > > Thanks > > [PATCH] tcp: tcp_mtu_probe() and retransmit hints > > When removing an skb from tcp write queue, we must take care of various > hints that could be kept on this skb. tcp_mtu_probe() misses this > cleanup. > > lkml reference : http://lkml.org/lkml/2010/7/8/63 > > Reported-by: Tejun Heo <tj@kernel.org> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> > --- > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c > index b4ed957..187453f 100644 > --- a/net/ipv4/tcp_output.c > +++ b/net/ipv4/tcp_output.c > @@ -1666,6 +1666,9 @@ static int tcp_mtu_probe(struct sock *sk) > * Throw it away. */ > TCP_SKB_CB(nskb)->flags |= TCP_SKB_CB(skb)->flags; > tcp_unlink_write_queue(skb, sk); > + tcp_clear_retrans_hints_partial(tp); > + if (skb == tp->retransmit_skb_hint) > + tp->retransmit_skb_hint = nskb; ...I think we've not sent that skb just yet, so this'll never be true (and the rexmit loop terminates at send_head without setting it so we should be safe, I'll still need to check that no other code can accidently move it to the send_head but I doubt it happens). > sk_wmem_free_skb(sk, skb); > } else { > TCP_SKB_CB(nskb)->flags |= TCP_SKB_CB(skb)->flags & > > > -- i. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: oops in tcp_xmit_retransmit_queue() w/ v2.6.32.15 2010-07-11 19:22 ` Ilpo Järvinen @ 2010-07-11 19:25 ` Ilpo Järvinen 2010-07-11 19:44 ` Ilpo Järvinen 0 siblings, 1 reply; 29+ messages in thread From: Ilpo Järvinen @ 2010-07-11 19:25 UTC (permalink / raw) To: Eric Dumazet Cc: Tejun Heo, David S. Miller, lkml, netdev@vger.kernel.org, Fehrmann, Henning, Carsten Aulbert [-- Attachment #1: Type: TEXT/PLAIN, Size: 3511 bytes --] On Sun, 11 Jul 2010, Ilpo Järvinen wrote: > On Sun, 11 Jul 2010, Eric Dumazet wrote: > > > Le dimanche 11 juillet 2010 à 19:46 +0200, Eric Dumazet a écrit : > > > Le dimanche 11 juillet 2010 à 19:06 +0200, Eric Dumazet a écrit : > > > > Le dimanche 11 juillet 2010 à 19:09 +0300, Ilpo Järvinen a écrit : > > > > > On Thu, 8 Jul 2010, Tejun Heo wrote: > > > > > > > > > > > We've been seeing oops in tcp_xmit_retransmit_queue() w/ 2.6.32.15. > > > > > > Please see the attached photoshoot. This is happening on a HPC > > > > > > cluster and very interestingly caused by one particular job. How long > > > > > > it takes isn't clear yet (at least more than a day) but when it > > > > > > happens it happens on a lot of machines in relatively short time. > > > > > > > > > > > > With a bit of disassemblying, I've found that the oops is happening > > > > > > during tcp_for_write_queue_from() because the skb->next points to > > > > > > NULL. > > > > > > > > > > > > void tcp_xmit_retransmit_queue(struct sock *sk) > > > > > > { > > > > > > ... > > > > > > if (tp->retransmit_skb_hint) { > > > > > > skb = tp->retransmit_skb_hint; > > > > > > last_lost = TCP_SKB_CB(skb)->end_seq; > > > > > > if (after(last_lost, tp->retransmit_high)) > > > > > > last_lost = tp->retransmit_high; > > > > > > } else { > > > > > > skb = tcp_write_queue_head(sk); > > > > > > last_lost = tp->snd_una; > > > > > > } > > > > > > > > > > > > => tcp_for_write_queue_from(skb, sk) { > > > > > > __u8 sacked = TCP_SKB_CB(skb)->sacked; > > > > > > > > > > > > if (skb == tcp_send_head(sk)) > > > > > > break; > > > > > > /* we could do better than to assign each time */ > > > > > > if (hole == NULL) > > > > > > > > > > > > This can happen for one of the following reasons, > > > > > > > > > > > > 1. tp->retransmit_skb_hint is NULL and tcp_write_queue_head() is NULL > > > > > > too. ie. tcp_xmit_retransmit_queue() is called on an empty write > > > > > > queue for some reason. > > > > > > > > > > > > 2. tp->retransmit_skb_hint is pointing to a skb which is not on the > > > > > > write_queue. ie. somebody forgot to update hint while removing the > > > > > > skb from the write queue. > > > > > > > > > > Once again I've read the unlinkers through, and only thing that could > > > > > cause this is tcp_send_synack (others do deal with the hints) but I think > > > > > Eric already proposed a patch to that but we never got anywhere due to > > > > > some counterargument why it wouldn't take place (too far away for me to > > > > > remember, see archives about the discussions). ...But if you want be dead > > > > > sure some WARN_ON there might not hurt. Also the purging of the whole > > > > > queue was a similar suspect I then came across (but that would only > > > > > materialize with sk reuse happening e.g., with nfs which the other guys > > > > > weren't using). > > > > > > > > > > > > > Hmm. > > > > > > > > This sounds familiar to me, but I cannot remember the discussion you > > > > mention or the patch. > > > > > > > > Or maybe it was the TCP transaction thing ? (including data in SYN or > > > > SYN-ACK packet) > > No. That's another thing. ...I've already found it with google today but > cannot seem to find it again. I thought I used tcp_make_synack eric but > for some reason I only get these transaction fix hits. I'll keep looking. Right, this one: http://kerneltrap.org/mailarchive/linux-netdev/2009/10/29/6259073 -- i. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: oops in tcp_xmit_retransmit_queue() w/ v2.6.32.15 2010-07-11 19:25 ` Ilpo Järvinen @ 2010-07-11 19:44 ` Ilpo Järvinen 0 siblings, 0 replies; 29+ messages in thread From: Ilpo Järvinen @ 2010-07-11 19:44 UTC (permalink / raw) To: Tejun Heo Cc: Eric Dumazet, David S. Miller, lkml, netdev@vger.kernel.org, Fehrmann, Henning, Carsten Aulbert [-- Attachment #1: Type: TEXT/PLAIN, Size: 4203 bytes --] On Sun, 11 Jul 2010, Ilpo Järvinen wrote: > On Sun, 11 Jul 2010, Ilpo Järvinen wrote: > > > On Sun, 11 Jul 2010, Eric Dumazet wrote: > > > > > Le dimanche 11 juillet 2010 à 19:46 +0200, Eric Dumazet a écrit : > > > > Le dimanche 11 juillet 2010 à 19:06 +0200, Eric Dumazet a écrit : > > > > > Le dimanche 11 juillet 2010 à 19:09 +0300, Ilpo Järvinen a écrit : > > > > > > On Thu, 8 Jul 2010, Tejun Heo wrote: > > > > > > > > > > > > > We've been seeing oops in tcp_xmit_retransmit_queue() w/ 2.6.32.15. > > > > > > > Please see the attached photoshoot. This is happening on a HPC > > > > > > > cluster and very interestingly caused by one particular job. How long > > > > > > > it takes isn't clear yet (at least more than a day) but when it > > > > > > > happens it happens on a lot of machines in relatively short time. > > > > > > > > > > > > > > With a bit of disassemblying, I've found that the oops is happening > > > > > > > during tcp_for_write_queue_from() because the skb->next points to > > > > > > > NULL. > > > > > > > > > > > > > > void tcp_xmit_retransmit_queue(struct sock *sk) > > > > > > > { > > > > > > > ... > > > > > > > if (tp->retransmit_skb_hint) { > > > > > > > skb = tp->retransmit_skb_hint; > > > > > > > last_lost = TCP_SKB_CB(skb)->end_seq; > > > > > > > if (after(last_lost, tp->retransmit_high)) > > > > > > > last_lost = tp->retransmit_high; > > > > > > > } else { > > > > > > > skb = tcp_write_queue_head(sk); > > > > > > > last_lost = tp->snd_una; > > > > > > > } > > > > > > > > > > > > > > => tcp_for_write_queue_from(skb, sk) { > > > > > > > __u8 sacked = TCP_SKB_CB(skb)->sacked; > > > > > > > > > > > > > > if (skb == tcp_send_head(sk)) > > > > > > > break; > > > > > > > /* we could do better than to assign each time */ > > > > > > > if (hole == NULL) > > > > > > > > > > > > > > This can happen for one of the following reasons, > > > > > > > > > > > > > > 1. tp->retransmit_skb_hint is NULL and tcp_write_queue_head() is NULL > > > > > > > too. ie. tcp_xmit_retransmit_queue() is called on an empty write > > > > > > > queue for some reason. > > > > > > > > > > > > > > 2. tp->retransmit_skb_hint is pointing to a skb which is not on the > > > > > > > write_queue. ie. somebody forgot to update hint while removing the > > > > > > > skb from the write queue. > > > > > > > > > > > > Once again I've read the unlinkers through, and only thing that could > > > > > > cause this is tcp_send_synack (others do deal with the hints) but I think > > > > > > Eric already proposed a patch to that but we never got anywhere due to > > > > > > some counterargument why it wouldn't take place (too far away for me to > > > > > > remember, see archives about the discussions). ...But if you want be dead > > > > > > sure some WARN_ON there might not hurt. Also the purging of the whole > > > > > > queue was a similar suspect I then came across (but that would only > > > > > > materialize with sk reuse happening e.g., with nfs which the other guys > > > > > > weren't using). > > > > > > > > > > > > > > > > Hmm. > > > > > > > > > > This sounds familiar to me, but I cannot remember the discussion you > > > > > mention or the patch. > > > > > > > > > > Or maybe it was the TCP transaction thing ? (including data in SYN or > > > > > SYN-ACK packet) > > > > No. That's another thing. ...I've already found it with google today but > > cannot seem to find it again. I thought I used tcp_make_synack eric but > > for some reason I only get these transaction fix hits. I'll keep looking. > > Right, this one: > > http://kerneltrap.org/mailarchive/linux-netdev/2009/10/29/6259073 Hmm, another idea... It might be useful to try to disable tcp_retrans_try_collapse in tcp_retransmit_skb as a test. I think that it might be possible that tcp_xmit_retransmit_queue might end up holding a stale reference either in hole or skb. Kind of shot into the dark still, no actual theory on how that could happen but that tcp_xmit_retransmit_queue logic is rather tricky because of returning to the first hole if such exists so that I couldn't immediately rule out the possibility. -- i. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: oops in tcp_xmit_retransmit_queue() w/ v2.6.32.15 2010-07-11 16:09 ` Ilpo Järvinen 2010-07-11 17:06 ` Eric Dumazet @ 2010-07-15 11:58 ` Lennart Schulte 2010-07-15 12:05 ` Eric Dumazet 1 sibling, 1 reply; 29+ messages in thread From: Lennart Schulte @ 2010-07-15 11:58 UTC (permalink / raw) To: Ilpo Järvinen Cc: Tejun Heo, David S. Miller, lkml, netdev@vger.kernel.org, Fehrmann, Henning, Carsten Aulbert, Eric Dumazet I'm testing new reordering algorithms in a virtual testbed, that is the nodes are emulated with xen and all the network parameters can be tuned with queues. With one of the algorithms I also got tracebacks which include tcp_xmit_retransmit_queue. It only happens with ACK loss. The kernel version however is 2.6.31. When I read this thread I tried the debug patch and got the following: [ 2754.413150] NULL head, pkts 0 [ 2754.413156] Errors caught so far 1 Hope that is of any help. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: oops in tcp_xmit_retransmit_queue() w/ v2.6.32.15 2010-07-15 11:58 ` Lennart Schulte @ 2010-07-15 12:05 ` Eric Dumazet 2010-07-15 12:55 ` Lennart Schulte 0 siblings, 1 reply; 29+ messages in thread From: Eric Dumazet @ 2010-07-15 12:05 UTC (permalink / raw) To: Lennart Schulte Cc: Ilpo Järvinen, Tejun Heo, David S. Miller, lkml, netdev@vger.kernel.org, Fehrmann, Henning, Carsten Aulbert Le jeudi 15 juillet 2010 à 13:58 +0200, Lennart Schulte a écrit : > I'm testing new reordering algorithms in a virtual testbed, that is the > nodes are emulated with xen and all the network parameters can be tuned > with queues. > With one of the algorithms I also got tracebacks which include > tcp_xmit_retransmit_queue. It only happens with ACK loss. The kernel > version however is 2.6.31. > When I read this thread I tried the debug patch and got the following: > > [ 2754.413150] NULL head, pkts 0 > [ 2754.413156] Errors caught so far 1 > > Hope that is of any help. Not sure I understand. Are you saying you reproduce same tcp_xmit_retransmit_queue() bug, with a special tc qdisc/class droppping some ACK frames ? Could it be some sched problem and incorrect return codes in case of congestion ? ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: oops in tcp_xmit_retransmit_queue() w/ v2.6.32.15 2010-07-15 12:05 ` Eric Dumazet @ 2010-07-15 12:55 ` Lennart Schulte 2010-07-16 12:02 ` Ilpo Järvinen 0 siblings, 1 reply; 29+ messages in thread From: Lennart Schulte @ 2010-07-15 12:55 UTC (permalink / raw) To: Eric Dumazet Cc: Ilpo Järvinen, Tejun Heo, David S. Miller, lkml, netdev@vger.kernel.org, Fehrmann, Henning, Carsten Aulbert Since tcp_xmit_retransmit_queue also gets skb == NULL I'm pretty sure it is the same bug. Up to now I only experienced the problem with ACK loss (without ACK loss the test ran about 30min without problems, with ACK loss it had paniced within 10min). The data sender only has a HTB queue for traffic shaping (set to 20 Mbit/s). The ACK loss is done by another router. The setup looks like this. This way it seems to be the most realistic. o sender with HTB | | o netem queue for forward path delay | o netem queue for a queue limit | o netem queue for backward path delay | o netem queue for ACK loss | | o receiver with HTB Perhaps now it is a little big clearer. On 15.07.2010 14:05, Eric Dumazet wrote: > Le jeudi 15 juillet 2010 à 13:58 +0200, Lennart Schulte a écrit : > >> I'm testing new reordering algorithms in a virtual testbed, that is the >> nodes are emulated with xen and all the network parameters can be tuned >> with queues. >> With one of the algorithms I also got tracebacks which include >> tcp_xmit_retransmit_queue. It only happens with ACK loss. The kernel >> version however is 2.6.31. >> When I read this thread I tried the debug patch and got the following: >> >> [ 2754.413150] NULL head, pkts 0 >> [ 2754.413156] Errors caught so far 1 >> >> Hope that is of any help. >> > Not sure I understand. > > Are you saying you reproduce same tcp_xmit_retransmit_queue() bug, with > a special tc qdisc/class droppping some ACK frames ? > > Could it be some sched problem and incorrect return codes in case of > congestion ? > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: oops in tcp_xmit_retransmit_queue() w/ v2.6.32.15 2010-07-15 12:55 ` Lennart Schulte @ 2010-07-16 12:02 ` Ilpo Järvinen 2010-07-16 12:25 ` Lennart Schulte 2010-07-19 14:57 ` oops in tcp_xmit_retransmit_queue() w/ v2.6.32.15 Tejun Heo 0 siblings, 2 replies; 29+ messages in thread From: Ilpo Järvinen @ 2010-07-16 12:02 UTC (permalink / raw) To: Lennart Schulte Cc: Eric Dumazet, Tejun Heo, David S. Miller, lkml, netdev@vger.kernel.org, Fehrmann, Henning, Carsten Aulbert On Thu, 15 Jul 2010, Lennart Schulte wrote: > Since tcp_xmit_retransmit_queue also gets skb == NULL I'm pretty sure it is > the same bug. > Up to now I only experienced the problem with ACK loss (without ACK loss the > test ran about 30min without problems, with ACK loss it had paniced within > 10min). > The data sender only has a HTB queue for traffic shaping (set to 20 Mbit/s). > The ACK loss is done by another router. > The setup looks like this. This way it seems to be the most realistic. > > o sender with HTB > | > | > o netem queue for forward path delay > | > o netem queue for a queue limit > | > o netem queue for backward path delay > | > o netem queue for ACK loss > | > | > o receiver with HTB > > Perhaps now it is a little big clearer. > > > [ 2754.413150] NULL head, pkts 0 > > > [ 2754.413156] Errors caught so far 1 Thanks for reporting the results. Could you post the oops too or double check do the timestamps really match (and there wasn't more "Errors caught" prints in between)? Since this condition doesn't seem to crash the kernel as also send_head should be NULL, which saves the day here exiting the loop (unless send head would too be corrupt). ...However, I don't like too much anyway that we can end up into tcp_xmit_retransmit_queue loop with packets_out being zero and only send_head check side-effect causes proper action. Besides, Tejun has also found that it's hint->next ptr which is NULL in his case so this won't solve his case anyway. Tejun, can you confirm whether it was retransmit_skb_hint->next being NULL on _entry time_ to tcp_xmit_retransmit_queue() or later on in the loop after the updates done by the loop itself to the hint (or that your testing didn't conclude either)? -- i. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: oops in tcp_xmit_retransmit_queue() w/ v2.6.32.15 2010-07-16 12:02 ` Ilpo Järvinen @ 2010-07-16 12:25 ` Lennart Schulte 2010-07-16 13:19 ` Ilpo Järvinen 2010-07-19 14:57 ` oops in tcp_xmit_retransmit_queue() w/ v2.6.32.15 Tejun Heo 1 sibling, 1 reply; 29+ messages in thread From: Lennart Schulte @ 2010-07-16 12:25 UTC (permalink / raw) To: Ilpo Järvinen Cc: Eric Dumazet, Tejun Heo, David S. Miller, lkml, netdev@vger.kernel.org, Fehrmann, Henning, Carsten Aulbert On 16.07.2010 14:02, Ilpo Järvinen wrote: > >>>> [ 2754.413150] NULL head, pkts 0 >>>> [ 2754.413156] Errors caught so far 1 >>>> > Thanks for reporting the results. > > Could you post the oops too or double check do the timestamps really match > (and there wasn't more "Errors caught" prints in between)? Since this > condition doesn't seem to crash the kernel as also send_head should be > NULL, which saves the day here exiting the loop (unless send head would > too be corrupt). > I can try to do some more testing, perhaps then I will get other results. But until now I've always gotten something like above. With the debug patch the kernel doesn't crash, but I have an oops from a run before the patch: [ 3214.498061] BUG: unable to handle kernel NULL pointer dereference at (null) [ 3214.498085] IP: [<c12657dc>] tcp_xmit_retransmit_queue+0x4c/0x2b0 [ 3214.498121] *pdpt = 00000002cf6fa001 [ 3214.498130] Thread overran stack, or stack corrupted [ 3214.498138] Oops: 0000 [#1] SMP [ 3214.498154] last sysfs file: /sys/kernel/uevent_seqnum [ 3214.498161] Modules linked in: tcp_ancr tcp_ncr [ 3214.498174] [ 3214.498180] Pid: 0, comm: swapper Not tainted (2.6.31.9-pae-um-wolff #79) [ 3214.498188] EIP: 0061:[<c12657dc>] EFLAGS: 00010246 CPU: 0 [ 3214.498196] EIP is at tcp_xmit_retransmit_queue+0x4c/0x2b0 [ 3214.498203] EAX: c6da2900 EBX: c6da2880 ECX: 00000000 EDX: e50c512e [ 3214.498211] ESI: 00000000 EDI: 0000051b EBP: c6da2900 ESP: c13d5cf0 [ 3214.498219] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0069 [ 3214.498227] Process swapper (pid: 0, ti=c13d4000 task=c13e7a20 task.ti=c13d4000) [ 3214.498236] Stack: [ 3214.498240] c1005a0b 00000001 00000000 e50c512e c7804300 00000013 c6da2880 0000051b [ 3214.498264] <0> e50c512e c1260709 c6cbf840 c6d42000 c1031826 c1288bbd c6da2900 c6e09320 [ 3214.498290] <0> c6e09300 00000000 00000000 00000001 e50c512d e521a346 e50c512e 00000000 [ 3214.498318] Call Trace: [ 3214.498329] [<c1005a0b>] ? xen_restore_fl_direct_end+0x0/0x1 [ 3214.498339] [<c1260709>] ? tcp_ack+0x7f9/0x10d0 [ 3214.498350] [<c1031826>] ? local_bh_enable+0x56/0x80 [ 3214.498359] [<c1288bbd>] ? ipt_do_table+0x2dd/0x590 [ 3214.498369] [<c1261eef>] ? tcp_rcv_state_process+0x41f/0x970 [ 3214.498378] [<c1267e7f>] ? tcp_v4_do_rcv+0x8f/0x1e0 [ 3214.498387] [<c1269ccd>] ? tcp_v4_rcv+0x68d/0x7d0 [ 3214.498397] [<c124d5f0>] ? ip_local_deliver_finish+0x0/0x1e0 [ 3214.498406] [<c124d687>] ? ip_local_deliver_finish+0x97/0x1e0 [ 3214.498416] [<c124d5f0>] ? ip_local_deliver_finish+0x0/0x1e0 [ 3214.498425] [<c124d3eb>] ? ip_rcv_finish+0x13b/0x340 [ 3214.498434] [<c124d2b0>] ? ip_rcv_finish+0x0/0x340 [ 3214.498442] [<c124d8c0>] ? ip_rcv+0x0/0x2e0 [ 3214.498452] [<c1211037>] ? netif_receive_skb+0x2f7/0x4c0 [ 3214.498468] [<c1213fe0>] ? process_backlog+0x70/0xb0 [ 3214.498476] [<c1213d98>] ? net_rx_action+0xe8/0x1a0 [ 3214.498486] [<c103116d>] ? __do_softirq+0x8d/0x120 [ 3214.498494] [<c100360d>] ? xen_mc_flush+0xed/0x1a0 [ 3214.498504] [<c1057891>] ? move_native_irq+0x11/0x50 [ 3214.498513] [<c1031238>] ? do_softirq+0x38/0x40 [ 3214.498523] [<c11930e2>] ? xen_evtchn_do_upcall+0x142/0x160 [ 3214.498534] [<c10087d7>] ? xen_do_upcall+0x7/0xc [ 3214.498543] [<c10013a7>] ? hypercall_page+0x3a7/0x1010 [ 3214.498552] [<c100520f>] ? xen_safe_halt+0xf/0x20 [ 3214.498560] [<c100349c>] ? xen_idle+0x1c/0x30 [ 3214.498569] [<c1006bba>] ? cpu_idle+0x3a/0x60 [ 3214.498578] [<c141692a>] ? start_kernel+0x26a/0x300 [ 3214.498616] [<c1416280>] ? unknown_bootoption+0x0/0x1c0 [ 3214.498630] [<c141938e>] ? xen_start_kernel+0x3be/0x3e0 [ 3214.498637] Code: 00 00 8b b3 a0 03 00 00 85 f6 0f 84 53 02 00 00 8b 46 3c 8d ab 80 00 00 00 8b 93 04 04 00 00 39 c2 89 54 24 0c 0f 89 1c 02 00 00 <8b> 06 0f 18 00 90 39 ee 0f 84 30 01 00 00 39 b3 28 01 00 00 8d [ 3214.498820] EIP: [<c12657dc>] tcp_xmit_retransmit_queue+0x4c/0x2b0 SS:ESP 0069:c13d5cf0 [ 3214.498836] CR2: 0000000000000000 [ 3214.498846] ---[ end trace 709a97adf87834a7 ]--- [ 3214.498852] Kernel panic - not syncing: Fatal exception in interrupt [ 3214.498862] Pid: 0, comm: swapper Tainted: G D 2.6.31.9-pae-um-wolff #79 [ 3214.498870] Call Trace: [ 3214.498878] [<c102c896>] ? panic+0x46/0x100 [ 3214.498904] [<c100b428>] ? oops_end+0x98/0xa0 [ 3214.498922] [<c1019eff>] ? no_context+0x11f/0x1b0 [ 3214.498930] [<c101a516>] ? do_page_fault+0x66/0x240 [ 3214.498939] [<c101a4b0>] ? do_page_fault+0x0/0x240 [ 3214.498947] [<c101a23f>] ? bad_area_nosemaphore+0xf/0x20 [ 3214.498955] [<c1308b7e>] ? error_code+0x66/0x6c [ 3214.498963] [<c101a4b0>] ? do_page_fault+0x0/0x240 [ 3214.498972] [<c12657dc>] ? tcp_xmit_retransmit_queue+0x4c/0x2b0 [ 3214.498982] [<c1005a0b>] ? xen_restore_fl_direct_end+0x0/0x1 [ 3214.498991] [<c1260709>] ? tcp_ack+0x7f9/0x10d0 [ 3214.498999] [<c1031826>] ? local_bh_enable+0x56/0x80 [ 3214.499009] [<c1288bbd>] ? ipt_do_table+0x2dd/0x590 [ 3214.499017] [<c1261eef>] ? tcp_rcv_state_process+0x41f/0x970 [ 3214.499025] [<c1267e7f>] ? tcp_v4_do_rcv+0x8f/0x1e0 [ 3214.499034] [<c1269ccd>] ? tcp_v4_rcv+0x68d/0x7d0 [ 3214.499044] [<c124d5f0>] ? ip_local_deliver_finish+0x0/0x1e0 [ 3214.499053] [<c124d687>] ? ip_local_deliver_finish+0x97/0x1e0 [ 3214.499063] [<c124d5f0>] ? ip_local_deliver_finish+0x0/0x1e0 [ 3214.499072] [<c124d3eb>] ? ip_rcv_finish+0x13b/0x340 [ 3214.499079] [<c124d2b0>] ? ip_rcv_finish+0x0/0x340 [ 3214.499087] [<c124d8c0>] ? ip_rcv+0x0/0x2e0 [ 3214.499101] [<c1211037>] ? netif_receive_skb+0x2f7/0x4c0 [ 3214.499115] [<c1213fe0>] ? process_backlog+0x70/0xb0 [ 3214.499123] [<c1213d98>] ? net_rx_action+0xe8/0x1a0 [ 3214.499131] [<c103116d>] ? __do_softirq+0x8d/0x120 [ 3214.499143] [<c100360d>] ? xen_mc_flush+0xed/0x1a0 [ 3214.499152] [<c1057891>] ? move_native_irq+0x11/0x50 [ 3214.499160] [<c1031238>] ? do_softirq+0x38/0x40 [ 3214.499174] [<c11930e2>] ? xen_evtchn_do_upcall+0x142/0x160 [ 3214.499188] [<c10087d7>] ? xen_do_upcall+0x7/0xc [ 3214.499195] [<c10013a7>] ? hypercall_page+0x3a7/0x1010 [ 3214.499203] [<c100520f>] ? xen_safe_halt+0xf/0x20 [ 3214.499214] [<c100349c>] ? xen_idle+0x1c/0x30 [ 3214.499223] [<c1006bba>] ? cpu_idle+0x3a/0x60 [ 3214.499231] [<c141692a>] ? start_kernel+0x26a/0x300 [ 3214.499239] [<c1416280>] ? unknown_bootoption+0x0/0x1c0 [ 3214.499247] [<c141938e>] ? xen_start_kernel+0x3be/0x3e0 ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: oops in tcp_xmit_retransmit_queue() w/ v2.6.32.15 2010-07-16 12:25 ` Lennart Schulte @ 2010-07-16 13:19 ` Ilpo Järvinen 2010-07-19 8:06 ` Lennart Schulte 0 siblings, 1 reply; 29+ messages in thread From: Ilpo Järvinen @ 2010-07-16 13:19 UTC (permalink / raw) To: Lennart Schulte, David S. Miller Cc: Eric Dumazet, Tejun Heo, lkml, netdev@vger.kernel.org, Fehrmann, Henning, Carsten Aulbert [-- Attachment #1: Type: TEXT/PLAIN, Size: 2300 bytes --] On Fri, 16 Jul 2010, Lennart Schulte wrote: > On 16.07.2010 14:02, Ilpo Järvinen wrote: > > > > > > > [ 2754.413150] NULL head, pkts 0 > > > > > [ 2754.413156] Errors caught so far 1 > > > > > > > Thanks for reporting the results. > > > > Could you post the oops too or double check do the timestamps really match > > (and there wasn't more "Errors caught" prints in between)? Since this > > condition doesn't seem to crash the kernel as also send_head should be > > NULL, which saves the day here exiting the loop (unless send head would > > too be corrupt). Doh, I think we'll deref skb already to get the sacked (wouldn't be absolutely necessary but better to not trust side-effects) so it certainly is bad even with the send_head exit. > I can try to do some more testing, perhaps then I will get other results. But > until now I've always gotten something like above. It might then be useful to remove if (!caught_it) which was to prevent infinite printout if the problem is such that it would have persisted forever (now w/o the crash), but since there's no evidence of that. > With the debug patch the kernel doesn't crash, but I have an oops from a run > before the patch: Right, no crash of course, stupid me :-). Lets start with this (I'm not sure if this helps Tejun's case but much doubt it does): -- [PATCH] tcp: fix crash in tcp_xmit_retransmit_queue It can happen that there are no packets in queue while calling tcp_xmit_retransmit_queue(). tcp_write_queue_head() then returns NULL and that gets deref'ed to get sacked into a local var. There is no work to do if no packets are outstanding so we just exit early. There may still be another bug affecting this same function. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Reported-by: Lennart Schulte <lennart.schulte@nets.rwth-aachen.de> --- net/ipv4/tcp_output.c | 3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index b4ed957..7ed9dc1 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2208,6 +2208,9 @@ void tcp_xmit_retransmit_queue(struct sock *sk) int mib_idx; int fwd_rexmitting = 0; + if (!tp->packets_out) + return; + if (!tp->lost_out) tp->retransmit_high = tp->snd_una; -- 1.5.6.5 ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: oops in tcp_xmit_retransmit_queue() w/ v2.6.32.15 2010-07-16 13:19 ` Ilpo Järvinen @ 2010-07-19 8:06 ` Lennart Schulte 2010-07-19 11:16 ` [PATCHv2] tcp: fix crash in tcp_xmit_retransmit_queue Ilpo Järvinen 0 siblings, 1 reply; 29+ messages in thread From: Lennart Schulte @ 2010-07-19 8:06 UTC (permalink / raw) To: Ilpo Järvinen Cc: David S. Miller, Eric Dumazet, Tejun Heo, lkml, netdev@vger.kernel.org, Fehrmann, Henning, Carsten Aulbert I ran tests for about 2 hours with this patch and I got no output from the debug patch. This seems to have solved at least my problem :) Thanks! > [PATCH] tcp: fix crash in tcp_xmit_retransmit_queue > > It can happen that there are no packets in queue while calling > tcp_xmit_retransmit_queue(). tcp_write_queue_head() then returns > NULL and that gets deref'ed to get sacked into a local var. > > There is no work to do if no packets are outstanding so we just > exit early. > > There may still be another bug affecting this same function. > > Signed-off-by: Ilpo Järvinen<ilpo.jarvinen@helsinki.fi> > Reported-by: Lennart Schulte<lennart.schulte@nets.rwth-aachen.de> > --- > net/ipv4/tcp_output.c | 3 +++ > 1 files changed, 3 insertions(+), 0 deletions(-) > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c > index b4ed957..7ed9dc1 100644 > --- a/net/ipv4/tcp_output.c > +++ b/net/ipv4/tcp_output.c > @@ -2208,6 +2208,9 @@ void tcp_xmit_retransmit_queue(struct sock *sk) > int mib_idx; > int fwd_rexmitting = 0; > > + if (!tp->packets_out) > + return; > + > if (!tp->lost_out) > tp->retransmit_high = tp->snd_una; > > ^ permalink raw reply [flat|nested] 29+ messages in thread
* [PATCHv2] tcp: fix crash in tcp_xmit_retransmit_queue 2010-07-19 8:06 ` Lennart Schulte @ 2010-07-19 11:16 ` Ilpo Järvinen 2010-07-19 14:09 ` Eric Dumazet 0 siblings, 1 reply; 29+ messages in thread From: Ilpo Järvinen @ 2010-07-19 11:16 UTC (permalink / raw) To: Lennart Schulte, David Miller Cc: Eric Dumazet, Tejun Heo, lkml, netdev@vger.kernel.org, Fehrmann, Henning, Carsten Aulbert [-- Attachment #1: Type: TEXT/PLAIN, Size: 1951 bytes --] On Mon, 19 Jul 2010, Lennart Schulte wrote: > I ran tests for about 2 hours with this patch and I got no output from the > debug patch. This seems to have solved at least my problem :) > > Thanks! > > [PATCH] tcp: fix crash in tcp_xmit_retransmit_queue > > > > It can happen that there are no packets in queue while calling > > tcp_xmit_retransmit_queue(). tcp_write_queue_head() then returns > > NULL and that gets deref'ed to get sacked into a local var. > > > > There is no work to do if no packets are outstanding so we just > > exit early. > > > > There may still be another bug affecting this same function. Thanks for testing. DaveM, I think this oops was introduced for 2.6.28 (in 08ebd1721ab8fd362e90ae17b461c07b23fa2824 it seems, to be exact) so to stables it should go too please. I've only tweaked the message (so no need for Lennart to retest v2 :-)). -- [PATCHv2] tcp: fix crash in tcp_xmit_retransmit_queue It can happen that there are no packets in queue while calling tcp_xmit_retransmit_queue(). tcp_write_queue_head() then returns NULL and that gets deref'ed to get sacked into a local var. There is no work to do if no packets are outstanding so we just exit early. This oops was introduced by 08ebd1721ab8fd (tcp: remove tp->lost_out guard to make joining diff nicer). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Reported-by: Lennart Schulte <lennart.schulte@nets.rwth-aachen.de> Tested-by: Lennart Schulte <lennart.schulte@nets.rwth-aachen.de> --- net/ipv4/tcp_output.c | 3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index b4ed957..7ed9dc1 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2208,6 +2208,9 @@ void tcp_xmit_retransmit_queue(struct sock *sk) int mib_idx; int fwd_rexmitting = 0; + if (!tp->packets_out) + return; + if (!tp->lost_out) tp->retransmit_high = tp->snd_una; -- 1.5.6.5 ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: [PATCHv2] tcp: fix crash in tcp_xmit_retransmit_queue 2010-07-19 11:16 ` [PATCHv2] tcp: fix crash in tcp_xmit_retransmit_queue Ilpo Järvinen @ 2010-07-19 14:09 ` Eric Dumazet 2010-07-19 17:25 ` Ilpo Järvinen 0 siblings, 1 reply; 29+ messages in thread From: Eric Dumazet @ 2010-07-19 14:09 UTC (permalink / raw) To: Ilpo Järvinen Cc: Lennart Schulte, David Miller, Tejun Heo, lkml, netdev@vger.kernel.org, Fehrmann, Henning, Carsten Aulbert Le lundi 19 juillet 2010 à 14:16 +0300, Ilpo Järvinen a écrit : > Thanks for testing. > > DaveM, I think this oops was introduced for 2.6.28 (in > 08ebd1721ab8fd362e90ae17b461c07b23fa2824 it seems, to be exact) so to > stables it should go too please. I've only tweaked the message (so no need > for Lennart to retest v2 :-)). > > -- > [PATCHv2] tcp: fix crash in tcp_xmit_retransmit_queue > > It can happen that there are no packets in queue while calling > tcp_xmit_retransmit_queue(). tcp_write_queue_head() then returns > NULL and that gets deref'ed to get sacked into a local var. > > There is no work to do if no packets are outstanding so we just > exit early. > > This oops was introduced by 08ebd1721ab8fd (tcp: remove tp->lost_out > guard to make joining diff nicer). > But prior to commit 08ebd1721ab8fd3, we were not testing tp->packets_out, but tp->lost_out if it was 0, we were not doing the tcp_for_write_queue_from() loop. Not sure it makes a difference ? > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> > Reported-by: Lennart Schulte <lennart.schulte@nets.rwth-aachen.de> > Tested-by: Lennart Schulte <lennart.schulte@nets.rwth-aachen.de> > --- > net/ipv4/tcp_output.c | 3 +++ > 1 files changed, 3 insertions(+), 0 deletions(-) > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c > index b4ed957..7ed9dc1 100644 > --- a/net/ipv4/tcp_output.c > +++ b/net/ipv4/tcp_output.c > @@ -2208,6 +2208,9 @@ void tcp_xmit_retransmit_queue(struct sock *sk) > int mib_idx; > int fwd_rexmitting = 0; > > + if (!tp->packets_out) > + return; > + > if (!tp->lost_out) > tp->retransmit_high = tp->snd_una; > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCHv2] tcp: fix crash in tcp_xmit_retransmit_queue 2010-07-19 14:09 ` Eric Dumazet @ 2010-07-19 17:25 ` Ilpo Järvinen 2010-07-19 17:39 ` Eric Dumazet 0 siblings, 1 reply; 29+ messages in thread From: Ilpo Järvinen @ 2010-07-19 17:25 UTC (permalink / raw) To: Eric Dumazet Cc: Lennart Schulte, David Miller, Tejun Heo, lkml, netdev@vger.kernel.org, Fehrmann, Henning, Carsten Aulbert [-- Attachment #1: Type: TEXT/PLAIN, Size: 2153 bytes --] On Mon, 19 Jul 2010, Eric Dumazet wrote: > Le lundi 19 juillet 2010 à 14:16 +0300, Ilpo Järvinen a écrit : > > > Thanks for testing. > > > > DaveM, I think this oops was introduced for 2.6.28 (in > > 08ebd1721ab8fd362e90ae17b461c07b23fa2824 it seems, to be exact) so to > > stables it should go too please. I've only tweaked the message (so no need > > for Lennart to retest v2 :-)). > > > > -- > > [PATCHv2] tcp: fix crash in tcp_xmit_retransmit_queue > > > > It can happen that there are no packets in queue while calling > > tcp_xmit_retransmit_queue(). tcp_write_queue_head() then returns > > NULL and that gets deref'ed to get sacked into a local var. > > > > There is no work to do if no packets are outstanding so we just > > exit early. > > > > This oops was introduced by 08ebd1721ab8fd (tcp: remove tp->lost_out > > guard to make joining diff nicer). > > > > But prior to commit 08ebd1721ab8fd3, we were not testing > tp->packets_out, but tp->lost_out That's right, but back then we were not testing it for the same purpose. > if it was 0, we were not doing the tcp_for_write_queue_from() loop. This invariant _should_ be true all the time: lost_out <= packets_out ...and if it's not we would get Leak printouts every now and then. Thus is packets_out is zero no NULL defer with the if lost_out either. The other loop too (in pre 08eb kernels) will work because of earlier mentioned send_head check side-effects. > Not sure it makes a difference ? This difference is well thought and intentional, I didn't use different one by accident. We want to make sure we won't use NULL from tcp_write_queue_head() while the pre 08ebd1721ab8fd3 kernels was interested mainly whether the first loop should run or not (and of course ends up avoid the null deref too but it's more optimization like thing in there, ie., if there's no lost packets no work to-do). The deref could have been fixed by moving TCP_SKB_CB(skb)->sacked a bit later but that would again make us depend on the side-effect of the send_head check (in the case of packets_out being zero and wq empty) which is something I don't like too much. -- i. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCHv2] tcp: fix crash in tcp_xmit_retransmit_queue 2010-07-19 17:25 ` Ilpo Järvinen @ 2010-07-19 17:39 ` Eric Dumazet 2010-07-19 19:55 ` David Miller 0 siblings, 1 reply; 29+ messages in thread From: Eric Dumazet @ 2010-07-19 17:39 UTC (permalink / raw) To: Ilpo Järvinen Cc: Lennart Schulte, David Miller, Tejun Heo, lkml, netdev@vger.kernel.org, Fehrmann, Henning, Carsten Aulbert Le lundi 19 juillet 2010 à 20:25 +0300, Ilpo Järvinen a écrit : > This difference is well thought and intentional, I didn't use different > one by accident. We want to make sure we won't use NULL from > tcp_write_queue_head() while the pre 08ebd1721ab8fd3 kernels was > interested mainly whether the first loop should run or not (and of course > ends up avoid the null deref too but it's more optimization like > thing in there, ie., if there's no lost packets no work to-do). The deref > could have been fixed by moving TCP_SKB_CB(skb)->sacked a bit later but > that would again make us depend on the side-effect of the send_head check > (in the case of packets_out being zero and wq empty) which is something I > don't like too much. > Thanks Ilpo. Do you know in what exact circumstance the bug triggers ? It's hard to believe thousand of machines on the Internet never hit it :( Maybe another problem in congestion control ? ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCHv2] tcp: fix crash in tcp_xmit_retransmit_queue 2010-07-19 17:39 ` Eric Dumazet @ 2010-07-19 19:55 ` David Miller 2010-07-20 8:33 ` Ilpo Järvinen 0 siblings, 1 reply; 29+ messages in thread From: David Miller @ 2010-07-19 19:55 UTC (permalink / raw) To: eric.dumazet Cc: ilpo.jarvinen, lennart.schulte, tj, linux-kernel, netdev, henning.fehrmann, carsten.aulbert From: Eric Dumazet <eric.dumazet@gmail.com> Date: Mon, 19 Jul 2010 19:39:08 +0200 > Do you know in what exact circumstance the bug triggers ? > > It's hard to believe thousand of machines on the Internet never hit > it :( > > Maybe another problem in congestion control ? This is something to investigate, but the conditions under which tcp_fastretrans_alert() (the main invoker of tcp_xmit_retransmit_queue()) does it's thing are complicated enough that I'm going to add this fix for the time being and push it out to stable too. Thanks everyone. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [PATCHv2] tcp: fix crash in tcp_xmit_retransmit_queue 2010-07-19 19:55 ` David Miller @ 2010-07-20 8:33 ` Ilpo Järvinen 0 siblings, 0 replies; 29+ messages in thread From: Ilpo Järvinen @ 2010-07-20 8:33 UTC (permalink / raw) To: David Miller Cc: eric.dumazet, lennart.schulte, tj, LKML, Netdev, henning.fehrmann, carsten.aulbert On Mon, 19 Jul 2010, David Miller wrote: > From: Eric Dumazet <eric.dumazet@gmail.com> > Date: Mon, 19 Jul 2010 19:39:08 +0200 > > > Do you know in what exact circumstance the bug triggers ? > > > > It's hard to believe thousand of machines on the Internet never hit > > it :( > > > > Maybe another problem in congestion control ? > > This is something to investigate, but the conditions under which > tcp_fastretrans_alert() (the main invoker of tcp_xmit_retransmit_queue()) > does it's thing are complicated enough that I'm going to add this fix > for the time being and push it out to stable too. This is so true. ...So far I've managed to twice rule out of the possibility of this being really triggerable (ie., it would mean Lennart's out of tree changes broke it), and once in the middle came into opposite conclusion. Thus by majority voting we can deduce that it won't happen - how reassuring :-/. It seems that tcp_try_undo_recovery causes return if TCP remained in CA_Loss/CA_Recovery and that tcp_time_to_recover won't really let past return either under normal circumstances (more details below), and tcp_simple_retransmit requires lost_out to change; seems safe in mainline to me. Hmm... It seems that I've just solved another report too. ...Somebody a while back found out that setting reordering sysctl to zero (ie. to a value which does not make too much sense) crashed the kernel. It seems that at least then tcp_time_to_recover() would return true and trigger this bug (though I'm not sure if that's the only breakage to happen). Also worth to keep in mind is the bugzilla entry ("New freez in TCP" or something like that) so I'm not really sure I could say for sure nobody never hit it. The bugzilla one goes away by disable SACK (at least for some) but it might mix two different issues. It seems that there really are two different issues, the other may have something to do with SACK though there are other variables then involved, e.g., the changes in retransmission logic/timing, so it's impossible to say if the SACK disable really "fixed" the bugzilla one or not. Also Tejun's ->next == NULL finding points out to a different bug than this Lennart's one. -- i. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: oops in tcp_xmit_retransmit_queue() w/ v2.6.32.15 2010-07-16 12:02 ` Ilpo Järvinen 2010-07-16 12:25 ` Lennart Schulte @ 2010-07-19 14:57 ` Tejun Heo 2010-07-20 8:41 ` Ilpo Järvinen 2010-09-08 9:32 ` Ilpo Järvinen 1 sibling, 2 replies; 29+ messages in thread From: Tejun Heo @ 2010-07-19 14:57 UTC (permalink / raw) To: Ilpo Järvinen Cc: Lennart Schulte, Eric Dumazet, David S. Miller, lkml, netdev@vger.kernel.org, Fehrmann, Henning, Carsten Aulbert Hello, On 07/16/2010 02:02 PM, Ilpo Järvinen wrote: > Besides, Tejun has also found that it's hint->next ptr which is NULL in > his case so this won't solve his case anyway. Tejun, can you confirm > whether it was retransmit_skb_hint->next being NULL on _entry time_ to > tcp_xmit_retransmit_queue() or later on in the loop after the updates done > by the loop itself to the hint (or that your testing didn't conclude > either)? Sorry about the delay. I was traveling last week. Unfortunately, I don't know whether ->next was NULL on entry or not. I hacked up the following ugly patch for the next test run. It should have everything which has come up till now + list and hint sanity checking before starting processing them. I'm planning on deploying it w/ crashdump enabled in several days. If I've missed something, please let me know. Thanks. diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index b4ed957..1c8b1e0 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2190,6 +2190,53 @@ static int tcp_can_forward_retransmit(struct sock *sk) return 1; } +static void print_queue(struct sock *sk, struct sk_buff *old, struct sk_buff *hole) +{ + struct tcp_sock *tp = tcp_sk(sk); + struct sk_buff *skb, *prev; + bool do_panic = false; + + skb = tcp_write_queue_head(sk); + prev = (struct sk_buff *)(&sk->sk_write_queue); + + if (skb == NULL) { + printk("XXX NULL head, pkts %u\n", tp->packets_out); + do_panic = true; + } + + printk("XXX head %p tail %p sendhead %p oldhint %p now %p hole %p high %u\n", + tcp_write_queue_head(sk), tcp_write_queue_tail(sk), + tcp_send_head(sk), old, tp->retransmit_skb_hint, hole, + tp->retransmit_high); + + while (skb) { + printk("XXX skb %p (%u-%u) next %p prev %p sacked %u\n", + skb, TCP_SKB_CB(skb)->seq, TCP_SKB_CB(skb)->end_seq, + skb->next, skb->prev, TCP_SKB_CB(skb)->sacked); + if (prev != skb->prev) { + printk("XXX Inconsistent prev\n"); + do_panic = true; + } + + if (skb == tcp_write_queue_tail(sk)) { + if (skb->next != (struct sk_buff *)(&sk->sk_write_queue)) { + printk("XXX Improper next at tail\n"); + do_panic = true; + } + break; + } + + prev = skb; + skb = skb->next; + } + if (!skb) { + printk("XXX Encountered unexpected NULL\n"); + do_panic = true; + } + if (do_panic) + panic("XXX panicking"); +} + /* This gets called after a retransmit timeout, and the initially * retransmitted data is acknowledged. It tries to continue * resending the rest of the retransmit queue, until either @@ -2198,19 +2245,53 @@ static int tcp_can_forward_retransmit(struct sock *sk) * based retransmit packet might feed us FACK information again. * If so, we use it to avoid unnecessarily retransmissions. */ +static unsigned int caught_it; + void tcp_xmit_retransmit_queue(struct sock *sk) { const struct inet_connection_sock *icsk = inet_csk(sk); struct tcp_sock *tp = tcp_sk(sk); - struct sk_buff *skb; + struct sk_buff *skb, *prev; struct sk_buff *hole = NULL; + struct sk_buff *old = tp->retransmit_skb_hint; u32 last_lost; int mib_idx; int fwd_rexmitting = 0; + bool saw_hint = false; + + if (!tp->packets_out) { + if (net_ratelimit()) + printk("XXX !tp->packets_out, retransmit_skb_hint=%p, write_queue_head=%p\n", + tp->retransmit_skb_hint, tcp_write_queue_head(sk)); + return; + } if (!tp->lost_out) tp->retransmit_high = tp->snd_una; + for (skb = tcp_write_queue_head(sk), + prev = (struct sk_buff *)&sk->sk_write_queue; + skb != (struct sk_buff *)&sk->sk_write_queue; + prev = skb, skb = skb->next) { + if (prev != skb->prev) { + printk("XXX sanity check: prev corrupt\n"); + print_queue(sk, old, hole); + } + if (skb == tp->retransmit_skb_hint) + saw_hint = true; + if (skb == tcp_write_queue_tail(sk) && + skb->next != (struct sk_buff *)(&sk->sk_write_queue)) { + printk("XXX sanity check: end corrupt\n"); + print_queue(sk, old, hole); + } + } + if (tp->retransmit_skb_hint && !saw_hint) { + printk("XXX sanity check: retransmit_skb_hint=%p is not on list, claring hint\n", + tp->retransmit_skb_hint); + print_queue(sk, old, hole); + tp->retransmit_skb_hint = NULL; + } + if (tp->retransmit_skb_hint) { skb = tp->retransmit_skb_hint; last_lost = TCP_SKB_CB(skb)->end_seq; @@ -2218,7 +2299,17 @@ void tcp_xmit_retransmit_queue(struct sock *sk) last_lost = tp->retransmit_high; } else { skb = tcp_write_queue_head(sk); - last_lost = tp->snd_una; + if (skb) + last_lost = tp->snd_una; + } + +checknull: + if (skb == NULL) { + print_queue(sk, old, hole); + caught_it++; + if (net_ratelimit()) + printk("XXX Errors caught so far %u\n", caught_it); + return; } tcp_for_write_queue_from(skb, sk) { @@ -2261,7 +2352,7 @@ begin_fwd: } else if (!(sacked & TCPCB_LOST)) { if (hole == NULL && !(sacked & (TCPCB_SACKED_RETRANS|TCPCB_SACKED_ACKED))) hole = skb; - continue; + goto cont; } else { last_lost = TCP_SKB_CB(skb)->end_seq; @@ -2272,7 +2363,7 @@ begin_fwd: } if (sacked & (TCPCB_SACKED_ACKED|TCPCB_SACKED_RETRANS)) - continue; + goto cont; if (tcp_retransmit_skb(sk, skb)) return; @@ -2282,6 +2373,9 @@ begin_fwd: inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS, inet_csk(sk)->icsk_rto, TCP_RTO_MAX); +cont: + skb = skb->next; + goto checknull; } } -- tejun ^ permalink raw reply related [flat|nested] 29+ messages in thread
* Re: oops in tcp_xmit_retransmit_queue() w/ v2.6.32.15 2010-07-19 14:57 ` oops in tcp_xmit_retransmit_queue() w/ v2.6.32.15 Tejun Heo @ 2010-07-20 8:41 ` Ilpo Järvinen 2010-09-08 9:32 ` Ilpo Järvinen 1 sibling, 0 replies; 29+ messages in thread From: Ilpo Järvinen @ 2010-07-20 8:41 UTC (permalink / raw) To: Tejun Heo Cc: Lennart Schulte, Eric Dumazet, David S. Miller, lkml, netdev@vger.kernel.org, Fehrmann, Henning, Carsten Aulbert [-- Attachment #1: Type: TEXT/PLAIN, Size: 6095 bytes --] On Mon, 19 Jul 2010, Tejun Heo wrote: > Hello, > > On 07/16/2010 02:02 PM, Ilpo Järvinen wrote: > > Besides, Tejun has also found that it's hint->next ptr which is NULL in > > his case so this won't solve his case anyway. Tejun, can you confirm > > whether it was retransmit_skb_hint->next being NULL on _entry time_ to > > tcp_xmit_retransmit_queue() or later on in the loop after the updates done > > by the loop itself to the hint (or that your testing didn't conclude > > either)? > > Sorry about the delay. I was traveling last week. Unfortunately, I > don't know whether ->next was NULL on entry or not. I hacked up the > following ugly patch for the next test run. It should have everything > which has come up till now + list and hint sanity checking before > starting processing them. I'm planning on deploying it w/ crashdump > enabled in several days. If I've missed something, please let me > know. One thing that complicates things further is the fact that the write queue can change beneath us (ie., in tcp_retrans_try_collapse and tcp_fragment). I've read them multiple times through and always found them innocent so this might be just for-the-record type of a note (at least I hope so). -- i. > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c > index b4ed957..1c8b1e0 100644 > --- a/net/ipv4/tcp_output.c > +++ b/net/ipv4/tcp_output.c > @@ -2190,6 +2190,53 @@ static int tcp_can_forward_retransmit(struct sock *sk) > return 1; > } > > +static void print_queue(struct sock *sk, struct sk_buff *old, struct sk_buff *hole) > +{ > + struct tcp_sock *tp = tcp_sk(sk); > + struct sk_buff *skb, *prev; > + bool do_panic = false; > + > + skb = tcp_write_queue_head(sk); > + prev = (struct sk_buff *)(&sk->sk_write_queue); > + > + if (skb == NULL) { > + printk("XXX NULL head, pkts %u\n", tp->packets_out); > + do_panic = true; > + } > + > + printk("XXX head %p tail %p sendhead %p oldhint %p now %p hole %p high %u\n", > + tcp_write_queue_head(sk), tcp_write_queue_tail(sk), > + tcp_send_head(sk), old, tp->retransmit_skb_hint, hole, > + tp->retransmit_high); > + > + while (skb) { > + printk("XXX skb %p (%u-%u) next %p prev %p sacked %u\n", > + skb, TCP_SKB_CB(skb)->seq, TCP_SKB_CB(skb)->end_seq, > + skb->next, skb->prev, TCP_SKB_CB(skb)->sacked); > + if (prev != skb->prev) { > + printk("XXX Inconsistent prev\n"); > + do_panic = true; > + } > + > + if (skb == tcp_write_queue_tail(sk)) { > + if (skb->next != (struct sk_buff *)(&sk->sk_write_queue)) { > + printk("XXX Improper next at tail\n"); > + do_panic = true; > + } > + break; > + } > + > + prev = skb; > + skb = skb->next; > + } > + if (!skb) { > + printk("XXX Encountered unexpected NULL\n"); > + do_panic = true; > + } > + if (do_panic) > + panic("XXX panicking"); > +} > + > /* This gets called after a retransmit timeout, and the initially > * retransmitted data is acknowledged. It tries to continue > * resending the rest of the retransmit queue, until either > @@ -2198,19 +2245,53 @@ static int tcp_can_forward_retransmit(struct sock *sk) > * based retransmit packet might feed us FACK information again. > * If so, we use it to avoid unnecessarily retransmissions. > */ > +static unsigned int caught_it; > + > void tcp_xmit_retransmit_queue(struct sock *sk) > { > const struct inet_connection_sock *icsk = inet_csk(sk); > struct tcp_sock *tp = tcp_sk(sk); > - struct sk_buff *skb; > + struct sk_buff *skb, *prev; > struct sk_buff *hole = NULL; > + struct sk_buff *old = tp->retransmit_skb_hint; > u32 last_lost; > int mib_idx; > int fwd_rexmitting = 0; > + bool saw_hint = false; > + > + if (!tp->packets_out) { > + if (net_ratelimit()) > + printk("XXX !tp->packets_out, retransmit_skb_hint=%p, write_queue_head=%p\n", > + tp->retransmit_skb_hint, tcp_write_queue_head(sk)); > + return; > + } > > if (!tp->lost_out) > tp->retransmit_high = tp->snd_una; > > + for (skb = tcp_write_queue_head(sk), > + prev = (struct sk_buff *)&sk->sk_write_queue; > + skb != (struct sk_buff *)&sk->sk_write_queue; > + prev = skb, skb = skb->next) { > + if (prev != skb->prev) { > + printk("XXX sanity check: prev corrupt\n"); > + print_queue(sk, old, hole); > + } > + if (skb == tp->retransmit_skb_hint) > + saw_hint = true; > + if (skb == tcp_write_queue_tail(sk) && > + skb->next != (struct sk_buff *)(&sk->sk_write_queue)) { > + printk("XXX sanity check: end corrupt\n"); > + print_queue(sk, old, hole); > + } > + } > + if (tp->retransmit_skb_hint && !saw_hint) { > + printk("XXX sanity check: retransmit_skb_hint=%p is not on list, claring hint\n", > + tp->retransmit_skb_hint); > + print_queue(sk, old, hole); > + tp->retransmit_skb_hint = NULL; > + } > + > if (tp->retransmit_skb_hint) { > skb = tp->retransmit_skb_hint; > last_lost = TCP_SKB_CB(skb)->end_seq; > @@ -2218,7 +2299,17 @@ void tcp_xmit_retransmit_queue(struct sock *sk) > last_lost = tp->retransmit_high; > } else { > skb = tcp_write_queue_head(sk); > - last_lost = tp->snd_una; > + if (skb) > + last_lost = tp->snd_una; > + } > + > +checknull: > + if (skb == NULL) { > + print_queue(sk, old, hole); > + caught_it++; > + if (net_ratelimit()) > + printk("XXX Errors caught so far %u\n", caught_it); > + return; > } > > tcp_for_write_queue_from(skb, sk) { > @@ -2261,7 +2352,7 @@ begin_fwd: > } else if (!(sacked & TCPCB_LOST)) { > if (hole == NULL && !(sacked & (TCPCB_SACKED_RETRANS|TCPCB_SACKED_ACKED))) > hole = skb; > - continue; > + goto cont; > > } else { > last_lost = TCP_SKB_CB(skb)->end_seq; > @@ -2272,7 +2363,7 @@ begin_fwd: > } > > if (sacked & (TCPCB_SACKED_ACKED|TCPCB_SACKED_RETRANS)) > - continue; > + goto cont; > > if (tcp_retransmit_skb(sk, skb)) > return; > @@ -2282,6 +2373,9 @@ begin_fwd: > inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS, > inet_csk(sk)->icsk_rto, > TCP_RTO_MAX); > +cont: > + skb = skb->next; > + goto checknull; > } > } > > ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: oops in tcp_xmit_retransmit_queue() w/ v2.6.32.15 2010-07-19 14:57 ` oops in tcp_xmit_retransmit_queue() w/ v2.6.32.15 Tejun Heo 2010-07-20 8:41 ` Ilpo Järvinen @ 2010-09-08 9:32 ` Ilpo Järvinen 2010-09-08 10:25 ` Tejun Heo 1 sibling, 1 reply; 29+ messages in thread From: Ilpo Järvinen @ 2010-09-08 9:32 UTC (permalink / raw) To: Tejun Heo Cc: Lennart Schulte, Eric Dumazet, David S. Miller, netdev@vger.kernel.org, Fehrmann, Henning, Carsten Aulbert [-- Attachment #1: Type: TEXT/PLAIN, Size: 956 bytes --] On Mon, 19 Jul 2010, Tejun Heo wrote: > Hello, > > On 07/16/2010 02:02 PM, Ilpo Järvinen wrote: > > Besides, Tejun has also found that it's hint->next ptr which is NULL in > > his case so this won't solve his case anyway. Tejun, can you confirm > > whether it was retransmit_skb_hint->next being NULL on _entry time_ to > > tcp_xmit_retransmit_queue() or later on in the loop after the updates done > > by the loop itself to the hint (or that your testing didn't conclude > > either)? > > Sorry about the delay. I was traveling last week. Unfortunately, I > don't know whether ->next was NULL on entry or not. I hacked up the > following ugly patch for the next test run. It should have everything > which has come up till now + list and hint sanity checking before > starting processing them. I'm planning on deploying it w/ crashdump > enabled in several days. If I've missed something, please let me > know. Any news on this one? -- i. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: oops in tcp_xmit_retransmit_queue() w/ v2.6.32.15 2010-09-08 9:32 ` Ilpo Järvinen @ 2010-09-08 10:25 ` Tejun Heo 2010-09-08 10:34 ` Ilpo Järvinen 0 siblings, 1 reply; 29+ messages in thread From: Tejun Heo @ 2010-09-08 10:25 UTC (permalink / raw) To: Ilpo Järvinen Cc: Lennart Schulte, Eric Dumazet, David S. Miller, netdev@vger.kernel.org, Fehrmann, Henning, Carsten Aulbert Hello, On 09/08/2010 11:32 AM, Ilpo Järvinen wrote: >> Sorry about the delay. I was traveling last week. Unfortunately, I >> don't know whether ->next was NULL on entry or not. I hacked up the >> following ugly patch for the next test run. It should have everything >> which has come up till now + list and hint sanity checking before >> starting processing them. I'm planning on deploying it w/ crashdump >> enabled in several days. If I've missed something, please let me >> know. > > Any news on this one? Unfortunately, we haven't been able to reproduce the problem anymore. It could be (but not likely given that none of the debugging messages is triggering) that I was mistaken and the previously posted fixed the issue. The network used by the cluster went through some changes at the time and there have been issues with packet losses. Given that the problem needs packet losses to trigger, it's likely that packet loss pattern here changed such that the patterns of packet losses which trigger the problem aren't happening anymore. (Carsten, Henning, please feel free to fill in if I'm missing something). Thanks. -- tejun ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: oops in tcp_xmit_retransmit_queue() w/ v2.6.32.15 2010-09-08 10:25 ` Tejun Heo @ 2010-09-08 10:34 ` Ilpo Järvinen 2010-09-09 10:27 ` Tejun Heo 0 siblings, 1 reply; 29+ messages in thread From: Ilpo Järvinen @ 2010-09-08 10:34 UTC (permalink / raw) To: Tejun Heo Cc: Lennart Schulte, Eric Dumazet, David S. Miller, netdev@vger.kernel.org, Fehrmann, Henning, Carsten Aulbert [-- Attachment #1: Type: TEXT/PLAIN, Size: 1460 bytes --] On Wed, 8 Sep 2010, Tejun Heo wrote: > On 09/08/2010 11:32 AM, Ilpo Järvinen wrote: > >> Sorry about the delay. I was traveling last week. Unfortunately, I > >> don't know whether ->next was NULL on entry or not. I hacked up the > >> following ugly patch for the next test run. It should have everything > >> which has come up till now + list and hint sanity checking before > >> starting processing them. I'm planning on deploying it w/ crashdump > >> enabled in several days. If I've missed something, please let me > >> know. > > > > Any news on this one? > > Unfortunately, we haven't been able to reproduce the problem anymore. With my debug patch or not at all? > It could be (but not likely given that none of the debugging messages > is triggering) that I was mistaken and the previously posted fixed the > issue. The network used by the cluster went through some changes at > the time and there have been issues with packet losses. Given that > the problem needs packet losses to trigger, it's likely that packet > loss pattern here changed such that the patterns of packet losses > which trigger the problem aren't happening anymore. (Carsten, > Henning, please feel free to fill in if I'm missing something). That might well be true, however, you're already a second guy who cannot reproduce it with the debug patch so I would not rule out other possibilities unless you've tried without debug patch too since the changes? -- i. ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: oops in tcp_xmit_retransmit_queue() w/ v2.6.32.15 2010-09-08 10:34 ` Ilpo Järvinen @ 2010-09-09 10:27 ` Tejun Heo 2010-09-09 10:45 ` Ilpo Järvinen 0 siblings, 1 reply; 29+ messages in thread From: Tejun Heo @ 2010-09-09 10:27 UTC (permalink / raw) To: Ilpo Järvinen Cc: Lennart Schulte, Eric Dumazet, David S. Miller, netdev@vger.kernel.org, Fehrmann, Henning, Carsten Aulbert Hello, On 09/08/2010 12:34 PM, Ilpo Järvinen wrote: >> Unfortunately, we haven't been able to reproduce the problem anymore. > > With my debug patch or not at all? With the ugly merged patch I posted previously in this thread which contained debug messages if any of the worked around condition triggers. >> It could be (but not likely given that none of the debugging messages >> is triggering) that I was mistaken and the previously posted fixed the >> issue. The network used by the cluster went through some changes at >> the time and there have been issues with packet losses. Given that >> the problem needs packet losses to trigger, it's likely that packet >> loss pattern here changed such that the patterns of packet losses >> which trigger the problem aren't happening anymore. (Carsten, >> Henning, please feel free to fill in if I'm missing something). > > That might well be true, however, you're already a second guy who > cannot reproduce it with the debug patch so I would not rule out other > possibilities unless you've tried without debug patch too since the > changes? Unfortunately, I can't really tell one way or the other at this point. Carsten will be back in a few days. I'll ask him for more details. Thanks. -- tejun ^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: oops in tcp_xmit_retransmit_queue() w/ v2.6.32.15 2010-09-09 10:27 ` Tejun Heo @ 2010-09-09 10:45 ` Ilpo Järvinen 0 siblings, 0 replies; 29+ messages in thread From: Ilpo Järvinen @ 2010-09-09 10:45 UTC (permalink / raw) To: Tejun Heo Cc: Lennart Schulte, Eric Dumazet, David S. Miller, netdev@vger.kernel.org, Fehrmann, Henning, Carsten Aulbert [-- Attachment #1: Type: TEXT/PLAIN, Size: 1609 bytes --] On Thu, 9 Sep 2010, Tejun Heo wrote: > On 09/08/2010 12:34 PM, Ilpo Järvinen wrote: > >> Unfortunately, we haven't been able to reproduce the problem anymore. > > > > With my debug patch or not at all? > > With the ugly merged patch I posted previously in this thread which > contained debug messages if any of the worked around condition > triggers. > > >> It could be (but not likely given that none of the debugging messages > >> is triggering) that I was mistaken and the previously posted fixed the > >> issue. The network used by the cluster went through some changes at > >> the time and there have been issues with packet losses. Given that > >> the problem needs packet losses to trigger, it's likely that packet > >> loss pattern here changed such that the patterns of packet losses > >> which trigger the problem aren't happening anymore. (Carsten, > >> Henning, please feel free to fill in if I'm missing something). > > > > That might well be true, however, you're already a second guy who > > cannot reproduce it with the debug patch so I would not rule out other > > possibilities unless you've tried without debug patch too since the > > changes? > > Unfortunately, I can't really tell one way or the other at this point. > Carsten will be back in a few days. I'll ask him for more details. Once you get the info, if not yet done, I'd recommend you try without the debug patch (assuming a possible crash isn't too devasting for the actual stuff you're doing with the machines :-)). ...If it crashes without, then it's time to start looking into compiler versions, etc. -- i. ^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2010-09-09 10:45 UTC | newest] Thread overview: 29+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-07-08 8:22 oops in tcp_xmit_retransmit_queue() w/ v2.6.32.15 Tejun Heo 2010-07-11 2:36 ` David Miller 2010-07-11 16:09 ` Ilpo Järvinen 2010-07-11 17:06 ` Eric Dumazet 2010-07-11 17:46 ` Eric Dumazet 2010-07-11 18:29 ` Eric Dumazet 2010-07-11 19:22 ` Ilpo Järvinen 2010-07-11 19:25 ` Ilpo Järvinen 2010-07-11 19:44 ` Ilpo Järvinen 2010-07-15 11:58 ` Lennart Schulte 2010-07-15 12:05 ` Eric Dumazet 2010-07-15 12:55 ` Lennart Schulte 2010-07-16 12:02 ` Ilpo Järvinen 2010-07-16 12:25 ` Lennart Schulte 2010-07-16 13:19 ` Ilpo Järvinen 2010-07-19 8:06 ` Lennart Schulte 2010-07-19 11:16 ` [PATCHv2] tcp: fix crash in tcp_xmit_retransmit_queue Ilpo Järvinen 2010-07-19 14:09 ` Eric Dumazet 2010-07-19 17:25 ` Ilpo Järvinen 2010-07-19 17:39 ` Eric Dumazet 2010-07-19 19:55 ` David Miller 2010-07-20 8:33 ` Ilpo Järvinen 2010-07-19 14:57 ` oops in tcp_xmit_retransmit_queue() w/ v2.6.32.15 Tejun Heo 2010-07-20 8:41 ` Ilpo Järvinen 2010-09-08 9:32 ` Ilpo Järvinen 2010-09-08 10:25 ` Tejun Heo 2010-09-08 10:34 ` Ilpo Järvinen 2010-09-09 10:27 ` Tejun Heo 2010-09-09 10:45 ` Ilpo Järvinen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).