[PATCH] NULL pointer deref in tcp_do_twkill

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] NULL pointer deref in tcp_do_twkill_work()
@ 2004-02-27 22:09 olof
  2004-02-27 22:34 ` David S. Miller
  0 siblings, 1 reply; 2+ messages in thread
From: olof @ 2004-02-27 22:09 UTC (permalink / raw)
  To: davem; +Cc: linux-kernel, netdev, niv, anton, milliner

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2033 bytes --]

This bug has been encountered before (ref http://bugme.osdl.org/show_bug.cgi?id=1627),
but that time it went away by itself. We're now seeing it again with
another heavy network load on 2.6.3.

Recap:

We're crashing in tcp_do_twkill_work():

        tw_for_each_inmate(tw, node, safe,
                           &tcp_tw_death_row[slot]) {
                __tw_del_dead_node(tw);
                spin_unlock(&tw_death_lock);
                tcp_timewait_kill(tw);
                tcp_tw_put(tw);
                killed++;
                spin_lock(&tw_death_lock);
                if (killed > quota) {
                        ret = 1;
                        break;
                }
        }

The crash is in __tw_del_dead_node, where tw is taken off of the
tw_death_node list. At the time of the crash, the pprev list pointer is
NULL, which indicates that tw is no longer on the list.

I'm suspicious of the fact that the tw_death_lock is released, and since
it's released, the "safe" inmate could be descheduled on another CPU. Once
the lock is reaquired and the loop is redone, the new tw (old safe) is
now already taken off the list so we go astray.

Shouldn't the loop always restart from the beginning instead of using the
(not thread-)"safe" list iteration? Since it keeps taking the entries off
the list, the behaviour should be identical but it wouldn't be racy.

The alternative is to not drop the lock, but I'm guessing we need to do
that to avoid deadlocks. I don't know the TCP code well enough to tell for
sure.

Proposed patch is attached. We've given it a bit of runtime here but the
crashes happen randomly so it'll take a while to gain full confidence that
it's fixed. But the above reasoning around the patch seems to make sense.

Thanks,

Olof

Olof Johansson                                        Office: 4E002/905
Linux on Power Development                            IBM Systems Group
Email: olof@austin.ibm.com                          Phone: 512-838-9858
All opinions are my own and not those of IBM

[-- Attachment #2: Type: TEXT/PLAIN, Size: 1229 bytes --]

--- net/ipv4/tcp_minisocks.c.orig	2004-02-27 13:43:11.000000000 -0600
+++ net/ipv4/tcp_minisocks.c	2004-02-27 15:21:18.318890672 -0600
@@ -427,9 +427,7 @@ static u32 twkill_thread_slots;
 static int tcp_do_twkill_work(int slot, unsigned int quota)
 {
 	struct tcp_tw_bucket *tw;
-	struct hlist_node *node, *safe;
 	unsigned int killed;
-	int ret;

 	/* NOTE: compare this to previous version where lock
 	 * was released after detaching chain. It was racy,
@@ -438,25 +436,21 @@ static int tcp_do_twkill_work(int slot, 
 	 * soft irqs are not sequenced.
 	 */
 	killed = 0;
-	ret = 0;
-	tw_for_each_inmate(tw, node, safe,
-			   &tcp_tw_death_row[slot]) {
+	while (killed <= quota &&
+	       !hlist_empty(&tcp_tw_death_row[slot])) {
+		tw = hlist_entry(tcp_tw_death_row[slot].first, struct tcp_tw_bucket, tw_death_node);
 		__tw_del_dead_node(tw);
 		spin_unlock(&tw_death_lock);
 		tcp_timewait_kill(tw);
 		tcp_tw_put(tw);
 		killed++;
 		spin_lock(&tw_death_lock);
-		if (killed > quota) {
-			ret = 1;
-			break;
-		}
 	}

 	tcp_tw_count -= killed;
 	NET_ADD_STATS_BH(TimeWaited, killed);

-	return ret;
+	return killed > quota;
 }

 static void tcp_twkill(unsigned long dummy)

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [PATCH] NULL pointer deref in tcp_do_twkill_work()
  2004-02-27 22:09 [PATCH] NULL pointer deref in tcp_do_twkill_work() olof
@ 2004-02-27 22:34 ` David S. Miller
  0 siblings, 0 replies; 2+ messages in thread
From: David S. Miller @ 2004-02-27 22:34 UTC (permalink / raw)
  To: olof, davidm, akpm; +Cc: linux-kernel, netdev, niv, anton, milliner

On Fri, 27 Feb 2004 16:09:21 -0600 (CST)
olof@austin.ibm.com wrote:

> This bug has been encountered before (ref http://bugme.osdl.org/show_bug.cgi?id=1627),
> but that time it went away by itself. We're now seeing it again with
> another heavy network load on 2.6.3.

Olaf, you are a life saver.

David Mosberger just recently told me that ia64 is hitting this  problem
again too and I believe your analysis is perfect.

Yes, we cannot cache the 'safe' pointer in tcp_do_twkill_work() while the lock
is not held, and YES another cpu can kill off the bucket that 'safe' points to.

This patch should fix everything up.  Let me know how it goes.

===== include/net/tcp.h 1.56 vs edited =====
--- 1.56/include/net/tcp.h	Sun Feb  1 11:17:41 2004
+++ edited/include/net/tcp.h	Fri Feb 27 14:31:03 2004
@@ -263,7 +263,10 @@
 #define tw_for_each(tw, node, head) \
 	hlist_for_each_entry(tw, node, head, tw_node)
 
-#define tw_for_each_inmate(tw, node, safe, jail) \
+#define tw_for_each_inmate(tw, node, jail) \
+	hlist_for_each_entry(tw, node, jail, tw_death_node)
+
+#define tw_for_each_inmate_safe(tw, node, safe, jail) \
 	hlist_for_each_entry_safe(tw, node, safe, jail, tw_death_node)
 
 #define tcptw_sk(__sk)	((struct tcp_tw_bucket *)(__sk))
===== net/ipv4/tcp_minisocks.c 1.43 vs edited =====
--- 1.43/net/ipv4/tcp_minisocks.c	Tue Oct 28 03:10:47 2003
+++ edited/net/ipv4/tcp_minisocks.c	Fri Feb 27 14:30:01 2004
@@ -427,7 +427,7 @@
 static int tcp_do_twkill_work(int slot, unsigned int quota)
 {
 	struct tcp_tw_bucket *tw;
-	struct hlist_node *node, *safe;
+	struct hlist_node *node;
 	unsigned int killed;
 	int ret;
 
@@ -439,8 +439,8 @@
 	 */
 	killed = 0;
 	ret = 0;
-	tw_for_each_inmate(tw, node, safe,
-			   &tcp_tw_death_row[slot]) {
+rescan:
+	tw_for_each_inmate(tw, node, &tcp_tw_death_row[slot]) {
 		__tw_del_dead_node(tw);
 		spin_unlock(&tw_death_lock);
 		tcp_timewait_kill(tw);
@@ -451,6 +451,14 @@
 			ret = 1;
 			break;
 		}
+
+		/* While we dropped tw_death_lock, another cpu may have
+		 * killed off the next TW bucket in the list, therefore
+		 * do a fresh re-read of the hlist head node with the
+		 * lock reacquired.  We still use the hlist traversal
+		 * macro in order to get the prefetches.
+		 */
+		goto rescan;
 	}
 
 	tcp_tw_count -= killed;
@@ -637,7 +645,7 @@
 			struct hlist_node *node, *safe;
 			struct tcp_tw_bucket *tw;
 
-			tw_for_each_inmate(tw, node, safe,
+			tw_for_each_inmate_safe(tw, node, safe,
 					   &tcp_twcal_row[slot]) {
 				__tw_del_dead_node(tw);
 				tcp_timewait_kill(tw);

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2004-02-27 22:34 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-02-27 22:09 [PATCH] NULL pointer deref in tcp_do_twkill_work() olof
2004-02-27 22:34 ` David S. Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).