[patch] net: avoid race between netpoll and network fast path

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [patch] net: avoid race between netpoll and network fast path
@ 2007-10-17  3:45 Tina Yang
  2007-10-17  4:06 ` David Miller
  0 siblings, 1 reply; 5+ messages in thread
From: Tina Yang @ 2007-10-17  3:45 UTC (permalink / raw)
  To: Matt Mackall; +Cc: netdev

The current netpoll design and implementation has serveral race issues with the
network fast path that panics/hangs the system or causes interface timeout/reset
but the fix is likely to have impact on the overall system performance and could
involve a large number of drivers.  The proposal is to disable the problem code
for normal operations but only to enable it at the time of crash in case polling
is necessary.  Tests that have been done included the bug fix verification
as well as regression check on the netlog results in various crash modes.

Signed-off-by: Tina Yang <tina.yang@oracle.com>

---

--- linux-2.6.23.1/include/linux/kernel.h.orig	2007-10-12 09:43:44.000000000 -0700
+++ linux-2.6.23.1/include/linux/kernel.h	2007-10-15 22:05:27.000000000 -0700
@@ -184,6 +184,8 @@ static inline void console_verbose(void)
 		console_loglevel = 15;
 }

+extern int netpoll_enable;
+
 extern void bust_spinlocks(int yes);
 extern void wake_up_klogd(void);
 extern int oops_in_progress;		/* If set, an oops, panic(), BUG() or die() is in progress */
--- linux-2.6.23.1/net/core/netpoll.c.orig	2007-10-12 09:43:44.000000000 -0700
+++ linux-2.6.23.1/net/core/netpoll.c	2007-10-15 22:20:15.000000000 -0700
@@ -150,15 +150,19 @@ static void service_arp_queue(struct net
 	}
 }

+int netpoll_enable;
+EXPORT_SYMBOL(netpoll_enable);
 void netpoll_poll(struct netpoll *np)
 {
 	if (!np->dev || !netif_running(np->dev) || !np->dev->poll_controller)
 		return;

 	/* Process pending work on NIC */
-	np->dev->poll_controller(np->dev);
-	if (np->dev->poll)
-		poll_napi(np);
+	if (unlikely(netpoll_enable)) {
+		np->dev->poll_controller(np->dev);
+		if (np->dev->poll)
+			poll_napi(np);
+	}

 	service_arp_queue(np->dev->npinfo);

--- linux-2.6.23.1/kernel/panic.c.orig	2007-10-12 09:43:44.000000000 -0700
+++ linux-2.6.23.1/kernel/panic.c	2007-10-15 22:07:25.000000000 -0700
@@ -66,6 +66,7 @@ NORET_TYPE void panic(const char * fmt,
         unsigned long caller = (unsigned long) __builtin_return_address(0);
 #endif

+	netpoll_enable = 1;
 	/*
 	 * It's possible to come here directly from a panic-assertion and not
 	 * have preempt disabled. Some functions called from here want
--- linux-2.6.23.1/arch/x86_64/kernel/traps.c.orig	2007-10-12 09:43:44.000000000 -0700
+++ linux-2.6.23.1/arch/x86_64/kernel/traps.c	2007-10-15 22:06:29.000000000 -0700
@@ -522,6 +522,8 @@ void __kprobes __die(const char * str, s
 #endif
 	printk("\n");
 	notify_die(DIE_OOPS, str, regs, err, current->thread.trap_no, SIGSEGV);
+	if (kexec_should_crash(current))
+		netpoll_enable = 1;
 	show_registers(regs);
 	add_taint(TAINT_DIE);
 	/* Executive summary in case the oops scrolled away */
--- linux-2.6.23.1/arch/i386/kernel/traps.c.orig	2007-10-12 09:43:44.000000000 -0700
+++ linux-2.6.23.1/arch/i386/kernel/traps.c	2007-10-15 22:06:14.000000000 -0700
@@ -428,6 +428,8 @@ void die(const char * str, struct pt_reg
 		if (notify_die(DIE_OOPS, str, regs, err,
 					current->thread.trap_no, SIGSEGV) !=
 				NOTIFY_STOP) {
+			if (kexec_should_crash(current))
+				netpoll_enable = 1;
 			show_registers(regs);
 			/* Executive summary in case the oops scrolled away */
 			esp = (unsigned long) (&regs->esp);

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [patch] net: avoid race between netpoll and network fast path
  2007-10-17  3:45 [patch] net: avoid race between netpoll and network fast path Tina Yang
@ 2007-10-17  4:06 ` David Miller
  2007-10-17  5:46   ` Tina Yang
  0 siblings, 1 reply; 5+ messages in thread
From: David Miller @ 2007-10-17  4:06 UTC (permalink / raw)
  To: tina.yang; +Cc: mpm, netdev

From: Tina Yang <tina.yang@oracle.com>
Date: Tue, 16 Oct 2007 20:45:04 -0700

> The current netpoll design and implementation has serveral race issues with the
> network fast path that panics/hangs the system or causes interface timeout/reset
> but the fix is likely to have impact on the overall system performance and could
> involve a large number of drivers.  The proposal is to disable the problem code
> for normal operations but only to enable it at the time of crash in case polling
> is necessary.  Tests that have been done included the bug fix verification
> as well as regression check on the netlog results in various crash modes.
> 
> Signed-off-by: Tina Yang <tina.yang@oracle.com>

This is at best a kludge, and it's the wrong way to approach this problem.

Fix the bug, and fix it right.

If you disable that stretch of code, what you've done is make the
netpoll code hang and/or drop console messages if the TX queue is full
in the driver and the only way to liberate TX space is to call into
->poll().

You haven't shown the precise race that leads to corruption so that someone
so motivated can guide you towards a more correct fix if you are not
capable of implementing it properly on your own.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [patch] net: avoid race between netpoll and network fast path
  2007-10-17  4:06 ` David Miller
@ 2007-10-17  5:46   ` Tina Yang
  2007-10-30  4:26     ` David Miller
  0 siblings, 1 reply; 5+ messages in thread
From: Tina Yang @ 2007-10-17  5:46 UTC (permalink / raw)
  To: David Miller; +Cc: mpm, netdev

David Miller wrote:
> From: Tina Yang <tina.yang@oracle.com>
> Date: Tue, 16 Oct 2007 20:45:04 -0700
> 
>> The current netpoll design and implementation has serveral race issues with the
>> network fast path that panics/hangs the system or causes interface timeout/reset
>> but the fix is likely to have impact on the overall system performance and could
>> involve a large number of drivers.  The proposal is to disable the problem code
>> for normal operations but only to enable it at the time of crash in case polling
>> is necessary.  Tests that have been done included the bug fix verification
>> as well as regression check on the netlog results in various crash modes.
>>
>> Signed-off-by: Tina Yang <tina.yang@oracle.com>
> 
> This is at best a kludge, and it's the wrong way to approach this problem.
> 
> Fix the bug, and fix it right.
> 
	
> If you disable that stretch of code, what you've done is make the
> netpoll code hang and/or drop console messages if the TX queue is full
> in the driver and the only way to liberate TX space is to call into
> ->poll().


	Isn't net_rx_action() calling ->poll() to free the TX space ?
	TX queue full can only be emptied when the device is done transmitting
	not because of netpoll ->poll() it.  The softirq (net_rx_action)
	is the purpose for such an event.  Netconsole messages will be
	dropped if the device can't keep up with it regardless of netpoll
	->poll() or not.  If no dropping can be tolerated, then the
	netpoll upper layer probably should be redesigned to buffer the data.

	The poll_list currently is in a per_cpu structure, not being
	protected globally that netpoll thread from any cpu can
	trash it.

> 
> You haven't shown the precise race that leads to corruption so that someone
> so motivated can guide you towards a more correct fix if you are not
> capable of implementing it properly on your own.


	The precise race is
	1) net_rx_action get the dev from poll_list
	2) at the same time, netpoll poll_napi() get a hold of the poll lock
	   and calls ->poll(), remove dev from the poll list
	3) after it finishes, net_rx_action get the poll lock, and calls
	   ->poll() the second time, and panic when trying to remove (again)
	   the dev from the poll list.
	and I had logged all the crash info from the crash scenes into the
	bug database.

	As Matt Mackall had acknowledged, the network fast path went to great
	length to reduce locking overhead, should that be undone because of
	netpoll if that's what it takes to fix it more correctly ?

	

	


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [patch] net: avoid race between netpoll and network fast path
  2007-10-17  5:46   ` Tina Yang
@ 2007-10-30  4:26     ` David Miller
  2007-10-30  5:08       ` Matt Mackall
  0 siblings, 1 reply; 5+ messages in thread
From: David Miller @ 2007-10-30  4:26 UTC (permalink / raw)
  To: tina.yang; +Cc: mpm, netdev

From: Tina Yang <tina.yang@oracle.com>
Date: Tue, 16 Oct 2007 22:46:30 -0700

> 	The precise race is
> 	1) net_rx_action get the dev from poll_list
> 	2) at the same time, netpoll poll_napi() get a hold of the poll lock
> 	   and calls ->poll(), remove dev from the poll list
> 	3) after it finishes, net_rx_action get the poll lock, and calls
> 	   ->poll() the second time, and panic when trying to remove (again)
> 	   the dev from the poll list.

This is trivial to fix.

I'll check the following into 2.6.14 and backport it to
the -stable trees.

[NET]: Fix race between poll_napi() and net_rx_action()

netpoll_poll_lock() synchronizes the ->poll() invocation
code paths, but once we have the lock we have to make
sure that NAPI_STATE_SCHED is still set.  Otherwise we
get:

	cpu 0			cpu 1

	net_rx_action()		poll_napi()
	netpoll_poll_lock()	... spin on ->poll_lock
	->poll()
	  netif_rx_complete
	netpoll_poll_unlock()	acquire ->poll_lock()
				->poll()
				 netif_rx_complete()
				 CRASH

Based upon a bug report from Tina Yang.

Signed-off-by: David S. Miller <davem@davemloft.net>

diff --git a/net/core/dev.c b/net/core/dev.c
index 853c8b5..02e7d83 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2172,7 +2172,15 @@ static void net_rx_action(struct softirq_action *h)
 
 		weight = n->weight;
 
-		work = n->poll(n, weight);
+		/* This NAPI_STATE_SCHED test is for avoiding a race
+		 * with netpoll's poll_napi().  Only the entity which
+		 * obtains the lock and sees NAPI_STATE_SCHED set will
+		 * actually make the ->poll() call.  Therefore we avoid
+		 * accidently calling ->poll() when NAPI is not scheduled.
+		 */
+		work = 0;
+		if (test_bit(NAPI_STATE_SCHED, &n->state))
+			work = n->poll(n, weight);
 
 		WARN_ON_ONCE(work > weight);
 
diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index bf8d18f..c499b5c 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -116,6 +116,29 @@ static __sum16 checksum_udp(struct sk_buff *skb, struct udphdr *uh,
  * network adapter, forcing superfluous retries and possibly timeouts.
  * Thus, we set our budget to greater than 1.
  */
+static int poll_one_napi(struct netpoll_info *npinfo,
+			 struct napi_struct *napi, int budget)
+{
+	int work;
+
+	/* net_rx_action's ->poll() invocations and our's are
+	 * synchronized by this test which is only made while
+	 * holding the napi->poll_lock.
+	 */
+	if (!test_bit(NAPI_STATE_SCHED, &napi->state))
+		return budget;
+
+	npinfo->rx_flags |= NETPOLL_RX_DROP;
+	atomic_inc(&trapped);
+
+	work = napi->poll(napi, budget);
+
+	atomic_dec(&trapped);
+	npinfo->rx_flags &= ~NETPOLL_RX_DROP;
+
+	return budget - work;
+}
+
 static void poll_napi(struct netpoll *np)
 {
 	struct netpoll_info *npinfo = np->dev->npinfo;
@@ -123,17 +146,13 @@ static void poll_napi(struct netpoll *np)
 	int budget = 16;
 
 	list_for_each_entry(napi, &np->dev->napi_list, dev_list) {
-		if (test_bit(NAPI_STATE_SCHED, &napi->state) &&
-		    napi->poll_owner != smp_processor_id() &&
+		if (napi->poll_owner != smp_processor_id() &&
 		    spin_trylock(&napi->poll_lock)) {
-			npinfo->rx_flags |= NETPOLL_RX_DROP;
-			atomic_inc(&trapped);
-
-			napi->poll(napi, budget);
-
-			atomic_dec(&trapped);
-			npinfo->rx_flags &= ~NETPOLL_RX_DROP;
+			budget = poll_one_napi(npinfo, napi, budget);
 			spin_unlock(&napi->poll_lock);
+
+			if (!budget)
+				break;
 		}
 	}
 }


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [patch] net: avoid race between netpoll and network fast path
  2007-10-30  4:26     ` David Miller
@ 2007-10-30  5:08       ` Matt Mackall
  0 siblings, 0 replies; 5+ messages in thread
From: Matt Mackall @ 2007-10-30  5:08 UTC (permalink / raw)
  To: David Miller; +Cc: tina.yang, netdev

On Mon, Oct 29, 2007 at 09:26:11PM -0700, David Miller wrote:
> From: Tina Yang <tina.yang@oracle.com>
> Date: Tue, 16 Oct 2007 22:46:30 -0700
> 
> > 	The precise race is
> > 	1) net_rx_action get the dev from poll_list
> > 	2) at the same time, netpoll poll_napi() get a hold of the poll lock
> > 	   and calls ->poll(), remove dev from the poll list
> > 	3) after it finishes, net_rx_action get the poll lock, and calls
> > 	   ->poll() the second time, and panic when trying to remove (again)
> > 	   the dev from the poll list.
> 
> This is trivial to fix.
> 
> I'll check the following into 2.6.14 and backport it to
> the -stable trees.
> 
> [NET]: Fix race between poll_napi() and net_rx_action()
> 
> netpoll_poll_lock() synchronizes the ->poll() invocation
> code paths, but once we have the lock we have to make
> sure that NAPI_STATE_SCHED is still set.  Otherwise we
> get:
> 
> 	cpu 0			cpu 1
> 
> 	net_rx_action()		poll_napi()
> 	netpoll_poll_lock()	... spin on ->poll_lock
> 	->poll()
> 	  netif_rx_complete
> 	netpoll_poll_unlock()	acquire ->poll_lock()
> 				->poll()
> 				 netif_rx_complete()
> 				 CRASH
> 
> Based upon a bug report from Tina Yang.
> 
> Signed-off-by: David S. Miller <davem@davemloft.net>

Thanks, Dave.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2007-10-30  5:09 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-17  3:45 [patch] net: avoid race between netpoll and network fast path Tina Yang
2007-10-17  4:06 ` David Miller
2007-10-17  5:46   ` Tina Yang
2007-10-30  4:26     ` David Miller
2007-10-30  5:08       ` Matt Mackall

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).