netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tina Yang <tina.yang@oracle.com>
To: David Miller <davem@davemloft.net>
Cc: mpm@selenic.com, netdev@vger.kernel.org
Subject: Re: [patch] net: avoid race between netpoll and network fast path
Date: Tue, 16 Oct 2007 22:46:30 -0700	[thread overview]
Message-ID: <4715A1B6.5020300@oracle.com> (raw)
In-Reply-To: <20071016.210613.71104656.davem@davemloft.net>

David Miller wrote:
> From: Tina Yang <tina.yang@oracle.com>
> Date: Tue, 16 Oct 2007 20:45:04 -0700
> 
>> The current netpoll design and implementation has serveral race issues with the
>> network fast path that panics/hangs the system or causes interface timeout/reset
>> but the fix is likely to have impact on the overall system performance and could
>> involve a large number of drivers.  The proposal is to disable the problem code
>> for normal operations but only to enable it at the time of crash in case polling
>> is necessary.  Tests that have been done included the bug fix verification
>> as well as regression check on the netlog results in various crash modes.
>>
>> Signed-off-by: Tina Yang <tina.yang@oracle.com>
> 
> This is at best a kludge, and it's the wrong way to approach this problem.
> 
> Fix the bug, and fix it right.
> 
	
> If you disable that stretch of code, what you've done is make the
> netpoll code hang and/or drop console messages if the TX queue is full
> in the driver and the only way to liberate TX space is to call into
> ->poll().


	Isn't net_rx_action() calling ->poll() to free the TX space ?
	TX queue full can only be emptied when the device is done transmitting
	not because of netpoll ->poll() it.  The softirq (net_rx_action)
	is the purpose for such an event.  Netconsole messages will be
	dropped if the device can't keep up with it regardless of netpoll
	->poll() or not.  If no dropping can be tolerated, then the
	netpoll upper layer probably should be redesigned to buffer the data.

	The poll_list currently is in a per_cpu structure, not being
	protected globally that netpoll thread from any cpu can
	trash it.

> 
> You haven't shown the precise race that leads to corruption so that someone
> so motivated can guide you towards a more correct fix if you are not
> capable of implementing it properly on your own.


	The precise race is
	1) net_rx_action get the dev from poll_list
	2) at the same time, netpoll poll_napi() get a hold of the poll lock
	   and calls ->poll(), remove dev from the poll list
	3) after it finishes, net_rx_action get the poll lock, and calls
	   ->poll() the second time, and panic when trying to remove (again)
	   the dev from the poll list.
	and I had logged all the crash info from the crash scenes into the
	bug database.

	As Matt Mackall had acknowledged, the network fast path went to great
	length to reduce locking overhead, should that be undone because of
	netpoll if that's what it takes to fix it more correctly ?

	

	


  reply	other threads:[~2007-10-17  5:53 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-10-17  3:45 [patch] net: avoid race between netpoll and network fast path Tina Yang
2007-10-17  4:06 ` David Miller
2007-10-17  5:46   ` Tina Yang [this message]
2007-10-30  4:26     ` David Miller
2007-10-30  5:08       ` Matt Mackall

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4715A1B6.5020300@oracle.com \
    --to=tina.yang@oracle.com \
    --cc=davem@davemloft.net \
    --cc=mpm@selenic.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).