From: Mike McCormack <mikem@ring3k.org>
To: Jarek Poplawski <jarkao2@gmail.com>
Cc: Stephen Hemminger <shemminger@vyatta.com>,
netdev@vger.kernel.org, flyboy@gmail.com, dhazelton@enter.net,
mbreuer@majjas.com
Subject: Re: [PATCH] sky2: Lock transmit queue while disabling device
Date: Fri, 01 Jan 2010 08:51:23 +0900 [thread overview]
Message-ID: <4B3D38FB.40105@ring3k.org> (raw)
In-Reply-To: <4B3CF2C4.5070203@gmail.com>
Hi Jarek,
This is based on my analysis of the oops at:
http://bugzilla.kernel.org/show_bug.cgi?id=14925
Specifically:
>>> [ 8673.345873] sky2 eth0: receiver hang detected
>>> [ 8673.350368] sky2 eth0: disabling interface
>>> [ 8673.354749] BUG: unable to handle kernel NULL pointer dereference at
>>> 0000000000000010
>>> [ 8673.359748] IP: [<ffffffffa00373d3>] sky2_xmit_frame+0x321/0x5d8
>>> [sky2]
netif_device_detach() does not guarantee that all transmits have completed
after it returns.
CPU 1 stack will look like:
dev_queue_xmit()
HARD_TX_LOCK() -> __netif_tx_lock()
...
dev_hard_start_xmit()
ops->ndo_start_xmit() -> sky2_xmit_frame()
sky2_xmit_frame() pushing skb to hardware
use NULL tx_ring here
CPU 2 stack will look like:
sky2_restart()
rtnl_lock()
sky2_detach()
netif_device_detach()
sky2_down()
printk("sky2 eth0: disabling interface")
...
sky2_free_buffers(sky2);
sky2->tx_ring = NULL;
...
Another way to solve the problem would be to take the transmit lock in
netif_device_detach() to make sure that any in progress transmits have
completed before returning.
Note that most of these backtraces are using the nvidia binary only
module. This may change the timings and make the sky2 race more likely,
or be involved in the "tx timeout" condition that triggers a sky2_restart().
Will test with netif_tx_lock_bh and resubmit.
thanks,
Mike
Jarek Poplawski wrote:
> Mike McCormack wrote, On 12/31/2009 11:55 AM:
>
>> netif_device_detach() does not take the tx_lock, so it's
>> possible that a call to sky2_xmit_frame is still in
>> progress after netif_device_detach() is complete.
>>
>> Take netif_tx_lock() to make sure all transmits have
>> stopped while we're disabling the devices and that
>> no other CPU is still transmitting a frame after
>> we've disabling the device.
>>
>> Proposed fix for "sky2 panic under load" reported by Berck E. Nash.
>
> Could you give some scenario of the oops/fix?
> Btw, even if it worked, you should use netif_tx_lock_bh
> version considering sky2_detach use contexts, I guess.
>
> Jarek P.
>
>> Signed-off-by: Mike McCormack <mikem@ring3k.org>
>> ---
>> drivers/net/sky2.c | 2 ++
>> 1 files changed, 2 insertions(+), 0 deletions(-)
>>
>> diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
>> index faa4841..8ae8520 100644
>> --- a/drivers/net/sky2.c
>> +++ b/drivers/net/sky2.c
>> @@ -3176,7 +3176,9 @@ static void sky2_reset(struct sky2_hw *hw)
>> static void sky2_detach(struct net_device *dev)
>> {
>> if (netif_running(dev)) {
>> + netif_tx_lock(dev);
>> netif_device_detach(dev); /* stop txq */
>> + netif_tx_unlock(dev);
>> sky2_down(dev);
>> }
>> }
>
>
next prev parent reply other threads:[~2009-12-31 23:54 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-12-31 10:55 [PATCH] sky2: Lock transmit queue while disabling device Mike McCormack
2009-12-31 15:58 ` Michael Breuer
2009-12-31 16:15 ` Daniel Hazelton
2009-12-31 16:33 ` Berck Nash
2009-12-31 18:51 ` Jarek Poplawski
2009-12-31 23:51 ` Mike McCormack [this message]
2010-01-01 3:06 ` Berck E. Nash
2010-01-01 6:42 ` Stephen Hemminger
2010-01-01 18:31 ` Jarek Poplawski
2010-01-04 2:44 ` Berck E. Nash
2010-01-04 13:49 ` [PATCH] sky2: Fix oops in sky2_xmit_frame() after TX timeout Jarek Poplawski
2010-01-04 18:26 ` Stephen Hemminger
2010-01-04 18:48 ` [PATCH v2] " Jarek Poplawski
2010-01-07 4:27 ` [PATCH] sky2: Lock transmit queue while disabling device David Miller
2010-01-07 6:35 ` Jarek Poplawski
2010-01-07 8:01 ` David Miller
2010-01-07 8:15 ` Jarek Poplawski
2010-01-07 8:19 ` David Miller
2010-01-07 13:48 ` Stephen Hemminger
2010-01-07 18:08 ` Stephen Hemminger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4B3D38FB.40105@ring3k.org \
--to=mikem@ring3k.org \
--cc=dhazelton@enter.net \
--cc=flyboy@gmail.com \
--cc=jarkao2@gmail.com \
--cc=mbreuer@majjas.com \
--cc=netdev@vger.kernel.org \
--cc=shemminger@vyatta.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.