netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Berck E. Nash" <flyboy@gmail.com>
To: Mike McCormack <mikem@ring3k.org>
Cc: Jarek Poplawski <jarkao2@gmail.com>,
	Stephen Hemminger <shemminger@vyatta.com>,
	netdev@vger.kernel.org, dhazelton@enter.net, mbreuer@majjas.com
Subject: Re: [PATCH] sky2: Lock transmit queue while disabling device
Date: Thu, 31 Dec 2009 20:06:18 -0700	[thread overview]
Message-ID: <4B3D66AA.3030709@gmail.com> (raw)
In-Reply-To: <4B3D38FB.40105@ring3k.org>

[-- Attachment #1: Type: text/plain, Size: 3059 bytes --]

Well, that didn't fix it.  Oops attached, looks pretty much the same to me.

Mike McCormack wrote:
> Hi Jarek,
> 
> This is based on my analysis of the oops at:
> 
> http://bugzilla.kernel.org/show_bug.cgi?id=14925
> 
> Specifically:
> 
>>>> [ 8673.345873] sky2 eth0: receiver hang detected
>>>> [ 8673.350368] sky2 eth0: disabling interface
>>>> [ 8673.354749] BUG: unable to handle kernel NULL pointer dereference at
>>>> 0000000000000010
>>>> [ 8673.359748] IP: [<ffffffffa00373d3>] sky2_xmit_frame+0x321/0x5d8 
>>>> [sky2]
> 
> netif_device_detach() does not guarantee that all transmits have completed 
> after it returns.
> 
> CPU 1 stack will look like:
> 
>   dev_queue_xmit()
>      HARD_TX_LOCK() -> __netif_tx_lock()
>      ...
>      dev_hard_start_xmit()
>         ops->ndo_start_xmit()  -> sky2_xmit_frame()
>         sky2_xmit_frame() pushing skb to hardware
>           use NULL tx_ring here
> 
> 
> CPU 2 stack will look like:
>            
>   sky2_restart()
>      rtnl_lock()
>      sky2_detach()
>         netif_device_detach()
>         sky2_down()
>           printk("sky2 eth0: disabling interface")
>           ...
>           sky2_free_buffers(sky2);
>             sky2->tx_ring = NULL;
>           ...
> 
> Another way to solve the problem would be to take the transmit lock in 
> netif_device_detach() to make sure that any in progress transmits have
> completed before returning.
> 
> Note that most of these backtraces are using the nvidia binary only 
> module.  This may change the timings and make the sky2 race more likely,
> or be involved in the "tx timeout" condition that triggers a sky2_restart().
> 
> Will test with netif_tx_lock_bh and resubmit.
> 
> thanks,
> 
> Mike
>      
>   
>    
> 
> Jarek Poplawski wrote:
>> Mike McCormack wrote, On 12/31/2009 11:55 AM:
>>
>>> netif_device_detach() does not take the tx_lock, so it's
>>>  possible that a call to sky2_xmit_frame is still in
>>>  progress after netif_device_detach() is complete.
>>>
>>> Take netif_tx_lock() to make sure all transmits have
>>>  stopped while we're disabling the devices and that
>>>  no other CPU is still transmitting a frame after
>>>  we've disabling the device.
>>>
>>> Proposed fix for "sky2 panic under load" reported by Berck E. Nash.
>> Could you give some scenario of the oops/fix?
>> Btw, even if it worked, you should use netif_tx_lock_bh
>> version considering sky2_detach use contexts, I guess.
>>
>> Jarek P.
>>
>>> Signed-off-by: Mike McCormack <mikem@ring3k.org>
>>> ---
>>>  drivers/net/sky2.c |    2 ++
>>>  1 files changed, 2 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
>>> index faa4841..8ae8520 100644
>>> --- a/drivers/net/sky2.c
>>> +++ b/drivers/net/sky2.c
>>> @@ -3176,7 +3176,9 @@ static void sky2_reset(struct sky2_hw *hw)
>>>  static void sky2_detach(struct net_device *dev)
>>>  {
>>>  	if (netif_running(dev)) {
>>> +		netif_tx_lock(dev);
>>>  		netif_device_detach(dev);	/* stop txq */
>>> +		netif_tx_unlock(dev);
>>>  		sky2_down(dev);
>>>  	}
>>>  }
>>
> 


[-- Attachment #2: sky2crash2.txt --]
[-- Type: text/plain, Size: 5379 bytes --]

[ 5768.704033] sky2 eth0: receiver hang detected
[ 5768.708579] sky2 eth0: disabling interface
[ 5768.712928] BUG: unable to handle kernel NULL pointer dereference at 0000000000000ad0
[ 5768.717776] IP: [<ffffffffa003d46f>] sky2_xmit_frame+0x321/0x5d8 [sky2]
[ 5768.726935] PGD beaa3067 PUD ba837067 PMD 0 
[ 5768.731121] Oops: 0002 [#1] SMP 
[ 5768.731121] last sysfs file: /sys/devices/platform/coretemp.0/temp1_label
[ 5768.740188] CPU 0 
[ 5768.742247] Modules linked in: nvidia(P) nfsd exportfs nfs lockd nfs_acl auth_rpcgss sunrpc nls_cp437 msdos fat kvm_intel kvm fuse snd_rtctimer usbhid hwmon_vid tuner_simple tuner_types wm8775 snd_hda_codec_realtek tda9887 tda8290 snd_hda_intel snd_hda_codec snd_pcm_oss snd_mixer_oss snd_pcm tuner snd_seq_dummy cx25840 ivtv i2c_algo_bit cx2341x snd_seq_oss uhci_hcd snd_seq_midi_event ehci_hcd v4l2_common i2c_i801 snd_seq videodev snd_timer v4l1_compat v4l2_compat_ioctl32 snd_seq_device tveeprom snd floppy sky2 usbcore soundcore snd_page_alloc [last unloaded: nvidia]
[ 5768.794811] Pid: 4, comm: ksoftirqd/0 Tainted: P           2.6.32.2 #9 P5W DH Deluxe
[ 5768.801019] RIP: 0010:[<ffffffffa003d46f>]  [<ffffffffa003d46f>] sky2_xmit_frame+0x321/0x5d8 [sky2]
[ 5768.808600] RSP: 0018:ffff880001603df8  EFLAGS: 00010206
[ 5768.817679] RAX: 00000000000002b0 RBX: ffff8800bd184540 RCX: 0000000000000ac0
[ 5768.822147] RDX: 0000000000000000 RSI: 000000000000008c RDI: 0000000000000ac0
[ 5768.831325] RBP: ffff880001603e48 R08: 0000000000000001 R09: 0000000000000000
[ 5768.835840] R10: 000000000000001e R11: 0000000000000d7f R12: ffff880006a40ec8
[ 5768.844917] R13: ffff8800be922e00 R14: 0000000000560056 R15: 000000009553807e
[ 5768.853995] FS:  0000000000000000(0000) GS:ffff880001600000(0000) knlGS:0000000000000000
[ 5768.859584] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[ 5768.867708] CR2: 0000000000000ad0 CR3: 00000000ba8d8000 CR4: 00000000000026f0
[ 5768.872155] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 5768.881235] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 5768.888900] Process ksoftirqd/0 (pid: 4, threadinfo ffff8800bf8b4000, task ffff8800bf8a8650)
[ 5768.895027] Stack:
[ 5768.899363]  ffff88004c485ec0 ffff88009553807e ffff8800bd184000 0000004281229811
[ 5768.904093] <0> ffff880001603e48 ffff880006a40ec8 ffff88004c485ec0 ffffffff813ead30
[ 5768.913174] <0> ffff8800bd184000 ffff8800beb1dec0 ffff880001603e98 ffffffff81230fa0
[ 5768.922327] Call Trace:
[ 5768.922327]  <IRQ> 
[ 5768.926596]  [<ffffffff81230fa0>] dev_hard_start_xmit+0x21c/0x2b7
[ 5768.931393]  [<ffffffff8123f422>] sch_direct_xmit+0x5e/0x154
[ 5768.935678]  [<ffffffff8123f5d4>] __qdisc_run+0xbc/0xd5
[ 5768.944755]  [<ffffffff8122ecd1>] net_tx_action+0xbb/0x10e
[ 5768.949643]  [<ffffffff8103d292>] __do_softirq+0x91/0x11b
[ 5768.956054]  [<ffffffff8100be9c>] call_softirq+0x1c/0x28
[ 5768.958733]  <EOI> 
[ 5768.963566]  [<ffffffff8100d907>] do_softirq+0x33/0x6b
[ 5768.967800]  [<ffffffff8103cd9a>] ksoftirqd+0x60/0xd7
[ 5768.972644]  [<ffffffff8103cd3a>] ? ksoftirqd+0x0/0xd7
[ 5768.976939]  [<ffffffff8104a563>] kthread+0x7a/0x82
[ 5768.981727]  [<ffffffff8100bd9a>] child_rip+0xa/0x20
[ 5768.986004]  [<ffffffff8104a4e9>] ? kthread+0x0/0x82
[ 5768.990805]  [<ffffffff8100bd90>] ? child_rip+0x0/0x20
[ 5768.995081] Code: 06 00 00 00 00 89 08 66 c7 40 04 00 00 c6 40 06 01 c6 40 07 9f 41 0f b7 c6 48 89 c7 48 c1 e0 03 48 c1 e7 05 48 89 f9 48 03 4b 20 <4c> 89 79 10 48 c7 41 08 01 00 00 00 8b 75 cc 89 71 18 48 03 7b 
[ 5769.018044] RIP  [<ffffffffa003d46f>] sky2_xmit_frame+0x321/0x5d8 [sky2]
[ 5769.025816]  RSP <ffff880001603df8>
[ 5769.027123] CR2: 0000000000000ad0
[ 5769.033031] ---[ end trace 90bf20a10331c8d8 ]---
[ 5769.037702] Kernel panic - not syncing: Fatal exception in interrupt
[ 5769.044106] Pid: 4, comm: ksoftirqd/0 Tainted: P      D    2.6.32.2 #9
[ 5769.050724] Call Trace:
[ 5769.053213]  <IRQ>  [<ffffffff812975ed>] panic+0x75/0x11c
[ 5769.058677]  [<ffffffff8100e9a7>] oops_end+0x81/0x8e
[ 5769.063712]  [<ffffffff81026413>] no_context+0x1ee/0x1fd
[ 5769.069067]  [<ffffffff8102ee15>] ? walk_tg_tree+0x5e/0x74
[ 5769.074605]  [<ffffffff81026594>] __bad_area_nosemaphore+0x172/0x195
[ 5769.081044]  [<ffffffff810265c5>] bad_area_nosemaphore+0xe/0x10
[ 5769.087032]  [<ffffffff810267ff>] do_page_fault+0x114/0x252
[ 5769.092659]  [<ffffffff810307de>] ? update_shares+0x26/0x57
[ 5769.098291]  [<ffffffff81299bff>] page_fault+0x1f/0x30
[ 5769.103489]  [<ffffffffa003d46f>] ? sky2_xmit_frame+0x321/0x5d8 [sky2]
[ 5769.110097]  [<ffffffffa003d254>] ? sky2_xmit_frame+0x106/0x5d8 [sky2]
[ 5769.116706]  [<ffffffff81230fa0>] dev_hard_start_xmit+0x21c/0x2b7
[ 5769.122863]  [<ffffffff8123f422>] sch_direct_xmit+0x5e/0x154
[ 5769.128600]  [<ffffffff8123f5d4>] __qdisc_run+0xbc/0xd5
[ 5769.133933]  [<ffffffff8122ecd1>] net_tx_action+0xbb/0x10e
[ 5769.139468]  [<ffffffff8103d292>] __do_softirq+0x91/0x11b
[ 5769.144929]  [<ffffffff8100be9c>] call_softirq+0x1c/0x28
[ 5769.150291]  <EOI>  [<ffffffff8100d907>] do_softirq+0x33/0x6b
[ 5769.156125]  [<ffffffff8103cd9a>] ksoftirqd+0x60/0xd7
[ 5769.161240]  [<ffffffff8103cd3a>] ? ksoftirqd+0x0/0xd7
[ 5769.166430]  [<ffffffff8104a563>] kthread+0x7a/0x82
[ 5769.171370]  [<ffffffff8100bd9a>] child_rip+0xa/0x20
[ 5769.176398]  [<ffffffff8104a4e9>] ? kthread+0x0/0x82
[ 5769.181416]  [<ffffffff8100bd90>] ? child_rip+0x0/0x20

  reply	other threads:[~2010-01-01  3:06 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-12-31 10:55 [PATCH] sky2: Lock transmit queue while disabling device Mike McCormack
2009-12-31 15:58 ` Michael Breuer
2009-12-31 16:15   ` Daniel Hazelton
2009-12-31 16:33 ` Berck Nash
2009-12-31 18:51 ` Jarek Poplawski
2009-12-31 23:51   ` Mike McCormack
2010-01-01  3:06     ` Berck E. Nash [this message]
2010-01-01  6:42       ` Stephen Hemminger
2010-01-01 18:31     ` Jarek Poplawski
2010-01-04  2:44       ` Berck E. Nash
2010-01-04 13:49         ` [PATCH] sky2: Fix oops in sky2_xmit_frame() after TX timeout Jarek Poplawski
2010-01-04 18:26           ` Stephen Hemminger
2010-01-04 18:48             ` [PATCH v2] " Jarek Poplawski
2010-01-07  4:27 ` [PATCH] sky2: Lock transmit queue while disabling device David Miller
2010-01-07  6:35   ` Jarek Poplawski
2010-01-07  8:01     ` David Miller
2010-01-07  8:15       ` Jarek Poplawski
2010-01-07  8:19         ` David Miller
2010-01-07 13:48           ` Stephen Hemminger
2010-01-07 18:08           ` Stephen Hemminger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B3D66AA.3030709@gmail.com \
    --to=flyboy@gmail.com \
    --cc=dhazelton@enter.net \
    --cc=jarkao2@gmail.com \
    --cc=mbreuer@majjas.com \
    --cc=mikem@ring3k.org \
    --cc=netdev@vger.kernel.org \
    --cc=shemminger@vyatta.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).