All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Berck E. Nash" <flyboy@gmail.com>
To: Mike McCormack <mikem@ring3k.org>
Cc: Jarek Poplawski <jarkao2@gmail.com>,
	Stephen Hemminger <shemminger@vyatta.com>,
	netdev@vger.kernel.org, dhazelton@enter.net, mbreuer@majjas.com
Subject: Re: [PATCH] sky2: Lock transmit queue while disabling device
Date: Thu, 31 Dec 2009 20:06:18 -0700	[thread overview]
Message-ID: <4B3D66AA.3030709@gmail.com> (raw)
In-Reply-To: <4B3D38FB.40105@ring3k.org>

[-- Attachment #1: Type: text/plain, Size: 3059 bytes --]

Well, that didn't fix it.  Oops attached, looks pretty much the same to me.

Mike McCormack wrote:
> Hi Jarek,
> 
> This is based on my analysis of the oops at:
> 
> http://bugzilla.kernel.org/show_bug.cgi?id=14925
> 
> Specifically:
> 
>>>> [ 8673.345873] sky2 eth0: receiver hang detected
>>>> [ 8673.350368] sky2 eth0: disabling interface
>>>> [ 8673.354749] BUG: unable to handle kernel NULL pointer dereference at
>>>> 0000000000000010
>>>> [ 8673.359748] IP: [<ffffffffa00373d3>] sky2_xmit_frame+0x321/0x5d8 
>>>> [sky2]
> 
> netif_device_detach() does not guarantee that all transmits have completed 
> after it returns.
> 
> CPU 1 stack will look like:
> 
>   dev_queue_xmit()
>      HARD_TX_LOCK() -> __netif_tx_lock()
>      ...
>      dev_hard_start_xmit()
>         ops->ndo_start_xmit()  -> sky2_xmit_frame()
>         sky2_xmit_frame() pushing skb to hardware
>           use NULL tx_ring here
> 
> 
> CPU 2 stack will look like:
>            
>   sky2_restart()
>      rtnl_lock()
>      sky2_detach()
>         netif_device_detach()
>         sky2_down()
>           printk("sky2 eth0: disabling interface")
>           ...
>           sky2_free_buffers(sky2);
>             sky2->tx_ring = NULL;
>           ...
> 
> Another way to solve the problem would be to take the transmit lock in 
> netif_device_detach() to make sure that any in progress transmits have
> completed before returning.
> 
> Note that most of these backtraces are using the nvidia binary only 
> module.  This may change the timings and make the sky2 race more likely,
> or be involved in the "tx timeout" condition that triggers a sky2_restart().
> 
> Will test with netif_tx_lock_bh and resubmit.
> 
> thanks,
> 
> Mike
>      
>   
>    
> 
> Jarek Poplawski wrote:
>> Mike McCormack wrote, On 12/31/2009 11:55 AM:
>>
>>> netif_device_detach() does not take the tx_lock, so it's
>>>  possible that a call to sky2_xmit_frame is still in
>>>  progress after netif_device_detach() is complete.
>>>
>>> Take netif_tx_lock() to make sure all transmits have
>>>  stopped while we're disabling the devices and that
>>>  no other CPU is still transmitting a frame after
>>>  we've disabling the device.
>>>
>>> Proposed fix for "sky2 panic under load" reported by Berck E. Nash.
>> Could you give some scenario of the oops/fix?
>> Btw, even if it worked, you should use netif_tx_lock_bh
>> version considering sky2_detach use contexts, I guess.
>>
>> Jarek P.
>>
>>> Signed-off-by: Mike McCormack <mikem@ring3k.org>
>>> ---
>>>  drivers/net/sky2.c |    2 ++
>>>  1 files changed, 2 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/drivers/net/sky2.c b/drivers/net/sky2.c
>>> index faa4841..8ae8520 100644
>>> --- a/drivers/net/sky2.c
>>> +++ b/drivers/net/sky2.c
>>> @@ -3176,7 +3176,9 @@ static void sky2_reset(struct sky2_hw *hw)
>>>  static void sky2_detach(struct net_device *dev)
>>>  {
>>>  	if (netif_running(dev)) {
>>> +		netif_tx_lock(dev);
>>>  		netif_device_detach(dev);	/* stop txq */
>>> +		netif_tx_unlock(dev);
>>>  		sky2_down(dev);
>>>  	}
>>>  }
>>
> 


[-- Attachment #2: sky2crash2.txt --]
[-- Type: text/plain, Size: 5379 bytes --]

[ 5768.704033] sky2 eth0: receiver hang detected
[ 5768.708579] sky2 eth0: disabling interface
[ 5768.712928] BUG: unable to handle kernel NULL pointer dereference at 0000000000000ad0
[ 5768.717776] IP: [<ffffffffa003d46f>] sky2_xmit_frame+0x321/0x5d8 [sky2]
[ 5768.726935] PGD beaa3067 PUD ba837067 PMD 0 
[ 5768.731121] Oops: 0002 [#1] SMP 
[ 5768.731121] last sysfs file: /sys/devices/platform/coretemp.0/temp1_label
[ 5768.740188] CPU 0 
[ 5768.742247] Modules linked in: nvidia(P) nfsd exportfs nfs lockd nfs_acl auth_rpcgss sunrpc nls_cp437 msdos fat kvm_intel kvm fuse snd_rtctimer usbhid hwmon_vid tuner_simple tuner_types wm8775 snd_hda_codec_realtek tda9887 tda8290 snd_hda_intel snd_hda_codec snd_pcm_oss snd_mixer_oss snd_pcm tuner snd_seq_dummy cx25840 ivtv i2c_algo_bit cx2341x snd_seq_oss uhci_hcd snd_seq_midi_event ehci_hcd v4l2_common i2c_i801 snd_seq videodev snd_timer v4l1_compat v4l2_compat_ioctl32 snd_seq_device tveeprom snd floppy sky2 usbcore soundcore snd_page_alloc [last unloaded: nvidia]
[ 5768.794811] Pid: 4, comm: ksoftirqd/0 Tainted: P           2.6.32.2 #9 P5W DH Deluxe
[ 5768.801019] RIP: 0010:[<ffffffffa003d46f>]  [<ffffffffa003d46f>] sky2_xmit_frame+0x321/0x5d8 [sky2]
[ 5768.808600] RSP: 0018:ffff880001603df8  EFLAGS: 00010206
[ 5768.817679] RAX: 00000000000002b0 RBX: ffff8800bd184540 RCX: 0000000000000ac0
[ 5768.822147] RDX: 0000000000000000 RSI: 000000000000008c RDI: 0000000000000ac0
[ 5768.831325] RBP: ffff880001603e48 R08: 0000000000000001 R09: 0000000000000000
[ 5768.835840] R10: 000000000000001e R11: 0000000000000d7f R12: ffff880006a40ec8
[ 5768.844917] R13: ffff8800be922e00 R14: 0000000000560056 R15: 000000009553807e
[ 5768.853995] FS:  0000000000000000(0000) GS:ffff880001600000(0000) knlGS:0000000000000000
[ 5768.859584] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
[ 5768.867708] CR2: 0000000000000ad0 CR3: 00000000ba8d8000 CR4: 00000000000026f0
[ 5768.872155] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 5768.881235] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 5768.888900] Process ksoftirqd/0 (pid: 4, threadinfo ffff8800bf8b4000, task ffff8800bf8a8650)
[ 5768.895027] Stack:
[ 5768.899363]  ffff88004c485ec0 ffff88009553807e ffff8800bd184000 0000004281229811
[ 5768.904093] <0> ffff880001603e48 ffff880006a40ec8 ffff88004c485ec0 ffffffff813ead30
[ 5768.913174] <0> ffff8800bd184000 ffff8800beb1dec0 ffff880001603e98 ffffffff81230fa0
[ 5768.922327] Call Trace:
[ 5768.922327]  <IRQ> 
[ 5768.926596]  [<ffffffff81230fa0>] dev_hard_start_xmit+0x21c/0x2b7
[ 5768.931393]  [<ffffffff8123f422>] sch_direct_xmit+0x5e/0x154
[ 5768.935678]  [<ffffffff8123f5d4>] __qdisc_run+0xbc/0xd5
[ 5768.944755]  [<ffffffff8122ecd1>] net_tx_action+0xbb/0x10e
[ 5768.949643]  [<ffffffff8103d292>] __do_softirq+0x91/0x11b
[ 5768.956054]  [<ffffffff8100be9c>] call_softirq+0x1c/0x28
[ 5768.958733]  <EOI> 
[ 5768.963566]  [<ffffffff8100d907>] do_softirq+0x33/0x6b
[ 5768.967800]  [<ffffffff8103cd9a>] ksoftirqd+0x60/0xd7
[ 5768.972644]  [<ffffffff8103cd3a>] ? ksoftirqd+0x0/0xd7
[ 5768.976939]  [<ffffffff8104a563>] kthread+0x7a/0x82
[ 5768.981727]  [<ffffffff8100bd9a>] child_rip+0xa/0x20
[ 5768.986004]  [<ffffffff8104a4e9>] ? kthread+0x0/0x82
[ 5768.990805]  [<ffffffff8100bd90>] ? child_rip+0x0/0x20
[ 5768.995081] Code: 06 00 00 00 00 89 08 66 c7 40 04 00 00 c6 40 06 01 c6 40 07 9f 41 0f b7 c6 48 89 c7 48 c1 e0 03 48 c1 e7 05 48 89 f9 48 03 4b 20 <4c> 89 79 10 48 c7 41 08 01 00 00 00 8b 75 cc 89 71 18 48 03 7b 
[ 5769.018044] RIP  [<ffffffffa003d46f>] sky2_xmit_frame+0x321/0x5d8 [sky2]
[ 5769.025816]  RSP <ffff880001603df8>
[ 5769.027123] CR2: 0000000000000ad0
[ 5769.033031] ---[ end trace 90bf20a10331c8d8 ]---
[ 5769.037702] Kernel panic - not syncing: Fatal exception in interrupt
[ 5769.044106] Pid: 4, comm: ksoftirqd/0 Tainted: P      D    2.6.32.2 #9
[ 5769.050724] Call Trace:
[ 5769.053213]  <IRQ>  [<ffffffff812975ed>] panic+0x75/0x11c
[ 5769.058677]  [<ffffffff8100e9a7>] oops_end+0x81/0x8e
[ 5769.063712]  [<ffffffff81026413>] no_context+0x1ee/0x1fd
[ 5769.069067]  [<ffffffff8102ee15>] ? walk_tg_tree+0x5e/0x74
[ 5769.074605]  [<ffffffff81026594>] __bad_area_nosemaphore+0x172/0x195
[ 5769.081044]  [<ffffffff810265c5>] bad_area_nosemaphore+0xe/0x10
[ 5769.087032]  [<ffffffff810267ff>] do_page_fault+0x114/0x252
[ 5769.092659]  [<ffffffff810307de>] ? update_shares+0x26/0x57
[ 5769.098291]  [<ffffffff81299bff>] page_fault+0x1f/0x30
[ 5769.103489]  [<ffffffffa003d46f>] ? sky2_xmit_frame+0x321/0x5d8 [sky2]
[ 5769.110097]  [<ffffffffa003d254>] ? sky2_xmit_frame+0x106/0x5d8 [sky2]
[ 5769.116706]  [<ffffffff81230fa0>] dev_hard_start_xmit+0x21c/0x2b7
[ 5769.122863]  [<ffffffff8123f422>] sch_direct_xmit+0x5e/0x154
[ 5769.128600]  [<ffffffff8123f5d4>] __qdisc_run+0xbc/0xd5
[ 5769.133933]  [<ffffffff8122ecd1>] net_tx_action+0xbb/0x10e
[ 5769.139468]  [<ffffffff8103d292>] __do_softirq+0x91/0x11b
[ 5769.144929]  [<ffffffff8100be9c>] call_softirq+0x1c/0x28
[ 5769.150291]  <EOI>  [<ffffffff8100d907>] do_softirq+0x33/0x6b
[ 5769.156125]  [<ffffffff8103cd9a>] ksoftirqd+0x60/0xd7
[ 5769.161240]  [<ffffffff8103cd3a>] ? ksoftirqd+0x0/0xd7
[ 5769.166430]  [<ffffffff8104a563>] kthread+0x7a/0x82
[ 5769.171370]  [<ffffffff8100bd9a>] child_rip+0xa/0x20
[ 5769.176398]  [<ffffffff8104a4e9>] ? kthread+0x0/0x82
[ 5769.181416]  [<ffffffff8100bd90>] ? child_rip+0x0/0x20

  reply	other threads:[~2010-01-01  3:06 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-12-31 10:55 [PATCH] sky2: Lock transmit queue while disabling device Mike McCormack
2009-12-31 15:58 ` Michael Breuer
2009-12-31 16:15   ` Daniel Hazelton
2009-12-31 16:33 ` Berck Nash
2009-12-31 18:51 ` Jarek Poplawski
2009-12-31 23:51   ` Mike McCormack
2010-01-01  3:06     ` Berck E. Nash [this message]
2010-01-01  6:42       ` Stephen Hemminger
2010-01-01 18:31     ` Jarek Poplawski
2010-01-04  2:44       ` Berck E. Nash
2010-01-04 13:49         ` [PATCH] sky2: Fix oops in sky2_xmit_frame() after TX timeout Jarek Poplawski
2010-01-04 18:26           ` Stephen Hemminger
2010-01-04 18:48             ` [PATCH v2] " Jarek Poplawski
2010-01-07  4:27 ` [PATCH] sky2: Lock transmit queue while disabling device David Miller
2010-01-07  6:35   ` Jarek Poplawski
2010-01-07  8:01     ` David Miller
2010-01-07  8:15       ` Jarek Poplawski
2010-01-07  8:19         ` David Miller
2010-01-07 13:48           ` Stephen Hemminger
2010-01-07 18:08           ` Stephen Hemminger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B3D66AA.3030709@gmail.com \
    --to=flyboy@gmail.com \
    --cc=dhazelton@enter.net \
    --cc=jarkao2@gmail.com \
    --cc=mbreuer@majjas.com \
    --cc=mikem@ring3k.org \
    --cc=netdev@vger.kernel.org \
    --cc=shemminger@vyatta.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.