netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Cong Wang <amwang@redhat.com>
To: nhorman@tuxdriver.com
Cc: netdev@vger.kernel.org, bonding-devel@lists.sourceforge.net,
	fubar@us.ibm.com, davem@davemloft.net, andy@greyhouse.net
Subject: Re: [PATCH 1/2] Remove netpoll blocking from uninit path
Date: Wed, 20 Oct 2010 15:47:11 +0800	[thread overview]
Message-ID: <4CBE9E7F.60107@redhat.com> (raw)
In-Reply-To: <1287507866-25156-2-git-send-email-nhorman@tuxdriver.com>

On 10/20/10 01:04, nhorman@tuxdriver.com wrote:
> From: Neil Horman<nhorman@tuxdriver.com>
>
> Some recent testing in netpoll with bonding showed this backtrace
>
>   ------------[ cut here ]------------
>   kernel BUG at drivers/net/bonding/bonding.h:134!
>   invalid opcode: 0000 [#1] SMP
>   last sysfs file: /sys/devices/pci0000:00/0000:00:1d.2/usb7/devnum
>   CPU 0
>   Pid: 1876, comm: rmmod Not tainted 2.6.36-rc3+ #10 D26928/
>   RIP: 0010:[<ffffffffa0514ba4>]  [<ffffffffa0514ba4>] bond_uninit+0x6f4/0x7a0
>   RSP: 0018:ffff88003b1b5d58  EFLAGS: 00010296
>   RAX: ffff88003b9b6200 RBX: ffff8800373e8e00 RCX: 00000000000f4240
>   RDX: 00000000ffffffff RSI: 0000000000000286 RDI: 0000000000000286
>   RBP: ffff88003b1b5dc8 R08: 0000000000000000 R09: 00000001af7de920
>   R10: 0000000000000000 R11: ffff880002495e98 R12: ffff880037922700
>   R13: ffff880038c31000 R14: ffff880037922730 R15: 0000000000000286
>   FS:  00007f90e6d72700(0000) GS:ffff880002400000(0000) knlGS:0000000000000000
>   CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>   CR2: 000000346f0d9ad0 CR3: 000000003b263000 CR4: 00000000000006f0
>   DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>   DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>   Process rmmod (pid: 1876, threadinfo ffff88003b1b4000, task ffff88003b36aa80)
>   Stack:
>   00000000ffffffff ffff88003b1b5d7a ffff8800379221e8 ffff880037922000
>   <0>  ffff88003b1b5dc8 ffffffff813eb5fb ffff88003b1b5da8 0000000031b177a3
>   <0>  ffff88003b1b5da8 ffff880037922000 ffff88003b1b5e48 ffff88003b1b5e48
>   Call Trace:
>   [<ffffffff813eb5fb>] ? rtmsg_ifinfo+0xcb/0xf0
>   [<ffffffff813daad8>] rollback_registered_many+0x168/0x280
>   [<ffffffff813dac09>] unregister_netdevice_many+0x19/0x80
>   [<ffffffff813e97b3>] __rtnl_kill_links+0x63/0x90
>   [<ffffffff813e980b>] __rtnl_link_unregister+0x2b/0x60
>   [<ffffffff813e9bde>] rtnl_link_unregister+0x1e/0x30
>   [<ffffffffa052124b>] bonding_exit+0x37/0x51 [bonding]
>   [<ffffffff81098b2e>] sys_delete_module+0x19e/0x270
>   [<ffffffff810bb2b2>] ? audit_syscall_entry+0x252/0x280
>   [<ffffffff8100b0b2>] system_call_fastpath+0x16/0x1b
>   RIP  [<ffffffffa0514ba4>] bond_uninit+0x6f4/0x7a0 [bonding]
>   RSP<ffff88003b1b5d58>
>   ---[ end trace 1395ad691cea24d1 ]---
>
> It occurs because of my recent netpoll blocking patches, which I added to avoid
> recursive deadlock in the bonding driver.  It relies on some per cpu bits, but
> the shutdown path forces some rescheduling as we cancel workqueues for the
> driver and wait for some device refcounts.  If after the forced reschedule, we
> wind up on a different cpu we trigger the bughalt in unblock_netpoll_tx.
>
> The fix is to remove the netpoll block/unblock calls from bond_release_all.
> This is safe to do because bond_uninit, which is called via ndo_uninit in
> rollback_registered_many, doesn't occur until we send a NETDEV_UNREGISTER event,
> which triggers netconsole to remove us as a netpoll client, so we are guaranteed
> not to recurse into our own tx path here.

Also bond_release_all() is called after bond_netpoll_cleanup()
in bond_uninit().

>
> Signed-off-by: Neil Horman<nhorman@tuxdriver.com>

Reviewed-by: WANG Cong <amwang@redhat.com>

Thanks.

  reply	other threads:[~2010-10-20  7:42 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-10-19 17:04 [PATCH] bonding: minor cleanups to bond + netpoll nhorman
2010-10-19 17:04 ` [PATCH 1/2] Remove netpoll blocking from uninit path nhorman
2010-10-20  7:47   ` Cong Wang [this message]
2010-10-20  8:45     ` David Miller
2010-10-20 10:51       ` Neil Horman
2010-10-19 17:04 ` [PATCH 2/2] Revert napi_poll fix for bonding driver nhorman
2010-10-20  7:52   ` Cong Wang
2010-10-20  8:45     ` David Miller
2010-10-19 20:29 ` [PATCH] bonding: minor cleanups to bond + netpoll Andy Gospodarek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4CBE9E7F.60107@redhat.com \
    --to=amwang@redhat.com \
    --cc=andy@greyhouse.net \
    --cc=bonding-devel@lists.sourceforge.net \
    --cc=davem@davemloft.net \
    --cc=fubar@us.ibm.com \
    --cc=netdev@vger.kernel.org \
    --cc=nhorman@tuxdriver.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).