From: Flavio Leitner <fleitner@redhat.com>
To: nhorman@tuxdriver.com
Cc: netdev@vger.kernel.org, bonding-devel@lists.sourceforge.net,
fubar@us.ibm.com, davem@davemloft.net, andy@greyhouse.net,
amwang@redhat.com
Subject: Re: [PATCH] bonding: various fixes for bonding, netpoll & netconsole (v2)
Date: Fri, 15 Oct 2010 20:41:15 -0300 [thread overview]
Message-ID: <20101015234115.GB2747@redhat.com> (raw)
In-Reply-To: <1286973334-4339-1-git-send-email-nhorman@tuxdriver.com>
On Wed, Oct 13, 2010 at 08:35:29AM -0400, nhorman@tuxdriver.com wrote:
> Version 2, taking teh following changes into account:
>
> 1) Moved tx blocking/checking macros to netpoll.h as suggested by amwang
>
> 2) Added tx blocking macro calls to sysfs paths, as they can deadlock in the
> same way that the link monitoring paths can.
>
> Summary:
> A while ago we tried to enable netpoll on the bonding driver to enable
> netconsole. That worked well in a steady state, but deadlocked frequently in
> failover conditions due to some recursive lock-taking (as well as a few other
> problems). I've gone through the driver, netconsole and netpoll code, fixed up
> those deadlocks, and confirmed that, with this patch series, we can use
> netconsole on bonding without deadlock in all bonding modes with all slaves,
> even accross failovers. I've also fixed up some incidental bugs that I ran
> across while looking through this code, as described in individual patches
>
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
I've tested these patch series and found this:
netconsole: network logging started
bonding: bond0: making interface eth0 the new active one.
------------[ cut here ]------------
WARNING: at kernel/softirq.c:143 _local_bh_enable_ip+0x4e/0xd7()
Hardware name: Precision WorkStation 490
Modules linked in: netconsole configfs sunrpc bonding ip6t_REJECT
nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 p4_clockmod freq_table
speedstep_lib dm_multipath uinput snd_hda_codec_idt snd_hda_intel
snd_hda_codec snd_hwdep snd_seq snd_seq_device i5k_amb snd_pcm hwmon
i5000_edac snd_timer edac_core e1000 snd ppdev parport_pc iTCO_wdt
parport iTCO_vendor_support soundcore tg3 dcdbas pcspkr shpchp i2c_i801
serio_raw snd_page_alloc nouveau ttm drm_kms_helper drm i2c_algo_bit
video output i2c_core [last unloaded: netconsole]
Pid: 8, comm: kworker/1:0 Not tainted 2.6.36-rc7+ #26
Call Trace:
[<ffffffff810510c5>] warn_slowpath_common+0x85/0x9d
[<ffffffff813cfcf2>] ? rcu_read_unlock_bh+0x26/0x28
[<ffffffff810510f7>] warn_slowpath_null+0x1a/0x1c
[<ffffffff810574fa>] _local_bh_enable_ip+0x4e/0xd7
[<ffffffff810575a5>] local_bh_enable+0x12/0x14 <-- enabling again
[<ffffffff813cfcf2>] rcu_read_unlock_bh+0x26/0x28
[<ffffffff813d08a1>] dev_queue_xmit+0x363/0x375
[<ffffffff813d053e>] ? dev_queue_xmit+0x0/0x375
[<ffffffffa028c1e0>] bond_dev_queue_xmit+0xbe/0xdb [bonding]
[<ffffffffa028c46e>] bond_start_xmit+0x271/0x4df [bonding]
[<ffffffff813e0a15>] queue_process+0xcd/0x18a <- interrupts disabled
[<ffffffff813e0948>] ? queue_process+0x0/0x18a
[<ffffffff810673cf>] process_one_work+0x216/0x37d
[<ffffffff81067344>] ? process_one_work+0x18b/0x37d
[<ffffffff8106920d>] ? manage_workers+0x10b/0x195
[<ffffffff810693d8>] worker_thread+0x141/0x21e
[<ffffffff81069297>] ? worker_thread+0x0/0x21e
[<ffffffff8106c988>] kthread+0x9d/0xa5
[<ffffffff8100aaa4>] kernel_thread_helper+0x4/0x10
[<ffffffff8147f950>] ? restore_args+0x0/0x30
[<ffffffff8106c8eb>] ? kthread+0x0/0xa5
[<ffffffff8100aaa0>] ? kernel_thread_helper+0x0/0x10
---[ end trace 55688f5173e9b393 ]---
e1000: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
bonding: bond0: link status definitely up for interface eth1.
0)
It happens because queue_process() disables the local
interrupts before call ->ndo_start_xmit() and then
dev_queue_xmit() will enable them back.
I have CONFIG_TRACE_IRQFLAGS=y on my .config.
--
Flavio
next prev parent reply other threads:[~2010-10-15 23:41 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-10-13 12:35 [PATCH] bonding: various fixes for bonding, netpoll & netconsole (v2) nhorman
2010-10-13 12:35 ` [PATCH 1/5] Fix bonding drivers improper modification of netpoll structure nhorman
2010-10-13 12:35 ` [PATCH 2/5] Fix deadlock in bonding driver resulting from internal locking when using netpoll nhorman
2010-10-13 12:35 ` [PATCH 3/5] Fix napi poll for bonding driver nhorman
2010-10-13 12:35 ` [PATCH 4/5] Fix netconsole to not deadlock on rmmod nhorman
2010-10-13 12:35 ` [PATCH 5/5] Re-enable netpoll over bonding nhorman
2010-10-15 23:41 ` Flavio Leitner [this message]
2010-10-16 0:06 ` [PATCH] bonding: various fixes for bonding, netpoll & netconsole (v2) Neil Horman
2010-10-16 0:45 ` Flavio Leitner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20101015234115.GB2747@redhat.com \
--to=fleitner@redhat.com \
--cc=amwang@redhat.com \
--cc=andy@greyhouse.net \
--cc=bonding-devel@lists.sourceforge.net \
--cc=davem@davemloft.net \
--cc=fubar@us.ibm.com \
--cc=netdev@vger.kernel.org \
--cc=nhorman@tuxdriver.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.