From mboxrd@z Thu Jan 1 00:00:00 1970 From: Flavio Leitner Subject: Re: [PATCH] bonding: various fixes for bonding, netpoll & netconsole (v2) Date: Fri, 15 Oct 2010 20:41:15 -0300 Message-ID: <20101015234115.GB2747@redhat.com> References: <1286973334-4339-1-git-send-email-nhorman@tuxdriver.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: netdev@vger.kernel.org, bonding-devel@lists.sourceforge.net, fubar@us.ibm.com, davem@davemloft.net, andy@greyhouse.net, amwang@redhat.com To: nhorman@tuxdriver.com Return-path: Received: from mx1.redhat.com ([209.132.183.28]:35258 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751509Ab0JOXlg (ORCPT ); Fri, 15 Oct 2010 19:41:36 -0400 Content-Disposition: inline In-Reply-To: <1286973334-4339-1-git-send-email-nhorman@tuxdriver.com> Sender: netdev-owner@vger.kernel.org List-ID: On Wed, Oct 13, 2010 at 08:35:29AM -0400, nhorman@tuxdriver.com wrote: > Version 2, taking teh following changes into account: > > 1) Moved tx blocking/checking macros to netpoll.h as suggested by amwang > > 2) Added tx blocking macro calls to sysfs paths, as they can deadlock in the > same way that the link monitoring paths can. > > Summary: > A while ago we tried to enable netpoll on the bonding driver to enable > netconsole. That worked well in a steady state, but deadlocked frequently in > failover conditions due to some recursive lock-taking (as well as a few other > problems). I've gone through the driver, netconsole and netpoll code, fixed up > those deadlocks, and confirmed that, with this patch series, we can use > netconsole on bonding without deadlock in all bonding modes with all slaves, > even accross failovers. I've also fixed up some incidental bugs that I ran > across while looking through this code, as described in individual patches > > Signed-off-by: Neil Horman I've tested these patch series and found this: netconsole: network logging started bonding: bond0: making interface eth0 the new active one. ------------[ cut here ]------------ WARNING: at kernel/softirq.c:143 _local_bh_enable_ip+0x4e/0xd7() Hardware name: Precision WorkStation 490 Modules linked in: netconsole configfs sunrpc bonding ip6t_REJECT nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 p4_clockmod freq_table speedstep_lib dm_multipath uinput snd_hda_codec_idt snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device i5k_amb snd_pcm hwmon i5000_edac snd_timer edac_core e1000 snd ppdev parport_pc iTCO_wdt parport iTCO_vendor_support soundcore tg3 dcdbas pcspkr shpchp i2c_i801 serio_raw snd_page_alloc nouveau ttm drm_kms_helper drm i2c_algo_bit video output i2c_core [last unloaded: netconsole] Pid: 8, comm: kworker/1:0 Not tainted 2.6.36-rc7+ #26 Call Trace: [] warn_slowpath_common+0x85/0x9d [] ? rcu_read_unlock_bh+0x26/0x28 [] warn_slowpath_null+0x1a/0x1c [] _local_bh_enable_ip+0x4e/0xd7 [] local_bh_enable+0x12/0x14 <-- enabling again [] rcu_read_unlock_bh+0x26/0x28 [] dev_queue_xmit+0x363/0x375 [] ? dev_queue_xmit+0x0/0x375 [] bond_dev_queue_xmit+0xbe/0xdb [bonding] [] bond_start_xmit+0x271/0x4df [bonding] [] queue_process+0xcd/0x18a <- interrupts disabled [] ? queue_process+0x0/0x18a [] process_one_work+0x216/0x37d [] ? process_one_work+0x18b/0x37d [] ? manage_workers+0x10b/0x195 [] worker_thread+0x141/0x21e [] ? worker_thread+0x0/0x21e [] kthread+0x9d/0xa5 [] kernel_thread_helper+0x4/0x10 [] ? restore_args+0x0/0x30 [] ? kthread+0x0/0xa5 [] ? kernel_thread_helper+0x0/0x10 ---[ end trace 55688f5173e9b393 ]--- e1000: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX bonding: bond0: link status definitely up for interface eth1. 0) It happens because queue_process() disables the local interrupts before call ->ndo_start_xmit() and then dev_queue_xmit() will enable them back. I have CONFIG_TRACE_IRQFLAGS=y on my .config. -- Flavio