From mboxrd@z Thu Jan 1 00:00:00 1970 From: Andy Gospodarek Subject: Re: [PATCH 2/4] bonding: make sure tx and rx hash tables stay in sync when using alb mode Date: Wed, 16 Sep 2009 19:44:51 -0400 Message-ID: <20090916234451.GW8515@gospo.rdu.redhat.com> References: <20090911211112.GR8515@gospo.rdu.redhat.com> <27763.1253144169@death.nxdomain.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Andy Gospodarek , netdev@vger.kernel.org, bonding-devel@lists.sourceforge.net To: Jay Vosburgh Return-path: Received: from mx1.redhat.com ([209.132.183.28]:29987 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755444AbZIPXpS (ORCPT ); Wed, 16 Sep 2009 19:45:18 -0400 Content-Disposition: inline In-Reply-To: <27763.1253144169@death.nxdomain.ibm.com> Sender: netdev-owner@vger.kernel.org List-ID: On Wed, Sep 16, 2009 at 04:36:09PM -0700, Jay Vosburgh wrote: > Andy Gospodarek wrote: > > > > >Subject: [PATCH] bonding: make sure tx and rx hash tables stay in sync when using alb mode > > When testing this, I'm getting a lockdep warning. It appears to > be unhappy that tlb_choose_channel acquires the tx / rx hash table locks > in the order tx then rx, but rlb_choose_channel -> alb_get_best_slave > acquires the locks in the other order. I applied all four patches, but > it looks like the change that trips lockdep is in this patch (#2). Interesting. I specifically enabled lockdep (or so I thought) because I wanted to be sure by more than my inspection that there were no deadlock possibilities. > > I haven't gotten an actual deadlock from this, although it seems > plausible if there are two cpus in bond_alb_xmit at the same time, and > one of them is sending an ARP. > > One fairly straightforward fix would be to combine the rx and tx > hash table locks into a single lock. I suspect that wouldn't have any > real performance penalty, since the rx hash table lock is generally not > acquired very often (unlike the tx lock, which is taken for every packet > that goes out). > That will probably work. I'll take a look at this right away and see how feasible that is. > Also, FYI, two of the four patches had trailing whitespace. I > believe it was #2 and #4. Grrr, I can't believe I did that. :-/ > > Thoughts? > > Here's the lockdep warning: > > Ethernet Channel Bonding Driver: v3.5.0 (November 4, 2008) > bonding: In ALB mode you might experience client disconnections upon reconnection of a link if the bonding module updelay parameter (0 msec) is incompatible with the forwarding delay time of the switch > bonding: MII link monitoring set to 10 ms > ADDRCONF(NETDEV_UP): bond0: link is not ready > tg3 0000:01:07.1: PME# disabled > bonding: bond0: enslaving eth0 as an active interface with a down link. > tg3 0000:01:07.0: PME# enabled > tg3 0000:01:07.0: PME# disabled > bonding: bond0: enslaving eth1 as an active interface with a down link. > tg3: eth0: Link is up at 1000 Mbps, full duplex. > tg3: eth0: Flow control is off for TX and off for RX. > bonding: bond0: link status definitely up for interface eth0. > bonding: bond0: making interface eth0 the new active one. > bonding: bond0: first active interface up! > ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready > tg3: eth1: Link is up at 1000 Mbps, full duplex. > tg3: eth1: Flow control is off for TX and off for RX. > bonding: bond0: link status definitely up for interface eth1. > bonding: bond0: enslaving eth2 as an active interface with a down link. > e1000: eth2 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX > bonding: bond0: link status definitely up for interface eth2. > > ======================================================= > [ INFO: possible circular locking dependency detected ] > 2.6.31-locking #10 > ------------------------------------------------------- > swapper/0 is trying to acquire lock: > (&(bond_info->tx_hashtbl_lock)){+.-...}, at: [] alb_get_best_slave+0x1b/0x58 [bonding] > > but task is already holding lock: > (&(bond_info->rx_hashtbl_lock)){+.-...}, at: [] bond_alb_xmit+0x1b7/0x60b [bonding] > > which lock already depends on the new lock. > > > the existing dependency chain (in reverse order) is: > > -> #1 (&(bond_info->rx_hashtbl_lock)){+.-...}: > [] __lock_acquire+0x109f/0x13a0 > [] lock_acquire+0xa8/0xbf > [] _spin_lock_bh+0x2a/0x39 > [] bond_alb_xmit+0x501/0x60b [bonding] > [] bond_start_xmit+0x2e9/0x32e [bonding] > [] dev_hard_start_xmit+0x281/0x314 > [] dev_queue_xmit+0x338/0x41b > [] neigh_resolve_output+0x260/0x28d > [] ip6_output2+0x2dc/0x32a [ipv6] > [] ip6_output+0xe93/0xea0 [ipv6] > [] ndisc_send_skb+0x19d/0x320 [ipv6] > [] __ndisc_send+0x3a/0x45 [ipv6] > [] ndisc_send_rs+0x34/0x3c [ipv6] > [] addrconf_dad_completed+0x5e/0x99 [ipv6] > [] addrconf_dad_timer+0x5d/0xe1 [ipv6] > [] run_timer_softirq+0x1a0/0x219 > [] __do_softirq+0xd6/0x1a3 > [] do_softirq+0x2b/0x43 > [] irq_exit+0x38/0x74 > [] smp_apic_timer_interrupt+0x6e/0x7c > [] apic_timer_interrupt+0x2f/0x34 > [] cpu_idle+0x49/0x76 > [] rest_init+0x67/0x69 > [] start_kernel+0x2c1/0x2c6 > [] __init_begin+0x6a/0x6f > > -> #0 (&(bond_info->tx_hashtbl_lock)){+.-...}: > [] __lock_acquire+0xe2a/0x13a0 > [] lock_acquire+0xa8/0xbf > [] _spin_lock_bh+0x2a/0x39 > [] alb_get_best_slave+0x1b/0x58 [bonding] > [] bond_alb_xmit+0x252/0x60b [bonding] > [] bond_start_xmit+0x2e9/0x32e [bonding] > [] dev_hard_start_xmit+0x281/0x314 > [] dev_queue_xmit+0x338/0x41b > [] arp_send+0x32/0x37 > [] arp_solicit+0x18a/0x1a1 > [] neigh_timer_handler+0x1c9/0x20a > [] run_timer_softirq+0x1a0/0x219 > [] __do_softirq+0xd6/0x1a3 > [] do_softirq+0x2b/0x43 > [] irq_exit+0x38/0x74 > [] smp_apic_timer_interrupt+0x6e/0x7c > [] apic_timer_interrupt+0x2f/0x34 > [] cpu_idle+0x49/0x76 > [] start_secondary+0x1ab/0x1b2 > > other info that might help us debug this: > > 5 locks held by swapper/0: > #0: (&n->timer){+.-...}, at: [] run_timer_softirq+0x146/0x219 > #1: (rcu_read_lock){.+.+..}, at: [] dev_queue_xmit+0x1b4/0x41b > #2: (&bond->lock){++.?..}, at: [] bond_alb_xmit+0x3d/0x60b [bonding] > #3: (&bond->curr_slave_lock){++.?..}, at: [] bond_alb_xmit+0x50/0x60b [bonding] > #4: (&(bond_info->rx_hashtbl_lock)){+.-...}, at: [] bond_alb_xmit+0x1b7/0x60b [bonding] > > stack backtrace: > Pid: 0, comm: swapper Not tainted 2.6.31-locking #10 > Call Trace: > [] ? printk+0xf/0x11 > [] print_circular_bug+0x90/0x9c > [] __lock_acquire+0xe2a/0x13a0 > [] ? bond_alb_xmit+0x1b7/0x60b [bonding] > [] lock_acquire+0xa8/0xbf > [] ? alb_get_best_slave+0x1b/0x58 [bonding] > [] _spin_lock_bh+0x2a/0x39 > [] ? alb_get_best_slave+0x1b/0x58 [bonding] > [] alb_get_best_slave+0x1b/0x58 [bonding] > [] bond_alb_xmit+0x252/0x60b [bonding] > [] ? mark_held_locks+0x43/0x5b > [] ? trace_hardirqs_on_caller+0xe6/0x120 > [] bond_start_xmit+0x2e9/0x32e [bonding] > [] dev_hard_start_xmit+0x281/0x314 > [] dev_queue_xmit+0x338/0x41b > [] ? arp_send+0x26/0x37 > [] arp_send+0x32/0x37 > [] arp_solicit+0x18a/0x1a1 > [] neigh_timer_handler+0x1c9/0x20a > [] run_timer_softirq+0x1a0/0x219 > [] ? run_timer_softirq+0x146/0x219 > [] ? neigh_timer_handler+0x0/0x20a > [] __do_softirq+0xd6/0x1a3 > [] do_softirq+0x2b/0x43 > [] irq_exit+0x38/0x74 > [] smp_apic_timer_interrupt+0x6e/0x7c > [] apic_timer_interrupt+0x2f/0x34 > [] ? cpu_idle+0x43/0x76 > [] ? sys_timer_create+0x205/0x304 > [] ? default_idle+0x8a/0xef > [] ? tick_nohz_restart_sched_tick+0x12f/0x138 > [] cpu_idle+0x49/0x76 > [] start_secondary+0x1ab/0x1b2 > > > -J > > --- > -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html