* [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1 @ 2007-04-26 18:58 Vincent ETIENNE 2007-04-26 20:27 ` Chris Snook 2007-04-27 4:11 ` Andrew Morton 0 siblings, 2 replies; 26+ messages in thread From: Vincent ETIENNE @ 2007-04-26 18:58 UTC (permalink / raw) To: Linux Kernel; +Cc: fubar, bonding-devel Hi, Summary : Got this trace when one network interface come down or up in a 2 interfaces bonding. So far, system seems to survive to this problem and works fine. Full description During testing of bonding of 2 interfaces, i have seen this from time to time in my log file ( the problem doesn't arrive each time but one in 3 or 4 try ). SYSTEM : 2 NIC card bond on interface bond0 : intel PRO/1000 (e1000 ) Broadcomm ( tg3 ) I have also try a 2.6.20 and 2.6.19 vanilla kernel ( identical problem but in onecase the system doesn't survive : that the reason the problem catch my attention ) Keywords ; network, bonding Version : Linux version 2.6.21-rc6-mm1 (root@jupiter2) (gcc version 4.1.1 (Gentoo 4.1.1-r3)) #3 SMP Thu Apr 26 08:45:06 CEST 2007 Output of /var/log/messages Apr 26 11:09:34 jupiter2 e1000: eth0: e1000_watchdog_task: NIC Link is Down Apr 26 11:09:34 jupiter2 bonding: bond0: link status definitely down for interface eth0, disabling it Apr 26 11:09:34 jupiter2 bonding: bond0: making interface eth1 the new active one. Apr 26 11:09:34 jupiter2 RTNL: assertion failed at net/ipv4/devinet.c (1055) Apr 26 11:09:34 jupiter2 Apr 26 11:09:34 jupiter2 Call Trace: Apr 26 11:09:34 jupiter2 <IRQ> [<ffffffff8049b49e>] inetdev_event+0x48/0x283 Apr 26 11:09:34 jupiter2 [<ffffffff804c8731>] _spin_lock_bh+0x9/0x19 Apr 26 11:09:34 jupiter2 [<ffffffff80473df1>] rt_run_flush+0x7e/0xaf Apr 26 11:09:34 jupiter2 [<ffffffff8022bdd0>] notifier_call_chain+0x29/0x56 Apr 26 11:09:34 jupiter2 [<ffffffff804560cc>] dev_set_mac_address+0x53/0x59 Apr 26 11:09:34 jupiter2 [<ffffffff88006d8d>] bonding:alb_set_slave_mac_addr+0x41/0x6c Apr 26 11:09:34 jupiter2 [<ffffffff88007215>] bonding:alb_swap_mac_addr+0x91/0x165 Apr 26 11:09:34 jupiter2 [<ffffffff88002029>] bonding:bond_change_active_slave+0x227/0x382 Apr 26 11:09:34 jupiter2 [<ffffffff880024c9>] bonding:bond_select_active_slave+0xb7/0xe5 Apr 26 11:09:34 jupiter2 [<ffffffff88004182>] bonding:bond_mii_monitor+0x3cd/0x41e Apr 26 11:09:34 jupiter2 [<ffffffff88003db5>] bonding:bond_mii_monitor+0x0/0x41e Apr 26 11:09:34 jupiter2 [<ffffffff80228714>] run_timer_softirq+0x130/0x19f Apr 26 11:09:34 jupiter2[<ffffffff8022618a>] __do_softirq+0x55/0xc4 Apr 26 11:09:34 jupiter2 [<ffffffff8020a5ac>] call_softirq+0x1c/0x28 Apr 26 11:09:34 jupiter2 [<ffffffff8020bead>] do_softirq+0x2c/0x7d Apr 26 11:09:34 jupiter2 [<ffffffff80213b2a>] smp_apic_timer_interrupt+0x49/0x5f Apr 26 11:09:34 jupiter2 [<ffffffff802088a3>] mwait_idle+0x0/0x45 Apr 26 11:09:34 jupiter2 [<ffffffff8020a056>] apic_timer_interrupt+0x66/0x70 Apr 26 11:09:34 jupiter2 <EOI> [<ffffffff802088e5>] mwait_idle+0x42/0x45 Apr 26 11:09:34 jupiter2 [<ffffffff8020883f>] cpu_idle+0x51/0x70 Apr 26 11:09:34 jupiter2 [<ffffffff806369bd>] start_kernel+0x242/0x24e Apr 26 11:09:34 jupiter2 [<ffffffff80636146>] _sinittext+0x146/0x14a other informations (ver_linux, lspci, ... ) available at http://mail1.vetienne.net/linux I'm a bit worried by the message so any help will be greatly appreciated Vincent ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1 2007-04-26 18:58 [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1 Vincent ETIENNE @ 2007-04-26 20:27 ` Chris Snook 2007-04-26 20:44 ` [Bonding-devel] " Jay Vosburgh 2007-04-27 4:11 ` Andrew Morton 1 sibling, 1 reply; 26+ messages in thread From: Chris Snook @ 2007-04-26 20:27 UTC (permalink / raw) To: Vincent ETIENNE; +Cc: Linux Kernel, fubar, bonding-devel Vincent ETIENNE wrote: > Hi, > > Summary : > Got this trace when one network interface come down or up in a 2 > interfaces bonding. So far, system seems to survive to this problem > and works fine. I'm investigating a similar/possibly identical bug. Do you experience packet loss or throughput stalls, beyond just the loss of the interface that went down, when this happens? -- Chris > Full description > > During testing of bonding of 2 interfaces, i have seen this from > time to time in my log file ( the problem doesn't arrive each > time but one in 3 or 4 try ). > > SYSTEM : 2 NIC card bond on interface bond0 : > intel PRO/1000 (e1000 ) > Broadcomm ( tg3 ) > I have also try a 2.6.20 and 2.6.19 vanilla kernel ( identical problem but in > onecase the system doesn't survive : that the reason the problem catch my > attention ) > > Keywords ; network, bonding > > Version : Linux version 2.6.21-rc6-mm1 (root@jupiter2) (gcc version 4.1.1 > (Gentoo 4.1.1-r3)) #3 SMP Thu Apr 26 08:45:06 CEST 2007 > > Output of /var/log/messages > > Apr 26 11:09:34 jupiter2 e1000: eth0: e1000_watchdog_task: NIC Link > is Down Apr 26 11:09:34 jupiter2 bonding: bond0: link status > definitely down for interface eth0, disabling it > Apr 26 11:09:34 jupiter2 bonding: bond0: making interface eth1 the new > active one. > Apr 26 11:09:34 jupiter2 RTNL: assertion failed at net/ipv4/devinet.c > (1055) Apr 26 11:09:34 jupiter2 > Apr 26 11:09:34 jupiter2 Call Trace: > Apr 26 11:09:34 jupiter2 <IRQ> [<ffffffff8049b49e>] > inetdev_event+0x48/0x283 > Apr 26 11:09:34 jupiter2 [<ffffffff804c8731>] _spin_lock_bh+0x9/0x19 > Apr 26 11:09:34 jupiter2 [<ffffffff80473df1>] rt_run_flush+0x7e/0xaf > Apr 26 11:09:34 jupiter2 [<ffffffff8022bdd0>] notifier_call_chain+0x29/0x56 > Apr 26 11:09:34 jupiter2 [<ffffffff804560cc>] dev_set_mac_address+0x53/0x59 > Apr 26 11:09:34 jupiter2 [<ffffffff88006d8d>] > bonding:alb_set_slave_mac_addr+0x41/0x6c > Apr 26 11:09:34 jupiter2 [<ffffffff88007215>] > bonding:alb_swap_mac_addr+0x91/0x165 > Apr 26 11:09:34 jupiter2 [<ffffffff88002029>] > bonding:bond_change_active_slave+0x227/0x382 > Apr 26 11:09:34 jupiter2 [<ffffffff880024c9>] > bonding:bond_select_active_slave+0xb7/0xe5 > Apr 26 11:09:34 jupiter2 [<ffffffff88004182>] > bonding:bond_mii_monitor+0x3cd/0x41e > Apr 26 11:09:34 jupiter2 [<ffffffff88003db5>] > bonding:bond_mii_monitor+0x0/0x41e > Apr 26 11:09:34 jupiter2 [<ffffffff80228714>] > run_timer_softirq+0x130/0x19f > Apr 26 11:09:34 jupiter2[<ffffffff8022618a>] __do_softirq+0x55/0xc4 > Apr 26 11:09:34 jupiter2 [<ffffffff8020a5ac>] call_softirq+0x1c/0x28 > Apr 26 11:09:34 jupiter2 [<ffffffff8020bead>] do_softirq+0x2c/0x7d > Apr 26 11:09:34 jupiter2 [<ffffffff80213b2a>] > smp_apic_timer_interrupt+0x49/0x5f > Apr 26 11:09:34 jupiter2 [<ffffffff802088a3>] mwait_idle+0x0/0x45 > Apr 26 11:09:34 jupiter2 [<ffffffff8020a056>] apic_timer_interrupt+0x66/0x70 > Apr 26 11:09:34 jupiter2 <EOI> [<ffffffff802088e5>] mwait_idle+0x42/0x45 > Apr 26 11:09:34 jupiter2 [<ffffffff8020883f>] cpu_idle+0x51/0x70 > Apr 26 11:09:34 jupiter2 [<ffffffff806369bd>] start_kernel+0x242/0x24e > Apr 26 11:09:34 jupiter2 [<ffffffff80636146>] _sinittext+0x146/0x14a > > > other informations (ver_linux, lspci, ... ) available at > http://mail1.vetienne.net/linux > > I'm a bit worried by the message so any help will be greatly appreciated > > Vincent ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Bonding-devel] [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1 2007-04-26 20:27 ` Chris Snook @ 2007-04-26 20:44 ` Jay Vosburgh 2007-04-26 21:43 ` Vincent ETIENNE 2007-05-09 18:47 ` Vincent ETIENNE 0 siblings, 2 replies; 26+ messages in thread From: Jay Vosburgh @ 2007-04-26 20:44 UTC (permalink / raw) To: Chris Snook; +Cc: Vincent ETIENNE, Linux Kernel, bonding-devel, Andy Gospodarek Chris Snook <csnook@redhat.com> wrote: >Vincent ETIENNE wrote: >> Hi, >> >> Summary : >> Got this trace when one network interface come down or up in a 2 >> interfaces bonding. So far, system seems to survive to this problem >> and works fine. > >I'm investigating a similar/possibly identical bug. Do you experience >packet loss or throughput stalls, beyond just the loss of the interface >that went down, when this happens? This problem looks to be one of the known locking issues with bonding. Andy Gospodarek <andy@greyhouse.net> and I have been working offline on the locking issues in bonding over the last several weeks. At the moment, we have a generally stable (but ugly with debug fluff and other yuckies) patch that seems to resolve at least the majority of the various issues. I'm thinking to clean it up for general posting early next week, and address additional problems from there (since it's hopefully at least a big step forward). -J --- -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Bonding-devel] [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1 2007-04-26 20:44 ` [Bonding-devel] " Jay Vosburgh @ 2007-04-26 21:43 ` Vincent ETIENNE 2007-05-09 18:47 ` Vincent ETIENNE 1 sibling, 0 replies; 26+ messages in thread From: Vincent ETIENNE @ 2007-04-26 21:43 UTC (permalink / raw) To: Jay Vosburgh; +Cc: Chris Snook, Linux Kernel, bonding-devel, Andy Gospodarek Le Thursday 26 April 2007 22:44:59 Jay Vosburgh, vous avez écrit : > Chris Snook <csnook@redhat.com> wrote: > >Vincent ETIENNE wrote: > >> Hi, > >> > >> Summary : > >> Got this trace when one network interface come down or up in a 2 > >> interfaces bonding. So far, system seems to survive to this problem > >> and works fine. > > > >I'm investigating a similar/possibly identical bug. Do you experience > >packet loss or throughput stalls, beyond just the loss of the interface > >that went down, when this happens? > > This problem looks to be one of the known locking issues with > bonding. > > Andy Gospodarek <andy@greyhouse.net> and I have been working > offline on the locking issues in bonding over the last several weeks. > At the moment, we have a generally stable (but ugly with debug fluff and > other yuckies) patch that seems to resolve at least the majority of the > various issues. I'm thinking to clean it up for general posting early > next week, and address additional problems from there (since it's > hopefully at least a big step forward). > > -J > > --- > -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ Will be more than happy to test it and report the result with the new patch. Thanks for your work, ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Bonding-devel] [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1 2007-04-26 20:44 ` [Bonding-devel] " Jay Vosburgh 2007-04-26 21:43 ` Vincent ETIENNE @ 2007-05-09 18:47 ` Vincent ETIENNE 2007-05-10 19:57 ` Jay Vosburgh 1 sibling, 1 reply; 26+ messages in thread From: Vincent ETIENNE @ 2007-05-09 18:47 UTC (permalink / raw) To: Jay Vosburgh; +Cc: Chris Snook, Linux Kernel, bonding-devel, Andy Gospodarek Le Thursday 26 April 2007 22:44:59 Jay Vosburgh, vous avez écrit : > Chris Snook <csnook@redhat.com> wrote: > >Vincent ETIENNE wrote: > >> Hi, > >> > >> Summary : > >> Got this trace when one network interface come down or up in a 2 > >> interfaces bonding. So far, system seems to survive to this problem > >> and works fine. > > > This problem looks to be one of the known locking issues with > bonding. > > Andy Gospodarek <andy@greyhouse.net> and I have been working > offline on the locking issues in bonding over the last several weeks. > At the moment, we have a generally stable (but ugly with debug fluff and > other yuckies) patch that seems to resolve at least the majority of the > various issues. I'm thinking to clean it up for general posting early > next week, and address additional problems from there (since it's > hopefully at least a big step forward). > > -J Any news and/or patch concerning this problem ?. Not intended to stress you but time goes on... Do you think a solution could be found in a relatively short timeframe (and i will delay a bit final installation) or is it better to go without bonding and do an upgrade later ?. Vincent > > --- > -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [Bonding-devel] [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1 2007-05-09 18:47 ` Vincent ETIENNE @ 2007-05-10 19:57 ` Jay Vosburgh 0 siblings, 0 replies; 26+ messages in thread From: Jay Vosburgh @ 2007-05-10 19:57 UTC (permalink / raw) To: Vincent ETIENNE; +Cc: Chris Snook, netdev, bonding-devel Vincent ETIENNE <ve@vetienne.net> wrote: >Any news and/or patch concerning this problem ?. Not intended to stress you >but time goes on... Do you think a solution could be found in a relatively >short timeframe (and i will delay a bit final installation) or is it better >to go without bonding and do an upgrade later ?. Our working patch is appended; I'm getting a bunch of warnings from the lockdep stuff that I'm sorting through. I'm not sure if they're real concerns or just misconfiguration of lockdep itself. This doesn't fix everything (e.g., enslaving a VLAN device), but covers most of the usual trouble cases. -J --- -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c index 7e03f41..c8394a0 100644 --- a/drivers/net/bonding/bond_3ad.c +++ b/drivers/net/bonding/bond_3ad.c @@ -2097,8 +2097,10 @@ void bond_3ad_unbind_slave(struct slave *slave) * times out, and it selects an aggregator for the ports that are yet not * related to any aggregator, and selects the active aggregator for a bond. */ -void bond_3ad_state_machine_handler(struct bonding *bond) +void bond_3ad_state_machine_handler(struct work_struct *work) { + struct bonding *bond = container_of(work, struct bonding, + ad_work.work); struct port *port; struct aggregator *aggregator; @@ -2149,7 +2151,7 @@ void bond_3ad_state_machine_handler(struct bonding *bond) } re_arm: - mod_timer(&(BOND_AD_INFO(bond).ad_timer), jiffies + ad_delta_in_ticks); + queue_delayed_work(bond->wq, &bond->ad_work, ad_delta_in_ticks); out: read_unlock(&bond->lock); } diff --git a/drivers/net/bonding/bond_3ad.h b/drivers/net/bonding/bond_3ad.h index 6ad5ad6..cf5142d 100644 --- a/drivers/net/bonding/bond_3ad.h +++ b/drivers/net/bonding/bond_3ad.h @@ -276,7 +276,7 @@ struct ad_slave_info { void bond_3ad_initialize(struct bonding *bond, u16 tick_resolution, int lacp_fast); int bond_3ad_bind_slave(struct slave *slave); void bond_3ad_unbind_slave(struct slave *slave); -void bond_3ad_state_machine_handler(struct bonding *bond); +void bond_3ad_state_machine_handler(struct work_struct *); void bond_3ad_adapter_speed_changed(struct slave *slave); void bond_3ad_adapter_duplex_changed(struct slave *slave); void bond_3ad_handle_link_change(struct slave *slave, char link); diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c index 92c3b6f..4cdea61 100644 --- a/drivers/net/bonding/bond_alb.c +++ b/drivers/net/bonding/bond_alb.c @@ -42,7 +42,6 @@ #include "bonding.h" #include "bond_alb.h" - #define ALB_TIMER_TICKS_PER_SEC 10 /* should be a divisor of HZ */ #define BOND_TLB_REBALANCE_INTERVAL 10 /* In seconds, periodic re-balancing. * Used for division - never set @@ -469,13 +468,13 @@ static void rlb_clear_slave(struct bonding *bond, struct slave *slave) _unlock_rx_hashtbl(bond); - write_lock(&bond->curr_slave_lock); + write_lock_bh(&bond->curr_slave_lock); if (slave != bond->curr_active_slave) { rlb_teach_disabled_mac_on_primary(bond, slave->dev->dev_addr); } - write_unlock(&bond->curr_slave_lock); + write_unlock_bh(&bond->curr_slave_lock); } static void rlb_update_client(struct rlb_client_info *client_info) @@ -956,19 +955,33 @@ static int alb_set_slave_mac_addr(struct slave *slave, u8 addr[], int hw) return 0; } -/* Caller must hold bond lock for write or curr_slave_lock for write*/ +/* + * Swap MAC addresses between two slaves. + * + * Called with RTNL held, and no other locks. + */ + static void alb_swap_mac_addr(struct bonding *bond, struct slave *slave1, struct slave *slave2) { - struct slave *disabled_slave = NULL; u8 tmp_mac_addr[ETH_ALEN]; - int slaves_state_differ; - - slaves_state_differ = (SLAVE_IS_OK(slave1) != SLAVE_IS_OK(slave2)); memcpy(tmp_mac_addr, slave1->dev->dev_addr, ETH_ALEN); alb_set_slave_mac_addr(slave1, slave2->dev->dev_addr, bond->alb_info.rlb_enabled); alb_set_slave_mac_addr(slave2, tmp_mac_addr, bond->alb_info.rlb_enabled); +} + +/* + * Send learning packets after MAC address swap. + * + * Called with RTNL and bond->lock held for read. + */ +static void alb_fasten_mac_swap(struct bonding *bond, struct slave *slave1, + struct slave *slave2) +{ + int slaves_state_differ = (SLAVE_IS_OK(slave1) != SLAVE_IS_OK(slave2)); + struct slave *disabled_slave = NULL; + /* fasten the change in the switch */ if (SLAVE_IS_OK(slave1)) { alb_send_learning_packets(slave1, slave1->dev->dev_addr); @@ -1041,7 +1054,9 @@ static void alb_change_hw_addr_on_detach(struct bonding *bond, struct slave *sla } if (found) { + /* XXX locking: needs RTNL and nothing else */ alb_swap_mac_addr(bond, slave, tmp_slave); + alb_fasten_mac_swap(bond, slave, tmp_slave); } } } @@ -1373,8 +1388,10 @@ out: return 0; } -void bond_alb_monitor(struct bonding *bond) +void bond_alb_monitor(struct work_struct *work) { + struct bonding *bond = container_of(work, struct bonding, + alb_work.work); struct alb_bond_info *bond_info = &(BOND_ALB_INFO(bond)); struct slave *slave; int i; @@ -1477,7 +1494,7 @@ void bond_alb_monitor(struct bonding *bond) } re_arm: - mod_timer(&(bond_info->alb_timer), jiffies + alb_delta_in_ticks); + queue_delayed_work(bond->wq, &bond->alb_work, alb_delta_in_ticks); out: read_unlock(&bond->lock); } @@ -1498,11 +1515,11 @@ int bond_alb_init_slave(struct bonding *bond, struct slave *slave) /* caller must hold the bond lock for write since the mac addresses * are compared and may be swapped. */ - write_lock_bh(&bond->lock); + read_lock(&bond->lock); res = alb_handle_addr_collision_on_attach(bond, slave); - write_unlock_bh(&bond->lock); + read_unlock(&bond->lock); if (res) { return res; @@ -1567,7 +1584,12 @@ void bond_alb_handle_link_change(struct bonding *bond, struct slave *slave, char * Set the bond->curr_active_slave to @new_slave and handle * mac address swapping and promiscuity changes as needed. * - * Caller must hold bond curr_slave_lock for write (or bond lock for write) + * If new_slave is NULL, caller must hold curr_slave_lock or + * bond->lock for write. + * + * If new_slave is not NULL, caller must hold RTNL, bond->lock for + * read and curr_slave_lock for write_bh. Processing here may sleep, so + * no other locks may be held. */ void bond_alb_handle_active_change(struct bonding *bond, struct slave *new_slave) { @@ -1606,19 +1628,30 @@ void bond_alb_handle_active_change(struct bonding *bond, struct slave *new_slave } } + write_unlock_bh(&bond->curr_slave_lock); + read_unlock(&bond->lock); + /* curr_active_slave must be set before calling alb_swap_mac_addr */ - if (swap_slave) { - /* swap mac address */ + if (swap_slave) alb_swap_mac_addr(bond, swap_slave, new_slave); - } else { + else /* set the new_slave to the bond mac address */ alb_set_slave_mac_addr(new_slave, bond->dev->dev_addr, bond->alb_info.rlb_enabled); + + read_lock(&bond->lock); + write_lock_bh(&bond->curr_slave_lock); + + if (swap_slave) + alb_fasten_mac_swap(bond, swap_slave, new_slave); + else /* fasten bond mac on new current slave */ alb_send_learning_packets(new_slave, bond->dev->dev_addr); - } } +/* + * Called with RTNL + */ int bond_alb_set_mac_address(struct net_device *bond_dev, void *addr) { struct bonding *bond = bond_dev->priv; @@ -1655,8 +1688,12 @@ int bond_alb_set_mac_address(struct net_device *bond_dev, void *addr) } } + write_unlock_bh(&bond->curr_slave_lock); + read_unlock(&bond->lock); + if (swap_slave) { alb_swap_mac_addr(bond, swap_slave, bond->curr_active_slave); + alb_fasten_mac_swap(bond, swap_slave, bond->curr_active_slave); } else { alb_set_slave_mac_addr(bond->curr_active_slave, bond_dev->dev_addr, bond->alb_info.rlb_enabled); @@ -1668,6 +1705,9 @@ int bond_alb_set_mac_address(struct net_device *bond_dev, void *addr) } } + read_lock(&bond->lock); + write_lock_bh(&bond->curr_slave_lock); + return 0; } diff --git a/drivers/net/bonding/bond_alb.h b/drivers/net/bonding/bond_alb.h index 28f2a2f..5a37e43 100644 --- a/drivers/net/bonding/bond_alb.h +++ b/drivers/net/bonding/bond_alb.h @@ -125,7 +125,7 @@ void bond_alb_deinit_slave(struct bonding *bond, struct slave *slave); void bond_alb_handle_link_change(struct bonding *bond, struct slave *slave, char link); void bond_alb_handle_active_change(struct bonding *bond, struct slave *new_slave); int bond_alb_xmit(struct sk_buff *skb, struct net_device *bond_dev); -void bond_alb_monitor(struct bonding *bond); +void bond_alb_monitor(struct work_struct *); int bond_alb_set_mac_address(struct net_device *bond_dev, void *addr); void bond_alb_clear_vlan(struct bonding *bond, unsigned short vlan_id); #endif /* __BOND_ALB_H__ */ diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 724bce5..7c003fc 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -1,4 +1,4 @@ -/* + /* * originally based on the dummy device. * * Copyright 1999, Thomas Davis, tadavis@lbl.gov. @@ -1567,7 +1567,15 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev) /* first slave or no active slave yet, and this link * is OK, so make this interface the active one */ + write_unlock_bh(&bond->lock); + read_lock(&bond->lock); + write_lock_bh(&bond->curr_slave_lock); + bond_change_active_slave(bond, new_slave); + + write_unlock_bh(&bond->curr_slave_lock); + read_unlock(&bond->lock); + write_lock_bh(&bond->lock); } else { bond_set_slave_inactive_flags(new_slave); } @@ -1729,9 +1737,23 @@ int bond_release(struct net_device *bond_dev, struct net_device *slave_dev) bond_alb_deinit_slave(bond, slave); } - if (oldcurrent == slave) + if (oldcurrent == slave) { + /* + * Note that we hold RTNL over this sequence, so there + * is no concern that another slave add/remove event + * will interfere. + */ + write_unlock_bh(&bond->lock); + read_lock(&bond->lock); + write_lock_bh(&bond->curr_slave_lock); + bond_select_active_slave(bond); + write_unlock_bh(&bond->curr_slave_lock); + read_unlock(&bond->lock); + write_lock_bh(&bond->lock); + } + if (bond->slave_cnt == 0) { bond_set_carrier(bond); @@ -1954,16 +1976,19 @@ static int bond_ioctl_change_active(struct net_device *bond_dev, struct net_devi return -EINVAL; } - write_lock_bh(&bond->lock); + read_lock(&bond->lock); + read_lock(&bond->curr_slave_lock); old_active = bond->curr_active_slave; + read_unlock(&bond->curr_slave_lock); + new_active = bond_get_slave_by_dev(bond, slave_dev); /* * Changing to the current active: do nothing; return success. */ if (new_active && (new_active == old_active)) { - write_unlock_bh(&bond->lock); + read_unlock(&bond->lock); return 0; } @@ -1971,12 +1996,14 @@ static int bond_ioctl_change_active(struct net_device *bond_dev, struct net_devi (old_active) && (new_active->link == BOND_LINK_UP) && IS_UP(new_active->dev)) { + write_lock_bh(&bond->curr_slave_lock); bond_change_active_slave(bond, new_active); + write_unlock_bh(&bond->curr_slave_lock); } else { res = -EINVAL; } - write_unlock_bh(&bond->lock); + read_unlock(&bond->lock); return res; } @@ -1988,9 +2015,9 @@ static int bond_info_query(struct net_device *bond_dev, struct ifbond *info) info->bond_mode = bond->params.mode; info->miimon = bond->params.miimon; - read_lock_bh(&bond->lock); + read_lock(&bond->lock); info->num_slaves = bond->slave_cnt; - read_unlock_bh(&bond->lock); + read_unlock(&bond->lock); return 0; } @@ -2005,7 +2032,7 @@ static int bond_slave_info_query(struct net_device *bond_dev, struct ifslave *in return -ENODEV; } - read_lock_bh(&bond->lock); + read_lock(&bond->lock); bond_for_each_slave(bond, slave, i) { if (i == (int)info->slave_id) { @@ -2014,7 +2041,7 @@ static int bond_slave_info_query(struct net_device *bond_dev, struct ifslave *in } } - read_unlock_bh(&bond->lock); + read_unlock(&bond->lock); if (found) { strcpy(info->slave_name, slave->dev->name); @@ -2028,28 +2055,52 @@ static int bond_slave_info_query(struct net_device *bond_dev, struct ifslave *in return 0; } + +/*-------------------------------- Workqueues -------------------------------*/ + +void bond_work_cancel_all(struct bonding *bond) +{ + write_lock_bh(&bond->lock); + bond->kill_timers = 1; + write_unlock_bh(&bond->lock); + + if (bond->params.miimon && delayed_work_pending(&bond->mii_work)) + cancel_delayed_work(&bond->mii_work); + + if (bond->params.arp_interval && delayed_work_pending(&bond->arp_work)) + cancel_delayed_work(&bond->arp_work); + + if (bond->params.mode == BOND_MODE_ALB && + delayed_work_pending(&bond->alb_work)) + cancel_delayed_work(&bond->alb_work); + + if (bond->params.mode == BOND_MODE_8023AD && + delayed_work_pending(&bond->ad_work)) + cancel_delayed_work(&bond->ad_work); +} + /*-------------------------------- Monitoring -------------------------------*/ -/* this function is called regularly to monitor each slave's link. */ -void bond_mii_monitor(struct net_device *bond_dev) +/* + * if !have_locks, return nonzero if a failover is necessary. + * if have_locks, do whatever failover activities are needed. + * + * This is to separate the inspection and failover steps for locking + * purposes; failover requires rtnl, but acquiring it for every + * inspection is undesirable, so an initial call does inspection, and + * the wrapper acquires the necessary locks and calls again to perform + * failover if needed. Since all locks are dropped, a complete + * restart is needed between calls. + */ +static int __bond_mii_monitor(struct bonding *bond, int have_locks) { - struct bonding *bond = bond_dev->priv; + struct net_device *bond_dev = bond->dev; struct slave *slave, *oldcurrent; int do_failover = 0; - int delta_in_ticks; int i; - read_lock(&bond->lock); - - delta_in_ticks = (bond->params.miimon * HZ) / 1000; - - if (bond->kill_timers) { + if (bond->slave_cnt == 0) goto out; - } - - if (bond->slave_cnt == 0) { - goto re_arm; - } /* we will try to read the link status of each of our slaves, and * set their IFF_RUNNING flag appropriately. For each slave not @@ -2105,6 +2156,9 @@ void bond_mii_monitor(struct net_device *bond_dev) if (link_state != BMSR_LSTATUS) { /* link stays down */ if (slave->delay <= 0) { + if (!have_locks) + return 1; + /* link down for too long time */ slave->link = BOND_LINK_DOWN; @@ -2188,6 +2242,9 @@ void bond_mii_monitor(struct net_device *bond_dev) } else { /* link stays up */ if (slave->delay == 0) { + if (!have_locks) + return 1; + /* now the link has been up for long time enough */ slave->link = BOND_LINK_UP; slave->jiffies = jiffies; @@ -2253,22 +2310,39 @@ void bond_mii_monitor(struct net_device *bond_dev) } /* end of for */ if (do_failover) { - write_lock(&bond->curr_slave_lock); - + write_lock_bh(&bond->curr_slave_lock); bond_select_active_slave(bond); - - write_unlock(&bond->curr_slave_lock); + write_unlock_bh(&bond->curr_slave_lock); } else bond_set_carrier(bond); -re_arm: - if (bond->params.miimon) { - mod_timer(&bond->mii_timer, jiffies + delta_in_ticks); - } out: - read_unlock(&bond->lock); + return 0; } +void bond_mii_monitor(struct work_struct *work) +{ + struct bonding *bond = container_of(work, struct bonding, + mii_work.work); + unsigned long delay; + + read_lock(&bond->lock); + if (bond->kill_timers) { + read_unlock(&bond->lock); + return; + } + if (__bond_mii_monitor(bond, 0)) { + read_unlock(&bond->lock); + rtnl_lock(); + read_lock(&bond->lock); + __bond_mii_monitor(bond, 1); + rtnl_unlock(); + } + + delay = ((bond->params.miimon * HZ) / 1000) ? : 1; + read_unlock(&bond->lock); + queue_delayed_work(bond->wq, &bond->mii_work, delay); +} static u32 bond_glean_dev_ip(struct net_device *dev) { @@ -2564,9 +2638,11 @@ out: * arp is transmitted to generate traffic. see activebackup_arp_monitor for * arp monitoring in active backup mode. */ -void bond_loadbalance_arp_mon(struct net_device *bond_dev) +void bond_loadbalance_arp_mon(struct work_struct *work) { - struct bonding *bond = bond_dev->priv; + struct bonding *bond = container_of(work, struct bonding, + arp_work.work); + struct net_device *bond_dev = bond->dev; struct slave *slave, *oldcurrent; int do_failover = 0; int delta_in_ticks; @@ -2665,17 +2741,16 @@ void bond_loadbalance_arp_mon(struct net_device *bond_dev) } if (do_failover) { - write_lock(&bond->curr_slave_lock); + write_lock_bh(&bond->curr_slave_lock); bond_select_active_slave(bond); - write_unlock(&bond->curr_slave_lock); + write_unlock_bh(&bond->curr_slave_lock); } re_arm: - if (bond->params.arp_interval) { - mod_timer(&bond->arp_timer, jiffies + delta_in_ticks); - } + if (bond->params.arp_interval) + queue_delayed_work(bond->wq, &bond->arp_work, delta_in_ticks); out: read_unlock(&bond->lock); } @@ -2695,9 +2770,11 @@ out: * may have received. * see loadbalance_arp_monitor for arp monitoring in load balancing mode */ -void bond_activebackup_arp_mon(struct net_device *bond_dev) +void bond_activebackup_arp_mon(struct work_struct *work) { - struct bonding *bond = bond_dev->priv; + struct bonding *bond = container_of(work, struct bonding, + arp_work.work); + struct net_device *bond_dev = bond->dev; struct slave *slave; int delta_in_ticks; int i; @@ -2726,7 +2803,7 @@ void bond_activebackup_arp_mon(struct net_device *bond_dev) slave->link = BOND_LINK_UP; - write_lock(&bond->curr_slave_lock); + write_lock_bh(&bond->curr_slave_lock); if ((!bond->curr_active_slave) && ((jiffies - slave->dev->trans_start) <= delta_in_ticks)) { @@ -2760,7 +2837,7 @@ void bond_activebackup_arp_mon(struct net_device *bond_dev) slave->dev->name); } - write_unlock(&bond->curr_slave_lock); + write_unlock_bh(&bond->curr_slave_lock); } } else { read_lock(&bond->curr_slave_lock); @@ -2830,12 +2907,12 @@ void bond_activebackup_arp_mon(struct net_device *bond_dev) bond_dev->name, slave->dev->name); - write_lock(&bond->curr_slave_lock); + write_lock_bh(&bond->curr_slave_lock); bond_select_active_slave(bond); slave = bond->curr_active_slave; - write_unlock(&bond->curr_slave_lock); + write_unlock_bh(&bond->curr_slave_lock); bond->current_arp_slave = slave; @@ -2854,9 +2931,9 @@ void bond_activebackup_arp_mon(struct net_device *bond_dev) bond->primary_slave->dev->name); /* primary is up so switch to it */ - write_lock(&bond->curr_slave_lock); + write_lock_bh(&bond->curr_slave_lock); bond_change_active_slave(bond, bond->primary_slave); - write_unlock(&bond->curr_slave_lock); + write_unlock_bh(&bond->curr_slave_lock); slave = bond->primary_slave; slave->jiffies = jiffies; @@ -2921,9 +2998,8 @@ void bond_activebackup_arp_mon(struct net_device *bond_dev) } re_arm: - if (bond->params.arp_interval) { - mod_timer(&bond->arp_timer, jiffies + delta_in_ticks); - } + if (bond->params.arp_interval) + queue_delayed_work(bond->wq, &bond->arp_work, delta_in_ticks); out: read_unlock(&bond->lock); } @@ -2943,7 +3019,7 @@ static void *bond_info_seq_start(struct seq_file *seq, loff_t *pos) /* make sure the bond won't be taken away */ read_lock(&dev_base_lock); - read_lock_bh(&bond->lock); + read_lock(&bond->lock); if (*pos == 0) { return SEQ_START_TOKEN; @@ -2977,7 +3053,7 @@ static void bond_info_seq_stop(struct seq_file *seq, void *v) { struct bonding *bond = seq->private; - read_unlock_bh(&bond->lock); + read_unlock(&bond->lock); read_unlock(&dev_base_lock); } @@ -3503,14 +3579,11 @@ static int bond_xmit_hash_policy_l2(struct sk_buff *skb, static int bond_open(struct net_device *bond_dev) { struct bonding *bond = bond_dev->priv; - struct timer_list *mii_timer = &bond->mii_timer; - struct timer_list *arp_timer = &bond->arp_timer; bond->kill_timers = 0; if ((bond->params.mode == BOND_MODE_TLB) || (bond->params.mode == BOND_MODE_ALB)) { - struct timer_list *alb_timer = &(BOND_ALB_INFO(bond).alb_timer); /* bond_alb_initialize must be called before the timer * is started. @@ -3520,44 +3593,33 @@ static int bond_open(struct net_device *bond_dev) return -1; } - init_timer(alb_timer); - alb_timer->expires = jiffies + 1; - alb_timer->data = (unsigned long)bond; - alb_timer->function = (void *)&bond_alb_monitor; - add_timer(alb_timer); + INIT_DELAYED_WORK(&bond->alb_work, bond_alb_monitor); + queue_delayed_work(bond->wq, &bond->alb_work, 0); } if (bond->params.miimon) { /* link check interval, in milliseconds. */ - init_timer(mii_timer); - mii_timer->expires = jiffies + 1; - mii_timer->data = (unsigned long)bond_dev; - mii_timer->function = (void *)&bond_mii_monitor; - add_timer(mii_timer); + INIT_DELAYED_WORK(&bond->mii_work, bond_mii_monitor); + queue_delayed_work(bond->wq, &bond->mii_work, 0); } if (bond->params.arp_interval) { /* arp interval, in milliseconds. */ - init_timer(arp_timer); - arp_timer->expires = jiffies + 1; - arp_timer->data = (unsigned long)bond_dev; - if (bond->params.mode == BOND_MODE_ACTIVEBACKUP) { - arp_timer->function = (void *)&bond_activebackup_arp_mon; - } else { - arp_timer->function = (void *)&bond_loadbalance_arp_mon; - } + + if (bond->params.mode == BOND_MODE_ACTIVEBACKUP) + INIT_DELAYED_WORK(&bond->arp_work, + bond_activebackup_arp_mon); + else + INIT_DELAYED_WORK(&bond->arp_work, + bond_loadbalance_arp_mon); + + queue_delayed_work(bond->wq, &bond->arp_work, 0); if (bond->params.arp_validate) bond_register_arp(bond); - - add_timer(arp_timer); } if (bond->params.mode == BOND_MODE_8023AD) { - struct timer_list *ad_timer = &(BOND_AD_INFO(bond).ad_timer); - init_timer(ad_timer); - ad_timer->expires = jiffies + 1; - ad_timer->data = (unsigned long)bond; - ad_timer->function = (void *)&bond_3ad_state_machine_handler; - add_timer(ad_timer); - + INIT_DELAYED_WORK(&bond->ad_work, + bond_3ad_state_machine_handler); + queue_delayed_work(bond->wq, &bond->ad_work, 0); /* register to receive LACPDUs */ bond_register_lacpdu(bond); } @@ -3579,36 +3641,36 @@ static int bond_close(struct net_device *bond_dev) write_lock_bh(&bond->lock); - /* signal timers not to re-arm */ bond->kill_timers = 1; write_unlock_bh(&bond->lock); - /* del_timer_sync must run without holding the bond->lock - * because a running timer might be trying to hold it too + /* + * Release the lock here since cancel_delayed_work will take + * it again and releasing it will give scheduled work a chance + * to run. */ - if (bond->params.miimon) { /* link check interval, in milliseconds. */ - del_timer_sync(&bond->mii_timer); - } + if (bond->params.miimon && delayed_work_pending(&bond->mii_work)) + cancel_delayed_work(&bond->mii_work); - if (bond->params.arp_interval) { /* arp interval, in milliseconds. */ - del_timer_sync(&bond->arp_timer); - } + if (bond->params.arp_interval && delayed_work_pending(&bond->arp_work)) + cancel_delayed_work(&bond->arp_work); switch (bond->params.mode) { case BOND_MODE_8023AD: - del_timer_sync(&(BOND_AD_INFO(bond).ad_timer)); + if (delayed_work_pending(&bond->ad_work)) + cancel_delayed_work(&bond->ad_work); break; case BOND_MODE_TLB: case BOND_MODE_ALB: - del_timer_sync(&(BOND_ALB_INFO(bond).alb_timer)); + if (delayed_work_pending(&bond->alb_work)) + cancel_delayed_work(&bond->alb_work); break; default: break; - } - + } if ((bond->params.mode == BOND_MODE_TLB) || (bond->params.mode == BOND_MODE_ALB)) { @@ -3630,7 +3692,7 @@ static struct net_device_stats *bond_get_stats(struct net_device *bond_dev) memset(stats, 0, sizeof(struct net_device_stats)); - read_lock_bh(&bond->lock); + read_lock(&bond->lock); bond_for_each_slave(bond, slave, i) { sstats = slave->dev->get_stats(slave->dev); @@ -3661,7 +3723,7 @@ static struct net_device_stats *bond_get_stats(struct net_device *bond_dev) stats->tx_window_errors += sstats->tx_window_errors; } - read_unlock_bh(&bond->lock); + read_unlock(&bond->lock); return stats; } @@ -3700,13 +3762,13 @@ static int bond_do_ioctl(struct net_device *bond_dev, struct ifreq *ifr, int cmd if (mii->reg_num == 1) { struct bonding *bond = bond_dev->priv; mii->val_out = 0; - read_lock_bh(&bond->lock); + read_lock(&bond->lock); read_lock(&bond->curr_slave_lock); if (netif_carrier_ok(bond->dev)) { mii->val_out = BMSR_LSTATUS; } read_unlock(&bond->curr_slave_lock); - read_unlock_bh(&bond->lock); + read_unlock(&bond->lock); } return 0; @@ -3991,7 +4053,7 @@ static int bond_xmit_roundrobin(struct sk_buff *skb, struct net_device *bond_dev { struct bonding *bond = bond_dev->priv; struct slave *slave, *start_at; - int i; + int i, slave_no; int res = 1; read_lock(&bond->lock); @@ -4000,29 +4062,30 @@ static int bond_xmit_roundrobin(struct sk_buff *skb, struct net_device *bond_dev goto out; } - read_lock(&bond->curr_slave_lock); - slave = start_at = bond->curr_active_slave; - read_unlock(&bond->curr_slave_lock); + /* + * Concurrent TX may collide on rr_tx_counter; we accept that + * as being rare enough to not justify using an atomic op here + */ + slave_no = bond->rr_tx_counter++ % bond->slave_cnt; - if (!slave) { - goto out; + bond_for_each_slave(bond, slave, i) { + slave_no--; + if (slave_no < 0) { + break; + } } + start_at = slave; + bond_for_each_slave_from(bond, slave, i, start_at) { if (IS_UP(slave->dev) && (slave->link == BOND_LINK_UP) && (slave->state == BOND_STATE_ACTIVE)) { res = bond_dev_queue_xmit(bond, skb, slave->dev); - - write_lock(&bond->curr_slave_lock); - bond->curr_active_slave = slave->next; - write_unlock(&bond->curr_slave_lock); - break; } } - out: if (res) { /* no suitable interface, frame not sent */ @@ -4258,6 +4321,12 @@ static int bond_init(struct net_device *bond_dev, struct bond_params *params) bond->params = *params; /* copy params struct */ + /* initialize individual workqueue */ + bond->wq = create_singlethread_workqueue(bond_dev->name); + + if (!bond->wq) + return -ENOMEM; + /* Initialize pointers */ bond->first_slave = NULL; bond->curr_active_slave = NULL; @@ -4743,6 +4812,7 @@ static int __init bonding_init(void) { int i; int res; + struct bonding *bond, *nxt; printk(KERN_INFO "%s", version); @@ -4769,10 +4839,16 @@ static int __init bonding_init(void) goto out; err: + list_for_each_entry_safe(bond, nxt, &bond_dev_list, bond_list) { + bond_work_cancel_all(bond); + destroy_workqueue(bond->wq); + } + rtnl_lock(); bond_free_all(); bond_destroy_sysfs(); rtnl_unlock(); + out: return res; @@ -4780,9 +4856,18 @@ out: static void __exit bonding_exit(void) { + struct bonding *bond, *nxt; + unregister_netdevice_notifier(&bond_netdev_notifier); unregister_inetaddr_notifier(&bond_inetaddr_notifier); + list_for_each_entry_safe(bond, nxt, &bond_dev_list, bond_list) { + if (bond && bond->dev) { + bond_work_cancel_all(bond); + destroy_workqueue(bond->wq); + } + } + rtnl_lock(); bond_free_all(); bond_destroy_sysfs(); diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c index a122baa..ee17fd4 100644 --- a/drivers/net/bonding/bond_sysfs.c +++ b/drivers/net/bonding/bond_sysfs.c @@ -231,7 +231,7 @@ static ssize_t bonding_show_slaves(struct device *d, int i, res = 0; struct bonding *bond = to_bond(d); - read_lock_bh(&bond->lock); + read_lock(&bond->lock); bond_for_each_slave(bond, slave, i) { if (res > (PAGE_SIZE - IFNAMSIZ)) { /* not enough space for another interface name */ @@ -242,7 +242,7 @@ static ssize_t bonding_show_slaves(struct device *d, } res += sprintf(buf + res, "%s ", slave->dev->name); } - read_unlock_bh(&bond->lock); + read_unlock(&bond->lock); res += sprintf(buf + res, "\n"); res++; return res; @@ -285,18 +285,18 @@ static ssize_t bonding_store_slaves(struct device *d, /* Got a slave name in ifname. Is it already in the list? */ found = 0; - read_lock_bh(&bond->lock); + read_lock(&bond->lock); bond_for_each_slave(bond, slave, i) if (strnicmp(slave->dev->name, ifname, IFNAMSIZ) == 0) { printk(KERN_ERR DRV_NAME ": %s: Interface %s is already enslaved!\n", bond->dev->name, ifname); ret = -EPERM; - read_unlock_bh(&bond->lock); + read_unlock(&bond->lock); goto out; } - read_unlock_bh(&bond->lock); + read_unlock(&bond->lock); printk(KERN_INFO DRV_NAME ": %s: Adding slave %s.\n", bond->dev->name, ifname); dev = dev_get_by_name(ifname); @@ -610,11 +610,9 @@ static ssize_t bonding_store_arp_interval(struct device *d, bond->dev->name, bond->dev->name); bond->params.miimon = 0; /* Kill MII timer, else it brings bond's link down */ - if (bond->arp_timer.function) { - printk(KERN_INFO DRV_NAME - ": %s: Kill MII timer, else it brings bond's link down...\n", - bond->dev->name); - del_timer_sync(&bond->mii_timer); + if (delayed_work_pending(&bond->mii_work)) { + cancel_delayed_work(&bond->mii_work); + flush_workqueue(bond->wq); } } if (!bond->params.arp_targets[0]) { @@ -629,25 +627,15 @@ static ssize_t bonding_store_arp_interval(struct device *d, * timer will get fired off when the open function * is called. */ - if (bond->arp_timer.function) { - /* The timer's already set up, so fire it off */ - mod_timer(&bond->arp_timer, jiffies + 1); - } else { - /* Set up the timer. */ - init_timer(&bond->arp_timer); - bond->arp_timer.expires = jiffies + 1; - bond->arp_timer.data = - (unsigned long) bond->dev; - if (bond->params.mode == BOND_MODE_ACTIVEBACKUP) { - bond->arp_timer.function = - (void *) - &bond_activebackup_arp_mon; - } else { - bond->arp_timer.function = - (void *) - &bond_loadbalance_arp_mon; - } - add_timer(&bond->arp_timer); + if (!delayed_work_pending(&bond->arp_work)) { + if (bond->params.mode == BOND_MODE_ACTIVEBACKUP) + INIT_DELAYED_WORK(&bond->arp_work, + bond_activebackup_arp_mon); + else + INIT_DELAYED_WORK(&bond->arp_work, + bond_loadbalance_arp_mon); + + queue_delayed_work(bond->wq, &bond->arp_work, 0); } } @@ -1004,11 +992,12 @@ static ssize_t bonding_store_miimon(struct device *d, BOND_ARP_VALIDATE_NONE; } /* Kill ARP timer, else it brings bond's link down */ - if (bond->mii_timer.function) { + if (delayed_work_pending(&bond->arp_work)) { printk(KERN_INFO DRV_NAME ": %s: Kill ARP timer, else it brings bond's link down...\n", bond->dev->name); - del_timer_sync(&bond->arp_timer); + cancel_delayed_work(&bond->arp_work); + flush_workqueue(bond->wq); } } @@ -1018,18 +1007,11 @@ static ssize_t bonding_store_miimon(struct device *d, * timer will get fired off when the open function * is called. */ - if (bond->mii_timer.function) { - /* The timer's already set up, so fire it off */ - mod_timer(&bond->mii_timer, jiffies + 1); - } else { - /* Set up the timer. */ - init_timer(&bond->mii_timer); - bond->mii_timer.expires = jiffies + 1; - bond->mii_timer.data = - (unsigned long) bond->dev; - bond->mii_timer.function = - (void *) &bond_mii_monitor; - add_timer(&bond->mii_timer); + if (!delayed_work_pending(&bond->mii_work)) { + INIT_DELAYED_WORK(&bond->mii_work, + bond_mii_monitor); + queue_delayed_work(bond->wq, + &bond->mii_work, 0); } } } @@ -1068,6 +1050,7 @@ static ssize_t bonding_store_primary(struct device *d, struct slave *slave; struct bonding *bond = to_bond(d); + rtnl_lock(); write_lock_bh(&bond->lock); if (!USES_PRIMARY(bond->params.mode)) { printk(KERN_INFO DRV_NAME @@ -1103,6 +1086,8 @@ static ssize_t bonding_store_primary(struct device *d, } out: write_unlock_bh(&bond->lock); + rtnl_unlock(); + return count; } static DEVICE_ATTR(primary, S_IRUGO | S_IWUSR, bonding_show_primary, bonding_store_primary); @@ -1182,6 +1167,7 @@ static ssize_t bonding_store_active_slave(struct device *d, struct slave *new_active = NULL; struct bonding *bond = to_bond(d); + rtnl_lock(); write_lock_bh(&bond->lock); if (!USES_PRIMARY(bond->params.mode)) { printk(KERN_INFO DRV_NAME @@ -1239,6 +1225,8 @@ static ssize_t bonding_store_active_slave(struct device *d, } out: write_unlock_bh(&bond->lock); + rtnl_unlock(); + return count; } diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h index 41aa78b..d8a23b0 100644 --- a/drivers/net/bonding/bonding.h +++ b/drivers/net/bonding/bonding.h @@ -182,9 +182,8 @@ struct bonding { s32 slave_cnt; /* never change this value outside the attach/detach wrappers */ rwlock_t lock; rwlock_t curr_slave_lock; - struct timer_list mii_timer; - struct timer_list arp_timer; s8 kill_timers; + u16 rr_tx_counter; struct net_device_stats stats; #ifdef CONFIG_PROC_FS struct proc_dir_entry *proc_entry; @@ -201,6 +200,11 @@ struct bonding { struct list_head vlan_list; struct vlan_group *vlgrp; struct packet_type arp_mon_pt; + struct workqueue_struct *wq; + struct delayed_work mii_work; + struct delayed_work arp_work; + struct delayed_work alb_work; + struct delayed_work ad_work; }; /** @@ -302,9 +306,9 @@ void bond_destroy_slave_symlinks(struct net_device *master, struct net_device *s int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev); int bond_release(struct net_device *bond_dev, struct net_device *slave_dev); int bond_sethwaddr(struct net_device *bond_dev, struct net_device *slave_dev); -void bond_mii_monitor(struct net_device *bond_dev); -void bond_loadbalance_arp_mon(struct net_device *bond_dev); -void bond_activebackup_arp_mon(struct net_device *bond_dev); +void bond_mii_monitor(struct work_struct *); +void bond_loadbalance_arp_mon(struct work_struct *); +void bond_activebackup_arp_mon(struct work_struct *); void bond_set_mode_ops(struct bonding *bond, int mode); int bond_parse_parm(char *mode_arg, struct bond_parm_tbl *tbl); const char *bond_mode_name(int mode); @@ -312,6 +316,8 @@ void bond_select_active_slave(struct bonding *bond); void bond_change_active_slave(struct bonding *bond, struct slave *new_active); void bond_register_arp(struct bonding *); void bond_unregister_arp(struct bonding *); +void bond_wq_destroy(struct net_device *bond_dev); + #endif /* _LINUX_BONDING_H */ ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1 2007-04-26 18:58 [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1 Vincent ETIENNE 2007-04-26 20:27 ` Chris Snook @ 2007-04-27 4:11 ` Andrew Morton 2007-04-27 9:25 ` VE (HOME) 1 sibling, 1 reply; 26+ messages in thread From: Andrew Morton @ 2007-04-27 4:11 UTC (permalink / raw) To: Vincent ETIENNE; +Cc: Linux Kernel, fubar, bonding-devel On Thu, 26 Apr 2007 20:58:32 +0200 Vincent ETIENNE <ve@vetienne.net> wrote: > Apr 26 11:09:34 jupiter2 RTNL: assertion failed at net/ipv4/devinet.c > (1055) Apr 26 11:09:34 jupiter2 > Apr 26 11:09:34 jupiter2 Call Trace: > Apr 26 11:09:34 jupiter2 <IRQ> [<ffffffff8049b49e>] > inetdev_event+0x48/0x283 > Apr 26 11:09:34 jupiter2 [<ffffffff804c8731>] _spin_lock_bh+0x9/0x19 > Apr 26 11:09:34 jupiter2 [<ffffffff80473df1>] rt_run_flush+0x7e/0xaf > Apr 26 11:09:34 jupiter2 [<ffffffff8022bdd0>] notifier_call_chain+0x29/0x56 > Apr 26 11:09:34 jupiter2 [<ffffffff804560cc>] dev_set_mac_address+0x53/0x59 > Apr 26 11:09:34 jupiter2 [<ffffffff88006d8d>] > bo This was due to locking bustage in the net tree. It should be fixed in 2.6.21-rc7-mm2. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1 2007-04-27 4:11 ` Andrew Morton @ 2007-04-27 9:25 ` VE (HOME) 2007-04-27 19:20 ` Andrew Morton 2007-04-27 21:18 ` USB HID bug (was [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1) Greg KH 0 siblings, 2 replies; 26+ messages in thread From: VE (HOME) @ 2007-04-27 9:25 UTC (permalink / raw) To: Andrew Morton; +Cc: Linux Kernel, fubar, bonding-devel Andrew Morton wrote: > On Thu, 26 Apr 2007 20:58:32 +0200 Vincent ETIENNE <ve@vetienne.net> > wrote: > > > This was due to locking bustage in the net tree. It should be fixed > in 2.6.21-rc7-mm2. I have tried this version. Same problem ( see http://mail1.vetienne.net/linux/dmesg-2.6.21-rc7-mm2.log ) But also a new problem with USB keyboard/mouse BUG: at drivers/hid/usbhid/hid-quirks.c:480 usbhid_exists_dquirk() Call Trace: [<ffffffff8044bbcb>] usbhid_lookup_quirk+0x63/0xea [<ffffffff8044ab12>] hid_probe+0x4d/0xbad [<ffffffff8021d7cd>] default_wake_function+0x0/0xe [<ffffffff8029d0b6>] sysfs_new_dirent+0x79/0xaa [<ffffffff80403411>] usb_match_one_id+0x26/0x82 [<ffffffff80403fe6>] usb_probe_interface+0x7d/0xa5 [<ffffffff80391cb9>] driver_probe_device+0xf6/0x17f [<ffffffff80391de4>] __driver_attach+0x0/0x93 [<ffffffff80391e3e>] __driver_attach+0x5a/0x93 [<ffffffff8039110c>] bus_for_each_dev+0x43/0x6e [<ffffffff80391433>] bus_add_driver+0x7b/0x19d [<ffffffff80403b06>] usb_register_driver+0x7e/0xe1 [<ffffffff8064d2d3>] hid_init+0x28/0x4e [<ffffffff8063460b>] kernel_init+0x162/0x2d2 [<ffffffff8020a338>] child_rip+0xa/0x12 [<ffffffff80343d5c>] acpi_ds_init_one_object+0x0/0x7c [<ffffffff806344a9>] kernel_init+0x0/0x2d2 [<ffffffff8020a32e>] child_rip+0x0/0x12 input: ServerEngines SE USB Device as /devices/pci0000:00/0000:00:1d.2/usb4/4-2/4-2:1.0/input/input2 input: USB HID v1.11 Keyboard [ServerEngines SE USB Device] on usb-0000:00:1d.2-2 BUG: at drivers/hid/usbhid/hid-quirks.c:480 usbhid_exists_dquirk() Call Trace: [<ffffffff8044bbcb>] usbhid_lookup_quirk+0x63/0xea [<ffffffff8044ab12>] hid_probe+0x4d/0xbad [<ffffffff8029d0b6>] sysfs_new_dirent+0x79/0xaa [<ffffffff80403411>] usb_match_one_id+0x26/0x82 [<ffffffff80403fe6>] usb_probe_interface+0x7d/0xa5 [<ffffffff80391cb9>] driver_probe_device+0xf6/0x17f [<ffffffff80391de4>] __driver_attach+0x0/0x93 [<ffffffff80391e3e>] __driver_attach+0x5a/0x93 [<ffffffff8039110c>] bus_for_each_dev+0x43/0x6e [<ffffffff80391433>] bus_add_driver+0x7b/0x19d [<ffffffff80403b06>] usb_register_driver+0x7e/0xe1 [<ffffffff8064d2d3>] hid_init+0x28/0x4e [<ffffffff8063460b>] kernel_init+0x162/0x2d2 [<ffffffff8020a338>] child_rip+0xa/0x12 [<ffffffff80343d5c>] acpi_ds_init_one_object+0x0/0x7c [<ffffffff806344a9>] kernel_init+0x0/0x2d2 [<ffffffff8020a32e>] child_rip+0x0/0x12 And also a strange problem : dhcp server and dns server was started before bond0 was completly up ( eth0 and eth1 was declared down at this time and up a few times later : was not the case with later 2.6.21-rc6-mm1 ) Vincent ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1 2007-04-27 9:25 ` VE (HOME) @ 2007-04-27 19:20 ` Andrew Morton [not found] ` <200704272205.29131.ve@vetienne.net> 2007-04-27 21:18 ` USB HID bug (was [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1) Greg KH 1 sibling, 1 reply; 26+ messages in thread From: Andrew Morton @ 2007-04-27 19:20 UTC (permalink / raw) To: VE (HOME); +Cc: Linux Kernel, fubar, bonding-devel, netdev On Fri, 27 Apr 2007 11:25:46 +0200 "VE \(HOME\)" <ve@vetienne.net> wrote: > Andrew Morton wrote: > > On Thu, 26 Apr 2007 20:58:32 +0200 Vincent ETIENNE <ve@vetienne.net> > > wrote: > > > > > > This was due to locking bustage in the net tree. It should be fixed > > in 2.6.21-rc7-mm2. > > I have tried this version. Same problem ( see > http://mail1.vetienne.net/linux/dmesg-2.6.21-rc7-mm2.log ) That file has disappeared. > But also a new problem with USB keyboard/mouse > USB problem - let's handle that separately. > > And also a strange problem : dhcp server and dns server was started before > bond0 was completly up ( eth0 and eth1 was declared down at this time and > up a few times later : was not the case with later 2.6.21-rc6-mm1 ) > ^ permalink raw reply [flat|nested] 26+ messages in thread
[parent not found: <200704272205.29131.ve@vetienne.net>]
* Re: [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1 [not found] ` <200704272205.29131.ve@vetienne.net> @ 2007-04-27 22:32 ` Andrew Morton 2007-04-28 20:37 ` Vincent ETIENNE 2007-04-28 21:10 ` Vincent ETIENNE 0 siblings, 2 replies; 26+ messages in thread From: Andrew Morton @ 2007-04-27 22:32 UTC (permalink / raw) To: Vincent ETIENNE; +Cc: Stephen Hemminger, netdev On Fri, 27 Apr 2007 22:05:28 +0200 Vincent ETIENNE <ve@vetienne.net> wrote: > Le Friday 27 April 2007 21:20:39, vous avez __crit__: > > On Fri, 27 Apr 2007 11:25:46 +0200 "VE \(HOME\)" <ve@vetienne.net> wrote: > > > Andrew Morton wrote: > > > > On Thu, 26 Apr 2007 20:58:32 +0200 Vincent ETIENNE <ve@vetienne.net> > > > > wrote: > > > > > > > > > > > > This was due to locking bustage in the net tree. It should be fixed > > > > in 2.6.21-rc7-mm2. > > > > > > I have tried this version. Same problem ( see > > > http://mail1.vetienne.net/linux/dmesg-2.6.21-rc7-mm2.log ) > > > > That file has disappeared. > > Sorry wrong right on the file. Should be ok now Please don't go off-list. Now the people who work on this code don't know that the log is available. I've reestablished a few cc's. The troublesome part is here: e1000: eth0: e1000_watchdog_task: NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX/TX e1000: eth0: e1000_watchdog_task: 10/100 speed: disabling TSO bonding: bond0: link status definitely up for interface eth0. bonding: bond0: making interface eth0 the new active one. RTNL: assertion failed at net/ipv4/devinet.c (1055) Call Trace: <IRQ> [<ffffffff8049b9a1>] inetdev_event+0x48/0x283 [<ffffffff804c85d1>] _spin_lock_bh+0x9/0x19 [<ffffffff804753d7>] rt_run_flush+0x7e/0xaf [<ffffffff8022d388>] notifier_call_chain+0x29/0x56 [<ffffffff80457994>] dev_set_mac_address+0x53/0x59 [<ffffffff88006d6d>] :bonding:alb_set_slave_mac_addr+0x41/0x6c [<ffffffff880071e9>] :bonding:alb_swap_mac_addr+0x91/0x165 [<ffffffff88002022>] :bonding:bond_change_active_slave+0x227/0x382 [<ffffffff880024c2>] :bonding:bond_select_active_slave+0xb7/0xe5 [<ffffffff88004172>] :bonding:bond_mii_monitor+0x3cd/0x41e [<ffffffff88003da5>] :bonding:bond_mii_monitor+0x0/0x41e [<ffffffff802299a0>] run_timer_softirq+0x130/0x19f [<ffffffff80226b64>] __do_softirq+0x55/0xc4 [<ffffffff8020a6ac>] call_softirq+0x1c/0x28 [<ffffffff8020bfb9>] do_softirq+0x2c/0x7d [<ffffffff802145a7>] smp_apic_timer_interrupt+0x49/0x5e [<ffffffff80208989>] mwait_idle+0x0/0x47 [<ffffffff8020a156>] apic_timer_interrupt+0x66/0x70 <EOI> [<ffffffff802089cb>] mwait_idle+0x42/0x47 [<ffffffff80208921>] cpu_idle+0x7f/0xa2 [<ffffffff806349bd>] start_kernel+0x242/0x24e [<ffffffff80634146>] _sinittext+0x146/0x14a because we thought we'd fixed the rtnl_lock() problems in 2.6.21-rc7-mm2. Are you sure that log is from 2.6.21-rc7-mm2? ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1 2007-04-27 22:32 ` Andrew Morton @ 2007-04-28 20:37 ` Vincent ETIENNE 2007-04-28 21:31 ` Andrew Morton 2007-04-28 21:10 ` Vincent ETIENNE 1 sibling, 1 reply; 26+ messages in thread From: Vincent ETIENNE @ 2007-04-28 20:37 UTC (permalink / raw) To: Andrew Morton; +Cc: Stephen Hemminger, netdev Le Saturday 28 April 2007 00:32:37 Andrew Morton, vous avez écrit : > On Fri, 27 Apr 2007 22:05:28 +0200 > > because we thought we'd fixed the rtnl_lock() problems in 2.6.21-rc7-mm2. > Are you sure that log is from 2.6.21-rc7-mm2? Yes. I have retested it another time ( for adding the small usb debug message ) and get the same message a new time. Maybe a tiny difference : only eth0 was setup during boot ( bond0 and eth1 was not setup - volontary - to avoid problem that could interfere with usb ). Ony after having done my usb test, i have stopped the eth0, setup bond interface and restart bond interface and got the same problem with the same back trace. Is there anything i can do to get you more precise information ? ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1 2007-04-28 20:37 ` Vincent ETIENNE @ 2007-04-28 21:31 ` Andrew Morton 2007-04-28 21:55 ` Patrick McHardy 2007-04-28 22:04 ` Vincent ETIENNE 0 siblings, 2 replies; 26+ messages in thread From: Andrew Morton @ 2007-04-28 21:31 UTC (permalink / raw) To: Vincent ETIENNE; +Cc: Stephen Hemminger, netdev On Sat, 28 Apr 2007 22:37:42 +0200 Vincent ETIENNE <ve@vetienne.net> wrote: > Le Saturday 28 April 2007 00:32:37 Andrew Morton, vous avez écrit : > > On Fri, 27 Apr 2007 22:05:28 +0200 > > > > > because we thought we'd fixed the rtnl_lock() problems in 2.6.21-rc7-mm2. > > Are you sure that log is from 2.6.21-rc7-mm2? > > > Yes. I have retested it another time ( for adding the small usb debug > message ) and get the same message a new time. > > Maybe a tiny difference : only eth0 was setup during boot ( bond0 and eth1 was > not setup - volontary - to avoid problem that could interfere with usb ). > Ony after having done my usb test, i have stopped the eth0, setup bond > interface and restart bond interface and got the same problem with the same > back trace. OK, thanks. > Is there anything i can do to get you more precise information ? I guess if you could provide us with your .config and a step-by-step recipe which developers should use to reproduce the problem then we should be able to fix this pretty easily when someone finds the time to do so. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1 2007-04-28 21:31 ` Andrew Morton @ 2007-04-28 21:55 ` Patrick McHardy 2007-04-28 22:04 ` Vincent ETIENNE 1 sibling, 0 replies; 26+ messages in thread From: Patrick McHardy @ 2007-04-28 21:55 UTC (permalink / raw) To: Andrew Morton; +Cc: Vincent ETIENNE, Stephen Hemminger, netdev Andrew Morton wrote: > On Sat, 28 Apr 2007 22:37:42 +0200 Vincent ETIENNE <ve@vetienne.net> wrote: > >>>because we thought we'd fixed the rtnl_lock() problems in 2.6.21-rc7-mm2. >>>Are you sure that log is from 2.6.21-rc7-mm2? >> >> >>Yes. I have retested it another time ( for adding the small usb debug >>message ) and get the same message a new time. >> >>Maybe a tiny difference : only eth0 was setup during boot ( bond0 and eth1 was >>not setup - volontary - to avoid problem that could interfere with usb ). >>Ony after having done my usb test, i have stopped the eth0, setup bond >>interface and restart bond interface and got the same problem with the same >>back trace. > > > OK, thanks. The failed RTNL assertion here is unrelated to the mutex initialization bug in 2.6.1-rc7-mm1. The bonding driver is not holding the rtnl_mutex while calling dev_set_mac_address (and can not since its called from a timer). I believe this is a known problem to the bonding maintainers. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1 2007-04-28 21:31 ` Andrew Morton 2007-04-28 21:55 ` Patrick McHardy @ 2007-04-28 22:04 ` Vincent ETIENNE 1 sibling, 0 replies; 26+ messages in thread From: Vincent ETIENNE @ 2007-04-28 22:04 UTC (permalink / raw) To: Andrew Morton; +Cc: Stephen Hemminger, netdev Le Saturday 28 April 2007 23:31:02 Andrew Morton, vous avez écrit : > > > Is there anything i can do to get you more precise information ? > > I guess if you could provide us with your .config and a step-by-step recipe > which developers should use to reproduce the problem then we should be able > to fix this pretty easily when someone finds the time to do so. For the config, i have uploaded it to http://mail1.vetienne.net/linux/config-2.6.21-rc7-mm2. For reproducing the problem, it 's simple (for me at least) : - setup a bonding interface ( IP 192.168.1.5/255.255.255.0 so nothing fancy ) bond (mode 6 : alb ) on two NIC card (in my case E1000 and TG3 ). Don't know if it's important or not but the two cards are connected to the same cheap 100Mb switch ( So no gigabit ) - Reboot : when the bond interface went up ( after the two physical slaves ) the bug is triggered ( for the moment always, but i have only rebooted 2 or 3 times with 2.6.21-rc7-mm2 ). - You could also play with networks cable and put down/up one interface or another during normal work : it would often triggered the same trace ( not always although ). You don't need to have load on the network. The interface that went down or up doesn't seem to be important as far as i can see. - No lockup of the kernel, network still work fine after the problem ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1 2007-04-27 22:32 ` Andrew Morton 2007-04-28 20:37 ` Vincent ETIENNE @ 2007-04-28 21:10 ` Vincent ETIENNE 1 sibling, 0 replies; 26+ messages in thread From: Vincent ETIENNE @ 2007-04-28 21:10 UTC (permalink / raw) To: Linux Kernel Le Saturday 28 April 2007 00:32:37 Andrew Morton, vous avez écrit : > On Fri, 27 Apr 2007 22:05:28 +0200 > > because we thought we'd fixed the rtnl_lock() problems in 2.6.21-rc7-mm2. > Are you sure that log is from 2.6.21-rc7-mm2? Yes. I have retested it another time ( for adding the small usb debug message ) and get the same message a new time. Maybe a tiny difference : only eth0 was setup during boot ( bond0 and eth1 was not setup - volontary - to avoid problem that could interfere with usb ). Ony after having done my usb test, i have stopped the eth0, setup bond interface and restart bond interface and got the same problem with the same back trace. Is there anything i can do to get you more precise information ? Resend to the list : i have stripped lklm from the recipients...... ^ permalink raw reply [flat|nested] 26+ messages in thread
* USB HID bug (was [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1) 2007-04-27 9:25 ` VE (HOME) 2007-04-27 19:20 ` Andrew Morton @ 2007-04-27 21:18 ` Greg KH 2007-04-27 22:42 ` Jiri Kosina 1 sibling, 1 reply; 26+ messages in thread From: Greg KH @ 2007-04-27 21:18 UTC (permalink / raw) To: VE (HOME), jkosina; +Cc: Andrew Morton, Linux Kernel, fubar On Fri, Apr 27, 2007 at 11:25:46AM +0200, VE (HOME) wrote: > Andrew Morton wrote: > > On Thu, 26 Apr 2007 20:58:32 +0200 Vincent ETIENNE <ve@vetienne.net> > > wrote: > > > > > > This was due to locking bustage in the net tree. It should be fixed > > in 2.6.21-rc7-mm2. > > I have tried this version. Same problem ( see > http://mail1.vetienne.net/linux/dmesg-2.6.21-rc7-mm2.log ) > But also a new problem with USB keyboard/mouse > > BUG: at drivers/hid/usbhid/hid-quirks.c:480 usbhid_exists_dquirk() > > Call Trace: > [<ffffffff8044bbcb>] usbhid_lookup_quirk+0x63/0xea > [<ffffffff8044ab12>] hid_probe+0x4d/0xbad > [<ffffffff8021d7cd>] default_wake_function+0x0/0xe > [<ffffffff8029d0b6>] sysfs_new_dirent+0x79/0xaa > [<ffffffff80403411>] usb_match_one_id+0x26/0x82 > [<ffffffff80403fe6>] usb_probe_interface+0x7d/0xa5 > [<ffffffff80391cb9>] driver_probe_device+0xf6/0x17f > [<ffffffff80391de4>] __driver_attach+0x0/0x93 > [<ffffffff80391e3e>] __driver_attach+0x5a/0x93 > [<ffffffff8039110c>] bus_for_each_dev+0x43/0x6e > [<ffffffff80391433>] bus_add_driver+0x7b/0x19d > [<ffffffff80403b06>] usb_register_driver+0x7e/0xe1 > [<ffffffff8064d2d3>] hid_init+0x28/0x4e > [<ffffffff8063460b>] kernel_init+0x162/0x2d2 > [<ffffffff8020a338>] child_rip+0xa/0x12 > [<ffffffff80343d5c>] acpi_ds_init_one_object+0x0/0x7c > [<ffffffff806344a9>] kernel_init+0x0/0x2d2 > [<ffffffff8020a32e>] child_rip+0x0/0x12 > > input: ServerEngines SE USB Device as > /devices/pci0000:00/0000:00:1d.2/usb4/4-2/4-2:1.0/input/input2 > input: USB HID v1.11 Keyboard [ServerEngines SE USB Device] on > usb-0000:00:1d.2-2 > BUG: at drivers/hid/usbhid/hid-quirks.c:480 usbhid_exists_dquirk() > > Call Trace: > [<ffffffff8044bbcb>] usbhid_lookup_quirk+0x63/0xea > [<ffffffff8044ab12>] hid_probe+0x4d/0xbad > [<ffffffff8029d0b6>] sysfs_new_dirent+0x79/0xaa > [<ffffffff80403411>] usb_match_one_id+0x26/0x82 > [<ffffffff80403fe6>] usb_probe_interface+0x7d/0xa5 > [<ffffffff80391cb9>] driver_probe_device+0xf6/0x17f > [<ffffffff80391de4>] __driver_attach+0x0/0x93 > [<ffffffff80391e3e>] __driver_attach+0x5a/0x93 > [<ffffffff8039110c>] bus_for_each_dev+0x43/0x6e > [<ffffffff80391433>] bus_add_driver+0x7b/0x19d > [<ffffffff80403b06>] usb_register_driver+0x7e/0xe1 > [<ffffffff8064d2d3>] hid_init+0x28/0x4e > [<ffffffff8063460b>] kernel_init+0x162/0x2d2 > [<ffffffff8020a338>] child_rip+0xa/0x12 > [<ffffffff80343d5c>] acpi_ds_init_one_object+0x0/0x7c > [<ffffffff806344a9>] kernel_init+0x0/0x2d2 > [<ffffffff8020a32e>] child_rip+0x0/0x12 Jiri, any thoughts about this? This looks like it comes from your tree as 2.6.21 doesn't have the drivers/hid/usbhid/ directory... thanks, greg k-h ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: USB HID bug (was [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1) 2007-04-27 21:18 ` USB HID bug (was [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1) Greg KH @ 2007-04-27 22:42 ` Jiri Kosina 2007-04-27 22:55 ` Jiri Kosina ` (2 more replies) 0 siblings, 3 replies; 26+ messages in thread From: Jiri Kosina @ 2007-04-27 22:42 UTC (permalink / raw) To: Greg KH; +Cc: VE (HOME), Andrew Morton, Linux Kernel, fubar, Paul Walmsley On Fri, 27 Apr 2007, Greg KH wrote: > > BUG: at drivers/hid/usbhid/hid-quirks.c:480 usbhid_exists_dquirk() [ .. stacktraces stripped .. ] > Jiri, any thoughts about this? This looks like it comes from your tree > as 2.6.21 doesn't have the drivers/hid/usbhid/ directory... Paul (author of the corresponding patch) added to CC. This BUG (it's in fact a warning) is this one: WARN_ON(idVendor == 0); I now don't immediately see how this could happen - the vendor ID seems to be propagated properly from hid_probe() (nothing has been changed in this codepath), so this would mean that hid_probe() has been passed usb_interface for which le16_to_cpu(interface_to_usbdev(intf).dev->descriptor.idVendor) is equal to zero ... and this definitely shouldn't happen for any sane device (could the original poster please verify with lsusb, just to be 100% sure?). It would also help if the original poster, who is able to reproduce this WARN_ON, would verify whether hid_probe() is really being passed such strange usb_interface from upper USB layer ... please see the patch below, and send me the output if possible. Paul, do you have any idea? In fact, what was your reason for putting this WARN_ON() there? Did you ever meet any condition when idVendor == 0 appears there? Thanks. diff --git a/drivers/hid/usbhid/hid-core.c b/drivers/hid/usbhid/hid-core.c index f929dee..2a77d8b 100644 --- a/drivers/hid/usbhid/hid-core.c +++ b/drivers/hid/usbhid/hid-core.c @@ -984,6 +984,8 @@ static int hid_probe(struct usb_interfac dbg("HID probe called for ifnum %d", intf->altsetting->desc.bInterfaceNumber); + printk(KERN_DEBUG "DEBUG - hid_probe: idVendor: %x\n", + le16_to_cpu(interface_to_usbdev(intf)->descriptor.idVendor)); if (!(hid = usb_hid_configure(intf))) return -ENODEV; ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: USB HID bug (was [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1) 2007-04-27 22:42 ` Jiri Kosina @ 2007-04-27 22:55 ` Jiri Kosina 2007-04-27 23:24 ` Paul Walmsley 2007-04-28 7:21 ` Vincent ETIENNE 2 siblings, 0 replies; 26+ messages in thread From: Jiri Kosina @ 2007-04-27 22:55 UTC (permalink / raw) To: Greg KH, VE (HOME); +Cc: Andrew Morton, Linux Kernel, fubar, Paul Walmsley On Sat, 28 Apr 2007, Jiri Kosina wrote: > This BUG (it's in fact a warning) is this one: > WARN_ON(idVendor == 0); > I now don't immediately see how this could happen - the vendor ID seems > to be propagated properly from hid_probe() (nothing has been changed in > this codepath), so this would mean that hid_probe() has been passed > usb_interface for which > le16_to_cpu(interface_to_usbdev(intf).dev->descriptor.idVendor) is equal > to zero ... and this definitely shouldn't happen for any sane device > (could the original poster please verify with lsusb, just to be 100% > sure?). BTW do I guess correctly that your keyboard is useful without problems after that, you only see this BUG and stacktraces in your logs, right? Thanks, -- Jiri Kosina SUSE Labs ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: USB HID bug (was [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1) 2007-04-27 22:42 ` Jiri Kosina 2007-04-27 22:55 ` Jiri Kosina @ 2007-04-27 23:24 ` Paul Walmsley 2007-04-28 7:21 ` Vincent ETIENNE 2 siblings, 0 replies; 26+ messages in thread From: Paul Walmsley @ 2007-04-27 23:24 UTC (permalink / raw) To: Jiri Kosina; +Cc: Greg KH, VE (HOME), Andrew Morton, Linux Kernel, fubar Hi Jiri, On Sat, 28 Apr 2007, Jiri Kosina wrote: > Paul, do you have any idea? In fact, what was your reason for putting this > WARN_ON() there? The static quirk list uses idVendor == 0 to mark the end of hid_blacklist[], so we don't expect any device to have idVendor == 0. If a device is correctly presenting with idVendor == 0, we need a different way to terminate that blacklist. Either that or there's an upper-layer bug, as you write. Regarding its placement: that WARN_ON() belongs in the static quirk lookup code, rather than where it is now. Its current location must be a relic of an earlier patchset. > Did you ever meet any condition when idVendor == 0 appears there? No. There shouldn't be any functional problem with removing that WARN_ON(), and also removing the initial if() in usbhid_lookup_dquirk(). - Paul ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: USB HID bug (was [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1) 2007-04-27 22:42 ` Jiri Kosina 2007-04-27 22:55 ` Jiri Kosina 2007-04-27 23:24 ` Paul Walmsley @ 2007-04-28 7:21 ` Vincent ETIENNE 2007-04-28 14:07 ` Paul Walmsley 2007-04-28 15:06 ` Jiri Kosina 2 siblings, 2 replies; 26+ messages in thread From: Vincent ETIENNE @ 2007-04-28 7:21 UTC (permalink / raw) To: Jiri Kosina; +Cc: Greg KH, Andrew Morton, Linux Kernel, Paul Walmsley Le Saturday 28 April 2007 00:42:45 Jiri Kosina, vous avez écrit : > On Fri, 27 Apr 2007, Greg KH wrote: > > > BUG: at drivers/hid/usbhid/hid-quirks.c:480 usbhid_exists_dquirk() > > [ .. stacktraces stripped .. ] > > > Jiri, any thoughts about this? This looks like it comes from your tree > > as 2.6.21 doesn't have the drivers/hid/usbhid/ directory... > > Paul (author of the corresponding patch) added to CC. > > This BUG (it's in fact a warning) is this one: > > WARN_ON(idVendor == 0); > > I now don't immediately see how this could happen - the vendor ID seems to > be propagated properly from hid_probe() (nothing has been changed in this > codepath), so this would mean that hid_probe() has been passed > usb_interface for which > le16_to_cpu(interface_to_usbdev(intf).dev->descriptor.idVendor) is equal > to zero ... and this definitely shouldn't happen for any sane device > (could the original poster please verify with lsusb, just to be 100% > sure?). You could download the result of lsusb -vvv from http://mail1.vetienne.net/linux/lsusb.log > > It would also help if the original poster, who is able to reproduce this > WARN_ON, would verify whether hid_probe() is really being passed such > strange usb_interface from upper USB layer ... please see the patch below, > and send me the output if possible. > > Paul, do you have any idea? In fact, what was your reason for putting this > WARN_ON() there? Did you ever meet any condition when idVendor == 0 > appears there? > > Thanks. > > diff --git a/drivers/hid/usbhid/hid-core.c b/drivers/hid/usbhid/hid-core.c > index f929dee..2a77d8b 100644 > --- a/drivers/hid/usbhid/hid-core.c > +++ b/drivers/hid/usbhid/hid-core.c > @@ -984,6 +984,8 @@ static int hid_probe(struct usb_interfac > dbg("HID probe called for ifnum %d", > intf->altsetting->desc.bInterfaceNumber); > > + printk(KERN_DEBUG "DEBUG - hid_probe: idVendor: %x\n", > + le16_to_cpu(interface_to_usbdev(intf)->descriptor.idVendor)); > if (!(hid = usb_hid_configure(intf))) > return -ENODEV; With your patch it seems like idVendor passed is 0. You could see it there ; http://mail1.vetienne.net/linux/dmesg.2.6.21-rc7-mm2+patch-usbhid I could confirm that the keyboard is working ( yesterday i was just behind the box due to test on the network ). But i don't remember if the keyboard is USB or PS2 (i only have ssh access to the box for the next 3/4 days) Vincent ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: USB HID bug (was [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1) 2007-04-28 7:21 ` Vincent ETIENNE @ 2007-04-28 14:07 ` Paul Walmsley 2007-04-28 15:06 ` Jiri Kosina 1 sibling, 0 replies; 26+ messages in thread From: Paul Walmsley @ 2007-04-28 14:07 UTC (permalink / raw) To: Vincent ETIENNE; +Cc: Jiri Kosina, Greg KH, Andrew Morton, Linux Kernel On Sat, 28 Apr 2007, Vincent ETIENNE wrote: > With your patch it seems like idVendor passed is 0. You could see it there ; > > http://mail1.vetienne.net/linux/dmesg.2.6.21-rc7-mm2+patch-usbhid > > I could confirm that the keyboard is working ( yesterday i was just behind the > box due to test on the network ). But i don't remember if the keyboard is USB > or PS2 (i only have ssh access to the box for the next 3/4 days) It seems that OpenBSD users are reporting similar problems: http://article.gmane.org/gmane.os.openbsd.misc/122526 http://www.mail-archive.com/misc@openbsd.org/msg38200.html - Paul ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: USB HID bug (was [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1) 2007-04-28 7:21 ` Vincent ETIENNE 2007-04-28 14:07 ` Paul Walmsley @ 2007-04-28 15:06 ` Jiri Kosina 2007-04-28 19:50 ` [linux-usb-devel] " Alan Stern 1 sibling, 1 reply; 26+ messages in thread From: Jiri Kosina @ 2007-04-28 15:06 UTC (permalink / raw) To: Vincent ETIENNE Cc: Greg KH, Andrew Morton, Linux Kernel, Paul Walmsley, linux-usb-devel On Sat, 28 Apr 2007, Vincent ETIENNE wrote: > > I now don't immediately see how this could happen - the vendor ID > > seems to be propagated properly from hid_probe() (nothing has been > > changed in this codepath), so this would mean that hid_probe() has > > been passed usb_interface for which > > le16_to_cpu(interface_to_usbdev(intf).dev->descriptor.idVendor) is > > equal to zero ... and this definitely shouldn't happen for any sane > > device (could the original poster please verify with lsusb, just to be > > 100% sure?). > You could download the result of lsusb -vvv from > http://mail1.vetienne.net/linux/lsusb.log Hi Vincent, thanks for the report. It's pretty awesome though - all the USB devices seem to have vendor and product id set to 0x0000. Greg, have you ever met this? linux-usb-devel added to CC (full thread here: http://lkml.org/lkml/2007/4/27/496) So this definitely isn't a HID-specific problem, something is confusing the USB VID/PIDs. -- Jiri Kosina SUSE Labs ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [linux-usb-devel] USB HID bug (was [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1) 2007-04-28 15:06 ` Jiri Kosina @ 2007-04-28 19:50 ` Alan Stern 2007-04-28 20:43 ` Vincent ETIENNE 0 siblings, 1 reply; 26+ messages in thread From: Alan Stern @ 2007-04-28 19:50 UTC (permalink / raw) To: Jiri Kosina Cc: Vincent ETIENNE, Greg KH, Andrew Morton, Linux Kernel, linux-usb-devel, Paul Walmsley On Sat, 28 Apr 2007, Jiri Kosina wrote: > On Sat, 28 Apr 2007, Vincent ETIENNE wrote: > > > > I now don't immediately see how this could happen - the vendor ID > > > seems to be propagated properly from hid_probe() (nothing has been > > > changed in this codepath), so this would mean that hid_probe() has > > > been passed usb_interface for which > > > le16_to_cpu(interface_to_usbdev(intf).dev->descriptor.idVendor) is > > > equal to zero ... and this definitely shouldn't happen for any sane > > > device (could the original poster please verify with lsusb, just to be > > > 100% sure?). > > You could download the result of lsusb -vvv from > > http://mail1.vetienne.net/linux/lsusb.log > > Hi Vincent, > > thanks for the report. It's pretty awesome though - all the USB devices > seem to have vendor and product id set to 0x0000. Greg, have you ever met > this? > > linux-usb-devel added to CC (full thread here: > http://lkml.org/lkml/2007/4/27/496) > > So this definitely isn't a HID-specific problem, something is confusing > the USB VID/PIDs. No, it isn't a problem in the USB core. The device itself is messed up; it really does report idVendor and idProduct both equal to 0. Jiri, don't worry about all those other devices in the listing that also have idVendor and idProduct set to 0. They aren't real devices at all; they are virtual root hubs. You'll see what I mean if you look at their iManufacturer, iProduct, and iSerial strings. Alan Stern ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [linux-usb-devel] USB HID bug (was [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1) 2007-04-28 19:50 ` [linux-usb-devel] " Alan Stern @ 2007-04-28 20:43 ` Vincent ETIENNE 2007-04-29 11:18 ` Jiri Kosina 0 siblings, 1 reply; 26+ messages in thread From: Vincent ETIENNE @ 2007-04-28 20:43 UTC (permalink / raw) To: Alan Stern Cc: Jiri Kosina, Greg KH, Andrew Morton, Linux Kernel, linux-usb-devel, Paul Walmsley Le Saturday 28 April 2007 21:50:30 Alan Stern, vous avez écrit : > No, it isn't a problem in the USB core. The device itself is messed up; > it really does report idVendor and idProduct both equal to 0. > > Jiri, don't worry about all those other devices in the listing that also > have idVendor and idProduct set to 0. They aren't real devices at all; > they are virtual root hubs. You'll see what I mean if you look at their > iManufacturer, iProduct, and iSerial strings. > > Alan Stern So is it just a scary trace but without consequence that i could ignored ? May i ask you what is the device which is messed up ? ( Maybe should i change it ? ) ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [linux-usb-devel] USB HID bug (was [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1) 2007-04-28 20:43 ` Vincent ETIENNE @ 2007-04-29 11:18 ` Jiri Kosina 2007-04-29 13:40 ` Vincent ETIENNE 0 siblings, 1 reply; 26+ messages in thread From: Jiri Kosina @ 2007-04-29 11:18 UTC (permalink / raw) To: Vincent ETIENNE Cc: Alan Stern, Greg KH, Andrew Morton, Linux Kernel, linux-usb-devel, Paul Walmsley On Sat, 28 Apr 2007, Vincent ETIENNE wrote: > > No, it isn't a problem in the USB core. The device itself is messed up; > > it really does report idVendor and idProduct both equal to 0. > So is it just a scary trace but without consequence that i could ignored > ? May i ask you what is the device which is messed up ? ( Maybe should i > change it ? ) Hi Vincent, yes, the device is messed up, but it shouldn't have any consequences for you - the HID driver is able to correctly handle that, so as soon as we don't need to add any extra quirks for such device, everything should be fine. I have removed the WARN_ON from the code in my tree. I think we still don't want users to add quirks for such broken devices (as it would collide with hid_blakclist[] terminator), so I have left the initial condition in usbhid_modify_dquirk() untouched. From: Jiri Kosina <jkosina@suse.cz> USB HID: don't warn on idVendor == 0 It turns out that there are broken devices out there that incorrectly report VID/PID as 0x000, see http://lkml.org/lkml/2007/4/27/496 Therefore we should not confuse users by dumping warnings and stacktraces in such situation. It is not possible to add quirks for such horribly broken devices, but currently that's not needed. Signed-off-by: Jiri Kosina <jkosina@suse.cz> diff --git a/drivers/hid/usbhid/hid-quirks.c b/drivers/hid/usbhid/hid-quirks.c index 27188bd..17a8755 100644 --- a/drivers/hid/usbhid/hid-quirks.c +++ b/drivers/hid/usbhid/hid-quirks.c @@ -477,8 +477,6 @@ static struct hid_blacklist *usbhid_exis struct quirks_list_struct *q; struct hid_blacklist *bl_entry = NULL; - WARN_ON(idVendor == 0); - list_for_each_entry(q, &dquirks_list, node) { if (q->hid_bl_item.idVendor == idVendor && q->hid_bl_item.idProduct == idProduct) { ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [linux-usb-devel] USB HID bug (was [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1) 2007-04-29 11:18 ` Jiri Kosina @ 2007-04-29 13:40 ` Vincent ETIENNE 0 siblings, 0 replies; 26+ messages in thread From: Vincent ETIENNE @ 2007-04-29 13:40 UTC (permalink / raw) To: Jiri Kosina Cc: Alan Stern, Greg KH, Andrew Morton, Linux Kernel, linux-usb-devel, Paul Walmsley Le Sunday 29 April 2007 13:18:31 Jiri Kosina, vous avez écrit : > > Hi Vincent, Hi Jiri, > > yes, the device is messed up, but it shouldn't have any consequences for > you - the HID driver is able to correctly handle that, so as soon as we > don't need to add any extra quirks for such device, everything should be > fine. I have removed the WARN_ON from the code in my tree. I think we > still don't want users to add quirks for such broken devices (as it would > collide with hid_blakclist[] terminator), so I have left the initial > condition in usbhid_modify_dquirk() untouched. > Thanks a lot for the explanation and the patch, now i better understand the "problem". Sorry to have bother you with just noise. The patch will certainly be of some help with dumb user (who have dump hardware ) like me. I have very appreciate the attention you ( and all other from this feed ) have take with me. So a big thanks for your work and the next time (if any), i will try to better analyse the problem to avoid unnecessary work. Vincent ^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2007-05-10 19:57 UTC | newest]
Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-26 18:58 [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1 Vincent ETIENNE
2007-04-26 20:27 ` Chris Snook
2007-04-26 20:44 ` [Bonding-devel] " Jay Vosburgh
2007-04-26 21:43 ` Vincent ETIENNE
2007-05-09 18:47 ` Vincent ETIENNE
2007-05-10 19:57 ` Jay Vosburgh
2007-04-27 4:11 ` Andrew Morton
2007-04-27 9:25 ` VE (HOME)
2007-04-27 19:20 ` Andrew Morton
[not found] ` <200704272205.29131.ve@vetienne.net>
2007-04-27 22:32 ` Andrew Morton
2007-04-28 20:37 ` Vincent ETIENNE
2007-04-28 21:31 ` Andrew Morton
2007-04-28 21:55 ` Patrick McHardy
2007-04-28 22:04 ` Vincent ETIENNE
2007-04-28 21:10 ` Vincent ETIENNE
2007-04-27 21:18 ` USB HID bug (was [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1) Greg KH
2007-04-27 22:42 ` Jiri Kosina
2007-04-27 22:55 ` Jiri Kosina
2007-04-27 23:24 ` Paul Walmsley
2007-04-28 7:21 ` Vincent ETIENNE
2007-04-28 14:07 ` Paul Walmsley
2007-04-28 15:06 ` Jiri Kosina
2007-04-28 19:50 ` [linux-usb-devel] " Alan Stern
2007-04-28 20:43 ` Vincent ETIENNE
2007-04-29 11:18 ` Jiri Kosina
2007-04-29 13:40 ` Vincent ETIENNE
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.