All of lore.kernel.org
 help / color / mirror / Atom feed
* [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1
@ 2007-04-26 18:58 Vincent ETIENNE
  2007-04-26 20:27 ` Chris Snook
  2007-04-27  4:11 ` Andrew Morton
  0 siblings, 2 replies; 26+ messages in thread
From: Vincent ETIENNE @ 2007-04-26 18:58 UTC (permalink / raw)
  To: Linux Kernel; +Cc: fubar, bonding-devel

Hi,
 
 Summary :
	Got this trace when one network interface come down or up in a 2
	interfaces bonding.	So far, system seems to survive to this problem
	and works fine.

 Full description
 
	During testing of bonding of 2 interfaces, i have seen this from
	time to time in my log file ( the problem doesn't arrive each
	time but one in 3 or 4 try ). 

	SYSTEM :	 2 NIC card bond on interface bond0 : 
				intel PRO/1000 (e1000 )
				Broadcomm ( tg3 )
	I have also try a 2.6.20 and 2.6.19 vanilla kernel ( identical problem but in 
	onecase the system doesn't survive : that the reason the problem catch my 
	attention )
 
	Keywords	; network, bonding
 
	Version : Linux version 2.6.21-rc6-mm1 (root@jupiter2) (gcc version 4.1.1
	(Gentoo 4.1.1-r3)) #3 SMP Thu Apr 26 08:45:06 CEST 2007
 
	Output of /var/log/messages
 
		Apr 26 11:09:34 jupiter2 e1000: eth0: e1000_watchdog_task: NIC Link
		is Down Apr 26 11:09:34 jupiter2 bonding: bond0: link status
		definitely down for interface eth0, disabling it
		Apr 26 11:09:34 jupiter2 bonding: bond0: making interface eth1 the new
		active one.
		Apr 26 11:09:34 jupiter2 RTNL: assertion failed at net/ipv4/devinet.c
		(1055) Apr 26 11:09:34 jupiter2
		Apr 26 11:09:34 jupiter2 Call Trace:
		Apr 26 11:09:34 jupiter2 <IRQ>  [<ffffffff8049b49e>] 
inetdev_event+0x48/0x283
		Apr 26 11:09:34 jupiter2 [<ffffffff804c8731>] _spin_lock_bh+0x9/0x19
		Apr 26 11:09:34 jupiter2 [<ffffffff80473df1>] rt_run_flush+0x7e/0xaf
		Apr 26 11:09:34 jupiter2 [<ffffffff8022bdd0>] notifier_call_chain+0x29/0x56 
		Apr 26 11:09:34 jupiter2 [<ffffffff804560cc>] dev_set_mac_address+0x53/0x59 			
		Apr 26 11:09:34 jupiter2 [<ffffffff88006d8d>] 
bonding:alb_set_slave_mac_addr+0x41/0x6c
		Apr 26 11:09:34 jupiter2 [<ffffffff88007215>] 
bonding:alb_swap_mac_addr+0x91/0x165
		Apr 26 11:09:34 jupiter2 [<ffffffff88002029>] 
bonding:bond_change_active_slave+0x227/0x382
		Apr 26 11:09:34 jupiter2 [<ffffffff880024c9>]
		bonding:bond_select_active_slave+0xb7/0xe5
		Apr 26 11:09:34 jupiter2 [<ffffffff88004182>] 
bonding:bond_mii_monitor+0x3cd/0x41e
		Apr 26 11:09:34 jupiter2 [<ffffffff88003db5>] 
bonding:bond_mii_monitor+0x0/0x41e
		Apr 26 11:09:34 jupiter2 [<ffffffff80228714>]
 run_timer_softirq+0x130/0x19f
		Apr 26 11:09:34 jupiter2[<ffffffff8022618a>] __do_softirq+0x55/0xc4 
		Apr 26 11:09:34 jupiter2 [<ffffffff8020a5ac>] call_softirq+0x1c/0x28
		Apr 26 11:09:34 jupiter2 [<ffffffff8020bead>] do_softirq+0x2c/0x7d
		Apr 26 11:09:34 jupiter2 [<ffffffff80213b2a>] 
smp_apic_timer_interrupt+0x49/0x5f
		 Apr 26 11:09:34 jupiter2 [<ffffffff802088a3>] mwait_idle+0x0/0x45
		Apr 26 11:09:34 jupiter2 [<ffffffff8020a056>] apic_timer_interrupt+0x66/0x70
		 Apr 26 11:09:34 jupiter2 <EOI>  [<ffffffff802088e5>] mwait_idle+0x42/0x45 
		Apr 26 11:09:34 jupiter2 [<ffffffff8020883f>] cpu_idle+0x51/0x70 
		Apr 26 11:09:34 jupiter2 [<ffffffff806369bd>] start_kernel+0x242/0x24e
		Apr 26 11:09:34 jupiter2 [<ffffffff80636146>] _sinittext+0x146/0x14a
 
 
other  informations (ver_linux, lspci, ...  ) available at 
http://mail1.vetienne.net/linux

I'm a bit worried by the message so any help will be greatly appreciated

	Vincent

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1
  2007-04-26 18:58 [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1 Vincent ETIENNE
@ 2007-04-26 20:27 ` Chris Snook
  2007-04-26 20:44   ` [Bonding-devel] " Jay Vosburgh
  2007-04-27  4:11 ` Andrew Morton
  1 sibling, 1 reply; 26+ messages in thread
From: Chris Snook @ 2007-04-26 20:27 UTC (permalink / raw)
  To: Vincent ETIENNE; +Cc: Linux Kernel, fubar, bonding-devel

Vincent ETIENNE wrote:
> Hi,
>  
>  Summary :
> 	Got this trace when one network interface come down or up in a 2
> 	interfaces bonding.	So far, system seems to survive to this problem
> 	and works fine.

I'm investigating a similar/possibly identical bug.  Do you experience 
packet loss or throughput stalls, beyond just the loss of the interface 
that went down, when this happens?

	-- Chris

>  Full description
>  
> 	During testing of bonding of 2 interfaces, i have seen this from
> 	time to time in my log file ( the problem doesn't arrive each
> 	time but one in 3 or 4 try ). 
> 
> 	SYSTEM :	 2 NIC card bond on interface bond0 : 
> 				intel PRO/1000 (e1000 )
> 				Broadcomm ( tg3 )
> 	I have also try a 2.6.20 and 2.6.19 vanilla kernel ( identical problem but in 
> 	onecase the system doesn't survive : that the reason the problem catch my 
> 	attention )
>  
> 	Keywords	; network, bonding
>  
> 	Version : Linux version 2.6.21-rc6-mm1 (root@jupiter2) (gcc version 4.1.1
> 	(Gentoo 4.1.1-r3)) #3 SMP Thu Apr 26 08:45:06 CEST 2007
>  
> 	Output of /var/log/messages
>  
> 		Apr 26 11:09:34 jupiter2 e1000: eth0: e1000_watchdog_task: NIC Link
> 		is Down Apr 26 11:09:34 jupiter2 bonding: bond0: link status
> 		definitely down for interface eth0, disabling it
> 		Apr 26 11:09:34 jupiter2 bonding: bond0: making interface eth1 the new
> 		active one.
> 		Apr 26 11:09:34 jupiter2 RTNL: assertion failed at net/ipv4/devinet.c
> 		(1055) Apr 26 11:09:34 jupiter2
> 		Apr 26 11:09:34 jupiter2 Call Trace:
> 		Apr 26 11:09:34 jupiter2 <IRQ>  [<ffffffff8049b49e>] 
> inetdev_event+0x48/0x283
> 		Apr 26 11:09:34 jupiter2 [<ffffffff804c8731>] _spin_lock_bh+0x9/0x19
> 		Apr 26 11:09:34 jupiter2 [<ffffffff80473df1>] rt_run_flush+0x7e/0xaf
> 		Apr 26 11:09:34 jupiter2 [<ffffffff8022bdd0>] notifier_call_chain+0x29/0x56 
> 		Apr 26 11:09:34 jupiter2 [<ffffffff804560cc>] dev_set_mac_address+0x53/0x59 			
> 		Apr 26 11:09:34 jupiter2 [<ffffffff88006d8d>] 
> bonding:alb_set_slave_mac_addr+0x41/0x6c
> 		Apr 26 11:09:34 jupiter2 [<ffffffff88007215>] 
> bonding:alb_swap_mac_addr+0x91/0x165
> 		Apr 26 11:09:34 jupiter2 [<ffffffff88002029>] 
> bonding:bond_change_active_slave+0x227/0x382
> 		Apr 26 11:09:34 jupiter2 [<ffffffff880024c9>]
> 		bonding:bond_select_active_slave+0xb7/0xe5
> 		Apr 26 11:09:34 jupiter2 [<ffffffff88004182>] 
> bonding:bond_mii_monitor+0x3cd/0x41e
> 		Apr 26 11:09:34 jupiter2 [<ffffffff88003db5>] 
> bonding:bond_mii_monitor+0x0/0x41e
> 		Apr 26 11:09:34 jupiter2 [<ffffffff80228714>]
>  run_timer_softirq+0x130/0x19f
> 		Apr 26 11:09:34 jupiter2[<ffffffff8022618a>] __do_softirq+0x55/0xc4 
> 		Apr 26 11:09:34 jupiter2 [<ffffffff8020a5ac>] call_softirq+0x1c/0x28
> 		Apr 26 11:09:34 jupiter2 [<ffffffff8020bead>] do_softirq+0x2c/0x7d
> 		Apr 26 11:09:34 jupiter2 [<ffffffff80213b2a>] 
> smp_apic_timer_interrupt+0x49/0x5f
> 		 Apr 26 11:09:34 jupiter2 [<ffffffff802088a3>] mwait_idle+0x0/0x45
> 		Apr 26 11:09:34 jupiter2 [<ffffffff8020a056>] apic_timer_interrupt+0x66/0x70
> 		 Apr 26 11:09:34 jupiter2 <EOI>  [<ffffffff802088e5>] mwait_idle+0x42/0x45 
> 		Apr 26 11:09:34 jupiter2 [<ffffffff8020883f>] cpu_idle+0x51/0x70 
> 		Apr 26 11:09:34 jupiter2 [<ffffffff806369bd>] start_kernel+0x242/0x24e
> 		Apr 26 11:09:34 jupiter2 [<ffffffff80636146>] _sinittext+0x146/0x14a
>  
>  
> other  informations (ver_linux, lspci, ...  ) available at 
> http://mail1.vetienne.net/linux
> 
> I'm a bit worried by the message so any help will be greatly appreciated
> 
> 	Vincent


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Bonding-devel] [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1
  2007-04-26 20:27 ` Chris Snook
@ 2007-04-26 20:44   ` Jay Vosburgh
  2007-04-26 21:43     ` Vincent ETIENNE
  2007-05-09 18:47     ` Vincent ETIENNE
  0 siblings, 2 replies; 26+ messages in thread
From: Jay Vosburgh @ 2007-04-26 20:44 UTC (permalink / raw)
  To: Chris Snook; +Cc: Vincent ETIENNE, Linux Kernel, bonding-devel, Andy Gospodarek

Chris Snook <csnook@redhat.com> wrote:

>Vincent ETIENNE wrote:
>> Hi,
>>  
>>  Summary :
>> 	Got this trace when one network interface come down or up in a 2
>> 	interfaces bonding.	So far, system seems to survive to this problem
>> 	and works fine.
>
>I'm investigating a similar/possibly identical bug.  Do you experience 
>packet loss or throughput stalls, beyond just the loss of the interface 
>that went down, when this happens?

	This problem looks to be one of the known locking issues with
bonding.

	Andy Gospodarek <andy@greyhouse.net> and I have been working
offline on the locking issues in bonding over the last several weeks.
At the moment, we have a generally stable (but ugly with debug fluff and
other yuckies) patch that seems to resolve at least the majority of the
various issues.  I'm thinking to clean it up for general posting early
next week, and address additional problems from there (since it's
hopefully at least a big step forward).

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Bonding-devel] [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1
  2007-04-26 20:44   ` [Bonding-devel] " Jay Vosburgh
@ 2007-04-26 21:43     ` Vincent ETIENNE
  2007-05-09 18:47     ` Vincent ETIENNE
  1 sibling, 0 replies; 26+ messages in thread
From: Vincent ETIENNE @ 2007-04-26 21:43 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: Chris Snook, Linux Kernel, bonding-devel, Andy Gospodarek

Le Thursday 26 April 2007 22:44:59 Jay Vosburgh, vous avez écrit :
> Chris Snook <csnook@redhat.com> wrote:
> >Vincent ETIENNE wrote:
> >> Hi,
> >>
> >>  Summary :
> >> 	Got this trace when one network interface come down or up in a 2
> >> 	interfaces bonding.	So far, system seems to survive to this problem
> >> 	and works fine.
> >
> >I'm investigating a similar/possibly identical bug.  Do you experience
> >packet loss or throughput stalls, beyond just the loss of the interface
> >that went down, when this happens?
>
> 	This problem looks to be one of the known locking issues with
> bonding.
>
> 	Andy Gospodarek <andy@greyhouse.net> and I have been working
> offline on the locking issues in bonding over the last several weeks.
> At the moment, we have a generally stable (but ugly with debug fluff and
> other yuckies) patch that seems to resolve at least the majority of the
> various issues.  I'm thinking to clean it up for general posting early
> next week, and address additional problems from there (since it's
> hopefully at least a big step forward).
>
> 	-J
>
> ---
> 	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

Will be more than happy to test it and report the result with the new patch.

Thanks for your work,

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1
  2007-04-26 18:58 [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1 Vincent ETIENNE
  2007-04-26 20:27 ` Chris Snook
@ 2007-04-27  4:11 ` Andrew Morton
  2007-04-27  9:25   ` VE (HOME)
  1 sibling, 1 reply; 26+ messages in thread
From: Andrew Morton @ 2007-04-27  4:11 UTC (permalink / raw)
  To: Vincent ETIENNE; +Cc: Linux Kernel, fubar, bonding-devel

On Thu, 26 Apr 2007 20:58:32 +0200 Vincent ETIENNE <ve@vetienne.net> wrote:

> 		Apr 26 11:09:34 jupiter2 RTNL: assertion failed at net/ipv4/devinet.c
> 		(1055) Apr 26 11:09:34 jupiter2
> 		Apr 26 11:09:34 jupiter2 Call Trace:
> 		Apr 26 11:09:34 jupiter2 <IRQ>  [<ffffffff8049b49e>] 
> inetdev_event+0x48/0x283
> 		Apr 26 11:09:34 jupiter2 [<ffffffff804c8731>] _spin_lock_bh+0x9/0x19
> 		Apr 26 11:09:34 jupiter2 [<ffffffff80473df1>] rt_run_flush+0x7e/0xaf
> 		Apr 26 11:09:34 jupiter2 [<ffffffff8022bdd0>] notifier_call_chain+0x29/0x56 
> 		Apr 26 11:09:34 jupiter2 [<ffffffff804560cc>] dev_set_mac_address+0x53/0x59 			
> 		Apr 26 11:09:34 jupiter2 [<ffffffff88006d8d>] 
> bo

This was due to locking bustage in the net tree.  It should be fixed in 2.6.21-rc7-mm2.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1
  2007-04-27  4:11 ` Andrew Morton
@ 2007-04-27  9:25   ` VE (HOME)
  2007-04-27 19:20     ` Andrew Morton
  2007-04-27 21:18     ` USB HID bug (was [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1) Greg KH
  0 siblings, 2 replies; 26+ messages in thread
From: VE (HOME) @ 2007-04-27  9:25 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linux Kernel, fubar, bonding-devel

Andrew Morton wrote:
> On Thu, 26 Apr 2007 20:58:32 +0200 Vincent ETIENNE <ve@vetienne.net>
> wrote:
>
>
> This was due to locking bustage in the net tree.  It should be fixed
> in 2.6.21-rc7-mm2.

I have tried this version. Same problem ( see 
http://mail1.vetienne.net/linux/dmesg-2.6.21-rc7-mm2.log )
But also a new problem with USB keyboard/mouse

BUG: at drivers/hid/usbhid/hid-quirks.c:480 usbhid_exists_dquirk()

Call Trace:
[<ffffffff8044bbcb>] usbhid_lookup_quirk+0x63/0xea
[<ffffffff8044ab12>] hid_probe+0x4d/0xbad
[<ffffffff8021d7cd>] default_wake_function+0x0/0xe
[<ffffffff8029d0b6>] sysfs_new_dirent+0x79/0xaa
[<ffffffff80403411>] usb_match_one_id+0x26/0x82
[<ffffffff80403fe6>] usb_probe_interface+0x7d/0xa5
[<ffffffff80391cb9>] driver_probe_device+0xf6/0x17f
[<ffffffff80391de4>] __driver_attach+0x0/0x93
[<ffffffff80391e3e>] __driver_attach+0x5a/0x93
[<ffffffff8039110c>] bus_for_each_dev+0x43/0x6e
[<ffffffff80391433>] bus_add_driver+0x7b/0x19d
[<ffffffff80403b06>] usb_register_driver+0x7e/0xe1
[<ffffffff8064d2d3>] hid_init+0x28/0x4e
[<ffffffff8063460b>] kernel_init+0x162/0x2d2
[<ffffffff8020a338>] child_rip+0xa/0x12
[<ffffffff80343d5c>] acpi_ds_init_one_object+0x0/0x7c
[<ffffffff806344a9>] kernel_init+0x0/0x2d2
[<ffffffff8020a32e>] child_rip+0x0/0x12

input: ServerEngines SE USB Device as 
/devices/pci0000:00/0000:00:1d.2/usb4/4-2/4-2:1.0/input/input2
input: USB HID v1.11 Keyboard [ServerEngines SE USB Device] on 
usb-0000:00:1d.2-2
BUG: at drivers/hid/usbhid/hid-quirks.c:480 usbhid_exists_dquirk()

Call Trace:
[<ffffffff8044bbcb>] usbhid_lookup_quirk+0x63/0xea
[<ffffffff8044ab12>] hid_probe+0x4d/0xbad
[<ffffffff8029d0b6>] sysfs_new_dirent+0x79/0xaa
[<ffffffff80403411>] usb_match_one_id+0x26/0x82
[<ffffffff80403fe6>] usb_probe_interface+0x7d/0xa5
[<ffffffff80391cb9>] driver_probe_device+0xf6/0x17f
[<ffffffff80391de4>] __driver_attach+0x0/0x93
[<ffffffff80391e3e>] __driver_attach+0x5a/0x93
[<ffffffff8039110c>] bus_for_each_dev+0x43/0x6e
[<ffffffff80391433>] bus_add_driver+0x7b/0x19d
[<ffffffff80403b06>] usb_register_driver+0x7e/0xe1
[<ffffffff8064d2d3>] hid_init+0x28/0x4e
[<ffffffff8063460b>] kernel_init+0x162/0x2d2
[<ffffffff8020a338>] child_rip+0xa/0x12
[<ffffffff80343d5c>] acpi_ds_init_one_object+0x0/0x7c
[<ffffffff806344a9>] kernel_init+0x0/0x2d2
[<ffffffff8020a32e>] child_rip+0x0/0x12


And also a strange problem : dhcp server and dns server was started before 
bond0 was completly up ( eth0 and eth1 was declared down at this time and 
up a few times later : was not the case with later 2.6.21-rc6-mm1 )

Vincent 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1
  2007-04-27  9:25   ` VE (HOME)
@ 2007-04-27 19:20     ` Andrew Morton
       [not found]       ` <200704272205.29131.ve@vetienne.net>
  2007-04-27 21:18     ` USB HID bug (was [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1) Greg KH
  1 sibling, 1 reply; 26+ messages in thread
From: Andrew Morton @ 2007-04-27 19:20 UTC (permalink / raw)
  To: VE (HOME); +Cc: Linux Kernel, fubar, bonding-devel, netdev

On Fri, 27 Apr 2007 11:25:46 +0200 "VE \(HOME\)" <ve@vetienne.net> wrote:

> Andrew Morton wrote:
> > On Thu, 26 Apr 2007 20:58:32 +0200 Vincent ETIENNE <ve@vetienne.net>
> > wrote:
> >
> >
> > This was due to locking bustage in the net tree.  It should be fixed
> > in 2.6.21-rc7-mm2.
> 
> I have tried this version. Same problem ( see 
> http://mail1.vetienne.net/linux/dmesg-2.6.21-rc7-mm2.log )

That file has disappeared.

> But also a new problem with USB keyboard/mouse
> 

USB problem - let's handle that separately.

> 
> And also a strange problem : dhcp server and dns server was started before 
> bond0 was completly up ( eth0 and eth1 was declared down at this time and 
> up a few times later : was not the case with later 2.6.21-rc6-mm1 )
> 



^ permalink raw reply	[flat|nested] 26+ messages in thread

* USB HID bug (was [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1)
  2007-04-27  9:25   ` VE (HOME)
  2007-04-27 19:20     ` Andrew Morton
@ 2007-04-27 21:18     ` Greg KH
  2007-04-27 22:42       ` Jiri Kosina
  1 sibling, 1 reply; 26+ messages in thread
From: Greg KH @ 2007-04-27 21:18 UTC (permalink / raw)
  To: VE (HOME), jkosina; +Cc: Andrew Morton, Linux Kernel, fubar

On Fri, Apr 27, 2007 at 11:25:46AM +0200, VE (HOME) wrote:
>  Andrew Morton wrote:
> > On Thu, 26 Apr 2007 20:58:32 +0200 Vincent ETIENNE <ve@vetienne.net>
> > wrote:
> >
> >
> > This was due to locking bustage in the net tree.  It should be fixed
> > in 2.6.21-rc7-mm2.
> 
>  I have tried this version. Same problem ( see 
>  http://mail1.vetienne.net/linux/dmesg-2.6.21-rc7-mm2.log )
>  But also a new problem with USB keyboard/mouse
> 
>  BUG: at drivers/hid/usbhid/hid-quirks.c:480 usbhid_exists_dquirk()
> 
>  Call Trace:
>  [<ffffffff8044bbcb>] usbhid_lookup_quirk+0x63/0xea
>  [<ffffffff8044ab12>] hid_probe+0x4d/0xbad
>  [<ffffffff8021d7cd>] default_wake_function+0x0/0xe
>  [<ffffffff8029d0b6>] sysfs_new_dirent+0x79/0xaa
>  [<ffffffff80403411>] usb_match_one_id+0x26/0x82
>  [<ffffffff80403fe6>] usb_probe_interface+0x7d/0xa5
>  [<ffffffff80391cb9>] driver_probe_device+0xf6/0x17f
>  [<ffffffff80391de4>] __driver_attach+0x0/0x93
>  [<ffffffff80391e3e>] __driver_attach+0x5a/0x93
>  [<ffffffff8039110c>] bus_for_each_dev+0x43/0x6e
>  [<ffffffff80391433>] bus_add_driver+0x7b/0x19d
>  [<ffffffff80403b06>] usb_register_driver+0x7e/0xe1
>  [<ffffffff8064d2d3>] hid_init+0x28/0x4e
>  [<ffffffff8063460b>] kernel_init+0x162/0x2d2
>  [<ffffffff8020a338>] child_rip+0xa/0x12
>  [<ffffffff80343d5c>] acpi_ds_init_one_object+0x0/0x7c
>  [<ffffffff806344a9>] kernel_init+0x0/0x2d2
>  [<ffffffff8020a32e>] child_rip+0x0/0x12
> 
>  input: ServerEngines SE USB Device as 
>  /devices/pci0000:00/0000:00:1d.2/usb4/4-2/4-2:1.0/input/input2
>  input: USB HID v1.11 Keyboard [ServerEngines SE USB Device] on 
>  usb-0000:00:1d.2-2
>  BUG: at drivers/hid/usbhid/hid-quirks.c:480 usbhid_exists_dquirk()
> 
>  Call Trace:
>  [<ffffffff8044bbcb>] usbhid_lookup_quirk+0x63/0xea
>  [<ffffffff8044ab12>] hid_probe+0x4d/0xbad
>  [<ffffffff8029d0b6>] sysfs_new_dirent+0x79/0xaa
>  [<ffffffff80403411>] usb_match_one_id+0x26/0x82
>  [<ffffffff80403fe6>] usb_probe_interface+0x7d/0xa5
>  [<ffffffff80391cb9>] driver_probe_device+0xf6/0x17f
>  [<ffffffff80391de4>] __driver_attach+0x0/0x93
>  [<ffffffff80391e3e>] __driver_attach+0x5a/0x93
>  [<ffffffff8039110c>] bus_for_each_dev+0x43/0x6e
>  [<ffffffff80391433>] bus_add_driver+0x7b/0x19d
>  [<ffffffff80403b06>] usb_register_driver+0x7e/0xe1
>  [<ffffffff8064d2d3>] hid_init+0x28/0x4e
>  [<ffffffff8063460b>] kernel_init+0x162/0x2d2
>  [<ffffffff8020a338>] child_rip+0xa/0x12
>  [<ffffffff80343d5c>] acpi_ds_init_one_object+0x0/0x7c
>  [<ffffffff806344a9>] kernel_init+0x0/0x2d2
>  [<ffffffff8020a32e>] child_rip+0x0/0x12

Jiri, any thoughts about this?  This looks like it comes from your tree
as 2.6.21 doesn't have the drivers/hid/usbhid/ directory...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1
       [not found]       ` <200704272205.29131.ve@vetienne.net>
@ 2007-04-27 22:32         ` Andrew Morton
  2007-04-28 20:37           ` Vincent ETIENNE
  2007-04-28 21:10           ` Vincent ETIENNE
  0 siblings, 2 replies; 26+ messages in thread
From: Andrew Morton @ 2007-04-27 22:32 UTC (permalink / raw)
  To: Vincent ETIENNE; +Cc: Stephen Hemminger, netdev

On Fri, 27 Apr 2007 22:05:28 +0200
Vincent ETIENNE <ve@vetienne.net> wrote:

> Le Friday 27 April 2007 21:20:39, vous avez __crit__:
> > On Fri, 27 Apr 2007 11:25:46 +0200 "VE \(HOME\)" <ve@vetienne.net> wrote:
> > > Andrew Morton wrote:
> > > > On Thu, 26 Apr 2007 20:58:32 +0200 Vincent ETIENNE <ve@vetienne.net>
> > > > wrote:
> > > >
> > > >
> > > > This was due to locking bustage in the net tree.  It should be fixed
> > > > in 2.6.21-rc7-mm2.
> > >
> > > I have tried this version. Same problem ( see
> > > http://mail1.vetienne.net/linux/dmesg-2.6.21-rc7-mm2.log )
> >
> > That file has disappeared.
> 
> Sorry wrong right on the file. Should be ok now

Please don't go off-list.  Now the people who work on this code don't know
that the log is available.  I've reestablished a few cc's.


The troublesome part is here:

e1000: eth0: e1000_watchdog_task: NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX/TX
e1000: eth0: e1000_watchdog_task: 10/100 speed: disabling TSO
bonding: bond0: link status definitely up for interface eth0.
bonding: bond0: making interface eth0 the new active one.
RTNL: assertion failed at net/ipv4/devinet.c (1055)

Call Trace:
 <IRQ>  [<ffffffff8049b9a1>] inetdev_event+0x48/0x283
 [<ffffffff804c85d1>] _spin_lock_bh+0x9/0x19
 [<ffffffff804753d7>] rt_run_flush+0x7e/0xaf
 [<ffffffff8022d388>] notifier_call_chain+0x29/0x56
 [<ffffffff80457994>] dev_set_mac_address+0x53/0x59
 [<ffffffff88006d6d>] :bonding:alb_set_slave_mac_addr+0x41/0x6c
 [<ffffffff880071e9>] :bonding:alb_swap_mac_addr+0x91/0x165
 [<ffffffff88002022>] :bonding:bond_change_active_slave+0x227/0x382
 [<ffffffff880024c2>] :bonding:bond_select_active_slave+0xb7/0xe5
 [<ffffffff88004172>] :bonding:bond_mii_monitor+0x3cd/0x41e
 [<ffffffff88003da5>] :bonding:bond_mii_monitor+0x0/0x41e
 [<ffffffff802299a0>] run_timer_softirq+0x130/0x19f
 [<ffffffff80226b64>] __do_softirq+0x55/0xc4
 [<ffffffff8020a6ac>] call_softirq+0x1c/0x28
 [<ffffffff8020bfb9>] do_softirq+0x2c/0x7d
 [<ffffffff802145a7>] smp_apic_timer_interrupt+0x49/0x5e
 [<ffffffff80208989>] mwait_idle+0x0/0x47
 [<ffffffff8020a156>] apic_timer_interrupt+0x66/0x70
 <EOI>  [<ffffffff802089cb>] mwait_idle+0x42/0x47
 [<ffffffff80208921>] cpu_idle+0x7f/0xa2
 [<ffffffff806349bd>] start_kernel+0x242/0x24e
 [<ffffffff80634146>] _sinittext+0x146/0x14a


because we thought we'd fixed the rtnl_lock() problems in 2.6.21-rc7-mm2. 
Are you sure that log is from 2.6.21-rc7-mm2?


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: USB HID bug (was [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1)
  2007-04-27 21:18     ` USB HID bug (was [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1) Greg KH
@ 2007-04-27 22:42       ` Jiri Kosina
  2007-04-27 22:55         ` Jiri Kosina
                           ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Jiri Kosina @ 2007-04-27 22:42 UTC (permalink / raw)
  To: Greg KH; +Cc: VE (HOME), Andrew Morton, Linux Kernel, fubar, Paul Walmsley

On Fri, 27 Apr 2007, Greg KH wrote:

> >  BUG: at drivers/hid/usbhid/hid-quirks.c:480 usbhid_exists_dquirk()
[ .. stacktraces stripped .. ]
> Jiri, any thoughts about this?  This looks like it comes from your tree
> as 2.6.21 doesn't have the drivers/hid/usbhid/ directory...

Paul (author of the corresponding patch) added to CC. 

This BUG (it's in fact a warning) is this one:

       WARN_ON(idVendor == 0);

I now don't immediately see how this could happen - the vendor ID seems to 
be propagated properly from hid_probe() (nothing has been changed in this 
codepath), so this would mean that hid_probe() has been passed 
usb_interface for which 
le16_to_cpu(interface_to_usbdev(intf).dev->descriptor.idVendor) is equal 
to zero ...  and this definitely shouldn't happen for any sane device 
(could the original poster please verify with lsusb, just to be 100% 
sure?).

It would also help if the original poster, who is able to reproduce this 
WARN_ON, would verify whether hid_probe() is really being passed such 
strange usb_interface from upper USB layer ... please see the patch below, 
and send me the output if possible.

Paul, do you have any idea? In fact, what was your reason for putting this 
WARN_ON() there? Did you ever meet any condition when idVendor == 0 
appears there?

Thanks.

diff --git a/drivers/hid/usbhid/hid-core.c b/drivers/hid/usbhid/hid-core.c
index f929dee..2a77d8b 100644
--- a/drivers/hid/usbhid/hid-core.c
+++ b/drivers/hid/usbhid/hid-core.c
@@ -984,6 +984,8 @@ static int hid_probe(struct usb_interfac
 	dbg("HID probe called for ifnum %d",
 			intf->altsetting->desc.bInterfaceNumber);
 
+	printk(KERN_DEBUG "DEBUG - hid_probe: idVendor: %x\n", 
+			le16_to_cpu(interface_to_usbdev(intf)->descriptor.idVendor));
 	if (!(hid = usb_hid_configure(intf)))
 		return -ENODEV;
 

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: USB HID bug (was [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1)
  2007-04-27 22:42       ` Jiri Kosina
@ 2007-04-27 22:55         ` Jiri Kosina
  2007-04-27 23:24         ` Paul Walmsley
  2007-04-28  7:21         ` Vincent ETIENNE
  2 siblings, 0 replies; 26+ messages in thread
From: Jiri Kosina @ 2007-04-27 22:55 UTC (permalink / raw)
  To: Greg KH, VE (HOME); +Cc: Andrew Morton, Linux Kernel, fubar, Paul Walmsley

On Sat, 28 Apr 2007, Jiri Kosina wrote:

> This BUG (it's in fact a warning) is this one:
>        WARN_ON(idVendor == 0);
> I now don't immediately see how this could happen - the vendor ID seems 
> to be propagated properly from hid_probe() (nothing has been changed in 
> this codepath), so this would mean that hid_probe() has been passed 
> usb_interface for which 
> le16_to_cpu(interface_to_usbdev(intf).dev->descriptor.idVendor) is equal 
> to zero ...  and this definitely shouldn't happen for any sane device 
> (could the original poster please verify with lsusb, just to be 100% 
> sure?).

BTW do I guess correctly that your keyboard is useful without problems 
after that, you only see this BUG and stacktraces in your logs, right?

Thanks,

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: USB HID bug (was [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1)
  2007-04-27 22:42       ` Jiri Kosina
  2007-04-27 22:55         ` Jiri Kosina
@ 2007-04-27 23:24         ` Paul Walmsley
  2007-04-28  7:21         ` Vincent ETIENNE
  2 siblings, 0 replies; 26+ messages in thread
From: Paul Walmsley @ 2007-04-27 23:24 UTC (permalink / raw)
  To: Jiri Kosina; +Cc: Greg KH, VE (HOME), Andrew Morton, Linux Kernel, fubar

Hi Jiri,

On Sat, 28 Apr 2007, Jiri Kosina wrote:

> Paul, do you have any idea? In fact, what was your reason for putting this
> WARN_ON() there?

The static quirk list uses idVendor == 0 to mark the end of 
hid_blacklist[], so we don't expect any device to have idVendor == 0.  If 
a device is correctly presenting with idVendor == 0, we need a different 
way to terminate that blacklist.  Either that or there's an upper-layer 
bug, as you write.

Regarding its placement: that WARN_ON() belongs in the static quirk lookup 
code, rather than where it is now.  Its current location must be a relic 
of an earlier patchset.

> Did you ever meet any condition when idVendor == 0 appears there?

No.  There shouldn't be any functional problem with removing that 
WARN_ON(), and also removing the initial if() in usbhid_lookup_dquirk().


- Paul

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: USB HID bug (was [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1)
  2007-04-27 22:42       ` Jiri Kosina
  2007-04-27 22:55         ` Jiri Kosina
  2007-04-27 23:24         ` Paul Walmsley
@ 2007-04-28  7:21         ` Vincent ETIENNE
  2007-04-28 14:07           ` Paul Walmsley
  2007-04-28 15:06           ` Jiri Kosina
  2 siblings, 2 replies; 26+ messages in thread
From: Vincent ETIENNE @ 2007-04-28  7:21 UTC (permalink / raw)
  To: Jiri Kosina; +Cc: Greg KH, Andrew Morton, Linux Kernel, Paul Walmsley

Le Saturday 28 April 2007 00:42:45 Jiri Kosina, vous avez écrit :
> On Fri, 27 Apr 2007, Greg KH wrote:
> > >  BUG: at drivers/hid/usbhid/hid-quirks.c:480 usbhid_exists_dquirk()
>
> [ .. stacktraces stripped .. ]
>
> > Jiri, any thoughts about this?  This looks like it comes from your tree
> > as 2.6.21 doesn't have the drivers/hid/usbhid/ directory...
>
> Paul (author of the corresponding patch) added to CC.
>
> This BUG (it's in fact a warning) is this one:
>
>        WARN_ON(idVendor == 0);
>
> I now don't immediately see how this could happen - the vendor ID seems to
> be propagated properly from hid_probe() (nothing has been changed in this
> codepath), so this would mean that hid_probe() has been passed
> usb_interface for which
> le16_to_cpu(interface_to_usbdev(intf).dev->descriptor.idVendor) is equal
> to zero ...  and this definitely shouldn't happen for any sane device
> (could the original poster please verify with lsusb, just to be 100%
> sure?).

You could download the result of lsusb -vvv from 
http://mail1.vetienne.net/linux/lsusb.log

>
> It would also help if the original poster, who is able to reproduce this
> WARN_ON, would verify whether hid_probe() is really being passed such
> strange usb_interface from upper USB layer ... please see the patch below,
> and send me the output if possible.
>
> Paul, do you have any idea? In fact, what was your reason for putting this
> WARN_ON() there? Did you ever meet any condition when idVendor == 0
> appears there?
>
> Thanks.
>
> diff --git a/drivers/hid/usbhid/hid-core.c b/drivers/hid/usbhid/hid-core.c
> index f929dee..2a77d8b 100644
> --- a/drivers/hid/usbhid/hid-core.c
> +++ b/drivers/hid/usbhid/hid-core.c
> @@ -984,6 +984,8 @@ static int hid_probe(struct usb_interfac
>  	dbg("HID probe called for ifnum %d",
>  			intf->altsetting->desc.bInterfaceNumber);
>
> +	printk(KERN_DEBUG "DEBUG - hid_probe: idVendor: %x\n",
> +			le16_to_cpu(interface_to_usbdev(intf)->descriptor.idVendor));
>  	if (!(hid = usb_hid_configure(intf)))
>  		return -ENODEV;

With your patch it seems like idVendor passed is 0. You could see it there ;

http://mail1.vetienne.net/linux/dmesg.2.6.21-rc7-mm2+patch-usbhid

I could confirm that the keyboard is working ( yesterday i was just behind the 
box due to test on the network ). But i don't remember if the keyboard is USB 
or PS2 (i only have ssh access to the box for the next 3/4 days)

  Vincent

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: USB HID bug (was [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1)
  2007-04-28  7:21         ` Vincent ETIENNE
@ 2007-04-28 14:07           ` Paul Walmsley
  2007-04-28 15:06           ` Jiri Kosina
  1 sibling, 0 replies; 26+ messages in thread
From: Paul Walmsley @ 2007-04-28 14:07 UTC (permalink / raw)
  To: Vincent ETIENNE; +Cc: Jiri Kosina, Greg KH, Andrew Morton, Linux Kernel

On Sat, 28 Apr 2007, Vincent ETIENNE wrote:

> With your patch it seems like idVendor passed is 0. You could see it there ;
>
> http://mail1.vetienne.net/linux/dmesg.2.6.21-rc7-mm2+patch-usbhid
>
> I could confirm that the keyboard is working ( yesterday i was just behind the
> box due to test on the network ). But i don't remember if the keyboard is USB
> or PS2 (i only have ssh access to the box for the next 3/4 days)

It seems that OpenBSD users are reporting similar problems:

http://article.gmane.org/gmane.os.openbsd.misc/122526

http://www.mail-archive.com/misc@openbsd.org/msg38200.html


- Paul

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: USB HID bug (was [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1)
  2007-04-28  7:21         ` Vincent ETIENNE
  2007-04-28 14:07           ` Paul Walmsley
@ 2007-04-28 15:06           ` Jiri Kosina
  2007-04-28 19:50             ` [linux-usb-devel] " Alan Stern
  1 sibling, 1 reply; 26+ messages in thread
From: Jiri Kosina @ 2007-04-28 15:06 UTC (permalink / raw)
  To: Vincent ETIENNE
  Cc: Greg KH, Andrew Morton, Linux Kernel, Paul Walmsley,
	linux-usb-devel

On Sat, 28 Apr 2007, Vincent ETIENNE wrote:

> > I now don't immediately see how this could happen - the vendor ID 
> > seems to be propagated properly from hid_probe() (nothing has been 
> > changed in this codepath), so this would mean that hid_probe() has 
> > been passed usb_interface for which 
> > le16_to_cpu(interface_to_usbdev(intf).dev->descriptor.idVendor) is 
> > equal to zero ...  and this definitely shouldn't happen for any sane 
> > device (could the original poster please verify with lsusb, just to be 
> > 100% sure?).
> You could download the result of lsusb -vvv from 
> http://mail1.vetienne.net/linux/lsusb.log

Hi Vincent,

thanks for the report. It's pretty awesome though - all the USB devices 
seem to have vendor and product id set to 0x0000. Greg, have you ever met 
this? 

linux-usb-devel added to CC (full thread here: 
http://lkml.org/lkml/2007/4/27/496)

So this definitely isn't a HID-specific problem, something is confusing 
the USB VID/PIDs.

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [linux-usb-devel] USB HID bug (was [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1)
  2007-04-28 15:06           ` Jiri Kosina
@ 2007-04-28 19:50             ` Alan Stern
  2007-04-28 20:43               ` Vincent ETIENNE
  0 siblings, 1 reply; 26+ messages in thread
From: Alan Stern @ 2007-04-28 19:50 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Vincent ETIENNE, Greg KH, Andrew Morton, Linux Kernel,
	linux-usb-devel, Paul Walmsley

On Sat, 28 Apr 2007, Jiri Kosina wrote:

> On Sat, 28 Apr 2007, Vincent ETIENNE wrote:
> 
> > > I now don't immediately see how this could happen - the vendor ID 
> > > seems to be propagated properly from hid_probe() (nothing has been 
> > > changed in this codepath), so this would mean that hid_probe() has 
> > > been passed usb_interface for which 
> > > le16_to_cpu(interface_to_usbdev(intf).dev->descriptor.idVendor) is 
> > > equal to zero ...  and this definitely shouldn't happen for any sane 
> > > device (could the original poster please verify with lsusb, just to be 
> > > 100% sure?).
> > You could download the result of lsusb -vvv from 
> > http://mail1.vetienne.net/linux/lsusb.log
> 
> Hi Vincent,
> 
> thanks for the report. It's pretty awesome though - all the USB devices 
> seem to have vendor and product id set to 0x0000. Greg, have you ever met 
> this? 
> 
> linux-usb-devel added to CC (full thread here: 
> http://lkml.org/lkml/2007/4/27/496)
> 
> So this definitely isn't a HID-specific problem, something is confusing 
> the USB VID/PIDs.

No, it isn't a problem in the USB core.  The device itself is messed up; 
it really does report idVendor and idProduct both equal to 0.

Jiri, don't worry about all those other devices in the listing that also 
have idVendor and idProduct set to 0.  They aren't real devices at all; 
they are virtual root hubs.  You'll see what I mean if you look at their 
iManufacturer, iProduct, and iSerial strings.

Alan Stern


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1
  2007-04-27 22:32         ` Andrew Morton
@ 2007-04-28 20:37           ` Vincent ETIENNE
  2007-04-28 21:31             ` Andrew Morton
  2007-04-28 21:10           ` Vincent ETIENNE
  1 sibling, 1 reply; 26+ messages in thread
From: Vincent ETIENNE @ 2007-04-28 20:37 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Stephen Hemminger, netdev

Le Saturday 28 April 2007 00:32:37 Andrew Morton, vous avez écrit :
> On Fri, 27 Apr 2007 22:05:28 +0200
>

> because we thought we'd fixed the rtnl_lock() problems in 2.6.21-rc7-mm2.
> Are you sure that log is from 2.6.21-rc7-mm2?


Yes. I have retested it another time ( for adding the small usb debug 
message ) and get the same message a new time.

Maybe a tiny difference : only eth0 was setup during boot ( bond0 and eth1 was 
not setup - volontary  - to avoid problem that could interfere with usb ). 
Ony after having done my usb test, i have stopped the eth0, setup bond 
interface and restart bond interface and got the same problem with the same 
back trace.

Is there anything i can do to get you more precise information ?





^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [linux-usb-devel] USB HID bug (was [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1)
  2007-04-28 19:50             ` [linux-usb-devel] " Alan Stern
@ 2007-04-28 20:43               ` Vincent ETIENNE
  2007-04-29 11:18                 ` Jiri Kosina
  0 siblings, 1 reply; 26+ messages in thread
From: Vincent ETIENNE @ 2007-04-28 20:43 UTC (permalink / raw)
  To: Alan Stern
  Cc: Jiri Kosina, Greg KH, Andrew Morton, Linux Kernel,
	linux-usb-devel, Paul Walmsley

Le Saturday 28 April 2007 21:50:30 Alan Stern, vous avez écrit :
> No, it isn't a problem in the USB core.  The device itself is messed up;
> it really does report idVendor and idProduct both equal to 0.
>
> Jiri, don't worry about all those other devices in the listing that also
> have idVendor and idProduct set to 0.  They aren't real devices at all;
> they are virtual root hubs.  You'll see what I mean if you look at their
> iManufacturer, iProduct, and iSerial strings.
>
> Alan Stern


So is it just a scary trace but without consequence that i could ignored ? 
May i ask you what is the device which is messed up ? ( Maybe should i change 
it ? )

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1
  2007-04-27 22:32         ` Andrew Morton
  2007-04-28 20:37           ` Vincent ETIENNE
@ 2007-04-28 21:10           ` Vincent ETIENNE
  1 sibling, 0 replies; 26+ messages in thread
From: Vincent ETIENNE @ 2007-04-28 21:10 UTC (permalink / raw)
  To: Linux Kernel

Le Saturday 28 April 2007 00:32:37 Andrew Morton, vous avez écrit :
> On Fri, 27 Apr 2007 22:05:28 +0200
>

> because we thought we'd fixed the rtnl_lock() problems in 2.6.21-rc7-mm2.
> Are you sure that log is from 2.6.21-rc7-mm2?


Yes. I have retested it another time ( for adding the small usb debug 
message ) and get the same message a new time.

Maybe a tiny difference : only eth0 was setup during boot ( bond0 and eth1 was 
not setup - volontary  - to avoid problem that could interfere with usb ). 
Ony after having done my usb test, i have stopped the eth0, setup bond 
interface and restart bond interface and got the same problem with the same 
back trace.

Is there anything i can do to get you more precise information ?


Resend to the list : i have stripped lklm from the recipients......


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1
  2007-04-28 20:37           ` Vincent ETIENNE
@ 2007-04-28 21:31             ` Andrew Morton
  2007-04-28 21:55               ` Patrick McHardy
  2007-04-28 22:04               ` Vincent ETIENNE
  0 siblings, 2 replies; 26+ messages in thread
From: Andrew Morton @ 2007-04-28 21:31 UTC (permalink / raw)
  To: Vincent ETIENNE; +Cc: Stephen Hemminger, netdev

On Sat, 28 Apr 2007 22:37:42 +0200 Vincent ETIENNE <ve@vetienne.net> wrote:

> Le Saturday 28 April 2007 00:32:37 Andrew Morton, vous avez écrit :
> > On Fri, 27 Apr 2007 22:05:28 +0200
> >
> 
> > because we thought we'd fixed the rtnl_lock() problems in 2.6.21-rc7-mm2.
> > Are you sure that log is from 2.6.21-rc7-mm2?
> 
> 
> Yes. I have retested it another time ( for adding the small usb debug 
> message ) and get the same message a new time.
> 
> Maybe a tiny difference : only eth0 was setup during boot ( bond0 and eth1 was 
> not setup - volontary  - to avoid problem that could interfere with usb ). 
> Ony after having done my usb test, i have stopped the eth0, setup bond 
> interface and restart bond interface and got the same problem with the same 
> back trace.

OK, thanks.

> Is there anything i can do to get you more precise information ?

I guess if you could provide us with your .config and a step-by-step recipe
which developers should use to reproduce the problem then we should be able
to fix this pretty easily when someone finds the time to do so.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1
  2007-04-28 21:31             ` Andrew Morton
@ 2007-04-28 21:55               ` Patrick McHardy
  2007-04-28 22:04               ` Vincent ETIENNE
  1 sibling, 0 replies; 26+ messages in thread
From: Patrick McHardy @ 2007-04-28 21:55 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Vincent ETIENNE, Stephen Hemminger, netdev

Andrew Morton wrote:
> On Sat, 28 Apr 2007 22:37:42 +0200 Vincent ETIENNE <ve@vetienne.net> wrote:
> 
>>>because we thought we'd fixed the rtnl_lock() problems in 2.6.21-rc7-mm2.
>>>Are you sure that log is from 2.6.21-rc7-mm2?
>>
>>
>>Yes. I have retested it another time ( for adding the small usb debug 
>>message ) and get the same message a new time.
>>
>>Maybe a tiny difference : only eth0 was setup during boot ( bond0 and eth1 was 
>>not setup - volontary  - to avoid problem that could interfere with usb ). 
>>Ony after having done my usb test, i have stopped the eth0, setup bond 
>>interface and restart bond interface and got the same problem with the same 
>>back trace.
> 
> 
> OK, thanks.


The failed RTNL assertion here is unrelated to the mutex initialization
bug in 2.6.1-rc7-mm1. The bonding driver is not holding the rtnl_mutex
while calling dev_set_mac_address (and can not since its called from
a timer). I believe this is a known problem to the bonding maintainers.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1
  2007-04-28 21:31             ` Andrew Morton
  2007-04-28 21:55               ` Patrick McHardy
@ 2007-04-28 22:04               ` Vincent ETIENNE
  1 sibling, 0 replies; 26+ messages in thread
From: Vincent ETIENNE @ 2007-04-28 22:04 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Stephen Hemminger, netdev

Le Saturday 28 April 2007 23:31:02 Andrew Morton, vous avez écrit :
>
> > Is there anything i can do to get you more precise information ?
>
> I guess if you could provide us with your .config and a step-by-step recipe
> which developers should use to reproduce the problem then we should be able
> to fix this pretty easily when someone finds the time to do so.

For the config, i have uploaded it to 
http://mail1.vetienne.net/linux/config-2.6.21-rc7-mm2. 

For reproducing the problem, it 's simple (for me at least) :

	-  setup a bonding interface ( IP 192.168.1.5/255.255.255.0 so nothing 
fancy ) bond (mode 6 : alb ) on two NIC card (in my case  E1000 and TG3 ). 
Don't know if it's important or not but the two cards are connected to the 
same cheap 100Mb switch ( So no gigabit )

	- Reboot  : when the bond interface went up ( after the two physical slaves ) 
the bug is triggered ( for the moment always, but i have only rebooted 2 or 3 
times with 2.6.21-rc7-mm2  ).

	- You could also play with networks cable and put down/up one interface or 
another  during normal work : it would often triggered the same trace ( not 
always although ). You don't need to have load on the network. The interface 
that went down or up doesn't seem to be important as far as i can see.

	- No lockup of the kernel, network still work fine after the problem








^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [linux-usb-devel] USB HID bug (was [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1)
  2007-04-28 20:43               ` Vincent ETIENNE
@ 2007-04-29 11:18                 ` Jiri Kosina
  2007-04-29 13:40                   ` Vincent ETIENNE
  0 siblings, 1 reply; 26+ messages in thread
From: Jiri Kosina @ 2007-04-29 11:18 UTC (permalink / raw)
  To: Vincent ETIENNE
  Cc: Alan Stern, Greg KH, Andrew Morton, Linux Kernel, linux-usb-devel,
	Paul Walmsley

On Sat, 28 Apr 2007, Vincent ETIENNE wrote:

> > No, it isn't a problem in the USB core.  The device itself is messed up;
> > it really does report idVendor and idProduct both equal to 0.
> So is it just a scary trace but without consequence that i could ignored 
> ? May i ask you what is the device which is messed up ? ( Maybe should i 
> change it ? )

Hi Vincent,

yes, the device is messed up, but it shouldn't have any consequences for 
you - the HID driver is able to correctly handle that, so as soon as we 
don't need to add any extra quirks for such device, everything should be 
fine. I have removed the WARN_ON from the code in my tree. I think we 
still don't want users to add quirks for such broken devices (as it would 
collide with hid_blakclist[] terminator), so I have left the initial 
condition in usbhid_modify_dquirk() untouched.


From: Jiri Kosina <jkosina@suse.cz>

USB HID: don't warn on idVendor == 0

It turns out that there are broken devices out there that incorrectly
report VID/PID as 0x000, see http://lkml.org/lkml/2007/4/27/496

Therefore we should not confuse users by dumping warnings and stacktraces
in such situation. It is not possible to add quirks for such horribly
broken devices, but currently that's not needed.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>

diff --git a/drivers/hid/usbhid/hid-quirks.c b/drivers/hid/usbhid/hid-quirks.c
index 27188bd..17a8755 100644
--- a/drivers/hid/usbhid/hid-quirks.c
+++ b/drivers/hid/usbhid/hid-quirks.c
@@ -477,8 +477,6 @@ static struct hid_blacklist *usbhid_exis
 	struct quirks_list_struct *q;
 	struct hid_blacklist *bl_entry = NULL;
 
-	WARN_ON(idVendor == 0);
-
 	list_for_each_entry(q, &dquirks_list, node) {
 		if (q->hid_bl_item.idVendor == idVendor &&
 				q->hid_bl_item.idProduct == idProduct) {

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [linux-usb-devel] USB HID bug (was [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1)
  2007-04-29 11:18                 ` Jiri Kosina
@ 2007-04-29 13:40                   ` Vincent ETIENNE
  0 siblings, 0 replies; 26+ messages in thread
From: Vincent ETIENNE @ 2007-04-29 13:40 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Alan Stern, Greg KH, Andrew Morton, Linux Kernel, linux-usb-devel,
	Paul Walmsley

Le Sunday 29 April 2007 13:18:31 Jiri Kosina, vous avez écrit :
>
> Hi Vincent,

	Hi Jiri,

>
> yes, the device is messed up, but it shouldn't have any consequences for
> you - the HID driver is able to correctly handle that, so as soon as we
> don't need to add any extra quirks for such device, everything should be
> fine. I have removed the WARN_ON from the code in my tree. I think we
> still don't want users to add quirks for such broken devices (as it would
> collide with hid_blakclist[] terminator), so I have left the initial
> condition in usbhid_modify_dquirk() untouched.
>

Thanks a lot for the explanation and the patch, now i better understand 
the "problem". Sorry to have bother you with just noise. The patch will 
certainly be of some help with dumb user (who have dump hardware ) 
like me.

I have very appreciate the attention you ( and all other from this feed ) have 
take with me. So a big thanks for your work and the next time (if any), i 
will try to better analyse the problem to avoid unnecessary work.

Vincent

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Bonding-devel] [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1
  2007-04-26 20:44   ` [Bonding-devel] " Jay Vosburgh
  2007-04-26 21:43     ` Vincent ETIENNE
@ 2007-05-09 18:47     ` Vincent ETIENNE
  2007-05-10 19:57       ` Jay Vosburgh
  1 sibling, 1 reply; 26+ messages in thread
From: Vincent ETIENNE @ 2007-05-09 18:47 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: Chris Snook, Linux Kernel, bonding-devel, Andy Gospodarek

Le Thursday 26 April 2007 22:44:59 Jay Vosburgh, vous avez écrit :
> Chris Snook <csnook@redhat.com> wrote:
> >Vincent ETIENNE wrote:
> >> Hi,
> >>
> >>  Summary :
> >> 	Got this trace when one network interface come down or up in a 2
> >> 	interfaces bonding.	So far, system seems to survive to this problem
> >> 	and works fine.
> >

> 	This problem looks to be one of the known locking issues with
> bonding.
>
> 	Andy Gospodarek <andy@greyhouse.net> and I have been working
> offline on the locking issues in bonding over the last several weeks.
> At the moment, we have a generally stable (but ugly with debug fluff and
> other yuckies) patch that seems to resolve at least the majority of the
> various issues.  I'm thinking to clean it up for general posting early
> next week, and address additional problems from there (since it's
> hopefully at least a big step forward).
>
> 	-J


Any news and/or patch concerning this problem ?. Not intended to stress you 
but time goes on... Do you think a solution could be found in a relatively 
short timeframe (and i will delay a bit final installation) or is it better 
to go without bonding and do an upgrade later ?.

	Vincent

>
> ---
> 	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Bonding-devel] [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1
  2007-05-09 18:47     ` Vincent ETIENNE
@ 2007-05-10 19:57       ` Jay Vosburgh
  0 siblings, 0 replies; 26+ messages in thread
From: Jay Vosburgh @ 2007-05-10 19:57 UTC (permalink / raw)
  To: Vincent ETIENNE; +Cc: Chris Snook, netdev, bonding-devel

Vincent ETIENNE <ve@vetienne.net> wrote:

>Any news and/or patch concerning this problem ?. Not intended to stress you 
>but time goes on... Do you think a solution could be found in a relatively 
>short timeframe (and i will delay a bit final installation) or is it better 
>to go without bonding and do an upgrade later ?.

	Our working patch is appended; I'm getting a bunch of warnings
from the lockdep stuff that I'm sorting through.  I'm not sure if
they're real concerns or just misconfiguration of lockdep itself.  This
doesn't fix everything (e.g., enslaving a VLAN device), but covers most
of the usual trouble cases.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index 7e03f41..c8394a0 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -2097,8 +2097,10 @@ void bond_3ad_unbind_slave(struct slave *slave)
  * times out, and it selects an aggregator for the ports that are yet not
  * related to any aggregator, and selects the active aggregator for a bond.
  */
-void bond_3ad_state_machine_handler(struct bonding *bond)
+void bond_3ad_state_machine_handler(struct work_struct *work)
 {
+	struct bonding *bond = container_of(work, struct bonding,
+					    ad_work.work);
 	struct port *port;
 	struct aggregator *aggregator;
 
@@ -2149,7 +2151,7 @@ void bond_3ad_state_machine_handler(struct bonding *bond)
 	}
 
 re_arm:
-	mod_timer(&(BOND_AD_INFO(bond).ad_timer), jiffies + ad_delta_in_ticks);
+	queue_delayed_work(bond->wq, &bond->ad_work, ad_delta_in_ticks);
 out:
 	read_unlock(&bond->lock);
 }
diff --git a/drivers/net/bonding/bond_3ad.h b/drivers/net/bonding/bond_3ad.h
index 6ad5ad6..cf5142d 100644
--- a/drivers/net/bonding/bond_3ad.h
+++ b/drivers/net/bonding/bond_3ad.h
@@ -276,7 +276,7 @@ struct ad_slave_info {
 void bond_3ad_initialize(struct bonding *bond, u16 tick_resolution, int lacp_fast);
 int  bond_3ad_bind_slave(struct slave *slave);
 void bond_3ad_unbind_slave(struct slave *slave);
-void bond_3ad_state_machine_handler(struct bonding *bond);
+void bond_3ad_state_machine_handler(struct work_struct *);
 void bond_3ad_adapter_speed_changed(struct slave *slave);
 void bond_3ad_adapter_duplex_changed(struct slave *slave);
 void bond_3ad_handle_link_change(struct slave *slave, char link);
diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
index 92c3b6f..4cdea61 100644
--- a/drivers/net/bonding/bond_alb.c
+++ b/drivers/net/bonding/bond_alb.c
@@ -42,7 +42,6 @@
 #include "bonding.h"
 #include "bond_alb.h"
 
-
 #define ALB_TIMER_TICKS_PER_SEC	    10	/* should be a divisor of HZ */
 #define BOND_TLB_REBALANCE_INTERVAL 10	/* In seconds, periodic re-balancing.
 					 * Used for division - never set
@@ -469,13 +468,13 @@ static void rlb_clear_slave(struct bonding *bond, struct slave *slave)
 
 	_unlock_rx_hashtbl(bond);
 
-	write_lock(&bond->curr_slave_lock);
+	write_lock_bh(&bond->curr_slave_lock);
 
 	if (slave != bond->curr_active_slave) {
 		rlb_teach_disabled_mac_on_primary(bond, slave->dev->dev_addr);
 	}
 
-	write_unlock(&bond->curr_slave_lock);
+	write_unlock_bh(&bond->curr_slave_lock);
 }
 
 static void rlb_update_client(struct rlb_client_info *client_info)
@@ -956,19 +955,33 @@ static int alb_set_slave_mac_addr(struct slave *slave, u8 addr[], int hw)
 	return 0;
 }
 
-/* Caller must hold bond lock for write or curr_slave_lock for write*/
+/*
+ * Swap MAC addresses between two slaves.
+ *
+ * Called with RTNL held, and no other locks.
+ */
+
 static void alb_swap_mac_addr(struct bonding *bond, struct slave *slave1, struct slave *slave2)
 {
-	struct slave *disabled_slave = NULL;
 	u8 tmp_mac_addr[ETH_ALEN];
-	int slaves_state_differ;
-
-	slaves_state_differ = (SLAVE_IS_OK(slave1) != SLAVE_IS_OK(slave2));
 
 	memcpy(tmp_mac_addr, slave1->dev->dev_addr, ETH_ALEN);
 	alb_set_slave_mac_addr(slave1, slave2->dev->dev_addr, bond->alb_info.rlb_enabled);
 	alb_set_slave_mac_addr(slave2, tmp_mac_addr, bond->alb_info.rlb_enabled);
 
+}
+
+/*
+ * Send learning packets after MAC address swap.
+ *
+ * Called with RTNL and bond->lock held for read.
+ */
+static void alb_fasten_mac_swap(struct bonding *bond, struct slave *slave1,
+				struct slave *slave2)
+{
+	int slaves_state_differ = (SLAVE_IS_OK(slave1) != SLAVE_IS_OK(slave2));
+	struct slave *disabled_slave = NULL;
+
 	/* fasten the change in the switch */
 	if (SLAVE_IS_OK(slave1)) {
 		alb_send_learning_packets(slave1, slave1->dev->dev_addr);
@@ -1041,7 +1054,9 @@ static void alb_change_hw_addr_on_detach(struct bonding *bond, struct slave *sla
 		}
 
 		if (found) {
+			/* XXX locking: needs RTNL and nothing else */
 			alb_swap_mac_addr(bond, slave, tmp_slave);
+			alb_fasten_mac_swap(bond, slave, tmp_slave);
 		}
 	}
 }
@@ -1373,8 +1388,10 @@ out:
 	return 0;
 }
 
-void bond_alb_monitor(struct bonding *bond)
+void bond_alb_monitor(struct work_struct *work)
 {
+	struct bonding *bond = container_of(work, struct bonding,
+					    alb_work.work);
 	struct alb_bond_info *bond_info = &(BOND_ALB_INFO(bond));
 	struct slave *slave;
 	int i;
@@ -1477,7 +1494,7 @@ void bond_alb_monitor(struct bonding *bond)
 	}
 
 re_arm:
-	mod_timer(&(bond_info->alb_timer), jiffies + alb_delta_in_ticks);
+	queue_delayed_work(bond->wq, &bond->alb_work, alb_delta_in_ticks);
 out:
 	read_unlock(&bond->lock);
 }
@@ -1498,11 +1515,11 @@ int bond_alb_init_slave(struct bonding *bond, struct slave *slave)
 	/* caller must hold the bond lock for write since the mac addresses
 	 * are compared and may be swapped.
 	 */
-	write_lock_bh(&bond->lock);
+	read_lock(&bond->lock);
 
 	res = alb_handle_addr_collision_on_attach(bond, slave);
 
-	write_unlock_bh(&bond->lock);
+	read_unlock(&bond->lock);
 
 	if (res) {
 		return res;
@@ -1567,7 +1584,12 @@ void bond_alb_handle_link_change(struct bonding *bond, struct slave *slave, char
  * Set the bond->curr_active_slave to @new_slave and handle
  * mac address swapping and promiscuity changes as needed.
  *
- * Caller must hold bond curr_slave_lock for write (or bond lock for write)
+ * If new_slave is NULL, caller must hold curr_slave_lock or
+ * bond->lock for write.
+ *
+ * If new_slave is not NULL, caller must hold RTNL, bond->lock for
+ * read and curr_slave_lock for write_bh.  Processing here may sleep, so
+ * no other locks may be held.
  */
 void bond_alb_handle_active_change(struct bonding *bond, struct slave *new_slave)
 {
@@ -1606,19 +1628,30 @@ void bond_alb_handle_active_change(struct bonding *bond, struct slave *new_slave
 		}
 	}
 
+	write_unlock_bh(&bond->curr_slave_lock);
+	read_unlock(&bond->lock);
+
 	/* curr_active_slave must be set before calling alb_swap_mac_addr */
-	if (swap_slave) {
-		/* swap mac address */
+	if (swap_slave)
 		alb_swap_mac_addr(bond, swap_slave, new_slave);
-	} else {
+	else
 		/* set the new_slave to the bond mac address */
 		alb_set_slave_mac_addr(new_slave, bond->dev->dev_addr,
 				       bond->alb_info.rlb_enabled);
+
+	read_lock(&bond->lock);
+	write_lock_bh(&bond->curr_slave_lock);
+
+	if (swap_slave)
+		alb_fasten_mac_swap(bond, swap_slave, new_slave);
+	else
 		/* fasten bond mac on new current slave */
 		alb_send_learning_packets(new_slave, bond->dev->dev_addr);
-	}
 }
 
+/*
+ * Called with RTNL
+ */
 int bond_alb_set_mac_address(struct net_device *bond_dev, void *addr)
 {
 	struct bonding *bond = bond_dev->priv;
@@ -1655,8 +1688,12 @@ int bond_alb_set_mac_address(struct net_device *bond_dev, void *addr)
 		}
 	}
 
+	write_unlock_bh(&bond->curr_slave_lock);
+	read_unlock(&bond->lock);
+
 	if (swap_slave) {
 		alb_swap_mac_addr(bond, swap_slave, bond->curr_active_slave);
+		alb_fasten_mac_swap(bond, swap_slave, bond->curr_active_slave);
 	} else {
 		alb_set_slave_mac_addr(bond->curr_active_slave, bond_dev->dev_addr,
 				       bond->alb_info.rlb_enabled);
@@ -1668,6 +1705,9 @@ int bond_alb_set_mac_address(struct net_device *bond_dev, void *addr)
 		}
 	}
 
+	read_lock(&bond->lock);
+	write_lock_bh(&bond->curr_slave_lock);
+
 	return 0;
 }
 
diff --git a/drivers/net/bonding/bond_alb.h b/drivers/net/bonding/bond_alb.h
index 28f2a2f..5a37e43 100644
--- a/drivers/net/bonding/bond_alb.h
+++ b/drivers/net/bonding/bond_alb.h
@@ -125,7 +125,7 @@ void bond_alb_deinit_slave(struct bonding *bond, struct slave *slave);
 void bond_alb_handle_link_change(struct bonding *bond, struct slave *slave, char link);
 void bond_alb_handle_active_change(struct bonding *bond, struct slave *new_slave);
 int bond_alb_xmit(struct sk_buff *skb, struct net_device *bond_dev);
-void bond_alb_monitor(struct bonding *bond);
+void bond_alb_monitor(struct work_struct *);
 int bond_alb_set_mac_address(struct net_device *bond_dev, void *addr);
 void bond_alb_clear_vlan(struct bonding *bond, unsigned short vlan_id);
 #endif /* __BOND_ALB_H__ */
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 724bce5..7c003fc 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1,4 +1,4 @@
-/*
+ /*
  * originally based on the dummy device.
  *
  * Copyright 1999, Thomas Davis, tadavis@lbl.gov.
@@ -1567,7 +1567,15 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
 			/* first slave or no active slave yet, and this link
 			 * is OK, so make this interface the active one
 			 */
+			write_unlock_bh(&bond->lock);
+			read_lock(&bond->lock);
+			write_lock_bh(&bond->curr_slave_lock);
+
 			bond_change_active_slave(bond, new_slave);
+
+			write_unlock_bh(&bond->curr_slave_lock);
+			read_unlock(&bond->lock);
+			write_lock_bh(&bond->lock);
 		} else {
 			bond_set_slave_inactive_flags(new_slave);
 		}
@@ -1729,9 +1737,23 @@ int bond_release(struct net_device *bond_dev, struct net_device *slave_dev)
 		bond_alb_deinit_slave(bond, slave);
 	}
 
-	if (oldcurrent == slave)
+	if (oldcurrent == slave) {
+		/*
+		 * Note that we hold RTNL over this sequence, so there
+		 * is no concern that another slave add/remove event
+		 * will interfere.
+		 */
+		write_unlock_bh(&bond->lock);
+		read_lock(&bond->lock);
+		write_lock_bh(&bond->curr_slave_lock);
+
 		bond_select_active_slave(bond);
 
+		write_unlock_bh(&bond->curr_slave_lock);
+		read_unlock(&bond->lock);
+		write_lock_bh(&bond->lock);
+	}
+
 	if (bond->slave_cnt == 0) {
 		bond_set_carrier(bond);
 
@@ -1954,16 +1976,19 @@ static int bond_ioctl_change_active(struct net_device *bond_dev, struct net_devi
 		return -EINVAL;
 	}
 
-	write_lock_bh(&bond->lock);
+	read_lock(&bond->lock);
 
+	read_lock(&bond->curr_slave_lock);
 	old_active = bond->curr_active_slave;
+	read_unlock(&bond->curr_slave_lock);
+
 	new_active = bond_get_slave_by_dev(bond, slave_dev);
 
 	/*
 	 * Changing to the current active: do nothing; return success.
 	 */
 	if (new_active && (new_active == old_active)) {
-		write_unlock_bh(&bond->lock);
+		read_unlock(&bond->lock);
 		return 0;
 	}
 
@@ -1971,12 +1996,14 @@ static int bond_ioctl_change_active(struct net_device *bond_dev, struct net_devi
 	    (old_active) &&
 	    (new_active->link == BOND_LINK_UP) &&
 	    IS_UP(new_active->dev)) {
+		write_lock_bh(&bond->curr_slave_lock);
 		bond_change_active_slave(bond, new_active);
+		write_unlock_bh(&bond->curr_slave_lock);
 	} else {
 		res = -EINVAL;
 	}
 
-	write_unlock_bh(&bond->lock);
+	read_unlock(&bond->lock);
 
 	return res;
 }
@@ -1988,9 +2015,9 @@ static int bond_info_query(struct net_device *bond_dev, struct ifbond *info)
 	info->bond_mode = bond->params.mode;
 	info->miimon = bond->params.miimon;
 
-	read_lock_bh(&bond->lock);
+	read_lock(&bond->lock);
 	info->num_slaves = bond->slave_cnt;
-	read_unlock_bh(&bond->lock);
+	read_unlock(&bond->lock);
 
 	return 0;
 }
@@ -2005,7 +2032,7 @@ static int bond_slave_info_query(struct net_device *bond_dev, struct ifslave *in
 		return -ENODEV;
 	}
 
-	read_lock_bh(&bond->lock);
+	read_lock(&bond->lock);
 
 	bond_for_each_slave(bond, slave, i) {
 		if (i == (int)info->slave_id) {
@@ -2014,7 +2041,7 @@ static int bond_slave_info_query(struct net_device *bond_dev, struct ifslave *in
 		}
 	}
 
-	read_unlock_bh(&bond->lock);
+	read_unlock(&bond->lock);
 
 	if (found) {
 		strcpy(info->slave_name, slave->dev->name);
@@ -2028,28 +2055,52 @@ static int bond_slave_info_query(struct net_device *bond_dev, struct ifslave *in
 	return 0;
 }
 
+
+/*-------------------------------- Workqueues -------------------------------*/
+
+void bond_work_cancel_all(struct bonding *bond)
+{
+	write_lock_bh(&bond->lock);
+	bond->kill_timers = 1;
+	write_unlock_bh(&bond->lock);
+
+	if (bond->params.miimon && delayed_work_pending(&bond->mii_work))
+		cancel_delayed_work(&bond->mii_work);
+
+	if (bond->params.arp_interval && delayed_work_pending(&bond->arp_work))
+		cancel_delayed_work(&bond->arp_work);
+
+	if (bond->params.mode == BOND_MODE_ALB &&
+	    delayed_work_pending(&bond->alb_work))
+		cancel_delayed_work(&bond->alb_work);
+
+	if (bond->params.mode == BOND_MODE_8023AD &&
+	    delayed_work_pending(&bond->ad_work))
+		cancel_delayed_work(&bond->ad_work);
+}
+
 /*-------------------------------- Monitoring -------------------------------*/
 
-/* this function is called regularly to monitor each slave's link. */
-void bond_mii_monitor(struct net_device *bond_dev)
+/*
+ * if !have_locks, return nonzero if a failover is necessary.
+ * if have_locks, do whatever failover activities are needed.
+ *
+ * This is to separate the inspection and failover steps for locking
+ * purposes; failover requires rtnl, but acquiring it for every
+ * inspection is undesirable, so an initial call does inspection, and
+ * the wrapper acquires the necessary locks and calls again to perform
+ * failover if needed.  Since all locks are dropped, a complete
+ * restart is needed between calls.
+ */
+static int __bond_mii_monitor(struct bonding *bond, int have_locks)
 {
-	struct bonding *bond = bond_dev->priv;
+	struct net_device *bond_dev = bond->dev;
 	struct slave *slave, *oldcurrent;
 	int do_failover = 0;
-	int delta_in_ticks;
 	int i;
 
-	read_lock(&bond->lock);
-
-	delta_in_ticks = (bond->params.miimon * HZ) / 1000;
-
-	if (bond->kill_timers) {
+	if (bond->slave_cnt == 0)
 		goto out;
-	}
-
-	if (bond->slave_cnt == 0) {
-		goto re_arm;
-	}
 
 	/* we will try to read the link status of each of our slaves, and
 	 * set their IFF_RUNNING flag appropriately. For each slave not
@@ -2105,6 +2156,9 @@ void bond_mii_monitor(struct net_device *bond_dev)
 			if (link_state != BMSR_LSTATUS) {
 				/* link stays down */
 				if (slave->delay <= 0) {
+					if (!have_locks)
+						return 1;
+
 					/* link down for too long time */
 					slave->link = BOND_LINK_DOWN;
 
@@ -2188,6 +2242,9 @@ void bond_mii_monitor(struct net_device *bond_dev)
 			} else {
 				/* link stays up */
 				if (slave->delay == 0) {
+					if (!have_locks)
+						return 1;
+
 					/* now the link has been up for long time enough */
 					slave->link = BOND_LINK_UP;
 					slave->jiffies = jiffies;
@@ -2253,22 +2310,39 @@ void bond_mii_monitor(struct net_device *bond_dev)
 	} /* end of for */
 
 	if (do_failover) {
-		write_lock(&bond->curr_slave_lock);
-
+		write_lock_bh(&bond->curr_slave_lock);
 		bond_select_active_slave(bond);
-
-		write_unlock(&bond->curr_slave_lock);
+		write_unlock_bh(&bond->curr_slave_lock);
 	} else
 		bond_set_carrier(bond);
 
-re_arm:
-	if (bond->params.miimon) {
-		mod_timer(&bond->mii_timer, jiffies + delta_in_ticks);
-	}
 out:
-	read_unlock(&bond->lock);
+	return 0;
 }
 
+void bond_mii_monitor(struct work_struct *work)
+{
+	struct bonding *bond = container_of(work, struct bonding,
+					    mii_work.work);
+	unsigned long delay;
+
+	read_lock(&bond->lock);
+	if (bond->kill_timers) {
+		read_unlock(&bond->lock);
+		return;
+	}
+	if (__bond_mii_monitor(bond, 0)) {
+		read_unlock(&bond->lock);
+		rtnl_lock();
+		read_lock(&bond->lock);
+		__bond_mii_monitor(bond, 1);
+		rtnl_unlock();
+	}
+
+	delay = ((bond->params.miimon * HZ) / 1000) ? : 1;
+	read_unlock(&bond->lock);
+	queue_delayed_work(bond->wq, &bond->mii_work, delay);
+}
 
 static u32 bond_glean_dev_ip(struct net_device *dev)
 {
@@ -2564,9 +2638,11 @@ out:
  * arp is transmitted to generate traffic. see activebackup_arp_monitor for
  * arp monitoring in active backup mode.
  */
-void bond_loadbalance_arp_mon(struct net_device *bond_dev)
+void bond_loadbalance_arp_mon(struct work_struct *work)
 {
-	struct bonding *bond = bond_dev->priv;
+	struct bonding *bond = container_of(work, struct bonding,
+					    arp_work.work);
+	struct net_device *bond_dev = bond->dev;
 	struct slave *slave, *oldcurrent;
 	int do_failover = 0;
 	int delta_in_ticks;
@@ -2665,17 +2741,16 @@ void bond_loadbalance_arp_mon(struct net_device *bond_dev)
 	}
 
 	if (do_failover) {
-		write_lock(&bond->curr_slave_lock);
+		write_lock_bh(&bond->curr_slave_lock);
 
 		bond_select_active_slave(bond);
 
-		write_unlock(&bond->curr_slave_lock);
+		write_unlock_bh(&bond->curr_slave_lock);
 	}
 
 re_arm:
-	if (bond->params.arp_interval) {
-		mod_timer(&bond->arp_timer, jiffies + delta_in_ticks);
-	}
+	if (bond->params.arp_interval)
+		queue_delayed_work(bond->wq, &bond->arp_work, delta_in_ticks);
 out:
 	read_unlock(&bond->lock);
 }
@@ -2695,9 +2770,11 @@ out:
  * may have received.
  * see loadbalance_arp_monitor for arp monitoring in load balancing mode
  */
-void bond_activebackup_arp_mon(struct net_device *bond_dev)
+void bond_activebackup_arp_mon(struct work_struct *work)
 {
-	struct bonding *bond = bond_dev->priv;
+	struct bonding *bond = container_of(work, struct bonding,
+					    arp_work.work);
+	struct net_device *bond_dev = bond->dev;
 	struct slave *slave;
 	int delta_in_ticks;
 	int i;
@@ -2726,7 +2803,7 @@ void bond_activebackup_arp_mon(struct net_device *bond_dev)
 
 				slave->link = BOND_LINK_UP;
 
-				write_lock(&bond->curr_slave_lock);
+				write_lock_bh(&bond->curr_slave_lock);
 
 				if ((!bond->curr_active_slave) &&
 				    ((jiffies - slave->dev->trans_start) <= delta_in_ticks)) {
@@ -2760,7 +2837,7 @@ void bond_activebackup_arp_mon(struct net_device *bond_dev)
 					       slave->dev->name);
 				}
 
-				write_unlock(&bond->curr_slave_lock);
+				write_unlock_bh(&bond->curr_slave_lock);
 			}
 		} else {
 			read_lock(&bond->curr_slave_lock);
@@ -2830,12 +2907,12 @@ void bond_activebackup_arp_mon(struct net_device *bond_dev)
 			       bond_dev->name,
 			       slave->dev->name);
 
-			write_lock(&bond->curr_slave_lock);
+			write_lock_bh(&bond->curr_slave_lock);
 
 			bond_select_active_slave(bond);
 			slave = bond->curr_active_slave;
 
-			write_unlock(&bond->curr_slave_lock);
+			write_unlock_bh(&bond->curr_slave_lock);
 
 			bond->current_arp_slave = slave;
 
@@ -2854,9 +2931,9 @@ void bond_activebackup_arp_mon(struct net_device *bond_dev)
 			       bond->primary_slave->dev->name);
 
 			/* primary is up so switch to it */
-			write_lock(&bond->curr_slave_lock);
+			write_lock_bh(&bond->curr_slave_lock);
 			bond_change_active_slave(bond, bond->primary_slave);
-			write_unlock(&bond->curr_slave_lock);
+			write_unlock_bh(&bond->curr_slave_lock);
 
 			slave = bond->primary_slave;
 			slave->jiffies = jiffies;
@@ -2921,9 +2998,8 @@ void bond_activebackup_arp_mon(struct net_device *bond_dev)
 	}
 
 re_arm:
-	if (bond->params.arp_interval) {
-		mod_timer(&bond->arp_timer, jiffies + delta_in_ticks);
-	}
+	if (bond->params.arp_interval)
+		queue_delayed_work(bond->wq, &bond->arp_work, delta_in_ticks);
 out:
 	read_unlock(&bond->lock);
 }
@@ -2943,7 +3019,7 @@ static void *bond_info_seq_start(struct seq_file *seq, loff_t *pos)
 
 	/* make sure the bond won't be taken away */
 	read_lock(&dev_base_lock);
-	read_lock_bh(&bond->lock);
+	read_lock(&bond->lock);
 
 	if (*pos == 0) {
 		return SEQ_START_TOKEN;
@@ -2977,7 +3053,7 @@ static void bond_info_seq_stop(struct seq_file *seq, void *v)
 {
 	struct bonding *bond = seq->private;
 
-	read_unlock_bh(&bond->lock);
+	read_unlock(&bond->lock);
 	read_unlock(&dev_base_lock);
 }
 
@@ -3503,14 +3579,11 @@ static int bond_xmit_hash_policy_l2(struct sk_buff *skb,
 static int bond_open(struct net_device *bond_dev)
 {
 	struct bonding *bond = bond_dev->priv;
-	struct timer_list *mii_timer = &bond->mii_timer;
-	struct timer_list *arp_timer = &bond->arp_timer;
 
 	bond->kill_timers = 0;
 
 	if ((bond->params.mode == BOND_MODE_TLB) ||
 	    (bond->params.mode == BOND_MODE_ALB)) {
-		struct timer_list *alb_timer = &(BOND_ALB_INFO(bond).alb_timer);
 
 		/* bond_alb_initialize must be called before the timer
 		 * is started.
@@ -3520,44 +3593,33 @@ static int bond_open(struct net_device *bond_dev)
 			return -1;
 		}
 
-		init_timer(alb_timer);
-		alb_timer->expires  = jiffies + 1;
-		alb_timer->data     = (unsigned long)bond;
-		alb_timer->function = (void *)&bond_alb_monitor;
-		add_timer(alb_timer);
+		INIT_DELAYED_WORK(&bond->alb_work, bond_alb_monitor);
+		queue_delayed_work(bond->wq, &bond->alb_work, 0);
 	}
 
 	if (bond->params.miimon) {  /* link check interval, in milliseconds. */
-		init_timer(mii_timer);
-		mii_timer->expires  = jiffies + 1;
-		mii_timer->data     = (unsigned long)bond_dev;
-		mii_timer->function = (void *)&bond_mii_monitor;
-		add_timer(mii_timer);
+		INIT_DELAYED_WORK(&bond->mii_work, bond_mii_monitor);
+		queue_delayed_work(bond->wq, &bond->mii_work, 0);
 	}
 
 	if (bond->params.arp_interval) {  /* arp interval, in milliseconds. */
-		init_timer(arp_timer);
-		arp_timer->expires  = jiffies + 1;
-		arp_timer->data     = (unsigned long)bond_dev;
-		if (bond->params.mode == BOND_MODE_ACTIVEBACKUP) {
-			arp_timer->function = (void *)&bond_activebackup_arp_mon;
-		} else {
-			arp_timer->function = (void *)&bond_loadbalance_arp_mon;
-		}
+
+		if (bond->params.mode == BOND_MODE_ACTIVEBACKUP)
+			INIT_DELAYED_WORK(&bond->arp_work,
+					  bond_activebackup_arp_mon);
+		else
+			INIT_DELAYED_WORK(&bond->arp_work,
+					  bond_loadbalance_arp_mon);
+
+		queue_delayed_work(bond->wq, &bond->arp_work, 0);
 		if (bond->params.arp_validate)
 			bond_register_arp(bond);
-
-		add_timer(arp_timer);
 	}
 
 	if (bond->params.mode == BOND_MODE_8023AD) {
-		struct timer_list *ad_timer = &(BOND_AD_INFO(bond).ad_timer);
-		init_timer(ad_timer);
-		ad_timer->expires  = jiffies + 1;
-		ad_timer->data     = (unsigned long)bond;
-		ad_timer->function = (void *)&bond_3ad_state_machine_handler;
-		add_timer(ad_timer);
-
+		INIT_DELAYED_WORK(&bond->ad_work,
+				  bond_3ad_state_machine_handler);
+		queue_delayed_work(bond->wq, &bond->ad_work, 0);
 		/* register to receive LACPDUs */
 		bond_register_lacpdu(bond);
 	}
@@ -3579,36 +3641,36 @@ static int bond_close(struct net_device *bond_dev)
 
 	write_lock_bh(&bond->lock);
 
-
 	/* signal timers not to re-arm */
 	bond->kill_timers = 1;
 
 	write_unlock_bh(&bond->lock);
 
-	/* del_timer_sync must run without holding the bond->lock
-	 * because a running timer might be trying to hold it too
+	/*
+	 * Release the lock here since cancel_delayed_work will take
+	 * it again and releasing it will give scheduled work a chance
+	 * to run.
 	 */
 
-	if (bond->params.miimon) {  /* link check interval, in milliseconds. */
-		del_timer_sync(&bond->mii_timer);
-	}
+	if (bond->params.miimon && delayed_work_pending(&bond->mii_work))
+		cancel_delayed_work(&bond->mii_work);
 
-	if (bond->params.arp_interval) {  /* arp interval, in milliseconds. */
-		del_timer_sync(&bond->arp_timer);
-	}
+	if (bond->params.arp_interval && delayed_work_pending(&bond->arp_work))
+		cancel_delayed_work(&bond->arp_work);
 
 	switch (bond->params.mode) {
 	case BOND_MODE_8023AD:
-		del_timer_sync(&(BOND_AD_INFO(bond).ad_timer));
+		if (delayed_work_pending(&bond->ad_work))
+			cancel_delayed_work(&bond->ad_work);
 		break;
 	case BOND_MODE_TLB:
 	case BOND_MODE_ALB:
-		del_timer_sync(&(BOND_ALB_INFO(bond).alb_timer));
+		if (delayed_work_pending(&bond->alb_work))
+			cancel_delayed_work(&bond->alb_work);
 		break;
 	default:
 		break;
-	}
-
+        }
 
 	if ((bond->params.mode == BOND_MODE_TLB) ||
 	    (bond->params.mode == BOND_MODE_ALB)) {
@@ -3630,7 +3692,7 @@ static struct net_device_stats *bond_get_stats(struct net_device *bond_dev)
 
 	memset(stats, 0, sizeof(struct net_device_stats));
 
-	read_lock_bh(&bond->lock);
+	read_lock(&bond->lock);
 
 	bond_for_each_slave(bond, slave, i) {
 		sstats = slave->dev->get_stats(slave->dev);
@@ -3661,7 +3723,7 @@ static struct net_device_stats *bond_get_stats(struct net_device *bond_dev)
 		stats->tx_window_errors += sstats->tx_window_errors;
 	}
 
-	read_unlock_bh(&bond->lock);
+	read_unlock(&bond->lock);
 
 	return stats;
 }
@@ -3700,13 +3762,13 @@ static int bond_do_ioctl(struct net_device *bond_dev, struct ifreq *ifr, int cmd
 		if (mii->reg_num == 1) {
 			struct bonding *bond = bond_dev->priv;
 			mii->val_out = 0;
-			read_lock_bh(&bond->lock);
+			read_lock(&bond->lock);
 			read_lock(&bond->curr_slave_lock);
 			if (netif_carrier_ok(bond->dev)) {
 				mii->val_out = BMSR_LSTATUS;
 			}
 			read_unlock(&bond->curr_slave_lock);
-			read_unlock_bh(&bond->lock);
+			read_unlock(&bond->lock);
 		}
 
 		return 0;
@@ -3991,7 +4053,7 @@ static int bond_xmit_roundrobin(struct sk_buff *skb, struct net_device *bond_dev
 {
 	struct bonding *bond = bond_dev->priv;
 	struct slave *slave, *start_at;
-	int i;
+	int i, slave_no;
 	int res = 1;
 
 	read_lock(&bond->lock);
@@ -4000,29 +4062,30 @@ static int bond_xmit_roundrobin(struct sk_buff *skb, struct net_device *bond_dev
 		goto out;
 	}
 
-	read_lock(&bond->curr_slave_lock);
-	slave = start_at = bond->curr_active_slave;
-	read_unlock(&bond->curr_slave_lock);
+	/*
+	 * Concurrent TX may collide on rr_tx_counter; we accept that
+	 * as being rare enough to not justify using an atomic op here
+	 */
+	slave_no = bond->rr_tx_counter++ % bond->slave_cnt;
 
-	if (!slave) {
-		goto out;
+	bond_for_each_slave(bond, slave, i) {
+		slave_no--;
+		if (slave_no < 0) {
+			break;
+		}
 	}
 
+	start_at = slave;
+
 	bond_for_each_slave_from(bond, slave, i, start_at) {
 		if (IS_UP(slave->dev) &&
 		    (slave->link == BOND_LINK_UP) &&
 		    (slave->state == BOND_STATE_ACTIVE)) {
 			res = bond_dev_queue_xmit(bond, skb, slave->dev);
-
-			write_lock(&bond->curr_slave_lock);
-			bond->curr_active_slave = slave->next;
-			write_unlock(&bond->curr_slave_lock);
-
 			break;
 		}
 	}
 
-
 out:
 	if (res) {
 		/* no suitable interface, frame not sent */
@@ -4258,6 +4321,12 @@ static int bond_init(struct net_device *bond_dev, struct bond_params *params)
 
 	bond->params = *params; /* copy params struct */
 
+	/* initialize individual workqueue */
+        bond->wq = create_singlethread_workqueue(bond_dev->name);
+
+        if (!bond->wq)
+		return -ENOMEM;
+
 	/* Initialize pointers */
 	bond->first_slave = NULL;
 	bond->curr_active_slave = NULL;
@@ -4743,6 +4812,7 @@ static int __init bonding_init(void)
 {
 	int i;
 	int res;
+	struct bonding *bond, *nxt;
 
 	printk(KERN_INFO "%s", version);
 
@@ -4769,10 +4839,16 @@ static int __init bonding_init(void)
 
 	goto out;
 err:
+	list_for_each_entry_safe(bond, nxt, &bond_dev_list, bond_list) {
+		bond_work_cancel_all(bond);
+		destroy_workqueue(bond->wq);
+	}
+
 	rtnl_lock();
 	bond_free_all();
 	bond_destroy_sysfs();
 	rtnl_unlock();
+
 out:
 	return res;
 
@@ -4780,9 +4856,18 @@ out:
 
 static void __exit bonding_exit(void)
 {
+	struct bonding *bond, *nxt;
+
 	unregister_netdevice_notifier(&bond_netdev_notifier);
 	unregister_inetaddr_notifier(&bond_inetaddr_notifier);
 
+	list_for_each_entry_safe(bond, nxt, &bond_dev_list, bond_list) {
+		if (bond && bond->dev) {
+			bond_work_cancel_all(bond);
+			destroy_workqueue(bond->wq);
+		}
+	}
+
 	rtnl_lock();
 	bond_free_all();
 	bond_destroy_sysfs();
diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index a122baa..ee17fd4 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -231,7 +231,7 @@ static ssize_t bonding_show_slaves(struct device *d,
 	int i, res = 0;
 	struct bonding *bond = to_bond(d);
 
-	read_lock_bh(&bond->lock);
+	read_lock(&bond->lock);
 	bond_for_each_slave(bond, slave, i) {
 		if (res > (PAGE_SIZE - IFNAMSIZ)) {
 			/* not enough space for another interface name */
@@ -242,7 +242,7 @@ static ssize_t bonding_show_slaves(struct device *d,
 		}
 		res += sprintf(buf + res, "%s ", slave->dev->name);
 	}
-	read_unlock_bh(&bond->lock);
+	read_unlock(&bond->lock);
 	res += sprintf(buf + res, "\n");
 	res++;
 	return res;
@@ -285,18 +285,18 @@ static ssize_t bonding_store_slaves(struct device *d,
 
 		/* Got a slave name in ifname.  Is it already in the list? */
 		found = 0;
-		read_lock_bh(&bond->lock);
+		read_lock(&bond->lock);
 		bond_for_each_slave(bond, slave, i)
 			if (strnicmp(slave->dev->name, ifname, IFNAMSIZ) == 0) {
 				printk(KERN_ERR DRV_NAME
 				       ": %s: Interface %s is already enslaved!\n",
 				       bond->dev->name, ifname);
 				ret = -EPERM;
-				read_unlock_bh(&bond->lock);
+				read_unlock(&bond->lock);
 				goto out;
 			}
 
-		read_unlock_bh(&bond->lock);
+		read_unlock(&bond->lock);
 		printk(KERN_INFO DRV_NAME ": %s: Adding slave %s.\n",
 		       bond->dev->name, ifname);
 		dev = dev_get_by_name(ifname);
@@ -610,11 +610,9 @@ static ssize_t bonding_store_arp_interval(struct device *d,
 		       bond->dev->name, bond->dev->name);
 		bond->params.miimon = 0;
 		/* Kill MII timer, else it brings bond's link down */
-		if (bond->arp_timer.function) {
-			printk(KERN_INFO DRV_NAME
-			": %s: Kill MII timer, else it brings bond's link down...\n",
-		       bond->dev->name);
-			del_timer_sync(&bond->mii_timer);
+		if (delayed_work_pending(&bond->mii_work)) {
+			cancel_delayed_work(&bond->mii_work);
+			flush_workqueue(bond->wq);
 		}
 	}
 	if (!bond->params.arp_targets[0]) {
@@ -629,25 +627,15 @@ static ssize_t bonding_store_arp_interval(struct device *d,
 		 * timer will get fired off when the open function
 		 * is called.
 		 */
-		if (bond->arp_timer.function) {
-			/* The timer's already set up, so fire it off */
-			mod_timer(&bond->arp_timer, jiffies + 1);
-		} else {
-			/* Set up the timer. */
-			init_timer(&bond->arp_timer);
-			bond->arp_timer.expires = jiffies + 1;
-			bond->arp_timer.data =
-				(unsigned long) bond->dev;
-			if (bond->params.mode == BOND_MODE_ACTIVEBACKUP) {
-				bond->arp_timer.function =
-					(void *)
-					&bond_activebackup_arp_mon;
-			} else {
-				bond->arp_timer.function =
-					(void *)
-					&bond_loadbalance_arp_mon;
-			}
-			add_timer(&bond->arp_timer);
+		if (!delayed_work_pending(&bond->arp_work)) {
+			if (bond->params.mode == BOND_MODE_ACTIVEBACKUP)
+				INIT_DELAYED_WORK(&bond->arp_work,
+						  bond_activebackup_arp_mon);
+			else
+				INIT_DELAYED_WORK(&bond->arp_work,
+						  bond_loadbalance_arp_mon);
+
+			queue_delayed_work(bond->wq, &bond->arp_work, 0);
 		}
 	}
 
@@ -1004,11 +992,12 @@ static ssize_t bonding_store_miimon(struct device *d,
 					BOND_ARP_VALIDATE_NONE;
 			}
 			/* Kill ARP timer, else it brings bond's link down */
-			if (bond->mii_timer.function) {
+			if (delayed_work_pending(&bond->arp_work)) {
 				printk(KERN_INFO DRV_NAME
 				": %s: Kill ARP timer, else it brings bond's link down...\n",
 			       bond->dev->name);
-				del_timer_sync(&bond->arp_timer);
+				cancel_delayed_work(&bond->arp_work);
+				flush_workqueue(bond->wq);
 			}
 		}
 
@@ -1018,18 +1007,11 @@ static ssize_t bonding_store_miimon(struct device *d,
 			 * timer will get fired off when the open function
 			 * is called.
 			 */
-			if (bond->mii_timer.function) {
-				/* The timer's already set up, so fire it off */
-				mod_timer(&bond->mii_timer, jiffies + 1);
-			} else {
-				/* Set up the timer. */
-				init_timer(&bond->mii_timer);
-				bond->mii_timer.expires = jiffies + 1;
-				bond->mii_timer.data =
-					(unsigned long) bond->dev;
-				bond->mii_timer.function =
-					(void *) &bond_mii_monitor;
-				add_timer(&bond->mii_timer);
+			if (!delayed_work_pending(&bond->mii_work)) {
+				INIT_DELAYED_WORK(&bond->mii_work,
+						  bond_mii_monitor);
+				queue_delayed_work(bond->wq,
+						   &bond->mii_work, 0);
 			}
 		}
 	}
@@ -1068,6 +1050,7 @@ static ssize_t bonding_store_primary(struct device *d,
 	struct slave *slave;
 	struct bonding *bond = to_bond(d);
 
+	rtnl_lock();
 	write_lock_bh(&bond->lock);
 	if (!USES_PRIMARY(bond->params.mode)) {
 		printk(KERN_INFO DRV_NAME
@@ -1103,6 +1086,8 @@ static ssize_t bonding_store_primary(struct device *d,
 	}
 out:
 	write_unlock_bh(&bond->lock);
+	rtnl_unlock();
+
 	return count;
 }
 static DEVICE_ATTR(primary, S_IRUGO | S_IWUSR, bonding_show_primary, bonding_store_primary);
@@ -1182,6 +1167,7 @@ static ssize_t bonding_store_active_slave(struct device *d,
         struct slave *new_active = NULL;
 	struct bonding *bond = to_bond(d);
 
+	rtnl_lock();
 	write_lock_bh(&bond->lock);
 	if (!USES_PRIMARY(bond->params.mode)) {
 		printk(KERN_INFO DRV_NAME
@@ -1239,6 +1225,8 @@ static ssize_t bonding_store_active_slave(struct device *d,
 	}
 out:
 	write_unlock_bh(&bond->lock);
+	rtnl_unlock();
+
 	return count;
 
 }
diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
index 41aa78b..d8a23b0 100644
--- a/drivers/net/bonding/bonding.h
+++ b/drivers/net/bonding/bonding.h
@@ -182,9 +182,8 @@ struct bonding {
 	s32      slave_cnt; /* never change this value outside the attach/detach wrappers */
 	rwlock_t lock;
 	rwlock_t curr_slave_lock;
-	struct   timer_list mii_timer;
-	struct   timer_list arp_timer;
 	s8       kill_timers;
+	u16	 rr_tx_counter;
 	struct   net_device_stats stats;
 #ifdef CONFIG_PROC_FS
 	struct   proc_dir_entry *proc_entry;
@@ -201,6 +200,11 @@ struct bonding {
 	struct   list_head vlan_list;
 	struct   vlan_group *vlgrp;
 	struct   packet_type arp_mon_pt;
+	struct   workqueue_struct *wq;
+	struct   delayed_work mii_work;
+	struct   delayed_work arp_work;
+	struct   delayed_work alb_work;
+	struct   delayed_work ad_work;
 };
 
 /**
@@ -302,9 +306,9 @@ void bond_destroy_slave_symlinks(struct net_device *master, struct net_device *s
 int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev);
 int bond_release(struct net_device *bond_dev, struct net_device *slave_dev);
 int bond_sethwaddr(struct net_device *bond_dev, struct net_device *slave_dev);
-void bond_mii_monitor(struct net_device *bond_dev);
-void bond_loadbalance_arp_mon(struct net_device *bond_dev);
-void bond_activebackup_arp_mon(struct net_device *bond_dev);
+void bond_mii_monitor(struct work_struct *);
+void bond_loadbalance_arp_mon(struct work_struct *);
+void bond_activebackup_arp_mon(struct work_struct *);
 void bond_set_mode_ops(struct bonding *bond, int mode);
 int bond_parse_parm(char *mode_arg, struct bond_parm_tbl *tbl);
 const char *bond_mode_name(int mode);
@@ -312,6 +316,8 @@ void bond_select_active_slave(struct bonding *bond);
 void bond_change_active_slave(struct bonding *bond, struct slave *new_active);
 void bond_register_arp(struct bonding *);
 void bond_unregister_arp(struct bonding *);
+void bond_wq_destroy(struct net_device *bond_dev);
+
 
 #endif /* _LINUX_BONDING_H */
 

^ permalink raw reply related	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2007-05-10 19:57 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-26 18:58 [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1 Vincent ETIENNE
2007-04-26 20:27 ` Chris Snook
2007-04-26 20:44   ` [Bonding-devel] " Jay Vosburgh
2007-04-26 21:43     ` Vincent ETIENNE
2007-05-09 18:47     ` Vincent ETIENNE
2007-05-10 19:57       ` Jay Vosburgh
2007-04-27  4:11 ` Andrew Morton
2007-04-27  9:25   ` VE (HOME)
2007-04-27 19:20     ` Andrew Morton
     [not found]       ` <200704272205.29131.ve@vetienne.net>
2007-04-27 22:32         ` Andrew Morton
2007-04-28 20:37           ` Vincent ETIENNE
2007-04-28 21:31             ` Andrew Morton
2007-04-28 21:55               ` Patrick McHardy
2007-04-28 22:04               ` Vincent ETIENNE
2007-04-28 21:10           ` Vincent ETIENNE
2007-04-27 21:18     ` USB HID bug (was [PROBLEM] Bonding driver in linux-2.6.21-rc6-mm1) Greg KH
2007-04-27 22:42       ` Jiri Kosina
2007-04-27 22:55         ` Jiri Kosina
2007-04-27 23:24         ` Paul Walmsley
2007-04-28  7:21         ` Vincent ETIENNE
2007-04-28 14:07           ` Paul Walmsley
2007-04-28 15:06           ` Jiri Kosina
2007-04-28 19:50             ` [linux-usb-devel] " Alan Stern
2007-04-28 20:43               ` Vincent ETIENNE
2007-04-29 11:18                 ` Jiri Kosina
2007-04-29 13:40                   ` Vincent ETIENNE

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.