* Re: [PATCH] IB/ipoib: Bound the net device to the ipoib_neigh structue [not found] ` <470C200D.4010705@pobox.com> @ 2007-10-10 0:56 ` Jeff Garzik 2007-10-10 1:12 ` David Miller 0 siblings, 1 reply; 10+ messages in thread From: Jeff Garzik @ 2007-10-10 0:56 UTC (permalink / raw) To: Jay Vosburgh; +Cc: Moni Shoua, Or Gerlitz, netdev Jeff Garzik wrote: > applied patches 1-9 > > the only thing that was a hiccup during submission is that your email > subject lines did not contain a notion of ordering "[PATCH 1/9] ...". > But other than that, the git-send-email went flawlessly. unfortunately it does not seem to build flawlessly: drivers/net/bonding/bond_main.c: In function ‘bond_setup_by_slave’: drivers/net/bonding/bond_main.c:1264: error: ‘struct net_device’ has no member named ‘hard_header’ drivers/net/bonding/bond_main.c:1264: error: ‘struct net_device’ has no member named ‘hard_header’ drivers/net/bonding/bond_main.c:1265: error: ‘struct net_device’ has no member named ‘rebuild_header’ drivers/net/bonding/bond_main.c:1265: error: ‘struct net_device’ has no member named ‘rebuild_header’ drivers/net/bonding/bond_main.c:1266: error: ‘struct net_device’ has no member named ‘hard_header_cache’ drivers/net/bonding/bond_main.c:1266: error: ‘struct net_device’ has no member named ‘hard_header_cache’ drivers/net/bonding/bond_main.c:1267: error: ‘struct net_device’ has no member named ‘header_cache_update’ drivers/net/bonding/bond_main.c:1267: error: ‘struct net_device’ has no member named ‘header_cache_update’ drivers/net/bonding/bond_main.c:1268: error: ‘struct net_device’ has no member named ‘hard_header_parse’ drivers/net/bonding/bond_main.c:1268: error: ‘struct net_device’ has no member named ‘hard_header_parse’ drivers/net/bonding/bond_main.c: In function ‘bond_release_and_destroy’: drivers/net/bonding/bond_main.c:1864: warning: too few arguments for format ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] IB/ipoib: Bound the net device to the ipoib_neigh structue 2007-10-10 0:56 ` [PATCH] IB/ipoib: Bound the net device to the ipoib_neigh structue Jeff Garzik @ 2007-10-10 1:12 ` David Miller 2007-10-10 1:18 ` Jay Vosburgh 0 siblings, 1 reply; 10+ messages in thread From: David Miller @ 2007-10-10 1:12 UTC (permalink / raw) To: jeff; +Cc: fubar, monis, ogerlitz, netdev From: Jeff Garzik <jeff@garzik.org> Date: Tue, 09 Oct 2007 20:56:35 -0400 > Jeff Garzik wrote: > > applied patches 1-9 > > > > the only thing that was a hiccup during submission is that your email > > subject lines did not contain a notion of ordering "[PATCH 1/9] ...". > > But other than that, the git-send-email went flawlessly. > > unfortunately it does not seem to build flawlessly: Yeah it doesn't handle Stephen Hemmingers headerops change in net-2.6.24 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] IB/ipoib: Bound the net device to the ipoib_neigh structue 2007-10-10 1:12 ` David Miller @ 2007-10-10 1:18 ` Jay Vosburgh 2007-10-10 16:03 ` Moni Shoua 0 siblings, 1 reply; 10+ messages in thread From: Jay Vosburgh @ 2007-10-10 1:18 UTC (permalink / raw) To: David Miller; +Cc: jeff, monis, ogerlitz, netdev David Miller <davem@davemloft.net> wrote: >From: Jeff Garzik <jeff@garzik.org> >Date: Tue, 09 Oct 2007 20:56:35 -0400 > >> Jeff Garzik wrote: >> > applied patches 1-9 >> > >> > the only thing that was a hiccup during submission is that your email >> > subject lines did not contain a notion of ordering "[PATCH 1/9] ...". >> > But other than that, the git-send-email went flawlessly. >> >> unfortunately it does not seem to build flawlessly: > >Yeah it doesn't handle Stephen Hemmingers headerops change >in net-2.6.24 Gaah. I'll sort it out and repost. -J --- -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] IB/ipoib: Bound the net device to the ipoib_neigh structue 2007-10-10 1:18 ` Jay Vosburgh @ 2007-10-10 16:03 ` Moni Shoua 2007-10-10 18:31 ` Roland Dreier 0 siblings, 1 reply; 10+ messages in thread From: Moni Shoua @ 2007-10-10 16:03 UTC (permalink / raw) To: Jay Vosburgh, jeff; +Cc: David Miller, ogerlitz, netdev Jay Vosburgh wrote: > David Miller <davem@davemloft.net> wrote: > >> From: Jeff Garzik <jeff@garzik.org> >> Date: Tue, 09 Oct 2007 20:56:35 -0400 >> >>> Jeff Garzik wrote: >>>> applied patches 1-9 >>>> >>>> the only thing that was a hiccup during submission is that your email >>>> subject lines did not contain a notion of ordering "[PATCH 1/9] ...". >>>> But other than that, the git-send-email went flawlessly. >>> unfortunately it does not seem to build flawlessly: >> Yeah it doesn't handle Stephen Hemmingers headerops change >> in net-2.6.24 > > Gaah. I'll sort it out and repost. > > -J > > --- > -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com Hi Jay, Jeff Thanks for the help with making the patch work compile under 2.6.24. However, patch #3 has a missing line in bond_setup_by_slave that should look like this bond_dev->header_ops = slave_dev->header_ops; I rewrote the patch and also fixed patch #8 that became broken. I would send the new patches now but there is more.... I also ran a test for the code in the branch of 2.6.24 and found a problem. I see that ifconfig down doesn't return (for IPoIB interfaces) and it's stuck in napi_disable() in the kernel (any idea why?) I am trying to solve it now so I'd like to wait a short time before applying these patches. I guess that I'll need to add something. thanks MoniS ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] IB/ipoib: Bound the net device to the ipoib_neigh structue 2007-10-10 16:03 ` Moni Shoua @ 2007-10-10 18:31 ` Roland Dreier 2007-10-11 14:48 ` Moni Shoua 0 siblings, 1 reply; 10+ messages in thread From: Roland Dreier @ 2007-10-10 18:31 UTC (permalink / raw) To: Moni Shoua; +Cc: Jay Vosburgh, jeff, David Miller, ogerlitz, netdev > I also ran a test for the code in the branch of 2.6.24 and found a problem. > I see that ifconfig down doesn't return (for IPoIB interfaces) and it's stuck in napi_disable() in the kernel (any idea why?) For what it's worth, I took the upstream 2.6.23 git tree and merged in Dave's latest net-2.6.24 tree and my latest for-2.6.24 tree and tried that. I brought up an IPoIB interface, sent a few pings, and did ifconfig down, and it worked fine. Can you try the same thing without the bonding patches to see if your setup works OK too? Also can you give more details about what you do to get ifconfig down stuck? - R. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] IB/ipoib: Bound the net device to the ipoib_neigh structue 2007-10-10 18:31 ` Roland Dreier @ 2007-10-11 14:48 ` Moni Shoua 2007-10-11 20:17 ` Roland Dreier 0 siblings, 1 reply; 10+ messages in thread From: Moni Shoua @ 2007-10-11 14:48 UTC (permalink / raw) To: Roland Dreier, Jay Vosburgh, jeff Cc: David Miller, ogerlitz, netdev, Moni Levy Roland Dreier wrote: > > I also ran a test for the code in the branch of 2.6.24 and found a problem. > > I see that ifconfig down doesn't return (for IPoIB interfaces) and it's stuck in napi_disable() in the kernel (any idea why?) > > For what it's worth, I took the upstream 2.6.23 git tree and merged in > Dave's latest net-2.6.24 tree and my latest for-2.6.24 tree and tried > that. I brought up an IPoIB interface, sent a few pings, and did > ifconfig down, and it worked fine. > > Can you try the same thing without the bonding patches to see if your > setup works OK too? > > Also can you give more details about what you do to get ifconfig down stuck? > > - R. Without bonding ifconfig down works fine. It happens only when ib interfaces are slaves of a bonding device. I thought before that the stuck is in napi_disable() but it's almost right. I put prints before and after call to napi_disable and see that it is called twice. I'll try to investigate in this direction. ib0: stopping interface ib0: before napi_disable ib0: after napi_disable ib0: downing ib_dev ib0: All sends and receives done. ib0: stopping interface ib0: before napi_disable There is also a dump of the kernel log after 'echo t > /proc/sysrq-trigger' (for ifconfig) SysRq : Show State ifconfig S 0000000000000000 0 6311 6099 ffff810034f49d18 0000000000000086 0000000000000000 ffffffffffffffff ffff810037e747c0 ffff810037e747c0 000000013481e000 ffff81003a851a78 ffff81003a851840 000000003b0c8c00 0000000000000000 00000000802358ee Call Trace: [<ffffffff8023cc89>] lock_timer_base+0x24/0x49 [<ffffffff80403754>] schedule_timeout+0x8a/0xad [<ffffffff8023d241>] process_timeout+0x0/0x5 [<ffffffff8023d6ec>] msleep_interruptible+0x11/0x39 [<ffffffff884081a7>] :ib_ipoib:ipoib_stop+0x64/0x12c [<ffffffff8039fc07>] dev_close+0x3e/0x56 [<ffffffff803a1c31>] dev_change_flags+0xa7/0x15f [<ffffffff803e5bee>] devinet_ioctl+0x293/0x5ed [<ffffffff803e775b>] inet_ioctl+0x7f/0x9d [<ffffffff80395b2e>] sock_ioctl+0x0/0x1fe [<ffffffff80395d08>] sock_ioctl+0x1da/0x1fe [<ffffffff802947d9>] do_ioctl+0x29/0x6f [<ffffffff80294a75>] vfs_ioctl+0x256/0x267 [<ffffffff80294adf>] sys_ioctl+0x59/0x7a [<ffffffff8020bc0e>] system_call+0x7e/0x83 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] IB/ipoib: Bound the net device to the ipoib_neigh structue 2007-10-11 14:48 ` Moni Shoua @ 2007-10-11 20:17 ` Roland Dreier 2007-10-11 22:01 ` Jay Vosburgh ` (2 more replies) 0 siblings, 3 replies; 10+ messages in thread From: Roland Dreier @ 2007-10-11 20:17 UTC (permalink / raw) To: Moni Shoua; +Cc: Jay Vosburgh, jeff, David Miller, ogerlitz, netdev, Moni Levy > It happens only when ib interfaces are slaves of a bonding device. > I thought before that the stuck is in napi_disable() but it's almost right. > I put prints before and after call to napi_disable and see that it is called twice. > I'll try to investigate in this direction. > > ib0: stopping interface > ib0: before napi_disable > ib0: after napi_disable > ib0: downing ib_dev > ib0: All sends and receives done. > ib0: stopping interface > ib0: before napi_disable Yes, two napi_disable()s in a row without a matching napi_enable() will deadlock. I guess the question is why the ipoib interface is being stopped twice. If you just take the net-2.6.24 tree (without bonding patches), does bonding for ethernet interfaces work OK, or is there a similar problem with double napi_disable()? How about bonding of ethernet after this batch of bonding patches? - R. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] IB/ipoib: Bound the net device to the ipoib_neigh structue 2007-10-11 20:17 ` Roland Dreier @ 2007-10-11 22:01 ` Jay Vosburgh 2007-10-13 15:24 ` Moni Shoua 2007-10-14 15:51 ` Moni Shoua 2 siblings, 0 replies; 10+ messages in thread From: Jay Vosburgh @ 2007-10-11 22:01 UTC (permalink / raw) To: Roland Dreier; +Cc: Moni Shoua, jeff, David Miller, ogerlitz, netdev, Moni Levy Roland Dreier <rdreier@cisco.com> wrote: [...] >Yes, two napi_disable()s in a row without a matching napi_enable() >will deadlock. I guess the question is why the ipoib interface is >being stopped twice. > >If you just take the net-2.6.24 tree (without bonding patches), does >bonding for ethernet interfaces work OK, or is there a similar problem >with double napi_disable()? How about bonding of ethernet after this >batch of bonding patches? I just checked this on an x86 box. The bonding in stock net-2.6 pulled this morning or last night works ok (I did some basic tests, including ifconfig down / up, with e100). This remains true with the IPoIB bonding patches applied. I do not have hardware available to test IPoIB. I did get a whammy from tg3, but I think this is unrelated to bonding (as it happens when tg3 comes up, before bonding is involved): BUG: unable to handle kernel paging request at virtual address 00004214 printing eip: e0828017 *pde = 00000000 Oops: 0002 [#1] SMP Modules linked in: thermal processor fan button loop e1000 sg evdev tg3 e100 rtb CPU: 0 EIP: 0060:[<e0828017>] Not tainted VLI EFLAGS: 00010206 (2.6.23-ipv6 #1) EIP is at tg3_ape_write32+0x7/0x10 [tg3] eax: de9304c0 ebx: dde8fe18 ecx: 00000000 edx: 00004214 esi: de9304c0 edi: 00000000 ebp: dde8fe28 esp: dde8fdd4 ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068 Process ip (pid: 2817, ti=dde8e000 task=dff4e0b0 task.ti=dde8e000) Stack: e082fb2e 00000000 dde8fdf4 c01ece3e dde8fdf8 000003fe 00000000 00005400 08000000 00001aa0 e083b340 08001aa0 00000060 e083ce00 08001b20 00000030 e083ce80 00000101 de9304c0 00000001 dde56800 dde8fe38 e0830178 dff69000 Call Trace: [<c010536a>] show_trace_log_lvl+0x1a/0x30 [<c0105429>] show_stack_log_lvl+0xa9/0xd0 [<c0105639>] show_registers+0x1e9/0x2f0 [<c0105851>] die+0x111/0x260 [<c011c5dc>] do_page_fault+0x18c/0x6a0 [<c0319bea>] error_code+0x72/0x78 [<e0830178>] tg3_init_hw+0x38/0x50 [tg3] [<e0838886>] tg3_open+0x276/0x5d0 [tg3] [<c02aead8>] dev_open+0x38/0x80 [<c02ad5cd>] dev_change_flags+0x7d/0x1a0 [<c02f63d8>] devinet_ioctl+0x4c8/0x660 [<c02f698b>] inet_ioctl+0x6b/0x90 [<c02a0e5a>] sock_ioctl+0x5a/0x210 [<c017cd98>] do_ioctl+0x28/0x80 [<c017ce47>] vfs_ioctl+0x57/0x290 [<c017d0b9>] sys_ioctl+0x39/0x60 [<c01042a2>] sysenter_past_esp+0x5f/0x99 ======================= Code: <89> 0a c3 8d b6 00 00 00 00 55 8b 48 50 89 e5 5d 01 ca 8b 02 c3 8d EIP: [<e0828017>] tg3_ape_write32+0x7/0x10 [tg3] SS:ESP 0068:dde8fdd4 Kernel panic - not syncing: Fatal exception in interrupt I haven't investigated this further. I'm using a BCM5704 card; if this isn't a known problem and anyone is curious, I can supply additional info. -J --- -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] IB/ipoib: Bound the net device to the ipoib_neigh structue 2007-10-11 20:17 ` Roland Dreier 2007-10-11 22:01 ` Jay Vosburgh @ 2007-10-13 15:24 ` Moni Shoua 2007-10-14 15:51 ` Moni Shoua 2 siblings, 0 replies; 10+ messages in thread From: Moni Shoua @ 2007-10-13 15:24 UTC (permalink / raw) To: Roland Dreier Cc: Moni Shoua, Jay Vosburgh, jeff, David Miller, ogerlitz, netdev, Moni Levy I will be near my lab only tomorrow... I will check this and let you know. On 10/11/07, Roland Dreier <rdreier@cisco.com> wrote: > > It happens only when ib interfaces are slaves of a bonding device. > > I thought before that the stuck is in napi_disable() but it's almost right. > > I put prints before and after call to napi_disable and see that it is called twice. > > I'll try to investigate in this direction. > > > > ib0: stopping interface > > ib0: before napi_disable > > ib0: after napi_disable > > ib0: downing ib_dev > > ib0: All sends and receives done. > > ib0: stopping interface > > ib0: before napi_disable > > Yes, two napi_disable()s in a row without a matching napi_enable() > will deadlock. I guess the question is why the ipoib interface is > being stopped twice. > > If you just take the net-2.6.24 tree (without bonding patches), does > bonding for ethernet interfaces work OK, or is there a similar problem > with double napi_disable()? How about bonding of ethernet after this > batch of bonding patches? > > - R. > - > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] IB/ipoib: Bound the net device to the ipoib_neigh structue 2007-10-11 20:17 ` Roland Dreier 2007-10-11 22:01 ` Jay Vosburgh 2007-10-13 15:24 ` Moni Shoua @ 2007-10-14 15:51 ` Moni Shoua 2 siblings, 0 replies; 10+ messages in thread From: Moni Shoua @ 2007-10-14 15:51 UTC (permalink / raw) To: Roland Dreier, Jay Vosburgh Cc: jeff, David Miller, ogerlitz, netdev, Moni Levy Roland Dreier wrote: > > It happens only when ib interfaces are slaves of a bonding device. > > I thought before that the stuck is in napi_disable() but it's almost right. > > I put prints before and after call to napi_disable and see that it is called twice. > > I'll try to investigate in this direction. > > > > ib0: stopping interface > > ib0: before napi_disable > > ib0: after napi_disable > > ib0: downing ib_dev > > ib0: All sends and receives done. > > ib0: stopping interface > > ib0: before napi_disable > > Yes, two napi_disable()s in a row without a matching napi_enable() > will deadlock. I guess the question is why the ipoib interface is > being stopped twice. > > If you just take the net-2.6.24 tree (without bonding patches), does > bonding for ethernet interfaces work OK, or is there a similar problem > with double napi_disable()? How about bonding of ethernet after this > batch of bonding patches? > > - R. Ok, I think I know what happens here. When bonding gets an NETDEV_GOING_DONW event it releases the slave and by the way closes the slave device (this is a new code). ifconfig on the other hand closes the deivice one more time and this is why we see 2 napi_disable() in a row. The fix in my opinion is in bonding - it should react to NETDEV_UNREGISTER and not to NETDEV_GOING_DONW. I want to test this point and if it's good I'll submit new patches. thanks MoniS ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2007-10-14 15:51 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <11916151232222-git-send-email-fubar@us.ibm.com>
[not found] ` <470C200D.4010705@pobox.com>
2007-10-10 0:56 ` [PATCH] IB/ipoib: Bound the net device to the ipoib_neigh structue Jeff Garzik
2007-10-10 1:12 ` David Miller
2007-10-10 1:18 ` Jay Vosburgh
2007-10-10 16:03 ` Moni Shoua
2007-10-10 18:31 ` Roland Dreier
2007-10-11 14:48 ` Moni Shoua
2007-10-11 20:17 ` Roland Dreier
2007-10-11 22:01 ` Jay Vosburgh
2007-10-13 15:24 ` Moni Shoua
2007-10-14 15:51 ` Moni Shoua
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).