* Re: [PATCH] IB/ipoib: Bound the net device to the ipoib_neigh structue
[not found] ` <470C200D.4010705@pobox.com>
@ 2007-10-10 0:56 ` Jeff Garzik
2007-10-10 1:12 ` David Miller
0 siblings, 1 reply; 10+ messages in thread
From: Jeff Garzik @ 2007-10-10 0:56 UTC (permalink / raw)
To: Jay Vosburgh; +Cc: Moni Shoua, Or Gerlitz, netdev
Jeff Garzik wrote:
> applied patches 1-9
>
> the only thing that was a hiccup during submission is that your email
> subject lines did not contain a notion of ordering "[PATCH 1/9] ...".
> But other than that, the git-send-email went flawlessly.
unfortunately it does not seem to build flawlessly:
drivers/net/bonding/bond_main.c: In function ‘bond_setup_by_slave’:
drivers/net/bonding/bond_main.c:1264: error: ‘struct net_device’ has no
member named ‘hard_header’
drivers/net/bonding/bond_main.c:1264: error: ‘struct net_device’ has no
member named ‘hard_header’
drivers/net/bonding/bond_main.c:1265: error: ‘struct net_device’ has no
member named ‘rebuild_header’
drivers/net/bonding/bond_main.c:1265: error: ‘struct net_device’ has no
member named ‘rebuild_header’
drivers/net/bonding/bond_main.c:1266: error: ‘struct net_device’ has no
member named ‘hard_header_cache’
drivers/net/bonding/bond_main.c:1266: error: ‘struct net_device’ has no
member named ‘hard_header_cache’
drivers/net/bonding/bond_main.c:1267: error: ‘struct net_device’ has no
member named ‘header_cache_update’
drivers/net/bonding/bond_main.c:1267: error: ‘struct net_device’ has no
member named ‘header_cache_update’
drivers/net/bonding/bond_main.c:1268: error: ‘struct net_device’ has no
member named ‘hard_header_parse’
drivers/net/bonding/bond_main.c:1268: error: ‘struct net_device’ has no
member named ‘hard_header_parse’
drivers/net/bonding/bond_main.c: In function ‘bond_release_and_destroy’:
drivers/net/bonding/bond_main.c:1864: warning: too few arguments for format
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] IB/ipoib: Bound the net device to the ipoib_neigh structue
2007-10-10 0:56 ` [PATCH] IB/ipoib: Bound the net device to the ipoib_neigh structue Jeff Garzik
@ 2007-10-10 1:12 ` David Miller
2007-10-10 1:18 ` Jay Vosburgh
0 siblings, 1 reply; 10+ messages in thread
From: David Miller @ 2007-10-10 1:12 UTC (permalink / raw)
To: jeff; +Cc: fubar, monis, ogerlitz, netdev
From: Jeff Garzik <jeff@garzik.org>
Date: Tue, 09 Oct 2007 20:56:35 -0400
> Jeff Garzik wrote:
> > applied patches 1-9
> >
> > the only thing that was a hiccup during submission is that your email
> > subject lines did not contain a notion of ordering "[PATCH 1/9] ...".
> > But other than that, the git-send-email went flawlessly.
>
> unfortunately it does not seem to build flawlessly:
Yeah it doesn't handle Stephen Hemmingers headerops change
in net-2.6.24
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] IB/ipoib: Bound the net device to the ipoib_neigh structue
2007-10-10 1:12 ` David Miller
@ 2007-10-10 1:18 ` Jay Vosburgh
2007-10-10 16:03 ` Moni Shoua
0 siblings, 1 reply; 10+ messages in thread
From: Jay Vosburgh @ 2007-10-10 1:18 UTC (permalink / raw)
To: David Miller; +Cc: jeff, monis, ogerlitz, netdev
David Miller <davem@davemloft.net> wrote:
>From: Jeff Garzik <jeff@garzik.org>
>Date: Tue, 09 Oct 2007 20:56:35 -0400
>
>> Jeff Garzik wrote:
>> > applied patches 1-9
>> >
>> > the only thing that was a hiccup during submission is that your email
>> > subject lines did not contain a notion of ordering "[PATCH 1/9] ...".
>> > But other than that, the git-send-email went flawlessly.
>>
>> unfortunately it does not seem to build flawlessly:
>
>Yeah it doesn't handle Stephen Hemmingers headerops change
>in net-2.6.24
Gaah. I'll sort it out and repost.
-J
---
-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] IB/ipoib: Bound the net device to the ipoib_neigh structue
2007-10-10 1:18 ` Jay Vosburgh
@ 2007-10-10 16:03 ` Moni Shoua
2007-10-10 18:31 ` Roland Dreier
0 siblings, 1 reply; 10+ messages in thread
From: Moni Shoua @ 2007-10-10 16:03 UTC (permalink / raw)
To: Jay Vosburgh, jeff; +Cc: David Miller, ogerlitz, netdev
Jay Vosburgh wrote:
> David Miller <davem@davemloft.net> wrote:
>
>> From: Jeff Garzik <jeff@garzik.org>
>> Date: Tue, 09 Oct 2007 20:56:35 -0400
>>
>>> Jeff Garzik wrote:
>>>> applied patches 1-9
>>>>
>>>> the only thing that was a hiccup during submission is that your email
>>>> subject lines did not contain a notion of ordering "[PATCH 1/9] ...".
>>>> But other than that, the git-send-email went flawlessly.
>>> unfortunately it does not seem to build flawlessly:
>> Yeah it doesn't handle Stephen Hemmingers headerops change
>> in net-2.6.24
>
> Gaah. I'll sort it out and repost.
>
> -J
>
> ---
> -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
Hi Jay, Jeff
Thanks for the help with making the patch work compile under 2.6.24.
However, patch #3 has a missing line in bond_setup_by_slave that should look like this
bond_dev->header_ops = slave_dev->header_ops;
I rewrote the patch and also fixed patch #8 that became broken.
I would send the new patches now but there is more....
I also ran a test for the code in the branch of 2.6.24 and found a problem.
I see that ifconfig down doesn't return (for IPoIB interfaces) and it's stuck in napi_disable() in the kernel (any idea why?)
I am trying to solve it now so I'd like to wait a short time before applying these patches.
I guess that I'll need to add something.
thanks
MoniS
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] IB/ipoib: Bound the net device to the ipoib_neigh structue
2007-10-10 16:03 ` Moni Shoua
@ 2007-10-10 18:31 ` Roland Dreier
2007-10-11 14:48 ` Moni Shoua
0 siblings, 1 reply; 10+ messages in thread
From: Roland Dreier @ 2007-10-10 18:31 UTC (permalink / raw)
To: Moni Shoua; +Cc: Jay Vosburgh, jeff, David Miller, ogerlitz, netdev
> I also ran a test for the code in the branch of 2.6.24 and found a problem.
> I see that ifconfig down doesn't return (for IPoIB interfaces) and it's stuck in napi_disable() in the kernel (any idea why?)
For what it's worth, I took the upstream 2.6.23 git tree and merged in
Dave's latest net-2.6.24 tree and my latest for-2.6.24 tree and tried
that. I brought up an IPoIB interface, sent a few pings, and did
ifconfig down, and it worked fine.
Can you try the same thing without the bonding patches to see if your
setup works OK too?
Also can you give more details about what you do to get ifconfig down stuck?
- R.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] IB/ipoib: Bound the net device to the ipoib_neigh structue
2007-10-10 18:31 ` Roland Dreier
@ 2007-10-11 14:48 ` Moni Shoua
2007-10-11 20:17 ` Roland Dreier
0 siblings, 1 reply; 10+ messages in thread
From: Moni Shoua @ 2007-10-11 14:48 UTC (permalink / raw)
To: Roland Dreier, Jay Vosburgh, jeff
Cc: David Miller, ogerlitz, netdev, Moni Levy
Roland Dreier wrote:
> > I also ran a test for the code in the branch of 2.6.24 and found a problem.
> > I see that ifconfig down doesn't return (for IPoIB interfaces) and it's stuck in napi_disable() in the kernel (any idea why?)
>
> For what it's worth, I took the upstream 2.6.23 git tree and merged in
> Dave's latest net-2.6.24 tree and my latest for-2.6.24 tree and tried
> that. I brought up an IPoIB interface, sent a few pings, and did
> ifconfig down, and it worked fine.
>
> Can you try the same thing without the bonding patches to see if your
> setup works OK too?
>
> Also can you give more details about what you do to get ifconfig down stuck?
>
> - R.
Without bonding ifconfig down works fine.
It happens only when ib interfaces are slaves of a bonding device.
I thought before that the stuck is in napi_disable() but it's almost right.
I put prints before and after call to napi_disable and see that it is called twice.
I'll try to investigate in this direction.
ib0: stopping interface
ib0: before napi_disable
ib0: after napi_disable
ib0: downing ib_dev
ib0: All sends and receives done.
ib0: stopping interface
ib0: before napi_disable
There is also a dump of the kernel log after 'echo t > /proc/sysrq-trigger' (for ifconfig)
SysRq : Show State
ifconfig S 0000000000000000 0 6311 6099
ffff810034f49d18 0000000000000086 0000000000000000 ffffffffffffffff
ffff810037e747c0 ffff810037e747c0 000000013481e000 ffff81003a851a78
ffff81003a851840 000000003b0c8c00 0000000000000000 00000000802358ee
Call Trace:
[<ffffffff8023cc89>] lock_timer_base+0x24/0x49
[<ffffffff80403754>] schedule_timeout+0x8a/0xad
[<ffffffff8023d241>] process_timeout+0x0/0x5
[<ffffffff8023d6ec>] msleep_interruptible+0x11/0x39
[<ffffffff884081a7>] :ib_ipoib:ipoib_stop+0x64/0x12c
[<ffffffff8039fc07>] dev_close+0x3e/0x56
[<ffffffff803a1c31>] dev_change_flags+0xa7/0x15f
[<ffffffff803e5bee>] devinet_ioctl+0x293/0x5ed
[<ffffffff803e775b>] inet_ioctl+0x7f/0x9d
[<ffffffff80395b2e>] sock_ioctl+0x0/0x1fe
[<ffffffff80395d08>] sock_ioctl+0x1da/0x1fe
[<ffffffff802947d9>] do_ioctl+0x29/0x6f
[<ffffffff80294a75>] vfs_ioctl+0x256/0x267
[<ffffffff80294adf>] sys_ioctl+0x59/0x7a
[<ffffffff8020bc0e>] system_call+0x7e/0x83
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] IB/ipoib: Bound the net device to the ipoib_neigh structue
2007-10-11 14:48 ` Moni Shoua
@ 2007-10-11 20:17 ` Roland Dreier
2007-10-11 22:01 ` Jay Vosburgh
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Roland Dreier @ 2007-10-11 20:17 UTC (permalink / raw)
To: Moni Shoua; +Cc: Jay Vosburgh, jeff, David Miller, ogerlitz, netdev, Moni Levy
> It happens only when ib interfaces are slaves of a bonding device.
> I thought before that the stuck is in napi_disable() but it's almost right.
> I put prints before and after call to napi_disable and see that it is called twice.
> I'll try to investigate in this direction.
>
> ib0: stopping interface
> ib0: before napi_disable
> ib0: after napi_disable
> ib0: downing ib_dev
> ib0: All sends and receives done.
> ib0: stopping interface
> ib0: before napi_disable
Yes, two napi_disable()s in a row without a matching napi_enable()
will deadlock. I guess the question is why the ipoib interface is
being stopped twice.
If you just take the net-2.6.24 tree (without bonding patches), does
bonding for ethernet interfaces work OK, or is there a similar problem
with double napi_disable()? How about bonding of ethernet after this
batch of bonding patches?
- R.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] IB/ipoib: Bound the net device to the ipoib_neigh structue
2007-10-11 20:17 ` Roland Dreier
@ 2007-10-11 22:01 ` Jay Vosburgh
2007-10-13 15:24 ` Moni Shoua
2007-10-14 15:51 ` Moni Shoua
2 siblings, 0 replies; 10+ messages in thread
From: Jay Vosburgh @ 2007-10-11 22:01 UTC (permalink / raw)
To: Roland Dreier; +Cc: Moni Shoua, jeff, David Miller, ogerlitz, netdev, Moni Levy
Roland Dreier <rdreier@cisco.com> wrote:
[...]
>Yes, two napi_disable()s in a row without a matching napi_enable()
>will deadlock. I guess the question is why the ipoib interface is
>being stopped twice.
>
>If you just take the net-2.6.24 tree (without bonding patches), does
>bonding for ethernet interfaces work OK, or is there a similar problem
>with double napi_disable()? How about bonding of ethernet after this
>batch of bonding patches?
I just checked this on an x86 box. The bonding in stock net-2.6
pulled this morning or last night works ok (I did some basic tests,
including ifconfig down / up, with e100). This remains true with the
IPoIB bonding patches applied. I do not have hardware available to test
IPoIB.
I did get a whammy from tg3, but I think this is unrelated to
bonding (as it happens when tg3 comes up, before bonding is involved):
BUG: unable to handle kernel paging request at virtual address 00004214
printing eip:
e0828017
*pde = 00000000
Oops: 0002 [#1]
SMP
Modules linked in: thermal processor fan button loop e1000 sg evdev tg3 e100 rtb
CPU: 0
EIP: 0060:[<e0828017>] Not tainted VLI
EFLAGS: 00010206 (2.6.23-ipv6 #1)
EIP is at tg3_ape_write32+0x7/0x10 [tg3]
eax: de9304c0 ebx: dde8fe18 ecx: 00000000 edx: 00004214
esi: de9304c0 edi: 00000000 ebp: dde8fe28 esp: dde8fdd4
ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068
Process ip (pid: 2817, ti=dde8e000 task=dff4e0b0 task.ti=dde8e000)
Stack: e082fb2e 00000000 dde8fdf4 c01ece3e dde8fdf8 000003fe 00000000 00005400
08000000 00001aa0 e083b340 08001aa0 00000060 e083ce00 08001b20 00000030
e083ce80 00000101 de9304c0 00000001 dde56800 dde8fe38 e0830178 dff69000
Call Trace:
[<c010536a>] show_trace_log_lvl+0x1a/0x30
[<c0105429>] show_stack_log_lvl+0xa9/0xd0
[<c0105639>] show_registers+0x1e9/0x2f0
[<c0105851>] die+0x111/0x260
[<c011c5dc>] do_page_fault+0x18c/0x6a0
[<c0319bea>] error_code+0x72/0x78
[<e0830178>] tg3_init_hw+0x38/0x50 [tg3]
[<e0838886>] tg3_open+0x276/0x5d0 [tg3]
[<c02aead8>] dev_open+0x38/0x80
[<c02ad5cd>] dev_change_flags+0x7d/0x1a0
[<c02f63d8>] devinet_ioctl+0x4c8/0x660
[<c02f698b>] inet_ioctl+0x6b/0x90
[<c02a0e5a>] sock_ioctl+0x5a/0x210
[<c017cd98>] do_ioctl+0x28/0x80
[<c017ce47>] vfs_ioctl+0x57/0x290
[<c017d0b9>] sys_ioctl+0x39/0x60
[<c01042a2>] sysenter_past_esp+0x5f/0x99
=======================
Code: <89> 0a c3 8d b6 00 00 00 00 55 8b 48 50 89 e5 5d 01 ca 8b 02 c3 8d
EIP: [<e0828017>] tg3_ape_write32+0x7/0x10 [tg3] SS:ESP 0068:dde8fdd4
Kernel panic - not syncing: Fatal exception in interrupt
I haven't investigated this further. I'm using a BCM5704 card;
if this isn't a known problem and anyone is curious, I can supply
additional info.
-J
---
-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] IB/ipoib: Bound the net device to the ipoib_neigh structue
2007-10-11 20:17 ` Roland Dreier
2007-10-11 22:01 ` Jay Vosburgh
@ 2007-10-13 15:24 ` Moni Shoua
2007-10-14 15:51 ` Moni Shoua
2 siblings, 0 replies; 10+ messages in thread
From: Moni Shoua @ 2007-10-13 15:24 UTC (permalink / raw)
To: Roland Dreier
Cc: Moni Shoua, Jay Vosburgh, jeff, David Miller, ogerlitz, netdev,
Moni Levy
I will be near my lab only tomorrow...
I will check this and let you know.
On 10/11/07, Roland Dreier <rdreier@cisco.com> wrote:
> > It happens only when ib interfaces are slaves of a bonding device.
> > I thought before that the stuck is in napi_disable() but it's almost right.
> > I put prints before and after call to napi_disable and see that it is called twice.
> > I'll try to investigate in this direction.
> >
> > ib0: stopping interface
> > ib0: before napi_disable
> > ib0: after napi_disable
> > ib0: downing ib_dev
> > ib0: All sends and receives done.
> > ib0: stopping interface
> > ib0: before napi_disable
>
> Yes, two napi_disable()s in a row without a matching napi_enable()
> will deadlock. I guess the question is why the ipoib interface is
> being stopped twice.
>
> If you just take the net-2.6.24 tree (without bonding patches), does
> bonding for ethernet interfaces work OK, or is there a similar problem
> with double napi_disable()? How about bonding of ethernet after this
> batch of bonding patches?
>
> - R.
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] IB/ipoib: Bound the net device to the ipoib_neigh structue
2007-10-11 20:17 ` Roland Dreier
2007-10-11 22:01 ` Jay Vosburgh
2007-10-13 15:24 ` Moni Shoua
@ 2007-10-14 15:51 ` Moni Shoua
2 siblings, 0 replies; 10+ messages in thread
From: Moni Shoua @ 2007-10-14 15:51 UTC (permalink / raw)
To: Roland Dreier, Jay Vosburgh
Cc: jeff, David Miller, ogerlitz, netdev, Moni Levy
Roland Dreier wrote:
> > It happens only when ib interfaces are slaves of a bonding device.
> > I thought before that the stuck is in napi_disable() but it's almost right.
> > I put prints before and after call to napi_disable and see that it is called twice.
> > I'll try to investigate in this direction.
> >
> > ib0: stopping interface
> > ib0: before napi_disable
> > ib0: after napi_disable
> > ib0: downing ib_dev
> > ib0: All sends and receives done.
> > ib0: stopping interface
> > ib0: before napi_disable
>
> Yes, two napi_disable()s in a row without a matching napi_enable()
> will deadlock. I guess the question is why the ipoib interface is
> being stopped twice.
>
> If you just take the net-2.6.24 tree (without bonding patches), does
> bonding for ethernet interfaces work OK, or is there a similar problem
> with double napi_disable()? How about bonding of ethernet after this
> batch of bonding patches?
>
> - R.
Ok, I think I know what happens here.
When bonding gets an NETDEV_GOING_DONW event it releases the slave and
by the way closes the slave device (this is a new code). ifconfig on the other hand
closes the deivice one more time and this is why we see 2 napi_disable() in a row.
The fix in my opinion is in bonding - it should react to NETDEV_UNREGISTER and not to NETDEV_GOING_DONW.
I want to test this point and if it's good I'll submit new patches.
thanks
MoniS
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2007-10-14 15:51 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <11916151232222-git-send-email-fubar@us.ibm.com>
[not found] ` <470C200D.4010705@pobox.com>
2007-10-10 0:56 ` [PATCH] IB/ipoib: Bound the net device to the ipoib_neigh structue Jeff Garzik
2007-10-10 1:12 ` David Miller
2007-10-10 1:18 ` Jay Vosburgh
2007-10-10 16:03 ` Moni Shoua
2007-10-10 18:31 ` Roland Dreier
2007-10-11 14:48 ` Moni Shoua
2007-10-11 20:17 ` Roland Dreier
2007-10-11 22:01 ` Jay Vosburgh
2007-10-13 15:24 ` Moni Shoua
2007-10-14 15:51 ` Moni Shoua
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).