netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH] IB/ipoib: Bound the net device to the ipoib_neigh structue
       [not found] ` <470C200D.4010705@pobox.com>
@ 2007-10-10  0:56   ` Jeff Garzik
  2007-10-10  1:12     ` David Miller
  0 siblings, 1 reply; 10+ messages in thread
From: Jeff Garzik @ 2007-10-10  0:56 UTC (permalink / raw)
  To: Jay Vosburgh; +Cc: Moni Shoua, Or Gerlitz, netdev

Jeff Garzik wrote:
> applied patches 1-9
> 
> the only thing that was a hiccup during submission is that your email 
> subject lines did not contain a notion of ordering "[PATCH 1/9] ...". 
> But other than that, the git-send-email went flawlessly.

unfortunately it does not seem to build flawlessly:


drivers/net/bonding/bond_main.c: In function ‘bond_setup_by_slave’:
drivers/net/bonding/bond_main.c:1264: error: ‘struct net_device’ has no 
member named ‘hard_header’
drivers/net/bonding/bond_main.c:1264: error: ‘struct net_device’ has no 
member named ‘hard_header’
drivers/net/bonding/bond_main.c:1265: error: ‘struct net_device’ has no 
member named ‘rebuild_header’
drivers/net/bonding/bond_main.c:1265: error: ‘struct net_device’ has no 
member named ‘rebuild_header’
drivers/net/bonding/bond_main.c:1266: error: ‘struct net_device’ has no 
member named ‘hard_header_cache’
drivers/net/bonding/bond_main.c:1266: error: ‘struct net_device’ has no 
member named ‘hard_header_cache’
drivers/net/bonding/bond_main.c:1267: error: ‘struct net_device’ has no 
member named ‘header_cache_update’
drivers/net/bonding/bond_main.c:1267: error: ‘struct net_device’ has no 
member named ‘header_cache_update’
drivers/net/bonding/bond_main.c:1268: error: ‘struct net_device’ has no 
member named ‘hard_header_parse’
drivers/net/bonding/bond_main.c:1268: error: ‘struct net_device’ has no 
member named ‘hard_header_parse’
drivers/net/bonding/bond_main.c: In function ‘bond_release_and_destroy’:
drivers/net/bonding/bond_main.c:1864: warning: too few arguments for format


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] IB/ipoib: Bound the net device to the ipoib_neigh structue
  2007-10-10  0:56   ` [PATCH] IB/ipoib: Bound the net device to the ipoib_neigh structue Jeff Garzik
@ 2007-10-10  1:12     ` David Miller
  2007-10-10  1:18       ` Jay Vosburgh
  0 siblings, 1 reply; 10+ messages in thread
From: David Miller @ 2007-10-10  1:12 UTC (permalink / raw)
  To: jeff; +Cc: fubar, monis, ogerlitz, netdev

From: Jeff Garzik <jeff@garzik.org>
Date: Tue, 09 Oct 2007 20:56:35 -0400

> Jeff Garzik wrote:
> > applied patches 1-9
> > 
> > the only thing that was a hiccup during submission is that your email 
> > subject lines did not contain a notion of ordering "[PATCH 1/9] ...". 
> > But other than that, the git-send-email went flawlessly.
> 
> unfortunately it does not seem to build flawlessly:

Yeah it doesn't handle Stephen Hemmingers headerops change
in net-2.6.24


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] IB/ipoib: Bound the net device to the ipoib_neigh structue
  2007-10-10  1:12     ` David Miller
@ 2007-10-10  1:18       ` Jay Vosburgh
  2007-10-10 16:03         ` Moni Shoua
  0 siblings, 1 reply; 10+ messages in thread
From: Jay Vosburgh @ 2007-10-10  1:18 UTC (permalink / raw)
  To: David Miller; +Cc: jeff, monis, ogerlitz, netdev

David Miller <davem@davemloft.net> wrote:

>From: Jeff Garzik <jeff@garzik.org>
>Date: Tue, 09 Oct 2007 20:56:35 -0400
>
>> Jeff Garzik wrote:
>> > applied patches 1-9
>> > 
>> > the only thing that was a hiccup during submission is that your email 
>> > subject lines did not contain a notion of ordering "[PATCH 1/9] ...". 
>> > But other than that, the git-send-email went flawlessly.
>> 
>> unfortunately it does not seem to build flawlessly:
>
>Yeah it doesn't handle Stephen Hemmingers headerops change
>in net-2.6.24

	Gaah.  I'll sort it out and repost.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] IB/ipoib: Bound the net device to the ipoib_neigh structue
  2007-10-10  1:18       ` Jay Vosburgh
@ 2007-10-10 16:03         ` Moni Shoua
  2007-10-10 18:31           ` Roland Dreier
  0 siblings, 1 reply; 10+ messages in thread
From: Moni Shoua @ 2007-10-10 16:03 UTC (permalink / raw)
  To: Jay Vosburgh, jeff; +Cc: David Miller, ogerlitz, netdev

Jay Vosburgh wrote:
> David Miller <davem@davemloft.net> wrote:
> 
>> From: Jeff Garzik <jeff@garzik.org>
>> Date: Tue, 09 Oct 2007 20:56:35 -0400
>>
>>> Jeff Garzik wrote:
>>>> applied patches 1-9
>>>>
>>>> the only thing that was a hiccup during submission is that your email 
>>>> subject lines did not contain a notion of ordering "[PATCH 1/9] ...". 
>>>> But other than that, the git-send-email went flawlessly.
>>> unfortunately it does not seem to build flawlessly:
>> Yeah it doesn't handle Stephen Hemmingers headerops change
>> in net-2.6.24
> 
> 	Gaah.  I'll sort it out and repost.
> 
> 	-J
> 
> ---
> 	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com

Hi Jay, Jeff
Thanks for the help with making the patch work compile under 2.6.24.
However, patch #3 has a missing line in bond_setup_by_slave that should look like this

	bond_dev->header_ops        = slave_dev->header_ops;

I rewrote the patch and also fixed patch #8 that became broken.

I would send the new patches now but there is more....
I also ran a test for the code in the branch of 2.6.24 and found a problem.
I see that ifconfig down doesn't return (for IPoIB interfaces) and it's stuck in napi_disable() in the kernel (any idea why?)

I am trying to solve it now so I'd like to wait a short time before applying these patches. 
I guess that I'll need to add something.



thanks
   MoniS


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] IB/ipoib: Bound the net device to the ipoib_neigh structue
  2007-10-10 16:03         ` Moni Shoua
@ 2007-10-10 18:31           ` Roland Dreier
  2007-10-11 14:48             ` Moni Shoua
  0 siblings, 1 reply; 10+ messages in thread
From: Roland Dreier @ 2007-10-10 18:31 UTC (permalink / raw)
  To: Moni Shoua; +Cc: Jay Vosburgh, jeff, David Miller, ogerlitz, netdev

 > I also ran a test for the code in the branch of 2.6.24 and found a problem.
 > I see that ifconfig down doesn't return (for IPoIB interfaces) and it's stuck in napi_disable() in the kernel (any idea why?)

For what it's worth, I took the upstream 2.6.23 git tree and merged in
Dave's latest net-2.6.24 tree and my latest for-2.6.24 tree and tried
that.  I brought up an IPoIB interface, sent a few pings, and did
ifconfig down, and it worked fine.

Can you try the same thing without the bonding patches to see if your
setup works OK too?

Also can you give more details about what you do to get ifconfig down stuck?

 - R.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] IB/ipoib: Bound the net device to the ipoib_neigh structue
  2007-10-10 18:31           ` Roland Dreier
@ 2007-10-11 14:48             ` Moni Shoua
  2007-10-11 20:17               ` Roland Dreier
  0 siblings, 1 reply; 10+ messages in thread
From: Moni Shoua @ 2007-10-11 14:48 UTC (permalink / raw)
  To: Roland Dreier, Jay Vosburgh, jeff
  Cc: David Miller, ogerlitz, netdev, Moni Levy

Roland Dreier wrote:
>  > I also ran a test for the code in the branch of 2.6.24 and found a problem.
>  > I see that ifconfig down doesn't return (for IPoIB interfaces) and it's stuck in napi_disable() in the kernel (any idea why?)
> 
> For what it's worth, I took the upstream 2.6.23 git tree and merged in
> Dave's latest net-2.6.24 tree and my latest for-2.6.24 tree and tried
> that.  I brought up an IPoIB interface, sent a few pings, and did
> ifconfig down, and it worked fine.
> 
> Can you try the same thing without the bonding patches to see if your
> setup works OK too?
> 
> Also can you give more details about what you do to get ifconfig down stuck?
> 
>  - R.

Without bonding ifconfig down works fine. 
It happens only when ib interfaces are slaves of a bonding device.
I thought before that the stuck is in napi_disable() but it's almost right.
I put prints before and after call to napi_disable and see that it is called twice.
I'll try to investigate in this direction.

ib0: stopping interface
ib0: before napi_disable
ib0: after napi_disable
ib0: downing ib_dev
ib0: All sends and receives done.
ib0: stopping interface
ib0: before napi_disable



There is also a dump of the kernel log after 'echo t > /proc/sysrq-trigger' (for ifconfig)

SysRq : Show State

ifconfig      S 0000000000000000     0  6311   6099
 ffff810034f49d18 0000000000000086 0000000000000000 ffffffffffffffff
 ffff810037e747c0 ffff810037e747c0 000000013481e000 ffff81003a851a78
 ffff81003a851840 000000003b0c8c00 0000000000000000 00000000802358ee
Call Trace:
 [<ffffffff8023cc89>] lock_timer_base+0x24/0x49
 [<ffffffff80403754>] schedule_timeout+0x8a/0xad
 [<ffffffff8023d241>] process_timeout+0x0/0x5
 [<ffffffff8023d6ec>] msleep_interruptible+0x11/0x39
 [<ffffffff884081a7>] :ib_ipoib:ipoib_stop+0x64/0x12c
 [<ffffffff8039fc07>] dev_close+0x3e/0x56
 [<ffffffff803a1c31>] dev_change_flags+0xa7/0x15f
 [<ffffffff803e5bee>] devinet_ioctl+0x293/0x5ed
 [<ffffffff803e775b>] inet_ioctl+0x7f/0x9d
 [<ffffffff80395b2e>] sock_ioctl+0x0/0x1fe
 [<ffffffff80395d08>] sock_ioctl+0x1da/0x1fe
 [<ffffffff802947d9>] do_ioctl+0x29/0x6f
 [<ffffffff80294a75>] vfs_ioctl+0x256/0x267
 [<ffffffff80294adf>] sys_ioctl+0x59/0x7a
 [<ffffffff8020bc0e>] system_call+0x7e/0x83



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] IB/ipoib: Bound the net device to the ipoib_neigh structue
  2007-10-11 14:48             ` Moni Shoua
@ 2007-10-11 20:17               ` Roland Dreier
  2007-10-11 22:01                 ` Jay Vosburgh
                                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Roland Dreier @ 2007-10-11 20:17 UTC (permalink / raw)
  To: Moni Shoua; +Cc: Jay Vosburgh, jeff, David Miller, ogerlitz, netdev, Moni Levy

 > It happens only when ib interfaces are slaves of a bonding device.
 > I thought before that the stuck is in napi_disable() but it's almost right.
 > I put prints before and after call to napi_disable and see that it is called twice.
 > I'll try to investigate in this direction.
 > 
 > ib0: stopping interface
 > ib0: before napi_disable
 > ib0: after napi_disable
 > ib0: downing ib_dev
 > ib0: All sends and receives done.
 > ib0: stopping interface
 > ib0: before napi_disable

Yes, two napi_disable()s in a row without a matching napi_enable()
will deadlock.  I guess the question is why the ipoib interface is
being stopped twice.

If you just take the net-2.6.24 tree (without bonding patches), does
bonding for ethernet interfaces work OK, or is there a similar problem
with double napi_disable()?  How about bonding of ethernet after this
batch of bonding patches?

 - R.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] IB/ipoib: Bound the net device to the ipoib_neigh structue
  2007-10-11 20:17               ` Roland Dreier
@ 2007-10-11 22:01                 ` Jay Vosburgh
  2007-10-13 15:24                 ` Moni Shoua
  2007-10-14 15:51                 ` Moni Shoua
  2 siblings, 0 replies; 10+ messages in thread
From: Jay Vosburgh @ 2007-10-11 22:01 UTC (permalink / raw)
  To: Roland Dreier; +Cc: Moni Shoua, jeff, David Miller, ogerlitz, netdev, Moni Levy

Roland Dreier <rdreier@cisco.com> wrote:
[...]
>Yes, two napi_disable()s in a row without a matching napi_enable()
>will deadlock.  I guess the question is why the ipoib interface is
>being stopped twice.
>
>If you just take the net-2.6.24 tree (without bonding patches), does
>bonding for ethernet interfaces work OK, or is there a similar problem
>with double napi_disable()?  How about bonding of ethernet after this
>batch of bonding patches?

	I just checked this on an x86 box.  The bonding in stock net-2.6
pulled this morning or last night works ok (I did some basic tests,
including ifconfig down / up, with e100).  This remains true with the
IPoIB bonding patches applied.  I do not have hardware available to test
IPoIB.

	I did get a whammy from tg3, but I think this is unrelated to
bonding (as it happens when tg3 comes up, before bonding is involved):

BUG: unable to handle kernel paging request at virtual address 00004214
 printing eip:
e0828017
*pde = 00000000
Oops: 0002 [#1]
SMP
Modules linked in: thermal processor fan button loop e1000 sg evdev tg3 e100 rtb
CPU:    0
EIP:    0060:[<e0828017>]    Not tainted VLI
EFLAGS: 00010206   (2.6.23-ipv6 #1)
EIP is at tg3_ape_write32+0x7/0x10 [tg3]
eax: de9304c0   ebx: dde8fe18   ecx: 00000000   edx: 00004214
esi: de9304c0   edi: 00000000   ebp: dde8fe28   esp: dde8fdd4
ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068
Process ip (pid: 2817, ti=dde8e000 task=dff4e0b0 task.ti=dde8e000)
Stack: e082fb2e 00000000 dde8fdf4 c01ece3e dde8fdf8 000003fe 00000000 00005400
       08000000 00001aa0 e083b340 08001aa0 00000060 e083ce00 08001b20 00000030
       e083ce80 00000101 de9304c0 00000001 dde56800 dde8fe38 e0830178 dff69000
Call Trace:
 [<c010536a>] show_trace_log_lvl+0x1a/0x30
 [<c0105429>] show_stack_log_lvl+0xa9/0xd0
 [<c0105639>] show_registers+0x1e9/0x2f0
 [<c0105851>] die+0x111/0x260
 [<c011c5dc>] do_page_fault+0x18c/0x6a0
 [<c0319bea>] error_code+0x72/0x78
 [<e0830178>] tg3_init_hw+0x38/0x50 [tg3]
 [<e0838886>] tg3_open+0x276/0x5d0 [tg3]
 [<c02aead8>] dev_open+0x38/0x80
 [<c02ad5cd>] dev_change_flags+0x7d/0x1a0
 [<c02f63d8>] devinet_ioctl+0x4c8/0x660
 [<c02f698b>] inet_ioctl+0x6b/0x90
 [<c02a0e5a>] sock_ioctl+0x5a/0x210
 [<c017cd98>] do_ioctl+0x28/0x80
 [<c017ce47>] vfs_ioctl+0x57/0x290
 [<c017d0b9>] sys_ioctl+0x39/0x60
 [<c01042a2>] sysenter_past_esp+0x5f/0x99
 =======================
Code: <89> 0a c3 8d b6 00 00 00 00 55 8b 48 50 89 e5 5d 01 ca 8b 02 c3 8d
EIP: [<e0828017>] tg3_ape_write32+0x7/0x10 [tg3] SS:ESP 0068:dde8fdd4
Kernel panic - not syncing: Fatal exception in interrupt

	I haven't investigated this further.  I'm using a BCM5704 card;
if this isn't a known problem and anyone is curious, I can supply
additional info.

	-J

---
	-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] IB/ipoib: Bound the net device to the ipoib_neigh structue
  2007-10-11 20:17               ` Roland Dreier
  2007-10-11 22:01                 ` Jay Vosburgh
@ 2007-10-13 15:24                 ` Moni Shoua
  2007-10-14 15:51                 ` Moni Shoua
  2 siblings, 0 replies; 10+ messages in thread
From: Moni Shoua @ 2007-10-13 15:24 UTC (permalink / raw)
  To: Roland Dreier
  Cc: Moni Shoua, Jay Vosburgh, jeff, David Miller, ogerlitz, netdev,
	Moni Levy

I will be near my lab only tomorrow...
I will check this and let you know.

On 10/11/07, Roland Dreier <rdreier@cisco.com> wrote:
>  > It happens only when ib interfaces are slaves of a bonding device.
>  > I thought before that the stuck is in napi_disable() but it's almost right.
>  > I put prints before and after call to napi_disable and see that it is called twice.
>  > I'll try to investigate in this direction.
>  >
>  > ib0: stopping interface
>  > ib0: before napi_disable
>  > ib0: after napi_disable
>  > ib0: downing ib_dev
>  > ib0: All sends and receives done.
>  > ib0: stopping interface
>  > ib0: before napi_disable
>
> Yes, two napi_disable()s in a row without a matching napi_enable()
> will deadlock.  I guess the question is why the ipoib interface is
> being stopped twice.
>
> If you just take the net-2.6.24 tree (without bonding patches), does
> bonding for ethernet interfaces work OK, or is there a similar problem
> with double napi_disable()?  How about bonding of ethernet after this
> batch of bonding patches?
>
>  - R.
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] IB/ipoib: Bound the net device to the ipoib_neigh structue
  2007-10-11 20:17               ` Roland Dreier
  2007-10-11 22:01                 ` Jay Vosburgh
  2007-10-13 15:24                 ` Moni Shoua
@ 2007-10-14 15:51                 ` Moni Shoua
  2 siblings, 0 replies; 10+ messages in thread
From: Moni Shoua @ 2007-10-14 15:51 UTC (permalink / raw)
  To: Roland Dreier, Jay Vosburgh
  Cc: jeff, David Miller, ogerlitz, netdev, Moni Levy

Roland Dreier wrote:
>  > It happens only when ib interfaces are slaves of a bonding device.
>  > I thought before that the stuck is in napi_disable() but it's almost right.
>  > I put prints before and after call to napi_disable and see that it is called twice.
>  > I'll try to investigate in this direction.
>  > 
>  > ib0: stopping interface
>  > ib0: before napi_disable
>  > ib0: after napi_disable
>  > ib0: downing ib_dev
>  > ib0: All sends and receives done.
>  > ib0: stopping interface
>  > ib0: before napi_disable
> 
> Yes, two napi_disable()s in a row without a matching napi_enable()
> will deadlock.  I guess the question is why the ipoib interface is
> being stopped twice.
> 
> If you just take the net-2.6.24 tree (without bonding patches), does
> bonding for ethernet interfaces work OK, or is there a similar problem
> with double napi_disable()?  How about bonding of ethernet after this
> batch of bonding patches?
> 
>  - R.

Ok, I think I know what happens here.
When bonding gets an NETDEV_GOING_DONW event it releases the slave and 
by the way closes the slave device (this is a new code). ifconfig on the other hand
closes the deivice one more time and this is why we see 2 napi_disable() in a row.

The fix in my opinion is in bonding - it should react to NETDEV_UNREGISTER and not to NETDEV_GOING_DONW.
I want to test this point and if it's good I'll submit new patches.


thanks
  MoniS


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2007-10-14 15:51 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <11916151232222-git-send-email-fubar@us.ibm.com>
     [not found] ` <470C200D.4010705@pobox.com>
2007-10-10  0:56   ` [PATCH] IB/ipoib: Bound the net device to the ipoib_neigh structue Jeff Garzik
2007-10-10  1:12     ` David Miller
2007-10-10  1:18       ` Jay Vosburgh
2007-10-10 16:03         ` Moni Shoua
2007-10-10 18:31           ` Roland Dreier
2007-10-11 14:48             ` Moni Shoua
2007-10-11 20:17               ` Roland Dreier
2007-10-11 22:01                 ` Jay Vosburgh
2007-10-13 15:24                 ` Moni Shoua
2007-10-14 15:51                 ` Moni Shoua

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).