From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jay Vosburgh Subject: Re: 2.6.27.18: bonding: scheduling while atomic Date: Tue, 05 May 2009 12:56:21 -0700 Message-ID: <19313.1241553381@death.nxdomain.ibm.com> References: <1241550877.30571.17.camel@psmith-ubeta.netezza.com> Cc: Netdev To: paul@mad-scientist.net Return-path: Received: from e4.ny.us.ibm.com ([32.97.182.144]:45272 "EHLO e4.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752923AbZEET4Q (ORCPT ); Tue, 5 May 2009 15:56:16 -0400 Received: from d01relay02.pok.ibm.com (d01relay02.pok.ibm.com [9.56.227.234]) by e4.ny.us.ibm.com (8.13.1/8.13.1) with ESMTP id n45JqFQQ007253 for ; Tue, 5 May 2009 15:52:15 -0400 Received: from d01av02.pok.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by d01relay02.pok.ibm.com (8.13.8/8.13.8/NCO v9.2) with ESMTP id n45JuGIN145268 for ; Tue, 5 May 2009 15:56:16 -0400 Received: from d01av02.pok.ibm.com (loopback [127.0.0.1]) by d01av02.pok.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id n45JsMs0031734 for ; Tue, 5 May 2009 15:54:23 -0400 In-reply-to: <1241550877.30571.17.camel@psmith-ubeta.netezza.com> Sender: netdev-owner@vger.kernel.org List-ID: Paul Smith wrote: >Hi all; I got a "scheduling while atomic" error while I was enslaving >my second interface to my bond (balance-alb / mode 6). I already have >the previous fix for a locking error installed here. [...] ><4> [] packet_notifier+0x8c/0x1f0 ><4> [] bond_alb_init_slave+0x228/0x250 [bonding] ><4> [] bond_enslave+0x7ca/0x9d0 [bonding] ><4> [] _spin_unlock_irq+0x11/0x40 ><4> [] __dev_get_by_name+0x9a/0xc0 ><4> [] bond_do_ioctl+0x3f5/0x530 [bonding] ><4> [] notifier_call_chain+0x37/0x70 I believe this is happening when the new slave's MAC address is already in use by the bond somewhere. You can get that if you set up and tear down the bond after it's moved things around and you haven't reset the slaves to their default (hardware) MAC address (by, e.g., reloading the drivers). The alb mode doesn't do that, because that MAC might still be in use by the bond; if memory serves, you'll see a message at slave removal about that. Anyway, I'm pretty sure the following will make it go away. I believe this is safe, as RTNL is held throughout, but I haven't checked exhaustively. This is against 2.6.27.18, and is just for testing. -J diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c index 4489e58..d199446 100644 --- a/drivers/net/bonding/bond_alb.c +++ b/drivers/net/bonding/bond_alb.c @@ -1520,15 +1520,8 @@ int bond_alb_init_slave(struct bonding *bond, struct slave *slave) return res; } - /* caller must hold the bond lock for write since the mac addresses - * are compared and may be swapped. - */ - read_lock(&bond->lock); - res = alb_handle_addr_collision_on_attach(bond, slave); - read_unlock(&bond->lock); - if (res) { return res; } --- -Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com