From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?ISO-8859-1?Q?Nicolas_de_Peslo=FCan?= Subject: Re: [PATCH] bonding: prevent deadlock on slave store with alb mode Date: Tue, 24 May 2011 22:13:35 +0200 Message-ID: <4DDC116F.8020602@gmail.com> References: <1306265765-8257-1-git-send-email-nhorman@tuxdriver.com> <20110524200047.GI21309@gospo.rdu.redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Neil Horman , netdev@vger.kernel.org, Jay Vosburgh , "David S. Miller" To: Andy Gospodarek Return-path: Received: from mail-wy0-f174.google.com ([74.125.82.174]:55198 "EHLO mail-wy0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756990Ab1EXUNj (ORCPT ); Tue, 24 May 2011 16:13:39 -0400 Received: by wya21 with SMTP id 21so5284410wya.19 for ; Tue, 24 May 2011 13:13:38 -0700 (PDT) In-Reply-To: <20110524200047.GI21309@gospo.rdu.redhat.com> Sender: netdev-owner@vger.kernel.org List-ID: Le 24/05/2011 22:00, Andy Gospodarek a =E9crit : > On Tue, May 24, 2011 at 03:36:05PM -0400, Neil Horman wrote: >> This soft lockup was recently reported: >> >> [root@dell-per715-01 ~]# echo +bond5> /sys/class/net/bonding_master= s >> [root@dell-per715-01 ~]# echo +eth1> /sys/class/net/bond5/bonding/s= laves >> bonding: bond5: doing slave updates when interface is down. >> bonding bond5: master_dev is not up in bond_enslave >> [root@dell-per715-01 ~]# echo -eth1> /sys/class/net/bond5/bonding/s= laves >> bonding: bond5: doing slave updates when interface is down. >> >> BUG: soft lockup - CPU#12 stuck for 60s! [bash:6444] >> CPU 12: >> Modules linked in: bonding autofs4 hidp rfcomm l2cap bluetooth lockd= sunrpc >> be2d >> Pid: 6444, comm: bash Not tainted 2.6.18-262.el5 #1 >> RIP: 0010:[] [] >> .text.lock.spinlock+0x26/00 >> RSP: 0018:ffff810113167da8 EFLAGS: 00000286 >> RAX: ffff810113167fd8 RBX: ffff810123a47800 RCX: 0000000000ff1025 >> RDX: 0000000000000000 RSI: ffff810123a47800 RDI: ffff81021b57f6f8 >> RBP: ffff81021b57f500 R08: 0000000000000000 R09: 000000000000000c >> R10: 00000000ffffffff R11: ffff81011d41c000 R12: ffff81021b57f000 >> R13: 0000000000000000 R14: 0000000000000282 R15: 0000000000000282 >> FS: 00002b3b41ef3f50(0000) GS:ffff810123b27940(0000) knlGS:00000000= 00000000 >> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >> CR2: 00002b3b456dd000 CR3: 000000031fc60000 CR4: 00000000000006e0 >> >> Call Trace: >> [] _spin_lock_bh+0x9/0x14 >> [] :bonding:tlb_clear_slave+0x22/0xa1 >> [] :bonding:bond_alb_deinit_slave+0xba/0xf0 >> [] :bonding:bond_release+0x1b4/0x450 >> [] __down_write_nested+0x12/0x92 >> [] :bonding:bonding_store_slaves+0x25c/0x2f7 >> [] sysfs_write_file+0xb9/0xe8 >> [] vfs_write+0xce/0x174 >> [] sys_write+0x45/0x6e >> [] tracesys+0xd5/0xe0 >> >> It occurs because we are able to change the slave configuarion of a = bond while >> the bond interface is down. The bonding driver initializes some dat= a structures >> only after its ndo_open routine is called. Among them is the inital= ization of >> the alb tx and rx hash locks. So if we add or remove a slave withou= t first >> opening the bond master device, we run the risk of trying to lock/un= lock a >> spinlock that has garbage for data in it, which results in our above= softlock. >> >> We could fix it by moving the spin lock initalization to the device = creation >> path, but it seems that since we're warning people about not doing t= his, we >> should probably just disallow them from doing it, so fix it by addin= g an EINVAL >> return if we're not up yet. Tested by the reporter and confirmed to= fix the >> problem. >> >> Signed-off-by: Neil Horman > > Signed-off-by: Andy Gospodarek > >> Reported-by: jtluka@redhat.com >> CC: Jay Vosburgh >> CC: Andy Gospodarek >> CC: "David S. Miller" >> --- >> drivers/net/bonding/bond_sysfs.c | 1 + >> 1 files changed, 1 insertions(+), 0 deletions(-) >> >> diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/= bond_sysfs.c >> index 4059bfc..206c543 100644 >> --- a/drivers/net/bonding/bond_sysfs.c >> +++ b/drivers/net/bonding/bond_sysfs.c >> @@ -231,6 +231,7 @@ static ssize_t bonding_store_slaves(struct devic= e *d, >> if (!(bond->dev->flags& IFF_UP)) { >> pr_warning("%s: doing slave updates when interface is down.\n", >> bond->dev->name); >> + return -EINVAL; This will turn a warning into an error. This warning existed for long, but never caused the bonding setup to fa= il. This patch cause some=20 regression for user space. For example, current ifenslave-2.6 package i= n Debian doesn't ensure bond=20 is UP before enslaving, because this was never required. NAK. >> } >> >> if (!rtnl_trylock()) >> -- >> 1.7.5.1 >> >> -- >> To unsubscribe from this list: send the line "unsubscribe netdev" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >