Re: [PATCH] bonding: prevent deadlock on slave store with alb mode (v3)

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Jay Vosburgh <fubar@us.ibm.com>
To: Neil Horman <nhorman@tuxdriver.com>
Cc: netdev@vger.kernel.org, Andy Gospodarek <andy@greyhouse.net>,
	nicolas.2p.debian@gmail.com,
	"David S. Miller" <davem@davemloft.net>
Subject: Re: [PATCH] bonding: prevent deadlock on slave store with alb mode (v3)
Date: Wed, 25 May 2011 11:59:02 -0700	[thread overview]
Message-ID: <13612.1306349942@death> (raw)
In-Reply-To: <1306347181-3757-1-git-send-email-nhorman@tuxdriver.com>

Neil Horman <nhorman@tuxdriver.com> wrote:

>This soft lockup was recently reported:
>
>[root@dell-per715-01 ~]# echo +bond5 > /sys/class/net/bonding_masters
>[root@dell-per715-01 ~]# echo +eth1 > /sys/class/net/bond5/bonding/slaves
>bonding: bond5: doing slave updates when interface is down.
>bonding bond5: master_dev is not up in bond_enslave
>[root@dell-per715-01 ~]# echo -eth1 > /sys/class/net/bond5/bonding/slaves
>bonding: bond5: doing slave updates when interface is down.
>
>BUG: soft lockup - CPU#12 stuck for 60s! [bash:6444]
>CPU 12:
>Modules linked in: bonding autofs4 hidp rfcomm l2cap bluetooth lockd sunrpc
>be2d
>Pid: 6444, comm: bash Not tainted 2.6.18-262.el5 #1
>RIP: 0010:[<ffffffff80064bf0>]  [<ffffffff80064bf0>]
>.text.lock.spinlock+0x26/00
>RSP: 0018:ffff810113167da8  EFLAGS: 00000286
>RAX: ffff810113167fd8 RBX: ffff810123a47800 RCX: 0000000000ff1025
>RDX: 0000000000000000 RSI: ffff810123a47800 RDI: ffff81021b57f6f8
>RBP: ffff81021b57f500 R08: 0000000000000000 R09: 000000000000000c
>R10: 00000000ffffffff R11: ffff81011d41c000 R12: ffff81021b57f000
>R13: 0000000000000000 R14: 0000000000000282 R15: 0000000000000282
>FS:  00002b3b41ef3f50(0000) GS:ffff810123b27940(0000) knlGS:0000000000000000
>CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
>CR2: 00002b3b456dd000 CR3: 000000031fc60000 CR4: 00000000000006e0
>
>Call Trace:
> [<ffffffff80064af9>] _spin_lock_bh+0x9/0x14
> [<ffffffff886937d7>] :bonding:tlb_clear_slave+0x22/0xa1
> [<ffffffff8869423c>] :bonding:bond_alb_deinit_slave+0xba/0xf0
> [<ffffffff8868dda6>] :bonding:bond_release+0x1b4/0x450
> [<ffffffff8006457b>] __down_write_nested+0x12/0x92
> [<ffffffff88696ae4>] :bonding:bonding_store_slaves+0x25c/0x2f7
> [<ffffffff801106f7>] sysfs_write_file+0xb9/0xe8
> [<ffffffff80016b87>] vfs_write+0xce/0x174
> [<ffffffff80017450>] sys_write+0x45/0x6e
> [<ffffffff8005d28d>] tracesys+0xd5/0xe0
>
>It occurs because we are able to change the slave configuarion of a bond while
>the bond interface is down.  The bonding driver initializes some data structures
>only after its ndo_open routine is called.  Among them is the initalization of
>the alb tx and rx hash locks.  So if we add or remove a slave without first
>opening the bond master device, we run the risk of trying to lock/unlock a
>spinlock that has garbage for data in it, which results in our above softlock.
>
>Note that sometimes this works, because in many cases an unlocked spinlock has
>the raw_lock parameter initialized to zero (meaning that the kzalloc of the
>net_device private data is equivalent to calling spin_lock_init), but thats not
>true in all cases, and we aren't guaranteed that condition, so we need to pass
>the relevant spinlocks through the spin_lock_init function.
>
>Fix it by moving the spin_lock_init calls for the tx and rx hashtable locks to
>the ndo_init path, so they are ready for use by the bond_store_slaves path.
>
>Change notes:
>v2) Based on conversation with Jay and Nicolas it seems that the ability to
>enslave devices while the bond master is down should be safe to do.  As such
>this is an outlier bug, and so instead we'll just initalize the errant spinlocks
>in the init path rather than the open path, solving the problem.  We'll also
>remove the warnings about the bond being down during enslave operations, since
>it should be safe
>
>v3) Fix spelling error
>
>Signed-off-by: Neil Horman <nhorman@tuxdriver.com>

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>


>Reported-by: jtluka@redhat.com
>CC: Jay Vosburgh <fubar@us.ibm.com>
>CC: Andy Gospodarek <andy@greyhouse.net>
>CC: nicolas.2p.debian@gmail.com
>CC: "David S. Miller" <davem@davemloft.net>
>---
> drivers/net/bonding/bond_alb.c   |    4 ----
> drivers/net/bonding/bond_main.c  |   16 ++++++++++------
> drivers/net/bonding/bond_sysfs.c |    6 ------
> 3 files changed, 10 insertions(+), 16 deletions(-)
>
>diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
>index 8f2d2e7..2df9276 100644
>--- a/drivers/net/bonding/bond_alb.c
>+++ b/drivers/net/bonding/bond_alb.c
>@@ -163,8 +163,6 @@ static int tlb_initialize(struct bonding *bond)
> 	struct tlb_client_info *new_hashtbl;
> 	int i;
>
>-	spin_lock_init(&(bond_info->tx_hashtbl_lock));
>-
> 	new_hashtbl = kzalloc(size, GFP_KERNEL);
> 	if (!new_hashtbl) {
> 		pr_err("%s: Error: Failed to allocate TLB hash table\n",
>@@ -747,8 +745,6 @@ static int rlb_initialize(struct bonding *bond)
> 	int size = RLB_HASH_TABLE_SIZE * sizeof(struct rlb_client_info);
> 	int i;
>
>-	spin_lock_init(&(bond_info->rx_hashtbl_lock));
>-
> 	new_hashtbl = kmalloc(size, GFP_KERNEL);
> 	if (!new_hashtbl) {
> 		pr_err("%s: Error: Failed to allocate RLB hash table\n",
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index 088fd84..c0045d7 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -1542,12 +1542,6 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev)
> 			   bond_dev->name, slave_dev->name);
> 	}
>
>-	/* bond must be initialized by bond_open() before enslaving */
>-	if (!(bond_dev->flags & IFF_UP)) {
>-		pr_warning("%s: master_dev is not up in bond_enslave\n",
>-			   bond_dev->name);
>-	}
>-
> 	/* already enslaved */
> 	if (slave_dev->flags & IFF_SLAVE) {
> 		pr_debug("Error, Device was already enslaved\n");
>@@ -4832,9 +4826,19 @@ static int bond_init(struct net_device *bond_dev)
> {
> 	struct bonding *bond = netdev_priv(bond_dev);
> 	struct bond_net *bn = net_generic(dev_net(bond_dev), bond_net_id);
>+	struct alb_bond_info *bond_info = &(BOND_ALB_INFO(bond));
>
> 	pr_debug("Begin bond_init for %s\n", bond_dev->name);
>
>+	/*
>+	 * Initialize locks that may be required during
>+	 * en/deslave operations.  All of the bond_open work
>+	 * (of which this is part) should really be moved to
>+	 * a phase prior to dev_open
>+	 */
>+	spin_lock_init(&(bond_info->tx_hashtbl_lock));
>+	spin_lock_init(&(bond_info->rx_hashtbl_lock));
>+
> 	bond->wq = create_singlethread_workqueue(bond_dev->name);
> 	if (!bond->wq)
> 		return -ENOMEM;
>diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
>index 4059bfc..bb1319f 100644
>--- a/drivers/net/bonding/bond_sysfs.c
>+++ b/drivers/net/bonding/bond_sysfs.c
>@@ -227,12 +227,6 @@ static ssize_t bonding_store_slaves(struct device *d,
> 	struct net_device *dev;
> 	struct bonding *bond = to_bond(d);
>
>-	/* Quick sanity check -- is the bond interface up? */
>-	if (!(bond->dev->flags & IFF_UP)) {
>-		pr_warning("%s: doing slave updates when interface is down.\n",
>-			   bond->dev->name);
>-	}
>-
> 	if (!rtnl_trylock())
> 		return restart_syscall();
>
>-- 
>1.7.5.2
>

     prev parent reply	other threads:[~2011-05-25 19:00 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-05-24 19:36 [PATCH] bonding: prevent deadlock on slave store with alb mode Neil Horman
2011-05-24 20:00 ` Andy Gospodarek
2011-05-24 20:13   ` Nicolas de Pesloüan
2011-05-24 20:37     ` Neil Horman
2011-05-24 20:51       ` Nicolas de Pesloüan
2011-05-24 21:02         ` Nicolas de Pesloüan
2011-05-24 21:12         ` Jay Vosburgh
2011-05-24 21:14           ` David Miller
2011-05-24 21:32           ` Nicolas de Pesloüan
2011-05-24 23:18           ` Neil Horman
2011-05-25  0:34             ` Jay Vosburgh
2011-05-25  2:20               ` Neil Horman
2011-05-25  7:33               ` Nicolas de Pesloüan
2011-05-25 17:16 ` [PATCH] bonding: prevent deadlock on slave store with alb mode (v2) Neil Horman
2011-05-25 17:32   ` Jay Vosburgh
2011-05-25 21:58   ` David Miller
2011-05-25 18:13 ` [PATCH] bonding: prevent deadlock on slave store with alb mode (v3) Neil Horman
2011-05-25 18:59   ` Jay Vosburgh [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=13612.1306349942@death \
    --to=fubar@us.ibm.com \
    --cc=andy@greyhouse.net \
    --cc=davem@davemloft.net \
    --cc=netdev@vger.kernel.org \
    --cc=nhorman@tuxdriver.com \
    --cc=nicolas.2p.debian@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).