From: Jay Vosburgh <fubar@us.ibm.com>
To: Jiri Bohac <jbohac@suse.cz>
Cc: bonding-devel@lists.sourceforge.net, markine@google.com,
jarkao2@gmail.com, chavey@google.com, netdev@vger.kernel.org
Subject: Re: [RFC] bonding: fix workqueue re-arming races
Date: Tue, 31 Aug 2010 13:54:23 -0700 [thread overview]
Message-ID: <20136.1283288063@death> (raw)
In-Reply-To: <20100831170752.GA9743@midget.suse.cz>
Jiri Bohac <jbohac@suse.cz> wrote:
>Hi,
>
>this is another attempt to solve the bonding workqueue re-arming
>races.
>
>The issue has been thoroughly discussed here:
>http://article.gmane.org/gmane.linux.network/146949 "[PATCH]
>bonding: cancel_delayed_work() -> cancel_delayed_work_sync()"
>However, the only outcome was a proposal for an ugly hack with
>busy-waiting on the rtnl.
>
>The problem:
>Bonding uses delayed work that automatically re-arms itself,
>e.g.: bond_mii_monitor().
>
>A dev->close() quickly followed by dev->open() on the bonding
>master has a race condition. bond_close() sets kill_timers=1 and
>calls cancel_delayed_work(), hoping that bond_mii_monitor() will
>not re-arm again anymore. There are two problems with this:
>
>- bond->kill_timers is not re-checked after re-acquiring the
> bond->lock (this would be easy to fix)
>
>- bond_open() resets bond->kill_timers to 0. If this happens
> before bond_mii_monitor() notices the flag and exits, it will
> re-arm itself. bond_open() then tries to schedule the delayed
> work, which causes a BUG.
>
>The issue would be solved by calling cancel_delayed_work_sync(),
>but this can not be done from bond_close() since it is called
>under rtnl and the delayed work locks rtnl itself.
>
>My proposal is to move all the "commit" work that requires rtnl
>to a separate work and schedule it on the bonding wq. Thus, the
>re-arming work does not need rtnl and can be cancelled using
>cancel_delayed_work_sync().
>
>Comments?
>
>[note, this does not deal with bond_loadbalance_arp_mon(), where
>rtnl is now taken as well in net-next; I'll do this if you think
>the idea is good ]
I don't believe the loadbalance_arp_mon acquires RTNL in
net-next. I recall discussing this with Andy not too long ago, but I
didn't think that change went in, and I don't see it in the tree.
>Signed-off-by: Jiri Bohac <jbohac@suse.cz>
>
>diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
>index 822f586..8015e12 100644
>--- a/drivers/net/bonding/bond_3ad.c
>+++ b/drivers/net/bonding/bond_3ad.c
>@@ -2119,10 +2119,6 @@ void bond_3ad_state_machine_handler(struct work_struct *work)
>
> read_lock(&bond->lock);
>
>- if (bond->kill_timers) {
>- goto out;
>- }
>-
> //check if there are any slaves
> if (bond->slave_cnt == 0) {
> goto re_arm;
>@@ -2166,7 +2162,6 @@ void bond_3ad_state_machine_handler(struct work_struct *work)
>
> re_arm:
> queue_delayed_work(bond->wq, &bond->ad_work, ad_delta_in_ticks);
>-out:
> read_unlock(&bond->lock);
> }
>
>diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
>index c746b33..250d027 100644
>--- a/drivers/net/bonding/bond_alb.c
>+++ b/drivers/net/bonding/bond_alb.c
>@@ -1397,6 +1397,23 @@ out:
> return NETDEV_TX_OK;
> }
>
>+void bond_alb_promisc_disable(struct work_struct *work)
>+{
>+ struct bonding *bond = container_of(work, struct bonding,
>+ alb_promisc_disable_work);
>+ struct alb_bond_info *bond_info = &(BOND_ALB_INFO(bond));
>+
>+ /*
>+ * dev_set_promiscuity requires rtnl and
>+ * nothing else.
>+ */
>+ rtnl_lock();
>+ dev_set_promiscuity(bond->curr_active_slave->dev, -1);
>+ bond_info->primary_is_promisc = 0;
>+ bond_info->rlb_promisc_timeout_counter = 0;
>+ rtnl_unlock();
>+}
What prevents this from deadlocking such that cpu A is in
bond_close, holding RTNL and in cancel_delayed_work_sync, while cpu B is
in the above function, trying to acquire RTNL?
Also, assuming for the moment that the above isn't a problem,
curr_active_slave may be NULL if the last slave is removed between the
time bond_alb_promisc_disable is scheduled and when it runs. I'm not
sure that the alb_bond_info can be guaranteed to be valid, either, if
the mode changed.
-J
> void bond_alb_monitor(struct work_struct *work)
> {
> struct bonding *bond = container_of(work, struct bonding,
>@@ -1407,10 +1424,6 @@ void bond_alb_monitor(struct work_struct *work)
>
> read_lock(&bond->lock);
>
>- if (bond->kill_timers) {
>- goto out;
>- }
>-
> if (bond->slave_cnt == 0) {
> bond_info->tx_rebalance_counter = 0;
> bond_info->lp_counter = 0;
>@@ -1462,25 +1475,11 @@ void bond_alb_monitor(struct work_struct *work)
> if (bond_info->rlb_enabled) {
> if (bond_info->primary_is_promisc &&
> (++bond_info->rlb_promisc_timeout_counter >= RLB_PROMISC_TIMEOUT)) {
>-
>- /*
>- * dev_set_promiscuity requires rtnl and
>- * nothing else.
>- */
>- read_unlock(&bond->lock);
>- rtnl_lock();
>-
>- bond_info->rlb_promisc_timeout_counter = 0;
>-
> /* If the primary was set to promiscuous mode
> * because a slave was disabled then
> * it can now leave promiscuous mode.
> */
>- dev_set_promiscuity(bond->curr_active_slave->dev, -1);
>- bond_info->primary_is_promisc = 0;
>-
>- rtnl_unlock();
>- read_lock(&bond->lock);
>+ queue_work(bond->wq, &bond->alb_promisc_disable_work);
> }
>
> if (bond_info->rlb_rebalance) {
>@@ -1505,7 +1504,6 @@ void bond_alb_monitor(struct work_struct *work)
>
> re_arm:
> queue_delayed_work(bond->wq, &bond->alb_work, alb_delta_in_ticks);
>-out:
> read_unlock(&bond->lock);
> }
>
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index 2cc4cfc..3e8b57e 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -2343,10 +2343,15 @@ static int bond_miimon_inspect(struct bonding *bond)
> return commit;
> }
>
>-static void bond_miimon_commit(struct bonding *bond)
>+static void bond_miimon_commit(struct work_struct *work)
> {
> struct slave *slave;
> int i;
>+ struct bonding *bond = container_of(work, struct bonding,
>+ miimon_commit_work);
>+
>+ rtnl_lock();
>+ read_lock(&bond->lock);
>
> bond_for_each_slave(bond, slave, i) {
> switch (slave->new_link) {
>@@ -2421,15 +2426,18 @@ static void bond_miimon_commit(struct bonding *bond)
> }
>
> do_failover:
>- ASSERT_RTNL();
> write_lock_bh(&bond->curr_slave_lock);
> bond_select_active_slave(bond);
> write_unlock_bh(&bond->curr_slave_lock);
> }
>
> bond_set_carrier(bond);
>+
>+ read_unlock(&bond->lock);
>+ rtnl_unlock();
> }
>
>+
> /*
> * bond_mii_monitor
> *
>@@ -2438,14 +2446,13 @@ do_failover:
> * an acquisition of appropriate locks followed by a commit phase to
> * implement whatever link state changes are indicated.
> */
>+
> void bond_mii_monitor(struct work_struct *work)
> {
> struct bonding *bond = container_of(work, struct bonding,
> mii_work.work);
>
> read_lock(&bond->lock);
>- if (bond->kill_timers)
>- goto out;
>
> if (bond->slave_cnt == 0)
> goto re_arm;
>@@ -2462,23 +2469,14 @@ void bond_mii_monitor(struct work_struct *work)
> read_unlock(&bond->curr_slave_lock);
> }
>
>- if (bond_miimon_inspect(bond)) {
>- read_unlock(&bond->lock);
>- rtnl_lock();
>- read_lock(&bond->lock);
>-
>- bond_miimon_commit(bond);
>+ if (bond_miimon_inspect(bond))
>+ queue_work(bond->wq, &bond->miimon_commit_work);
>
>- read_unlock(&bond->lock);
>- rtnl_unlock(); /* might sleep, hold no other locks */
>- read_lock(&bond->lock);
>- }
>
> re_arm:
> if (bond->params.miimon)
> queue_delayed_work(bond->wq, &bond->mii_work,
> msecs_to_jiffies(bond->params.miimon));
>-out:
> read_unlock(&bond->lock);
> }
>
>@@ -2778,9 +2776,6 @@ void bond_loadbalance_arp_mon(struct work_struct *work)
>
> delta_in_ticks = msecs_to_jiffies(bond->params.arp_interval);
>
>- if (bond->kill_timers)
>- goto out;
>-
> if (bond->slave_cnt == 0)
> goto re_arm;
>
>@@ -2867,7 +2862,6 @@ void bond_loadbalance_arp_mon(struct work_struct *work)
> re_arm:
> if (bond->params.arp_interval)
> queue_delayed_work(bond->wq, &bond->arp_work, delta_in_ticks);
>-out:
> read_unlock(&bond->lock);
> }
>
>@@ -2949,13 +2943,19 @@ static int bond_ab_arp_inspect(struct bonding *bond, int delta_in_ticks)
> /*
> * Called to commit link state changes noted by inspection step of
> * active-backup mode ARP monitor.
>- *
>- * Called with RTNL and bond->lock for read.
> */
>-static void bond_ab_arp_commit(struct bonding *bond, int delta_in_ticks)
>+static void bond_ab_arp_commit(struct work_struct *work)
> {
> struct slave *slave;
> int i;
>+ int delta_in_ticks;
>+ struct bonding *bond = container_of(work, struct bonding,
>+ ab_arp_commit_work);
>+
>+ rtnl_lock();
>+ read_lock(&bond->lock);
>+
>+ delta_in_ticks = msecs_to_jiffies(bond->params.arp_interval);
>
> bond_for_each_slave(bond, slave, i) {
> switch (slave->new_link) {
>@@ -3014,6 +3014,9 @@ do_failover:
> }
>
> bond_set_carrier(bond);
>+
>+ read_unlock(&bond->lock);
>+ rtnl_unlock();
> }
>
> /*
>@@ -3093,9 +3096,6 @@ void bond_activebackup_arp_mon(struct work_struct *work)
>
> read_lock(&bond->lock);
>
>- if (bond->kill_timers)
>- goto out;
>-
> delta_in_ticks = msecs_to_jiffies(bond->params.arp_interval);
>
> if (bond->slave_cnt == 0)
>@@ -3113,24 +3113,15 @@ void bond_activebackup_arp_mon(struct work_struct *work)
> read_unlock(&bond->curr_slave_lock);
> }
>
>- if (bond_ab_arp_inspect(bond, delta_in_ticks)) {
>- read_unlock(&bond->lock);
>- rtnl_lock();
>- read_lock(&bond->lock);
>+ if (bond_ab_arp_inspect(bond, delta_in_ticks))
>+ queue_work(bond->wq, &bond->ab_arp_commit_work);
>
>- bond_ab_arp_commit(bond, delta_in_ticks);
>-
>- read_unlock(&bond->lock);
>- rtnl_unlock();
>- read_lock(&bond->lock);
>- }
>
> bond_ab_arp_probe(bond);
>
> re_arm:
> if (bond->params.arp_interval)
> queue_delayed_work(bond->wq, &bond->arp_work, delta_in_ticks);
>-out:
> read_unlock(&bond->lock);
> }
>
>@@ -3720,8 +3711,6 @@ static int bond_open(struct net_device *bond_dev)
> {
> struct bonding *bond = netdev_priv(bond_dev);
>
>- bond->kill_timers = 0;
>-
> if (bond_is_lb(bond)) {
> /* bond_alb_initialize must be called before the timer
> * is started.
>@@ -3781,26 +3770,23 @@ static int bond_close(struct net_device *bond_dev)
> bond->send_grat_arp = 0;
> bond->send_unsol_na = 0;
>
>- /* signal timers not to re-arm */
>- bond->kill_timers = 1;
>-
> write_unlock_bh(&bond->lock);
>
> if (bond->params.miimon) { /* link check interval, in milliseconds. */
>- cancel_delayed_work(&bond->mii_work);
>+ cancel_delayed_work_sync(&bond->mii_work);
> }
>
> if (bond->params.arp_interval) { /* arp interval, in milliseconds. */
>- cancel_delayed_work(&bond->arp_work);
>+ cancel_delayed_work_sync(&bond->arp_work);
> }
>
> switch (bond->params.mode) {
> case BOND_MODE_8023AD:
>- cancel_delayed_work(&bond->ad_work);
>+ cancel_delayed_work_sync(&bond->ad_work);
> break;
> case BOND_MODE_TLB:
> case BOND_MODE_ALB:
>- cancel_delayed_work(&bond->alb_work);
>+ cancel_delayed_work_sync(&bond->alb_work);
> break;
> default:
> break;
>@@ -4660,23 +4646,19 @@ static void bond_setup(struct net_device *bond_dev)
>
> static void bond_work_cancel_all(struct bonding *bond)
> {
>- write_lock_bh(&bond->lock);
>- bond->kill_timers = 1;
>- write_unlock_bh(&bond->lock);
>-
> if (bond->params.miimon && delayed_work_pending(&bond->mii_work))
>- cancel_delayed_work(&bond->mii_work);
>+ cancel_delayed_work_sync(&bond->mii_work);
>
> if (bond->params.arp_interval && delayed_work_pending(&bond->arp_work))
>- cancel_delayed_work(&bond->arp_work);
>+ cancel_delayed_work_sync(&bond->arp_work);
>
> if (bond->params.mode == BOND_MODE_ALB &&
> delayed_work_pending(&bond->alb_work))
>- cancel_delayed_work(&bond->alb_work);
>+ cancel_delayed_work_sync(&bond->alb_work);
>
> if (bond->params.mode == BOND_MODE_8023AD &&
> delayed_work_pending(&bond->ad_work))
>- cancel_delayed_work(&bond->ad_work);
>+ cancel_delayed_work_sync(&bond->ad_work);
> }
>
> /*
>@@ -5094,6 +5076,9 @@ static int bond_init(struct net_device *bond_dev)
> bond_prepare_sysfs_group(bond);
>
> __hw_addr_init(&bond->mc_list);
>+ INIT_WORK(&bond->miimon_commit_work, bond_miimon_commit);
>+ INIT_WORK(&bond->ab_arp_commit_work, bond_ab_arp_commit);
>+ INIT_WORK(&bond->alb_promisc_disable_work, bond_alb_promisc_disable);
> return 0;
> }
>
>diff --git a/drivers/net/bonding/bonding.h b/drivers/net/bonding/bonding.h
>index c6fdd85..e111023 100644
>--- a/drivers/net/bonding/bonding.h
>+++ b/drivers/net/bonding/bonding.h
>@@ -198,7 +198,6 @@ struct bonding {
> s32 slave_cnt; /* never change this value outside the attach/detach wrappers */
> rwlock_t lock;
> rwlock_t curr_slave_lock;
>- s8 kill_timers;
> s8 send_grat_arp;
> s8 send_unsol_na;
> s8 setup_by_slave;
>@@ -223,6 +222,9 @@ struct bonding {
> struct delayed_work arp_work;
> struct delayed_work alb_work;
> struct delayed_work ad_work;
>+ struct work_struct miimon_commit_work;
>+ struct work_struct ab_arp_commit_work;
>+ struct work_struct alb_promisc_disable_work;
> #if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
> struct in6_addr master_ipv6;
> #endif
>@@ -348,6 +350,7 @@ void bond_select_active_slave(struct bonding *bond);
> void bond_change_active_slave(struct bonding *bond, struct slave *new_active);
> void bond_register_arp(struct bonding *);
> void bond_unregister_arp(struct bonding *);
>+void bond_alb_promisc_disable(struct work_struct *work);
>
> struct bond_net {
> struct net * net; /* Associated network namespace */
>
>
>--
>Jiri Bohac <jbohac@suse.cz>
>SUSE Labs, SUSE CZ
---
-Jay Vosburgh, IBM Linux Technology Center, fubar@us.ibm.com
next prev parent reply other threads:[~2010-08-31 20:54 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-08-31 17:07 [RFC] bonding: fix workqueue re-arming races Jiri Bohac
2010-08-31 20:54 ` Jay Vosburgh [this message]
2010-09-01 12:23 ` Jarek Poplawski
2010-09-01 13:30 ` Jiri Bohac
2010-09-01 15:18 ` Jarek Poplawski
2010-09-01 15:37 ` Jarek Poplawski
2010-09-01 19:00 ` Jarek Poplawski
2010-09-01 19:11 ` Jiri Bohac
2010-09-01 19:20 ` Jarek Poplawski
2010-09-01 19:38 ` Jarek Poplawski
2010-09-01 19:46 ` Jay Vosburgh
2010-09-01 20:06 ` Jarek Poplawski
2010-09-01 13:16 ` Jiri Bohac
2010-09-01 17:14 ` Jay Vosburgh
2010-09-01 18:31 ` Jiri Bohac
2010-09-01 20:00 ` Jay Vosburgh
2010-09-01 20:56 ` Jiri Bohac
2010-09-02 0:54 ` Jay Vosburgh
2010-09-02 17:08 ` Jiri Bohac
2010-09-09 0:06 ` Jay Vosburgh
2010-09-16 22:44 ` Jay Vosburgh
2010-09-24 11:23 ` Narendra K
2010-10-01 18:22 ` Jiri Bohac
2010-10-05 15:03 ` Narendra_K
2010-10-06 7:36 ` Narendra_K
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20136.1283288063@death \
--to=fubar@us.ibm.com \
--cc=bonding-devel@lists.sourceforge.net \
--cc=chavey@google.com \
--cc=jarkao2@gmail.com \
--cc=jbohac@suse.cz \
--cc=markine@google.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.