* [PATCH v3 net-next 1/3] bonding: fix bond_3ad_set_carrier() RCU usage
2014-01-10 9:18 [PATCH v3 net-next 0/3] bonding: fix bond_3ad RCU usage Veaceslav Falico
@ 2014-01-10 9:18 ` Veaceslav Falico
2014-01-10 10:34 ` Ding Tianhong
2014-01-10 9:18 ` [PATCH v3 net-next 2/3] bonding: fix __get_first_agg " Veaceslav Falico
2014-01-10 9:18 ` [PATCH v3 net-next 3/3] bonding: fix __get_active_agg() RCU logic Veaceslav Falico
2 siblings, 1 reply; 8+ messages in thread
From: Veaceslav Falico @ 2014-01-10 9:18 UTC (permalink / raw)
To: netdev; +Cc: Veaceslav Falico, dingtianhong, Jay Vosburgh, Andy Gospodarek
Currently, its usage is just plainly wrong. It first gets a slave under
RCU, and, after releasing the RCU lock, continues to use it - whilst it can
be freed.
Fix this by ensuring that bond_3ad_set_carrier() holds RCU till it uses its
slave (or its agg).
Fixes: be79bd048ab ("bonding: add RCU for bond_3ad_state_machine_handler()")
CC: dingtianhong@huawei.com
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
---
Notes:
v2 -> v3:
Just wrap RCU for the whole usage of our slave.
v1 -> v2:
Don't use _rcu primitives as we can be called under RTNL too.
v1 -> v2:
Don't use _rcu primitives as we can be called under RTNL too.
drivers/net/bonding/bond_3ad.c | 23 ++++++++++++-----------
1 file changed, 12 insertions(+), 11 deletions(-)
diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index 29db1ca..9ff55eb 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -2327,32 +2327,33 @@ int bond_3ad_set_carrier(struct bonding *bond)
{
struct aggregator *active;
struct slave *first_slave;
+ int ret = 1;
rcu_read_lock();
first_slave = bond_first_slave_rcu(bond);
- rcu_read_unlock();
- if (!first_slave)
- return 0;
+ if (!first_slave) {
+ ret = 0;
+ goto out;
+ }
active = __get_active_agg(&(SLAVE_AD_INFO(first_slave).aggregator));
if (active) {
/* are enough slaves available to consider link up? */
if (active->num_of_ports < bond->params.min_links) {
if (netif_carrier_ok(bond->dev)) {
netif_carrier_off(bond->dev);
- return 1;
+ goto out;
}
} else if (!netif_carrier_ok(bond->dev)) {
netif_carrier_on(bond->dev);
- return 1;
+ goto out;
}
- return 0;
- }
-
- if (netif_carrier_ok(bond->dev)) {
+ } else if (netif_carrier_ok(bond->dev)) {
netif_carrier_off(bond->dev);
- return 1;
+ goto out;
}
- return 0;
+out:
+ rcu_read_unlock();
+ return ret;
}
/**
--
1.8.4
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [PATCH v3 net-next 1/3] bonding: fix bond_3ad_set_carrier() RCU usage
2014-01-10 9:18 ` [PATCH v3 net-next 1/3] bonding: fix bond_3ad_set_carrier() " Veaceslav Falico
@ 2014-01-10 10:34 ` Ding Tianhong
0 siblings, 0 replies; 8+ messages in thread
From: Ding Tianhong @ 2014-01-10 10:34 UTC (permalink / raw)
To: Veaceslav Falico, netdev; +Cc: Jay Vosburgh, Andy Gospodarek
On 2014/1/10 17:18, Veaceslav Falico wrote:
> Currently, its usage is just plainly wrong. It first gets a slave under
> RCU, and, after releasing the RCU lock, continues to use it - whilst it can
> be freed.
>
> Fix this by ensuring that bond_3ad_set_carrier() holds RCU till it uses its
> slave (or its agg).
>
> Fixes: be79bd048ab ("bonding: add RCU for bond_3ad_state_machine_handler()")
> CC: dingtianhong@huawei.com
> CC: Jay Vosburgh <fubar@us.ibm.com>
> CC: Andy Gospodarek <andy@greyhouse.net>
> Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
> ---
>
> Notes:
> v2 -> v3:
> Just wrap RCU for the whole usage of our slave.
>
> v1 -> v2:
> Don't use _rcu primitives as we can be called under RTNL too.
>
> v1 -> v2:
> Don't use _rcu primitives as we can be called under RTNL too.
>
> drivers/net/bonding/bond_3ad.c | 23 ++++++++++++-----------
> 1 file changed, 12 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
> index 29db1ca..9ff55eb 100644
> --- a/drivers/net/bonding/bond_3ad.c
> +++ b/drivers/net/bonding/bond_3ad.c
> @@ -2327,32 +2327,33 @@ int bond_3ad_set_carrier(struct bonding *bond)
> {
> struct aggregator *active;
> struct slave *first_slave;
> + int ret = 1;
>
> rcu_read_lock();
> first_slave = bond_first_slave_rcu(bond);
> - rcu_read_unlock();
> - if (!first_slave)
> - return 0;
> + if (!first_slave) {
> + ret = 0;
> + goto out;
> + }
> active = __get_active_agg(&(SLAVE_AD_INFO(first_slave).aggregator));
> if (active) {
> /* are enough slaves available to consider link up? */
> if (active->num_of_ports < bond->params.min_links) {
> if (netif_carrier_ok(bond->dev)) {
> netif_carrier_off(bond->dev);
> - return 1;
> + goto out;
> }
> } else if (!netif_carrier_ok(bond->dev)) {
> netif_carrier_on(bond->dev);
> - return 1;
> + goto out;
> }
> - return 0;
> - }
> -
> - if (netif_carrier_ok(bond->dev)) {
> + } else if (netif_carrier_ok(bond->dev)) {
> netif_carrier_off(bond->dev);
> - return 1;
> + goto out;
no need for this line, but it is not a big issue.
Regards
Ding
> }
> - return 0;
> +out:
> + rcu_read_unlock();
> + return ret;
> }
>
> /**
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v3 net-next 2/3] bonding: fix __get_first_agg RCU usage
2014-01-10 9:18 [PATCH v3 net-next 0/3] bonding: fix bond_3ad RCU usage Veaceslav Falico
2014-01-10 9:18 ` [PATCH v3 net-next 1/3] bonding: fix bond_3ad_set_carrier() " Veaceslav Falico
@ 2014-01-10 9:18 ` Veaceslav Falico
2014-01-10 10:43 ` Ding Tianhong
2014-01-10 9:18 ` [PATCH v3 net-next 3/3] bonding: fix __get_active_agg() RCU logic Veaceslav Falico
2 siblings, 1 reply; 8+ messages in thread
From: Veaceslav Falico @ 2014-01-10 9:18 UTC (permalink / raw)
To: netdev; +Cc: Veaceslav Falico, dingtianhong, Jay Vosburgh, Andy Gospodarek
Currently, the RCU read lock usage is just wrong - it gets the slave struct
under RCU and continues to use it when RCU lock is released.
However, it's still safe to do this cause we didn't need the
rcu_read_lock() initially - all of the __get_first_agg() callers are either
holding RCU read lock or the RTNL lock, so that we can't sync while in it.
So, remove the useless rcu locking and add a comment.
Fixes: be79bd048 ("bonding: add RCU for bond_3ad_state_machine_handler()")
CC: dingtianhong@huawei.com
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
---
Notes:
v2 -> v3:
Use the rcu primitives.
v1 -> v2:
Don't use RCU primitives as we can hold RTNL.
drivers/net/bonding/bond_3ad.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index 9ff55eb..27dac0e 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -143,6 +143,7 @@ static inline struct bonding *__get_bond_by_port(struct port *port)
*
* Return the aggregator of the first slave in @bond, or %NULL if it can't be
* found.
+ * The caller must hold RCU lock.
*/
static inline struct aggregator *__get_first_agg(struct port *port)
{
@@ -153,9 +154,7 @@ static inline struct aggregator *__get_first_agg(struct port *port)
if (bond == NULL)
return NULL;
- rcu_read_lock();
first_slave = bond_first_slave_rcu(bond);
- rcu_read_unlock();
return first_slave ? &(SLAVE_AD_INFO(first_slave).aggregator) : NULL;
}
--
1.8.4
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [PATCH v3 net-next 2/3] bonding: fix __get_first_agg RCU usage
2014-01-10 9:18 ` [PATCH v3 net-next 2/3] bonding: fix __get_first_agg " Veaceslav Falico
@ 2014-01-10 10:43 ` Ding Tianhong
2014-01-10 10:53 ` Veaceslav Falico
0 siblings, 1 reply; 8+ messages in thread
From: Ding Tianhong @ 2014-01-10 10:43 UTC (permalink / raw)
To: Veaceslav Falico, netdev; +Cc: Jay Vosburgh, Andy Gospodarek
On 2014/1/10 17:18, Veaceslav Falico wrote:
> Currently, the RCU read lock usage is just wrong - it gets the slave struct
> under RCU and continues to use it when RCU lock is released.
>
> However, it's still safe to do this cause we didn't need the
> rcu_read_lock() initially - all of the __get_first_agg() callers are either
> holding RCU read lock or the RTNL lock, so that we can't sync while in it.
>
> So, remove the useless rcu locking and add a comment.
>
> Fixes: be79bd048 ("bonding: add RCU for bond_3ad_state_machine_handler()")
> CC: dingtianhong@huawei.com
> CC: Jay Vosburgh <fubar@us.ibm.com>
> CC: Andy Gospodarek <andy@greyhouse.net>
> Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
> ---
>
> Notes:
> v2 -> v3:
> Use the rcu primitives.
>
> v1 -> v2:
> Don't use RCU primitives as we can hold RTNL.
>
> drivers/net/bonding/bond_3ad.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
> index 9ff55eb..27dac0e 100644
> --- a/drivers/net/bonding/bond_3ad.c
> +++ b/drivers/net/bonding/bond_3ad.c
> @@ -143,6 +143,7 @@ static inline struct bonding *__get_bond_by_port(struct port *port)
> *
> * Return the aggregator of the first slave in @bond, or %NULL if it can't be
> * found.
> + * The caller must hold RCU lock.
> */
> static inline struct aggregator *__get_first_agg(struct port *port)
> {
> @@ -153,9 +154,7 @@ static inline struct aggregator *__get_first_agg(struct port *port)
> if (bond == NULL)
> return NULL;
>
> - rcu_read_lock();
> first_slave = bond_first_slave_rcu(bond);
> - rcu_read_unlock();
>
I am afraid the lockdep check will calling some warming:
bond_3ad_unbind_slave -> __get_first_agg -> bond_first_slave_rcu -> netdev_lower_get_first_private_rcu -> list_first_or_null_rcu
but the bond_3ad_unbind_slave is not in RCU.
Regards
Ding
> return first_slave ? &(SLAVE_AD_INFO(first_slave).aggregator) : NULL;
> }
>
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [PATCH v3 net-next 2/3] bonding: fix __get_first_agg RCU usage
2014-01-10 10:43 ` Ding Tianhong
@ 2014-01-10 10:53 ` Veaceslav Falico
0 siblings, 0 replies; 8+ messages in thread
From: Veaceslav Falico @ 2014-01-10 10:53 UTC (permalink / raw)
To: Ding Tianhong; +Cc: netdev, Jay Vosburgh, Andy Gospodarek
On Fri, Jan 10, 2014 at 06:43:41PM +0800, Ding Tianhong wrote:
>On 2014/1/10 17:18, Veaceslav Falico wrote:
>> - rcu_read_lock();
>> first_slave = bond_first_slave_rcu(bond);
>> - rcu_read_unlock();
>>
>I am afraid the lockdep check will calling some warming:
>bond_3ad_unbind_slave -> __get_first_agg -> bond_first_slave_rcu -> netdev_lower_get_first_private_rcu -> list_first_or_null_rcu
>
>but the bond_3ad_unbind_slave is not in RCU.
Yep, right, I'm always colliding with my next patchset which removes it
completely, so it doesn't whine.
Will resend.
>
>Regards
>Ding
>> return first_slave ? &(SLAVE_AD_INFO(first_slave).aggregator) : NULL;
>> }
>>
>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v3 net-next 3/3] bonding: fix __get_active_agg() RCU logic
2014-01-10 9:18 [PATCH v3 net-next 0/3] bonding: fix bond_3ad RCU usage Veaceslav Falico
2014-01-10 9:18 ` [PATCH v3 net-next 1/3] bonding: fix bond_3ad_set_carrier() " Veaceslav Falico
2014-01-10 9:18 ` [PATCH v3 net-next 2/3] bonding: fix __get_first_agg " Veaceslav Falico
@ 2014-01-10 9:18 ` Veaceslav Falico
2014-01-10 10:48 ` Ding Tianhong
2 siblings, 1 reply; 8+ messages in thread
From: Veaceslav Falico @ 2014-01-10 9:18 UTC (permalink / raw)
To: netdev; +Cc: Veaceslav Falico, dingtianhong, Jay Vosburgh, Andy Gospodarek
Currently, the implementation is meaningless - once again, we take the
slave structure and use it after we've exited RCU critical section.
Fix this by removing the rcu_read_lock() from __get_active_agg(), and
ensuring that all its callers are holding RCU.
Fixes: be79bd048 ("bonding: add RCU for bond_3ad_state_machine_handler()")
CC: dingtianhong@huawei.com
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
---
Notes:
v2 -> v3:
Use the RCU primitives.
v1 -> v2:
Don't use RCU primitives as we can hold RTNL.
drivers/net/bonding/bond_3ad.c | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)
diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index 27dac0e..112afa8 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -674,6 +674,8 @@ static u32 __get_agg_bandwidth(struct aggregator *aggregator)
/**
* __get_active_agg - get the current active aggregator
* @aggregator: the aggregator we're looking at
+ *
+ * Caller must hold RCU lock.
*/
static struct aggregator *__get_active_agg(struct aggregator *aggregator)
{
@@ -681,13 +683,9 @@ static struct aggregator *__get_active_agg(struct aggregator *aggregator)
struct list_head *iter;
struct slave *slave;
- rcu_read_lock();
bond_for_each_slave_rcu(bond, slave, iter)
- if (SLAVE_AD_INFO(slave).aggregator.is_active) {
- rcu_read_unlock();
+ if (SLAVE_AD_INFO(slave).aggregator.is_active)
return &(SLAVE_AD_INFO(slave).aggregator);
- }
- rcu_read_unlock();
return NULL;
}
@@ -1495,11 +1493,11 @@ static void ad_agg_selection_logic(struct aggregator *agg)
struct slave *slave;
struct port *port;
+ rcu_read_lock();
origin = agg;
active = __get_active_agg(agg);
best = (active && agg_device_up(active)) ? active : NULL;
- rcu_read_lock();
bond_for_each_slave_rcu(bond, slave, iter) {
agg = &(SLAVE_AD_INFO(slave).aggregator);
--
1.8.4
^ permalink raw reply related [flat|nested] 8+ messages in thread* Re: [PATCH v3 net-next 3/3] bonding: fix __get_active_agg() RCU logic
2014-01-10 9:18 ` [PATCH v3 net-next 3/3] bonding: fix __get_active_agg() RCU logic Veaceslav Falico
@ 2014-01-10 10:48 ` Ding Tianhong
0 siblings, 0 replies; 8+ messages in thread
From: Ding Tianhong @ 2014-01-10 10:48 UTC (permalink / raw)
To: Veaceslav Falico, netdev; +Cc: Jay Vosburgh, Andy Gospodarek
On 2014/1/10 17:18, Veaceslav Falico wrote:
> Currently, the implementation is meaningless - once again, we take the
> slave structure and use it after we've exited RCU critical section.
>
> Fix this by removing the rcu_read_lock() from __get_active_agg(), and
> ensuring that all its callers are holding RCU.
>
> Fixes: be79bd048 ("bonding: add RCU for bond_3ad_state_machine_handler()")
> CC: dingtianhong@huawei.com
> CC: Jay Vosburgh <fubar@us.ibm.com>
> CC: Andy Gospodarek <andy@greyhouse.net>
> Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
> ---
>
> Notes:
> v2 -> v3:
> Use the RCU primitives.
>
> v1 -> v2:
> Don't use RCU primitives as we can hold RTNL.
>
> drivers/net/bonding/bond_3ad.c | 10 ++++------
> 1 file changed, 4 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
> index 27dac0e..112afa8 100644
> --- a/drivers/net/bonding/bond_3ad.c
> +++ b/drivers/net/bonding/bond_3ad.c
> @@ -674,6 +674,8 @@ static u32 __get_agg_bandwidth(struct aggregator *aggregator)
> /**
> * __get_active_agg - get the current active aggregator
> * @aggregator: the aggregator we're looking at
> + *
> + * Caller must hold RCU lock.
> */
> static struct aggregator *__get_active_agg(struct aggregator *aggregator)
> {
> @@ -681,13 +683,9 @@ static struct aggregator *__get_active_agg(struct aggregator *aggregator)
> struct list_head *iter;
> struct slave *slave;
>
> - rcu_read_lock();
> bond_for_each_slave_rcu(bond, slave, iter)
> - if (SLAVE_AD_INFO(slave).aggregator.is_active) {
> - rcu_read_unlock();
> + if (SLAVE_AD_INFO(slave).aggregator.is_active)
> return &(SLAVE_AD_INFO(slave).aggregator);
> - }
> - rcu_read_unlock();
>
> return NULL;
> }
> @@ -1495,11 +1493,11 @@ static void ad_agg_selection_logic(struct aggregator *agg)
> struct slave *slave;
> struct port *port;
>
> + rcu_read_lock();
> origin = agg;
> active = __get_active_agg(agg);
> best = (active && agg_device_up(active)) ? active : NULL;
>
> - rcu_read_lock();
> bond_for_each_slave_rcu(bond, slave, iter) {
> agg = &(SLAVE_AD_INFO(slave).aggregator);
>
>
Great.
Acked-by: Ding Tianhong <dingtianhong@huawei.com>
^ permalink raw reply [flat|nested] 8+ messages in thread