netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 net-next 0/3] bonding: fix bond_3ad RCU usage
@ 2014-01-10  9:18 Veaceslav Falico
  2014-01-10  9:18 ` [PATCH v3 net-next 1/3] bonding: fix bond_3ad_set_carrier() " Veaceslav Falico
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Veaceslav Falico @ 2014-01-10  9:18 UTC (permalink / raw)
  To: netdev; +Cc: dingtianhong, Jay Vosburgh, Andy Gospodarek, Veaceslav Falico

While digging through bond_3ad.c I've found that the RCU usage there is
just wrong - it's used as a kind of mutex/spinlock instead of RCU.

v2->v3: make bond_3ad_set_carrier() use RCU read lock for the whole
function, so that all other functions will be protected by RCU as well.
This way we can use _rcu variants everywhere.

v1->v2: use generic primitives instead of _rcu ones cause we can hold RTNL
lock without RCU one, which is still safe.

This patchset is on top of bond_3ad.c cleanup:
http://www.spinics.net/lists/netdev/msg265447.html

CC: dingtianhong@huawei.com
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: netdev@vger.kernel.org
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v3 net-next 1/3] bonding: fix bond_3ad_set_carrier() RCU usage
  2014-01-10  9:18 [PATCH v3 net-next 0/3] bonding: fix bond_3ad RCU usage Veaceslav Falico
@ 2014-01-10  9:18 ` Veaceslav Falico
  2014-01-10 10:34   ` Ding Tianhong
  2014-01-10  9:18 ` [PATCH v3 net-next 2/3] bonding: fix __get_first_agg " Veaceslav Falico
  2014-01-10  9:18 ` [PATCH v3 net-next 3/3] bonding: fix __get_active_agg() RCU logic Veaceslav Falico
  2 siblings, 1 reply; 8+ messages in thread
From: Veaceslav Falico @ 2014-01-10  9:18 UTC (permalink / raw)
  To: netdev; +Cc: Veaceslav Falico, dingtianhong, Jay Vosburgh, Andy Gospodarek

Currently, its usage is just plainly wrong. It first gets a slave under
RCU, and, after releasing the RCU lock, continues to use it - whilst it can
be freed.

Fix this by ensuring that bond_3ad_set_carrier() holds RCU till it uses its
slave (or its agg).

Fixes: be79bd048ab ("bonding: add RCU for bond_3ad_state_machine_handler()")
CC: dingtianhong@huawei.com
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
---

Notes:
    v2 -> v3:
    Just wrap RCU for the whole usage of our slave.
    
    v1 -> v2:
    Don't use _rcu primitives as we can be called under RTNL too.
    
    v1 -> v2:
    Don't use _rcu primitives as we can be called under RTNL too.

 drivers/net/bonding/bond_3ad.c | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index 29db1ca..9ff55eb 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -2327,32 +2327,33 @@ int bond_3ad_set_carrier(struct bonding *bond)
 {
 	struct aggregator *active;
 	struct slave *first_slave;
+	int ret = 1;
 
 	rcu_read_lock();
 	first_slave = bond_first_slave_rcu(bond);
-	rcu_read_unlock();
-	if (!first_slave)
-		return 0;
+	if (!first_slave) {
+		ret = 0;
+		goto out;
+	}
 	active = __get_active_agg(&(SLAVE_AD_INFO(first_slave).aggregator));
 	if (active) {
 		/* are enough slaves available to consider link up? */
 		if (active->num_of_ports < bond->params.min_links) {
 			if (netif_carrier_ok(bond->dev)) {
 				netif_carrier_off(bond->dev);
-				return 1;
+				goto out;
 			}
 		} else if (!netif_carrier_ok(bond->dev)) {
 			netif_carrier_on(bond->dev);
-			return 1;
+			goto out;
 		}
-		return 0;
-	}
-
-	if (netif_carrier_ok(bond->dev)) {
+	} else if (netif_carrier_ok(bond->dev)) {
 		netif_carrier_off(bond->dev);
-		return 1;
+		goto out;
 	}
-	return 0;
+out:
+	rcu_read_unlock();
+	return ret;
 }
 
 /**
-- 
1.8.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v3 net-next 2/3] bonding: fix __get_first_agg RCU usage
  2014-01-10  9:18 [PATCH v3 net-next 0/3] bonding: fix bond_3ad RCU usage Veaceslav Falico
  2014-01-10  9:18 ` [PATCH v3 net-next 1/3] bonding: fix bond_3ad_set_carrier() " Veaceslav Falico
@ 2014-01-10  9:18 ` Veaceslav Falico
  2014-01-10 10:43   ` Ding Tianhong
  2014-01-10  9:18 ` [PATCH v3 net-next 3/3] bonding: fix __get_active_agg() RCU logic Veaceslav Falico
  2 siblings, 1 reply; 8+ messages in thread
From: Veaceslav Falico @ 2014-01-10  9:18 UTC (permalink / raw)
  To: netdev; +Cc: Veaceslav Falico, dingtianhong, Jay Vosburgh, Andy Gospodarek

Currently, the RCU read lock usage is just wrong - it gets the slave struct
under RCU and continues to use it when RCU lock is released.

However, it's still safe to do this cause we didn't need the
rcu_read_lock() initially - all of the __get_first_agg() callers are either
holding RCU read lock or the RTNL lock, so that we can't sync while in it.

So, remove the useless rcu locking and add a comment.

Fixes: be79bd048 ("bonding: add RCU for bond_3ad_state_machine_handler()")
CC: dingtianhong@huawei.com
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
---

Notes:
    v2 -> v3:
    Use the rcu primitives.
    
    v1 -> v2:
    Don't use RCU primitives as we can hold RTNL.

 drivers/net/bonding/bond_3ad.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index 9ff55eb..27dac0e 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -143,6 +143,7 @@ static inline struct bonding *__get_bond_by_port(struct port *port)
  *
  * Return the aggregator of the first slave in @bond, or %NULL if it can't be
  * found.
+ * The caller must hold RCU lock.
  */
 static inline struct aggregator *__get_first_agg(struct port *port)
 {
@@ -153,9 +154,7 @@ static inline struct aggregator *__get_first_agg(struct port *port)
 	if (bond == NULL)
 		return NULL;
 
-	rcu_read_lock();
 	first_slave = bond_first_slave_rcu(bond);
-	rcu_read_unlock();
 
 	return first_slave ? &(SLAVE_AD_INFO(first_slave).aggregator) : NULL;
 }
-- 
1.8.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v3 net-next 3/3] bonding: fix __get_active_agg() RCU logic
  2014-01-10  9:18 [PATCH v3 net-next 0/3] bonding: fix bond_3ad RCU usage Veaceslav Falico
  2014-01-10  9:18 ` [PATCH v3 net-next 1/3] bonding: fix bond_3ad_set_carrier() " Veaceslav Falico
  2014-01-10  9:18 ` [PATCH v3 net-next 2/3] bonding: fix __get_first_agg " Veaceslav Falico
@ 2014-01-10  9:18 ` Veaceslav Falico
  2014-01-10 10:48   ` Ding Tianhong
  2 siblings, 1 reply; 8+ messages in thread
From: Veaceslav Falico @ 2014-01-10  9:18 UTC (permalink / raw)
  To: netdev; +Cc: Veaceslav Falico, dingtianhong, Jay Vosburgh, Andy Gospodarek

Currently, the implementation is meaningless - once again, we take the
slave structure and use it after we've exited RCU critical section.

Fix this by removing the rcu_read_lock() from __get_active_agg(), and
ensuring that all its callers are holding RCU.

Fixes: be79bd048 ("bonding: add RCU for bond_3ad_state_machine_handler()")
CC: dingtianhong@huawei.com
CC: Jay Vosburgh <fubar@us.ibm.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
---

Notes:
    v2 -> v3:
    Use the RCU primitives.
    
    v1 -> v2:
    Don't use RCU primitives as we can hold RTNL.

 drivers/net/bonding/bond_3ad.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index 27dac0e..112afa8 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -674,6 +674,8 @@ static u32 __get_agg_bandwidth(struct aggregator *aggregator)
 /**
  * __get_active_agg - get the current active aggregator
  * @aggregator: the aggregator we're looking at
+ *
+ * Caller must hold RCU lock.
  */
 static struct aggregator *__get_active_agg(struct aggregator *aggregator)
 {
@@ -681,13 +683,9 @@ static struct aggregator *__get_active_agg(struct aggregator *aggregator)
 	struct list_head *iter;
 	struct slave *slave;
 
-	rcu_read_lock();
 	bond_for_each_slave_rcu(bond, slave, iter)
-		if (SLAVE_AD_INFO(slave).aggregator.is_active) {
-			rcu_read_unlock();
+		if (SLAVE_AD_INFO(slave).aggregator.is_active)
 			return &(SLAVE_AD_INFO(slave).aggregator);
-		}
-	rcu_read_unlock();
 
 	return NULL;
 }
@@ -1495,11 +1493,11 @@ static void ad_agg_selection_logic(struct aggregator *agg)
 	struct slave *slave;
 	struct port *port;
 
+	rcu_read_lock();
 	origin = agg;
 	active = __get_active_agg(agg);
 	best = (active && agg_device_up(active)) ? active : NULL;
 
-	rcu_read_lock();
 	bond_for_each_slave_rcu(bond, slave, iter) {
 		agg = &(SLAVE_AD_INFO(slave).aggregator);
 
-- 
1.8.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 net-next 1/3] bonding: fix bond_3ad_set_carrier() RCU usage
  2014-01-10  9:18 ` [PATCH v3 net-next 1/3] bonding: fix bond_3ad_set_carrier() " Veaceslav Falico
@ 2014-01-10 10:34   ` Ding Tianhong
  0 siblings, 0 replies; 8+ messages in thread
From: Ding Tianhong @ 2014-01-10 10:34 UTC (permalink / raw)
  To: Veaceslav Falico, netdev; +Cc: Jay Vosburgh, Andy Gospodarek

On 2014/1/10 17:18, Veaceslav Falico wrote:
> Currently, its usage is just plainly wrong. It first gets a slave under
> RCU, and, after releasing the RCU lock, continues to use it - whilst it can
> be freed.
> 
> Fix this by ensuring that bond_3ad_set_carrier() holds RCU till it uses its
> slave (or its agg).
> 
> Fixes: be79bd048ab ("bonding: add RCU for bond_3ad_state_machine_handler()")
> CC: dingtianhong@huawei.com
> CC: Jay Vosburgh <fubar@us.ibm.com>
> CC: Andy Gospodarek <andy@greyhouse.net>
> Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
> ---
> 
> Notes:
>     v2 -> v3:
>     Just wrap RCU for the whole usage of our slave.
>     
>     v1 -> v2:
>     Don't use _rcu primitives as we can be called under RTNL too.
>     
>     v1 -> v2:
>     Don't use _rcu primitives as we can be called under RTNL too.
> 
>  drivers/net/bonding/bond_3ad.c | 23 ++++++++++++-----------
>  1 file changed, 12 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
> index 29db1ca..9ff55eb 100644
> --- a/drivers/net/bonding/bond_3ad.c
> +++ b/drivers/net/bonding/bond_3ad.c
> @@ -2327,32 +2327,33 @@ int bond_3ad_set_carrier(struct bonding *bond)
>  {
>  	struct aggregator *active;
>  	struct slave *first_slave;
> +	int ret = 1;
>  
>  	rcu_read_lock();
>  	first_slave = bond_first_slave_rcu(bond);
> -	rcu_read_unlock();
> -	if (!first_slave)
> -		return 0;
> +	if (!first_slave) {
> +		ret = 0;
> +		goto out;
> +	}
>  	active = __get_active_agg(&(SLAVE_AD_INFO(first_slave).aggregator));
>  	if (active) {
>  		/* are enough slaves available to consider link up? */
>  		if (active->num_of_ports < bond->params.min_links) {
>  			if (netif_carrier_ok(bond->dev)) {
>  				netif_carrier_off(bond->dev);
> -				return 1;
> +				goto out;
>  			}
>  		} else if (!netif_carrier_ok(bond->dev)) {
>  			netif_carrier_on(bond->dev);
> -			return 1;
> +			goto out;
>  		}
> -		return 0;
> -	}
> -
> -	if (netif_carrier_ok(bond->dev)) {
> +	} else if (netif_carrier_ok(bond->dev)) {
>  		netif_carrier_off(bond->dev);
> -		return 1;
> +		goto out;
no need for this line, but it is not a big issue.

Regards
Ding

>  	}
> -	return 0;
> +out:
> +	rcu_read_unlock();
> +	return ret;
>  }
>  
>  /**
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 net-next 2/3] bonding: fix __get_first_agg RCU usage
  2014-01-10  9:18 ` [PATCH v3 net-next 2/3] bonding: fix __get_first_agg " Veaceslav Falico
@ 2014-01-10 10:43   ` Ding Tianhong
  2014-01-10 10:53     ` Veaceslav Falico
  0 siblings, 1 reply; 8+ messages in thread
From: Ding Tianhong @ 2014-01-10 10:43 UTC (permalink / raw)
  To: Veaceslav Falico, netdev; +Cc: Jay Vosburgh, Andy Gospodarek

On 2014/1/10 17:18, Veaceslav Falico wrote:
> Currently, the RCU read lock usage is just wrong - it gets the slave struct
> under RCU and continues to use it when RCU lock is released.
> 
> However, it's still safe to do this cause we didn't need the
> rcu_read_lock() initially - all of the __get_first_agg() callers are either
> holding RCU read lock or the RTNL lock, so that we can't sync while in it.
> 
> So, remove the useless rcu locking and add a comment.
> 
> Fixes: be79bd048 ("bonding: add RCU for bond_3ad_state_machine_handler()")
> CC: dingtianhong@huawei.com
> CC: Jay Vosburgh <fubar@us.ibm.com>
> CC: Andy Gospodarek <andy@greyhouse.net>
> Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
> ---
> 
> Notes:
>     v2 -> v3:
>     Use the rcu primitives.
>     
>     v1 -> v2:
>     Don't use RCU primitives as we can hold RTNL.
> 
>  drivers/net/bonding/bond_3ad.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
> index 9ff55eb..27dac0e 100644
> --- a/drivers/net/bonding/bond_3ad.c
> +++ b/drivers/net/bonding/bond_3ad.c
> @@ -143,6 +143,7 @@ static inline struct bonding *__get_bond_by_port(struct port *port)
>   *
>   * Return the aggregator of the first slave in @bond, or %NULL if it can't be
>   * found.
> + * The caller must hold RCU lock.
>   */
>  static inline struct aggregator *__get_first_agg(struct port *port)
>  {
> @@ -153,9 +154,7 @@ static inline struct aggregator *__get_first_agg(struct port *port)
>  	if (bond == NULL)
>  		return NULL;
>  
> -	rcu_read_lock();
>  	first_slave = bond_first_slave_rcu(bond);
> -	rcu_read_unlock();
>  
I am afraid the lockdep check will calling some warming:
bond_3ad_unbind_slave -> __get_first_agg -> bond_first_slave_rcu -> netdev_lower_get_first_private_rcu -> list_first_or_null_rcu

but the bond_3ad_unbind_slave is not in RCU.

Regards
Ding
>  	return first_slave ? &(SLAVE_AD_INFO(first_slave).aggregator) : NULL;
>  }
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 net-next 3/3] bonding: fix __get_active_agg() RCU logic
  2014-01-10  9:18 ` [PATCH v3 net-next 3/3] bonding: fix __get_active_agg() RCU logic Veaceslav Falico
@ 2014-01-10 10:48   ` Ding Tianhong
  0 siblings, 0 replies; 8+ messages in thread
From: Ding Tianhong @ 2014-01-10 10:48 UTC (permalink / raw)
  To: Veaceslav Falico, netdev; +Cc: Jay Vosburgh, Andy Gospodarek

On 2014/1/10 17:18, Veaceslav Falico wrote:
> Currently, the implementation is meaningless - once again, we take the
> slave structure and use it after we've exited RCU critical section.
> 
> Fix this by removing the rcu_read_lock() from __get_active_agg(), and
> ensuring that all its callers are holding RCU.
> 
> Fixes: be79bd048 ("bonding: add RCU for bond_3ad_state_machine_handler()")
> CC: dingtianhong@huawei.com
> CC: Jay Vosburgh <fubar@us.ibm.com>
> CC: Andy Gospodarek <andy@greyhouse.net>
> Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
> ---
> 
> Notes:
>     v2 -> v3:
>     Use the RCU primitives.
>     
>     v1 -> v2:
>     Don't use RCU primitives as we can hold RTNL.
> 
>  drivers/net/bonding/bond_3ad.c | 10 ++++------
>  1 file changed, 4 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
> index 27dac0e..112afa8 100644
> --- a/drivers/net/bonding/bond_3ad.c
> +++ b/drivers/net/bonding/bond_3ad.c
> @@ -674,6 +674,8 @@ static u32 __get_agg_bandwidth(struct aggregator *aggregator)
>  /**
>   * __get_active_agg - get the current active aggregator
>   * @aggregator: the aggregator we're looking at
> + *
> + * Caller must hold RCU lock.
>   */
>  static struct aggregator *__get_active_agg(struct aggregator *aggregator)
>  {
> @@ -681,13 +683,9 @@ static struct aggregator *__get_active_agg(struct aggregator *aggregator)
>  	struct list_head *iter;
>  	struct slave *slave;
>  
> -	rcu_read_lock();
>  	bond_for_each_slave_rcu(bond, slave, iter)
> -		if (SLAVE_AD_INFO(slave).aggregator.is_active) {
> -			rcu_read_unlock();
> +		if (SLAVE_AD_INFO(slave).aggregator.is_active)
>  			return &(SLAVE_AD_INFO(slave).aggregator);
> -		}
> -	rcu_read_unlock();
>  
>  	return NULL;
>  }
> @@ -1495,11 +1493,11 @@ static void ad_agg_selection_logic(struct aggregator *agg)
>  	struct slave *slave;
>  	struct port *port;
>  
> +	rcu_read_lock();
>  	origin = agg;
>  	active = __get_active_agg(agg);
>  	best = (active && agg_device_up(active)) ? active : NULL;
>  
> -	rcu_read_lock();
>  	bond_for_each_slave_rcu(bond, slave, iter) {
>  		agg = &(SLAVE_AD_INFO(slave).aggregator);
>  
> 
Great.

Acked-by: Ding Tianhong <dingtianhong@huawei.com>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v3 net-next 2/3] bonding: fix __get_first_agg RCU usage
  2014-01-10 10:43   ` Ding Tianhong
@ 2014-01-10 10:53     ` Veaceslav Falico
  0 siblings, 0 replies; 8+ messages in thread
From: Veaceslav Falico @ 2014-01-10 10:53 UTC (permalink / raw)
  To: Ding Tianhong; +Cc: netdev, Jay Vosburgh, Andy Gospodarek

On Fri, Jan 10, 2014 at 06:43:41PM +0800, Ding Tianhong wrote:
>On 2014/1/10 17:18, Veaceslav Falico wrote:
>> -	rcu_read_lock();
>>  	first_slave = bond_first_slave_rcu(bond);
>> -	rcu_read_unlock();
>>
>I am afraid the lockdep check will calling some warming:
>bond_3ad_unbind_slave -> __get_first_agg -> bond_first_slave_rcu -> netdev_lower_get_first_private_rcu -> list_first_or_null_rcu
>
>but the bond_3ad_unbind_slave is not in RCU.

Yep, right, I'm always colliding with my next patchset which removes it
completely, so it doesn't whine.

Will resend.

>
>Regards
>Ding
>>  	return first_slave ? &(SLAVE_AD_INFO(first_slave).aggregator) : NULL;
>>  }
>>
>
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2014-01-10 10:56 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-10  9:18 [PATCH v3 net-next 0/3] bonding: fix bond_3ad RCU usage Veaceslav Falico
2014-01-10  9:18 ` [PATCH v3 net-next 1/3] bonding: fix bond_3ad_set_carrier() " Veaceslav Falico
2014-01-10 10:34   ` Ding Tianhong
2014-01-10  9:18 ` [PATCH v3 net-next 2/3] bonding: fix __get_first_agg " Veaceslav Falico
2014-01-10 10:43   ` Ding Tianhong
2014-01-10 10:53     ` Veaceslav Falico
2014-01-10  9:18 ` [PATCH v3 net-next 3/3] bonding: fix __get_active_agg() RCU logic Veaceslav Falico
2014-01-10 10:48   ` Ding Tianhong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).