netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next v2] Avoid traversing addrconf hash on ifdown
@ 2024-11-08  5:25 Gilad Naaman
  2024-11-09 15:00 ` Vadim Fedorenko
  0 siblings, 1 reply; 9+ messages in thread
From: Gilad Naaman @ 2024-11-08  5:25 UTC (permalink / raw)
  To: Kuniyuki Iwashima, David S. Miller, David Ahern, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Simon Horman, netdev
  Cc: Gilad Naaman

struct inet6_dev already has a list of addresses owned by the device,
enabling us to traverse this much shorter list, instead of scanning
the entire hash-table.

Signed-off-by: Gilad Naaman <gnaaman@drivenets.com>
---
Changes in v2:
 - Remove double BH sections
 - Styling fixes (extra {}, extra newline)
---
 net/ipv6/addrconf.c | 38 +++++++++++++++++---------------------
 1 file changed, 17 insertions(+), 21 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index d0a99710d65d..c6fbd634912a 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -3846,12 +3846,12 @@ static int addrconf_ifdown(struct net_device *dev, bool unregister)
 {
 	unsigned long event = unregister ? NETDEV_UNREGISTER : NETDEV_DOWN;
 	struct net *net = dev_net(dev);
-	struct inet6_dev *idev;
 	struct inet6_ifaddr *ifa;
 	LIST_HEAD(tmp_addr_list);
+	struct inet6_dev *idev;
 	bool keep_addr = false;
 	bool was_ready;
-	int state, i;
+	int state;
 
 	ASSERT_RTNL();
 
@@ -3890,28 +3890,24 @@ static int addrconf_ifdown(struct net_device *dev, bool unregister)
 	}
 
 	/* Step 2: clear hash table */
-	for (i = 0; i < IN6_ADDR_HSIZE; i++) {
-		struct hlist_head *h = &net->ipv6.inet6_addr_lst[i];
+	read_lock_bh(&idev->lock);
+	spin_lock(&net->ipv6.addrconf_hash_lock);
 
-		spin_lock_bh(&net->ipv6.addrconf_hash_lock);
-restart:
-		hlist_for_each_entry_rcu(ifa, h, addr_lst) {
-			if (ifa->idev == idev) {
-				addrconf_del_dad_work(ifa);
-				/* combined flag + permanent flag decide if
-				 * address is retained on a down event
-				 */
-				if (!keep_addr ||
-				    !(ifa->flags & IFA_F_PERMANENT) ||
-				    addr_is_local(&ifa->addr)) {
-					hlist_del_init_rcu(&ifa->addr_lst);
-					goto restart;
-				}
-			}
-		}
-		spin_unlock_bh(&net->ipv6.addrconf_hash_lock);
+	list_for_each_entry(ifa, &idev->addr_list, if_list) {
+		addrconf_del_dad_work(ifa);
+
+		/* combined flag + permanent flag decide if
+		 * address is retained on a down event
+		 */
+		if (!keep_addr ||
+		    !(ifa->flags & IFA_F_PERMANENT) ||
+		    addr_is_local(&ifa->addr))
+			hlist_del_init_rcu(&ifa->addr_lst);
 	}
 
+	spin_unlock(&net->ipv6.addrconf_hash_lock);
+	read_unlock_bh(&idev->lock);
+
 	write_lock_bh(&idev->lock);
 
 	addrconf_del_rs_timer(idev);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next v2] Avoid traversing addrconf hash on ifdown
  2024-11-08  5:25 [PATCH net-next v2] Avoid traversing addrconf hash on ifdown Gilad Naaman
@ 2024-11-09 15:00 ` Vadim Fedorenko
  2024-11-10  6:53   ` Gilad Naaman
  0 siblings, 1 reply; 9+ messages in thread
From: Vadim Fedorenko @ 2024-11-09 15:00 UTC (permalink / raw)
  To: Gilad Naaman, Kuniyuki Iwashima, David S. Miller, David Ahern,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman, netdev

On 08/11/2024 05:25, Gilad Naaman wrote:
> struct inet6_dev already has a list of addresses owned by the device,
> enabling us to traverse this much shorter list, instead of scanning
> the entire hash-table.
> 
> Signed-off-by: Gilad Naaman <gnaaman@drivenets.com>
> ---
> Changes in v2:
>   - Remove double BH sections
>   - Styling fixes (extra {}, extra newline)
> ---
>   net/ipv6/addrconf.c | 38 +++++++++++++++++---------------------
>   1 file changed, 17 insertions(+), 21 deletions(-)
> 
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index d0a99710d65d..c6fbd634912a 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -3846,12 +3846,12 @@ static int addrconf_ifdown(struct net_device *dev, bool unregister)
>   {
>   	unsigned long event = unregister ? NETDEV_UNREGISTER : NETDEV_DOWN;
>   	struct net *net = dev_net(dev);
> -	struct inet6_dev *idev;
>   	struct inet6_ifaddr *ifa;
>   	LIST_HEAD(tmp_addr_list);
> +	struct inet6_dev *idev;
>   	bool keep_addr = false;
>   	bool was_ready;
> -	int state, i;
> +	int state;
>   
>   	ASSERT_RTNL();
>   
> @@ -3890,28 +3890,24 @@ static int addrconf_ifdown(struct net_device *dev, bool unregister)
>   	}
>   
>   	/* Step 2: clear hash table */
> -	for (i = 0; i < IN6_ADDR_HSIZE; i++) {
> -		struct hlist_head *h = &net->ipv6.inet6_addr_lst[i];
> +	read_lock_bh(&idev->lock);
 > +	spin_lock(&net->ipv6.addrconf_hash_lock);>
> -		spin_lock_bh(&net->ipv6.addrconf_hash_lock);
> -restart:
> -		hlist_for_each_entry_rcu(ifa, h, addr_lst) {
> -			if (ifa->idev == idev) {
> -				addrconf_del_dad_work(ifa);
> -				/* combined flag + permanent flag decide if
> -				 * address is retained on a down event
> -				 */
> -				if (!keep_addr ||
> -				    !(ifa->flags & IFA_F_PERMANENT) ||
> -				    addr_is_local(&ifa->addr)) {
> -					hlist_del_init_rcu(&ifa->addr_lst);
> -					goto restart;
> -				}
> -			}
> -		}
> -		spin_unlock_bh(&net->ipv6.addrconf_hash_lock);
> +	list_for_each_entry(ifa, &idev->addr_list, if_list) {
> +		addrconf_del_dad_work(ifa);
> +
> +		/* combined flag + permanent flag decide if
> +		 * address is retained on a down event
> +		 */
> +		if (!keep_addr ||
> +		    !(ifa->flags & IFA_F_PERMANENT) ||
> +		    addr_is_local(&ifa->addr))
> +			hlist_del_init_rcu(&ifa->addr_lst);
>   	}
>   
> +	spin_unlock(&net->ipv6.addrconf_hash_lock);
> +	read_unlock_bh(&idev->lock);

Why is this read lock needed here? spinlock addrconf_hash_lock will
block any RCU grace period to happen, so we can safely traverse
idev->addr_list with list_for_each_entry_rcu()...

> +
>   	write_lock_bh(&idev->lock);

if we are trying to protect idev->addr_list against addition, then we
have to extend write_lock scope. Otherwise it may happen that another
thread will grab write lock between read_unlock and write_lock.

Am I missing something?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next v2] Avoid traversing addrconf hash on ifdown
  2024-11-09 15:00 ` Vadim Fedorenko
@ 2024-11-10  6:53   ` Gilad Naaman
  2024-11-10 22:31     ` Vadim Fedorenko
  0 siblings, 1 reply; 9+ messages in thread
From: Gilad Naaman @ 2024-11-10  6:53 UTC (permalink / raw)
  To: vadim.fedorenko
  Cc: davem, dsahern, edumazet, gnaaman, horms, kuba, kuniyu, netdev,
	pabeni

> > -		spin_unlock_bh(&net->ipv6.addrconf_hash_lock);
> > +	list_for_each_entry(ifa, &idev->addr_list, if_list) {
> > +		addrconf_del_dad_work(ifa);
> > +
> > +		/* combined flag + permanent flag decide if
> > +		 * address is retained on a down event
> > +		 */
> > +		if (!keep_addr ||
> > +		    !(ifa->flags & IFA_F_PERMANENT) ||
> > +		    addr_is_local(&ifa->addr))
> > +			hlist_del_init_rcu(&ifa->addr_lst);
> >   	}
> >   
> > +	spin_unlock(&net->ipv6.addrconf_hash_lock);
> > +	read_unlock_bh(&idev->lock);
> 
> Why is this read lock needed here? spinlock addrconf_hash_lock will
> block any RCU grace period to happen, so we can safely traverse
> idev->addr_list with list_for_each_entry_rcu()...

Oh, sorry, I didn't realize the hash lock encompasses this one;
although it seems obvious in retrospect.

> > +
> >   	write_lock_bh(&idev->lock);
> 
> if we are trying to protect idev->addr_list against addition, then we
> have to extend write_lock scope. Otherwise it may happen that another
> thread will grab write lock between read_unlock and write_lock.
> 
> Am I missing something?

I wanted to ensure that access to `idev->addr_list` is performed under lock,
the same way it is done immediately afterwards;
No particular reason not to extend the existing lock, I just didn't think
about it.

For what it's worth, the original code didn't have this protection either,
since the another thread could have grabbed the lock between
`spin_unlock_bh(&net->ipv6.addrconf_hash_lock);` of the last loop iteration,
and the `write_lock`.

Should I extend the write_lock upwards, or just leave it off?

Thank you for your time,
Gilad

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next v2] Avoid traversing addrconf hash on ifdown
  2024-11-10  6:53   ` Gilad Naaman
@ 2024-11-10 22:31     ` Vadim Fedorenko
  2024-11-11  5:21       ` Gilad Naaman
  0 siblings, 1 reply; 9+ messages in thread
From: Vadim Fedorenko @ 2024-11-10 22:31 UTC (permalink / raw)
  To: Gilad Naaman
  Cc: davem, dsahern, edumazet, horms, kuba, kuniyu, netdev, pabeni

On 10/11/2024 06:53, Gilad Naaman wrote:
>>> -		spin_unlock_bh(&net->ipv6.addrconf_hash_lock);
>>> +	list_for_each_entry(ifa, &idev->addr_list, if_list) {
>>> +		addrconf_del_dad_work(ifa);
>>> +
>>> +		/* combined flag + permanent flag decide if
>>> +		 * address is retained on a down event
>>> +		 */
>>> +		if (!keep_addr ||
>>> +		    !(ifa->flags & IFA_F_PERMANENT) ||
>>> +		    addr_is_local(&ifa->addr))
>>> +			hlist_del_init_rcu(&ifa->addr_lst);
>>>    	}
>>>    
>>> +	spin_unlock(&net->ipv6.addrconf_hash_lock);
>>> +	read_unlock_bh(&idev->lock);
>>
>> Why is this read lock needed here? spinlock addrconf_hash_lock will
>> block any RCU grace period to happen, so we can safely traverse
>> idev->addr_list with list_for_each_entry_rcu()...
> 
> Oh, sorry, I didn't realize the hash lock encompasses this one;
> although it seems obvious in retrospect.
> 
>>> +
>>>    	write_lock_bh(&idev->lock);
>>
>> if we are trying to protect idev->addr_list against addition, then we
>> have to extend write_lock scope. Otherwise it may happen that another
>> thread will grab write lock between read_unlock and write_lock.
>>
>> Am I missing something?
> 
> I wanted to ensure that access to `idev->addr_list` is performed under lock,
> the same way it is done immediately afterwards;
> No particular reason not to extend the existing lock, I just didn't think
> about it.
> 
> For what it's worth, the original code didn't have this protection either,
> since the another thread could have grabbed the lock between
> `spin_unlock_bh(&net->ipv6.addrconf_hash_lock);` of the last loop iteration,
> and the `write_lock`.
> 
> Should I extend the write_lock upwards, or just leave it off?

Well, you are doing write manipulation with the list, which is protected
by read-write lock. I would expect this lock to be held in write mode. 
And you have to protect hash map at the same time. So yes, write_lock
and spin_lock altogether, I believe.

> 
> Thank you for your time,
> Gilad


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next v2] Avoid traversing addrconf hash on ifdown
  2024-11-10 22:31     ` Vadim Fedorenko
@ 2024-11-11  5:21       ` Gilad Naaman
  2024-11-11 12:07         ` Vadim Fedorenko
  0 siblings, 1 reply; 9+ messages in thread
From: Gilad Naaman @ 2024-11-11  5:21 UTC (permalink / raw)
  To: vadim.fedorenko
  Cc: davem, dsahern, edumazet, gnaaman, horms, kuba, kuniyu, netdev,
	pabeni

> On 10/11/2024 06:53, Gilad Naaman wrote:
> >>> -           spin_unlock_bh(&net->ipv6.addrconf_hash_lock);
> >>> +   list_for_each_entry(ifa, &idev->addr_list, if_list) {
> >>> +           addrconf_del_dad_work(ifa);
> >>> +
> >>> +           /* combined flag + permanent flag decide if
> >>> +            * address is retained on a down event
> >>> +            */
> >>> +           if (!keep_addr ||
> >>> +               !(ifa->flags & IFA_F_PERMANENT) ||
> >>> +               addr_is_local(&ifa->addr))
> >>> +                   hlist_del_init_rcu(&ifa->addr_lst);
> >>>     }
> >>>
> >>> +   spin_unlock(&net->ipv6.addrconf_hash_lock);
> >>> +   read_unlock_bh(&idev->lock);
> >>
> >> Why is this read lock needed here? spinlock addrconf_hash_lock will
> >> block any RCU grace period to happen, so we can safely traverse
> >> idev->addr_list with list_for_each_entry_rcu()...
> >
> > Oh, sorry, I didn't realize the hash lock encompasses this one;
> > although it seems obvious in retrospect.
> >
> >>> +
> >>>     write_lock_bh(&idev->lock);
> >>
> >> if we are trying to protect idev->addr_list against addition, then we
> >> have to extend write_lock scope. Otherwise it may happen that another
> >> thread will grab write lock between read_unlock and write_lock.
> >>
> >> Am I missing something?
> >
> > I wanted to ensure that access to `idev->addr_list` is performed under lock,
> > the same way it is done immediately afterwards;
> > No particular reason not to extend the existing lock, I just didn't think
> > about it.
> >
> > For what it's worth, the original code didn't have this protection either,
> > since the another thread could have grabbed the lock between
> > `spin_unlock_bh(&net->ipv6.addrconf_hash_lock);` of the last loop iteration,
> > and the `write_lock`.
> >
> > Should I extend the write_lock upwards, or just leave it off?
> 
> Well, you are doing write manipulation with the list, which is protected
> by read-write lock. I would expect this lock to be held in write mode.
> And you have to protect hash map at the same time. So yes, write_lock
> and spin_lock altogether, I believe.
> 

Note that within the changed lines, the list itself is only iterated-on,
not manipulated.
The changes are to the `addr_lst` list, which is the hashtable, not the
list this lock protects.

I'll send v3 with the write-lock extended.
Thank you!

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next v2] Avoid traversing addrconf hash on ifdown
  2024-11-11  5:21       ` Gilad Naaman
@ 2024-11-11 12:07         ` Vadim Fedorenko
  2024-11-12 14:41           ` Paolo Abeni
  0 siblings, 1 reply; 9+ messages in thread
From: Vadim Fedorenko @ 2024-11-11 12:07 UTC (permalink / raw)
  To: Gilad Naaman, Eric Dumazet
  Cc: davem, dsahern, horms, kuba, kuniyu, netdev, pabeni

On 11/11/2024 05:21, Gilad Naaman wrote:
>> On 10/11/2024 06:53, Gilad Naaman wrote:
>>>>> -           spin_unlock_bh(&net->ipv6.addrconf_hash_lock);
>>>>> +   list_for_each_entry(ifa, &idev->addr_list, if_list) {
>>>>> +           addrconf_del_dad_work(ifa);
>>>>> +
>>>>> +           /* combined flag + permanent flag decide if
>>>>> +            * address is retained on a down event
>>>>> +            */
>>>>> +           if (!keep_addr ||
>>>>> +               !(ifa->flags & IFA_F_PERMANENT) ||
>>>>> +               addr_is_local(&ifa->addr))
>>>>> +                   hlist_del_init_rcu(&ifa->addr_lst);
>>>>>      }
>>>>>
>>>>> +   spin_unlock(&net->ipv6.addrconf_hash_lock);
>>>>> +   read_unlock_bh(&idev->lock);
>>>>
>>>> Why is this read lock needed here? spinlock addrconf_hash_lock will
>>>> block any RCU grace period to happen, so we can safely traverse
>>>> idev->addr_list with list_for_each_entry_rcu()...
>>>
>>> Oh, sorry, I didn't realize the hash lock encompasses this one;
>>> although it seems obvious in retrospect.
>>>
>>>>> +
>>>>>      write_lock_bh(&idev->lock);
>>>>
>>>> if we are trying to protect idev->addr_list against addition, then we
>>>> have to extend write_lock scope. Otherwise it may happen that another
>>>> thread will grab write lock between read_unlock and write_lock.
>>>>
>>>> Am I missing something?
>>>
>>> I wanted to ensure that access to `idev->addr_list` is performed under lock,
>>> the same way it is done immediately afterwards;
>>> No particular reason not to extend the existing lock, I just didn't think
>>> about it.
>>>
>>> For what it's worth, the original code didn't have this protection either,
>>> since the another thread could have grabbed the lock between
>>> `spin_unlock_bh(&net->ipv6.addrconf_hash_lock);` of the last loop iteration,
>>> and the `write_lock`.
>>>
>>> Should I extend the write_lock upwards, or just leave it off?
>>
>> Well, you are doing write manipulation with the list, which is protected
>> by read-write lock. I would expect this lock to be held in write mode.
>> And you have to protect hash map at the same time. So yes, write_lock
>> and spin_lock altogether, I believe.
>>
> 
> Note that within the changed lines, the list itself is only iterated-on,
> not manipulated.
> The changes are to the `addr_lst` list, which is the hashtable, not the
> list this lock protects.
> 
> I'll send v3 with the write-lock extended.
> Thank you!

Reading it one more time, I'm not quite sure that locking hashmap
spinlock under idev->lock in write mode is a good idea... We have to
think more about it, maybe ask for another opinion. Looks like RTNL
should protect idev->addr_list from modification while idev->lock is
more about changes to idev, not only about addr_list.

@Eric could you please shed some light on the locking schema here?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next v2] Avoid traversing addrconf hash on ifdown
  2024-11-11 12:07         ` Vadim Fedorenko
@ 2024-11-12 14:41           ` Paolo Abeni
  2024-11-12 16:08             ` Vadim Fedorenko
  2024-11-13  6:21             ` Gilad Naaman
  0 siblings, 2 replies; 9+ messages in thread
From: Paolo Abeni @ 2024-11-12 14:41 UTC (permalink / raw)
  To: Vadim Fedorenko, Gilad Naaman, Eric Dumazet
  Cc: davem, dsahern, horms, kuba, kuniyu, netdev

On 11/11/24 13:07, Vadim Fedorenko wrote:
> On 11/11/2024 05:21, Gilad Naaman wrote:
>>> On 10/11/2024 06:53, Gilad Naaman wrote:
>>>>>> -           spin_unlock_bh(&net->ipv6.addrconf_hash_lock);
>>>>>> +   list_for_each_entry(ifa, &idev->addr_list, if_list) {
>>>>>> +           addrconf_del_dad_work(ifa);
>>>>>> +
>>>>>> +           /* combined flag + permanent flag decide if
>>>>>> +            * address is retained on a down event
>>>>>> +            */
>>>>>> +           if (!keep_addr ||
>>>>>> +               !(ifa->flags & IFA_F_PERMANENT) ||
>>>>>> +               addr_is_local(&ifa->addr))
>>>>>> +                   hlist_del_init_rcu(&ifa->addr_lst);
>>>>>>      }
>>>>>>
>>>>>> +   spin_unlock(&net->ipv6.addrconf_hash_lock);
>>>>>> +   read_unlock_bh(&idev->lock);
>>>>>
>>>>> Why is this read lock needed here? spinlock addrconf_hash_lock will
>>>>> block any RCU grace period to happen, so we can safely traverse
>>>>> idev->addr_list with list_for_each_entry_rcu()...
>>>>
>>>> Oh, sorry, I didn't realize the hash lock encompasses this one;
>>>> although it seems obvious in retrospect.
>>>>
>>>>>> +
>>>>>>      write_lock_bh(&idev->lock);
>>>>>
>>>>> if we are trying to protect idev->addr_list against addition, then we
>>>>> have to extend write_lock scope. Otherwise it may happen that another
>>>>> thread will grab write lock between read_unlock and write_lock.
>>>>>
>>>>> Am I missing something?
>>>>
>>>> I wanted to ensure that access to `idev->addr_list` is performed under lock,
>>>> the same way it is done immediately afterwards;
>>>> No particular reason not to extend the existing lock, I just didn't think
>>>> about it.
>>>>
>>>> For what it's worth, the original code didn't have this protection either,
>>>> since the another thread could have grabbed the lock between
>>>> `spin_unlock_bh(&net->ipv6.addrconf_hash_lock);` of the last loop iteration,
>>>> and the `write_lock`.
>>>>
>>>> Should I extend the write_lock upwards, or just leave it off?
>>>
>>> Well, you are doing write manipulation with the list, which is protected
>>> by read-write lock. I would expect this lock to be held in write mode.
>>> And you have to protect hash map at the same time. So yes, write_lock
>>> and spin_lock altogether, I believe.
>>>
>>
>> Note that within the changed lines, the list itself is only iterated-on,
>> not manipulated.
>> The changes are to the `addr_lst` list, which is the hashtable, not the
>> list this lock protects.
>>
>> I'll send v3 with the write-lock extended.
>> Thank you!
> 
> Reading it one more time, I'm not quite sure that locking hashmap
> spinlock under idev->lock in write mode is a good idea... We have to
> think more about it, maybe ask for another opinion. Looks like RTNL
> should protect idev->addr_list from modification while idev->lock is
> more about changes to idev, not only about addr_list.
> 
> @Eric could you please shed some light on the locking schema here?

AFAICS idev->addr_list is (write) protected by write_lock(idev->lock),
while net->ipv6.inet6_addr_lst is protected by
spin_lock_bh(&net->ipv6.addrconf_hash_lock).

Extending the write_lock() scope will create a lock dependency between
the hashtable lock and the list lock, which in turn could cause more
problem in the future.

Note that idev->addr_list locking looks a bit fuzzy, as is traversed in
several places under the RCU lock only. I suggest finish the conversion
of idev->addr_list to RCU and do this additional traversal under RCU, too.

Cheers,

Paolo


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next v2] Avoid traversing addrconf hash on ifdown
  2024-11-12 14:41           ` Paolo Abeni
@ 2024-11-12 16:08             ` Vadim Fedorenko
  2024-11-13  6:21             ` Gilad Naaman
  1 sibling, 0 replies; 9+ messages in thread
From: Vadim Fedorenko @ 2024-11-12 16:08 UTC (permalink / raw)
  To: Paolo Abeni, Gilad Naaman
  Cc: davem, dsahern, horms, kuba, kuniyu, netdev, Eric Dumazet

On 12/11/2024 14:41, Paolo Abeni wrote:
> On 11/11/24 13:07, Vadim Fedorenko wrote:
>> On 11/11/2024 05:21, Gilad Naaman wrote:
>>>> On 10/11/2024 06:53, Gilad Naaman wrote:
>>>>>>> -           spin_unlock_bh(&net->ipv6.addrconf_hash_lock);
>>>>>>> +   list_for_each_entry(ifa, &idev->addr_list, if_list) {
>>>>>>> +           addrconf_del_dad_work(ifa);
>>>>>>> +
>>>>>>> +           /* combined flag + permanent flag decide if
>>>>>>> +            * address is retained on a down event
>>>>>>> +            */
>>>>>>> +           if (!keep_addr ||
>>>>>>> +               !(ifa->flags & IFA_F_PERMANENT) ||
>>>>>>> +               addr_is_local(&ifa->addr))
>>>>>>> +                   hlist_del_init_rcu(&ifa->addr_lst);
>>>>>>>       }
>>>>>>>
>>>>>>> +   spin_unlock(&net->ipv6.addrconf_hash_lock);
>>>>>>> +   read_unlock_bh(&idev->lock);
>>>>>>
>>>>>> Why is this read lock needed here? spinlock addrconf_hash_lock will
>>>>>> block any RCU grace period to happen, so we can safely traverse
>>>>>> idev->addr_list with list_for_each_entry_rcu()...
>>>>>
>>>>> Oh, sorry, I didn't realize the hash lock encompasses this one;
>>>>> although it seems obvious in retrospect.
>>>>>
>>>>>>> +
>>>>>>>       write_lock_bh(&idev->lock);
>>>>>>
>>>>>> if we are trying to protect idev->addr_list against addition, then we
>>>>>> have to extend write_lock scope. Otherwise it may happen that another
>>>>>> thread will grab write lock between read_unlock and write_lock.
>>>>>>
>>>>>> Am I missing something?
>>>>>
>>>>> I wanted to ensure that access to `idev->addr_list` is performed under lock,
>>>>> the same way it is done immediately afterwards;
>>>>> No particular reason not to extend the existing lock, I just didn't think
>>>>> about it.
>>>>>
>>>>> For what it's worth, the original code didn't have this protection either,
>>>>> since the another thread could have grabbed the lock between
>>>>> `spin_unlock_bh(&net->ipv6.addrconf_hash_lock);` of the last loop iteration,
>>>>> and the `write_lock`.
>>>>>
>>>>> Should I extend the write_lock upwards, or just leave it off?
>>>>
>>>> Well, you are doing write manipulation with the list, which is protected
>>>> by read-write lock. I would expect this lock to be held in write mode.
>>>> And you have to protect hash map at the same time. So yes, write_lock
>>>> and spin_lock altogether, I believe.
>>>>
>>>
>>> Note that within the changed lines, the list itself is only iterated-on,
>>> not manipulated.
>>> The changes are to the `addr_lst` list, which is the hashtable, not the
>>> list this lock protects.
>>>
>>> I'll send v3 with the write-lock extended.
>>> Thank you!
>>
>> Reading it one more time, I'm not quite sure that locking hashmap
>> spinlock under idev->lock in write mode is a good idea... We have to
>> think more about it, maybe ask for another opinion. Looks like RTNL
>> should protect idev->addr_list from modification while idev->lock is
>> more about changes to idev, not only about addr_list.
>>
>> @Eric could you please shed some light on the locking schema here?
> 
> AFAICS idev->addr_list is (write) protected by write_lock(idev->lock),
> while net->ipv6.inet6_addr_lst is protected by
> spin_lock_bh(&net->ipv6.addrconf_hash_lock).
> 
> Extending the write_lock() scope will create a lock dependency between
> the hashtable lock and the list lock, which in turn could cause more
> problem in the future.
> 
> Note that idev->addr_list locking looks a bit fuzzy, as is traversed in
> several places under the RCU lock only.

Yeah, I was confused exactly because of some places using RCU while
others still using read_lock.

> I suggest finish the conversion
> of idev->addr_list to RCU and do this additional traversal under RCU, too.

That sounds reasonable,

Thanks!

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next v2] Avoid traversing addrconf hash on ifdown
  2024-11-12 14:41           ` Paolo Abeni
  2024-11-12 16:08             ` Vadim Fedorenko
@ 2024-11-13  6:21             ` Gilad Naaman
  1 sibling, 0 replies; 9+ messages in thread
From: Gilad Naaman @ 2024-11-13  6:21 UTC (permalink / raw)
  To: pabeni
  Cc: davem, dsahern, edumazet, gnaaman, horms, kuba, kuniyu, netdev,
	vadim.fedorenko

>On 11/11/24 13:07, Vadim Fedorenko wrote:
>> On 11/11/2024 05:21, Gilad Naaman wrote:
>>>> On 10/11/2024 06:53, Gilad Naaman wrote:
>>>>>>> -           spin_unlock_bh(&net->ipv6.addrconf_hash_lock);
>>>>>>> +   list_for_each_entry(ifa, &idev->addr_list, if_list) {
>>>>>>> +           addrconf_del_dad_work(ifa);
>>>>>>> +
>>>>>>> +           /* combined flag + permanent flag decide if
>>>>>>> +            * address is retained on a down event
>>>>>>> +            */
>>>>>>> +           if (!keep_addr ||
>>>>>>> +               !(ifa->flags & IFA_F_PERMANENT) ||
>>>>>>> +               addr_is_local(&ifa->addr))
>>>>>>> +                   hlist_del_init_rcu(&ifa->addr_lst);
>>>>>>>      }
>>>>>>>
>>>>>>> +   spin_unlock(&net->ipv6.addrconf_hash_lock);
>>>>>>> +   read_unlock_bh(&idev->lock);
>>>>>>
>>>>>> Why is this read lock needed here? spinlock addrconf_hash_lock will
>>>>>> block any RCU grace period to happen, so we can safely traverse
>>>>>> idev->addr_list with list_for_each_entry_rcu()...
>>>>>
>>>>> Oh, sorry, I didn't realize the hash lock encompasses this one;
>>>>> although it seems obvious in retrospect.
>>>>>
>>>>>>> +
>>>>>>>      write_lock_bh(&idev->lock);
>>>>>>
>>>>>> if we are trying to protect idev->addr_list against addition, then we
>>>>>> have to extend write_lock scope. Otherwise it may happen that another
>>>>>> thread will grab write lock between read_unlock and write_lock.
>>>>>>
>>>>>> Am I missing something?
>>>>>
>>>>> I wanted to ensure that access to `idev->addr_list` is performed under lock,
>>>>> the same way it is done immediately afterwards;
>>>>> No particular reason not to extend the existing lock, I just didn't think
>>>>> about it.
>>>>>
>>>>> For what it's worth, the original code didn't have this protection either,
>>>>> since the another thread could have grabbed the lock between
>>>>> `spin_unlock_bh(&net->ipv6.addrconf_hash_lock);` of the last loop iteration,
>>>>> and the `write_lock`.
>>>>>
>>>>> Should I extend the write_lock upwards, or just leave it off?
>>>>
>>>> Well, you are doing write manipulation with the list, which is protected
>>>> by read-write lock. I would expect this lock to be held in write mode.
>>>> And you have to protect hash map at the same time. So yes, write_lock
>>>> and spin_lock altogether, I believe.
>>>>
>>>
>>> Note that within the changed lines, the list itself is only iterated-on,
>>> not manipulated.
>>> The changes are to the `addr_lst` list, which is the hashtable, not the
>>> list this lock protects.
>>>
>>> I'll send v3 with the write-lock extended.
>>> Thank you!
>> 
>> Reading it one more time, I'm not quite sure that locking hashmap
>> spinlock under idev->lock in write mode is a good idea... We have to
>> think more about it, maybe ask for another opinion. Looks like RTNL
>> should protect idev->addr_list from modification while idev->lock is
>> more about changes to idev, not only about addr_list.
>> 
>> @Eric could you please shed some light on the locking schema here?
>
>AFAICS idev->addr_list is (write) protected by write_lock(idev->lock),
>while net->ipv6.inet6_addr_lst is protected by
>spin_lock_bh(&net->ipv6.addrconf_hash_lock).
>
>Extending the write_lock() scope will create a lock dependency between
>the hashtable lock and the list lock, which in turn could cause more
>problem in the future.
>
>Note that idev->addr_list locking looks a bit fuzzy, as is traversed in
>several places under the RCU lock only. I suggest finish the conversion
>of idev->addr_list to RCU and do this additional traversal under RCU, too.

Sure, no problem.

I've looked over the usage of ->addr_list in this file and there are about four
places where I'm certain I can replace idev->lock with RCU:

 - dev_forward_change
 - inet6_addr_del
 - addrconf_dad_run
 - addrconf_disable_policy_idev

As for the rest, if it's okay to run it by you before submitting a patch:

 - ipv6_link_dev_addr:
   Modifies list directly under write-lock.

 -  __ipv6_get_lladdr & ipv6_inherit_eui64 & ipv6_lonely_lladdr: Traverse in
    reverse. According my (admittedly limited) understanding, this is not
    possible in RCU.

 - addrconf_permanent_addr: Not sure if this can be RCU'd, as there's no
   variant that is both _rcu and _safe.
   If it was safe to keep iterating with just `_rcu`, I'm not sure why
   `_safe` was needed in the first place.

 - addrconf_ifdown & inet6_set_iftoken:
   Seems like write-lock is taken anyway and regardless of the iteration,
   so I'm not sure it would benefit from introducing RCU.

 - check_cleanup_prefix_route:
   I'm conflicted about this one.
   When called from ipv6_del_addr(), the write lock is taken anyway.
   When called from inet6_addr_modify(), the write-lock is taken;
   where a read-lock could have done the job.

   Should this be RCU'd as well?

>Cheers,
>
>Paolo

Cheers


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-11-13  6:22 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-08  5:25 [PATCH net-next v2] Avoid traversing addrconf hash on ifdown Gilad Naaman
2024-11-09 15:00 ` Vadim Fedorenko
2024-11-10  6:53   ` Gilad Naaman
2024-11-10 22:31     ` Vadim Fedorenko
2024-11-11  5:21       ` Gilad Naaman
2024-11-11 12:07         ` Vadim Fedorenko
2024-11-12 14:41           ` Paolo Abeni
2024-11-12 16:08             ` Vadim Fedorenko
2024-11-13  6:21             ` Gilad Naaman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).