netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vadim Fedorenko <vadim.fedorenko@linux.dev>
To: Paolo Abeni <pabeni@redhat.com>, Gilad Naaman <gnaaman@drivenets.com>
Cc: davem@davemloft.net, dsahern@kernel.org, horms@kernel.org,
	kuba@kernel.org, kuniyu@amazon.com, netdev@vger.kernel.org,
	Eric Dumazet <edumazet@google.com>
Subject: Re: [PATCH net-next v2] Avoid traversing addrconf hash on ifdown
Date: Tue, 12 Nov 2024 16:08:29 +0000	[thread overview]
Message-ID: <2acb766d-4cbc-426d-9d0d-0d592610e209@linux.dev> (raw)
In-Reply-To: <ecdad6a5-d766-4ff2-a8ad-b605ebb3811c@redhat.com>

On 12/11/2024 14:41, Paolo Abeni wrote:
> On 11/11/24 13:07, Vadim Fedorenko wrote:
>> On 11/11/2024 05:21, Gilad Naaman wrote:
>>>> On 10/11/2024 06:53, Gilad Naaman wrote:
>>>>>>> -           spin_unlock_bh(&net->ipv6.addrconf_hash_lock);
>>>>>>> +   list_for_each_entry(ifa, &idev->addr_list, if_list) {
>>>>>>> +           addrconf_del_dad_work(ifa);
>>>>>>> +
>>>>>>> +           /* combined flag + permanent flag decide if
>>>>>>> +            * address is retained on a down event
>>>>>>> +            */
>>>>>>> +           if (!keep_addr ||
>>>>>>> +               !(ifa->flags & IFA_F_PERMANENT) ||
>>>>>>> +               addr_is_local(&ifa->addr))
>>>>>>> +                   hlist_del_init_rcu(&ifa->addr_lst);
>>>>>>>       }
>>>>>>>
>>>>>>> +   spin_unlock(&net->ipv6.addrconf_hash_lock);
>>>>>>> +   read_unlock_bh(&idev->lock);
>>>>>>
>>>>>> Why is this read lock needed here? spinlock addrconf_hash_lock will
>>>>>> block any RCU grace period to happen, so we can safely traverse
>>>>>> idev->addr_list with list_for_each_entry_rcu()...
>>>>>
>>>>> Oh, sorry, I didn't realize the hash lock encompasses this one;
>>>>> although it seems obvious in retrospect.
>>>>>
>>>>>>> +
>>>>>>>       write_lock_bh(&idev->lock);
>>>>>>
>>>>>> if we are trying to protect idev->addr_list against addition, then we
>>>>>> have to extend write_lock scope. Otherwise it may happen that another
>>>>>> thread will grab write lock between read_unlock and write_lock.
>>>>>>
>>>>>> Am I missing something?
>>>>>
>>>>> I wanted to ensure that access to `idev->addr_list` is performed under lock,
>>>>> the same way it is done immediately afterwards;
>>>>> No particular reason not to extend the existing lock, I just didn't think
>>>>> about it.
>>>>>
>>>>> For what it's worth, the original code didn't have this protection either,
>>>>> since the another thread could have grabbed the lock between
>>>>> `spin_unlock_bh(&net->ipv6.addrconf_hash_lock);` of the last loop iteration,
>>>>> and the `write_lock`.
>>>>>
>>>>> Should I extend the write_lock upwards, or just leave it off?
>>>>
>>>> Well, you are doing write manipulation with the list, which is protected
>>>> by read-write lock. I would expect this lock to be held in write mode.
>>>> And you have to protect hash map at the same time. So yes, write_lock
>>>> and spin_lock altogether, I believe.
>>>>
>>>
>>> Note that within the changed lines, the list itself is only iterated-on,
>>> not manipulated.
>>> The changes are to the `addr_lst` list, which is the hashtable, not the
>>> list this lock protects.
>>>
>>> I'll send v3 with the write-lock extended.
>>> Thank you!
>>
>> Reading it one more time, I'm not quite sure that locking hashmap
>> spinlock under idev->lock in write mode is a good idea... We have to
>> think more about it, maybe ask for another opinion. Looks like RTNL
>> should protect idev->addr_list from modification while idev->lock is
>> more about changes to idev, not only about addr_list.
>>
>> @Eric could you please shed some light on the locking schema here?
> 
> AFAICS idev->addr_list is (write) protected by write_lock(idev->lock),
> while net->ipv6.inet6_addr_lst is protected by
> spin_lock_bh(&net->ipv6.addrconf_hash_lock).
> 
> Extending the write_lock() scope will create a lock dependency between
> the hashtable lock and the list lock, which in turn could cause more
> problem in the future.
> 
> Note that idev->addr_list locking looks a bit fuzzy, as is traversed in
> several places under the RCU lock only.

Yeah, I was confused exactly because of some places using RCU while
others still using read_lock.

> I suggest finish the conversion
> of idev->addr_list to RCU and do this additional traversal under RCU, too.

That sounds reasonable,

Thanks!

  reply	other threads:[~2024-11-12 16:08 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-08  5:25 [PATCH net-next v2] Avoid traversing addrconf hash on ifdown Gilad Naaman
2024-11-09 15:00 ` Vadim Fedorenko
2024-11-10  6:53   ` Gilad Naaman
2024-11-10 22:31     ` Vadim Fedorenko
2024-11-11  5:21       ` Gilad Naaman
2024-11-11 12:07         ` Vadim Fedorenko
2024-11-12 14:41           ` Paolo Abeni
2024-11-12 16:08             ` Vadim Fedorenko [this message]
2024-11-13  6:21             ` Gilad Naaman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2acb766d-4cbc-426d-9d0d-0d592610e209@linux.dev \
    --to=vadim.fedorenko@linux.dev \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=gnaaman@drivenets.com \
    --cc=horms@kernel.org \
    --cc=kuba@kernel.org \
    --cc=kuniyu@amazon.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).