All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vadim Fedorenko <vadim.fedorenko@linux.dev>
To: Gilad Naaman <gnaaman@drivenets.com>,
	Kuniyuki Iwashima <kuniyu@amazon.com>,
	"David S. Miller" <davem@davemloft.net>,
	David Ahern <dsahern@kernel.org>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Simon Horman <horms@kernel.org>,
	netdev@vger.kernel.org
Subject: Re: [PATCH net-next v2] Avoid traversing addrconf hash on ifdown
Date: Sat, 9 Nov 2024 15:00:55 +0000	[thread overview]
Message-ID: <ea009a4a-c9f2-4843-b84d-e6b72982228e@linux.dev> (raw)
In-Reply-To: <20241108052559.2926114-1-gnaaman@drivenets.com>

On 08/11/2024 05:25, Gilad Naaman wrote:
> struct inet6_dev already has a list of addresses owned by the device,
> enabling us to traverse this much shorter list, instead of scanning
> the entire hash-table.
> 
> Signed-off-by: Gilad Naaman <gnaaman@drivenets.com>
> ---
> Changes in v2:
>   - Remove double BH sections
>   - Styling fixes (extra {}, extra newline)
> ---
>   net/ipv6/addrconf.c | 38 +++++++++++++++++---------------------
>   1 file changed, 17 insertions(+), 21 deletions(-)
> 
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index d0a99710d65d..c6fbd634912a 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -3846,12 +3846,12 @@ static int addrconf_ifdown(struct net_device *dev, bool unregister)
>   {
>   	unsigned long event = unregister ? NETDEV_UNREGISTER : NETDEV_DOWN;
>   	struct net *net = dev_net(dev);
> -	struct inet6_dev *idev;
>   	struct inet6_ifaddr *ifa;
>   	LIST_HEAD(tmp_addr_list);
> +	struct inet6_dev *idev;
>   	bool keep_addr = false;
>   	bool was_ready;
> -	int state, i;
> +	int state;
>   
>   	ASSERT_RTNL();
>   
> @@ -3890,28 +3890,24 @@ static int addrconf_ifdown(struct net_device *dev, bool unregister)
>   	}
>   
>   	/* Step 2: clear hash table */
> -	for (i = 0; i < IN6_ADDR_HSIZE; i++) {
> -		struct hlist_head *h = &net->ipv6.inet6_addr_lst[i];
> +	read_lock_bh(&idev->lock);
 > +	spin_lock(&net->ipv6.addrconf_hash_lock);>
> -		spin_lock_bh(&net->ipv6.addrconf_hash_lock);
> -restart:
> -		hlist_for_each_entry_rcu(ifa, h, addr_lst) {
> -			if (ifa->idev == idev) {
> -				addrconf_del_dad_work(ifa);
> -				/* combined flag + permanent flag decide if
> -				 * address is retained on a down event
> -				 */
> -				if (!keep_addr ||
> -				    !(ifa->flags & IFA_F_PERMANENT) ||
> -				    addr_is_local(&ifa->addr)) {
> -					hlist_del_init_rcu(&ifa->addr_lst);
> -					goto restart;
> -				}
> -			}
> -		}
> -		spin_unlock_bh(&net->ipv6.addrconf_hash_lock);
> +	list_for_each_entry(ifa, &idev->addr_list, if_list) {
> +		addrconf_del_dad_work(ifa);
> +
> +		/* combined flag + permanent flag decide if
> +		 * address is retained on a down event
> +		 */
> +		if (!keep_addr ||
> +		    !(ifa->flags & IFA_F_PERMANENT) ||
> +		    addr_is_local(&ifa->addr))
> +			hlist_del_init_rcu(&ifa->addr_lst);
>   	}
>   
> +	spin_unlock(&net->ipv6.addrconf_hash_lock);
> +	read_unlock_bh(&idev->lock);

Why is this read lock needed here? spinlock addrconf_hash_lock will
block any RCU grace period to happen, so we can safely traverse
idev->addr_list with list_for_each_entry_rcu()...

> +
>   	write_lock_bh(&idev->lock);

if we are trying to protect idev->addr_list against addition, then we
have to extend write_lock scope. Otherwise it may happen that another
thread will grab write lock between read_unlock and write_lock.

Am I missing something?

  reply	other threads:[~2024-11-09 15:01 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-08  5:25 [PATCH net-next v2] Avoid traversing addrconf hash on ifdown Gilad Naaman
2024-11-09 15:00 ` Vadim Fedorenko [this message]
2024-11-10  6:53   ` Gilad Naaman
2024-11-10 22:31     ` Vadim Fedorenko
2024-11-11  5:21       ` Gilad Naaman
2024-11-11 12:07         ` Vadim Fedorenko
2024-11-12 14:41           ` Paolo Abeni
2024-11-12 16:08             ` Vadim Fedorenko
2024-11-13  6:21             ` Gilad Naaman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ea009a4a-c9f2-4843-b84d-e6b72982228e@linux.dev \
    --to=vadim.fedorenko@linux.dev \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=gnaaman@drivenets.com \
    --cc=horms@kernel.org \
    --cc=kuba@kernel.org \
    --cc=kuniyu@amazon.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.