public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Jakub Kicinski <kuba@kernel.org>
To: pablo@netfilter.org
Cc: Jakub Kicinski <kuba@kernel.org>,
	netfilter-devel@vger.kernel.org, davem@davemloft.net,
	netdev@vger.kernel.org, pabeni@redhat.com, edumazet@google.com,
	fw@strlen.de, horms@kernel.org
Subject: Re: [PATCH net 11/11] ipvs: fix the spin_lock usage for RT build
Date: Mon, 27 Apr 2026 19:13:11 -0700	[thread overview]
Message-ID: <20260428021311.793367-1-kuba@kernel.org> (raw)
In-Reply-To: <20260424190513.32823-12-pablo@netfilter.org>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
ipvs: fix the spin_lock usage for RT build

This patch fixes PREEMPT_RT issues with spin_lock usage under bit
locks in the IPVS resizable hash tables, and reworks ip_vs_conn_hash(),
ip_vs_conn_unlink(), and ip_vs_conn_fill_cport() accordingly.

Link: https://sashiko.dev/#/patchset/20260415200216.79699-1-ja%40ssi.bg
Link: https://sashiko.dev/#/patchset/20260420165539.85174-4-ja%40ssi.bg
Link: https://sashiko.dev/#/patchset/20260422135823.50489-4-ja%40ssi.bg

> diff --git a/net/netfilter/ipvs/ip_vs_conn.c b/net/netfilter/ipvs/ip_vs_conn.c
> index 84a4921a7865..9e23cda84825 100644
> --- a/net/netfilter/ipvs/ip_vs_conn.c
> +++ b/net/netfilter/ipvs/ip_vs_conn.c

[ ... ]

> @@ -637,6 +628,7 @@ void ip_vs_conn_fill_cport(struct ip_vs_conn *cp, __be16 cport)
>  	struct ip_vs_conn_hnode *hn;
>  	u32 hash_key, hash_key_new;
>  	struct ip_vs_conn_param p;
> +	bool by_me = false;
>  	int ntbl;
>  	int dir;
>  
> @@ -664,8 +656,11 @@ void ip_vs_conn_fill_cport(struct ip_vs_conn *cp, __be16 cport)
>  		t = rcu_dereference(t->new_tbl);
>  		ntbl++;
>  		/* We are lost? */
> -		if (ntbl >= 2)
> +		if (ntbl >= 2) {
> +			IP_VS_ERR_RL("%s(): Too many ht changes for dir %d\n",
> +				     __func__, dir);
>  			return;
> +		}
>  	}

Can the ntbl >= 2 early return strand cp in an unrecoverable state
when it is reached on the dir=0 iteration after dir=1 has already
committed partial state?

In the pre-patch code, cp->cport, the clearing of
IP_VS_CONN_F_NO_CPORT, and the atomic_dec() of no_cport_conns[] were
all performed together inside a single if (!dir) critical section, so
an abort via ntbl >= 2 simply left cp->cport == 0 and a later packet
could retry from scratch.

[ ... ]

> -	spin_lock(&cp->lock);
> -	if ((cp->flags & IP_VS_CONN_F_NO_CPORT) &&
> -	    (cp->flags & IP_VS_CONN_F_HASHED)) {
> +	/* Fill cport once, even if multiple packets try to do it */
> +	if (cp->flags & IP_VS_CONN_F_NO_CPORT && (!cp->cport || by_me)) {
> +		/* If we race with resizing make sure cport is set for dir 1 */
> +		if (!cp->cport) {
> +			cp->cport = cport;
> +			by_me = true;
> +		}
> +		if (!dir) {
> +			atomic_dec(&ipvs->no_cport_conns[af_id]);
> +			cp->flags &= ~IP_VS_CONN_F_NO_CPORT;
> +		}

With the new structure, the dir=1 iteration (which runs first when
use2 is true) sets cp->cport = cport and by_me = true, but leaves
IP_VS_CONN_F_NO_CPORT set and does not touch no_cport_conns[].  The
flag clear and the atomic_dec happen only in the subsequent dir=0
iteration.

If the dir=0 iteration then returns early via the ntbl >= 2 path,
either directly on entry or after an accumulated ntbl across retry:
restarts (ntbl is reset per next_dir, not per retry), cp ends up
with cp->cport set to the new value but IP_VS_CONN_F_NO_CPORT still
set.

Can any subsequent ip_vs_conn_fill_cport() call complete the
transition?  The gate is:

	if (cp->flags & IP_VS_CONN_F_NO_CPORT && (!cp->cport || by_me)) {

For a fresh caller, by_me is false and cp->cport is now non-zero, so
the gate is structurally unsatisfiable and the flag clear /
atomic_dec never runs.

In __ip_vs_conn_in_get() the lookup

	!p->cport ^ !(cp->flags & IP_VS_CONN_F_NO_CPORT)

evaluates false for this cp (flag still set, p->cport non-zero), and
the cport=0 fallback also fails since p->cport != cp->cport (cp->cport
is now non-zero).  Does this leave cp unreachable via forward
direction lookup until the timer expires?

As a side effect, does no_cport_conns[af_id] stay elevated for the
remainder of cp's lifetime, forcing ip_vs_conn_in_get() into the
slower cport=0 fallback path for that af/netns while the stuck entry
lives?

> -		if (!dir) {
> -			atomic_dec(&ipvs->no_cport_conns[af_id]);
> -			cp->flags &= ~IP_VS_CONN_F_NO_CPORT;
> -			cp->cport = cport;
> -		}
>  	}
> -	spin_unlock(&cp->lock);
>  
>  	if (head != head2)
>  		hlist_bl_unlock(head2);
>  	hlist_bl_unlock(head);
>  	write_seqcount_end(&t->seqc[hash_key & t->seqc_mask]);
>  	preempt_enable_nested();
> -	spin_unlock_bh(&t->lock[hash_key & t->lock_mask].l);
> -	if (dir--)
> +	spin_unlock(&t->lock[hash_key & t->lock_mask].l);
> +
> +	spin_unlock_bh(&cp->lock);
> +	if (dir-- && by_me)
>  		goto next_dir;
>  }
-- 
pw-bot: cr

  reply	other threads:[~2026-04-28  2:13 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-24 19:05 [PATCH net 00/11] Netfilter/IPVS fixes for net Pablo Neira Ayuso
2026-04-24 19:05 ` [PATCH net 01/11] netfilter: arp_tables: fix IEEE1394 ARP payload parsing Pablo Neira Ayuso
2026-04-24 19:05 ` [PATCH net 02/11] netfilter: nf_tables: use list_del_rcu for netlink hooks Pablo Neira Ayuso
2026-04-24 19:05 ` [PATCH net 03/11] rculist: add list_splice_rcu() for private lists Pablo Neira Ayuso
2026-04-24 19:05 ` [PATCH net 04/11] netfilter: nf_tables: join hook list via splice_list_rcu() in commit phase Pablo Neira Ayuso
2026-04-24 19:05 ` [PATCH net 05/11] netfilter: nf_tables: add hook transactions for device deletions Pablo Neira Ayuso
2026-04-24 19:05 ` [PATCH net 06/11] netfilter: xt_policy: fix strict mode inbound policy matching Pablo Neira Ayuso
2026-04-24 19:05 ` [PATCH net 07/11] netfilter: reject zero shift in nft_bitwise Pablo Neira Ayuso
2026-04-24 19:05 ` [PATCH net 08/11] netfilter: nf_conntrack_sip: don't use simple_strtoul Pablo Neira Ayuso
2026-04-24 19:05 ` [PATCH net 09/11] ipvs: fixes for the new ip_vs_status info Pablo Neira Ayuso
2026-04-24 19:05 ` [PATCH net 10/11] ipvs: fix races around the conn_lfactor and svc_lfactor sysctl vars Pablo Neira Ayuso
2026-04-24 19:05 ` [PATCH net 11/11] ipvs: fix the spin_lock usage for RT build Pablo Neira Ayuso
2026-04-28  2:13   ` Jakub Kicinski [this message]
2026-04-28  2:13   ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260428021311.793367-1-kuba@kernel.org \
    --to=kuba@kernel.org \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=fw@strlen.de \
    --cc=horms@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=pablo@netfilter.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox