public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Jordan Rife <jrife@google.com>
To: Kuniyuki Iwashima <kuniyu@google.com>
Cc: netdev@vger.kernel.org, bpf@vger.kernel.org,
	 Willem de Bruijn <willemdebruijn.kernel@gmail.com>,
	Eric Dumazet <edumazet@google.com>,
	 Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@linux.dev>,
	 Stanislav Fomichev <sdf@fomichev.me>,
	Andrii Nakryiko <andrii@kernel.org>,
	 Yusuke Suzuki <yusuke.suzuki@isovalent.com>,
	Jakub Kicinski <kuba@kernel.org>
Subject: Re: [PATCH net-next v3 1/4] udp: Only compare daddr/dport when sk_state == TCP_ESTABLISHED
Date: Wed, 1 Apr 2026 20:50:41 +0000	[thread overview]
Message-ID: <ac1qqMc8M96aDORI@google.com> (raw)
In-Reply-To: <CAAVpQUDk1ywgsuCtreVCV4yVJRm_mcjsnC-+LuUwLRLHWavtZA@mail.gmail.com>

On Mon, Mar 30, 2026 at 06:21:39PM -0700, Kuniyuki Iwashima wrote:
> On Mon, Mar 30, 2026 at 2:57 PM Jordan Rife <jrife@google.com> wrote:
> >
> > Adjust lookups and scoring to keep their results equivalent to before
> > even if inet_daddr+inet_dport are left intact after disconnecting a
> > socket (sk_state == TCP_CLOSE). sk_state == TCP_ESTABLISHED implies that
> > *daddr is non-zero, so remove redundant checks for that at the same
> > time. Note that __udp6_lib_demux_lookup already checks if sk_state ==
> > TCP_ESTABLISHED, so no change was needed there [1].
> >
> > I could find no discernible difference in performance in
> > udp4_lib_lookup2 before and after the change in compute_score.
> 
> What workload did you test the series with ?

These measurements were taken on the server side while running a netperf
UDP_STREAM test over a 100 Gbps link.

> I think we want to see results under DDoS.

Intuitively, it seems like the performance should be similar.
sk_state resides in the same cache line as inet_daddr, inet_dport, and
sk_v6_daddr, and we trade a comparison with inet_daddr/sk_v6_daddr for
one with sk_state. Of course, code-level intuition can be wrong, so I'm
happy to do some more extensive testing if you feel it's warranted to
make sure that performance isn't regressing.
 
> >
> > (AMD Ryzen 9 9900X)
> >
> > kprobe:udp4_lib_lookup2 {
> >         @start[cpu] = nsecs;
> > }
> > kretprobe:udp4_lib_lookup2 {
> >         @lookup[cpu] = hist(nsecs - @start[cpu], 2);
> > }
> >
> > BEFORE
> > ======
> > @lookup[11]:
> > [80, 96)         1387077 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> > [96, 112)         364973 |@@@@@@@@@@@@@                                       |
> > [112, 128)         34261 |@                                                   |
> > [128, 160)          7246 |                                                    |
> > [160, 192)           215 |                                                    |
> > [192, 224)           126 |                                                    |
> >
> > AFTER
> > =====
> > @lookup[11]:
> > [80, 96)         1408594 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
> > [96, 112)         340568 |@@@@@@@@@@@@                                        |
> > [112, 128)         30753 |@                                                   |
> > [128, 160)          8019 |                                                    |
> > [160, 192)           231 |                                                    |
> > [192, 224)           157 |                                                    |
> >
> > [1]: https://lore.kernel.org/netdev/20170623222537.130493-1-tracywwnj@gmail.com/
> >
> > Signed-off-by: Jordan Rife <jrife@google.com>
> > ---
> >  net/ipv4/udp.c | 20 +++++++++++---------
> >  net/ipv6/udp.c | 18 +++++++++---------
> >  2 files changed, 20 insertions(+), 18 deletions(-)
> >
> > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> > index b60fad393e18..d91c587c3657 100644
> > --- a/net/ipv4/udp.c
> > +++ b/net/ipv4/udp.c
> > @@ -385,16 +385,16 @@ static int compute_score(struct sock *sk, const struct net *net,
> >         score = (sk->sk_family == PF_INET) ? 2 : 1;
> >
> >         inet = inet_sk(sk);
> > -       if (inet->inet_daddr) {
> > +       if (sk->sk_state == TCP_ESTABLISHED) {
> >                 if (inet->inet_daddr != saddr)
> >                         return -1;
> >                 score += 4;
> > -       }
> >
> > -       if (inet->inet_dport) {
> > -               if (inet->inet_dport != sport)
> > -                       return -1;
> > -               score += 4;
> > +               if (inet->inet_dport) {
> > +                       if (inet->inet_dport != sport)
> > +                               return -1;
> > +                       score += 4;
> > +               }
> >         }
> >
> >         dev_match = udp_sk_bound_dev_eq(net, sk->sk_bound_dev_if,
> > @@ -796,8 +796,9 @@ static inline bool __udp_is_mcast_sock(struct net *net, const struct sock *sk,
> >
> >         if (!net_eq(sock_net(sk), net) ||
> >             udp_sk(sk)->udp_port_hash != hnum ||
> > -           (inet->inet_daddr && inet->inet_daddr != rmt_addr) ||
> > -           (inet->inet_dport != rmt_port && inet->inet_dport) ||
> > +           (sk->sk_state == TCP_ESTABLISHED &&
> > +            (inet->inet_daddr != rmt_addr ||
> > +            (inet->inet_dport != rmt_port && inet->inet_dport))) ||
> >             (inet->inet_rcv_saddr && inet->inet_rcv_saddr != loc_addr) ||
> >             ipv6_only_sock(sk) ||
> >             !udp_sk_bound_dev_eq(net, sk->sk_bound_dev_if, dif, sdif))
> > @@ -2854,7 +2855,8 @@ static struct sock *__udp4_lib_demux_lookup(struct net *net,
> >         ports = INET_COMBINED_PORTS(rmt_port, hnum);
> >
> >         udp_portaddr_for_each_entry_rcu(sk, &hslot2->head) {
> > -               if (inet_match(net, sk, acookie, ports, dif, sdif))
> > +               if (sk->sk_state == TCP_ESTABLISHED &&
> > +                   inet_match(net, sk, acookie, ports, dif, sdif))
> >                         return sk;
> >                 /* Only check first socket in chain */
> >                 break;
> > diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
> > index 010b909275dd..b93a9a3e7678 100644
> > --- a/net/ipv6/udp.c
> > +++ b/net/ipv6/udp.c
> > @@ -147,16 +147,16 @@ static int compute_score(struct sock *sk, const struct net *net,
> >         score = 0;
> >         inet = inet_sk(sk);
> >
> > -       if (inet->inet_dport) {
> > +       if (sk->sk_state == TCP_ESTABLISHED) {
> >                 if (inet->inet_dport != sport)
> >                         return -1;
> >                 score++;
> > -       }
> >
> > -       if (!ipv6_addr_any(&sk->sk_v6_daddr)) {
> > -               if (!ipv6_addr_equal(&sk->sk_v6_daddr, saddr))
> > -                       return -1;
> > -               score++;
> > +               if (!ipv6_addr_any(&sk->sk_v6_daddr)) {
> 
> This looks unnecessary.

Yep, thanks. This is inverted. It should have been similar to the ipv4
logic where the unnecessary `if (inet->inet_daddr)` check was removed
after adding if `(sk->sk_state == TCP_ESTABLISHED)`:

	if (sk->sk_state == TCP_ESTABLISHED) {
		if (inet->inet_dport) {
			if (inet->inet_dport != sport)
				return -1;
			score++;
		}

		if (!ipv6_addr_equal(&sk->sk_v6_addr, saddr))
			return -1;
		score++;
	}
 
> 
> > +                       if (!ipv6_addr_equal(&sk->sk_v6_daddr, saddr))
> > +                               return -1;
> > +                       score++;
> > +               }
> >         }
> >
> >         bound_dev_if = READ_ONCE(sk->sk_bound_dev_if);
> > @@ -949,9 +949,9 @@ static bool __udp_v6_is_mcast_sock(struct net *net, const struct sock *sk,
> >
> >         if (udp_sk(sk)->udp_port_hash != hnum ||
> >             sk->sk_family != PF_INET6 ||
> > -           (inet->inet_dport && inet->inet_dport != rmt_port) ||
> > -           (!ipv6_addr_any(&sk->sk_v6_daddr) &&
> > -                   !ipv6_addr_equal(&sk->sk_v6_daddr, rmt_addr)) ||
> > +           (sk->sk_state == TCP_ESTABLISHED &&
> > +            ((inet->inet_dport && inet->inet_dport != rmt_port) ||
> > +            !ipv6_addr_equal(&sk->sk_v6_daddr, rmt_addr))) ||
> >             !udp_sk_bound_dev_eq(net, READ_ONCE(sk->sk_bound_dev_if), dif, sdif) ||
> >             (!ipv6_addr_any(&sk->sk_v6_rcv_saddr) &&
> >                     !ipv6_addr_equal(&sk->sk_v6_rcv_saddr, loc_addr)))
> > --
> > 2.53.0.1118.gaef5881109-goog
> >

Thanks,
Jordan

  reply	other threads:[~2026-04-01 20:50 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-30 21:57 [PATCH net-next v3 0/4] udp: Preserve UDP socket addresses on abort Jordan Rife
2026-03-30 21:57 ` [PATCH net-next v3 1/4] udp: Only compare daddr/dport when sk_state == TCP_ESTABLISHED Jordan Rife
2026-03-31  1:21   ` Kuniyuki Iwashima
2026-04-01 20:50     ` Jordan Rife [this message]
2026-03-30 21:57 ` [PATCH net-next v3 2/4] udp: Remove disconnected sockets from the 4-tuple hash Jordan Rife
2026-03-31 16:51   ` kernel test robot
2026-03-31 17:33   ` kernel test robot
2026-03-31 17:42   ` kernel test robot
2026-03-31 17:55   ` kernel test robot
2026-03-31 18:49   ` kernel test robot
2026-03-30 21:57 ` [PATCH net-next v3 3/4] udp: Preserve destination address info after abort Jordan Rife
2026-03-30 21:57 ` [PATCH net-next v3 4/4] selftests/bpf: Ensure dst addr/port are preserved after socket abort Jordan Rife

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ac1qqMc8M96aDORI@google.com \
    --to=jrife@google.com \
    --cc=andrii@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=edumazet@google.com \
    --cc=kuba@kernel.org \
    --cc=kuniyu@google.com \
    --cc=martin.lau@linux.dev \
    --cc=netdev@vger.kernel.org \
    --cc=sdf@fomichev.me \
    --cc=willemdebruijn.kernel@gmail.com \
    --cc=yusuke.suzuki@isovalent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox