netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Stefano Brivio <sbrivio@redhat.com>
To: Paolo Abeni <pabeni@redhat.com>
Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>,
	Eric Dumazet <edumazet@google.com>,
	netdev@vger.kernel.org, Kuniyuki Iwashima <kuniyu@amazon.com>,
	Mike Manning <mvrmanning@gmail.com>,
	David Gibson <david@gibson.dropbear.id.au>,
	Paul Holzinger <pholzing@redhat.com>,
	Philo Lu <lulie@linux.alibaba.com>,
	Cambda Zhu <cambda@linux.alibaba.com>,
	Fred Chen <fred.cc@alibaba-inc.com>,
	Yubing Qiu <yubing.qiuyubing@alibaba-inc.com>
Subject: Re: [PATCH net-next 2/2] datagram, udp: Set local address and rehash socket atomically against lookup
Date: Wed, 18 Dec 2024 17:21:10 +0100	[thread overview]
Message-ID: <20241218172110.12c4016a@elisabeth> (raw)
In-Reply-To: <20241206143535.3e095320@elisabeth>

On Fri, 6 Dec 2024 14:35:35 +0100
Stefano Brivio <sbrivio@redhat.com> wrote:

> On Fri, 6 Dec 2024 13:36:47 +0100
> Paolo Abeni <pabeni@redhat.com> wrote:
> 
> > On 12/6/24 11:50, Stefano Brivio wrote:  
> > > On Thu, 5 Dec 2024 17:53:33 +0100 Paolo Abeni <pabeni@redhat.com> wrote:    
> > >> I'm wondering if the issue could be solved (almost) entirely in the
> > >> rehash callback?!? if the rehash happens on connect and the the socket
> > >> does not have hash4 yet (it's not a reconnect) do the l4 hashing before
> > >> everything else.    
> > > 
> > > So, yes, that's actually the first thing I tried: do the hashing (any
> > > hash) before setting the address (I guess that's what you mean by
> > > "everything else").
> > > 
> > > If you take this series, and drop the changes in __udp4_lib_lookup(), I
> > > guess that would match what you suggest.    
> > 
> > I mean something slightly different. Just to explain the idea something
> > alike the following (completely untested):
> > 
> > ---
> > diff --git a/net/ipv4/datagram.c b/net/ipv4/datagram.c
> > index cc6d0bd7b0a9..e9cc6edbcdc6 100644
> > --- a/net/ipv4/datagram.c
> > +++ b/net/ipv4/datagram.c
> > @@ -61,6 +61,10 @@ int __ip4_datagram_connect(struct sock *sk, struct
> > sockaddr *uaddr, int addr_len
> >  		err = -EACCES;
> >  		goto out;
> >  	}
> > +
> > +	sk->sk_state = TCP_ESTABLISHED;
> > +	inet->inet_daddr = fl4->daddr;
> > +	inet->inet_dport = usin->sin_port;
> >  	if (!inet->inet_saddr)
> >  		inet->inet_saddr = fl4->saddr;	/* Update source address */
> >  	if (!inet->inet_rcv_saddr) {
> > @@ -68,10 +72,7 @@ int __ip4_datagram_connect(struct sock *sk, struct
> > sockaddr *uaddr, int addr_len
> >  		if (sk->sk_prot->rehash)
> >  			sk->sk_prot->rehash(sk);
> >  	}
> > -	inet->inet_daddr = fl4->daddr;
> > -	inet->inet_dport = usin->sin_port;
> >  	reuseport_has_conns_set(sk);
> > -	sk->sk_state = TCP_ESTABLISHED;
> >  	sk_set_txhash(sk);
> >  	atomic_set(&inet->inet_id, get_random_u16());
> > 
> > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> > index 6a01905d379f..c6c58b0a6b7b 100644
> > --- a/net/ipv4/udp.c
> > +++ b/net/ipv4/udp.c
> > @@ -2194,6 +2194,21 @@ void udp_lib_rehash(struct sock *sk, u16 newhash,
> > u16 newhash4)
> >  			if (rcu_access_pointer(sk->sk_reuseport_cb))
> >  				reuseport_detach_sock(sk);
> > 
> > +			if (sk->sk_state == TCP_ESTABLISHED && !udp_hashed4(sk)) {
> > +				struct udp_hslot * hslot4 = udp_hashslot4(udptable, newhash4);
> > +
> > +				udp_sk(sk)->udp_lrpa_hash = newhash4;
> > +				spin_lock(&hslot4->lock);
> > +				hlist_nulls_add_head_rcu(&udp_sk(sk)->udp_lrpa_node,
> > +							 &hslot4->nulls_head);
> > +				hslot4->count++;
> > +				spin_unlock(&hslot4->lock);
> > +
> > +				spin_lock(&hslot2->lock);
> > +				udp_hash4_inc(hslot2);
> > +				spin_unlock(&hslot2->lock);
> > +			}
> > +
> >  			if (hslot2 != nhslot2) {
> >  				spin_lock(&hslot2->lock);
> >  				hlist_del_init_rcu(&udp_sk(sk)->udp_portaddr_node);
> > ---
> > 
> > Basically the idea is to leverage the hash4 - which should be not yet
> > initialized when rehash is invoked due to connect().  
> 
> That assumption seems to be correct from my tests.

...but that doesn't help in a general case, because we don't have a
wildcard lookup for four-tuple hashes, more on that below.

> > In such a case, before touching hash{,2}, do hash4.  
> 
> Brilliant, thanks. I'll give that a try.

It sounded like a nice idea and I actually tried quite hard, but it
can't work (so I'm posting a different/simpler fix), mostly for three
reasons (plus a bunch that would require sparse but doable changes):

1. we can't use four-tuple hashes on CONFIG_BASE_SMALL=y, and it would
   be rather weird to leave it unfixed in that case (and, worse, to have
   substantially different behaviours depending on CONFIG_BASE_SMALL).

   At the same time, I see your point about it (from review to v4 of
   the four-tuple hash series), and I don't feel like it's worth adding
   it back also for CONFIG_BASE_SMALL.

   I tried adding some special handling based on a similar concept that
   wouldn't make struct udp_table bigger, but it's strictly more
   complicated than the other fix I'm posting.

2. hash4_cnt is stored in the secondary hash slot, and I see why, but
   that means that if the secondary hash doesn't match, we'll also fail
   the lookup based on four-tuple hash.

   We could introduce a special case in the lookup, perhaps as fallback
   only, ignoring the result of udp_has_hash4(), but it looks rather
   convoluted (especially compared to the fix I'm posting)

3. we would need another version of udp{4,6}_lib_lookup4() (or a branch
   inside it), handling wildcard lookups like udp{4,6}_lib_lookup2()
   does, and then call udp{4,6}_lib_lookup4() a second time with
   INADDR_ANY / &in6addr_any, because we don't know if the receive
   address changed yet, as we're performing the lookup.

So, instead, I'm resorting to the primary hash, as fallback only. If
what we need is a hash that doesn't include the address, such as an
"uninitialised" four-tuple hash, we can as well use the original hash
that doesn't include addresses by design.

-- 
Stefano


  parent reply	other threads:[~2024-12-18 16:21 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-04 22:12 [PATCH net-next 0/2] Fix race between datagram socket address change and rehash Stefano Brivio
2024-12-04 22:12 ` [PATCH net-next 1/2] datagram: Rehash sockets only if local address changed for their family Stefano Brivio
2024-12-04 22:12 ` [PATCH net-next 2/2] datagram, udp: Set local address and rehash socket atomically against lookup Stefano Brivio
2024-12-05  9:30   ` Paolo Abeni
2024-12-05 15:58     ` Stefano Brivio
2024-12-05 16:53       ` Paolo Abeni
2024-12-06 10:50         ` Stefano Brivio
2024-12-06 12:36           ` Paolo Abeni
2024-12-06 13:35             ` Stefano Brivio
2024-12-06 15:10               ` Paolo Abeni
2024-12-18 16:21               ` Stefano Brivio [this message]
2024-12-05 16:35   ` Eric Dumazet
2024-12-05 22:32     ` David Gibson
2024-12-05 22:52       ` Eric Dumazet
2024-12-06  2:16         ` David Gibson
2024-12-06  9:04           ` Eric Dumazet
2024-12-09  2:20             ` David Gibson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20241218172110.12c4016a@elisabeth \
    --to=sbrivio@redhat.com \
    --cc=cambda@linux.alibaba.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=edumazet@google.com \
    --cc=fred.cc@alibaba-inc.com \
    --cc=kuniyu@amazon.com \
    --cc=lulie@linux.alibaba.com \
    --cc=mvrmanning@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=pholzing@redhat.com \
    --cc=willemdebruijn.kernel@gmail.com \
    --cc=yubing.qiuyubing@alibaba-inc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).