Netdev List
 help / color / mirror / Atom feed
From: Stefano Brivio <sbrivio@redhat.com>
To: Eric Dumazet <edumazet@google.com>
Cc: Laurent Vivier <lvivier@redhat.com>,
	Jakub Kicinski <kuba@kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	Paolo Abeni <pabeni@redhat.com>,
	Pavel Emelyanov <xemul@scylladb.com>,
	David Gibson <david@gibson.dropbear.id.au>,
	Jon Maloy <jmaloy@redhat.com>, Dmitry Safonov <dima@arista.com>,
	Andrei Vagin <avagin@google.com>,
	netdev@vger.kernel.org, linux-kselftest@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	Neal Cardwell <ncardwell@google.com>,
	Kuniyuki Iwashima <kuniyu@google.com>,
	Simon Horman <horms@kernel.org>, Shuah Khan <shuah@kernel.org>
Subject: Re: [PATCH net v2 0/2] Fix race condition between TCP_REPAIR dump and data receive
Date: Wed, 20 May 2026 10:09:26 +0200 (CEST)	[thread overview]
Message-ID: <20260520100925.1c5e385a@elisabeth> (raw)
In-Reply-To: <CANn89i+RqFMt8X48V1xkk8mveLvcBORBb5crDKhSAGAvipxNRw@mail.gmail.com>

On Wed, 20 May 2026 00:27:38 -0700
Eric Dumazet <edumazet@google.com> wrote:

> On Wed, May 20, 2026 at 12:24 AM Laurent Vivier <lvivier@redhat.com> wrote:
> >
> > On 5/20/26 06:39, Eric Dumazet wrote:  
> > > On Tue, May 19, 2026 at 7:03 PM Jakub Kicinski <kuba@kernel.org> wrote:  
> > >>
> > >> On Mon, 18 May 2026 20:34:22 +0200 Stefano Brivio wrote:  
> > >>> Stefano Brivio (2):
> > >>>    tcp: Don't accept data when socket is in repair mode  
> > >>
> > >> Not sure Eric is on board with this patch in the first place.
> > >> Sound like it's not the intended use case for REPAIR so IMO
> > >> it's up to TCP maintainers whether we want to support this.
> > >> And it's definitely not a Fix.  
> > >
> > > I am not on board. Only net-next material for sure.

I wasn't quite sure, I'll re-target it for net-next.

> > > While v2 is slightly better, we have the fundamental problem of adding
> > > stuff to TCP receive path for features that almost nobody use.
> > > (It took more than 13 years to finally complain about the race)
> > >
> > > This consumes Gigawats of power on our planet (and tons of CO2), given
> > > the trillions of packets processed every second.

Absolutely not what I want, I didn't consider that.

> > > Please at least add a static key as we did in:
> > >
> > > commit 020e71a3cf7f50c0f2c54cf2444067b76fe6d785
> > > Author: Eric Dumazet <edumazet@google.com>
> > > Date:   Mon Oct 25 09:48:24 2021 -0700
> > >
> > >      ipv4: guard IP_MINTTL with a static key
> > >
> > >      RFC 5082 IP_MINTTL option is rarely used on hosts.

I will, indeed, thanks for providing the reference.

> > > But in my original feedback I pointed that we TCP_REPAIR should
> > > probably only insert a TCP socket into TPC established hash table
> > > only when it is ready to process incoming packets.

I looked into your original feedback, and replied, because I couldn't
understand how adding sockets to the hash table (they are already
there) would help fixing this.

Now I understand the confusion:

> > The problem is on the source side of the migration and I think removing the socket from
> > the ehash table would trigger RSTs, closing the connection on the remote side.  
> 
> RST is triggered anyway if TCP_REPAIR did not occur yet.

It's different than the problem Kuniyuki pointed to (with the link to
that proposed hack from January): there, you have a situation where the
socket doesn't exist yet because it hasn't been restored yet.

That's livepatching, on the same host / node, a different (new) usage.

But here (regardless of whether it's CRIU or passt) we're talking about
*migration* (CRIU: container, passt: VM) to a different host.

In this case, migration will be considered done / ready only after the
sockets on the *target node* are up and running. No traffic will be
routed to the target node before that point.

As long as we don't prematurely close the sockets on the source node,
there's no issue with RSTs anywhere (and we can also roll things back
on a failed migration). That's something we fixed in userspace without
bothering the kernel at all.

> This is why a netfilter solution is generally used to make sure no
> incoming packet is received until the whole state has been repaired.

See comment from Andrei here: the _main_ reason why CRIU (only by
default, though) sets up netfilter rules seems to be that it makes
things simpler *while dumping connections* in that specific case. A
separate namespace makes it straightforward.

That's not the case for passt: it doesn't need to touch netfilter at
all, and I think it's a very useful thing in terms of complexity,
security, and keeping the downtime to a minimum (other than, as Andrei
mentioned, making things generally safer).

-- 
Stefano


  reply	other threads:[~2026-05-20  8:09 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-18 18:34 [PATCH net v2 0/2] Fix race condition between TCP_REPAIR dump and data receive Stefano Brivio
2026-05-18 18:34 ` [PATCH net v2 1/2] tcp: Don't accept data when socket is in repair mode Stefano Brivio
2026-05-18 18:34 ` [PATCH net v2 2/2] selftests: Add data path tests for TCP_REPAIR mode Stefano Brivio
2026-05-20  2:03 ` [PATCH net v2 0/2] Fix race condition between TCP_REPAIR dump and data receive Jakub Kicinski
2026-05-20  4:39   ` Eric Dumazet
2026-05-20  7:24     ` Laurent Vivier
2026-05-20  7:27       ` Eric Dumazet
2026-05-20  8:09         ` Stefano Brivio [this message]
2026-05-20  8:08   ` Stefano Brivio

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260520100925.1c5e385a@elisabeth \
    --to=sbrivio@redhat.com \
    --cc=avagin@google.com \
    --cc=davem@davemloft.net \
    --cc=david@gibson.dropbear.id.au \
    --cc=dima@arista.com \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=jmaloy@redhat.com \
    --cc=kuba@kernel.org \
    --cc=kuniyu@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=lvivier@redhat.com \
    --cc=ncardwell@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=shuah@kernel.org \
    --cc=xemul@scylladb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox