* [PATCH net-next] tcp: Revert no longer abort SYN_SENT when receiving some ICMP
@ 2024-01-02 9:00 Leon Romanovsky
2024-01-02 9:46 ` Eric Dumazet
` (2 more replies)
0 siblings, 3 replies; 15+ messages in thread
From: Leon Romanovsky @ 2024-01-02 9:00 UTC (permalink / raw)
To: David Ahern, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni
Cc: Shachar Kagan, netdev, Bagas Sanjaya,
Linux regression tracking (Thorsten Leemhuis)
From: Shachar Kagan <skagan@nvidia.com>
This reverts commit 0a8de364ff7a14558e9676f424283148110384d6.
Shachar reported that Vagrant (https://www.vagrantup.com/), which is
very popular tool to manage fleet of VMs stopped to work after commit
citied in Fixes line.
The issue appears while using Vagrant to manage nested VMs.
The steps are:
* create vagrant file
* vagrant up
* vagrant halt (VM is created but shut down)
* vagrant up - fail
Vagrant up stdout:
Bringing machine 'player1' up with 'libvirt' provider...
==> player1: Creating shared folders metadata...
==> player1: Starting domain.
==> player1: Domain launching with graphics connection settings...
==> player1: -- Graphics Port: 5900
==> player1: -- Graphics IP: 127.0.0.1
==> player1: -- Graphics Password: Not defined
==> player1: -- Graphics Websocket: 5700
==> player1: Waiting for domain to get an IP address...
==> player1: Waiting for machine to boot. This may take a few minutes...
player1: SSH address: 192.168.123.61:22
player1: SSH username: vagrant
player1: SSH auth method: private key
==> player1: Attempting graceful shutdown of VM...
==> player1: Attempting graceful shutdown of VM...
==> player1: Attempting graceful shutdown of VM...
player1: Guest communication could not be established! This is usually because
player1: SSH is not running, the authentication information was changed,
player1: or some other networking issue. Vagrant will force halt, if
player1: capable.
==> player1: Attempting direct shutdown of domain...
Fixes: 0a8de364ff7a ("tcp: no longer abort SYN_SENT when receiving some ICMP")
Closes: https://lore.kernel.org/all/MN2PR12MB44863139E562A59329E89DBEB982A@MN2PR12MB4486.namprd12.prod.outlook.com
Signed-off-by: Shachar Kagan <skagan@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
---
net/ipv4/tcp_ipv4.c | 6 ------
net/ipv6/tcp_ipv6.c | 9 +++------
2 files changed, 3 insertions(+), 12 deletions(-)
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 4bac6e319aca..0c50c5a32b84 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -482,7 +482,6 @@ int tcp_v4_err(struct sk_buff *skb, u32 info)
const int code = icmp_hdr(skb)->code;
struct sock *sk;
struct request_sock *fastopen;
- bool harderr = false;
u32 seq, snd_una;
int err;
struct net *net = dev_net(skb->dev);
@@ -556,7 +555,6 @@ int tcp_v4_err(struct sk_buff *skb, u32 info)
goto out;
case ICMP_PARAMETERPROB:
err = EPROTO;
- harderr = true;
break;
case ICMP_DEST_UNREACH:
if (code > NR_ICMP_UNREACH)
@@ -581,7 +579,6 @@ int tcp_v4_err(struct sk_buff *skb, u32 info)
}
err = icmp_err_convert[code].errno;
- harderr = icmp_err_convert[code].fatal;
/* check if this ICMP message allows revert of backoff.
* (see RFC 6069)
*/
@@ -607,9 +604,6 @@ int tcp_v4_err(struct sk_buff *skb, u32 info)
ip_icmp_error(sk, skb, err, th->dest, info, (u8 *)th);
- if (!harderr)
- break;
-
if (!sock_owned_by_user(sk)) {
WRITE_ONCE(sk->sk_err, err);
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index d1307d77a6f0..57b25b1fc9d9 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -381,7 +381,7 @@ static int tcp_v6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
struct tcp_sock *tp;
__u32 seq, snd_una;
struct sock *sk;
- bool harderr;
+ bool fatal;
int err;
sk = __inet6_lookup_established(net, net->ipv4.tcp_death_row.hashinfo,
@@ -402,9 +402,9 @@ static int tcp_v6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
return 0;
}
seq = ntohl(th->seq);
- harderr = icmpv6_err_convert(type, code, &err);
+ fatal = icmpv6_err_convert(type, code, &err);
if (sk->sk_state == TCP_NEW_SYN_RECV) {
- tcp_req_err(sk, seq, harderr);
+ tcp_req_err(sk, seq, fatal);
return 0;
}
@@ -489,9 +489,6 @@ static int tcp_v6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
ipv6_icmp_error(sk, skb, err, th->dest, ntohl(info), (u8 *)th);
- if (!harderr)
- break;
-
if (!sock_owned_by_user(sk)) {
WRITE_ONCE(sk->sk_err, err);
sk_error_report(sk); /* Wake people up to see the error (see connect in sock.c) */
--
2.43.0
^ permalink raw reply related [flat|nested] 15+ messages in thread* Re: [PATCH net-next] tcp: Revert no longer abort SYN_SENT when receiving some ICMP
2024-01-02 9:00 [PATCH net-next] tcp: Revert no longer abort SYN_SENT when receiving some ICMP Leon Romanovsky
@ 2024-01-02 9:46 ` Eric Dumazet
2024-01-02 9:58 ` Leon Romanovsky
2024-01-02 22:02 ` Jakub Kicinski
2024-01-08 15:52 ` Eric Dumazet
2024-01-09 3:20 ` patchwork-bot+netdevbpf
2 siblings, 2 replies; 15+ messages in thread
From: Eric Dumazet @ 2024-01-02 9:46 UTC (permalink / raw)
To: Leon Romanovsky
Cc: David Ahern, David S. Miller, Jakub Kicinski, Paolo Abeni,
Shachar Kagan, netdev, Bagas Sanjaya,
Linux regression tracking (Thorsten Leemhuis)
On Tue, Jan 2, 2024 at 10:01 AM Leon Romanovsky <leon@kernel.org> wrote:
>
> From: Shachar Kagan <skagan@nvidia.com>
>
> This reverts commit 0a8de364ff7a14558e9676f424283148110384d6.
>
> Shachar reported that Vagrant (https://www.vagrantup.com/), which is
> very popular tool to manage fleet of VMs stopped to work after commit
> citied in Fixes line.
>
> The issue appears while using Vagrant to manage nested VMs.
> The steps are:
> * create vagrant file
> * vagrant up
> * vagrant halt (VM is created but shut down)
> * vagrant up - fail
>
I would rather have an explanation, instead of reverting a valid patch.
I have been on vacation for some time. I may have missed a detailed
explanation, please repost if needed.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net-next] tcp: Revert no longer abort SYN_SENT when receiving some ICMP
2024-01-02 9:46 ` Eric Dumazet
@ 2024-01-02 9:58 ` Leon Romanovsky
2024-01-02 10:03 ` Eric Dumazet
2024-01-02 22:02 ` Jakub Kicinski
1 sibling, 1 reply; 15+ messages in thread
From: Leon Romanovsky @ 2024-01-02 9:58 UTC (permalink / raw)
To: Eric Dumazet
Cc: David Ahern, David S. Miller, Jakub Kicinski, Paolo Abeni,
Shachar Kagan, netdev, Bagas Sanjaya,
Linux regression tracking (Thorsten Leemhuis)
On Tue, Jan 02, 2024 at 10:46:13AM +0100, Eric Dumazet wrote:
> On Tue, Jan 2, 2024 at 10:01 AM Leon Romanovsky <leon@kernel.org> wrote:
> >
> > From: Shachar Kagan <skagan@nvidia.com>
> >
> > This reverts commit 0a8de364ff7a14558e9676f424283148110384d6.
> >
> > Shachar reported that Vagrant (https://www.vagrantup.com/), which is
> > very popular tool to manage fleet of VMs stopped to work after commit
> > citied in Fixes line.
> >
> > The issue appears while using Vagrant to manage nested VMs.
> > The steps are:
> > * create vagrant file
> > * vagrant up
> > * vagrant halt (VM is created but shut down)
> > * vagrant up - fail
> >
>
> I would rather have an explanation, instead of reverting a valid patch.
>
> I have been on vacation for some time. I may have missed a detailed
> explanation, please repost if needed.
Our detailed explanation that revert worked. You provided the patch that
broke, so please let's not require from users to debug it.
If you need a help to reproduce and/or test some hypothesis, Shachar
will be happy to help you, just ask.
Thanks
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net-next] tcp: Revert no longer abort SYN_SENT when receiving some ICMP
2024-01-02 9:58 ` Leon Romanovsky
@ 2024-01-02 10:03 ` Eric Dumazet
2024-01-02 10:11 ` Eric Dumazet
2024-01-02 11:41 ` Leon Romanovsky
0 siblings, 2 replies; 15+ messages in thread
From: Eric Dumazet @ 2024-01-02 10:03 UTC (permalink / raw)
To: Leon Romanovsky
Cc: David Ahern, David S. Miller, Jakub Kicinski, Paolo Abeni,
Shachar Kagan, netdev, Bagas Sanjaya,
Linux regression tracking (Thorsten Leemhuis)
On Tue, Jan 2, 2024 at 10:58 AM Leon Romanovsky <leon@kernel.org> wrote:
>
> On Tue, Jan 02, 2024 at 10:46:13AM +0100, Eric Dumazet wrote:
> > On Tue, Jan 2, 2024 at 10:01 AM Leon Romanovsky <leon@kernel.org> wrote:
> > >
> > > From: Shachar Kagan <skagan@nvidia.com>
> > >
> > > This reverts commit 0a8de364ff7a14558e9676f424283148110384d6.
> > >
> > > Shachar reported that Vagrant (https://www.vagrantup.com/), which is
> > > very popular tool to manage fleet of VMs stopped to work after commit
> > > citied in Fixes line.
> > >
> > > The issue appears while using Vagrant to manage nested VMs.
> > > The steps are:
> > > * create vagrant file
> > > * vagrant up
> > > * vagrant halt (VM is created but shut down)
> > > * vagrant up - fail
> > >
> >
> > I would rather have an explanation, instead of reverting a valid patch.
> >
> > I have been on vacation for some time. I may have missed a detailed
> > explanation, please repost if needed.
>
> Our detailed explanation that revert worked. You provided the patch that
> broke, so please let's not require from users to debug it.
>
> If you need a help to reproduce and/or test some hypothesis, Shachar
> will be happy to help you, just ask.
I have asked already, and received files that showed no ICMP relevant
interactions.
Can someone from your team help Shachar to get a packet capture of
both TCP _and_ ICMP packets ?
Otherwise there is little I can do. I can not blindly trust someone
that a valid patch broke something, just because 'something broke'
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net-next] tcp: Revert no longer abort SYN_SENT when receiving some ICMP
2024-01-02 10:03 ` Eric Dumazet
@ 2024-01-02 10:11 ` Eric Dumazet
2024-01-02 11:41 ` Leon Romanovsky
1 sibling, 0 replies; 15+ messages in thread
From: Eric Dumazet @ 2024-01-02 10:11 UTC (permalink / raw)
To: Leon Romanovsky
Cc: David Ahern, David S. Miller, Jakub Kicinski, Paolo Abeni,
Shachar Kagan, netdev, Bagas Sanjaya,
Linux regression tracking (Thorsten Leemhuis)
On Tue, Jan 2, 2024 at 11:03 AM Eric Dumazet <edumazet@google.com> wrote:
>
> On Tue, Jan 2, 2024 at 10:58 AM Leon Romanovsky <leon@kernel.org> wrote:
> >
> > On Tue, Jan 02, 2024 at 10:46:13AM +0100, Eric Dumazet wrote:
> > > On Tue, Jan 2, 2024 at 10:01 AM Leon Romanovsky <leon@kernel.org> wrote:
> > > >
> > > > From: Shachar Kagan <skagan@nvidia.com>
> > > >
> > > > This reverts commit 0a8de364ff7a14558e9676f424283148110384d6.
> > > >
> > > > Shachar reported that Vagrant (https://www.vagrantup.com/), which is
> > > > very popular tool to manage fleet of VMs stopped to work after commit
> > > > citied in Fixes line.
> > > >
> > > > The issue appears while using Vagrant to manage nested VMs.
> > > > The steps are:
> > > > * create vagrant file
> > > > * vagrant up
> > > > * vagrant halt (VM is created but shut down)
> > > > * vagrant up - fail
> > > >
> > >
> > > I would rather have an explanation, instead of reverting a valid patch.
> > >
> > > I have been on vacation for some time. I may have missed a detailed
> > > explanation, please repost if needed.
> >
> > Our detailed explanation that revert worked. You provided the patch that
> > broke, so please let's not require from users to debug it.
> >
> > If you need a help to reproduce and/or test some hypothesis, Shachar
> > will be happy to help you, just ask.
>
> I have asked already, and received files that showed no ICMP relevant
> interactions.
>
> Can someone from your team help Shachar to get a packet capture of
> both TCP _and_ ICMP packets ?
>
> Otherwise there is little I can do. I can not blindly trust someone
> that a valid patch broke something, just because 'something broke'
Alternative to a packet capture would be a packetdrill test.
Following test passes with the new kernel, and breaks with the old ones.
// Set up config.
`../common/defaults.sh
../common/set_sysctls.py /proc/sys/net/ipv4/tcp_syn_retries=12 \
/proc/sys/net/ipv4/tcp_syn_linear_timeouts=4
`
// Establish a connection.
0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
+0 connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)
+0 > S 0:0(0) win 65535 <mss 1460,sackOK,TS val 80 ecr 0,nop,wscale 8>
+1 > S 0:0(0) win 65535 <...>
+0 %{ assert tcpi_backoff == 0, tcpi_backoff}%
+0 %{ assert tcpi_total_rto == 1, tcpi_total_rto}%
+1 > S 0:0(0) win 65535 <...>
+0 %{ assert tcpi_backoff == 0, tcpi_backoff}%
+0 %{ assert tcpi_total_rto == 2, tcpi_total_rto}%
+1 > S 0:0(0) win 65535 <...>
+0 %{ assert tcpi_backoff == 0, tcpi_backoff}%
+0 %{ assert tcpi_total_rto == 3, tcpi_total_rto}%
// RFC 6069 : this ICMP message is ignored while in linear timeout mode
+.5 < icmp unreachable host_unreachable [0:0(0)]
+0.5 > S 0:0(0) win 65535 <...>
+0 %{ assert tcpi_backoff == 0, tcpi_backoff}%
+0 %{ assert tcpi_total_rto == 4, tcpi_total_rto}%
+1 > S 0:0(0) win 65535 <...>
+0 %{ assert tcpi_backoff == 1, tcpi_backoff}%
+0 %{ assert tcpi_total_rto == 5, tcpi_total_rto}%
+2 > S 0:0(0) win 65535 <...>
+0 %{ assert tcpi_backoff == 2, tcpi_backoff}%
+0 %{ assert tcpi_total_rto == 6, tcpi_total_rto}%
+4 > S 0:0(0) win 65535 <...>
+0 %{ assert tcpi_backoff == 3, tcpi_backoff}%
+0 %{ assert tcpi_total_rto == 7, tcpi_total_rto}%
// RFC 6069 : this ICMP message should undo one retransmission timer backoff
+.5 < icmp unreachable host_unreachable [0:0(0)]
+0 %{ assert tcpi_backoff == 2, tcpi_backoff }%
// RFC 6069 : this ICMP message should undo one retransmission timer backoff
+.4 < icmp unreachable host_unreachable [0:0(0)]
+0 %{ assert tcpi_backoff == 1, tcpi_backoff }%
// RFC 6069 : this ICMP message should undo one retransmission timer backoff
+0 < icmp unreachable host_unreachable [0:0(0)]
+0 %{ assert tcpi_backoff == 0, tcpi_backoff }%
+.1 > S 0:0(0) win 65535 <...>
+0 %{ assert tcpi_backoff == 1, tcpi_backoff }%
// RFC 6069 : this ICMP message should undo one retransmission timer backoff
+.6 < icmp unreachable net_unreachable [0:0(0)]
+0 %{ assert tcpi_backoff == 0, tcpi_backoff }%
+.4 > S 0:0(0) win 65535 <...>
+0 %{ assert tcpi_backoff == 1, tcpi_backoff }%
//RFC 6069 : other ICMP messages should not undo one timer backoff
+0 < icmp unreachable source_route_failed [1:1(0)]
+0 < icmp unreachable net_unreachable_for_tos [1:1(0)]
+0 < icmp unreachable host_unreachable_for_tos [1:1(0)]
+0 %{ assert tcpi_backoff == 1, tcpi_backoff }%
+2 > S 0:0(0) win 65535 <...>
+0 %{ assert tcpi_backoff == 2, tcpi_backoff }%
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH net-next] tcp: Revert no longer abort SYN_SENT when receiving some ICMP
2024-01-02 10:03 ` Eric Dumazet
2024-01-02 10:11 ` Eric Dumazet
@ 2024-01-02 11:41 ` Leon Romanovsky
2024-01-02 15:31 ` Eric Dumazet
1 sibling, 1 reply; 15+ messages in thread
From: Leon Romanovsky @ 2024-01-02 11:41 UTC (permalink / raw)
To: Eric Dumazet, Gal Pressman
Cc: David Ahern, David S. Miller, Jakub Kicinski, Paolo Abeni,
Shachar Kagan, netdev, Bagas Sanjaya,
Linux regression tracking (Thorsten Leemhuis)
On Tue, Jan 02, 2024 at 11:03:55AM +0100, Eric Dumazet wrote:
> On Tue, Jan 2, 2024 at 10:58 AM Leon Romanovsky <leon@kernel.org> wrote:
> >
> > On Tue, Jan 02, 2024 at 10:46:13AM +0100, Eric Dumazet wrote:
> > > On Tue, Jan 2, 2024 at 10:01 AM Leon Romanovsky <leon@kernel.org> wrote:
> > > >
> > > > From: Shachar Kagan <skagan@nvidia.com>
> > > >
> > > > This reverts commit 0a8de364ff7a14558e9676f424283148110384d6.
> > > >
> > > > Shachar reported that Vagrant (https://www.vagrantup.com/), which is
> > > > very popular tool to manage fleet of VMs stopped to work after commit
> > > > citied in Fixes line.
> > > >
> > > > The issue appears while using Vagrant to manage nested VMs.
> > > > The steps are:
> > > > * create vagrant file
> > > > * vagrant up
> > > > * vagrant halt (VM is created but shut down)
> > > > * vagrant up - fail
> > > >
> > >
> > > I would rather have an explanation, instead of reverting a valid patch.
> > >
> > > I have been on vacation for some time. I may have missed a detailed
> > > explanation, please repost if needed.
> >
> > Our detailed explanation that revert worked. You provided the patch that
> > broke, so please let's not require from users to debug it.
> >
> > If you need a help to reproduce and/or test some hypothesis, Shachar
> > will be happy to help you, just ask.
>
> I have asked already, and received files that showed no ICMP relevant
> interactions.
>
> Can someone from your team help Shachar to get a packet capture of
> both TCP _and_ ICMP packets ?
I or Gal will help her, but for now let's revert it, before we will see
this breakage in merge window and later in all other branches which will
be based on -rc1.
>
> Otherwise there is little I can do. I can not blindly trust someone
> that a valid patch broke something, just because 'something broke'
We use standard Vagrant, you can try to reproduce the issue locally.
Thanks
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net-next] tcp: Revert no longer abort SYN_SENT when receiving some ICMP
2024-01-02 11:41 ` Leon Romanovsky
@ 2024-01-02 15:31 ` Eric Dumazet
2024-01-02 18:01 ` Leon Romanovsky
0 siblings, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2024-01-02 15:31 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Gal Pressman, David Ahern, David S. Miller, Jakub Kicinski,
Paolo Abeni, Shachar Kagan, netdev, Bagas Sanjaya,
Linux regression tracking (Thorsten Leemhuis)
On Tue, Jan 2, 2024 at 12:41 PM Leon Romanovsky <leon@kernel.org> wrote:
>
> On Tue, Jan 02, 2024 at 11:03:55AM +0100, Eric Dumazet wrote:
> > On Tue, Jan 2, 2024 at 10:58 AM Leon Romanovsky <leon@kernel.org> wrote:
> > >
> > > On Tue, Jan 02, 2024 at 10:46:13AM +0100, Eric Dumazet wrote:
> > > > On Tue, Jan 2, 2024 at 10:01 AM Leon Romanovsky <leon@kernel.org> wrote:
> > > > >
> > > > > From: Shachar Kagan <skagan@nvidia.com>
> > > > >
> > > > > This reverts commit 0a8de364ff7a14558e9676f424283148110384d6.
> > > > >
> > > > > Shachar reported that Vagrant (https://www.vagrantup.com/), which is
> > > > > very popular tool to manage fleet of VMs stopped to work after commit
> > > > > citied in Fixes line.
> > > > >
> > > > > The issue appears while using Vagrant to manage nested VMs.
> > > > > The steps are:
> > > > > * create vagrant file
> > > > > * vagrant up
> > > > > * vagrant halt (VM is created but shut down)
> > > > > * vagrant up - fail
> > > > >
> > > >
> > > > I would rather have an explanation, instead of reverting a valid patch.
> > > >
> > > > I have been on vacation for some time. I may have missed a detailed
> > > > explanation, please repost if needed.
> > >
> > > Our detailed explanation that revert worked. You provided the patch that
> > > broke, so please let's not require from users to debug it.
> > >
> > > If you need a help to reproduce and/or test some hypothesis, Shachar
> > > will be happy to help you, just ask.
> >
> > I have asked already, and received files that showed no ICMP relevant
> > interactions.
> >
> > Can someone from your team help Shachar to get a packet capture of
> > both TCP _and_ ICMP packets ?
>
> I or Gal will help her, but for now let's revert it, before we will see
> this breakage in merge window and later in all other branches which will
> be based on -rc1.
Patch is in net-next, we have at least four weeks to find the root cause.
I am a TCP maintainer, I will ask you to respect my choice, we have
tests and reverting the patch is breaking one of them.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net-next] tcp: Revert no longer abort SYN_SENT when receiving some ICMP
2024-01-02 15:31 ` Eric Dumazet
@ 2024-01-02 18:01 ` Leon Romanovsky
2024-01-02 18:33 ` Eric Dumazet
0 siblings, 1 reply; 15+ messages in thread
From: Leon Romanovsky @ 2024-01-02 18:01 UTC (permalink / raw)
To: Eric Dumazet
Cc: Gal Pressman, David Ahern, David S. Miller, Jakub Kicinski,
Paolo Abeni, Shachar Kagan, netdev, Bagas Sanjaya,
Linux regression tracking (Thorsten Leemhuis)
On Tue, Jan 02, 2024 at 04:31:15PM +0100, Eric Dumazet wrote:
> On Tue, Jan 2, 2024 at 12:41 PM Leon Romanovsky <leon@kernel.org> wrote:
> >
> > On Tue, Jan 02, 2024 at 11:03:55AM +0100, Eric Dumazet wrote:
> > > On Tue, Jan 2, 2024 at 10:58 AM Leon Romanovsky <leon@kernel.org> wrote:
> > > >
> > > > On Tue, Jan 02, 2024 at 10:46:13AM +0100, Eric Dumazet wrote:
> > > > > On Tue, Jan 2, 2024 at 10:01 AM Leon Romanovsky <leon@kernel.org> wrote:
> > > > > >
> > > > > > From: Shachar Kagan <skagan@nvidia.com>
> > > > > >
> > > > > > This reverts commit 0a8de364ff7a14558e9676f424283148110384d6.
> > > > > >
> > > > > > Shachar reported that Vagrant (https://www.vagrantup.com/), which is
> > > > > > very popular tool to manage fleet of VMs stopped to work after commit
> > > > > > citied in Fixes line.
> > > > > >
> > > > > > The issue appears while using Vagrant to manage nested VMs.
> > > > > > The steps are:
> > > > > > * create vagrant file
> > > > > > * vagrant up
> > > > > > * vagrant halt (VM is created but shut down)
> > > > > > * vagrant up - fail
> > > > > >
> > > > >
> > > > > I would rather have an explanation, instead of reverting a valid patch.
> > > > >
> > > > > I have been on vacation for some time. I may have missed a detailed
> > > > > explanation, please repost if needed.
> > > >
> > > > Our detailed explanation that revert worked. You provided the patch that
> > > > broke, so please let's not require from users to debug it.
> > > >
> > > > If you need a help to reproduce and/or test some hypothesis, Shachar
> > > > will be happy to help you, just ask.
> > >
> > > I have asked already, and received files that showed no ICMP relevant
> > > interactions.
> > >
> > > Can someone from your team help Shachar to get a packet capture of
> > > both TCP _and_ ICMP packets ?
> >
> > I or Gal will help her, but for now let's revert it, before we will see
> > this breakage in merge window and later in all other branches which will
> > be based on -rc1.
>
> Patch is in net-next, we have at least four weeks to find the root cause.
I saw more than once claims that netdev is fast to take patches but also
fast in reverts. There is no need to keep patch with known regression,
while we are in -rc8.
>
> I am a TCP maintainer, I will ask you to respect my choice, we have
> tests and reverting the patch is breaking one of them.
At least for ipv6, you changed code from 2016 and the patch which I'm asking
to revert is not even marked as a fix. So I don't understand the urgency to keep
the patch.
There are two things to consider:
1. Linux rule number one is "do not break userspace".
2. Linux is a community project and people can have different opinions,
which can be different from your/mine.
Thanks
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net-next] tcp: Revert no longer abort SYN_SENT when receiving some ICMP
2024-01-02 18:01 ` Leon Romanovsky
@ 2024-01-02 18:33 ` Eric Dumazet
2024-01-02 19:13 ` Leon Romanovsky
0 siblings, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2024-01-02 18:33 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Gal Pressman, David Ahern, David S. Miller, Jakub Kicinski,
Paolo Abeni, Shachar Kagan, netdev, Bagas Sanjaya,
Linux regression tracking (Thorsten Leemhuis)
On Tue, Jan 2, 2024 at 7:01 PM Leon Romanovsky <leon@kernel.org> wrote:
>
> On Tue, Jan 02, 2024 at 04:31:15PM +0100, Eric Dumazet wrote:
> > On Tue, Jan 2, 2024 at 12:41 PM Leon Romanovsky <leon@kernel.org> wrote:
> > >
> > > On Tue, Jan 02, 2024 at 11:03:55AM +0100, Eric Dumazet wrote:
> > > > On Tue, Jan 2, 2024 at 10:58 AM Leon Romanovsky <leon@kernel.org> wrote:
> > > > >
> > > > > On Tue, Jan 02, 2024 at 10:46:13AM +0100, Eric Dumazet wrote:
> > > > > > On Tue, Jan 2, 2024 at 10:01 AM Leon Romanovsky <leon@kernel.org> wrote:
> > > > > > >
> > > > > > > From: Shachar Kagan <skagan@nvidia.com>
> > > > > > >
> > > > > > > This reverts commit 0a8de364ff7a14558e9676f424283148110384d6.
> > > > > > >
> > > > > > > Shachar reported that Vagrant (https://www.vagrantup.com/), which is
> > > > > > > very popular tool to manage fleet of VMs stopped to work after commit
> > > > > > > citied in Fixes line.
> > > > > > >
> > > > > > > The issue appears while using Vagrant to manage nested VMs.
> > > > > > > The steps are:
> > > > > > > * create vagrant file
> > > > > > > * vagrant up
> > > > > > > * vagrant halt (VM is created but shut down)
> > > > > > > * vagrant up - fail
> > > > > > >
> > > > > >
> > > > > > I would rather have an explanation, instead of reverting a valid patch.
> > > > > >
> > > > > > I have been on vacation for some time. I may have missed a detailed
> > > > > > explanation, please repost if needed.
> > > > >
> > > > > Our detailed explanation that revert worked. You provided the patch that
> > > > > broke, so please let's not require from users to debug it.
> > > > >
> > > > > If you need a help to reproduce and/or test some hypothesis, Shachar
> > > > > will be happy to help you, just ask.
> > > >
> > > > I have asked already, and received files that showed no ICMP relevant
> > > > interactions.
> > > >
> > > > Can someone from your team help Shachar to get a packet capture of
> > > > both TCP _and_ ICMP packets ?
> > >
> > > I or Gal will help her, but for now let's revert it, before we will see
> > > this breakage in merge window and later in all other branches which will
> > > be based on -rc1.
> >
> > Patch is in net-next, we have at least four weeks to find the root cause.
>
> I saw more than once claims that netdev is fast to take patches but also
> fast in reverts. There is no need to keep patch with known regression,
> while we are in -rc8.
This patch is not in rc8, unless I am mistaken ?
>
> >
> > I am a TCP maintainer, I will ask you to respect my choice, we have
> > tests and reverting the patch is breaking one of them.
>
> At least for ipv6, you changed code from 2016 and the patch which I'm asking
> to revert is not even marked as a fix. So I don't understand the urgency to keep
> the patch.
Do you have an issue with IPv4 code or IPv6 ?
It would help to have details.
>
> There are two things to consider:
> 1. Linux rule number one is "do not break userspace".
No released kernel contains the issue yet. Nothing broke yet.
net-next is for developers.
> 2. Linux is a community project and people can have different opinions,
> which can be different from your/mine.
>
> Thanks
I think we have time, and getting this patch with potential users on
it will help to debug the issue.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net-next] tcp: Revert no longer abort SYN_SENT when receiving some ICMP
2024-01-02 18:33 ` Eric Dumazet
@ 2024-01-02 19:13 ` Leon Romanovsky
0 siblings, 0 replies; 15+ messages in thread
From: Leon Romanovsky @ 2024-01-02 19:13 UTC (permalink / raw)
To: Eric Dumazet
Cc: Gal Pressman, David Ahern, David S. Miller, Jakub Kicinski,
Paolo Abeni, Shachar Kagan, netdev, Bagas Sanjaya,
Linux regression tracking (Thorsten Leemhuis)
On Tue, Jan 02, 2024 at 07:33:23PM +0100, Eric Dumazet wrote:
> On Tue, Jan 2, 2024 at 7:01 PM Leon Romanovsky <leon@kernel.org> wrote:
> >
> > On Tue, Jan 02, 2024 at 04:31:15PM +0100, Eric Dumazet wrote:
> > > On Tue, Jan 2, 2024 at 12:41 PM Leon Romanovsky <leon@kernel.org> wrote:
> > > >
> > > > On Tue, Jan 02, 2024 at 11:03:55AM +0100, Eric Dumazet wrote:
> > > > > On Tue, Jan 2, 2024 at 10:58 AM Leon Romanovsky <leon@kernel.org> wrote:
> > > > > >
> > > > > > On Tue, Jan 02, 2024 at 10:46:13AM +0100, Eric Dumazet wrote:
> > > > > > > On Tue, Jan 2, 2024 at 10:01 AM Leon Romanovsky <leon@kernel.org> wrote:
> > > > > > > >
> > > > > > > > From: Shachar Kagan <skagan@nvidia.com>
> > > > > > > >
> > > > > > > > This reverts commit 0a8de364ff7a14558e9676f424283148110384d6.
> > > > > > > >
> > > > > > > > Shachar reported that Vagrant (https://www.vagrantup.com/), which is
> > > > > > > > very popular tool to manage fleet of VMs stopped to work after commit
> > > > > > > > citied in Fixes line.
> > > > > > > >
> > > > > > > > The issue appears while using Vagrant to manage nested VMs.
> > > > > > > > The steps are:
> > > > > > > > * create vagrant file
> > > > > > > > * vagrant up
> > > > > > > > * vagrant halt (VM is created but shut down)
> > > > > > > > * vagrant up - fail
> > > > > > > >
> > > > > > >
> > > > > > > I would rather have an explanation, instead of reverting a valid patch.
> > > > > > >
> > > > > > > I have been on vacation for some time. I may have missed a detailed
> > > > > > > explanation, please repost if needed.
> > > > > >
> > > > > > Our detailed explanation that revert worked. You provided the patch that
> > > > > > broke, so please let's not require from users to debug it.
> > > > > >
> > > > > > If you need a help to reproduce and/or test some hypothesis, Shachar
> > > > > > will be happy to help you, just ask.
> > > > >
> > > > > I have asked already, and received files that showed no ICMP relevant
> > > > > interactions.
> > > > >
> > > > > Can someone from your team help Shachar to get a packet capture of
> > > > > both TCP _and_ ICMP packets ?
> > > >
> > > > I or Gal will help her, but for now let's revert it, before we will see
> > > > this breakage in merge window and later in all other branches which will
> > > > be based on -rc1.
> > >
> > > Patch is in net-next, we have at least four weeks to find the root cause.
> >
> > I saw more than once claims that netdev is fast to take patches but also
> > fast in reverts. There is no need to keep patch with known regression,
> > while we are in -rc8.
>
> This patch is not in rc8, unless I am mistaken ?
Patch is not, but the timeline is. Right now, we are in -rc8 and in next
week, the merge window will start.
>
> >
> > >
> > > I am a TCP maintainer, I will ask you to respect my choice, we have
> > > tests and reverting the patch is breaking one of them.
> >
> > At least for ipv6, you changed code from 2016 and the patch which I'm asking
> > to revert is not even marked as a fix. So I don't understand the urgency to keep
> > the patch.
>
> Do you have an issue with IPv4 code or IPv6 ?
I think that IPv4, I looked on IPv6 dates just because it was easy to
do. It looks like IPv4 code which you are changing existed in its
current form even before git era.
>
> It would help to have details.
We prepared raw PCAP files and I'm uploading them. It takes time.
> >
> > There are two things to consider:
> > 1. Linux rule number one is "do not break userspace".
>
> No released kernel contains the issue yet. Nothing broke yet.
After merge window, it will and I want to avoid it.
>
> net-next is for developers.
It is encouraged to test net-next, so we did it and reported.
>
> > 2. Linux is a community project and people can have different opinions,
> > which can be different from your/mine.
> >
> > Thanks
>
> I think we have time, and getting this patch with potential users on
> it will help to debug the issue.
Like I said, Shachar can test any debug patches, just ask her.
Thanks
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net-next] tcp: Revert no longer abort SYN_SENT when receiving some ICMP
2024-01-02 9:46 ` Eric Dumazet
2024-01-02 9:58 ` Leon Romanovsky
@ 2024-01-02 22:02 ` Jakub Kicinski
2024-01-03 6:00 ` Leon Romanovsky
1 sibling, 1 reply; 15+ messages in thread
From: Jakub Kicinski @ 2024-01-02 22:02 UTC (permalink / raw)
To: Leon Romanovsky
Cc: Eric Dumazet, David Ahern, David S. Miller, Paolo Abeni,
Shachar Kagan, netdev, Bagas Sanjaya,
Linux regression tracking (Thorsten Leemhuis)
On Tue, 2 Jan 2024 10:46:13 +0100 Eric Dumazet wrote:
> > The issue appears while using Vagrant to manage nested VMs.
> > The steps are:
> > * create vagrant file
> > * vagrant up
> > * vagrant halt (VM is created but shut down)
> > * vagrant up - fail
>
> I would rather have an explanation, instead of reverting a valid patch.
+1 obviously. Your refusal to debug this any further does not put
nVidia's TCP / NVMe offload in a good light. On one hand you
claim to have TCP experts in house and are pushing TCP offloads and
on the other you can't debug a TCP issue for which you reportedly
have an easy repro? Does not add up.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net-next] tcp: Revert no longer abort SYN_SENT when receiving some ICMP
2024-01-02 22:02 ` Jakub Kicinski
@ 2024-01-03 6:00 ` Leon Romanovsky
0 siblings, 0 replies; 15+ messages in thread
From: Leon Romanovsky @ 2024-01-03 6:00 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Eric Dumazet, David Ahern, David S. Miller, Paolo Abeni,
Shachar Kagan, netdev, Bagas Sanjaya,
Linux regression tracking (Thorsten Leemhuis)
On Tue, Jan 02, 2024 at 02:02:32PM -0800, Jakub Kicinski wrote:
> On Tue, 2 Jan 2024 10:46:13 +0100 Eric Dumazet wrote:
> > > The issue appears while using Vagrant to manage nested VMs.
> > > The steps are:
> > > * create vagrant file
> > > * vagrant up
> > > * vagrant halt (VM is created but shut down)
> > > * vagrant up - fail
> >
> > I would rather have an explanation, instead of reverting a valid patch.
>
> +1 obviously. Your refusal to debug this any further does not put
> nVidia's TCP / NVMe offload in a good light. On one hand you
> claim to have TCP experts in house and are pushing TCP offloads and
> on the other you can't debug a TCP issue for which you reportedly
> have an easy repro? Does not add up.
Did I claim about TCP experts? No.
Did we cause to Vagrant to stop working? No.
Did we write problematic patch? No.
So let's not ask from the people who by chance tested the code to debug it.
Thanks
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net-next] tcp: Revert no longer abort SYN_SENT when receiving some ICMP
2024-01-02 9:00 [PATCH net-next] tcp: Revert no longer abort SYN_SENT when receiving some ICMP Leon Romanovsky
2024-01-02 9:46 ` Eric Dumazet
@ 2024-01-08 15:52 ` Eric Dumazet
2024-01-09 13:26 ` Leon Romanovsky
2024-01-09 3:20 ` patchwork-bot+netdevbpf
2 siblings, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2024-01-08 15:52 UTC (permalink / raw)
To: Leon Romanovsky
Cc: David Ahern, David S. Miller, Jakub Kicinski, Paolo Abeni,
Shachar Kagan, netdev, Bagas Sanjaya,
Linux regression tracking (Thorsten Leemhuis), Neal Cardwell
On Tue, Jan 2, 2024 at 10:01 AM Leon Romanovsky <leon@kernel.org> wrote:
>
> From: Shachar Kagan <skagan@nvidia.com>
>
> This reverts commit 0a8de364ff7a14558e9676f424283148110384d6.
>
> Shachar reported that Vagrant (https://www.vagrantup.com/), which is
> very popular tool to manage fleet of VMs stopped to work after commit
> citied in Fixes line.
>
> The issue appears while using Vagrant to manage nested VMs.
> The steps are:
> * create vagrant file
> * vagrant up
> * vagrant halt (VM is created but shut down)
> * vagrant up - fail
>
> Vagrant up stdout:
> Bringing machine 'player1' up with 'libvirt' provider...
> ==> player1: Creating shared folders metadata...
> ==> player1: Starting domain.
> ==> player1: Domain launching with graphics connection settings...
> ==> player1: -- Graphics Port: 5900
> ==> player1: -- Graphics IP: 127.0.0.1
> ==> player1: -- Graphics Password: Not defined
> ==> player1: -- Graphics Websocket: 5700
> ==> player1: Waiting for domain to get an IP address...
> ==> player1: Waiting for machine to boot. This may take a few minutes...
> player1: SSH address: 192.168.123.61:22
> player1: SSH username: vagrant
> player1: SSH auth method: private key
> ==> player1: Attempting graceful shutdown of VM...
> ==> player1: Attempting graceful shutdown of VM...
> ==> player1: Attempting graceful shutdown of VM...
> player1: Guest communication could not be established! This is usually because
> player1: SSH is not running, the authentication information was changed,
> player1: or some other networking issue. Vagrant will force halt, if
> player1: capable.
> ==> player1: Attempting direct shutdown of domain...
>
> Fixes: 0a8de364ff7a ("tcp: no longer abort SYN_SENT when receiving some ICMP")
> Closes: https://lore.kernel.org/all/MN2PR12MB44863139E562A59329E89DBEB982A@MN2PR12MB4486.namprd12.prod.outlook.com
> Signed-off-by: Shachar Kagan <skagan@nvidia.com>
> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> ---
While IPv6 code has an issue (not calling tcp_ld_RTO_revert() helper
for TCP_SYN_SENT as intended,
I could not find the root cause for your case.
We will submit the patch again for 6.9, once we get to the root cause.
Reviewed-by: Eric Dumazet <edumazet@google.com>
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [PATCH net-next] tcp: Revert no longer abort SYN_SENT when receiving some ICMP
2024-01-08 15:52 ` Eric Dumazet
@ 2024-01-09 13:26 ` Leon Romanovsky
0 siblings, 0 replies; 15+ messages in thread
From: Leon Romanovsky @ 2024-01-09 13:26 UTC (permalink / raw)
To: Eric Dumazet
Cc: David Ahern, David S. Miller, Jakub Kicinski, Paolo Abeni,
Shachar Kagan, netdev, Bagas Sanjaya,
Linux regression tracking (Thorsten Leemhuis), Neal Cardwell
On Mon, Jan 08, 2024 at 04:52:39PM +0100, Eric Dumazet wrote:
> On Tue, Jan 2, 2024 at 10:01 AM Leon Romanovsky <leon@kernel.org> wrote:
> >
> > From: Shachar Kagan <skagan@nvidia.com>
> >
> > This reverts commit 0a8de364ff7a14558e9676f424283148110384d6.
> >
> > Shachar reported that Vagrant (https://www.vagrantup.com/), which is
> > very popular tool to manage fleet of VMs stopped to work after commit
> > citied in Fixes line.
> >
> > The issue appears while using Vagrant to manage nested VMs.
> > The steps are:
> > * create vagrant file
> > * vagrant up
> > * vagrant halt (VM is created but shut down)
> > * vagrant up - fail
> >
> > Vagrant up stdout:
> > Bringing machine 'player1' up with 'libvirt' provider...
> > ==> player1: Creating shared folders metadata...
> > ==> player1: Starting domain.
> > ==> player1: Domain launching with graphics connection settings...
> > ==> player1: -- Graphics Port: 5900
> > ==> player1: -- Graphics IP: 127.0.0.1
> > ==> player1: -- Graphics Password: Not defined
> > ==> player1: -- Graphics Websocket: 5700
> > ==> player1: Waiting for domain to get an IP address...
> > ==> player1: Waiting for machine to boot. This may take a few minutes...
> > player1: SSH address: 192.168.123.61:22
> > player1: SSH username: vagrant
> > player1: SSH auth method: private key
> > ==> player1: Attempting graceful shutdown of VM...
> > ==> player1: Attempting graceful shutdown of VM...
> > ==> player1: Attempting graceful shutdown of VM...
> > player1: Guest communication could not be established! This is usually because
> > player1: SSH is not running, the authentication information was changed,
> > player1: or some other networking issue. Vagrant will force halt, if
> > player1: capable.
> > ==> player1: Attempting direct shutdown of domain...
> >
> > Fixes: 0a8de364ff7a ("tcp: no longer abort SYN_SENT when receiving some ICMP")
> > Closes: https://lore.kernel.org/all/MN2PR12MB44863139E562A59329E89DBEB982A@MN2PR12MB4486.namprd12.prod.outlook.com
> > Signed-off-by: Shachar Kagan <skagan@nvidia.com>
> > Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
> > ---
>
> While IPv6 code has an issue (not calling tcp_ld_RTO_revert() helper
> for TCP_SYN_SENT as intended,
> I could not find the root cause for your case.
>
> We will submit the patch again for 6.9, once we get to the root cause.
Feel free to send to Shachar with CC me and/or Gal any patches which you
would like to test in advance and we will be happy to do it.
>
> Reviewed-by: Eric Dumazet <edumazet@google.com>
Thanks
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net-next] tcp: Revert no longer abort SYN_SENT when receiving some ICMP
2024-01-02 9:00 [PATCH net-next] tcp: Revert no longer abort SYN_SENT when receiving some ICMP Leon Romanovsky
2024-01-02 9:46 ` Eric Dumazet
2024-01-08 15:52 ` Eric Dumazet
@ 2024-01-09 3:20 ` patchwork-bot+netdevbpf
2 siblings, 0 replies; 15+ messages in thread
From: patchwork-bot+netdevbpf @ 2024-01-09 3:20 UTC (permalink / raw)
To: Leon Romanovsky
Cc: dsahern, davem, edumazet, kuba, pabeni, skagan, netdev,
bagasdotme, regressions
Hello:
This patch was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:
On Tue, 2 Jan 2024 11:00:57 +0200 you wrote:
> From: Shachar Kagan <skagan@nvidia.com>
>
> This reverts commit 0a8de364ff7a14558e9676f424283148110384d6.
>
> Shachar reported that Vagrant (https://www.vagrantup.com/), which is
> very popular tool to manage fleet of VMs stopped to work after commit
> citied in Fixes line.
>
> [...]
Here is the summary with links:
- [net-next] tcp: Revert no longer abort SYN_SENT when receiving some ICMP
https://git.kernel.org/netdev/net-next/c/b59db45d7eba
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2024-01-09 13:26 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-02 9:00 [PATCH net-next] tcp: Revert no longer abort SYN_SENT when receiving some ICMP Leon Romanovsky
2024-01-02 9:46 ` Eric Dumazet
2024-01-02 9:58 ` Leon Romanovsky
2024-01-02 10:03 ` Eric Dumazet
2024-01-02 10:11 ` Eric Dumazet
2024-01-02 11:41 ` Leon Romanovsky
2024-01-02 15:31 ` Eric Dumazet
2024-01-02 18:01 ` Leon Romanovsky
2024-01-02 18:33 ` Eric Dumazet
2024-01-02 19:13 ` Leon Romanovsky
2024-01-02 22:02 ` Jakub Kicinski
2024-01-03 6:00 ` Leon Romanovsky
2024-01-08 15:52 ` Eric Dumazet
2024-01-09 13:26 ` Leon Romanovsky
2024-01-09 3:20 ` patchwork-bot+netdevbpf
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).