netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eric Dumazet <eric.dumazet@gmail.com>
To: "Bartschies, Thomas" <Thomas.Bartschies@cvk.de>,
	'David Ahern' <dsahern@gmail.com>,
	"'netdev@vger.kernel.org'" <netdev@vger.kernel.org>
Subject: Re: AW: big ICMP requests get disrupted on IPSec tunnel activation
Date: Mon, 14 Oct 2019 08:31:32 -0700	[thread overview]
Message-ID: <e306f521-6d1e-837f-6cc6-8badb56e014c@gmail.com> (raw)
In-Reply-To: <EB8510AA7A943D43916A72C9B8F4181F62A0774F@cvk038.intra.cvk.de>



On 10/14/19 7:02 AM, Bartschies, Thomas wrote:
> Hello,
> 
> it took a while to build a testsystem for bisecting the issue. Finally I've identified the patch that causes my problems.
> BTW. The fq packet network scheduler is in use.
> 
> It's 
> [PATCH net-next] tcp/fq: move back to CLOCK_MONOTONIC
> 
> In the recent TCP/EDT patch series, I switched TCP and sch_fq clocks from MONOTONIC to TAI, in order to meet the choice done
> earlier for sch_etf packet scheduler.
> 
> But sure enough, this broke some setups were the TAI clock jumps forward (by almost 50 year...), as reported by Leonard Crestez.
> 
> If we want to converge later, we'll probably need to add an skb field to differentiate the clock bases, or a socket option.
> 
> In the meantime, an UDP application will need to use CLOCK_MONOTONIC base for its SCM_TXTIME timestamps if using fq 
> packet scheduler.
> 
> Fixes: 72b0094f9182 ("tcp: switch tcp_clock_ns() to CLOCK_TAI base")
> Fixes: 142537e41923 ("net_sched: sch_fq: switch to CLOCK_TAI")
> Fixes: fd2bca2aa789 ("tcp: switch internal pacing timer to CLOCK_TAI")
> Signed-off-by: Eric Dumazet <edumazet@xxxxxxxxxx>
> Reported-by: Leonard Crestez <leonard.crestez@xxxxxxx>
> 
> ----
> 
> After reverting it in a current 5.2.18 kernel, the problem disappears. There were some post fixes for other issues caused by this
> patch. These fixed other similar issues, but not mine. I've already tried to set the tstamp to zero in xfrm4_output.c, but with no
> luck so far. I'm pretty sure, that reverting the clock patch isn't the proper solution for upstream. So I what other way this can
> be fixed?


Thanks a lot Thomas for this report !

I guess you could add a debug check in fq to let us know the call graph.

Something like the following :

diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c
index 98dd87ce15108cfe1c011da44ba32f97763776c8..2aa41a39e81b94f3b7092dc51b91829f5929634d 100644
--- a/net/sched/sch_fq.c
+++ b/net/sched/sch_fq.c
@@ -380,9 +380,14 @@ static void flow_queue_add(struct fq_flow *flow, struct sk_buff *skb)
 {
        struct rb_node **p, *parent;
        struct sk_buff *head, *aux;
+       s64 delay;
 
        fq_skb_cb(skb)->time_to_send = skb->tstamp ?: ktime_get_ns();
 
+       /* We should really add a TCA_FQ_MAX_HORIZON  at some point :( */
+       delay = fq_skb_cb(skb)->time_to_send - ktime_get_ns();
+       WARN_ON_ONCE(delay > 60 * NSEC_PER_SEC);
+
        head = flow->head;
        if (!head ||
            fq_skb_cb(skb)->time_to_send >= fq_skb_cb(flow->tail)->time_to_send) {


> 
> ---
> [PATCH net] net: clear skb->tstamp in bridge forwarding path
> Matteo reported forwarding issues inside the linux bridge, if the enslaved interfaces use the fq qdisc.
> 
> Similar to commit 8203e2d844d3 ("net: clear skb->tstamp in forwarding paths"), we need to clear the tstamp field in
> the bridge forwarding path.
> 
> Fixes: 80b14dee2bea ("net: Add a new socket option for a future transmit time.")
> Fixes: fb420d5d91c1 ("tcp/fq: move back to CLOCK_MONOTONIC")
> Reported-and-tested-by: Matteo Croce <mcroce@redhat.com>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> 
> and
> 
> net: clear skb->tstamp in forwarding paths
> 
> Sergey reported that forwarding was no longer working if fq packet scheduler was used.
> 
> This is caused by the recent switch to EDT model, since incoming packets might have been timestamped by __net_timestamp()
> 
> __net_timestamp() uses ktime_get_real(), while fq expects packets using CLOCK_MONOTONIC base.
> 
> The fix is to clear skb->tstamp in forwarding paths.
> 
> Fixes: 80b14dee ("net: Add a new socket option for a future transmit time.")
> Fixes: fb420d5d ("tcp/fq: move back to CLOCK_MONOTONIC")
> Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
> Reported-by: default avatarSergey Matyukevich <geomatsi@gmail.com>
> Tested-by: default avatarSergey Matyukevich <geomatsi@gmail.com>
> Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
> 
> Best regards,
> --
> Thomas Bartschies
> CVK IT Systeme
> 
> 
> -----Ursprüngliche Nachricht-----
> Von: Bartschies, Thomas 
> Gesendet: Dienstag, 17. September 2019 09:28
> An: 'David Ahern' <dsahern@gmail.com>; 'netdev@vger.kernel.org' <netdev@vger.kernel.org>
> Betreff: AW: big ICMP requests get disrupted on IPSec tunnel activation
> 
> Hello,
> 
> thanks for the suggestion. Running pmtu.sh with kernel versions 4.19, 4.20 and even 5.2.13 made no difference. All tests were successful every time.
> 
> Although my external ping tests still failing with the newer kernels. I've ran the script after triggering my problem, to make sure all possible side effects happening. 
> 
> Please keep in mind, that even when the ICMP requests stalling, other connections still going through. Like e.g. ssh or tracepath. I would expect that all connection types would be affected if this is a MTU problem. Am I wrong?
> 
> Any suggestions for more tests to isolate the cause? 
> 
> Best regards,
> --
> Thomas Bartschies
> CVK IT Systeme
> 
> -----Ursprüngliche Nachricht-----
> Von: David Ahern [mailto:dsahern@gmail.com]
> Gesendet: Freitag, 13. September 2019 19:13
> An: Bartschies, Thomas <Thomas.Bartschies@cvk.de>; 'netdev@vger.kernel.org' <netdev@vger.kernel.org>
> Betreff: Re: big ICMP requests get disrupted on IPSec tunnel activation
> 
> On 9/13/19 9:59 AM, Bartschies, Thomas wrote:
>> Hello together,
>>
>> since kenel 4.20 we're observing a strange behaviour when sending big ICMP packets. An example is a packet size of 3000 bytes.
>> The packets should be forwarded by a linux gateway (firewall) having multiple interfaces also acting as a vpn gateway.
>>
>> Test steps:
>> 1. Disabled all iptables rules
>> 2. Enabled the VPN IPSec Policies.
>> 3. Start a ping with packet size (e.g. 3000 bytes) from a client in 
>> the DMZ passing the machine targeting another LAN machine 4. Ping 
>> works 5. Enable a VPN policy by sending pings from the gateway to a 
>> tunnel target. System tries to create the tunnel 6. Ping from 3. immediately stalls. No error messages. Just stops.
>> 7. Stop Ping from 3. Start another without packet size parameter. Stalls also.
>>
>> Result:
>> Connections from the client to other services on the LAN machine still 
>> work. Tracepath works. Only ICMP requests do not pass the gateway 
>> anymore. tcpdump sees them on incoming interface, but not on the outgoing LAN interface. IMCP requests to any other target IP address in LAN still work. Until one uses a bigger packet size. Then these alternative connections stall also.
>>
>> Flushing the policy table has no effect. Flushing the conntrack table has no effect. Setting rp_filter to loose (2) has no effect.
>> Flush the route cache has no effect.
>>
>> Only a reboot of the gateway restores normal behavior.
>>
>> What can be the cause? Is this a networking bug?
>>
> 
> some of these most likely will fail due to other reasons, but can you run 'tools/testing/selftests/net/pmtu.sh'[1] on 4.19 and then 4.20 and compare results. Hopefully it will shed some light on the problem and can be used to bisect to a commit that caused the regression.
> 
> 
> [1]
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/testing/selftests/net/pmtu.sh
> 

  reply	other threads:[~2019-10-14 15:31 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-13  8:59 big ICMP requests get disrupted on IPSec tunnel activation Bartschies, Thomas
2019-09-13 17:13 ` David Ahern
2019-09-17  7:28   ` AW: " Bartschies, Thomas
2019-10-14 14:02     ` Bartschies, Thomas
2019-10-14 15:31       ` Eric Dumazet [this message]
2019-10-15 10:11         ` AW: " Bartschies, Thomas
  -- strict thread matches above, loose matches on Subject: below --
2019-10-16 12:57 Bartschies, Thomas
2019-10-16 15:31 ` Eric Dumazet
2019-10-16 15:40   ` Eric Dumazet
2019-10-16 18:54     ` AW: " Bartschies, Thomas
2019-10-16 19:54       ` Eric Dumazet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e306f521-6d1e-837f-6cc6-8badb56e014c@gmail.com \
    --to=eric.dumazet@gmail.com \
    --cc=Thomas.Bartschies@cvk.de \
    --cc=dsahern@gmail.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).