public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <eric.dumazet@gmail.com>
To: Eric Dumazet <eric.dumazet@gmail.com>,
	Marek Majkowski <marek@cloudflare.com>
Cc: kuznet@ms2.inr.ac.ru, yoshfuji@linux-ipv6.org,
	Jakub Sitnicki <jakub@cloudflare.com>,
	netdev@vger.kernel.org, kernel-team <kernel-team@cloudflare.com>
Subject: Re: IPv6 flow label reflection behave for RST packets
Date: Tue, 9 Jul 2019 15:36:56 +0200	[thread overview]
Message-ID: <1cf380b3-843e-599a-105a-d1879852def1@gmail.com> (raw)
In-Reply-To: <8e2fca44-6fe7-42fc-8684-2cdd52c67103@gmail.com>



On 7/9/19 3:22 PM, Eric Dumazet wrote:
> 
> 
> On 7/9/19 2:33 PM, Marek Majkowski wrote:
>> Ha, thanks. I missed that.
>>
>> There is a caveat though. I don't think it's working as intended...
> 
> 
> Note that my commit really took a look at a fraction of the cases ;)
> 
> commit 323a53c41292a0d7efc8748856c623324c8d7c21
> 
>     ipv6: tcp: enable flowlabel reflection in some RST packets
>     
>     When RST packets are sent because no socket could be found,
>     it makes sense to use flowlabel_reflect sysctl to decide
>     if a reflection of the flowlabel is requested.
>     
> 
> In your case, a socket is found, most probably, and np->repflow seems to be ignored.
> 
> I'll take a look, thanks.

I guess a possible fix would be :

diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index d56a9019a0feb5a34312ec353c555f44b8c09b3d..2a298835317c0f6b1d82fb118dc4ba9647a2a110 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -984,8 +984,13 @@ static void tcp_v6_send_reset(const struct sock *sk, struct sk_buff *skb)
 
        if (sk) {
                oif = sk->sk_bound_dev_if;
-               if (sk_fullsock(sk))
+               if (sk_fullsock(sk)) {
+                       struct ipv6_pinfo *np = tcp_inet6_sk(sk);
+
                        trace_tcp_send_reset(sk, skb);
+                       if (np->repflow)
+                               label = ip6_flowlabel(ipv6h);
+               }
                if (sk->sk_state == TCP_TIME_WAIT)
                        label = cpu_to_be32(inet_twsk(sk)->tw_flowlabel);
        } else {


> 
>> Running my script:
>>
>> $ sysctl -w net.ipv6.flowlabel_reflect=3
>>
>> $ tail reflect.py
>> cd2.close()
>> cd.send(b"a")
>>
>> $ python3 reflect.py
>> IP6 (flowlabel 0xf2927, hlim 64) ::1.1235 > ::1.60246: Flags [F.]
>> IP6 (flowlabel 0xf2927, hlim 64) ::1.60246 > ::1.1235: Flags [P.]
>> IP6 (flowlabel 0x58ecd, hlim 64) ::1.1235 > ::1.60246: Flags [R]
>>
>> Note. The RST is opportunistic, depending on timing I sometimes get a
>> proper FIN, without RST.
>>
>> If I change the script to introduce some delay:
>>
>> $ tail reflect.py
>> cd2.close()
>> time.sleep(0.1)
>> cd.send(b"a")
>>
>> $ python3 reflect.py
>> IP6 (flowlabel 0x2f60c, hlim 64) ::1.60326 > ::1.1235: Flags [.]
>> IP6 (flowlabel 0x2f60c, hlim 64) ::1.60326 > ::1.1235: Flags [P.]
>> IP6 (flowlabel 0x2f60c, hlim 64) ::1.1235 > ::1.60326: Flags [R]
>>
>> Now it seem to work reliably. Tested on net-next under virtme.
>>
>> Marek
>>
>> On Tue, Jul 9, 2019 at 1:19 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>>
>>>
>>>
>>> On 7/9/19 1:10 PM, Marek Majkowski wrote:
>>>> Morning,
>>>>
>>>> I'm experimenting with flow label reflection from a server point of
>>>> view. I'm able to get it working in both supported ways:
>>>>
>>>> (a) per-socket with flow manager IPV6_FL_F_REFLECT and flowlabel_consistency=0
>>>>
>>>> (b) with global flowlabel_reflect sysctl
>>>>
>>>> However, I was surprised to see that RST after the connection is torn
>>>> down, doesn't have the correct flow label value:
>>>>
>>>> IP6 (flowlabel 0x3ba3d) ::1.59276 > ::1.1235: Flags [S]
>>>> IP6 (flowlabel 0x3ba3d) ::1.1235 > ::1.59276: Flags [S.]
>>>> IP6 (flowlabel 0x3ba3d) ::1.59276 > ::1.1235: Flags [.]
>>>> IP6 (flowlabel 0x3ba3d) ::1.1235 > ::1.59276: Flags [F.]
>>>> IP6 (flowlabel 0x3ba3d) ::1.59276 > ::1.1235: Flags [P.]
>>>> IP6 (flowlabel 0xdfc46) ::1.1235 > ::1.59276: Flags [R]
>>>>
>>>> Notice, the last RST packet has inconsistent flow label. Perhaps we
>>>> can argue this behaviour might be acceptable for a per-socket
>>>> IPV6_FL_F_REFLECT option, but with global flowlabel_reflect, I would
>>>> expect the RST to preserve the reflected flow label value.
>>>>
>>>> I suspect the same behaviour is true for kernel-generated ICMPv6.
>>>>
>>>> Prepared test case:
>>>> https://gist.github.com/majek/139081b84f9b5b6187c8ccff802e3ab3
>>>>
>>>> This behaviour is not necessarily a bug, more of a surprise. Flow
>>>> label reflection is mostly useful in deployments where Linux servers
>>>> stand behind ECMP router, which uses flow-label to compute the hash.
>>>> Flow label reflection allows ICMP PTB message to be routed back to
>>>> correct server.
>>>>
>>>> It's hard to imagine a situation where generated RST or ICMP echo
>>>> response would trigger a ICMP PTB. Flow label reflection is explained
>>>> here:
>>>> https://tools.ietf.org/html/draft-wang-6man-flow-label-reflection-01
>>>> and:
>>>> https://tools.ietf.org/html/rfc7098
>>>> https://tools.ietf.org/html/rfc6438
>>>>
>>>> Cheers,
>>>>     Marek
>>>>
>>>>
>>>> (Note: the unrelated "fwmark_reflect" toggle is about something
>>>> different - flow marks, but also addresses RST and ICMP generated by
>>>> the server)
>>>>
>>>
>>> Please check the recent commits, scheduled for linux-5.3
>>>
>>> a346abe051bd2bd0d5d0140b2da9ec95639acad7 ipv6: icmp: allow flowlabel reflection in echo replies
>>> c67b85558ff20cb1ff20874461d12af456bee5d0 ipv6: tcp: send consistent autoflowlabel in TIME_WAIT state
>>> 392096736a06bc9d8f2b42fd4bb1a44b245b9fed ipv6: tcp: fix potential NULL deref in tcp_v6_send_reset()
>>> 50a8accf10627b343109a9c9d5c361751bf753b0 ipv6: tcp: send consistent flowlabel in TIME_WAIT state
>>> 323a53c41292a0d7efc8748856c623324c8d7c21 ipv6: tcp: enable flowlabel reflection in some RST packets
>>>

  reply	other threads:[~2019-07-09 13:37 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-09 11:10 IPv6 flow label reflection behave for RST packets Marek Majkowski
2019-07-09 11:19 ` Eric Dumazet
2019-07-09 12:33   ` Marek Majkowski
2019-07-09 13:22     ` Eric Dumazet
2019-07-09 13:36       ` Eric Dumazet [this message]
2019-07-09 14:12         ` Marek Majkowski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1cf380b3-843e-599a-105a-d1879852def1@gmail.com \
    --to=eric.dumazet@gmail.com \
    --cc=jakub@cloudflare.com \
    --cc=kernel-team@cloudflare.com \
    --cc=kuznet@ms2.inr.ac.ru \
    --cc=marek@cloudflare.com \
    --cc=netdev@vger.kernel.org \
    --cc=yoshfuji@linux-ipv6.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox