* kTLS broken somewhere between 4.18 and 5.0
@ 2019-04-13 15:34 Steinar H. Gunderson
  2019-04-14  1:56 ` Andre Tomt
  0 siblings, 1 reply; 6+ messages in thread
From: Steinar H. Gunderson @ 2019-04-13 15:34 UTC (permalink / raw)
  To: netdev
Hi,
I've been using kTLS for a while, with my video reflector Cubemap
(https://git.sesse.net/?p=cubemap). After I upgraded my server from
4.18.11 to 5.0.6, seemingly I've started seeing corruption. The data sent
with send() (HTTP headers, HLS playlists) appears to be fine, but sendfile()
(actual video data, from a file on tmpfs) is not; after ~20 kB of data
(19626 in one test here), the data appears to be randomly corrupted. Diffing
non-TLS (good) and TLS (bad) video data:
  00004c70: fa 70 c5 71 b5 f5 b7 ac 74 b0 ca 80 02 4c 06 3f  .p.q....t....L.?
  00004c80: 5c 5b 0c b3 e0 a0 c3 21 93 d3 6e 65 36 70 0a 27  \[.....!..ne6p.'
  00004c90: 84 67 16 2c 95 c0 55 e1 04 76 52 10 50 5d 00 26  .g.,..U..vR.P].&
 -00004ca0: 0c b8 84 70 7e ed 12 8f 5e 7e 18 c0 06 20 02 54  ...p~...^~... .T
 +00004ca0: 0c b8 84 70 7e ed 12 8f 5e 7e 0a 60 9f 1f 97 f2  ...p~...^~.`....
 -00004cb0: 1e 4c c1 71 7d 0b 91 28 23 98 09 ae c4 95 ae 7f  .L.q}..(#.......
 +00004cb0: 6e 17 50 03 67 fa 2f 83 b0 88 eb fc 54 f2 0b 00  n.P.g./.....T...
 -00004cc0: a2 92 20 b8 f2 b6 72 2a e8 7e d7 27 99 65 56 70  .. ...r*.~.'.eVp
 +00004cc0: 6c 9e a1 02 b4 30 11 25 d7 58 b0 0c c0 6c e1 bd  l....0.%.X...l..
It never appears to get back into sync after that. Interestingly, it is
_consistently_ wrong; if I download the same fragment multiple times, it
breaks at the same place and gives the same garbage (but different fragments
give different divergence points). Tested with both wget and Chrome.
Does anyone know what could be wrong?
(It is, unfortunately, not easy for me to reboot this server at will, so a
bisect could be hard.)
Please Cc me on any replies, I'm not subscribed to netdev.
/* Steinar */
-- 
Homepage: https://www.sesse.net/
^ permalink raw reply	[flat|nested] 6+ messages in thread
* Re: kTLS broken somewhere between 4.18 and 5.0
  2019-04-13 15:34 kTLS broken somewhere between 4.18 and 5.0 Steinar H. Gunderson
@ 2019-04-14  1:56 ` Andre Tomt
  2019-04-14 20:40   ` John Fastabend
  0 siblings, 1 reply; 6+ messages in thread
From: Andre Tomt @ 2019-04-14  1:56 UTC (permalink / raw)
  To: Steinar H. Gunderson, netdev, John Fastabend, Daniel Borkmann
On 13.04.2019 17:34, Steinar H. Gunderson wrote:
> Hi,
> 
> I've been using kTLS for a while, with my video reflector Cubemap
> (https://git.sesse.net/?p=cubemap). After I upgraded my server from
> 4.18.11 to 5.0.6, seemingly I've started seeing corruption. The data sent
> with send() (HTTP headers, HLS playlists) appears to be fine, but sendfile()
> (actual video data, from a file on tmpfs) is not; after ~20 kB of data
> (19626 in one test here), the data appears to be randomly corrupted. Diffing
> non-TLS (good) and TLS (bad) video data:
> 
>    00004c70: fa 70 c5 71 b5 f5 b7 ac 74 b0 ca 80 02 4c 06 3f  .p.q....t....L.?
>    00004c80: 5c 5b 0c b3 e0 a0 c3 21 93 d3 6e 65 36 70 0a 27  \[.....!..ne6p.'
>    00004c90: 84 67 16 2c 95 c0 55 e1 04 76 52 10 50 5d 00 26  .g.,..U..vR.P].&
> 
>   -00004ca0: 0c b8 84 70 7e ed 12 8f 5e 7e 18 c0 06 20 02 54  ...p~...^~... .T
>   +00004ca0: 0c b8 84 70 7e ed 12 8f 5e 7e 0a 60 9f 1f 97 f2  ...p~...^~.`....
> 
>   -00004cb0: 1e 4c c1 71 7d 0b 91 28 23 98 09 ae c4 95 ae 7f  .L.q}..(#.......
>   +00004cb0: 6e 17 50 03 67 fa 2f 83 b0 88 eb fc 54 f2 0b 00  n.P.g./.....T...
> 
>   -00004cc0: a2 92 20 b8 f2 b6 72 2a e8 7e d7 27 99 65 56 70  .. ...r*.~.'.eVp
>   +00004cc0: 6c 9e a1 02 b4 30 11 25 d7 58 b0 0c c0 6c e1 bd  l....0.%.X...l..
> 
> It never appears to get back into sync after that. Interestingly, it is
> _consistently_ wrong; if I download the same fragment multiple times, it
> breaks at the same place and gives the same garbage (but different fragments
> give different divergence points). Tested with both wget and Chrome.
> Does anyone know what could be wrong?
> 
> (It is, unfortunately, not easy for me to reboot this server at will, so a
> bisect could be hard.)
> 
> Please Cc me on any replies, I'm not subscribed to netdev.
> 
> /* Steinar */
> 
Reproduced and bisected, the problem showed up in v4.20-rc1. 
Unfortunately the commit seems to have some significant dependencies so 
I was unable to verify by reverting it on 4.20.
Adding John and Daniel.
d3b18ad31f93d0b6bae105c679018a1ba7daa9ca is the first bad commit
commit d3b18ad31f93d0b6bae105c679018a1ba7daa9ca
Author: John Fastabend <john.fastabend@gmail.com>
Date:   Sat Oct 13 02:46:01 2018 +0200
     tls: add bpf support to sk_msg handling
     This work adds BPF sk_msg verdict program support to kTLS
     allowing BPF and kTLS to be combined together. Previously kTLS
     and sk_msg verdict programs were mutually exclusive in the
     ULP layer which created challenges for the orchestrator when
     trying to apply TCP based policy, for example. To resolve this,
     leveraging the work from previous patches that consolidates
     the use of sk_msg, we can finally enable BPF sk_msg verdict
     programs so they continue to run after the kTLS socket is
     created. No change in behavior when kTLS is not used in
     combination with BPF, the kselftest suite for kTLS also runs
     successfully.
     Joint work with Daniel.
     Signed-off-by: John Fastabend <john.fastabend@gmail.com>
     Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
     Signed-off-by: Alexei Starovoitov <ast@kernel.org>
:040000 040000 107a1f08fa7d54610292047bbd360a6bf9fff78a 
1ef18a0495f094c6d771a371f6c05f849daff512 M	include
:040000 040000 56e6ebd0c6dc0a5aa8d371332cae6bff6cdcc1ff 
44ddc12947dce93b449e468ed6862d475c33f32b M	net
$ git bisect log
git bisect start '--' 'net' 'include/net'
# good: [84df9525b0c27f3ebc2ebb1864fa62a97fdedb7d] Linux 4.19
git bisect good 84df9525b0c27f3ebc2ebb1864fa62a97fdedb7d
# bad: [8fe28cb58bcb235034b64cbbb7550a8a43fd88be] Linux 4.20
git bisect bad 8fe28cb58bcb235034b64cbbb7550a8a43fd88be
# bad: [aa563d7bca6e882ec2bdae24603c8f016401a144] iov_iter: Separate 
type from direction and use accessor functions
git bisect bad aa563d7bca6e882ec2bdae24603c8f016401a144
# good: [85dd3da43dd59b9220d9cba4f933a3dc0ea6faa5] cfg80211: combine 
wdev/netdev unregister code
git bisect good 85dd3da43dd59b9220d9cba4f933a3dc0ea6faa5
# good: [9000a457a0c84883874a844ef94adf26f633f3b4] Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
git bisect good 9000a457a0c84883874a844ef94adf26f633f3b4
# good: [1a3aea2534f4f3083f29b2b047aa83a9d6c777a4] net: bridge: fix a 
possible memory leak in __vlan_add
git bisect good 1a3aea2534f4f3083f29b2b047aa83a9d6c777a4
# bad: [cb167893f41e21e6bd283d78e53489289dc0592d] net: Plumb support for 
filtering ipv4 and ipv6 multicast route dumps
git bisect bad cb167893f41e21e6bd283d78e53489289dc0592d
# bad: [5ef0ae84f02a4dbe0e09f89c6481ac13649cb19b] bpf: Fix IPv6 dport 
byte-order in bpf_sk_lookup
git bisect bad 5ef0ae84f02a4dbe0e09f89c6481ac13649cb19b
# good: [604326b41a6fb9b4a78b6179335decee0365cd8c] bpf, sockmap: convert 
to generic sk_msg interface
git bisect good 604326b41a6fb9b4a78b6179335decee0365cd8c
# good: [924ad65ed01ee0eec5d2a3280c01c394343d6df7] tls: replace poll 
implementation with read hook
git bisect good 924ad65ed01ee0eec5d2a3280c01c394343d6df7
# bad: [8a615c6b0352a9ec56151b6c95d68e0a2eef5cf0] bpf: Allow sk_lookup 
with IPv6 module
git bisect bad 8a615c6b0352a9ec56151b6c95d68e0a2eef5cf0
# bad: [d3b18ad31f93d0b6bae105c679018a1ba7daa9ca] tls: add bpf support 
to sk_msg handling
git bisect bad d3b18ad31f93d0b6bae105c679018a1ba7daa9ca
# first bad commit: [d3b18ad31f93d0b6bae105c679018a1ba7daa9ca] tls: add 
bpf support to sk_msg handling
^ permalink raw reply	[flat|nested] 6+ messages in thread
* Re: kTLS broken somewhere between 4.18 and 5.0
  2019-04-14  1:56 ` Andre Tomt
@ 2019-04-14 20:40   ` John Fastabend
  2019-05-02 14:15     ` Andre Tomt
  0 siblings, 1 reply; 6+ messages in thread
From: John Fastabend @ 2019-04-14 20:40 UTC (permalink / raw)
  To: Andre Tomt, Steinar H. Gunderson, netdev, Daniel Borkmann
On 4/13/19 6:56 PM, Andre Tomt wrote:
> On 13.04.2019 17:34, Steinar H. Gunderson wrote:
>> Hi,
>>
>> I've been using kTLS for a while, with my video reflector Cubemap
>> (https://git.sesse.net/?p=cubemap). After I upgraded my server from
>> 4.18.11 to 5.0.6, seemingly I've started seeing corruption. The data sent
>> with send() (HTTP headers, HLS playlists) appears to be fine, but sendfile()
>> (actual video data, from a file on tmpfs) is not; after ~20 kB of data
>> (19626 in one test here), the data appears to be randomly corrupted. Diffing
>> non-TLS (good) and TLS (bad) video data:
>>
>>    00004c70: fa 70 c5 71 b5 f5 b7 ac 74 b0 ca 80 02 4c 06 3f  .p.q....t....L.?
>>    00004c80: 5c 5b 0c b3 e0 a0 c3 21 93 d3 6e 65 36 70 0a 27  \[.....!..ne6p.'
>>    00004c90: 84 67 16 2c 95 c0 55 e1 04 76 52 10 50 5d 00 26  .g.,..U..vR.P].&
>>
>>   -00004ca0: 0c b8 84 70 7e ed 12 8f 5e 7e 18 c0 06 20 02 54  ...p~...^~... .T
>>   +00004ca0: 0c b8 84 70 7e ed 12 8f 5e 7e 0a 60 9f 1f 97 f2  ...p~...^~.`....
>>
>>   -00004cb0: 1e 4c c1 71 7d 0b 91 28 23 98 09 ae c4 95 ae 7f  .L.q}..(#.......
>>   +00004cb0: 6e 17 50 03 67 fa 2f 83 b0 88 eb fc 54 f2 0b 00  n.P.g./.....T...
>>
>>   -00004cc0: a2 92 20 b8 f2 b6 72 2a e8 7e d7 27 99 65 56 70  .. ...r*.~.'.eVp
>>   +00004cc0: 6c 9e a1 02 b4 30 11 25 d7 58 b0 0c c0 6c e1 bd  l....0.%.X...l..
>>
>> It never appears to get back into sync after that. Interestingly, it is
>> _consistently_ wrong; if I download the same fragment multiple times, it
>> breaks at the same place and gives the same garbage (but different fragments
>> give different divergence points). Tested with both wget and Chrome.
>> Does anyone know what could be wrong?
>>
>> (It is, unfortunately, not easy for me to reboot this server at will, so a
>> bisect could be hard.)
>>
>> Please Cc me on any replies, I'm not subscribed to netdev.
>>
>> /* Steinar */
>>
> 
> 
> Reproduced and bisected, the problem showed up in v4.20-rc1. Unfortunately the commit seems to have some significant dependencies so I was unable to verify by reverting it on 4.20.
> 
Hi thanks I'll take a look this evening or first thing tomorrow. I have
a couple other fixes queued up for ktls as well so I'll see if we can get
a fix for this included in that series.
Thanks!
John
^ permalink raw reply	[flat|nested] 6+ messages in thread
* Re: kTLS broken somewhere between 4.18 and 5.0
  2019-04-14 20:40   ` John Fastabend
@ 2019-05-02 14:15     ` Andre Tomt
  2019-05-07 14:45       ` John Fastabend
  0 siblings, 1 reply; 6+ messages in thread
From: Andre Tomt @ 2019-05-02 14:15 UTC (permalink / raw)
  To: John Fastabend, Steinar H. Gunderson, netdev, Daniel Borkmann
On 14.04.2019 22:40, John Fastabend wrote:
> On 4/13/19 6:56 PM, Andre Tomt wrote:
>> On 13.04.2019 17:34, Steinar H. Gunderson wrote:
>>> Hi,
>>>
>>> I've been using kTLS for a while, with my video reflector Cubemap
>>> (https://git.sesse.net/?p=cubemap). After I upgraded my server from
>>> 4.18.11 to 5.0.6, seemingly I've started seeing corruption. The data sent
>>> with send() (HTTP headers, HLS playlists) appears to be fine, but sendfile()
>>> (actual video data, from a file on tmpfs) is not; after ~20 kB of data
>>> (19626 in one test here), the data appears to be randomly corrupted. Diffing
>>> non-TLS (good) and TLS (bad) video data:
>>>
>>>     00004c70: fa 70 c5 71 b5 f5 b7 ac 74 b0 ca 80 02 4c 06 3f  .p.q....t....L.?
>>>     00004c80: 5c 5b 0c b3 e0 a0 c3 21 93 d3 6e 65 36 70 0a 27  \[.....!..ne6p.'
>>>     00004c90: 84 67 16 2c 95 c0 55 e1 04 76 52 10 50 5d 00 26  .g.,..U..vR.P].&
>>>
>>>    -00004ca0: 0c b8 84 70 7e ed 12 8f 5e 7e 18 c0 06 20 02 54  ...p~...^~... .T
>>>    +00004ca0: 0c b8 84 70 7e ed 12 8f 5e 7e 0a 60 9f 1f 97 f2  ...p~...^~.`....
>>>
>>>    -00004cb0: 1e 4c c1 71 7d 0b 91 28 23 98 09 ae c4 95 ae 7f  .L.q}..(#.......
>>>    +00004cb0: 6e 17 50 03 67 fa 2f 83 b0 88 eb fc 54 f2 0b 00  n.P.g./.....T...
>>>
>>>    -00004cc0: a2 92 20 b8 f2 b6 72 2a e8 7e d7 27 99 65 56 70  .. ...r*.~.'.eVp
>>>    +00004cc0: 6c 9e a1 02 b4 30 11 25 d7 58 b0 0c c0 6c e1 bd  l....0.%.X...l..
>>>
>>> It never appears to get back into sync after that. Interestingly, it is
>>> _consistently_ wrong; if I download the same fragment multiple times, it
>>> breaks at the same place and gives the same garbage (but different fragments
>>> give different divergence points). Tested with both wget and Chrome.
>>> Does anyone know what could be wrong?
>>>
>>> (It is, unfortunately, not easy for me to reboot this server at will, so a
>>> bisect could be hard.)
>>>
>>> Please Cc me on any replies, I'm not subscribed to netdev.
>>>
>>> /* Steinar */
>>>
>>
>>
>> Reproduced and bisected, the problem showed up in v4.20-rc1. Unfortunately the commit seems to have some significant dependencies so I was unable to verify by reverting it on 4.20.
>>
> 
> 
> Hi thanks I'll take a look this evening or first thing tomorrow. I have
> a couple other fixes queued up for ktls as well so I'll see if we can get
> a fix for this included in that series.
> 
> Thanks!
> John
Hi John
Have you had any luck tracking this down?
Just gave net.git a spin and it is still serving up corrupted data when 
ktls is active and using sendfile.  FWIW I only tested without ktls 
offload capable hardware (ie in software mode) and no bpf. Same sendfile 
usage on a non-ktls socket works fine.
^ permalink raw reply	[flat|nested] 6+ messages in thread
* Re: kTLS broken somewhere between 4.18 and 5.0
  2019-05-02 14:15     ` Andre Tomt
@ 2019-05-07 14:45       ` John Fastabend
  2019-05-30 22:41         ` Andre Tomt
  0 siblings, 1 reply; 6+ messages in thread
From: John Fastabend @ 2019-05-07 14:45 UTC (permalink / raw)
  To: Andre Tomt, John Fastabend, Steinar H. Gunderson, netdev,
	Daniel Borkmann
Andre Tomt wrote:
> On 14.04.2019 22:40, John Fastabend wrote:
> > On 4/13/19 6:56 PM, Andre Tomt wrote:
> >> On 13.04.2019 17:34, Steinar H. Gunderson wrote:
> >>> Hi,
> >>>
> >>> I've been using kTLS for a while, with my video reflector Cubemap
> >>> (https://git.sesse.net/?p=cubemap). After I upgraded my server from
> >>> 4.18.11 to 5.0.6, seemingly I've started seeing corruption. The data sent
> >>> with send() (HTTP headers, HLS playlists) appears to be fine, but sendfile()
> >>> (actual video data, from a file on tmpfs) is not; after ~20 kB of data
> >>> (19626 in one test here), the data appears to be randomly corrupted. Diffing
> >>> non-TLS (good) and TLS (bad) video data:
[...]
> Hi John
> 
> Have you had any luck tracking this down?
> 
> Just gave net.git a spin and it is still serving up corrupted data when 
> ktls is active and using sendfile.  FWIW I only tested without ktls 
> offload capable hardware (ie in software mode) and no bpf. Same sendfile 
> usage on a non-ktls socket works fine.
Hi Andre, I should have a series to address this in the next few days. I
still need to resolve a couple corner cases. Hopefully, by next week we
can get bpf tree working for this case.
Thanks,
John
^ permalink raw reply	[flat|nested] 6+ messages in thread
* Re: kTLS broken somewhere between 4.18 and 5.0
  2019-05-07 14:45       ` John Fastabend
@ 2019-05-30 22:41         ` Andre Tomt
  0 siblings, 0 replies; 6+ messages in thread
From: Andre Tomt @ 2019-05-30 22:41 UTC (permalink / raw)
  To: John Fastabend, Steinar H. Gunderson, netdev, Daniel Borkmann
On 07.05.2019 16:45, John Fastabend wrote:
> Andre Tomt wrote:
>> On 14.04.2019 22:40, John Fastabend wrote:
>>> On 4/13/19 6:56 PM, Andre Tomt wrote:
>>>> On 13.04.2019 17:34, Steinar H. Gunderson wrote:
>>>>> Hi,
>>>>>
>>>>> I've been using kTLS for a while, with my video reflector Cubemap
>>>>> (https://git.sesse.net/?p=cubemap). After I upgraded my server from
>>>>> 4.18.11 to 5.0.6, seemingly I've started seeing corruption. The data sent
>>>>> with send() (HTTP headers, HLS playlists) appears to be fine, but sendfile()
>>>>> (actual video data, from a file on tmpfs) is not; after ~20 kB of data
>>>>> (19626 in one test here), the data appears to be randomly corrupted. Diffing
>>>>> non-TLS (good) and TLS (bad) video data:
> 
> [...]
> 
>> Hi John
>>
>> Have you had any luck tracking this down?
>>
>> Just gave net.git a spin and it is still serving up corrupted data when
>> ktls is active and using sendfile.  FWIW I only tested without ktls
>> offload capable hardware (ie in software mode) and no bpf. Same sendfile
>> usage on a non-ktls socket works fine.
> 
> Hi Andre, I should have a series to address this in the next few days. I
> still need to resolve a couple corner cases. Hopefully, by next week we
> can get bpf tree working for this case.
current linus master, net.git master and bpf.git master are all still 
not working right, so I took a closer look.
It seems to only happen if sendfile writes more than a maximum tls 
record size worth of data. If I clamp the sendfile calls to 16384 bytes 
at a time everything works fine.
Not sure if sendfile triggers record splitting, but could that be what 
is broken? The bisected commit does touch that part quite extensively.
I made a fest input that just repeats 0-9 a-z 10 times for each 
character in a loop and got the following corruption post-decryption:
> 00004f74: 6969 6969 6969 6969 6969  iiiiiiiiii
> 00004f7e: 6a6a 6a6a 6a6a 6a6a 6a6a  jjjjjjjjjj
> 00004f88: 6b6b 6b6b 6b6b 6b6b 6b6b  kkkkkkkkkk
> 00004f92: 6c6c 6c6c 6c6c 6c6c 6c6c  llllllllll
> 00004f9c: 6d6d 6d6d 6d6d 6d6d 6d6d  mmmmmmmmmm
> 00004fa6: 6e6e 6e6e 6e6e 6e6e 6e6e  nnnnnnnnnn
> 00004fb0: 6f6f 6f6f 6f6f 6f6f 6f6f  oooooooooo
> 00004fba: 7070 7070 7070 7070 7070  pppppppppp
> 00004fc4: 7171 7171 7171 7171 7171  qqqqqqqqqq
> 00004fce: 7272 6d6d 6d6d 6d6d 6e6e  rrmmmmmmnn <- uh oh, goes backwards?
> 00004fd8: 6e6e 6e6e 6e6e 6e6e 6f6f  nnnnnnnnoo
> 00004fe2: 6f6f 6f6f 6f6f 6f6f 7070  oooooooopp
> 00004fec: 7070 7070 7070 7070 7171  ppppppppqq
> 00004ff6: 7171 7171 7171 7171 7272  qqqqqqqqrr
> 00005000: 7272 7272 7272 7272 7373  rrrrrrrrss
> 0000500a: 7373 7373 7373 7373 7474  sssssssstt
> 00005014: 7474 7474 7474 7474 7575  ttttttttuu
> 0000501e: 7575 7575 7575 7575 7676  uuuuuuuuvv
> 00005028: 7676 7676 7676 7676 7777  vvvvvvvvww
^ permalink raw reply	[flat|nested] 6+ messages in thread
end of thread, other threads:[~2019-05-30 22:41 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-04-13 15:34 kTLS broken somewhere between 4.18 and 5.0 Steinar H. Gunderson
2019-04-14  1:56 ` Andre Tomt
2019-04-14 20:40   ` John Fastabend
2019-05-02 14:15     ` Andre Tomt
2019-05-07 14:45       ` John Fastabend
2019-05-30 22:41         ` Andre Tomt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).