* KCM - recvmsg() mangles packets?
@ 2018-08-03 18:28 Dominique Martinet
2018-08-03 22:46 ` Tom Herbert
0 siblings, 1 reply; 11+ messages in thread
From: Dominique Martinet @ 2018-08-03 18:28 UTC (permalink / raw)
To: Tom Herbert, netdev
[-- Attachment #1: Type: text/plain, Size: 1787 bytes --]
I've been playing with KCM on a 4.18.0-rc7 kernel and I'm running in a
problem where the iovec filled by recvmsg() is mangled up: it is filled
by the length of one packet, but contains (truncated) data from another
packet, rendering KCM unuseable.
(I haven't tried old kernels to see for how long this is broken/try to
bisect; I might if there's no progress but this might be simpler than I
think)
I've attached a reproducer, a simple program that forks, creates a tcp
server/client, attach the server socket to a kcm socket, and in an
infinite loop sends varying-length messages from the client to the
server.
The loop stops when the server gets a message which length is not the
length indicated in the packet header, rather fast (I can make it run
for a while if I slow down emission, or if I run a verbose tcpdump for
example)
In the quiet version on a VM on my laptop, I get this output:
[root@f2 ~]# gcc -g -l bcc -o kcm kcm.c
[root@f2 ~]# ./kcm
client is starting
server is starting
server is receiving data
Got 14, expected 27 on 1th message: 22222222222222; flags: 80
The client sends message deterministacally, first one is 14 bytes filled
with 1, second one is 27 bytes filled with 2, third one is 9 bytes
filled with 3 etc (final digit is actually a \0 instead)
As we can see, the server received 14 '2', and the header size matches
the second message header, so something went wrong™.
Flags 0x80 is MSG_EOR meaning recvmsg copied the full message.
This happens even if I reduce the VMs CPU to 1, so I was thinking some
irq messes with the sock between skb_peek and the actual copy of the
data (as this deos work if I send slowly!), but even disabling
irq/preempt doesn't seem to help so I'm not sure what to try next.
Any idea?
Thanks,
--
Dominique Martinet
[-- Attachment #2: kcm.c --]
[-- Type: text/x-csrc, Size: 4860 bytes --]
/*
* A sample program of KCM.
* Originally https://gist.github.com/peo3/fd0e266a3852d3422c08854aba96bff5
*
* $ gcc -lbcc kcm-sample.c
* $ ./a.out 10000
*/
#include <err.h>
#include <errno.h>
#include <netdb.h>
#include <poll.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/ioctl.h>
#include <sys/socket.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <netinet/in.h>
#include <linux/kcm.h>
#include <bcc/bpf_common.h>
#include <bcc/libbpf.h>
struct my_proto {
struct _hdr {
uint32_t len;
} hdr;
char data[32];
} __attribute__((packed));
// use htons to use LE header size, since load_half does a first convertion
// from network byte order
const char *bpf_prog_string = " \
ssize_t bpf_prog1(struct __sk_buff *skb) \
{ \
return bpf_htons(load_half(skb, 0)) + 4; \
}";
int servsock_init(int port)
{
int s, error;
struct sockaddr_in addr;
s = socket(AF_INET, SOCK_STREAM, 0);
addr.sin_family = AF_INET;
addr.sin_port = htons(port);
addr.sin_addr.s_addr = INADDR_ANY;
error = bind(s, (struct sockaddr *)&addr, sizeof(addr));
if (error == -1)
err(EXIT_FAILURE, "bind");
error = listen(s, 10);
if (error == -1)
err(EXIT_FAILURE, "listen");
return s;
}
int bpf_init(void)
{
int fd, map_fd;
void *mod;
int key;
long long value = 0;
mod = bpf_module_create_c_from_string(bpf_prog_string, 0, NULL, 0);
fd = bpf_prog_load(
BPF_PROG_TYPE_SOCKET_FILTER,
"bpf_prog1",
bpf_function_start(mod, "bpf_prog1"),
bpf_function_size(mod, "bpf_prog1"),
bpf_module_license(mod),
bpf_module_kern_version(mod),
0, NULL, 0);
if (fd == -1)
exit(1);
return fd;
}
void client(int port)
{
int s, error;
struct sockaddr_in addr;
struct hostent *host;
struct my_proto my_msg;
int len;
printf("client is starting\n");
s = socket(AF_INET, SOCK_STREAM, 0);
if (s == -1)
err(EXIT_FAILURE, "socket");
memset(&addr, 0, sizeof(addr));
addr.sin_family = AF_INET;
addr.sin_port = htons(port);
host = gethostbyname("localhost");
if (host == NULL)
err(EXIT_FAILURE, "gethostbyname");
memcpy(&addr.sin_addr, host->h_addr, host->h_length);
error = connect(s, (struct sockaddr *)&addr, sizeof(addr));
if (error == -1)
err(EXIT_FAILURE, "connect");
len = sprintf(my_msg.data, "1234567890123456789012345678901");
my_msg.data[len] = '\0';
my_msg.hdr.len = len + 1;
int i = 1;
while(1) {
my_msg.hdr.len = (i++ * 1312739ULL) % 31 + 1;
for (int j = 0; j < my_msg.hdr.len; ) {
j += snprintf(my_msg.data + j, my_msg.hdr.len - j, "%i", i - 1);
}
my_msg.data[my_msg.hdr.len-1] = '\0';
//printf("%d: writing %d\n", i-1, my_msg.hdr.len);
len = write(s, &my_msg, sizeof(my_msg.hdr) + my_msg.hdr.len);
if (error == -1)
err(EXIT_FAILURE, "write");
//usleep(10000);
}
close(s);
}
int kcm_init(void)
{
int kcmfd;
kcmfd = socket(AF_KCM, SOCK_DGRAM, KCMPROTO_CONNECTED);
if (kcmfd == -1)
err(EXIT_FAILURE, "socket(AF_KCM)");
return kcmfd;
}
int kcm_attach(int kcmfd, int csock, int bpf_prog_fd)
{
int error;
struct kcm_attach attach_info = {
.fd = csock,
.bpf_fd = bpf_prog_fd,
};
error = ioctl(kcmfd, SIOCKCMATTACH, &attach_info);
if (error == -1)
err(EXIT_FAILURE, "ioctl(SIOCKCMATTACH)");
}
void process(int kcmfd)
{
struct my_proto my_msg;
int error, len;
struct msghdr msg;
struct iovec iov = {
.iov_base = &my_msg,
.iov_len = sizeof(my_msg),
};
printf("server is receiving data\n");
int i =0;
while (1) {
memset(&msg, 0, sizeof(msg));
msg.msg_iov = &iov;
msg.msg_iovlen = 1;
memset(&my_msg, 0, sizeof(my_msg));
len = recvmsg(kcmfd, &msg, 0);
if (len == -1)
err(EXIT_FAILURE, "recvmsg");
if (len != my_msg.hdr.len + 4) {
printf("Got %d, expected %d on %dth message: %s; flags: %x\n", len - 4, my_msg.hdr.len, i, my_msg.data, msg.msg_flags);
exit(1);
}
i++;
}
}
void server(int tcpfd, int bpf_prog_fd)
{
int kcmfd, error;
struct sockaddr_in client;
int len, csock;
printf("server is starting\n");
kcmfd = kcm_init();
len = sizeof(client);
csock = accept(tcpfd, (struct sockaddr *)&client, &len);
if (csock == -1)
err(EXIT_FAILURE, "accept");
kcm_attach(kcmfd, csock, bpf_prog_fd);
process(kcmfd);
close(kcmfd);
}
int main(int argc, char **argv)
{
int error, tcpfd, bpf_prog_fd;
pid_t pid;
int pipefd[2];
int dummy;
int port = argc > 1 ? atoi(argv[1]) : 10000;
error = pipe(pipefd);
if (error == -1)
err(EXIT_FAILURE, "pipe");
pid = fork();
if (pid == -1)
err(EXIT_FAILURE, "fork");
if (pid == 0) {
/* wait for server's ready */
read(pipefd[0], &dummy, sizeof(dummy));
client(port);
exit(0);
}
tcpfd = servsock_init(port);
bpf_prog_fd = bpf_init();
/* tell ready */
write(pipefd[1], &dummy, sizeof(dummy));
server(tcpfd, bpf_prog_fd);
waitpid(pid, NULL, 0);
close(bpf_prog_fd);
close(tcpfd);
return 0;
}
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: KCM - recvmsg() mangles packets?
2018-08-03 18:28 KCM - recvmsg() mangles packets? Dominique Martinet
@ 2018-08-03 22:46 ` Tom Herbert
2018-08-03 23:20 ` Dominique Martinet
0 siblings, 1 reply; 11+ messages in thread
From: Tom Herbert @ 2018-08-03 22:46 UTC (permalink / raw)
To: Dominique Martinet; +Cc: Linux Kernel Network Developers
struct my_proto {
struct _hdr {
uint32_t len;
} hdr;
char data[32];
} __attribute__((packed));
// use htons to use LE header size, since load_half does a first convertion
// from network byte order
const char *bpf_prog_string = " \
ssize_t bpf_prog1(struct __sk_buff *skb) \
{ \
return bpf_htons(load_half(skb, 0)) + 4; \
}";
On Fri, Aug 3, 2018 at 11:28 AM, Dominique Martinet
<asmadeus@codewreck.org> wrote:
> I've been playing with KCM on a 4.18.0-rc7 kernel and I'm running in a
> problem where the iovec filled by recvmsg() is mangled up: it is filled
> by the length of one packet, but contains (truncated) data from another
> packet, rendering KCM unuseable.
>
> (I haven't tried old kernels to see for how long this is broken/try to
> bisect; I might if there's no progress but this might be simpler than I
> think)
>
>
> I've attached a reproducer, a simple program that forks, creates a tcp
> server/client, attach the server socket to a kcm socket, and in an
> infinite loop sends varying-length messages from the client to the
> server.
> The loop stops when the server gets a message which length is not the
> length indicated in the packet header, rather fast (I can make it run
> for a while if I slow down emission, or if I run a verbose tcpdump for
> example)
>
>From the reproducer:
struct my_proto {
struct _hdr {
uint32_t len;
} hdr;
char data[32];
} __attribute__((packed));
// use htons to use LE header size, since load_half does a first convertion
// from network byte order
const char *bpf_prog_string = " \
ssize_t bpf_prog1(struct __sk_buff *skb) \
{ \
return bpf_htons(load_half(skb, 0)) + 4; \
}";
The length in hdr is uint32_t above, but this looks like it's being
read as a short.
Tom
> In the quiet version on a VM on my laptop, I get this output:
> [root@f2 ~]# gcc -g -l bcc -o kcm kcm.c
> [root@f2 ~]# ./kcm
> client is starting
> server is starting
> server is receiving data
> Got 14, expected 27 on 1th message: 22222222222222; flags: 80
>
> The client sends message deterministacally, first one is 14 bytes filled
> with 1, second one is 27 bytes filled with 2, third one is 9 bytes
> filled with 3 etc (final digit is actually a \0 instead)
>
> As we can see, the server received 14 '2', and the header size matches
> the second message header, so something went wrong™.
> Flags 0x80 is MSG_EOR meaning recvmsg copied the full message.
>
>
>
> This happens even if I reduce the VMs CPU to 1, so I was thinking some
> irq messes with the sock between skb_peek and the actual copy of the
> data (as this deos work if I send slowly!), but even disabling
> irq/preempt doesn't seem to help so I'm not sure what to try next.
>
> Any idea?
>
>
> Thanks,
> --
> Dominique Martinet
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: KCM - recvmsg() mangles packets?
2018-08-03 22:46 ` Tom Herbert
@ 2018-08-03 23:20 ` Dominique Martinet
2018-08-04 1:18 ` Tom Herbert
0 siblings, 1 reply; 11+ messages in thread
From: Dominique Martinet @ 2018-08-03 23:20 UTC (permalink / raw)
To: Tom Herbert; +Cc: Linux Kernel Network Developers
Tom Herbert wrote on Fri, Aug 03, 2018:
> struct my_proto {
> struct _hdr {
> uint32_t len;
> } hdr;
> char data[32];
> } __attribute__((packed));
>
> // use htons to use LE header size, since load_half does a first convertion
> // from network byte order
> const char *bpf_prog_string = " \
> ssize_t bpf_prog1(struct __sk_buff *skb) \
> { \
> return bpf_htons(load_half(skb, 0)) + 4; \
> }";
>
> The length in hdr is uint32_t above, but this looks like it's being
> read as a short.
Err, I agree this is obviously wrong here (I can blame my lack of
attention to this and the example I used), but this isn't the problem as
the actual size is between 0 and 32 -- I could use any size I want here
and the result would the same.
A "real" problem with the conversion program would mean that my example
would not work if I slow it down, but I can send as many packet as I
want if I uncomment the usleep() on the client side or if I just
throttle the network stack with a loud tcpdump writing to stdout -- that
means the algorithm is working even if it's making some badly-sized
conversions.
(Just to make sure I did fix it to htonl(load_word()) and I can confirm
there is no difference)
Thanks,
--
Dominique Martinet
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: KCM - recvmsg() mangles packets?
2018-08-03 23:20 ` Dominique Martinet
@ 2018-08-04 1:18 ` Tom Herbert
2018-08-04 1:41 ` Dominique Martinet
0 siblings, 1 reply; 11+ messages in thread
From: Tom Herbert @ 2018-08-04 1:18 UTC (permalink / raw)
To: Dominique Martinet; +Cc: Linux Kernel Network Developers
On Fri, Aug 3, 2018 at 4:20 PM, Dominique Martinet
<asmadeus@codewreck.org> wrote:
> Tom Herbert wrote on Fri, Aug 03, 2018:
>> struct my_proto {
>> struct _hdr {
>> uint32_t len;
>> } hdr;
>> char data[32];
>> } __attribute__((packed));
>>
>> // use htons to use LE header size, since load_half does a first convertion
>> // from network byte order
>> const char *bpf_prog_string = " \
>> ssize_t bpf_prog1(struct __sk_buff *skb) \
>> { \
>> return bpf_htons(load_half(skb, 0)) + 4; \
>> }";
>>
>> The length in hdr is uint32_t above, but this looks like it's being
>> read as a short.
>
> Err, I agree this is obviously wrong here (I can blame my lack of
> attention to this and the example I used), but this isn't the problem as
> the actual size is between 0 and 32 -- I could use any size I want here
> and the result would the same.
>
> A "real" problem with the conversion program would mean that my example
> would not work if I slow it down, but I can send as many packet as I
> want if I uncomment the usleep() on the client side or if I just
> throttle the network stack with a loud tcpdump writing to stdout -- that
> means the algorithm is working even if it's making some badly-sized
> conversions.
>
> (Just to make sure I did fix it to htonl(load_word()) and I can confirm
> there is no difference)
>
You also need to htonl for
my_msg.hdr.len = (i++ * 1312739ULL) % 31 + 1;
>
> Thanks,
> --
> Dominique Martinet
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: KCM - recvmsg() mangles packets?
2018-08-04 1:18 ` Tom Herbert
@ 2018-08-04 1:41 ` Dominique Martinet
2018-08-04 2:08 ` Dominique Martinet
0 siblings, 1 reply; 11+ messages in thread
From: Dominique Martinet @ 2018-08-04 1:41 UTC (permalink / raw)
To: Tom Herbert; +Cc: Linux Kernel Network Developers
Tom Herbert wrote on Fri, Aug 03, 2018:
> On Fri, Aug 3, 2018 at 4:20 PM, Dominique Martinet
> <asmadeus@codewreck.org> wrote:
> > Tom Herbert wrote on Fri, Aug 03, 2018:
> >> struct my_proto {
> >> struct _hdr {
> >> uint32_t len;
> >> } hdr;
> >> char data[32];
> >> } __attribute__((packed));
> >>
> >> // use htons to use LE header size, since load_half does a first convertion
> >> // from network byte order
> >> const char *bpf_prog_string = " \
> >> ssize_t bpf_prog1(struct __sk_buff *skb) \
> >> { \
> >> return bpf_htons(load_half(skb, 0)) + 4; \
> >> }";
> >
> > (Just to make sure I did fix it to htonl(load_word()) and I can confirm
> > there is no difference)
>
> You also need to htonl for
>
> my_msg.hdr.len = (i++ * 1312739ULL) % 31 + 1;
Thanks, but this looks correct to me - I was writing the header in
little endian order here and doing the double-swap dance in the bpf prog
because the protocol I was considering making a KCM implementation for
uses that.
Just to make sure, I rewrote it using network byte order e.g. these
three points and this makes no difference:
---8<----------------------
diff --git a/kcm.c b/kcm.c
index cb48df1..d437226 100644
--- a/kcm.c
+++ b/kcm.c
@@ -36,7 +36,7 @@ struct my_proto {
const char *bpf_prog_string = " \
ssize_t bpf_prog1(struct __sk_buff *skb) \
{ \
- return bpf_htons(load_half(skb, 0)) + 4; \
+ return load_word(skb, 0) + 4; \
}";
int servsock_init(int port)
@@ -110,13 +110,15 @@ void client(int port)
int i = 1;
while(1) {
- my_msg.hdr.len = (i++ * 1312739ULL) % 31 + 1;
- for (int j = 0; j < my_msg.hdr.len; ) {
- j += snprintf(my_msg.data + j, my_msg.hdr.len - j, "%i", i - 1);
+ int len = (i++ * 1312739ULL) % 31 + 1;
+ my_msg.hdr.len = htonl(len);
+ for (int j = 0; j < len; ) {
+ j += snprintf(my_msg.data + j, len - j,
+ "%i", i - 1);
}
- my_msg.data[my_msg.hdr.len-1] = '\0';
- //printf("%d: writing %d\n", i-1, my_msg.hdr.len);
- len = write(s, &my_msg, sizeof(my_msg.hdr) + my_msg.hdr.len);
+ my_msg.data[len-1] = '\0';
+ //printf("%d: writing %d\n", i-1, len);
+ len = write(s, &my_msg, sizeof(my_msg.hdr) + len);
if (error == -1)
err(EXIT_FAILURE, "write");
//usleep(10000);
@@ -171,9 +173,10 @@ void process(int kcmfd)
len = recvmsg(kcmfd, &msg, 0);
if (len == -1)
err(EXIT_FAILURE, "recvmsg");
- if (len != my_msg.hdr.len + 4) {
+ if (len != ntohl(my_msg.hdr.len) + 4) {
printf("Got %d, expected %d on %dth message: %s; flags:
%x\n",
- len - 4, my_msg.hdr.len, i, my_msg.data, msg.msg_flags);
+ len - 4, ntohl(my_msg.hdr.len), i,
+ my_msg.data, msg.msg_flags);
exit(1);
}
i++;
----------------8<-----------
Frankly I do not believe this is a rule problem, as if the length
splitting was incorrect the program would not work at all, but just
uncommenting the usleep on the sender side makes this work.
Actually, now I'm looking closer to the timing, it looks specific to the
connection setup. This send loop works:
int i = 1;
while(i <= 1000) {
int len = (i++ * 1312739ULL) % 31 + 1;
my_msg.hdr.len = htonl(len);
for (int j = 0; j < len; ) {
j += snprintf(my_msg.data + j, len - j,
"%i", i - 1);
}
my_msg.data[len-1] = '\0';
//printf("%d: writing %d\n", i-1, len);
len = write(s, &my_msg, sizeof(my_msg.hdr) + len);
if (error == -1)
err(EXIT_FAILURE, "write");
if (i == 2)
usleep(1);
}
But removing the usleep(1) after the first packet makes recvmsg()
"fail": it reads the content of the second packet with the size of the
first.
I assume that usleep gives the server time to finish setting up the kcm
socket, because it does accept(); ioctl(SIOCKCMATTACH); recvmsg(); but
the client does not wait to send packets so there could be some sort of
race with the attach and multiple packets?
FWIW I took the time to look at older kernel and this has been happening
ever since KCM got introduced in 4.6
Thanks,
--
Dominique
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: KCM - recvmsg() mangles packets?
2018-08-04 1:41 ` Dominique Martinet
@ 2018-08-04 2:08 ` Dominique Martinet
2018-08-05 6:44 ` Dominique Martinet
0 siblings, 1 reply; 11+ messages in thread
From: Dominique Martinet @ 2018-08-04 2:08 UTC (permalink / raw)
To: Tom Herbert; +Cc: Linux Kernel Network Developers
Dominique Martinet wrote on Sat, Aug 04, 2018:
> Actually, now I'm looking closer to the timing, it looks specific to the
> connection setup. This send loop works:
> int i = 1;
> while(i <= 1000) {
> int len = (i++ * 1312739ULL) % 31 + 1;
> my_msg.hdr.len = htonl(len);
> for (int j = 0; j < len; ) {
> j += snprintf(my_msg.data + j, len - j,
> "%i", i - 1);
> }
> my_msg.data[len-1] = '\0';
> //printf("%d: writing %d\n", i-1, len);
> len = write(s, &my_msg, sizeof(my_msg.hdr) + len);
> if (error == -1)
> err(EXIT_FAILURE, "write");
> if (i == 2)
> usleep(1);
> }
>
> But removing the usleep(1) after the first packet makes recvmsg()
> "fail": it reads the content of the second packet with the size of the
> first.
I talked too fast, I can get this to fail on later packets e.g.
Got 18, expected 31 on 452nd message: 453453453453453453; flags: 80
The content is 453 in a loop so this really is the 453rd packet...
But being slower e.g. doing that usleep after every single packets and I
could let the loop run until 100k without a hintch.
There really has to be something wrong, I just can't tell what from
looking at the code with my naive eyes.
Maybe we need to lock both the tcp and the kcm sockets?
Thanks,
--
Dominique
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: KCM - recvmsg() mangles packets?
2018-08-04 2:08 ` Dominique Martinet
@ 2018-08-05 6:44 ` Dominique Martinet
2018-08-05 14:12 ` Dominique Martinet
0 siblings, 1 reply; 11+ messages in thread
From: Dominique Martinet @ 2018-08-05 6:44 UTC (permalink / raw)
To: Tom Herbert; +Cc: Linux Kernel Network Developers
Dominique Martinet wrote on Sat, Aug 04, 2018:
> I talked too fast, I can get this to fail on later packets e.g.
> Got 18, expected 31 on 452nd message: 453453453453453453; flags: 80
>
> The content is 453 in a loop so this really is the 453rd packet...
>
> But being slower e.g. doing that usleep after every single packets and I
> could let the loop run until 100k without a hintch.
>
>
> There really has to be something wrong, I just can't tell what from
> looking at the code with my naive eyes.
I've spent a few hours debugging this weekend, I'm still not confident I
understand how this all works but it's a bit better than Friday.
I've tried updating kcm_recvmsg()'s use of kcm_wait_data() to use
skb_dequeue() instead of skb_peek(), with the idea that if the skb is
off the list it would have less chance to be messed with, and that
didn't seem to change anything so I was probably looking at the wrong
place.
So I went one step up to how we split the packet and the interface with
strparser and added an extra check of the protocol in kcm_rcv_strparser.
If I understand correctly, with stm = strp_msg(skb), stm->full_len
contains the output of the bpf progran and must match something like
this in the network byte-order version:
ntohl(*((u32 *)(skb->data + stm->offset)))
(I'm not sure about offset, since we pass the full skb to parse message,
wouldn't it look at the start of the buffer everytime? Well, offset
seems to be 0 everytime the first time that check fails so I can
probably ignore that for now...)
Just like the test program, that extra check seems to work but will fail
everytime the test program detects and error, although the data I can
see does not always match what the program sees, the packet would seem
to already be incorrect when it is added to the kcm queue... So there
must be something happening to the skb between parse_msg() and rcv_msg()
in __strp_recv() ...?
>From what I can see it's looking at a cloned skb so the headers are
private but the data is shared, but it looks like the tcp socket (csock
in kcm_attach) is properly locked at this time, so I'm not sure what
could corrupt the buffer.
I'll keep looking at this a bit more but any help is appreciated,
Thanks,
--
Dominique Martinet | Asmadeus
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: KCM - recvmsg() mangles packets?
2018-08-05 6:44 ` Dominique Martinet
@ 2018-08-05 14:12 ` Dominique Martinet
2018-08-05 23:39 ` Dominique Martinet
0 siblings, 1 reply; 11+ messages in thread
From: Dominique Martinet @ 2018-08-05 14:12 UTC (permalink / raw)
To: Tom Herbert; +Cc: Linux Kernel Network Developers
Dominique Martinet wrote on Sun, Aug 05, 2018:
> (I'm not sure about offset, since we pass the full skb to parse message,
> wouldn't it look at the start of the buffer everytime? Well, offset
> seems to be 0 everytime the first time that check fails so I can
> probably ignore that for now...)
Oh, this might actually not have been such a bad remark; if I have the
client write two "messages" in a single write() kcm seems to reliably
fail the same way...
Conversely, if I setsockopt(s, IPPROTO_TCP, TCP_NODELAY...) on the
sender socket, *and* make it wait till the kcm socket has been created
to start sending data, then it dramatically reduces the probability of
this happening (I had to let the reproducer run in a loop for 5 minutes,
wheras it used to happen within seconds).
So I think the problem is packet aggregation, and strparser not handling
that properly... The first packet still fails with TCP_NODELAY but
there's probably aggregation on the recv side as well before the socket
is attached to the multiplexor... I guess the low probability failure
that is still happening could be similar.
(I also noticed that I've mistakedly believed that the problem was the
first packet contained the 2nd packet's data because of an off-by-one in
the receiver, it really is the second packet, it only has the first
packet's length)
I've moved my check from kcm_rcv_strparser to just before parse_msg in
__strp_recv and surely enough it fails everytime there's an offset.
It's getting late but I'll try adding a pskb_pull in there tomorrow, it
would be better to make the bpf program start with an offset but I don't
think that'll be easy to change...
I'm almost done spamming, thanks for being a good rubber duck! :p
--
Dominique Martinet
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: KCM - recvmsg() mangles packets?
2018-08-05 14:12 ` Dominique Martinet
@ 2018-08-05 23:39 ` Dominique Martinet
2018-08-09 20:58 ` Tom Herbert
0 siblings, 1 reply; 11+ messages in thread
From: Dominique Martinet @ 2018-08-05 23:39 UTC (permalink / raw)
To: Tom Herbert; +Cc: Linux Kernel Network Developers
Dominique Martinet wrote on Sun, Aug 05, 2018:
> It's getting late but I'll try adding a pskb_pull in there tomorrow, it
> would be better to make the bpf program start with an offset but I don't
> think that'll be easy to change...
I can confirm the following patch fixes the issue for me:
-----8<---------------------
diff --git a/net/strparser/strparser.c b/net/strparser/strparser.c
index 625acb27efcc..348ff5945591 100644
--- a/net/strparser/strparser.c
+++ b/net/strparser/strparser.c
@@ -222,6 +222,16 @@ static int __strp_recv(read_descriptor_t *desc, struct sk_buff *orig_skb,
if (!stm->strp.full_len) {
ssize_t len;
+ /* Can only parse if there is no offset */
+ if (unlikely(stm->strp.offset)) {
+ if (!pskb_pull(skb, stm->strp.offset)) {
+ STRP_STATS_INCR(strp->stats.mem_fail);
+ strp_parser_err(strp, -ENOMEM, desc);
+ break;
+ }
+ stm->strp.offset = 0;
+ }
+
len = (*strp->cb.parse_msg)(strp, head);
if (!len) {
----------------8<----------------------
Now, I was looking at other users of strparser (I see sockmap, kcm and
tls) and it looks like sockmap does not handle offsets either but tls
does by using skb_copy_bits -- they're copying the tls header to a
buffer on the stack.
kcm cannot do that because we do not know how much data the user expects
to read, and I'm not comfortable doing pskb_pull in the kcm callback
either, but the cost of this pull is probably non-negligible if some
user can make do without it...
On the other hand, I do not see how to make the bpf program handle an
offset in the skb as that offset is strparser-specific.
Maybe add a flag in the cb that specifies wether the callback allows
non-zero offset?
I'll let you see if you can reproduce this and will wait for advices on
how to solve this properly so we can work on a proper fix.
Thanks,
--
Dominique
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: KCM - recvmsg() mangles packets?
2018-08-05 23:39 ` Dominique Martinet
@ 2018-08-09 20:58 ` Tom Herbert
2018-08-09 22:06 ` Dominique Martinet
0 siblings, 1 reply; 11+ messages in thread
From: Tom Herbert @ 2018-08-09 20:58 UTC (permalink / raw)
To: Dominique Martinet; +Cc: Linux Kernel Network Developers
On Sun, Aug 5, 2018 at 4:39 PM, Dominique Martinet
<asmadeus@codewreck.org> wrote:
> Dominique Martinet wrote on Sun, Aug 05, 2018:
>> It's getting late but I'll try adding a pskb_pull in there tomorrow, it
>> would be better to make the bpf program start with an offset but I don't
>> think that'll be easy to change...
>
> I can confirm the following patch fixes the issue for me:
> -----8<---------------------
> diff --git a/net/strparser/strparser.c b/net/strparser/strparser.c
> index 625acb27efcc..348ff5945591 100644
> --- a/net/strparser/strparser.c
> +++ b/net/strparser/strparser.c
> @@ -222,6 +222,16 @@ static int __strp_recv(read_descriptor_t *desc, struct sk_buff *orig_skb,
> if (!stm->strp.full_len) {
> ssize_t len;
>
> + /* Can only parse if there is no offset */
> + if (unlikely(stm->strp.offset)) {
> + if (!pskb_pull(skb, stm->strp.offset)) {
> + STRP_STATS_INCR(strp->stats.mem_fail);
> + strp_parser_err(strp, -ENOMEM, desc);
> + break;
> + }
> + stm->strp.offset = 0;
> + }
> +
Seems okay to me for a fix. Looks like strp.offset is only set in one
place and read in one place. With this pull maybe that just can go
away?
Tom
> len = (*strp->cb.parse_msg)(strp, head);
>
> if (!len) {
> ----------------8<----------------------
>
> Now, I was looking at other users of strparser (I see sockmap, kcm and
> tls) and it looks like sockmap does not handle offsets either but tls
> does by using skb_copy_bits -- they're copying the tls header to a
> buffer on the stack.
>
> kcm cannot do that because we do not know how much data the user expects
> to read, and I'm not comfortable doing pskb_pull in the kcm callback
> either, but the cost of this pull is probably non-negligible if some
> user can make do without it...
>
> On the other hand, I do not see how to make the bpf program handle an
> offset in the skb as that offset is strparser-specific.
>
> Maybe add a flag in the cb that specifies wether the callback allows
> non-zero offset?
>
>
> I'll let you see if you can reproduce this and will wait for advices on
> how to solve this properly so we can work on a proper fix.
>
>
> Thanks,
> --
> Dominique
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: KCM - recvmsg() mangles packets?
2018-08-09 20:58 ` Tom Herbert
@ 2018-08-09 22:06 ` Dominique Martinet
0 siblings, 0 replies; 11+ messages in thread
From: Dominique Martinet @ 2018-08-09 22:06 UTC (permalink / raw)
To: Tom Herbert; +Cc: Linux Kernel Network Developers
Tom Herbert wrote on Thu, Aug 09, 2018:
> > diff --git a/net/strparser/strparser.c b/net/strparser/strparser.c
> > index 625acb27efcc..348ff5945591 100644
> > --- a/net/strparser/strparser.c
> > +++ b/net/strparser/strparser.c
> > @@ -222,6 +222,16 @@ static int __strp_recv(read_descriptor_t *desc, struct sk_buff *orig_skb,
> > if (!stm->strp.full_len) {
> > ssize_t len;
> >
> > + /* Can only parse if there is no offset */
> > + if (unlikely(stm->strp.offset)) {
> > + if (!pskb_pull(skb, stm->strp.offset)) {
> > + STRP_STATS_INCR(strp->stats.mem_fail);
> > + strp_parser_err(strp, -ENOMEM, desc);
> > + break;
> > + }
> > + stm->strp.offset = 0;
> > + }
> > +
>
> Seems okay to me for a fix.
Hmm, if you say so, I'll send this as a patch for broader comments right
away.
> Looks like strp.offset is only set in one place and read in one
> place. With this pull maybe that just can go away?
Good point, when strp.offset is set the full_len is also being init'd so
we will necessarily do the pull next...
But the way tls uses strparser is also kind of weird, since they modify
the strp_msg's offset and full_len, I wouldn't want to assume we can't
have full_len == 0 *again* later with a non zero offset...
On the other hand they do handle non-zero offset in their parse function
so they'd be ok with that... Ultimately it's probably closer to a design
choice than anything else.
I'll still send a v0 of the patch as is, because I feel it's easier to
understand that we pull because the existing parse_msg functions do not
handle it properly, and will write a note that I intend to move it up a
few lines as a comment.
Thanks,
--
Dominique Martinet
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2018-08-10 0:33 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-08-03 18:28 KCM - recvmsg() mangles packets? Dominique Martinet
2018-08-03 22:46 ` Tom Herbert
2018-08-03 23:20 ` Dominique Martinet
2018-08-04 1:18 ` Tom Herbert
2018-08-04 1:41 ` Dominique Martinet
2018-08-04 2:08 ` Dominique Martinet
2018-08-05 6:44 ` Dominique Martinet
2018-08-05 14:12 ` Dominique Martinet
2018-08-05 23:39 ` Dominique Martinet
2018-08-09 20:58 ` Tom Herbert
2018-08-09 22:06 ` Dominique Martinet
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).