* tools/bpf regression causing samples/bpf/ to hang
@ 2018-09-11 11:11 Björn Töpel
2018-09-11 16:46 ` Yonghong Song
0 siblings, 1 reply; 5+ messages in thread
From: Björn Töpel @ 2018-09-11 11:11 UTC (permalink / raw)
To: Netdev, yhs; +Cc: ast, Daniel Borkmann, Jesper Dangaard Brouer
Hi Yonghong, I tried to run the XDP samples from the bpf-next tip
today, and was hit by a regression.
Commit f7010770fbac ("tools/bpf: move bpf/lib netlink related
functions into a new file") adds a while(1) around the recv call in
bpf_set_link_xdp_fd making that call getting stuck in an infinite
loop.
I simply removed the loop, and that solved my problem (patch below).
However, I don't know if removing the loop would break bpftool for
you. If not, I can submit the patch as a proper one for bpf-next.
Thanks!
Björn
From: =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= <bjorn.topel@intel.com>
Date: Tue, 11 Sep 2018 12:35:44 +0200
Subject: [PATCH] tools/bpf: remove loop around netlink recv
Commit f7010770fbac ("tools/bpf: move bpf/lib netlink related
functions into a new file") moved the bpf_set_link_xdp_fd and split it
up into multiple functions. The added receive function
bpf_netlink_recv added a loop around the recv syscall leading to
multiple recv calls. This resulted in all XDP samples in the
samples/bpf/ to stop working, since they were stuck in a blocking
recv.
This commits removes the while (1)-statement.
Fixes: f7010770fbac ("tools/bpf: move bpf/lib netlink related
functions into a new file")
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
---
tools/lib/bpf/netlink.c | 64 ++++++++++++++++++++---------------------
1 file changed, 31 insertions(+), 33 deletions(-)
diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c
index 469e068dd0c5..0eae1fbf46c6 100644
--- a/tools/lib/bpf/netlink.c
+++ b/tools/lib/bpf/netlink.c
@@ -70,41 +70,39 @@ static int bpf_netlink_recv(int sock, __u32 nl_pid, int seq,
char buf[4096];
int len, ret;
- while (1) {
- len = recv(sock, buf, sizeof(buf), 0);
- if (len < 0) {
- ret = -errno;
+ len = recv(sock, buf, sizeof(buf), 0);
+ if (len < 0) {
+ ret = -errno;
+ goto done;
+ }
+
+ for (nh = (struct nlmsghdr *)buf; NLMSG_OK(nh, len);
+ nh = NLMSG_NEXT(nh, len)) {
+ if (nh->nlmsg_pid != nl_pid) {
+ ret = -LIBBPF_ERRNO__WRNGPID;
goto done;
}
-
- for (nh = (struct nlmsghdr *)buf; NLMSG_OK(nh, len);
- nh = NLMSG_NEXT(nh, len)) {
- if (nh->nlmsg_pid != nl_pid) {
- ret = -LIBBPF_ERRNO__WRNGPID;
- goto done;
- }
- if (nh->nlmsg_seq != seq) {
- ret = -LIBBPF_ERRNO__INVSEQ;
- goto done;
- }
- switch (nh->nlmsg_type) {
- case NLMSG_ERROR:
- err = (struct nlmsgerr *)NLMSG_DATA(nh);
- if (!err->error)
- continue;
- ret = err->error;
- nla_dump_errormsg(nh);
- goto done;
- case NLMSG_DONE:
- return 0;
- default:
- break;
- }
- if (_fn) {
- ret = _fn(nh, fn, cookie);
- if (ret)
- return ret;
- }
+ if (nh->nlmsg_seq != seq) {
+ ret = -LIBBPF_ERRNO__INVSEQ;
+ goto done;
+ }
+ switch (nh->nlmsg_type) {
+ case NLMSG_ERROR:
+ err = (struct nlmsgerr *)NLMSG_DATA(nh);
+ if (!err->error)
+ continue;
+ ret = err->error;
+ nla_dump_errormsg(nh);
+ goto done;
+ case NLMSG_DONE:
+ return 0;
+ default:
+ break;
+ }
+ if (_fn) {
+ ret = _fn(nh, fn, cookie);
+ if (ret)
+ return ret;
}
}
ret = 0;
--
2.17.1
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: tools/bpf regression causing samples/bpf/ to hang 2018-09-11 11:11 tools/bpf regression causing samples/bpf/ to hang Björn Töpel @ 2018-09-11 16:46 ` Yonghong Song 2018-09-11 17:15 ` Björn Töpel 0 siblings, 1 reply; 5+ messages in thread From: Yonghong Song @ 2018-09-11 16:46 UTC (permalink / raw) To: Björn Töpel, Netdev Cc: ast, Daniel Borkmann, Jesper Dangaard Brouer On 9/11/18 4:11 AM, Björn Töpel wrote: > Hi Yonghong, I tried to run the XDP samples from the bpf-next tip > today, and was hit by a regression. > > Commit f7010770fbac ("tools/bpf: move bpf/lib netlink related > functions into a new file") adds a while(1) around the recv call in > bpf_set_link_xdp_fd making that call getting stuck in an infinite > loop. > > I simply removed the loop, and that solved my problem (patch below). > > However, I don't know if removing the loop would break bpftool for > you. If not, I can submit the patch as a proper one for bpf-next. Hi, Björn, thanks for reporting the problem. The while loop is needed since the "recv" syscall buffer size may not be big enough to hold all the returned information, in which cases, multiple "recv" calls are needed. Could you try the following patch to see whether it fixed your issue? Thanks! commit 3eb1c0249dfc3ea4ad61aa223dce32262af7e049 (HEAD -> fix) Author: Yonghong Song <yhs@fb.com> Date: Tue Sep 11 08:58:20 2018 -0700 tools/bpf: fix a netlink recv issue Commit f7010770fbac ("tools/bpf: move bpf/lib netlink related functions into a new file") introduced a while loop for the netlink recv path. This while loop is needed since the buffer in recv syscall may not be big enough to hold all the information and in such cases multiple recv calls are needed. When netlink recv returns message length of 0, there will be no more messages for returning data so the while loop can end. Fixes: f7010770fbac ("tools/bpf: move bpf/lib netlink related functions into a new file") Reported-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Yonghong Song <yhs@fb.com> diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c index 469e068dd0c5..37827319a50a 100644 --- a/tools/lib/bpf/netlink.c +++ b/tools/lib/bpf/netlink.c @@ -77,6 +77,9 @@ static int bpf_netlink_recv(int sock, __u32 nl_pid, int seq, goto done; } + if (len == 0) + break; + for (nh = (struct nlmsghdr *)buf; NLMSG_OK(nh, len); nh = NLMSG_NEXT(nh, len)) { if (nh->nlmsg_pid != nl_pid) { > > Thanks! > Björn > > From: =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= <bjorn.topel@intel.com> > Date: Tue, 11 Sep 2018 12:35:44 +0200 > Subject: [PATCH] tools/bpf: remove loop around netlink recv > > Commit f7010770fbac ("tools/bpf: move bpf/lib netlink related > functions into a new file") moved the bpf_set_link_xdp_fd and split it > up into multiple functions. The added receive function > bpf_netlink_recv added a loop around the recv syscall leading to > multiple recv calls. This resulted in all XDP samples in the > samples/bpf/ to stop working, since they were stuck in a blocking > recv. > > This commits removes the while (1)-statement. > > Fixes: f7010770fbac ("tools/bpf: move bpf/lib netlink related > functions into a new file") > Signed-off-by: Björn Töpel <bjorn.topel@intel.com> > --- > tools/lib/bpf/netlink.c | 64 ++++++++++++++++++++--------------------- > 1 file changed, 31 insertions(+), 33 deletions(-) > > diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c > index 469e068dd0c5..0eae1fbf46c6 100644 > --- a/tools/lib/bpf/netlink.c > +++ b/tools/lib/bpf/netlink.c > @@ -70,41 +70,39 @@ static int bpf_netlink_recv(int sock, __u32 nl_pid, int seq, > char buf[4096]; > int len, ret; > > - while (1) { > - len = recv(sock, buf, sizeof(buf), 0); > - if (len < 0) { > - ret = -errno; > + len = recv(sock, buf, sizeof(buf), 0); > + if (len < 0) { > + ret = -errno; > + goto done; > + } > + > + for (nh = (struct nlmsghdr *)buf; NLMSG_OK(nh, len); > + nh = NLMSG_NEXT(nh, len)) { > + if (nh->nlmsg_pid != nl_pid) { > + ret = -LIBBPF_ERRNO__WRNGPID; > goto done; > } > - > - for (nh = (struct nlmsghdr *)buf; NLMSG_OK(nh, len); > - nh = NLMSG_NEXT(nh, len)) { > - if (nh->nlmsg_pid != nl_pid) { > - ret = -LIBBPF_ERRNO__WRNGPID; > - goto done; > - } > - if (nh->nlmsg_seq != seq) { > - ret = -LIBBPF_ERRNO__INVSEQ; > - goto done; > - } > - switch (nh->nlmsg_type) { > - case NLMSG_ERROR: > - err = (struct nlmsgerr *)NLMSG_DATA(nh); > - if (!err->error) > - continue; > - ret = err->error; > - nla_dump_errormsg(nh); > - goto done; > - case NLMSG_DONE: > - return 0; > - default: > - break; > - } > - if (_fn) { > - ret = _fn(nh, fn, cookie); > - if (ret) > - return ret; > - } > + if (nh->nlmsg_seq != seq) { > + ret = -LIBBPF_ERRNO__INVSEQ; > + goto done; > + } > + switch (nh->nlmsg_type) { > + case NLMSG_ERROR: > + err = (struct nlmsgerr *)NLMSG_DATA(nh); > + if (!err->error) > + continue; > + ret = err->error; > + nla_dump_errormsg(nh); > + goto done; > + case NLMSG_DONE: > + return 0; > + default: > + break; > + } > + if (_fn) { > + ret = _fn(nh, fn, cookie); > + if (ret) > + return ret; > } > } > ret = 0; > ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: tools/bpf regression causing samples/bpf/ to hang 2018-09-11 16:46 ` Yonghong Song @ 2018-09-11 17:15 ` Björn Töpel 2018-09-11 18:21 ` Yonghong Song 0 siblings, 1 reply; 5+ messages in thread From: Björn Töpel @ 2018-09-11 17:15 UTC (permalink / raw) To: yhs; +Cc: Netdev, ast, Daniel Borkmann, Jesper Dangaard Brouer Den tis 11 sep. 2018 kl 18:47 skrev Yonghong Song <yhs@fb.com>: > > > > On 9/11/18 4:11 AM, Björn Töpel wrote: > > Hi Yonghong, I tried to run the XDP samples from the bpf-next tip > > today, and was hit by a regression. > > > > Commit f7010770fbac ("tools/bpf: move bpf/lib netlink related > > functions into a new file") adds a while(1) around the recv call in > > bpf_set_link_xdp_fd making that call getting stuck in an infinite > > loop. > > > > I simply removed the loop, and that solved my problem (patch below). > > > > However, I don't know if removing the loop would break bpftool for > > you. If not, I can submit the patch as a proper one for bpf-next. > > Hi, Björn, thanks for reporting the problem. > The while loop is needed since the "recv" syscall buffer size > may not be big enough to hold all the returned information, in > which cases, multiple "recv" calls are needed. > > Could you try the following patch to see whether it fixed your > issue? Thanks! > Nope, it doesn't -- but if you move that hunk after the for-loop it works. Cheers, Björn > commit 3eb1c0249dfc3ea4ad61aa223dce32262af7e049 (HEAD -> fix) > Author: Yonghong Song <yhs@fb.com> > Date: Tue Sep 11 08:58:20 2018 -0700 > > tools/bpf: fix a netlink recv issue > > Commit f7010770fbac ("tools/bpf: move bpf/lib netlink related > functions into a new file") introduced a while loop for the > netlink recv path. This while loop is needed since the > buffer in recv syscall may not be big enough to hold all the > information and in such cases multiple recv calls are needed. > > When netlink recv returns message length of 0, there will be > no more messages for returning data so the while loop > can end. > > Fixes: f7010770fbac ("tools/bpf: move bpf/lib netlink related > functions into a new file") > Reported-by: Björn Töpel <bjorn.topel@intel.com> > Signed-off-by: Yonghong Song <yhs@fb.com> > > diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c > index 469e068dd0c5..37827319a50a 100644 > --- a/tools/lib/bpf/netlink.c > +++ b/tools/lib/bpf/netlink.c > @@ -77,6 +77,9 @@ static int bpf_netlink_recv(int sock, __u32 nl_pid, > int seq, > goto done; > } > > + if (len == 0) > + break; > + > for (nh = (struct nlmsghdr *)buf; NLMSG_OK(nh, len); > nh = NLMSG_NEXT(nh, len)) { > if (nh->nlmsg_pid != nl_pid) { > > > > > > Thanks! > > Björn > > > > From: =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= <bjorn.topel@intel.com> > > Date: Tue, 11 Sep 2018 12:35:44 +0200 > > Subject: [PATCH] tools/bpf: remove loop around netlink recv > > > > Commit f7010770fbac ("tools/bpf: move bpf/lib netlink related > > functions into a new file") moved the bpf_set_link_xdp_fd and split it > > up into multiple functions. The added receive function > > bpf_netlink_recv added a loop around the recv syscall leading to > > multiple recv calls. This resulted in all XDP samples in the > > samples/bpf/ to stop working, since they were stuck in a blocking > > recv. > > > > This commits removes the while (1)-statement. > > > > Fixes: f7010770fbac ("tools/bpf: move bpf/lib netlink related > > functions into a new file") > > Signed-off-by: Björn Töpel <bjorn.topel@intel.com> > > --- > > tools/lib/bpf/netlink.c | 64 ++++++++++++++++++++--------------------- > > 1 file changed, 31 insertions(+), 33 deletions(-) > > > > diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c > > index 469e068dd0c5..0eae1fbf46c6 100644 > > --- a/tools/lib/bpf/netlink.c > > +++ b/tools/lib/bpf/netlink.c > > @@ -70,41 +70,39 @@ static int bpf_netlink_recv(int sock, __u32 nl_pid, int seq, > > char buf[4096]; > > int len, ret; > > > > - while (1) { > > - len = recv(sock, buf, sizeof(buf), 0); > > - if (len < 0) { > > - ret = -errno; > > + len = recv(sock, buf, sizeof(buf), 0); > > + if (len < 0) { > > + ret = -errno; > > + goto done; > > + } > > + > > + for (nh = (struct nlmsghdr *)buf; NLMSG_OK(nh, len); > > + nh = NLMSG_NEXT(nh, len)) { > > + if (nh->nlmsg_pid != nl_pid) { > > + ret = -LIBBPF_ERRNO__WRNGPID; > > goto done; > > } > > - > > - for (nh = (struct nlmsghdr *)buf; NLMSG_OK(nh, len); > > - nh = NLMSG_NEXT(nh, len)) { > > - if (nh->nlmsg_pid != nl_pid) { > > - ret = -LIBBPF_ERRNO__WRNGPID; > > - goto done; > > - } > > - if (nh->nlmsg_seq != seq) { > > - ret = -LIBBPF_ERRNO__INVSEQ; > > - goto done; > > - } > > - switch (nh->nlmsg_type) { > > - case NLMSG_ERROR: > > - err = (struct nlmsgerr *)NLMSG_DATA(nh); > > - if (!err->error) > > - continue; > > - ret = err->error; > > - nla_dump_errormsg(nh); > > - goto done; > > - case NLMSG_DONE: > > - return 0; > > - default: > > - break; > > - } > > - if (_fn) { > > - ret = _fn(nh, fn, cookie); > > - if (ret) > > - return ret; > > - } > > + if (nh->nlmsg_seq != seq) { > > + ret = -LIBBPF_ERRNO__INVSEQ; > > + goto done; > > + } > > + switch (nh->nlmsg_type) { > > + case NLMSG_ERROR: > > + err = (struct nlmsgerr *)NLMSG_DATA(nh); > > + if (!err->error) > > + continue; > > + ret = err->error; > > + nla_dump_errormsg(nh); > > + goto done; > > + case NLMSG_DONE: > > + return 0; > > + default: > > + break; > > + } > > + if (_fn) { > > + ret = _fn(nh, fn, cookie); > > + if (ret) > > + return ret; > > } > > } > > ret = 0; > > ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: tools/bpf regression causing samples/bpf/ to hang 2018-09-11 17:15 ` Björn Töpel @ 2018-09-11 18:21 ` Yonghong Song 2018-09-11 19:01 ` Björn Töpel 0 siblings, 1 reply; 5+ messages in thread From: Yonghong Song @ 2018-09-11 18:21 UTC (permalink / raw) To: Björn Töpel Cc: Netdev, ast, Daniel Borkmann, Jesper Dangaard Brouer On 9/11/18 10:15 AM, Björn Töpel wrote: > Den tis 11 sep. 2018 kl 18:47 skrev Yonghong Song <yhs@fb.com>: >> >> >> >> On 9/11/18 4:11 AM, Björn Töpel wrote: >>> Hi Yonghong, I tried to run the XDP samples from the bpf-next tip >>> today, and was hit by a regression. >>> >>> Commit f7010770fbac ("tools/bpf: move bpf/lib netlink related >>> functions into a new file") adds a while(1) around the recv call in >>> bpf_set_link_xdp_fd making that call getting stuck in an infinite >>> loop. >>> >>> I simply removed the loop, and that solved my problem (patch below). >>> >>> However, I don't know if removing the loop would break bpftool for >>> you. If not, I can submit the patch as a proper one for bpf-next. >> >> Hi, Björn, thanks for reporting the problem. >> The while loop is needed since the "recv" syscall buffer size >> may not be big enough to hold all the returned information, in >> which cases, multiple "recv" calls are needed. >> >> Could you try the following patch to see whether it fixed your >> issue? Thanks! >> > > Nope, it doesn't -- but if you move that hunk after the for-loop it works. Could you try this patch? commit 9a7fb19899ce87594fe8012f8a23fc8fc7b6b764 (HEAD -> fix) Author: Yonghong Song <yhs@fb.com> Date: Tue Sep 11 08:58:20 2018 -0700 tools/bpf: fix a netlink recv issue Commit f7010770fbac ("tools/bpf: move bpf/lib netlink related functions into a new file") introduced a while loop for the netlink recv path. This while loop is needed since the buffer in recv syscall may not be enough to hold all the information and in such cases multiple recv calls are needed. There is a bug introduced by the above commit as the while loop may block on recv syscall if there is no more messages are expected. The netlink message header flag NLM_F_MULTI is used to indicate that more messages are expected and this patch fixed the bug by doing further recv syscall only if multipart message is expected. The patch added another fix regarding to message length of 0. When netlink recv returns message length of 0, there will be no more messages for returning data so the while loop can end. Fixes: f7010770fbac ("tools/bpf: move bpf/lib netlink related functions into a new file") Reported-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Yonghong Song <yhs@fb.com> diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c index 469e068dd0c5..fde1d7bf8199 100644 --- a/tools/lib/bpf/netlink.c +++ b/tools/lib/bpf/netlink.c @@ -65,18 +65,23 @@ static int bpf_netlink_recv(int sock, __u32 nl_pid, int seq, __dump_nlmsg_t _fn, dump_nlmsg_t fn, void *cookie) { + bool multipart = true; struct nlmsgerr *err; struct nlmsghdr *nh; char buf[4096]; int len, ret; - while (1) { + while (multipart) { + multipart = false; len = recv(sock, buf, sizeof(buf), 0); if (len < 0) { ret = -errno; goto done; } + if (len == 0) + break; + for (nh = (struct nlmsghdr *)buf; NLMSG_OK(nh, len); nh = NLMSG_NEXT(nh, len)) { if (nh->nlmsg_pid != nl_pid) { @@ -87,6 +92,8 @@ static int bpf_netlink_recv(int sock, __u32 nl_pid, int seq, ret = -LIBBPF_ERRNO__INVSEQ; goto done; } + if (nh->nlmsg_flags & NLM_F_MULTI) + multipart = true; switch (nh->nlmsg_type) { case NLMSG_ERROR: err = (struct nlmsgerr *)NLMSG_DATA(nh); > > Cheers, > Björn > >> commit 3eb1c0249dfc3ea4ad61aa223dce32262af7e049 (HEAD -> fix) >> Author: Yonghong Song <yhs@fb.com> >> Date: Tue Sep 11 08:58:20 2018 -0700 >> >> tools/bpf: fix a netlink recv issue >> >> Commit f7010770fbac ("tools/bpf: move bpf/lib netlink related >> functions into a new file") introduced a while loop for the >> netlink recv path. This while loop is needed since the >> buffer in recv syscall may not be big enough to hold all the >> information and in such cases multiple recv calls are needed. >> >> When netlink recv returns message length of 0, there will be >> no more messages for returning data so the while loop >> can end. >> >> Fixes: f7010770fbac ("tools/bpf: move bpf/lib netlink related >> functions into a new file") >> Reported-by: Björn Töpel <bjorn.topel@intel.com> >> Signed-off-by: Yonghong Song <yhs@fb.com> >> >> diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c >> index 469e068dd0c5..37827319a50a 100644 >> --- a/tools/lib/bpf/netlink.c >> +++ b/tools/lib/bpf/netlink.c >> @@ -77,6 +77,9 @@ static int bpf_netlink_recv(int sock, __u32 nl_pid, >> int seq, >> goto done; >> } >> >> + if (len == 0) >> + break; >> + >> for (nh = (struct nlmsghdr *)buf; NLMSG_OK(nh, len); >> nh = NLMSG_NEXT(nh, len)) { >> if (nh->nlmsg_pid != nl_pid) { >> >> >>> >>> Thanks! >>> Björn >>> >>> From: =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= <bjorn.topel@intel.com> >>> Date: Tue, 11 Sep 2018 12:35:44 +0200 >>> Subject: [PATCH] tools/bpf: remove loop around netlink recv >>> >>> Commit f7010770fbac ("tools/bpf: move bpf/lib netlink related >>> functions into a new file") moved the bpf_set_link_xdp_fd and split it >>> up into multiple functions. The added receive function >>> bpf_netlink_recv added a loop around the recv syscall leading to >>> multiple recv calls. This resulted in all XDP samples in the >>> samples/bpf/ to stop working, since they were stuck in a blocking >>> recv. >>> >>> This commits removes the while (1)-statement. >>> >>> Fixes: f7010770fbac ("tools/bpf: move bpf/lib netlink related >>> functions into a new file") >>> Signed-off-by: Björn Töpel <bjorn.topel@intel.com> >>> --- >>> tools/lib/bpf/netlink.c | 64 ++++++++++++++++++++--------------------- >>> 1 file changed, 31 insertions(+), 33 deletions(-) >>> >>> diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c >>> index 469e068dd0c5..0eae1fbf46c6 100644 >>> --- a/tools/lib/bpf/netlink.c >>> +++ b/tools/lib/bpf/netlink.c >>> @@ -70,41 +70,39 @@ static int bpf_netlink_recv(int sock, __u32 nl_pid, int seq, >>> char buf[4096]; >>> int len, ret; >>> >>> - while (1) { >>> - len = recv(sock, buf, sizeof(buf), 0); >>> - if (len < 0) { >>> - ret = -errno; >>> + len = recv(sock, buf, sizeof(buf), 0); >>> + if (len < 0) { >>> + ret = -errno; >>> + goto done; >>> + } >>> + >>> + for (nh = (struct nlmsghdr *)buf; NLMSG_OK(nh, len); >>> + nh = NLMSG_NEXT(nh, len)) { >>> + if (nh->nlmsg_pid != nl_pid) { >>> + ret = -LIBBPF_ERRNO__WRNGPID; >>> goto done; >>> } >>> - >>> - for (nh = (struct nlmsghdr *)buf; NLMSG_OK(nh, len); >>> - nh = NLMSG_NEXT(nh, len)) { >>> - if (nh->nlmsg_pid != nl_pid) { >>> - ret = -LIBBPF_ERRNO__WRNGPID; >>> - goto done; >>> - } >>> - if (nh->nlmsg_seq != seq) { >>> - ret = -LIBBPF_ERRNO__INVSEQ; >>> - goto done; >>> - } >>> - switch (nh->nlmsg_type) { >>> - case NLMSG_ERROR: >>> - err = (struct nlmsgerr *)NLMSG_DATA(nh); >>> - if (!err->error) >>> - continue; >>> - ret = err->error; >>> - nla_dump_errormsg(nh); >>> - goto done; >>> - case NLMSG_DONE: >>> - return 0; >>> - default: >>> - break; >>> - } >>> - if (_fn) { >>> - ret = _fn(nh, fn, cookie); >>> - if (ret) >>> - return ret; >>> - } >>> + if (nh->nlmsg_seq != seq) { >>> + ret = -LIBBPF_ERRNO__INVSEQ; >>> + goto done; >>> + } >>> + switch (nh->nlmsg_type) { >>> + case NLMSG_ERROR: >>> + err = (struct nlmsgerr *)NLMSG_DATA(nh); >>> + if (!err->error) >>> + continue; >>> + ret = err->error; >>> + nla_dump_errormsg(nh); >>> + goto done; >>> + case NLMSG_DONE: >>> + return 0; >>> + default: >>> + break; >>> + } >>> + if (_fn) { >>> + ret = _fn(nh, fn, cookie); >>> + if (ret) >>> + return ret; >>> } >>> } >>> ret = 0; >>> ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: tools/bpf regression causing samples/bpf/ to hang 2018-09-11 18:21 ` Yonghong Song @ 2018-09-11 19:01 ` Björn Töpel 0 siblings, 0 replies; 5+ messages in thread From: Björn Töpel @ 2018-09-11 19:01 UTC (permalink / raw) To: yhs; +Cc: Netdev, ast, Daniel Borkmann, Jesper Dangaard Brouer Den tis 11 sep. 2018 kl 20:21 skrev Yonghong Song <yhs@fb.com>: > > > > On 9/11/18 10:15 AM, Björn Töpel wrote: > > Den tis 11 sep. 2018 kl 18:47 skrev Yonghong Song <yhs@fb.com>: > >> > >> > >> > >> On 9/11/18 4:11 AM, Björn Töpel wrote: > >>> Hi Yonghong, I tried to run the XDP samples from the bpf-next tip > >>> today, and was hit by a regression. > >>> > >>> Commit f7010770fbac ("tools/bpf: move bpf/lib netlink related > >>> functions into a new file") adds a while(1) around the recv call in > >>> bpf_set_link_xdp_fd making that call getting stuck in an infinite > >>> loop. > >>> > >>> I simply removed the loop, and that solved my problem (patch below). > >>> > >>> However, I don't know if removing the loop would break bpftool for > >>> you. If not, I can submit the patch as a proper one for bpf-next. > >> > >> Hi, Björn, thanks for reporting the problem. > >> The while loop is needed since the "recv" syscall buffer size > >> may not be big enough to hold all the returned information, in > >> which cases, multiple "recv" calls are needed. > >> > >> Could you try the following patch to see whether it fixed your > >> issue? Thanks! > >> > > > > Nope, it doesn't -- but if you move that hunk after the for-loop it works. > > Could you try this patch? > Works! Thanks! Tested-by: Björn Töpel <bjorn.topel@intel.com> > commit 9a7fb19899ce87594fe8012f8a23fc8fc7b6b764 (HEAD -> fix) > Author: Yonghong Song <yhs@fb.com> > Date: Tue Sep 11 08:58:20 2018 -0700 > > tools/bpf: fix a netlink recv issue > > Commit f7010770fbac ("tools/bpf: move bpf/lib netlink related > functions into a new file") introduced a while loop for the > netlink recv path. This while loop is needed since the > buffer in recv syscall may not be enough to hold all the > information and in such cases multiple recv calls are needed. > > There is a bug introduced by the above commit as > the while loop may block on recv syscall if there is no > more messages are expected. The netlink message header > flag NLM_F_MULTI is used to indicate that more messages > are expected and this patch fixed the bug by doing > further recv syscall only if multipart message is expected. > > The patch added another fix regarding to message length of 0. > When netlink recv returns message length of 0, there will be > no more messages for returning data so the while loop > can end. > > Fixes: f7010770fbac ("tools/bpf: move bpf/lib netlink related > functions into a new file") > Reported-by: Björn Töpel <bjorn.topel@intel.com> > Signed-off-by: Yonghong Song <yhs@fb.com> > > diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c > index 469e068dd0c5..fde1d7bf8199 100644 > --- a/tools/lib/bpf/netlink.c > +++ b/tools/lib/bpf/netlink.c > @@ -65,18 +65,23 @@ static int bpf_netlink_recv(int sock, __u32 nl_pid, > int seq, > __dump_nlmsg_t _fn, dump_nlmsg_t fn, > void *cookie) > { > + bool multipart = true; > struct nlmsgerr *err; > struct nlmsghdr *nh; > char buf[4096]; > int len, ret; > > - while (1) { > + while (multipart) { > + multipart = false; > len = recv(sock, buf, sizeof(buf), 0); > if (len < 0) { > ret = -errno; > goto done; > } > > + if (len == 0) > + break; > + > for (nh = (struct nlmsghdr *)buf; NLMSG_OK(nh, len); > nh = NLMSG_NEXT(nh, len)) { > if (nh->nlmsg_pid != nl_pid) { > @@ -87,6 +92,8 @@ static int bpf_netlink_recv(int sock, __u32 nl_pid, > int seq, > ret = -LIBBPF_ERRNO__INVSEQ; > goto done; > } > + if (nh->nlmsg_flags & NLM_F_MULTI) > + multipart = true; > switch (nh->nlmsg_type) { > case NLMSG_ERROR: > err = (struct nlmsgerr *)NLMSG_DATA(nh); > > > > > > Cheers, > > Björn > > > >> commit 3eb1c0249dfc3ea4ad61aa223dce32262af7e049 (HEAD -> fix) > >> Author: Yonghong Song <yhs@fb.com> > >> Date: Tue Sep 11 08:58:20 2018 -0700 > >> > >> tools/bpf: fix a netlink recv issue > >> > >> Commit f7010770fbac ("tools/bpf: move bpf/lib netlink related > >> functions into a new file") introduced a while loop for the > >> netlink recv path. This while loop is needed since the > >> buffer in recv syscall may not be big enough to hold all the > >> information and in such cases multiple recv calls are needed. > >> > >> When netlink recv returns message length of 0, there will be > >> no more messages for returning data so the while loop > >> can end. > >> > >> Fixes: f7010770fbac ("tools/bpf: move bpf/lib netlink related > >> functions into a new file") > >> Reported-by: Björn Töpel <bjorn.topel@intel.com> > >> Signed-off-by: Yonghong Song <yhs@fb.com> > >> > >> diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c > >> index 469e068dd0c5..37827319a50a 100644 > >> --- a/tools/lib/bpf/netlink.c > >> +++ b/tools/lib/bpf/netlink.c > >> @@ -77,6 +77,9 @@ static int bpf_netlink_recv(int sock, __u32 nl_pid, > >> int seq, > >> goto done; > >> } > >> > >> + if (len == 0) > >> + break; > >> + > >> for (nh = (struct nlmsghdr *)buf; NLMSG_OK(nh, len); > >> nh = NLMSG_NEXT(nh, len)) { > >> if (nh->nlmsg_pid != nl_pid) { > >> > >> > >>> > >>> Thanks! > >>> Björn > >>> > >>> From: =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= <bjorn.topel@intel.com> > >>> Date: Tue, 11 Sep 2018 12:35:44 +0200 > >>> Subject: [PATCH] tools/bpf: remove loop around netlink recv > >>> > >>> Commit f7010770fbac ("tools/bpf: move bpf/lib netlink related > >>> functions into a new file") moved the bpf_set_link_xdp_fd and split it > >>> up into multiple functions. The added receive function > >>> bpf_netlink_recv added a loop around the recv syscall leading to > >>> multiple recv calls. This resulted in all XDP samples in the > >>> samples/bpf/ to stop working, since they were stuck in a blocking > >>> recv. > >>> > >>> This commits removes the while (1)-statement. > >>> > >>> Fixes: f7010770fbac ("tools/bpf: move bpf/lib netlink related > >>> functions into a new file") > >>> Signed-off-by: Björn Töpel <bjorn.topel@intel.com> > >>> --- > >>> tools/lib/bpf/netlink.c | 64 ++++++++++++++++++++--------------------- > >>> 1 file changed, 31 insertions(+), 33 deletions(-) > >>> > >>> diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c > >>> index 469e068dd0c5..0eae1fbf46c6 100644 > >>> --- a/tools/lib/bpf/netlink.c > >>> +++ b/tools/lib/bpf/netlink.c > >>> @@ -70,41 +70,39 @@ static int bpf_netlink_recv(int sock, __u32 nl_pid, int seq, > >>> char buf[4096]; > >>> int len, ret; > >>> > >>> - while (1) { > >>> - len = recv(sock, buf, sizeof(buf), 0); > >>> - if (len < 0) { > >>> - ret = -errno; > >>> + len = recv(sock, buf, sizeof(buf), 0); > >>> + if (len < 0) { > >>> + ret = -errno; > >>> + goto done; > >>> + } > >>> + > >>> + for (nh = (struct nlmsghdr *)buf; NLMSG_OK(nh, len); > >>> + nh = NLMSG_NEXT(nh, len)) { > >>> + if (nh->nlmsg_pid != nl_pid) { > >>> + ret = -LIBBPF_ERRNO__WRNGPID; > >>> goto done; > >>> } > >>> - > >>> - for (nh = (struct nlmsghdr *)buf; NLMSG_OK(nh, len); > >>> - nh = NLMSG_NEXT(nh, len)) { > >>> - if (nh->nlmsg_pid != nl_pid) { > >>> - ret = -LIBBPF_ERRNO__WRNGPID; > >>> - goto done; > >>> - } > >>> - if (nh->nlmsg_seq != seq) { > >>> - ret = -LIBBPF_ERRNO__INVSEQ; > >>> - goto done; > >>> - } > >>> - switch (nh->nlmsg_type) { > >>> - case NLMSG_ERROR: > >>> - err = (struct nlmsgerr *)NLMSG_DATA(nh); > >>> - if (!err->error) > >>> - continue; > >>> - ret = err->error; > >>> - nla_dump_errormsg(nh); > >>> - goto done; > >>> - case NLMSG_DONE: > >>> - return 0; > >>> - default: > >>> - break; > >>> - } > >>> - if (_fn) { > >>> - ret = _fn(nh, fn, cookie); > >>> - if (ret) > >>> - return ret; > >>> - } > >>> + if (nh->nlmsg_seq != seq) { > >>> + ret = -LIBBPF_ERRNO__INVSEQ; > >>> + goto done; > >>> + } > >>> + switch (nh->nlmsg_type) { > >>> + case NLMSG_ERROR: > >>> + err = (struct nlmsgerr *)NLMSG_DATA(nh); > >>> + if (!err->error) > >>> + continue; > >>> + ret = err->error; > >>> + nla_dump_errormsg(nh); > >>> + goto done; > >>> + case NLMSG_DONE: > >>> + return 0; > >>> + default: > >>> + break; > >>> + } > >>> + if (_fn) { > >>> + ret = _fn(nh, fn, cookie); > >>> + if (ret) > >>> + return ret; > >>> } > >>> } > >>> ret = 0; > >>> ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2018-09-12 0:02 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-09-11 11:11 tools/bpf regression causing samples/bpf/ to hang Björn Töpel 2018-09-11 16:46 ` Yonghong Song 2018-09-11 17:15 ` Björn Töpel 2018-09-11 18:21 ` Yonghong Song 2018-09-11 19:01 ` Björn Töpel
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox