* Re: tools/bpf regression causing samples/bpf/ to hang
From: Yonghong Song @ 2018-09-11 16:46 UTC (permalink / raw)
To: Björn Töpel, Netdev
Cc: ast, Daniel Borkmann, Jesper Dangaard Brouer
In-Reply-To: <CAJ+HfNh42UiWdMdtw=eM6LoHTWWJ5h+rnBdhOBdoatNS-8nFOQ@mail.gmail.com>
On 9/11/18 4:11 AM, Björn Töpel wrote:
> Hi Yonghong, I tried to run the XDP samples from the bpf-next tip
> today, and was hit by a regression.
>
> Commit f7010770fbac ("tools/bpf: move bpf/lib netlink related
> functions into a new file") adds a while(1) around the recv call in
> bpf_set_link_xdp_fd making that call getting stuck in an infinite
> loop.
>
> I simply removed the loop, and that solved my problem (patch below).
>
> However, I don't know if removing the loop would break bpftool for
> you. If not, I can submit the patch as a proper one for bpf-next.
Hi, Björn, thanks for reporting the problem.
The while loop is needed since the "recv" syscall buffer size
may not be big enough to hold all the returned information, in
which cases, multiple "recv" calls are needed.
Could you try the following patch to see whether it fixed your
issue? Thanks!
commit 3eb1c0249dfc3ea4ad61aa223dce32262af7e049 (HEAD -> fix)
Author: Yonghong Song <yhs@fb.com>
Date: Tue Sep 11 08:58:20 2018 -0700
tools/bpf: fix a netlink recv issue
Commit f7010770fbac ("tools/bpf: move bpf/lib netlink related
functions into a new file") introduced a while loop for the
netlink recv path. This while loop is needed since the
buffer in recv syscall may not be big enough to hold all the
information and in such cases multiple recv calls are needed.
When netlink recv returns message length of 0, there will be
no more messages for returning data so the while loop
can end.
Fixes: f7010770fbac ("tools/bpf: move bpf/lib netlink related
functions into a new file")
Reported-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c
index 469e068dd0c5..37827319a50a 100644
--- a/tools/lib/bpf/netlink.c
+++ b/tools/lib/bpf/netlink.c
@@ -77,6 +77,9 @@ static int bpf_netlink_recv(int sock, __u32 nl_pid,
int seq,
goto done;
}
+ if (len == 0)
+ break;
+
for (nh = (struct nlmsghdr *)buf; NLMSG_OK(nh, len);
nh = NLMSG_NEXT(nh, len)) {
if (nh->nlmsg_pid != nl_pid) {
>
> Thanks!
> Björn
>
> From: =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= <bjorn.topel@intel.com>
> Date: Tue, 11 Sep 2018 12:35:44 +0200
> Subject: [PATCH] tools/bpf: remove loop around netlink recv
>
> Commit f7010770fbac ("tools/bpf: move bpf/lib netlink related
> functions into a new file") moved the bpf_set_link_xdp_fd and split it
> up into multiple functions. The added receive function
> bpf_netlink_recv added a loop around the recv syscall leading to
> multiple recv calls. This resulted in all XDP samples in the
> samples/bpf/ to stop working, since they were stuck in a blocking
> recv.
>
> This commits removes the while (1)-statement.
>
> Fixes: f7010770fbac ("tools/bpf: move bpf/lib netlink related
> functions into a new file")
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
> ---
> tools/lib/bpf/netlink.c | 64 ++++++++++++++++++++---------------------
> 1 file changed, 31 insertions(+), 33 deletions(-)
>
> diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c
> index 469e068dd0c5..0eae1fbf46c6 100644
> --- a/tools/lib/bpf/netlink.c
> +++ b/tools/lib/bpf/netlink.c
> @@ -70,41 +70,39 @@ static int bpf_netlink_recv(int sock, __u32 nl_pid, int seq,
> char buf[4096];
> int len, ret;
>
> - while (1) {
> - len = recv(sock, buf, sizeof(buf), 0);
> - if (len < 0) {
> - ret = -errno;
> + len = recv(sock, buf, sizeof(buf), 0);
> + if (len < 0) {
> + ret = -errno;
> + goto done;
> + }
> +
> + for (nh = (struct nlmsghdr *)buf; NLMSG_OK(nh, len);
> + nh = NLMSG_NEXT(nh, len)) {
> + if (nh->nlmsg_pid != nl_pid) {
> + ret = -LIBBPF_ERRNO__WRNGPID;
> goto done;
> }
> -
> - for (nh = (struct nlmsghdr *)buf; NLMSG_OK(nh, len);
> - nh = NLMSG_NEXT(nh, len)) {
> - if (nh->nlmsg_pid != nl_pid) {
> - ret = -LIBBPF_ERRNO__WRNGPID;
> - goto done;
> - }
> - if (nh->nlmsg_seq != seq) {
> - ret = -LIBBPF_ERRNO__INVSEQ;
> - goto done;
> - }
> - switch (nh->nlmsg_type) {
> - case NLMSG_ERROR:
> - err = (struct nlmsgerr *)NLMSG_DATA(nh);
> - if (!err->error)
> - continue;
> - ret = err->error;
> - nla_dump_errormsg(nh);
> - goto done;
> - case NLMSG_DONE:
> - return 0;
> - default:
> - break;
> - }
> - if (_fn) {
> - ret = _fn(nh, fn, cookie);
> - if (ret)
> - return ret;
> - }
> + if (nh->nlmsg_seq != seq) {
> + ret = -LIBBPF_ERRNO__INVSEQ;
> + goto done;
> + }
> + switch (nh->nlmsg_type) {
> + case NLMSG_ERROR:
> + err = (struct nlmsgerr *)NLMSG_DATA(nh);
> + if (!err->error)
> + continue;
> + ret = err->error;
> + nla_dump_errormsg(nh);
> + goto done;
> + case NLMSG_DONE:
> + return 0;
> + default:
> + break;
> + }
> + if (_fn) {
> + ret = _fn(nh, fn, cookie);
> + if (ret)
> + return ret;
> }
> }
> ret = 0;
>
^ permalink raw reply related
* Re: [PATCH net-next v3 05/17] zinc: ChaCha20 x86_64 implementation
From: Eric Biggers @ 2018-09-11 21:48 UTC (permalink / raw)
To: Jason A. Donenfeld
Cc: Thomas Gleixner, Samuel Neves, LKML, Netdev, David Miller,
Greg Kroah-Hartman, Andrew Lutomirski, Jean-Philippe Aumasson,
Andy Polyakov, mingo, X86 ML, Linux Crypto Mailing List
In-Reply-To: <CAHmME9qPk7iOVspOJOhN0M+C=v+i6GNq9XkDnXXJ=puoFV0Gfg@mail.gmail.com>
On Tue, Sep 11, 2018 at 03:12:14PM -0600, Jason A. Donenfeld wrote:
> Hey Thomas,
>
> On Tue, Sep 11, 2018 at 3:09 AM Thomas Gleixner <tglx@linutronix.de> wrote:
> > > This is covered on the 02/17 commit message, whose relevant paragraph follows:
> > Well, being only cc'ed on only half of the patches does not really help.
>
> Sorry about that. I'm happy to Cc you on the whole series for next
> round if you want. Just let me know. In the meantime:
>
> https://patchwork.ozlabs.org/project/netdev/list/?series=65056
> https://lore.kernel.org/lkml/20180911010838.8818-1-Jason@zx2c4.com/T/
>
> Regards,
> Jason
Please Cc linux-crypto on the whole patch series, including the cover letter.
- Eric
^ permalink raw reply
* Re: kernels > v4.12 oops/crash with ipsec-traffic: bisected to b838d5e1c5b6e57b10ec8af2268824041e3ea911: ipv4: mark DST_NOGC and remove the operation of dst_free()
From: Wolfgang Walter @ 2018-09-11 16:53 UTC (permalink / raw)
To: Steffen Klassert
Cc: Kristian Evensen, Network Development, weiwan, Tobias Hommel,
edumazet
In-Reply-To: <20180911103334.GY23674@gauss3.secunet.de>
Am Dienstag, 11. September 2018, 12:33:34 schrieb Steffen Klassert:
> On Mon, Sep 10, 2018 at 10:18:47AM +0200, Kristian Evensen wrote:
> > Hi,
> >
> > Thanks everyone for all the effort in debugging this issue.
> >
> > On Mon, Sep 10, 2018 at 8:39 AM Steffen Klassert
> >
> > <steffen.klassert@secunet.com> wrote:
> > > The easy fix that could be backported to stable would be
> > > to check skb->dst for NULL and drop the packet in that case.
> >
> > Thought I should just chime in and say that we deployed this
> > work-around when we started observing the error back in June. Since
> > then we have not seen any crashes. Also, we have instrumented some of
> > our kernels to count the number of times the error is hit (overall +
> > consecutive). Compared to the overall number of packets, the error
> > happens very rarely. With our workloads, we on average see the error
> > once every couple of days.
>
> Thanks for letting us know!
>
> I plan to fix this in the ipsec tree with:
>
> Subject: [PATCH RFC] xfrm: Fix NULL pointer dereference when skb_dst_force
> clears the dst_entry.
>
> Since commit 222d7dbd258d ("net: prevent dst uses after free")
> skb_dst_force() might clear the dst_entry attached to the skb.
> The xfrm code don't expect this to happen, so we crash with
> a NULL pointer dereference in this case. Fix it by checking
> skb_dst(skb) for NULL after skb_dst_force() and drop the packet
> in cast the dst_entry was cleared.
>
> Fixes: 222d7dbd258d ("net: prevent dst uses after free")
> Reported-by: Tobias Hommel <netdev-list@genoetigt.de>
> Reported-by: Kristian Evensen <kristian.evensen@gmail.com>
> Reported-by: Wolfgang Walter <linux@stwm.de>
> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
> ---
> net/xfrm/xfrm_output.c | 4 ++++
> net/xfrm/xfrm_policy.c | 4 ++++
> 2 files changed, 8 insertions(+)
>
> diff --git a/net/xfrm/xfrm_output.c b/net/xfrm/xfrm_output.c
> index 89b178a78dc7..36d15a38ce5e 100644
> --- a/net/xfrm/xfrm_output.c
> +++ b/net/xfrm/xfrm_output.c
> @@ -101,6 +101,10 @@ static int xfrm_output_one(struct sk_buff *skb, int
> err) spin_unlock_bh(&x->lock);
>
> skb_dst_force(skb);
> + if (!skb_dst(skb)) {
> + XFRM_INC_STATS(net, LINUX_MIB_XFRMOUTERROR);
> + goto error_nolock;
> + }
>
> if (xfrm_offload(skb)) {
> x->type_offload->encap(x, skb);
> diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
> index 7c5e8978aeaa..626e0f4d1749 100644
> --- a/net/xfrm/xfrm_policy.c
> +++ b/net/xfrm/xfrm_policy.c
> @@ -2548,6 +2548,10 @@ int __xfrm_route_forward(struct sk_buff *skb,
> unsigned short family) }
>
> skb_dst_force(skb);
> + if (!skb_dst(skb)) {
> + XFRM_INC_STATS(net, LINUX_MIB_XFRMFWDHDRERROR);
> + return 0;
> + }
>
> dst = xfrm_lookup(net, skb_dst(skb), &fl, NULL, XFRM_LOOKUP_QUEUE);
> if (IS_ERR(dst)) {
This patch fixes the problem here.
XfrmFwdHdrError gets around 80 at the very beginning and remains so. Probably
this happens when some route are changed/set then.
Regards and thanks,
--
Wolfgang Walter
Studentenwerk München
Anstalt des öffentlichen Rechts
^ permalink raw reply
* Re: [PATCH net-next v3 17/17] net: WireGuard secure network tunnel
From: Andrew Lunn @ 2018-09-11 21:55 UTC (permalink / raw)
To: Jason A. Donenfeld; +Cc: LKML, Netdev, David Miller, Greg Kroah-Hartman
In-Reply-To: <CAHmME9pwVSeE=4aUD6RKLjE+DrHY1=3OtBS7sAvdR66gaNhNtA@mail.gmail.com>
> Most uses of inline are this way after checking the compiler output on
> several platforms. (Generally my build-run-test cycle is actually
> build-IDAPro-run-test.) If you feel that's been done too liberally,
> I'm happy to spend time rechecking each use case.
Hi Jason
I really expect DaveM is going to come along and tell you to remove
them all.
You need a strong argument that they are needed, since this goes
against the general consensus in the kernel, that the compiler is
better at this than humans.
Andrew
^ permalink raw reply
* Editing 4
From: Aaron @ 2018-09-11 6:22 UTC (permalink / raw)
To: netdev
Hi,
If you have photos for editing, please send email to: hansrekan@outlook.com
We have 12 in house image editors and we can help you for cutting out your
photos, or path the photos.
Includes retouching if needed.
Used for products photos or portrait photos, catalog photos.
You may drop us one photo, we can send you the testing work.
Thanks,
Aaron Williams
Email: hansrekan@outlook.com
^ permalink raw reply
* Editing 4
From: Aaron @ 2018-09-11 6:12 UTC (permalink / raw)
To: netdev
Hi,
If you have photos for editing, please send email to: hansrekan@outlook.com
We have 12 in house image editors and we can help you for cutting out your
photos, or path the photos.
Includes retouching if needed.
Used for products photos or portrait photos, catalog photos.
You may drop us one photo, we can send you the testing work.
Thanks,
Aaron Williams
Email: hansrekan@outlook.com
^ permalink raw reply
* Re: [RFC] managing PHY carrier from user space
From: Florian Fainelli @ 2018-09-11 16:56 UTC (permalink / raw)
To: Joakim Tjernlund, netdev@vger.kernel.org, andrew@lunn.ch
In-Reply-To: <2fcf4c20a9adf410db98706f6f2508e46aade2c1.camel@infinera.com>
On 09/11/2018 09:41 AM, Joakim Tjernlund wrote:
> I am looking for a way to induce carrier state from user space, primarily
> for Fixed PHYs as these are always up. ifplugd/dhcp etc. does not behave properly
> if the link is up when it really isn't.
Was my suggestion in my email to you somehow not working? This is
obviously not acceptable for upstream, there is no reason, even for a
fixed PHY, to attempt to mangle with the carrier state for any
reasonable production purposes.
>
> I came up with a new 'phy_carrier' attribute in /sys/class/net/eth0/phydev
> where I can induce carrier state:
>
> diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> index a1e7ea4d4b16..f82beeabdd75 100644
> --- a/drivers/net/phy/phy_device.c
> +++ b/drivers/net/phy/phy_device.c
> @@ -612,10 +612,39 @@ phy_has_fixups_show(struct device *dev, struct device_attribute *attr,
> }
> static DEVICE_ATTR_RO(phy_has_fixups);
>
> +static ssize_t
> +phy_carrier_show(struct device *dev, struct device_attribute *attr,
> + char *buf)
> +{
> + struct phy_device *phydev = to_phy_device(dev);
> + struct net_device *netdev = phydev->attached_dev;
> +
> + return sprintf(buf, "%d\n", netif_carrier_ok(netdev));
> +}
> +
> +static ssize_t phy_carrier_store(struct device *dev, struct device_attribute *attr,
> + const char *buf, size_t len)
> +{
> + struct phy_device *phydev = to_phy_device(dev);
> + struct net_device *netdev = phydev->attached_dev;
> + bool enable;
> +
> + if (strtobool(buf, &enable))
> + return -EINVAL;
> +
> + if (enable)
> + netif_carrier_on(netdev);
> + else
> + netif_carrier_off(netdev);
> + return len;
> +}
> +static DEVICE_ATTR_RW(phy_carrier);
> +
> static struct attribute *phy_dev_attrs[] = {
> &dev_attr_phy_id.attr,
> &dev_attr_phy_interface.attr,
> &dev_attr_phy_has_fixups.attr,
> + &dev_attr_phy_carrier.attr,
> NULL,
> };
> ATTRIBUTE_GROUPS(phy_dev);
>
> I would like to know if this acceptable for linux proper or if there is
> a better way to do this?
>
> Jocke
>
--
Florian
^ permalink raw reply
* Re: [PATCH net-next v3 02/17] zinc: introduce minimal cryptography library
From: Jason A. Donenfeld @ 2018-09-11 22:02 UTC (permalink / raw)
To: Eric Biggers
Cc: Greg Kroah-Hartman, Ard Biesheuvel, LKML, Netdev, David Miller,
Andrew Lutomirski, Samuel Neves, Jean-Philippe Aumasson,
Linux Crypto Mailing List
In-Reply-To: <20180911214737.GA81235@gmail.com>
On Tue, Sep 11, 2018 at 3:47 PM Eric Biggers <ebiggers@kernel.org> wrote:
> Of course, the real problem is that even after multiple revisions of this
> patchset, there's still no actual conversions of the existing crypto API
> algorithms over to use the new implementations. "Zinc" is still completely
> separate from the existing crypto API.
No this is not, "the real problem [...] after multiple revisions"
because I've offered to do this and stated pretty clearly my intent to
do so. But, as I've mentioned before, I'd really prefer to land this
series through net-next, and then after we can turn our attention to
integrating this into the existing crypto API. This series is already
big enough and I would really prefer not to further complicate it.
So, what you want is going to happen. There isn't some kind of
fundamental problem here. This is more of a discussion on
scheduling/trees/mergecycles than anything else.
^ permalink raw reply
* Re: [PATCH net-next v3 05/17] zinc: ChaCha20 x86_64 implementation
From: Jason A. Donenfeld @ 2018-09-11 22:04 UTC (permalink / raw)
To: Eric Biggers
Cc: Thomas Gleixner, Samuel Neves, LKML, Netdev, David Miller,
Greg Kroah-Hartman, Andrew Lutomirski, Jean-Philippe Aumasson,
Andy Polyakov, mingo, X86 ML, Linux Crypto Mailing List
In-Reply-To: <20180911214837.GB81235@gmail.com>
On Tue, Sep 11, 2018 at 3:48 PM Eric Biggers <ebiggers@kernel.org> wrote:
> Please Cc linux-crypto on the whole patch series, including the cover letter.
Ack.
^ permalink raw reply
* Re: [PATCH net-next v3 02/17] zinc: introduce minimal cryptography library
From: Eric Biggers @ 2018-09-11 22:08 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: Jason A. Donenfeld, Linux Kernel Mailing List,
<netdev@vger.kernel.org>, David S. Miller,
Greg Kroah-Hartman, Andy Lutomirski, Samuel Neves,
Jean-Philippe Aumasson,
open list:HARDWARE RANDOM NUMBER GENERATOR CORE
In-Reply-To: <CAKv+Gu9nSeo4PFVrtjtqGQ32eqyxUp3xVCczZfu+dhFz5yLM0A@mail.gmail.com>
On Tue, Sep 11, 2018 at 12:08:56PM +0200, Ard Biesheuvel wrote:
>
> I won't go into the 1000s lines of generated assembly again - you
> already know my position on that topic.
>
I'd strongly prefer the assembly to be readable too. Jason, I'm not sure if
you've actually read through the asm from the OpenSSL implementations, but the
generated .S files actually do lose a lot of semantic information that was in
the original .pl scripts. For example, in the Poly1305 NEON implementation
which I'm especially interested in (but you could check any of the other
generated files too), the original .pl script has register aliases showing the
meaning of each register. Just grabbing a random hunk:
vshr.u64 $T0,$D3,#26
vmovn.i64 $D3#lo,$D3
vshr.u64 $T1,$D0,#26
vmovn.i64 $D0#lo,$D0
vadd.i64 $D4,$D4,$T0 @ h3 -> h4
vbic.i32 $D3#lo,#0xfc000000
vsri.u32 $H4,$H3,#8 @ base 2^32 -> base 2^26
vadd.i64 $D1,$D1,$T1 @ h0 -> h1
vshl.u32 $H3,$H3,#18
vbic.i32 $D0#lo,#0xfc000000
(Yes, it's still not *that* readable, but D0-D4 and H0-H4 map directly to d0-d4
and h0-h4 in the C implementation. So someone familiar with Poly1305
implementations can figure it out.)
In contrast, the generated .S file just has the raw registers. It's difficult
to remember what each register is used for. In fact, someone who actually
wanted to figure it out would probably find themselves referring to the .pl
script -- which raises the question of why the .S file is the "source" and not
the .pl script...
vshr.u64 q15,q8,#26
vmovn.i64 d16,q8
vshr.u64 q4,q5,#26
vmovn.i64 d10,q5
vadd.i64 q9,q9,q15 @ h3 -> h4
vbic.i32 d16,#0xfc000000
vsri.u32 q14,q13,#8 @ base 2^32 -> base 2^26
vadd.i64 q6,q6,q4 @ h0 -> h1
vshl.u32 q13,q13,#18
vbic.i32 d10,#0xfc000000
- Eric
^ permalink raw reply
* Re: [PATCH] net/core/filter: fix unused-variable warning
From: Alexei Starovoitov @ 2018-09-11 22:13 UTC (permalink / raw)
To: Anders Roxell; +Cc: davem, ast, daniel, tehnerd, netdev, linux-kernel
In-Reply-To: <20180907125005.32383-1-anders.roxell@linaro.org>
On Fri, Sep 07, 2018 at 02:50:05PM +0200, Anders Roxell wrote:
> Building with CONFIG_INET=n will show the warning below:
> net/core/filter.c: In function ‘____bpf_getsockopt’:
> net/core/filter.c:4048:19: warning: unused variable ‘tp’ [-Wunused-variable]
> struct tcp_sock *tp;
> ^~
> net/core/filter.c:4046:31: warning: unused variable ‘icsk’ [-Wunused-variable]
> struct inet_connection_sock *icsk;
> ^~~~
> Move the variable declarations inside the {} block where they are used.
>
> Fixes: 1e215300f138 ("bpf: add TCP_SAVE_SYN/TCP_SAVED_SYN options for bpf_(set|get)sockopt")
> Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
> ---
> net/core/filter.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/net/core/filter.c b/net/core/filter.c
> index d301134bca3a..0ae7185b2207 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -4043,14 +4043,14 @@ static const struct bpf_func_proto bpf_setsockopt_proto = {
> BPF_CALL_5(bpf_getsockopt, struct bpf_sock_ops_kern *, bpf_sock,
> int, level, int, optname, char *, optval, int, optlen)
> {
> - struct inet_connection_sock *icsk;
> struct sock *sk = bpf_sock->sk;
> - struct tcp_sock *tp;
>
> if (!sk_fullsock(sk))
> goto err_clear;
> #ifdef CONFIG_INET
> if (level == SOL_TCP && sk->sk_prot->getsockopt == tcp_getsockopt) {
> + struct inet_connection_sock *icsk;
> + struct tcp_sock *tp;
> switch (optname) {
I've added empty line after variable decl and applied.
^ permalink raw reply
* Re: tools/bpf regression causing samples/bpf/ to hang
From: Björn Töpel @ 2018-09-11 17:15 UTC (permalink / raw)
To: yhs; +Cc: Netdev, ast, Daniel Borkmann, Jesper Dangaard Brouer
In-Reply-To: <f17b2896-6191-b929-593e-20ae090d4d56@fb.com>
Den tis 11 sep. 2018 kl 18:47 skrev Yonghong Song <yhs@fb.com>:
>
>
>
> On 9/11/18 4:11 AM, Björn Töpel wrote:
> > Hi Yonghong, I tried to run the XDP samples from the bpf-next tip
> > today, and was hit by a regression.
> >
> > Commit f7010770fbac ("tools/bpf: move bpf/lib netlink related
> > functions into a new file") adds a while(1) around the recv call in
> > bpf_set_link_xdp_fd making that call getting stuck in an infinite
> > loop.
> >
> > I simply removed the loop, and that solved my problem (patch below).
> >
> > However, I don't know if removing the loop would break bpftool for
> > you. If not, I can submit the patch as a proper one for bpf-next.
>
> Hi, Björn, thanks for reporting the problem.
> The while loop is needed since the "recv" syscall buffer size
> may not be big enough to hold all the returned information, in
> which cases, multiple "recv" calls are needed.
>
> Could you try the following patch to see whether it fixed your
> issue? Thanks!
>
Nope, it doesn't -- but if you move that hunk after the for-loop it works.
Cheers,
Björn
> commit 3eb1c0249dfc3ea4ad61aa223dce32262af7e049 (HEAD -> fix)
> Author: Yonghong Song <yhs@fb.com>
> Date: Tue Sep 11 08:58:20 2018 -0700
>
> tools/bpf: fix a netlink recv issue
>
> Commit f7010770fbac ("tools/bpf: move bpf/lib netlink related
> functions into a new file") introduced a while loop for the
> netlink recv path. This while loop is needed since the
> buffer in recv syscall may not be big enough to hold all the
> information and in such cases multiple recv calls are needed.
>
> When netlink recv returns message length of 0, there will be
> no more messages for returning data so the while loop
> can end.
>
> Fixes: f7010770fbac ("tools/bpf: move bpf/lib netlink related
> functions into a new file")
> Reported-by: Björn Töpel <bjorn.topel@intel.com>
> Signed-off-by: Yonghong Song <yhs@fb.com>
>
> diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c
> index 469e068dd0c5..37827319a50a 100644
> --- a/tools/lib/bpf/netlink.c
> +++ b/tools/lib/bpf/netlink.c
> @@ -77,6 +77,9 @@ static int bpf_netlink_recv(int sock, __u32 nl_pid,
> int seq,
> goto done;
> }
>
> + if (len == 0)
> + break;
> +
> for (nh = (struct nlmsghdr *)buf; NLMSG_OK(nh, len);
> nh = NLMSG_NEXT(nh, len)) {
> if (nh->nlmsg_pid != nl_pid) {
>
>
> >
> > Thanks!
> > Björn
> >
> > From: =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= <bjorn.topel@intel.com>
> > Date: Tue, 11 Sep 2018 12:35:44 +0200
> > Subject: [PATCH] tools/bpf: remove loop around netlink recv
> >
> > Commit f7010770fbac ("tools/bpf: move bpf/lib netlink related
> > functions into a new file") moved the bpf_set_link_xdp_fd and split it
> > up into multiple functions. The added receive function
> > bpf_netlink_recv added a loop around the recv syscall leading to
> > multiple recv calls. This resulted in all XDP samples in the
> > samples/bpf/ to stop working, since they were stuck in a blocking
> > recv.
> >
> > This commits removes the while (1)-statement.
> >
> > Fixes: f7010770fbac ("tools/bpf: move bpf/lib netlink related
> > functions into a new file")
> > Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
> > ---
> > tools/lib/bpf/netlink.c | 64 ++++++++++++++++++++---------------------
> > 1 file changed, 31 insertions(+), 33 deletions(-)
> >
> > diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c
> > index 469e068dd0c5..0eae1fbf46c6 100644
> > --- a/tools/lib/bpf/netlink.c
> > +++ b/tools/lib/bpf/netlink.c
> > @@ -70,41 +70,39 @@ static int bpf_netlink_recv(int sock, __u32 nl_pid, int seq,
> > char buf[4096];
> > int len, ret;
> >
> > - while (1) {
> > - len = recv(sock, buf, sizeof(buf), 0);
> > - if (len < 0) {
> > - ret = -errno;
> > + len = recv(sock, buf, sizeof(buf), 0);
> > + if (len < 0) {
> > + ret = -errno;
> > + goto done;
> > + }
> > +
> > + for (nh = (struct nlmsghdr *)buf; NLMSG_OK(nh, len);
> > + nh = NLMSG_NEXT(nh, len)) {
> > + if (nh->nlmsg_pid != nl_pid) {
> > + ret = -LIBBPF_ERRNO__WRNGPID;
> > goto done;
> > }
> > -
> > - for (nh = (struct nlmsghdr *)buf; NLMSG_OK(nh, len);
> > - nh = NLMSG_NEXT(nh, len)) {
> > - if (nh->nlmsg_pid != nl_pid) {
> > - ret = -LIBBPF_ERRNO__WRNGPID;
> > - goto done;
> > - }
> > - if (nh->nlmsg_seq != seq) {
> > - ret = -LIBBPF_ERRNO__INVSEQ;
> > - goto done;
> > - }
> > - switch (nh->nlmsg_type) {
> > - case NLMSG_ERROR:
> > - err = (struct nlmsgerr *)NLMSG_DATA(nh);
> > - if (!err->error)
> > - continue;
> > - ret = err->error;
> > - nla_dump_errormsg(nh);
> > - goto done;
> > - case NLMSG_DONE:
> > - return 0;
> > - default:
> > - break;
> > - }
> > - if (_fn) {
> > - ret = _fn(nh, fn, cookie);
> > - if (ret)
> > - return ret;
> > - }
> > + if (nh->nlmsg_seq != seq) {
> > + ret = -LIBBPF_ERRNO__INVSEQ;
> > + goto done;
> > + }
> > + switch (nh->nlmsg_type) {
> > + case NLMSG_ERROR:
> > + err = (struct nlmsgerr *)NLMSG_DATA(nh);
> > + if (!err->error)
> > + continue;
> > + ret = err->error;
> > + nla_dump_errormsg(nh);
> > + goto done;
> > + case NLMSG_DONE:
> > + return 0;
> > + default:
> > + break;
> > + }
> > + if (_fn) {
> > + ret = _fn(nh, fn, cookie);
> > + if (ret)
> > + return ret;
> > }
> > }
> > ret = 0;
> >
^ permalink raw reply
* Re: [PATCH net-next v3 02/17] zinc: introduce minimal cryptography library
From: Andy Lutomirski @ 2018-09-11 22:16 UTC (permalink / raw)
To: Eric Biggers
Cc: Greg Kroah-Hartman, Ard Biesheuvel, Jason A. Donenfeld,
Linux Kernel Mailing List, <netdev@vger.kernel.org>,
David S. Miller, Andy Lutomirski, Samuel Neves,
Jean-Philippe Aumasson,
open list:HARDWARE RANDOM NUMBER GENERATOR CORE
In-Reply-To: <20180911214737.GA81235@gmail.com>
> On Sep 11, 2018, at 2:47 PM, Eric Biggers <ebiggers@kernel.org> wrote:
>
>> On Tue, Sep 11, 2018 at 04:56:24PM +0200, Greg Kroah-Hartman wrote:
>> On Tue, Sep 11, 2018 at 12:08:56PM +0200, Ard Biesheuvel wrote:
>>>> As Zinc is simply library code, its config options are un-menued, with
>>>> the exception of CONFIG_ZINC_DEBUG, which enables various selftests and
>>>> BUG_ONs.
>>>>
>>>
>>> In spite of the wall of text, you fail to point out exactly why the
>>> existing AEAD API in unsuitable, and why fixing it is not an option.
>>>
>>> As I pointed out in a previous version, I don't think we need a
>>> separate crypto API/library in the kernel, and I don't think you have
>>> convinced anyone else yet either.
>>
>> Um, then why do people keep sprinkling new crypto/hash code all around
>> the kernel tree? It's because what we have as a crypto api is complex
>> and is hard to use for many in-kernel users.
>>
>> Something like this new interface (zinc) is a much "saner" api for
>> writing kernel code that has to interact with crypto/hash primitives.
>>
>> I see no reason why the existing crypto code can be redone to use the
>> zinc crypto primitives over time, making there only be one main location
>> for the crypto algorithms. But to do it the other way around is pretty
>> much impossible given the complexities in the existing api that has been
>> created over time.
>>
>> Not to say that the existing api is not a viable one, but ugh, really?
>> You have to admit it is a pain to try to use in any "normal" type of
>> "here's a bytestream, go give me a hash" type of method, right?
>>
>> Also there is the added benefit that the crypto primitives here have
>> been audited by a number of people (so Jason stated), and they are
>> written in a way that the crypto community can more easily interact and
>> contribute to. Which is _way_ better than what we have today.
>>
>> So this gets my "stamp of approval" for whatever it is worth :)
>>
>
> I think you mean you see no reason why it *cannot* be converted? The
> conversions definitely *should* be done, just like how some of the existing
> crypto API algorithms like SHA-256 already wrap implementations in lib/. In my
> view, lib/zinc/ isn't fundamentally different from what we already have for some
> algorithms. So it's misguided to design/present it as some novel thing, which
> unfortunately this patchset still does to a large extent. (The actual new thing
> is how architecture-specific implementations are handled.)
>
> Of course, the real problem is that even after multiple revisions of this
> patchset, there's still no actual conversions of the existing crypto API
> algorithms over to use the new implementations. "Zinc" is still completely
> separate from the existing crypto API.
>
Jason, can you do one of these conversions as an example?
> So, it's not yet clear that the conversions will actually work out without
> problems that would require changes in "Zinc". I don't think it makes sense to
> merge all this stuff without doing the conversions, or at the very least
> demonstrating how they will be done.
>
> In particular, in its current form "Zinc" is useless for anyone that needs the
> existing crypto API. For example, for HPolyC,
> (https://lkml.org/lkml/2018/8/6/857), I need to make improvements to ChaCha and
> Poly1305 in the existing crypto API, e.g. to add support for XChaCha and
> NEON-accelerated Poly1305. Having completely separate ChaCha and Poly1305
> implementations in Zinc doesn't help at all. If anything, it makes things
> harder because people will have to review/maintain both sets of implementations;
> and when trying to make the improvements I need, I'll find myself in the middle
> of a holy war between two competing camps who each have their own opinion about
> The Right Way To Do Crypto, and their own crypto implementations and APIs in the
> kernel.
>
> - Eric
^ permalink raw reply
* Re: [PATCH RESEND net-next] docs: net: Convert tcp.txt to RST format
From: Tobin C. Harding @ 2018-09-11 22:25 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S. Miller, Eric Dumazet, netdev, linux-doc, linux-kernel
In-Reply-To: <7f50abcd-f2fa-c043-9e5b-8e5ca89c52a1@gmail.com>
On Tue, Sep 11, 2018 at 03:30:19AM -0700, Eric Dumazet wrote:
>
>
> On 09/11/2018 12:13 AM, Tobin C. Harding wrote:
> > Restructured text is the kernel documentation format of choice now.
> > Some text from tcp.txt is out of date, specifically the function
> > tcp_write() does not appear to be in the tree anymore. Also the
> > following data members have been removed
> >
> > sk->tcp_pend_event
> > sk->transmit_queue
> > sk->transmit_new
> > sk->transmit_end
> > sk->tcp_last_tx_ack
> > sk->tcp_dup_ack
> >
> > Remove section 'How the new TCP output machine [nyi] works'. This
> > leaves only a single section so we can name the document with that
> > section heading now.
> >
> > Convert tcp.txt to RST format. Add GPLv2 SPDX tag.
> >
> > Signed-off-by: Tobin C. Harding <me@tobin.cc>
> >
>
> I dunno, this 'doc' is probably useless and should be deleted.
Cool! The best type of patch - line removal only. I'll put it in.
thanks,
Tobin.
^ permalink raw reply
* Re: [PATCH net-next v3 02/17] zinc: introduce minimal cryptography library
From: Andy Lutomirski @ 2018-09-11 23:01 UTC (permalink / raw)
To: Jason A. Donenfeld
Cc: Eric Biggers, Greg Kroah-Hartman, Ard Biesheuvel, LKML, Netdev,
David Miller, Andrew Lutomirski, Samuel Neves,
Jean-Philippe Aumasson, Linux Crypto Mailing List
In-Reply-To: <CAHmME9q+mqT+vtM1FdBvf8=nAFu=u+wN48sOxqLcFthxYVn7NQ@mail.gmail.com>
> On Sep 11, 2018, at 3:18 PM, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
>> On Tue, Sep 11, 2018 at 4:16 PM Andy Lutomirski <luto@amacapital.net> wrote:
>> Jason, can you do one of these conversions as an example?
>
> My preference is really to leave that to a different patch series. But
> if you think I *must*, then I shall.
>
>
I think Ard’s point is valid: in the long run we don’t want two competing software implementations of each primitive. It clearly *should* be possible to make crypto API call into zinc for synchronous software operations, but a demonstration of how this actually works and that there isn’t some change to zinc to make it would well would be in order, I think.
IMO the right approach is do one conversion right away and save the rest for later.
^ permalink raw reply
* Re: bpf: btf: Change how section is supported in btf_header
From: Martin KaFai Lau @ 2018-09-11 18:10 UTC (permalink / raw)
To: Dmitry Vyukov; +Cc: Daniel Borkmann, Alexei Starovoitov, netdev
In-Reply-To: <CACT4Y+YzC15pw=30uS3Vo27-cZp=V=CdnG5vYDYrULVLKvj=YQ@mail.gmail.com>
On Tue, Sep 11, 2018 at 06:40:05PM +0200, Dmitry Vyukov wrote:
> Hi Martin,
>
> I am looking at the subj commit:
>
> static int btf_add_type(struct btf_verifier_env *env, struct btf_type *t)
> @@ -1754,9 +1756,9 @@ static int btf_check_all_metas(struct
> btf_verifier_env *env)
> struct btf_header *hdr;
> void *cur, *end;
>
> - hdr = btf->hdr;
> + hdr = &btf->hdr;
> cur = btf->nohdr_data + hdr->type_off;
> - end = btf->nohdr_data + hdr->str_off;
> + end = btf->nohdr_data + hdr->type_len;
>
> Shouldn't this be:
>
> + end = cur + hdr->type_len;
>
> ? Or otherwise I am having trouble understanding meaning of fields.
You are correct. Thanks for pointing this out.
Do you want to post an offical patch for the bpf branch?
>
> On a related note, what's between header and type_off? Is type_off
> supposed to be 0 always?
type section is always the first section for now (i.e. immediately after
the header). Some other sections could be introduced later and it could
be located before the type section such that the type_off will not be 0.
^ permalink raw reply
* Re: tools/bpf regression causing samples/bpf/ to hang
From: Yonghong Song @ 2018-09-11 18:21 UTC (permalink / raw)
To: Björn Töpel
Cc: Netdev, ast, Daniel Borkmann, Jesper Dangaard Brouer
In-Reply-To: <CAJ+HfNhXJbhp3z+fL-3iXJD-5yFtEL0P1OmdsTkmZHUMviuAcQ@mail.gmail.com>
On 9/11/18 10:15 AM, Björn Töpel wrote:
> Den tis 11 sep. 2018 kl 18:47 skrev Yonghong Song <yhs@fb.com>:
>>
>>
>>
>> On 9/11/18 4:11 AM, Björn Töpel wrote:
>>> Hi Yonghong, I tried to run the XDP samples from the bpf-next tip
>>> today, and was hit by a regression.
>>>
>>> Commit f7010770fbac ("tools/bpf: move bpf/lib netlink related
>>> functions into a new file") adds a while(1) around the recv call in
>>> bpf_set_link_xdp_fd making that call getting stuck in an infinite
>>> loop.
>>>
>>> I simply removed the loop, and that solved my problem (patch below).
>>>
>>> However, I don't know if removing the loop would break bpftool for
>>> you. If not, I can submit the patch as a proper one for bpf-next.
>>
>> Hi, Björn, thanks for reporting the problem.
>> The while loop is needed since the "recv" syscall buffer size
>> may not be big enough to hold all the returned information, in
>> which cases, multiple "recv" calls are needed.
>>
>> Could you try the following patch to see whether it fixed your
>> issue? Thanks!
>>
>
> Nope, it doesn't -- but if you move that hunk after the for-loop it works.
Could you try this patch?
commit 9a7fb19899ce87594fe8012f8a23fc8fc7b6b764 (HEAD -> fix)
Author: Yonghong Song <yhs@fb.com>
Date: Tue Sep 11 08:58:20 2018 -0700
tools/bpf: fix a netlink recv issue
Commit f7010770fbac ("tools/bpf: move bpf/lib netlink related
functions into a new file") introduced a while loop for the
netlink recv path. This while loop is needed since the
buffer in recv syscall may not be enough to hold all the
information and in such cases multiple recv calls are needed.
There is a bug introduced by the above commit as
the while loop may block on recv syscall if there is no
more messages are expected. The netlink message header
flag NLM_F_MULTI is used to indicate that more messages
are expected and this patch fixed the bug by doing
further recv syscall only if multipart message is expected.
The patch added another fix regarding to message length of 0.
When netlink recv returns message length of 0, there will be
no more messages for returning data so the while loop
can end.
Fixes: f7010770fbac ("tools/bpf: move bpf/lib netlink related
functions into a new file")
Reported-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c
index 469e068dd0c5..fde1d7bf8199 100644
--- a/tools/lib/bpf/netlink.c
+++ b/tools/lib/bpf/netlink.c
@@ -65,18 +65,23 @@ static int bpf_netlink_recv(int sock, __u32 nl_pid,
int seq,
__dump_nlmsg_t _fn, dump_nlmsg_t fn,
void *cookie)
{
+ bool multipart = true;
struct nlmsgerr *err;
struct nlmsghdr *nh;
char buf[4096];
int len, ret;
- while (1) {
+ while (multipart) {
+ multipart = false;
len = recv(sock, buf, sizeof(buf), 0);
if (len < 0) {
ret = -errno;
goto done;
}
+ if (len == 0)
+ break;
+
for (nh = (struct nlmsghdr *)buf; NLMSG_OK(nh, len);
nh = NLMSG_NEXT(nh, len)) {
if (nh->nlmsg_pid != nl_pid) {
@@ -87,6 +92,8 @@ static int bpf_netlink_recv(int sock, __u32 nl_pid,
int seq,
ret = -LIBBPF_ERRNO__INVSEQ;
goto done;
}
+ if (nh->nlmsg_flags & NLM_F_MULTI)
+ multipart = true;
switch (nh->nlmsg_type) {
case NLMSG_ERROR:
err = (struct nlmsgerr *)NLMSG_DATA(nh);
>
> Cheers,
> Björn
>
>> commit 3eb1c0249dfc3ea4ad61aa223dce32262af7e049 (HEAD -> fix)
>> Author: Yonghong Song <yhs@fb.com>
>> Date: Tue Sep 11 08:58:20 2018 -0700
>>
>> tools/bpf: fix a netlink recv issue
>>
>> Commit f7010770fbac ("tools/bpf: move bpf/lib netlink related
>> functions into a new file") introduced a while loop for the
>> netlink recv path. This while loop is needed since the
>> buffer in recv syscall may not be big enough to hold all the
>> information and in such cases multiple recv calls are needed.
>>
>> When netlink recv returns message length of 0, there will be
>> no more messages for returning data so the while loop
>> can end.
>>
>> Fixes: f7010770fbac ("tools/bpf: move bpf/lib netlink related
>> functions into a new file")
>> Reported-by: Björn Töpel <bjorn.topel@intel.com>
>> Signed-off-by: Yonghong Song <yhs@fb.com>
>>
>> diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c
>> index 469e068dd0c5..37827319a50a 100644
>> --- a/tools/lib/bpf/netlink.c
>> +++ b/tools/lib/bpf/netlink.c
>> @@ -77,6 +77,9 @@ static int bpf_netlink_recv(int sock, __u32 nl_pid,
>> int seq,
>> goto done;
>> }
>>
>> + if (len == 0)
>> + break;
>> +
>> for (nh = (struct nlmsghdr *)buf; NLMSG_OK(nh, len);
>> nh = NLMSG_NEXT(nh, len)) {
>> if (nh->nlmsg_pid != nl_pid) {
>>
>>
>>>
>>> Thanks!
>>> Björn
>>>
>>> From: =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= <bjorn.topel@intel.com>
>>> Date: Tue, 11 Sep 2018 12:35:44 +0200
>>> Subject: [PATCH] tools/bpf: remove loop around netlink recv
>>>
>>> Commit f7010770fbac ("tools/bpf: move bpf/lib netlink related
>>> functions into a new file") moved the bpf_set_link_xdp_fd and split it
>>> up into multiple functions. The added receive function
>>> bpf_netlink_recv added a loop around the recv syscall leading to
>>> multiple recv calls. This resulted in all XDP samples in the
>>> samples/bpf/ to stop working, since they were stuck in a blocking
>>> recv.
>>>
>>> This commits removes the while (1)-statement.
>>>
>>> Fixes: f7010770fbac ("tools/bpf: move bpf/lib netlink related
>>> functions into a new file")
>>> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
>>> ---
>>> tools/lib/bpf/netlink.c | 64 ++++++++++++++++++++---------------------
>>> 1 file changed, 31 insertions(+), 33 deletions(-)
>>>
>>> diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c
>>> index 469e068dd0c5..0eae1fbf46c6 100644
>>> --- a/tools/lib/bpf/netlink.c
>>> +++ b/tools/lib/bpf/netlink.c
>>> @@ -70,41 +70,39 @@ static int bpf_netlink_recv(int sock, __u32 nl_pid, int seq,
>>> char buf[4096];
>>> int len, ret;
>>>
>>> - while (1) {
>>> - len = recv(sock, buf, sizeof(buf), 0);
>>> - if (len < 0) {
>>> - ret = -errno;
>>> + len = recv(sock, buf, sizeof(buf), 0);
>>> + if (len < 0) {
>>> + ret = -errno;
>>> + goto done;
>>> + }
>>> +
>>> + for (nh = (struct nlmsghdr *)buf; NLMSG_OK(nh, len);
>>> + nh = NLMSG_NEXT(nh, len)) {
>>> + if (nh->nlmsg_pid != nl_pid) {
>>> + ret = -LIBBPF_ERRNO__WRNGPID;
>>> goto done;
>>> }
>>> -
>>> - for (nh = (struct nlmsghdr *)buf; NLMSG_OK(nh, len);
>>> - nh = NLMSG_NEXT(nh, len)) {
>>> - if (nh->nlmsg_pid != nl_pid) {
>>> - ret = -LIBBPF_ERRNO__WRNGPID;
>>> - goto done;
>>> - }
>>> - if (nh->nlmsg_seq != seq) {
>>> - ret = -LIBBPF_ERRNO__INVSEQ;
>>> - goto done;
>>> - }
>>> - switch (nh->nlmsg_type) {
>>> - case NLMSG_ERROR:
>>> - err = (struct nlmsgerr *)NLMSG_DATA(nh);
>>> - if (!err->error)
>>> - continue;
>>> - ret = err->error;
>>> - nla_dump_errormsg(nh);
>>> - goto done;
>>> - case NLMSG_DONE:
>>> - return 0;
>>> - default:
>>> - break;
>>> - }
>>> - if (_fn) {
>>> - ret = _fn(nh, fn, cookie);
>>> - if (ret)
>>> - return ret;
>>> - }
>>> + if (nh->nlmsg_seq != seq) {
>>> + ret = -LIBBPF_ERRNO__INVSEQ;
>>> + goto done;
>>> + }
>>> + switch (nh->nlmsg_type) {
>>> + case NLMSG_ERROR:
>>> + err = (struct nlmsgerr *)NLMSG_DATA(nh);
>>> + if (!err->error)
>>> + continue;
>>> + ret = err->error;
>>> + nla_dump_errormsg(nh);
>>> + goto done;
>>> + case NLMSG_DONE:
>>> + return 0;
>>> + default:
>>> + break;
>>> + }
>>> + if (_fn) {
>>> + ret = _fn(nh, fn, cookie);
>>> + if (ret)
>>> + return ret;
>>> }
>>> }
>>> ret = 0;
>>>
^ permalink raw reply related
* Re: [PATCH v2] neighbour: confirm neigh entries when ARP packet is received
From: Vasiliy Khoruzhick @ 2018-09-11 18:23 UTC (permalink / raw)
To: Stephen Hemminger
Cc: David S. Miller, Roopa Prabhu, Alexey Dobriyan, Eric Dumazet,
Jim Westfall, Wolfgang Bumiller, Vasily Khoruzhick, Kees Cook,
Ihar Hrachyshka, netdev, linux-kernel
In-Reply-To: <20180911111217.3e3679c5@xeon-e3>
On Tue, Sep 11, 2018 at 11:12 AM, Stephen Hemminger
<stephen@networkplumber.org> wrote:
> On Tue, 11 Sep 2018 11:04:06 -0700
> Vasily Khoruzhick <vasilykh@arista.com> wrote:
>
>> diff --git a/net/core/neighbour.c b/net/core/neighbour.c
>> index aa19d86937af..56a554597db5 100644
>> --- a/net/core/neighbour.c
>> +++ b/net/core/neighbour.c
>> @@ -1180,6 +1180,12 @@ int neigh_update(struct neighbour *neigh, const u8 *lladdr, u8 new,
>> lladdr = neigh->ha;
>> }
>>
>> + /* Update confirmed timestamp for neighbour entry after we
>> + * received ARP packet even if it doesn't change IP to MAC binding.
>> + */
>> + if (new & NUD_CONNECTED)
>> + neigh->confirmed = jiffies;
>
> You might want to do:
> if ((new & NUD_CONNECTED) && neigh->confirmed != jiffies)
> neigh->confirmed = jiffies;
>
> This avoid poisoning the cacheline with unnecessary write.
Sorry for duplicate - this time in plain text, so it should get
through lkml filter:
I don't think that it's performance-critical path, so this
optimization is unnecessary
and it doesn't improve code readability.
^ permalink raw reply
* Re: [PATCH net-next v3 02/17] zinc: introduce minimal cryptography library
From: Andrew Lunn @ 2018-09-11 23:30 UTC (permalink / raw)
To: Jason A. Donenfeld
Cc: Eric Biggers, Greg Kroah-Hartman, Ard Biesheuvel, LKML, Netdev,
David Miller, Andrew Lutomirski, Samuel Neves,
Jean-Philippe Aumasson, Linux Crypto Mailing List
In-Reply-To: <CAHmME9rFUruF-VN1pmU-k5nFsb9ppAPhPpW-5Ho9dKL2HCg4kA@mail.gmail.com>
On Tue, Sep 11, 2018 at 04:02:52PM -0600, Jason A. Donenfeld wrote:
> On Tue, Sep 11, 2018 at 3:47 PM Eric Biggers <ebiggers@kernel.org> wrote:
> > Of course, the real problem is that even after multiple revisions of this
> > patchset, there's still no actual conversions of the existing crypto API
> > algorithms over to use the new implementations. "Zinc" is still completely
> > separate from the existing crypto API.
>
> No this is not, "the real problem [...] after multiple revisions"
> because I've offered to do this and stated pretty clearly my intent to
> do so. But, as I've mentioned before, I'd really prefer to land this
> series through net-next
Hi Jason
Just as an FYI:
1) I don't think anybody in netdev has taken a serious look at the
network code yet. There is little point until the controversial part
of the code, Zinc, has been sorted out.
2) I personally would be surprised if DaveM took this code without
having an Acked-by from the crypto subsystem people. In the same way,
i doubt the crypto people would take an Ethernet driver without having
DaveM's Acked-by.
Andrew
^ permalink raw reply
* Re: [PATCH v2 net-next 0/4] net: batched receive in GRO path
From: Edward Cree @ 2018-09-11 18:34 UTC (permalink / raw)
To: Eric Dumazet, davem; +Cc: linux-net-drivers, netdev
In-Reply-To: <01bc9bf5-1780-2650-958f-961bd24b8c26@gmail.com>
On 07/09/18 03:32, Eric Dumazet wrote:
> Adding this complexity and icache pressure needs more experimental results.
> What about RPC workloads (eg 100 concurrent netperf -t TCP_RR -- -r 8000,8000 )
>
> Thanks.
Some more results. Note that the TCP_STREAM figures given in the cover
letter were '-m 1450'; when I run that with '-m 8000' I hit line rate on
my 10G NIC on both the old and new code. Also, these tests are still all
with IRQs bound to a single core on the RX side.
A further note: the Code Under Test is running on the netserver side (RX
side for TCP_STREAM tests); the netperf side is running stock RHEL7u3
(kernel 3.10.0-514.el7.x86_64). This potentially matters more for the
TCP_RR test as both sides have to receive data.
TCP_STREAM, 8000 bytes, GRO enabled (4 streams)
old: 9.415 Gbit/s
new: 9.417 Gbit/s
(Welch p = 0.087, n₁ = n₂ = 3)
There was however a noticeable reduction in *TX* CPU usage, of about 15%.
I don't know why that should be (changes in ack timing, perhaps?)
TCP_STREAM, 8000 bytes, GRO disabled (4 streams)
old: 5.200 Gbit/s
new: 5.839 Gbit/s (12.3% faster)
(Welch p < 0.001, n₁ = n₂ = 6)
TCP_RR, 8000 bytes, GRO enabled (100 streams)
(FoM is one-way latency, 0.5 / tps)
old: 855.833 us
new: 862.033 us (0.7% slower)
(Welch p = 0.040, n₁ = n₂ = 6)
TCP_RR, 8000 bytes, GRO disabled (100 streams)
old: 962.733 us
new: 871.417 us (9.5% faster)
(Welch p < 0.001, n₁ = n₂ = 6)
Conclusion: with GRO on we pay a small but real RR penalty. With GRO off
(thus also with traffic that can't be coalesced) we get a noticeable
speed boost from being able to use netif_receive_skb_list_internal().
-Ed
^ permalink raw reply
* [Patch net-next] llc: avoid blocking in llc_sap_close()
From: Cong Wang @ 2018-09-11 18:42 UTC (permalink / raw)
To: netdev; +Cc: Cong Wang
llc_sap_close() is called by llc_sap_put() which
could be called in BH context in llc_rcv(). We can't
block in BH.
There is no reason to block it here, kfree_rcu() should
be sufficient.
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
include/net/llc.h | 1 +
net/llc/llc_core.c | 4 +---
2 files changed, 2 insertions(+), 3 deletions(-)
diff --git a/include/net/llc.h b/include/net/llc.h
index 890a87318014..df282d9b4017 100644
--- a/include/net/llc.h
+++ b/include/net/llc.h
@@ -66,6 +66,7 @@ struct llc_sap {
int sk_count;
struct hlist_nulls_head sk_laddr_hash[LLC_SK_LADDR_HASH_ENTRIES];
struct hlist_head sk_dev_hash[LLC_SK_DEV_HASH_ENTRIES];
+ struct rcu_head rcu;
};
static inline
diff --git a/net/llc/llc_core.c b/net/llc/llc_core.c
index 260b3dc1b4a2..64d4bef04e73 100644
--- a/net/llc/llc_core.c
+++ b/net/llc/llc_core.c
@@ -127,9 +127,7 @@ void llc_sap_close(struct llc_sap *sap)
list_del_rcu(&sap->node);
spin_unlock_bh(&llc_sap_list_lock);
- synchronize_rcu();
-
- kfree(sap);
+ kfree_rcu(sap, rcu);
}
static struct packet_type llc_packet_type __read_mostly = {
--
2.14.4
^ permalink raw reply related
* Re: [PATCH net-next v3 02/17] zinc: introduce minimal cryptography library
From: David Miller @ 2018-09-11 23:57 UTC (permalink / raw)
To: andrew
Cc: Jason, ebiggers, gregkh, ard.biesheuvel, linux-kernel, netdev,
luto, sneves, jeanphilippe.aumasson, linux-crypto
In-Reply-To: <20180911233015.GD11474@lunn.ch>
From: Andrew Lunn <andrew@lunn.ch>
Date: Wed, 12 Sep 2018 01:30:15 +0200
> Just as an FYI:
>
> 1) I don't think anybody in netdev has taken a serious look at the
> network code yet. There is little point until the controversial part
> of the code, Zinc, has been sorted out.
>
> 2) I personally would be surprised if DaveM took this code without
> having an Acked-by from the crypto subsystem people. In the same way,
> i doubt the crypto people would take an Ethernet driver without having
> DaveM's Acked-by.
Both of Andrew's statements are completely true.
I'm not looking at any of the networking bits until the crypto stuff
is fully sorted and fully supported and Ack'd by crypto folks.
^ permalink raw reply
* Re: [PATCH net-next v3 02/17] zinc: introduce minimal cryptography library
From: Jason A. Donenfeld @ 2018-09-12 0:01 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Eric Biggers, Greg Kroah-Hartman, Ard Biesheuvel, LKML, Netdev,
David Miller, Andrew Lutomirski, Samuel Neves,
Jean-Philippe Aumasson, Linux Crypto Mailing List
In-Reply-To: <2E01FBB6-030A-40AB-8BEE-F8F271A57568@amacapital.net>
Hi Andy,
On Tue, Sep 11, 2018 at 5:01 PM Andy Lutomirski <luto@amacapital.net> wrote:
> I think Ard’s point is valid: in the long run we don’t want two competing software implementations of each primitive. It clearly *should* be possible to make crypto API call into zinc for synchronous software operations, but a demonstration of how this actually works and that there isn’t some change to zinc to make it would well would be in order, I think.
>
> IMO the right approach is do one conversion right away and save the rest for later.
Alright, I'll go ahead and do this for v4. Thanks for the guidance.
Jason
^ permalink raw reply
* Re: [PATCH net-next v3 02/17] zinc: introduce minimal cryptography library
From: Jason A. Donenfeld @ 2018-09-12 0:02 UTC (permalink / raw)
To: David Miller
Cc: Andrew Lunn, Eric Biggers, Greg Kroah-Hartman, Ard Biesheuvel,
LKML, Netdev, Andrew Lutomirski, Samuel Neves,
Jean-Philippe Aumasson, Linux Crypto Mailing List
In-Reply-To: <20180911.165739.2032677219588723041.davem@davemloft.net>
On Tue, Sep 11, 2018 at 5:57 PM David Miller <davem@davemloft.net> wrote:
> Both of Andrew's statements are completely true.
>
> I'm not looking at any of the networking bits until the crypto stuff
> is fully sorted and fully supported and Ack'd by crypto folks.
Seems reasonable to me.
Jason
^ permalink raw reply
* Re: tools/bpf regression causing samples/bpf/ to hang
From: Björn Töpel @ 2018-09-11 19:01 UTC (permalink / raw)
To: yhs; +Cc: Netdev, ast, Daniel Borkmann, Jesper Dangaard Brouer
In-Reply-To: <a5a23097-c59f-42a2-eeff-050b825cde11@fb.com>
Den tis 11 sep. 2018 kl 20:21 skrev Yonghong Song <yhs@fb.com>:
>
>
>
> On 9/11/18 10:15 AM, Björn Töpel wrote:
> > Den tis 11 sep. 2018 kl 18:47 skrev Yonghong Song <yhs@fb.com>:
> >>
> >>
> >>
> >> On 9/11/18 4:11 AM, Björn Töpel wrote:
> >>> Hi Yonghong, I tried to run the XDP samples from the bpf-next tip
> >>> today, and was hit by a regression.
> >>>
> >>> Commit f7010770fbac ("tools/bpf: move bpf/lib netlink related
> >>> functions into a new file") adds a while(1) around the recv call in
> >>> bpf_set_link_xdp_fd making that call getting stuck in an infinite
> >>> loop.
> >>>
> >>> I simply removed the loop, and that solved my problem (patch below).
> >>>
> >>> However, I don't know if removing the loop would break bpftool for
> >>> you. If not, I can submit the patch as a proper one for bpf-next.
> >>
> >> Hi, Björn, thanks for reporting the problem.
> >> The while loop is needed since the "recv" syscall buffer size
> >> may not be big enough to hold all the returned information, in
> >> which cases, multiple "recv" calls are needed.
> >>
> >> Could you try the following patch to see whether it fixed your
> >> issue? Thanks!
> >>
> >
> > Nope, it doesn't -- but if you move that hunk after the for-loop it works.
>
> Could you try this patch?
>
Works! Thanks!
Tested-by: Björn Töpel <bjorn.topel@intel.com>
> commit 9a7fb19899ce87594fe8012f8a23fc8fc7b6b764 (HEAD -> fix)
> Author: Yonghong Song <yhs@fb.com>
> Date: Tue Sep 11 08:58:20 2018 -0700
>
> tools/bpf: fix a netlink recv issue
>
> Commit f7010770fbac ("tools/bpf: move bpf/lib netlink related
> functions into a new file") introduced a while loop for the
> netlink recv path. This while loop is needed since the
> buffer in recv syscall may not be enough to hold all the
> information and in such cases multiple recv calls are needed.
>
> There is a bug introduced by the above commit as
> the while loop may block on recv syscall if there is no
> more messages are expected. The netlink message header
> flag NLM_F_MULTI is used to indicate that more messages
> are expected and this patch fixed the bug by doing
> further recv syscall only if multipart message is expected.
>
> The patch added another fix regarding to message length of 0.
> When netlink recv returns message length of 0, there will be
> no more messages for returning data so the while loop
> can end.
>
> Fixes: f7010770fbac ("tools/bpf: move bpf/lib netlink related
> functions into a new file")
> Reported-by: Björn Töpel <bjorn.topel@intel.com>
> Signed-off-by: Yonghong Song <yhs@fb.com>
>
> diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c
> index 469e068dd0c5..fde1d7bf8199 100644
> --- a/tools/lib/bpf/netlink.c
> +++ b/tools/lib/bpf/netlink.c
> @@ -65,18 +65,23 @@ static int bpf_netlink_recv(int sock, __u32 nl_pid,
> int seq,
> __dump_nlmsg_t _fn, dump_nlmsg_t fn,
> void *cookie)
> {
> + bool multipart = true;
> struct nlmsgerr *err;
> struct nlmsghdr *nh;
> char buf[4096];
> int len, ret;
>
> - while (1) {
> + while (multipart) {
> + multipart = false;
> len = recv(sock, buf, sizeof(buf), 0);
> if (len < 0) {
> ret = -errno;
> goto done;
> }
>
> + if (len == 0)
> + break;
> +
> for (nh = (struct nlmsghdr *)buf; NLMSG_OK(nh, len);
> nh = NLMSG_NEXT(nh, len)) {
> if (nh->nlmsg_pid != nl_pid) {
> @@ -87,6 +92,8 @@ static int bpf_netlink_recv(int sock, __u32 nl_pid,
> int seq,
> ret = -LIBBPF_ERRNO__INVSEQ;
> goto done;
> }
> + if (nh->nlmsg_flags & NLM_F_MULTI)
> + multipart = true;
> switch (nh->nlmsg_type) {
> case NLMSG_ERROR:
> err = (struct nlmsgerr *)NLMSG_DATA(nh);
>
>
> >
> > Cheers,
> > Björn
> >
> >> commit 3eb1c0249dfc3ea4ad61aa223dce32262af7e049 (HEAD -> fix)
> >> Author: Yonghong Song <yhs@fb.com>
> >> Date: Tue Sep 11 08:58:20 2018 -0700
> >>
> >> tools/bpf: fix a netlink recv issue
> >>
> >> Commit f7010770fbac ("tools/bpf: move bpf/lib netlink related
> >> functions into a new file") introduced a while loop for the
> >> netlink recv path. This while loop is needed since the
> >> buffer in recv syscall may not be big enough to hold all the
> >> information and in such cases multiple recv calls are needed.
> >>
> >> When netlink recv returns message length of 0, there will be
> >> no more messages for returning data so the while loop
> >> can end.
> >>
> >> Fixes: f7010770fbac ("tools/bpf: move bpf/lib netlink related
> >> functions into a new file")
> >> Reported-by: Björn Töpel <bjorn.topel@intel.com>
> >> Signed-off-by: Yonghong Song <yhs@fb.com>
> >>
> >> diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c
> >> index 469e068dd0c5..37827319a50a 100644
> >> --- a/tools/lib/bpf/netlink.c
> >> +++ b/tools/lib/bpf/netlink.c
> >> @@ -77,6 +77,9 @@ static int bpf_netlink_recv(int sock, __u32 nl_pid,
> >> int seq,
> >> goto done;
> >> }
> >>
> >> + if (len == 0)
> >> + break;
> >> +
> >> for (nh = (struct nlmsghdr *)buf; NLMSG_OK(nh, len);
> >> nh = NLMSG_NEXT(nh, len)) {
> >> if (nh->nlmsg_pid != nl_pid) {
> >>
> >>
> >>>
> >>> Thanks!
> >>> Björn
> >>>
> >>> From: =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= <bjorn.topel@intel.com>
> >>> Date: Tue, 11 Sep 2018 12:35:44 +0200
> >>> Subject: [PATCH] tools/bpf: remove loop around netlink recv
> >>>
> >>> Commit f7010770fbac ("tools/bpf: move bpf/lib netlink related
> >>> functions into a new file") moved the bpf_set_link_xdp_fd and split it
> >>> up into multiple functions. The added receive function
> >>> bpf_netlink_recv added a loop around the recv syscall leading to
> >>> multiple recv calls. This resulted in all XDP samples in the
> >>> samples/bpf/ to stop working, since they were stuck in a blocking
> >>> recv.
> >>>
> >>> This commits removes the while (1)-statement.
> >>>
> >>> Fixes: f7010770fbac ("tools/bpf: move bpf/lib netlink related
> >>> functions into a new file")
> >>> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
> >>> ---
> >>> tools/lib/bpf/netlink.c | 64 ++++++++++++++++++++---------------------
> >>> 1 file changed, 31 insertions(+), 33 deletions(-)
> >>>
> >>> diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c
> >>> index 469e068dd0c5..0eae1fbf46c6 100644
> >>> --- a/tools/lib/bpf/netlink.c
> >>> +++ b/tools/lib/bpf/netlink.c
> >>> @@ -70,41 +70,39 @@ static int bpf_netlink_recv(int sock, __u32 nl_pid, int seq,
> >>> char buf[4096];
> >>> int len, ret;
> >>>
> >>> - while (1) {
> >>> - len = recv(sock, buf, sizeof(buf), 0);
> >>> - if (len < 0) {
> >>> - ret = -errno;
> >>> + len = recv(sock, buf, sizeof(buf), 0);
> >>> + if (len < 0) {
> >>> + ret = -errno;
> >>> + goto done;
> >>> + }
> >>> +
> >>> + for (nh = (struct nlmsghdr *)buf; NLMSG_OK(nh, len);
> >>> + nh = NLMSG_NEXT(nh, len)) {
> >>> + if (nh->nlmsg_pid != nl_pid) {
> >>> + ret = -LIBBPF_ERRNO__WRNGPID;
> >>> goto done;
> >>> }
> >>> -
> >>> - for (nh = (struct nlmsghdr *)buf; NLMSG_OK(nh, len);
> >>> - nh = NLMSG_NEXT(nh, len)) {
> >>> - if (nh->nlmsg_pid != nl_pid) {
> >>> - ret = -LIBBPF_ERRNO__WRNGPID;
> >>> - goto done;
> >>> - }
> >>> - if (nh->nlmsg_seq != seq) {
> >>> - ret = -LIBBPF_ERRNO__INVSEQ;
> >>> - goto done;
> >>> - }
> >>> - switch (nh->nlmsg_type) {
> >>> - case NLMSG_ERROR:
> >>> - err = (struct nlmsgerr *)NLMSG_DATA(nh);
> >>> - if (!err->error)
> >>> - continue;
> >>> - ret = err->error;
> >>> - nla_dump_errormsg(nh);
> >>> - goto done;
> >>> - case NLMSG_DONE:
> >>> - return 0;
> >>> - default:
> >>> - break;
> >>> - }
> >>> - if (_fn) {
> >>> - ret = _fn(nh, fn, cookie);
> >>> - if (ret)
> >>> - return ret;
> >>> - }
> >>> + if (nh->nlmsg_seq != seq) {
> >>> + ret = -LIBBPF_ERRNO__INVSEQ;
> >>> + goto done;
> >>> + }
> >>> + switch (nh->nlmsg_type) {
> >>> + case NLMSG_ERROR:
> >>> + err = (struct nlmsgerr *)NLMSG_DATA(nh);
> >>> + if (!err->error)
> >>> + continue;
> >>> + ret = err->error;
> >>> + nla_dump_errormsg(nh);
> >>> + goto done;
> >>> + case NLMSG_DONE:
> >>> + return 0;
> >>> + default:
> >>> + break;
> >>> + }
> >>> + if (_fn) {
> >>> + ret = _fn(nh, fn, cookie);
> >>> + if (ret)
> >>> + return ret;
> >>> }
> >>> }
> >>> ret = 0;
> >>>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox