* Re: Co-existing XDP generic and native mode? (Re: [PATCH bpf-next v5 5/8] xdp: Provide extack messages when prog attachment failed)
From: Jakub Kicinski @ 2019-02-01 18:47 UTC (permalink / raw)
To: Jesper Dangaard Brouer
Cc: daniel, ast, David Miller, Maciej Fijalkowski, netdev,
john.fastabend, David Ahern, Saeed Mahameed
In-Reply-To: <20190201080236.446d84d4@redhat.com>
On Fri, 1 Feb 2019 08:02:36 +0100, Jesper Dangaard Brouer wrote:
> On Thu, 31 Jan 2019 19:11:01 -0800
> Jakub Kicinski <jakub.kicinski@netronome.com> wrote:
>
> > On Fri, 1 Feb 2019 01:19:51 +0100, Maciej Fijalkowski wrote:
> > > if (__dev_xdp_query(dev, bpf_chk, XDP_QUERY_PROG) ||
> > > - __dev_xdp_query(dev, bpf_chk, XDP_QUERY_PROG_HW))
> > > + __dev_xdp_query(dev, bpf_chk, XDP_QUERY_PROG_HW)) {
> > > + NL_SET_ERR_MSG(extack, "native and generic XDP can't be active at the same time");
> > > return -EEXIST;
> > > + }
> >
> > This reminds me, since we allowed native/driver and offloaded XDP
> > programs to coexist in a25717d2b604 ("xdp: support simultaneous
> > driver and hw XDP attachment") I got an internal feature request
> > to also allow generic and native mode. Would anyone object to that?
>
> Well, I will object ;-)
>
> I have two refactor ideas [1] and [2], that depend on not allowing
> XDP-native and XDP-generic to co-exist. The general idea is to let
> XDP-native use the same fields in net_device->rx[] as XDP-generic given
> they (currently) cannot co-exist.
> The goal is (1) to move stuff out of driver code, and (2) hopefully
> make it easier to implement per RXq XDP progs.
You mean you'd use one pointer to keep the prog in the RXQ structure?
Then some from from of an extra flag will be necessary to distinguish?
I.e.:
if (rxq->prog && rxq->is_native)
/* got_prog */
rather than:
if (rxq->native_prog)
/* got_prog */
The cost of this reuse would be a read-only cache line per-q when XDP is
not enabled. Right now drivers have the ability to pack the XDP prog
into a structure which is in cache already, and don't need to bring the
entire RXQ structure out (which is cache line aligned so driver authors
can't do anything to place it cleverly).
No doubt, thought, that if we allow both to be enabled we will have to
bloat the data structures.
> These are only refactor ideas, so if you can argue why your internal
> feature request for simultaneous generic and native make more sense,
> then I'm open for allowing this ?
The request was actually to enable xdpoffload and xdpgeneric at the
same time. I'm happy to have that as another HW offload exclusive
for now :)
> [1] https://github.com/xdp-project/xdp-project/blob/master/areas/core/xdp_per_rxq01.org#refactor-idea-move-xdp_rxq_info-to-net_devicenetdev_rx_queue
>
> [2] https://github.com/xdp-project/xdp-project/blob/master/areas/core/xdp_per_rxq01.org#refactor-idea-xdpbpf_prog-into-netdev_rx_queuenet_device
>
> > Apart from a touch up to test_offload.py I don't think anything
> > would care. netlink can already carry multiple IDs, iproute2
> > understands it, too..
>
> And we did notice you added support for HW+native:
> [3] https://github.com/xdp-project/xdp-project/blob/master/areas/core/xdp_per_rxq01.org
^ permalink raw reply
* Re: WoL broken in r8169.c since kernel 4.19
From: Marc Haber @ 2019-02-01 18:41 UTC (permalink / raw)
To: Heiner Kallweit; +Cc: netdev@vger.kernel.org
In-Reply-To: <075de04d-4a5d-e557-aba6-7e569826a356@gmail.com>
On Fri, Feb 01, 2019 at 07:24:01PM +0100, Heiner Kallweit wrote:
> It's too much effort to make the current r8169 compile under an older kernel.
> But thanks anyway for trying!
Will the patch be in 5.0-rc5?
Greetings
Marc
--
-----------------------------------------------------------------------------
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
^ permalink raw reply
* Re: [PATCHv2 net] sctp: check and update stream->out_curr when allocating stream_out
From: David Miller @ 2019-02-01 18:38 UTC (permalink / raw)
To: nhorman; +Cc: marcelo.leitner, lucien.xin, netdev, linux-sctp
In-Reply-To: <20190201123118.GA5970@hmswarspite.think-freely.org>
From: Neil Horman <nhorman@tuxdriver.com>
Date: Fri, 1 Feb 2019 07:31:18 -0500
> On Thu, Jan 31, 2019 at 10:39:41PM -0200, Marcelo Ricardo Leitner wrote:
>> On Tue, Jan 29, 2019 at 07:58:07PM +0100, Tuxdriver wrote:
>> > I was initially under the impression that with Kent's repost, the radixtree
>> > (which is what I think you meant by rhashtables) updates would be merged
>>
>> Oops! Yep.. I had meant flex_arrays actually.
>>
>> > imminently, but that doesn't seem to be the case. I'd really like to know
>> > what the hold up there is, as that patch seems to have been stalled for
>> > months. I hate the notion of breaking the radixtree patch, but if it's
>> > status is indeterminate, then, yes, we probably need to go with xins patch
>> > for the short term, and let Kent fix it up in due course.
>>
>> Dave, can you please consider applying this patch? The conflict
>> resolution will be easy: just ignore the changes introduced by this
>> patch.
>>
> Dave I concur with Marcelo here. Kent was very active in getting sctp fixed up
> to use radixtrees, but now he seems to have gone to ground, and for whatever
> reason, no one seems interested in incorporating his patch. Its been languising
> for months, so I think we need to take action to secure sctp now until such time
> as his genradix changes finally move forward.
I agree, please let's get this sctp patch resubmitted and back into
patchwork and I will apply it.
^ permalink raw reply
* Re: WoL broken in r8169.c since kernel 4.19
From: Heiner Kallweit @ 2019-02-01 18:24 UTC (permalink / raw)
To: Marc Haber; +Cc: netdev@vger.kernel.org
In-Reply-To: <20190201171943.GA25308@torres.zugschlus.de>
It's too much effort to make the current r8169 compile under an older kernel.
But thanks anyway for trying!
Heiner
On 01.02.2019 18:19, Marc Haber wrote:
> On Fri, Feb 01, 2019 at 07:49:09AM +0100, Heiner Kallweit wrote:
>> Great, thanks. This patch has been applied and is part of latest linux-next kernel already:
>> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/?h=next-20190201
>> Would be much appreciated if you could build and test this kernel version.
>
> I regret that I have not been able to get this through the compiler.
> When I clone the linux-next repo and try to build with deb-pkg, some
> process chokes on the weird version number of the linux-next kernel, and
> transplanting the r8169.c from linux-next on to 4.20.6, compile aborts:
>
> In file included from drivers/net/ethernet/realtek/r8169.c:13:
> ./include/linux/pci.h:830:12: error: ‘PCI_VENDOR_ID_USR’ undeclared here (not in a function); did you mean ‘PCI_VENDOR_ID_UMC’?
> .vendor = PCI_VENDOR_ID_##vend, .device = (dev), \
> ^~~~~~~~~~~~~~
> drivers/net/ethernet/realtek/r8169.c:222:4: note: in expansion of macro ‘PCI_VDEVICE’
> { PCI_VDEVICE(USR, 0x0116), RTL_CFG_0 },
> ^~~~~~~~~~~
> drivers/net/ethernet/realtek/r8169.c: In function ‘rtl8169_start_xmit’:
> drivers/net/ethernet/realtek/r8169.c:6261:6: error: implicit declaration of function ‘__netdev_sent_queue’; did you mean ‘netdev_sent_queue’? [-Werror=implicit-function-declaration]
> if (__netdev_sent_queue(dev, skb->len, skb->xmit_more))
> ^~~~~~~~~~~~~~~~~~~
> netdev_sent_queue
> drivers/net/ethernet/realtek/r8169.c: In function ‘r8169_phy_connect’:
> drivers/net/ethernet/realtek/r8169.c:6653:22: warning: passing argument 1 of ‘linkmode_copy’ makes pointer from integer without a cast [-Wint-conversion]
> linkmode_copy(phydev->advertising, phydev->supported);
> ~~~~~~^~~~~~~~~~~~~
> In file included from ./include/linux/phy.h:22,
> from drivers/net/ethernet/realtek/r8169.c:19:
> ./include/linux/linkmode.h:13:49: note: expected ‘long unsigned int *’ but argument is of type ‘u32’ {aka ‘unsigned int’}
> static inline void linkmode_copy(unsigned long *dst, const unsigned long *src)
> ~~~~~~~~~~~~~~~^~~
> drivers/net/ethernet/realtek/r8169.c:6653:43: warning: passing argument 2 of ‘linkmode_copy’ makes pointer from integer without a cast [-Wint-conversion]
> linkmode_copy(phydev->advertising, phydev->supported);
> ~~~~~~^~~~~~~~~~~
> In file included from ./include/linux/phy.h:22,
> from drivers/net/ethernet/realtek/r8169.c:19:
> ./include/linux/linkmode.h:13:75: note: expected ‘const long unsigned int *’ but argument is of type ‘u32’ {aka ‘unsigned int’}
> static inline void linkmode_copy(unsigned long *dst, const unsigned long *src)
> ~~~~~~~~~~~~~~~~~~~~~^~~
> cc1: some warnings being treated as errors
> make[4]: *** [scripts/Makefile.build:297: drivers/net/ethernet/realtek/r8169.o] Error 1
>
> Will it help if I transplant more files from linux-next to 4.20.6?
>
> Greetings
> Marc
>
^ permalink raw reply
* Re: pull-request: mac80211-next 2019-02-01
From: David Miller @ 2019-02-01 18:21 UTC (permalink / raw)
To: johannes; +Cc: netdev, linux-wireless
In-Reply-To: <20190201111651.10203-1-johannes@sipsolutions.net>
From: Johannes Berg <johannes@sipsolutions.net>
Date: Fri, 1 Feb 2019 12:16:50 +0100
> Here's my next pull request for net-next. I wanted to include more,
> notably the multi-BSSID work that's going on now (multi-BSSID lets
> a single AP with a single beacon have multiple networks, rather than
> requiring a beacon for each), but we're still finding small issues
> and bugs in it, so I haven't included it. As always, I'll send it
> when it's ready :-)
Ok.
> As noted below, this contains the NLA_POLICY_NESTED{,_ARRAY} changes
> I showed you earlier, no new users have been added to net-next as
> of today when I merged your tree to mine (to get the net changes to
> not have conflicts over this in wireless code.)
Thanks for all of this information, it really helps me audit and
understand a pull request better.
> Please pull and let me know if there's any problem.
Pulled, thanks Johannes.
^ permalink raw reply
* Re: [PATCH 0/3] pull request for net: batman-adv 2019-02-01
From: David Miller @ 2019-02-01 18:19 UTC (permalink / raw)
To: sw; +Cc: netdev, b.a.t.m.a.n
In-Reply-To: <20190201111412.13807-1-sw@simonwunderlich.de>
From: Simon Wunderlich <sw@simonwunderlich.de>
Date: Fri, 1 Feb 2019 12:14:09 +0100
> here are some bugfixes which we would like to see integrated into net.
>
> Please pull or let me know of any problem!
Pulled, thanks Simon.
^ permalink raw reply
* Re: [PATCH v2 net] phy: Micrel KSZ8061: link failure after cable connect
From: David Miller @ 2019-02-01 18:17 UTC (permalink / raw)
To: T.Rajasingh; +Cc: alexander.onnasch, andrew, f.fainelli, netdev, linux-kernel
In-Reply-To: <20190201104836.21095-1-T.Rajasingh@landisgyr.com>
From: Rajasingh Thavamani <T.Rajasingh@landisgyr.com>
Date: Fri, 1 Feb 2019 16:18:32 +0530
> With Micrel KSZ8061 PHY, the link may occasionally not come up after
> Ethernet cable connect. The vendor's (Microchip, former Micrel) errata
> sheet 80000688A.pdf describes the problem and possible workarounds in
> detail, see below.
> The patch implements workaround 1, which permanently fixes the issue.
This does not apply cleanly to the current 'net' tree, and also there is not
enough context in the second hunk of the patch for me to safely fix it up
for you as I really can't tell exactly which PHY device entry the change
goes into.
So please respin this upon a clean net tree, thanks.
^ permalink raw reply
* Re: pull-request: mac80211 2019-02-01
From: David Miller @ 2019-02-01 18:14 UTC (permalink / raw)
To: johannes; +Cc: netdev, linux-wireless
In-Reply-To: <20190201101744.7551-1-johannes@sipsolutions.net>
From: Johannes Berg <johannes@sipsolutions.net>
Date: Fri, 1 Feb 2019 11:17:43 +0100
> Two more fixes for the current cycle, one is a new bug introduced
> in 4.20, the other one seems to be much older.
>
> Please pull and let me know if there's any problem.
Pulled, thanks!
^ permalink raw reply
* Re: [PATCH net-next] net: hns3: Check for allocation failure
From: David Miller @ 2019-02-01 18:02 UTC (permalink / raw)
To: dan.carpenter
Cc: yisen.zhuang, liuzhongzhu, salil.mehta, lipeng321, shenjian15,
tanhuazhong, linyunsheng, liangfuyun1, netdev, kernel-janitors
In-Reply-To: <20190201083226.GF8459@kadam>
From: Dan Carpenter <dan.carpenter@oracle.com>
Date: Fri, 1 Feb 2019 11:32:26 +0300
> We should return -ENOMEM if the kcalloc() fails.
>
> Fixes: d174ea75c96a ("net: hns3: add statistics for PFC frames and MAC control frame")
> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Applied.
^ permalink raw reply
* Re: [PATCH net] skge: potential memory corruption in skge_get_regs()
From: David Miller @ 2019-02-01 18:00 UTC (permalink / raw)
To: dan.carpenter; +Cc: mlindner, stephen, netdev, kernel-janitors
In-Reply-To: <20190201082816.GB8459@kadam>
From: Dan Carpenter <dan.carpenter@oracle.com>
Date: Fri, 1 Feb 2019 11:28:16 +0300
> The "p" buffer is 0x4000 bytes long. B3_RI_WTO_R1 is 0x190. The value
> of "regs->len" is in the 1-0x4000 range. The bug here is that
> "regs->len - B3_RI_WTO_R1" can be a negative value which would lead to
> memory corruption and an abrupt crash.
>
> Fixes: c3f8be961808 ("[PATCH] skge: expand ethtool debug register dump")
> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Applied.
^ permalink raw reply
* Re: [PATCH net] ethtool: remove unnecessary check in ethtool_get_regs()
From: David Miller @ 2019-02-01 17:59 UTC (permalink / raw)
To: dan.carpenter
Cc: linyunsheng, f.fainelli, wang6495, borisp, ilyal, ecree, mkubecek,
ynorov, jakub.kicinski, netdev, kernel-janitors
In-Reply-To: <20190201082406.GA8459@kadam>
From: Dan Carpenter <dan.carpenter@oracle.com>
Date: Fri, 1 Feb 2019 11:24:06 +0300
> We recently changed this function in commit f9fc54d313fa ("ethtool:
> check the return value of get_regs_len") such that if "reglen" is zero
> we return directly. That means we can remove this condition as well.
>
> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Applied to net-next, thanks.
^ permalink raw reply
* Re: KASAN: use-after-free Read in selinux_netlbl_socket_setsockopt
From: Cong Wang @ 2019-02-01 17:58 UTC (permalink / raw)
To: Dmitry Vyukov
Cc: Paul Moore, Ralf Baechle, David Miller, linux-hams, netdev,
syzbot, Eric Paris, LKML, Stephen Smalley, selinux,
syzkaller-bugs
In-Reply-To: <CACT4Y+Yk=bsS7Ti3PjeZm8yG_txt=orBztowHwddHRP_MzuTxw@mail.gmail.com>
On Thu, Jan 31, 2019 at 10:56 PM Dmitry Vyukov <dvyukov@google.com> wrote:
> Hi Paul,
>
> Searching for af_netrom across other syzbot bugs:
> https://groups.google.com/forum/#!searchin/syzkaller-bugs/af_netrom%7Csort:date
>
> I see at least:
> https://syzkaller.appspot.com/bug?extid=b0b1952f5864b4009b09
> https://syzkaller.appspot.com/bug?extid=febf3c50d4262e578b1c
> https://syzkaller.appspot.com/bug?extid=defa700d16f1bd1b9a05
>
> Which suggests there are some serious lifetime problems in netrom
> sockets. That would probably explain this crash as well.
This is supposed to be fixed by:
https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=63346650c1a94a92be61a57416ac88c0a47c4327
Please let me know if it isn't.
Thanks.
^ permalink raw reply
* [PATCH bpf-next v2 1/3] tools/bpf: move libbpf pr_* debug print functions to headers
From: Yonghong Song @ 2019-02-01 17:47 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Magnus Karlsson, netdev
Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, Yonghong Song
In-Reply-To: <20190201174731.695459-1-yhs@fb.com>
A global function libbpf_debug_print, which is invisible
outside the shared library, is defined to print based
on levels. The pr_warning, pr_info and pr_debug
macros are moved into the newly created header
common.h. So any .c file including common.h can
use these macros directly.
Currently btf__new and btf_ext__new API has an argument getting
__pr_debug function pointer into btf.c so the debugging information
can be printed there. This patch removed this parameter
from btf__new and btf_ext__new and directly using pr_debug in btf.c.
Another global function libbpf_dprint_level_available, also
invisible outside the shared library, can test
whether a particular level debug printing is
available or not. It is used in btf.c to
test whether DEBUG level debug printing is availabl or not,
based on which the log buffer will be allocated when loading
btf to the kernel.
Signed-off-by: Yonghong Song <yhs@fb.com>
---
tools/lib/bpf/btf.c | 97 +++++++++++++++++------------------
tools/lib/bpf/btf.h | 7 +--
tools/lib/bpf/libbpf.c | 46 ++++++++++++-----
tools/lib/bpf/libbpf.h | 6 +++
tools/lib/bpf/test_libbpf.cpp | 2 +-
tools/lib/bpf/util.h | 32 ++++++++++++
6 files changed, 120 insertions(+), 70 deletions(-)
create mode 100644 tools/lib/bpf/util.h
diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
index d682d3b8f7b9..159104ec31dc 100644
--- a/tools/lib/bpf/btf.c
+++ b/tools/lib/bpf/btf.c
@@ -9,8 +9,9 @@
#include <linux/btf.h>
#include "btf.h"
#include "bpf.h"
+#include "libbpf.h"
+#include "util.h"
-#define elog(fmt, ...) { if (err_log) err_log(fmt, ##__VA_ARGS__); }
#define max(a, b) ((a) > (b) ? (a) : (b))
#define min(a, b) ((a) < (b) ? (a) : (b))
@@ -107,54 +108,54 @@ static int btf_add_type(struct btf *btf, struct btf_type *t)
return 0;
}
-static int btf_parse_hdr(struct btf *btf, btf_print_fn_t err_log)
+static int btf_parse_hdr(struct btf *btf)
{
const struct btf_header *hdr = btf->hdr;
__u32 meta_left;
if (btf->data_size < sizeof(struct btf_header)) {
- elog("BTF header not found\n");
+ pr_debug("BTF header not found\n");
return -EINVAL;
}
if (hdr->magic != BTF_MAGIC) {
- elog("Invalid BTF magic:%x\n", hdr->magic);
+ pr_debug("Invalid BTF magic:%x\n", hdr->magic);
return -EINVAL;
}
if (hdr->version != BTF_VERSION) {
- elog("Unsupported BTF version:%u\n", hdr->version);
+ pr_debug("Unsupported BTF version:%u\n", hdr->version);
return -ENOTSUP;
}
if (hdr->flags) {
- elog("Unsupported BTF flags:%x\n", hdr->flags);
+ pr_debug("Unsupported BTF flags:%x\n", hdr->flags);
return -ENOTSUP;
}
meta_left = btf->data_size - sizeof(*hdr);
if (!meta_left) {
- elog("BTF has no data\n");
+ pr_debug("BTF has no data\n");
return -EINVAL;
}
if (meta_left < hdr->type_off) {
- elog("Invalid BTF type section offset:%u\n", hdr->type_off);
+ pr_debug("Invalid BTF type section offset:%u\n", hdr->type_off);
return -EINVAL;
}
if (meta_left < hdr->str_off) {
- elog("Invalid BTF string section offset:%u\n", hdr->str_off);
+ pr_debug("Invalid BTF string section offset:%u\n", hdr->str_off);
return -EINVAL;
}
if (hdr->type_off >= hdr->str_off) {
- elog("BTF type section offset >= string section offset. No type?\n");
+ pr_debug("BTF type section offset >= string section offset. No type?\n");
return -EINVAL;
}
if (hdr->type_off & 0x02) {
- elog("BTF type section is not aligned to 4 bytes\n");
+ pr_debug("BTF type section is not aligned to 4 bytes\n");
return -EINVAL;
}
@@ -163,7 +164,7 @@ static int btf_parse_hdr(struct btf *btf, btf_print_fn_t err_log)
return 0;
}
-static int btf_parse_str_sec(struct btf *btf, btf_print_fn_t err_log)
+static int btf_parse_str_sec(struct btf *btf)
{
const struct btf_header *hdr = btf->hdr;
const char *start = btf->nohdr_data + hdr->str_off;
@@ -171,7 +172,7 @@ static int btf_parse_str_sec(struct btf *btf, btf_print_fn_t err_log)
if (!hdr->str_len || hdr->str_len - 1 > BTF_MAX_NAME_OFFSET ||
start[0] || end[-1]) {
- elog("Invalid BTF string section\n");
+ pr_debug("Invalid BTF string section\n");
return -EINVAL;
}
@@ -180,7 +181,7 @@ static int btf_parse_str_sec(struct btf *btf, btf_print_fn_t err_log)
return 0;
}
-static int btf_parse_type_sec(struct btf *btf, btf_print_fn_t err_log)
+static int btf_parse_type_sec(struct btf *btf)
{
struct btf_header *hdr = btf->hdr;
void *nohdr_data = btf->nohdr_data;
@@ -219,7 +220,7 @@ static int btf_parse_type_sec(struct btf *btf, btf_print_fn_t err_log)
case BTF_KIND_RESTRICT:
break;
default:
- elog("Unsupported BTF_KIND:%u\n",
+ pr_debug("Unsupported BTF_KIND:%u\n",
BTF_INFO_KIND(t->info));
return -EINVAL;
}
@@ -363,7 +364,7 @@ void btf__free(struct btf *btf)
free(btf);
}
-struct btf *btf__new(__u8 *data, __u32 size, btf_print_fn_t err_log)
+struct btf *btf__new(__u8 *data, __u32 size)
{
__u32 log_buf_size = 0;
char *log_buf = NULL;
@@ -376,7 +377,7 @@ struct btf *btf__new(__u8 *data, __u32 size, btf_print_fn_t err_log)
btf->fd = -1;
- if (err_log) {
+ if (libbpf_dprint_level_available(LIBBPF_DEBUG)) {
log_buf = malloc(BPF_LOG_BUF_SIZE);
if (!log_buf) {
err = -ENOMEM;
@@ -400,21 +401,21 @@ struct btf *btf__new(__u8 *data, __u32 size, btf_print_fn_t err_log)
if (btf->fd == -1) {
err = -errno;
- elog("Error loading BTF: %s(%d)\n", strerror(errno), errno);
+ pr_debug("Error loading BTF: %s(%d)\n", strerror(errno), errno);
if (log_buf && *log_buf)
- elog("%s\n", log_buf);
+ pr_debug("%s\n", log_buf);
goto done;
}
- err = btf_parse_hdr(btf, err_log);
+ err = btf_parse_hdr(btf);
if (err)
goto done;
- err = btf_parse_str_sec(btf, err_log);
+ err = btf_parse_str_sec(btf);
if (err)
goto done;
- err = btf_parse_type_sec(btf, err_log);
+ err = btf_parse_type_sec(btf);
done:
free(log_buf);
@@ -491,7 +492,7 @@ int btf__get_from_id(__u32 id, struct btf **btf)
goto exit_free;
}
- *btf = btf__new((__u8 *)(long)btf_info.btf, btf_info.btf_size, NULL);
+ *btf = btf__new((__u8 *)(long)btf_info.btf, btf_info.btf_size);
if (IS_ERR(*btf)) {
err = PTR_ERR(*btf);
*btf = NULL;
@@ -514,8 +515,7 @@ struct btf_ext_sec_copy_param {
static int btf_ext_copy_info(struct btf_ext *btf_ext,
__u8 *data, __u32 data_size,
- struct btf_ext_sec_copy_param *ext_sec,
- btf_print_fn_t err_log)
+ struct btf_ext_sec_copy_param *ext_sec)
{
const struct btf_ext_header *hdr = (struct btf_ext_header *)data;
const struct btf_ext_info_sec *sinfo;
@@ -529,14 +529,14 @@ static int btf_ext_copy_info(struct btf_ext *btf_ext,
data_size -= hdr->hdr_len;
if (ext_sec->off & 0x03) {
- elog(".BTF.ext %s section is not aligned to 4 bytes\n",
+ pr_debug(".BTF.ext %s section is not aligned to 4 bytes\n",
ext_sec->desc);
return -EINVAL;
}
if (data_size < ext_sec->off ||
ext_sec->len > data_size - ext_sec->off) {
- elog("%s section (off:%u len:%u) is beyond the end of the ELF section .BTF.ext\n",
+ pr_debug("%s section (off:%u len:%u) is beyond the end of the ELF section .BTF.ext\n",
ext_sec->desc, ext_sec->off, ext_sec->len);
return -EINVAL;
}
@@ -546,7 +546,7 @@ static int btf_ext_copy_info(struct btf_ext *btf_ext,
/* At least a record size */
if (info_left < sizeof(__u32)) {
- elog(".BTF.ext %s record size not found\n", ext_sec->desc);
+ pr_debug(".BTF.ext %s record size not found\n", ext_sec->desc);
return -EINVAL;
}
@@ -554,7 +554,7 @@ static int btf_ext_copy_info(struct btf_ext *btf_ext,
record_size = *(__u32 *)info;
if (record_size < ext_sec->min_rec_size ||
record_size & 0x03) {
- elog("%s section in .BTF.ext has invalid record size %u\n",
+ pr_debug("%s section in .BTF.ext has invalid record size %u\n",
ext_sec->desc, record_size);
return -EINVAL;
}
@@ -564,7 +564,7 @@ static int btf_ext_copy_info(struct btf_ext *btf_ext,
/* If no records, return failure now so .BTF.ext won't be used. */
if (!info_left) {
- elog("%s section in .BTF.ext has no records", ext_sec->desc);
+ pr_debug("%s section in .BTF.ext has no records", ext_sec->desc);
return -EINVAL;
}
@@ -574,14 +574,14 @@ static int btf_ext_copy_info(struct btf_ext *btf_ext,
__u32 num_records;
if (info_left < sec_hdrlen) {
- elog("%s section header is not found in .BTF.ext\n",
+ pr_debug("%s section header is not found in .BTF.ext\n",
ext_sec->desc);
return -EINVAL;
}
num_records = sinfo->num_info;
if (num_records == 0) {
- elog("%s section has incorrect num_records in .BTF.ext\n",
+ pr_debug("%s section has incorrect num_records in .BTF.ext\n",
ext_sec->desc);
return -EINVAL;
}
@@ -589,7 +589,7 @@ static int btf_ext_copy_info(struct btf_ext *btf_ext,
total_record_size = sec_hdrlen +
(__u64)num_records * record_size;
if (info_left < total_record_size) {
- elog("%s section has incorrect num_records in .BTF.ext\n",
+ pr_debug("%s section has incorrect num_records in .BTF.ext\n",
ext_sec->desc);
return -EINVAL;
}
@@ -610,8 +610,7 @@ static int btf_ext_copy_info(struct btf_ext *btf_ext,
}
static int btf_ext_copy_func_info(struct btf_ext *btf_ext,
- __u8 *data, __u32 data_size,
- btf_print_fn_t err_log)
+ __u8 *data, __u32 data_size)
{
const struct btf_ext_header *hdr = (struct btf_ext_header *)data;
struct btf_ext_sec_copy_param param = {
@@ -622,12 +621,11 @@ static int btf_ext_copy_func_info(struct btf_ext *btf_ext,
.desc = "func_info"
};
- return btf_ext_copy_info(btf_ext, data, data_size, ¶m, err_log);
+ return btf_ext_copy_info(btf_ext, data, data_size, ¶m);
}
static int btf_ext_copy_line_info(struct btf_ext *btf_ext,
- __u8 *data, __u32 data_size,
- btf_print_fn_t err_log)
+ __u8 *data, __u32 data_size)
{
const struct btf_ext_header *hdr = (struct btf_ext_header *)data;
struct btf_ext_sec_copy_param param = {
@@ -638,37 +636,36 @@ static int btf_ext_copy_line_info(struct btf_ext *btf_ext,
.desc = "line_info",
};
- return btf_ext_copy_info(btf_ext, data, data_size, ¶m, err_log);
+ return btf_ext_copy_info(btf_ext, data, data_size, ¶m);
}
-static int btf_ext_parse_hdr(__u8 *data, __u32 data_size,
- btf_print_fn_t err_log)
+static int btf_ext_parse_hdr(__u8 *data, __u32 data_size)
{
const struct btf_ext_header *hdr = (struct btf_ext_header *)data;
if (data_size < offsetof(struct btf_ext_header, func_info_off) ||
data_size < hdr->hdr_len) {
- elog("BTF.ext header not found");
+ pr_debug("BTF.ext header not found");
return -EINVAL;
}
if (hdr->magic != BTF_MAGIC) {
- elog("Invalid BTF.ext magic:%x\n", hdr->magic);
+ pr_debug("Invalid BTF.ext magic:%x\n", hdr->magic);
return -EINVAL;
}
if (hdr->version != BTF_VERSION) {
- elog("Unsupported BTF.ext version:%u\n", hdr->version);
+ pr_debug("Unsupported BTF.ext version:%u\n", hdr->version);
return -ENOTSUP;
}
if (hdr->flags) {
- elog("Unsupported BTF.ext flags:%x\n", hdr->flags);
+ pr_debug("Unsupported BTF.ext flags:%x\n", hdr->flags);
return -ENOTSUP;
}
if (data_size == hdr->hdr_len) {
- elog("BTF.ext has no data\n");
+ pr_debug("BTF.ext has no data\n");
return -EINVAL;
}
@@ -685,12 +682,12 @@ void btf_ext__free(struct btf_ext *btf_ext)
free(btf_ext);
}
-struct btf_ext *btf_ext__new(__u8 *data, __u32 size, btf_print_fn_t err_log)
+struct btf_ext *btf_ext__new(__u8 *data, __u32 size)
{
struct btf_ext *btf_ext;
int err;
- err = btf_ext_parse_hdr(data, size, err_log);
+ err = btf_ext_parse_hdr(data, size);
if (err)
return ERR_PTR(err);
@@ -698,13 +695,13 @@ struct btf_ext *btf_ext__new(__u8 *data, __u32 size, btf_print_fn_t err_log)
if (!btf_ext)
return ERR_PTR(-ENOMEM);
- err = btf_ext_copy_func_info(btf_ext, data, size, err_log);
+ err = btf_ext_copy_func_info(btf_ext, data, size);
if (err) {
btf_ext__free(btf_ext);
return ERR_PTR(err);
}
- err = btf_ext_copy_line_info(btf_ext, data, size, err_log);
+ err = btf_ext_copy_line_info(btf_ext, data, size);
if (err) {
btf_ext__free(btf_ext);
return ERR_PTR(err);
diff --git a/tools/lib/bpf/btf.h b/tools/lib/bpf/btf.h
index b0610dcdae6b..b1e8e54cc21d 100644
--- a/tools/lib/bpf/btf.h
+++ b/tools/lib/bpf/btf.h
@@ -55,11 +55,8 @@ struct btf_ext_header {
__u32 line_info_len;
};
-typedef int (*btf_print_fn_t)(const char *, ...)
- __attribute__((format(printf, 1, 2)));
-
LIBBPF_API void btf__free(struct btf *btf);
-LIBBPF_API struct btf *btf__new(__u8 *data, __u32 size, btf_print_fn_t err_log);
+LIBBPF_API struct btf *btf__new(__u8 *data, __u32 size);
LIBBPF_API __s32 btf__find_by_name(const struct btf *btf,
const char *type_name);
LIBBPF_API const struct btf_type *btf__type_by_id(const struct btf *btf,
@@ -70,7 +67,7 @@ LIBBPF_API int btf__fd(const struct btf *btf);
LIBBPF_API const char *btf__name_by_offset(const struct btf *btf, __u32 offset);
LIBBPF_API int btf__get_from_id(__u32 id, struct btf **btf);
-struct btf_ext *btf_ext__new(__u8 *data, __u32 size, btf_print_fn_t err_log);
+struct btf_ext *btf_ext__new(__u8 *data, __u32 size);
void btf_ext__free(struct btf_ext *btf_ext);
int btf_ext__reloc_func_info(const struct btf *btf,
const struct btf_ext *btf_ext,
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 2ccde17957e6..1ede6b93653e 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -42,6 +42,7 @@
#include "bpf.h"
#include "btf.h"
#include "str_error.h"
+#include "util.h"
#ifndef EM_BPF
#define EM_BPF 247
@@ -69,16 +70,6 @@ static __printf(1, 2) libbpf_print_fn_t __pr_warning = __base_pr;
static __printf(1, 2) libbpf_print_fn_t __pr_info = __base_pr;
static __printf(1, 2) libbpf_print_fn_t __pr_debug;
-#define __pr(func, fmt, ...) \
-do { \
- if ((func)) \
- (func)("libbpf: " fmt, ##__VA_ARGS__); \
-} while (0)
-
-#define pr_warning(fmt, ...) __pr(__pr_warning, fmt, ##__VA_ARGS__)
-#define pr_info(fmt, ...) __pr(__pr_info, fmt, ##__VA_ARGS__)
-#define pr_debug(fmt, ...) __pr(__pr_debug, fmt, ##__VA_ARGS__)
-
void libbpf_set_print(libbpf_print_fn_t warn,
libbpf_print_fn_t info,
libbpf_print_fn_t debug)
@@ -88,6 +79,35 @@ void libbpf_set_print(libbpf_print_fn_t warn,
__pr_debug = debug;
}
+__printf(2, 3)
+void libbpf_debug_print(enum libbpf_print_level level, const char *format, ...)
+{
+ va_list args;
+
+ va_start(args, format);
+ if (level == LIBBPF_WARN) {
+ if (__pr_warning)
+ __pr_warning(format, args);
+ } else if (level == LIBBPF_INFO) {
+ if (__pr_info)
+ __pr_info(format, args);
+ } else {
+ if (__pr_debug)
+ __pr_debug(format, args);
+ }
+ va_end(args);
+}
+
+bool libbpf_dprint_level_available(enum libbpf_print_level level)
+{
+ if (level == LIBBPF_WARN)
+ return !!__pr_warning;
+ else if (level == LIBBPF_INFO)
+ return !!__pr_info;
+ else
+ return !!__pr_debug;
+}
+
#define STRERR_BUFSIZE 128
#define CHECK_ERR(action, err, out) do { \
@@ -839,8 +859,7 @@ static int bpf_object__elf_collect(struct bpf_object *obj, int flags)
else if (strcmp(name, "maps") == 0)
obj->efile.maps_shndx = idx;
else if (strcmp(name, BTF_ELF_SEC) == 0) {
- obj->btf = btf__new(data->d_buf, data->d_size,
- __pr_debug);
+ obj->btf = btf__new(data->d_buf, data->d_size);
if (IS_ERR(obj->btf)) {
pr_warning("Error loading ELF section %s: %ld. Ignored and continue.\n",
BTF_ELF_SEC, PTR_ERR(obj->btf));
@@ -915,8 +934,7 @@ static int bpf_object__elf_collect(struct bpf_object *obj, int flags)
BTF_EXT_ELF_SEC, BTF_ELF_SEC);
} else {
obj->btf_ext = btf_ext__new(btf_ext_data->d_buf,
- btf_ext_data->d_size,
- __pr_debug);
+ btf_ext_data->d_size);
if (IS_ERR(obj->btf_ext)) {
pr_warning("Error loading ELF section %s: %ld. Ignored and continue.\n",
BTF_EXT_ELF_SEC,
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 62ae6cb93da1..4e21971101c9 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -47,6 +47,12 @@ enum libbpf_errno {
LIBBPF_API int libbpf_strerror(int err, char *buf, size_t size);
+enum libbpf_print_level {
+ LIBBPF_WARN,
+ LIBBPF_INFO,
+ LIBBPF_DEBUG,
+};
+
/*
* __printf is defined in include/linux/compiler-gcc.h. However,
* it would be better if libbpf.h didn't depend on Linux header files.
diff --git a/tools/lib/bpf/test_libbpf.cpp b/tools/lib/bpf/test_libbpf.cpp
index abf3fc25c9fa..be67f5ea2c19 100644
--- a/tools/lib/bpf/test_libbpf.cpp
+++ b/tools/lib/bpf/test_libbpf.cpp
@@ -14,5 +14,5 @@ int main(int argc, char *argv[])
bpf_prog_get_fd_by_id(0);
/* btf.h */
- btf__new(NULL, 0, NULL);
+ btf__new(NULL, 0);
}
diff --git a/tools/lib/bpf/util.h b/tools/lib/bpf/util.h
new file mode 100644
index 000000000000..3ae80f486875
--- /dev/null
+++ b/tools/lib/bpf/util.h
@@ -0,0 +1,32 @@
+/* SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) */
+/* Copyright (c) 2019 Facebook */
+
+#ifndef __LIBBPF_COMMON_H
+#define __LIBBPF_COMMON_H
+
+#include <stdbool.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+extern void libbpf_debug_print(enum libbpf_print_level level,
+ const char *format, ...)
+ __attribute__((format(printf, 2, 3)));
+
+extern bool libbpf_dprint_level_available(enum libbpf_print_level level);
+
+#define __pr(level, fmt, ...) \
+do { \
+ libbpf_debug_print(level, "libbpf: " fmt, ##__VA_ARGS__); \
+} while (0)
+
+#define pr_warning(fmt, ...) __pr(LIBBPF_WARN, fmt, ##__VA_ARGS__)
+#define pr_info(fmt, ...) __pr(LIBBPF_INFO, fmt, ##__VA_ARGS__)
+#define pr_debug(fmt, ...) __pr(LIBBPF_DEBUG, fmt, ##__VA_ARGS__)
+
+#ifdef __cplusplus
+} /* extern "C" */
+#endif
+
+#endif
--
2.17.1
^ permalink raw reply related
* [PATCH bpf-next v2 3/3] tools/bpf: simplify libbpf API function libbpf_set_print()
From: Yonghong Song @ 2019-02-01 17:47 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Magnus Karlsson, netdev
Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, Yonghong Song
In-Reply-To: <20190201174731.695459-1-yhs@fb.com>
Currently, the libbpf API function libbpf_set_print()
takes three function pointer parameters for warning, info
and debug printout respectively.
This patch changes the API to have just one function pointer
parameter and the function pointer has one additional
parameter "debugging level". So if in the future, if
the debug level is increased, the function signature
won't change.
Signed-off-by: Yonghong Song <yhs@fb.com>
---
tools/lib/bpf/libbpf.c | 28 ++++-----------
tools/lib/bpf/libbpf.h | 14 +++-----
tools/lib/bpf/test_libbpf.cpp | 2 +-
tools/perf/util/bpf-loader.c | 32 +++++++----------
tools/testing/selftests/bpf/test_btf.c | 7 ++--
.../testing/selftests/bpf/test_libbpf_open.c | 36 +++++++++----------
tools/testing/selftests/bpf/test_progs.c | 20 +++++++++--
7 files changed, 63 insertions(+), 76 deletions(-)
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 1b1c0b504d25..d2337a179837 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -54,8 +54,8 @@
#define __printf(a, b) __attribute__((format(printf, a, b)))
-__printf(1, 2)
-static int __base_pr(const char *format, ...)
+__printf(2, 3)
+static int __base_pr(enum libbpf_print_level level, const char *format, ...)
{
va_list args;
int err;
@@ -66,17 +66,11 @@ static int __base_pr(const char *format, ...)
return err;
}
-static __printf(1, 2) libbpf_print_fn_t __pr_warning = __base_pr;
-static __printf(1, 2) libbpf_print_fn_t __pr_info = __base_pr;
-static __printf(1, 2) libbpf_print_fn_t __pr_debug;
+static __printf(2, 3) libbpf_print_fn_t __libbpf_pr = __base_pr;
-void libbpf_set_print(libbpf_print_fn_t warn,
- libbpf_print_fn_t info,
- libbpf_print_fn_t debug)
+void libbpf_set_print(libbpf_print_fn_t fn)
{
- __pr_warning = warn;
- __pr_info = info;
- __pr_debug = debug;
+ __libbpf_pr = fn;
}
__printf(2, 3)
@@ -85,16 +79,8 @@ void libbpf_debug_print(enum libbpf_print_level level, const char *format, ...)
va_list args;
va_start(args, format);
- if (level == LIBBPF_WARN) {
- if (__pr_warning)
- __pr_warning(format, args);
- } else if (level == LIBBPF_INFO) {
- if (__pr_info)
- __pr_info(format, args);
- } else {
- if (__pr_debug)
- __pr_debug(format, args);
- }
+ if (__libbpf_pr)
+ __libbpf_pr(level, format, args);
va_end(args);
}
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 4e21971101c9..f8f27f1bb6bf 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -53,17 +53,11 @@ enum libbpf_print_level {
LIBBPF_DEBUG,
};
-/*
- * __printf is defined in include/linux/compiler-gcc.h. However,
- * it would be better if libbpf.h didn't depend on Linux header files.
- * So instead of __printf, here we use gcc attribute directly.
- */
-typedef int (*libbpf_print_fn_t)(const char *, ...)
- __attribute__((format(printf, 1, 2)));
+typedef int (*libbpf_print_fn_t)(enum libbpf_print_level level,
+ const char *, ...)
+ __attribute__((format(printf, 2, 3)));
-LIBBPF_API void libbpf_set_print(libbpf_print_fn_t warn,
- libbpf_print_fn_t info,
- libbpf_print_fn_t debug);
+LIBBPF_API void libbpf_set_print(libbpf_print_fn_t fn);
/* Hide internal to user */
struct bpf_object;
diff --git a/tools/lib/bpf/test_libbpf.cpp b/tools/lib/bpf/test_libbpf.cpp
index be67f5ea2c19..fc134873bb6d 100644
--- a/tools/lib/bpf/test_libbpf.cpp
+++ b/tools/lib/bpf/test_libbpf.cpp
@@ -8,7 +8,7 @@
int main(int argc, char *argv[])
{
/* libbpf.h */
- libbpf_set_print(NULL, NULL, NULL);
+ libbpf_set_print(NULL);
/* bpf.h */
bpf_prog_get_fd_by_id(0);
diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index 2f3eb6d293ee..c492f3a2acdc 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -24,21 +24,17 @@
#include "llvm-utils.h"
#include "c++/clang-c.h"
-#define DEFINE_PRINT_FN(name, level) \
-static int libbpf_##name(const char *fmt, ...) \
-{ \
- va_list args; \
- int ret; \
- \
- va_start(args, fmt); \
- ret = veprintf(level, verbose, pr_fmt(fmt), args);\
- va_end(args); \
- return ret; \
-}
+static int libbpf_perf_dprint(enum libbpf_print_level level __attribute__((unused)),
+ const char *fmt, ...)
+{
+ va_list args;
+ int ret;
-DEFINE_PRINT_FN(warning, 1)
-DEFINE_PRINT_FN(info, 1)
-DEFINE_PRINT_FN(debug, 1)
+ va_start(args, fmt);
+ ret = veprintf(1, verbose, pr_fmt(fmt), args);
+ va_end(args);
+ return ret;
+}
struct bpf_prog_priv {
bool is_tp;
@@ -59,9 +55,7 @@ bpf__prepare_load_buffer(void *obj_buf, size_t obj_buf_sz, const char *name)
struct bpf_object *obj;
if (!libbpf_initialized) {
- libbpf_set_print(libbpf_warning,
- libbpf_info,
- libbpf_debug);
+ libbpf_set_print(libbpf_perf_dprint);
libbpf_initialized = true;
}
@@ -79,9 +73,7 @@ struct bpf_object *bpf__prepare_load(const char *filename, bool source)
struct bpf_object *obj;
if (!libbpf_initialized) {
- libbpf_set_print(libbpf_warning,
- libbpf_info,
- libbpf_debug);
+ libbpf_set_print(libbpf_perf_dprint);
libbpf_initialized = true;
}
diff --git a/tools/testing/selftests/bpf/test_btf.c b/tools/testing/selftests/bpf/test_btf.c
index 179f1d8ec5bf..aebaeff5a5a0 100644
--- a/tools/testing/selftests/bpf/test_btf.c
+++ b/tools/testing/selftests/bpf/test_btf.c
@@ -54,8 +54,9 @@ static int count_result(int err)
#define __printf(a, b) __attribute__((format(printf, a, b)))
-__printf(1, 2)
-static int __base_pr(const char *format, ...)
+__printf(2, 3)
+static int __base_pr(enum libbpf_print_level level __attribute__((unused)),
+ const char *format, ...)
{
va_list args;
int err;
@@ -5650,7 +5651,7 @@ int main(int argc, char **argv)
return err;
if (args.always_log)
- libbpf_set_print(__base_pr, __base_pr, __base_pr);
+ libbpf_set_print(__base_pr);
if (args.raw_test)
err |= test_raw();
diff --git a/tools/testing/selftests/bpf/test_libbpf_open.c b/tools/testing/selftests/bpf/test_libbpf_open.c
index 8fcd1c076add..3fe258520e4b 100644
--- a/tools/testing/selftests/bpf/test_libbpf_open.c
+++ b/tools/testing/selftests/bpf/test_libbpf_open.c
@@ -34,23 +34,22 @@ static void usage(char *argv[])
printf("\n");
}
-#define DEFINE_PRINT_FN(name, enabled) \
-static int libbpf_##name(const char *fmt, ...) \
-{ \
- va_list args; \
- int ret; \
- \
- va_start(args, fmt); \
- if (enabled) { \
- fprintf(stderr, "[" #name "] "); \
- ret = vfprintf(stderr, fmt, args); \
- } \
- va_end(args); \
- return ret; \
+static bool debug = 0;
+static int libbpf_print(enum libbpf_print_level level,
+ const char *fmt, ...)
+{
+ va_list args;
+ int ret;
+
+ if (level == LIBBPF_DEBUG && !debug)
+ return 0;
+
+ va_start(args, fmt);
+ fprintf(stderr, "[%d] ", level);
+ ret = vfprintf(stderr, fmt, args);
+ va_end(args);
+ return ret;
}
-DEFINE_PRINT_FN(warning, 1)
-DEFINE_PRINT_FN(info, 1)
-DEFINE_PRINT_FN(debug, 1)
#define EXIT_FAIL_LIBBPF EXIT_FAILURE
#define EXIT_FAIL_OPTION 2
@@ -120,15 +119,14 @@ int main(int argc, char **argv)
int longindex = 0;
int opt;
- libbpf_set_print(libbpf_warning, libbpf_info, NULL);
+ libbpf_set_print(libbpf_print);
/* Parse commands line args */
while ((opt = getopt_long(argc, argv, "hDq",
long_options, &longindex)) != -1) {
switch (opt) {
case 'D':
- libbpf_set_print(libbpf_warning, libbpf_info,
- libbpf_debug);
+ debug = 1;
break;
case 'q': /* Use in scripting mode */
verbose = 0;
diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c
index d8940b8b2f8d..5eff68ab2c1c 100644
--- a/tools/testing/selftests/bpf/test_progs.c
+++ b/tools/testing/selftests/bpf/test_progs.c
@@ -10,6 +10,7 @@
#include <string.h>
#include <assert.h>
#include <stdlib.h>
+#include <stdarg.h>
#include <time.h>
#include <linux/types.h>
@@ -1783,6 +1784,21 @@ static void test_task_fd_query_tp(void)
"sys_enter_read");
}
+static int libbpf_print(enum libbpf_print_level level,
+ const char *format, ...)
+{
+ va_list args;
+ int ret;
+
+ if (level == LIBBPF_DEBUG)
+ return 0;
+
+ va_start(args, format);
+ ret = vfprintf(stderr, format, args);
+ va_end(args);
+ return ret;
+}
+
static void test_reference_tracking()
{
const char *file = "./test_sk_lookup_kern.o";
@@ -1809,9 +1825,9 @@ static void test_reference_tracking()
/* Expect verifier failure if test name has 'fail' */
if (strstr(title, "fail") != NULL) {
- libbpf_set_print(NULL, NULL, NULL);
+ libbpf_set_print(NULL);
err = !bpf_program__load(prog, "GPL", 0);
- libbpf_set_print(printf, printf, NULL);
+ libbpf_set_print(libbpf_print);
} else {
err = bpf_program__load(prog, "GPL", 0);
}
--
2.17.1
^ permalink raw reply related
* [PATCH bpf-next v2 0/3] tools/bpf: changes of libbpf debug interfaces
From: Yonghong Song @ 2019-02-01 17:47 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Magnus Karlsson, netdev
Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, Yonghong Song
These are patches responding to my comments for
Magnus's patch (https://patchwork.ozlabs.org/patch/1032848/).
The goal is to make pr_* macros available to other C files
than libbpf.c, and to simplify API function libbpf_set_print().
Specifically, Patch #1 used global functions
to facilitate pr_* macros in the header files so they
are available in different C files.
Patch #2 removes the global function libbpf_dprint_level_available()
which is added in Patch 1.
Patch #3 simplified libbpf_set_print() which takes only one print
function with a debug level argument among others.
Changelogs:
v1 -> v2:
. Renamed global function libbpf_dprint() to libbpf_debug_print()
to be more expressive.
. Removed libbpf_dprint_level_available() as it is used only
once in btf.c and we can remove it by optimizing for common cases.
Yonghong Song (3):
tools/bpf: move libbpf pr_* debug print functions to headers
tools/bpf: print out btf log at LIBBPF_WARN level
tools/bpf: simplify libbpf API function libbpf_set_print()
tools/lib/bpf/btf.c | 110 +++++++++---------
tools/lib/bpf/btf.h | 7 +-
tools/lib/bpf/libbpf.c | 42 +++----
tools/lib/bpf/libbpf.h | 20 ++--
tools/lib/bpf/test_libbpf.cpp | 4 +-
tools/lib/bpf/util.h | 30 +++++
tools/perf/util/bpf-loader.c | 32 ++---
tools/testing/selftests/bpf/test_btf.c | 7 +-
.../testing/selftests/bpf/test_libbpf_open.c | 36 +++---
tools/testing/selftests/bpf/test_progs.c | 20 +++-
10 files changed, 166 insertions(+), 142 deletions(-)
create mode 100644 tools/lib/bpf/util.h
--
2.17.1
^ permalink raw reply
* [PATCH bpf-next v2 2/3] tools/bpf: print out btf log at LIBBPF_WARN level
From: Yonghong Song @ 2019-02-01 17:47 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Magnus Karlsson, netdev
Cc: Alexei Starovoitov, Daniel Borkmann, kernel-team, Yonghong Song
In-Reply-To: <20190201174731.695459-1-yhs@fb.com>
Currently, the btf log is allocated and printed out in case
of error at LIBBPF_DEBUG level.
Such logs from kernel are very important for debugging.
For example, bpf syscall BPF_PROG_LOAD command can get
verifier logs back to user space. In function load_program()
of libbpf.c, the log buffer is allocated unconditionally
and printed out at pr_warning() level.
Let us do the similar thing here for btf. Allocate buffer
unconditionally and print out error logs at pr_warning() level.
This is to reduce one global function and
optimize for common situations where pr_warning()
is activated either by default or by user supplied
debug output function.
Signed-off-by: Yonghong Song <yhs@fb.com>
---
tools/lib/bpf/btf.c | 19 +++++++++----------
tools/lib/bpf/libbpf.c | 10 ----------
tools/lib/bpf/util.h | 2 --
3 files changed, 9 insertions(+), 22 deletions(-)
diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
index 159104ec31dc..46271bc19572 100644
--- a/tools/lib/bpf/btf.c
+++ b/tools/lib/bpf/btf.c
@@ -377,16 +377,15 @@ struct btf *btf__new(__u8 *data, __u32 size)
btf->fd = -1;
- if (libbpf_dprint_level_available(LIBBPF_DEBUG)) {
- log_buf = malloc(BPF_LOG_BUF_SIZE);
- if (!log_buf) {
- err = -ENOMEM;
- goto done;
- }
- *log_buf = 0;
- log_buf_size = BPF_LOG_BUF_SIZE;
+ log_buf = malloc(BPF_LOG_BUF_SIZE);
+ if (!log_buf) {
+ err = -ENOMEM;
+ goto done;
}
+ *log_buf = 0;
+ log_buf_size = BPF_LOG_BUF_SIZE;
+
btf->data = malloc(size);
if (!btf->data) {
err = -ENOMEM;
@@ -401,9 +400,9 @@ struct btf *btf__new(__u8 *data, __u32 size)
if (btf->fd == -1) {
err = -errno;
- pr_debug("Error loading BTF: %s(%d)\n", strerror(errno), errno);
+ pr_warning("Error loading BTF: %s(%d)\n", strerror(errno), errno);
if (log_buf && *log_buf)
- pr_debug("%s\n", log_buf);
+ pr_warning("%s\n", log_buf);
goto done;
}
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 1ede6b93653e..1b1c0b504d25 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -98,16 +98,6 @@ void libbpf_debug_print(enum libbpf_print_level level, const char *format, ...)
va_end(args);
}
-bool libbpf_dprint_level_available(enum libbpf_print_level level)
-{
- if (level == LIBBPF_WARN)
- return !!__pr_warning;
- else if (level == LIBBPF_INFO)
- return !!__pr_info;
- else
- return !!__pr_debug;
-}
-
#define STRERR_BUFSIZE 128
#define CHECK_ERR(action, err, out) do { \
diff --git a/tools/lib/bpf/util.h b/tools/lib/bpf/util.h
index 3ae80f486875..66d65b803095 100644
--- a/tools/lib/bpf/util.h
+++ b/tools/lib/bpf/util.h
@@ -14,8 +14,6 @@ extern void libbpf_debug_print(enum libbpf_print_level level,
const char *format, ...)
__attribute__((format(printf, 2, 3)));
-extern bool libbpf_dprint_level_available(enum libbpf_print_level level);
-
#define __pr(level, fmt, ...) \
do { \
libbpf_debug_print(level, "libbpf: " fmt, ##__VA_ARGS__); \
--
2.17.1
^ permalink raw reply related
* [PATCH bpf-next v6 5/5] selftests: bpf: add test_lwt_ip_encap selftest
From: Peter Oskolkov @ 2019-02-01 17:22 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, netdev
Cc: Peter Oskolkov, David Ahern, Willem de Bruijn, Peter Oskolkov
In-Reply-To: <20190201172229.108867-1-posk@google.com>
This patch adds a bpf self-test to cover BPF_LWT_ENCAP_IP mode
in bpf_lwt_push_encap.
Covered:
- encapping in LWT_IN and LWT_XMIT
- IPv4 and IPv6
A follow-up patch will add VRF-enabled tests.
Signed-off-by: Peter Oskolkov <posk@google.com>
---
tools/testing/selftests/bpf/Makefile | 5 +-
.../testing/selftests/bpf/test_lwt_ip_encap.c | 85 +++++
.../selftests/bpf/test_lwt_ip_encap.sh | 311 ++++++++++++++++++
3 files changed, 399 insertions(+), 2 deletions(-)
create mode 100644 tools/testing/selftests/bpf/test_lwt_ip_encap.c
create mode 100755 tools/testing/selftests/bpf/test_lwt_ip_encap.sh
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 8993e9c8f410..28aa3b3e297e 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -35,7 +35,7 @@ BPF_OBJ_FILES = \
sendmsg4_prog.o sendmsg6_prog.o test_lirc_mode2_kern.o \
get_cgroup_id_kern.o socket_cookie_prog.o test_select_reuseport_kern.o \
test_skb_cgroup_id_kern.o bpf_flow.o netcnt_prog.o test_xdp_vlan.o \
- xdp_dummy.o test_map_in_map.o
+ xdp_dummy.o test_map_in_map.o test_lwt_ip_encap.o
# Objects are built with default compilation flags and with sub-register
# code-gen enabled.
@@ -73,7 +73,8 @@ TEST_PROGS := test_kmod.sh \
test_lirc_mode2.sh \
test_skb_cgroup_id.sh \
test_flow_dissector.sh \
- test_xdp_vlan.sh
+ test_xdp_vlan.sh \
+ test_lwt_ip_encap.sh
TEST_PROGS_EXTENDED := with_addr.sh \
with_tunnels.sh \
diff --git a/tools/testing/selftests/bpf/test_lwt_ip_encap.c b/tools/testing/selftests/bpf/test_lwt_ip_encap.c
new file mode 100644
index 000000000000..c957d6dfe6d7
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_lwt_ip_encap.c
@@ -0,0 +1,85 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <stddef.h>
+#include <string.h>
+#include <linux/bpf.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include "bpf_helpers.h"
+#include "bpf_endian.h"
+
+struct grehdr {
+ __be16 flags;
+ __be16 protocol;
+};
+
+SEC("encap_gre")
+int bpf_lwt_encap_gre(struct __sk_buff *skb)
+{
+ struct encap_hdr {
+ struct iphdr iph;
+ struct grehdr greh;
+ } hdr;
+ int err;
+
+ memset(&hdr, 0, sizeof(struct encap_hdr));
+
+ hdr.iph.ihl = 5;
+ hdr.iph.version = 4;
+ hdr.iph.ttl = 0x40;
+ hdr.iph.protocol = 47; /* IPPROTO_GRE */
+#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
+ hdr.iph.saddr = 0x640110ac; /* 172.16.1.100 */
+ hdr.iph.daddr = 0x641010ac; /* 172.16.16.100 */
+#elif __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
+ hdr.iph.saddr = 0xac100164; /* 172.16.1.100 */
+ hdr.iph.daddr = 0xac101064; /* 172.16.16.100 */
+#else
+#error "Fix your compiler's __BYTE_ORDER__?!"
+#endif
+ hdr.iph.tot_len = bpf_htons(skb->len + sizeof(struct encap_hdr));
+
+ hdr.greh.protocol = skb->protocol;
+
+ err = bpf_lwt_push_encap(skb, BPF_LWT_ENCAP_IP, &hdr,
+ sizeof(struct encap_hdr));
+ if (err)
+ return BPF_DROP;
+
+ return BPF_LWT_REROUTE;
+}
+
+SEC("encap_gre6")
+int bpf_lwt_encap_gre6(struct __sk_buff *skb)
+{
+ struct encap_hdr {
+ struct ipv6hdr ip6hdr;
+ struct grehdr greh;
+ } hdr;
+ int err;
+
+ memset(&hdr, 0, sizeof(struct encap_hdr));
+
+ hdr.ip6hdr.version = 6;
+ hdr.ip6hdr.payload_len = bpf_htons(skb->len + sizeof(struct grehdr));
+ hdr.ip6hdr.nexthdr = 47; /* IPPROTO_GRE */
+ hdr.ip6hdr.hop_limit = 0x40;
+ /* fb01::1 */
+ hdr.ip6hdr.saddr.s6_addr[0] = 0xfb;
+ hdr.ip6hdr.saddr.s6_addr[1] = 1;
+ hdr.ip6hdr.saddr.s6_addr[15] = 1;
+ /* fb10::1 */
+ hdr.ip6hdr.daddr.s6_addr[0] = 0xfb;
+ hdr.ip6hdr.daddr.s6_addr[1] = 0x10;
+ hdr.ip6hdr.daddr.s6_addr[15] = 1;
+
+ hdr.greh.protocol = skb->protocol;
+
+ err = bpf_lwt_push_encap(skb, BPF_LWT_ENCAP_IP, &hdr,
+ sizeof(struct encap_hdr));
+ if (err)
+ return BPF_DROP;
+
+ return BPF_LWT_REROUTE;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/test_lwt_ip_encap.sh b/tools/testing/selftests/bpf/test_lwt_ip_encap.sh
new file mode 100755
index 000000000000..4ca714e23ab0
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_lwt_ip_encap.sh
@@ -0,0 +1,311 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# Setup/topology:
+#
+# NS1 NS2 NS3
+# veth1 <---> veth2 veth3 <---> veth4 (the top route)
+# veth5 <---> veth6 veth7 <---> veth8 (the bottom route)
+#
+# each vethN gets IPv[4|6]_N address
+#
+# IPv*_SRC = IPv*_1
+# IPv*_DST = IPv*_4
+#
+# all tests test pings from IPv*_SRC to IPv*_DST
+#
+# by default, routes are configured to allow packets to go
+# IP*_1 <=> IP*_2 <=> IP*_3 <=> IP*_4 (the top route)
+#
+# a GRE device is installed in NS3 with IPv*_GRE, and
+# NS1/NS2 are configured to route packets to IPv*_GRE via IP*_8
+# (the bottom route)
+#
+# Tests:
+#
+# 1. routes NS2->IPv*_DST are brought down, so the only way a ping
+# from IP*_SRC to IP*_DST can work is via IPv*_GRE
+#
+# 2a. in an egress test, a bpf LWT_XMIT program is installed on veth1
+# that encaps the packets with an IP/GRE header to route to IPv*_GRE
+#
+# ping: SRC->[encap at veth1:egress]->GRE:decap->DST
+# ping replies go DST->SRC directly
+#
+# 2b. in an ingress test, a bpf LWT_IN program is installed on veth2
+# that encaps the packets with an IP/GRE header to route to IPv*_GRE
+#
+# ping: SRC->[encap at veth2:ingress]->GRE:decap->DST
+# ping replies go DST->SRC directly
+
+set -e # exit on error
+
+if [[ $EUID -ne 0 ]]; then
+ echo "This script must be run as root"
+ echo "FAIL"
+ exit 1
+fi
+
+readonly NS1="ns1-$(mktemp -u XXXXXX)"
+readonly NS2="ns2-$(mktemp -u XXXXXX)"
+readonly NS3="ns3-$(mktemp -u XXXXXX)"
+
+readonly IPv4_1="172.16.1.100"
+readonly IPv4_2="172.16.2.100"
+readonly IPv4_3="172.16.3.100"
+readonly IPv4_4="172.16.4.100"
+readonly IPv4_5="172.16.5.100"
+readonly IPv4_6="172.16.6.100"
+readonly IPv4_7="172.16.7.100"
+readonly IPv4_8="172.16.8.100"
+readonly IPv4_GRE="172.16.16.100"
+
+readonly IPv4_SRC=$IPv4_1
+readonly IPv4_DST=$IPv4_4
+
+readonly IPv6_1="fb01::1"
+readonly IPv6_2="fb02::1"
+readonly IPv6_3="fb03::1"
+readonly IPv6_4="fb04::1"
+readonly IPv6_5="fb05::1"
+readonly IPv6_6="fb06::1"
+readonly IPv6_7="fb07::1"
+readonly IPv6_8="fb08::1"
+readonly IPv6_GRE="fb10::1"
+
+readonly IPv6_SRC=$IPv6_1
+readonly IPv6_DST=$IPv6_4
+
+setup() {
+set -e # exit on error
+ # create devices and namespaces
+ ip netns add "${NS1}"
+ ip netns add "${NS2}"
+ ip netns add "${NS3}"
+
+ ip link add veth1 type veth peer name veth2
+ ip link add veth3 type veth peer name veth4
+ ip link add veth5 type veth peer name veth6
+ ip link add veth7 type veth peer name veth8
+
+ ip netns exec ${NS2} sysctl -wq net.ipv4.ip_forward=1
+ ip netns exec ${NS2} sysctl -wq net.ipv6.conf.all.forwarding=1
+
+ ip link set veth1 netns ${NS1}
+ ip link set veth2 netns ${NS2}
+ ip link set veth3 netns ${NS2}
+ ip link set veth4 netns ${NS3}
+ ip link set veth5 netns ${NS1}
+ ip link set veth6 netns ${NS2}
+ ip link set veth7 netns ${NS2}
+ ip link set veth8 netns ${NS3}
+
+ # configure addesses: the top route (1-2-3-4)
+ ip -netns ${NS1} addr add ${IPv4_1}/24 dev veth1
+ ip -netns ${NS2} addr add ${IPv4_2}/24 dev veth2
+ ip -netns ${NS2} addr add ${IPv4_3}/24 dev veth3
+ ip -netns ${NS3} addr add ${IPv4_4}/24 dev veth4
+ ip -netns ${NS1} -6 addr add ${IPv6_1}/128 nodad dev veth1
+ ip -netns ${NS2} -6 addr add ${IPv6_2}/128 nodad dev veth2
+ ip -netns ${NS2} -6 addr add ${IPv6_3}/128 nodad dev veth3
+ ip -netns ${NS3} -6 addr add ${IPv6_4}/128 nodad dev veth4
+
+ # configure addresses: the bottom route (5-6-7-8)
+ ip -netns ${NS1} addr add ${IPv4_5}/24 dev veth5
+ ip -netns ${NS2} addr add ${IPv4_6}/24 dev veth6
+ ip -netns ${NS2} addr add ${IPv4_7}/24 dev veth7
+ ip -netns ${NS3} addr add ${IPv4_8}/24 dev veth8
+ ip -netns ${NS1} -6 addr add ${IPv6_5}/128 nodad dev veth5
+ ip -netns ${NS2} -6 addr add ${IPv6_6}/128 nodad dev veth6
+ ip -netns ${NS2} -6 addr add ${IPv6_7}/128 nodad dev veth7
+ ip -netns ${NS3} -6 addr add ${IPv6_8}/128 nodad dev veth8
+
+
+ ip -netns ${NS1} link set dev veth1 up
+ ip -netns ${NS2} link set dev veth2 up
+ ip -netns ${NS2} link set dev veth3 up
+ ip -netns ${NS3} link set dev veth4 up
+ ip -netns ${NS1} link set dev veth5 up
+ ip -netns ${NS2} link set dev veth6 up
+ ip -netns ${NS2} link set dev veth7 up
+ ip -netns ${NS3} link set dev veth8 up
+
+ # configure routes: IP*_SRC -> veth1/IP*_2 (= top route) default;
+ # the bottom route to specific bottom addresses
+
+ # NS1
+ # top route
+ ip -netns ${NS1} route add ${IPv4_2}/32 dev veth1
+ ip -netns ${NS1} route add default dev veth1 via ${IPv4_2} # go top by default
+ ip -netns ${NS1} -6 route add ${IPv6_2}/128 dev veth1
+ ip -netns ${NS1} -6 route add default dev veth1 via ${IPv6_2} # go top by default
+ # bottom route
+ ip -netns ${NS1} route add ${IPv4_6}/32 dev veth5
+ ip -netns ${NS1} route add ${IPv4_7}/32 dev veth5 via ${IPv4_6}
+ ip -netns ${NS1} route add ${IPv4_8}/32 dev veth5 via ${IPv4_6}
+ ip -netns ${NS1} -6 route add ${IPv6_6}/128 dev veth5
+ ip -netns ${NS1} -6 route add ${IPv6_7}/128 dev veth5 via ${IPv6_6}
+ ip -netns ${NS1} -6 route add ${IPv6_8}/128 dev veth5 via ${IPv6_6}
+
+ # NS2
+ # top route
+ ip -netns ${NS2} route add ${IPv4_1}/32 dev veth2
+ ip -netns ${NS2} route add ${IPv4_4}/32 dev veth3
+ ip -netns ${NS2} -6 route add ${IPv6_1}/128 dev veth2
+ ip -netns ${NS2} -6 route add ${IPv6_4}/128 dev veth3
+ # bottom route
+ ip -netns ${NS2} route add ${IPv4_5}/32 dev veth6
+ ip -netns ${NS2} route add ${IPv4_8}/32 dev veth7
+ ip -netns ${NS2} -6 route add ${IPv6_5}/128 dev veth6
+ ip -netns ${NS2} -6 route add ${IPv6_8}/128 dev veth7
+
+ # NS3
+ # top route
+ ip -netns ${NS3} route add ${IPv4_3}/32 dev veth4
+ ip -netns ${NS3} route add ${IPv4_1}/32 dev veth4 via ${IPv4_3}
+ ip -netns ${NS3} route add ${IPv4_2}/32 dev veth4 via ${IPv4_3}
+ ip -netns ${NS3} -6 route add ${IPv6_3}/128 dev veth4
+ ip -netns ${NS3} -6 route add ${IPv6_1}/128 dev veth4 via ${IPv6_3}
+ ip -netns ${NS3} -6 route add ${IPv6_2}/128 dev veth4 via ${IPv6_3}
+ # bottom route
+ ip -netns ${NS3} route add ${IPv4_7}/32 dev veth8
+ ip -netns ${NS3} route add ${IPv4_5}/32 dev veth8 via ${IPv4_7}
+ ip -netns ${NS3} route add ${IPv4_6}/32 dev veth8 via ${IPv4_7}
+ ip -netns ${NS3} -6 route add ${IPv6_7}/128 dev veth8
+ ip -netns ${NS3} -6 route add ${IPv6_5}/128 dev veth8 via ${IPv6_7}
+ ip -netns ${NS3} -6 route add ${IPv6_6}/128 dev veth8 via ${IPv6_7}
+
+ # configure IPv4 GRE device in NS3, and a route to it via the "bottom" route
+ ip -netns ${NS3} tunnel add gre_dev mode gre remote ${IPv4_1} local ${IPv4_GRE} ttl 255
+ ip -netns ${NS3} link set gre_dev up
+ ip -netns ${NS3} addr add ${IPv4_GRE} dev gre_dev
+ ip -netns ${NS1} route add ${IPv4_GRE}/32 dev veth5 via ${IPv4_6}
+ ip -netns ${NS2} route add ${IPv4_GRE}/32 dev veth7 via ${IPv4_8}
+
+
+ # configure IPv6 GRE device in NS3, and a route to it via the "bottom" route
+ ip -netns ${NS3} -6 tunnel add name gre6_dev mode ip6gre remote ${IPv6_1} local ${IPv6_GRE} ttl 255
+ ip -netns ${NS3} link set gre6_dev up
+ ip -netns ${NS3} -6 addr add ${IPv6_GRE} nodad dev gre6_dev
+ ip -netns ${NS1} -6 route add ${IPv6_GRE}/128 dev veth5 via ${IPv6_6}
+ ip -netns ${NS2} -6 route add ${IPv6_GRE}/128 dev veth7 via ${IPv6_8}
+
+ # rp_filter gets confused by what these tests are doing, so disable it
+ ip netns exec ${NS1} sysctl -wq net.ipv4.conf.all.rp_filter=0
+ ip netns exec ${NS2} sysctl -wq net.ipv4.conf.all.rp_filter=0
+ ip netns exec ${NS3} sysctl -wq net.ipv4.conf.all.rp_filter=0
+}
+
+cleanup() {
+ ip netns del ${NS1} 2> /dev/null
+ ip netns del ${NS2} 2> /dev/null
+ ip netns del ${NS3} 2> /dev/null
+}
+
+trap cleanup EXIT
+
+test_ping() {
+ local readonly PROTO=$1
+ local readonly EXPECTED=$2
+ local RET=0
+
+ set +e
+ if [ "${PROTO}" == "IPv4" ] ; then
+ ip netns exec ${NS1} ping -c 1 -W 1 -I ${IPv4_SRC} ${IPv4_DST} 2>&1 > /dev/null
+ RET=$?
+ elif [ "${PROTO}" == "IPv6" ] ; then
+ ip netns exec ${NS1} ping6 -c 1 -W 6 -I ${IPv6_SRC} ${IPv6_DST} 2>&1 > /dev/null
+ RET=$?
+ else
+ echo "test_ping: unknown PROTO: ${PROTO}"
+ exit 1
+ fi
+ set -e
+
+ if [ "0" != "${RET}" ]; then
+ RET=1
+ fi
+
+ if [ "${EXPECTED}" != "${RET}" ] ; then
+ echo "FAIL: test_ping: ${RET}"
+ exit 1
+ fi
+}
+
+test_egress() {
+ local readonly ENCAP=$1
+ echo "starting egress ${ENCAP} encap test"
+ setup
+
+ # need to wait a bit for IPv6 to autoconf, otherwise
+ # ping6 sometimes fails with "unable to bind to address"
+
+ # by default, pings work
+ test_ping IPv4 0
+ test_ping IPv6 0
+
+ # remove NS2->DST routes, ping fails
+ ip -netns ${NS2} route del ${IPv4_DST}/32 dev veth3
+ ip -netns ${NS2} -6 route del ${IPv6_DST}/128 dev veth3
+ test_ping IPv4 1
+ test_ping IPv6 1
+
+ # install replacement routes (LWT/eBPF), pings succeed
+ if [ "${ENCAP}" == "IPv4" ] ; then
+ ip -netns ${NS1} route add ${IPv4_DST} encap bpf xmit obj test_lwt_ip_encap.o sec encap_gre dev veth1
+ ip -netns ${NS1} -6 route add ${IPv6_DST} encap bpf xmit obj test_lwt_ip_encap.o sec encap_gre dev veth1
+ elif [ "${ENCAP}" == "IPv6" ] ; then
+ ip -netns ${NS1} route add ${IPv4_DST} encap bpf xmit obj test_lwt_ip_encap.o sec encap_gre6 dev veth1
+ ip -netns ${NS1} -6 route add ${IPv6_DST} encap bpf xmit obj test_lwt_ip_encap.o sec encap_gre6 dev veth1
+ else
+ echo "FAIL: unknown encap ${ENCAP}"
+ fi
+ test_ping IPv4 0
+ test_ping IPv6 0
+
+ cleanup
+ echo "PASS"
+}
+
+test_ingress() {
+ local readonly ENCAP=$1
+ echo "starting ingress ${ENCAP} encap test"
+ setup
+
+ # need to wait a bit for IPv6 to autoconf, otherwise
+ # ping6 sometimes fails with "unable to bind to address"
+
+ # by default, pings work
+ test_ping IPv4 0
+ test_ping IPv6 0
+
+ # remove NS2->DST routes, pings fail
+ ip -netns ${NS2} route del ${IPv4_DST}/32 dev veth3
+ ip -netns ${NS2} -6 route del ${IPv6_DST}/128 dev veth3
+ test_ping IPv4 1
+ test_ping IPv6 1
+
+ # install replacement routes (LWT/eBPF), pings succeed
+ if [ "${ENCAP}" == "IPv4" ] ; then
+ ip -netns ${NS2} route add ${IPv4_DST} encap bpf in obj test_lwt_ip_encap.o sec encap_gre dev veth2
+ ip -netns ${NS2} -6 route add ${IPv6_DST} encap bpf in obj test_lwt_ip_encap.o sec encap_gre dev veth2
+ elif [ "${ENCAP}" == "IPv6" ] ; then
+ ip -netns ${NS2} route add ${IPv4_DST} encap bpf in obj test_lwt_ip_encap.o sec encap_gre6 dev veth2
+ ip -netns ${NS2} -6 route add ${IPv6_DST} encap bpf in obj test_lwt_ip_encap.o sec encap_gre6 dev veth2
+ else
+ echo "FAIL: unknown encap ${ENCAP}"
+ fi
+ test_ping IPv4 0
+ test_ping IPv6 0
+
+ cleanup
+ echo "PASS"
+}
+
+test_egress IPv4
+test_egress IPv6
+
+test_ingress IPv4
+test_ingress IPv6
+
+echo "all tests passed"
--
2.20.1.611.gfbb209baf1-goog
^ permalink raw reply related
* [PATCH bpf-next v6 4/5] bpf: sync <kdir>/<uapi>/bpf.h with tools/<uapi>/bpf.h
From: Peter Oskolkov @ 2019-02-01 17:22 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, netdev
Cc: Peter Oskolkov, David Ahern, Willem de Bruijn, Peter Oskolkov
In-Reply-To: <20190201172229.108867-1-posk@google.com>
This patch copies changes in bpf.h done by a previous patch
in this patchset from the kernel uapi include dir into tools
uapi include dir.
Signed-off-by: Peter Oskolkov <posk@google.com>
---
tools/include/uapi/linux/bpf.h | 23 +++++++++++++++++++++--
1 file changed, 21 insertions(+), 2 deletions(-)
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 60b99b730a41..911c15585fab 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -2015,6 +2015,16 @@ union bpf_attr {
* Only works if *skb* contains an IPv6 packet. Insert a
* Segment Routing Header (**struct ipv6_sr_hdr**) inside
* the IPv6 header.
+ * **BPF_LWT_ENCAP_IP**
+ * IP encapsulation (GRE/GUE/IPIP/etc). The outer header
+ * must be IPv4 or IPv6, followed by zero or more
+ * additional headers, up to LWT_BPF_MAX_HEADROOM total
+ * bytes in all prepended headers.
+ *
+ * BPF_LWT_ENCAP_SEG6*** types can be called by bpf programs of
+ * type BPF_PROG_TYPE_LWT_IN; BPF_LWT_ENCAP_IP type can be called
+ * by bpf programs of types BPF_PROG_TYPE_LWT_IN and
+ * BPF_PROG_TYPE_LWT_XMIT.
*
* A call to this helper is susceptible to change the underlaying
* packet buffer. Therefore, at load time, all checks on pointers
@@ -2495,7 +2505,8 @@ enum bpf_hdr_start_off {
/* Encapsulation type for BPF_FUNC_lwt_push_encap helper. */
enum bpf_lwt_encap_mode {
BPF_LWT_ENCAP_SEG6,
- BPF_LWT_ENCAP_SEG6_INLINE
+ BPF_LWT_ENCAP_SEG6_INLINE,
+ BPF_LWT_ENCAP_IP,
};
#define __bpf_md_ptr(type, name) \
@@ -2583,7 +2594,15 @@ enum bpf_ret_code {
BPF_DROP = 2,
/* 3-6 reserved */
BPF_REDIRECT = 7,
- /* >127 are reserved for prog type specific return codes */
+ /* >127 are reserved for prog type specific return codes.
+ *
+ * BPF_LWT_REROUTE: used by BPF_PROG_TYPE_LWT_IN and
+ * BPF_PROG_TYPE_LWT_XMIT to indicate that skb had been
+ * changed and should be routed based on its new L3 header.
+ * (This is an L3 redirect, as opposed to L2 redirect
+ * represented by BPF_REDIRECT above).
+ */
+ BPF_LWT_REROUTE = 128,
};
struct bpf_sock {
--
2.20.1.611.gfbb209baf1-goog
^ permalink raw reply related
* [PATCH bpf-next v6 3/5] bpf: add handling of BPF_LWT_REROUTE to lwt_bpf.c
From: Peter Oskolkov @ 2019-02-01 17:22 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, netdev
Cc: Peter Oskolkov, David Ahern, Willem de Bruijn, Peter Oskolkov
In-Reply-To: <20190201172229.108867-1-posk@google.com>
This patch builds on top of the previous patch in the patchset,
which added BPF_LWT_ENCAP_IP mode to bpf_lwt_push_encap. As the
encapping can result in the skb needing to go via a different
interface/route/dst, bpf programs can indicate this by returning
BPF_LWT_REROUTE, which triggers a new route lookup for the skb.
Signed-off-by: Peter Oskolkov <posk@google.com>
---
net/core/lwt_bpf.c | 125 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 125 insertions(+)
diff --git a/net/core/lwt_bpf.c b/net/core/lwt_bpf.c
index eaec3d491894..6bad2eca23ec 100644
--- a/net/core/lwt_bpf.c
+++ b/net/core/lwt_bpf.c
@@ -16,6 +16,7 @@
#include <linux/types.h>
#include <linux/bpf.h>
#include <net/lwtunnel.h>
+#include <net/ip6_route.h>
struct bpf_lwt_prog {
struct bpf_prog *prog;
@@ -55,6 +56,7 @@ static int run_lwt_bpf(struct sk_buff *skb, struct bpf_lwt_prog *lwt,
switch (ret) {
case BPF_OK:
+ case BPF_LWT_REROUTE:
break;
case BPF_REDIRECT:
@@ -87,6 +89,32 @@ static int run_lwt_bpf(struct sk_buff *skb, struct bpf_lwt_prog *lwt,
return ret;
}
+static int bpf_lwt_input_reroute(struct sk_buff *skb)
+{
+ int err = -EINVAL;
+
+ if (skb->protocol == htons(ETH_P_IP)) {
+ struct iphdr *iph = ip_hdr(skb);
+
+ err = ip_route_input_noref(skb, iph->daddr, iph->saddr,
+ iph->tos, skb_dst(skb)->dev);
+ } else if (skb->protocol == htons(ETH_P_IPV6)) {
+ ip6_route_input(skb);
+ err = skb_dst(skb)->error;
+ } else {
+ pr_warn_once("BPF_LWT_REROUTE input: unsupported proto %d\n",
+ skb->protocol);
+ }
+
+ if (err)
+ goto err;
+ return dst_input(skb);
+
+err:
+ kfree_skb(skb);
+ return err;
+}
+
static int bpf_input(struct sk_buff *skb)
{
struct dst_entry *dst = skb_dst(skb);
@@ -98,6 +126,8 @@ static int bpf_input(struct sk_buff *skb)
ret = run_lwt_bpf(skb, &bpf->in, dst, NO_REDIRECT);
if (ret < 0)
return ret;
+ if (ret == BPF_LWT_REROUTE)
+ return bpf_lwt_input_reroute(skb);
}
if (unlikely(!dst->lwtstate->orig_input)) {
@@ -147,6 +177,90 @@ static int xmit_check_hhlen(struct sk_buff *skb)
return 0;
}
+static int bpf_lwt_xmit_reroute(struct sk_buff *skb)
+{
+ struct net_device *l3mdev = l3mdev_master_dev_rcu(skb_dst(skb)->dev);
+ int oif = l3mdev ? l3mdev->ifindex : 0;
+ struct dst_entry *dst = NULL;
+ struct sock *sk;
+ struct net *net;
+ bool ipv4;
+ int err;
+
+ if (skb->protocol == htons(ETH_P_IP)) {
+ ipv4 = true;
+ } else if (skb->protocol == htons(ETH_P_IPV6)) {
+ ipv4 = false;
+ } else {
+ pr_warn_once("BPF_LWT_REROUTE xmit: unsupported proto %d\n",
+ skb->protocol);
+ return -EINVAL;
+ }
+
+ sk = sk_to_full_sk(skb->sk);
+ if (sk) {
+ if (sk->sk_bound_dev_if)
+ oif = sk->sk_bound_dev_if;
+ net = sock_net(sk);
+ } else {
+ net = dev_net(skb_dst(skb)->dev);
+ }
+
+ if (ipv4) {
+ struct iphdr *iph = ip_hdr(skb);
+ struct flowi4 fl4 = {0};
+ struct rtable *rt;
+
+ fl4.flowi4_oif = oif;
+ fl4.flowi4_mark = skb->mark;
+ fl4.flowi4_uid = sock_net_uid(net, sk);
+ fl4.flowi4_tos = RT_TOS(iph->tos);
+ fl4.flowi4_flags = FLOWI_FLAG_ANYSRC;
+ fl4.flowi4_proto = iph->protocol;
+ fl4.daddr = iph->daddr;
+ fl4.saddr = iph->saddr;
+
+ rt = ip_route_output_key(net, &fl4);
+ if (IS_ERR(rt) || rt->dst.error)
+ return -EINVAL;
+ dst = &rt->dst;
+ } else {
+ struct ipv6hdr *iph6 = ipv6_hdr(skb);
+ struct flowi6 fl6 = {0};
+
+ fl6.flowi6_oif = oif;
+ fl6.flowi6_mark = skb->mark;
+ fl6.flowi6_uid = sock_net_uid(net, sk);
+ fl6.flowlabel = ip6_flowinfo(iph6);
+ fl6.flowi6_proto = iph6->nexthdr;
+ fl6.daddr = iph6->daddr;
+ fl6.saddr = iph6->saddr;
+
+ dst = ip6_route_output(net, skb->sk, &fl6);
+ if (IS_ERR(dst) || dst->error)
+ return -EINVAL;
+ }
+
+ /* Although skb header was reserved in bpf_lwt_push_ip_encap(), it
+ * was done for the previous dst, so we are doing it here again, in
+ * case the new dst needs much more space. The call below is a noop
+ * if there is enough header space in skb.
+ */
+ err = skb_cow_head(skb, LL_RESERVED_SPACE(dst->dev));
+ if (unlikely(err))
+ return err;
+
+ skb_dst_drop(skb);
+ skb_dst_set(skb, dst);
+
+ err = dst_output(dev_net(skb_dst(skb)->dev), skb->sk, skb);
+ if (unlikely(err))
+ return err;
+
+ /* ip[6]_finish_output2 understand LWTUNNEL_XMIT_DONE */
+ return LWTUNNEL_XMIT_DONE;
+}
+
static int bpf_xmit(struct sk_buff *skb)
{
struct dst_entry *dst = skb_dst(skb);
@@ -154,11 +268,20 @@ static int bpf_xmit(struct sk_buff *skb)
bpf = bpf_lwt_lwtunnel(dst->lwtstate);
if (bpf->xmit.prog) {
+ __be16 proto = skb->protocol;
int ret;
ret = run_lwt_bpf(skb, &bpf->xmit, dst, CAN_REDIRECT);
switch (ret) {
case BPF_OK:
+ /* If the header changed, e.g. via bpf_lwt_push_encap,
+ * BPF_LWT_REROUTE below should have been used if the
+ * protocol was also changed.
+ */
+ if (skb->protocol != proto) {
+ kfree_skb(skb);
+ return -EINVAL;
+ }
/* If the header was expanded, headroom might be too
* small for L2 header to come, expand as needed.
*/
@@ -169,6 +292,8 @@ static int bpf_xmit(struct sk_buff *skb)
return LWTUNNEL_XMIT_CONTINUE;
case BPF_REDIRECT:
return LWTUNNEL_XMIT_DONE;
+ case BPF_LWT_REROUTE:
+ return bpf_lwt_xmit_reroute(skb);
default:
return ret;
}
--
2.20.1.611.gfbb209baf1-goog
^ permalink raw reply related
* [PATCH bpf-next v6 2/5] bpf: implement BPF_LWT_ENCAP_IP mode in bpf_lwt_push_encap
From: Peter Oskolkov @ 2019-02-01 17:22 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, netdev
Cc: Peter Oskolkov, David Ahern, Willem de Bruijn, Peter Oskolkov
In-Reply-To: <20190201172229.108867-1-posk@google.com>
This patch implements BPF_LWT_ENCAP_IP mode in bpf_lwt_push_encap
BPF helper. It enables BPF programs (specifically, BPF_PROG_TYPE_LWT_IN
and BPF_PROG_TYPE_LWT_XMIT prog types) to add IP encapsulation headers
to packets (e.g. IP/GRE, GUE, IPIP).
This is useful when thousands of different short-lived flows should be
encapped, each with different and dynamically determined destination.
Although lwtunnels can be used in some of these scenarios, the ability
to dynamically generate encap headers adds more flexibility, e.g.
when routing depends on the state of the host (reflected in global bpf
maps).
Note: a follow-up patch with deal with GSO-enabled packets, which
are currently rejected at encapping attempt.
Signed-off-by: Peter Oskolkov <posk@google.com>
---
include/net/lwtunnel.h | 3 ++
net/core/filter.c | 3 +-
net/core/lwt_bpf.c | 63 ++++++++++++++++++++++++++++++++++++++++++
3 files changed, 68 insertions(+), 1 deletion(-)
diff --git a/include/net/lwtunnel.h b/include/net/lwtunnel.h
index 33fd9ba7e0e5..f0973eca8036 100644
--- a/include/net/lwtunnel.h
+++ b/include/net/lwtunnel.h
@@ -126,6 +126,8 @@ int lwtunnel_cmp_encap(struct lwtunnel_state *a, struct lwtunnel_state *b);
int lwtunnel_output(struct net *net, struct sock *sk, struct sk_buff *skb);
int lwtunnel_input(struct sk_buff *skb);
int lwtunnel_xmit(struct sk_buff *skb);
+int bpf_lwt_push_ip_encap(struct sk_buff *skb, void *hdr, u32 len,
+ bool ingress);
static inline void lwtunnel_set_redirect(struct dst_entry *dst)
{
@@ -138,6 +140,7 @@ static inline void lwtunnel_set_redirect(struct dst_entry *dst)
dst->input = lwtunnel_input;
}
}
+
#else
static inline void lwtstate_free(struct lwtunnel_state *lws)
diff --git a/net/core/filter.c b/net/core/filter.c
index 27d3fbe4b77b..de6bd4b4e0a3 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -73,6 +73,7 @@
#include <linux/seg6_local.h>
#include <net/seg6.h>
#include <net/seg6_local.h>
+#include <net/lwtunnel.h>
/**
* sk_filter_trim_cap - run a packet through a socket filter
@@ -4804,7 +4805,7 @@ static int bpf_push_seg6_encap(struct sk_buff *skb, u32 type, void *hdr, u32 len
static int bpf_push_ip_encap(struct sk_buff *skb, void *hdr, u32 len,
bool ingress)
{
- return -EINVAL; /* Implemented in the next patch. */
+ return bpf_lwt_push_ip_encap(skb, hdr, len, ingress);
}
BPF_CALL_4(bpf_lwt_in_push_encap, struct sk_buff *, skb, u32, type, void *, hdr,
diff --git a/net/core/lwt_bpf.c b/net/core/lwt_bpf.c
index a648568c5e8f..eaec3d491894 100644
--- a/net/core/lwt_bpf.c
+++ b/net/core/lwt_bpf.c
@@ -390,6 +390,69 @@ static const struct lwtunnel_encap_ops bpf_encap_ops = {
.owner = THIS_MODULE,
};
+int bpf_lwt_push_ip_encap(struct sk_buff *skb, void *hdr, u32 len, bool ingress)
+{
+ struct iphdr *iph;
+ bool ipv4;
+ int err;
+
+ if (unlikely(len < sizeof(struct iphdr) || len > LWT_BPF_MAX_HEADROOM))
+ return -EINVAL;
+
+ /* GSO-enabled packets cannot be encapped at the moment. */
+ if (unlikely(skb_is_gso(skb)))
+ return -EINVAL;
+
+ /* validate protocol and length */
+ iph = (struct iphdr *)hdr;
+ if (iph->version == 4) {
+ ipv4 = true;
+ if (unlikely(len < iph->ihl * 4))
+ return -EINVAL;
+ } else if (iph->version == 6) {
+ ipv4 = false;
+ if (unlikely(len < sizeof(struct ipv6hdr)))
+ return -EINVAL;
+ } else {
+ return -EINVAL;
+ }
+
+ if (ingress)
+ err = skb_cow_head(skb, len + skb->mac_len);
+ else
+ err = skb_cow_head(skb,
+ len + LL_RESERVED_SPACE(skb_dst(skb)->dev));
+ if (unlikely(err))
+ return err;
+
+ /* push the encap headers and fix pointers */
+ skb_reset_inner_headers(skb);
+ skb->encapsulation = 1;
+ skb_push(skb, len);
+ if (ingress)
+ skb_postpush_rcsum(skb, iph, len);
+ skb_reset_network_header(skb);
+ memcpy(skb_network_header(skb), hdr, len);
+ bpf_compute_data_pointers(skb);
+
+ if (ipv4) {
+ skb->protocol = htons(ETH_P_IP);
+ iph = ip_hdr(skb);
+ if (iph->ihl * 4 < len)
+ skb_set_transport_header(skb, iph->ihl * 4);
+
+ if (!iph->check)
+ iph->check = ip_fast_csum((unsigned char *)iph,
+ iph->ihl);
+ } else {
+ skb->protocol = htons(ETH_P_IPV6);
+ if (sizeof(struct ipv6hdr) < len)
+ skb_set_transport_header(skb, sizeof(struct ipv6hdr));
+ }
+
+ return 0;
+}
+
static int __init bpf_lwt_init(void)
{
return lwtunnel_encap_add_ops(&bpf_encap_ops, LWTUNNEL_ENCAP_BPF);
--
2.20.1.611.gfbb209baf1-goog
^ permalink raw reply related
* [PATCH bpf-next v6 1/5] bpf: add plumbing for BPF_LWT_ENCAP_IP in bpf_lwt_push_encap
From: Peter Oskolkov @ 2019-02-01 17:22 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, netdev
Cc: Peter Oskolkov, David Ahern, Willem de Bruijn, Peter Oskolkov
In-Reply-To: <20190201172229.108867-1-posk@google.com>
This patch adds all needed plumbing in preparation to allowing
bpf programs to do IP encapping via bpf_lwt_push_encap. Actual
implementation is added in the next patch in the patchset.
Of note:
- bpf_lwt_push_encap can now be called from BPF_PROG_TYPE_LWT_XMIT
prog types in addition to BPF_PROG_TYPE_LWT_IN;
- as route lookups are different for ingress vs egress, the single
external bpf_lwt_push_encap BPF helper is routed internally to
either bpf_lwt_in_push_encap or bpf_lwt_xmit_push_encap BPF_CALLs,
depending on prog type.
Signed-off-by: Peter Oskolkov <posk@google.com>
---
include/uapi/linux/bpf.h | 23 ++++++++++++++++++--
net/core/filter.c | 46 +++++++++++++++++++++++++++++++++++-----
2 files changed, 62 insertions(+), 7 deletions(-)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 60b99b730a41..911c15585fab 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2015,6 +2015,16 @@ union bpf_attr {
* Only works if *skb* contains an IPv6 packet. Insert a
* Segment Routing Header (**struct ipv6_sr_hdr**) inside
* the IPv6 header.
+ * **BPF_LWT_ENCAP_IP**
+ * IP encapsulation (GRE/GUE/IPIP/etc). The outer header
+ * must be IPv4 or IPv6, followed by zero or more
+ * additional headers, up to LWT_BPF_MAX_HEADROOM total
+ * bytes in all prepended headers.
+ *
+ * BPF_LWT_ENCAP_SEG6*** types can be called by bpf programs of
+ * type BPF_PROG_TYPE_LWT_IN; BPF_LWT_ENCAP_IP type can be called
+ * by bpf programs of types BPF_PROG_TYPE_LWT_IN and
+ * BPF_PROG_TYPE_LWT_XMIT.
*
* A call to this helper is susceptible to change the underlaying
* packet buffer. Therefore, at load time, all checks on pointers
@@ -2495,7 +2505,8 @@ enum bpf_hdr_start_off {
/* Encapsulation type for BPF_FUNC_lwt_push_encap helper. */
enum bpf_lwt_encap_mode {
BPF_LWT_ENCAP_SEG6,
- BPF_LWT_ENCAP_SEG6_INLINE
+ BPF_LWT_ENCAP_SEG6_INLINE,
+ BPF_LWT_ENCAP_IP,
};
#define __bpf_md_ptr(type, name) \
@@ -2583,7 +2594,15 @@ enum bpf_ret_code {
BPF_DROP = 2,
/* 3-6 reserved */
BPF_REDIRECT = 7,
- /* >127 are reserved for prog type specific return codes */
+ /* >127 are reserved for prog type specific return codes.
+ *
+ * BPF_LWT_REROUTE: used by BPF_PROG_TYPE_LWT_IN and
+ * BPF_PROG_TYPE_LWT_XMIT to indicate that skb had been
+ * changed and should be routed based on its new L3 header.
+ * (This is an L3 redirect, as opposed to L2 redirect
+ * represented by BPF_REDIRECT above).
+ */
+ BPF_LWT_REROUTE = 128,
};
struct bpf_sock {
diff --git a/net/core/filter.c b/net/core/filter.c
index 41984ad4b9b4..27d3fbe4b77b 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4801,7 +4801,13 @@ static int bpf_push_seg6_encap(struct sk_buff *skb, u32 type, void *hdr, u32 len
}
#endif /* CONFIG_IPV6_SEG6_BPF */
-BPF_CALL_4(bpf_lwt_push_encap, struct sk_buff *, skb, u32, type, void *, hdr,
+static int bpf_push_ip_encap(struct sk_buff *skb, void *hdr, u32 len,
+ bool ingress)
+{
+ return -EINVAL; /* Implemented in the next patch. */
+}
+
+BPF_CALL_4(bpf_lwt_in_push_encap, struct sk_buff *, skb, u32, type, void *, hdr,
u32, len)
{
switch (type) {
@@ -4809,14 +4815,41 @@ BPF_CALL_4(bpf_lwt_push_encap, struct sk_buff *, skb, u32, type, void *, hdr,
case BPF_LWT_ENCAP_SEG6:
case BPF_LWT_ENCAP_SEG6_INLINE:
return bpf_push_seg6_encap(skb, type, hdr, len);
+#endif
+#if IS_ENABLED(CONFIG_LWTUNNEL_BPF)
+ case BPF_LWT_ENCAP_IP:
+ return bpf_push_ip_encap(skb, hdr, len, true /* ingress */);
#endif
default:
return -EINVAL;
}
}
-static const struct bpf_func_proto bpf_lwt_push_encap_proto = {
- .func = bpf_lwt_push_encap,
+BPF_CALL_4(bpf_lwt_xmit_push_encap, struct sk_buff *, skb, u32, type,
+ void *, hdr, u32, len)
+{
+ switch (type) {
+#if IS_ENABLED(CONFIG_LWTUNNEL_BPF)
+ case BPF_LWT_ENCAP_IP:
+ return bpf_push_ip_encap(skb, hdr, len, false /* egress */);
+#endif
+ default:
+ return -EINVAL;
+ }
+}
+
+static const struct bpf_func_proto bpf_lwt_in_push_encap_proto = {
+ .func = bpf_lwt_in_push_encap,
+ .gpl_only = false,
+ .ret_type = RET_INTEGER,
+ .arg1_type = ARG_PTR_TO_CTX,
+ .arg2_type = ARG_ANYTHING,
+ .arg3_type = ARG_PTR_TO_MEM,
+ .arg4_type = ARG_CONST_SIZE
+};
+
+static const struct bpf_func_proto bpf_lwt_xmit_push_encap_proto = {
+ .func = bpf_lwt_xmit_push_encap,
.gpl_only = false,
.ret_type = RET_INTEGER,
.arg1_type = ARG_PTR_TO_CTX,
@@ -5282,7 +5315,8 @@ bool bpf_helper_changes_pkt_data(void *func)
func == bpf_lwt_seg6_adjust_srh ||
func == bpf_lwt_seg6_action ||
#endif
- func == bpf_lwt_push_encap)
+ func == bpf_lwt_in_push_encap ||
+ func == bpf_lwt_xmit_push_encap)
return true;
return false;
@@ -5660,7 +5694,7 @@ lwt_in_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
{
switch (func_id) {
case BPF_FUNC_lwt_push_encap:
- return &bpf_lwt_push_encap_proto;
+ return &bpf_lwt_in_push_encap_proto;
default:
return lwt_out_func_proto(func_id, prog);
}
@@ -5696,6 +5730,8 @@ lwt_xmit_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
return &bpf_l4_csum_replace_proto;
case BPF_FUNC_set_hash_invalid:
return &bpf_set_hash_invalid_proto;
+ case BPF_FUNC_lwt_push_encap:
+ return &bpf_lwt_xmit_push_encap_proto;
default:
return lwt_out_func_proto(func_id, prog);
}
--
2.20.1.611.gfbb209baf1-goog
^ permalink raw reply related
* [PATCH bpf-next v6 0/5] bpf: add BPF_LWT_ENCAP_IP option to bpf_lwt_push_encap
From: Peter Oskolkov @ 2019-02-01 17:22 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, netdev
Cc: Peter Oskolkov, David Ahern, Willem de Bruijn, Peter Oskolkov
This patchset implements BPF_LWT_ENCAP_IP mode in bpf_lwt_push_encap
BPF helper. It enables BPF programs (specifically, BPF_PROG_TYPE_LWT_IN
and BPF_PROG_TYPE_LWT_XMIT prog types) to add IP encapsulation headers
to packets (e.g. IP/GRE, GUE, IPIP).
This is useful when thousands of different short-lived flows should be
encapped, each with different and dynamically determined destination.
Although lwtunnels can be used in some of these scenarios, the ability
to dynamically generate encap headers adds more flexibility, e.g.
when routing depends on the state of the host (reflected in global bpf
maps).
V2 changes: Added flowi-based route lookup, IPv6 encapping, and
encapping on ingress.
V3 changes: incorporated David Ahern's suggestions:
- added l3mdev check/oif (patch 2)
- sync bpf.h from include/uapi into tools/include/uapi
- selftest tweaks
V4 changes: moved route lookup/dst change from bpf_push_ip_encap
to when BPF_LWT_REROUTE is handled, as suggested by David Ahern.
V5 changes: added a check in lwt_xmit that skb->protocol stays the
same if the skb is to be passed back to the stack (ret == BPF_OK).
Again, suggested by David Ahern.
V6 changes: reject skb_is_gso() packets. A follow-up patch(set) will
process GSO packets more intelligently.
Peter Oskolkov (5):
bpf: add plumbing for BPF_LWT_ENCAP_IP in bpf_lwt_push_encap
bpf: implement BPF_LWT_ENCAP_IP mode in bpf_lwt_push_encap
bpf: add handling of BPF_LWT_REROUTE to lwt_bpf.c
bpf: sync <kdir>/<uapi>/bpf.h with tools/<uapi>/bpf.h
selftests: bpf: add test_lwt_ip_encap selftest
include/net/lwtunnel.h | 3 +
include/uapi/linux/bpf.h | 23 +-
net/core/filter.c | 47 ++-
net/core/lwt_bpf.c | 188 +++++++++++
tools/include/uapi/linux/bpf.h | 23 +-
tools/testing/selftests/bpf/Makefile | 5 +-
.../testing/selftests/bpf/test_lwt_ip_encap.c | 85 +++++
.../selftests/bpf/test_lwt_ip_encap.sh | 311 ++++++++++++++++++
8 files changed, 674 insertions(+), 11 deletions(-)
create mode 100644 tools/testing/selftests/bpf/test_lwt_ip_encap.c
create mode 100755 tools/testing/selftests/bpf/test_lwt_ip_encap.sh
--
2.20.1.611.gfbb209baf1-goog
^ permalink raw reply
* Re: WoL broken in r8169.c since kernel 4.19
From: Marc Haber @ 2019-02-01 17:19 UTC (permalink / raw)
To: Heiner Kallweit; +Cc: netdev@vger.kernel.org
In-Reply-To: <070e8dac-4ab2-ef86-f44d-991c7c8182aa@gmail.com>
On Fri, Feb 01, 2019 at 07:49:09AM +0100, Heiner Kallweit wrote:
> Great, thanks. This patch has been applied and is part of latest linux-next kernel already:
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/?h=next-20190201
> Would be much appreciated if you could build and test this kernel version.
I regret that I have not been able to get this through the compiler.
When I clone the linux-next repo and try to build with deb-pkg, some
process chokes on the weird version number of the linux-next kernel, and
transplanting the r8169.c from linux-next on to 4.20.6, compile aborts:
In file included from drivers/net/ethernet/realtek/r8169.c:13:
./include/linux/pci.h:830:12: error: ‘PCI_VENDOR_ID_USR’ undeclared here (not in a function); did you mean ‘PCI_VENDOR_ID_UMC’?
.vendor = PCI_VENDOR_ID_##vend, .device = (dev), \
^~~~~~~~~~~~~~
drivers/net/ethernet/realtek/r8169.c:222:4: note: in expansion of macro ‘PCI_VDEVICE’
{ PCI_VDEVICE(USR, 0x0116), RTL_CFG_0 },
^~~~~~~~~~~
drivers/net/ethernet/realtek/r8169.c: In function ‘rtl8169_start_xmit’:
drivers/net/ethernet/realtek/r8169.c:6261:6: error: implicit declaration of function ‘__netdev_sent_queue’; did you mean ‘netdev_sent_queue’? [-Werror=implicit-function-declaration]
if (__netdev_sent_queue(dev, skb->len, skb->xmit_more))
^~~~~~~~~~~~~~~~~~~
netdev_sent_queue
drivers/net/ethernet/realtek/r8169.c: In function ‘r8169_phy_connect’:
drivers/net/ethernet/realtek/r8169.c:6653:22: warning: passing argument 1 of ‘linkmode_copy’ makes pointer from integer without a cast [-Wint-conversion]
linkmode_copy(phydev->advertising, phydev->supported);
~~~~~~^~~~~~~~~~~~~
In file included from ./include/linux/phy.h:22,
from drivers/net/ethernet/realtek/r8169.c:19:
./include/linux/linkmode.h:13:49: note: expected ‘long unsigned int *’ but argument is of type ‘u32’ {aka ‘unsigned int’}
static inline void linkmode_copy(unsigned long *dst, const unsigned long *src)
~~~~~~~~~~~~~~~^~~
drivers/net/ethernet/realtek/r8169.c:6653:43: warning: passing argument 2 of ‘linkmode_copy’ makes pointer from integer without a cast [-Wint-conversion]
linkmode_copy(phydev->advertising, phydev->supported);
~~~~~~^~~~~~~~~~~
In file included from ./include/linux/phy.h:22,
from drivers/net/ethernet/realtek/r8169.c:19:
./include/linux/linkmode.h:13:75: note: expected ‘const long unsigned int *’ but argument is of type ‘u32’ {aka ‘unsigned int’}
static inline void linkmode_copy(unsigned long *dst, const unsigned long *src)
~~~~~~~~~~~~~~~~~~~~~^~~
cc1: some warnings being treated as errors
make[4]: *** [scripts/Makefile.build:297: drivers/net/ethernet/realtek/r8169.o] Error 1
Will it help if I transplant more files from linux-next to 4.20.6?
Greetings
Marc
--
-----------------------------------------------------------------------------
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421
^ permalink raw reply
* Re: KASAN: use-after-free Read in selinux_netlbl_socket_setsockopt
From: Paul Moore @ 2019-02-01 16:48 UTC (permalink / raw)
To: Dmitry Vyukov
Cc: Ralf Baechle, David Miller, linux-hams, netdev, syzbot,
Eric Paris, LKML, Stephen Smalley, selinux, syzkaller-bugs
In-Reply-To: <CACT4Y+Yk=bsS7Ti3PjeZm8yG_txt=orBztowHwddHRP_MzuTxw@mail.gmail.com>
On Fri, Feb 1, 2019 at 1:26 AM Dmitry Vyukov <dvyukov@google.com> wrote:
> On Wed, Jan 30, 2019 at 10:30 PM Paul Moore <paul@paul-moore.com> wrote:
> >
> > On Wed, Jan 30, 2019 at 4:01 PM syzbot
> > <syzbot+1bfc00ca3aabe5bcd4cb@syzkaller.appspotmail.com> wrote:
> > >
> > > Hello,
> > >
> > > syzbot found the following crash on:
> > >
> > > HEAD commit: 62967898789d Merge git://git.kernel.org/pub/scm/linux/kern..
> > > git tree: upstream
> > > console output: https://syzkaller.appspot.com/x/log.txt?x=167fdef8c00000
> > > kernel config: https://syzkaller.appspot.com/x/.config?x=4fceea9e2d99ac20
> > > dashboard link: https://syzkaller.appspot.com/bug?extid=1bfc00ca3aabe5bcd4cb
> > > compiler: gcc (GCC) 9.0.0 20181231 (experimental)
> > >
> > > Unfortunately, I don't have any reproducer for this crash yet.
> > >
> > > IMPORTANT: if you fix the bug, please add the following tag to the commit:
> > > Reported-by: syzbot+1bfc00ca3aabe5bcd4cb@syzkaller.appspotmail.com
> > >
> > > 8021q: adding VLAN 0 to HW filter on device team0
> > > ==================================================================
> > > BUG: KASAN: use-after-free in selinux_netlbl_socket_setsockopt+0x49b/0x510
> > > security/selinux/netlabel.c:525
> > > Read of size 8 at addr ffff8880a63cf078 by task syz-executor3/18477
> > >
> > > CPU: 1 PID: 18477 Comm: syz-executor3 Not tainted 5.0.0-rc4+ #51
> > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> > > Google 01/01/2011
> > > Call Trace:
> > > __dump_stack lib/dump_stack.c:77 [inline]
> > > dump_stack+0x1db/0x2d0 lib/dump_stack.c:113
> > > print_address_description.cold+0x7c/0x20d mm/kasan/report.c:187
> > > kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
> > > __asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:135
> > > selinux_netlbl_socket_setsockopt+0x49b/0x510
> > > security/selinux/netlabel.c:525
> >
> > At first glance this seems odd. The line above is simply
> > dereferencing sock->sk_security (getting the "sksec"), which we also
> > do higher up selinux_socket_setsockopt() via sock_has_perm(). Unless
> > somehow the socket is being released/freed in the middle of a
> > setsockopt() syscall this looks like maybe it's something else?
>
> Hi Paul,
>
> Searching for af_netrom across other syzbot bugs:
> https://groups.google.com/forum/#!searchin/syzkaller-bugs/af_netrom%7Csort:date
>
> I see at least:
> https://syzkaller.appspot.com/bug?extid=b0b1952f5864b4009b09
> https://syzkaller.appspot.com/bug?extid=febf3c50d4262e578b1c
> https://syzkaller.appspot.com/bug?extid=defa700d16f1bd1b9a05
>
> Which suggests there are some serious lifetime problems in netrom
> sockets. That would probably explain this crash as well.
That definitely looks plausible. While I'm not one to say it could
*never* be the SELinux/NetLabel code, I can say that the SELinux code
path in question hasn't changed in some time so I would be a little
surprised if it was suddenly broken.
> +netrom maintainers, does this explanation look plausible to you?
> should we dup this crash onto one of the other netrom bugs? and
> perhaps these netrom bugs across themselves too?
--
paul moore
www.paul-moore.com
^ permalink raw reply
* Re: [PATCH bpf-next v5 2/5] bpf: implement BPF_LWT_ENCAP_IP mode in bpf_lwt_push_encap
From: Peter Oskolkov @ 2019-02-01 16:48 UTC (permalink / raw)
To: Willem de Bruijn
Cc: Daniel Borkmann, Peter Oskolkov, Alexei Starovoitov,
Network Development, David Ahern
In-Reply-To: <CAF=yD-+GX5r68j1Noz1U+vU3N=O4J3k4dQs6OS71BRqkWvRouA@mail.gmail.com>
On Thu, Jan 31, 2019 at 5:47 PM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> On Thu, Jan 31, 2019 at 5:04 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
> >
> > On 01/31/2019 12:51 AM, Peter Oskolkov wrote:
> > > This patch implements BPF_LWT_ENCAP_IP mode in bpf_lwt_push_encap
> > > BPF helper. It enables BPF programs (specifically, BPF_PROG_TYPE_LWT_IN
> > > and BPF_PROG_TYPE_LWT_XMIT prog types) to add IP encapsulation headers
> > > to packets (e.g. IP/GRE, GUE, IPIP).
> > >
> > > This is useful when thousands of different short-lived flows should be
> > > encapped, each with different and dynamically determined destination.
> > > Although lwtunnels can be used in some of these scenarios, the ability
> > > to dynamically generate encap headers adds more flexibility, e.g.
> > > when routing depends on the state of the host (reflected in global bpf
> > > maps).
> > >
> > > Signed-off-by: Peter Oskolkov <posk@google.com>
>
> > > +int bpf_lwt_push_ip_encap(struct sk_buff *skb, void *hdr, u32 len, bool ingress)
> > > +{
> > > + struct iphdr *iph;
> > > + bool ipv4;
> > > + int err;
> > > +
> > > + if (unlikely(len < sizeof(struct iphdr) || len > LWT_BPF_MAX_HEADROOM))
> > > + return -EINVAL;
> > > +
> > > + /* validate protocol and length */
> > > + iph = (struct iphdr *)hdr;
> > > + if (iph->version == 4) {
> > > + ipv4 = true;
> > > + if (unlikely(len < iph->ihl * 4))
> > > + return -EINVAL;
> > > + } else if (iph->version == 6) {
> > > + ipv4 = false;
> > > + if (unlikely(len < sizeof(struct ipv6hdr)))
> > > + return -EINVAL;
> > > + } else {
> > > + return -EINVAL;
> > > + }
> > > +
> > > + if (ingress)
> > > + err = skb_cow_head(skb, len + skb->mac_len);
> > > + else
> > > + err = skb_cow_head(skb,
> > > + len + LL_RESERVED_SPACE(skb_dst(skb)->dev));
> > > + if (unlikely(err))
> > > + return err;
> > > +
> > > + /* push the encap headers and fix pointers */
> > > + skb_reset_inner_headers(skb);
> > > + skb->encapsulation = 1;
> > > + skb_push(skb, len);
> > > + if (ingress)
> > > + skb_postpush_rcsum(skb, iph, len);
> > > + skb_reset_network_header(skb);
> > > + memcpy(skb_network_header(skb), hdr, len);
> > > + bpf_compute_data_pointers(skb);
> >
> > Does this work transparently with GSO as well or would we need to
> > update shared info for this (like in nat64 case, for example)?
>
> Good point. It does need to update the gso_type to include the tunnel
> type, similar to iptunnel_handle_offloads.
>
> Only, the nice feature of this interface is that it is encap protocol
> independent, which implies that it does not know the correct type.
>
> I don't think that we want to allow programs to write the gso_type themselves.
>
> With GSO_PARTIAL, perhaps specifying the exact tunnel type can be
> avoided as long as it is a fixed prefix to replicate?
>
> The transport layer size does not change, so no need to recompute gso_segs?
>
> Either way, this seems non-trivial enough to me to do in a separate
> follow-on patch. For now just fail if skb_is_gso.
Thanks, Willem, for the details! I'll re-send the patchset with the
additional check that skb_is_gso() is false.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox