* Re: MT7621 ethernet does not get probed on net-next branch after 5.15 merge
From: Andrey Jr. Melnikov @ 2021-12-10 14:35 UTC (permalink / raw)
To: netdev; +Cc: linux-mediatek, openwrt-devel
In-Reply-To: <CAMhs-H9ve2VtLm8x__DEb0_CpoYsqix1HwLDcZ8_ZeEK9vdfQg@mail.gmail.com>
In gmane.comp.embedded.openwrt.devel Sergio Paracuellos <sergio.paracuellos@gmail.com> wrote:
> Hi Qingfang,
> On Fri, Oct 15, 2021 at 4:23 PM DENG Qingfang <dqfext@gmail.com> wrote:
> >
> > Hi,
> >
> > After the merge of 5.15.y into net-next, MT7621 ethernet
> > (mtk_eth_soc.c) does not get probed at all.
> >
> > Kernel log before 5.15 merge:
> > ...
> > libphy: Fixed MDIO Bus: probed
> > libphy: mdio: probed
> > mt7530 mdio-bus:1f: MT7530 adapts as multi-chip module
> > mtk_soc_eth 1e100000.ethernet eth0: mediatek frame engine at 0xbe100000, irq 20
> > mt7621-pci 1e140000.pcie: host bridge /pcie@1e140000 ranges:
> > ...
> >
> > Kernel log after 5.15 merge:
> > ...
> > libphy: Fixed MDIO Bus: probed
> > mt7621-pci 1e140000.pcie: host bridge /pcie@1e140000 ranges:
> > ...
> >
> >
> > I tried adding debug prints into the .mtk_probe function, but it did
> > not execute.
> > There are no dts changes for MT7621 between 5.14 and 5.15, so I
> > believe it should be something else.
> >
> > Any ideas?
> I had time to create a new image for my gnubee board using kernel 5.15
> and this problem does not exist on my side. Since no more mails have
> come for a while I guess this was a problem from your configuration,
> but just in case I preferred to answer to let you know. I am currently
> using v5.15.7 from linux-stable with some other patches that will be
> for 5.16. Just in case, you can check the kernel tree [0] I am
> currently using.
There is problem with reset controller and devlink commutication. reset
controller is abent in mainline, devlink defer all drivers which use reset
lines until reset-controller become available - so no drivers probed.
I'm create for myself this patch:
https://drive.google.com/file/d/1AiKlfvIgtrBsxtI-2XFBvaxGoE0S-s9d/view?usp=sharing
^ permalink raw reply
* Re: [syzbot] general protection fault in inet_csk_accept
From: Paolo Abeni @ 2021-12-10 14:37 UTC (permalink / raw)
To: Eric Dumazet, syzbot, Florian Westphal
Cc: davem, dsahern, kuba, linux-kernel, netdev, syzkaller-bugs,
yoshfuji
In-Reply-To: <CANn89iKJY21Y3MZMXBpVqNm6BhudgfE+c-v7EU8gMUcbEFVs+A@mail.gmail.com>
On Fri, 2021-12-10 at 03:13 -0800, Eric Dumazet wrote:
> On Fri, Dec 10, 2021 at 3:09 AM syzbot
> <syzbot+e4d843bb96a9431e6331@syzkaller.appspotmail.com> wrote:
> >
> > Hello,
> >
> > syzbot found the following issue on:
> >
> > HEAD commit: 2a987e65025e Merge tag 'perf-tools-fixes-for-v5.16-2021-12..
> > git tree: upstream
> > console output: https://syzkaller.appspot.com/x/log.txt?x=166f73adb00000
> > kernel config: https://syzkaller.appspot.com/x/.config?x=221ffc09e39ebbd1
> > dashboard link: https://syzkaller.appspot.com/bug?extid=e4d843bb96a9431e6331
> > compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2
> > syz repro: https://syzkaller.appspot.com/x/repro.syz?x=16280ae5b00000
> > C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1000fdc5b00000
> >
> > The issue was bisected to:
> >
>
> Note to MPTCP maintainers, I think this issue is MPTCP one,
Indeed it is, thanks for the head-up!
> and the bisection result shown here seems not relevant.
yep.
> The C repro is however correct, I trigger an immediate crash.
The repro is not triggering here - but I'm using a different kconfig.
Still the repro itself gives a good hint on the root cause. I'm testing
a patch with syzbot.
Thanks!
Paolo
^ permalink raw reply
* [PATCH v2] netfilter: fix regression in looped (broad|multi)cast's MAC handling
From: Ignacy Gawędzki @ 2021-12-10 14:27 UTC (permalink / raw)
To: netdev
In commit 5648b5e1169f ("netfilter: nfnetlink_queue: fix OOB when mac
header was cleared"), the test for non-empty MAC header introduced in
commit 2c38de4c1f8da7 ("netfilter: fix looped (broad|multi)cast's MAC
handling") has been replaced with a test for a set MAC header, which
breaks the case when the MAC header has been reset (using
skb_reset_mac_header), as is the case with looped-back multicast
packets.
This patch adds a test for a non-empty MAC header in addition to the
test for a set MAC header. The same two tests are also implemented in
nfnetlink_log.c, where the initial code of commit 2c38de4c1f8da7
("netfilter: fix looped (broad|multi)cast's MAC handling") has not been
touched, but where supposedly the same situation may happen.
Fixes: 5648b5e1169f ("netfilter: nfnetlink_queue: fix OOB when mac
header was cleared")
Signed-off-by: Ignacy Gawędzki <ignacy.gawedzki@green-communications.fr>
---
net/netfilter/nfnetlink_log.c | 3 ++-
net/netfilter/nfnetlink_queue.c | 3 ++-
2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/net/netfilter/nfnetlink_log.c b/net/netfilter/nfnetlink_log.c
index 691ef4cffdd9..7f83f9697fc1 100644
--- a/net/netfilter/nfnetlink_log.c
+++ b/net/netfilter/nfnetlink_log.c
@@ -556,7 +556,8 @@ __build_packet_message(struct nfnl_log_net *log,
goto nla_put_failure;
if (indev && skb->dev &&
- skb->mac_header != skb->network_header) {
+ skb_mac_header_was_set(skb) &&
+ skb_mac_header_len(skb) != 0) {
struct nfulnl_msg_packet_hw phw;
int len;
diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c
index 4acc4b8e9fe5..959527708e38 100644
--- a/net/netfilter/nfnetlink_queue.c
+++ b/net/netfilter/nfnetlink_queue.c
@@ -560,7 +560,8 @@ nfqnl_build_packet_message(struct net *net, struct nfqnl_instance *queue,
goto nla_put_failure;
if (indev && entskb->dev &&
- skb_mac_header_was_set(entskb)) {
+ skb_mac_header_was_set(entskb) &&
+ skb_mac_header_len(entskb) != 0) {
struct nfqnl_msg_packet_hw phw;
int len;
--
2.32.0
^ permalink raw reply related
* [PATCH bpf-next v2 6/8] bpf: Add XDP_REDIRECT support to XDP for bpf_prog_run()
From: Toke Høiland-Jørgensen @ 2021-12-10 14:20 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
KP Singh, David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer
Cc: Toke Høiland-Jørgensen, netdev, bpf
In-Reply-To: <20211210142008.76981-1-toke@redhat.com>
This adds support for doing real redirects when an XDP program returns
XDP_REDIRECT in bpf_prog_run(). To achieve this, we create a page pool
instance while setting up the test run, and feed pages from that into the
XDP program. The setup cost of this is amortised over the number of
repetitions specified by userspace.
To support performance testing use case, we further optimise the setup step
so that all pages in the pool are pre-initialised with the packet data, and
pre-computed context and xdp_frame objects stored at the start of each
page. This makes it possible to entirely avoid touching the page content on
each XDP program invocation, and enables sending up to 11.5 Mpps/core on my
test box.
Because the data pages are recycled by the page pool, and the test runner
doesn't re-initialise them for each run, subsequent invocations of the XDP
program will see the packet data in the state it was after the last time it
ran on that particular page. This means that an XDP program that modifies
the packet before redirecting it has to be careful about which assumptions
it makes about the packet content, but that is only an issue for the most
naively written programs.
Previous uses of bpf_prog_run() for XDP returned the modified packet data
and return code to userspace, which is a different semantic then this new
redirect mode. For this reason, the caller has to set the new
BPF_F_TEST_XDP_DO_REDIRECT flag when calling bpf_prog_run() to opt in to
the different semantics. Enabling this flag is only allowed if not setting
ctx_out and data_out in the test specification, since it means frames will
be redirected somewhere else, so they can't be returned.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
include/uapi/linux/bpf.h | 2 +
kernel/bpf/Kconfig | 1 +
net/bpf/test_run.c | 218 +++++++++++++++++++++++++++++++--
tools/include/uapi/linux/bpf.h | 2 +
4 files changed, 211 insertions(+), 12 deletions(-)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index c26871263f1f..224a6e3261f5 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1225,6 +1225,8 @@ enum {
/* If set, run the test on the cpu specified by bpf_attr.test.cpu */
#define BPF_F_TEST_RUN_ON_CPU (1U << 0)
+/* If set, support performing redirection of XDP frames */
+#define BPF_F_TEST_XDP_DO_REDIRECT (1U << 1)
/* type for BPF_ENABLE_STATS */
enum bpf_stats_type {
diff --git a/kernel/bpf/Kconfig b/kernel/bpf/Kconfig
index d24d518ddd63..c8c920020d11 100644
--- a/kernel/bpf/Kconfig
+++ b/kernel/bpf/Kconfig
@@ -30,6 +30,7 @@ config BPF_SYSCALL
select TASKS_TRACE_RCU
select BINARY_PRINTF
select NET_SOCK_MSG if NET
+ select PAGE_POOL if NET
default n
help
Enable the bpf() system call that allows to manipulate BPF programs
diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 46dd95755967..a16734578d85 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -14,6 +14,7 @@
#include <net/sock.h>
#include <net/tcp.h>
#include <net/net_namespace.h>
+#include <net/page_pool.h>
#include <linux/error-injection.h>
#include <linux/smp.h>
#include <linux/sock_diag.h>
@@ -23,19 +24,34 @@
#include <trace/events/bpf_test_run.h>
struct bpf_test_timer {
- enum { NO_PREEMPT, NO_MIGRATE } mode;
+ enum { NO_PREEMPT, NO_MIGRATE, XDP } mode;
u32 i;
u64 time_start, time_spent;
+ struct {
+ struct xdp_buff *orig_ctx;
+ struct xdp_rxq_info rxq;
+ struct page_pool *pp;
+ u16 frame_cnt;
+ } xdp;
};
static void bpf_test_timer_enter(struct bpf_test_timer *t)
__acquires(rcu)
{
rcu_read_lock();
- if (t->mode == NO_PREEMPT)
+ switch (t->mode) {
+ case NO_PREEMPT:
preempt_disable();
- else
+ break;
+ case XDP:
+ migrate_disable();
+ xdp_set_return_frame_no_direct();
+ t->xdp.frame_cnt = 0;
+ break;
+ case NO_MIGRATE:
migrate_disable();
+ break;
+ }
t->time_start = ktime_get_ns();
}
@@ -45,10 +61,18 @@ static void bpf_test_timer_leave(struct bpf_test_timer *t)
{
t->time_start = 0;
- if (t->mode == NO_PREEMPT)
+ switch (t->mode) {
+ case NO_PREEMPT:
preempt_enable();
- else
+ break;
+ case XDP:
+ xdp_do_flush();
+ xdp_clear_return_frame_no_direct();
+ fallthrough;
+ case NO_MIGRATE:
migrate_enable();
+ break;
+ }
rcu_read_unlock();
}
@@ -87,13 +111,162 @@ static bool bpf_test_timer_continue(struct bpf_test_timer *t, u32 repeat, int *e
return false;
}
+/* We put this struct at the head of each page with a context and frame
+ * initialised when the page is allocated, so we don't have to do this on each
+ * repetition of the test run.
+ */
+struct xdp_page_head {
+ struct xdp_buff orig_ctx;
+ struct xdp_buff ctx;
+ struct xdp_frame frm;
+ u8 data[];
+};
+
+#define TEST_XDP_FRAME_SIZE (PAGE_SIZE - sizeof(struct xdp_page_head) \
+ - sizeof(struct skb_shared_info))
+
+static void bpf_test_run_xdp_init_page(struct page *page, void *arg)
+{
+ struct xdp_page_head *head = phys_to_virt(page_to_phys(page));
+ struct xdp_buff *new_ctx, *orig_ctx;
+ u32 headroom = XDP_PACKET_HEADROOM;
+ struct bpf_test_timer *t = arg;
+ size_t frm_len, meta_len;
+ struct xdp_frame *frm;
+ void *data;
+
+ orig_ctx = t->xdp.orig_ctx;
+ frm_len = orig_ctx->data_end - orig_ctx->data_meta;
+ meta_len = orig_ctx->data - orig_ctx->data_meta;
+ headroom -= meta_len;
+
+ new_ctx = &head->ctx;
+ frm = &head->frm;
+ data = &head->data;
+ memcpy(data + headroom, orig_ctx->data_meta, frm_len);
+
+ xdp_init_buff(new_ctx, TEST_XDP_FRAME_SIZE, &t->xdp.rxq);
+ xdp_prepare_buff(new_ctx, data, headroom, frm_len, true);
+ new_ctx->data_meta = new_ctx->data + meta_len;
+
+ xdp_update_frame_from_buff(new_ctx, frm);
+ frm->mem = new_ctx->rxq->mem;
+
+ memcpy(&head->orig_ctx, new_ctx, sizeof(head->orig_ctx));
+}
+
+static int bpf_test_run_xdp_setup(struct bpf_test_timer *t, struct xdp_buff *orig_ctx)
+{
+ struct xdp_mem_info mem = {};
+ struct page_pool *pp;
+ int err;
+ struct page_pool_params pp_params = {
+ .order = 0,
+ .flags = 0,
+ .pool_size = NAPI_POLL_WEIGHT * 2,
+ .nid = NUMA_NO_NODE,
+ .max_len = TEST_XDP_FRAME_SIZE,
+ .init_callback = bpf_test_run_xdp_init_page,
+ .init_arg = t,
+ };
+
+ pp = page_pool_create(&pp_params);
+ if (IS_ERR(pp))
+ return PTR_ERR(pp);
+
+ /* will copy 'mem->id' into pp->xdp_mem_id */
+ err = xdp_reg_mem_model(&mem, MEM_TYPE_PAGE_POOL, pp);
+ if (err) {
+ page_pool_destroy(pp);
+ return err;
+ }
+ t->xdp.pp = pp;
+
+ /* We create a 'fake' RXQ referencing the original dev, but with an
+ * xdp_mem_info pointing to our page_pool
+ */
+ xdp_rxq_info_reg(&t->xdp.rxq, orig_ctx->rxq->dev, 0, 0);
+ t->xdp.rxq.mem.type = MEM_TYPE_PAGE_POOL;
+ t->xdp.rxq.mem.id = pp->xdp_mem_id;
+ t->xdp.orig_ctx = orig_ctx;
+
+ return 0;
+}
+
+static void bpf_test_run_xdp_teardown(struct bpf_test_timer *t)
+{
+ struct xdp_mem_info mem = {
+ .id = t->xdp.pp->xdp_mem_id,
+ .type = MEM_TYPE_PAGE_POOL,
+ };
+ xdp_unreg_mem_model(&mem);
+}
+
+static bool ctx_was_changed(struct xdp_page_head *head)
+{
+ return (head->orig_ctx.data != head->ctx.data ||
+ head->orig_ctx.data_meta != head->ctx.data_meta ||
+ head->orig_ctx.data_end != head->ctx.data_end);
+}
+
+static void reset_ctx(struct xdp_page_head *head)
+{
+ if (likely(!ctx_was_changed(head)))
+ return;
+
+ head->ctx.data = head->orig_ctx.data;
+ head->ctx.data_meta = head->orig_ctx.data_meta;
+ head->ctx.data_end = head->orig_ctx.data_end;
+ xdp_update_frame_from_buff(&head->ctx, &head->frm);
+}
+
+static int bpf_test_run_xdp_redirect(struct bpf_test_timer *t,
+ struct bpf_prog *prog, struct xdp_buff *orig_ctx)
+{
+ void *data, *data_end, *data_meta;
+ struct xdp_page_head *head;
+ struct xdp_buff *ctx;
+ struct page *page;
+ int ret, err = 0;
+
+ page = page_pool_dev_alloc_pages(t->xdp.pp);
+ if (!page)
+ return -ENOMEM;
+
+ head = phys_to_virt(page_to_phys(page));
+ reset_ctx(head);
+ ctx = &head->ctx;
+
+ ret = bpf_prog_run_xdp(prog, ctx);
+ if (ret == XDP_REDIRECT) {
+ struct xdp_frame *frm = &head->frm;
+
+ /* if program changed pkt bounds we need to update the xdp_frame */
+ if (unlikely(ctx_was_changed(head)))
+ xdp_update_frame_from_buff(ctx, frm);
+
+ err = xdp_do_redirect_frame(ctx->rxq->dev, ctx, frm, prog);
+ if (err)
+ ret = err;
+ }
+ if (ret != XDP_REDIRECT)
+ xdp_return_buff(ctx);
+
+ if (++t->xdp.frame_cnt >= NAPI_POLL_WEIGHT) {
+ xdp_do_flush();
+ t->xdp.frame_cnt = 0;
+ }
+
+ return ret;
+}
+
static int bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat,
- u32 *retval, u32 *time, bool xdp)
+ u32 *retval, u32 *time, bool xdp, bool xdp_redirect)
{
struct bpf_prog_array_item item = {.prog = prog};
struct bpf_run_ctx *old_ctx;
struct bpf_cg_run_ctx run_ctx;
- struct bpf_test_timer t = { NO_MIGRATE };
+ struct bpf_test_timer t = { .mode = (xdp && xdp_redirect) ? XDP : NO_MIGRATE };
enum bpf_cgroup_storage_type stype;
int ret;
@@ -110,14 +283,26 @@ static int bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat,
if (!repeat)
repeat = 1;
+ if (t.mode == XDP) {
+ ret = bpf_test_run_xdp_setup(&t, ctx);
+ if (ret)
+ return ret;
+ }
+
bpf_test_timer_enter(&t);
old_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
do {
run_ctx.prog_item = &item;
- if (xdp)
+ if (xdp && xdp_redirect) {
+ ret = bpf_test_run_xdp_redirect(&t, prog, ctx);
+ if (unlikely(ret < 0))
+ break;
+ *retval = ret;
+ } else if (xdp) {
*retval = bpf_prog_run_xdp(prog, ctx);
- else
+ } else {
*retval = bpf_prog_run(prog, ctx);
+ }
} while (bpf_test_timer_continue(&t, repeat, &ret, time));
bpf_reset_run_ctx(old_ctx);
bpf_test_timer_leave(&t);
@@ -125,6 +310,9 @@ static int bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat,
for_each_cgroup_storage_type(stype)
bpf_cgroup_storage_free(item.cgroup_storage[stype]);
+ if (t.mode == XDP)
+ bpf_test_run_xdp_teardown(&t);
+
return ret;
}
@@ -663,7 +851,7 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
ret = convert___skb_to_skb(skb, ctx);
if (ret)
goto out;
- ret = bpf_test_run(prog, skb, repeat, &retval, &duration, false);
+ ret = bpf_test_run(prog, skb, repeat, &retval, &duration, false, false);
if (ret)
goto out;
if (!is_l2) {
@@ -757,6 +945,7 @@ static void xdp_convert_buff_to_md(struct xdp_buff *xdp, struct xdp_md *xdp_md)
int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
union bpf_attr __user *uattr)
{
+ bool do_redirect = (kattr->test.flags & BPF_F_TEST_XDP_DO_REDIRECT);
u32 tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
u32 headroom = XDP_PACKET_HEADROOM;
u32 size = kattr->test.data_size_in;
@@ -773,6 +962,9 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
prog->expected_attach_type == BPF_XDP_CPUMAP)
return -EINVAL;
+ if (kattr->test.flags & ~BPF_F_TEST_XDP_DO_REDIRECT)
+ return -EINVAL;
+
ctx = bpf_ctx_init(kattr, sizeof(struct xdp_md));
if (IS_ERR(ctx))
return PTR_ERR(ctx);
@@ -781,7 +973,8 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
/* There can't be user provided data before the meta data */
if (ctx->data_meta || ctx->data_end != size ||
ctx->data > ctx->data_end ||
- unlikely(xdp_metalen_invalid(ctx->data)))
+ unlikely(xdp_metalen_invalid(ctx->data)) ||
+ (do_redirect && (kattr->test.data_out || kattr->test.ctx_out)))
goto free_ctx;
/* Meta data is allocated from the headroom */
headroom -= ctx->data;
@@ -807,7 +1000,8 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
if (repeat > 1)
bpf_prog_change_xdp(NULL, prog);
- ret = bpf_test_run(prog, &xdp, repeat, &retval, &duration, true);
+ ret = bpf_test_run(prog, &xdp, repeat, &retval, &duration,
+ true, do_redirect);
/* We convert the xdp_buff back to an xdp_md before checking the return
* code so the reference count of any held netdevice will be decremented
* even if the test run failed.
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index c26871263f1f..224a6e3261f5 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1225,6 +1225,8 @@ enum {
/* If set, run the test on the cpu specified by bpf_attr.test.cpu */
#define BPF_F_TEST_RUN_ON_CPU (1U << 0)
+/* If set, support performing redirection of XDP frames */
+#define BPF_F_TEST_XDP_DO_REDIRECT (1U << 1)
/* type for BPF_ENABLE_STATS */
enum bpf_stats_type {
--
2.34.0
^ permalink raw reply related
* [PATCH bpf-next v2 7/8] selftests/bpf: Add selftest for XDP_REDIRECT in bpf_prog_run()
From: Toke Høiland-Jørgensen @ 2021-12-10 14:20 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, David S. Miller,
Jakub Kicinski, Jesper Dangaard Brouer, John Fastabend,
Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
KP Singh
Cc: Toke Høiland-Jørgensen, Shuah Khan, netdev, bpf
In-Reply-To: <20211210142008.76981-1-toke@redhat.com>
This adds a selftest for the XDP_REDIRECT facility in bpf_prog_run, that
redirects packets into a veth and counts them using an XDP program on the
other side of the veth pair.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
.../bpf/prog_tests/xdp_do_redirect.c | 74 +++++++++++++++++++
.../bpf/progs/test_xdp_do_redirect.c | 34 +++++++++
2 files changed, 108 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c
create mode 100644 tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c
diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c b/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c
new file mode 100644
index 000000000000..c2effcf076a6
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c
@@ -0,0 +1,74 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <test_progs.h>
+#include <network_helpers.h>
+#include <net/if.h>
+#include "test_xdp_do_redirect.skel.h"
+
+#define SYS(fmt, ...) \
+ ({ \
+ char cmd[1024]; \
+ snprintf(cmd, sizeof(cmd), fmt, ##__VA_ARGS__); \
+ if (!ASSERT_OK(system(cmd), cmd)) \
+ goto fail; \
+ })
+
+#define NUM_PKTS 10
+void test_xdp_do_redirect(void)
+{
+ struct test_xdp_do_redirect *skel = NULL;
+ struct ipv6_packet data = pkt_v6;
+ struct xdp_md ctx_in = { .data_end = sizeof(data) };
+ __u8 dst_mac[ETH_ALEN] = {0x00, 0x11, 0x22, 0x33, 0x44, 0x55};
+ __u8 src_mac[ETH_ALEN] = {0x66, 0x77, 0x88, 0x99, 0xaa, 0xbb};
+ DECLARE_LIBBPF_OPTS(bpf_test_run_opts, opts,
+ .data_in = &data,
+ .data_size_in = sizeof(data),
+ .ctx_in = &ctx_in,
+ .ctx_size_in = sizeof(ctx_in),
+ .flags = BPF_F_TEST_XDP_DO_REDIRECT,
+ .repeat = NUM_PKTS,
+ );
+ int err, prog_fd, ifindex_src, ifindex_dst;
+ struct bpf_link *link;
+
+ memcpy(data.eth.h_dest, dst_mac, ETH_ALEN);
+ memcpy(data.eth.h_source, src_mac, ETH_ALEN);
+
+ skel = test_xdp_do_redirect__open();
+ if (!ASSERT_OK_PTR(skel, "skel"))
+ return;
+
+ SYS("ip link add veth_src type veth peer name veth_dst");
+ SYS("ip link set dev veth_src up");
+ SYS("ip link set dev veth_dst up");
+
+ ifindex_src = if_nametoindex("veth_src");
+ ifindex_dst = if_nametoindex("veth_dst");
+ if (!ASSERT_NEQ(ifindex_src, 0, "ifindex_src") ||
+ !ASSERT_NEQ(ifindex_dst, 0, "ifindex_dst"))
+ goto fail;
+
+ memcpy(skel->rodata->expect_dst, dst_mac, ETH_ALEN);
+ skel->rodata->ifindex_out = ifindex_src;
+
+ if (!ASSERT_OK(test_xdp_do_redirect__load(skel), "load"))
+ goto fail;
+
+ link = bpf_program__attach_xdp(skel->progs.xdp_count_pkts, ifindex_dst);
+ if (!ASSERT_OK_PTR(link, "prog_attach"))
+ goto fail;
+ skel->links.xdp_count_pkts = link;
+
+ prog_fd = bpf_program__fd(skel->progs.xdp_redirect_notouch);
+ err = bpf_prog_test_run_opts(prog_fd, &opts);
+ if (!ASSERT_OK(err, "prog_run"))
+ goto fail;
+
+ /* wait for the packets to be flushed */
+ kern_sync_rcu();
+
+ ASSERT_EQ(skel->bss->pkts_seen, NUM_PKTS, "pkt_count");
+fail:
+ system("ip link del dev veth_src");
+ test_xdp_do_redirect__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c b/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c
new file mode 100644
index 000000000000..254ebf523f37
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c
@@ -0,0 +1,34 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <vmlinux.h>
+#include <bpf/bpf_helpers.h>
+
+#define ETH_ALEN 6
+const volatile int ifindex_out;
+const volatile __u8 expect_dst[ETH_ALEN];
+volatile int pkts_seen = 0;
+
+SEC("xdp")
+int xdp_redirect_notouch(struct xdp_md *xdp)
+{
+ return bpf_redirect(ifindex_out, 0);
+}
+
+SEC("xdp")
+int xdp_count_pkts(struct xdp_md *xdp)
+{
+ void *data = (void *)(long)xdp->data;
+ void *data_end = (void *)(long)xdp->data_end;
+ struct ethhdr *eth = data;
+ int i;
+
+ if (eth + 1 > data_end)
+ return XDP_ABORTED;
+
+ for (i = 0; i < ETH_ALEN; i++)
+ if (expect_dst[i] != eth->h_dest[i])
+ return XDP_ABORTED;
+ pkts_seen++;
+ return XDP_DROP;
+}
+
+char _license[] SEC("license") = "GPL";
--
2.34.0
^ permalink raw reply related
* [PATCH bpf-next v2 8/8] samples/bpf: Add xdp_trafficgen sample
From: Toke Høiland-Jørgensen @ 2021-12-10 14:20 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
KP Singh, David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer
Cc: Toke Høiland-Jørgensen, netdev, bpf
In-Reply-To: <20211210142008.76981-1-toke@redhat.com>
This adds an XDP-based traffic generator sample which uses the DO_REDIRECT
flag of bpf_prog_run(). It works by building the initial packet in
userspace and passing it to the kernel where an XDP program redirects the
packet to the target interface. The traffic generator supports two modes of
operation: one that just sends copies of the same packet as fast as it can
without touching the packet data at all, and one that rewrites the
destination port number of each packet, making the generated traffic span a
range of port numbers.
The dynamic mode is included to demonstrate how the bpf_prog_run() facility
enables building a completely programmable packet generator using XDP.
Using the dynamic mode has about a 10% overhead compared to the static
mode, because the latter completely avoids touching the page data.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
samples/bpf/.gitignore | 1 +
samples/bpf/Makefile | 4 +
samples/bpf/xdp_redirect.bpf.c | 34 +++
samples/bpf/xdp_trafficgen_user.c | 421 ++++++++++++++++++++++++++++++
4 files changed, 460 insertions(+)
create mode 100644 samples/bpf/xdp_trafficgen_user.c
diff --git a/samples/bpf/.gitignore b/samples/bpf/.gitignore
index 0e7bfdbff80a..935672cbdd80 100644
--- a/samples/bpf/.gitignore
+++ b/samples/bpf/.gitignore
@@ -49,6 +49,7 @@ xdp_redirect_map_multi
xdp_router_ipv4
xdp_rxq_info
xdp_sample_pkts
+xdp_trafficgen
xdp_tx_iptunnel
xdpsock
xdpsock_ctrl_proc
diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile
index 38638845db9d..d827e0680945 100644
--- a/samples/bpf/Makefile
+++ b/samples/bpf/Makefile
@@ -58,6 +58,7 @@ tprogs-y += xdp_redirect_cpu
tprogs-y += xdp_redirect_map_multi
tprogs-y += xdp_redirect_map
tprogs-y += xdp_redirect
+tprogs-y += xdp_trafficgen
tprogs-y += xdp_monitor
# Libbpf dependencies
@@ -123,6 +124,7 @@ xdp_redirect_map_multi-objs := xdp_redirect_map_multi_user.o $(XDP_SAMPLE)
xdp_redirect_cpu-objs := xdp_redirect_cpu_user.o $(XDP_SAMPLE)
xdp_redirect_map-objs := xdp_redirect_map_user.o $(XDP_SAMPLE)
xdp_redirect-objs := xdp_redirect_user.o $(XDP_SAMPLE)
+xdp_trafficgen-objs := xdp_trafficgen_user.o $(XDP_SAMPLE)
xdp_monitor-objs := xdp_monitor_user.o $(XDP_SAMPLE)
# Tell kbuild to always build the programs
@@ -226,6 +228,7 @@ TPROGLDLIBS_map_perf_test += -lrt
TPROGLDLIBS_test_overhead += -lrt
TPROGLDLIBS_xdpsock += -pthread -lcap
TPROGLDLIBS_xsk_fwd += -pthread
+TPROGLDLIBS_xdp_trafficgen += -pthread
# Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on cmdline:
# make M=samples/bpf LLC=~/git/llvm-project/llvm/build/bin/llc CLANG=~/git/llvm-project/llvm/build/bin/clang
@@ -341,6 +344,7 @@ $(obj)/xdp_redirect_cpu_user.o: $(obj)/xdp_redirect_cpu.skel.h
$(obj)/xdp_redirect_map_multi_user.o: $(obj)/xdp_redirect_map_multi.skel.h
$(obj)/xdp_redirect_map_user.o: $(obj)/xdp_redirect_map.skel.h
$(obj)/xdp_redirect_user.o: $(obj)/xdp_redirect.skel.h
+$(obj)/xdp_trafficgen_user.o: $(obj)/xdp_redirect.skel.h
$(obj)/xdp_monitor_user.o: $(obj)/xdp_monitor.skel.h
$(obj)/tracex5_kern.o: $(obj)/syscall_nrs.h
diff --git a/samples/bpf/xdp_redirect.bpf.c b/samples/bpf/xdp_redirect.bpf.c
index 7c02bacfe96b..a09c6f576b79 100644
--- a/samples/bpf/xdp_redirect.bpf.c
+++ b/samples/bpf/xdp_redirect.bpf.c
@@ -39,6 +39,40 @@ int xdp_redirect_prog(struct xdp_md *ctx)
return bpf_redirect(ifindex_out, 0);
}
+SEC("xdp")
+int xdp_redirect_notouch(struct xdp_md *ctx)
+{
+ return bpf_redirect(ifindex_out, 0);
+}
+
+const volatile __u16 port_start;
+const volatile __u16 port_range;
+volatile __u16 next_port = 0;
+
+SEC("xdp")
+int xdp_redirect_update_port(struct xdp_md *ctx)
+{
+ void *data_end = (void *)(long)ctx->data_end;
+ void *data = (void *)(long)ctx->data;
+ __u16 cur_port, cksum_diff;
+ struct udphdr *hdr;
+
+ hdr = data + (sizeof(struct ethhdr) + sizeof(struct ipv6hdr));
+ if (hdr + 1 > data_end)
+ return XDP_ABORTED;
+
+ cur_port = bpf_ntohs(hdr->dest);
+ cksum_diff = next_port - cur_port;
+ if (cksum_diff) {
+ hdr->check = bpf_htons(~(~bpf_ntohs(hdr->check) + cksum_diff));
+ hdr->dest = bpf_htons(next_port);
+ }
+ if (next_port++ >= port_start + port_range - 1)
+ next_port = port_start;
+
+ return bpf_redirect(ifindex_out, 0);
+}
+
/* Redirect require an XDP bpf_prog loaded on the TX device */
SEC("xdp")
int xdp_redirect_dummy_prog(struct xdp_md *ctx)
diff --git a/samples/bpf/xdp_trafficgen_user.c b/samples/bpf/xdp_trafficgen_user.c
new file mode 100644
index 000000000000..03f3a7b3260d
--- /dev/null
+++ b/samples/bpf/xdp_trafficgen_user.c
@@ -0,0 +1,421 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (c) 2021 Toke Høiland-Jørgensen <toke@redhat.com>
+ */
+static const char *__doc__ =
+"XDP trafficgen tool, using bpf_redirect helper\n"
+"Usage: xdp_trafficgen [options] <IFINDEX|IFNAME>_OUT\n";
+
+#define _GNU_SOURCE
+#include <linux/bpf.h>
+#include <linux/if_link.h>
+#include <linux/if_ether.h>
+#include <linux/if_packet.h>
+#include <linux/ipv6.h>
+#include <linux/in6.h>
+#include <linux/udp.h>
+#include <assert.h>
+#include <errno.h>
+#include <sched.h>
+#include <signal.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdbool.h>
+#include <string.h>
+#include <net/if.h>
+#include <unistd.h>
+#include <libgen.h>
+#include <limits.h>
+#include <getopt.h>
+#include <pthread.h>
+#include <arpa/inet.h>
+#include <netinet/ether.h>
+#include <sys/resource.h>
+#include <sys/ioctl.h>
+#include <bpf/bpf.h>
+#include <bpf/bpf_endian.h>
+#include <bpf/libbpf.h>
+#include "bpf_util.h"
+#include "xdp_sample_user.h"
+#include "xdp_redirect.skel.h"
+
+static int mask = SAMPLE_REDIRECT_ERR_CNT |
+ SAMPLE_EXCEPTION_CNT | SAMPLE_DEVMAP_XMIT_CNT_MULTI;
+
+DEFINE_SAMPLE_INIT(xdp_redirect);
+
+static const struct option long_options[] = {
+ {"dst-mac", required_argument, NULL, 'm' },
+ {"src-mac", required_argument, NULL, 'M' },
+ {"dst-ip", required_argument, NULL, 'a' },
+ {"src-ip", required_argument, NULL, 'A' },
+ {"dst-port", required_argument, NULL, 'p' },
+ {"src-port", required_argument, NULL, 'P' },
+ {"dynamic-ports", required_argument, NULL, 'd' },
+ {"help", no_argument, NULL, 'h' },
+ {"stats", no_argument, NULL, 's' },
+ {"interval", required_argument, NULL, 'i' },
+ {"n-pkts", required_argument, NULL, 'n' },
+ {"threads", required_argument, NULL, 't' },
+ {"verbose", no_argument, NULL, 'v' },
+ {}
+};
+
+static int sample_res;
+static bool sample_exited;
+
+static void *run_samples(void *arg)
+{
+ unsigned long *interval = arg;
+
+ sample_res = sample_run(*interval, NULL, NULL);
+ sample_exited = true;
+ return NULL;
+}
+
+struct ipv6_packet {
+ struct ethhdr eth;
+ struct ipv6hdr iph;
+ struct udphdr udp;
+ __u8 payload[64 - sizeof(struct udphdr)
+ - sizeof(struct ethhdr) - sizeof(struct ipv6hdr)];
+} __packed;
+static struct ipv6_packet pkt_v6 = {
+ .eth.h_proto = __bpf_constant_htons(ETH_P_IPV6),
+ .iph.version = 6,
+ .iph.nexthdr = IPPROTO_UDP,
+ .iph.payload_len = bpf_htons(sizeof(struct ipv6_packet)
+ - offsetof(struct ipv6_packet, udp)),
+ .iph.hop_limit = 1,
+ .iph.saddr.s6_addr16 = {bpf_htons(0xfe80), 0, 0, 0, 0, 0, 0, bpf_htons(1)},
+ .iph.daddr.s6_addr16 = {bpf_htons(0xfe80), 0, 0, 0, 0, 0, 0, bpf_htons(2)},
+ .udp.source = bpf_htons(1),
+ .udp.dest = bpf_htons(1),
+ .udp.len = bpf_htons(sizeof(struct ipv6_packet)
+ - offsetof(struct ipv6_packet, udp)),
+};
+
+struct thread_config {
+ void *pkt;
+ size_t pkt_size;
+ __u32 cpu_core_id;
+ __u32 num_pkts;
+ int prog_fd;
+};
+
+struct config {
+ __be64 src_mac;
+ __be64 dst_mac;
+ struct in6_addr src_ip;
+ struct in6_addr dst_ip;
+ __be16 src_port;
+ __be16 dst_port;
+ int ifindex;
+ char ifname[IFNAMSIZ];
+};
+
+static void *run_traffic(void *arg)
+{
+ const struct thread_config *cfg = arg;
+ struct xdp_md ctx_in = {
+ .data_end = cfg->pkt_size,
+ };
+ DECLARE_LIBBPF_OPTS(bpf_test_run_opts, opts,
+ .data_in = cfg->pkt,
+ .data_size_in = cfg->pkt_size,
+ .ctx_in = &ctx_in,
+ .ctx_size_in = sizeof(ctx_in),
+ .repeat = cfg->num_pkts ?: 1 << 24,
+ .flags = BPF_F_TEST_XDP_DO_REDIRECT,
+ );
+ __u64 iterations = 0;
+ cpu_set_t cpu_cores;
+ int err;
+
+ CPU_ZERO(&cpu_cores);
+ CPU_SET(cfg->cpu_core_id, &cpu_cores);
+ pthread_setaffinity_np(pthread_self(), sizeof(cpu_set_t), &cpu_cores);
+ do {
+ err = bpf_prog_test_run_opts(cfg->prog_fd, &opts);
+ if (err) {
+ printf("bpf_prog_test_run ret %d errno %d\n", err, errno);
+ break;
+ }
+ iterations += opts.repeat;
+ } while (!sample_exited && (!cfg->num_pkts || cfg->num_pkts < iterations));
+ return NULL;
+}
+
+static __be16 calc_udp_cksum(const struct ipv6_packet *pkt)
+{
+ __u32 chksum = pkt->iph.nexthdr + bpf_ntohs(pkt->iph.payload_len);
+ int i;
+
+ for (i = 0; i < 8; i++) {
+ chksum += bpf_ntohs(pkt->iph.saddr.s6_addr16[i]);
+ chksum += bpf_ntohs(pkt->iph.daddr.s6_addr16[i]);
+ }
+ chksum += bpf_ntohs(pkt->udp.source);
+ chksum += bpf_ntohs(pkt->udp.dest);
+ chksum += bpf_ntohs(pkt->udp.len);
+
+ while (chksum >> 16)
+ chksum = (chksum & 0xFFFF) + (chksum >> 16);
+ return bpf_htons(~chksum);
+}
+
+static int prepare_pkt(struct config *cfg)
+{
+ __be64 src_mac = cfg->src_mac;
+ struct in6_addr nulladdr = {};
+ int i, err;
+
+ if (!src_mac) {
+ err = get_mac_addr(cfg->ifindex, &src_mac);
+ if (err)
+ return err;
+ }
+ for (i = 0; i < 6 ; i++) {
+ pkt_v6.eth.h_source[i] = *((__u8 *)&src_mac + i);
+ if (cfg->dst_mac)
+ pkt_v6.eth.h_dest[i] = *((__u8 *)&cfg->dst_mac + i);
+ }
+ if (memcmp(&cfg->src_ip, &nulladdr, sizeof(nulladdr)))
+ pkt_v6.iph.saddr = cfg->src_ip;
+ if (memcmp(&cfg->dst_ip, &nulladdr, sizeof(nulladdr)))
+ pkt_v6.iph.daddr = cfg->dst_ip;
+ if (cfg->src_port)
+ pkt_v6.udp.source = cfg->src_port;
+ if (cfg->dst_port)
+ pkt_v6.udp.dest = cfg->dst_port;
+ pkt_v6.udp.check = calc_udp_cksum(&pkt_v6);
+ return 0;
+}
+
+int main(int argc, char **argv)
+{
+ unsigned long interval = 2, threads = 1, dynports = 0;
+ __u64 num_pkts = 0;
+ pthread_t sample_thread, *runner_threads = NULL;
+ struct thread_config *t = NULL, tcfg = {
+ .pkt = &pkt_v6,
+ .pkt_size = sizeof(pkt_v6),
+ };
+ int ret = EXIT_FAIL_OPTION;
+ struct xdp_redirect *skel;
+ struct config cfg = {};
+ bool error = true;
+ int opt, i, err;
+
+ while ((opt = getopt_long(argc, argv, "a:A:d:hi:m:M:n:p:P:t:vs",
+ long_options, NULL)) != -1) {
+ switch (opt) {
+ case 'a':
+ if (!inet_pton(AF_INET6, optarg, &cfg.dst_ip)) {
+ fprintf(stderr, "Invalid IPv6 address: %s\n", optarg);
+ return -1;
+ }
+ break;
+ case 'A':
+ if (!inet_pton(AF_INET6, optarg, &cfg.src_ip)) {
+ fprintf(stderr, "Invalid IPv6 address: %s\n", optarg);
+ return -1;
+ }
+ break;
+ case 'd':
+ dynports = strtoul(optarg, NULL, 0);
+ if (dynports < 2 || dynports >= 65535) {
+ fprintf(stderr, "Dynamic port range must be >1 and < 65535\n");
+ return -1;
+ }
+ break;
+ case 'i':
+ interval = strtoul(optarg, NULL, 0);
+ if (interval < 1 || interval == ULONG_MAX) {
+ fprintf(stderr, "Need non-zero interval\n");
+ return -1;
+ }
+ break;
+ case 't':
+ threads = strtoul(optarg, NULL, 0);
+ if (threads < 1 || threads == ULONG_MAX) {
+ fprintf(stderr, "Need at least 1 thread\n");
+ return -1;
+ }
+ break;
+ case 'm':
+ case 'M':
+ struct ether_addr *a;
+
+ a = ether_aton(optarg);
+ if (!a) {
+ fprintf(stderr, "Invalid MAC: %s\n", optarg);
+ return -1;
+ }
+ if (opt == 'm')
+ memcpy(&cfg.dst_mac, a, sizeof(*a));
+ else
+ memcpy(&cfg.src_mac, a, sizeof(*a));
+ break;
+ case 'n':
+ num_pkts = strtoull(optarg, NULL, 0);
+ if (num_pkts >= 1ULL << 32) {
+ fprintf(stderr, "Can send up to 2^32-1 pkts or infinite (0)\n");
+ return -1;
+ }
+ tcfg.num_pkts = num_pkts;
+ break;
+ case 'p':
+ case 'P':
+ unsigned long p;
+
+ p = strtoul(optarg, NULL, 0);
+ if (!p || p > 0xFFFF) {
+ fprintf(stderr, "Invalid port: %s\n", optarg);
+ return -1;
+ }
+ if (opt == 'p')
+ cfg.dst_port = bpf_htons(p);
+ else
+ cfg.src_port = bpf_htons(p);
+ break;
+ case 'v':
+ sample_switch_mode();
+ break;
+ case 's':
+ mask |= SAMPLE_REDIRECT_CNT;
+ break;
+ case 'h':
+ error = false;
+ default:
+ sample_usage(argv, long_options, __doc__, mask, error);
+ return ret;
+ }
+ }
+
+ if (argc <= optind) {
+ sample_usage(argv, long_options, __doc__, mask, true);
+ return ret;
+ }
+
+ cfg.ifindex = if_nametoindex(argv[optind]);
+ if (!cfg.ifindex)
+ cfg.ifindex = strtoul(argv[optind], NULL, 0);
+
+ if (!cfg.ifindex) {
+ fprintf(stderr, "Bad interface index or name\n");
+ sample_usage(argv, long_options, __doc__, mask, true);
+ goto end;
+ }
+
+ if (!if_indextoname(cfg.ifindex, cfg.ifname)) {
+ fprintf(stderr, "Failed to if_indextoname for %d: %s\n", cfg.ifindex,
+ strerror(errno));
+ goto end;
+ }
+
+ err = prepare_pkt(&cfg);
+ if (err)
+ goto end;
+
+ if (dynports) {
+ if (!cfg.dst_port) {
+ fprintf(stderr, "Must specify dst port when using dynamic port range\n");
+ goto end;
+ }
+
+ if (dynports + bpf_ntohs(cfg.dst_port) - 1 > 65535) {
+ fprintf(stderr, "Dynamic port range must end <= 65535\n");
+ goto end;
+ }
+ }
+
+ skel = xdp_redirect__open();
+ if (!skel) {
+ fprintf(stderr, "Failed to xdp_redirect__open: %s\n", strerror(errno));
+ ret = EXIT_FAIL_BPF;
+ goto end;
+ }
+
+ ret = sample_init_pre_load(skel);
+ if (ret < 0) {
+ fprintf(stderr, "Failed to sample_init_pre_load: %s\n", strerror(-ret));
+ ret = EXIT_FAIL_BPF;
+ goto end_destroy;
+ }
+
+ skel->rodata->to_match[0] = cfg.ifindex;
+ skel->rodata->ifindex_out = cfg.ifindex;
+ skel->rodata->port_start = bpf_ntohs(cfg.dst_port);
+ skel->rodata->port_range = dynports;
+ skel->bss->next_port = bpf_ntohs(cfg.dst_port);
+
+ ret = xdp_redirect__load(skel);
+ if (ret < 0) {
+ fprintf(stderr, "Failed to xdp_redirect__load: %s\n", strerror(errno));
+ ret = EXIT_FAIL_BPF;
+ goto end_destroy;
+ }
+
+ if (dynports)
+ tcfg.prog_fd = bpf_program__fd(skel->progs.xdp_redirect_update_port);
+ else
+ tcfg.prog_fd = bpf_program__fd(skel->progs.xdp_redirect_notouch);
+
+ ret = sample_init(skel, mask);
+ if (ret < 0) {
+ fprintf(stderr, "Failed to initialize sample: %s\n", strerror(-ret));
+ ret = EXIT_FAIL;
+ goto end_destroy;
+ }
+
+ ret = EXIT_FAIL;
+
+ runner_threads = calloc(sizeof(pthread_t), threads);
+ if (!runner_threads) {
+ fprintf(stderr, "Couldn't allocate memory\n");
+ goto end_destroy;
+ }
+ t = calloc(sizeof(struct thread_config), threads);
+ if (!t) {
+ fprintf(stderr, "Couldn't allocate memory\n");
+ goto end_destroy;
+ }
+
+ printf("Transmitting on %s (ifindex %d; driver %s)\n",
+ cfg.ifname, cfg.ifindex, get_driver_name(cfg.ifindex));
+
+ sample_exited = false;
+ ret = pthread_create(&sample_thread, NULL, run_samples, &interval);
+ if (ret < 0) {
+ fprintf(stderr, "Failed to create sample thread: %s\n", strerror(-ret));
+ goto end_destroy;
+ }
+ sleep(1);
+ for (i = 0; i < threads; i++) {
+ memcpy(&t[i], &tcfg, sizeof(tcfg));
+ tcfg.cpu_core_id++;
+
+ ret = pthread_create(&runner_threads[i], NULL, run_traffic, &t[i]);
+ if (ret < 0) {
+ fprintf(stderr, "Failed to create traffic thread: %s\n", strerror(-ret));
+ ret = EXIT_FAIL;
+ goto end_cancel;
+ }
+ }
+ pthread_join(sample_thread, NULL);
+ for (i = 0; i < 0; i++)
+ pthread_join(runner_threads[i], NULL);
+ ret = sample_res;
+ goto end_destroy;
+
+end_cancel:
+ pthread_cancel(sample_thread);
+ for (i = 0; i < 0; i++)
+ pthread_cancel(runner_threads[i]);
+end_destroy:
+ xdp_redirect__destroy(skel);
+ free(runner_threads);
+ free(t);
+end:
+ sample_exit(ret);
+}
--
2.34.0
^ permalink raw reply related
* Re: [PATCH v2] samples/bpf: xdpsock: fix swap.cocci warning
From: Toke Høiland-Jørgensen @ 2021-12-10 14:26 UTC (permalink / raw)
To: Yihao Han, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
KP Singh, netdev, bpf, linux-kernel
Cc: kernel, Yihao Han
In-Reply-To: <20211209092250.56430-1-hanyihao@vivo.com>
Yihao Han <hanyihao@vivo.com> writes:
> Fix following swap.cocci warning:
> ./samples/bpf/xdpsock_user.c:528:22-23:
> WARNING opportunity for swap()
>
> Signed-off-by: Yihao Han <hanyihao@vivo.com>
Erm, did this get applied without anyone actually trying to compile
samples? I'm getting build errors as:
CC /home/build/linux/samples/bpf/xsk_fwd.o
/home/build/linux/samples/bpf/xsk_fwd.c: In function ‘swap_mac_addresses’:
/home/build/linux/samples/bpf/xsk_fwd.c:658:9: warning: implicit declaration of function ‘swap’; did you mean ‘swab’? [-Wimplicit-function-declaration]
658 | swap(*src_addr, *dst_addr);
| ^~~~
| swab
/usr/bin/ld: /home/build/linux/samples/bpf/xsk_fwd.o: in function `thread_func':
xsk_fwd.c:(.text+0x440): undefined reference to `swap'
collect2: error: ld returned 1 exit status
Could we maybe get samples/bpf added to the BPF CI builds? :)
-Toke
^ permalink raw reply
* [PATCH bpf-next v2 4/8] xdp: Move conversion to xdp_frame out of map functions
From: Toke Høiland-Jørgensen @ 2021-12-10 14:20 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
KP Singh, David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer
Cc: Toke Høiland-Jørgensen, netdev, bpf
In-Reply-To: <20211210142008.76981-1-toke@redhat.com>
All map redirect functions except XSK maps convert xdp_buff to xdp_frame
before enqueueing it. So move this conversion of out the map functions
and into xdp_do_redirect(). This removes a bit of duplicated code, but more
importantly it makes it possible to support caller-allocated xdp_frame
structures, which will be added in a subsequent commit.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
include/linux/bpf.h | 20 ++++++++++----------
kernel/bpf/cpumap.c | 8 +-------
kernel/bpf/devmap.c | 32 +++++++++++---------------------
net/core/filter.c | 24 +++++++++++++++++-------
4 files changed, 39 insertions(+), 45 deletions(-)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 8bbf08fbab66..691bb397500e 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1621,17 +1621,17 @@ void bpf_patch_call_args(struct bpf_insn *insn, u32 stack_depth);
struct btf *bpf_get_btf_vmlinux(void);
/* Map specifics */
-struct xdp_buff;
+struct xdp_frame;
struct sk_buff;
struct bpf_dtab_netdev;
struct bpf_cpu_map_entry;
void __dev_flush(void);
-int dev_xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp,
+int dev_xdp_enqueue(struct net_device *dev, struct xdp_frame *xdpf,
struct net_device *dev_rx);
-int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp,
+int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_frame *xdpf,
struct net_device *dev_rx);
-int dev_map_enqueue_multi(struct xdp_buff *xdp, struct net_device *dev_rx,
+int dev_map_enqueue_multi(struct xdp_frame *xdpf, struct net_device *dev_rx,
struct bpf_map *map, bool exclude_ingress);
int dev_map_generic_redirect(struct bpf_dtab_netdev *dst, struct sk_buff *skb,
struct bpf_prog *xdp_prog);
@@ -1640,7 +1640,7 @@ int dev_map_redirect_multi(struct net_device *dev, struct sk_buff *skb,
bool exclude_ingress);
void __cpu_map_flush(void);
-int cpu_map_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_buff *xdp,
+int cpu_map_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_frame *xdpf,
struct net_device *dev_rx);
int cpu_map_generic_redirect(struct bpf_cpu_map_entry *rcpu,
struct sk_buff *skb);
@@ -1818,26 +1818,26 @@ static inline void __dev_flush(void)
{
}
-struct xdp_buff;
+struct xdp_frame;
struct bpf_dtab_netdev;
struct bpf_cpu_map_entry;
static inline
-int dev_xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp,
+int dev_xdp_enqueue(struct net_device *dev, struct xdp_frame *xdpf,
struct net_device *dev_rx)
{
return 0;
}
static inline
-int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp,
+int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_frame *xdpf,
struct net_device *dev_rx)
{
return 0;
}
static inline
-int dev_map_enqueue_multi(struct xdp_buff *xdp, struct net_device *dev_rx,
+int dev_map_enqueue_multi(struct xdp_frame *xdpf, struct net_device *dev_rx,
struct bpf_map *map, bool exclude_ingress)
{
return 0;
@@ -1865,7 +1865,7 @@ static inline void __cpu_map_flush(void)
}
static inline int cpu_map_enqueue(struct bpf_cpu_map_entry *rcpu,
- struct xdp_buff *xdp,
+ struct xdp_frame *xdpf,
struct net_device *dev_rx)
{
return 0;
diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
index 585b2b77ccc4..12798b2c68d9 100644
--- a/kernel/bpf/cpumap.c
+++ b/kernel/bpf/cpumap.c
@@ -746,15 +746,9 @@ static void bq_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_frame *xdpf)
list_add(&bq->flush_node, flush_list);
}
-int cpu_map_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_buff *xdp,
+int cpu_map_enqueue(struct bpf_cpu_map_entry *rcpu, struct xdp_frame *xdpf,
struct net_device *dev_rx)
{
- struct xdp_frame *xdpf;
-
- xdpf = xdp_convert_buff_to_frame(xdp);
- if (unlikely(!xdpf))
- return -EOVERFLOW;
-
/* Info needed when constructing SKB on remote CPU */
xdpf->dev_rx = dev_rx;
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index f02d04540c0c..f29f439fac76 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -467,24 +467,19 @@ static void bq_enqueue(struct net_device *dev, struct xdp_frame *xdpf,
bq->q[bq->count++] = xdpf;
}
-static inline int __xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp,
+static inline int __xdp_enqueue(struct net_device *dev, struct xdp_frame *xdpf,
struct net_device *dev_rx,
struct bpf_prog *xdp_prog)
{
- struct xdp_frame *xdpf;
int err;
if (!dev->netdev_ops->ndo_xdp_xmit)
return -EOPNOTSUPP;
- err = xdp_ok_fwd_dev(dev, xdp->data_end - xdp->data);
+ err = xdp_ok_fwd_dev(dev, xdpf->len);
if (unlikely(err))
return err;
- xdpf = xdp_convert_buff_to_frame(xdp);
- if (unlikely(!xdpf))
- return -EOVERFLOW;
-
bq_enqueue(dev, xdpf, dev_rx, xdp_prog);
return 0;
}
@@ -520,27 +515,27 @@ static u32 dev_map_bpf_prog_run_skb(struct sk_buff *skb, struct bpf_dtab_netdev
return act;
}
-int dev_xdp_enqueue(struct net_device *dev, struct xdp_buff *xdp,
+int dev_xdp_enqueue(struct net_device *dev, struct xdp_frame *xdpf,
struct net_device *dev_rx)
{
- return __xdp_enqueue(dev, xdp, dev_rx, NULL);
+ return __xdp_enqueue(dev, xdpf, dev_rx, NULL);
}
-int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_buff *xdp,
+int dev_map_enqueue(struct bpf_dtab_netdev *dst, struct xdp_frame *xdpf,
struct net_device *dev_rx)
{
struct net_device *dev = dst->dev;
- return __xdp_enqueue(dev, xdp, dev_rx, dst->xdp_prog);
+ return __xdp_enqueue(dev, xdpf, dev_rx, dst->xdp_prog);
}
-static bool is_valid_dst(struct bpf_dtab_netdev *obj, struct xdp_buff *xdp)
+static bool is_valid_dst(struct bpf_dtab_netdev *obj, struct xdp_frame *xdpf)
{
if (!obj ||
!obj->dev->netdev_ops->ndo_xdp_xmit)
return false;
- if (xdp_ok_fwd_dev(obj->dev, xdp->data_end - xdp->data))
+ if (xdp_ok_fwd_dev(obj->dev, xdpf->len))
return false;
return true;
@@ -586,14 +581,13 @@ static int get_upper_ifindexes(struct net_device *dev, int *indexes)
return n;
}
-int dev_map_enqueue_multi(struct xdp_buff *xdp, struct net_device *dev_rx,
+int dev_map_enqueue_multi(struct xdp_frame *xdpf, struct net_device *dev_rx,
struct bpf_map *map, bool exclude_ingress)
{
struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map);
struct bpf_dtab_netdev *dst, *last_dst = NULL;
int excluded_devices[1+MAX_NEST_DEV];
struct hlist_head *head;
- struct xdp_frame *xdpf;
int num_excluded = 0;
unsigned int i;
int err;
@@ -603,15 +597,11 @@ int dev_map_enqueue_multi(struct xdp_buff *xdp, struct net_device *dev_rx,
excluded_devices[num_excluded++] = dev_rx->ifindex;
}
- xdpf = xdp_convert_buff_to_frame(xdp);
- if (unlikely(!xdpf))
- return -EOVERFLOW;
-
if (map->map_type == BPF_MAP_TYPE_DEVMAP) {
for (i = 0; i < map->max_entries; i++) {
dst = rcu_dereference_check(dtab->netdev_map[i],
rcu_read_lock_bh_held());
- if (!is_valid_dst(dst, xdp))
+ if (!is_valid_dst(dst, xdpf))
continue;
if (is_ifindex_excluded(excluded_devices, num_excluded, dst->dev->ifindex))
@@ -634,7 +624,7 @@ int dev_map_enqueue_multi(struct xdp_buff *xdp, struct net_device *dev_rx,
head = dev_map_index_hash(dtab, i);
hlist_for_each_entry_rcu(dst, head, index_hlist,
lockdep_is_held(&dtab->index_lock)) {
- if (!is_valid_dst(dst, xdp))
+ if (!is_valid_dst(dst, xdpf))
continue;
if (is_ifindex_excluded(excluded_devices, num_excluded,
diff --git a/net/core/filter.c b/net/core/filter.c
index fe27c91e3758..bfa4ffbced35 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3964,12 +3964,24 @@ int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp,
enum bpf_map_type map_type = ri->map_type;
void *fwd = ri->tgt_value;
u32 map_id = ri->map_id;
+ struct xdp_frame *xdpf;
struct bpf_map *map;
int err;
ri->map_id = 0; /* Valid map id idr range: [1,INT_MAX[ */
ri->map_type = BPF_MAP_TYPE_UNSPEC;
+ if (map_type == BPF_MAP_TYPE_XSKMAP) {
+ err = __xsk_map_redirect(fwd, xdp);
+ goto out;
+ }
+
+ xdpf = xdp_convert_buff_to_frame(xdp);
+ if (unlikely(!xdpf)) {
+ err = -EOVERFLOW;
+ goto err;
+ }
+
switch (map_type) {
case BPF_MAP_TYPE_DEVMAP:
fallthrough;
@@ -3977,17 +3989,14 @@ int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp,
map = READ_ONCE(ri->map);
if (unlikely(map)) {
WRITE_ONCE(ri->map, NULL);
- err = dev_map_enqueue_multi(xdp, dev, map,
+ err = dev_map_enqueue_multi(xdpf, dev, map,
ri->flags & BPF_F_EXCLUDE_INGRESS);
} else {
- err = dev_map_enqueue(fwd, xdp, dev);
+ err = dev_map_enqueue(fwd, xdpf, dev);
}
break;
case BPF_MAP_TYPE_CPUMAP:
- err = cpu_map_enqueue(fwd, xdp, dev);
- break;
- case BPF_MAP_TYPE_XSKMAP:
- err = __xsk_map_redirect(fwd, xdp);
+ err = cpu_map_enqueue(fwd, xdpf, dev);
break;
case BPF_MAP_TYPE_UNSPEC:
if (map_id == INT_MAX) {
@@ -3996,7 +4005,7 @@ int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp,
err = -EINVAL;
break;
}
- err = dev_xdp_enqueue(fwd, xdp, dev);
+ err = dev_xdp_enqueue(fwd, xdpf, dev);
break;
}
fallthrough;
@@ -4004,6 +4013,7 @@ int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp,
err = -EBADRQC;
}
+out:
if (unlikely(err))
goto err;
--
2.34.0
^ permalink raw reply related
* [PATCH bpf-next v2 5/8] xdp: add xdp_do_redirect_frame() for pre-computed xdp_frames
From: Toke Høiland-Jørgensen @ 2021-12-10 14:20 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
KP Singh, David S. Miller, Jakub Kicinski, Jesper Dangaard Brouer
Cc: Toke Høiland-Jørgensen, netdev, bpf
In-Reply-To: <20211210142008.76981-1-toke@redhat.com>
Add an xdp_do_redirect_frame() variant which supports pre-computed
xdp_frame structures. This will be used in bpf_prog_run() to avoid having
to write to the xdp_frame structure when the XDP program doesn't modify the
frame boundaries.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
include/linux/filter.h | 4 +++
net/core/filter.c | 65 +++++++++++++++++++++++++++++++++++-------
2 files changed, 58 insertions(+), 11 deletions(-)
diff --git a/include/linux/filter.h b/include/linux/filter.h
index b6a216eb217a..845452c83e0f 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -1022,6 +1022,10 @@ int xdp_do_generic_redirect(struct net_device *dev, struct sk_buff *skb,
int xdp_do_redirect(struct net_device *dev,
struct xdp_buff *xdp,
struct bpf_prog *prog);
+int xdp_do_redirect_frame(struct net_device *dev,
+ struct xdp_buff *xdp,
+ struct xdp_frame *xdpf,
+ struct bpf_prog *prog);
void xdp_do_flush(void);
/* The xdp_do_flush_map() helper has been renamed to drop the _map suffix, as
diff --git a/net/core/filter.c b/net/core/filter.c
index bfa4ffbced35..629188642b4e 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3957,26 +3957,44 @@ u32 xdp_master_redirect(struct xdp_buff *xdp)
}
EXPORT_SYMBOL_GPL(xdp_master_redirect);
-int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp,
- struct bpf_prog *xdp_prog)
+static inline int __xdp_do_redirect_xsk(struct bpf_redirect_info *ri,
+ struct net_device *dev,
+ struct xdp_buff *xdp,
+ struct bpf_prog *xdp_prog)
{
- struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
enum bpf_map_type map_type = ri->map_type;
void *fwd = ri->tgt_value;
u32 map_id = ri->map_id;
- struct xdp_frame *xdpf;
- struct bpf_map *map;
int err;
ri->map_id = 0; /* Valid map id idr range: [1,INT_MAX[ */
ri->map_type = BPF_MAP_TYPE_UNSPEC;
- if (map_type == BPF_MAP_TYPE_XSKMAP) {
- err = __xsk_map_redirect(fwd, xdp);
- goto out;
- }
+ err = __xsk_map_redirect(fwd, xdp);
+ if (unlikely(err))
+ goto err;
+
+ _trace_xdp_redirect_map(dev, xdp_prog, fwd, map_type, map_id, ri->tgt_index);
+ return 0;
+err:
+ _trace_xdp_redirect_map_err(dev, xdp_prog, fwd, map_type, map_id, ri->tgt_index, err);
+ return err;
+}
+
+static __always_inline int __xdp_do_redirect_frame(struct bpf_redirect_info *ri,
+ struct net_device *dev,
+ struct xdp_frame *xdpf,
+ struct bpf_prog *xdp_prog)
+{
+ enum bpf_map_type map_type = ri->map_type;
+ void *fwd = ri->tgt_value;
+ u32 map_id = ri->map_id;
+ struct bpf_map *map;
+ int err;
+
+ ri->map_id = 0; /* Valid map id idr range: [1,INT_MAX[ */
+ ri->map_type = BPF_MAP_TYPE_UNSPEC;
- xdpf = xdp_convert_buff_to_frame(xdp);
if (unlikely(!xdpf)) {
err = -EOVERFLOW;
goto err;
@@ -4013,7 +4031,6 @@ int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp,
err = -EBADRQC;
}
-out:
if (unlikely(err))
goto err;
@@ -4023,8 +4040,34 @@ int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp,
_trace_xdp_redirect_map_err(dev, xdp_prog, fwd, map_type, map_id, ri->tgt_index, err);
return err;
}
+
+int xdp_do_redirect(struct net_device *dev, struct xdp_buff *xdp,
+ struct bpf_prog *xdp_prog)
+{
+ struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
+ enum bpf_map_type map_type = ri->map_type;
+
+ if (map_type == BPF_MAP_TYPE_XSKMAP)
+ return __xdp_do_redirect_xsk(ri, dev, xdp, xdp_prog);
+
+ return __xdp_do_redirect_frame(ri, dev, xdp_convert_buff_to_frame(xdp),
+ xdp_prog);
+}
EXPORT_SYMBOL_GPL(xdp_do_redirect);
+int xdp_do_redirect_frame(struct net_device *dev, struct xdp_buff *xdp,
+ struct xdp_frame *xdpf, struct bpf_prog *xdp_prog)
+{
+ struct bpf_redirect_info *ri = this_cpu_ptr(&bpf_redirect_info);
+ enum bpf_map_type map_type = ri->map_type;
+
+ if (map_type == BPF_MAP_TYPE_XSKMAP)
+ return __xdp_do_redirect_xsk(ri, dev, xdp, xdp_prog);
+
+ return __xdp_do_redirect_frame(ri, dev, xdpf, xdp_prog);
+}
+EXPORT_SYMBOL_GPL(xdp_do_redirect_frame);
+
static int xdp_do_generic_redirect_map(struct net_device *dev,
struct sk_buff *skb,
struct xdp_buff *xdp,
--
2.34.0
^ permalink raw reply related
* [PATCH bpf-next v2 3/8] xdp: Allow registering memory model without rxq reference
From: Toke Høiland-Jørgensen @ 2021-12-10 14:20 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, David S. Miller,
Jakub Kicinski, Jesper Dangaard Brouer, John Fastabend,
Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
KP Singh
Cc: Toke Høiland-Jørgensen, netdev, bpf
In-Reply-To: <20211210142008.76981-1-toke@redhat.com>
The functions that register an XDP memory model take a struct xdp_rxq as
parameter, but the RXQ is not actually used for anything other than pulling
out the struct xdp_mem_info that it embeds. So refactor the register
functions and export variants that just take a pointer to the xdp_mem_info.
This is in preparation for enabling XDP_REDIRECT in bpf_prog_run(), using a
page_pool instance that is not connected to any network device.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
include/net/xdp.h | 3 ++
net/core/xdp.c | 92 +++++++++++++++++++++++++++++++----------------
2 files changed, 65 insertions(+), 30 deletions(-)
diff --git a/include/net/xdp.h b/include/net/xdp.h
index 447f9b1578f3..8f0812e4996d 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -260,6 +260,9 @@ bool xdp_rxq_info_is_reg(struct xdp_rxq_info *xdp_rxq);
int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
enum xdp_mem_type type, void *allocator);
void xdp_rxq_info_unreg_mem_model(struct xdp_rxq_info *xdp_rxq);
+int xdp_reg_mem_model(struct xdp_mem_info *mem,
+ enum xdp_mem_type type, void *allocator);
+void xdp_unreg_mem_model(struct xdp_mem_info *mem);
/* Drivers not supporting XDP metadata can use this helper, which
* rejects any room expansion for metadata as a result.
diff --git a/net/core/xdp.c b/net/core/xdp.c
index 143388c6d9dd..2901bb7004cc 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -110,20 +110,15 @@ static void mem_allocator_disconnect(void *allocator)
mutex_unlock(&mem_id_lock);
}
-void xdp_rxq_info_unreg_mem_model(struct xdp_rxq_info *xdp_rxq)
+void xdp_unreg_mem_model(struct xdp_mem_info *mem)
{
struct xdp_mem_allocator *xa;
- int type = xdp_rxq->mem.type;
- int id = xdp_rxq->mem.id;
+ int type = mem->type;
+ int id = mem->id;
/* Reset mem info to defaults */
- xdp_rxq->mem.id = 0;
- xdp_rxq->mem.type = 0;
-
- if (xdp_rxq->reg_state != REG_STATE_REGISTERED) {
- WARN(1, "Missing register, driver bug");
- return;
- }
+ mem->id = 0;
+ mem->type = 0;
if (id == 0)
return;
@@ -135,6 +130,17 @@ void xdp_rxq_info_unreg_mem_model(struct xdp_rxq_info *xdp_rxq)
rcu_read_unlock();
}
}
+EXPORT_SYMBOL_GPL(xdp_unreg_mem_model);
+
+void xdp_rxq_info_unreg_mem_model(struct xdp_rxq_info *xdp_rxq)
+{
+ if (xdp_rxq->reg_state != REG_STATE_REGISTERED) {
+ WARN(1, "Missing register, driver bug");
+ return;
+ }
+
+ xdp_unreg_mem_model(&xdp_rxq->mem);
+}
EXPORT_SYMBOL_GPL(xdp_rxq_info_unreg_mem_model);
void xdp_rxq_info_unreg(struct xdp_rxq_info *xdp_rxq)
@@ -259,28 +265,24 @@ static bool __is_supported_mem_type(enum xdp_mem_type type)
return true;
}
-int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
- enum xdp_mem_type type, void *allocator)
+static struct xdp_mem_allocator *__xdp_reg_mem_model(struct xdp_mem_info *mem,
+ enum xdp_mem_type type,
+ void *allocator)
{
struct xdp_mem_allocator *xdp_alloc;
gfp_t gfp = GFP_KERNEL;
int id, errno, ret;
void *ptr;
- if (xdp_rxq->reg_state != REG_STATE_REGISTERED) {
- WARN(1, "Missing register, driver bug");
- return -EFAULT;
- }
-
if (!__is_supported_mem_type(type))
- return -EOPNOTSUPP;
+ return ERR_PTR(-EOPNOTSUPP);
- xdp_rxq->mem.type = type;
+ mem->type = type;
if (!allocator) {
if (type == MEM_TYPE_PAGE_POOL)
- return -EINVAL; /* Setup time check page_pool req */
- return 0;
+ return ERR_PTR(-EINVAL); /* Setup time check page_pool req */
+ return NULL;
}
/* Delay init of rhashtable to save memory if feature isn't used */
@@ -290,13 +292,13 @@ int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
mutex_unlock(&mem_id_lock);
if (ret < 0) {
WARN_ON(1);
- return ret;
+ return ERR_PTR(ret);
}
}
xdp_alloc = kzalloc(sizeof(*xdp_alloc), gfp);
if (!xdp_alloc)
- return -ENOMEM;
+ return ERR_PTR(-ENOMEM);
mutex_lock(&mem_id_lock);
id = __mem_id_cyclic_get(gfp);
@@ -304,15 +306,15 @@ int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
errno = id;
goto err;
}
- xdp_rxq->mem.id = id;
- xdp_alloc->mem = xdp_rxq->mem;
+ mem->id = id;
+ xdp_alloc->mem = *mem;
xdp_alloc->allocator = allocator;
/* Insert allocator into ID lookup table */
ptr = rhashtable_insert_slow(mem_id_ht, &id, &xdp_alloc->node);
if (IS_ERR(ptr)) {
- ida_simple_remove(&mem_id_pool, xdp_rxq->mem.id);
- xdp_rxq->mem.id = 0;
+ ida_simple_remove(&mem_id_pool, mem->id);
+ mem->id = 0;
errno = PTR_ERR(ptr);
goto err;
}
@@ -322,13 +324,43 @@ int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
mutex_unlock(&mem_id_lock);
- trace_mem_connect(xdp_alloc, xdp_rxq);
- return 0;
+ return xdp_alloc;
err:
mutex_unlock(&mem_id_lock);
kfree(xdp_alloc);
- return errno;
+ return ERR_PTR(errno);
+}
+
+int xdp_reg_mem_model(struct xdp_mem_info *mem,
+ enum xdp_mem_type type, void *allocator)
+{
+ struct xdp_mem_allocator *xdp_alloc;
+
+ xdp_alloc = __xdp_reg_mem_model(mem, type, allocator);
+ if (IS_ERR(xdp_alloc))
+ return PTR_ERR(xdp_alloc);
+ return 0;
+}
+EXPORT_SYMBOL_GPL(xdp_reg_mem_model);
+
+int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
+ enum xdp_mem_type type, void *allocator)
+{
+ struct xdp_mem_allocator *xdp_alloc;
+
+ if (xdp_rxq->reg_state != REG_STATE_REGISTERED) {
+ WARN(1, "Missing register, driver bug");
+ return -EFAULT;
+ }
+
+ xdp_alloc = __xdp_reg_mem_model(&xdp_rxq->mem, type, allocator);
+ if (IS_ERR(xdp_alloc))
+ return PTR_ERR(xdp_alloc);
+
+ trace_mem_connect(xdp_alloc, xdp_rxq);
+ return 0;
}
+
EXPORT_SYMBOL_GPL(xdp_rxq_info_reg_mem_model);
/* XDP RX runs under NAPI protection, and in different delivery error
--
2.34.0
^ permalink raw reply related
* [PATCH bpf-next v2 1/8] page_pool: Add callback to init pages when they are allocated
From: Toke Høiland-Jørgensen @ 2021-12-10 14:20 UTC (permalink / raw)
To: Jesper Dangaard Brouer, Ilias Apalodimas, David S. Miller,
Jakub Kicinski, Alexei Starovoitov, Daniel Borkmann,
Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
John Fastabend, KP Singh
Cc: Toke Høiland-Jørgensen, netdev, bpf
In-Reply-To: <20211210142008.76981-1-toke@redhat.com>
Add a new callback function to page_pool that, if set, will be called every
time a new page is allocated. This will be used from bpf_test_run() to
initialise the page data with the data provided by userspace when running
XDP programs with redirect turned on.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
include/net/page_pool.h | 2 ++
net/core/page_pool.c | 2 ++
2 files changed, 4 insertions(+)
diff --git a/include/net/page_pool.h b/include/net/page_pool.h
index 3855f069627f..a71201854c41 100644
--- a/include/net/page_pool.h
+++ b/include/net/page_pool.h
@@ -80,6 +80,8 @@ struct page_pool_params {
enum dma_data_direction dma_dir; /* DMA mapping direction */
unsigned int max_len; /* max DMA sync memory size */
unsigned int offset; /* DMA addr offset */
+ void (*init_callback)(struct page *page, void *arg);
+ void *init_arg;
};
struct page_pool {
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 9b60e4301a44..fb5a90b9d574 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -219,6 +219,8 @@ static void page_pool_set_pp_info(struct page_pool *pool,
{
page->pp = pool;
page->pp_magic |= PP_SIGNATURE;
+ if (unlikely(pool->p.init_callback))
+ pool->p.init_callback(page, pool->p.init_arg);
}
static void page_pool_clear_pp_info(struct page *page)
--
2.34.0
^ permalink raw reply related
* [PATCH bpf-next v2 2/8] page_pool: Store the XDP mem id
From: Toke Høiland-Jørgensen @ 2021-12-10 14:20 UTC (permalink / raw)
To: Jesper Dangaard Brouer, Ilias Apalodimas, David S. Miller,
Jakub Kicinski, Alexei Starovoitov, Daniel Borkmann,
John Fastabend, Andrii Nakryiko, Martin KaFai Lau, Song Liu,
Yonghong Song, KP Singh
Cc: Toke Høiland-Jørgensen, netdev, bpf
In-Reply-To: <20211210142008.76981-1-toke@redhat.com>
Store the XDP mem ID inside the page_pool struct so it can be retrieved
later for use in bpf_prog_run().
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
include/net/page_pool.h | 9 +++++++--
net/core/page_pool.c | 4 +++-
net/core/xdp.c | 2 +-
3 files changed, 11 insertions(+), 4 deletions(-)
diff --git a/include/net/page_pool.h b/include/net/page_pool.h
index a71201854c41..6bc0409c4ffd 100644
--- a/include/net/page_pool.h
+++ b/include/net/page_pool.h
@@ -96,6 +96,7 @@ struct page_pool {
unsigned int frag_offset;
struct page *frag_page;
long frag_users;
+ u32 xdp_mem_id;
/*
* Data structure for allocation side
@@ -170,9 +171,12 @@ bool page_pool_return_skb_page(struct page *page);
struct page_pool *page_pool_create(const struct page_pool_params *params);
+struct xdp_mem_info;
+
#ifdef CONFIG_PAGE_POOL
void page_pool_destroy(struct page_pool *pool);
-void page_pool_use_xdp_mem(struct page_pool *pool, void (*disconnect)(void *));
+void page_pool_use_xdp_mem(struct page_pool *pool, void (*disconnect)(void *),
+ struct xdp_mem_info *mem);
void page_pool_release_page(struct page_pool *pool, struct page *page);
void page_pool_put_page_bulk(struct page_pool *pool, void **data,
int count);
@@ -182,7 +186,8 @@ static inline void page_pool_destroy(struct page_pool *pool)
}
static inline void page_pool_use_xdp_mem(struct page_pool *pool,
- void (*disconnect)(void *))
+ void (*disconnect)(void *),
+ struct xdp_mem_info *mem)
{
}
static inline void page_pool_release_page(struct page_pool *pool,
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index fb5a90b9d574..2605467251f1 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -695,10 +695,12 @@ static void page_pool_release_retry(struct work_struct *wq)
schedule_delayed_work(&pool->release_dw, DEFER_TIME);
}
-void page_pool_use_xdp_mem(struct page_pool *pool, void (*disconnect)(void *))
+void page_pool_use_xdp_mem(struct page_pool *pool, void (*disconnect)(void *),
+ struct xdp_mem_info *mem)
{
refcount_inc(&pool->user_cnt);
pool->disconnect = disconnect;
+ pool->xdp_mem_id = mem->id;
}
void page_pool_destroy(struct page_pool *pool)
diff --git a/net/core/xdp.c b/net/core/xdp.c
index 5ddc29f29bad..143388c6d9dd 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -318,7 +318,7 @@ int xdp_rxq_info_reg_mem_model(struct xdp_rxq_info *xdp_rxq,
}
if (type == MEM_TYPE_PAGE_POOL)
- page_pool_use_xdp_mem(allocator, mem_allocator_disconnect);
+ page_pool_use_xdp_mem(allocator, mem_allocator_disconnect, mem);
mutex_unlock(&mem_id_lock);
--
2.34.0
^ permalink raw reply related
* [PATCH bpf-next v2 0/8] Add support for transmitting packets using XDP in bpf_prog_run()
From: Toke Høiland-Jørgensen @ 2021-12-10 14:20 UTC (permalink / raw)
To: Alexei Starovoitov, Daniel Borkmann, David S. Miller,
Jakub Kicinski, Jesper Dangaard Brouer, John Fastabend,
Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
KP Singh
Cc: Toke Høiland-Jørgensen, netdev, bpf
This series adds support for transmitting packets using XDP in
bpf_prog_run(), by enabling the xdp_do_redirect() callback so XDP programs
can perform "real" redirects to devices or maps, using an opt-in flag when
executing the program.
The primary use case for this is testing the redirect map types and the
ndo_xdp_xmit driver operation without generating external traffic. But it
turns out to also be useful for creating a programmable traffic generator.
The last patch adds a sample traffic generator to bpf/samples, which
can transmit up to 11.5 Mpps/core on my test machine.
To transmit the frames, the new mode instantiates a page_pool structure in
bpf_prog_run() and initialises the pages with the data passed in by
userspace. These pages can then be redirected using the normal redirection
mechanism, and the existing page_pool code takes care of returning and
recycling them. The setup is optimised for high performance with a high
number of repetitions to support stress testing and the traffic generator
use case; see patch 6 for details.
The series is structured as follows: Patches 1-2 adds a few features to
page_pool that are needed for the usage in bpf_prog_run(). Similarly,
patches 3-5 performs a couple of preparatory refactorings of the XDP
redirect and memory management code. Patch 6 adds the support to
bpf_prog_run() itself, patch 7 adds a selftest, and patch 8 adds the
traffic generator example to samples/bpf.
v2:
- Split up up __xdp_do_redirect to avoid passing two pointers to it (John)
- Always reset context pointers before each test run (John)
- Use get_mac_addr() from xdp_sample_user.h instead of rolling our own (Kumar)
- Fix wrong offset for metadata pointer
Toke Høiland-Jørgensen (8):
page_pool: Add callback to init pages when they are allocated
page_pool: Store the XDP mem id
xdp: Allow registering memory model without rxq reference
xdp: Move conversion to xdp_frame out of map functions
xdp: add xdp_do_redirect_frame() for pre-computed xdp_frames
bpf: Add XDP_REDIRECT support to XDP for bpf_prog_run()
selftests/bpf: Add selftest for XDP_REDIRECT in bpf_prog_run()
samples/bpf: Add xdp_trafficgen sample
include/linux/bpf.h | 20 +-
include/linux/filter.h | 4 +
include/net/page_pool.h | 11 +-
include/net/xdp.h | 3 +
include/uapi/linux/bpf.h | 2 +
kernel/bpf/Kconfig | 1 +
kernel/bpf/cpumap.c | 8 +-
kernel/bpf/devmap.c | 32 +-
net/bpf/test_run.c | 218 ++++++++-
net/core/filter.c | 73 ++-
net/core/page_pool.c | 6 +-
net/core/xdp.c | 94 ++--
samples/bpf/.gitignore | 1 +
samples/bpf/Makefile | 4 +
samples/bpf/xdp_redirect.bpf.c | 34 ++
samples/bpf/xdp_trafficgen_user.c | 421 ++++++++++++++++++
tools/include/uapi/linux/bpf.h | 2 +
.../bpf/prog_tests/xdp_do_redirect.c | 74 +++
.../bpf/progs/test_xdp_do_redirect.c | 34 ++
19 files changed, 948 insertions(+), 94 deletions(-)
create mode 100644 samples/bpf/xdp_trafficgen_user.c
create mode 100644 tools/testing/selftests/bpf/prog_tests/xdp_do_redirect.c
create mode 100644 tools/testing/selftests/bpf/progs/test_xdp_do_redirect.c
--
2.34.0
^ permalink raw reply
* [PATCH V2 net] sch_cake: do not call cake_destroy() from cake_init()
From: Eric Dumazet @ 2021-12-10 14:20 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski
Cc: netdev, Eric Dumazet, Eric Dumazet, syzbot,
Toke Høiland-Jørgensen
From: Eric Dumazet <edumazet@google.com>
qdiscs are not supposed to call their own destroy() method
from init(), because core stack already does that.
syzbot was able to trigger use after free:
DEBUG_LOCKS_WARN_ON(lock->magic != lock)
WARNING: CPU: 0 PID: 21902 at kernel/locking/mutex.c:586 __mutex_lock_common kernel/locking/mutex.c:586 [inline]
WARNING: CPU: 0 PID: 21902 at kernel/locking/mutex.c:586 __mutex_lock+0x9ec/0x12f0 kernel/locking/mutex.c:740
Modules linked in:
CPU: 0 PID: 21902 Comm: syz-executor189 Not tainted 5.16.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:__mutex_lock_common kernel/locking/mutex.c:586 [inline]
RIP: 0010:__mutex_lock+0x9ec/0x12f0 kernel/locking/mutex.c:740
Code: 08 84 d2 0f 85 19 08 00 00 8b 05 97 38 4b 04 85 c0 0f 85 27 f7 ff ff 48 c7 c6 20 00 ac 89 48 c7 c7 a0 fe ab 89 e8 bf 76 ba ff <0f> 0b e9 0d f7 ff ff 48 8b 44 24 40 48 8d b8 c8 08 00 00 48 89 f8
RSP: 0018:ffffc9000627f290 EFLAGS: 00010282
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: ffff88802315d700 RSI: ffffffff815f1db8 RDI: fffff52000c4fe44
RBP: ffff88818f28e000 R08: 0000000000000000 R09: 0000000000000000
R10: ffffffff815ebb5e R11: 0000000000000000 R12: 0000000000000000
R13: dffffc0000000000 R14: ffffc9000627f458 R15: 0000000093c30000
FS: 0000555556abc400(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fda689c3303 CR3: 000000001cfbb000 CR4: 0000000000350ef0
Call Trace:
<TASK>
tcf_chain0_head_change_cb_del+0x2e/0x3d0 net/sched/cls_api.c:810
tcf_block_put_ext net/sched/cls_api.c:1381 [inline]
tcf_block_put_ext net/sched/cls_api.c:1376 [inline]
tcf_block_put+0xbc/0x130 net/sched/cls_api.c:1394
cake_destroy+0x3f/0x80 net/sched/sch_cake.c:2695
qdisc_create.constprop.0+0x9da/0x10f0 net/sched/sch_api.c:1293
tc_modify_qdisc+0x4c5/0x1980 net/sched/sch_api.c:1660
rtnetlink_rcv_msg+0x413/0xb80 net/core/rtnetlink.c:5571
netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2496
netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline]
netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1345
netlink_sendmsg+0x904/0xdf0 net/netlink/af_netlink.c:1921
sock_sendmsg_nosec net/socket.c:704 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:724
____sys_sendmsg+0x6e8/0x810 net/socket.c:2409
___sys_sendmsg+0xf3/0x170 net/socket.c:2463
__sys_sendmsg+0xe5/0x1b0 net/socket.c:2492
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f1bb06badb9
Code: Unable to access opcode bytes at RIP 0x7f1bb06bad8f.
RSP: 002b:00007fff3012a658 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f1bb06badb9
RDX: 0000000000000000 RSI: 00000000200007c0 RDI: 0000000000000003
RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000000003
R10: 0000000000000003 R11: 0000000000000246 R12: 00007fff3012a688
R13: 00007fff3012a6a0 R14: 00007fff3012a6e0 R15: 00000000000013c2
</TASK>
Fixes: 046f6fd5daef ("sched: Add Common Applications Kept Enhanced (cake) qdisc")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>
---
net/sched/sch_cake.c | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)
diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c
index 3c2300d144681869a37ada0d20966f9b5b145653..857aaebd49f4315502928fb1f75d2c85eb63eb51 100644
--- a/net/sched/sch_cake.c
+++ b/net/sched/sch_cake.c
@@ -2736,7 +2736,7 @@ static int cake_init(struct Qdisc *sch, struct nlattr *opt,
q->tins = kvcalloc(CAKE_MAX_TINS, sizeof(struct cake_tin_data),
GFP_KERNEL);
if (!q->tins)
- goto nomem;
+ return -ENOMEM;
for (i = 0; i < CAKE_MAX_TINS; i++) {
struct cake_tin_data *b = q->tins + i;
@@ -2766,10 +2766,6 @@ static int cake_init(struct Qdisc *sch, struct nlattr *opt,
q->min_netlen = ~0;
q->min_adjlen = ~0;
return 0;
-
-nomem:
- cake_destroy(sch);
- return -ENOMEM;
}
static int cake_dump(struct Qdisc *sch, struct sk_buff *skb)
--
2.34.1.173.g76aa8bc2d0-goog
^ permalink raw reply related
* Re: [PATCH] netfilter: fix regression in looped (broad|multi)cast's MAC handing
From: Denis Kirjanov @ 2021-12-10 14:03 UTC (permalink / raw)
To: Ignacy Gawędzki, netdev
In-Reply-To: <20211210122600.mrduxdw2uwpwoqbr@zenon.in.qult.net>
12/10/21 3:26 PM, Ignacy Gawędzki пишет:
> In 5648b5e1169f, the test for non-empty MAC header introduced in
> 2c38de4c1f8da7 has been replaced with a test for a set MAC header,
> which breaks the case when the MAC header has been reset (using
> skb_reset_mac_header), as is the case with looped-back multicast
> packets.
>
> This patch adds a test for a non-empty MAC header in addition to the
> test for a set MAC header. The same two tests are also implemented in
> nfnetlink_log.c, where the initial code of 2c38de4c1f8da7 has not been
> touched, but where supposedly the same situation may happen.
>
Fixes: 2c38de4c1f8da7 ("netfilter: fix looped (broad|multi)cast's MAC
handling")
> Signed-off-by: Ignacy Gawędzki <ignacy.gawedzki@green-communications.fr>
> ---
> net/netfilter/nfnetlink_log.c | 3 ++-
> net/netfilter/nfnetlink_queue.c | 3 ++-
> 2 files changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/net/netfilter/nfnetlink_log.c b/net/netfilter/nfnetlink_log.c
> index 691ef4cffdd9..7f83f9697fc1 100644
> --- a/net/netfilter/nfnetlink_log.c
> +++ b/net/netfilter/nfnetlink_log.c
> @@ -556,7 +556,8 @@ __build_packet_message(struct nfnl_log_net *log,
> goto nla_put_failure;
>
> if (indev && skb->dev &&
> - skb->mac_header != skb->network_header) {
> + skb_mac_header_was_set(skb) &&
> + skb_mac_header_len(skb) != 0) {
> struct nfulnl_msg_packet_hw phw;
> int len;
>
> diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c
> index 4acc4b8e9fe5..959527708e38 100644
> --- a/net/netfilter/nfnetlink_queue.c
> +++ b/net/netfilter/nfnetlink_queue.c
> @@ -560,7 +560,8 @@ nfqnl_build_packet_message(struct net *net, struct nfqnl_instance *queue,
> goto nla_put_failure;
>
> if (indev && entskb->dev &&
> - skb_mac_header_was_set(entskb)) {
> + skb_mac_header_was_set(entskb) &&
> + skb_mac_header_len(entskb) != 0) {
> struct nfqnl_msg_packet_hw phw;
> int len;
>
>
^ permalink raw reply
* Re: [PATCH v3, 1/2] yaml: Add dm9051 SPI network yaml file
From: Rob Herring @ 2021-12-10 14:02 UTC (permalink / raw)
To: JosephCHANG
Cc: Jakub Kicinski, linux-kernel, netdev, joseph_chang, devicetree,
Rob Herring, David S . Miller
In-Reply-To: <20211210084021.13993-2-josright123@gmail.com>
On Fri, 10 Dec 2021 16:40:20 +0800, JosephCHANG wrote:
> For support davicom dm9051 device tree configure
>
> Signed-off-by: JosephCHANG <josright123@gmail.com>
> ---
> .../bindings/net/davicom,dm9051.yaml | 71 +++++++++++++++++++
> 1 file changed, 71 insertions(+)
> create mode 100644 Documentation/devicetree/bindings/net/davicom,dm9051.yaml
>
My bot found errors running 'make DT_CHECKER_FLAGS=-m dt_binding_check'
on your patch (DT_CHECKER_FLAGS is new in v5.13):
yamllint warnings/errors:
dtschema/dtc warnings/errors:
/builds/robherring/linux-dt-review/Documentation/devicetree/bindings/net/davicom,dm9051.example.dt.yaml: dm9051@0: $nodename:0: 'dm9051@0' does not match '^ethernet(@.*)?$'
From schema: /builds/robherring/linux-dt-review/Documentation/devicetree/bindings/net/davicom,dm9051.yaml
doc reference errors (make refcheckdocs):
See https://patchwork.ozlabs.org/patch/1566354
This check can fail if there are any dependencies. The base for a patch
series is generally the most recent rc1.
If you already ran 'make dt_binding_check' and didn't see the above
error(s), then make sure 'yamllint' is installed and dt-schema is up to
date:
pip3 install dtschema --upgrade
Please check and re-submit.
^ permalink raw reply
* Re: [PATCH net-next v8 4/6] net: dt-bindings: dwmac: Convert mediatek-dwmac to DT schema
From: Rob Herring @ 2021-12-10 14:02 UTC (permalink / raw)
To: Biao Huang
Cc: Jose Abreu, Rob Herring, Matthias Brugger, netdev, srv_heupstream,
linux-arm-kernel, linux-stm32, davem, angelogioacchino.delregno,
devicetree, Alexandre Torgue, linux-kernel, Jakub Kicinski,
dkirjanov, linux-mediatek, Giuseppe Cavallaro, macpaul.lin,
Maxime Coquelin
In-Reply-To: <20211210013129.811-5-biao.huang@mediatek.com>
On Fri, 10 Dec 2021 09:31:27 +0800, Biao Huang wrote:
> Convert mediatek-dwmac to DT schema, and delete old mediatek-dwmac.txt.
> And there are some changes in .yaml than .txt, others almost keep the same:
> 1. compatible "const: snps,dwmac-4.20".
> 2. delete "snps,reset-active-low;" in example, since driver remove this
> property long ago.
> 3. add "snps,reset-delay-us = <0 10000 10000>" in example.
> 4. the example is for rgmii interface, keep related properties only.
>
> Signed-off-by: Biao Huang <biao.huang@mediatek.com>
> ---
> .../bindings/net/mediatek-dwmac.txt | 91 ----------
> .../bindings/net/mediatek-dwmac.yaml | 156 ++++++++++++++++++
> 2 files changed, 156 insertions(+), 91 deletions(-)
> delete mode 100644 Documentation/devicetree/bindings/net/mediatek-dwmac.txt
> create mode 100644 Documentation/devicetree/bindings/net/mediatek-dwmac.yaml
>
Running 'make dtbs_check' with the schema in this patch gives the
following warnings. Consider if they are expected or the schema is
incorrect. These may not be new warnings.
Note that it is not yet a requirement to have 0 warnings for dtbs_check.
This will change in the future.
Full log is available here: https://patchwork.ozlabs.org/patch/1566169
ethernet@1101c000: clock-names: ['axi', 'apb', 'mac_main', 'ptp_ref'] is too short
arch/arm64/boot/dts/mediatek/mt2712-evb.dt.yaml
ethernet@1101c000: clocks: [[27, 34], [27, 37], [6, 154], [6, 155]] is too short
arch/arm64/boot/dts/mediatek/mt2712-evb.dt.yaml
ethernet@1101c000: compatible: ['mediatek,mt2712-gmac'] does not contain items matching the given schema
arch/arm64/boot/dts/mediatek/mt2712-evb.dt.yaml
ethernet@1101c000: compatible: 'oneOf' conditional failed, one must be fixed:
arch/arm64/boot/dts/mediatek/mt2712-evb.dt.yaml
ethernet@1101c000: Unevaluated properties are not allowed ('compatible', 'reg', 'interrupts', 'interrupt-names', 'mac-address', 'clock-names', 'clocks', 'power-domains', 'snps,axi-config', 'snps,mtl-rx-config', 'snps,mtl-tx-config', 'snps,txpbl', 'snps,rxpbl', 'clk_csr', 'phy-mode', 'phy-handle', 'snps,reset-gpio', 'mdio' were unexpected)
arch/arm64/boot/dts/mediatek/mt2712-evb.dt.yaml
^ permalink raw reply
* Re: [PATCH net-next v8 6/6] net: dt-bindings: dwmac: add support for mt8195
From: Rob Herring @ 2021-12-10 14:02 UTC (permalink / raw)
To: Biao Huang
Cc: Jose Abreu, Jakub Kicinski, Maxime Coquelin, linux-stm32,
srv_heupstream, macpaul.lin, linux-mediatek, Matthias Brugger,
Rob Herring, angelogioacchino.delregno, linux-kernel, dkirjanov,
linux-arm-kernel, netdev, davem, Giuseppe Cavallaro,
Alexandre Torgue, devicetree
In-Reply-To: <20211210013129.811-7-biao.huang@mediatek.com>
On Fri, 10 Dec 2021 09:31:29 +0800, Biao Huang wrote:
> Add binding document for the ethernet on mt8195.
>
> Signed-off-by: Biao Huang <biao.huang@mediatek.com>
> ---
> .../bindings/net/mediatek-dwmac.yaml | 86 +++++++++++++++----
> 1 file changed, 70 insertions(+), 16 deletions(-)
>
Running 'make dtbs_check' with the schema in this patch gives the
following warnings. Consider if they are expected or the schema is
incorrect. These may not be new warnings.
Note that it is not yet a requirement to have 0 warnings for dtbs_check.
This will change in the future.
Full log is available here: https://patchwork.ozlabs.org/patch/1566168
ethernet@1101c000: clock-names: ['axi', 'apb', 'mac_main', 'ptp_ref'] is too short
arch/arm64/boot/dts/mediatek/mt2712-evb.dt.yaml
ethernet@1101c000: clocks: [[27, 34], [27, 37], [6, 154], [6, 155]] is too short
arch/arm64/boot/dts/mediatek/mt2712-evb.dt.yaml
ethernet@1101c000: compatible: ['mediatek,mt2712-gmac'] does not contain items matching the given schema
arch/arm64/boot/dts/mediatek/mt2712-evb.dt.yaml
ethernet@1101c000: compatible: 'oneOf' conditional failed, one must be fixed:
arch/arm64/boot/dts/mediatek/mt2712-evb.dt.yaml
ethernet@1101c000: Unevaluated properties are not allowed ('compatible', 'reg', 'interrupts', 'interrupt-names', 'mac-address', 'clock-names', 'clocks', 'assigned-clocks', 'assigned-clock-parents', 'power-domains', 'snps,axi-config', 'snps,mtl-rx-config', 'snps,mtl-tx-config', 'snps,txpbl', 'snps,rxpbl', 'clk_csr', 'phy-mode', 'phy-handle', 'snps,reset-gpio', 'mdio' were unexpected)
arch/arm64/boot/dts/mediatek/mt2712-evb.dt.yaml
^ permalink raw reply
* Re: [PATCH] igc: Avoid possible deadlock during suspend/resume
From: Thorsten Leemhuis @ 2021-12-10 14:01 UTC (permalink / raw)
To: Stefan Dietrich, Vinicius Costa Gomes
Cc: kuba, greg, netdev, intel-wired-lan, regressions
In-Reply-To: <8e59b7d6b5d4674d5843bb45dde89e9881d0c741.camel@gmx.de>
On 10.12.21 14:45, Stefan Dietrich wrote:
>
> thanks for keeping an eye on the issue. I've sent the files in private
> because I did not want to spam the mailing lists with them. Please let
> me know if this is the correct procedure.
It's likely okay in this case, but FWIW: most of the time it's the wrong
thing to do as outlined here:
https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html#general-advice-for-further-interactions
One reason for this: others that might want to look into the issue now
or a in a year or two might be unable to if crucial data was only sent
in private.
Ciao, Thorsten
> On Fri, 2021-12-10 at 10:40 +0100, Thorsten Leemhuis wrote:
>> Hi, this is your Linux kernel regression tracker speaking.
>>
>> On 02.12.21 23:34, Vinicius Costa Gomes wrote:
>>> Hi Stefan,
>>>
>>> Stefan Dietrich <roots@gmx.de> writes:
>>>
>>>> Hi Vinicius,
>>>>
>>>> thanks for the patch - unfortunately it did not solve the issue
>>>> and I
>>>> am still getting reboots/lockups.
>>>>
>>>
>>> Thanks for the test. We learned something, not a lot, but
>>> something: the
>>> problem you are facing is PTM related and it's not the same bug as
>>> that
>>> PM deadlock.
>>>
>>> I am still trying to understand what's going on.
>>>
>>> Are you able to send me the 'dmesg' output for the two kernel
>>> configs
>>> (CONFIG_PCIE_PTM enabled and disabled)? (no need to bring the
>>> network
>>> interface up or down). Your kernel .config would be useful as well.
>>
>> Stefan, could you provide the data Vinicius asked for? Or did you do
>> that in private already? Or was progress made somewhere else and I
>> simply missed this?
>>
>> Ciao, Thorsten, your Linux kernel regression tracker.
>>
>> P.S.: As a Linux kernel regression tracker I'm getting a lot of
>> reports
>> on my table. I can only look briefly into most of them. Unfortunately
>> therefore I sometimes will get things wrong or miss something
>> important.
>> I hope that's not the case here; if you think it is, don't hesitate
>> to
>> tell me about it in a public reply. That's in everyone's interest, as
>> what I wrote above might be misleading to everyone reading this; any
>> suggestion I gave they thus might sent someone reading this down the
>> wrong rabbit hole, which none of us wants.
>>
>> BTW, I have no personal interest in this issue, which is tracked
>> using
>> regzbot, my Linux kernel regression tracking bot
>> (https://linux-regtracking.leemhuis.info/regzbot/). I'm only posting
>> this mail to get things rolling again and hence don't need to be CC
>> on
>> all further activities wrt to this regression.
>>
>> #regzbot poke
>>
>>>> On Wed, 2021-12-01 at 10:57 -0800, Vinicius Costa Gomes wrote:
>>>>> Inspired by:
>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=215129
>>>>>
>>>>> Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
>>>>> ---
>>>>> Just to see if it's indeed the same problem as the bug report
>>>>> above.
>>>>>
>>>>> drivers/net/ethernet/intel/igc/igc_main.c | 19 +++++++++++++
>>>>> ------
>>>>> 1 file changed, 13 insertions(+), 6 deletions(-)
>>>>>
>>>>> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c
>>>>> b/drivers/net/ethernet/intel/igc/igc_main.c
>>>>> index 0e19b4d02e62..c58bf557a2a1 100644
>>>>> --- a/drivers/net/ethernet/intel/igc/igc_main.c
>>>>> +++ b/drivers/net/ethernet/intel/igc/igc_main.c
>>>>> @@ -6619,7 +6619,7 @@ static void
>>>>> igc_deliver_wake_packet(struct
>>>>> net_device *netdev)
>>>>> netif_rx(skb);
>>>>> }
>>>>>
>>>>> -static int __maybe_unused igc_resume(struct device *dev)
>>>>> +static int __maybe_unused __igc_resume(struct device *dev,
>>>>> bool rpm)
>>>>> {
>>>>> struct pci_dev *pdev = to_pci_dev(dev);
>>>>> struct net_device *netdev = pci_get_drvdata(pdev);
>>>>> @@ -6661,20 +6661,27 @@ static int __maybe_unused
>>>>> igc_resume(struct
>>>>> device *dev)
>>>>>
>>>>> wr32(IGC_WUS, ~0);
>>>>>
>>>>> - rtnl_lock();
>>>>> + if (!rpm)
>>>>> + rtnl_lock();
>>>>> if (!err && netif_running(netdev))
>>>>> err = __igc_open(netdev, true);
>>>>>
>>>>> if (!err)
>>>>> netif_device_attach(netdev);
>>>>> - rtnl_unlock();
>>>>> + if (!rpm)
>>>>> + rtnl_unlock();
>>>>>
>>>>> return err;
>>>>> }
>>>>>
>>>>> static int __maybe_unused igc_runtime_resume(struct device
>>>>> *dev)
>>>>> {
>>>>> - return igc_resume(dev);
>>>>> + return __igc_resume(dev, true);
>>>>> +}
>>>>> +
>>>>> +static int __maybe_unused igc_resume(struct device *dev)
>>>>> +{
>>>>> + return __igc_resume(dev, false);
>>>>> }
>>>>>
>>>>> static int __maybe_unused igc_suspend(struct device *dev)
>>>>> @@ -6738,7 +6745,7 @@ static pci_ers_result_t
>>>>> igc_io_error_detected(struct pci_dev *pdev,
>>>>> * @pdev: Pointer to PCI device
>>>>> *
>>>>> * Restart the card from scratch, as if from a cold-boot.
>>>>> Implementation
>>>>> - * resembles the first-half of the igc_resume routine.
>>>>> + * resembles the first-half of the __igc_resume routine.
>>>>> **/
>>>>> static pci_ers_result_t igc_io_slot_reset(struct pci_dev
>>>>> *pdev)
>>>>> {
>>>>> @@ -6777,7 +6784,7 @@ static pci_ers_result_t
>>>>> igc_io_slot_reset(struct pci_dev *pdev)
>>>>> *
>>>>> * This callback is called when the error recovery driver
>>>>> tells us
>>>>> that
>>>>> * its OK to resume normal operation. Implementation
>>>>> resembles the
>>>>> - * second-half of the igc_resume routine.
>>>>> + * second-half of the __igc_resume routine.
>>>>> */
>>>>> static void igc_io_resume(struct pci_dev *pdev)
>>>>> {
>>>
>>> Cheers,
>>>
>
>
>
^ permalink raw reply
* [PATCH bpf-next v2 4/4] selftests/bpf: add test cases for bpf_strncmp()
From: Hou Tao @ 2021-12-10 14:16 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Martin KaFai Lau, Yonghong Song, Daniel Borkmann, Andrii Nakryiko,
netdev, bpf, houtao1
In-Reply-To: <20211210141652.877186-1-houtao1@huawei.com>
Four test cases are added:
(1) ensure the return value is expected
(2) ensure no const string size is rejected
(3) ensure writable target is rejected
(4) ensure no null-terminated target is rejected
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
.../selftests/bpf/prog_tests/test_strncmp.c | 167 ++++++++++++++++++
.../selftests/bpf/progs/strncmp_test.c | 54 ++++++
2 files changed, 221 insertions(+)
create mode 100644 tools/testing/selftests/bpf/prog_tests/test_strncmp.c
create mode 100644 tools/testing/selftests/bpf/progs/strncmp_test.c
diff --git a/tools/testing/selftests/bpf/prog_tests/test_strncmp.c b/tools/testing/selftests/bpf/prog_tests/test_strncmp.c
new file mode 100644
index 000000000000..b57a3009465f
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/test_strncmp.c
@@ -0,0 +1,167 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2021. Huawei Technologies Co., Ltd */
+#include <test_progs.h>
+#include "strncmp_test.skel.h"
+
+static int trigger_strncmp(const struct strncmp_test *skel)
+{
+ int cmp;
+
+ usleep(1);
+
+ cmp = skel->bss->cmp_ret;
+ if (cmp > 0)
+ return 1;
+ if (cmp < 0)
+ return -1;
+ return 0;
+}
+
+/*
+ * Compare str and target after making str[i] != target[i].
+ * When exp is -1, make str[i] < target[i] and delta = -1.
+ */
+static void strncmp_full_str_cmp(struct strncmp_test *skel, const char *name,
+ int exp)
+{
+ size_t nr = sizeof(skel->bss->str);
+ char *str = skel->bss->str;
+ int delta = exp;
+ int got;
+ size_t i;
+
+ memcpy(str, skel->rodata->target, nr);
+ for (i = 0; i < nr - 1; i++) {
+ str[i] += delta;
+
+ got = trigger_strncmp(skel);
+ ASSERT_EQ(got, exp, name);
+
+ str[i] -= delta;
+ }
+}
+
+static void test_strncmp_ret(void)
+{
+ struct strncmp_test *skel;
+ struct bpf_program *prog;
+ int err, got;
+
+ skel = strncmp_test__open();
+ if (!ASSERT_OK_PTR(skel, "strncmp_test open"))
+ return;
+
+ bpf_object__for_each_program(prog, skel->obj)
+ bpf_program__set_autoload(prog, false);
+
+ bpf_program__set_autoload(skel->progs.do_strncmp, true);
+
+ err = strncmp_test__load(skel);
+ if (!ASSERT_EQ(err, 0, "strncmp_test load"))
+ goto out;
+
+ err = strncmp_test__attach(skel);
+ if (!ASSERT_EQ(err, 0, "strncmp_test attach"))
+ goto out;
+
+ skel->bss->target_pid = getpid();
+
+ /* Empty str */
+ skel->bss->str[0] = '\0';
+ got = trigger_strncmp(skel);
+ ASSERT_EQ(got, -1, "strncmp: empty str");
+
+ /* Same string */
+ memcpy(skel->bss->str, skel->rodata->target, sizeof(skel->bss->str));
+ got = trigger_strncmp(skel);
+ ASSERT_EQ(got, 0, "strncmp: same str");
+
+ /* Not-null-termainted string */
+ memcpy(skel->bss->str, skel->rodata->target, sizeof(skel->bss->str));
+ skel->bss->str[sizeof(skel->bss->str) - 1] = 'A';
+ got = trigger_strncmp(skel);
+ ASSERT_EQ(got, 1, "strncmp: not-null-term str");
+
+ strncmp_full_str_cmp(skel, "strncmp: less than", -1);
+ strncmp_full_str_cmp(skel, "strncmp: greater than", 1);
+out:
+ strncmp_test__destroy(skel);
+}
+
+static void test_strncmp_bad_not_const_str_size(void)
+{
+ struct strncmp_test *skel;
+ struct bpf_program *prog;
+ int err;
+
+ skel = strncmp_test__open();
+ if (!ASSERT_OK_PTR(skel, "strncmp_test open"))
+ return;
+
+ bpf_object__for_each_program(prog, skel->obj)
+ bpf_program__set_autoload(prog, false);
+
+ bpf_program__set_autoload(skel->progs.strncmp_bad_not_const_str_size,
+ true);
+
+ err = strncmp_test__load(skel);
+ ASSERT_ERR(err, "strncmp_test load bad_not_const_str_size");
+
+ strncmp_test__destroy(skel);
+}
+
+static void test_strncmp_bad_writable_target(void)
+{
+ struct strncmp_test *skel;
+ struct bpf_program *prog;
+ int err;
+
+ skel = strncmp_test__open();
+ if (!ASSERT_OK_PTR(skel, "strncmp_test open"))
+ return;
+
+ bpf_object__for_each_program(prog, skel->obj)
+ bpf_program__set_autoload(prog, false);
+
+ bpf_program__set_autoload(skel->progs.strncmp_bad_writable_target,
+ true);
+
+ err = strncmp_test__load(skel);
+ ASSERT_ERR(err, "strncmp_test load bad_writable_target");
+
+ strncmp_test__destroy(skel);
+}
+
+static void test_strncmp_bad_not_null_term_target(void)
+{
+ struct strncmp_test *skel;
+ struct bpf_program *prog;
+ int err;
+
+ skel = strncmp_test__open();
+ if (!ASSERT_OK_PTR(skel, "strncmp_test open"))
+ return;
+
+ bpf_object__for_each_program(prog, skel->obj)
+ bpf_program__set_autoload(prog, false);
+
+ bpf_program__set_autoload(skel->progs.strncmp_bad_not_null_term_target,
+ true);
+
+ err = strncmp_test__load(skel);
+ ASSERT_ERR(err, "strncmp_test load bad_not_null_term_target");
+
+ strncmp_test__destroy(skel);
+}
+
+void test_test_strncmp(void)
+{
+ if (test__start_subtest("strncmp_ret"))
+ test_strncmp_ret();
+ if (test__start_subtest("strncmp_bad_not_const_str_size"))
+ test_strncmp_bad_not_const_str_size();
+ if (test__start_subtest("strncmp_bad_writable_target"))
+ test_strncmp_bad_writable_target();
+ if (test__start_subtest("strncmp_bad_not_null_term_target"))
+ test_strncmp_bad_not_null_term_target();
+}
diff --git a/tools/testing/selftests/bpf/progs/strncmp_test.c b/tools/testing/selftests/bpf/progs/strncmp_test.c
new file mode 100644
index 000000000000..900d930d48a8
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/strncmp_test.c
@@ -0,0 +1,54 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2021. Huawei Technologies Co., Ltd */
+#include <stdbool.h>
+#include <linux/types.h>
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+#define STRNCMP_STR_SZ 8
+
+const char target[STRNCMP_STR_SZ] = "EEEEEEE";
+char str[STRNCMP_STR_SZ];
+int cmp_ret = 0;
+int target_pid = 0;
+
+const char no_str_target[STRNCMP_STR_SZ] = "12345678";
+char writable_target[STRNCMP_STR_SZ];
+unsigned int no_const_str_size = STRNCMP_STR_SZ;
+
+char _license[] SEC("license") = "GPL";
+
+SEC("tp/syscalls/sys_enter_nanosleep")
+int do_strncmp(void *ctx)
+{
+ if ((bpf_get_current_pid_tgid() >> 32) != target_pid)
+ return 0;
+
+ cmp_ret = bpf_strncmp(str, STRNCMP_STR_SZ, target);
+ return 0;
+}
+
+SEC("tp/syscalls/sys_enter_nanosleep")
+int strncmp_bad_not_const_str_size(void *ctx)
+{
+ /* The value of string size is not const, so will fail */
+ cmp_ret = bpf_strncmp(str, no_const_str_size, target);
+ return 0;
+}
+
+SEC("tp/syscalls/sys_enter_nanosleep")
+int strncmp_bad_writable_target(void *ctx)
+{
+ /* Compared target is not read-only, so will fail */
+ cmp_ret = bpf_strncmp(str, STRNCMP_STR_SZ, writable_target);
+ return 0;
+}
+
+SEC("tp/syscalls/sys_enter_nanosleep")
+int strncmp_bad_not_null_term_target(void *ctx)
+{
+ /* Compared target is not null-terminated, so will fail */
+ cmp_ret = bpf_strncmp(str, STRNCMP_STR_SZ, no_str_target);
+ return 0;
+}
--
2.29.2
^ permalink raw reply related
* [PATCH bpf-next v2 2/4] selftests/bpf: fix checkpatch error on empty function parameter
From: Hou Tao @ 2021-12-10 14:16 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Martin KaFai Lau, Yonghong Song, Daniel Borkmann, Andrii Nakryiko,
netdev, bpf, houtao1
In-Reply-To: <20211210141652.877186-1-houtao1@huawei.com>
Fix checkpatch error: "ERROR: Bad function definition - void foo()
should probably be void foo(void)". Most replacements are done by
the following command:
sed -i 's#\([a-z]\)()$#\1(void)#g' testing/selftests/bpf/benchs/*.c
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
tools/testing/selftests/bpf/bench.c | 2 +-
tools/testing/selftests/bpf/bench.h | 9 +++----
.../selftests/bpf/benchs/bench_count.c | 2 +-
.../selftests/bpf/benchs/bench_rename.c | 16 ++++++-------
.../selftests/bpf/benchs/bench_ringbufs.c | 14 +++++------
.../selftests/bpf/benchs/bench_trigger.c | 24 +++++++++----------
6 files changed, 34 insertions(+), 33 deletions(-)
diff --git a/tools/testing/selftests/bpf/bench.c b/tools/testing/selftests/bpf/bench.c
index 3d6082b97a56..ffe5752f3324 100644
--- a/tools/testing/selftests/bpf/bench.c
+++ b/tools/testing/selftests/bpf/bench.c
@@ -39,7 +39,7 @@ static int bump_memlock_rlimit(void)
return setrlimit(RLIMIT_MEMLOCK, &rlim_new);
}
-void setup_libbpf()
+void setup_libbpf(void)
{
int err;
diff --git a/tools/testing/selftests/bpf/bench.h b/tools/testing/selftests/bpf/bench.h
index 50785503756b..fb3e213df3dc 100644
--- a/tools/testing/selftests/bpf/bench.h
+++ b/tools/testing/selftests/bpf/bench.h
@@ -38,8 +38,8 @@ struct bench_res {
struct bench {
const char *name;
- void (*validate)();
- void (*setup)();
+ void (*validate)(void);
+ void (*setup)(void);
void *(*producer_thread)(void *ctx);
void *(*consumer_thread)(void *ctx);
void (*measure)(struct bench_res* res);
@@ -54,7 +54,7 @@ struct counter {
extern struct env env;
extern const struct bench *bench;
-void setup_libbpf();
+void setup_libbpf(void);
void hits_drops_report_progress(int iter, struct bench_res *res, long delta_ns);
void hits_drops_report_final(struct bench_res res[], int res_cnt);
void false_hits_report_progress(int iter, struct bench_res *res, long delta_ns);
@@ -62,7 +62,8 @@ void false_hits_report_final(struct bench_res res[], int res_cnt);
void ops_report_progress(int iter, struct bench_res *res, long delta_ns);
void ops_report_final(struct bench_res res[], int res_cnt);
-static inline __u64 get_time_ns() {
+static inline __u64 get_time_ns(void)
+{
struct timespec t;
clock_gettime(CLOCK_MONOTONIC, &t);
diff --git a/tools/testing/selftests/bpf/benchs/bench_count.c b/tools/testing/selftests/bpf/benchs/bench_count.c
index befba7a82643..078972ce208e 100644
--- a/tools/testing/selftests/bpf/benchs/bench_count.c
+++ b/tools/testing/selftests/bpf/benchs/bench_count.c
@@ -36,7 +36,7 @@ static struct count_local_ctx {
struct counter *hits;
} count_local_ctx;
-static void count_local_setup()
+static void count_local_setup(void)
{
struct count_local_ctx *ctx = &count_local_ctx;
diff --git a/tools/testing/selftests/bpf/benchs/bench_rename.c b/tools/testing/selftests/bpf/benchs/bench_rename.c
index c7ec114eca56..3c203b6d6a6e 100644
--- a/tools/testing/selftests/bpf/benchs/bench_rename.c
+++ b/tools/testing/selftests/bpf/benchs/bench_rename.c
@@ -11,7 +11,7 @@ static struct ctx {
int fd;
} ctx;
-static void validate()
+static void validate(void)
{
if (env.producer_cnt != 1) {
fprintf(stderr, "benchmark doesn't support multi-producer!\n");
@@ -43,7 +43,7 @@ static void measure(struct bench_res *res)
res->hits = atomic_swap(&ctx.hits.value, 0);
}
-static void setup_ctx()
+static void setup_ctx(void)
{
setup_libbpf();
@@ -71,36 +71,36 @@ static void attach_bpf(struct bpf_program *prog)
}
}
-static void setup_base()
+static void setup_base(void)
{
setup_ctx();
}
-static void setup_kprobe()
+static void setup_kprobe(void)
{
setup_ctx();
attach_bpf(ctx.skel->progs.prog1);
}
-static void setup_kretprobe()
+static void setup_kretprobe(void)
{
setup_ctx();
attach_bpf(ctx.skel->progs.prog2);
}
-static void setup_rawtp()
+static void setup_rawtp(void)
{
setup_ctx();
attach_bpf(ctx.skel->progs.prog3);
}
-static void setup_fentry()
+static void setup_fentry(void)
{
setup_ctx();
attach_bpf(ctx.skel->progs.prog4);
}
-static void setup_fexit()
+static void setup_fexit(void)
{
setup_ctx();
attach_bpf(ctx.skel->progs.prog5);
diff --git a/tools/testing/selftests/bpf/benchs/bench_ringbufs.c b/tools/testing/selftests/bpf/benchs/bench_ringbufs.c
index 52d4a2f91dbd..da8593b3494a 100644
--- a/tools/testing/selftests/bpf/benchs/bench_ringbufs.c
+++ b/tools/testing/selftests/bpf/benchs/bench_ringbufs.c
@@ -88,12 +88,12 @@ const struct argp bench_ringbufs_argp = {
static struct counter buf_hits;
-static inline void bufs_trigger_batch()
+static inline void bufs_trigger_batch(void)
{
(void)syscall(__NR_getpgid);
}
-static void bufs_validate()
+static void bufs_validate(void)
{
if (env.consumer_cnt != 1) {
fprintf(stderr, "rb-libbpf benchmark doesn't support multi-consumer!\n");
@@ -132,7 +132,7 @@ static void ringbuf_libbpf_measure(struct bench_res *res)
res->drops = atomic_swap(&ctx->skel->bss->dropped, 0);
}
-static struct ringbuf_bench *ringbuf_setup_skeleton()
+static struct ringbuf_bench *ringbuf_setup_skeleton(void)
{
struct ringbuf_bench *skel;
@@ -167,7 +167,7 @@ static int buf_process_sample(void *ctx, void *data, size_t len)
return 0;
}
-static void ringbuf_libbpf_setup()
+static void ringbuf_libbpf_setup(void)
{
struct ringbuf_libbpf_ctx *ctx = &ringbuf_libbpf_ctx;
struct bpf_link *link;
@@ -223,7 +223,7 @@ static void ringbuf_custom_measure(struct bench_res *res)
res->drops = atomic_swap(&ctx->skel->bss->dropped, 0);
}
-static void ringbuf_custom_setup()
+static void ringbuf_custom_setup(void)
{
struct ringbuf_custom_ctx *ctx = &ringbuf_custom_ctx;
const size_t page_size = getpagesize();
@@ -352,7 +352,7 @@ static void perfbuf_measure(struct bench_res *res)
res->drops = atomic_swap(&ctx->skel->bss->dropped, 0);
}
-static struct perfbuf_bench *perfbuf_setup_skeleton()
+static struct perfbuf_bench *perfbuf_setup_skeleton(void)
{
struct perfbuf_bench *skel;
@@ -390,7 +390,7 @@ perfbuf_process_sample_raw(void *input_ctx, int cpu,
return LIBBPF_PERF_EVENT_CONT;
}
-static void perfbuf_libbpf_setup()
+static void perfbuf_libbpf_setup(void)
{
struct perfbuf_libbpf_ctx *ctx = &perfbuf_libbpf_ctx;
struct perf_event_attr attr;
diff --git a/tools/testing/selftests/bpf/benchs/bench_trigger.c b/tools/testing/selftests/bpf/benchs/bench_trigger.c
index 049a5ad56f65..7f957c55a3ca 100644
--- a/tools/testing/selftests/bpf/benchs/bench_trigger.c
+++ b/tools/testing/selftests/bpf/benchs/bench_trigger.c
@@ -11,7 +11,7 @@ static struct trigger_ctx {
static struct counter base_hits;
-static void trigger_validate()
+static void trigger_validate(void)
{
if (env.consumer_cnt != 1) {
fprintf(stderr, "benchmark doesn't support multi-consumer!\n");
@@ -45,7 +45,7 @@ static void trigger_measure(struct bench_res *res)
res->hits = atomic_swap(&ctx.skel->bss->hits, 0);
}
-static void setup_ctx()
+static void setup_ctx(void)
{
setup_libbpf();
@@ -67,37 +67,37 @@ static void attach_bpf(struct bpf_program *prog)
}
}
-static void trigger_tp_setup()
+static void trigger_tp_setup(void)
{
setup_ctx();
attach_bpf(ctx.skel->progs.bench_trigger_tp);
}
-static void trigger_rawtp_setup()
+static void trigger_rawtp_setup(void)
{
setup_ctx();
attach_bpf(ctx.skel->progs.bench_trigger_raw_tp);
}
-static void trigger_kprobe_setup()
+static void trigger_kprobe_setup(void)
{
setup_ctx();
attach_bpf(ctx.skel->progs.bench_trigger_kprobe);
}
-static void trigger_fentry_setup()
+static void trigger_fentry_setup(void)
{
setup_ctx();
attach_bpf(ctx.skel->progs.bench_trigger_fentry);
}
-static void trigger_fentry_sleep_setup()
+static void trigger_fentry_sleep_setup(void)
{
setup_ctx();
attach_bpf(ctx.skel->progs.bench_trigger_fentry_sleep);
}
-static void trigger_fmodret_setup()
+static void trigger_fmodret_setup(void)
{
setup_ctx();
attach_bpf(ctx.skel->progs.bench_trigger_fmodret);
@@ -183,22 +183,22 @@ static void usetup(bool use_retprobe, bool use_nop)
ctx.skel->links.bench_trigger_uprobe = link;
}
-static void uprobe_setup_with_nop()
+static void uprobe_setup_with_nop(void)
{
usetup(false, true);
}
-static void uretprobe_setup_with_nop()
+static void uretprobe_setup_with_nop(void)
{
usetup(true, true);
}
-static void uprobe_setup_without_nop()
+static void uprobe_setup_without_nop(void)
{
usetup(false, false);
}
-static void uretprobe_setup_without_nop()
+static void uretprobe_setup_without_nop(void)
{
usetup(true, false);
}
--
2.29.2
^ permalink raw reply related
* [PATCH bpf-next v2 3/4] selftests/bpf: add benchmark for bpf_strncmp() helper
From: Hou Tao @ 2021-12-10 14:16 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Martin KaFai Lau, Yonghong Song, Daniel Borkmann, Andrii Nakryiko,
netdev, bpf, houtao1
In-Reply-To: <20211210141652.877186-1-houtao1@huawei.com>
Add benchmark to compare the performance between home-made strncmp()
in bpf program and bpf_strncmp() helper. In summary, the performance
win of bpf_strncmp() under x86-64 is greater than 18% when the compared
string length is greater than 64, and is 179% when the length is 4095.
Under arm64 the performance win is even bigger: 33% when the length
is greater than 64 and 600% when the length is 4095.
The following is the details:
no-helper-X: use home-made strncmp() to compare X-sized string
helper-Y: use bpf_strncmp() to compare Y-sized string
Under x86-64:
no-helper-1 3.504 ± 0.000M/s (drops 0.000 ± 0.000M/s)
helper-1 3.347 ± 0.001M/s (drops 0.000 ± 0.000M/s)
no-helper-8 3.357 ± 0.001M/s (drops 0.000 ± 0.000M/s)
helper-8 3.307 ± 0.001M/s (drops 0.000 ± 0.000M/s)
no-helper-32 3.064 ± 0.000M/s (drops 0.000 ± 0.000M/s)
helper-32 3.253 ± 0.001M/s (drops 0.000 ± 0.000M/s)
no-helper-64 2.563 ± 0.001M/s (drops 0.000 ± 0.000M/s)
helper-64 3.040 ± 0.001M/s (drops 0.000 ± 0.000M/s)
no-helper-128 1.975 ± 0.000M/s (drops 0.000 ± 0.000M/s)
helper-128 2.641 ± 0.000M/s (drops 0.000 ± 0.000M/s)
no-helper-512 0.759 ± 0.000M/s (drops 0.000 ± 0.000M/s)
helper-512 1.574 ± 0.000M/s (drops 0.000 ± 0.000M/s)
no-helper-2048 0.329 ± 0.000M/s (drops 0.000 ± 0.000M/s)
helper-2048 0.602 ± 0.000M/s (drops 0.000 ± 0.000M/s)
no-helper-4095 0.117 ± 0.000M/s (drops 0.000 ± 0.000M/s)
helper-4095 0.327 ± 0.000M/s (drops 0.000 ± 0.000M/s)
Under arm64:
no-helper-1 2.806 ± 0.004M/s (drops 0.000 ± 0.000M/s)
helper-1 2.819 ± 0.002M/s (drops 0.000 ± 0.000M/s)
no-helper-8 2.797 ± 0.109M/s (drops 0.000 ± 0.000M/s)
helper-8 2.786 ± 0.025M/s (drops 0.000 ± 0.000M/s)
no-helper-32 2.399 ± 0.011M/s (drops 0.000 ± 0.000M/s)
helper-32 2.703 ± 0.002M/s (drops 0.000 ± 0.000M/s)
no-helper-64 2.020 ± 0.015M/s (drops 0.000 ± 0.000M/s)
helper-64 2.702 ± 0.073M/s (drops 0.000 ± 0.000M/s)
no-helper-128 1.604 ± 0.001M/s (drops 0.000 ± 0.000M/s)
helper-128 2.516 ± 0.002M/s (drops 0.000 ± 0.000M/s)
no-helper-512 0.699 ± 0.000M/s (drops 0.000 ± 0.000M/s)
helper-512 2.106 ± 0.003M/s (drops 0.000 ± 0.000M/s)
no-helper-2048 0.215 ± 0.000M/s (drops 0.000 ± 0.000M/s)
helper-2048 1.223 ± 0.003M/s (drops 0.000 ± 0.000M/s)
no-helper-4095 0.112 ± 0.000M/s (drops 0.000 ± 0.000M/s)
helper-4095 0.796 ± 0.000M/s (drops 0.000 ± 0.000M/s)
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
tools/testing/selftests/bpf/Makefile | 4 +-
tools/testing/selftests/bpf/bench.c | 6 +
.../selftests/bpf/benchs/bench_strncmp.c | 161 ++++++++++++++++++
.../selftests/bpf/benchs/run_bench_strncmp.sh | 12 ++
.../selftests/bpf/progs/strncmp_bench.c | 50 ++++++
5 files changed, 232 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/bpf/benchs/bench_strncmp.c
create mode 100755 tools/testing/selftests/bpf/benchs/run_bench_strncmp.sh
create mode 100644 tools/testing/selftests/bpf/progs/strncmp_bench.c
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index e38626fe3f5d..67ebf6be86f1 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -537,6 +537,7 @@ $(OUTPUT)/bench_ringbufs.o: $(OUTPUT)/ringbuf_bench.skel.h \
$(OUTPUT)/perfbuf_bench.skel.h
$(OUTPUT)/bench_bloom_filter_map.o: $(OUTPUT)/bloom_filter_bench.skel.h
$(OUTPUT)/bench_bpf_loop.o: $(OUTPUT)/bpf_loop_bench.skel.h
+$(OUTPUT)/bench_strncmp.o: $(OUTPUT)/strncmp_bench.skel.h
$(OUTPUT)/bench.o: bench.h testing_helpers.h $(BPFOBJ)
$(OUTPUT)/bench: LDLIBS += -lm
$(OUTPUT)/bench: $(OUTPUT)/bench.o \
@@ -547,7 +548,8 @@ $(OUTPUT)/bench: $(OUTPUT)/bench.o \
$(OUTPUT)/bench_trigger.o \
$(OUTPUT)/bench_ringbufs.o \
$(OUTPUT)/bench_bloom_filter_map.o \
- $(OUTPUT)/bench_bpf_loop.o
+ $(OUTPUT)/bench_bpf_loop.o \
+ $(OUTPUT)/bench_strncmp.o
$(call msg,BINARY,,$@)
$(Q)$(CC) $(LDFLAGS) $(filter %.a %.o,$^) $(LDLIBS) -o $@
diff --git a/tools/testing/selftests/bpf/bench.c b/tools/testing/selftests/bpf/bench.c
index ffe5752f3324..bbb42e2cee0c 100644
--- a/tools/testing/selftests/bpf/bench.c
+++ b/tools/testing/selftests/bpf/bench.c
@@ -205,11 +205,13 @@ static const struct argp_option opts[] = {
extern struct argp bench_ringbufs_argp;
extern struct argp bench_bloom_map_argp;
extern struct argp bench_bpf_loop_argp;
+extern struct argp bench_strncmp_argp;
static const struct argp_child bench_parsers[] = {
{ &bench_ringbufs_argp, 0, "Ring buffers benchmark", 0 },
{ &bench_bloom_map_argp, 0, "Bloom filter map benchmark", 0 },
{ &bench_bpf_loop_argp, 0, "bpf_loop helper benchmark", 0 },
+ { &bench_strncmp_argp, 0, "bpf_strncmp helper benchmark", 0 },
{},
};
@@ -409,6 +411,8 @@ extern const struct bench bench_bloom_false_positive;
extern const struct bench bench_hashmap_without_bloom;
extern const struct bench bench_hashmap_with_bloom;
extern const struct bench bench_bpf_loop;
+extern const struct bench bench_strncmp_no_helper;
+extern const struct bench bench_strncmp_helper;
static const struct bench *benchs[] = {
&bench_count_global,
@@ -441,6 +445,8 @@ static const struct bench *benchs[] = {
&bench_hashmap_without_bloom,
&bench_hashmap_with_bloom,
&bench_bpf_loop,
+ &bench_strncmp_no_helper,
+ &bench_strncmp_helper,
};
static void setup_benchmark()
diff --git a/tools/testing/selftests/bpf/benchs/bench_strncmp.c b/tools/testing/selftests/bpf/benchs/bench_strncmp.c
new file mode 100644
index 000000000000..494b591c0289
--- /dev/null
+++ b/tools/testing/selftests/bpf/benchs/bench_strncmp.c
@@ -0,0 +1,161 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2021. Huawei Technologies Co., Ltd */
+#include <argp.h>
+#include "bench.h"
+#include "strncmp_bench.skel.h"
+
+static struct strncmp_ctx {
+ struct strncmp_bench *skel;
+} ctx;
+
+static struct strncmp_args {
+ u32 cmp_str_len;
+} args = {
+ .cmp_str_len = 32,
+};
+
+enum {
+ ARG_CMP_STR_LEN = 5000,
+};
+
+static const struct argp_option opts[] = {
+ { "cmp-str-len", ARG_CMP_STR_LEN, "CMP_STR_LEN", 0,
+ "Set the length of compared string" },
+ {},
+};
+
+static error_t strncmp_parse_arg(int key, char *arg, struct argp_state *state)
+{
+ switch (key) {
+ case ARG_CMP_STR_LEN:
+ args.cmp_str_len = strtoul(arg, NULL, 10);
+ if (!args.cmp_str_len ||
+ args.cmp_str_len >= sizeof(ctx.skel->bss->str)) {
+ fprintf(stderr, "Invalid cmp str len (limit %zu)\n",
+ sizeof(ctx.skel->bss->str));
+ argp_usage(state);
+ }
+ break;
+ default:
+ return ARGP_ERR_UNKNOWN;
+ }
+
+ return 0;
+}
+
+const struct argp bench_strncmp_argp = {
+ .options = opts,
+ .parser = strncmp_parse_arg,
+};
+
+static void strncmp_validate(void)
+{
+ if (env.consumer_cnt != 1) {
+ fprintf(stderr, "strncmp benchmark doesn't support multi-consumer!\n");
+ exit(1);
+ }
+}
+
+static void strncmp_setup(void)
+{
+ int err;
+ char *target;
+ size_t i, sz;
+
+ sz = sizeof(ctx.skel->rodata->target);
+ if (!sz || sz < sizeof(ctx.skel->bss->str)) {
+ fprintf(stderr, "invalid string size (target %zu, src %zu)\n",
+ sz, sizeof(ctx.skel->bss->str));
+ exit(1);
+ }
+
+ setup_libbpf();
+
+ ctx.skel = strncmp_bench__open();
+ if (!ctx.skel) {
+ fprintf(stderr, "failed to open skeleton\n");
+ exit(1);
+ }
+
+ srandom(time(NULL));
+ target = ctx.skel->rodata->target;
+ for (i = 0; i < sz - 1; i++)
+ target[i] = '1' + random() % 9;
+ target[sz - 1] = '\0';
+
+ ctx.skel->rodata->cmp_str_len = args.cmp_str_len;
+
+ memcpy(ctx.skel->bss->str, target, args.cmp_str_len);
+ ctx.skel->bss->str[args.cmp_str_len] = '\0';
+ /* Make bss->str < rodata->target */
+ ctx.skel->bss->str[args.cmp_str_len - 1] -= 1;
+
+ err = strncmp_bench__load(ctx.skel);
+ if (err) {
+ fprintf(stderr, "failed to load skeleton\n");
+ strncmp_bench__destroy(ctx.skel);
+ exit(1);
+ }
+}
+
+static void strncmp_attach_prog(struct bpf_program *prog)
+{
+ struct bpf_link *link;
+
+ link = bpf_program__attach(prog);
+ if (!link) {
+ fprintf(stderr, "failed to attach program!\n");
+ exit(1);
+ }
+}
+
+static void strncmp_no_helper_setup(void)
+{
+ strncmp_setup();
+ strncmp_attach_prog(ctx.skel->progs.strncmp_no_helper);
+}
+
+static void strncmp_helper_setup(void)
+{
+ strncmp_setup();
+ strncmp_attach_prog(ctx.skel->progs.strncmp_helper);
+}
+
+static void *strncmp_producer(void *ctx)
+{
+ while (true)
+ (void)syscall(__NR_getpgid);
+ return NULL;
+}
+
+static void *strncmp_consumer(void *ctx)
+{
+ return NULL;
+}
+
+static void strncmp_measure(struct bench_res *res)
+{
+ res->hits = atomic_swap(&ctx.skel->bss->hits, 0);
+}
+
+const struct bench bench_strncmp_no_helper = {
+ .name = "strncmp-no-helper",
+ .validate = strncmp_validate,
+ .setup = strncmp_no_helper_setup,
+ .producer_thread = strncmp_producer,
+ .consumer_thread = strncmp_consumer,
+ .measure = strncmp_measure,
+ .report_progress = hits_drops_report_progress,
+ .report_final = hits_drops_report_final,
+};
+
+const struct bench bench_strncmp_helper = {
+ .name = "strncmp-helper",
+ .validate = strncmp_validate,
+ .setup = strncmp_helper_setup,
+ .producer_thread = strncmp_producer,
+ .consumer_thread = strncmp_consumer,
+ .measure = strncmp_measure,
+ .report_progress = hits_drops_report_progress,
+ .report_final = hits_drops_report_final,
+};
diff --git a/tools/testing/selftests/bpf/benchs/run_bench_strncmp.sh b/tools/testing/selftests/bpf/benchs/run_bench_strncmp.sh
new file mode 100755
index 000000000000..142697284b45
--- /dev/null
+++ b/tools/testing/selftests/bpf/benchs/run_bench_strncmp.sh
@@ -0,0 +1,12 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+source ./benchs/run_common.sh
+
+set -eufo pipefail
+
+for s in 1 8 64 512 2048 4095; do
+ for b in no-helper helper; do
+ summarize ${b}-${s} "$($RUN_BENCH --cmp-str-len=$s strncmp-${b})"
+ done
+done
diff --git a/tools/testing/selftests/bpf/progs/strncmp_bench.c b/tools/testing/selftests/bpf/progs/strncmp_bench.c
new file mode 100644
index 000000000000..18373a7df76e
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/strncmp_bench.c
@@ -0,0 +1,50 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2021. Huawei Technologies Co., Ltd */
+#include <linux/types.h>
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+
+#define STRNCMP_STR_SZ 4096
+
+/* Will be updated by benchmark before program loading */
+const volatile unsigned int cmp_str_len = 1;
+const char target[STRNCMP_STR_SZ];
+
+long hits = 0;
+char str[STRNCMP_STR_SZ];
+
+char _license[] SEC("license") = "GPL";
+
+static __always_inline int local_strncmp(const char *s1, unsigned int sz,
+ const char *s2)
+{
+ int ret = 0;
+ unsigned int i;
+
+ for (i = 0; i < sz; i++) {
+ /* E.g. 0xff > 0x31 */
+ ret = (unsigned char)s1[i] - (unsigned char)s2[i];
+ if (ret || !s1[i])
+ break;
+ }
+
+ return ret;
+}
+
+SEC("tp/syscalls/sys_enter_getpgid")
+int strncmp_no_helper(void *ctx)
+{
+ if (local_strncmp(str, cmp_str_len + 1, target) < 0)
+ __sync_add_and_fetch(&hits, 1);
+ return 0;
+}
+
+SEC("tp/syscalls/sys_enter_getpgid")
+int strncmp_helper(void *ctx)
+{
+ if (bpf_strncmp(str, cmp_str_len + 1, target) < 0)
+ __sync_add_and_fetch(&hits, 1);
+ return 0;
+}
+
--
2.29.2
^ permalink raw reply related
* [PATCH bpf-next v2 1/4] bpf: add bpf_strncmp helper
From: Hou Tao @ 2021-12-10 14:16 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Martin KaFai Lau, Yonghong Song, Daniel Borkmann, Andrii Nakryiko,
netdev, bpf, houtao1
In-Reply-To: <20211210141652.877186-1-houtao1@huawei.com>
The helper compares two strings: one string is a null-terminated
read-only string, and another string has const max storage size
but doesn't need to be null-terminated. It can be used to compare
file name in tracing or LSM program.
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
include/linux/bpf.h | 1 +
include/uapi/linux/bpf.h | 11 +++++++++++
kernel/bpf/helpers.c | 16 ++++++++++++++++
tools/include/uapi/linux/bpf.h | 11 +++++++++++
4 files changed, 39 insertions(+)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 8bbf08fbab66..6b7c533ba249 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -2173,6 +2173,7 @@ extern const struct bpf_func_proto bpf_sk_getsockopt_proto;
extern const struct bpf_func_proto bpf_kallsyms_lookup_name_proto;
extern const struct bpf_func_proto bpf_find_vma_proto;
extern const struct bpf_func_proto bpf_loop_proto;
+extern const struct bpf_func_proto bpf_strncmp_proto;
const struct bpf_func_proto *tracing_prog_func_proto(
enum bpf_func_id func_id, const struct bpf_prog *prog);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index c26871263f1f..2820c77e4846 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -4983,6 +4983,16 @@ union bpf_attr {
* Return
* The number of loops performed, **-EINVAL** for invalid **flags**,
* **-E2BIG** if **nr_loops** exceeds the maximum number of loops.
+ *
+ * long bpf_strncmp(const char *s1, u32 s1_sz, const char *s2)
+ * Description
+ * Do strncmp() between **s1** and **s2**. **s1** doesn't need
+ * to be null-terminated and **s1_sz** is the maximum storage
+ * size of **s1**. **s2** must be a read-only string.
+ * Return
+ * An integer less than, equal to, or greater than zero
+ * if the first **s1_sz** bytes of **s1** is found to be
+ * less than, to match, or be greater than **s2**.
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
@@ -5167,6 +5177,7 @@ union bpf_attr {
FN(kallsyms_lookup_name), \
FN(find_vma), \
FN(loop), \
+ FN(strncmp), \
/* */
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 52188004a9c3..2b035c03624f 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -565,6 +565,20 @@ const struct bpf_func_proto bpf_strtoul_proto = {
};
#endif
+BPF_CALL_3(bpf_strncmp, const char *, s1, u32, s1_sz, const char *, s2)
+{
+ return strncmp(s1, s2, s1_sz);
+}
+
+const struct bpf_func_proto bpf_strncmp_proto = {
+ .func = bpf_strncmp,
+ .gpl_only = false,
+ .ret_type = RET_INTEGER,
+ .arg1_type = ARG_PTR_TO_MEM,
+ .arg2_type = ARG_CONST_SIZE,
+ .arg3_type = ARG_PTR_TO_CONST_STR,
+};
+
BPF_CALL_4(bpf_get_ns_current_pid_tgid, u64, dev, u64, ino,
struct bpf_pidns_info *, nsdata, u32, size)
{
@@ -1380,6 +1394,8 @@ bpf_base_func_proto(enum bpf_func_id func_id)
return &bpf_for_each_map_elem_proto;
case BPF_FUNC_loop:
return &bpf_loop_proto;
+ case BPF_FUNC_strncmp:
+ return &bpf_strncmp_proto;
default:
break;
}
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index c26871263f1f..2820c77e4846 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -4983,6 +4983,16 @@ union bpf_attr {
* Return
* The number of loops performed, **-EINVAL** for invalid **flags**,
* **-E2BIG** if **nr_loops** exceeds the maximum number of loops.
+ *
+ * long bpf_strncmp(const char *s1, u32 s1_sz, const char *s2)
+ * Description
+ * Do strncmp() between **s1** and **s2**. **s1** doesn't need
+ * to be null-terminated and **s1_sz** is the maximum storage
+ * size of **s1**. **s2** must be a read-only string.
+ * Return
+ * An integer less than, equal to, or greater than zero
+ * if the first **s1_sz** bytes of **s1** is found to be
+ * less than, to match, or be greater than **s2**.
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
@@ -5167,6 +5177,7 @@ union bpf_attr {
FN(kallsyms_lookup_name), \
FN(find_vma), \
FN(loop), \
+ FN(strncmp), \
/* */
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
--
2.29.2
^ permalink raw reply related
* [PATCH bpf-next v2 0/4] introduce bpf_strncmp() helper
From: Hou Tao @ 2021-12-10 14:16 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Martin KaFai Lau, Yonghong Song, Daniel Borkmann, Andrii Nakryiko,
netdev, bpf, houtao1
Hi,
The motivation for introducing bpf_strncmp() helper comes from
two aspects:
(1) clang doesn't always replace strncmp() automatically
In tracing program, sometimes we need to using a home-made
strncmp() to check whether or not the file name is expected.
(2) the performance of home-made strncmp is not so good
As shown in the benchmark in patch #4, the performance of
bpf_strncmp() helper is 18% or 33% better than home-made strncmp()
under x86-64 or arm64 when the compared string length is 64. When
the string length grows to 4095, the performance win will be
179% or 600% under x86-64 or arm64.
Any comments are welcome.
Regards,
Tao
Change Log:
v2:
* rebased on bpf-next
* drop patch "selftests/bpf: factor out common helpers for benchmarks"
(suggested by Andrii)
* remove unnecessary inline functions and add comments for programs which
will be rejected by verifier in patch 4 (suggested by Andrii)
* rename variables used in will-fail programs to clarify the purposes.
v1: https://lore.kernel.org/bpf/20211130142215.1237217-1-houtao1@huawei.com
* change API to bpf_strncmp(const char *s1, u32 s1_sz, const char *s2)
* add benchmark refactor and benchmark between bpf_strncmp() and strncmp()
RFC: https://lore.kernel.org/bpf/20211106132822.1396621-1-houtao1@huawei.com/
Hou Tao (4):
bpf: add bpf_strncmp helper
selftests/bpf: fix checkpatch error on empty function parameter
selftests/bpf: add benchmark for bpf_strncmp() helper
selftests/bpf: add test cases for bpf_strncmp()
include/linux/bpf.h | 1 +
include/uapi/linux/bpf.h | 11 ++
kernel/bpf/helpers.c | 16 ++
tools/include/uapi/linux/bpf.h | 11 ++
tools/testing/selftests/bpf/Makefile | 4 +-
tools/testing/selftests/bpf/bench.c | 8 +-
tools/testing/selftests/bpf/bench.h | 9 +-
.../selftests/bpf/benchs/bench_count.c | 2 +-
.../selftests/bpf/benchs/bench_rename.c | 16 +-
.../selftests/bpf/benchs/bench_ringbufs.c | 14 +-
.../selftests/bpf/benchs/bench_strncmp.c | 161 +++++++++++++++++
.../selftests/bpf/benchs/bench_trigger.c | 24 +--
.../selftests/bpf/benchs/run_bench_strncmp.sh | 12 ++
.../selftests/bpf/prog_tests/test_strncmp.c | 167 ++++++++++++++++++
.../selftests/bpf/progs/strncmp_bench.c | 50 ++++++
.../selftests/bpf/progs/strncmp_test.c | 54 ++++++
16 files changed, 526 insertions(+), 34 deletions(-)
create mode 100644 tools/testing/selftests/bpf/benchs/bench_strncmp.c
create mode 100755 tools/testing/selftests/bpf/benchs/run_bench_strncmp.sh
create mode 100644 tools/testing/selftests/bpf/prog_tests/test_strncmp.c
create mode 100644 tools/testing/selftests/bpf/progs/strncmp_bench.c
create mode 100644 tools/testing/selftests/bpf/progs/strncmp_test.c
--
2.29.2
^ permalink raw reply
* Re: [PATCH V6 3/5] hyper-v: Enable swiotlb bounce buffer for Isolation VM
From: Tianyu Lan @ 2021-12-10 14:01 UTC (permalink / raw)
To: Michael Kelley (LINUX), KY Srinivasan, Haiyang Zhang,
Stephen Hemminger, wei.liu@kernel.org, Dexuan Cui,
tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com,
davem@davemloft.net, kuba@kernel.org, jejb@linux.ibm.com,
martin.petersen@oracle.com, arnd@arndb.de, hch@infradead.org,
m.szyprowski@samsung.com, robin.murphy@arm.com, Tianyu Lan,
thomas.lendacky@amd.com
Cc: iommu@lists.linux-foundation.org, linux-arch@vger.kernel.org,
linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-scsi@vger.kernel.org, netdev@vger.kernel.org, vkuznets,
brijesh.singh@amd.com, konrad.wilk@oracle.com, hch@lst.de,
joro@8bytes.org, parri.andrea@gmail.com, dave.hansen@intel.com
In-Reply-To: <4d60fcd1-97df-f4a1-1b79-643e65f66b3e@gmail.com>
On 12/10/2021 9:25 PM, Tianyu Lan wrote:
>>> @@ -319,8 +320,16 @@ static void __init ms_hyperv_init_platform(void)
>>> pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B
>>> 0x%x\n",
>>> ms_hyperv.isolation_config_a,
>>> ms_hyperv.isolation_config_b);
>>>
>>> - if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP)
>>> + if (hv_get_isolation_type() == HV_ISOLATION_TYPE_SNP) {
>>> static_branch_enable(&isolation_type_snp);
>>> + swiotlb_unencrypted_base = ms_hyperv.shared_gpa_boundary;
>>> + }
>>> +
>>> + /*
>>> + * Enable swiotlb force mode in Isolation VM to
>>> + * use swiotlb bounce buffer for dma transaction.
>>> + */
>>> + swiotlb_force = SWIOTLB_FORCE;
>> I'm good with this approach that directly updates the swiotlb settings
>> here
>>
>> rather than in IOMMU initialization code. It's a lot more
>> straightforward.
>>
>> However, there's an issue if building for X86_32 without PAE, in that the
>> swiotlb module may not be built, resulting in compile and link
>> errors. The
>> swiotlb.h file needs to be updated to provide a stub function for
>> swiotlb_update_mem_attributes(). swiotlb_unencrypted_base probably
>> needs wrapper functions to get/set it, which can be stubs when
>> CONFIG_SWIOTLB is not set. swiotlb_force is a bit of a mess in that
>> it already
>> has a stub definition that assumes it will only be read, and not set.
>> A bit of
>> thinking will be needed to sort that out.
>
> It's ok to fix the issue via selecting swiotlb when CONFIG_HYPERV is
> set?
>
Sorry. ignore the previous statement. These codes doesn't depend on
CONFIG_HYPERV.
How about making these code under #ifdef CONFIG_X86_64 or CONFIG_SWIOTLB?
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox