From: Jiri Olsa <jolsa@redhat.com>
To: "Björn Töpel" <bjorn.topel@intel.com>
Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>,
Jiri Olsa <jolsa@kernel.org>, Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
netdev@vger.kernel.org, bpf@vger.kernel.org,
Andrii Nakryiko <andriin@fb.com>, Yonghong Song <yhs@fb.com>,
Martin KaFai Lau <kafai@fb.com>,
Jakub Kicinski <jakub.kicinski@netronome.com>,
David Miller <davem@redhat.com>
Subject: Re: [PATCH 5/5] bpf: Allow to resolve bpf trampoline in unwind
Date: Mon, 13 Jan 2020 10:43:10 +0100 [thread overview]
Message-ID: <20200113094310.GE35080@krava> (raw)
In-Reply-To: <76a10338-391a-ffca-9af8-f407265d146a@intel.com>
On Tue, Jan 07, 2020 at 02:30:35PM +0100, Björn Töpel wrote:
> On 2020-01-07 14:05, Jiri Olsa wrote:
> > On Mon, Jan 06, 2020 at 03:46:40PM -0800, Alexei Starovoitov wrote:
> > > On Sun, Dec 29, 2019 at 03:37:40PM +0100, Jiri Olsa wrote:
> > > > When unwinding the stack we need to identify each
> > > > address to successfully continue. Adding latch tree
> > > > to keep trampolines for quick lookup during the
> > > > unwind.
> > > >
> > > > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > > ...
> > > > +bool is_bpf_trampoline(void *addr)
> > > > +{
> > > > + return latch_tree_find(addr, &tree, &tree_ops) != NULL;
> > > > +}
> > > > +
> > > > struct bpf_trampoline *bpf_trampoline_lookup(u64 key)
> > > > {
> > > > struct bpf_trampoline *tr;
> > > > @@ -65,6 +98,7 @@ struct bpf_trampoline *bpf_trampoline_lookup(u64 key)
> > > > for (i = 0; i < BPF_TRAMP_MAX; i++)
> > > > INIT_HLIST_HEAD(&tr->progs_hlist[i]);
> > > > tr->image = image;
> > > > + latch_tree_insert(&tr->tnode, &tree, &tree_ops);
> > >
> > > Thanks for the fix. I was thinking to apply it, but then realized that bpf
> > > dispatcher logic has the same issue.
> > > Could you generalize the fix for both?
> > > May be bpf_jit_alloc_exec_page() can do latch_tree_insert() ?
> > > and new version of bpf_jit_free_exec() is needed that will do latch_tree_erase().
> > > Wdyt?
> >
> > I need to check the dispatcher code, but seems ok.. will check
> >
>
> Thanks Jiri! The trampoline and dispatcher share the image allocation, so
> putting it there would make sense.
>
> It's annoying that the dispatcher doesn't show up correctly in perf, and
> it's been on my list to fix that. Hopefully you beat me to it! :-D
hi,
attached patch seems to work for me (trampoline usecase), but I don't know
how to test it for dispatcher.. also I need to check if we need to decrease
BPF_TRAMP_MAX or BPF_DISPATCHER_MAX, it might take more time ;-)
jirka
---
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index aed2bc39d72b..e0ca8780dc7a 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -510,7 +510,6 @@ struct bpf_trampoline *bpf_trampoline_lookup(u64 key);
int bpf_trampoline_link_prog(struct bpf_prog *prog);
int bpf_trampoline_unlink_prog(struct bpf_prog *prog);
void bpf_trampoline_put(struct bpf_trampoline *tr);
-void *bpf_jit_alloc_exec_page(void);
#define BPF_DISPATCHER_INIT(name) { \
.mutex = __MUTEX_INITIALIZER(name.mutex), \
.func = &name##func, \
@@ -542,6 +541,13 @@ void *bpf_jit_alloc_exec_page(void);
#define BPF_DISPATCHER_PTR(name) (&name)
void bpf_dispatcher_change_prog(struct bpf_dispatcher *d, struct bpf_prog *from,
struct bpf_prog *to);
+struct bpf_image {
+ struct latch_tree_node tnode;
+ unsigned char data[];
+};
+#define BPF_IMAGE_SIZE (PAGE_SIZE - sizeof(struct bpf_image))
+bool is_bpf_image(void *addr);
+void *bpf_image_alloc(void);
#else
static inline struct bpf_trampoline *bpf_trampoline_lookup(u64 key)
{
@@ -563,6 +569,10 @@ static inline void bpf_trampoline_put(struct bpf_trampoline *tr) {}
static inline void bpf_dispatcher_change_prog(struct bpf_dispatcher *d,
struct bpf_prog *from,
struct bpf_prog *to) {}
+static inline bool is_bpf_image(void *addr)
+{
+ return false;
+}
#endif
struct bpf_func_info_aux {
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 29d47aae0dd1..53dc3adf6077 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -704,6 +704,8 @@ bool is_bpf_text_address(unsigned long addr)
rcu_read_lock();
ret = bpf_prog_kallsyms_find(addr) != NULL;
+ if (!ret)
+ ret = is_bpf_image((void*) addr);
rcu_read_unlock();
return ret;
diff --git a/kernel/bpf/dispatcher.c b/kernel/bpf/dispatcher.c
index 204ee61a3904..b3e5b214fed8 100644
--- a/kernel/bpf/dispatcher.c
+++ b/kernel/bpf/dispatcher.c
@@ -113,7 +113,7 @@ static void bpf_dispatcher_update(struct bpf_dispatcher *d, int prev_num_progs)
noff = 0;
} else {
old = d->image + d->image_off;
- noff = d->image_off ^ (PAGE_SIZE / 2);
+ noff = d->image_off ^ (BPF_IMAGE_SIZE / 2);
}
new = d->num_progs ? d->image + noff : NULL;
@@ -140,7 +140,7 @@ void bpf_dispatcher_change_prog(struct bpf_dispatcher *d, struct bpf_prog *from,
mutex_lock(&d->mutex);
if (!d->image) {
- d->image = bpf_jit_alloc_exec_page();
+ d->image = bpf_image_alloc();
if (!d->image)
goto out;
}
diff --git a/kernel/bpf/trampoline.c b/kernel/bpf/trampoline.c
index 79a04417050d..3ea56f89c68a 100644
--- a/kernel/bpf/trampoline.c
+++ b/kernel/bpf/trampoline.c
@@ -4,6 +4,7 @@
#include <linux/bpf.h>
#include <linux/filter.h>
#include <linux/ftrace.h>
+#include <linux/rbtree_latch.h>
/* btf_vmlinux has ~22k attachable functions. 1k htab is enough. */
#define TRAMPOLINE_HASH_BITS 10
@@ -14,7 +15,12 @@ static struct hlist_head trampoline_table[TRAMPOLINE_TABLE_SIZE];
/* serializes access to trampoline_table */
static DEFINE_MUTEX(trampoline_mutex);
-void *bpf_jit_alloc_exec_page(void)
+static struct latch_tree_root image_tree __cacheline_aligned;
+
+/* serializes access to image_tree */
+static DEFINE_MUTEX(image_mutex);
+
+static void *bpf_jit_alloc_exec_page(void)
{
void *image;
@@ -30,6 +36,62 @@ void *bpf_jit_alloc_exec_page(void)
return image;
}
+static __always_inline bool image_tree_less(struct latch_tree_node *a,
+ struct latch_tree_node *b)
+{
+ struct bpf_image *ia = container_of(a, struct bpf_image, tnode);
+ struct bpf_image *ib = container_of(b, struct bpf_image, tnode);
+
+ return ia < ib;
+}
+
+static __always_inline int image_tree_comp(void *addr, struct latch_tree_node *n)
+{
+ void *image = container_of(n, struct bpf_image, tnode);
+
+ if (addr < image)
+ return -1;
+ if (addr >= image + PAGE_SIZE)
+ return 1;
+
+ return 0;
+}
+
+static const struct latch_tree_ops image_tree_ops = {
+ .less = image_tree_less,
+ .comp = image_tree_comp,
+};
+
+void *bpf_image_alloc(void)
+{
+ struct bpf_image *image;
+
+ image = bpf_jit_alloc_exec_page();
+ if (!image)
+ return NULL;
+
+ mutex_lock(&image_mutex);
+ latch_tree_insert(&image->tnode, &image_tree, &image_tree_ops);
+ mutex_unlock(&image_mutex);
+ return image->data;
+}
+
+void bpf_image_delete(void *ptr)
+{
+ struct bpf_image *image = container_of(ptr, struct bpf_image, data);
+
+ mutex_lock(&image_mutex);
+ latch_tree_erase(&image->tnode, &image_tree, &image_tree_ops);
+ synchronize_rcu();
+ bpf_jit_free_exec(image);
+ mutex_unlock(&image_mutex);
+}
+
+bool is_bpf_image(void *addr)
+{
+ return latch_tree_find(addr, &image_tree, &image_tree_ops) != NULL;
+}
+
struct bpf_trampoline *bpf_trampoline_lookup(u64 key)
{
struct bpf_trampoline *tr;
@@ -50,7 +112,7 @@ struct bpf_trampoline *bpf_trampoline_lookup(u64 key)
goto out;
/* is_root was checked earlier. No need for bpf_jit_charge_modmem() */
- image = bpf_jit_alloc_exec_page();
+ image = bpf_image_alloc();
if (!image) {
kfree(tr);
tr = NULL;
@@ -125,14 +187,14 @@ static int register_fentry(struct bpf_trampoline *tr, void *new_addr)
}
/* Each call __bpf_prog_enter + call bpf_func + call __bpf_prog_exit is ~50
- * bytes on x86. Pick a number to fit into PAGE_SIZE / 2
+ * bytes on x86. Pick a number to fit into BPF_IMAGE_SIZE / 2
*/
#define BPF_MAX_TRAMP_PROGS 40
static int bpf_trampoline_update(struct bpf_trampoline *tr)
{
- void *old_image = tr->image + ((tr->selector + 1) & 1) * PAGE_SIZE/2;
- void *new_image = tr->image + (tr->selector & 1) * PAGE_SIZE/2;
+ void *old_image = tr->image + ((tr->selector + 1) & 1) * BPF_IMAGE_SIZE/2;
+ void *new_image = tr->image + (tr->selector & 1) * BPF_IMAGE_SIZE/2;
struct bpf_prog *progs_to_run[BPF_MAX_TRAMP_PROGS];
int fentry_cnt = tr->progs_cnt[BPF_TRAMP_FENTRY];
int fexit_cnt = tr->progs_cnt[BPF_TRAMP_FEXIT];
@@ -160,7 +222,7 @@ static int bpf_trampoline_update(struct bpf_trampoline *tr)
if (fexit_cnt)
flags = BPF_TRAMP_F_CALL_ORIG | BPF_TRAMP_F_SKIP_FRAME;
- err = arch_prepare_bpf_trampoline(new_image, new_image + PAGE_SIZE / 2,
+ err = arch_prepare_bpf_trampoline(new_image, new_image + BPF_IMAGE_SIZE / 2,
&tr->func.model, flags,
fentry, fentry_cnt,
fexit, fexit_cnt,
@@ -251,7 +313,7 @@ void bpf_trampoline_put(struct bpf_trampoline *tr)
goto out;
if (WARN_ON_ONCE(!hlist_empty(&tr->progs_hlist[BPF_TRAMP_FEXIT])))
goto out;
- bpf_jit_free_exec(tr->image);
+ bpf_image_delete(tr->image);
hlist_del(&tr->hlist);
kfree(tr);
out:
next prev parent reply other threads:[~2020-01-13 9:43 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-12-29 14:37 [RFC 0/5] bpf: Add trampoline helpers Jiri Olsa
2019-12-29 14:37 ` [PATCH 1/5] bpf: Allow non struct type for btf ctx access Jiri Olsa
2020-01-06 21:36 ` Yonghong Song
2020-01-07 12:13 ` Jiri Olsa
2020-01-07 15:50 ` Jiri Olsa
2020-01-07 17:55 ` Yonghong Song
2020-01-07 18:28 ` Yonghong Song
2020-01-08 14:38 ` Jiri Olsa
2019-12-29 14:37 ` [PATCH 2/5] bpf: Add bpf_perf_event_output_kfunc Jiri Olsa
2020-01-06 23:27 ` Alexei Starovoitov
2020-01-07 12:25 ` Jiri Olsa
2020-01-07 22:13 ` Alexei Starovoitov
2020-01-08 10:24 ` Jiri Olsa
2019-12-29 14:37 ` [PATCH 3/5] bpf: Add bpf_get_stackid_kfunc Jiri Olsa
2019-12-29 14:37 ` [PATCH 4/5] bpf: Add bpf_get_stack_kfunc Jiri Olsa
2019-12-29 14:37 ` [PATCH 5/5] bpf: Allow to resolve bpf trampoline in unwind Jiri Olsa
2020-01-06 23:46 ` Alexei Starovoitov
2020-01-07 8:30 ` Daniel Borkmann
2020-01-07 13:15 ` Jiri Olsa
2020-01-07 19:30 ` Daniel Borkmann
2020-01-07 13:05 ` Jiri Olsa
2020-01-07 13:30 ` Björn Töpel
2020-01-13 9:43 ` Jiri Olsa [this message]
2020-01-13 12:21 ` Björn Töpel
2020-01-13 12:31 ` Björn Töpel
2020-01-13 12:37 ` Jiri Olsa
2020-02-03 19:58 ` Jiri Olsa
2020-02-03 20:27 ` Björn Töpel
2020-02-03 20:45 ` Toke Høiland-Jørgensen
2020-02-04 8:18 ` Jiri Olsa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200113094310.GE35080@krava \
--to=jolsa@redhat.com \
--cc=alexei.starovoitov@gmail.com \
--cc=andriin@fb.com \
--cc=ast@kernel.org \
--cc=bjorn.topel@intel.com \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@redhat.com \
--cc=jakub.kicinski@netronome.com \
--cc=jolsa@kernel.org \
--cc=kafai@fb.com \
--cc=netdev@vger.kernel.org \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.