From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
To: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Amery Hung <amery.hung@bytedance.com>, <bpf@vger.kernel.org>,
<magnus.karlsson@intel.com>, <sreedevi.joshi@intel.com>,
<ast@kernel.org>
Subject: Re: [External] Storing sk_buffs as kptrs in map
Date: Tue, 3 Dec 2024 21:46:59 +0100 [thread overview]
Message-ID: <Z09uQ48lKEsORsS1@boxer> (raw)
In-Reply-To: <d1e95498-4613-43e0-bc6b-6f6157802649@linux.dev>
On Wed, Nov 27, 2024 at 12:54:41PM -0800, Martin KaFai Lau wrote:
> On 11/27/24 11:07 AM, Maciej Fijalkowski wrote:
> > But kfunc does not work on PTR_TO_CTX - it takes in directly sk_buff, not
> > __sk_buff. As I mention above we use bpf_cast_to_kern_ctx() and per my
> > current limited understanding it overwrites the reg->type to
> > PTR_TO_BTF_ID | PTR_TRUSTED.
>
> Can you try skip calling the bpf_cast_to_kern_ctx and directly pass the
> "struct __sk_buff *skb" to the "struct sk_buff *bpf_skb_acquire(struct
> __sk_buff *skb).
That didn't help, log below:
libbpf: prog 'bpf_tc_ingress': -- BEGIN PROG LOAD LOG --
0: R1=ctx() R10=fp0
; map_entry = bpf_map_lookup_elem(&skb_map, &get_key); @ bpf_bpf.c:145
0: (18) r6 = 0xffffc9000d186004 ; R6_w=map_value(map=bpf_bpf.bss,ks=4,vs=8,off=4)
2: (18) r1 = 0xffff88984ab8ce00 ; R1_w=map_ptr(map=skb_map,ks=4,vs=8)
4: (18) r2 = 0xffffc9000d186004 ; R2_w=map_value(map=bpf_bpf.bss,ks=4,vs=8,off=4)
6: (85) call bpf_map_lookup_elem#1 ; R0_w=map_value_or_null(id=1,map=skb_map,ks=4,vs=8)
; if (!map_entry) { @ bpf_bpf.c:146
7: (55) if r0 != 0x0 goto pc+7 15: R0_w=map_value(map=skb_map,ks=4,vs=8) R6_w=map_value(map=bpf_bpf.bss,ks=4,vs=8,off=4) R10=fp0
; tmp = bpf_kptr_xchg(&map_entry->skb, tmp); @ bpf_bpf.c:151
15: (bf) r1 = r0 ; R0_w=map_value(map=skb_map,ks=4,vs=8) R1_w=map_value(map=skb_map,ks=4,vs=8)
16: (b7) r2 = 0 ; R2_w=0
17: (85) call bpf_kptr_xchg#194 ; R0_w=ptr_or_null_sk_buff(id=3,ref_obj_id=3) refs=3
18: (bf) r6 = r0 ; R0_w=ptr_or_null_sk_buff(id=3,ref_obj_id=3) R6_w=ptr_or_null_sk_buff(id=3,ref_obj_id=3) refs=3
19: (b4) w7 = -1 ; R7=0xffffffff refs=3
; if (!tmp) @ bpf_bpf.c:152
20: (15) if r6 == 0x0 goto pc+17 ; R6=ptr_sk_buff(ref_obj_id=3) refs=3
; bpf_printk("retrieved skb %p from index %d\n", tmp, get_key); @ bpf_bpf.c:155
21: (18) r8 = 0xffffc9000d186004 ; R8_w=map_value(map=bpf_bpf.bss,ks=4,vs=8,off=4) refs=3
23: (61) r4 = *(u32 *)(r8 +0) ; R4_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) R8_w=map_value(map=bpf_bpf.bss,ks=4,vs=8,off=4) refs=3
24: (18) r1 = 0xffff88984ab8cb20 ; R1_w=map_value(map=bpf_bpf.rodata,ks=4,vs=83,off=24) refs=3
26: (b4) w2 = 32 ; R2_w=32 refs=3
27: (bf) r3 = r6 ; R3_w=ptr_sk_buff(ref_obj_id=3) R6=ptr_sk_buff(ref_obj_id=3) refs=3
28: (85) call bpf_trace_printk#6 ; R0_w=scalar() refs=3
; get_key++; @ bpf_bpf.c:157
29: (61) r1 = *(u32 *)(r8 +0) ; R1_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) R8_w=map_value(map=bpf_bpf.bss,ks=4,vs=8,off=4) refs=3
30: (04) w1 += 1 ; R1_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) refs=3
31: (b4) w7 = 0 ; R7_w=0 refs=3
32: (b4) w2 = 0 ; R2=0 refs=3
; if (get_key >= 16) @ bpf_bpf.c:158
33: (26) if w1 > 0xf goto pc+1 ; R1=scalar(smin=smin32=0,smax=umax=smax32=umax32=15,var_off=(0x0; 0xf)) refs=3
34: (bc) w2 = w1 ; R1=scalar(id=4,smin=smin32=0,smax=umax=smax32=umax32=15,var_off=(0x0; 0xf)) R2_w=scalar(id=4,smin=smin32=0,smax=umax=smax32=umax32=15,var_off=(0x0; 0xf)) refs=3
35: (63) *(u32 *)(r8 +0) = r2 ; R2_w=scalar(id=4,smin=smin32=0,smax=umax=smax32=umax32=15,var_off=(0x0; 0xf)) R8=map_value(map=bpf_bpf.bss,ks=4,vs=8,off=4) refs=3
; bpf_skb_release((struct __sk_buff *)tmp); @ bpf_bpf.c:161
36: (bf) r1 = r6 ; R1_w=ptr_sk_buff(ref_obj_id=3) R6=ptr_sk_buff(ref_obj_id=3) refs=3
37: (85) call bpf_skb_release#102037
arg#0 expected pointer to ctx, but got ptr_
processed 34 insns (limit 1000000) max_states_per_insn 0 total_states 3 peak_states 3 mark_read 1
-- END PROG LOAD LOG --
Still the same problem. Also even it would work it was not very convenient
to cast these types back and forth...I then tried to store __sk_buff as
kptr but I ended up with:
"map 'skb_map' has to have BTF in order to use bpf_kptr_xchg"
which got me lost:) I have a solution though which I'd like to discuss.
>
> Please share the patch and the test case. It will be easier for others to help.
I have come up with rather simple way of achieving what I desired when
starting this thread, how about something like this:
From 0df7760330cccfe71235b56018d0a33d4a3b9863 Mon Sep 17 00:00:00 2001
From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Date: Tue, 3 Dec 2024 21:00:44 +0100
Subject: [PATCH RFC bpf-next] bpf: add __ctx_ref kfunc argument suffix
The use case is when user wants to use sk_buff pointers as kptrs against
program types that take in __sk_buff struct as context.
A pair of kfuncs for acquiring and releasing skb would look as follows:
__bpf_kfunc struct sk_buff *bpf_skb_acquire(struct sk_buff *skb__ctx_ref)
__bpf_kfunc void bpf_skb_release(struct sk_buff *skb__ctx_ref)
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
kernel/bpf/verifier.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 60938b136365..b16a39d28f8a 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -11303,6 +11303,11 @@ static bool is_kfunc_arg_const_str(const struct btf *btf, const struct btf_param
return btf_param_match_suffix(btf, arg, "__str");
}
+static bool is_kfunc_arg_ctx_ref(const struct btf *btf, const struct btf_param *arg)
+{
+ return btf_param_match_suffix(btf, arg, "__ctx_ref");
+}
+
static bool is_kfunc_arg_scalar_with_name(const struct btf *btf,
const struct btf_param *arg,
const char *name)
@@ -11599,6 +11604,9 @@ get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
if (meta->func_id == special_kfunc_list[KF_bpf_cast_to_kern_ctx])
return KF_ARG_PTR_TO_CTX;
+ if (is_kfunc_arg_ctx_ref(meta->btf, &args[argno]))
+ return KF_ARG_PTR_TO_BTF_ID;
+
/* In this function, we verify the kfunc's BTF as per the argument type,
* leaving the rest of the verification with respect to the register
* type to our caller. When a set of conditions hold in the BTF type of
--
2.34.1
Then below example progs store the skb on TC tx and retrieve it later on
when TC rx hook is called. This is a ping-pong of packets between veth
pair interfaces. We start with Tx on interface with progs attached and
then receive the packets back. I have stripped the other logic so that we
focus only on kptr storage:
-------------------------------8<-------------------------------
struct sk_buff *bpf_skb_acquire(struct sk_buff *skb) __ksym;
void bpf_skb_release(struct sk_buff *skb) __ksym;
void *bpf_cast_to_kern_ctx(void *) __ksym;
u32 key = 0;
u32 get_key = 0;
struct map_value {
struct sk_buff __kptr * skb;
};
struct {
__uint(type, BPF_MAP_TYPE_ARRAY);
__type(key, int);
__type(value, struct map_value);
__uint(max_entries, 16);
} skb_map SEC(".maps");
SEC("tc") int bpf_tc_ingress(struct __sk_buff *skb)
{
struct map_value *map_entry;
struct sk_buff *tmp = NULL;
map_entry = bpf_map_lookup_elem(&skb_map, &get_key);
if (!map_entry) {
bpf_printk("no entry at get key %d\n", get_key);
return TC_ACT_SHOT;
}
tmp = bpf_kptr_xchg(&map_entry->skb, tmp);
if (!tmp)
return TC_ACT_UNSPEC;
bpf_printk("retrieved skb %p from index %d\n", tmp, get_key);
get_key++;
if (get_key >= 16)
get_key = 0;
bpf_skb_release(tmp);
return TC_ACT_OK;
}
SEC("tc") int bpf_tc_egress(struct __sk_buff *ctx)
{
struct sk_buff *skbk = bpf_cast_to_kern_ctx(ctx);
struct map_value *map_entry;
struct map_value tmp_entry;
struct sk_buff *tmp;
tmp_entry.skb = NULL;
bpf_map_update_elem(&skb_map, &key, &tmp_entry, BPF_ANY);
map_entry = bpf_map_lookup_elem(&skb_map, &key);
if (!map_entry)
return TC_ACT_SHOT;
skbk = bpf_skb_acquire(skbk);
if (!skbk)
return TC_ACT_SHOT;
tmp = bpf_kptr_xchg(&map_entry->skb, skbk);
if (tmp)
bpf_skb_release(tmp);
bpf_printk("stored skb %p at index %d\n", skbk, key);
key++;
if (key >= 16)
key = 0;
return TC_ACT_OK;
}
char LICENSE[] SEC("license") = "GPL";
------------------------------->8-------------------------------
>
> > I tried to simplify the use case that customer has, but I am a bit worried
> > that it might only confuse people more :/ however, here it is:
>
> No. not at all. I suspect the use case has some similarity to the
> net-timestamp patches (https://lore.kernel.org/bpf/20241028110535.82999-1-kerneljasonxing@gmail.com/)
> which uses a skb tskey to associate/co-relate different timestamp.
I was aware of this work and I will give this a deep look, thanks.
>
> > On TC egress hook skb is stored in a map - reason for picking it over the
> > linked list or rbtree is that we want to be able to access skbs via some index,
> > say a hash. This is where we bump the skb's refcount via acquire kfunc.
> >
> > During TC ingress hook on the same interface, the skb that was previously
> > stored in map is retrieved, current skb that resides in the context of
> > hook carries the timestamp via metadata. We then use the retrieved skb and
> > tstamp from metadata on skb_tstamp_tx() (another kfunc) and finally
> > decrement skb's refcount via release kfunc.
> >
> >
> > Anyways, since we are able to do similar operations on task_struct
> > (holding it in map via kptr), I don't see a reason why wouldn't we allow
> > ourselves to do it on sk_buffs, no?
>
> skb holds other things like dev and dst, like someone may be trying to
> remove the netdevice and route...etc. Overall, yes, the skb refcnt will
> eventually be decremented when the map is freed like other kptr (e.g. task)
> do.
>
next prev parent reply other threads:[~2024-12-03 20:47 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-26 17:05 Storing sk_buffs as kptrs in map Maciej Fijalkowski
2024-11-26 19:56 ` [External] " Amery Hung
2024-11-26 20:47 ` Martin KaFai Lau
2024-11-27 19:07 ` Maciej Fijalkowski
2024-11-27 20:54 ` Martin KaFai Lau
2024-12-03 20:46 ` Maciej Fijalkowski [this message]
2024-12-04 23:24 ` Martin KaFai Lau
2024-12-06 16:24 ` Maciej Fijalkowski
2024-12-07 0:36 ` Martin KaFai Lau
2024-12-09 13:17 ` Maciej Fijalkowski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z09uQ48lKEsORsS1@boxer \
--to=maciej.fijalkowski@intel.com \
--cc=amery.hung@bytedance.com \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=magnus.karlsson@intel.com \
--cc=martin.lau@linux.dev \
--cc=sreedevi.joshi@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.