From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
To: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Amery Hung <amery.hung@bytedance.com>, <bpf@vger.kernel.org>,
<magnus.karlsson@intel.com>, <sreedevi.joshi@intel.com>,
<ast@kernel.org>
Subject: Re: [External] Storing sk_buffs as kptrs in map
Date: Tue, 3 Dec 2024 21:46:59 +0100 [thread overview]
Message-ID: <Z09uQ48lKEsORsS1@boxer> (raw)
In-Reply-To: <d1e95498-4613-43e0-bc6b-6f6157802649@linux.dev>
On Wed, Nov 27, 2024 at 12:54:41PM -0800, Martin KaFai Lau wrote:
> On 11/27/24 11:07 AM, Maciej Fijalkowski wrote:
> > But kfunc does not work on PTR_TO_CTX - it takes in directly sk_buff, not
> > __sk_buff. As I mention above we use bpf_cast_to_kern_ctx() and per my
> > current limited understanding it overwrites the reg->type to
> > PTR_TO_BTF_ID | PTR_TRUSTED.
>
> Can you try skip calling the bpf_cast_to_kern_ctx and directly pass the
> "struct __sk_buff *skb" to the "struct sk_buff *bpf_skb_acquire(struct
> __sk_buff *skb).
That didn't help, log below:
libbpf: prog 'bpf_tc_ingress': -- BEGIN PROG LOAD LOG --
0: R1=ctx() R10=fp0
; map_entry = bpf_map_lookup_elem(&skb_map, &get_key); @ bpf_bpf.c:145
0: (18) r6 = 0xffffc9000d186004 ; R6_w=map_value(map=bpf_bpf.bss,ks=4,vs=8,off=4)
2: (18) r1 = 0xffff88984ab8ce00 ; R1_w=map_ptr(map=skb_map,ks=4,vs=8)
4: (18) r2 = 0xffffc9000d186004 ; R2_w=map_value(map=bpf_bpf.bss,ks=4,vs=8,off=4)
6: (85) call bpf_map_lookup_elem#1 ; R0_w=map_value_or_null(id=1,map=skb_map,ks=4,vs=8)
; if (!map_entry) { @ bpf_bpf.c:146
7: (55) if r0 != 0x0 goto pc+7 15: R0_w=map_value(map=skb_map,ks=4,vs=8) R6_w=map_value(map=bpf_bpf.bss,ks=4,vs=8,off=4) R10=fp0
; tmp = bpf_kptr_xchg(&map_entry->skb, tmp); @ bpf_bpf.c:151
15: (bf) r1 = r0 ; R0_w=map_value(map=skb_map,ks=4,vs=8) R1_w=map_value(map=skb_map,ks=4,vs=8)
16: (b7) r2 = 0 ; R2_w=0
17: (85) call bpf_kptr_xchg#194 ; R0_w=ptr_or_null_sk_buff(id=3,ref_obj_id=3) refs=3
18: (bf) r6 = r0 ; R0_w=ptr_or_null_sk_buff(id=3,ref_obj_id=3) R6_w=ptr_or_null_sk_buff(id=3,ref_obj_id=3) refs=3
19: (b4) w7 = -1 ; R7=0xffffffff refs=3
; if (!tmp) @ bpf_bpf.c:152
20: (15) if r6 == 0x0 goto pc+17 ; R6=ptr_sk_buff(ref_obj_id=3) refs=3
; bpf_printk("retrieved skb %p from index %d\n", tmp, get_key); @ bpf_bpf.c:155
21: (18) r8 = 0xffffc9000d186004 ; R8_w=map_value(map=bpf_bpf.bss,ks=4,vs=8,off=4) refs=3
23: (61) r4 = *(u32 *)(r8 +0) ; R4_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) R8_w=map_value(map=bpf_bpf.bss,ks=4,vs=8,off=4) refs=3
24: (18) r1 = 0xffff88984ab8cb20 ; R1_w=map_value(map=bpf_bpf.rodata,ks=4,vs=83,off=24) refs=3
26: (b4) w2 = 32 ; R2_w=32 refs=3
27: (bf) r3 = r6 ; R3_w=ptr_sk_buff(ref_obj_id=3) R6=ptr_sk_buff(ref_obj_id=3) refs=3
28: (85) call bpf_trace_printk#6 ; R0_w=scalar() refs=3
; get_key++; @ bpf_bpf.c:157
29: (61) r1 = *(u32 *)(r8 +0) ; R1_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) R8_w=map_value(map=bpf_bpf.bss,ks=4,vs=8,off=4) refs=3
30: (04) w1 += 1 ; R1_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) refs=3
31: (b4) w7 = 0 ; R7_w=0 refs=3
32: (b4) w2 = 0 ; R2=0 refs=3
; if (get_key >= 16) @ bpf_bpf.c:158
33: (26) if w1 > 0xf goto pc+1 ; R1=scalar(smin=smin32=0,smax=umax=smax32=umax32=15,var_off=(0x0; 0xf)) refs=3
34: (bc) w2 = w1 ; R1=scalar(id=4,smin=smin32=0,smax=umax=smax32=umax32=15,var_off=(0x0; 0xf)) R2_w=scalar(id=4,smin=smin32=0,smax=umax=smax32=umax32=15,var_off=(0x0; 0xf)) refs=3
35: (63) *(u32 *)(r8 +0) = r2 ; R2_w=scalar(id=4,smin=smin32=0,smax=umax=smax32=umax32=15,var_off=(0x0; 0xf)) R8=map_value(map=bpf_bpf.bss,ks=4,vs=8,off=4) refs=3
; bpf_skb_release((struct __sk_buff *)tmp); @ bpf_bpf.c:161
36: (bf) r1 = r6 ; R1_w=ptr_sk_buff(ref_obj_id=3) R6=ptr_sk_buff(ref_obj_id=3) refs=3
37: (85) call bpf_skb_release#102037
arg#0 expected pointer to ctx, but got ptr_
processed 34 insns (limit 1000000) max_states_per_insn 0 total_states 3 peak_states 3 mark_read 1
-- END PROG LOAD LOG --
Still the same problem. Also even it would work it was not very convenient
to cast these types back and forth...I then tried to store __sk_buff as
kptr but I ended up with:
"map 'skb_map' has to have BTF in order to use bpf_kptr_xchg"
which got me lost:) I have a solution though which I'd like to discuss.
>
> Please share the patch and the test case. It will be easier for others to help.
I have come up with rather simple way of achieving what I desired when
starting this thread, how about something like this:
From 0df7760330cccfe71235b56018d0a33d4a3b9863 Mon Sep 17 00:00:00 2001
From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Date: Tue, 3 Dec 2024 21:00:44 +0100
Subject: [PATCH RFC bpf-next] bpf: add __ctx_ref kfunc argument suffix
The use case is when user wants to use sk_buff pointers as kptrs against
program types that take in __sk_buff struct as context.
A pair of kfuncs for acquiring and releasing skb would look as follows:
__bpf_kfunc struct sk_buff *bpf_skb_acquire(struct sk_buff *skb__ctx_ref)
__bpf_kfunc void bpf_skb_release(struct sk_buff *skb__ctx_ref)
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
kernel/bpf/verifier.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 60938b136365..b16a39d28f8a 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -11303,6 +11303,11 @@ static bool is_kfunc_arg_const_str(const struct btf *btf, const struct btf_param
return btf_param_match_suffix(btf, arg, "__str");
}
+static bool is_kfunc_arg_ctx_ref(const struct btf *btf, const struct btf_param *arg)
+{
+ return btf_param_match_suffix(btf, arg, "__ctx_ref");
+}
+
static bool is_kfunc_arg_scalar_with_name(const struct btf *btf,
const struct btf_param *arg,
const char *name)
@@ -11599,6 +11604,9 @@ get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
if (meta->func_id == special_kfunc_list[KF_bpf_cast_to_kern_ctx])
return KF_ARG_PTR_TO_CTX;
+ if (is_kfunc_arg_ctx_ref(meta->btf, &args[argno]))
+ return KF_ARG_PTR_TO_BTF_ID;
+
/* In this function, we verify the kfunc's BTF as per the argument type,
* leaving the rest of the verification with respect to the register
* type to our caller. When a set of conditions hold in the BTF type of
--
2.34.1
Then below example progs store the skb on TC tx and retrieve it later on
when TC rx hook is called. This is a ping-pong of packets between veth
pair interfaces. We start with Tx on interface with progs attached and
then receive the packets back. I have stripped the other logic so that we
focus only on kptr storage:
-------------------------------8<-------------------------------
struct sk_buff *bpf_skb_acquire(struct sk_buff *skb) __ksym;
void bpf_skb_release(struct sk_buff *skb) __ksym;
void *bpf_cast_to_kern_ctx(void *) __ksym;
u32 key = 0;
u32 get_key = 0;
struct map_value {
struct sk_buff __kptr * skb;
};
struct {
__uint(type, BPF_MAP_TYPE_ARRAY);
__type(key, int);
__type(value, struct map_value);
__uint(max_entries, 16);
} skb_map SEC(".maps");
SEC("tc") int bpf_tc_ingress(struct __sk_buff *skb)
{
struct map_value *map_entry;
struct sk_buff *tmp = NULL;
map_entry = bpf_map_lookup_elem(&skb_map, &get_key);
if (!map_entry) {
bpf_printk("no entry at get key %d\n", get_key);
return TC_ACT_SHOT;
}
tmp = bpf_kptr_xchg(&map_entry->skb, tmp);
if (!tmp)
return TC_ACT_UNSPEC;
bpf_printk("retrieved skb %p from index %d\n", tmp, get_key);
get_key++;
if (get_key >= 16)
get_key = 0;
bpf_skb_release(tmp);
return TC_ACT_OK;
}
SEC("tc") int bpf_tc_egress(struct __sk_buff *ctx)
{
struct sk_buff *skbk = bpf_cast_to_kern_ctx(ctx);
struct map_value *map_entry;
struct map_value tmp_entry;
struct sk_buff *tmp;
tmp_entry.skb = NULL;
bpf_map_update_elem(&skb_map, &key, &tmp_entry, BPF_ANY);
map_entry = bpf_map_lookup_elem(&skb_map, &key);
if (!map_entry)
return TC_ACT_SHOT;
skbk = bpf_skb_acquire(skbk);
if (!skbk)
return TC_ACT_SHOT;
tmp = bpf_kptr_xchg(&map_entry->skb, skbk);
if (tmp)
bpf_skb_release(tmp);
bpf_printk("stored skb %p at index %d\n", skbk, key);
key++;
if (key >= 16)
key = 0;
return TC_ACT_OK;
}
char LICENSE[] SEC("license") = "GPL";
------------------------------->8-------------------------------
>
> > I tried to simplify the use case that customer has, but I am a bit worried
> > that it might only confuse people more :/ however, here it is:
>
> No. not at all. I suspect the use case has some similarity to the
> net-timestamp patches (https://lore.kernel.org/bpf/20241028110535.82999-1-kerneljasonxing@gmail.com/)
> which uses a skb tskey to associate/co-relate different timestamp.
I was aware of this work and I will give this a deep look, thanks.
>
> > On TC egress hook skb is stored in a map - reason for picking it over the
> > linked list or rbtree is that we want to be able to access skbs via some index,
> > say a hash. This is where we bump the skb's refcount via acquire kfunc.
> >
> > During TC ingress hook on the same interface, the skb that was previously
> > stored in map is retrieved, current skb that resides in the context of
> > hook carries the timestamp via metadata. We then use the retrieved skb and
> > tstamp from metadata on skb_tstamp_tx() (another kfunc) and finally
> > decrement skb's refcount via release kfunc.
> >
> >
> > Anyways, since we are able to do similar operations on task_struct
> > (holding it in map via kptr), I don't see a reason why wouldn't we allow
> > ourselves to do it on sk_buffs, no?
>
> skb holds other things like dev and dst, like someone may be trying to
> remove the netdevice and route...etc. Overall, yes, the skb refcnt will
> eventually be decremented when the map is freed like other kptr (e.g. task)
> do.
>
next prev parent reply other threads:[~2024-12-03 20:47 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-26 17:05 Storing sk_buffs as kptrs in map Maciej Fijalkowski
2024-11-26 19:56 ` [External] " Amery Hung
2024-11-26 20:47 ` Martin KaFai Lau
2024-11-27 19:07 ` Maciej Fijalkowski
2024-11-27 20:54 ` Martin KaFai Lau
2024-12-03 20:46 ` Maciej Fijalkowski [this message]
2024-12-04 23:24 ` Martin KaFai Lau
2024-12-06 16:24 ` Maciej Fijalkowski
2024-12-07 0:36 ` Martin KaFai Lau
2024-12-09 13:17 ` Maciej Fijalkowski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z09uQ48lKEsORsS1@boxer \
--to=maciej.fijalkowski@intel.com \
--cc=amery.hung@bytedance.com \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=magnus.karlsson@intel.com \
--cc=martin.lau@linux.dev \
--cc=sreedevi.joshi@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox