BPF List
 help / color / mirror / Atom feed
From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
To: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Amery Hung <amery.hung@bytedance.com>, <bpf@vger.kernel.org>,
	<magnus.karlsson@intel.com>, <sreedevi.joshi@intel.com>,
	<ast@kernel.org>
Subject: Re: [External] Storing sk_buffs as kptrs in map
Date: Tue, 3 Dec 2024 21:46:59 +0100	[thread overview]
Message-ID: <Z09uQ48lKEsORsS1@boxer> (raw)
In-Reply-To: <d1e95498-4613-43e0-bc6b-6f6157802649@linux.dev>

On Wed, Nov 27, 2024 at 12:54:41PM -0800, Martin KaFai Lau wrote:
> On 11/27/24 11:07 AM, Maciej Fijalkowski wrote:
> > But kfunc does not work on PTR_TO_CTX - it takes in directly sk_buff, not
> > __sk_buff. As I mention above we use bpf_cast_to_kern_ctx() and per my
> > current limited understanding it overwrites the reg->type to
> > PTR_TO_BTF_ID | PTR_TRUSTED.
> 
> Can you try skip calling the bpf_cast_to_kern_ctx and directly pass the
> "struct __sk_buff *skb" to the "struct sk_buff *bpf_skb_acquire(struct
> __sk_buff *skb).

That didn't help, log below:

libbpf: prog 'bpf_tc_ingress': -- BEGIN PROG LOAD LOG --
0: R1=ctx() R10=fp0
; map_entry = bpf_map_lookup_elem(&skb_map, &get_key); @ bpf_bpf.c:145
0: (18) r6 = 0xffffc9000d186004       ; R6_w=map_value(map=bpf_bpf.bss,ks=4,vs=8,off=4)
2: (18) r1 = 0xffff88984ab8ce00       ; R1_w=map_ptr(map=skb_map,ks=4,vs=8)
4: (18) r2 = 0xffffc9000d186004       ; R2_w=map_value(map=bpf_bpf.bss,ks=4,vs=8,off=4)
6: (85) call bpf_map_lookup_elem#1    ; R0_w=map_value_or_null(id=1,map=skb_map,ks=4,vs=8)
; if (!map_entry) { @ bpf_bpf.c:146
7: (55) if r0 != 0x0 goto pc+7 15: R0_w=map_value(map=skb_map,ks=4,vs=8) R6_w=map_value(map=bpf_bpf.bss,ks=4,vs=8,off=4) R10=fp0
; tmp = bpf_kptr_xchg(&map_entry->skb, tmp); @ bpf_bpf.c:151
15: (bf) r1 = r0                      ; R0_w=map_value(map=skb_map,ks=4,vs=8) R1_w=map_value(map=skb_map,ks=4,vs=8)
16: (b7) r2 = 0                       ; R2_w=0
17: (85) call bpf_kptr_xchg#194       ; R0_w=ptr_or_null_sk_buff(id=3,ref_obj_id=3) refs=3
18: (bf) r6 = r0                      ; R0_w=ptr_or_null_sk_buff(id=3,ref_obj_id=3) R6_w=ptr_or_null_sk_buff(id=3,ref_obj_id=3) refs=3
19: (b4) w7 = -1                      ; R7=0xffffffff refs=3
; if (!tmp) @ bpf_bpf.c:152
20: (15) if r6 == 0x0 goto pc+17      ; R6=ptr_sk_buff(ref_obj_id=3) refs=3
; bpf_printk("retrieved skb %p from index %d\n", tmp, get_key); @ bpf_bpf.c:155
21: (18) r8 = 0xffffc9000d186004      ; R8_w=map_value(map=bpf_bpf.bss,ks=4,vs=8,off=4) refs=3
23: (61) r4 = *(u32 *)(r8 +0)         ; R4_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) R8_w=map_value(map=bpf_bpf.bss,ks=4,vs=8,off=4) refs=3
24: (18) r1 = 0xffff88984ab8cb20      ; R1_w=map_value(map=bpf_bpf.rodata,ks=4,vs=83,off=24) refs=3
26: (b4) w2 = 32                      ; R2_w=32 refs=3
27: (bf) r3 = r6                      ; R3_w=ptr_sk_buff(ref_obj_id=3) R6=ptr_sk_buff(ref_obj_id=3) refs=3
28: (85) call bpf_trace_printk#6      ; R0_w=scalar() refs=3
; get_key++; @ bpf_bpf.c:157
29: (61) r1 = *(u32 *)(r8 +0)         ; R1_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) R8_w=map_value(map=bpf_bpf.bss,ks=4,vs=8,off=4) refs=3
30: (04) w1 += 1                      ; R1_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) refs=3
31: (b4) w7 = 0                       ; R7_w=0 refs=3
32: (b4) w2 = 0                       ; R2=0 refs=3
; if (get_key >= 16) @ bpf_bpf.c:158
33: (26) if w1 > 0xf goto pc+1        ; R1=scalar(smin=smin32=0,smax=umax=smax32=umax32=15,var_off=(0x0; 0xf)) refs=3
34: (bc) w2 = w1                      ; R1=scalar(id=4,smin=smin32=0,smax=umax=smax32=umax32=15,var_off=(0x0; 0xf)) R2_w=scalar(id=4,smin=smin32=0,smax=umax=smax32=umax32=15,var_off=(0x0; 0xf)) refs=3
35: (63) *(u32 *)(r8 +0) = r2         ; R2_w=scalar(id=4,smin=smin32=0,smax=umax=smax32=umax32=15,var_off=(0x0; 0xf)) R8=map_value(map=bpf_bpf.bss,ks=4,vs=8,off=4) refs=3
; bpf_skb_release((struct __sk_buff *)tmp); @ bpf_bpf.c:161
36: (bf) r1 = r6                      ; R1_w=ptr_sk_buff(ref_obj_id=3) R6=ptr_sk_buff(ref_obj_id=3) refs=3
37: (85) call bpf_skb_release#102037
arg#0 expected pointer to ctx, but got ptr_
processed 34 insns (limit 1000000) max_states_per_insn 0 total_states 3 peak_states 3 mark_read 1
-- END PROG LOAD LOG --


Still the same problem. Also even it would work it was not very convenient
to cast these types back and forth...I then tried to store __sk_buff as
kptr but I ended up with:

"map 'skb_map' has to have BTF in order to use bpf_kptr_xchg"

which got me lost:) I have a solution though which I'd like to discuss.

> 
> Please share the patch and the test case. It will be easier for others to help.

I have come up with rather simple way of achieving what I desired when
starting this thread, how about something like this:

From 0df7760330cccfe71235b56018d0a33d4a3b9863 Mon Sep 17 00:00:00 2001
From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Date: Tue, 3 Dec 2024 21:00:44 +0100
Subject: [PATCH RFC bpf-next] bpf: add __ctx_ref kfunc argument suffix

The use case is when user wants to use sk_buff pointers as kptrs against
program types that take in __sk_buff struct as context.

A pair of kfuncs for acquiring and releasing skb would look as follows:

__bpf_kfunc struct sk_buff *bpf_skb_acquire(struct sk_buff *skb__ctx_ref)
__bpf_kfunc void bpf_skb_release(struct sk_buff *skb__ctx_ref)

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
 kernel/bpf/verifier.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 60938b136365..b16a39d28f8a 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -11303,6 +11303,11 @@ static bool is_kfunc_arg_const_str(const struct btf *btf, const struct btf_param
 	return btf_param_match_suffix(btf, arg, "__str");
 }
 
+static bool is_kfunc_arg_ctx_ref(const struct btf *btf, const struct btf_param *arg)
+{
+	return btf_param_match_suffix(btf, arg, "__ctx_ref");
+}
+
 static bool is_kfunc_arg_scalar_with_name(const struct btf *btf,
 					  const struct btf_param *arg,
 					  const char *name)
@@ -11599,6 +11604,9 @@ get_kfunc_ptr_arg_type(struct bpf_verifier_env *env,
 	if (meta->func_id == special_kfunc_list[KF_bpf_cast_to_kern_ctx])
 		return KF_ARG_PTR_TO_CTX;
 
+	if (is_kfunc_arg_ctx_ref(meta->btf, &args[argno]))
+		return KF_ARG_PTR_TO_BTF_ID;
+
 	/* In this function, we verify the kfunc's BTF as per the argument type,
 	 * leaving the rest of the verification with respect to the register
 	 * type to our caller. When a set of conditions hold in the BTF type of
-- 
2.34.1

Then below example progs store the skb on TC tx and retrieve it later on
when TC rx hook is called. This is a ping-pong of packets between veth
pair interfaces. We start with Tx on interface with progs attached and
then receive the packets back. I have stripped the other logic so that we
focus only on kptr storage:

-------------------------------8<-------------------------------

struct sk_buff *bpf_skb_acquire(struct sk_buff *skb) __ksym;
void bpf_skb_release(struct sk_buff *skb) __ksym;
void *bpf_cast_to_kern_ctx(void *) __ksym;

u32 key = 0;
u32 get_key = 0;

struct map_value {
	struct sk_buff __kptr * skb;
};

struct {
    __uint(type, BPF_MAP_TYPE_ARRAY);
    __type(key, int);
    __type(value, struct map_value);
    __uint(max_entries, 16);
} skb_map SEC(".maps");

SEC("tc") int bpf_tc_ingress(struct __sk_buff *skb)
{
	struct map_value *map_entry;
	struct sk_buff *tmp = NULL;

	map_entry = bpf_map_lookup_elem(&skb_map, &get_key);
	if (!map_entry) {
		bpf_printk("no entry at get key %d\n", get_key);
		return TC_ACT_SHOT;
	}

	tmp = bpf_kptr_xchg(&map_entry->skb, tmp);
	if (!tmp)
		return TC_ACT_UNSPEC;

	bpf_printk("retrieved skb %p from index %d\n", tmp, get_key);

	get_key++;
	if (get_key >= 16)
		get_key = 0;

	bpf_skb_release(tmp);

	return TC_ACT_OK;
}

SEC("tc") int bpf_tc_egress(struct __sk_buff *ctx)
{
	struct sk_buff *skbk = bpf_cast_to_kern_ctx(ctx);
	struct map_value *map_entry;
	struct map_value tmp_entry;
	struct sk_buff *tmp;

	tmp_entry.skb = NULL;
	bpf_map_update_elem(&skb_map, &key, &tmp_entry, BPF_ANY);
	map_entry = bpf_map_lookup_elem(&skb_map, &key);
	if (!map_entry)
		return TC_ACT_SHOT;

	skbk = bpf_skb_acquire(skbk);
	if (!skbk)
		return TC_ACT_SHOT;

	tmp = bpf_kptr_xchg(&map_entry->skb, skbk);
	if (tmp)
		bpf_skb_release(tmp);
	bpf_printk("stored skb %p at index %d\n", skbk, key);

	key++;
	if (key >= 16)
		key = 0;

	return TC_ACT_OK;
}
char LICENSE[] SEC("license") = "GPL";

------------------------------->8-------------------------------

> 
> > I tried to simplify the use case that customer has, but I am a bit worried
> > that it might only confuse people more :/ however, here it is:
> 
> No. not at all. I suspect the use case has some similarity to the
> net-timestamp patches (https://lore.kernel.org/bpf/20241028110535.82999-1-kerneljasonxing@gmail.com/)
> which uses a skb tskey to associate/co-relate different timestamp.

I was aware of this work and I will give this a deep look, thanks.

> 
> > On TC egress hook skb is stored in a map - reason for picking it over the
> > linked list or rbtree is that we want to be able to access skbs via some index,
> > say a hash. This is where we bump the skb's refcount via acquire kfunc.
> > 
> > During TC ingress hook on the same interface, the skb that was previously
> > stored in map is retrieved, current skb that resides in the context of
> > hook carries the timestamp via metadata. We then use the retrieved skb and
> > tstamp from metadata on skb_tstamp_tx() (another kfunc) and finally
> > decrement skb's refcount via release kfunc.
> > 
> > 
> > Anyways, since we are able to do similar operations on task_struct
> > (holding it in map via kptr), I don't see a reason why wouldn't we allow
> > ourselves to do it on sk_buffs, no?
> 
> skb holds other things like dev and dst, like someone may be trying to
> remove the netdevice and route...etc. Overall, yes, the skb refcnt will
> eventually be decremented when the map is freed like other kptr (e.g. task)
> do.
> 

  reply	other threads:[~2024-12-03 20:47 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-26 17:05 Storing sk_buffs as kptrs in map Maciej Fijalkowski
2024-11-26 19:56 ` [External] " Amery Hung
2024-11-26 20:47   ` Martin KaFai Lau
2024-11-27 19:07     ` Maciej Fijalkowski
2024-11-27 20:54       ` Martin KaFai Lau
2024-12-03 20:46         ` Maciej Fijalkowski [this message]
2024-12-04 23:24           ` Martin KaFai Lau
2024-12-06 16:24             ` Maciej Fijalkowski
2024-12-07  0:36               ` Martin KaFai Lau
2024-12-09 13:17                 ` Maciej Fijalkowski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z09uQ48lKEsORsS1@boxer \
    --to=maciej.fijalkowski@intel.com \
    --cc=amery.hung@bytedance.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=magnus.karlsson@intel.com \
    --cc=martin.lau@linux.dev \
    --cc=sreedevi.joshi@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox