From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-ed1-f52.google.com (mail-ed1-f52.google.com [209.85.208.52]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 32DCC2367A2 for ; Wed, 23 Jul 2025 17:36:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.52 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753292215; cv=none; b=r8iFIFq/uIEfuPJcAPMVaIgclyWe0kmSdg6YioQRV5l6v7TA+tboUUP3LsX9rt9PgXrHPXTnigMKuS5NpY6LcBkAFcNJtvSPg6akAZwa/bGWIaaUEJ+0DKQrigVdKxQYI42oZtpamze9abUHxBHqU0KibXZdagvZnzlVbkX1yBI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753292215; c=relaxed/simple; bh=0Kn6I3D3GbGg/thSU32On+20t2U6sz2BAcDAGROf6zU=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:References: In-Reply-To:To:Cc; b=Yrnruif+Z1Pb1XjU9hjJW/jVVc/8yFQpjG0toLHMdE0em8LBnVjKDMgV6s/T5D2G7MLzKrO0eFsIpap50cSl8zC23eE3SM1t7rBzI8/wnmkItSDqL3cWgXG3pZgnh6p+IEZMnourfw4P4PI4PEuWevaViM00cXyk7Fnu+fSnY9Q= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=cloudflare.com; spf=pass smtp.mailfrom=cloudflare.com; dkim=pass (2048-bit key) header.d=cloudflare.com header.i=@cloudflare.com header.b=A7Ln79EF; arc=none smtp.client-ip=209.85.208.52 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=cloudflare.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=cloudflare.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=cloudflare.com header.i=@cloudflare.com header.b="A7Ln79EF" Received: by mail-ed1-f52.google.com with SMTP id 4fb4d7f45d1cf-60c4521ae2cso332548a12.0 for ; Wed, 23 Jul 2025 10:36:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudflare.com; s=google09082023; t=1753292211; x=1753897011; darn=vger.kernel.org; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=ZrjzRGrB9dP5LyoRHTg/j41WqS4k0TNvsR32BBSGXAM=; b=A7Ln79EFAG8szQy5TAn5xnipTVgv7X5uJJrKKNFfWv/OhymDFYE6xj6J0dGtg4EPbR M9ykX1YQ9POqTvBvB4LWRk4SUkOEdAH7ypOw7T+RKtewfxObpMxC4troqhu8yoodkx1/ pgFofcJqaOO4BNwIcVDHHm1NCzH5YTKlFoNmbbdOaYgTB6QfJM9qfjQG6ktDxljhXylx pOxtsBCcXkVJBj/KNHYU3TL/EY235A4X6R/20bSf+gzaYRi+m5+Y9y25kWVKHM6fe/iz 1QwrK3TcZ3Wd5/okrbkHaXLvw0yxD6nFKpoWxNrAdMqKohBY8zRPaY8adEXIoci2ZCRi GTaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1753292211; x=1753897011; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ZrjzRGrB9dP5LyoRHTg/j41WqS4k0TNvsR32BBSGXAM=; b=FzVScFXdiqbBJWdHTmDdhA9nIzwlCxHPdYAvDTc/1Kc70GS7yl/m0QbxQYpa25C5eo V/s5Rwibx/t6v6SJjYaRQTI44vatVk7eixrRLqhsK/9Pj9VNIGdcxyizxgIXdwr14gnG h/e4mgK75w/p5xclqCoZUHdO0lOo4qcyEQvYhC3WqlAcvoh7sQS406UuFrTrHgTlUTRC Lz58RUrJdvgx5yCxzHi3GA+vCEP+JyewcCUMspuKvtKIxRXLX4hXbwPqLfCJ8SAqfrmZ qc1R2aQMWUCILrTgAkwgvkqMQqGN6FUKxGbKaLiT0PeeOB3golq7kC0sxsRuUCybOier rgGQ== X-Forwarded-Encrypted: i=1; AJvYcCUIVr5lj8AaPwQqq94N7fsnFRutRBVkwo2UDF0izlM7v20ZYdPLQXrJMQ85kBU5a+BlWVIwFNg=@vger.kernel.org X-Gm-Message-State: AOJu0YxMxTVGVPtULJL2wFEEVJNnLWohmSX1FSXpSd3ztERX01vSp0ZI 8BhsfZ0O8CNu0dFHDeAO/qpCnN8AnmnHmAJ97U6i8avtf2+9ZXbD/y6IQtcRC5fN4e4= X-Gm-Gg: ASbGncvC5/m5x1SxOm+HY9LmoF7HWF/WODG9oJCDr7M9UnI/e7+tHOXVi+2ffXfeZyA 4ERSTAu+IKakLxeSr4dMyScaGMCLEaWBmMGnGhsX6ITqwjIh5o1gaSeRs7OZAyZpZtuGXPzs2na UUGHOvz2w8WlR7XjpU0CHHsh0nm8WL9nYv6HNzEUGrDFTa3KAtJj6gSYZRXkpv6bwPTGhWsyPAd YfePfr3xJT1LvPpe5fqgfl2cPVUs0S8A+gdNeZ/egPwXv8FI/b2MxjEFdTd9vE1acD8IA7XPOzo NFKDE3x1iqsLjWtGVP90aCu4wWeGrhxO2NaKJy0NnN7/gRwKbwOKmB0KgGNiuSt2eOQL6Kr6RDT R/DLFNYxSCUZni8YicHdNEeaL0NtmsnkG0fDnqhXo7HpCNzjmFAbbVBBUYTwP/+Ss8YF8tEI= X-Google-Smtp-Source: AGHT+IHtgmcetgAasTjSGahfxTq8wxDbfKRmBm/6OyCSubibAd2apkjgr6mO63InzxT2J+IC7LhB/g== X-Received: by 2002:a05:6402:26c7:b0:602:1d01:286a with SMTP id 4fb4d7f45d1cf-6149b40cdb8mr3203669a12.6.1753292211357; Wed, 23 Jul 2025 10:36:51 -0700 (PDT) Received: from cloudflare.com (79.184.149.187.ipv4.supernova.orange.pl. [79.184.149.187]) by smtp.gmail.com with ESMTPSA id 4fb4d7f45d1cf-612c8f33964sm8728306a12.18.2025.07.23.10.36.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 23 Jul 2025 10:36:50 -0700 (PDT) From: Jakub Sitnicki Date: Wed, 23 Jul 2025 19:36:46 +0200 Subject: [PATCH bpf-next v4 1/8] bpf: Add dynptr type for skb metadata Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Message-Id: <20250723-skb-metadata-thru-dynptr-v4-1-a0fed48bcd37@cloudflare.com> References: <20250723-skb-metadata-thru-dynptr-v4-0-a0fed48bcd37@cloudflare.com> In-Reply-To: <20250723-skb-metadata-thru-dynptr-v4-0-a0fed48bcd37@cloudflare.com> To: bpf@vger.kernel.org Cc: Alexei Starovoitov , Andrii Nakryiko , Arthur Fabre , Daniel Borkmann , Eduard Zingerman , Eric Dumazet , Jakub Kicinski , Jesper Dangaard Brouer , Jesse Brandeburg , Joanne Koong , Lorenzo Bianconi , Martin KaFai Lau , =?utf-8?q?Toke_H=C3=B8iland-J=C3=B8rgensen?= , Yan Zhai , kernel-team@cloudflare.com, netdev@vger.kernel.org, Stanislav Fomichev X-Mailer: b4 0.15-dev-07fe9 Add a dynptr type, similar to skb dynptr, but for the skb metadata access. The dynptr provides an alternative to __sk_buff->data_meta for accessing the custom metadata area allocated using the bpf_xdp_adjust_meta() helper. More importantly, it abstracts away the fact where the storage for the custom metadata lives, which opens up the way to persist the metadata by relocating it as the skb travels through the network stack layers. A notable difference between the skb and the skb_meta dynptr is that writes to the skb_meta dynptr don't invalidate either skb or skb_meta dynptr slices, since they cannot lead to a skb->head reallocation. skb_meta dynptr ops are stubbed out and implemented by subsequent changes. Only the program types which can access __sk_buff->data_meta today are allowed to create a dynptr for skb metadata at the moment. We need to modify the network stack to persist the metadata across layers before opening up access to other BPF hooks. Once more BPF hooks gain access to skb_meta dynptr, we will also need to add a read-only variant of the helper similar to bpf_dynptr_from_skb_rdonly. Signed-off-by: Jakub Sitnicki --- include/linux/bpf.h | 7 ++++++- kernel/bpf/helpers.c | 7 +++++++ kernel/bpf/log.c | 2 ++ kernel/bpf/verifier.c | 12 +++++++++++- net/core/filter.c | 27 +++++++++++++++++++++++++++ 5 files changed, 53 insertions(+), 2 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index f9cd2164ed23..49ddcf17fb4c 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -749,12 +749,15 @@ enum bpf_type_flag { */ MEM_WRITE = BIT(18 + BPF_BASE_TYPE_BITS), + /* DYNPTR points to skb_metadata_end()-skb_metadata_len() */ + DYNPTR_TYPE_SKB_META = BIT(19 + BPF_BASE_TYPE_BITS), + __BPF_TYPE_FLAG_MAX, __BPF_TYPE_LAST_FLAG = __BPF_TYPE_FLAG_MAX - 1, }; #define DYNPTR_TYPE_FLAG_MASK (DYNPTR_TYPE_LOCAL | DYNPTR_TYPE_RINGBUF | DYNPTR_TYPE_SKB \ - | DYNPTR_TYPE_XDP) + | DYNPTR_TYPE_XDP | DYNPTR_TYPE_SKB_META) /* Max number of base types. */ #define BPF_BASE_TYPE_LIMIT (1UL << BPF_BASE_TYPE_BITS) @@ -1348,6 +1351,8 @@ enum bpf_dynptr_type { BPF_DYNPTR_TYPE_SKB, /* Underlying data is a xdp_buff */ BPF_DYNPTR_TYPE_XDP, + /* Points to skb_metadata_end()-skb_metadata_len() */ + BPF_DYNPTR_TYPE_SKB_META, }; int bpf_dynptr_check_size(u32 size); diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index 6b4877e85a68..9552b32208c5 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -1780,6 +1780,8 @@ static int __bpf_dynptr_read(void *dst, u32 len, const struct bpf_dynptr_kern *s return __bpf_skb_load_bytes(src->data, src->offset + offset, dst, len); case BPF_DYNPTR_TYPE_XDP: return __bpf_xdp_load_bytes(src->data, src->offset + offset, dst, len); + case BPF_DYNPTR_TYPE_SKB_META: + return -EOPNOTSUPP; /* not implemented */ default: WARN_ONCE(true, "bpf_dynptr_read: unknown dynptr type %d\n", type); return -EFAULT; @@ -1836,6 +1838,8 @@ int __bpf_dynptr_write(const struct bpf_dynptr_kern *dst, u32 offset, void *src, if (flags) return -EINVAL; return __bpf_xdp_store_bytes(dst->data, dst->offset + offset, src, len); + case BPF_DYNPTR_TYPE_SKB_META: + return -EOPNOTSUPP; /* not implemented */ default: WARN_ONCE(true, "bpf_dynptr_write: unknown dynptr type %d\n", type); return -EFAULT; @@ -1882,6 +1886,7 @@ BPF_CALL_3(bpf_dynptr_data, const struct bpf_dynptr_kern *, ptr, u32, offset, u3 return (unsigned long)(ptr->data + ptr->offset + offset); case BPF_DYNPTR_TYPE_SKB: case BPF_DYNPTR_TYPE_XDP: + case BPF_DYNPTR_TYPE_SKB_META: /* skb and xdp dynptrs should use bpf_dynptr_slice / bpf_dynptr_slice_rdwr */ return 0; default: @@ -2710,6 +2715,8 @@ __bpf_kfunc void *bpf_dynptr_slice(const struct bpf_dynptr *p, u32 offset, bpf_xdp_copy_buf(ptr->data, ptr->offset + offset, buffer__opt, len, false); return buffer__opt; } + case BPF_DYNPTR_TYPE_SKB_META: + return NULL; /* not implemented */ default: WARN_ONCE(true, "unknown dynptr type %d\n", type); return NULL; diff --git a/kernel/bpf/log.c b/kernel/bpf/log.c index 38050f4ee400..e4983c1303e7 100644 --- a/kernel/bpf/log.c +++ b/kernel/bpf/log.c @@ -498,6 +498,8 @@ const char *dynptr_type_str(enum bpf_dynptr_type type) return "skb"; case BPF_DYNPTR_TYPE_XDP: return "xdp"; + case BPF_DYNPTR_TYPE_SKB_META: + return "skb_meta"; case BPF_DYNPTR_TYPE_INVALID: return ""; default: diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index e2fcea860755..c21e2cd63c83 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -674,6 +674,8 @@ static enum bpf_dynptr_type arg_to_dynptr_type(enum bpf_arg_type arg_type) return BPF_DYNPTR_TYPE_SKB; case DYNPTR_TYPE_XDP: return BPF_DYNPTR_TYPE_XDP; + case DYNPTR_TYPE_SKB_META: + return BPF_DYNPTR_TYPE_SKB_META; default: return BPF_DYNPTR_TYPE_INVALID; } @@ -690,6 +692,8 @@ static enum bpf_type_flag get_dynptr_type_flag(enum bpf_dynptr_type type) return DYNPTR_TYPE_SKB; case BPF_DYNPTR_TYPE_XDP: return DYNPTR_TYPE_XDP; + case BPF_DYNPTR_TYPE_SKB_META: + return DYNPTR_TYPE_SKB_META; default: return 0; } @@ -2274,7 +2278,8 @@ static bool reg_is_pkt_pointer_any(const struct bpf_reg_state *reg) static bool reg_is_dynptr_slice_pkt(const struct bpf_reg_state *reg) { return base_type(reg->type) == PTR_TO_MEM && - (reg->type & DYNPTR_TYPE_SKB || reg->type & DYNPTR_TYPE_XDP); + (reg->type & + (DYNPTR_TYPE_SKB | DYNPTR_TYPE_XDP | DYNPTR_TYPE_SKB_META)); } /* Unmodified PTR_TO_PACKET[_META,_END] register from ctx access. */ @@ -12189,6 +12194,7 @@ enum special_kfunc_type { KF_bpf_rbtree_right, KF_bpf_dynptr_from_skb, KF_bpf_dynptr_from_xdp, + KF_bpf_dynptr_from_skb_meta, KF_bpf_dynptr_slice, KF_bpf_dynptr_slice_rdwr, KF_bpf_dynptr_clone, @@ -12238,9 +12244,11 @@ BTF_ID(func, bpf_rbtree_right) #ifdef CONFIG_NET BTF_ID(func, bpf_dynptr_from_skb) BTF_ID(func, bpf_dynptr_from_xdp) +BTF_ID(func, bpf_dynptr_from_skb_meta) #else BTF_ID_UNUSED BTF_ID_UNUSED +BTF_ID_UNUSED #endif BTF_ID(func, bpf_dynptr_slice) BTF_ID(func, bpf_dynptr_slice_rdwr) @@ -13214,6 +13222,8 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_ dynptr_arg_type |= DYNPTR_TYPE_SKB; } else if (meta->func_id == special_kfunc_list[KF_bpf_dynptr_from_xdp]) { dynptr_arg_type |= DYNPTR_TYPE_XDP; + } else if (meta->func_id == special_kfunc_list[KF_bpf_dynptr_from_skb_meta]) { + dynptr_arg_type |= DYNPTR_TYPE_SKB_META; } else if (meta->func_id == special_kfunc_list[KF_bpf_dynptr_clone] && (dynptr_arg_type & MEM_UNINIT)) { enum bpf_dynptr_type parent_type = meta->initialized_dynptr.type; diff --git a/net/core/filter.c b/net/core/filter.c index 7a72f766aacf..0755dfc0fc2f 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -11995,6 +11995,22 @@ __bpf_kfunc int bpf_dynptr_from_skb(struct __sk_buff *s, u64 flags, return 0; } +__bpf_kfunc int bpf_dynptr_from_skb_meta(struct __sk_buff *skb_, u64 flags, + struct bpf_dynptr *ptr__uninit) +{ + struct bpf_dynptr_kern *ptr = (struct bpf_dynptr_kern *)ptr__uninit; + struct sk_buff *skb = (struct sk_buff *)skb_; + + if (flags) { + bpf_dynptr_set_null(ptr); + return -EINVAL; + } + + bpf_dynptr_init(ptr, skb, BPF_DYNPTR_TYPE_SKB_META, 0, skb_metadata_len(skb)); + + return 0; +} + __bpf_kfunc int bpf_dynptr_from_xdp(struct xdp_md *x, u64 flags, struct bpf_dynptr *ptr__uninit) { @@ -12169,6 +12185,10 @@ BTF_KFUNCS_START(bpf_kfunc_check_set_skb) BTF_ID_FLAGS(func, bpf_dynptr_from_skb, KF_TRUSTED_ARGS) BTF_KFUNCS_END(bpf_kfunc_check_set_skb) +BTF_KFUNCS_START(bpf_kfunc_check_set_skb_meta) +BTF_ID_FLAGS(func, bpf_dynptr_from_skb_meta, KF_TRUSTED_ARGS) +BTF_KFUNCS_END(bpf_kfunc_check_set_skb_meta) + BTF_KFUNCS_START(bpf_kfunc_check_set_xdp) BTF_ID_FLAGS(func, bpf_dynptr_from_xdp) BTF_KFUNCS_END(bpf_kfunc_check_set_xdp) @@ -12190,6 +12210,11 @@ static const struct btf_kfunc_id_set bpf_kfunc_set_skb = { .set = &bpf_kfunc_check_set_skb, }; +static const struct btf_kfunc_id_set bpf_kfunc_set_skb_meta = { + .owner = THIS_MODULE, + .set = &bpf_kfunc_check_set_skb_meta, +}; + static const struct btf_kfunc_id_set bpf_kfunc_set_xdp = { .owner = THIS_MODULE, .set = &bpf_kfunc_check_set_xdp, @@ -12225,6 +12250,8 @@ static int __init bpf_kfunc_init(void) ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_LWT_SEG6LOCAL, &bpf_kfunc_set_skb); ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_NETFILTER, &bpf_kfunc_set_skb); ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &bpf_kfunc_set_skb); + ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &bpf_kfunc_set_skb_meta); + ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_ACT, &bpf_kfunc_set_skb_meta); ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, &bpf_kfunc_set_xdp); ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_CGROUP_SOCK_ADDR, &bpf_kfunc_set_sock_addr); -- 2.43.0