All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mahe Tardy <mahe.tardy@gmail.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: bpf <bpf@vger.kernel.org>,
	Martin KaFai Lau <martin.lau@linux.dev>,
	Daniel Borkmann <daniel@iogearbox.net>,
	John Fastabend <john.fastabend@gmail.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>
Subject: Re: [PATCH bpf-next v1 3/4] bpf: add bpf_icmp_send_unreach cgroup_skb kfunc
Date: Fri, 11 Jul 2025 12:57:43 +0200	[thread overview]
Message-ID: <aHDuJ5rNeMTnUSju@gmail.com> (raw)
In-Reply-To: <CAADnVQKq_-=N7eJoup6AqFngoocT+D02NF0md_3mi2Vcrw09nQ@mail.gmail.com>

On Thu, Jul 10, 2025 at 09:07:59AM -0700, Alexei Starovoitov wrote:
> On Thu, Jul 10, 2025 at 3:26 AM Mahe Tardy <mahe.tardy@gmail.com> wrote:
> >
> > This is needed in the context of Tetragon to provide improved feedback
> > (in contrast to just dropping packets) to east-west traffic when blocked
> > by policies using cgroup_skb programs.
> >
> > This reuse concepts from netfilter reject target codepath with the
> > differences that:
> > * Packets are cloned since the BPF user can still return SK_PASS from
> >   the cgroup_skb progs and the current skb need to stay untouched
> >   (cgroup_skb hooks only allow read-only skb payload).
> > * Since cgroup_skb programs are called late in the stack, checksums do
> >   not need to be computed or verified, and IPv4 fragmentation does not
> >   need to be checked (ip_local_deliver should take care of that
> >   earlier).
> >
> > Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com>
> > ---
> >  net/core/filter.c | 61 ++++++++++++++++++++++++++++++++++++++++++++++-
> >  1 file changed, 60 insertions(+), 1 deletion(-)
> >
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index ab456bf1056e..9215f79e7690 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -85,6 +85,8 @@
> >  #include <linux/un.h>
> >  #include <net/xdp_sock_drv.h>
> >  #include <net/inet_dscp.h>
> > +#include <linux/icmp.h>
> > +#include <net/icmp.h>
> >
> >  #include "dev.h"
> >
> > @@ -12140,6 +12142,53 @@ __bpf_kfunc int bpf_sock_ops_enable_tx_tstamp(struct bpf_sock_ops_kern *skops,
> >         return 0;
> >  }
> >
> > +__bpf_kfunc int bpf_icmp_send_unreach(struct __sk_buff *__skb, int code)
> > +{
> > +       struct sk_buff *skb = (struct sk_buff *)__skb;
> > +       struct sk_buff *nskb;
> > +
> > +       switch (skb->protocol) {
> > +       case htons(ETH_P_IP):
> > +               if (code < 0 || code > NR_ICMP_UNREACH)
> > +                       return -EINVAL;
> > +
> > +               nskb = skb_clone(skb, GFP_ATOMIC);
> > +               if (!nskb)
> > +                       return -ENOMEM;
> > +
> > +               if (ip_route_reply_fetch_dst(nskb) < 0) {
> > +                       kfree_skb(nskb);
> > +                       return -EHOSTUNREACH;
> > +               }
> > +
> > +               icmp_send(nskb, ICMP_DEST_UNREACH, code, 0);
> > +               kfree_skb(nskb);
> > +               break;
> > +#if IS_ENABLED(CONFIG_IPV6)
> > +       case htons(ETH_P_IPV6):
> > +               if (code < 0 || code > ICMPV6_REJECT_ROUTE)
> > +                       return -EINVAL;
> > +
> > +               nskb = skb_clone(skb, GFP_ATOMIC);
> > +               if (!nskb)
> > +                       return -ENOMEM;
> > +
> > +               if (ip6_route_reply_fetch_dst(nskb) < 0) {
> > +                       kfree_skb(nskb);
> > +                       return -EHOSTUNREACH;
> > +               }
> > +
> > +               icmpv6_send(nskb, ICMPV6_DEST_UNREACH, code, 0);
> > +               kfree_skb(nskb);
> > +               break;
> > +#endif
> > +       default:
> > +               return -EPROTONOSUPPORT;
> > +       }
> > +
> > +       return 0;
> > +}
> > +
> >  __bpf_kfunc_end_defs();
> >
> >  int bpf_dynptr_from_skb_rdonly(struct __sk_buff *skb, u64 flags,
> > @@ -12177,6 +12226,10 @@ BTF_KFUNCS_START(bpf_kfunc_check_set_sock_ops)
> >  BTF_ID_FLAGS(func, bpf_sock_ops_enable_tx_tstamp, KF_TRUSTED_ARGS)
> >  BTF_KFUNCS_END(bpf_kfunc_check_set_sock_ops)
> >
> > +BTF_KFUNCS_START(bpf_kfunc_check_set_icmp_send_unreach)
> > +BTF_ID_FLAGS(func, bpf_icmp_send_unreach, KF_TRUSTED_ARGS)
> > +BTF_KFUNCS_END(bpf_kfunc_check_set_icmp_send_unreach)
> > +
> >  static const struct btf_kfunc_id_set bpf_kfunc_set_skb = {
> >         .owner = THIS_MODULE,
> >         .set = &bpf_kfunc_check_set_skb,
> > @@ -12202,6 +12255,11 @@ static const struct btf_kfunc_id_set bpf_kfunc_set_sock_ops = {
> >         .set = &bpf_kfunc_check_set_sock_ops,
> >  };
> >
> > +static const struct btf_kfunc_id_set bpf_kfunc_set_icmp_send_unreach = {
> > +       .owner = THIS_MODULE,
> > +       .set = &bpf_kfunc_check_set_icmp_send_unreach,
> > +};
> > +
> >  static int __init bpf_kfunc_init(void)
> >  {
> >         int ret;
> > @@ -12221,7 +12279,8 @@ static int __init bpf_kfunc_init(void)
> >         ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_CGROUP_SOCK_ADDR,
> >                                                &bpf_kfunc_set_sock_addr);
> >         ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SCHED_CLS, &bpf_kfunc_set_tcp_reqsk);
> > -       return ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SOCK_OPS, &bpf_kfunc_set_sock_ops);
> > +       ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_SOCK_OPS, &bpf_kfunc_set_sock_ops);
> > +       return ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_CGROUP_SKB, &bpf_kfunc_set_icmp_send_unreach);
> 
> Does it have to be restricted to BPF_PROG_TYPE_CGROUP_SKB ?
> Can it be a part of bpf_kfunc_set_skb[] and used more generally ?

From the assumptions that have been made to write the kfunc in this
state yes, it has to be restricted to cgroup_skb. We would need
additional checks for hooks that are earlier in the stack I think.

Keeping in mind that this kfunc is not a necessity for other prog types
which can already overwrite packets, like TC.
 
> If restriction is necessary then I guess we can live with extra
> bpf_kfunc_set_icmp_send_unreach, though it's odd to create a set
> just for one kfunc.
> Either way don't change the last 'return ...' line in this file.
> Add 'ret = ret ?: register...' instead to reduce churn.
> 
> Also cc netdev and netfilter maintainers in v2.

Yes to both.

Aside, could I have your opinion on this part of the cover letter before
I proceed to fix these patches:

> Other design ideas (to prevent above issues) could be:
> * Extend the return codes for the cgroup_skb program to trigger the
>  reject after completion (SK_REJECT).
> * Adding a kfunc to set the kernel to send an ICMP_HOST_UNREACH control
>  message with appropriate code when the cgroup_skb program eventually
>  terminates with SK_DROP.
> 
> We should bear in mind that we want to extend this with TCP reset next.
> Please tell me what's your opinion on above ideas: if adding new return
> codes could be considered and/or the other alternatives would be better
> than this patch series and thus proposed instead.

These two ideas would make it more natural for cgroup_skb progs but
would prevent someone to extend it to more prog types in the future.


  reply	other threads:[~2025-07-11 10:57 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-10 10:26 [PATCH bpf-next v1 0/4] bpf: add icmp_send_unreach kfunc Mahe Tardy
2025-07-10 10:26 ` [PATCH bpf-next v1 1/4] net: move netfilter nf_reject_fill_skb_dst to core ipv4 Mahe Tardy
2025-07-10 10:26 ` [PATCH bpf-next v1 2/4] net: move netfilter nf_reject6_fill_skb_dst to core ipv6 Mahe Tardy
2025-07-10 22:02   ` kernel test robot
2025-07-10 10:26 ` [PATCH bpf-next v1 3/4] bpf: add bpf_icmp_send_unreach cgroup_skb kfunc Mahe Tardy
2025-07-10 16:07   ` Alexei Starovoitov
2025-07-11 10:57     ` Mahe Tardy [this message]
2025-07-25 18:53     ` [PATCH bpf-next v1 0/4] bpf: add icmp_send_unreach kfunc Mahe Tardy
2025-07-25 18:53       ` [PATCH bpf-next v2 1/4] net: move netfilter nf_reject_fill_skb_dst to core ipv4 Mahe Tardy
2025-07-25 18:53       ` [PATCH bpf-next v2 2/4] net: move netfilter nf_reject6_fill_skb_dst to core ipv6 Mahe Tardy
2025-07-25 18:53       ` [PATCH bpf-next v2 3/4] bpf: add bpf_icmp_send_unreach cgroup_skb kfunc Mahe Tardy
2025-07-27  1:49         ` kernel test robot
2025-07-28  9:43           ` [PATCH bpf-next v3 0/4] bpf: add icmp_send_unreach kfunc Mahe Tardy
2025-07-28  9:43             ` [PATCH bpf-next v3 1/4] net: move netfilter nf_reject_fill_skb_dst to core ipv4 Mahe Tardy
2025-07-28  9:43             ` [PATCH bpf-next v3 2/4] net: move netfilter nf_reject6_fill_skb_dst to core ipv6 Mahe Tardy
2025-07-28  9:43             ` [PATCH bpf-next v3 3/4] bpf: add bpf_icmp_send_unreach cgroup_skb kfunc Mahe Tardy
2025-07-28 20:10               ` kernel test robot
2025-07-29  1:05               ` Martin KaFai Lau
2025-07-29 10:06                 ` Mahe Tardy
2025-07-29 23:13                   ` Martin KaFai Lau
2025-07-28  9:43             ` [PATCH bpf-next v3 4/4] selftests/bpf: add icmp_send_unreach kfunc tests Mahe Tardy
2025-07-28 15:40               ` Yonghong Song
2025-07-28 15:59                 ` Mahe Tardy
2025-07-29  1:18               ` Martin KaFai Lau
2025-07-29  9:09                 ` Mahe Tardy
2025-07-29 23:27                   ` Martin KaFai Lau
2025-07-30  0:01                     ` Martin KaFai Lau
2025-07-30  0:32                       ` Martin KaFai Lau
2025-08-05 23:26               ` Jordan Rife
2025-07-29  1:21             ` [PATCH bpf-next v3 0/4] bpf: add icmp_send_unreach kfunc Martin KaFai Lau
2025-07-29  9:53               ` Mahe Tardy
2025-07-30  1:54                 ` Martin KaFai Lau
2025-08-01 18:50                   ` Mahe Tardy
2026-04-20 10:58                     ` [PATCH bpf-next v4 0/6] " Mahe Tardy
2026-04-20 10:58                       ` [PATCH bpf-next v4 1/6] net: move netfilter nf_reject_fill_skb_dst to core ipv4 Mahe Tardy
2026-04-20 11:36                         ` bot+bpf-ci
2026-04-20 13:04                           ` Mahe Tardy
2026-04-21 11:13                         ` sashiko-bot
2026-04-20 10:58                       ` [PATCH bpf-next v4 2/6] net: move netfilter nf_reject6_fill_skb_dst to core ipv6 Mahe Tardy
2026-04-21 11:13                         ` sashiko-bot
2026-04-20 10:58                       ` [PATCH bpf-next v4 3/6] bpf: add bpf_icmp_send_unreach kfunc Mahe Tardy
2026-04-20 11:36                         ` bot+bpf-ci
2026-04-20 13:07                           ` Mahe Tardy
2026-04-21 11:13                         ` sashiko-bot
2026-04-20 10:58                       ` [PATCH bpf-next v4 4/6] selftests/bpf: add icmp_send_unreach kfunc tests Mahe Tardy
2026-04-20 11:36                         ` bot+bpf-ci
2026-04-20 13:08                           ` Mahe Tardy
2026-04-21 11:13                         ` sashiko-bot
2026-04-20 10:58                       ` [PATCH bpf-next v4 5/6] selftests/bpf: add icmp_send_unreach kfunc IPv6 tests Mahe Tardy
2026-04-21 11:13                         ` sashiko-bot
2026-04-20 10:58                       ` [PATCH bpf-next v4 6/6] selftests/bpf: add icmp_send_unreach_recursion test Mahe Tardy
2026-04-21 11:13                         ` sashiko-bot
2025-07-25 18:53       ` [PATCH bpf-next v2 4/4] selftests/bpf: add icmp_send_unreach kfunc tests Mahe Tardy
2025-07-11  0:32   ` [PATCH bpf-next v1 3/4] bpf: add bpf_icmp_send_unreach cgroup_skb kfunc kernel test robot
2025-07-10 10:26 ` [PATCH bpf-next v1 4/4] selftests/bpf: add icmp_send_unreach kfunc tests Mahe Tardy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aHDuJ5rNeMTnUSju@gmail.com \
    --to=mahe.tardy@gmail.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=john.fastabend@gmail.com \
    --cc=martin.lau@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.