* [PATCH v4 bpf 0/2] bpf: return proper error codes for lwt redirect @ 2023-07-26 1:07 Yan Zhai 2023-07-26 1:08 ` [PATCH v4 bpf 1/2] bpf: fix skb_do_redirect return values Yan Zhai 2023-07-26 1:09 ` [PATCH v4 bpf 2/2] bpf: selftests: add lwt redirect regression test cases Yan Zhai 0 siblings, 2 replies; 22+ messages in thread From: Yan Zhai @ 2023-07-26 1:07 UTC (permalink / raw) To: bpf Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Mykola Lysenko, Shuah Khan, Yan Zhai, linux-kernel, netdev, linux-kselftest, kernel-team, Jordan Griege, Markus Elfring, Jakub Sitnicki lwt xmit hook does not expect positive return values in function ip_finish_output2 and ip6_finish_output2. However, BPF redirect programs can return positive values such like NET_XMIT_DROP, NET_RX_DROP, and etc as errors. Such return values can panic the kernel unexpectedly: https://gist.github.com/zhaiyan920/8fbac245b261fe316a7ef04c9b1eba48 This patch fixes the return values from BPF redirect, so the error handling would be consistent at xmit hook. It also adds a few test cases to prevent future regressions. v3: https://lore.kernel.org/bpf/cover.1690255889.git.yan@cloudflare.com/ v2: https://lore.kernel.org/netdev/ZLdY6JkWRccunvu0@debian.debian/ v1: https://lore.kernel.org/bpf/ZLbYdpWC8zt9EJtq@debian.debian/ changes since v3: * minor change in commit message and changelogs * tested by Jakub Sitnicki changes since v2: * subject name changed * also covered redirect to ingress case * added selftests changes since v1: * minor code style changes Yan Zhai (2): bpf: fix skb_do_redirect return values bpf: selftests: add lwt redirect regression test cases include/linux/netdevice.h | 2 + net/core/filter.c | 9 +- tools/testing/selftests/bpf/Makefile | 1 + .../selftests/bpf/progs/test_lwt_redirect.c | 66 +++++++ .../selftests/bpf/test_lwt_redirect.sh | 174 ++++++++++++++++++ 5 files changed, 250 insertions(+), 2 deletions(-) create mode 100644 tools/testing/selftests/bpf/progs/test_lwt_redirect.c create mode 100755 tools/testing/selftests/bpf/test_lwt_redirect.sh -- 2.30.2 ^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH v4 bpf 1/2] bpf: fix skb_do_redirect return values 2023-07-26 1:07 [PATCH v4 bpf 0/2] bpf: return proper error codes for lwt redirect Yan Zhai @ 2023-07-26 1:08 ` Yan Zhai 2023-07-26 12:25 ` Jakub Sitnicki ` (2 more replies) 2023-07-26 1:09 ` [PATCH v4 bpf 2/2] bpf: selftests: add lwt redirect regression test cases Yan Zhai 1 sibling, 3 replies; 22+ messages in thread From: Yan Zhai @ 2023-07-26 1:08 UTC (permalink / raw) To: bpf Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Mykola Lysenko, Shuah Khan, Yan Zhai, linux-kernel, netdev, linux-kselftest, kernel-team, Jordan Griege, Markus Elfring, Jakub Sitnicki skb_do_redirect returns various of values: error code (negative), 0 (success), and some positive status code, e.g. NET_XMIT_CN, NET_RX_DROP. Commit 3a0af8fd61f9 ("bpf: BPF for lightweight tunnel infrastructure") didn't check the return code correctly, so positive values are propagated back along call chain: ip_finish_output2 -> bpf_xmit -> run_lwt_bpf -> skb_do_redirect Inside ip_finish_output2, redirected skb will continue to neighbor subsystem as if LWTUNNEL_XMIT_CONTINUE is returned, despite that this skb could have been freed. The bug can trigger use-after-free warning and crashes kernel afterwards: https://gist.github.com/zhaiyan920/8fbac245b261fe316a7ef04c9b1eba48 Convert positive statuses from skb_do_redirect eliminates this issue. Fixes: 3a0af8fd61f9 ("bpf: BPF for lightweight tunnel infrastructure") Tested-by: Jakub Sitnicki <jakub@cloudflare.com> Suggested-by: Markus Elfring <Markus.Elfring@web.de> Suggested-by: Stanislav Fomichev <sdf@google.com> Reported-by: Jordan Griege <jgriege@cloudflare.com> Signed-off-by: Yan Zhai <yan@cloudflare.com> --- include/linux/netdevice.h | 2 ++ net/core/filter.c | 9 +++++++-- 2 files changed, 9 insertions(+), 2 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index b828c7a75be2..520d808eec15 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -87,6 +87,8 @@ void netdev_sw_irq_coalesce_default_on(struct net_device *dev); #define NET_RX_SUCCESS 0 /* keep 'em coming, baby */ #define NET_RX_DROP 1 /* packet dropped */ +#define net_rx_errno(e) ((e) == NET_RX_DROP ? -ENOBUFS : (e)) + #define MAX_NEST_DEV 8 /* diff --git a/net/core/filter.c b/net/core/filter.c index 06ba0e56e369..564104543737 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -2095,7 +2095,9 @@ static const struct bpf_func_proto bpf_csum_level_proto = { static inline int __bpf_rx_skb(struct net_device *dev, struct sk_buff *skb) { - return dev_forward_skb_nomtu(dev, skb); + int ret = dev_forward_skb_nomtu(dev, skb); + + return net_rx_errno(ret); } static inline int __bpf_rx_skb_no_mac(struct net_device *dev, @@ -2108,7 +2110,7 @@ static inline int __bpf_rx_skb_no_mac(struct net_device *dev, ret = netif_rx(skb); } - return ret; + return net_rx_errno(ret); } static inline int __bpf_tx_skb(struct net_device *dev, struct sk_buff *skb) @@ -2129,6 +2131,9 @@ static inline int __bpf_tx_skb(struct net_device *dev, struct sk_buff *skb) ret = dev_queue_xmit(skb); dev_xmit_recursion_dec(); + if (unlikely(ret > 0)) + ret = net_xmit_errno(ret); + return ret; } -- 2.30.2 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH v4 bpf 1/2] bpf: fix skb_do_redirect return values 2023-07-26 1:08 ` [PATCH v4 bpf 1/2] bpf: fix skb_do_redirect return values Yan Zhai @ 2023-07-26 12:25 ` Jakub Sitnicki 2023-07-26 13:39 ` Dan Carpenter 2023-07-28 22:02 ` Martin KaFai Lau 2 siblings, 0 replies; 22+ messages in thread From: Jakub Sitnicki @ 2023-07-26 12:25 UTC (permalink / raw) To: Yan Zhai Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Mykola Lysenko, Shuah Khan, linux-kernel, netdev, linux-kselftest, kernel-team, Jordan Griege, Markus Elfring On Tue, Jul 25, 2023 at 06:08 PM -07, Yan Zhai wrote: > skb_do_redirect returns various of values: error code (negative), > 0 (success), and some positive status code, e.g. NET_XMIT_CN, > NET_RX_DROP. Commit 3a0af8fd61f9 ("bpf: BPF for lightweight tunnel > infrastructure") didn't check the return code correctly, so positive > values are propagated back along call chain: > > ip_finish_output2 > -> bpf_xmit > -> run_lwt_bpf > -> skb_do_redirect > > Inside ip_finish_output2, redirected skb will continue to neighbor > subsystem as if LWTUNNEL_XMIT_CONTINUE is returned, despite that this > skb could have been freed. The bug can trigger use-after-free warning > and crashes kernel afterwards: > > https://gist.github.com/zhaiyan920/8fbac245b261fe316a7ef04c9b1eba48 > > Convert positive statuses from skb_do_redirect eliminates this issue. > > Fixes: 3a0af8fd61f9 ("bpf: BPF for lightweight tunnel infrastructure") > Tested-by: Jakub Sitnicki <jakub@cloudflare.com> > Suggested-by: Markus Elfring <Markus.Elfring@web.de> > Suggested-by: Stanislav Fomichev <sdf@google.com> > Reported-by: Jordan Griege <jgriege@cloudflare.com> > Signed-off-by: Yan Zhai <yan@cloudflare.com> > --- Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4 bpf 1/2] bpf: fix skb_do_redirect return values 2023-07-26 1:08 ` [PATCH v4 bpf 1/2] bpf: fix skb_do_redirect return values Yan Zhai 2023-07-26 12:25 ` Jakub Sitnicki @ 2023-07-26 13:39 ` Dan Carpenter 2023-07-26 14:14 ` Yan Zhai 2023-07-28 22:02 ` Martin KaFai Lau 2 siblings, 1 reply; 22+ messages in thread From: Dan Carpenter @ 2023-07-26 13:39 UTC (permalink / raw) To: Yan Zhai Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Mykola Lysenko, Shuah Khan, linux-kernel, netdev, linux-kselftest, kernel-team, Jordan Griege, Markus Elfring, Jakub Sitnicki I'm not positive I understand the code in ip_finish_output2(). I think instead of looking for LWTUNNEL_XMIT_DONE it should instead look for != LWTUNNEL_XMIT_CONTINUE. It's unfortunate that NET_XMIT_DROP and LWTUNNEL_XMIT_CONTINUE are the both 0x1. Why don't we just change that instead? Also there seems to be a leak in lwtunnel_xmit(). Should that return LWTUNNEL_XMIT_CONTINUE or should it call kfree_skb() before returning? Something like the following? diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 11652e464f5d..375790b672bc 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -112,6 +112,9 @@ void netdev_sw_irq_coalesce_default_on(struct net_device *dev); #define NET_XMIT_CN 0x02 /* congestion notification */ #define NET_XMIT_MASK 0x0f /* qdisc flags in net/sch_generic.h */ +#define LWTUNNEL_XMIT_DONE NET_XMIT_SUCCESS +#define LWTUNNEL_XMIT_CONTINUE 0x3 + /* NET_XMIT_CN is special. It does not guarantee that this packet is lost. It * indicates that the device will soon be dropping packets, or already drops * some packets of the same priority; prompting us to send less aggressively. */ diff --git a/include/net/lwtunnel.h b/include/net/lwtunnel.h index 6f15e6fa154e..8ab032ee04d0 100644 --- a/include/net/lwtunnel.h +++ b/include/net/lwtunnel.h @@ -16,12 +16,6 @@ #define LWTUNNEL_STATE_INPUT_REDIRECT BIT(1) #define LWTUNNEL_STATE_XMIT_REDIRECT BIT(2) -enum { - LWTUNNEL_XMIT_DONE, - LWTUNNEL_XMIT_CONTINUE, -}; - - struct lwtunnel_state { __u16 type; __u16 flags; diff --git a/net/core/lwtunnel.c b/net/core/lwtunnel.c index 711cd3b4347a..732415d1287d 100644 --- a/net/core/lwtunnel.c +++ b/net/core/lwtunnel.c @@ -371,7 +371,7 @@ int lwtunnel_xmit(struct sk_buff *skb) if (lwtstate->type == LWTUNNEL_ENCAP_NONE || lwtstate->type > LWTUNNEL_ENCAP_MAX) - return 0; + return LWTUNNEL_XMIT_CONTINUE; ret = -EOPNOTSUPP; rcu_read_lock(); diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 6e70839257f7..4be50a211b14 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -216,7 +216,7 @@ static int ip_finish_output2(struct net *net, struct sock *sk, struct sk_buff *s if (lwtunnel_xmit_redirect(dst->lwtstate)) { int res = lwtunnel_xmit(skb); - if (res < 0 || res == LWTUNNEL_XMIT_DONE) + if (res != LWTUNNEL_XMIT_CONTINUE) return res; } diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 1e8c90e97608..016b0a513259 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -113,7 +113,7 @@ static int ip6_finish_output2(struct net *net, struct sock *sk, struct sk_buff * if (lwtunnel_xmit_redirect(dst->lwtstate)) { int res = lwtunnel_xmit(skb); - if (res < 0 || res == LWTUNNEL_XMIT_DONE) + if (res != LWTUNNEL_XMIT_CONTINUE) return res; } ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH v4 bpf 1/2] bpf: fix skb_do_redirect return values 2023-07-26 13:39 ` Dan Carpenter @ 2023-07-26 14:14 ` Yan Zhai 2023-07-26 15:01 ` Dan Carpenter 0 siblings, 1 reply; 22+ messages in thread From: Yan Zhai @ 2023-07-26 14:14 UTC (permalink / raw) To: Dan Carpenter Cc: Yan Zhai, bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Mykola Lysenko, Shuah Khan, linux-kernel, netdev, linux-kselftest, kernel-team, Jordan Griege, Markus Elfring, Jakub Sitnicki On Wed, Jul 26, 2023 at 04:39:08PM +0300, Dan Carpenter wrote: > I'm not positive I understand the code in ip_finish_output2(). I think > instead of looking for LWTUNNEL_XMIT_DONE it should instead look for > != LWTUNNEL_XMIT_CONTINUE. It's unfortunate that NET_XMIT_DROP and > LWTUNNEL_XMIT_CONTINUE are the both 0x1. Why don't we just change that > instead? > I considered about changing lwt side logic. But it would bring larger impact since there are multiple types of encaps on this hook, not just bpf redirect. Changing bpf return values is a minimum change on the other hand. In addition, returning value of NET_RX_DROP and NET_XMIT_CN are the same, so if we don't do something in bpf redirect, there is no way to distinguish them later: the former is considered as an error, while "CN" is considered as non-error. > Also there seems to be a leak in lwtunnel_xmit(). Should that return > LWTUNNEL_XMIT_CONTINUE or should it call kfree_skb() before returning? > > Something like the following? > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > index 11652e464f5d..375790b672bc 100644 > --- a/include/linux/netdevice.h > +++ b/include/linux/netdevice.h > @@ -112,6 +112,9 @@ void netdev_sw_irq_coalesce_default_on(struct net_device *dev); > #define NET_XMIT_CN 0x02 /* congestion notification */ > #define NET_XMIT_MASK 0x0f /* qdisc flags in net/sch_generic.h */ > > +#define LWTUNNEL_XMIT_DONE NET_XMIT_SUCCESS > +#define LWTUNNEL_XMIT_CONTINUE 0x3 > + > /* NET_XMIT_CN is special. It does not guarantee that this packet is lost. It > * indicates that the device will soon be dropping packets, or already drops > * some packets of the same priority; prompting us to send less aggressively. */ > diff --git a/include/net/lwtunnel.h b/include/net/lwtunnel.h > index 6f15e6fa154e..8ab032ee04d0 100644 > --- a/include/net/lwtunnel.h > +++ b/include/net/lwtunnel.h > @@ -16,12 +16,6 @@ > #define LWTUNNEL_STATE_INPUT_REDIRECT BIT(1) > #define LWTUNNEL_STATE_XMIT_REDIRECT BIT(2) > > -enum { > - LWTUNNEL_XMIT_DONE, > - LWTUNNEL_XMIT_CONTINUE, > -}; > - > - > struct lwtunnel_state { > __u16 type; > __u16 flags; > diff --git a/net/core/lwtunnel.c b/net/core/lwtunnel.c > index 711cd3b4347a..732415d1287d 100644 > --- a/net/core/lwtunnel.c > +++ b/net/core/lwtunnel.c > @@ -371,7 +371,7 @@ int lwtunnel_xmit(struct sk_buff *skb) > > if (lwtstate->type == LWTUNNEL_ENCAP_NONE || > lwtstate->type > LWTUNNEL_ENCAP_MAX) > - return 0; > + return LWTUNNEL_XMIT_CONTINUE; You are correct this path would leak skb. Return continue (or drop) would avoid the leak. Personally I'd prefer drop instead to signal the error setup. Since this is a separate issue, do you want to send a separate patch on this? Or I am happy to do it if you prefer. > > ret = -EOPNOTSUPP; > rcu_read_lock(); > diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c > index 6e70839257f7..4be50a211b14 100644 > --- a/net/ipv4/ip_output.c > +++ b/net/ipv4/ip_output.c > @@ -216,7 +216,7 @@ static int ip_finish_output2(struct net *net, struct sock *sk, struct sk_buff *s > if (lwtunnel_xmit_redirect(dst->lwtstate)) { > int res = lwtunnel_xmit(skb); > > - if (res < 0 || res == LWTUNNEL_XMIT_DONE) > + if (res != LWTUNNEL_XMIT_CONTINUE) > return res; Unfortunately we cannot return res directly here when res > 0. This is the final reason why I didn't patch here. Return values here can be propagated back to sendmsg syscall, so returning a positive value would break the syscall convention. best, Yan > } > > diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c > index 1e8c90e97608..016b0a513259 100644 > --- a/net/ipv6/ip6_output.c > +++ b/net/ipv6/ip6_output.c > @@ -113,7 +113,7 @@ static int ip6_finish_output2(struct net *net, struct sock *sk, struct sk_buff * > if (lwtunnel_xmit_redirect(dst->lwtstate)) { > int res = lwtunnel_xmit(skb); > > - if (res < 0 || res == LWTUNNEL_XMIT_DONE) > + if (res != LWTUNNEL_XMIT_CONTINUE) > return res; > } > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4 bpf 1/2] bpf: fix skb_do_redirect return values 2023-07-26 14:14 ` Yan Zhai @ 2023-07-26 15:01 ` Dan Carpenter 2023-07-26 16:10 ` Yan Zhai 0 siblings, 1 reply; 22+ messages in thread From: Dan Carpenter @ 2023-07-26 15:01 UTC (permalink / raw) To: Yan Zhai Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Mykola Lysenko, Shuah Khan, linux-kernel, netdev, linux-kselftest, kernel-team, Jordan Griege, Markus Elfring, Jakub Sitnicki On Wed, Jul 26, 2023 at 07:14:56AM -0700, Yan Zhai wrote: > On Wed, Jul 26, 2023 at 04:39:08PM +0300, Dan Carpenter wrote: > > I'm not positive I understand the code in ip_finish_output2(). I think > > instead of looking for LWTUNNEL_XMIT_DONE it should instead look for > > != LWTUNNEL_XMIT_CONTINUE. It's unfortunate that NET_XMIT_DROP and > > LWTUNNEL_XMIT_CONTINUE are the both 0x1. Why don't we just change that > > instead? > > > I considered about changing lwt side logic. But it would bring larger > impact since there are multiple types of encaps on this hook, not just > bpf redirect. Changing bpf return values is a minimum change on the > other hand. In addition, returning value of NET_RX_DROP and > NET_XMIT_CN are the same, so if we don't do something in bpf redirect, > there is no way to distinguish them later: the former is considered as > an error, while "CN" is considered as non-error. Uh, NET_RX/XMIT_DROP values are 1. NET_XMIT_CN is 2. I'm not an expert but I think what happens is that we treat NET_XMIT_CN as success so that it takes a while for the resend to happen. Eventually the TCP layer will detect it as a dropped packet. > > > Also there seems to be a leak in lwtunnel_xmit(). Should that return > > LWTUNNEL_XMIT_CONTINUE or should it call kfree_skb() before returning? > > > > Something like the following? > > > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > > index 11652e464f5d..375790b672bc 100644 > > --- a/include/linux/netdevice.h > > +++ b/include/linux/netdevice.h > > @@ -112,6 +112,9 @@ void netdev_sw_irq_coalesce_default_on(struct net_device *dev); > > #define NET_XMIT_CN 0x02 /* congestion notification */ > > #define NET_XMIT_MASK 0x0f /* qdisc flags in net/sch_generic.h */ > > > > +#define LWTUNNEL_XMIT_DONE NET_XMIT_SUCCESS > > +#define LWTUNNEL_XMIT_CONTINUE 0x3 > > + > > /* NET_XMIT_CN is special. It does not guarantee that this packet is lost. It > > * indicates that the device will soon be dropping packets, or already drops > > * some packets of the same priority; prompting us to send less aggressively. */ > > diff --git a/include/net/lwtunnel.h b/include/net/lwtunnel.h > > index 6f15e6fa154e..8ab032ee04d0 100644 > > --- a/include/net/lwtunnel.h > > +++ b/include/net/lwtunnel.h > > @@ -16,12 +16,6 @@ > > #define LWTUNNEL_STATE_INPUT_REDIRECT BIT(1) > > #define LWTUNNEL_STATE_XMIT_REDIRECT BIT(2) > > > > -enum { > > - LWTUNNEL_XMIT_DONE, > > - LWTUNNEL_XMIT_CONTINUE, > > -}; > > - > > - > > struct lwtunnel_state { > > __u16 type; > > __u16 flags; > > diff --git a/net/core/lwtunnel.c b/net/core/lwtunnel.c > > index 711cd3b4347a..732415d1287d 100644 > > --- a/net/core/lwtunnel.c > > +++ b/net/core/lwtunnel.c > > @@ -371,7 +371,7 @@ int lwtunnel_xmit(struct sk_buff *skb) > > > > if (lwtstate->type == LWTUNNEL_ENCAP_NONE || > > lwtstate->type > LWTUNNEL_ENCAP_MAX) > > - return 0; > > + return LWTUNNEL_XMIT_CONTINUE; > > You are correct this path would leak skb. Return continue (or drop) > would avoid the leak. Personally I'd prefer drop instead to signal the > error setup. Since this is a separate issue, do you want to send a > separate patch on this? Or I am happy to do it if you prefer. > I don't know which makes sense so I'll leave that up to you. > > > > ret = -EOPNOTSUPP; > > rcu_read_lock(); > > diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c > > index 6e70839257f7..4be50a211b14 100644 > > --- a/net/ipv4/ip_output.c > > +++ b/net/ipv4/ip_output.c > > @@ -216,7 +216,7 @@ static int ip_finish_output2(struct net *net, struct sock *sk, struct sk_buff *s > > if (lwtunnel_xmit_redirect(dst->lwtstate)) { > > int res = lwtunnel_xmit(skb); > > > > - if (res < 0 || res == LWTUNNEL_XMIT_DONE) > > + if (res != LWTUNNEL_XMIT_CONTINUE) > > return res; > > Unfortunately we cannot return res directly here when res > 0. This is > the final reason why I didn't patch here. Return values here can be > propagated back to sendmsg syscall, so returning a positive value > would break the syscall convention. The neigh_output() function is going to return NET_XMIT_DROP so this already happens. Is that not what we want to happen? I guess my concern is that eventually people will eventually new introduce bugs. Fixing incorrect error codes is something that I do several times per week. :P regards, dan carpenter ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4 bpf 1/2] bpf: fix skb_do_redirect return values 2023-07-26 15:01 ` Dan Carpenter @ 2023-07-26 16:10 ` Yan Zhai 2023-07-26 16:53 ` Dan Carpenter 0 siblings, 1 reply; 22+ messages in thread From: Yan Zhai @ 2023-07-26 16:10 UTC (permalink / raw) To: Dan Carpenter Cc: Yan Zhai, bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Mykola Lysenko, Shuah Khan, linux-kernel, netdev, linux-kselftest, kernel-team, Jordan Griege, Markus Elfring, Jakub Sitnicki On Wed, Jul 26, 2023 at 06:01:00PM +0300, Dan Carpenter wrote: > On Wed, Jul 26, 2023 at 07:14:56AM -0700, Yan Zhai wrote: > > On Wed, Jul 26, 2023 at 04:39:08PM +0300, Dan Carpenter wrote: > > > I'm not positive I understand the code in ip_finish_output2(). I think > > > instead of looking for LWTUNNEL_XMIT_DONE it should instead look for > > > != LWTUNNEL_XMIT_CONTINUE. It's unfortunate that NET_XMIT_DROP and > > > LWTUNNEL_XMIT_CONTINUE are the both 0x1. Why don't we just change that > > > instead? > > > > > I considered about changing lwt side logic. But it would bring larger > > impact since there are multiple types of encaps on this hook, not just > > bpf redirect. Changing bpf return values is a minimum change on the > > other hand. In addition, returning value of NET_RX_DROP and > > NET_XMIT_CN are the same, so if we don't do something in bpf redirect, > > there is no way to distinguish them later: the former is considered as > > an error, while "CN" is considered as non-error. > > Uh, NET_RX/XMIT_DROP values are 1. NET_XMIT_CN is 2. > > I'm not an expert but I think what happens is that we treat NET_XMIT_CN > as success so that it takes a while for the resend to happen. > Eventually the TCP layer will detect it as a dropped packet. > My eyes slipped lines. CN is 2. But the fact RX return value can be returned on a TX path still makes me feel unclean. Odds are low that we will have new statuses in future, it is a risk. I'd hope to contain these values only inside BPF redirect code as they are the reason why such rx values can show up there. Meanwhile, your argument do make good sense to me that the same problem may occur for other stuff. It is true. In fact, I just re-examined BPF-REROUTE path, it has the exact same issue by directly sending dst_output value back. So I would propose to do two things: 1. still convert BPF redirect ingress code to contain the propagation of mixed return. Return only TX side value instead, which is also what majority of those local senders are expecting. (I was wrong about positive values returned to sendmsg below btw, they are not). 2. change LWTUNNEL_XMIT_CONTINUE and check for this at xmit hook. > > > > > Also there seems to be a leak in lwtunnel_xmit(). Should that return > > > LWTUNNEL_XMIT_CONTINUE or should it call kfree_skb() before returning? > > > > > > Something like the following? > > > > > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > > > index 11652e464f5d..375790b672bc 100644 > > > --- a/include/linux/netdevice.h > > > +++ b/include/linux/netdevice.h > > > @@ -112,6 +112,9 @@ void netdev_sw_irq_coalesce_default_on(struct net_device *dev); > > > #define NET_XMIT_CN 0x02 /* congestion notification */ > > > #define NET_XMIT_MASK 0x0f /* qdisc flags in net/sch_generic.h */ > > > > > > +#define LWTUNNEL_XMIT_DONE NET_XMIT_SUCCESS > > > +#define LWTUNNEL_XMIT_CONTINUE 0x3 > > > + > > > /* NET_XMIT_CN is special. It does not guarantee that this packet is lost. It > > > * indicates that the device will soon be dropping packets, or already drops > > > * some packets of the same priority; prompting us to send less aggressively. */ > > > diff --git a/include/net/lwtunnel.h b/include/net/lwtunnel.h > > > index 6f15e6fa154e..8ab032ee04d0 100644 > > > --- a/include/net/lwtunnel.h > > > +++ b/include/net/lwtunnel.h > > > @@ -16,12 +16,6 @@ > > > #define LWTUNNEL_STATE_INPUT_REDIRECT BIT(1) > > > #define LWTUNNEL_STATE_XMIT_REDIRECT BIT(2) > > > > > > -enum { > > > - LWTUNNEL_XMIT_DONE, > > > - LWTUNNEL_XMIT_CONTINUE, > > > -}; > > > - > > > - > > > struct lwtunnel_state { > > > __u16 type; > > > __u16 flags; > > > diff --git a/net/core/lwtunnel.c b/net/core/lwtunnel.c > > > index 711cd3b4347a..732415d1287d 100644 > > > --- a/net/core/lwtunnel.c > > > +++ b/net/core/lwtunnel.c > > > @@ -371,7 +371,7 @@ int lwtunnel_xmit(struct sk_buff *skb) > > > > > > if (lwtstate->type == LWTUNNEL_ENCAP_NONE || > > > lwtstate->type > LWTUNNEL_ENCAP_MAX) > > > - return 0; > > > + return LWTUNNEL_XMIT_CONTINUE; > > > > You are correct this path would leak skb. Return continue (or drop) > > would avoid the leak. Personally I'd prefer drop instead to signal the > > error setup. Since this is a separate issue, do you want to send a > > separate patch on this? Or I am happy to do it if you prefer. > > > > I don't know which makes sense so I'll leave that up to you. > This conversation is juicy, I think we discovered two potential new problem sites (the leak here and the reroute path) :) > > > > > > ret = -EOPNOTSUPP; > > > rcu_read_lock(); > > > diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c > > > index 6e70839257f7..4be50a211b14 100644 > > > --- a/net/ipv4/ip_output.c > > > +++ b/net/ipv4/ip_output.c > > > @@ -216,7 +216,7 @@ static int ip_finish_output2(struct net *net, struct sock *sk, struct sk_buff *s > > > if (lwtunnel_xmit_redirect(dst->lwtstate)) { > > > int res = lwtunnel_xmit(skb); > > > > > > - if (res < 0 || res == LWTUNNEL_XMIT_DONE) > > > + if (res != LWTUNNEL_XMIT_CONTINUE) > > > return res; > > > > Unfortunately we cannot return res directly here when res > 0. This is > > the final reason why I didn't patch here. Return values here can be > > propagated back to sendmsg syscall, so returning a positive value > > would break the syscall convention. > > The neigh_output() function is going to return NET_XMIT_DROP so this > already happens. Is that not what we want to happen? > My bad, those return values are processed at ip_send_skb etc, while I was staring only at ip_local_out and beneath with my sleepy eyes. > I guess my concern is that eventually people will eventually new > introduce bugs. Fixing incorrect error codes is something that I do > several times per week. :P > > regards, > dan carpenter > > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4 bpf 1/2] bpf: fix skb_do_redirect return values 2023-07-26 16:10 ` Yan Zhai @ 2023-07-26 16:53 ` Dan Carpenter 2023-07-31 14:26 ` Dan Carpenter 0 siblings, 1 reply; 22+ messages in thread From: Dan Carpenter @ 2023-07-26 16:53 UTC (permalink / raw) To: Yan Zhai Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Mykola Lysenko, Shuah Khan, linux-kernel, netdev, linux-kselftest, kernel-team, Jordan Griege, Markus Elfring, Jakub Sitnicki On Wed, Jul 26, 2023 at 09:10:20AM -0700, Yan Zhai wrote: > On Wed, Jul 26, 2023 at 06:01:00PM +0300, Dan Carpenter wrote: > > On Wed, Jul 26, 2023 at 07:14:56AM -0700, Yan Zhai wrote: > > > On Wed, Jul 26, 2023 at 04:39:08PM +0300, Dan Carpenter wrote: > > > > I'm not positive I understand the code in ip_finish_output2(). I think > > > > instead of looking for LWTUNNEL_XMIT_DONE it should instead look for > > > > != LWTUNNEL_XMIT_CONTINUE. It's unfortunate that NET_XMIT_DROP and > > > > LWTUNNEL_XMIT_CONTINUE are the both 0x1. Why don't we just change that > > > > instead? > > > > > > > I considered about changing lwt side logic. But it would bring larger > > > impact since there are multiple types of encaps on this hook, not just > > > bpf redirect. Changing bpf return values is a minimum change on the > > > other hand. In addition, returning value of NET_RX_DROP and > > > NET_XMIT_CN are the same, so if we don't do something in bpf redirect, > > > there is no way to distinguish them later: the former is considered as > > > an error, while "CN" is considered as non-error. > > > > Uh, NET_RX/XMIT_DROP values are 1. NET_XMIT_CN is 2. > > > > I'm not an expert but I think what happens is that we treat NET_XMIT_CN > > as success so that it takes a while for the resend to happen. > > Eventually the TCP layer will detect it as a dropped packet. > > > My eyes slipped lines. CN is 2. But the fact RX return value can be > returned on a TX path still makes me feel unclean. Odds are low that > we will have new statuses in future, it is a risk. I'd hope to contain > these values only inside BPF redirect code as they are the reason why > such rx values can show up there. Meanwhile, your argument do make > good sense to me that the same problem may occur for other stuff. It > is true. In fact, I just re-examined BPF-REROUTE path, it has the > exact same issue by directly sending dst_output value back. > > So I would propose to do two things: > 1. still convert BPF redirect ingress code to contain the propagation > of mixed return. Return only TX side value instead, which is also what > majority of those local senders are expecting. (I was wrong about > positive values returned to sendmsg below btw, they are not). > > 2. change LWTUNNEL_XMIT_CONTINUE and check for this at xmit hook. > Sounds good! regards, dan carpenter ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4 bpf 1/2] bpf: fix skb_do_redirect return values 2023-07-26 16:53 ` Dan Carpenter @ 2023-07-31 14:26 ` Dan Carpenter 2023-08-01 22:18 ` Yan Zhai 0 siblings, 1 reply; 22+ messages in thread From: Dan Carpenter @ 2023-07-31 14:26 UTC (permalink / raw) To: Yan Zhai Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Mykola Lysenko, Shuah Khan, linux-kernel, netdev, linux-kselftest, kernel-team, Jordan Griege, Markus Elfring, Jakub Sitnicki I'm not a networking person, but I was looking at some use after free static checker warnings. Apparently the rule with xmit functions is that if they return a value > 15 then that means the skb was not freed. Otherwise it's supposed to be freed. So like NETDEV_TX_BUSY is 0x10 so it's not freed. This is checked with using the dev_xmit_complete() function. So I feel like it would make sense for LWTUNNEL_XMIT_CONTINUE to return higher than 15. Because that's the bug right? The original code was assuming that everything besides LWTUNNEL_XMIT_DONE was freed. regards, dan carpenter ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4 bpf 1/2] bpf: fix skb_do_redirect return values 2023-07-31 14:26 ` Dan Carpenter @ 2023-08-01 22:18 ` Yan Zhai 0 siblings, 0 replies; 22+ messages in thread From: Yan Zhai @ 2023-08-01 22:18 UTC (permalink / raw) To: Dan Carpenter Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Mykola Lysenko, Shuah Khan, linux-kernel, netdev, linux-kselftest, kernel-team, Jordan Griege, Markus Elfring, Jakub Sitnicki On Mon, Jul 31, 2023 at 9:26 AM Dan Carpenter <dan.carpenter@linaro.org> wrote: > > I'm not a networking person, but I was looking at some use after free > static checker warnings. Did you refer to the gist I posted or something new? > > Apparently the rule with xmit functions is that if they return a value > > 15 then that means the skb was not freed. Otherwise it's supposed to > be freed. So like NETDEV_TX_BUSY is 0x10 so it's not freed. > > This is checked with using the dev_xmit_complete() function. So I feel > like it would make sense for LWTUNNEL_XMIT_CONTINUE to return higher > than 15. Yes I am adopting your suggestion in v5. Dealing with NETDEV_TX_BUSY would be left as another item (potentially more suited for netdev rather than bpf). Would be great to find a reproduction of memleak. > > Because that's the bug right? The original code was assuming that > everything besides LWTUNNEL_XMIT_DONE was freed. > > regards, > dan carpenter > -- Yan ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4 bpf 1/2] bpf: fix skb_do_redirect return values 2023-07-26 1:08 ` [PATCH v4 bpf 1/2] bpf: fix skb_do_redirect return values Yan Zhai 2023-07-26 12:25 ` Jakub Sitnicki 2023-07-26 13:39 ` Dan Carpenter @ 2023-07-28 22:02 ` Martin KaFai Lau 2023-07-31 21:35 ` Yan Zhai 2 siblings, 1 reply; 22+ messages in thread From: Martin KaFai Lau @ 2023-07-28 22:02 UTC (permalink / raw) To: Yan Zhai Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Song Liu, Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Mykola Lysenko, Shuah Khan, linux-kernel, netdev, linux-kselftest, kernel-team, Jordan Griege, Markus Elfring, Jakub Sitnicki On 7/25/23 6:08 PM, Yan Zhai wrote: > skb_do_redirect returns various of values: error code (negative), > 0 (success), and some positive status code, e.g. NET_XMIT_CN, > NET_RX_DROP. Commit 3a0af8fd61f9 ("bpf: BPF for lightweight tunnel > infrastructure") didn't check the return code correctly, so positive > values are propagated back along call chain: > > ip_finish_output2 > -> bpf_xmit > -> run_lwt_bpf > -> skb_do_redirect From looking at skb_do_redirect, the skb_do_redirect should have consumed the skb except for the -EAGAIN return value. afaik, -EAGAIN could only happen by using the bpf_redirect_peer helper. lwt does not have the bpf_redirect_peer helper available, so there is no -EAGAIN case in lwt. iow, skb_do_redirect should have always consumed the skb in lwt. or did I miss something? If that is the case, it feels like the fix should be in run_lwt_bpf() and the "if (ret == 0)" test in run_lwt_bpf() is unnecessary? ret = skb_do_redirect(skb); if (ret == 0) ret = BPF_REDIRECT; > > Inside ip_finish_output2, redirected skb will continue to neighbor > subsystem as if LWTUNNEL_XMIT_CONTINUE is returned, despite that this > skb could have been freed. The bug can trigger use-after-free warning > and crashes kernel afterwards: > > https://gist.github.com/zhaiyan920/8fbac245b261fe316a7ef04c9b1eba48 ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4 bpf 1/2] bpf: fix skb_do_redirect return values 2023-07-28 22:02 ` Martin KaFai Lau @ 2023-07-31 21:35 ` Yan Zhai 2023-07-31 22:11 ` Martin KaFai Lau 0 siblings, 1 reply; 22+ messages in thread From: Yan Zhai @ 2023-07-31 21:35 UTC (permalink / raw) To: Martin KaFai Lau Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Song Liu, Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Mykola Lysenko, Shuah Khan, linux-kernel, netdev, linux-kselftest, kernel-team, Jordan Griege, Markus Elfring, Jakub Sitnicki On Fri, Jul 28, 2023 at 5:02 PM Martin KaFai Lau <martin.lau@linux.dev> wrote: > > On 7/25/23 6:08 PM, Yan Zhai wrote: > > skb_do_redirect returns various of values: error code (negative), > > 0 (success), and some positive status code, e.g. NET_XMIT_CN, > > NET_RX_DROP. Commit 3a0af8fd61f9 ("bpf: BPF for lightweight tunnel > > infrastructure") didn't check the return code correctly, so positive > > values are propagated back along call chain: > > > > ip_finish_output2 > > -> bpf_xmit > > -> run_lwt_bpf > > -> skb_do_redirect > > From looking at skb_do_redirect, the skb_do_redirect should have consumed the > skb except for the -EAGAIN return value. afaik, -EAGAIN could only happen by > using the bpf_redirect_peer helper. lwt does not have the bpf_redirect_peer > helper available, so there is no -EAGAIN case in lwt. iow, skb_do_redirect > should have always consumed the skb in lwt. or did I miss something? > > If that is the case, it feels like the fix should be in run_lwt_bpf() and the > "if (ret == 0)" test in run_lwt_bpf() is unnecessary? > > ret = skb_do_redirect(skb); > if (ret == 0) > ret = BPF_REDIRECT; > > Just fixing skb redirect return code won't be sufficient. I realized there are other return paths that need to be treated, e.g. bpf reroute path also directly returns dev_queue_xmit status. I plan to check for LWTUNNEL_XMIT_CONTINUE (and change it to a value that does not conflict with NET_RX_DROP and NET_XMIT_DROP) in the next revision. On the other hand, the return value of NETDEV_TX_BUSY is another hassle. As Dan suggested, packets might not have been freed when this is returned from drivers. The caller of dev_queue_xmit might need to free skb when this happens. Yan ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4 bpf 1/2] bpf: fix skb_do_redirect return values 2023-07-31 21:35 ` Yan Zhai @ 2023-07-31 22:11 ` Martin KaFai Lau 2023-07-31 23:01 ` Yan Zhai 0 siblings, 1 reply; 22+ messages in thread From: Martin KaFai Lau @ 2023-07-31 22:11 UTC (permalink / raw) To: Yan Zhai Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Song Liu, Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Mykola Lysenko, Shuah Khan, linux-kernel, netdev, linux-kselftest, kernel-team, Jordan Griege, Markus Elfring, Jakub Sitnicki On 7/31/23 2:35 PM, Yan Zhai wrote: > On Fri, Jul 28, 2023 at 5:02 PM Martin KaFai Lau <martin.lau@linux.dev> wrote: >> >> On 7/25/23 6:08 PM, Yan Zhai wrote: >>> skb_do_redirect returns various of values: error code (negative), >>> 0 (success), and some positive status code, e.g. NET_XMIT_CN, >>> NET_RX_DROP. Commit 3a0af8fd61f9 ("bpf: BPF for lightweight tunnel >>> infrastructure") didn't check the return code correctly, so positive >>> values are propagated back along call chain: >>> >>> ip_finish_output2 >>> -> bpf_xmit >>> -> run_lwt_bpf >>> -> skb_do_redirect >> >> From looking at skb_do_redirect, the skb_do_redirect should have consumed the >> skb except for the -EAGAIN return value. afaik, -EAGAIN could only happen by >> using the bpf_redirect_peer helper. lwt does not have the bpf_redirect_peer >> helper available, so there is no -EAGAIN case in lwt. iow, skb_do_redirect >> should have always consumed the skb in lwt. or did I miss something? >> >> If that is the case, it feels like the fix should be in run_lwt_bpf() and the >> "if (ret == 0)" test in run_lwt_bpf() is unnecessary? >> >> ret = skb_do_redirect(skb); >> if (ret == 0) >> ret = BPF_REDIRECT; >> >> > Just fixing skb redirect return code won't be sufficient. I realized > there are other return paths that need to be treated, e.g. bpf reroute > path also directly returns dev_queue_xmit status. I plan to check for > LWTUNNEL_XMIT_CONTINUE (and change it to a value that does not > conflict with NET_RX_DROP and NET_XMIT_DROP) in the next revision. On > the other hand, the return value of NETDEV_TX_BUSY is another hassle. I suspect we are talking about different things or I am still missing something. I was thinking skb_do_redirect() should have always consumed the skb and bpf_xmit should always return LWTUNNEL_XMIT_DONE also (instead of LWTUNNEL_XMIT_CONTINUE described in the this patch commit message). It is what sch_handle_egress() is doing also. Could you explain how is it different from the skb_do_redirect usage in sch_handle_egress() or you are suggesting the current sch_handle_egress() has the issue too also? > As Dan suggested, packets might not have been freed when this is > returned from drivers. The caller of dev_queue_xmit might need to free > skb when this happens. > > Yan ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4 bpf 1/2] bpf: fix skb_do_redirect return values 2023-07-31 22:11 ` Martin KaFai Lau @ 2023-07-31 23:01 ` Yan Zhai 2023-07-31 23:52 ` Martin KaFai Lau 0 siblings, 1 reply; 22+ messages in thread From: Yan Zhai @ 2023-07-31 23:01 UTC (permalink / raw) To: Martin KaFai Lau Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Song Liu, Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Mykola Lysenko, Shuah Khan, linux-kernel, netdev, linux-kselftest, kernel-team, Jordan Griege, Markus Elfring, Jakub Sitnicki On Mon, Jul 31, 2023 at 5:11 PM Martin KaFai Lau <martin.lau@linux.dev> wrote: > > On 7/31/23 2:35 PM, Yan Zhai wrote: > > On Fri, Jul 28, 2023 at 5:02 PM Martin KaFai Lau <martin.lau@linux.dev> wrote: > >> > >> On 7/25/23 6:08 PM, Yan Zhai wrote: > >>> skb_do_redirect returns various of values: error code (negative), > >>> 0 (success), and some positive status code, e.g. NET_XMIT_CN, > >>> NET_RX_DROP. Commit 3a0af8fd61f9 ("bpf: BPF for lightweight tunnel > >>> infrastructure") didn't check the return code correctly, so positive > >>> values are propagated back along call chain: > >>> > >>> ip_finish_output2 > >>> -> bpf_xmit > >>> -> run_lwt_bpf > >>> -> skb_do_redirect > >> > >> From looking at skb_do_redirect, the skb_do_redirect should have consumed the > >> skb except for the -EAGAIN return value. afaik, -EAGAIN could only happen by > >> using the bpf_redirect_peer helper. lwt does not have the bpf_redirect_peer > >> helper available, so there is no -EAGAIN case in lwt. iow, skb_do_redirect > >> should have always consumed the skb in lwt. or did I miss something? > >> > >> If that is the case, it feels like the fix should be in run_lwt_bpf() and the > >> "if (ret == 0)" test in run_lwt_bpf() is unnecessary? > >> > >> ret = skb_do_redirect(skb); > >> if (ret == 0) > >> ret = BPF_REDIRECT; > >> > >> > > Just fixing skb redirect return code won't be sufficient. I realized > > there are other return paths that need to be treated, e.g. bpf reroute > > path also directly returns dev_queue_xmit status. I plan to check for > > LWTUNNEL_XMIT_CONTINUE (and change it to a value that does not > > conflict with NET_RX_DROP and NET_XMIT_DROP) in the next revision. On > > the other hand, the return value of NETDEV_TX_BUSY is another hassle. > > I suspect we are talking about different things or I am still missing something. > > I was thinking skb_do_redirect() should have always consumed the skb and > bpf_xmit should always return LWTUNNEL_XMIT_DONE also (instead of > LWTUNNEL_XMIT_CONTINUE described in the this patch commit message). It is what > sch_handle_egress() is doing also. Could you explain how is it different from > the skb_do_redirect usage in sch_handle_egress() or you are suggesting the > current sch_handle_egress() has the issue too also? > I think we were not on the same page. You are absolutely right that skb_do_redirect should consume the packet anyway. The difference between your proposal and this patch is that this patch returns errno or LWTUNNEL_XMIT_DONE, and yours does not even return errno. Both approaches fix the issue of "redirect to down device crashes the kernel". What I commented was an exact same issue at different location: BPF reroute may trigger the crash as well, since it also returns dev_queue_xmit status in bpf_xmit. Need to fix this, or instead fixing LWTUNNEL_XMIT_CONTINUE value and correct the behavior at lwtunnel_xmit rather than bpf_xmit. Yan > > > As Dan suggested, packets might not have been freed when this is > > returned from drivers. The caller of dev_queue_xmit might need to free > > skb when this happens. > > > > Yan > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4 bpf 1/2] bpf: fix skb_do_redirect return values 2023-07-31 23:01 ` Yan Zhai @ 2023-07-31 23:52 ` Martin KaFai Lau 0 siblings, 0 replies; 22+ messages in thread From: Martin KaFai Lau @ 2023-07-31 23:52 UTC (permalink / raw) To: Yan Zhai Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Song Liu, Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Mykola Lysenko, Shuah Khan, linux-kernel, netdev, linux-kselftest, kernel-team, Jordan Griege, Markus Elfring, Jakub Sitnicki On 7/31/23 4:01 PM, Yan Zhai wrote: > What I commented was an exact same issue at different location: BPF > reroute may trigger the crash as well, since it also returns > dev_queue_xmit status in bpf_xmit. Need to fix this, or instead fixing > LWTUNNEL_XMIT_CONTINUE value and correct the behavior at lwtunnel_xmit > rather than bpf_xmit. Ah. I think I got it. You meant the bpf_lwt_xmit_reroute() / BPF_LWT_REROUTE case? It would be clearer if some of these names were quoted instead. "reroute" could mean many things. Please put details comment in v5. Thanks. > > Yan > >> >>> As Dan suggested, packets might not have been freed when this is >>> returned from drivers. The caller of dev_queue_xmit might need to free >>> skb when this happens. >>> >>> Yan >> ^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH v4 bpf 2/2] bpf: selftests: add lwt redirect regression test cases 2023-07-26 1:07 [PATCH v4 bpf 0/2] bpf: return proper error codes for lwt redirect Yan Zhai 2023-07-26 1:08 ` [PATCH v4 bpf 1/2] bpf: fix skb_do_redirect return values Yan Zhai @ 2023-07-26 1:09 ` Yan Zhai [not found] ` <3ec61192-c65c-62cc-d073-d6111b08e690@web.de> ` (2 more replies) 1 sibling, 3 replies; 22+ messages in thread From: Yan Zhai @ 2023-07-26 1:09 UTC (permalink / raw) To: bpf Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Mykola Lysenko, Shuah Khan, Yan Zhai, linux-kernel, netdev, linux-kselftest, kernel-team, Jordan Griege, Markus Elfring, Jakub Sitnicki Tests BPF redirect at the lwt xmit hook to ensure error handling are safe, i.e. won't panic the kernel. Tested-by: Jakub Sitnicki <jakub@cloudflare.com> Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: Yan Zhai <yan@cloudflare.com> --- tools/testing/selftests/bpf/Makefile | 1 + .../selftests/bpf/progs/test_lwt_redirect.c | 66 +++++++ .../selftests/bpf/test_lwt_redirect.sh | 174 ++++++++++++++++++ 3 files changed, 241 insertions(+) create mode 100644 tools/testing/selftests/bpf/progs/test_lwt_redirect.c create mode 100755 tools/testing/selftests/bpf/test_lwt_redirect.sh diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile index 538df8fb8c42..e3a24d053793 100644 --- a/tools/testing/selftests/bpf/Makefile +++ b/tools/testing/selftests/bpf/Makefile @@ -66,6 +66,7 @@ TEST_PROGS := test_kmod.sh \ test_xdp_vlan_mode_generic.sh \ test_xdp_vlan_mode_native.sh \ test_lwt_ip_encap.sh \ + test_lwt_redirect.sh \ test_tcp_check_syncookie.sh \ test_tc_tunnel.sh \ test_tc_edt.sh \ diff --git a/tools/testing/selftests/bpf/progs/test_lwt_redirect.c b/tools/testing/selftests/bpf/progs/test_lwt_redirect.c new file mode 100644 index 000000000000..3674e101f68f --- /dev/null +++ b/tools/testing/selftests/bpf/progs/test_lwt_redirect.c @@ -0,0 +1,66 @@ +// SPDX-License-Identifier: GPL-2.0 +#include <linux/bpf.h> +#include <bpf/bpf_helpers.h> +#include "bpf_tracing_net.h" + +/* We don't care about whether the packet can be received by network stack. + * Just care if the packet is sent to the correct device at correct direction + * and not panic the kernel. + */ +static __always_inline int prepend_dummy_mac(struct __sk_buff *skb) +{ + char mac[] = {0x1, 0x2, 0x3, 0x4, 0x5, 0x6, 0xf, + 0xe, 0xd, 0xc, 0xb, 0xa, 0x08, 0x00}; + + if (bpf_skb_change_head(skb, ETH_HLEN, 0)) { + bpf_printk("%s: fail to change head", __func__); + return -1; + } + + if (bpf_skb_store_bytes(skb, 0, mac, sizeof(mac), 0)) { + bpf_printk("%s: fail to update mac", __func__); + return -1; + } + + return 0; +} + +SEC("redir_ingress") +int test_lwt_redirect_in(struct __sk_buff *skb) +{ + if (prepend_dummy_mac(skb)) + return BPF_DROP; + + bpf_printk("Redirect skb to link %d ingress", skb->mark); + return bpf_redirect(skb->mark, BPF_F_INGRESS); +} + +SEC("redir_egress") +int test_lwt_redirect_out(struct __sk_buff *skb) +{ + if (prepend_dummy_mac(skb)) + return BPF_DROP; + + bpf_printk("Redirect skb to link %d egress", skb->mark); + return bpf_redirect(skb->mark, 0); +} + +SEC("redir_egress_nomac") +int test_lwt_redirect_out_nomac(struct __sk_buff *skb) +{ + int ret = bpf_redirect(skb->mark, 0); + + bpf_printk("Redirect skb to link %d egress nomac: %d", skb->mark, ret); + return ret; +} + +SEC("redir_ingress_nomac") +int test_lwt_redirect_in_nomac(struct __sk_buff *skb) +{ + int ret = bpf_redirect(skb->mark, BPF_F_INGRESS); + + bpf_printk("Redirect skb to link %d ingress nomac: %d", skb->mark, ret); + return ret; +} + +char _license[] SEC("license") = "GPL"; diff --git a/tools/testing/selftests/bpf/test_lwt_redirect.sh b/tools/testing/selftests/bpf/test_lwt_redirect.sh new file mode 100755 index 000000000000..1b7b78b48174 --- /dev/null +++ b/tools/testing/selftests/bpf/test_lwt_redirect.sh @@ -0,0 +1,174 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# +# This regression test checks basic lwt redirect functionality, +# making sure the kernel would not crash when redirecting packets +# to a device, regardless its administration state: +# +# 1. redirect to a device egress/ingress should work normally +# 2. redirect to a device egress/ingress should not panic when target is down +# 3. redirect to a device egress/ingress should not panic when target carrier is down +# +# All test setup are simple: redirect ping packet via lwt xmit to cover above +# situations. We do not worry about specific device type, except for the two +# categories of devices that require MAC header and not require MAC header. For +# carrier down situation, we use a vlan device as upper link, and bring down its +# lower device. +# +# Kselftest framework requirement - SKIP code is 4. +ksft_skip=4 +BPF_FILE="test_lwt_redirect.bpf.o" +INGRESS_REDIR_IP=2.2.2.2 +EGRESS_REDIR_IP=3.3.3.3 +INGRESS_REDIR_IP_NOMAC=4.4.4.4 +EGRESS_REDIR_IP_NOMAC=5.5.5.5 +PASS=0 +FAIL=0 + +readonly NS1="ns1-$(mktemp -u XXXXXX)" + +msg="skip all tests:" +if [ $UID != 0 ]; then + echo $msg please run this as root >&2 + exit $ksft_skip +fi + +get_ip_direction() +{ + case $1 in + $INGRESS_REDIR_IP|$INGRESS_REDIR_IP_NOMAC) + echo ingress + ;; + $EGRESS_REDIR_IP|$EGRESS_REDIR_IP_NOMAC) + echo egress + ;; + *) + echo bug + ;; + esac +} + +test_pass() +{ + local testname=$1 + local direction=`get_ip_direction $2` + shift 2 + echo "Pass: $testname $direction $@" + PASS=$((PASS + 1)) +} + +test_fail() +{ + local testname=$1 + local direction=`get_ip_direction $2` + shift 2 + echo "Fail: $testname $direction $@" + FAIL=$((FAIL + 1)) +} + +setup() +{ + ip netns add $NS1 + + ip -n $NS1 link set lo up + ip -n $NS1 link add link_err type dummy + ip -n $NS1 link add link_w_mac type dummy + ip -n $NS1 link add link link_w_mac link_upper type vlan id 1 + ip -n $NS1 link add link_wo_mac type gre remote 4.3.2.1 local 1.2.3.4 + ip -n $NS1 link set link_err up + ip -n $NS1 link set link_w_mac up + ip -n $NS1 link set link_upper up + ip -n $NS1 link set link_wo_mac up + + ip -n $NS1 addr add dev lo 1.1.1.1/32 + + # link_err is only used to make sure packets are redirected instead of + # being routed + ip -n $NS1 route add $INGRESS_REDIR_IP encap bpf xmit \ + obj $BPF_FILE sec redir_ingress dev link_err + ip -n $NS1 route add $EGRESS_REDIR_IP encap bpf xmit \ + obj $BPF_FILE sec redir_egress dev link_err + ip -n $NS1 route add $INGRESS_REDIR_IP_NOMAC encap bpf xmit \ + obj $BPF_FILE sec redir_ingress_nomac dev link_err + ip -n $NS1 route add $EGRESS_REDIR_IP_NOMAC encap bpf xmit \ + obj $BPF_FILE sec redir_egress_nomac dev link_err +} + +cleanup_and_summary() +{ + ip netns del $NS1 + echo PASSED:$PASS FAILED:$FAIL + if [ $FAIL -ne 0 ]; then + exit 1 + else + exit 0 + fi +} + +test_redirect_normal() +{ + local test_name=${FUNCNAME[0]} + local link_name=$1 + local link_id=`ip netns exec $NS1 cat /sys/class/net/${link_name}/ifindex` + local dest=$2 + + ip netns exec $NS1 timeout 2 tcpdump -i ${link_name} -c 1 -n -p icmp >/dev/null 2>&1 & + local jobid=$! + sleep 1 + + # hack: mark indicates the link to redirect to + ip netns exec $NS1 ping -m $link_id $dest -c 1 -w 1 > /dev/null 2>&1 + wait $jobid + + if [ $? -ne 0 ]; then + test_fail $test_name $dest $link_name + else + test_pass $test_name $dest $link_name + fi +} + +test_redirect_no_panic_on_link_down() +{ + local test_name=${FUNCNAME[0]} + local link_name=$1 + local link_id=`ip netns exec $NS1 cat /sys/class/net/${link_name}/ifindex` + local dest=$2 + + ip -n $NS1 link set $link_name down + # hack: mark indicates the link to redirect to + ip netns exec $NS1 ping -m $link_id $dest -c 1 -w 1 >/dev/null 2>&1 + + test_pass $test_name $dest to $link_name + ip -n $NS1 link set $link_name up +} + +test_redirect_no_panic_on_link_carrier_down() +{ + local test_name=${FUNCNAME[0]} + local link_id=`ip netns exec $NS1 cat /sys/class/net/link_upper/ifindex` + local dest=$1 + + ip -n $NS1 link set link_w_mac down + # hack: mark indicates the link to redirect to + ip netns exec $NS1 ping -m $link_id $dest -c 1 -w 1 >/dev/null 2>&1 + + test_pass $test_name $dest to link_upper + ip -n $NS1 link set link_w_mac up +} + +setup + +echo "Testing lwt redirect to devices requiring MAC header" +for dest in $INGRESS_REDIR_IP $EGRESS_REDIR_IP; do + test_redirect_normal link_w_mac $dest + test_redirect_no_panic_on_link_down link_w_mac $dest + test_redirect_no_panic_on_link_carrier_down $dest +done + +echo "Testing lwt redirect to devices not requiring MAC header" +for dest in $INGRESS_REDIR_IP_NOMAC $EGRESS_REDIR_IP_NOMAC; do + test_redirect_normal link_wo_mac $dest + test_redirect_no_panic_on_link_down link_wo_mac $dest +done + +cleanup_and_summary -- 2.30.2 ^ permalink raw reply related [flat|nested] 22+ messages in thread
[parent not found: <3ec61192-c65c-62cc-d073-d6111b08e690@web.de>]
[parent not found: <CAO3-PbraNcfQnqHUG_992vssuA795RxtexYsMdEo=k9zp-XHog@mail.gmail.com>]
* Re: [PATCH v4 bpf 2/2] bpf: selftests: add lwt redirect regression test cases [not found] ` <CAO3-PbraNcfQnqHUG_992vssuA795RxtexYsMdEo=k9zp-XHog@mail.gmail.com> @ 2023-07-26 10:30 ` Yan Zhai 2023-07-26 13:22 ` Dan Carpenter 0 siblings, 1 reply; 22+ messages in thread From: Yan Zhai @ 2023-07-26 10:30 UTC (permalink / raw) To: Markus Elfring Cc: Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, David S. Miller, Eric Dumazet, Hao Luo, Jakub Kicinski, Jakub Sitnicki, Jiri Olsa, John Fastabend, Jordan Griege, KP Singh, LKML, Martin KaFai Lau, Mykola Lysenko, Paolo Abeni, Shuah Khan, Song Liu, Stanislav Fomichev, Yonghong Song, bpf, kernel-janitors, kernel-team, linux-kselftest, netdev Apologize for sending previous mail from a wrong app (not text mode). Resending to keep the mailing list thread consistent. On Wed, Jul 26, 2023 at 3:10 AM Markus Elfring <Markus.Elfring@web.de> wrote: > > > Tests BPF redirect at the lwt xmit hook to ensure error handling are > > safe, i.e. won't panic the kernel. > > Are imperative change descriptions still preferred? Hi Markus, I think you linked this to me yesterday that it should be described imperatively: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?h=v6.5-rc3#n155 > > See also: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?h=v6.5-rc3#n94 > I don’t follow the purpose of this reference. This points to user impact but this is a selftest, so I don’t see any user impact here. Or is there anything I missed? > > Can remaining wording weaknesses be adjusted accordingly? I am not following this question . Can you be more specific or provide an example? Yan > > Regards, > Markus > ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4 bpf 2/2] bpf: selftests: add lwt redirect regression test cases 2023-07-26 10:30 ` Yan Zhai @ 2023-07-26 13:22 ` Dan Carpenter 0 siblings, 0 replies; 22+ messages in thread From: Dan Carpenter @ 2023-07-26 13:22 UTC (permalink / raw) To: Yan Zhai Cc: Markus Elfring, Alexei Starovoitov, Andrii Nakryiko, Daniel Borkmann, David S. Miller, Eric Dumazet, Hao Luo, Jakub Kicinski, Jakub Sitnicki, Jiri Olsa, John Fastabend, Jordan Griege, KP Singh, LKML, Martin KaFai Lau, Mykola Lysenko, Paolo Abeni, Shuah Khan, Song Liu, Stanislav Fomichev, Yonghong Song, bpf, kernel-janitors, kernel-team, linux-kselftest, netdev Was Markus unbanned from vger? How are we recieving these emails? regards, dan carpenter ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4 bpf 2/2] bpf: selftests: add lwt redirect regression test cases 2023-07-26 1:09 ` [PATCH v4 bpf 2/2] bpf: selftests: add lwt redirect regression test cases Yan Zhai [not found] ` <3ec61192-c65c-62cc-d073-d6111b08e690@web.de> @ 2023-07-26 12:26 ` Jakub Sitnicki 2023-07-28 22:47 ` Martin KaFai Lau 2 siblings, 0 replies; 22+ messages in thread From: Jakub Sitnicki @ 2023-07-26 12:26 UTC (permalink / raw) To: Yan Zhai Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Mykola Lysenko, Shuah Khan, linux-kernel, netdev, linux-kselftest, kernel-team, Jordan Griege, Markus Elfring On Tue, Jul 25, 2023 at 06:09 PM -07, Yan Zhai wrote: > Tests BPF redirect at the lwt xmit hook to ensure error handling are > safe, i.e. won't panic the kernel. > > Tested-by: Jakub Sitnicki <jakub@cloudflare.com> > Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> > Signed-off-by: Yan Zhai <yan@cloudflare.com> > --- Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4 bpf 2/2] bpf: selftests: add lwt redirect regression test cases 2023-07-26 1:09 ` [PATCH v4 bpf 2/2] bpf: selftests: add lwt redirect regression test cases Yan Zhai [not found] ` <3ec61192-c65c-62cc-d073-d6111b08e690@web.de> 2023-07-26 12:26 ` Jakub Sitnicki @ 2023-07-28 22:47 ` Martin KaFai Lau 2023-07-31 9:48 ` Jakub Sitnicki 2 siblings, 1 reply; 22+ messages in thread From: Martin KaFai Lau @ 2023-07-28 22:47 UTC (permalink / raw) To: Yan Zhai Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Song Liu, Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Mykola Lysenko, Shuah Khan, linux-kernel, netdev, linux-kselftest, kernel-team, Jordan Griege, Markus Elfring, Jakub Sitnicki On 7/25/23 6:09 PM, Yan Zhai wrote: > diff --git a/tools/testing/selftests/bpf/progs/test_lwt_redirect.c b/tools/testing/selftests/bpf/progs/test_lwt_redirect.c > new file mode 100644 > index 000000000000..3674e101f68f > --- /dev/null > +++ b/tools/testing/selftests/bpf/progs/test_lwt_redirect.c > @@ -0,0 +1,66 @@ > +// SPDX-License-Identifier: GPL-2.0 > +#include <linux/bpf.h> > +#include <bpf/bpf_helpers.h> > +#include "bpf_tracing_net.h" > + > +/* We don't care about whether the packet can be received by network stack. > + * Just care if the packet is sent to the correct device at correct direction > + * and not panic the kernel. > + */ > +static __always_inline int prepend_dummy_mac(struct __sk_buff *skb) > +{ __always_inline is no longer a must for a long time. > + char mac[] = {0x1, 0x2, 0x3, 0x4, 0x5, 0x6, 0xf, > + 0xe, 0xd, 0xc, 0xb, 0xa, 0x08, 0x00}; > + > + if (bpf_skb_change_head(skb, ETH_HLEN, 0)) { > + bpf_printk("%s: fail to change head", __func__); Avoid using bpf_printk(). The bpf CI runs other tests also. > + return -1; > + } > + > + if (bpf_skb_store_bytes(skb, 0, mac, sizeof(mac), 0)) { > + bpf_printk("%s: fail to update mac", __func__); > + return -1; > + } > + > + return 0; > +} > + > +SEC("redir_ingress") Use SEC("lwt_xmit"). Then the libbpf will figure out the prog type. > +int test_lwt_redirect_in(struct __sk_buff *skb) > +{ > + if (prepend_dummy_mac(skb)) > + return BPF_DROP; > + > + bpf_printk("Redirect skb to link %d ingress", skb->mark); > + return bpf_redirect(skb->mark, BPF_F_INGRESS); > +} > + > +SEC("redir_egress") > +int test_lwt_redirect_out(struct __sk_buff *skb) > +{ > + if (prepend_dummy_mac(skb)) > + return BPF_DROP; > + > + bpf_printk("Redirect skb to link %d egress", skb->mark); > + return bpf_redirect(skb->mark, 0); > +} > + > +SEC("redir_egress_nomac") > +int test_lwt_redirect_out_nomac(struct __sk_buff *skb) > +{ > + int ret = bpf_redirect(skb->mark, 0); > + > + bpf_printk("Redirect skb to link %d egress nomac: %d", skb->mark, ret); > + return ret; > +} > + > +SEC("redir_ingress_nomac") > +int test_lwt_redirect_in_nomac(struct __sk_buff *skb) > +{ > + int ret = bpf_redirect(skb->mark, BPF_F_INGRESS); > + > + bpf_printk("Redirect skb to link %d ingress nomac: %d", skb->mark, ret); > + return ret; > +} > + > +char _license[] SEC("license") = "GPL"; > diff --git a/tools/testing/selftests/bpf/test_lwt_redirect.sh b/tools/testing/selftests/bpf/test_lwt_redirect.sh > new file mode 100755 > index 000000000000..1b7b78b48174 > --- /dev/null > +++ b/tools/testing/selftests/bpf/test_lwt_redirect.sh This has to be written in the test_progs infrastructure in C. Only test_progs is run by the BPF CI. Take a look at other tests in prog_tests/. For example, tc_redirect.c and xdp_metadata.c which are having setup in netns/link/...etc. It currently has helpers to add tc qdisc and filter but not adding route yet which could be a useful addition. -- pw-bot: cr ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4 bpf 2/2] bpf: selftests: add lwt redirect regression test cases 2023-07-28 22:47 ` Martin KaFai Lau @ 2023-07-31 9:48 ` Jakub Sitnicki 2023-07-31 18:46 ` Alexei Starovoitov 0 siblings, 1 reply; 22+ messages in thread From: Jakub Sitnicki @ 2023-07-31 9:48 UTC (permalink / raw) To: Martin KaFai Lau Cc: Yan Zhai, bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Song Liu, Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Mykola Lysenko, Shuah Khan, linux-kernel, netdev, linux-kselftest, kernel-team, Jordan Griege, Markus Elfring On Fri, Jul 28, 2023 at 03:47 PM -07, Martin KaFai Lau wrote: > On 7/25/23 6:09 PM, Yan Zhai wrote: [...] >> diff --git a/tools/testing/selftests/bpf/test_lwt_redirect.sh >> b/tools/testing/selftests/bpf/test_lwt_redirect.sh >> new file mode 100755 >> index 000000000000..1b7b78b48174 >> --- /dev/null >> +++ b/tools/testing/selftests/bpf/test_lwt_redirect.sh > > This has to be written in the test_progs infrastructure in C. Only test_progs is > run by the BPF CI. Take a look at other tests in prog_tests/. For example, > tc_redirect.c and xdp_metadata.c which are having setup in netns/link/...etc. It > currently has helpers to add tc qdisc and filter but not adding route yet which > could be a useful addition. Can we help make the BPF CI better so that it also runs other tests in addition test_progs? We have bpf selftests written in shell and even Python. These are sometimes the right tools for the job and make adding tests easier, IMHO. Network setup from C is verbose and tedious. Not to mention, hard to read through. # ./run_kselftest.sh --list bpf:test_verifier bpf:test_tag bpf:test_maps bpf:test_lru_map bpf:test_lpm_map bpf:test_progs bpf:test_dev_cgroup bpf:test_sock bpf:test_sockmap bpf:get_cgroup_id_user bpf:test_cgroup_storage bpf:test_tcpnotify_user bpf:test_sysctl bpf:test_progs-no_alu32 bpf:test_kmod.sh bpf:test_xdp_redirect.sh bpf:test_xdp_redirect_multi.sh bpf:test_xdp_meta.sh bpf:test_xdp_veth.sh bpf:test_offload.py bpf:test_sock_addr.sh bpf:test_tunnel.sh bpf:test_lwt_seg6local.sh bpf:test_lirc_mode2.sh bpf:test_skb_cgroup_id.sh bpf:test_flow_dissector.sh bpf:test_xdp_vlan_mode_generic.sh bpf:test_xdp_vlan_mode_native.sh bpf:test_lwt_ip_encap.sh bpf:test_tcp_check_syncookie.sh bpf:test_tc_tunnel.sh bpf:test_tc_edt.sh bpf:test_xdping.sh bpf:test_bpftool_build.sh bpf:test_bpftool.sh bpf:test_bpftool_metadata.sh bpf:test_doc_build.sh bpf:test_xsk.sh bpf:test_xdp_features.sh ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4 bpf 2/2] bpf: selftests: add lwt redirect regression test cases 2023-07-31 9:48 ` Jakub Sitnicki @ 2023-07-31 18:46 ` Alexei Starovoitov 0 siblings, 0 replies; 22+ messages in thread From: Alexei Starovoitov @ 2023-07-31 18:46 UTC (permalink / raw) To: Jakub Sitnicki Cc: Martin KaFai Lau, Yan Zhai, bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, Song Liu, Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Mykola Lysenko, Shuah Khan, LKML, Network Development, open list:KERNEL SELFTEST FRAMEWORK, kernel-team, Jordan Griege, Markus Elfring On Mon, Jul 31, 2023 at 2:52 AM Jakub Sitnicki <jakub@cloudflare.com> wrote: > > On Fri, Jul 28, 2023 at 03:47 PM -07, Martin KaFai Lau wrote: > > On 7/25/23 6:09 PM, Yan Zhai wrote: > > [...] > > >> diff --git a/tools/testing/selftests/bpf/test_lwt_redirect.sh > >> b/tools/testing/selftests/bpf/test_lwt_redirect.sh > >> new file mode 100755 > >> index 000000000000..1b7b78b48174 > >> --- /dev/null > >> +++ b/tools/testing/selftests/bpf/test_lwt_redirect.sh > > > > This has to be written in the test_progs infrastructure in C. Only test_progs is > > run by the BPF CI. Take a look at other tests in prog_tests/. For example, > > tc_redirect.c and xdp_metadata.c which are having setup in netns/link/...etc. It > > currently has helpers to add tc qdisc and filter but not adding route yet which > > could be a useful addition. > > Can we help make the BPF CI better so that it also runs other tests in > addition test_progs? Not really. CI is not just running the test. It needs to understand the output, pass it to UI, run in parallel, etc. All the shell scripts are not suitable for long term CI exposure. So I completely agree with Martin. No new shell scripts. All selftests must be in test_progs. > We have bpf selftests written in shell and even Python. These are > sometimes the right tools for the job and make adding tests easier, > IMHO. Network setup from C is verbose and tedious. Not to mention, hard > to read through. For comparison take a look at BPF CI code base and what it takes to run the tests and process the output. There is plenty of work for CI ahead. ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2023-08-01 22:19 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-26 1:07 [PATCH v4 bpf 0/2] bpf: return proper error codes for lwt redirect Yan Zhai
2023-07-26 1:08 ` [PATCH v4 bpf 1/2] bpf: fix skb_do_redirect return values Yan Zhai
2023-07-26 12:25 ` Jakub Sitnicki
2023-07-26 13:39 ` Dan Carpenter
2023-07-26 14:14 ` Yan Zhai
2023-07-26 15:01 ` Dan Carpenter
2023-07-26 16:10 ` Yan Zhai
2023-07-26 16:53 ` Dan Carpenter
2023-07-31 14:26 ` Dan Carpenter
2023-08-01 22:18 ` Yan Zhai
2023-07-28 22:02 ` Martin KaFai Lau
2023-07-31 21:35 ` Yan Zhai
2023-07-31 22:11 ` Martin KaFai Lau
2023-07-31 23:01 ` Yan Zhai
2023-07-31 23:52 ` Martin KaFai Lau
2023-07-26 1:09 ` [PATCH v4 bpf 2/2] bpf: selftests: add lwt redirect regression test cases Yan Zhai
[not found] ` <3ec61192-c65c-62cc-d073-d6111b08e690@web.de>
[not found] ` <CAO3-PbraNcfQnqHUG_992vssuA795RxtexYsMdEo=k9zp-XHog@mail.gmail.com>
2023-07-26 10:30 ` Yan Zhai
2023-07-26 13:22 ` Dan Carpenter
2023-07-26 12:26 ` Jakub Sitnicki
2023-07-28 22:47 ` Martin KaFai Lau
2023-07-31 9:48 ` Jakub Sitnicki
2023-07-31 18:46 ` Alexei Starovoitov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox