* [PATCH bpf-next] bpf, sockmap: add BPF_F_PERMANENTLY flag for skmsg redirect
@ 2023-08-05 9:42 Liu Jian
2023-08-05 12:51 ` Jakub Sitnicki
0 siblings, 1 reply; 3+ messages in thread
From: Liu Jian @ 2023-08-05 9:42 UTC (permalink / raw)
To: john.fastabend, jakub, ast, daniel, andrii, martin.lau, song,
yonghong.song, kpsingh, sdf, haoluo, jolsa, davem, edumazet, kuba,
pabeni, dsahern
Cc: netdev, bpf, liujian56
If the sockmap msg redirection function is used only to forward packets
and no other operation, the execution result of the BPF_SK_MSG_VERDICT
program is the same each time. In this case, the BPF program only needs to
be run once. Add BPF_F_PERMANENTLY flag to bpf_msg_redirect_map() and
bpf_msg_redirect_hash() to implement this ability.
Then we can enable this function in the bpf program as follows:
bpf_msg_redirect_hash(xx, xx, xx, BPF_F_INGRESS | BPF_F_PERMANENTLY);
Test results using netperf TCP_STREAM mode:
for i in 1 64 128 512 1k 2k 32k 64k 100k 500k 1m;then
netperf -T 1,2 -t TCP_STREAM -H 127.0.0.1 -l 20 -- -m $i -s 100m,100m -S 100m,100m
done
before:
3.84 246.52 496.89 1885.03 3415.29 6375.03 40749.09 48764.40 51611.34 55678.26 55992.78
after:
4.43 279.20 555.82 2080.79 3870.70 7105.44 41836.41 49709.75 51861.56 55211.00 54566.85
Signed-off-by: Liu Jian <liujian56@huawei.com>
---
include/linux/skmsg.h | 1 +
include/uapi/linux/bpf.h | 7 +++++--
net/core/skmsg.c | 1 +
net/core/sock_map.c | 4 ++--
net/ipv4/tcp_bpf.c | 15 +++++++++------
tools/include/uapi/linux/bpf.h | 7 +++++--
6 files changed, 23 insertions(+), 12 deletions(-)
diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
index 054d7911bfc9..b2da9c432f52 100644
--- a/include/linux/skmsg.h
+++ b/include/linux/skmsg.h
@@ -82,6 +82,7 @@ struct sk_psock {
u32 cork_bytes;
u32 eval;
bool redir_ingress; /* undefined if sk_redir is null */
+ bool eval_permanently;
struct sk_msg *cork;
struct sk_psock_progs progs;
#if IS_ENABLED(CONFIG_BPF_STREAM_PARSER)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 70da85200695..cf622ea4f018 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -3004,7 +3004,8 @@ union bpf_attr {
* egress interfaces can be used for redirection. The
* **BPF_F_INGRESS** value in *flags* is used to make the
* distinction (ingress path is selected if the flag is present,
- * egress path otherwise). This is the only flag supported for now.
+ * egress path otherwise). The **BPF_F_PERMANENTLY** value in
+ * *flags* is used to indicates whether the eBPF result is permanent.
* Return
* **SK_PASS** on success, or **SK_DROP** on error.
*
@@ -3276,7 +3277,8 @@ union bpf_attr {
* egress interfaces can be used for redirection. The
* **BPF_F_INGRESS** value in *flags* is used to make the
* distinction (ingress path is selected if the flag is present,
- * egress path otherwise). This is the only flag supported for now.
+ * egress path otherwise). The **BPF_F_PERMANENTLY** value in
+ * *flags* is used to indicates whether the eBPF result is permanent.
* Return
* **SK_PASS** on success, or **SK_DROP** on error.
*
@@ -5872,6 +5874,7 @@ enum {
/* BPF_FUNC_clone_redirect and BPF_FUNC_redirect flags. */
enum {
BPF_F_INGRESS = (1ULL << 0),
+ BPF_F_PERMANENTLY = (1ULL << 1),
};
/* BPF_FUNC_skb_set_tunnel_key and BPF_FUNC_skb_get_tunnel_key flags. */
diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index a29508e1ff35..b2bf9b5c4252 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -875,6 +875,7 @@ int sk_psock_msg_verdict(struct sock *sk, struct sk_psock *psock,
ret = bpf_prog_run_pin_on_cpu(prog, msg);
ret = sk_psock_map_verd(ret, msg->sk_redir);
psock->apply_bytes = msg->apply_bytes;
+ psock->eval_permanently = msg->flags & BPF_F_PERMANENTLY;
if (ret == __SK_REDIRECT) {
if (psock->sk_redir) {
sock_put(psock->sk_redir);
diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index 08ab108206bf..6a0c90be7f4f 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -662,7 +662,7 @@ BPF_CALL_4(bpf_msg_redirect_map, struct sk_msg *, msg,
{
struct sock *sk;
- if (unlikely(flags & ~(BPF_F_INGRESS)))
+ if (unlikely(flags & ~(BPF_F_INGRESS | BPF_F_PERMANENTLY)))
return SK_DROP;
sk = __sock_map_lookup_elem(map, key);
@@ -1261,7 +1261,7 @@ BPF_CALL_4(bpf_msg_redirect_hash, struct sk_msg *, msg,
{
struct sock *sk;
- if (unlikely(flags & ~(BPF_F_INGRESS)))
+ if (unlikely(flags & ~(BPF_F_INGRESS | BPF_F_PERMANENTLY)))
return SK_DROP;
sk = __sock_hash_lookup_elem(map, key);
diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
index 81f0dff69e0b..81c3d3ad44f7 100644
--- a/net/ipv4/tcp_bpf.c
+++ b/net/ipv4/tcp_bpf.c
@@ -419,8 +419,10 @@ static int tcp_bpf_send_verdict(struct sock *sk, struct sk_psock *psock,
if (!psock->apply_bytes) {
/* Clean up before releasing the sock lock. */
eval = psock->eval;
- psock->eval = __SK_NONE;
- psock->sk_redir = NULL;
+ if (!psock->eval_permanently) {
+ psock->eval = __SK_NONE;
+ psock->sk_redir = NULL;
+ }
}
if (psock->cork) {
cork = true;
@@ -434,7 +436,7 @@ static int tcp_bpf_send_verdict(struct sock *sk, struct sk_psock *psock,
msg, tosend, flags);
sent = origsize - msg->sg.size;
- if (eval == __SK_REDIRECT)
+ if (!psock->eval_permanently && eval == __SK_REDIRECT)
sock_put(sk_redir);
lock_sock(sk);
@@ -460,8 +462,8 @@ static int tcp_bpf_send_verdict(struct sock *sk, struct sk_psock *psock,
}
if (likely(!ret)) {
- if (!psock->apply_bytes) {
- psock->eval = __SK_NONE;
+ if (!psock->apply_bytes && !psock->eval_permanently) {
+ psock->eval = __SK_NONE;
if (psock->sk_redir) {
sock_put(psock->sk_redir);
psock->sk_redir = NULL;
@@ -540,7 +542,8 @@ static int tcp_bpf_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
if (psock->cork_bytes && !enospc)
goto out_err;
/* All cork bytes are accounted, rerun the prog. */
- psock->eval = __SK_NONE;
+ if (!psock->eval_permanently)
+ psock->eval = __SK_NONE;
psock->cork_bytes = 0;
}
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 70da85200695..cf622ea4f018 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -3004,7 +3004,8 @@ union bpf_attr {
* egress interfaces can be used for redirection. The
* **BPF_F_INGRESS** value in *flags* is used to make the
* distinction (ingress path is selected if the flag is present,
- * egress path otherwise). This is the only flag supported for now.
+ * egress path otherwise). The **BPF_F_PERMANENTLY** value in
+ * *flags* is used to indicates whether the eBPF result is permanent.
* Return
* **SK_PASS** on success, or **SK_DROP** on error.
*
@@ -3276,7 +3277,8 @@ union bpf_attr {
* egress interfaces can be used for redirection. The
* **BPF_F_INGRESS** value in *flags* is used to make the
* distinction (ingress path is selected if the flag is present,
- * egress path otherwise). This is the only flag supported for now.
+ * egress path otherwise). The **BPF_F_PERMANENTLY** value in
+ * *flags* is used to indicates whether the eBPF result is permanent.
* Return
* **SK_PASS** on success, or **SK_DROP** on error.
*
@@ -5872,6 +5874,7 @@ enum {
/* BPF_FUNC_clone_redirect and BPF_FUNC_redirect flags. */
enum {
BPF_F_INGRESS = (1ULL << 0),
+ BPF_F_PERMANENTLY = (1ULL << 1),
};
/* BPF_FUNC_skb_set_tunnel_key and BPF_FUNC_skb_get_tunnel_key flags. */
--
2.34.1
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH bpf-next] bpf, sockmap: add BPF_F_PERMANENTLY flag for skmsg redirect
2023-08-05 9:42 [PATCH bpf-next] bpf, sockmap: add BPF_F_PERMANENTLY flag for skmsg redirect Liu Jian
@ 2023-08-05 12:51 ` Jakub Sitnicki
2023-08-08 14:02 ` liujian (CE)
0 siblings, 1 reply; 3+ messages in thread
From: Jakub Sitnicki @ 2023-08-05 12:51 UTC (permalink / raw)
To: Liu Jian
Cc: john.fastabend, ast, daniel, andrii, martin.lau, song,
yonghong.song, kpsingh, sdf, haoluo, jolsa, davem, edumazet, kuba,
pabeni, dsahern, netdev, bpf
On Sat, Aug 05, 2023 at 05:42 PM +08, Liu Jian wrote:
> If the sockmap msg redirection function is used only to forward packets
> and no other operation, the execution result of the BPF_SK_MSG_VERDICT
> program is the same each time. In this case, the BPF program only needs to
> be run once. Add BPF_F_PERMANENTLY flag to bpf_msg_redirect_map() and
> bpf_msg_redirect_hash() to implement this ability.
>
> Then we can enable this function in the bpf program as follows:
> bpf_msg_redirect_hash(xx, xx, xx, BPF_F_INGRESS | BPF_F_PERMANENTLY);
>
> Test results using netperf TCP_STREAM mode:
> for i in 1 64 128 512 1k 2k 32k 64k 100k 500k 1m;then
> netperf -T 1,2 -t TCP_STREAM -H 127.0.0.1 -l 20 -- -m $i -s 100m,100m -S 100m,100m
> done
>
> before:
> 3.84 246.52 496.89 1885.03 3415.29 6375.03 40749.09 48764.40 51611.34 55678.26 55992.78
> after:
> 4.43 279.20 555.82 2080.79 3870.70 7105.44 41836.41 49709.75 51861.56 55211.00 54566.85
>
> Signed-off-by: Liu Jian <liujian56@huawei.com>
> ---
Interesting idea. Potentially opens up the way to redirect without
fallback to backlog thread in the future. If we know the target, then we
can propagate backpressure.
If we go this route, we will need tests. selftests/test_sockmap would
need to be extended, and we will also need some unit tests in test_progs
for corner cases. Corner cases to cover that come to mind: redirect to
self, redirect target socket closed.
I'm out next week, so won't be able to give it a proper review.
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH bpf-next] bpf, sockmap: add BPF_F_PERMANENTLY flag for skmsg redirect
2023-08-05 12:51 ` Jakub Sitnicki
@ 2023-08-08 14:02 ` liujian (CE)
0 siblings, 0 replies; 3+ messages in thread
From: liujian (CE) @ 2023-08-08 14:02 UTC (permalink / raw)
To: Jakub Sitnicki
Cc: john.fastabend, ast, daniel, andrii, martin.lau, song,
yonghong.song, kpsingh, sdf, haoluo, jolsa, davem, edumazet, kuba,
pabeni, dsahern, netdev, bpf
On 2023/8/5 20:51, Jakub Sitnicki wrote:
> On Sat, Aug 05, 2023 at 05:42 PM +08, Liu Jian wrote:
>> If the sockmap msg redirection function is used only to forward packets
>> and no other operation, the execution result of the BPF_SK_MSG_VERDICT
>> program is the same each time. In this case, the BPF program only needs to
>> be run once. Add BPF_F_PERMANENTLY flag to bpf_msg_redirect_map() and
>> bpf_msg_redirect_hash() to implement this ability.
>>
>> Then we can enable this function in the bpf program as follows:
>> bpf_msg_redirect_hash(xx, xx, xx, BPF_F_INGRESS | BPF_F_PERMANENTLY);
>>
>> Test results using netperf TCP_STREAM mode:
>> for i in 1 64 128 512 1k 2k 32k 64k 100k 500k 1m;then
>> netperf -T 1,2 -t TCP_STREAM -H 127.0.0.1 -l 20 -- -m $i -s 100m,100m -S 100m,100m
>> done
>>
>> before:
>> 3.84 246.52 496.89 1885.03 3415.29 6375.03 40749.09 48764.40 51611.34 55678.26 55992.78
>> after:
>> 4.43 279.20 555.82 2080.79 3870.70 7105.44 41836.41 49709.75 51861.56 55211.00 54566.85
>>
>> Signed-off-by: Liu Jian <liujian56@huawei.com>
>> ---
>
> Interesting idea. Potentially opens up the way to redirect without
> fallback to backlog thread in the future. If we know the target, then we
> can propagate backpressure.
>
> If we go this route, we will need tests. selftests/test_sockmap would
> need to be extended, and we will also need some unit tests in test_progs
> for corner cases. Corner cases to cover that come to mind: redirect to
> self, redirect target socket closed.
Thanks. I will add some tests in v2.
>
> I'm out next week, so won't be able to give it a proper review.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2023-08-08 16:32 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-05 9:42 [PATCH bpf-next] bpf, sockmap: add BPF_F_PERMANENTLY flag for skmsg redirect Liu Jian
2023-08-05 12:51 ` Jakub Sitnicki
2023-08-08 14:02 ` liujian (CE)
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).