* [PATCH v2 0/2] bpf: Introduce cpu affinity for sockmap
@ 2024-11-01 16:16 mrpre
2024-11-01 16:16 ` [PATCH v2 1/2] " mrpre
2024-11-01 16:16 ` [PATCH v2 2/2] bpf: implement libbpf sockmap cpu affinity mrpre
0 siblings, 2 replies; 5+ messages in thread
From: mrpre @ 2024-11-01 16:16 UTC (permalink / raw)
To: yonghong.song, john.fastabend, martin.lau, edumazet, jakub, davem,
dsahern, kuba, pabeni, netdev, bpf, linux-kernel
Cc: mrpre
Why we need cpu affinity:
Mainstream data planes, like Nginx and HAProxy, utilize CPU affinity
by binding user processes to specific CPUs. This avoids interference
between processes and prevents impact from other processes.
Sockmap, as an optimization to accelerate such proxy programs,
currently lacks the ability to specify CPU affinity. The current
implementation of sockmap handling backlog is based on workqueue,
which operates by calling 'schedule_delayed_work()'. It's current
implementation prefers to schedule on the local CPU, i.e., the CPU
that handled the packet under softirq.
For extremely high traffic with large numbers of packets,
'sk_psock_backlog' becomes a large loop.
For multi-threaded programs with only one map, we expect different
sockets to run on different CPUs. It is important to note that this
feature is not a general performance optimization. Instead, it
provides users with the ability to bind to specific CPU, allowing
them to enhance overall operating system utilization based on their
own system environments.
Implementation:
1.When updating the sockmap, support passing a CPU parameter and
save it to the psock.
2.When scheduling psock, determine which CPU to run on using the
psock's CPU information.
3.For thoes sockmap without CPU affinity, keep original logic by using
'schedule_delayed_work()'.
Performance Testing:
'client <-> sockmap proxy <-> server'
Using 'iperf3' tests, with the iperf server bound to CPU5 and the iperf
client bound to CPU6, performance without using CPU affinity is
around 34 Gbits/s, and CPU usage is concentrated on CPU5 and CPU6.
'''
[ 5] local 127.0.0.1 port 57144 connected to 127.0.0.1 port 10000
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 3.95 GBytes 33.9 Gbits/sec
[ 5] 1.00-2.00 sec 3.95 GBytes 34.0 Gbits/sec
......
'''
With using CPU affinity, the performnce is close to direct connection
(without any proxy).
'''
[ 5] local 127.0.0.1 port 56518 connected to 127.0.0.1 port 10000
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 7.76 GBytes 66.6 Gbits/sec
[ 5] 1.00-2.00 sec 7.76 GBytes 66.7 Gbits/sec
......
'''
---
v1 -> v2: fix compile error when some macro disabled
---
mrpre (2):
bpf: Introduce cpu affinity for sockmap
bpf: implement libbpf sockmap cpu affinity
include/linux/bpf.h | 5 ++--
include/linux/skmsg.h | 8 +++++++
include/uapi/linux/bpf.h | 4 ++++
kernel/bpf/syscall.c | 23 ++++++++++++++-----
net/core/skmsg.c | 14 +++++------
net/core/sock_map.c | 13 +++++------
tools/include/uapi/linux/bpf.h | 4 ++++
tools/lib/bpf/bpf.c | 22 ++++++++++++++++++
tools/lib/bpf/bpf.h | 9 ++++++++
tools/lib/bpf/libbpf.map | 1 +
.../selftests/bpf/prog_tests/sockmap_basic.c | 19 +++++++++++----
11 files changed, 96 insertions(+), 26 deletions(-)
--
2.43.5
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v2 1/2] bpf: Introduce cpu affinity for sockmap
2024-11-01 16:16 [PATCH v2 0/2] bpf: Introduce cpu affinity for sockmap mrpre
@ 2024-11-01 16:16 ` mrpre
2024-11-04 7:23 ` Dan Carpenter
2024-11-01 16:16 ` [PATCH v2 2/2] bpf: implement libbpf sockmap cpu affinity mrpre
1 sibling, 1 reply; 5+ messages in thread
From: mrpre @ 2024-11-01 16:16 UTC (permalink / raw)
To: yonghong.song, john.fastabend, martin.lau, edumazet, jakub, davem,
dsahern, kuba, pabeni, netdev, bpf, linux-kernel
Cc: mrpre
Why we need cpu affinity:
Mainstream data planes, like Nginx and HAProxy, utilize CPU affinity
by binding user processes to specific CPUs. This avoids interference
between processes and prevents impact from other processes.
Sockmap, as an optimization to accelerate such proxy programs,
currently lacks the ability to specify CPU affinity. The current
implementation of sockmap handling backlog is based on workqueue,
which operates by calling 'schedule_delayed_work()'. It's current
implementation prefers to schedule on the local CPU, i.e., the CPU
that handled the packet under softirq.
For extremely high traffic with large numbers of packets,
'sk_psock_backlog' becomes a large loop.
For multi-threaded programs with only one map, we expect different
sockets to run on different CPUs. It is important to note that this
feature is not a general performance optimization. Instead, it
provides users with the ability to bind to specific CPU, allowing
them to enhance overall operating system utilization based on their
own system environments.
Implementation:
1.When updating the sockmap, support passing a CPU parameter and
save it to the psock.
2.When scheduling psock, determine which CPU to run on using the
psock's CPU information.
3.For thoes sockmap without CPU affinity, keep original logic by using
'schedule_delayed_work()'.
Performance Testing:
'client <-> sockmap proxy <-> server'
Using 'iperf3' tests, with the iperf server bound to CPU5 and the iperf
client bound to CPU6, performance without using CPU affinity is
around 34 Gbits/s, and CPU usage is concentrated on CPU5 and CPU6.
'''
[ 5] local 127.0.0.1 port 57144 connected to 127.0.0.1 port 10000
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 3.95 GBytes 33.9 Gbits/sec
[ 5] 1.00-2.00 sec 3.95 GBytes 34.0 Gbits/sec
......
'''
With using CPU affinity, the performnce is close to direct connection
(without any proxy).
'''
[ 5] local 127.0.0.1 port 56518 connected to 127.0.0.1 port 10000
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 7.76 GBytes 66.6 Gbits/sec
[ 5] 1.00-2.00 sec 7.76 GBytes 66.7 Gbits/sec
......
'''
Signed-off-by: Jiayuan Chen <mrpre@163.com>
---
include/linux/bpf.h | 5 +++--
include/linux/skmsg.h | 8 ++++++++
include/uapi/linux/bpf.h | 4 ++++
kernel/bpf/syscall.c | 23 +++++++++++++++++------
net/core/skmsg.c | 14 +++++++-------
net/core/sock_map.c | 13 ++++++-------
6 files changed, 45 insertions(+), 22 deletions(-)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index c3ba4d475174..648eaea2bb28 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -3080,7 +3080,8 @@ int bpf_prog_test_run_syscall(struct bpf_prog *prog,
int sock_map_get_from_fd(const union bpf_attr *attr, struct bpf_prog *prog);
int sock_map_prog_detach(const union bpf_attr *attr, enum bpf_prog_type ptype);
-int sock_map_update_elem_sys(struct bpf_map *map, void *key, void *value, u64 flags);
+int sock_map_update_elem_sys(struct bpf_map *map, void *key, void *value, u64 flags,
+ s32 target_cpu);
int sock_map_bpf_prog_query(const union bpf_attr *attr,
union bpf_attr __user *uattr);
int sock_map_link_create(const union bpf_attr *attr, struct bpf_prog *prog);
@@ -3172,7 +3173,7 @@ static inline int sock_map_prog_detach(const union bpf_attr *attr,
}
static inline int sock_map_update_elem_sys(struct bpf_map *map, void *key, void *value,
- u64 flags)
+ u64 flags, s32 target_cpu)
{
return -EOPNOTSUPP;
}
diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
index d9b03e0746e7..919425a92adf 100644
--- a/include/linux/skmsg.h
+++ b/include/linux/skmsg.h
@@ -117,6 +117,7 @@ struct sk_psock {
struct delayed_work work;
struct sock *sk_pair;
struct rcu_work rwork;
+ s32 target_cpu;
};
int sk_msg_alloc(struct sock *sk, struct sk_msg *msg, int len,
@@ -514,6 +515,13 @@ static inline bool sk_psock_strp_enabled(struct sk_psock *psock)
return !!psock->saved_data_ready;
}
+static inline int sk_psock_strp_get_cpu(struct sk_psock *psock)
+{
+ if (psock->target_cpu != -1)
+ return psock->target_cpu;
+ return WORK_CPU_UNBOUND;
+}
+
#if IS_ENABLED(CONFIG_NET_SOCK_MSG)
#define BPF_F_STRPARSER (1UL << 1)
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index f28b6527e815..2019a87b5d4a 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1509,6 +1509,10 @@ union bpf_attr {
__aligned_u64 next_key;
};
__u64 flags;
+ union {
+ /* specify the CPU where the sockmap job run on */
+ __aligned_u64 target_cpu;
+ };
};
struct { /* struct used by BPF_MAP_*_BATCH commands */
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 8254b2973157..95f719b9c3f3 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -239,10 +239,9 @@ static int bpf_obj_pin_uptrs(struct btf_record *rec, void *obj)
}
static int bpf_map_update_value(struct bpf_map *map, struct file *map_file,
- void *key, void *value, __u64 flags)
+ void *key, void *value, __u64 flags, s32 target_cpu)
{
int err;
-
/* Need to create a kthread, thus must support schedule */
if (bpf_map_is_offloaded(map)) {
return bpf_map_offload_update_elem(map, key, value, flags);
@@ -252,7 +251,7 @@ static int bpf_map_update_value(struct bpf_map *map, struct file *map_file,
return map->ops->map_update_elem(map, key, value, flags);
} else if (map->map_type == BPF_MAP_TYPE_SOCKHASH ||
map->map_type == BPF_MAP_TYPE_SOCKMAP) {
- return sock_map_update_elem_sys(map, key, value, flags);
+ return sock_map_update_elem_sys(map, key, value, flags, target_cpu);
} else if (IS_FD_PROG_ARRAY(map)) {
return bpf_fd_array_map_update_elem(map, map_file, key, value,
flags);
@@ -1680,12 +1679,14 @@ static int map_lookup_elem(union bpf_attr *attr)
}
-#define BPF_MAP_UPDATE_ELEM_LAST_FIELD flags
+#define BPF_MAP_UPDATE_ELEM_LAST_FIELD target_cpu
static int map_update_elem(union bpf_attr *attr, bpfptr_t uattr)
{
bpfptr_t ukey = make_bpfptr(attr->key, uattr.is_kernel);
bpfptr_t uvalue = make_bpfptr(attr->value, uattr.is_kernel);
+ bpfptr_t utarget_cpu = make_bpfptr(attr->target_cpu, uattr.is_kernel);
+ s64 target_cpu = 0;
struct bpf_map *map;
void *key, *value;
u32 value_size;
@@ -1723,7 +1724,17 @@ static int map_update_elem(union bpf_attr *attr, bpfptr_t uattr)
goto free_key;
}
- err = bpf_map_update_value(map, fd_file(f), key, value, attr->flags);
+ if (map->map_type == BPF_MAP_TYPE_SOCKMAP &&
+ !bpfptr_is_null(utarget_cpu)) {
+ if (copy_from_bpfptr(&target_cpu, utarget_cpu, sizeof(target_cpu)) ||
+ target_cpu > nr_cpu_ids) {
+ err = -EINVAL;
+ goto err_put;
+ }
+ } else {
+ target_cpu = -1;
+ }
+ err = bpf_map_update_value(map, fd_file(f), key, value, attr->flags, (s32)target_cpu);
if (!err)
maybe_wait_bpf_programs(map);
@@ -1947,7 +1958,7 @@ int generic_map_update_batch(struct bpf_map *map, struct file *map_file,
break;
err = bpf_map_update_value(map, map_file, key, value,
- attr->batch.elem_flags);
+ attr->batch.elem_flags, -1);
if (err)
break;
diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index b1dcbd3be89e..cc1dc17cd06c 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -530,7 +530,6 @@ static int sk_psock_skb_ingress_enqueue(struct sk_buff *skb,
struct sk_msg *msg)
{
int num_sge, copied;
-
num_sge = skb_to_sgvec(skb, msg->sg.data, off, len);
if (num_sge < 0) {
/* skb linearize may fail with ENOMEM, but lets simply try again
@@ -679,7 +678,8 @@ static void sk_psock_backlog(struct work_struct *work)
* other work that might be here.
*/
if (sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED))
- schedule_delayed_work(&psock->work, 1);
+ schedule_delayed_work_on(sk_psock_strp_get_cpu(psock),
+ &psock->work, 1);
goto end;
}
/* Hard errors break pipe and stop xmit. */
@@ -729,6 +729,7 @@ struct sk_psock *sk_psock_init(struct sock *sk, int node)
psock->saved_destroy = prot->destroy;
psock->saved_close = prot->close;
psock->saved_write_space = sk->sk_write_space;
+ psock->target_cpu = -1;
INIT_LIST_HEAD(&psock->link);
spin_lock_init(&psock->link_lock);
@@ -843,7 +844,6 @@ void sk_psock_drop(struct sock *sk, struct sk_psock *psock)
else if (psock->progs.stream_verdict || psock->progs.skb_verdict)
sk_psock_stop_verdict(sk, psock);
write_unlock_bh(&sk->sk_callback_lock);
-
sk_psock_stop(psock);
INIT_RCU_WORK(&psock->rwork, sk_psock_destroy);
@@ -934,7 +934,7 @@ static int sk_psock_skb_redirect(struct sk_psock *from, struct sk_buff *skb)
}
skb_queue_tail(&psock_other->ingress_skb, skb);
- schedule_delayed_work(&psock_other->work, 0);
+ schedule_delayed_work_on(sk_psock_strp_get_cpu(psock_other), &psock_other->work, 0);
spin_unlock_bh(&psock_other->ingress_lock);
return 0;
}
@@ -980,7 +980,6 @@ static int sk_psock_verdict_apply(struct sk_psock *psock, struct sk_buff *skb,
struct sock *sk_other;
int err = 0;
u32 len, off;
-
switch (verdict) {
case __SK_PASS:
err = -EIO;
@@ -1012,7 +1011,8 @@ static int sk_psock_verdict_apply(struct sk_psock *psock, struct sk_buff *skb,
spin_lock_bh(&psock->ingress_lock);
if (sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED)) {
skb_queue_tail(&psock->ingress_skb, skb);
- schedule_delayed_work(&psock->work, 0);
+ schedule_delayed_work_on(sk_psock_strp_get_cpu(psock),
+ &psock->work, 0);
err = 0;
}
spin_unlock_bh(&psock->ingress_lock);
@@ -1044,7 +1044,7 @@ static void sk_psock_write_space(struct sock *sk)
psock = sk_psock(sk);
if (likely(psock)) {
if (sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED))
- schedule_delayed_work(&psock->work, 0);
+ schedule_delayed_work_on(sk_psock_strp_get_cpu(psock), &psock->work, 0);
write_space = psock->saved_write_space;
}
rcu_read_unlock();
diff --git a/net/core/sock_map.c b/net/core/sock_map.c
index 07d6aa4e39ef..d74601024d6b 100644
--- a/net/core/sock_map.c
+++ b/net/core/sock_map.c
@@ -465,7 +465,7 @@ static int sock_map_get_next_key(struct bpf_map *map, void *key, void *next)
}
static int sock_map_update_common(struct bpf_map *map, u32 idx,
- struct sock *sk, u64 flags)
+ struct sock *sk, u64 flags, s32 target_cpu)
{
struct bpf_stab *stab = container_of(map, struct bpf_stab, map);
struct sk_psock_link *link;
@@ -489,7 +489,7 @@ static int sock_map_update_common(struct bpf_map *map, u32 idx,
psock = sk_psock(sk);
WARN_ON_ONCE(!psock);
-
+ psock->target_cpu = target_cpu;
spin_lock_bh(&stab->lock);
osk = stab->sks[idx];
if (osk && flags == BPF_NOEXIST) {
@@ -548,13 +548,12 @@ static int sock_hash_update_common(struct bpf_map *map, void *key,
struct sock *sk, u64 flags);
int sock_map_update_elem_sys(struct bpf_map *map, void *key, void *value,
- u64 flags)
+ u64 flags, s32 target_cpu)
{
struct socket *sock;
struct sock *sk;
int ret;
u64 ufd;
-
if (map->value_size == sizeof(u64))
ufd = *(u64 *)value;
else
@@ -579,7 +578,7 @@ int sock_map_update_elem_sys(struct bpf_map *map, void *key, void *value,
if (!sock_map_sk_state_allowed(sk))
ret = -EOPNOTSUPP;
else if (map->map_type == BPF_MAP_TYPE_SOCKMAP)
- ret = sock_map_update_common(map, *(u32 *)key, sk, flags);
+ ret = sock_map_update_common(map, *(u32 *)key, sk, flags, target_cpu);
else
ret = sock_hash_update_common(map, key, sk, flags);
sock_map_sk_release(sk);
@@ -605,7 +604,7 @@ static long sock_map_update_elem(struct bpf_map *map, void *key,
if (!sock_map_sk_state_allowed(sk))
ret = -EOPNOTSUPP;
else if (map->map_type == BPF_MAP_TYPE_SOCKMAP)
- ret = sock_map_update_common(map, *(u32 *)key, sk, flags);
+ ret = sock_map_update_common(map, *(u32 *)key, sk, flags, -1);
else
ret = sock_hash_update_common(map, key, sk, flags);
bh_unlock_sock(sk);
@@ -621,7 +620,7 @@ BPF_CALL_4(bpf_sock_map_update, struct bpf_sock_ops_kern *, sops,
if (likely(sock_map_sk_is_suitable(sops->sk) &&
sock_map_op_okay(sops)))
return sock_map_update_common(map, *(u32 *)key, sops->sk,
- flags);
+ flags, -1);
return -EOPNOTSUPP;
}
--
2.43.5
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH v2 2/2] bpf: implement libbpf sockmap cpu affinity
2024-11-01 16:16 [PATCH v2 0/2] bpf: Introduce cpu affinity for sockmap mrpre
2024-11-01 16:16 ` [PATCH v2 1/2] " mrpre
@ 2024-11-01 16:16 ` mrpre
2024-11-01 19:27 ` Andrii Nakryiko
1 sibling, 1 reply; 5+ messages in thread
From: mrpre @ 2024-11-01 16:16 UTC (permalink / raw)
To: yonghong.song, john.fastabend, martin.lau, edumazet, jakub, davem,
dsahern, kuba, pabeni, netdev, bpf, linux-kernel
Cc: mrpre
implement libbpf sockmap cpu affinity
Signed-off-by: Jiayuan Chen <mrpre@163.com>
---
tools/include/uapi/linux/bpf.h | 4 ++++
tools/lib/bpf/bpf.c | 22 +++++++++++++++++++
tools/lib/bpf/bpf.h | 9 ++++++++
tools/lib/bpf/libbpf.map | 1 +
.../selftests/bpf/prog_tests/sockmap_basic.c | 19 ++++++++++++----
5 files changed, 51 insertions(+), 4 deletions(-)
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index f28b6527e815..ba6c39f40f10 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -1509,6 +1509,10 @@ union bpf_attr {
__aligned_u64 next_key;
};
__u64 flags;
+ union {
+ /* specify the CPU where sockmap job run on */
+ __aligned_u64 target_cpu;
+ };
};
struct { /* struct used by BPF_MAP_*_BATCH commands */
diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 2a4c71501a17..13c3f3cfe889 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -401,6 +401,28 @@ int bpf_map_update_elem(int fd, const void *key, const void *value,
return libbpf_err_errno(ret);
}
+int bpf_map_update_elem_opts(int fd, const void *key, const void *value,
+ __u64 flags, const struct bpf_map_update_opts *opts)
+{
+ union bpf_attr attr;
+ int ret;
+ __u64 *target_cpu;
+
+ if (!OPTS_VALID(opts, bpf_map_update_opts))
+ return libbpf_err(-EINVAL);
+
+ target_cpu = OPTS_GET(opts, target_cpu, NULL);
+ memset(&attr, 0, sizeof(attr));
+ attr.map_fd = fd;
+ attr.key = ptr_to_u64(key);
+ attr.value = ptr_to_u64(value);
+ attr.flags = flags;
+ attr.target_cpu = ptr_to_u64(target_cpu);
+
+ ret = sys_bpf(BPF_MAP_UPDATE_ELEM, &attr, sizeof(attr));
+ return libbpf_err_errno(ret);
+}
+
int bpf_map_lookup_elem(int fd, const void *key, void *value)
{
const size_t attr_sz = offsetofend(union bpf_attr, flags);
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index a4a7b1ad1b63..aec6dfddf697 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -147,6 +147,15 @@ LIBBPF_API int bpf_btf_load(const void *btf_data, size_t btf_size,
LIBBPF_API int bpf_map_update_elem(int fd, const void *key, const void *value,
__u64 flags);
+struct bpf_map_update_opts {
+ size_t sz; /* size of this struct for forward/backward compatibility */
+ /* specify the CPU where the sockmap job run on */
+ __u64 *target_cpu;
+ size_t :0;
+};
+#define bpf_map_update_opts__last_field target_cpu
+LIBBPF_API int bpf_map_update_elem_opts(int fd, const void *key, const void *value,
+ __u64 flags, const struct bpf_map_update_opts *opts);
LIBBPF_API int bpf_map_lookup_elem(int fd, const void *key, void *value);
LIBBPF_API int bpf_map_lookup_elem_flags(int fd, const void *key, void *value,
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 54b6f312cfa8..5a26a1d8624f 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -17,6 +17,7 @@ LIBBPF_0.0.1 {
bpf_map_lookup_and_delete_elem;
bpf_map_lookup_elem;
bpf_map_update_elem;
+ bpf_map_update_elem_opts;
bpf_obj_get;
bpf_obj_get_info_by_fd;
bpf_obj_pin;
diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
index 82bfb266741c..84a35cb4b9fe 100644
--- a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
+++ b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
@@ -190,13 +190,18 @@ static void test_skmsg_helpers_with_link(enum bpf_map_type map_type)
test_skmsg_load_helpers__destroy(skel);
}
-static void test_sockmap_update(enum bpf_map_type map_type)
+static void test_sockmap_update(enum bpf_map_type map_type, bool cpu_affinity)
{
int err, prog, src;
struct test_sockmap_update *skel;
struct bpf_map *dst_map;
const __u32 zero = 0;
char dummy[14] = {0};
+ __u64 target_cpu = 0;
+
+ LIBBPF_OPTS(bpf_map_update_opts, update_opts,
+ .target_cpu = &target_cpu,
+ );
LIBBPF_OPTS(bpf_test_run_opts, topts,
.data_in = dummy,
.data_size_in = sizeof(dummy),
@@ -219,7 +224,11 @@ static void test_sockmap_update(enum bpf_map_type map_type)
else
dst_map = skel->maps.dst_sock_hash;
- err = bpf_map_update_elem(src, &zero, &sk, BPF_NOEXIST);
+ if (cpu_affinity)
+ err = bpf_map_update_elem_opts(src, &zero, &sk, BPF_NOEXIST, &update_opts);
+ else
+ err = bpf_map_update_elem(src, &zero, &sk, BPF_NOEXIST);
+
if (!ASSERT_OK(err, "update_elem(src)"))
goto out;
@@ -896,9 +905,11 @@ void test_sockmap_basic(void)
if (test__start_subtest("sockhash sk_msg load helpers"))
test_skmsg_helpers(BPF_MAP_TYPE_SOCKHASH);
if (test__start_subtest("sockmap update"))
- test_sockmap_update(BPF_MAP_TYPE_SOCKMAP);
+ test_sockmap_update(BPF_MAP_TYPE_SOCKMAP, false);
+ if (test__start_subtest("sockmap update cpu affinity"))
+ test_sockmap_update(BPF_MAP_TYPE_SOCKMAP, true);
if (test__start_subtest("sockhash update"))
- test_sockmap_update(BPF_MAP_TYPE_SOCKHASH);
+ test_sockmap_update(BPF_MAP_TYPE_SOCKHASH, false);
if (test__start_subtest("sockmap update in unsafe context"))
test_sockmap_invalid_update();
if (test__start_subtest("sockmap copy"))
--
2.43.5
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v2 2/2] bpf: implement libbpf sockmap cpu affinity
2024-11-01 16:16 ` [PATCH v2 2/2] bpf: implement libbpf sockmap cpu affinity mrpre
@ 2024-11-01 19:27 ` Andrii Nakryiko
0 siblings, 0 replies; 5+ messages in thread
From: Andrii Nakryiko @ 2024-11-01 19:27 UTC (permalink / raw)
To: mrpre
Cc: yonghong.song, john.fastabend, martin.lau, edumazet, jakub, davem,
dsahern, kuba, pabeni, netdev, bpf, linux-kernel
On Fri, Nov 1, 2024 at 9:17 AM mrpre <mrpre@163.com> wrote:
>
> implement libbpf sockmap cpu affinity
>
> Signed-off-by: Jiayuan Chen <mrpre@163.com>
> ---
> tools/include/uapi/linux/bpf.h | 4 ++++
> tools/lib/bpf/bpf.c | 22 +++++++++++++++++++
> tools/lib/bpf/bpf.h | 9 ++++++++
> tools/lib/bpf/libbpf.map | 1 +
> .../selftests/bpf/prog_tests/sockmap_basic.c | 19 ++++++++++++----
please split out selftests into a separate patch from libbpf changes
(but I hope we won't need libbpf changes at all)
> 5 files changed, 51 insertions(+), 4 deletions(-)
>
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index f28b6527e815..ba6c39f40f10 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -1509,6 +1509,10 @@ union bpf_attr {
> __aligned_u64 next_key;
> };
> __u64 flags;
> + union {
> + /* specify the CPU where sockmap job run on */
> + __aligned_u64 target_cpu;
> + };
> };
>
> struct { /* struct used by BPF_MAP_*_BATCH commands */
> diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
> index 2a4c71501a17..13c3f3cfe889 100644
> --- a/tools/lib/bpf/bpf.c
> +++ b/tools/lib/bpf/bpf.c
> @@ -401,6 +401,28 @@ int bpf_map_update_elem(int fd, const void *key, const void *value,
> return libbpf_err_errno(ret);
> }
>
> +int bpf_map_update_elem_opts(int fd, const void *key, const void *value,
> + __u64 flags, const struct bpf_map_update_opts *opts)
> +{
> + union bpf_attr attr;
> + int ret;
> + __u64 *target_cpu;
> +
> + if (!OPTS_VALID(opts, bpf_map_update_opts))
> + return libbpf_err(-EINVAL);
> +
> + target_cpu = OPTS_GET(opts, target_cpu, NULL);
> + memset(&attr, 0, sizeof(attr));
> + attr.map_fd = fd;
> + attr.key = ptr_to_u64(key);
> + attr.value = ptr_to_u64(value);
> + attr.flags = flags;
> + attr.target_cpu = ptr_to_u64(target_cpu);
> +
> + ret = sys_bpf(BPF_MAP_UPDATE_ELEM, &attr, sizeof(attr));
> + return libbpf_err_errno(ret);
> +}
> +
> int bpf_map_lookup_elem(int fd, const void *key, void *value)
> {
> const size_t attr_sz = offsetofend(union bpf_attr, flags);
> diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
> index a4a7b1ad1b63..aec6dfddf697 100644
> --- a/tools/lib/bpf/bpf.h
> +++ b/tools/lib/bpf/bpf.h
> @@ -147,6 +147,15 @@ LIBBPF_API int bpf_btf_load(const void *btf_data, size_t btf_size,
>
> LIBBPF_API int bpf_map_update_elem(int fd, const void *key, const void *value,
> __u64 flags);
> +struct bpf_map_update_opts {
> + size_t sz; /* size of this struct for forward/backward compatibility */
> + /* specify the CPU where the sockmap job run on */
> + __u64 *target_cpu;
> + size_t :0;
> +};
> +#define bpf_map_update_opts__last_field target_cpu
> +LIBBPF_API int bpf_map_update_elem_opts(int fd, const void *key, const void *value,
> + __u64 flags, const struct bpf_map_update_opts *opts);
>
> LIBBPF_API int bpf_map_lookup_elem(int fd, const void *key, void *value);
> LIBBPF_API int bpf_map_lookup_elem_flags(int fd, const void *key, void *value,
> diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
> index 54b6f312cfa8..5a26a1d8624f 100644
> --- a/tools/lib/bpf/libbpf.map
> +++ b/tools/lib/bpf/libbpf.map
> @@ -17,6 +17,7 @@ LIBBPF_0.0.1 {
> bpf_map_lookup_and_delete_elem;
> bpf_map_lookup_elem;
> bpf_map_update_elem;
> + bpf_map_update_elem_opts;
when you are touching unfamiliar code, look around and see what others
did. Did you notice versioned sections in this file? Do you think
adding a new API to 0.0.1 section makes sense when we are already at
1.6.0?
> bpf_obj_get;
> bpf_obj_get_info_by_fd;
> bpf_obj_pin;
> diff --git a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
> index 82bfb266741c..84a35cb4b9fe 100644
> --- a/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
> +++ b/tools/testing/selftests/bpf/prog_tests/sockmap_basic.c
> @@ -190,13 +190,18 @@ static void test_skmsg_helpers_with_link(enum bpf_map_type map_type)
> test_skmsg_load_helpers__destroy(skel);
> }
>
> -static void test_sockmap_update(enum bpf_map_type map_type)
> +static void test_sockmap_update(enum bpf_map_type map_type, bool cpu_affinity)
> {
> int err, prog, src;
> struct test_sockmap_update *skel;
> struct bpf_map *dst_map;
> const __u32 zero = 0;
> char dummy[14] = {0};
> + __u64 target_cpu = 0;
> +
> + LIBBPF_OPTS(bpf_map_update_opts, update_opts,
> + .target_cpu = &target_cpu,
> + );
> LIBBPF_OPTS(bpf_test_run_opts, topts,
> .data_in = dummy,
> .data_size_in = sizeof(dummy),
> @@ -219,7 +224,11 @@ static void test_sockmap_update(enum bpf_map_type map_type)
> else
> dst_map = skel->maps.dst_sock_hash;
>
> - err = bpf_map_update_elem(src, &zero, &sk, BPF_NOEXIST);
> + if (cpu_affinity)
> + err = bpf_map_update_elem_opts(src, &zero, &sk, BPF_NOEXIST, &update_opts);
> + else
> + err = bpf_map_update_elem(src, &zero, &sk, BPF_NOEXIST);
> +
> if (!ASSERT_OK(err, "update_elem(src)"))
> goto out;
>
> @@ -896,9 +905,11 @@ void test_sockmap_basic(void)
> if (test__start_subtest("sockhash sk_msg load helpers"))
> test_skmsg_helpers(BPF_MAP_TYPE_SOCKHASH);
> if (test__start_subtest("sockmap update"))
> - test_sockmap_update(BPF_MAP_TYPE_SOCKMAP);
> + test_sockmap_update(BPF_MAP_TYPE_SOCKMAP, false);
> + if (test__start_subtest("sockmap update cpu affinity"))
> + test_sockmap_update(BPF_MAP_TYPE_SOCKMAP, true);
> if (test__start_subtest("sockhash update"))
> - test_sockmap_update(BPF_MAP_TYPE_SOCKHASH);
> + test_sockmap_update(BPF_MAP_TYPE_SOCKHASH, false);
> if (test__start_subtest("sockmap update in unsafe context"))
> test_sockmap_invalid_update();
> if (test__start_subtest("sockmap copy"))
> --
> 2.43.5
>
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2 1/2] bpf: Introduce cpu affinity for sockmap
2024-11-01 16:16 ` [PATCH v2 1/2] " mrpre
@ 2024-11-04 7:23 ` Dan Carpenter
0 siblings, 0 replies; 5+ messages in thread
From: Dan Carpenter @ 2024-11-04 7:23 UTC (permalink / raw)
To: oe-kbuild, mrpre, yonghong.song, john.fastabend, martin.lau,
edumazet, jakub, davem, dsahern, kuba, pabeni, netdev, bpf,
linux-kernel
Cc: lkp, oe-kbuild-all, mrpre
Hi mrpre,
kernel test robot noticed the following build warnings:
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/mrpre/bpf-Introduce-cpu-affinity-for-sockmap/20241102-001844
base: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
patch link: https://lore.kernel.org/r/20241101161624.568527-2-mrpre%40163.com
patch subject: [PATCH v2 1/2] bpf: Introduce cpu affinity for sockmap
config: i386-randconfig-141-20241102 (https://download.01.org/0day-ci/archive/20241103/202411030036.PSKG1pW3-lkp@intel.com/config)
compiler: clang version 19.1.3 (https://github.com/llvm/llvm-project ab51eccf88f5321e7c60591c5546b254b6afab99)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
| Closes: https://lore.kernel.org/r/202411030036.PSKG1pW3-lkp@intel.com/
smatch warnings:
net/core/sock_map.c:511 sock_map_update_common() warn: variable dereferenced before check 'psock' (see line 492)
vim +/psock +511 net/core/sock_map.c
604326b41a6fb9 Daniel Borkmann 2018-10-13 467 static int sock_map_update_common(struct bpf_map *map, u32 idx,
ffed654afa8dc1 mrpre 2024-11-02 468 struct sock *sk, u64 flags, s32 target_cpu)
604326b41a6fb9 Daniel Borkmann 2018-10-13 469 {
604326b41a6fb9 Daniel Borkmann 2018-10-13 470 struct bpf_stab *stab = container_of(map, struct bpf_stab, map);
604326b41a6fb9 Daniel Borkmann 2018-10-13 471 struct sk_psock_link *link;
604326b41a6fb9 Daniel Borkmann 2018-10-13 472 struct sk_psock *psock;
604326b41a6fb9 Daniel Borkmann 2018-10-13 473 struct sock *osk;
604326b41a6fb9 Daniel Borkmann 2018-10-13 474 int ret;
604326b41a6fb9 Daniel Borkmann 2018-10-13 475
604326b41a6fb9 Daniel Borkmann 2018-10-13 476 WARN_ON_ONCE(!rcu_read_lock_held());
604326b41a6fb9 Daniel Borkmann 2018-10-13 477 if (unlikely(flags > BPF_EXIST))
604326b41a6fb9 Daniel Borkmann 2018-10-13 478 return -EINVAL;
604326b41a6fb9 Daniel Borkmann 2018-10-13 479 if (unlikely(idx >= map->max_entries))
604326b41a6fb9 Daniel Borkmann 2018-10-13 480 return -E2BIG;
604326b41a6fb9 Daniel Borkmann 2018-10-13 481
604326b41a6fb9 Daniel Borkmann 2018-10-13 482 link = sk_psock_init_link();
604326b41a6fb9 Daniel Borkmann 2018-10-13 483 if (!link)
604326b41a6fb9 Daniel Borkmann 2018-10-13 484 return -ENOMEM;
604326b41a6fb9 Daniel Borkmann 2018-10-13 485
2004fdbd8a2b56 Cong Wang 2021-03-30 486 ret = sock_map_link(map, sk);
604326b41a6fb9 Daniel Borkmann 2018-10-13 487 if (ret < 0)
604326b41a6fb9 Daniel Borkmann 2018-10-13 488 goto out_free;
604326b41a6fb9 Daniel Borkmann 2018-10-13 489
604326b41a6fb9 Daniel Borkmann 2018-10-13 490 psock = sk_psock(sk);
604326b41a6fb9 Daniel Borkmann 2018-10-13 491 WARN_ON_ONCE(!psock);
ffed654afa8dc1 mrpre 2024-11-02 @492 psock->target_cpu = target_cpu;
^^^^^^^^^^^^^^^^^
The patch adds an unchecked dereference
35d2b7ffffc1d9 John Fastabend 2023-08-29 493 spin_lock_bh(&stab->lock);
604326b41a6fb9 Daniel Borkmann 2018-10-13 494 osk = stab->sks[idx];
604326b41a6fb9 Daniel Borkmann 2018-10-13 495 if (osk && flags == BPF_NOEXIST) {
604326b41a6fb9 Daniel Borkmann 2018-10-13 496 ret = -EEXIST;
604326b41a6fb9 Daniel Borkmann 2018-10-13 497 goto out_unlock;
604326b41a6fb9 Daniel Borkmann 2018-10-13 498 } else if (!osk && flags == BPF_EXIST) {
604326b41a6fb9 Daniel Borkmann 2018-10-13 499 ret = -ENOENT;
604326b41a6fb9 Daniel Borkmann 2018-10-13 500 goto out_unlock;
604326b41a6fb9 Daniel Borkmann 2018-10-13 501 }
604326b41a6fb9 Daniel Borkmann 2018-10-13 502
604326b41a6fb9 Daniel Borkmann 2018-10-13 503 sock_map_add_link(psock, link, map, &stab->sks[idx]);
This also dereferences psock btw.
604326b41a6fb9 Daniel Borkmann 2018-10-13 504 stab->sks[idx] = sk;
604326b41a6fb9 Daniel Borkmann 2018-10-13 505 if (osk)
604326b41a6fb9 Daniel Borkmann 2018-10-13 506 sock_map_unref(osk, &stab->sks[idx]);
35d2b7ffffc1d9 John Fastabend 2023-08-29 507 spin_unlock_bh(&stab->lock);
604326b41a6fb9 Daniel Borkmann 2018-10-13 508 return 0;
604326b41a6fb9 Daniel Borkmann 2018-10-13 509 out_unlock:
35d2b7ffffc1d9 John Fastabend 2023-08-29 510 spin_unlock_bh(&stab->lock);
604326b41a6fb9 Daniel Borkmann 2018-10-13 @511 if (psock)
^^^^^
Probably after 6 years of not triggering the WARN_ON_ONCE() on line 490, we can
remove this check?
604326b41a6fb9 Daniel Borkmann 2018-10-13 512 sk_psock_put(sk, psock);
604326b41a6fb9 Daniel Borkmann 2018-10-13 513 out_free:
604326b41a6fb9 Daniel Borkmann 2018-10-13 514 sk_psock_free_link(link);
604326b41a6fb9 Daniel Borkmann 2018-10-13 515 return ret;
604326b41a6fb9 Daniel Borkmann 2018-10-13 516 }
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-11-04 7:23 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-01 16:16 [PATCH v2 0/2] bpf: Introduce cpu affinity for sockmap mrpre
2024-11-01 16:16 ` [PATCH v2 1/2] " mrpre
2024-11-04 7:23 ` Dan Carpenter
2024-11-01 16:16 ` [PATCH v2 2/2] bpf: implement libbpf sockmap cpu affinity mrpre
2024-11-01 19:27 ` Andrii Nakryiko
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox