From: Stanislav Fomichev <sdf@google.com>
To: bpf@vger.kernel.org
Cc: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org,
martin.lau@linux.dev, song@kernel.org, yhs@fb.com,
john.fastabend@gmail.com, kpsingh@kernel.org, sdf@google.com,
haoluo@google.com, jolsa@kernel.org
Subject: [PATCH bpf-next] RFC: bpf: query effective progs without cgroup_mutex
Date: Fri, 5 May 2023 11:45:50 -0700 [thread overview]
Message-ID: <20230505184550.1386802-1-sdf@google.com> (raw)
We're observing some stalls on the heavily loaded machines
in the cgroup_bpf_prog_query path. This is likely due to
being blocked on cgroup_mutex.
IIUC, the cgroup_mutex is there mostly to protect the non-effective
fields (cgrp->bpf.progs) which might be changed by the update path.
For the BPF_F_QUERY_EFFECTIVE case, all we need is to rcu_dereference
a bunch of pointers (and keep them around for consistency), so
let's do it.
Sending out as an RFC because it looks a bit ugly. It would also
be nice to handle non-effective case locklessly as well, but it
might require a larger rework.
Signed-off-by: Stanislav Fomichev <sdf@google.com>
---
kernel/bpf/cgroup.c | 30 ++++++++++++++++++++----------
1 file changed, 20 insertions(+), 10 deletions(-)
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index a06e118a9be5..c9d4b66e2c15 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -1022,10 +1022,10 @@ static int __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
__u32 __user *prog_attach_flags = u64_to_user_ptr(attr->query.prog_attach_flags);
bool effective_query = attr->query.query_flags & BPF_F_QUERY_EFFECTIVE;
__u32 __user *prog_ids = u64_to_user_ptr(attr->query.prog_ids);
+ struct bpf_prog_array *effective[MAX_CGROUP_BPF_ATTACH_TYPE];
enum bpf_attach_type type = attr->query.attach_type;
enum cgroup_bpf_attach_type from_atype, to_atype;
enum cgroup_bpf_attach_type atype;
- struct bpf_prog_array *effective;
int cnt, ret = 0, i;
int total_cnt = 0;
u32 flags;
@@ -1051,9 +1051,9 @@ static int __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
for (atype = from_atype; atype <= to_atype; atype++) {
if (effective_query) {
- effective = rcu_dereference_protected(cgrp->bpf.effective[atype],
- lockdep_is_held(&cgroup_mutex));
- total_cnt += bpf_prog_array_length(effective);
+ effective[atype] = rcu_dereference_protected(cgrp->bpf.effective[atype],
+ lockdep_is_held(&cgroup_mutex));
+ total_cnt += bpf_prog_array_length(effective[atype]);
} else {
total_cnt += prog_list_length(&cgrp->bpf.progs[atype]);
}
@@ -1076,10 +1076,8 @@ static int __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
for (atype = from_atype; atype <= to_atype && total_cnt; atype++) {
if (effective_query) {
- effective = rcu_dereference_protected(cgrp->bpf.effective[atype],
- lockdep_is_held(&cgroup_mutex));
- cnt = min_t(int, bpf_prog_array_length(effective), total_cnt);
- ret = bpf_prog_array_copy_to_user(effective, prog_ids, cnt);
+ cnt = min_t(int, bpf_prog_array_length(effective[atype]), total_cnt);
+ ret = bpf_prog_array_copy_to_user(effective[atype], prog_ids, cnt);
} else {
struct hlist_head *progs;
struct bpf_prog_list *pl;
@@ -1118,11 +1116,23 @@ static int __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
static int cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
union bpf_attr __user *uattr)
{
+ __u32 __user *prog_attach_flags = u64_to_user_ptr(attr->query.prog_attach_flags);
+ bool effective_query = attr->query.query_flags & BPF_F_QUERY_EFFECTIVE;
+ bool need_mutex = false;
int ret;
- mutex_lock(&cgroup_mutex);
+ if (effective_query && !prog_attach_flags)
+ need_mutex = false;
+
+ if (need_mutex)
+ mutex_lock(&cgroup_mutex);
+ else
+ rcu_read_lock();
ret = __cgroup_bpf_query(cgrp, attr, uattr);
- mutex_unlock(&cgroup_mutex);
+ if (need_mutex)
+ mutex_unlock(&cgroup_mutex);
+ else
+ rcu_read_unlock();
return ret;
}
--
2.40.1.521.gf1e218fcd8-goog
next reply other threads:[~2023-05-05 18:45 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-05 18:45 Stanislav Fomichev [this message]
2023-05-05 18:47 ` [PATCH bpf-next] RFC: bpf: query effective progs without cgroup_mutex Stanislav Fomichev
2023-05-05 21:35 ` Stanislav Fomichev
-- strict thread matches above, loose matches on Subject: below --
2023-05-11 15:57 Stanislav Fomichev
2023-05-11 15:59 ` Stanislav Fomichev
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230505184550.1386802-1-sdf@google.com \
--to=sdf@google.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=haoluo@google.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kpsingh@kernel.org \
--cc=martin.lau@linux.dev \
--cc=song@kernel.org \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox