From: Namhyung Kim <namhyung@kernel.org>
To: Song Liu <song@kernel.org>
Cc: Roman Gushchin <roman.gushchin@linux.dev>,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Andrii Nakryiko <andrii@kernel.org>,
Martin KaFai Lau <martin.lau@linux.dev>,
Eduard Zingerman <eddyz87@gmail.com>,
Yonghong Song <yonghong.song@linux.dev>,
John Fastabend <john.fastabend@gmail.com>,
KP Singh <kpsingh@kernel.org>,
Stanislav Fomichev <sdf@fomichev.me>, Hao Luo <haoluo@google.com>,
Jiri Olsa <jolsa@kernel.org>, LKML <linux-kernel@vger.kernel.org>,
bpf@vger.kernel.org, Andrew Morton <akpm@linux-foundation.org>,
Christoph Lameter <cl@linux.com>,
Pekka Enberg <penberg@kernel.org>,
David Rientjes <rientjes@google.com>,
Joonsoo Kim <iamjoonsoo.kim@lge.com>,
Vlastimil Babka <vbabka@suse.cz>,
Hyeonggon Yoo <42.hyeyoo@gmail.com>,
linux-mm@kvack.org, Arnaldo Carvalho de Melo <acme@kernel.org>,
Kees Cook <kees@kernel.org>
Subject: Re: [PATCH v4 bpf-next 2/3] mm/bpf: Add bpf_get_kmem_cache() kfunc
Date: Fri, 4 Oct 2024 16:28:01 -0700 [thread overview]
Message-ID: <ZwB6ARVa7ea8Vvxi@google.com> (raw)
In-Reply-To: <CAPhsuW4AjZMQxCbqYmEgbnkP0gWenKo4wVi8tW1zYcsaF5h7iQ@mail.gmail.com>
On Fri, Oct 04, 2024 at 03:57:26PM -0700, Song Liu wrote:
> On Fri, Oct 4, 2024 at 2:58 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > On Fri, Oct 04, 2024 at 02:36:30PM -0700, Song Liu wrote:
> > > On Fri, Oct 4, 2024 at 2:25 PM Roman Gushchin <roman.gushchin@linux.dev> wrote:
> > > >
> > > > On Fri, Oct 04, 2024 at 01:10:58PM -0700, Song Liu wrote:
> > > > > On Wed, Oct 2, 2024 at 11:10 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > > > > >
> > > > > > The bpf_get_kmem_cache() is to get a slab cache information from a
> > > > > > virtual address like virt_to_cache(). If the address is a pointer
> > > > > > to a slab object, it'd return a valid kmem_cache pointer, otherwise
> > > > > > NULL is returned.
> > > > > >
> > > > > > It doesn't grab a reference count of the kmem_cache so the caller is
> > > > > > responsible to manage the access. The intended use case for now is to
> > > > > > symbolize locks in slab objects from the lock contention tracepoints.
> > > > > >
> > > > > > Suggested-by: Vlastimil Babka <vbabka@suse.cz>
> > > > > > Acked-by: Roman Gushchin <roman.gushchin@linux.dev> (mm/*)
> > > > > > Acked-by: Vlastimil Babka <vbabka@suse.cz> #mm/slab
> > > > > > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > > > > > ---
> > > > > > kernel/bpf/helpers.c | 1 +
> > > > > > mm/slab_common.c | 19 +++++++++++++++++++
> > > > > > 2 files changed, 20 insertions(+)
> > > > > >
> > > > > > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > > > > > index 4053f279ed4cc7ab..3709fb14288105c6 100644
> > > > > > --- a/kernel/bpf/helpers.c
> > > > > > +++ b/kernel/bpf/helpers.c
> > > > > > @@ -3090,6 +3090,7 @@ BTF_ID_FLAGS(func, bpf_iter_bits_new, KF_ITER_NEW)
> > > > > > BTF_ID_FLAGS(func, bpf_iter_bits_next, KF_ITER_NEXT | KF_RET_NULL)
> > > > > > BTF_ID_FLAGS(func, bpf_iter_bits_destroy, KF_ITER_DESTROY)
> > > > > > BTF_ID_FLAGS(func, bpf_copy_from_user_str, KF_SLEEPABLE)
> > > > > > +BTF_ID_FLAGS(func, bpf_get_kmem_cache, KF_RET_NULL)
> > > > > > BTF_KFUNCS_END(common_btf_ids)
> > > > > >
> > > > > > static const struct btf_kfunc_id_set common_kfunc_set = {
> > > > > > diff --git a/mm/slab_common.c b/mm/slab_common.c
> > > > > > index 7443244656150325..5484e1cd812f698e 100644
> > > > > > --- a/mm/slab_common.c
> > > > > > +++ b/mm/slab_common.c
> > > > > > @@ -1322,6 +1322,25 @@ size_t ksize(const void *objp)
> > > > > > }
> > > > > > EXPORT_SYMBOL(ksize);
> > > > > >
> > > > > > +#ifdef CONFIG_BPF_SYSCALL
> > > > > > +#include <linux/btf.h>
> > > > > > +
> > > > > > +__bpf_kfunc_start_defs();
> > > > > > +
> > > > > > +__bpf_kfunc struct kmem_cache *bpf_get_kmem_cache(u64 addr)
> > > > > > +{
> > > > > > + struct slab *slab;
> > > > > > +
> > > > > > + if (!virt_addr_valid(addr))
> > > > > > + return NULL;
> > > > > > +
> > > > > > + slab = virt_to_slab((void *)(long)addr);
> > > > > > + return slab ? slab->slab_cache : NULL;
> > > > > > +}
> > > > >
> > > > > Do we need to hold a refcount to the slab_cache? Given
> > > > > we make this kfunc available everywhere, including
> > > > > sleepable contexts, I think it is necessary.
> > > >
> > > > It's a really good question.
> > > >
> > > > If the callee somehow owns the slab object, as in the example
> > > > provided in the series (current task), it's not necessarily.
> > > >
> > > > If a user can pass a random address, you're right, we need to
> > > > grab the slab_cache's refcnt. But then we also can't guarantee
> > > > that the object still belongs to the same slab_cache, the
> > > > function becomes racy by the definition.
> > >
> > > To be safe, we can limit the kfunc to sleepable context only. Then
> > > we can lock slab_mutex for virt_to_slab, and hold a refcount
> > > to slab_cache. We will need a KF_RELEASE kfunc to release
> > > the refcount later.
> >
> > Then it needs to call kmem_cache_destroy() for release which contains
> > rcu_barrier. :(
> >
> > >
> > > IIUC, this limitation (sleepable context only) shouldn't be a problem
> > > for perf use case?
> >
> > No, it would be called from the lock contention path including
> > spinlocks. :(
> >
> > Can we limit it to non-sleepable ctx and not to pass arbtrary address
> > somehow (or not to save the result pointer)?
>
> I hacked something like the following. It is not ideal, because we are
> taking spinlock_t pointer instead of void pointer. To use this with void
> 'pointer, we will need some verifier changes.
Thanks a lot for doing this!! I'll take a look at the verifier what
needs to be done.
Namhyung
>
>
> diff --git i/kernel/bpf/helpers.c w/kernel/bpf/helpers.c
> index 3709fb142881..7311a26ecb01 100644
> --- i/kernel/bpf/helpers.c
> +++ w/kernel/bpf/helpers.c
> @@ -3090,7 +3090,7 @@ BTF_ID_FLAGS(func, bpf_iter_bits_new, KF_ITER_NEW)
> BTF_ID_FLAGS(func, bpf_iter_bits_next, KF_ITER_NEXT | KF_RET_NULL)
> BTF_ID_FLAGS(func, bpf_iter_bits_destroy, KF_ITER_DESTROY)
> BTF_ID_FLAGS(func, bpf_copy_from_user_str, KF_SLEEPABLE)
> -BTF_ID_FLAGS(func, bpf_get_kmem_cache, KF_RET_NULL)
> +BTF_ID_FLAGS(func, bpf_get_kmem_cache, KF_RET_NULL | KF_TRUSTED_ARGS
> | KF_RCU_PROTECTED)
> BTF_KFUNCS_END(common_btf_ids)
>
> static const struct btf_kfunc_id_set common_kfunc_set = {
> diff --git i/mm/slab_common.c w/mm/slab_common.c
> index 5484e1cd812f..3e3e5f172f2e 100644
> --- i/mm/slab_common.c
> +++ w/mm/slab_common.c
> @@ -1327,14 +1327,15 @@ EXPORT_SYMBOL(ksize);
>
> __bpf_kfunc_start_defs();
>
> -__bpf_kfunc struct kmem_cache *bpf_get_kmem_cache(u64 addr)
> +__bpf_kfunc struct kmem_cache *bpf_get_kmem_cache(spinlock_t *addr)
> {
> struct slab *slab;
> + unsigned long a = (unsigned long)addr;
>
> - if (!virt_addr_valid(addr))
> + if (!virt_addr_valid(a))
> return NULL;
>
> - slab = virt_to_slab((void *)(long)addr);
> + slab = virt_to_slab(addr);
> return slab ? slab->slab_cache : NULL;
> }
>
> @@ -1346,4 +1347,3 @@ EXPORT_TRACEPOINT_SYMBOL(kmalloc);
> EXPORT_TRACEPOINT_SYMBOL(kmem_cache_alloc);
> EXPORT_TRACEPOINT_SYMBOL(kfree);
> EXPORT_TRACEPOINT_SYMBOL(kmem_cache_free);
> -
> diff --git i/tools/testing/selftests/bpf/progs/kmem_cache_iter.c
> w/tools/testing/selftests/bpf/progs/kmem_cache_iter.c
> index 3f6ec15a1bf6..8238155a5055 100644
> --- i/tools/testing/selftests/bpf/progs/kmem_cache_iter.c
> +++ w/tools/testing/selftests/bpf/progs/kmem_cache_iter.c
> @@ -16,7 +16,7 @@ struct {
> __uint(max_entries, 1024);
> } slab_hash SEC(".maps");
>
> -extern struct kmem_cache *bpf_get_kmem_cache(__u64 addr) __ksym;
> +extern struct kmem_cache *bpf_get_kmem_cache(spinlock_t *addr) __ksym;
>
> /* result, will be checked by userspace */
> int found;
> @@ -46,21 +46,23 @@ int slab_info_collector(struct bpf_iter__kmem_cache *ctx)
> SEC("raw_tp/bpf_test_finish")
> int BPF_PROG(check_task_struct)
> {
> - __u64 curr = bpf_get_current_task();
> + struct task_struct *curr = bpf_get_current_task_btf();
> struct kmem_cache *s;
> char *name;
>
> - s = bpf_get_kmem_cache(curr);
> + s = bpf_get_kmem_cache(&curr->alloc_lock);
> if (s == NULL) {
> found = -1;
> return 0;
> }
>
> + bpf_rcu_read_lock();
> name = bpf_map_lookup_elem(&slab_hash, &s);
> if (name && !bpf_strncmp(name, 11, "task_struct"))
> found = 1;
> else
> found = -2;
> + bpf_rcu_read_unlock();
>
> return 0;
> }
next prev parent reply other threads:[~2024-10-04 23:28 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-02 18:09 [PATCH v4 bpf-next 0/3] bpf: Add kmem_cache iterator and kfunc Namhyung Kim
2024-10-02 18:09 ` [PATCH v4 bpf-next 1/3] bpf: Add kmem_cache iterator Namhyung Kim
2024-10-03 7:35 ` Vlastimil Babka
2024-10-04 20:33 ` Song Liu
2024-10-04 21:37 ` Namhyung Kim
2024-10-04 21:46 ` Song Liu
2024-10-04 23:29 ` Namhyung Kim
2024-10-04 20:45 ` Song Liu
2024-10-04 21:42 ` Namhyung Kim
2024-10-02 18:09 ` [PATCH v4 bpf-next 2/3] mm/bpf: Add bpf_get_kmem_cache() kfunc Namhyung Kim
2024-10-04 5:31 ` Namhyung Kim
2024-10-04 9:12 ` kernel test robot
2024-10-04 18:08 ` kernel test robot
2024-10-04 20:10 ` Song Liu
2024-10-04 21:25 ` Roman Gushchin
2024-10-04 21:36 ` Song Liu
2024-10-04 21:58 ` Namhyung Kim
2024-10-04 22:57 ` Song Liu
2024-10-04 23:28 ` Namhyung Kim [this message]
2024-10-04 23:44 ` Alexei Starovoitov
2024-10-04 23:56 ` Song Liu
2024-10-06 19:00 ` Namhyung Kim
2024-10-07 12:57 ` Vlastimil Babka
2024-10-09 7:17 ` Namhyung Kim
2024-10-10 16:46 ` Namhyung Kim
2024-10-10 17:04 ` Alexei Starovoitov
2024-10-10 22:56 ` Namhyung Kim
2024-10-02 18:09 ` [PATCH v4 bpf-next 3/3] selftests/bpf: Add a test for kmem_cache_iter Namhyung Kim
[not found] ` <94bdb7a4cb0f83adf655d98a5c5f5df5299b960d2af54c87eba08de9646d0e42@mail.kernel.org>
[not found] ` <CAM9d7cjGh5+5Cgw-5Nc5oO88HgJz33BUuMGYREExEgWXND3B_A@mail.gmail.com>
2024-10-03 1:01 ` [PATCH v4 bpf-next 0/3] bpf: Add kmem_cache iterator and kfunc Daniel Xu
2024-10-03 17:43 ` Namhyung Kim
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZwB6ARVa7ea8Vvxi@google.com \
--to=namhyung@kernel.org \
--cc=42.hyeyoo@gmail.com \
--cc=acme@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=cl@linux.com \
--cc=daniel@iogearbox.net \
--cc=eddyz87@gmail.com \
--cc=haoluo@google.com \
--cc=iamjoonsoo.kim@lge.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kees@kernel.org \
--cc=kpsingh@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=martin.lau@linux.dev \
--cc=penberg@kernel.org \
--cc=rientjes@google.com \
--cc=roman.gushchin@linux.dev \
--cc=sdf@fomichev.me \
--cc=song@kernel.org \
--cc=vbabka@suse.cz \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.