From: Peilin Ye <yepeilin@google.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Eduard Zingerman <eddyz87@gmail.com>,
Leon Hwang <leon.hwang@linux.dev>, bpf <bpf@vger.kernel.org>,
Alexei Starovoitov <ast@kernel.org>,
Andrii Nakryiko <andrii@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Martin KaFai Lau <martin.lau@linux.dev>,
Song Liu <song@kernel.org>,
Yonghong Song <yonghong.song@linux.dev>,
kernel-patches-bot@fb.com,
Kumar Kartikeya Dwivedi <memxor@gmail.com>,
Josh Don <joshdon@google.com>, Barret Rhoden <brho@google.com>
Subject: Re: [PATCH bpf-next v2 1/2] bpf: Reject bpf_timer for PREEMPT_RT
Date: Tue, 9 Sep 2025 09:00:14 +0000 [thread overview]
Message-ID: <aL_snlcI4zC4HtZw@google.com> (raw)
In-Reply-To: <CAADnVQ+56_gvS328irDEuGoDGFH6iywKriACtsre7h5a7eiJbw@mail.gmail.com>
On Mon, Sep 08, 2025 at 03:51:00PM -0700, Alexei Starovoitov wrote:
> On Mon, Sep 8, 2025 at 3:42 PM Peilin Ye <yepeilin@google.com> wrote:
> > Just in case - actually there was a patch that does this:
> >
> > [2] https://lore.kernel.org/bpf/20250905061919.439648-1-yepeilin@google.com/
> >
> > It was then superseded by the patches you linked [1] above however,
> > since per discussion in [2], "use bpf_mem_alloc() to skip memcg
> > accounting because it can trigger hardlockups" is a workaround instead
> > of a proper fix.
> >
> > I wonder if this new issue on PREEMPT_RT would justify [2] over [1]?
> > IIUC, until kmalloc_nolock() becomes available:
> >
> > [1] (plus Leon's patch here) means no bpf_timer on PREEMPT_RT, but we
> > still have memcg accounting for non-PREEMPT_RT; [2] means no memcg
> > accounting.
>
> I didn't comment on the above statement earlier, because
> I thought you meant "no memcg accounting _inline_",
> but reading above it sounds that you think that bpf_mem_alloc()
> doesn't do memcg accounting at all ?
> That's incorrect. bpf_mem_alloc() always uses memcg accounting
Ah, I see - kernel/bpf/memalloc.c:alloc_bulk() via irq_work. Thanks for
the correction!
> , but the usage is nuanced. bpf_global_ma is counted towards root memcg,
> since it's created during boot. While hash map powered by bpf_mem_alloc
> is using memcg of the user that created that map.
- - -
IIUC, this "sleeping function called from invalid context" message on
PREEMPT_RT is because ___slab_alloc() does local_lock_irqsave(), with
IRQ disabled by __bpf_async_init():
__bpf_spin_lock_irqsave(&async->lock);
t = async->timer;
if (t) {
ret = -EBUSY;
goto out;
}
/* allocate hrtimer via map_kmalloc to use memcg accounting */
cb = bpf_map_kmalloc_node(map, size, __GFP_HIGH, map->numa_node);
For my understanding, is/how is kmalloc_nolock() going to resolve this?
Patch [3] changes ___slab_alloc() to:
/* must check again c->slab in case we got preempted and it changed */
- local_lock_irqsave(&s->cpu_slab->lock, flags);
+ local_lock_cpu_slab(s, &flags);
But for PREEMPT_RT, local_lock_cpu_slab() still does
local_lock_irqsave(), and the comment says that we can't call it with
IRQ disabled:
+ * On PREEMPT_RT an invocation is not possible from IRQ-off or preempt
+ * disabled context. The lock will always be acquired and if needed it
+ * block and sleep until the lock is available.
So it seems that we'll still have this "sleeping function called from
invalid context" issue for PREEMPT_RT even if we make __bpf_async_init()
use bpf_mem_alloc() (when the latter uses kmalloc_nolock())?
[3]
[PATCH v3 5/6] slab: Introduce kmalloc_nolock() and kfree_nolock().
https://lore.kernel.org/all/20250716022950.69330-6-alexei.starovoitov@gmail.com/
Thanks,
Peilin Ye
next prev parent reply other threads:[~2025-09-09 9:00 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-08 4:40 [PATCH bpf-next v2 0/2] bpf: Reject bpf_timer for PREEMPT_RT Leon Hwang
2025-09-08 4:40 ` [PATCH bpf-next v2 1/2] " Leon Hwang
2025-09-08 19:20 ` Eduard Zingerman
2025-09-08 19:29 ` Eduard Zingerman
2025-09-08 22:42 ` Peilin Ye
2025-09-08 22:51 ` Alexei Starovoitov
2025-09-09 9:00 ` Peilin Ye [this message]
2025-09-09 15:59 ` Alexei Starovoitov
2025-09-09 22:49 ` Alexei Starovoitov
2025-09-10 2:02 ` Leon Hwang
2025-09-10 2:06 ` Alexei Starovoitov
2025-09-11 16:38 ` Alexei Starovoitov
2025-09-12 2:20 ` Leon Hwang
2025-09-08 4:40 ` [PATCH bpf-next v2 2/2] selftests/bpf: Skip timer cases when bpf_timer is not supported Leon Hwang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aL_snlcI4zC4HtZw@google.com \
--to=yepeilin@google.com \
--cc=alexei.starovoitov@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=brho@google.com \
--cc=daniel@iogearbox.net \
--cc=eddyz87@gmail.com \
--cc=joshdon@google.com \
--cc=kernel-patches-bot@fb.com \
--cc=leon.hwang@linux.dev \
--cc=martin.lau@linux.dev \
--cc=memxor@gmail.com \
--cc=song@kernel.org \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.