From: Peilin Ye <yepeilin@google.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Eduard Zingerman <eddyz87@gmail.com>,
Leon Hwang <leon.hwang@linux.dev>, bpf <bpf@vger.kernel.org>,
Alexei Starovoitov <ast@kernel.org>,
Andrii Nakryiko <andrii@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Martin KaFai Lau <martin.lau@linux.dev>,
Song Liu <song@kernel.org>,
Yonghong Song <yonghong.song@linux.dev>,
kernel-patches-bot@fb.com,
Kumar Kartikeya Dwivedi <memxor@gmail.com>,
Josh Don <joshdon@google.com>, Barret Rhoden <brho@google.com>
Subject: Re: [PATCH bpf-next v2 1/2] bpf: Reject bpf_timer for PREEMPT_RT
Date: Tue, 9 Sep 2025 09:00:14 +0000 [thread overview]
Message-ID: <aL_snlcI4zC4HtZw@google.com> (raw)
In-Reply-To: <CAADnVQ+56_gvS328irDEuGoDGFH6iywKriACtsre7h5a7eiJbw@mail.gmail.com>
On Mon, Sep 08, 2025 at 03:51:00PM -0700, Alexei Starovoitov wrote:
> On Mon, Sep 8, 2025 at 3:42 PM Peilin Ye <yepeilin@google.com> wrote:
> > Just in case - actually there was a patch that does this:
> >
> > [2] https://lore.kernel.org/bpf/20250905061919.439648-1-yepeilin@google.com/
> >
> > It was then superseded by the patches you linked [1] above however,
> > since per discussion in [2], "use bpf_mem_alloc() to skip memcg
> > accounting because it can trigger hardlockups" is a workaround instead
> > of a proper fix.
> >
> > I wonder if this new issue on PREEMPT_RT would justify [2] over [1]?
> > IIUC, until kmalloc_nolock() becomes available:
> >
> > [1] (plus Leon's patch here) means no bpf_timer on PREEMPT_RT, but we
> > still have memcg accounting for non-PREEMPT_RT; [2] means no memcg
> > accounting.
>
> I didn't comment on the above statement earlier, because
> I thought you meant "no memcg accounting _inline_",
> but reading above it sounds that you think that bpf_mem_alloc()
> doesn't do memcg accounting at all ?
> That's incorrect. bpf_mem_alloc() always uses memcg accounting
Ah, I see - kernel/bpf/memalloc.c:alloc_bulk() via irq_work. Thanks for
the correction!
> , but the usage is nuanced. bpf_global_ma is counted towards root memcg,
> since it's created during boot. While hash map powered by bpf_mem_alloc
> is using memcg of the user that created that map.
- - -
IIUC, this "sleeping function called from invalid context" message on
PREEMPT_RT is because ___slab_alloc() does local_lock_irqsave(), with
IRQ disabled by __bpf_async_init():
__bpf_spin_lock_irqsave(&async->lock);
t = async->timer;
if (t) {
ret = -EBUSY;
goto out;
}
/* allocate hrtimer via map_kmalloc to use memcg accounting */
cb = bpf_map_kmalloc_node(map, size, __GFP_HIGH, map->numa_node);
For my understanding, is/how is kmalloc_nolock() going to resolve this?
Patch [3] changes ___slab_alloc() to:
/* must check again c->slab in case we got preempted and it changed */
- local_lock_irqsave(&s->cpu_slab->lock, flags);
+ local_lock_cpu_slab(s, &flags);
But for PREEMPT_RT, local_lock_cpu_slab() still does
local_lock_irqsave(), and the comment says that we can't call it with
IRQ disabled:
+ * On PREEMPT_RT an invocation is not possible from IRQ-off or preempt
+ * disabled context. The lock will always be acquired and if needed it
+ * block and sleep until the lock is available.
So it seems that we'll still have this "sleeping function called from
invalid context" issue for PREEMPT_RT even if we make __bpf_async_init()
use bpf_mem_alloc() (when the latter uses kmalloc_nolock())?
[3]
[PATCH v3 5/6] slab: Introduce kmalloc_nolock() and kfree_nolock().
https://lore.kernel.org/all/20250716022950.69330-6-alexei.starovoitov@gmail.com/
Thanks,
Peilin Ye
next prev parent reply other threads:[~2025-09-09 9:00 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-08 4:40 [PATCH bpf-next v2 0/2] bpf: Reject bpf_timer for PREEMPT_RT Leon Hwang
2025-09-08 4:40 ` [PATCH bpf-next v2 1/2] " Leon Hwang
2025-09-08 19:20 ` Eduard Zingerman
2025-09-08 19:29 ` Eduard Zingerman
2025-09-08 22:42 ` Peilin Ye
2025-09-08 22:51 ` Alexei Starovoitov
2025-09-09 9:00 ` Peilin Ye [this message]
2025-09-09 15:59 ` Alexei Starovoitov
2025-09-09 22:49 ` Alexei Starovoitov
2025-09-10 2:02 ` Leon Hwang
2025-09-10 2:06 ` Alexei Starovoitov
2025-09-11 16:38 ` Alexei Starovoitov
2025-09-12 2:20 ` Leon Hwang
2025-09-08 4:40 ` [PATCH bpf-next v2 2/2] selftests/bpf: Skip timer cases when bpf_timer is not supported Leon Hwang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aL_snlcI4zC4HtZw@google.com \
--to=yepeilin@google.com \
--cc=alexei.starovoitov@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=brho@google.com \
--cc=daniel@iogearbox.net \
--cc=eddyz87@gmail.com \
--cc=joshdon@google.com \
--cc=kernel-patches-bot@fb.com \
--cc=leon.hwang@linux.dev \
--cc=martin.lau@linux.dev \
--cc=memxor@gmail.com \
--cc=song@kernel.org \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox