BPF List
 help / color / mirror / Atom feed
From: Peilin Ye <yepeilin@google.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Eduard Zingerman <eddyz87@gmail.com>,
	Leon Hwang <leon.hwang@linux.dev>, bpf <bpf@vger.kernel.org>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <martin.lau@linux.dev>,
	Song Liu <song@kernel.org>,
	Yonghong Song <yonghong.song@linux.dev>,
	kernel-patches-bot@fb.com,
	Kumar Kartikeya Dwivedi <memxor@gmail.com>,
	Josh Don <joshdon@google.com>, Barret Rhoden <brho@google.com>
Subject: Re: [PATCH bpf-next v2 1/2] bpf: Reject bpf_timer for PREEMPT_RT
Date: Tue, 9 Sep 2025 09:00:14 +0000	[thread overview]
Message-ID: <aL_snlcI4zC4HtZw@google.com> (raw)
In-Reply-To: <CAADnVQ+56_gvS328irDEuGoDGFH6iywKriACtsre7h5a7eiJbw@mail.gmail.com>

On Mon, Sep 08, 2025 at 03:51:00PM -0700, Alexei Starovoitov wrote:
> On Mon, Sep 8, 2025 at 3:42 PM Peilin Ye <yepeilin@google.com> wrote:
> > Just in case - actually there was a patch that does this:
> >
> > [2] https://lore.kernel.org/bpf/20250905061919.439648-1-yepeilin@google.com/
> >
> > It was then superseded by the patches you linked [1] above however,
> > since per discussion in [2], "use bpf_mem_alloc() to skip memcg
> > accounting because it can trigger hardlockups" is a workaround instead
> > of a proper fix.
> >
> > I wonder if this new issue on PREEMPT_RT would justify [2] over [1]?
> > IIUC, until kmalloc_nolock() becomes available:
> >
> > [1] (plus Leon's patch here) means no bpf_timer on PREEMPT_RT, but we
> > still have memcg accounting for non-PREEMPT_RT; [2] means no memcg
> > accounting.
> 
> I didn't comment on the above statement earlier, because
> I thought you meant "no memcg accounting _inline_",
> but reading above it sounds that you think that bpf_mem_alloc()
> doesn't do memcg accounting at all ?
> That's incorrect. bpf_mem_alloc() always uses memcg accounting

Ah, I see - kernel/bpf/memalloc.c:alloc_bulk() via irq_work.  Thanks for
the correction!

> , but the usage is nuanced. bpf_global_ma is counted towards root memcg,
> since it's created during boot. While hash map powered by bpf_mem_alloc
> is using memcg of the user that created that map.

- - -
IIUC, this "sleeping function called from invalid context" message on
PREEMPT_RT is because ___slab_alloc() does local_lock_irqsave(), with
IRQ disabled by __bpf_async_init():

        __bpf_spin_lock_irqsave(&async->lock);
        t = async->timer;
        if (t) {
                ret = -EBUSY;
                goto out;
        }

        /* allocate hrtimer via map_kmalloc to use memcg accounting */
        cb = bpf_map_kmalloc_node(map, size, __GFP_HIGH, map->numa_node);

For my understanding, is/how is kmalloc_nolock() going to resolve this?
Patch [3] changes ___slab_alloc() to:

          /* must check again c->slab in case we got preempted and it changed */
 -        local_lock_irqsave(&s->cpu_slab->lock, flags);
 +        local_lock_cpu_slab(s, &flags);

But for PREEMPT_RT, local_lock_cpu_slab() still does
local_lock_irqsave(), and the comment says that we can't call it with
IRQ disabled:

 +         * On PREEMPT_RT an invocation is not possible from IRQ-off or preempt
 +         * disabled context. The lock will always be acquired and if needed it
 +         * block and sleep until the lock is available.

So it seems that we'll still have this "sleeping function called from
invalid context" issue for PREEMPT_RT even if we make __bpf_async_init()
use bpf_mem_alloc() (when the latter uses kmalloc_nolock())?

[3]
[PATCH v3 5/6] slab: Introduce kmalloc_nolock() and kfree_nolock().
https://lore.kernel.org/all/20250716022950.69330-6-alexei.starovoitov@gmail.com/

Thanks,
Peilin Ye


  reply	other threads:[~2025-09-09  9:00 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-08  4:40 [PATCH bpf-next v2 0/2] bpf: Reject bpf_timer for PREEMPT_RT Leon Hwang
2025-09-08  4:40 ` [PATCH bpf-next v2 1/2] " Leon Hwang
2025-09-08 19:20   ` Eduard Zingerman
2025-09-08 19:29     ` Eduard Zingerman
2025-09-08 22:42       ` Peilin Ye
2025-09-08 22:51         ` Alexei Starovoitov
2025-09-09  9:00           ` Peilin Ye [this message]
2025-09-09 15:59             ` Alexei Starovoitov
2025-09-09 22:49         ` Alexei Starovoitov
2025-09-10  2:02           ` Leon Hwang
2025-09-10  2:06             ` Alexei Starovoitov
2025-09-11 16:38               ` Alexei Starovoitov
2025-09-12  2:20                 ` Leon Hwang
2025-09-08  4:40 ` [PATCH bpf-next v2 2/2] selftests/bpf: Skip timer cases when bpf_timer is not supported Leon Hwang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aL_snlcI4zC4HtZw@google.com \
    --to=yepeilin@google.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=brho@google.com \
    --cc=daniel@iogearbox.net \
    --cc=eddyz87@gmail.com \
    --cc=joshdon@google.com \
    --cc=kernel-patches-bot@fb.com \
    --cc=leon.hwang@linux.dev \
    --cc=martin.lau@linux.dev \
    --cc=memxor@gmail.com \
    --cc=song@kernel.org \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox