public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Leon Hwang <leon.hwang@linux.dev>
To: Daniel Borkmann <daniel@iogearbox.net>, bpf@vger.kernel.org
Cc: Martin KaFai Lau <martin.lau@linux.dev>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Eduard Zingerman <eddyz87@gmail.com>, Song Liu <song@kernel.org>,
	Yonghong Song <yonghong.song@linux.dev>,
	John Fastabend <john.fastabend@gmail.com>,
	KP Singh <kpsingh@kernel.org>,
	Stanislav Fomichev <sdf@fomichev.me>, Hao Luo <haoluo@google.com>,
	Jiri Olsa <jolsa@kernel.org>, Shuah Khan <shuah@kernel.org>,
	linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org,
	kernel-patches-bot@fb.com,
	Kumar Kartikeya Dwivedi <memxor@gmail.com>
Subject: Re: [PATCH bpf-next 2/3] bpf: Avoid deadlock using trylock when popping LRU free nodes
Date: Tue, 20 Jan 2026 09:49:54 +0800	[thread overview]
Message-ID: <123a63a2-5679-4bd0-9e16-dc5c7dbe3325@linux.dev> (raw)
In-Reply-To: <d4b8843b-c5dc-4468-996a-bacc2db63f11@iogearbox.net>



On 20/1/26 03:47, Daniel Borkmann wrote:
> On 1/19/26 3:21 PM, Leon Hwang wrote:
>> Switch the free-node pop paths to raw_spin_trylock*() to avoid blocking
>> on contended LRU locks.
>>
>> If the global or per-CPU LRU lock is unavailable, refuse to refill the
>> local free list and return NULL instead. This allows callers to back
>> off safely rather than blocking or re-entering the same lock context.
>>
>> This change avoids lockdep warnings and potential deadlocks caused by
>> re-entrant LRU lock acquisition from NMI context, as shown below:
>>
>> [  418.260323] bpf_testmod: oh no, recursing into test_1,
>> recursion_misses 1
>> [  424.982207] ================================
>> [  424.982216] WARNING: inconsistent lock state
>> [  424.982223] inconsistent {INITIAL USE} -> {IN-NMI} usage.
>> [  424.982314]  *** DEADLOCK ***
>> [...]
>>
>> Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
>> ---
>>   kernel/bpf/bpf_lru_list.c | 17 ++++++++++-------
>>   1 file changed, 10 insertions(+), 7 deletions(-)
> 
> Documentation/bpf/map_lru_hash_update.dot needs update?
> 

Yes, it needs update.

>> diff --git a/kernel/bpf/bpf_lru_list.c b/kernel/bpf/bpf_lru_list.c
>> index c091f3232cc5..03d37f72731a 100644
>> --- a/kernel/bpf/bpf_lru_list.c
>> +++ b/kernel/bpf/bpf_lru_list.c
>> @@ -312,14 +312,15 @@ static void bpf_lru_list_push_free(struct
>> bpf_lru_list *l,
>>       raw_spin_unlock_irqrestore(&l->lock, flags);
>>   }
>>   -static void bpf_lru_list_pop_free_to_local(struct bpf_lru *lru,
>> +static bool bpf_lru_list_pop_free_to_local(struct bpf_lru *lru,
>>                          struct bpf_lru_locallist *loc_l)
>>   {
>>       struct bpf_lru_list *l = &lru->common_lru.lru_list;
>>       struct bpf_lru_node *node, *tmp_node;
>>       unsigned int nfree = 0;
>>   -    raw_spin_lock(&l->lock);
>> +    if (!raw_spin_trylock(&l->lock))
>> +        return false;
>>   
> 
> Could you provide some more analysis, and the effect this has on real-world
> programs? Presumably they'll unexpectedly encounter a lot more frequent
> -ENOMEM as an error on bpf_map_update_elem even though memory might be
> available just that locks are contended?
> 
> Also, have you considered rqspinlock as a potential candidate to discover
> deadlocks?

Thanks for the questions.

While I haven’t encountered this issue in production systems myself, the
deadlock has been observed repeatedly in practice, including the cases
shown in the cover letter. It can also be reproduced reliably when
running the LRU tests locally, so this is a real and recurring problem.

I agree that returning -ENOMEM when locks are contended is not ideal.
Using -EBUSY would better reflect the situation where memory is
available but forward progress is temporarily blocked by lock
contention. I can update the patch accordingly.

Regarding rqspinlock: as mentioned in the cover letter, Menglong
previously explored using rqspinlock to address these deadlocks but was
unable to arrive at a complete solution. After further off-list
discussion, we agreed that using trylock is a more practical approach
here. In most observed cases, the lock contention leading to deadlock
occurs in bpf_common_lru_pop_free(), and trylock allows callers to back
off safely rather than risking re-entrancy and deadlock.

Thanks,
Leon


  reply	other threads:[~2026-01-20  1:50 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-19 14:21 [PATCH bpf-next 0/3] bpf: Avoid deadlock using trylock when popping LRU free nodes Leon Hwang
2026-01-19 14:21 ` [PATCH bpf-next 1/3] bpf: Factor out bpf_lru_node_set_hash() helper Leon Hwang
2026-01-19 14:21 ` [PATCH bpf-next 2/3] bpf: Avoid deadlock using trylock when popping LRU free nodes Leon Hwang
2026-01-19 18:46   ` bot+bpf-ci
2026-01-20  1:56     ` Leon Hwang
2026-01-20  2:01       ` Alexei Starovoitov
2026-01-20  2:19         ` Leon Hwang
2026-01-19 19:47   ` Daniel Borkmann
2026-01-20  1:49     ` Leon Hwang [this message]
2026-01-20  1:54       ` Alexei Starovoitov
2026-01-19 14:21 ` [PATCH bpf-next 3/3] selftests/bpf: Allow -ENOMEM on LRU map updates Leon Hwang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=123a63a2-5679-4bd0-9e16-dc5c7dbe3325@linux.dev \
    --to=leon.hwang@linux.dev \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=eddyz87@gmail.com \
    --cc=haoluo@google.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kernel-patches-bot@fb.com \
    --cc=kpsingh@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=martin.lau@linux.dev \
    --cc=memxor@gmail.com \
    --cc=sdf@fomichev.me \
    --cc=shuah@kernel.org \
    --cc=song@kernel.org \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox