From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: "Thomas Weißschuh" <thomas.weissschuh@linutronix.de>
Cc: "Hou Tao" <houtao@huaweicloud.com>,
bpf@vger.kernel.org, "Martin KaFai Lau" <martin.lau@linux.dev>,
"Alexei Starovoitov" <alexei.starovoitov@gmail.com>,
"Andrii Nakryiko" <andrii@kernel.org>,
"Eduard Zingerman" <eddyz87@gmail.com>,
"Song Liu" <song@kernel.org>, "Hao Luo" <haoluo@google.com>,
"Yonghong Song" <yonghong.song@linux.dev>,
"Daniel Borkmann" <daniel@iogearbox.net>,
"KP Singh" <kpsingh@kernel.org>,
"Stanislav Fomichev" <sdf@fomichev.me>,
"Jiri Olsa" <jolsa@kernel.org>,
"John Fastabend" <john.fastabend@gmail.com>,
"Sebastian Andrzej Siewior" <bigeasy@linutronix.de>,
"Thomas Gleixner" <tglx@linutronix.de>,
"Thomas Weißschuh" <linux@weissschuh.net>,
houtao1@huawei.com, xukuohai@huawei.com
Subject: Re: [PATCH bpf-next 07/10] bpf: Switch to bpf mem allocator for LPM trie
Date: Thu, 21 Nov 2024 13:50:49 +0100 [thread overview]
Message-ID: <87wmgwhhsm.fsf@toke.dk> (raw)
In-Reply-To: <20241121124649-bc310634-8cc9-464e-bb81-6a9ad0f8e136@linutronix.de>
Thomas Weißschuh <thomas.weissschuh@linutronix.de> writes:
> On Thu, Nov 21, 2024 at 12:39:08PM +0100, Toke Høiland-Jørgensen wrote:
>> Hou Tao <houtao@huaweicloud.com> writes:
>>
>> > Fix these warnings by replacing kmalloc()/kfree()/kfree_rcu() with
>> > equivalent bpf memory allocator APIs. Since intermediate node and leaf
>> > node have fixed sizes, fixed-size allocation APIs are used.
>> >
>> > Two aspects of this change require explanation:
>> >
>> > 1. A new flag LPM_TREE_NODE_FLAG_ALLOC_LEAF is added to track the
>> > original allocator. This is necessary because during deletion, a leaf
>> > node may be used as an intermediate node. These nodes must be freed
>> > through the leaf allocator.
>> > 2. The intermediate node allocator and leaf node allocator may be merged
>> > because value_size for LPM trie is usually small. The merging reduces
>> > the memory overhead of bpf memory allocator.
>>
>> This seems like an awfully complicated way to fix this. Couldn't we just
>> move the node allocations in trie_update_elem() out so they happen
>> before the trie lock is taken instead?
>
> The problematic lock nesting is not between the trie lock and the
> allocator lock but between each of them and any other lock in the kernel.
> BPF programs can be called from any context through tracepoints.
> In this specific case the issue was a tracepoint executed under the
> workqueue lock.
That is not the issue described in the commit message, though. If the
goal is to make the lpm_trie map usable in any context, the commit
message should be rewritten to reflect this, instead of mentioning a
specific deadlock between the trie lock and the allocator lock.
And in that case, I think it's better to use a single 'struct
bpf_mem_alloc' per map (like hashmaps do). This will waste some memory
for intermediate nodes, but that seems like an acceptable trade-off to
avoid all the complexity around two different allocators.
Not sure if Alexei's comment about too many allocators was aimed solely
at this, or if he has issues even with having a single allocator per map
as well; but in that case, that seems like something that should be
fixed for hashmaps as well?
-Toke
next prev parent reply other threads:[~2024-11-21 12:50 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-18 1:07 [PATCH bpf-next 00/10] Fixes for LPM trie Hou Tao
2024-11-18 1:07 ` [PATCH bpf-next 01/10] bpf: Remove unnecessary check when updating " Hou Tao
2024-11-21 10:22 ` Toke Høiland-Jørgensen
2024-11-18 1:08 ` [PATCH bpf-next 02/10] bpf: Remove unnecessary kfree(im_node) in lpm_trie_update_elem Hou Tao
2024-11-21 10:25 ` Toke Høiland-Jørgensen
2024-11-18 1:08 ` [PATCH bpf-next 03/10] bpf: Handle BPF_EXIST and BPF_NOEXIST for LPM trie Hou Tao
2024-11-18 13:39 ` Thomas Weißschuh
2024-11-19 1:08 ` Hou Tao
2024-11-21 10:32 ` Toke Høiland-Jørgensen
2024-11-18 1:08 ` [PATCH bpf-next 04/10] bpf: Handle in-place update for full LPM trie correctly Hou Tao
2024-11-18 13:13 ` Sebastian Andrzej Siewior
2024-11-19 1:05 ` Hou Tao
2024-11-21 10:53 ` Toke Høiland-Jørgensen
2024-11-22 2:06 ` Hou Tao
2024-11-18 1:08 ` [PATCH bpf-next 05/10] bpf: Fix exact match conditions in trie_get_next_key() Hou Tao
2024-11-21 11:01 ` Toke Høiland-Jørgensen
2024-11-18 1:08 ` [PATCH bpf-next 06/10] bpf: Add bpf_mem_cache_is_mergeable() helper Hou Tao
2024-11-18 13:29 ` Thomas Weißschuh
2024-11-19 1:06 ` Hou Tao
2024-11-18 1:08 ` [PATCH bpf-next 07/10] bpf: Switch to bpf mem allocator for LPM trie Hou Tao
2024-11-18 13:30 ` Sebastian Andrzej Siewior
2024-11-18 16:56 ` Yonghong Song
2024-11-20 1:16 ` Alexei Starovoitov
2024-11-21 1:20 ` Hou Tao
2024-11-23 3:29 ` Alexei Starovoitov
2024-11-21 11:39 ` Toke Høiland-Jørgensen
2024-11-21 11:52 ` Thomas Weißschuh
2024-11-21 12:50 ` Toke Høiland-Jørgensen [this message]
2024-11-22 3:36 ` Hou Tao
2024-11-18 1:08 ` [PATCH bpf-next 08/10] bpf: Use raw_spinlock_t " Hou Tao
2024-11-18 1:08 ` [PATCH bpf-next 09/10] selftests/bpf: Move test_lpm_map.c to map_tests Hou Tao
2024-11-18 1:08 ` [PATCH bpf-next 10/10] selftests/bpf: Add more test cases for LPM trie Hou Tao
2024-11-18 17:46 ` Daniel Borkmann
2024-11-19 1:10 ` Hou Tao
[not found] ` <46268aa9ef13a24388af833b17f6cef8bdd3a7be8402fec7640e65a2f1118468@mail.kernel.org>
2024-11-18 6:20 ` [PATCH bpf-next 00/10] Fixes " Hou Tao
2024-11-18 17:43 ` Daniel Xu
2024-11-19 1:09 ` Hou Tao
2024-11-18 15:39 ` Sebastian Andrzej Siewior
2024-11-19 1:35 ` Hou Tao
2024-11-19 14:15 ` Sebastian Andrzej Siewior
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87wmgwhhsm.fsf@toke.dk \
--to=toke@redhat.com \
--cc=alexei.starovoitov@gmail.com \
--cc=andrii@kernel.org \
--cc=bigeasy@linutronix.de \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=eddyz87@gmail.com \
--cc=haoluo@google.com \
--cc=houtao1@huawei.com \
--cc=houtao@huaweicloud.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kpsingh@kernel.org \
--cc=linux@weissschuh.net \
--cc=martin.lau@linux.dev \
--cc=sdf@fomichev.me \
--cc=song@kernel.org \
--cc=tglx@linutronix.de \
--cc=thomas.weissschuh@linutronix.de \
--cc=xukuohai@huawei.com \
--cc=yonghong.song@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox