From: Yonghong Song <yhs@fb.com>
To: rainkin <rainkin1993@gmail.com>
Cc: bpf <bpf@vger.kernel.org>
Subject: Re: Create inner maps dynamically from ebpf kernel prog program
Date: Tue, 22 Jun 2021 08:40:25 -0700 [thread overview]
Message-ID: <8ffd3d8a-6137-da45-b838-a965be7aa18f@fb.com> (raw)
In-Reply-To: <CAHb-xav98Hy7=aGZsaU67Vw19OnGV8fsnzD+Xp6FJkGUtmmuZA@mail.gmail.com>
On 6/21/21 11:47 PM, rainkin wrote:
>>
>>
>>
>> On 6/21/21 6:12 AM, rainkin wrote:
>>> Hi,
>>>
>>> My ebpf program is attched to kprobe/vfs_read, my use case is to store
>>> information of each file (i.e., inode) of each process by using
>>> map-in-map (e.g., outer map is a hash map where key is pid, value is a
>>> inner map where key is inode, value is some stateful information I
>>> want to store.
>>> Thus I need to create a new inner map for a new coming inode.
>>>
>>> I know there exists local storage for task/inode, however, limited to
>>> my kernel version (4.1x), those local storage cannot be used.
>>>
>>> I tried two methods:
>>> 1. dynamically create a new inner in user-land ebpf program by
>>> following this tutorial:
>>> https://github.com/torvalds/linux/blob/master/samples/bpf/test_map_in_map_user.c
>>> Then insert the new inner map into the outer map.
>>> The limitation of this method:
>>> It requires ebpf kernel program send a message to user-land program to
>>> create a newly inner map.
>>> And ebpf kernel programs might access the map before user-land program
>>> finishes the job.
>>>
>>> 2. Thus, i prefer the second method: dynamically create inner maps in
>>> the kernel ebpf program.
>>> According to the discussion in the following thread, it seems that it
>>> can be done by calling bpf_map_update_elem():
>>> https://lore.kernel.org/bpf/878sdlpv92.fsf@toke.dk/T/#e9bac624324ffd3efb0c9f600426306e3a40ec
>>> 7b5
>>>> Creating a new map for map_in_map from bpf prog can be implemented.
>>>> bpf_map_update_elem() is doing memory allocation for map elements. In such a case calling
>>>> this helper on map_in_map can, in theory, create a new inner map and insert it into the outer map.
>>>
>>> However, when I call method to create a new inner, it return the error:
>>> 64: (bf) r2 = r10
>>> 65: (07) r2 += -144
>>> 66: (bf) r3 = r10
>>> 67: (07) r3 += -176
>>> ; bpf_map_update_elem(&outer, &ino, &new_inner, BPF_ANY);
>>> 68: (18) r1 = 0xffff8dfb7399e400
>>> 70: (b7) r4 = 0
>>> 71: (85) call bpf_map_update_elem#2
>>> cannot pass map_type 13 into func bpf_map_update_elem#2
>>
>> This is expected based on current verifier implementation.
>> In verifier check_map_func_compatibility() function, we have
>>
>> case BPF_MAP_TYPE_ARRAY_OF_MAPS:
>> case BPF_MAP_TYPE_HASH_OF_MAPS:
>> if (func_id != BPF_FUNC_map_lookup_elem)
>> goto error;
>> break;
>>
>> For array/hash map-in-map, the only supported helper
>> is bpf_map_lookup_elem(). bpf_map_update_elem()
>> is not supported yet.
>
> Thanks for your answer!
> If I understand correctly, the conclusion is that (at least for now)
> *ebpf kernel program*
> CAN only do lookup for array/hash map-in-map, and CANNOT do
> add/update/delete for array/hash
> map-in-map, and CANNOT create reguar hash/array maps dynamically.
Right.
>
>
>>
>> For your method #1, the bpf helper bpf_send_signal() or
>> bpf_send_signal_thread() might help to send some info
>> to user space, but I think they are not available in
>> 4.x kernels.
>>
>> Maybe a single map with key (pid, inode) may work?
>>
>>>
>>> new_inner is a structure of inner hashmap.
>>>
>>> Any suggestions?
>>> Thanks,
>>> Rainkin
>>>
>
> a single map with key (pid, inode) is ok for the above scenario, however,
> when I want to cleanup all entries realted to a certain pid when a
> process exits,
> a single map is NOT ok. I need to go through all the keys of the
> single map and delete keys related
> to the certain pid.
I understand this. Totally agree that it is expensive for the cleanup.
In such cases, map_in_map is the best strategy.
Alexei recently added a support to call bpf create_map/update_map
syscall in the bpf program ([1]). This needs to be a new program
type though.
In your particular case, you are doing kprobe/vfs_read which is
in the process context and in the beginning of syscall, it probably
safe to call create/update_map syscalls (I did not look at the
kernel codes thoroughly). But verifier needs to ensure it is
indeed safe. There are some ongoing compiler annotation work ([2]),
which may help annotate such functions so verifier can do
an effective work.
BTW, this is all future work. For now, esp. if you are using
4.1x kernels, I guess (pid, inode) probably your best shot.
[1]
https://lore.kernel.org/bpf/20210514003623.28033-2-alexei.starovoitov@gmail.com/
[2] https://reviews.llvm.org/D103667
prev parent reply other threads:[~2021-06-22 15:40 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-21 13:12 Create inner maps dynamically from ebpf kernel prog program rainkin
2021-06-22 5:55 ` Yonghong Song
2021-06-22 6:47 ` rainkin
2021-06-22 15:40 ` Yonghong Song [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8ffd3d8a-6137-da45-b838-a965be7aa18f@fb.com \
--to=yhs@fb.com \
--cc=bpf@vger.kernel.org \
--cc=rainkin1993@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox