From: Song Liu <songliubraving@meta.com>
To: Christian Brauner <brauner@kernel.org>
Cc: Song Liu <song@kernel.org>,
"bpf@vger.kernel.org" <bpf@vger.kernel.org>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-security-module@vger.kernel.org"
<linux-security-module@vger.kernel.org>,
Kernel Team <kernel-team@meta.com>,
"andrii@kernel.org" <andrii@kernel.org>,
"eddyz87@gmail.com" <eddyz87@gmail.com>,
"ast@kernel.org" <ast@kernel.org>,
"daniel@iogearbox.net" <daniel@iogearbox.net>,
"martin.lau@linux.dev" <martin.lau@linux.dev>,
"viro@zeniv.linux.org.uk" <viro@zeniv.linux.org.uk>,
"jack@suse.cz" <jack@suse.cz>,
"kpsingh@kernel.org" <kpsingh@kernel.org>,
"mattbobrowski@google.com" <mattbobrowski@google.com>,
"amir73il@gmail.com" <amir73il@gmail.com>,
"repnop@google.com" <repnop@google.com>,
"jlayton@kernel.org" <jlayton@kernel.org>,
Josef Bacik <josef@toxicpanda.com>,
"mic@digikod.net" <mic@digikod.net>,
"gnoack@google.com" <gnoack@google.com>
Subject: Re: [PATCH bpf-next 2/4] bpf: Make bpf inode storage available to tracing program
Date: Thu, 14 Nov 2024 21:11:57 +0000 [thread overview]
Message-ID: <86C65B85-8167-4D04-BFF5-40FD4F3407A4@fb.com> (raw)
In-Reply-To: <20241113-sensation-morgen-852f49484fd8@brauner>
Hi Christian,
> On Nov 13, 2024, at 2:19 AM, Christian Brauner <brauner@kernel.org> wrote:
[...]
>> static inline void bpf_lsm_find_cgroup_shim(const struct bpf_prog *prog,
>> bpf_func_t *bpf_func)
>> {
>> diff --git a/include/linux/fs.h b/include/linux/fs.h
>> index 3559446279c1..479097e4dd5b 100644
>> --- a/include/linux/fs.h
>> +++ b/include/linux/fs.h
>> @@ -79,6 +79,7 @@ struct fs_context;
>> struct fs_parameter_spec;
>> struct fileattr;
>> struct iomap_ops;
>> +struct bpf_local_storage;
>>
>> extern void __init inode_init(void);
>> extern void __init inode_init_early(void);
>> @@ -648,6 +649,9 @@ struct inode {
>> #ifdef CONFIG_SECURITY
>> void *i_security;
>> #endif
>> +#ifdef CONFIG_BPF_SYSCALL
>> + struct bpf_local_storage __rcu *i_bpf_storage;
>> +#endif
>
> Sorry, we're not growing struct inode for this. It just keeps getting
> bigger. Last cycle we freed up 8 bytes to shrink it and we're not going
> to waste them on special-purpose stuff. We already NAKed someone else's
> pet field here.
Per other discussions in this thread, I am implementing the following:
#ifdef CONFIG_SECURITY
void *i_security;
#elif CONFIG_BPF_SYSCALL
struct bpf_local_storage __rcu *i_bpf_storage;
#endif
However, it is a bit trickier than I thought. Specifically, we need
to deal with the following scenarios:
1. CONFIG_SECURITY=y && CONFIG_BPF_LSM=n && CONFIG_BPF_SYSCALL=y
2. CONFIG_SECURITY=y && CONFIG_BPF_LSM=y && CONFIG_BPF_SYSCALL=y but
bpf lsm is not enabled at boot time.
AFAICT, we need to modify how lsm blob are managed with
CONFIG_BPF_SYSCALL=y && CONFIG_BPF_LSM=n case. The solution, even
if it gets accepted, doesn't really save any memory. Instead of
growing struct inode by 8 bytes, the solution will allocate 8
more bytes to inode->i_security. So the total memory consumption
is the same, but the memory is more fragmented.
Therefore, I think we should really step back and consider adding
the i_bpf_storage to struct inode. While this does increase the
size of struct inode by 8 bytes, it may end up with less overall
memory consumption for the system. This is why.
When the user cannot use inode local storage, the alternative is
to use hash maps (use inode pointer as key). AFAICT, all hash maps
comes with non-trivial overhead, in memory consumption, in access
latency, and in extra code to manage the memory. OTOH, inode local
storage doesn't have these issue, and is usually much more efficient:
- memory is only allocated for inodes with actual data,
- O(1) latency,
- per inode data is freed automatically when the inode is evicted.
Please refer to [1] where Amir mentioned all the work needed to
properly manage a hash map, and I explained why we don't need to
worry about these with inode local storage.
Besides reducing memory consumption, i_bpf_storage also shortens
the pointer chain to access inode local storage. Before this set,
inode local storage is available at
inode->i_security+offset(struct bpf_storage_blob)->storage. After
this set, inode local storage is simply at inode->i_bpf_storage.
At the moment, we are using bpf local storage with
task_struct->bpf_storage, struct sock->sk_bpf_storage,
struct cgroup->bpf_cgrp_storage. All of these turned out to be
successful and helped users to use memory more efficiently. I
think we can see the same benefits with struct inode->i_bpf_storage.
I hope these make sense, and you will consider adding i_bpf_storage.
Please let me know if anything above is not clear.
Thanks,
Song
[1] https://lore.kernel.org/linux-fsdevel/CAOQ4uxjXjjkKMa1xcPyxE5vxh1U5oGZJWtofRCwp-3ikCA6cgg@mail.gmail.com/
next prev parent reply other threads:[~2024-11-14 21:12 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-12 8:25 [PATCH bpf-next 0/4] Make inode storage available to tracing prog Song Liu
2024-11-12 8:25 ` [PATCH bpf-next 1/4] bpf: lsm: Remove hook to bpf_task_storage_free Song Liu
2024-11-12 8:25 ` [PATCH bpf-next 2/4] bpf: Make bpf inode storage available to tracing program Song Liu
2024-11-13 10:19 ` Christian Brauner
2024-11-13 14:15 ` Song Liu
2024-11-13 18:29 ` Casey Schaufler
2024-11-13 19:00 ` Song Liu
2024-11-14 21:11 ` Song Liu [this message]
2024-11-15 11:19 ` Jan Kara
2024-11-15 17:35 ` Song Liu
2024-11-19 14:21 ` Jeff Layton
2024-11-19 15:25 ` Amir Goldstein
2024-11-19 15:30 ` Amir Goldstein
2024-11-19 21:53 ` Song Liu
2024-11-20 9:19 ` Amir Goldstein
2024-11-20 9:28 ` Christian Brauner
2024-11-20 11:19 ` Amir Goldstein
2024-11-12 8:25 ` [PATCH bpf-next 3/4] bpf: Add recursion avoid logic for inode storage Song Liu
2024-11-12 8:25 ` [PATCH bpf-next 3/4] bpf: Add recursion prevention " Song Liu
2024-11-12 8:25 ` [PATCH bpf-next 4/4] selftest/bpf: Add test for inode local storage recursion Song Liu
2024-11-12 8:26 ` [PATCH bpf-next 4/4] selftest/bpf: Test inode local storage recursion prevention Song Liu
2024-11-12 8:35 ` [PATCH bpf-next 0/4] Make inode storage available to tracing prog Song Liu
2024-11-12 18:09 ` Casey Schaufler
2024-11-12 18:44 ` Song Liu
2024-11-13 1:10 ` Casey Schaufler
2024-11-13 1:37 ` Song Liu
2024-11-13 18:06 ` Casey Schaufler
2024-11-13 18:57 ` Song Liu
2024-11-14 16:36 ` Dr. Greg
2024-11-14 17:29 ` Casey Schaufler
2024-11-14 18:08 ` Song Liu
2024-11-14 21:49 ` James Bottomley
2024-11-14 22:30 ` Song Liu
2024-11-17 22:59 ` Song Liu
2024-11-19 12:27 ` Dr. Greg
2024-11-19 18:14 ` Casey Schaufler
2024-11-19 22:35 ` Song Liu
2024-11-14 17:51 ` Song Liu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=86C65B85-8167-4D04-BFF5-40FD4F3407A4@fb.com \
--to=songliubraving@meta.com \
--cc=amir73il@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=brauner@kernel.org \
--cc=daniel@iogearbox.net \
--cc=eddyz87@gmail.com \
--cc=gnoack@google.com \
--cc=jack@suse.cz \
--cc=jlayton@kernel.org \
--cc=josef@toxicpanda.com \
--cc=kernel-team@meta.com \
--cc=kpsingh@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-security-module@vger.kernel.org \
--cc=martin.lau@linux.dev \
--cc=mattbobrowski@google.com \
--cc=mic@digikod.net \
--cc=repnop@google.com \
--cc=song@kernel.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox