From: "Dr. Greg" <greg@enjellic.com>
To: Song Liu <songliubraving@meta.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>,
"jack@suse.cz" <jack@suse.cz>,
"brauner@kernel.org" <brauner@kernel.org>,
Casey Schaufler <casey@schaufler-ca.com>,
"Dr. Greg" <greg@enjellic.com>, Song Liu <song@kernel.org>,
"bpf@vger.kernel.org" <bpf@vger.kernel.org>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"linux-security-module@vger.kernel.org"
<linux-security-module@vger.kernel.org>,
Kernel Team <kernel-team@meta.com>,
"andrii@kernel.org" <andrii@kernel.org>,
"eddyz87@gmail.com" <eddyz87@gmail.com>,
"ast@kernel.org" <ast@kernel.org>,
"daniel@iogearbox.net" <daniel@iogearbox.net>,
"martin.lau@linux.dev" <martin.lau@linux.dev>,
"viro@zeniv.linux.org.uk" <viro@zeniv.linux.org.uk>,
"kpsingh@kernel.org" <kpsingh@kernel.org>,
"mattbobrowski@google.com" <mattbobrowski@google.com>,
"amir73il@gmail.com" <amir73il@gmail.com>,
"repnop@google.com" <repnop@google.com>,
"jlayton@kernel.org" <jlayton@kernel.org>,
Josef Bacik <josef@toxicpanda.com>,
"mic@digikod.net" <mic@digikod.net>,
"gnoack@google.com" <gnoack@google.com>
Subject: Re: [PATCH bpf-next 0/4] Make inode storage available to tracing prog
Date: Tue, 19 Nov 2024 06:27:06 -0600 [thread overview]
Message-ID: <20241119122706.GA19220@wind.enjellic.com> (raw)
In-Reply-To: <A7017094-1A0C-42C8-BE9D-7352D2200ECC@fb.com>
On Sun, Nov 17, 2024 at 10:59:18PM +0000, Song Liu wrote:
> Hi Christian, James and Jan,
Good morning, I hope the day is starting well for everyone.
> > On Nov 14, 2024, at 1:49???PM, James Bottomley <James.Bottomley@HansenPartnership.com> wrote:
>
> [...]
>
> >>
> >> We can address this with something like following:
> >>
> >> #ifdef CONFIG_SECURITY
> >> void *i_security;
> >> #elif CONFIG_BPF_SYSCALL
> >> struct bpf_local_storage __rcu *i_bpf_storage;
> >> #endif
> >>
> >> This will help catch all misuse of the i_bpf_storage at compile
> >> time, as i_bpf_storage doesn't exist with CONFIG_SECURITY=y.
> >>
> >> Does this make sense?
> >
> > Got to say I'm with Casey here, this will generate horrible and failure
> > prone code.
> >
> > Since effectively you're making i_security always present anyway,
> > simply do that and also pull the allocation code out of security.c in a
> > way that it's always available? That way you don't have to special
> > case the code depending on whether CONFIG_SECURITY is defined.
> > Effectively this would give everyone a generic way to attach some
> > memory area to an inode. I know it's more complex than this because
> > there are LSM hooks that run from security_inode_alloc() but if you can
> > make it work generically, I'm sure everyone will benefit.
> On a second thought, I think making i_security generic is not
> the right solution for "BPF inode storage in tracing use cases".
>
> This is because i_security serves a very specific use case: it
> points to a piece of memory whose size is calculated at system
> boot time. If some of the supported LSMs is not enabled by the
> lsm= kernel arg, the kernel will not allocate memory in
> i_security for them. The only way to change lsm= is to reboot
> the system. BPF LSM programs can be disabled at the boot time,
> which fits well in i_security. However, BPF tracing programs
> cannot be disabled at boot time (even we change the code to
> make it possible, we are not likely to disable BPF tracing).
> IOW, as long as CONFIG_BPF_SYSCALL is enabled, we expect some
> BPF tracing programs to load at some point of time, and these
> programs may use BPF inode storage.
>
> Therefore, with CONFIG_BPF_SYSCALL enabled, some extra memory
> always will be attached to i_security (maybe under a different
> name, say, i_generic) of every inode. In this case, we should
> really add i_bpf_storage directly to the inode, because another
> pointer jump via i_generic gives nothing but overhead.
>
> Does this make sense? Or did I misunderstand the suggestion?
There is a colloquialism that seems relevant here: "Pick your poison".
In the greater interests of the kernel, it seems that a generic
mechanism for attaching per inode information is the only realistic
path forward, unless Christian changes his position on expanding
the size of struct inode.
There are two pathways forward.
1.) Attach a constant size 'blob' of storage to each inode.
This is a similar approach to what the LSM uses where each blob is
sized as follows:
S = U * sizeof(void *)
Where U is the number of sub-systems that have a desire to use inode
specific storage.
Each sub-system uses it's pointer slot to manage any additional
storage that it desires to attach to the inode.
This has the obvious advantage of O(1) cost complexity for any
sub-system that wants to access its inode specific storage.
The disadvantage, as you note, is that it wastes memory if a
sub-system does not elect to attach per inode information, for example
the tracing infrastructure.
This disadvantage is parried by the fact that it reduces the size of
the inode proper by 24 bytes (4 pointers down to 1) and allows future
extensibility without colliding with the interests and desires of the
VFS maintainers.
2.) Implement key/value mapping for inode specific storage.
The key would be a sub-system specific numeric value that returns a
pointer the sub-system uses to manage its inode specific memory for a
particular inode.
A participating sub-system in turn uses its identifier to register an
inode specific pointer for its sub-system.
This strategy loses O(1) lookup complexity but reduces total memory
consumption and only imposes memory costs for inodes when a sub-system
desires to use inode specific storage.
Approach 2 requires the introduction of generic infrastructure that
allows an inode's key/value mappings to be located, presumably based
on the inode's pointer value. We could probably just resurrect the
old IMA iint code for this purpose.
In the end it comes down to a rather standard trade-off in this
business, memory vs. execution cost.
We would posit that option 2 is the only viable scheme if the design
metric is overall good for the Linux kernel eco-system.
> Thanks,
> Song
Have a good day.
As always,
Dr. Greg
The Quixote Project - Flailing at the Travails of Cybersecurity
https://github.com/Quixote-Project
next prev parent reply other threads:[~2024-11-19 12:28 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-12 8:25 [PATCH bpf-next 0/4] Make inode storage available to tracing prog Song Liu
2024-11-12 8:25 ` [PATCH bpf-next 1/4] bpf: lsm: Remove hook to bpf_task_storage_free Song Liu
2024-11-12 8:25 ` [PATCH bpf-next 2/4] bpf: Make bpf inode storage available to tracing program Song Liu
2024-11-13 10:19 ` Christian Brauner
2024-11-13 14:15 ` Song Liu
2024-11-13 18:29 ` Casey Schaufler
2024-11-13 19:00 ` Song Liu
2024-11-21 9:04 ` Christian Brauner
2024-11-14 21:11 ` Song Liu
2024-11-15 11:19 ` Jan Kara
2024-11-15 17:35 ` Song Liu
2024-11-19 14:21 ` Jeff Layton
2024-11-19 15:25 ` Amir Goldstein
2024-11-19 15:30 ` Amir Goldstein
2024-11-19 21:53 ` Song Liu
2024-11-20 9:19 ` Amir Goldstein
2024-11-20 9:28 ` Christian Brauner
2024-11-20 11:19 ` Amir Goldstein
2024-11-21 8:43 ` Christian Brauner
2024-11-21 13:48 ` Jeff Layton
2024-11-21 8:08 ` Song Liu
2024-11-21 9:14 ` Christian Brauner
2024-11-23 0:08 ` Alexei Starovoitov
2024-11-12 8:25 ` [PATCH bpf-next 3/4] bpf: Add recursion avoid logic for inode storage Song Liu
2024-11-12 8:25 ` [PATCH bpf-next 3/4] bpf: Add recursion prevention " Song Liu
2024-11-12 8:25 ` [PATCH bpf-next 4/4] selftest/bpf: Add test for inode local storage recursion Song Liu
2024-11-12 8:26 ` [PATCH bpf-next 4/4] selftest/bpf: Test inode local storage recursion prevention Song Liu
2024-11-12 8:35 ` [PATCH bpf-next 0/4] Make inode storage available to tracing prog Song Liu
2024-11-12 18:09 ` Casey Schaufler
2024-11-12 18:44 ` Song Liu
2024-11-13 1:10 ` Casey Schaufler
2024-11-13 1:37 ` Song Liu
2024-11-13 18:06 ` Casey Schaufler
2024-11-13 18:57 ` Song Liu
2024-11-14 16:36 ` Dr. Greg
2024-11-14 17:29 ` Casey Schaufler
2024-11-14 18:08 ` Song Liu
2024-11-14 21:49 ` James Bottomley
2024-11-14 22:30 ` Song Liu
2024-11-17 22:59 ` Song Liu
2024-11-19 12:27 ` Dr. Greg [this message]
2024-11-19 18:14 ` Casey Schaufler
2024-11-19 22:35 ` Song Liu
2024-11-20 16:54 ` Dr. Greg
2024-11-21 8:28 ` Song Liu
2024-11-21 16:02 ` Dr. Greg
2024-11-21 18:11 ` Casey Schaufler
2024-11-23 17:01 ` Dr. Greg
2024-11-25 20:49 ` Casey Schaufler
2024-11-21 17:47 ` Casey Schaufler
2024-11-21 18:28 ` Song Liu
2024-11-23 19:11 ` Paul Moore
2024-11-14 17:51 ` Song Liu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241119122706.GA19220@wind.enjellic.com \
--to=greg@enjellic.com \
--cc=James.Bottomley@HansenPartnership.com \
--cc=amir73il@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=brauner@kernel.org \
--cc=casey@schaufler-ca.com \
--cc=daniel@iogearbox.net \
--cc=eddyz87@gmail.com \
--cc=gnoack@google.com \
--cc=jack@suse.cz \
--cc=jlayton@kernel.org \
--cc=josef@toxicpanda.com \
--cc=kernel-team@meta.com \
--cc=kpsingh@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-security-module@vger.kernel.org \
--cc=martin.lau@linux.dev \
--cc=mattbobrowski@google.com \
--cc=mic@digikod.net \
--cc=repnop@google.com \
--cc=song@kernel.org \
--cc=songliubraving@meta.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).