From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A513913DDBA; Thu, 11 Apr 2024 07:12:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712819566; cv=none; b=uCgM2/sUJghH9PeQaOtBFqZdWAfp0kSql2bc8mkz69iq58/RAMah1iS3E/+ZuQm3ESAka+D6VwOfxiw7RFkCoXqFzmBDtSIPnw+z9/JRdD3w35dXJNQCwx0CX+IYj0ULPb7ckQreltduayKRnKFNew3S0GJoLtmi2wNXwP2AMRc= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712819566; c=relaxed/simple; bh=qgtJ7uJlXWlZfBtuSaOH/toHDR8p7LHfHlMJuZyykcw=; h=Content-Type:Mime-Version:Subject:From:In-Reply-To:Date:Cc: Message-Id:References:To; b=kphmI/r5G9tpQ8GBszpTbwOf8GoOaCKVSj5yCpMTX2VMXzGpH2COXzKWw4Dnec8kgjkrzDUczkAJPCwqRT7QnVc1IR32Wycl3kW+r4tRVGbaPhnquCrYiUvYYXA8iRHT9Oka8s6L1dinW39a41G1yemTdrUUBoyAKmdA6J9NPJU= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=BhTUtteq; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="BhTUtteq" Received: by smtp.kernel.org (Postfix) with ESMTPSA id B4FAAC433C7; Thu, 11 Apr 2024 07:12:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1712819565; bh=qgtJ7uJlXWlZfBtuSaOH/toHDR8p7LHfHlMJuZyykcw=; h=Subject:From:In-Reply-To:Date:Cc:References:To:From; b=BhTUtteq5nqkJP7ZumwpATvL0UIBTb66SPUzTO5dVQx/D1dMFeq/QSU0D4QGQQ3HI nrXQ2gyuCXm/TlZc/Ybd/62msrG+Nmtzb7kwsYRSUQu5LMIgLxwupNEGj7CH8gflvo HAEffSK8IuNkClGaapm70rkTOv1RDkh7/gwpQ/oMV8dMhRVSEvzdqun6dyT/CMaEz8 7pZUUEPSYHXuIRHIGfxZV6VYs1bhk7hRW9FEZM9okT4CoWmz6GUo0hAyoRUbjm1+9/ zucNJ/pPVvQzozbboy3US3Bh2UEJJi0qz6MpfC6JAvsKN+r8jHDh1d6ePoFe94vMNE +kc/JRRnuxWzA== Content-Type: text/plain; charset=us-ascii Precedence: bulk X-Mailing-List: bpf@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3774.500.171.1.1\)) Subject: Re: [PATCH v9 3/4] security: Replace indirect LSM hook calls with static calls From: KP Singh In-Reply-To: Date: Thu, 11 Apr 2024 09:12:41 +0200 Cc: bpf@vger.kernel.org, linux-security-module@vger.kernel.org, Kees Cook , Casey Schaufler , song@kernel.org, daniel@iogearbox.net, ast@kernel.org, pabeni@redhat.com, andrii@kernel.org Content-Transfer-Encoding: quoted-printable Message-Id: <492036A7-4944-4225-B045-3C2F79DBEA31@kernel.org> References: <20240207124918.3498756-4-kpsingh@kernel.org> To: Paul Moore X-Mailer: Apple Mail (2.3774.500.171.1.1) > On 11 Apr 2024, at 02:38, Paul Moore wrote: >=20 > On Feb 7, 2024 KP Singh wrote: >>=20 >> LSM hooks are currently invoked from a linked list as indirect calls >> which are invoked using retpolines as a mitigation for speculative >> attacks (Branch History / Target injection) and add extra overhead = which >> is especially bad in kernel hot paths: >>=20 >> security_file_ioctl: >> 0xffffffff814f0320 <+0>: endbr64 >> 0xffffffff814f0324 <+4>: push %rbp >> 0xffffffff814f0325 <+5>: push %r15 >> 0xffffffff814f0327 <+7>: push %r14 >> 0xffffffff814f0329 <+9>: push %rbx >> 0xffffffff814f032a <+10>: mov %rdx,%rbx >> 0xffffffff814f032d <+13>: mov %esi,%ebp >> 0xffffffff814f032f <+15>: mov %rdi,%r14 >> 0xffffffff814f0332 <+18>: mov $0xffffffff834a7030,%r15 >> 0xffffffff814f0339 <+25>: mov (%r15),%r15 >> 0xffffffff814f033c <+28>: test %r15,%r15 >> 0xffffffff814f033f <+31>: je 0xffffffff814f0358 = >> 0xffffffff814f0341 <+33>: mov 0x18(%r15),%r11 >> 0xffffffff814f0345 <+37>: mov %r14,%rdi >> 0xffffffff814f0348 <+40>: mov %ebp,%esi >> 0xffffffff814f034a <+42>: mov %rbx,%rdx >>=20 >> 0xffffffff814f034d <+45>: call 0xffffffff81f742e0 = <__x86_indirect_thunk_array+352> >> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ >>=20 >> Indirect calls that use retpolines leading to overhead, not just = due >> to extra instruction but also branch misses. >>=20 >> 0xffffffff814f0352 <+50>: test %eax,%eax >> 0xffffffff814f0354 <+52>: je 0xffffffff814f0339 = >> 0xffffffff814f0356 <+54>: jmp 0xffffffff814f035a = >> 0xffffffff814f0358 <+56>: xor %eax,%eax >> 0xffffffff814f035a <+58>: pop %rbx >> 0xffffffff814f035b <+59>: pop %r14 >> 0xffffffff814f035d <+61>: pop %r15 >> 0xffffffff814f035f <+63>: pop %rbp >> 0xffffffff814f0360 <+64>: jmp 0xffffffff81f747c4 = <__x86_return_thunk> >=20 > Generally I fix these up, but since there are quite a few long-ish = lines > in the description, and a respin is probably a good idea to reduce the > merge fuzz, it would be good if you could manage the line lengths a = bit > better. Aim to have the no wrapped lines in the commit description = when > you run 'git log' on a 80-char wide terminal. I'm guessing that > (re)formatting the assembly to something like this will solve most of > the problems: >=20 > 0xff...0360: jmp 0xff...47c4 <__x86_return_thunk> Good idea. Will do. >=20 >> The indirect calls are not really needed as one knows the addresses = of >> enabled LSM callbacks at boot time and only the order can possibly >> change at boot time with the lsm=3D kernel command line parameter. >>=20 >> An array of static calls is defined per LSM hook and the static calls >> are updated at boot time once the order has been determined. >>=20 >> A static key guards whether an LSM static call is enabled or not, >> without this static key, for LSM hooks that return an int, the = presence >> of the hook that returns a default value can create side-effects = which [...] >> 0xffffffff818f0d07 <+103>: int3 >> 0xffffffff818f0d08 <+104>: int3 >> 0xffffffff818f0d09 <+105>: int3 >>=20 >> While this patch uses static_branch_unlikely indicating that an LSM = hook >> is likely to be not present, a subsequent makes it configurable. >=20 > I believe the comment above needs to be updated. Done. >=20 >> In most >> cases this is still a better choice as even when an LSM with one hook = is >> added, empty slots are created for all LSM hooks (especially when = many >> LSMs that do not initialize most hooks are present on the system). >>=20 >> There are some hooks that don't use the call_int_hook and >> call_void_hook. These hooks are updated to use a new macro called >> security_for_each_hook where the lsm_callback is directly invoked as = an >> indirect call. Currently, there are no performance sensitive hooks = that >> use the security_for_each_hook macro. However, if, some performance >> sensitive hooks are discovered, these can be updated to use static = calls >> with loop unrolling as well using a custom macro. >=20 > The security_for_each_hook() macro is not present in this patch. Yeah, it was renamed to lsm_for_each_hook based on Casey's suggestion, I = missed updating the message. Updated. >=20 > Beyond that, let's find a way to use static calls in the LSM hooks > which don't use the call_{int,void}_hook() macros. If we're going to = do > this to help close some attack vectors, let's make sure we do the > conversion everywhere. This is surely doable, We can unroll the loop individually in these = separate hooks. It would need separate=20 LSM_LOOP_UNROLL(__CALL_STATIC_xfrm_state_pol_flow_match, x, xp file) Would you be okay if we do it in a follow up series? These are special = hooks and I don't want to introduce any subtle logical bugs when fixing = potential speculative side channels (Which could be fixed with = retpolines, proper flushing at privilege changes etc). >=20 >> Below are results of the relevant Unixbench system benchmarks with = BPF LSM >> and SELinux enabled with default policies enabled with and without = these >> patches. >>=20 >> Benchmark Delta(%): (+ = is better) >> = =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D >> Execl Throughput +1.9356 >> File Write 1024 bufsize 2000 maxblocks +6.5953 >> Pipe Throughput +9.5499 >> Pipe-based Context Switching +3.0209 >> Process Creation +2.3246 >> Shell Scripts (1 concurrent) +1.4975 >> System Call Overhead +2.7815 >> System Benchmarks Index Score (Partial Only): +3.4859 >>=20 >> In the best case, some syscalls like eventfd_create benefitted to = about ~10%. >>=20 >> Reviewed-by: Casey Schaufler >> Reviewed-by: Kees Cook >> Acked-by: Song Liu >> Acked-by: Andrii Nakryiko >> Signed-off-by: KP Singh >> --- >> include/linux/lsm_hooks.h | 70 +++++++++-- >> security/security.c | 244 = ++++++++++++++++++++++++-------------- >> 2 files changed, 216 insertions(+), 98 deletions(-) >>=20 [...] >> #undef LSM_HOOK >> + void *lsm_callback; >> }; >=20 > It took me a little while to figure out what you were doing with the > lsm_callback field above, can we get rid of the "callback" bit and go > with something to indicate this is a generic function address? How > about "lsm_func_addr" or similar (bikeshedding, I know ...)? >=20 > I'd also like to see a one line comment in there too. lsm_func_addr is actually better. Thanks. >=20 >> -struct security_hook_heads { >> - #define LSM_HOOK(RET, DEFAULT, NAME, ...) struct hlist_head NAME; >> - #include "lsm_hook_defs.h" >> +/* >> + * @key: static call key as defined by STATIC_CALL_KEY >> + * @trampoline: static call trampoline as defined by = STATIC_CALL_TRAMP >> + * @hl: The security_hook_list as initialized by the owning LSM. >> + * @active: Enabled when the static call has an LSM hook associated. >> + */ >> +struct lsm_static_call { >> + struct static_call_key *key; >> + void *trampoline; >> + struct security_hook_list *hl; >> + /* this needs to be true or false based on what the key defaults to = */ >=20 > Isn't this "true or false based on if @hl is valid or not"? See below, we are trying to avoid surplus branches and loads. >=20 >> + struct static_key_false *active; >> +} __randomize_layout; >> + >> +/* >> + * Table of the static calls for each LSM hook. >> + * Once the LSMs are initialized, their callbacks will be copied to = these >> + * tables such that the calls are filled backwards (from last to = first). >> + * This way, we can jump directly to the first used static call, and = execute >> + * all of them after. This essentially makes the entry point >> + * dynamic to adapt the number of static calls to the number of = callbacks. >> + */ >> +struct lsm_static_calls_table { >> + #define LSM_HOOK(RET, DEFAULT, NAME, ...) \ >> + struct lsm_static_call NAME[MAX_LSM_COUNT]; >> + #include >> #undef LSM_HOOK >> } __randomize_layout; >>=20 >> @@ -58,10 +105,14 @@ struct lsm_id { >> /* >> * Security module hook list structure. >> * For use with generic list macros for common operations. >> + * >> + * struct security_hook_list - Contents of a cacheable, mappable = object. >=20 > The comment above looks odd ... can you explain this a bit more and = what > your intention was with that line? It's odd indeed. Been a while since I wrote it, I don't think that = comemnt is actually needed, since the comment above it is explantory. Will delete it. >=20 >> + * @scalls: The beginning of the array of static calls assigned to = this hook. >> + * @hook: The callback for the hook. >> + * @lsm: The name of the lsm that owns this hook. >> */ >> struct security_hook_list { >> - struct hlist_node list; >> - struct hlist_head *head; >> + struct lsm_static_call *scalls; >> union security_list_options hook; >> const struct lsm_id *lsmid; >> } __randomize_layout; >> @@ -110,10 +161,12 @@ static inline struct xattr = *lsm_get_xattr_slot(struct xattr *xattrs, >> * care of the common case and reduces the amount of >> * text involved. >> */ >> -#define LSM_HOOK_INIT(HEAD, HOOK) \ >> - { .head =3D &security_hook_heads.HEAD, .hook =3D { .HEAD =3D HOOK } = } >> +#define LSM_HOOK_INIT(NAME, CALLBACK) \ >> + { \ >> + .scalls =3D static_calls_table.NAME, \ >> + .hook =3D { .NAME =3D CALLBACK } \ >> + } >=20 > Unless there is something that I'm missing, please just stick with the > existing "HOOK" name instead of "CALLBACK". >=20 fair, updated. >> -extern struct security_hook_heads security_hook_heads; >> extern char *lsm_names; >>=20 >> extern void security_add_hooks(struct security_hook_list *hooks, int = count, >> @@ -151,5 +204,6 @@ extern struct lsm_info __start_early_lsm_info[], = __end_early_lsm_info[]; >> __aligned(sizeof(unsigned long)) >>=20 >> extern int lsm_inode_alloc(struct inode *inode); >> +extern struct lsm_static_calls_table static_calls_table = __ro_after_init; >>=20 [...] >> /* >> @@ -846,29 +906,41 @@ int lsm_fill_user_ctx(struct lsm_ctx __user = *uctx, size_t *uctx_len, >> * call_int_hook: >> * This is a hook that returns a value. >> */ >> +#define __CALL_STATIC_VOID(NUM, HOOK, ...) \ >> +do { \ >> + if (static_branch_unlikely(&SECURITY_HOOK_ACTIVE_KEY(HOOK, NUM))) { = \ >=20 > I'm not a fan of the likely()/unlikely() style markings/macros in = cases > like this as it can vary tremendously. Drop the likely()/unlikely() > checks and just do a static_call(). >=20 =20 These are actually not the the classical likely, unlikely macros which = are just hints to the compiler: #define likely(x) __builtin_expect(!!(x), 1) #define unlikely(x) __builtin_expect(!!(x), 0 but a part of the static keys API which generates jump tables and the = code generated depends on the (default state, likelyhood). It could have = been named better, all we need is to have a jump table so that we can = optimize this extra branch in hotpaths, in one direction. https://www.kernel.org/doc/Documentation/static-keys.txt If you want I can put this behind a macro: #define LSM_HOOK_ACTIVE(HOOK, NUM) = static_branch_unlikely(&SECURITY_HOOK_ACTIVE_KEY(HOOK, NUM) the static_branch_likely / static_branch_unlikey actually does not = matter much here, because without this we have a conditional branch and = an extra load. >> + static_call(LSM_STATIC_CALL(HOOK, NUM))(__VA_ARGS__); \ >> + } \ >> +} while (0); >>=20 >> -#define call_void_hook(FUNC, ...) \ >> - do { \ >> - struct security_hook_list *P; \ >> - \ >> - hlist_for_each_entry(P, &security_hook_heads.FUNC, list) \ >> - P->hook.FUNC(__VA_ARGS__); \ >> +#define call_void_hook(FUNC, ...) \ >> + do { \ >> + LSM_LOOP_UNROLL(__CALL_STATIC_VOID, FUNC, __VA_ARGS__); \ >> } while (0) >>=20 >> -#define call_int_hook(FUNC, IRC, ...) ({ \ >> - int RC =3D IRC; \ >> - do { \ >> - struct security_hook_list *P; \ >> - \ >> - hlist_for_each_entry(P, &security_hook_heads.FUNC, list) { \ >> - RC =3D P->hook.FUNC(__VA_ARGS__); \ >> - if (RC !=3D 0) \ >> - break; \ >> - } \ >> - } while (0); \ >> - RC; \ >> +#define __CALL_STATIC_INT(NUM, R, HOOK, LABEL, ...) \ >> +do { \ >> + if (static_branch_unlikely(&SECURITY_HOOK_ACTIVE_KEY(HOOK, NUM))) { = \ >=20 > See my comments in the void sister function. See above. >=20 >> + R =3D static_call(LSM_STATIC_CALL(HOOK, NUM))(__VA_ARGS__); \ >> + if (R !=3D 0) \ >> + goto LABEL; \ >> + } \ >> +} while (0); >> + >> +#define call_int_hook(FUNC, IRC, ...) \ >> +({ \ >> + __label__ out; \ >> + int RC =3D IRC; \ >> + LSM_LOOP_UNROLL(__CALL_STATIC_INT, RC, FUNC, out, __VA_ARGS__); \ >> +out: \ >> + RC; \ >> }) >>=20 >> +#define lsm_for_each_hook(scall, NAME) \ >> + for (scall =3D static_calls_table.NAME; \ >> + scall - static_calls_table.NAME < MAX_LSM_COUNT; scall++) \ >> + if (static_key_enabled(&scall->active->key)) >> + >> /* Security operations */ >>=20 >=20 > -- > paul-moore.com