From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?UTF-8?Q?Micka=c3=abl_Sala=c3=bcn?= Subject: Re: [RFC v2 09/10] landlock: Handle cgroups (performance) Date: Sat, 27 Aug 2016 21:35:14 +0200 Message-ID: <57C1EB72.2050703@digikod.net> References: <1472121165-29071-1-git-send-email-mic@digikod.net> <1472121165-29071-10-git-send-email-mic@digikod.net> <20160826021432.GA8291@ast-mbp.thefacebook.com> <57C05BF0.8000706@digikod.net> <20160826230539.GA26683@ast-mbp.thefacebook.com> <57C19E6E.6040908@digikod.net> <20160827180642.GA38754@ast-mbp.thefacebook.com> Reply-To: kernel-hardening@lists.openwall.com Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="G3g8KoHC06kMmFplvftMQOTxCWIrMcHf0" Return-path: List-Post: List-Help: List-Unsubscribe: List-Subscribe: In-Reply-To: <20160827180642.GA38754@ast-mbp.thefacebook.com> To: Alexei Starovoitov Cc: linux-kernel@vger.kernel.org, Alexei Starovoitov , Andy Lutomirski , Daniel Borkmann , Daniel Mack , "David S . Miller" , Kees Cook , Sargun Dhillon , kernel-hardening@lists.openwall.com, linux-api@vger.kernel.org, linux-security-module@vger.kernel.org, netdev@vger.kernel.org, Tejun Heo , cgroups@vger.kernel.org List-Id: linux-api@vger.kernel.org This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --G3g8KoHC06kMmFplvftMQOTxCWIrMcHf0 Content-Type: multipart/mixed; boundary="JUgEdnpOjxoOXANkuK2oNg02t617h3pFs" From: =?UTF-8?Q?Micka=c3=abl_Sala=c3=bcn?= To: Alexei Starovoitov Cc: linux-kernel@vger.kernel.org, Alexei Starovoitov , Andy Lutomirski , Daniel Borkmann , Daniel Mack , "David S . Miller" , Kees Cook , Sargun Dhillon , kernel-hardening@lists.openwall.com, linux-api@vger.kernel.org, linux-security-module@vger.kernel.org, netdev@vger.kernel.org, Tejun Heo , cgroups@vger.kernel.org Message-ID: <57C1EB72.2050703@digikod.net> Subject: Re: [RFC v2 09/10] landlock: Handle cgroups (performance) References: <1472121165-29071-1-git-send-email-mic@digikod.net> <1472121165-29071-10-git-send-email-mic@digikod.net> <20160826021432.GA8291@ast-mbp.thefacebook.com> <57C05BF0.8000706@digikod.net> <20160826230539.GA26683@ast-mbp.thefacebook.com> <57C19E6E.6040908@digikod.net> <20160827180642.GA38754@ast-mbp.thefacebook.com> In-Reply-To: <20160827180642.GA38754@ast-mbp.thefacebook.com> --JUgEdnpOjxoOXANkuK2oNg02t617h3pFs Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable On 27/08/2016 20:06, Alexei Starovoitov wrote: > On Sat, Aug 27, 2016 at 04:06:38PM +0200, Micka=EBl Sala=FCn wrote: >> >> On 27/08/2016 01:05, Alexei Starovoitov wrote: >>> On Fri, Aug 26, 2016 at 05:10:40PM +0200, Micka=EBl Sala=FCn wrote: >>>> >>>>> >>>>> - I don't think such 'for' loop can scale. The solution needs to wo= rk >>>>> with thousands of containers and thousands of cgroups. >>>>> In the patch 06/10 the proposal is to use 'current' as holder of >>>>> the programs: >>>>> + for (prog =3D current->seccomp.landlock_prog; >>>>> + prog; prog =3D prog->prev) { >>>>> + if (prog->filter =3D=3D landlock_ret->filter) { >>>>> + cur_ret =3D BPF_PROG_RUN(prog->prog, (void *)&c= tx); >>>>> + break; >>>>> + } >>>>> + } >>>>> imo that's the root of scalability issue. >>>>> I think to be able to scale the bpf programs have to be attached to= >>>>> cgroups instead of tasks. >>>>> That would be very different api. seccomp doesn't need to be touche= d. >>>>> But that is the only way I see to be able to scale. >>>> >>>> Landlock is inspired from seccomp which also use a BPF program per >>>> thread. For seccomp, each BPF programs are executed for each syscall= =2E >>>> For Landlock, some BPF programs are executed for some LSM hooks. I d= on't >>>> see why it is a scale issue for Landlock comparing to seccomp. I als= o >>>> don't see why storing the BPF program list pointer in the cgroup str= uct >>>> instead of the task struct change a lot here. The BPF programs execu= tion >>>> will be the same anyway (for each LSM hook). Kees should probably ha= ve a >>>> better opinion on this. >>> >>> seccomp has its own issues and copying them doesn't make this lsm any= better. >>> Like seccomp bpf programs are all gigantic switch statement that look= s >>> for interesting syscall numbers. All syscalls of a task are paying >>> non-trivial seccomp penalty due to such design. If bpf was attached p= er >>> syscall it would have been much cheaper. Of course doing it this way >>> for seccomp is not easy, but for lsm such facility is already there. >>> Blank call of a single bpf prog for all lsm hooks is unnecessary >>> overhead that can and should be avoided. >> >> It's probably a misunderstanding. Contrary to seccomp which run all th= e >> thread's BPF programs for any syscall, Landlock only run eBPF programs= >> for the triggered LSM hooks, if their type match. Indeed, thanks to th= e >> multiple eBPF program types and contrary to seccomp, Landlock only run= >> an eBPF program when needed. Landlock will have almost no performance >> overhead if the syscalls do not trigger the watched LSM hooks for the >> current process. >=20 > that's not what I see in the patch 06/10: > all lsm_hooks in 'static struct security_hook_list landlock_hooks' > (which eventually means all lsm hooks) will call > static inline int landlock_hook_##NAME > which will call landlock_run_prog() > which does: > + for (landlock_ret =3D current->seccomp.landlock_ret; > + landlock_ret; landlock_ret =3D landlock_ret->prev) { > + if (landlock_ret->triggered) { > + ctx.cookie =3D landlock_ret->cookie; > + for (prog =3D current->seccomp.landlock_prog; > + prog; prog =3D prog->prev) { > + if (prog->filter =3D=3D landlock_ret->filter) { > + cur_ret =3D BPF_PROG_RUN(prog->prog, (void *)&c= tx); > + break; > + } > + } >=20 > that is unacceptable overhead and not a scalable design. > It kinda works for 3 lsm_hooks as in patch 6, but doesn't scale > as soon as more lsm hooks are added. Good catch! I forgot to check the program (sub)type in the loop to only run the needed programs for the current hook. I will fix this. >=20 >> As said above, Landlock will not run an eBPF programs when not strictl= y >> needed. Attaching to a cgroup will have the same performance impact as= >> attaching to a process hierarchy. >=20 > Having a prog per cgroup per lsm_hook is the only scalable way I > could come up with. If you see another way, please propose. > current->seccomp.landlock_prog is not the answer. Hum, I don't see the difference from a performance point of view between a cgroup-based or a process hierarchy-based system. Maybe a better option should be to use an array of pointers with N entries, one for each supported hook, instead of a unique pointer list? Anyway, being able to attach an LSM hook program to a cgroup thanks to the new BPF_PROG_ATTACH seems a good idea (while keeping the possibility to use a process hierarchy). The downside will be to handle an LSM hook program which is not triggered by a seccomp-filter, but this should be needed anyway to handle interruptions. --JUgEdnpOjxoOXANkuK2oNg02t617h3pFs-- --G3g8KoHC06kMmFplvftMQOTxCWIrMcHf0 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- iQEcBAEBCgAGBQJXwetyAAoJECLe/t9zvWqVFqcH/1IbW2yp5+eSaQf+1RrBgCbK kAh9PrSG8Sz6JANiOUk3FNJQIGHNAwnFVjs+RwRWb2X7DzlCm6LHI19up1FNIGfV HFt5Q+/Wxil3FKtEb1NRsErxJ96hjnh4rIb11/yPQ8uzwrqR3c3rmedQKE9ejob0 H3O6sudJkAvEdj2Z2QsUrSqEmrDQz652U1QFl1Qlwz+goIq2Ds6/MRUYIV8cVXe1 59fh9StFnUL4lGeRfcXAo1PTG985/oHZI5PbN+IBSPGphBxmjwPQkCWFziZ0df64 MYzsc2R4edvetJM8PL9hVFguzaH0No7KVlJheywOIVj8puWqbVlzBFXwKOLv82o= =Tysc -----END PGP SIGNATURE----- --G3g8KoHC06kMmFplvftMQOTxCWIrMcHf0--