From mboxrd@z Thu Jan 1 00:00:00 1970 From: Aleksa Sarai Subject: Re: RFC(v2): Audit Kernel Container IDs Date: Fri, 20 Oct 2017 10:11:33 +1100 Message-ID: <8f495870-dd6c-23b9-b82b-4228a441c729@suse.de> References: <20171012141359.saqdtnodwmbz33b2@madcap2.tricolour.ca> <2307769.VGpzlLa4Dp@x2> <20171019195747.4ssujtaj3f5ipsoh@madcap2.tricolour.ca> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20171019195747.4ssujtaj3f5ipsoh-bcJWsdo4jJjeVoXN4CMphl7TgLCtbB0G@public.gmane.org> Content-Language: en-US Sender: linux-api-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: Richard Guy Briggs , Steve Grubb Cc: mszeredi-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, trondmy-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org, Andy Lutomirski , jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org, Carlos O'Donell , Linux API , Linux Containers , Paul Moore , Linux Kernel , Eric Paris , Al Viro , David Howells , Linux Audit , Simo Sorce , Linux Network Development , Linux FS Devel , cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, "Eric W. Biederman" List-Id: linux-api@vger.kernel.org >>> The registration is a pseudo filesystem (proc, since PID tree already >>> exists) write of a u8[16] UUID representing the container ID to a file >>> representing a process that will become the first process in a new >>> container. This write might place restrictions on mount namespaces >>> required to define a container, or at least careful checking of >>> namespaces in the kernel to verify permissions of the orchestrator so it >>> can't change its own container ID. A bind mount of nsfs may be >>> necessary in the container orchestrator's mntNS. >>> Note: Use a 128-bit scalar rather than a string to make compares faster >>> and simpler. >>> >>> Require a new CAP_CONTAINER_ADMIN to be able to carry out the >>> registration. >> >> Wouldn't CAP_AUDIT_WRITE be sufficient? After all, this is for auditing. > > No, because then any process with that capability (vsftpd) could change > its own container ID. This is discussed more in other parts of the > thread... Not if we make the container ID append-only (to support nesting), or write-once (the other idea thrown around). In that case, you can't move "out" from a particular container ID, you can only go "deeper". These semantics don't make sense for generic containers, but since the point of this facility is *specifically* for audit I imagine that not being able to move a process from a sub-container's ID is a benefit. -- Aleksa Sarai Senior Software Engineer (Containers) SUSE Linux GmbH https://www.cyphar.com/