From: Carlos O'Donell <carlos@redhat.com>
To: Richard Guy Briggs <rgb@redhat.com>,
cgroups@vger.kernel.org,
Linux Containers <containers@lists.linux-foundation.org>,
Linux API <linux-api@vger.kernel.org>,
Linux Audit <linux-audit@redhat.com>,
Linux FS Devel <linux-fsdevel@vger.kernel.org>,
Linux Kernel <linux-kernel@vger.kernel.org>,
Linux Network Development <netdev@vger.kernel.org>
Cc: Aristeu Rozanski <arozansk@redhat.com>,
David Howells <dhowells@redhat.com>,
"Eric W. Biederman" <ebiederm@xmission.com>,
Eric Paris <eparis@parisplace.org>,
jlayton@redhat.com, Andy Lutomirski <luto@kernel.org>,
mszeredi@redhat.com, Paul Moore <pmoore@redhat.com>,
"Serge E. Hallyn" <serge@hallyn.com>,
Steve Grubb <sgrubb@redhat.com>,
trondmy@primarydata.com, Al Viro <viro@zeniv.linux.org.uk>
Subject: Re: RFC: Audit Kernel Container IDs
Date: Wed, 13 Sep 2017 14:33:52 -0500 [thread overview]
Message-ID: <9043cc5a-e624-10c9-1906-f29010c5f57c@redhat.com> (raw)
In-Reply-To: <20170913171328.GP3405@madcap2.tricolour.ca>
On 09/13/2017 12:13 PM, Richard Guy Briggs wrote:
> Containers are a userspace concept. The kernel knows nothing of them.
I am looking at this RFC from a userspace perspective, particularly from
the loader's point of view and the unshare syscall and the semantics that
arise from the use of it.
At a high level what you are doing is providing a way to group, without
hierarchy, processes and namespaces. The processes can move between
container's if they have CAP_CONTAINER_ADMIN and can open and write to
a special proc file.
* With unshare a thread may dissociate part of its execution context and
therefore see a distinct mount namespace. When you say "process" in this
particular RFC do you exclude the fact that a thread might be in a
distinct container from the rest of the threads in the process?
> The Linux audit system needs a way to be able to track the container
> provenance of events and actions. Audit needs the kernel's help to do
> this.
* Why does the Linux audit system need to tracker container provenance?
- How does it help to provide better audit messages?
- Is it be enough to list the namespace that a process occupies?
* Why does it need the kernel's help?
- Is there a race condition that is only fixable with kernel support?
- Or is it easier with kernel help but not required?
Providing background on these questions would help clarify the
design requirements.
> Since the concept of a container is entirely a userspace concept, a
> trigger signal from the userspace container orchestration system
> initiates this. This will define a point in time and a set of resources
> associated with a particular container with an audit container ID.
Please don't use the word 'signal', I suggest 'register' since you are
writing to a filesystem.
> The trigger is a pseudo filesystem (proc, since PID tree already exists)
> write of a u64 representing the container ID to a file representing a
> process that will become the first process in a new container.
> This might place restrictions on mount namespaces required to define a
> container, or at least careful checking of namespaces in the kernel to
> verify permissions of the orchestrator so it can't change its own
> container ID.
> A bind mount of nsfs may be necessary in the container orchestrator's
> mntNS.
>
> Require a new CAP_CONTAINER_ADMIN to be able to write to the pseudo
> filesystem to have this action permitted. At that time, record the
> child container's user-supplied 64-bit container identifier along with
What is a "child container?" Containers don't have any hierarchy.
I assume that if you don't have CAP_CONTAINER_ADMIN, that nothing prevents
your continued operation as we have today?
> the child container's first process (which may become the container's
> "init" process) process ID (referenced from the initial PID namespace),
> all namespace IDs (in the form of a nsfs device number and inode number
> tuple) in a new auxilliary record AUDIT_CONTAINER with a qualifying
> op=$action field.
What kind of requirement is there on the first tid/pid registering
the container ID? What if the 8th tid/pid does the registration?
Would that mean that the first process of the container did not
register? It seems like you are suggesting that the registration
by the 8th tid/pid causes a cascading registration progress,
registering all tid/pids in the same grouping? Is that true?
> Issue a new auxilliary record AUDIT_CONTAINER_INFO for each valid
> container ID present on an auditable action or event.
>
> Forked and cloned processes inherit their parent's container ID,
> referenced in the process' audit_context struct.
So a cloned process with CLONE_NEWNS has the came container ID
as the parent process that called clone, at least until the clone
has time to change to a new container ID?
Do you forsee any case where someone might need a semantic that is
slightly different? For example wanting to set the container ID on
clone?
> Log the creation of every namespace, inheriting/adding its spawning
> process' containerID(s), if applicable. Include the spawning and
> spawned namespace IDs (device and inode number tuples).
> [AUDIT_NS_CREATE, AUDIT_NS_DESTROY] [clone(2), unshare(2), setns(2)]
> Note: At this point it appears only network namespaces may need to track
> container IDs apart from processes since incoming packets may cause an
> auditable event before being associated with a process.
OK.
> Log the destruction of every namespace when it is no longer used by any
> process, include the namespace IDs (device and inode number tuples).
> [AUDIT_NS_DESTROY] [process exit, unshare(2), setns(2)]
>
> Issue a new auxilliary record AUDIT_NS_CHANGE listing (opt: op=$action)
> the parent and child namespace IDs for any changes to a process'
> namespaces. [setns(2)]
> Note: It may be possible to combine AUDIT_NS_* record formats and
> distinguish them with an op=$action field depending on the fields
> required for each message type.
>
> A process can be moved from one container to another by using the
> container assignment method outlined above a second time.
OK.
> When a container ceases to exist because the last process in that
> container has exited and hence the last namespace has been destroyed and
> its refcount dropping to zero, log the fact.
> (This latter is likely needed for certification accountability.) A
> container object may need a list of processes and/or namespaces.
OK.
> A namespace cannot directly migrate from one container to another but
> could be assigned to a newly spawned container. A namespace can be
> moved from one container to another indirectly by having that namespace
> used in a second process in another container and then ending all the
> processes in the first container.
OK.
> Feedback please.
--
Cheers,
Carlos.
next prev parent reply other threads:[~2017-09-13 19:33 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-09-13 17:13 RFC: Audit Kernel Container IDs Richard Guy Briggs
2017-09-13 19:33 ` Carlos O'Donell [this message]
[not found] ` <9043cc5a-e624-10c9-1906-f29010c5f57c-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-09-14 5:30 ` Richard Guy Briggs
[not found] ` <20170914053007.GR3405-bcJWsdo4jJjeVoXN4CMphl7TgLCtbB0G@public.gmane.org>
2017-09-15 10:19 ` Richard Guy Briggs
2017-09-14 17:33 ` Eric W. Biederman
[not found] ` <87d16tb2y5.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2017-09-14 18:07 ` Richard Guy Briggs
2017-09-19 2:45 ` Eric W. Biederman
[not found] ` <87wp4v76f4.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2017-09-19 4:15 ` Richard Guy Briggs
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9043cc5a-e624-10c9-1906-f29010c5f57c@redhat.com \
--to=carlos@redhat.com \
--cc=arozansk@redhat.com \
--cc=cgroups@vger.kernel.org \
--cc=containers@lists.linux-foundation.org \
--cc=dhowells@redhat.com \
--cc=ebiederm@xmission.com \
--cc=eparis@parisplace.org \
--cc=jlayton@redhat.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-audit@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=luto@kernel.org \
--cc=mszeredi@redhat.com \
--cc=netdev@vger.kernel.org \
--cc=pmoore@redhat.com \
--cc=rgb@redhat.com \
--cc=serge@hallyn.com \
--cc=sgrubb@redhat.com \
--cc=trondmy@primarydata.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).