* RFC(V3): Audit Kernel Container IDs
@ 2018-01-09 12:16 ` Richard Guy Briggs
0 siblings, 0 replies; 35+ messages in thread
From: Richard Guy Briggs @ 2018-01-09 12:16 UTC (permalink / raw)
To: cgroups-u79uwXL29TY76Z2rM5mHXA, Linux Containers, Linux API,
Linux Audit, Linux FS Devel, Linux Kernel,
Linux Network Development
Cc: Simo Sorce, Carlos O'Donell, Aristeu Rozanski, David Howells,
Eric W. Biederman, Eric Paris, Daniel Walsh,
jlayton-H+wXaHxf7aLQT0dZR+AlfA, Andy Lutomirski,
mszeredi-H+wXaHxf7aLQT0dZR+AlfA, Paul Moore, Serge E. Hallyn,
Steve Grubb, trondmy-7I+n7zu2hftEKMMhf/gKZA, Al Viro, Madz Car
Containers are a userspace concept. The kernel knows nothing of them.
The Linux audit system needs a way to be able to track the container
provenance of events and actions. Audit needs the kernel's help to do
this.
Since the concept of a container is entirely a userspace concept, a
registration from the userspace container orchestration system initiates
this. This will define a point in time and a set of resources
associated with a particular container with an audit container
identifier.
The registration is a u64 representing the audit container identifier
written to a special file in a pseudo filesystem (proc, since PID tree
already exists) representing a process that will become a parent process
in that container. This write might place restrictions on mount
namespaces required to define a container, or at least careful checking
of namespaces in the kernel to verify permissions of the orchestrator so
it can't change its own container ID. A bind mount of nsfs may be
necessary in the container orchestrator's mount namespace. This write
can only happen once per process.
Note: The justification for using a u64 is that it minimizes the
information printed in every audit record, reducing bandwidth and limits
comparisons to a single u64 which will be faster and less error-prone.
Require CAP_AUDIT_CONTROL to be able to carry out the registration. At
that time, record the target container's user-supplied audit container
identifier along with a target container's parent process (which may
become the target container's "init" process) process ID (referenced
from the initial PID namespace) in a new record AUDIT_CONTAINER with a
qualifying op=$action field.
Issue a new auxilliary record AUDIT_CONTAINER_INFO for each valid
container ID present on an auditable action or event.
Forked and cloned processes inherit their parent's audit container
identifier, referenced in the process' task_struct. Since the audit
container identifier is inherited rather than written, it can still be
written once. This will prevent tampering while allowing nesting.
(This can be implemented with an internal settable flag upon
registration that does not get copied across a fork/clone.)
Mimic setns(2) and return an error if the process has already initiated
threading or forked since this registration should happen before the
process execution is started by the orchestrator and hence should not
yet have any threads or children. If this is deemed overly restrictive,
switch all of the target's threads and children to the new containerID.
Trust the orchestrator to judiciously use and restrict CAP_AUDIT_CONTROL.
When a container ceases to exist because the last process in that
container has exited log the fact to balance the registration action.
(This is likely needed for certification accountability.)
At this point it appears unnecessary to add a container session
identifier since this is all tracked from loginuid and sessionid to
communicate with the container orchestrator to spawn an additional
session into an existing container which would be logged. It can be
added at a later date without breaking API should it be deemed
necessary.
The following namespace logging actions are not needed for certification
purposes at this point, but are helpful for tracking namespace activity.
These are auxilliary records that are associated with namespace
manipulation syscalls unshare(2), clone(2) and setns(2), so the records
will only show up if explicit syscall rules have been added to document
this activity.
Log the creation of every namespace, inheriting/adding its spawning
process' audit container identifier(s), if applicable. Include the
spawning and spawned namespace IDs (device and inode number tuples).
[AUDIT_NS_CREATE, AUDIT_NS_DESTROY] [clone(2), unshare(2), setns(2)]
Note: At this point it appears only network namespaces may need to track
container IDs apart from processes since incoming packets may cause an
auditable event before being associated with a process. Since a
namespace can be shared by processes in different containers, the
namespace will need to track all containers to which it has been
assigned.
Upon registration, the target process' namespace IDs (in the form of a
nsfs device number and inode number tuple) will be recorded in an
AUDIT_NS_INFO auxilliary record.
Log the destruction of every namespace that is no longer used by any
process, including the namespace IDs (device and inode number tuples).
[AUDIT_NS_DESTROY] [process exit, unshare(2), setns(2)]
Issue a new auxilliary record AUDIT_NS_CHANGE listing (opt: op=$action)
the parent and child namespace IDs for any changes to a process'
namespaces. [setns(2)]
Note: It may be possible to combine AUDIT_NS_* record formats and
distinguish them with an op=$action field depending on the fields
required for each message type.
The audit container identifier will need to be reaped from all
implicated namespaces upon the destruction of a container.
This namespace information adds supporting information for tracking
events not attributable to specific processes.
Changelog:
(Upstream V3)
- switch back to u64 (from pmoore, can be expanded to u128 in future if
need arises without breaking API. u32 was originally proposed, up to
c36 discussed)
- write-once, but children inherit audit container identifier and can
then still be written once
- switch to CAP_AUDIT_CONTROL
- group namespace actions together, auxilliary records to namespace
operations.
(Upstream V2)
- switch from u64 to u128 UUID
- switch from "signal" and "trigger" to "register"
- restrict registration to single process or force all threads and
children into same container
- RGB
--
Richard Guy Briggs <rgb-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635
^ permalink raw reply [flat|nested] 35+ messages in thread
* RFC(V3): Audit Kernel Container IDs
@ 2018-01-09 12:16 ` Richard Guy Briggs
0 siblings, 0 replies; 35+ messages in thread
From: Richard Guy Briggs @ 2018-01-09 12:16 UTC (permalink / raw)
To: cgroups, Linux Containers, Linux API, Linux Audit, Linux FS Devel,
Linux Kernel, Linux Network Development
Cc: Simo Sorce, Carlos O'Donell, Aristeu Rozanski, David Howells,
Eric W. Biederman, Eric Paris, Daniel Walsh, jlayton,
Andy Lutomirski, mszeredi, Paul Moore, Serge E. Hallyn,
Steve Grubb, trondmy, Al Viro, Madz Car
Containers are a userspace concept. The kernel knows nothing of them.
The Linux audit system needs a way to be able to track the container
provenance of events and actions. Audit needs the kernel's help to do
this.
Since the concept of a container is entirely a userspace concept, a
registration from the userspace container orchestration system initiates
this. This will define a point in time and a set of resources
associated with a particular container with an audit container
identifier.
The registration is a u64 representing the audit container identifier
written to a special file in a pseudo filesystem (proc, since PID tree
already exists) representing a process that will become a parent process
in that container. This write might place restrictions on mount
namespaces required to define a container, or at least careful checking
of namespaces in the kernel to verify permissions of the orchestrator so
it can't change its own container ID. A bind mount of nsfs may be
necessary in the container orchestrator's mount namespace. This write
can only happen once per process.
Note: The justification for using a u64 is that it minimizes the
information printed in every audit record, reducing bandwidth and limits
comparisons to a single u64 which will be faster and less error-prone.
Require CAP_AUDIT_CONTROL to be able to carry out the registration. At
that time, record the target container's user-supplied audit container
identifier along with a target container's parent process (which may
become the target container's "init" process) process ID (referenced
from the initial PID namespace) in a new record AUDIT_CONTAINER with a
qualifying op=$action field.
Issue a new auxilliary record AUDIT_CONTAINER_INFO for each valid
container ID present on an auditable action or event.
Forked and cloned processes inherit their parent's audit container
identifier, referenced in the process' task_struct. Since the audit
container identifier is inherited rather than written, it can still be
written once. This will prevent tampering while allowing nesting.
(This can be implemented with an internal settable flag upon
registration that does not get copied across a fork/clone.)
Mimic setns(2) and return an error if the process has already initiated
threading or forked since this registration should happen before the
process execution is started by the orchestrator and hence should not
yet have any threads or children. If this is deemed overly restrictive,
switch all of the target's threads and children to the new containerID.
Trust the orchestrator to judiciously use and restrict CAP_AUDIT_CONTROL.
When a container ceases to exist because the last process in that
container has exited log the fact to balance the registration action.
(This is likely needed for certification accountability.)
At this point it appears unnecessary to add a container session
identifier since this is all tracked from loginuid and sessionid to
communicate with the container orchestrator to spawn an additional
session into an existing container which would be logged. It can be
added at a later date without breaking API should it be deemed
necessary.
The following namespace logging actions are not needed for certification
purposes at this point, but are helpful for tracking namespace activity.
These are auxilliary records that are associated with namespace
manipulation syscalls unshare(2), clone(2) and setns(2), so the records
will only show up if explicit syscall rules have been added to document
this activity.
Log the creation of every namespace, inheriting/adding its spawning
process' audit container identifier(s), if applicable. Include the
spawning and spawned namespace IDs (device and inode number tuples).
[AUDIT_NS_CREATE, AUDIT_NS_DESTROY] [clone(2), unshare(2), setns(2)]
Note: At this point it appears only network namespaces may need to track
container IDs apart from processes since incoming packets may cause an
auditable event before being associated with a process. Since a
namespace can be shared by processes in different containers, the
namespace will need to track all containers to which it has been
assigned.
Upon registration, the target process' namespace IDs (in the form of a
nsfs device number and inode number tuple) will be recorded in an
AUDIT_NS_INFO auxilliary record.
Log the destruction of every namespace that is no longer used by any
process, including the namespace IDs (device and inode number tuples).
[AUDIT_NS_DESTROY] [process exit, unshare(2), setns(2)]
Issue a new auxilliary record AUDIT_NS_CHANGE listing (opt: op=$action)
the parent and child namespace IDs for any changes to a process'
namespaces. [setns(2)]
Note: It may be possible to combine AUDIT_NS_* record formats and
distinguish them with an op=$action field depending on the fields
required for each message type.
The audit container identifier will need to be reaped from all
implicated namespaces upon the destruction of a container.
This namespace information adds supporting information for tracking
events not attributable to specific processes.
Changelog:
(Upstream V3)
- switch back to u64 (from pmoore, can be expanded to u128 in future if
need arises without breaking API. u32 was originally proposed, up to
c36 discussed)
- write-once, but children inherit audit container identifier and can
then still be written once
- switch to CAP_AUDIT_CONTROL
- group namespace actions together, auxilliary records to namespace
operations.
(Upstream V2)
- switch from u64 to u128 UUID
- switch from "signal" and "trigger" to "register"
- restrict registration to single process or force all threads and
children into same container
- RGB
--
Richard Guy Briggs <rgb@redhat.com>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635
^ permalink raw reply [flat|nested] 35+ messages in thread
[parent not found: <20180109121620.wi7dq2423ugsraqv-bcJWsdo4jJjeVoXN4CMphl7TgLCtbB0G@public.gmane.org>]
* Re: RFC(V3): Audit Kernel Container IDs
[not found] ` <20180109121620.wi7dq2423ugsraqv-bcJWsdo4jJjeVoXN4CMphl7TgLCtbB0G@public.gmane.org>
2018-01-09 16:18 ` Simo Sorce
@ 2018-01-09 16:18 ` Simo Sorce
2018-01-10 1:05 ` Eric W. Biederman
2018-02-02 22:05 ` Paul Moore
3 siblings, 0 replies; 35+ messages in thread
From: Simo Sorce @ 2018-01-09 16:18 UTC (permalink / raw)
To: Richard Guy Briggs, cgroups-u79uwXL29TY76Z2rM5mHXA,
Linux Containers, Linux API, Linux Audit, Linux FS Devel,
Linux Kernel, Linux Network Development
Cc: mszeredi-H+wXaHxf7aLQT0dZR+AlfA, Steve Grubb,
jlayton-H+wXaHxf7aLQT0dZR+AlfA, Carlos O'Donell, Daniel Walsh,
Paul Moore, Al Viro, David Howells, Madz Car, Andy Lutomirski,
trondmy-7I+n7zu2hftEKMMhf/gKZA, Eric Paris, Eric W. Biederman
On Tue, 2018-01-09 at 07:16 -0500, Richard Guy Briggs wrote:
> Containers are a userspace concept. The kernel knows nothing of them.
>
> The Linux audit system needs a way to be able to track the container
> provenance of events and actions. Audit needs the kernel's help to do
> this.
>
> Since the concept of a container is entirely a userspace concept, a
> registration from the userspace container orchestration system initiates
> this. This will define a point in time and a set of resources
> associated with a particular container with an audit container
> identifier.
>
> The registration is a u64 representing the audit container identifier
> written to a special file in a pseudo filesystem (proc, since PID tree
> already exists) representing a process that will become a parent process
> in that container. This write might place restrictions on mount
> namespaces required to define a container, or at least careful checking
> of namespaces in the kernel to verify permissions of the orchestrator so
> it can't change its own container ID. A bind mount of nsfs may be
> necessary in the container orchestrator's mount namespace. This write
> can only happen once per process.
>
> Note: The justification for using a u64 is that it minimizes the
> information printed in every audit record, reducing bandwidth and limits
> comparisons to a single u64 which will be faster and less error-prone.
>
> Require CAP_AUDIT_CONTROL to be able to carry out the registration. At
> that time, record the target container's user-supplied audit container
> identifier along with a target container's parent process (which may
> become the target container's "init" process) process ID (referenced
> from the initial PID namespace) in a new record AUDIT_CONTAINER with a
> qualifying op=$action field.
>
> Issue a new auxilliary record AUDIT_CONTAINER_INFO for each valid
> container ID present on an auditable action or event.
>
> Forked and cloned processes inherit their parent's audit container
> identifier, referenced in the process' task_struct. Since the audit
> container identifier is inherited rather than written, it can still be
> written once. This will prevent tampering while allowing nesting.
> (This can be implemented with an internal settable flag upon
> registration that does not get copied across a fork/clone.)
>
> Mimic setns(2) and return an error if the process has already initiated
> threading or forked since this registration should happen before the
> process execution is started by the orchestrator and hence should not
> yet have any threads or children. If this is deemed overly restrictive,
> switch all of the target's threads and children to the new containerID.
>
> Trust the orchestrator to judiciously use and restrict CAP_AUDIT_CONTROL.
>
> When a container ceases to exist because the last process in that
> container has exited log the fact to balance the registration action.
> (This is likely needed for certification accountability.)
>
> At this point it appears unnecessary to add a container session
> identifier since this is all tracked from loginuid and sessionid to
> communicate with the container orchestrator to spawn an additional
> session into an existing container which would be logged. It can be
> added at a later date without breaking API should it be deemed
> necessary.
>
> The following namespace logging actions are not needed for certification
> purposes at this point, but are helpful for tracking namespace activity.
> These are auxilliary records that are associated with namespace
> manipulation syscalls unshare(2), clone(2) and setns(2), so the records
> will only show up if explicit syscall rules have been added to document
> this activity.
>
> Log the creation of every namespace, inheriting/adding its spawning
> process' audit container identifier(s), if applicable. Include the
> spawning and spawned namespace IDs (device and inode number tuples).
> [AUDIT_NS_CREATE, AUDIT_NS_DESTROY] [clone(2), unshare(2), setns(2)]
> Note: At this point it appears only network namespaces may need to track
> container IDs apart from processes since incoming packets may cause an
> auditable event before being associated with a process. Since a
> namespace can be shared by processes in different containers, the
> namespace will need to track all containers to which it has been
> assigned.
>
> Upon registration, the target process' namespace IDs (in the form of a
> nsfs device number and inode number tuple) will be recorded in an
> AUDIT_NS_INFO auxilliary record.
>
> Log the destruction of every namespace that is no longer used by any
> process, including the namespace IDs (device and inode number tuples).
> [AUDIT_NS_DESTROY] [process exit, unshare(2), setns(2)]
>
> Issue a new auxilliary record AUDIT_NS_CHANGE listing (opt: op=$action)
> the parent and child namespace IDs for any changes to a process'
> namespaces. [setns(2)]
> Note: It may be possible to combine AUDIT_NS_* record formats and
> distinguish them with an op=$action field depending on the fields
> required for each message type.
>
> The audit container identifier will need to be reaped from all
> implicated namespaces upon the destruction of a container.
>
> This namespace information adds supporting information for tracking
> events not attributable to specific processes.
>
> Changelog:
>
> (Upstream V3)
> - switch back to u64 (from pmoore, can be expanded to u128 in future if
> need arises without breaking API. u32 was originally proposed, up to
> c36 discussed)
> - write-once, but children inherit audit container identifier and can
> then still be written once
> - switch to CAP_AUDIT_CONTROL
> - group namespace actions together, auxilliary records to namespace
> operations.
>
> (Upstream V2)
> - switch from u64 to u128 UUID
> - switch from "signal" and "trigger" to "register"
> - restrict registration to single process or force all threads and
> children into same container
I am trying to understand the back and forth on the ID size.
From an orchestrator POV anything that requires tracking a node
specific ID is not ideal.
Orchestrators tend to span many nodes, and containers tend to have IDs
that are either UUID or have a Hash (like SHA256) as identifier.
The problem here is two-fold:
a) Your auditing requires some mapping to be useful outside of the
system.
If you aggreggate audit logs outside of the system or you want to
correlate the system audit logs with other components dealing with
containers, now you need a place where you provide a mapping from your
audit u64 to the ID a container has in the rest of the system.
b) Now you need a mapping of some sort. The simplest way a container
orchestrator can go about this is to just use the UUID or Hash
representing their view of the container, truncate it to a u64 and use
that for Audit. This means there are some chances there will be a
collision and a duplicate u64 ID will be used by the orchestrator as
the container ID. What happen in that case ?
Simo.
--
Simo Sorce
Sr. Principal Software Engineer
Red Hat, Inc
^ permalink raw reply [flat|nested] 35+ messages in thread* Re: RFC(V3): Audit Kernel Container IDs
@ 2018-01-09 16:18 ` Simo Sorce
0 siblings, 0 replies; 35+ messages in thread
From: Simo Sorce @ 2018-01-09 16:18 UTC (permalink / raw)
To: Richard Guy Briggs, cgroups, Linux Containers, Linux API,
Linux Audit, Linux FS Devel, Linux Kernel,
Linux Network Development
Cc: Carlos O'Donell, Aristeu Rozanski, David Howells,
Eric W. Biederman, Eric Paris, Daniel Walsh, jlayton,
Andy Lutomirski, mszeredi, Paul Moore, Serge E. Hallyn,
Steve Grubb, trondmy, Al Viro, Madz Car
On Tue, 2018-01-09 at 07:16 -0500, Richard Guy Briggs wrote:
> Containers are a userspace concept. The kernel knows nothing of them.
>
> The Linux audit system needs a way to be able to track the container
> provenance of events and actions. Audit needs the kernel's help to do
> this.
>
> Since the concept of a container is entirely a userspace concept, a
> registration from the userspace container orchestration system initiates
> this. This will define a point in time and a set of resources
> associated with a particular container with an audit container
> identifier.
>
> The registration is a u64 representing the audit container identifier
> written to a special file in a pseudo filesystem (proc, since PID tree
> already exists) representing a process that will become a parent process
> in that container. This write might place restrictions on mount
> namespaces required to define a container, or at least careful checking
> of namespaces in the kernel to verify permissions of the orchestrator so
> it can't change its own container ID. A bind mount of nsfs may be
> necessary in the container orchestrator's mount namespace. This write
> can only happen once per process.
>
> Note: The justification for using a u64 is that it minimizes the
> information printed in every audit record, reducing bandwidth and limits
> comparisons to a single u64 which will be faster and less error-prone.
>
> Require CAP_AUDIT_CONTROL to be able to carry out the registration. At
> that time, record the target container's user-supplied audit container
> identifier along with a target container's parent process (which may
> become the target container's "init" process) process ID (referenced
> from the initial PID namespace) in a new record AUDIT_CONTAINER with a
> qualifying op=$action field.
>
> Issue a new auxilliary record AUDIT_CONTAINER_INFO for each valid
> container ID present on an auditable action or event.
>
> Forked and cloned processes inherit their parent's audit container
> identifier, referenced in the process' task_struct. Since the audit
> container identifier is inherited rather than written, it can still be
> written once. This will prevent tampering while allowing nesting.
> (This can be implemented with an internal settable flag upon
> registration that does not get copied across a fork/clone.)
>
> Mimic setns(2) and return an error if the process has already initiated
> threading or forked since this registration should happen before the
> process execution is started by the orchestrator and hence should not
> yet have any threads or children. If this is deemed overly restrictive,
> switch all of the target's threads and children to the new containerID.
>
> Trust the orchestrator to judiciously use and restrict CAP_AUDIT_CONTROL.
>
> When a container ceases to exist because the last process in that
> container has exited log the fact to balance the registration action.
> (This is likely needed for certification accountability.)
>
> At this point it appears unnecessary to add a container session
> identifier since this is all tracked from loginuid and sessionid to
> communicate with the container orchestrator to spawn an additional
> session into an existing container which would be logged. It can be
> added at a later date without breaking API should it be deemed
> necessary.
>
> The following namespace logging actions are not needed for certification
> purposes at this point, but are helpful for tracking namespace activity.
> These are auxilliary records that are associated with namespace
> manipulation syscalls unshare(2), clone(2) and setns(2), so the records
> will only show up if explicit syscall rules have been added to document
> this activity.
>
> Log the creation of every namespace, inheriting/adding its spawning
> process' audit container identifier(s), if applicable. Include the
> spawning and spawned namespace IDs (device and inode number tuples).
> [AUDIT_NS_CREATE, AUDIT_NS_DESTROY] [clone(2), unshare(2), setns(2)]
> Note: At this point it appears only network namespaces may need to track
> container IDs apart from processes since incoming packets may cause an
> auditable event before being associated with a process. Since a
> namespace can be shared by processes in different containers, the
> namespace will need to track all containers to which it has been
> assigned.
>
> Upon registration, the target process' namespace IDs (in the form of a
> nsfs device number and inode number tuple) will be recorded in an
> AUDIT_NS_INFO auxilliary record.
>
> Log the destruction of every namespace that is no longer used by any
> process, including the namespace IDs (device and inode number tuples).
> [AUDIT_NS_DESTROY] [process exit, unshare(2), setns(2)]
>
> Issue a new auxilliary record AUDIT_NS_CHANGE listing (opt: op=$action)
> the parent and child namespace IDs for any changes to a process'
> namespaces. [setns(2)]
> Note: It may be possible to combine AUDIT_NS_* record formats and
> distinguish them with an op=$action field depending on the fields
> required for each message type.
>
> The audit container identifier will need to be reaped from all
> implicated namespaces upon the destruction of a container.
>
> This namespace information adds supporting information for tracking
> events not attributable to specific processes.
>
> Changelog:
>
> (Upstream V3)
> - switch back to u64 (from pmoore, can be expanded to u128 in future if
> need arises without breaking API. u32 was originally proposed, up to
> c36 discussed)
> - write-once, but children inherit audit container identifier and can
> then still be written once
> - switch to CAP_AUDIT_CONTROL
> - group namespace actions together, auxilliary records to namespace
> operations.
>
> (Upstream V2)
> - switch from u64 to u128 UUID
> - switch from "signal" and "trigger" to "register"
> - restrict registration to single process or force all threads and
> children into same container
I am trying to understand the back and forth on the ID size.
>From an orchestrator POV anything that requires tracking a node
specific ID is not ideal.
Orchestrators tend to span many nodes, and containers tend to have IDs
that are either UUID or have a Hash (like SHA256) as identifier.
The problem here is two-fold:
a) Your auditing requires some mapping to be useful outside of the
system.
If you aggreggate audit logs outside of the system or you want to
correlate the system audit logs with other components dealing with
containers, now you need a place where you provide a mapping from your
audit u64 to the ID a container has in the rest of the system.
b) Now you need a mapping of some sort. The simplest way a container
orchestrator can go about this is to just use the UUID or Hash
representing their view of the container, truncate it to a u64 and use
that for Audit. This means there are some chances there will be a
collision and a duplicate u64 ID will be used by the orchestrator as
the container ID. What happen in that case ?
Simo.
--
Simo Sorce
Sr. Principal Software Engineer
Red Hat, Inc
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: RFC(V3): Audit Kernel Container IDs
@ 2018-01-09 16:18 ` Simo Sorce
0 siblings, 0 replies; 35+ messages in thread
From: Simo Sorce @ 2018-01-09 16:18 UTC (permalink / raw)
To: Richard Guy Briggs, cgroups-u79uwXL29TY76Z2rM5mHXA,
Linux Containers, Linux API, Linux Audit, Linux FS Devel,
Linux Kernel, Linux Network Development
Cc: mszeredi-H+wXaHxf7aLQT0dZR+AlfA, Steve Grubb,
jlayton-H+wXaHxf7aLQT0dZR+AlfA, Carlos O'Donell, Daniel Walsh,
Paul Moore, Al Viro, David Howells, Madz Car, Andy Lutomirski,
trondmy-7I+n7zu2hftEKMMhf/gKZA, Eric Paris, Eric W. Biederman
On Tue, 2018-01-09 at 07:16 -0500, Richard Guy Briggs wrote:
> Containers are a userspace concept. The kernel knows nothing of them.
>
> The Linux audit system needs a way to be able to track the container
> provenance of events and actions. Audit needs the kernel's help to do
> this.
>
> Since the concept of a container is entirely a userspace concept, a
> registration from the userspace container orchestration system initiates
> this. This will define a point in time and a set of resources
> associated with a particular container with an audit container
> identifier.
>
> The registration is a u64 representing the audit container identifier
> written to a special file in a pseudo filesystem (proc, since PID tree
> already exists) representing a process that will become a parent process
> in that container. This write might place restrictions on mount
> namespaces required to define a container, or at least careful checking
> of namespaces in the kernel to verify permissions of the orchestrator so
> it can't change its own container ID. A bind mount of nsfs may be
> necessary in the container orchestrator's mount namespace. This write
> can only happen once per process.
>
> Note: The justification for using a u64 is that it minimizes the
> information printed in every audit record, reducing bandwidth and limits
> comparisons to a single u64 which will be faster and less error-prone.
>
> Require CAP_AUDIT_CONTROL to be able to carry out the registration. At
> that time, record the target container's user-supplied audit container
> identifier along with a target container's parent process (which may
> become the target container's "init" process) process ID (referenced
> from the initial PID namespace) in a new record AUDIT_CONTAINER with a
> qualifying op=$action field.
>
> Issue a new auxilliary record AUDIT_CONTAINER_INFO for each valid
> container ID present on an auditable action or event.
>
> Forked and cloned processes inherit their parent's audit container
> identifier, referenced in the process' task_struct. Since the audit
> container identifier is inherited rather than written, it can still be
> written once. This will prevent tampering while allowing nesting.
> (This can be implemented with an internal settable flag upon
> registration that does not get copied across a fork/clone.)
>
> Mimic setns(2) and return an error if the process has already initiated
> threading or forked since this registration should happen before the
> process execution is started by the orchestrator and hence should not
> yet have any threads or children. If this is deemed overly restrictive,
> switch all of the target's threads and children to the new containerID.
>
> Trust the orchestrator to judiciously use and restrict CAP_AUDIT_CONTROL.
>
> When a container ceases to exist because the last process in that
> container has exited log the fact to balance the registration action.
> (This is likely needed for certification accountability.)
>
> At this point it appears unnecessary to add a container session
> identifier since this is all tracked from loginuid and sessionid to
> communicate with the container orchestrator to spawn an additional
> session into an existing container which would be logged. It can be
> added at a later date without breaking API should it be deemed
> necessary.
>
> The following namespace logging actions are not needed for certification
> purposes at this point, but are helpful for tracking namespace activity.
> These are auxilliary records that are associated with namespace
> manipulation syscalls unshare(2), clone(2) and setns(2), so the records
> will only show up if explicit syscall rules have been added to document
> this activity.
>
> Log the creation of every namespace, inheriting/adding its spawning
> process' audit container identifier(s), if applicable. Include the
> spawning and spawned namespace IDs (device and inode number tuples).
> [AUDIT_NS_CREATE, AUDIT_NS_DESTROY] [clone(2), unshare(2), setns(2)]
> Note: At this point it appears only network namespaces may need to track
> container IDs apart from processes since incoming packets may cause an
> auditable event before being associated with a process. Since a
> namespace can be shared by processes in different containers, the
> namespace will need to track all containers to which it has been
> assigned.
>
> Upon registration, the target process' namespace IDs (in the form of a
> nsfs device number and inode number tuple) will be recorded in an
> AUDIT_NS_INFO auxilliary record.
>
> Log the destruction of every namespace that is no longer used by any
> process, including the namespace IDs (device and inode number tuples).
> [AUDIT_NS_DESTROY] [process exit, unshare(2), setns(2)]
>
> Issue a new auxilliary record AUDIT_NS_CHANGE listing (opt: op=$action)
> the parent and child namespace IDs for any changes to a process'
> namespaces. [setns(2)]
> Note: It may be possible to combine AUDIT_NS_* record formats and
> distinguish them with an op=$action field depending on the fields
> required for each message type.
>
> The audit container identifier will need to be reaped from all
> implicated namespaces upon the destruction of a container.
>
> This namespace information adds supporting information for tracking
> events not attributable to specific processes.
>
> Changelog:
>
> (Upstream V3)
> - switch back to u64 (from pmoore, can be expanded to u128 in future if
> need arises without breaking API. u32 was originally proposed, up to
> c36 discussed)
> - write-once, but children inherit audit container identifier and can
> then still be written once
> - switch to CAP_AUDIT_CONTROL
> - group namespace actions together, auxilliary records to namespace
> operations.
>
> (Upstream V2)
> - switch from u64 to u128 UUID
> - switch from "signal" and "trigger" to "register"
> - restrict registration to single process or force all threads and
> children into same container
I am trying to understand the back and forth on the ID size.
>From an orchestrator POV anything that requires tracking a node
specific ID is not ideal.
Orchestrators tend to span many nodes, and containers tend to have IDs
that are either UUID or have a Hash (like SHA256) as identifier.
The problem here is two-fold:
a) Your auditing requires some mapping to be useful outside of the
system.
If you aggreggate audit logs outside of the system or you want to
correlate the system audit logs with other components dealing with
containers, now you need a place where you provide a mapping from your
audit u64 to the ID a container has in the rest of the system.
b) Now you need a mapping of some sort. The simplest way a container
orchestrator can go about this is to just use the UUID or Hash
representing their view of the container, truncate it to a u64 and use
that for Audit. This means there are some chances there will be a
collision and a duplicate u64 ID will be used by the orchestrator as
the container ID. What happen in that case ?
Simo.
--
Simo Sorce
Sr. Principal Software Engineer
Red Hat, Inc
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: RFC(V3): Audit Kernel Container IDs
2018-01-09 16:18 ` Simo Sorce
(?)
@ 2018-01-10 7:00 ` Richard Guy Briggs
-1 siblings, 0 replies; 35+ messages in thread
From: Richard Guy Briggs @ 2018-01-10 7:00 UTC (permalink / raw)
To: Simo Sorce
Cc: cgroups, Linux Containers, Linux API, Linux Audit, Linux FS Devel,
Linux Kernel, Linux Network Development, Carlos O'Donell,
Aristeu Rozanski, David Howells, Eric W. Biederman, Eric Paris,
Daniel Walsh, jlayton, Andy Lutomirski, mszeredi, Paul Moore,
Serge E. Hallyn, Steve Grubb, trondmy, Al Viro
On 2018-01-09 11:18, Simo Sorce wrote:
> On Tue, 2018-01-09 at 07:16 -0500, Richard Guy Briggs wrote:
> > Containers are a userspace concept. The kernel knows nothing of them.
> >
> > The Linux audit system needs a way to be able to track the container
> > provenance of events and actions. Audit needs the kernel's help to do
> > this.
> >
> > Since the concept of a container is entirely a userspace concept, a
> > registration from the userspace container orchestration system initiates
> > this. This will define a point in time and a set of resources
> > associated with a particular container with an audit container
> > identifier.
> >
> > The registration is a u64 representing the audit container identifier
> > written to a special file in a pseudo filesystem (proc, since PID tree
> > already exists) representing a process that will become a parent process
> > in that container. This write might place restrictions on mount
> > namespaces required to define a container, or at least careful checking
> > of namespaces in the kernel to verify permissions of the orchestrator so
> > it can't change its own container ID. A bind mount of nsfs may be
> > necessary in the container orchestrator's mount namespace. This write
> > can only happen once per process.
> >
> > Note: The justification for using a u64 is that it minimizes the
> > information printed in every audit record, reducing bandwidth and limits
> > comparisons to a single u64 which will be faster and less error-prone.
> >
> > Require CAP_AUDIT_CONTROL to be able to carry out the registration. At
> > that time, record the target container's user-supplied audit container
> > identifier along with a target container's parent process (which may
> > become the target container's "init" process) process ID (referenced
> > from the initial PID namespace) in a new record AUDIT_CONTAINER with a
> > qualifying op=$action field.
> >
> > Issue a new auxilliary record AUDIT_CONTAINER_INFO for each valid
> > container ID present on an auditable action or event.
> >
> > Forked and cloned processes inherit their parent's audit container
> > identifier, referenced in the process' task_struct. Since the audit
> > container identifier is inherited rather than written, it can still be
> > written once. This will prevent tampering while allowing nesting.
> > (This can be implemented with an internal settable flag upon
> > registration that does not get copied across a fork/clone.)
> >
> > Mimic setns(2) and return an error if the process has already initiated
> > threading or forked since this registration should happen before the
> > process execution is started by the orchestrator and hence should not
> > yet have any threads or children. If this is deemed overly restrictive,
> > switch all of the target's threads and children to the new containerID.
> >
> > Trust the orchestrator to judiciously use and restrict CAP_AUDIT_CONTROL.
> >
> > When a container ceases to exist because the last process in that
> > container has exited log the fact to balance the registration action.
> > (This is likely needed for certification accountability.)
> >
> > At this point it appears unnecessary to add a container session
> > identifier since this is all tracked from loginuid and sessionid to
> > communicate with the container orchestrator to spawn an additional
> > session into an existing container which would be logged. It can be
> > added at a later date without breaking API should it be deemed
> > necessary.
> >
> > The following namespace logging actions are not needed for certification
> > purposes at this point, but are helpful for tracking namespace activity.
> > These are auxilliary records that are associated with namespace
> > manipulation syscalls unshare(2), clone(2) and setns(2), so the records
> > will only show up if explicit syscall rules have been added to document
> > this activity.
> >
> > Log the creation of every namespace, inheriting/adding its spawning
> > process' audit container identifier(s), if applicable. Include the
> > spawning and spawned namespace IDs (device and inode number tuples).
> > [AUDIT_NS_CREATE, AUDIT_NS_DESTROY] [clone(2), unshare(2), setns(2)]
> > Note: At this point it appears only network namespaces may need to track
> > container IDs apart from processes since incoming packets may cause an
> > auditable event before being associated with a process. Since a
> > namespace can be shared by processes in different containers, the
> > namespace will need to track all containers to which it has been
> > assigned.
> >
> > Upon registration, the target process' namespace IDs (in the form of a
> > nsfs device number and inode number tuple) will be recorded in an
> > AUDIT_NS_INFO auxilliary record.
> >
> > Log the destruction of every namespace that is no longer used by any
> > process, including the namespace IDs (device and inode number tuples).
> > [AUDIT_NS_DESTROY] [process exit, unshare(2), setns(2)]
> >
> > Issue a new auxilliary record AUDIT_NS_CHANGE listing (opt: op=$action)
> > the parent and child namespace IDs for any changes to a process'
> > namespaces. [setns(2)]
> > Note: It may be possible to combine AUDIT_NS_* record formats and
> > distinguish them with an op=$action field depending on the fields
> > required for each message type.
> >
> > The audit container identifier will need to be reaped from all
> > implicated namespaces upon the destruction of a container.
> >
> > This namespace information adds supporting information for tracking
> > events not attributable to specific processes.
> >
> > Changelog:
> >
> > (Upstream V3)
> > - switch back to u64 (from pmoore, can be expanded to u128 in future if
> > need arises without breaking API. u32 was originally proposed, up to
> > c36 discussed)
> > - write-once, but children inherit audit container identifier and can
> > then still be written once
> > - switch to CAP_AUDIT_CONTROL
> > - group namespace actions together, auxilliary records to namespace
> > operations.
> >
> > (Upstream V2)
> > - switch from u64 to u128 UUID
> > - switch from "signal" and "trigger" to "register"
> > - restrict registration to single process or force all threads and
> > children into same container
>
> I am trying to understand the back and forth on the ID size.
>
> From an orchestrator POV anything that requires tracking a node
> specific ID is not ideal.
>
> Orchestrators tend to span many nodes, and containers tend to have IDs
> that are either UUID or have a Hash (like SHA256) as identifier.
>
> The problem here is two-fold:
>
> a) Your auditing requires some mapping to be useful outside of the
> system.
> If you aggreggate audit logs outside of the system or you want to
> correlate the system audit logs with other components dealing with
> containers, now you need a place where you provide a mapping from your
> audit u64 to the ID a container has in the rest of the system.
>
> b) Now you need a mapping of some sort. The simplest way a container
> orchestrator can go about this is to just use the UUID or Hash
> representing their view of the container, truncate it to a u64 and use
> that for Audit. This means there are some chances there will be a
> collision and a duplicate u64 ID will be used by the orchestrator as
> the container ID. What happen in that case ?
Paul, can you justify this somewhat larger inconvenience for some
relatively minor convenience on our part? u64 vs u128 is easy for us to
accomodate in terms of scalar comparisons. It doubles the information
in every container id field we print in audit records. A c36 is a
bigger step.
> Simo.
- RGB
--
Richard Guy Briggs <rgb@redhat.com>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: RFC(V3): Audit Kernel Container IDs
@ 2018-01-10 7:00 ` Richard Guy Briggs
0 siblings, 0 replies; 35+ messages in thread
From: Richard Guy Briggs @ 2018-01-10 7:00 UTC (permalink / raw)
To: Simo Sorce
Cc: cgroups, Linux Containers, Linux API, Linux Audit, Linux FS Devel,
Linux Kernel, Linux Network Development, Carlos O'Donell,
Aristeu Rozanski, David Howells, Eric W. Biederman, Eric Paris,
Daniel Walsh, jlayton, Andy Lutomirski, mszeredi, Paul Moore,
Serge E. Hallyn, Steve Grubb, trondmy, Al Viro
On 2018-01-09 11:18, Simo Sorce wrote:
> On Tue, 2018-01-09 at 07:16 -0500, Richard Guy Briggs wrote:
> > Containers are a userspace concept. The kernel knows nothing of them.
> >
> > The Linux audit system needs a way to be able to track the container
> > provenance of events and actions. Audit needs the kernel's help to do
> > this.
> >
> > Since the concept of a container is entirely a userspace concept, a
> > registration from the userspace container orchestration system initiates
> > this. This will define a point in time and a set of resources
> > associated with a particular container with an audit container
> > identifier.
> >
> > The registration is a u64 representing the audit container identifier
> > written to a special file in a pseudo filesystem (proc, since PID tree
> > already exists) representing a process that will become a parent process
> > in that container. This write might place restrictions on mount
> > namespaces required to define a container, or at least careful checking
> > of namespaces in the kernel to verify permissions of the orchestrator so
> > it can't change its own container ID. A bind mount of nsfs may be
> > necessary in the container orchestrator's mount namespace. This write
> > can only happen once per process.
> >
> > Note: The justification for using a u64 is that it minimizes the
> > information printed in every audit record, reducing bandwidth and limits
> > comparisons to a single u64 which will be faster and less error-prone.
> >
> > Require CAP_AUDIT_CONTROL to be able to carry out the registration. At
> > that time, record the target container's user-supplied audit container
> > identifier along with a target container's parent process (which may
> > become the target container's "init" process) process ID (referenced
> > from the initial PID namespace) in a new record AUDIT_CONTAINER with a
> > qualifying op=$action field.
> >
> > Issue a new auxilliary record AUDIT_CONTAINER_INFO for each valid
> > container ID present on an auditable action or event.
> >
> > Forked and cloned processes inherit their parent's audit container
> > identifier, referenced in the process' task_struct. Since the audit
> > container identifier is inherited rather than written, it can still be
> > written once. This will prevent tampering while allowing nesting.
> > (This can be implemented with an internal settable flag upon
> > registration that does not get copied across a fork/clone.)
> >
> > Mimic setns(2) and return an error if the process has already initiated
> > threading or forked since this registration should happen before the
> > process execution is started by the orchestrator and hence should not
> > yet have any threads or children. If this is deemed overly restrictive,
> > switch all of the target's threads and children to the new containerID.
> >
> > Trust the orchestrator to judiciously use and restrict CAP_AUDIT_CONTROL.
> >
> > When a container ceases to exist because the last process in that
> > container has exited log the fact to balance the registration action.
> > (This is likely needed for certification accountability.)
> >
> > At this point it appears unnecessary to add a container session
> > identifier since this is all tracked from loginuid and sessionid to
> > communicate with the container orchestrator to spawn an additional
> > session into an existing container which would be logged. It can be
> > added at a later date without breaking API should it be deemed
> > necessary.
> >
> > The following namespace logging actions are not needed for certification
> > purposes at this point, but are helpful for tracking namespace activity.
> > These are auxilliary records that are associated with namespace
> > manipulation syscalls unshare(2), clone(2) and setns(2), so the records
> > will only show up if explicit syscall rules have been added to document
> > this activity.
> >
> > Log the creation of every namespace, inheriting/adding its spawning
> > process' audit container identifier(s), if applicable. Include the
> > spawning and spawned namespace IDs (device and inode number tuples).
> > [AUDIT_NS_CREATE, AUDIT_NS_DESTROY] [clone(2), unshare(2), setns(2)]
> > Note: At this point it appears only network namespaces may need to track
> > container IDs apart from processes since incoming packets may cause an
> > auditable event before being associated with a process. Since a
> > namespace can be shared by processes in different containers, the
> > namespace will need to track all containers to which it has been
> > assigned.
> >
> > Upon registration, the target process' namespace IDs (in the form of a
> > nsfs device number and inode number tuple) will be recorded in an
> > AUDIT_NS_INFO auxilliary record.
> >
> > Log the destruction of every namespace that is no longer used by any
> > process, including the namespace IDs (device and inode number tuples).
> > [AUDIT_NS_DESTROY] [process exit, unshare(2), setns(2)]
> >
> > Issue a new auxilliary record AUDIT_NS_CHANGE listing (opt: op=$action)
> > the parent and child namespace IDs for any changes to a process'
> > namespaces. [setns(2)]
> > Note: It may be possible to combine AUDIT_NS_* record formats and
> > distinguish them with an op=$action field depending on the fields
> > required for each message type.
> >
> > The audit container identifier will need to be reaped from all
> > implicated namespaces upon the destruction of a container.
> >
> > This namespace information adds supporting information for tracking
> > events not attributable to specific processes.
> >
> > Changelog:
> >
> > (Upstream V3)
> > - switch back to u64 (from pmoore, can be expanded to u128 in future if
> > need arises without breaking API. u32 was originally proposed, up to
> > c36 discussed)
> > - write-once, but children inherit audit container identifier and can
> > then still be written once
> > - switch to CAP_AUDIT_CONTROL
> > - group namespace actions together, auxilliary records to namespace
> > operations.
> >
> > (Upstream V2)
> > - switch from u64 to u128 UUID
> > - switch from "signal" and "trigger" to "register"
> > - restrict registration to single process or force all threads and
> > children into same container
>
> I am trying to understand the back and forth on the ID size.
>
> From an orchestrator POV anything that requires tracking a node
> specific ID is not ideal.
>
> Orchestrators tend to span many nodes, and containers tend to have IDs
> that are either UUID or have a Hash (like SHA256) as identifier.
>
> The problem here is two-fold:
>
> a) Your auditing requires some mapping to be useful outside of the
> system.
> If you aggreggate audit logs outside of the system or you want to
> correlate the system audit logs with other components dealing with
> containers, now you need a place where you provide a mapping from your
> audit u64 to the ID a container has in the rest of the system.
>
> b) Now you need a mapping of some sort. The simplest way a container
> orchestrator can go about this is to just use the UUID or Hash
> representing their view of the container, truncate it to a u64 and use
> that for Audit. This means there are some chances there will be a
> collision and a duplicate u64 ID will be used by the orchestrator as
> the container ID. What happen in that case ?
Paul, can you justify this somewhat larger inconvenience for some
relatively minor convenience on our part? u64 vs u128 is easy for us to
accomodate in terms of scalar comparisons. It doubles the information
in every container id field we print in audit records. A c36 is a
bigger step.
> Simo.
- RGB
--
Richard Guy Briggs <rgb@redhat.com>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: RFC(V3): Audit Kernel Container IDs
@ 2018-01-10 7:00 ` Richard Guy Briggs
0 siblings, 0 replies; 35+ messages in thread
From: Richard Guy Briggs @ 2018-01-10 7:00 UTC (permalink / raw)
To: Simo Sorce
Cc: cgroups, Linux Containers, Linux API, Linux Audit, Linux FS Devel,
Linux Kernel, Linux Network Development, Carlos O'Donell,
Aristeu Rozanski, David Howells, Eric W. Biederman, Eric Paris,
Daniel Walsh, jlayton, Andy Lutomirski, mszeredi, Paul Moore,
Serge E. Hallyn, Steve Grubb, trondmy, Al Viro, Madz Car
On 2018-01-09 11:18, Simo Sorce wrote:
> On Tue, 2018-01-09 at 07:16 -0500, Richard Guy Briggs wrote:
> > Containers are a userspace concept. The kernel knows nothing of them.
> >
> > The Linux audit system needs a way to be able to track the container
> > provenance of events and actions. Audit needs the kernel's help to do
> > this.
> >
> > Since the concept of a container is entirely a userspace concept, a
> > registration from the userspace container orchestration system initiates
> > this. This will define a point in time and a set of resources
> > associated with a particular container with an audit container
> > identifier.
> >
> > The registration is a u64 representing the audit container identifier
> > written to a special file in a pseudo filesystem (proc, since PID tree
> > already exists) representing a process that will become a parent process
> > in that container. This write might place restrictions on mount
> > namespaces required to define a container, or at least careful checking
> > of namespaces in the kernel to verify permissions of the orchestrator so
> > it can't change its own container ID. A bind mount of nsfs may be
> > necessary in the container orchestrator's mount namespace. This write
> > can only happen once per process.
> >
> > Note: The justification for using a u64 is that it minimizes the
> > information printed in every audit record, reducing bandwidth and limits
> > comparisons to a single u64 which will be faster and less error-prone.
> >
> > Require CAP_AUDIT_CONTROL to be able to carry out the registration. At
> > that time, record the target container's user-supplied audit container
> > identifier along with a target container's parent process (which may
> > become the target container's "init" process) process ID (referenced
> > from the initial PID namespace) in a new record AUDIT_CONTAINER with a
> > qualifying op=$action field.
> >
> > Issue a new auxilliary record AUDIT_CONTAINER_INFO for each valid
> > container ID present on an auditable action or event.
> >
> > Forked and cloned processes inherit their parent's audit container
> > identifier, referenced in the process' task_struct. Since the audit
> > container identifier is inherited rather than written, it can still be
> > written once. This will prevent tampering while allowing nesting.
> > (This can be implemented with an internal settable flag upon
> > registration that does not get copied across a fork/clone.)
> >
> > Mimic setns(2) and return an error if the process has already initiated
> > threading or forked since this registration should happen before the
> > process execution is started by the orchestrator and hence should not
> > yet have any threads or children. If this is deemed overly restrictive,
> > switch all of the target's threads and children to the new containerID.
> >
> > Trust the orchestrator to judiciously use and restrict CAP_AUDIT_CONTROL.
> >
> > When a container ceases to exist because the last process in that
> > container has exited log the fact to balance the registration action.
> > (This is likely needed for certification accountability.)
> >
> > At this point it appears unnecessary to add a container session
> > identifier since this is all tracked from loginuid and sessionid to
> > communicate with the container orchestrator to spawn an additional
> > session into an existing container which would be logged. It can be
> > added at a later date without breaking API should it be deemed
> > necessary.
> >
> > The following namespace logging actions are not needed for certification
> > purposes at this point, but are helpful for tracking namespace activity.
> > These are auxilliary records that are associated with namespace
> > manipulation syscalls unshare(2), clone(2) and setns(2), so the records
> > will only show up if explicit syscall rules have been added to document
> > this activity.
> >
> > Log the creation of every namespace, inheriting/adding its spawning
> > process' audit container identifier(s), if applicable. Include the
> > spawning and spawned namespace IDs (device and inode number tuples).
> > [AUDIT_NS_CREATE, AUDIT_NS_DESTROY] [clone(2), unshare(2), setns(2)]
> > Note: At this point it appears only network namespaces may need to track
> > container IDs apart from processes since incoming packets may cause an
> > auditable event before being associated with a process. Since a
> > namespace can be shared by processes in different containers, the
> > namespace will need to track all containers to which it has been
> > assigned.
> >
> > Upon registration, the target process' namespace IDs (in the form of a
> > nsfs device number and inode number tuple) will be recorded in an
> > AUDIT_NS_INFO auxilliary record.
> >
> > Log the destruction of every namespace that is no longer used by any
> > process, including the namespace IDs (device and inode number tuples).
> > [AUDIT_NS_DESTROY] [process exit, unshare(2), setns(2)]
> >
> > Issue a new auxilliary record AUDIT_NS_CHANGE listing (opt: op=$action)
> > the parent and child namespace IDs for any changes to a process'
> > namespaces. [setns(2)]
> > Note: It may be possible to combine AUDIT_NS_* record formats and
> > distinguish them with an op=$action field depending on the fields
> > required for each message type.
> >
> > The audit container identifier will need to be reaped from all
> > implicated namespaces upon the destruction of a container.
> >
> > This namespace information adds supporting information for tracking
> > events not attributable to specific processes.
> >
> > Changelog:
> >
> > (Upstream V3)
> > - switch back to u64 (from pmoore, can be expanded to u128 in future if
> > need arises without breaking API. u32 was originally proposed, up to
> > c36 discussed)
> > - write-once, but children inherit audit container identifier and can
> > then still be written once
> > - switch to CAP_AUDIT_CONTROL
> > - group namespace actions together, auxilliary records to namespace
> > operations.
> >
> > (Upstream V2)
> > - switch from u64 to u128 UUID
> > - switch from "signal" and "trigger" to "register"
> > - restrict registration to single process or force all threads and
> > children into same container
>
> I am trying to understand the back and forth on the ID size.
>
> From an orchestrator POV anything that requires tracking a node
> specific ID is not ideal.
>
> Orchestrators tend to span many nodes, and containers tend to have IDs
> that are either UUID or have a Hash (like SHA256) as identifier.
>
> The problem here is two-fold:
>
> a) Your auditing requires some mapping to be useful outside of the
> system.
> If you aggreggate audit logs outside of the system or you want to
> correlate the system audit logs with other components dealing with
> containers, now you need a place where you provide a mapping from your
> audit u64 to the ID a container has in the rest of the system.
>
> b) Now you need a mapping of some sort. The simplest way a container
> orchestrator can go about this is to just use the UUID or Hash
> representing their view of the container, truncate it to a u64 and use
> that for Audit. This means there are some chances there will be a
> collision and a duplicate u64 ID will be used by the orchestrator as
> the container ID. What happen in that case ?
Paul, can you justify this somewhat larger inconvenience for some
relatively minor convenience on our part? u64 vs u128 is easy for us to
accomodate in terms of scalar comparisons. It doubles the information
in every container id field we print in audit records. A c36 is a
bigger step.
> Simo.
- RGB
--
Richard Guy Briggs <rgb@redhat.com>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: RFC(V3): Audit Kernel Container IDs
2018-01-10 7:00 ` Richard Guy Briggs
(?)
(?)
@ 2018-02-02 21:24 ` Paul Moore
[not found] ` <CAHC9VhQ=hX55e7ftkVQCogTZTcdSm3rm-+YNOgWomabbXV_sKg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
-1 siblings, 1 reply; 35+ messages in thread
From: Paul Moore @ 2018-02-02 21:24 UTC (permalink / raw)
To: Richard Guy Briggs
Cc: Simo Sorce, David Howells, cgroups, jlayton, trondmy,
Serge E. Hallyn, mszeredi, Al Viro, Andy Lutomirski, Eric Paris,
Carlos O'Donell, Linux API, Linux Containers, Linux Kernel,
Linux Audit, Eric W. Biederman, Linux Network Development,
Linux FS Devel
On Wed, Jan 10, 2018 at 2:00 AM, Richard Guy Briggs <rgb@redhat.com> wrote:
> On 2018-01-09 11:18, Simo Sorce wrote:
>> On Tue, 2018-01-09 at 07:16 -0500, Richard Guy Briggs wrote:
>> > Containers are a userspace concept. The kernel knows nothing of them.
>> >
>> > The Linux audit system needs a way to be able to track the container
>> > provenance of events and actions. Audit needs the kernel's help to do
>> > this.
>> >
>> > Since the concept of a container is entirely a userspace concept, a
>> > registration from the userspace container orchestration system initiates
>> > this. This will define a point in time and a set of resources
>> > associated with a particular container with an audit container
>> > identifier.
>> >
>> > The registration is a u64 representing the audit container identifier
>> > written to a special file in a pseudo filesystem (proc, since PID tree
>> > already exists) representing a process that will become a parent process
>> > in that container. This write might place restrictions on mount
>> > namespaces required to define a container, or at least careful checking
>> > of namespaces in the kernel to verify permissions of the orchestrator so
>> > it can't change its own container ID. A bind mount of nsfs may be
>> > necessary in the container orchestrator's mount namespace. This write
>> > can only happen once per process.
>> >
>> > Note: The justification for using a u64 is that it minimizes the
>> > information printed in every audit record, reducing bandwidth and limits
>> > comparisons to a single u64 which will be faster and less error-prone.
>> >
>> > Require CAP_AUDIT_CONTROL to be able to carry out the registration. At
>> > that time, record the target container's user-supplied audit container
>> > identifier along with a target container's parent process (which may
>> > become the target container's "init" process) process ID (referenced
>> > from the initial PID namespace) in a new record AUDIT_CONTAINER with a
>> > qualifying op=$action field.
>> >
>> > Issue a new auxilliary record AUDIT_CONTAINER_INFO for each valid
>> > container ID present on an auditable action or event.
>> >
>> > Forked and cloned processes inherit their parent's audit container
>> > identifier, referenced in the process' task_struct. Since the audit
>> > container identifier is inherited rather than written, it can still be
>> > written once. This will prevent tampering while allowing nesting.
>> > (This can be implemented with an internal settable flag upon
>> > registration that does not get copied across a fork/clone.)
>> >
>> > Mimic setns(2) and return an error if the process has already initiated
>> > threading or forked since this registration should happen before the
>> > process execution is started by the orchestrator and hence should not
>> > yet have any threads or children. If this is deemed overly restrictive,
>> > switch all of the target's threads and children to the new containerID.
>> >
>> > Trust the orchestrator to judiciously use and restrict CAP_AUDIT_CONTROL.
>> >
>> > When a container ceases to exist because the last process in that
>> > container has exited log the fact to balance the registration action.
>> > (This is likely needed for certification accountability.)
>> >
>> > At this point it appears unnecessary to add a container session
>> > identifier since this is all tracked from loginuid and sessionid to
>> > communicate with the container orchestrator to spawn an additional
>> > session into an existing container which would be logged. It can be
>> > added at a later date without breaking API should it be deemed
>> > necessary.
>> >
>> > The following namespace logging actions are not needed for certification
>> > purposes at this point, but are helpful for tracking namespace activity.
>> > These are auxilliary records that are associated with namespace
>> > manipulation syscalls unshare(2), clone(2) and setns(2), so the records
>> > will only show up if explicit syscall rules have been added to document
>> > this activity.
>> >
>> > Log the creation of every namespace, inheriting/adding its spawning
>> > process' audit container identifier(s), if applicable. Include the
>> > spawning and spawned namespace IDs (device and inode number tuples).
>> > [AUDIT_NS_CREATE, AUDIT_NS_DESTROY] [clone(2), unshare(2), setns(2)]
>> > Note: At this point it appears only network namespaces may need to track
>> > container IDs apart from processes since incoming packets may cause an
>> > auditable event before being associated with a process. Since a
>> > namespace can be shared by processes in different containers, the
>> > namespace will need to track all containers to which it has been
>> > assigned.
>> >
>> > Upon registration, the target process' namespace IDs (in the form of a
>> > nsfs device number and inode number tuple) will be recorded in an
>> > AUDIT_NS_INFO auxilliary record.
>> >
>> > Log the destruction of every namespace that is no longer used by any
>> > process, including the namespace IDs (device and inode number tuples).
>> > [AUDIT_NS_DESTROY] [process exit, unshare(2), setns(2)]
>> >
>> > Issue a new auxilliary record AUDIT_NS_CHANGE listing (opt: op=$action)
>> > the parent and child namespace IDs for any changes to a process'
>> > namespaces. [setns(2)]
>> > Note: It may be possible to combine AUDIT_NS_* record formats and
>> > distinguish them with an op=$action field depending on the fields
>> > required for each message type.
>> >
>> > The audit container identifier will need to be reaped from all
>> > implicated namespaces upon the destruction of a container.
>> >
>> > This namespace information adds supporting information for tracking
>> > events not attributable to specific processes.
>> >
>> > Changelog:
>> >
>> > (Upstream V3)
>> > - switch back to u64 (from pmoore, can be expanded to u128 in future if
>> > need arises without breaking API. u32 was originally proposed, up to
>> > c36 discussed)
>> > - write-once, but children inherit audit container identifier and can
>> > then still be written once
>> > - switch to CAP_AUDIT_CONTROL
>> > - group namespace actions together, auxilliary records to namespace
>> > operations.
>> >
>> > (Upstream V2)
>> > - switch from u64 to u128 UUID
>> > - switch from "signal" and "trigger" to "register"
>> > - restrict registration to single process or force all threads and
>> > children into same container
>>
>> I am trying to understand the back and forth on the ID size.
>>
>> From an orchestrator POV anything that requires tracking a node
>> specific ID is not ideal.
>>
>> Orchestrators tend to span many nodes, and containers tend to have IDs
>> that are either UUID or have a Hash (like SHA256) as identifier.
>>
>> The problem here is two-fold:
>>
>> a) Your auditing requires some mapping to be useful outside of the
>> system.
>> If you aggreggate audit logs outside of the system or you want to
>> correlate the system audit logs with other components dealing with
>> containers, now you need a place where you provide a mapping from your
>> audit u64 to the ID a container has in the rest of the system.
>>
>> b) Now you need a mapping of some sort. The simplest way a container
>> orchestrator can go about this is to just use the UUID or Hash
>> representing their view of the container, truncate it to a u64 and use
>> that for Audit. This means there are some chances there will be a
>> collision and a duplicate u64 ID will be used by the orchestrator as
>> the container ID. What happen in that case ?
>
> Paul, can you justify this somewhat larger inconvenience for some
> relatively minor convenience on our part?
Done in direct response to Simo.
But to be clear Richard, we've talked about this a few times, it's not
a "minor convenience" on our part, it's a pretty big convenience once
we starting having to route audit events and make decisions based on
the audit container ID information. Audit performance is less than
awesome now, I'm working hard to not make it worse.
> u64 vs u128 is easy for us to
> accomodate in terms of scalar comparisons. It doubles the information
> in every container id field we print in audit records.
... and slows down audit container ID checks.
> A c36 is a bigger step.
Yeah, we're not doing that, no way.
--
paul moore
www.paul-moore.com
^ permalink raw reply [flat|nested] 35+ messages in thread[parent not found: <20180110070011.l4rcdcwb27witfem-bcJWsdo4jJjeVoXN4CMphl7TgLCtbB0G@public.gmane.org>]
* Re: RFC(V3): Audit Kernel Container IDs
[not found] ` <20180110070011.l4rcdcwb27witfem-bcJWsdo4jJjeVoXN4CMphl7TgLCtbB0G@public.gmane.org>
@ 2018-02-02 21:24 ` Paul Moore
0 siblings, 0 replies; 35+ messages in thread
From: Paul Moore @ 2018-02-02 21:24 UTC (permalink / raw)
To: Richard Guy Briggs
Cc: mszeredi-H+wXaHxf7aLQT0dZR+AlfA, Eric W. Biederman,
Andy Lutomirski, jlayton-H+wXaHxf7aLQT0dZR+AlfA,
Carlos O'Donell, Linux API, Linux Containers, Linux Kernel,
Eric Paris, David Howells, Linux Audit, Al Viro, Simo Sorce,
trondmy-7I+n7zu2hftEKMMhf/gKZA, Linux FS Devel,
cgroups-u79uwXL29TY76Z2rM5mHXA, Linux Network Development
On Wed, Jan 10, 2018 at 2:00 AM, Richard Guy Briggs <rgb-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> On 2018-01-09 11:18, Simo Sorce wrote:
>> On Tue, 2018-01-09 at 07:16 -0500, Richard Guy Briggs wrote:
>> > Containers are a userspace concept. The kernel knows nothing of them.
>> >
>> > The Linux audit system needs a way to be able to track the container
>> > provenance of events and actions. Audit needs the kernel's help to do
>> > this.
>> >
>> > Since the concept of a container is entirely a userspace concept, a
>> > registration from the userspace container orchestration system initiates
>> > this. This will define a point in time and a set of resources
>> > associated with a particular container with an audit container
>> > identifier.
>> >
>> > The registration is a u64 representing the audit container identifier
>> > written to a special file in a pseudo filesystem (proc, since PID tree
>> > already exists) representing a process that will become a parent process
>> > in that container. This write might place restrictions on mount
>> > namespaces required to define a container, or at least careful checking
>> > of namespaces in the kernel to verify permissions of the orchestrator so
>> > it can't change its own container ID. A bind mount of nsfs may be
>> > necessary in the container orchestrator's mount namespace. This write
>> > can only happen once per process.
>> >
>> > Note: The justification for using a u64 is that it minimizes the
>> > information printed in every audit record, reducing bandwidth and limits
>> > comparisons to a single u64 which will be faster and less error-prone.
>> >
>> > Require CAP_AUDIT_CONTROL to be able to carry out the registration. At
>> > that time, record the target container's user-supplied audit container
>> > identifier along with a target container's parent process (which may
>> > become the target container's "init" process) process ID (referenced
>> > from the initial PID namespace) in a new record AUDIT_CONTAINER with a
>> > qualifying op=$action field.
>> >
>> > Issue a new auxilliary record AUDIT_CONTAINER_INFO for each valid
>> > container ID present on an auditable action or event.
>> >
>> > Forked and cloned processes inherit their parent's audit container
>> > identifier, referenced in the process' task_struct. Since the audit
>> > container identifier is inherited rather than written, it can still be
>> > written once. This will prevent tampering while allowing nesting.
>> > (This can be implemented with an internal settable flag upon
>> > registration that does not get copied across a fork/clone.)
>> >
>> > Mimic setns(2) and return an error if the process has already initiated
>> > threading or forked since this registration should happen before the
>> > process execution is started by the orchestrator and hence should not
>> > yet have any threads or children. If this is deemed overly restrictive,
>> > switch all of the target's threads and children to the new containerID.
>> >
>> > Trust the orchestrator to judiciously use and restrict CAP_AUDIT_CONTROL.
>> >
>> > When a container ceases to exist because the last process in that
>> > container has exited log the fact to balance the registration action.
>> > (This is likely needed for certification accountability.)
>> >
>> > At this point it appears unnecessary to add a container session
>> > identifier since this is all tracked from loginuid and sessionid to
>> > communicate with the container orchestrator to spawn an additional
>> > session into an existing container which would be logged. It can be
>> > added at a later date without breaking API should it be deemed
>> > necessary.
>> >
>> > The following namespace logging actions are not needed for certification
>> > purposes at this point, but are helpful for tracking namespace activity.
>> > These are auxilliary records that are associated with namespace
>> > manipulation syscalls unshare(2), clone(2) and setns(2), so the records
>> > will only show up if explicit syscall rules have been added to document
>> > this activity.
>> >
>> > Log the creation of every namespace, inheriting/adding its spawning
>> > process' audit container identifier(s), if applicable. Include the
>> > spawning and spawned namespace IDs (device and inode number tuples).
>> > [AUDIT_NS_CREATE, AUDIT_NS_DESTROY] [clone(2), unshare(2), setns(2)]
>> > Note: At this point it appears only network namespaces may need to track
>> > container IDs apart from processes since incoming packets may cause an
>> > auditable event before being associated with a process. Since a
>> > namespace can be shared by processes in different containers, the
>> > namespace will need to track all containers to which it has been
>> > assigned.
>> >
>> > Upon registration, the target process' namespace IDs (in the form of a
>> > nsfs device number and inode number tuple) will be recorded in an
>> > AUDIT_NS_INFO auxilliary record.
>> >
>> > Log the destruction of every namespace that is no longer used by any
>> > process, including the namespace IDs (device and inode number tuples).
>> > [AUDIT_NS_DESTROY] [process exit, unshare(2), setns(2)]
>> >
>> > Issue a new auxilliary record AUDIT_NS_CHANGE listing (opt: op=$action)
>> > the parent and child namespace IDs for any changes to a process'
>> > namespaces. [setns(2)]
>> > Note: It may be possible to combine AUDIT_NS_* record formats and
>> > distinguish them with an op=$action field depending on the fields
>> > required for each message type.
>> >
>> > The audit container identifier will need to be reaped from all
>> > implicated namespaces upon the destruction of a container.
>> >
>> > This namespace information adds supporting information for tracking
>> > events not attributable to specific processes.
>> >
>> > Changelog:
>> >
>> > (Upstream V3)
>> > - switch back to u64 (from pmoore, can be expanded to u128 in future if
>> > need arises without breaking API. u32 was originally proposed, up to
>> > c36 discussed)
>> > - write-once, but children inherit audit container identifier and can
>> > then still be written once
>> > - switch to CAP_AUDIT_CONTROL
>> > - group namespace actions together, auxilliary records to namespace
>> > operations.
>> >
>> > (Upstream V2)
>> > - switch from u64 to u128 UUID
>> > - switch from "signal" and "trigger" to "register"
>> > - restrict registration to single process or force all threads and
>> > children into same container
>>
>> I am trying to understand the back and forth on the ID size.
>>
>> From an orchestrator POV anything that requires tracking a node
>> specific ID is not ideal.
>>
>> Orchestrators tend to span many nodes, and containers tend to have IDs
>> that are either UUID or have a Hash (like SHA256) as identifier.
>>
>> The problem here is two-fold:
>>
>> a) Your auditing requires some mapping to be useful outside of the
>> system.
>> If you aggreggate audit logs outside of the system or you want to
>> correlate the system audit logs with other components dealing with
>> containers, now you need a place where you provide a mapping from your
>> audit u64 to the ID a container has in the rest of the system.
>>
>> b) Now you need a mapping of some sort. The simplest way a container
>> orchestrator can go about this is to just use the UUID or Hash
>> representing their view of the container, truncate it to a u64 and use
>> that for Audit. This means there are some chances there will be a
>> collision and a duplicate u64 ID will be used by the orchestrator as
>> the container ID. What happen in that case ?
>
> Paul, can you justify this somewhat larger inconvenience for some
> relatively minor convenience on our part?
Done in direct response to Simo.
But to be clear Richard, we've talked about this a few times, it's not
a "minor convenience" on our part, it's a pretty big convenience once
we starting having to route audit events and make decisions based on
the audit container ID information. Audit performance is less than
awesome now, I'm working hard to not make it worse.
> u64 vs u128 is easy for us to
> accomodate in terms of scalar comparisons. It doubles the information
> in every container id field we print in audit records.
... and slows down audit container ID checks.
> A c36 is a bigger step.
Yeah, we're not doing that, no way.
--
paul moore
www.paul-moore.com
^ permalink raw reply [flat|nested] 35+ messages in thread
[parent not found: <1515514736.3239.10.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>]
* Re: RFC(V3): Audit Kernel Container IDs
[not found] ` <1515514736.3239.10.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2018-01-10 7:00 ` Richard Guy Briggs
2018-02-02 21:18 ` Paul Moore
2018-02-02 21:18 ` Paul Moore
2 siblings, 0 replies; 35+ messages in thread
From: Richard Guy Briggs @ 2018-01-10 7:00 UTC (permalink / raw)
To: Simo Sorce
Cc: David Howells, cgroups-u79uwXL29TY76Z2rM5mHXA,
jlayton-H+wXaHxf7aLQT0dZR+AlfA, trondmy-7I+n7zu2hftEKMMhf/gKZA,
Steve Grubb, mszeredi-H+wXaHxf7aLQT0dZR+AlfA, Madz Car, Al Viro,
Andy Lutomirski, Eric Paris, Carlos O'Donell, Linux API,
Linux Containers, Daniel Walsh, Linux Kernel, Paul Moore,
Linux Audit, Eric W. Biederman, Linux Network Development,
Linux FS Devel
On 2018-01-09 11:18, Simo Sorce wrote:
> On Tue, 2018-01-09 at 07:16 -0500, Richard Guy Briggs wrote:
> > Containers are a userspace concept. The kernel knows nothing of them.
> >
> > The Linux audit system needs a way to be able to track the container
> > provenance of events and actions. Audit needs the kernel's help to do
> > this.
> >
> > Since the concept of a container is entirely a userspace concept, a
> > registration from the userspace container orchestration system initiates
> > this. This will define a point in time and a set of resources
> > associated with a particular container with an audit container
> > identifier.
> >
> > The registration is a u64 representing the audit container identifier
> > written to a special file in a pseudo filesystem (proc, since PID tree
> > already exists) representing a process that will become a parent process
> > in that container. This write might place restrictions on mount
> > namespaces required to define a container, or at least careful checking
> > of namespaces in the kernel to verify permissions of the orchestrator so
> > it can't change its own container ID. A bind mount of nsfs may be
> > necessary in the container orchestrator's mount namespace. This write
> > can only happen once per process.
> >
> > Note: The justification for using a u64 is that it minimizes the
> > information printed in every audit record, reducing bandwidth and limits
> > comparisons to a single u64 which will be faster and less error-prone.
> >
> > Require CAP_AUDIT_CONTROL to be able to carry out the registration. At
> > that time, record the target container's user-supplied audit container
> > identifier along with a target container's parent process (which may
> > become the target container's "init" process) process ID (referenced
> > from the initial PID namespace) in a new record AUDIT_CONTAINER with a
> > qualifying op=$action field.
> >
> > Issue a new auxilliary record AUDIT_CONTAINER_INFO for each valid
> > container ID present on an auditable action or event.
> >
> > Forked and cloned processes inherit their parent's audit container
> > identifier, referenced in the process' task_struct. Since the audit
> > container identifier is inherited rather than written, it can still be
> > written once. This will prevent tampering while allowing nesting.
> > (This can be implemented with an internal settable flag upon
> > registration that does not get copied across a fork/clone.)
> >
> > Mimic setns(2) and return an error if the process has already initiated
> > threading or forked since this registration should happen before the
> > process execution is started by the orchestrator and hence should not
> > yet have any threads or children. If this is deemed overly restrictive,
> > switch all of the target's threads and children to the new containerID.
> >
> > Trust the orchestrator to judiciously use and restrict CAP_AUDIT_CONTROL.
> >
> > When a container ceases to exist because the last process in that
> > container has exited log the fact to balance the registration action.
> > (This is likely needed for certification accountability.)
> >
> > At this point it appears unnecessary to add a container session
> > identifier since this is all tracked from loginuid and sessionid to
> > communicate with the container orchestrator to spawn an additional
> > session into an existing container which would be logged. It can be
> > added at a later date without breaking API should it be deemed
> > necessary.
> >
> > The following namespace logging actions are not needed for certification
> > purposes at this point, but are helpful for tracking namespace activity.
> > These are auxilliary records that are associated with namespace
> > manipulation syscalls unshare(2), clone(2) and setns(2), so the records
> > will only show up if explicit syscall rules have been added to document
> > this activity.
> >
> > Log the creation of every namespace, inheriting/adding its spawning
> > process' audit container identifier(s), if applicable. Include the
> > spawning and spawned namespace IDs (device and inode number tuples).
> > [AUDIT_NS_CREATE, AUDIT_NS_DESTROY] [clone(2), unshare(2), setns(2)]
> > Note: At this point it appears only network namespaces may need to track
> > container IDs apart from processes since incoming packets may cause an
> > auditable event before being associated with a process. Since a
> > namespace can be shared by processes in different containers, the
> > namespace will need to track all containers to which it has been
> > assigned.
> >
> > Upon registration, the target process' namespace IDs (in the form of a
> > nsfs device number and inode number tuple) will be recorded in an
> > AUDIT_NS_INFO auxilliary record.
> >
> > Log the destruction of every namespace that is no longer used by any
> > process, including the namespace IDs (device and inode number tuples).
> > [AUDIT_NS_DESTROY] [process exit, unshare(2), setns(2)]
> >
> > Issue a new auxilliary record AUDIT_NS_CHANGE listing (opt: op=$action)
> > the parent and child namespace IDs for any changes to a process'
> > namespaces. [setns(2)]
> > Note: It may be possible to combine AUDIT_NS_* record formats and
> > distinguish them with an op=$action field depending on the fields
> > required for each message type.
> >
> > The audit container identifier will need to be reaped from all
> > implicated namespaces upon the destruction of a container.
> >
> > This namespace information adds supporting information for tracking
> > events not attributable to specific processes.
> >
> > Changelog:
> >
> > (Upstream V3)
> > - switch back to u64 (from pmoore, can be expanded to u128 in future if
> > need arises without breaking API. u32 was originally proposed, up to
> > c36 discussed)
> > - write-once, but children inherit audit container identifier and can
> > then still be written once
> > - switch to CAP_AUDIT_CONTROL
> > - group namespace actions together, auxilliary records to namespace
> > operations.
> >
> > (Upstream V2)
> > - switch from u64 to u128 UUID
> > - switch from "signal" and "trigger" to "register"
> > - restrict registration to single process or force all threads and
> > children into same container
>
> I am trying to understand the back and forth on the ID size.
>
> From an orchestrator POV anything that requires tracking a node
> specific ID is not ideal.
>
> Orchestrators tend to span many nodes, and containers tend to have IDs
> that are either UUID or have a Hash (like SHA256) as identifier.
>
> The problem here is two-fold:
>
> a) Your auditing requires some mapping to be useful outside of the
> system.
> If you aggreggate audit logs outside of the system or you want to
> correlate the system audit logs with other components dealing with
> containers, now you need a place where you provide a mapping from your
> audit u64 to the ID a container has in the rest of the system.
>
> b) Now you need a mapping of some sort. The simplest way a container
> orchestrator can go about this is to just use the UUID or Hash
> representing their view of the container, truncate it to a u64 and use
> that for Audit. This means there are some chances there will be a
> collision and a duplicate u64 ID will be used by the orchestrator as
> the container ID. What happen in that case ?
Paul, can you justify this somewhat larger inconvenience for some
relatively minor convenience on our part? u64 vs u128 is easy for us to
accomodate in terms of scalar comparisons. It doubles the information
in every container id field we print in audit records. A c36 is a
bigger step.
> Simo.
- RGB
--
Richard Guy Briggs <rgb-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635
^ permalink raw reply [flat|nested] 35+ messages in thread* Re: RFC(V3): Audit Kernel Container IDs
[not found] ` <1515514736.3239.10.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-01-10 7:00 ` Richard Guy Briggs
@ 2018-02-02 21:18 ` Paul Moore
2018-02-02 21:18 ` Paul Moore
2 siblings, 0 replies; 35+ messages in thread
From: Paul Moore @ 2018-02-02 21:18 UTC (permalink / raw)
To: Simo Sorce
Cc: mszeredi-H+wXaHxf7aLQT0dZR+AlfA, Eric W. Biederman,
jlayton-H+wXaHxf7aLQT0dZR+AlfA, Linux API, Linux Containers,
Linux Kernel, Eric Paris, David Howells, Carlos O'Donell,
Linux Audit, Al Viro, Andy Lutomirski, Linux Network Development,
Linux FS Devel, cgroups-u79uwXL29TY76Z2rM5mHXA,
trondmy-7I+n7zu2hftEKMMhf/gKZA
On Tue, Jan 9, 2018 at 11:18 AM, Simo Sorce <simo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> On Tue, 2018-01-09 at 07:16 -0500, Richard Guy Briggs wrote:
...
>> Changelog:
>>
>> (Upstream V3)
>> - switch back to u64 (from pmoore, can be expanded to u128 in future if
>> need arises without breaking API. u32 was originally proposed, up to
>> c36 discussed)
>> - write-once, but children inherit audit container identifier and can
>> then still be written once
>> - switch to CAP_AUDIT_CONTROL
>> - group namespace actions together, auxilliary records to namespace
>> operations.
>>
>> (Upstream V2)
>> - switch from u64 to u128 UUID
>> - switch from "signal" and "trigger" to "register"
>> - restrict registration to single process or force all threads and
>> children into same container
>
> I am trying to understand the back and forth on the ID size.
I'm just now getting a chance to read Richard's latest draft, but I
wanted to comment on this quickly.
There are two main reasons for keeping this a 32 or 64 bit integer:
1) After the initial "be able to associate audit events with a
container" stage, we are going to look into supporting multiple audit
daemons on the system so that you could run an audit daemon inside a
container and it would collect events generated by the container
(we're tentatively calling this "phase 2", feel free to insert your
own "magic happens" joke). There are a lot things that need to happen
in phase two, one of these things is the addition of an audit event
routing mechanism that will send audit records to the right audit
daemons (the "host" daemon will always see everything), in order to do
this we will need to be able to quickly compare audit container IDs,
this means an integer.
2) Whatever we pick for an audit container ID it is going to be wrong
for at least one container orchestrator. There is no "one" solution
here, so we are providing a small and flexible mechanism that higher
level orchestrators can use to provide a more complete solution.
> >From an orchestrator POV anything that requires tracking a node
> specific ID is not ideal.
>
> Orchestrators tend to span many nodes, and containers tend to have IDs
> that are either UUID or have a Hash (like SHA256) as identifier.
You're helping me prove my reason #2.
> The problem here is two-fold:
>
> a) Your auditing requires some mapping to be useful outside of the
> system.
> If you aggreggate audit logs outside of the system or you want to
> correlate the system audit logs with other components dealing with
> containers, now you need a place where you provide a mapping from your
> audit u64 to the ID a container has in the rest of the system.
Yep, see my reason #2. I want us to have something that "works" for a
single system as well as something that can be leveraged by higher
level tools for large networks of machines.
I realize it's easy, and tempting, to expand the scope of this effort;
but if we are to have any success it is only going to be through some
discipline. We need to focus on a small solution which addresses the
basic needs and hopefully remains flexible enough for any potential
expansion while staying palatable to the audit folks and the general
kernel community.
> b) Now you need a mapping of some sort. The simplest way a container
> orchestrator can go about this is to just use the UUID or Hash
> representing their view of the container, truncate it to a u64 and use
> that for Audit. This means there are some chances there will be a
> collision and a duplicate u64 ID will be used by the orchestrator as
> the container ID. What happen in that case ?
That is a design decision left to the different container orchestrators.
--
paul moore
www.paul-moore.com
^ permalink raw reply [flat|nested] 35+ messages in thread* Re: RFC(V3): Audit Kernel Container IDs
2018-01-09 16:18 ` Simo Sorce
@ 2018-02-02 21:18 ` Paul Moore
-1 siblings, 0 replies; 35+ messages in thread
From: Paul Moore @ 2018-02-02 21:18 UTC (permalink / raw)
To: Simo Sorce
Cc: Richard Guy Briggs, cgroups-u79uwXL29TY76Z2rM5mHXA,
Linux Containers, Linux API, Linux Audit, Linux FS Devel,
Linux Kernel, Linux Network Development,
mszeredi-H+wXaHxf7aLQT0dZR+AlfA, jlayton-H+wXaHxf7aLQT0dZR+AlfA,
Carlos O'Donell, Al Viro, David Howells, Andy Lutomirski,
trondmy-7I+n7zu2hftEKMMhf/gKZA, Eric Paris, Serge E. Hallyn,
Eric W. Biederman
On Tue, Jan 9, 2018 at 11:18 AM, Simo Sorce <simo-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> On Tue, 2018-01-09 at 07:16 -0500, Richard Guy Briggs wrote:
...
>> Changelog:
>>
>> (Upstream V3)
>> - switch back to u64 (from pmoore, can be expanded to u128 in future if
>> need arises without breaking API. u32 was originally proposed, up to
>> c36 discussed)
>> - write-once, but children inherit audit container identifier and can
>> then still be written once
>> - switch to CAP_AUDIT_CONTROL
>> - group namespace actions together, auxilliary records to namespace
>> operations.
>>
>> (Upstream V2)
>> - switch from u64 to u128 UUID
>> - switch from "signal" and "trigger" to "register"
>> - restrict registration to single process or force all threads and
>> children into same container
>
> I am trying to understand the back and forth on the ID size.
I'm just now getting a chance to read Richard's latest draft, but I
wanted to comment on this quickly.
There are two main reasons for keeping this a 32 or 64 bit integer:
1) After the initial "be able to associate audit events with a
container" stage, we are going to look into supporting multiple audit
daemons on the system so that you could run an audit daemon inside a
container and it would collect events generated by the container
(we're tentatively calling this "phase 2", feel free to insert your
own "magic happens" joke). There are a lot things that need to happen
in phase two, one of these things is the addition of an audit event
routing mechanism that will send audit records to the right audit
daemons (the "host" daemon will always see everything), in order to do
this we will need to be able to quickly compare audit container IDs,
this means an integer.
2) Whatever we pick for an audit container ID it is going to be wrong
for at least one container orchestrator. There is no "one" solution
here, so we are providing a small and flexible mechanism that higher
level orchestrators can use to provide a more complete solution.
> >From an orchestrator POV anything that requires tracking a node
> specific ID is not ideal.
>
> Orchestrators tend to span many nodes, and containers tend to have IDs
> that are either UUID or have a Hash (like SHA256) as identifier.
You're helping me prove my reason #2.
> The problem here is two-fold:
>
> a) Your auditing requires some mapping to be useful outside of the
> system.
> If you aggreggate audit logs outside of the system or you want to
> correlate the system audit logs with other components dealing with
> containers, now you need a place where you provide a mapping from your
> audit u64 to the ID a container has in the rest of the system.
Yep, see my reason #2. I want us to have something that "works" for a
single system as well as something that can be leveraged by higher
level tools for large networks of machines.
I realize it's easy, and tempting, to expand the scope of this effort;
but if we are to have any success it is only going to be through some
discipline. We need to focus on a small solution which addresses the
basic needs and hopefully remains flexible enough for any potential
expansion while staying palatable to the audit folks and the general
kernel community.
> b) Now you need a mapping of some sort. The simplest way a container
> orchestrator can go about this is to just use the UUID or Hash
> representing their view of the container, truncate it to a u64 and use
> that for Audit. This means there are some chances there will be a
> collision and a duplicate u64 ID will be used by the orchestrator as
> the container ID. What happen in that case ?
That is a design decision left to the different container orchestrators.
--
paul moore
www.paul-moore.com
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: RFC(V3): Audit Kernel Container IDs
@ 2018-02-02 21:18 ` Paul Moore
0 siblings, 0 replies; 35+ messages in thread
From: Paul Moore @ 2018-02-02 21:18 UTC (permalink / raw)
To: Simo Sorce
Cc: Richard Guy Briggs, cgroups, Linux Containers, Linux API,
Linux Audit, Linux FS Devel, Linux Kernel,
Linux Network Development, mszeredi, jlayton, Carlos O'Donell,
Al Viro, David Howells, Andy Lutomirski, trondmy, Eric Paris,
Serge E. Hallyn, Eric W. Biederman
On Tue, Jan 9, 2018 at 11:18 AM, Simo Sorce <simo@redhat.com> wrote:
> On Tue, 2018-01-09 at 07:16 -0500, Richard Guy Briggs wrote:
...
>> Changelog:
>>
>> (Upstream V3)
>> - switch back to u64 (from pmoore, can be expanded to u128 in future if
>> need arises without breaking API. u32 was originally proposed, up to
>> c36 discussed)
>> - write-once, but children inherit audit container identifier and can
>> then still be written once
>> - switch to CAP_AUDIT_CONTROL
>> - group namespace actions together, auxilliary records to namespace
>> operations.
>>
>> (Upstream V2)
>> - switch from u64 to u128 UUID
>> - switch from "signal" and "trigger" to "register"
>> - restrict registration to single process or force all threads and
>> children into same container
>
> I am trying to understand the back and forth on the ID size.
I'm just now getting a chance to read Richard's latest draft, but I
wanted to comment on this quickly.
There are two main reasons for keeping this a 32 or 64 bit integer:
1) After the initial "be able to associate audit events with a
container" stage, we are going to look into supporting multiple audit
daemons on the system so that you could run an audit daemon inside a
container and it would collect events generated by the container
(we're tentatively calling this "phase 2", feel free to insert your
own "magic happens" joke). There are a lot things that need to happen
in phase two, one of these things is the addition of an audit event
routing mechanism that will send audit records to the right audit
daemons (the "host" daemon will always see everything), in order to do
this we will need to be able to quickly compare audit container IDs,
this means an integer.
2) Whatever we pick for an audit container ID it is going to be wrong
for at least one container orchestrator. There is no "one" solution
here, so we are providing a small and flexible mechanism that higher
level orchestrators can use to provide a more complete solution.
> >From an orchestrator POV anything that requires tracking a node
> specific ID is not ideal.
>
> Orchestrators tend to span many nodes, and containers tend to have IDs
> that are either UUID or have a Hash (like SHA256) as identifier.
You're helping me prove my reason #2.
> The problem here is two-fold:
>
> a) Your auditing requires some mapping to be useful outside of the
> system.
> If you aggreggate audit logs outside of the system or you want to
> correlate the system audit logs with other components dealing with
> containers, now you need a place where you provide a mapping from your
> audit u64 to the ID a container has in the rest of the system.
Yep, see my reason #2. I want us to have something that "works" for a
single system as well as something that can be leveraged by higher
level tools for large networks of machines.
I realize it's easy, and tempting, to expand the scope of this effort;
but if we are to have any success it is only going to be through some
discipline. We need to focus on a small solution which addresses the
basic needs and hopefully remains flexible enough for any potential
expansion while staying palatable to the audit folks and the general
kernel community.
> b) Now you need a mapping of some sort. The simplest way a container
> orchestrator can go about this is to just use the UUID or Hash
> representing their view of the container, truncate it to a u64 and use
> that for Audit. This means there are some chances there will be a
> collision and a duplicate u64 ID will be used by the orchestrator as
> the container ID. What happen in that case ?
That is a design decision left to the different container orchestrators.
--
paul moore
www.paul-moore.com
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: RFC(V3): Audit Kernel Container IDs
[not found] ` <20180109121620.wi7dq2423ugsraqv-bcJWsdo4jJjeVoXN4CMphl7TgLCtbB0G@public.gmane.org>
2018-01-09 16:18 ` Simo Sorce
@ 2018-01-10 1:05 ` Eric W. Biederman
2018-01-10 1:05 ` Eric W. Biederman
2018-02-02 22:05 ` Paul Moore
3 siblings, 0 replies; 35+ messages in thread
From: Eric W. Biederman @ 2018-01-10 1:05 UTC (permalink / raw)
To: Richard Guy Briggs
Cc: Simo Sorce, David Howells, cgroups-u79uwXL29TY76Z2rM5mHXA,
jlayton-H+wXaHxf7aLQT0dZR+AlfA, trondmy-7I+n7zu2hftEKMMhf/gKZA,
Steve Grubb, mszeredi-H+wXaHxf7aLQT0dZR+AlfA, Madz Car, Al Viro,
Andy Lutomirski, Eric Paris, Carlos O'Donell, Linux API,
Linux Containers, Daniel Walsh, Linux Kernel, Paul Moore,
Linux Audit, Linux Network Development, Linux FS Devel
Please let's have a description of the problem you are trying to solve.
A proposed solution without talking about the problem space is useless.
Any proposed solution could potentially work.
I know to these exist. There is motivation for your work.
What is the motivation?
What problem are you trying to solve?
In particular what information are you trying to get into logs that you
can not get into the logs today?
I am going to try to give this the attention it deserves but right now I
am having to deal with half thought out patches for information leaks
from speculative code paths, so I won't be able to give this much
attention for a little bit.
Eric
^ permalink raw reply [flat|nested] 35+ messages in thread* Re: RFC(V3): Audit Kernel Container IDs
2018-01-09 12:16 ` Richard Guy Briggs
@ 2018-01-10 1:05 ` Eric W. Biederman
-1 siblings, 0 replies; 35+ messages in thread
From: Eric W. Biederman @ 2018-01-10 1:05 UTC (permalink / raw)
To: Richard Guy Briggs
Cc: cgroups-u79uwXL29TY76Z2rM5mHXA, Linux Containers, Linux API,
Linux Audit, Linux FS Devel, Linux Kernel,
Linux Network Development, Simo Sorce, Carlos O'Donell,
Aristeu Rozanski, David Howells, Eric Paris, Daniel Walsh,
jlayton-H+wXaHxf7aLQT0dZR+AlfA, Andy Lutomirski,
mszeredi-H+wXaHxf7aLQT0dZR+AlfA, Paul Moore, Serge E. Hallyn,
Steve Grubb, trondmy-7I+n7zu2hftEKMMhf/gKZA, Al Viro, Ma
Please let's have a description of the problem you are trying to solve.
A proposed solution without talking about the problem space is useless.
Any proposed solution could potentially work.
I know to these exist. There is motivation for your work.
What is the motivation?
What problem are you trying to solve?
In particular what information are you trying to get into logs that you
can not get into the logs today?
I am going to try to give this the attention it deserves but right now I
am having to deal with half thought out patches for information leaks
from speculative code paths, so I won't be able to give this much
attention for a little bit.
Eric
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: RFC(V3): Audit Kernel Container IDs
@ 2018-01-10 1:05 ` Eric W. Biederman
0 siblings, 0 replies; 35+ messages in thread
From: Eric W. Biederman @ 2018-01-10 1:05 UTC (permalink / raw)
To: Richard Guy Briggs
Cc: cgroups, Linux Containers, Linux API, Linux Audit, Linux FS Devel,
Linux Kernel, Linux Network Development, Simo Sorce,
Carlos O'Donell, Aristeu Rozanski, David Howells, Eric Paris,
Daniel Walsh, jlayton, Andy Lutomirski, mszeredi, Paul Moore,
Serge E. Hallyn, Steve Grubb, trondmy, Al Viro, Madz Car
Please let's have a description of the problem you are trying to solve.
A proposed solution without talking about the problem space is useless.
Any proposed solution could potentially work.
I know to these exist. There is motivation for your work.
What is the motivation?
What problem are you trying to solve?
In particular what information are you trying to get into logs that you
can not get into the logs today?
I am going to try to give this the attention it deserves but right now I
am having to deal with half thought out patches for information leaks
from speculative code paths, so I won't be able to give this much
attention for a little bit.
Eric
^ permalink raw reply [flat|nested] 35+ messages in thread
[parent not found: <87k1wqcykw.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>]
* Re: RFC(V3): Audit Kernel Container IDs
[not found] ` <87k1wqcykw.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
@ 2018-01-10 6:54 ` Richard Guy Briggs
0 siblings, 0 replies; 35+ messages in thread
From: Richard Guy Briggs @ 2018-01-10 6:54 UTC (permalink / raw)
To: Eric W. Biederman
Cc: Simo Sorce, David Howells, cgroups-u79uwXL29TY76Z2rM5mHXA,
jlayton-H+wXaHxf7aLQT0dZR+AlfA, trondmy-7I+n7zu2hftEKMMhf/gKZA,
Steve Grubb, mszeredi-H+wXaHxf7aLQT0dZR+AlfA, Madz Car, Al Viro,
Andy Lutomirski, Eric Paris, Carlos O'Donell, Linux API,
Linux Containers, Daniel Walsh, Linux Kernel, Paul Moore,
Linux Audit, Linux Network Development, Linux FS Devel
On 2018-01-09 19:05, Eric W. Biederman wrote:
> Please let's have a description of the problem you are trying to solve.
I thought the first sentence of the second paragraph summed it up rather
well.
Here are the elaborated motivations:
- Filter unwanted, irrelevant or unimportant messages before they fill
queue so important messages don't get lost. This is a certification
requirement.
- Make security claims about containers, require tracking of actions
within those containers to ensure compliance with established security
policies.
- Route messages from events to local audit daemon instance or host
audit daemon instance
- Tried nsIDs, but insufficient for efficient filtering, routing,
tracking
> A proposed solution without talking about the problem space is useless.
> Any proposed solution could potentially work.
>
> I know to these exist. There is motivation for your work.
> What is the motivation?
> What problem are you trying to solve?
>
> In particular what information are you trying to get into logs that you
> can not get into the logs today?
>
> I am going to try to give this the attention it deserves but right now I
> am having to deal with half thought out patches for information leaks
> from speculative code paths, so I won't be able to give this much
> attention for a little bit.
>
> Eric
- RGB
--
Richard Guy Briggs <rgb-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: RFC(V3): Audit Kernel Container IDs
2018-01-10 1:05 ` Eric W. Biederman
@ 2018-01-10 6:54 ` Richard Guy Briggs
-1 siblings, 0 replies; 35+ messages in thread
From: Richard Guy Briggs @ 2018-01-10 6:54 UTC (permalink / raw)
To: Eric W. Biederman
Cc: cgroups, Linux Containers, Linux API, Linux Audit, Linux FS Devel,
Linux Kernel, Linux Network Development, Simo Sorce,
Carlos O'Donell, Aristeu Rozanski, David Howells, Eric Paris,
Daniel Walsh, jlayton, Andy Lutomirski, mszeredi, Paul Moore,
Serge E. Hallyn, Steve Grubb, trondmy, Al Viro, Ma
On 2018-01-09 19:05, Eric W. Biederman wrote:
> Please let's have a description of the problem you are trying to solve.
I thought the first sentence of the second paragraph summed it up rather
well.
Here are the elaborated motivations:
- Filter unwanted, irrelevant or unimportant messages before they fill
queue so important messages don't get lost. This is a certification
requirement.
- Make security claims about containers, require tracking of actions
within those containers to ensure compliance with established security
policies.
- Route messages from events to local audit daemon instance or host
audit daemon instance
- Tried nsIDs, but insufficient for efficient filtering, routing,
tracking
> A proposed solution without talking about the problem space is useless.
> Any proposed solution could potentially work.
>
> I know to these exist. There is motivation for your work.
> What is the motivation?
> What problem are you trying to solve?
>
> In particular what information are you trying to get into logs that you
> can not get into the logs today?
>
> I am going to try to give this the attention it deserves but right now I
> am having to deal with half thought out patches for information leaks
> from speculative code paths, so I won't be able to give this much
> attention for a little bit.
>
> Eric
- RGB
--
Richard Guy Briggs <rgb@redhat.com>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: RFC(V3): Audit Kernel Container IDs
@ 2018-01-10 6:54 ` Richard Guy Briggs
0 siblings, 0 replies; 35+ messages in thread
From: Richard Guy Briggs @ 2018-01-10 6:54 UTC (permalink / raw)
To: Eric W. Biederman
Cc: cgroups, Linux Containers, Linux API, Linux Audit, Linux FS Devel,
Linux Kernel, Linux Network Development, Simo Sorce,
Carlos O'Donell, Aristeu Rozanski, David Howells, Eric Paris,
Daniel Walsh, jlayton, Andy Lutomirski, mszeredi, Paul Moore,
Serge E. Hallyn, Steve Grubb, trondmy, Al Viro, Madz Car
On 2018-01-09 19:05, Eric W. Biederman wrote:
> Please let's have a description of the problem you are trying to solve.
I thought the first sentence of the second paragraph summed it up rather
well.
Here are the elaborated motivations:
- Filter unwanted, irrelevant or unimportant messages before they fill
queue so important messages don't get lost. This is a certification
requirement.
- Make security claims about containers, require tracking of actions
within those containers to ensure compliance with established security
policies.
- Route messages from events to local audit daemon instance or host
audit daemon instance
- Tried nsIDs, but insufficient for efficient filtering, routing,
tracking
> A proposed solution without talking about the problem space is useless.
> Any proposed solution could potentially work.
>
> I know to these exist. There is motivation for your work.
> What is the motivation?
> What problem are you trying to solve?
>
> In particular what information are you trying to get into logs that you
> can not get into the logs today?
>
> I am going to try to give this the attention it deserves but right now I
> am having to deal with half thought out patches for information leaks
> from speculative code paths, so I won't be able to give this much
> attention for a little bit.
>
> Eric
- RGB
--
Richard Guy Briggs <rgb@redhat.com>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: RFC(V3): Audit Kernel Container IDs
[not found] ` <20180109121620.wi7dq2423ugsraqv-bcJWsdo4jJjeVoXN4CMphl7TgLCtbB0G@public.gmane.org>
` (2 preceding siblings ...)
2018-01-10 1:05 ` Eric W. Biederman
@ 2018-02-02 22:05 ` Paul Moore
3 siblings, 0 replies; 35+ messages in thread
From: Paul Moore @ 2018-02-02 22:05 UTC (permalink / raw)
To: Richard Guy Briggs
Cc: mszeredi-H+wXaHxf7aLQT0dZR+AlfA, Eric W. Biederman, Simo Sorce,
jlayton-H+wXaHxf7aLQT0dZR+AlfA, Carlos O'Donell, Linux API,
Linux Containers, Linux Kernel, Eric Paris, David Howells,
Linux Audit, Al Viro, Andy Lutomirski, Linux Network Development,
Linux FS Devel, cgroups-u79uwXL29TY76Z2rM5mHXA,
trondmy-7I+n7zu2hftEKMMhf/gKZA
On Tue, Jan 9, 2018 at 7:16 AM, Richard Guy Briggs <rgb-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> Containers are a userspace concept. The kernel knows nothing of them.
>
> The Linux audit system needs a way to be able to track the container
> provenance of events and actions. Audit needs the kernel's help to do
> this.
Two small comments below, but I tend to think we are at a point where
you can start cobbling together some prototype/RFC patches. Surely
there are going to be a few changes, and new comments, that come out
once we see an initial implementation so let's see what those are.
> The registration is a u64 representing the audit container identifier
> written to a special file in a pseudo filesystem (proc, since PID tree
> already exists) representing a process that will become a parent process
> in that container. This write might place restrictions on mount
> namespaces required to define a container, or at least careful checking
> of namespaces in the kernel to verify permissions of the orchestrator so
> it can't change its own container ID. A bind mount of nsfs may be
> necessary in the container orchestrator's mount namespace. This write
> can only happen once per process.
>
> Note: The justification for using a u64 is that it minimizes the
> information printed in every audit record, reducing bandwidth and limits
> comparisons to a single u64 which will be faster and less error-prone.
I know Steve generally worries about audit record size, which is a
perfectly valid concern in this case, I also worry about the
additional overhead when we start routing audit records to multiple
audit daemons (see my other emails in this thread).
> ...
> When a container ceases to exist because the last process in that
> container has exited log the fact to balance the registration action.
> (This is likely needed for certification accountability.)
On the "container ceases to exist" point, I expect this "container
dead" message to come from the orchestrator and not the kernel itself
(I don't want the kernel to have to handle that level of bookkeeping).
I imagine this should be similar to what is done for VM auditing with
libvirt.
--
paul moore
www.paul-moore.com
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: RFC(V3): Audit Kernel Container IDs
2018-01-09 12:16 ` Richard Guy Briggs
(?)
(?)
@ 2018-02-02 22:05 ` Paul Moore
2018-02-03 1:57 ` Serge E. Hallyn
[not found] ` <CAHC9VhQ5ciUZDhrsb6S4YxwuzQEY-ra2RTDceWXOdjHEBoZ0BQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
-1 siblings, 2 replies; 35+ messages in thread
From: Paul Moore @ 2018-02-02 22:05 UTC (permalink / raw)
To: Richard Guy Briggs
Cc: cgroups, Linux Containers, Linux API, Linux Audit, Linux FS Devel,
Linux Kernel, Linux Network Development, mszeredi,
Andy Lutomirski, jlayton, Carlos O'Donell, Al Viro,
David Howells, Simo Sorce, trondmy, Eric Paris, Serge E. Hallyn,
Eric W. Biederman
On Tue, Jan 9, 2018 at 7:16 AM, Richard Guy Briggs <rgb@redhat.com> wrote:
> Containers are a userspace concept. The kernel knows nothing of them.
>
> The Linux audit system needs a way to be able to track the container
> provenance of events and actions. Audit needs the kernel's help to do
> this.
Two small comments below, but I tend to think we are at a point where
you can start cobbling together some prototype/RFC patches. Surely
there are going to be a few changes, and new comments, that come out
once we see an initial implementation so let's see what those are.
> The registration is a u64 representing the audit container identifier
> written to a special file in a pseudo filesystem (proc, since PID tree
> already exists) representing a process that will become a parent process
> in that container. This write might place restrictions on mount
> namespaces required to define a container, or at least careful checking
> of namespaces in the kernel to verify permissions of the orchestrator so
> it can't change its own container ID. A bind mount of nsfs may be
> necessary in the container orchestrator's mount namespace. This write
> can only happen once per process.
>
> Note: The justification for using a u64 is that it minimizes the
> information printed in every audit record, reducing bandwidth and limits
> comparisons to a single u64 which will be faster and less error-prone.
I know Steve generally worries about audit record size, which is a
perfectly valid concern in this case, I also worry about the
additional overhead when we start routing audit records to multiple
audit daemons (see my other emails in this thread).
> ...
> When a container ceases to exist because the last process in that
> container has exited log the fact to balance the registration action.
> (This is likely needed for certification accountability.)
On the "container ceases to exist" point, I expect this "container
dead" message to come from the orchestrator and not the kernel itself
(I don't want the kernel to have to handle that level of bookkeeping).
I imagine this should be similar to what is done for VM auditing with
libvirt.
--
paul moore
www.paul-moore.com
^ permalink raw reply [flat|nested] 35+ messages in thread* Re: RFC(V3): Audit Kernel Container IDs
2018-02-02 22:05 ` Paul Moore
@ 2018-02-03 1:57 ` Serge E. Hallyn
[not found] ` <CAHC9VhQ5ciUZDhrsb6S4YxwuzQEY-ra2RTDceWXOdjHEBoZ0BQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
1 sibling, 0 replies; 35+ messages in thread
From: Serge E. Hallyn @ 2018-02-03 1:57 UTC (permalink / raw)
To: Paul Moore
Cc: Richard Guy Briggs, cgroups, Linux Containers, Linux API,
Linux Audit, Linux FS Devel, Linux Kernel,
Linux Network Development, mszeredi, Andy Lutomirski, jlayton,
Carlos O'Donell, Al Viro, David Howells, Simo Sorce, trondmy,
Eric Paris, Serge E. Hallyn, Eric W. Biederman
On Fri, Feb 02, 2018 at 05:05:22PM -0500, Paul Moore wrote:
> On Tue, Jan 9, 2018 at 7:16 AM, Richard Guy Briggs <rgb@redhat.com> wrote:
> > Containers are a userspace concept. The kernel knows nothing of them.
> >
> > The Linux audit system needs a way to be able to track the container
> > provenance of events and actions. Audit needs the kernel's help to do
> > this.
>
> Two small comments below, but I tend to think we are at a point where
> you can start cobbling together some prototype/RFC patches. Surely
Agreed.
LGTM.
> there are going to be a few changes, and new comments, that come out
> once we see an initial implementation so let's see what those are.
thanks,
-serge
^ permalink raw reply [flat|nested] 35+ messages in thread[parent not found: <CAHC9VhQ5ciUZDhrsb6S4YxwuzQEY-ra2RTDceWXOdjHEBoZ0BQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: RFC(V3): Audit Kernel Container IDs
[not found] ` <CAHC9VhQ5ciUZDhrsb6S4YxwuzQEY-ra2RTDceWXOdjHEBoZ0BQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2018-02-03 1:57 ` Serge E. Hallyn
0 siblings, 0 replies; 35+ messages in thread
From: Serge E. Hallyn @ 2018-02-03 1:57 UTC (permalink / raw)
To: Paul Moore
Cc: mszeredi-H+wXaHxf7aLQT0dZR+AlfA, Eric W. Biederman, Simo Sorce,
jlayton-H+wXaHxf7aLQT0dZR+AlfA, Linux API, Linux Containers,
Linux Kernel, Eric Paris, David Howells, Carlos O'Donell,
Linux Audit, Al Viro, Andy Lutomirski, Linux Network Development,
Linux FS Devel, cgroups-u79uwXL29TY76Z2rM5mHXA,
trondmy-7I+n7zu2hftEKMMhf/gKZA
On Fri, Feb 02, 2018 at 05:05:22PM -0500, Paul Moore wrote:
> On Tue, Jan 9, 2018 at 7:16 AM, Richard Guy Briggs <rgb-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> > Containers are a userspace concept. The kernel knows nothing of them.
> >
> > The Linux audit system needs a way to be able to track the container
> > provenance of events and actions. Audit needs the kernel's help to do
> > this.
>
> Two small comments below, but I tend to think we are at a point where
> you can start cobbling together some prototype/RFC patches. Surely
Agreed.
LGTM.
> there are going to be a few changes, and new comments, that come out
> once we see an initial implementation so let's see what those are.
thanks,
-serge
^ permalink raw reply [flat|nested] 35+ messages in thread
* RFC(V3): Audit Kernel Container IDs
@ 2018-01-09 12:16 Richard Guy Briggs
0 siblings, 0 replies; 35+ messages in thread
From: Richard Guy Briggs @ 2018-01-09 12:16 UTC (permalink / raw)
To: cgroups-u79uwXL29TY76Z2rM5mHXA, Linux Containers, Linux API,
Linux Audit, Linux FS Devel, Linux Kernel,
Linux Network Development
Cc: mszeredi-H+wXaHxf7aLQT0dZR+AlfA, Steve Grubb, Andy Lutomirski,
jlayton-H+wXaHxf7aLQT0dZR+AlfA, Carlos O'Donell, Daniel Walsh,
Paul Moore, Al Viro, David Howells, Madz Car, Simo Sorce,
trondmy-7I+n7zu2hftEKMMhf/gKZA, Eric Paris, Eric W. Biederman
Containers are a userspace concept. The kernel knows nothing of them.
The Linux audit system needs a way to be able to track the container
provenance of events and actions. Audit needs the kernel's help to do
this.
Since the concept of a container is entirely a userspace concept, a
registration from the userspace container orchestration system initiates
this. This will define a point in time and a set of resources
associated with a particular container with an audit container
identifier.
The registration is a u64 representing the audit container identifier
written to a special file in a pseudo filesystem (proc, since PID tree
already exists) representing a process that will become a parent process
in that container. This write might place restrictions on mount
namespaces required to define a container, or at least careful checking
of namespaces in the kernel to verify permissions of the orchestrator so
it can't change its own container ID. A bind mount of nsfs may be
necessary in the container orchestrator's mount namespace. This write
can only happen once per process.
Note: The justification for using a u64 is that it minimizes the
information printed in every audit record, reducing bandwidth and limits
comparisons to a single u64 which will be faster and less error-prone.
Require CAP_AUDIT_CONTROL to be able to carry out the registration. At
that time, record the target container's user-supplied audit container
identifier along with a target container's parent process (which may
become the target container's "init" process) process ID (referenced
from the initial PID namespace) in a new record AUDIT_CONTAINER with a
qualifying op=$action field.
Issue a new auxilliary record AUDIT_CONTAINER_INFO for each valid
container ID present on an auditable action or event.
Forked and cloned processes inherit their parent's audit container
identifier, referenced in the process' task_struct. Since the audit
container identifier is inherited rather than written, it can still be
written once. This will prevent tampering while allowing nesting.
(This can be implemented with an internal settable flag upon
registration that does not get copied across a fork/clone.)
Mimic setns(2) and return an error if the process has already initiated
threading or forked since this registration should happen before the
process execution is started by the orchestrator and hence should not
yet have any threads or children. If this is deemed overly restrictive,
switch all of the target's threads and children to the new containerID.
Trust the orchestrator to judiciously use and restrict CAP_AUDIT_CONTROL.
When a container ceases to exist because the last process in that
container has exited log the fact to balance the registration action.
(This is likely needed for certification accountability.)
At this point it appears unnecessary to add a container session
identifier since this is all tracked from loginuid and sessionid to
communicate with the container orchestrator to spawn an additional
session into an existing container which would be logged. It can be
added at a later date without breaking API should it be deemed
necessary.
The following namespace logging actions are not needed for certification
purposes at this point, but are helpful for tracking namespace activity.
These are auxilliary records that are associated with namespace
manipulation syscalls unshare(2), clone(2) and setns(2), so the records
will only show up if explicit syscall rules have been added to document
this activity.
Log the creation of every namespace, inheriting/adding its spawning
process' audit container identifier(s), if applicable. Include the
spawning and spawned namespace IDs (device and inode number tuples).
[AUDIT_NS_CREATE, AUDIT_NS_DESTROY] [clone(2), unshare(2), setns(2)]
Note: At this point it appears only network namespaces may need to track
container IDs apart from processes since incoming packets may cause an
auditable event before being associated with a process. Since a
namespace can be shared by processes in different containers, the
namespace will need to track all containers to which it has been
assigned.
Upon registration, the target process' namespace IDs (in the form of a
nsfs device number and inode number tuple) will be recorded in an
AUDIT_NS_INFO auxilliary record.
Log the destruction of every namespace that is no longer used by any
process, including the namespace IDs (device and inode number tuples).
[AUDIT_NS_DESTROY] [process exit, unshare(2), setns(2)]
Issue a new auxilliary record AUDIT_NS_CHANGE listing (opt: op=$action)
the parent and child namespace IDs for any changes to a process'
namespaces. [setns(2)]
Note: It may be possible to combine AUDIT_NS_* record formats and
distinguish them with an op=$action field depending on the fields
required for each message type.
The audit container identifier will need to be reaped from all
implicated namespaces upon the destruction of a container.
This namespace information adds supporting information for tracking
events not attributable to specific processes.
Changelog:
(Upstream V3)
- switch back to u64 (from pmoore, can be expanded to u128 in future if
need arises without breaking API. u32 was originally proposed, up to
c36 discussed)
- write-once, but children inherit audit container identifier and can
then still be written once
- switch to CAP_AUDIT_CONTROL
- group namespace actions together, auxilliary records to namespace
operations.
(Upstream V2)
- switch from u64 to u128 UUID
- switch from "signal" and "trigger" to "register"
- restrict registration to single process or force all threads and
children into same container
- RGB
--
Richard Guy Briggs <rgb-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635
^ permalink raw reply [flat|nested] 35+ messages in thread
end of thread, other threads:[~2018-02-05 13:47 UTC | newest]
Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-01-09 12:16 RFC(V3): Audit Kernel Container IDs Richard Guy Briggs
2018-01-09 12:16 ` Richard Guy Briggs
[not found] ` <20180109121620.wi7dq2423ugsraqv-bcJWsdo4jJjeVoXN4CMphl7TgLCtbB0G@public.gmane.org>
2018-01-09 16:18 ` Simo Sorce
2018-01-09 16:18 ` Simo Sorce
2018-01-09 16:18 ` Simo Sorce
2018-01-10 7:00 ` Richard Guy Briggs
2018-01-10 7:00 ` Richard Guy Briggs
2018-01-10 7:00 ` Richard Guy Briggs
2018-02-02 21:24 ` Paul Moore
[not found] ` <CAHC9VhQ=hX55e7ftkVQCogTZTcdSm3rm-+YNOgWomabbXV_sKg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-02-02 22:19 ` Simo Sorce
2018-02-02 22:19 ` Simo Sorce
2018-02-02 22:19 ` Simo Sorce
[not found] ` <1517609946.13097.161.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-02-02 23:24 ` Paul Moore
2018-02-02 23:24 ` Paul Moore
2018-02-03 19:05 ` Casey Schaufler
[not found] ` <CAHC9VhTg0ocArSek03A-XrbjgR4iGeysxXWaA8HknKBD_5ZRkA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-02-03 19:05 ` Casey Schaufler
2018-02-05 13:47 ` Simo Sorce
2018-02-05 13:47 ` Simo Sorce
2018-02-05 13:47 ` Simo Sorce
[not found] ` <20180110070011.l4rcdcwb27witfem-bcJWsdo4jJjeVoXN4CMphl7TgLCtbB0G@public.gmane.org>
2018-02-02 21:24 ` Paul Moore
[not found] ` <1515514736.3239.10.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-01-10 7:00 ` Richard Guy Briggs
2018-02-02 21:18 ` Paul Moore
2018-02-02 21:18 ` Paul Moore
2018-02-02 21:18 ` Paul Moore
2018-01-10 1:05 ` Eric W. Biederman
2018-01-10 1:05 ` Eric W. Biederman
2018-01-10 1:05 ` Eric W. Biederman
[not found] ` <87k1wqcykw.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2018-01-10 6:54 ` Richard Guy Briggs
2018-01-10 6:54 ` Richard Guy Briggs
2018-01-10 6:54 ` Richard Guy Briggs
2018-02-02 22:05 ` Paul Moore
2018-02-02 22:05 ` Paul Moore
2018-02-03 1:57 ` Serge E. Hallyn
[not found] ` <CAHC9VhQ5ciUZDhrsb6S4YxwuzQEY-ra2RTDceWXOdjHEBoZ0BQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-02-03 1:57 ` Serge E. Hallyn
-- strict thread matches above, loose matches on Subject: below --
2018-01-09 12:16 Richard Guy Briggs
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.