public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Serge Hallyn <serge.hallyn@ubuntu.com>
To: Richard Guy Briggs <rgb@redhat.com>
Cc: "Serge E. Hallyn" <serge@hallyn.com>,
	linux-audit@redhat.com, linux-kernel@vger.kernel.org,
	containers@lists.linux-foundation.org, eparis@redhat.com,
	sgrubb@redhat.com, ebiederm@xmission.com
Subject: Re: [PATCH 0/2] namespaces: log namespaces per task
Date: Fri, 2 May 2014 21:00:44 +0000	[thread overview]
Message-ID: <20140502210044.GF2631@ubuntumail> (raw)
In-Reply-To: <20140502142851.GC24111@madcap2.tricolour.ca>

Quoting Richard Guy Briggs (rgb@redhat.com):
> On 14/05/02, Serge E. Hallyn wrote:
> > Quoting Richard Guy Briggs (rgb@redhat.com):
> > > I saw no replies to my questions when I replied a year after Aris' posting, so
> > > I don't know if it was ignored or got lost in stale threads:
> > >         https://www.redhat.com/archives/linux-audit/2013-March/msg00020.html
> > >         https://www.redhat.com/archives/linux-audit/2013-March/msg00033.html
> > > 	(https://lists.linux-foundation.org/pipermail/containers/2013-March/032063.html)
> > >         https://www.redhat.com/archives/linux-audit/2014-January/msg00180.html
> > > 
> > > I've tried to answer a number of questions that were raised in that thread.
> > > 
> > > The goal is not quite identical to Aris' patchset.
> > > 
> > > The purpose is to track namespaces in use by logged processes from the
> > > perspective of init_*_ns.  The first patch defines a function to list them.
> > > The second patch provides an example of usage for audit_log_task_info() which
> > > is used by syscall audits, among others.  audit_log_task() and
> > > audit_common_recv_message() would be other potential use cases.
> > > 
> > > Use a serial number per namespace (unique across one boot of one kernel)
> > > instead of the inode number (which is claimed to have had the right to change
> > > reserved and is not necessarily unique if there is more than one proc fs).  It
> > > could be argued that the inode numbers have now become a defacto interface and
> > > can't change now, but I'm proposing this approach to see if this helps address
> > > some of the objections to the earlier patchset.
> > > 
> > > There could also have messages added to track the creation and the destruction
> > > of namespaces, listing the parent for hierarchical namespaces such as pidns,
> > > userns, and listing other ids for non-hierarchical namespaces, as well as other
> > > information to help identify a namespace.
> > > 
> > > There has been some progress made for audit in net namespaces and pid
> > > namespaces since this previous thread.  net namespaces are now served as peers
> > > by one auditd in the init_net namespace with processes in a non-init_net
> > > namespace being able to write records if they are in the init_user_ns and have
> > > CAP_AUDIT_WRITE.  Processes in a non-init_pid_ns can now similarly write
> > > records.  As for CAP_AUDIT_READ, I just posted a patchset to check capabilities
> > > of userspace processes that try to join netlink broadcast groups.
> > > 
> > > 
> > > Questions:
> > > Is there a way to link serial numbers of namespaces involved in migration of a
> > > container to another kernel?  (I had a brief look at CRIU.)  Is there a unique
> > > identifier for each running instance of a kernel?  Or at least some identifier
> > > within the container migration realm?
> > 
> > Eric Biederman has always been adamantly opposed to adding new namespaces
> > of namespaces, so the fact that you're asking this question concerns me.
> 
> I have seen that position and I don't fully understand the justification
> for it other than added complexity.
> 
> One way that occured to me to be able to identify a kernel instance was
> to look at CPU serial numbers or other CPU entity intended to be
> globally unique, but that isn't universally available.

That's one issue, which is uniqueness of namespaces cross-machines.

But it gets worse if we consider that after allowing in-container audit,
we'll have a nested container running, then have the parent container
migrated to another host (or just checkpointed and restarted);  Now the
nexted container's indexes will all be changed.  Is there any way audit
can track who's who after the migration?

That's not an indictment of the serial # approach, since (a) we don't
have in-container audit yet and (b) we don't have c/r/migration of nested
containers.  But it's worth considering whether we can solve the issue
with serial #s, and, if not, whether we can solve it with any other
approach.

I guess one approach to solve it would be to allow userspace to request
a next serial #.  Which will immediately lead us to a namespace of serial
#s (since the requested # might be lower than the last used one on the
new host).

As you've said inode #s for /proc/self/ns/* probably aren't sufficiently
unique, though perhaps we could attach a generation # for the sake of
audit.  Then after a c/r/migration the generation # may be different,
but we may have a better shot at at least using the same ino#.

> Another possibility was RTC reading at time of boot, but that isn't good
> enough either.
> 
> Both are dubious in VMs anyways.
> 
> > The way things are right now, since audit belongs to the init userns,
> > we can get away with saying if a container 'migrates', the new kernel
> > will see a different set of serials, and noone should care.  However,
> > if we're going to be allowing containers to have their own audit
> > namespace/layer/whatever, then this becomes more of a concern.
> 
> Having a container have its own audit daemon (partitionned appropriately
> in the kernel) would be a long-term goal.

Agreed, fwiw.

> > That said, I'll now look at the patches while pretending that problem
> > does not exist :)  If I ack, it'll be on correctness of the code, but
> > we'll still have to deal with this issue.
> 
> Getting some discussion about this migration challenge was a significant
> motivation for posting this patch, so I'm hoping others will weigh in.
> 
> Thanks for your review, Serge.
> 
> > > What additional events should list this information?
> > > 
> > > Does this present any kind of information leak?  Only CAP_AUDIT_CONTROL (and
> > > proposed CAP_AUDIT_READ) in init_user_ns can get to this information in the
> > > init namespace at the moment.
> > > 
> > > 
> > > Proposed output format:
> > > This differs slightly from Aristeu's patch because of the label conflict with
> > > "pid=" due to including it in existing records rather than it being a seperate
> > > record:
> > >         type=SYSCALL msg=audit(1398112249.996:65): arch=c000003e syscall=272 success=yes exit=0 a0=40000000 a1=ffffffffffffffff a2=0 a3=22 items=0 ppid=1 pid=566 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="(t-daemon)" exe="/usr/lib/systemd/systemd" mntns=5 netns=97 utsns=2 ipcns=1 pidns=4 userns=3 subj=system_u:system_r:init_t:s0 key=(null)
> > > 
> > > 
> > > Note: This set does not try to solve the non-init namespace audit messages and
> > > auditd problem yet.  That will come later, likely with additional auditd
> > > instances running in another namespace with a limited ability to influence the
> > > master auditd.  I echo Eric B's idea that messages destined for different
> > > namespaces would have to be tailored for that namespace with references that
> > > make sense (such as the right pid number reported to that pid namespace, and
> > > not leaking info about parents or peers).
> > > 
> > > 
> > > Richard Guy Briggs (2):
> > >   namespaces: give each namespace a serial number
> > >   audit: log namespace serial numbers
> > > 
> > >  fs/mount.h                     |    1 +
> > >  fs/namespace.c                 |    1 +
> > >  include/linux/audit.h          |    7 +++++++
> > >  include/linux/ipc_namespace.h  |    1 +
> > >  include/linux/nsproxy.h        |    8 ++++++++
> > >  include/linux/pid_namespace.h  |    1 +
> > >  include/linux/user_namespace.h |    1 +
> > >  include/linux/utsname.h        |    1 +
> > >  include/net/net_namespace.h    |    1 +
> > >  init/version.c                 |    1 +
> > >  ipc/msgutil.c                  |    1 +
> > >  ipc/namespace.c                |    2 ++
> > >  kernel/audit.c                 |   38 ++++++++++++++++++++++++++++++++++++++
> > >  kernel/nsproxy.c               |   24 ++++++++++++++++++++++++
> > >  kernel/pid.c                   |    1 +
> > >  kernel/pid_namespace.c         |    2 ++
> > >  kernel/user.c                  |    1 +
> > >  kernel/user_namespace.c        |    2 ++
> > >  kernel/utsname.c               |    2 ++
> > >  net/core/net_namespace.c       |    4 +++-
> > >  20 files changed, 99 insertions(+), 1 deletions(-)
> > > 
> > > _______________________________________________
> > > Containers mailing list
> > > Containers@lists.linux-foundation.org
> > > https://lists.linuxfoundation.org/mailman/listinfo/containers
> 
> - RGB
> 
> --
> Richard Guy Briggs <rbriggs@redhat.com>
> Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red Hat
> Remote, Ottawa, Canada
> Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545

  reply	other threads:[~2014-05-02 21:00 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-22 18:12 [PATCH 0/2] namespaces: log namespaces per task Richard Guy Briggs
2014-04-22 18:12 ` [PATCH 1/2] namespaces: give each namespace a serial number Richard Guy Briggs
2014-05-01 22:51   ` Serge E. Hallyn
2014-05-02 14:15     ` Richard Guy Briggs
2014-05-02 20:50       ` Serge Hallyn
2014-04-22 18:12 ` [PATCH 2/2] audit: log namespace serial numbers Richard Guy Briggs
2014-05-01 23:01   ` Serge E. Hallyn
2014-05-01 22:32 ` [PATCH 0/2] namespaces: log namespaces per task Serge E. Hallyn
2014-05-02 14:28   ` Richard Guy Briggs
2014-05-02 21:00     ` Serge Hallyn [this message]
2014-05-05 21:29       ` Richard Guy Briggs
2014-05-05  9:23     ` Nicolas Dichtel
2014-05-06 21:15       ` Richard Guy Briggs
2014-05-07  9:35         ` Nicolas Dichtel
2014-05-03 21:58 ` James Bottomley
2014-05-05  3:48   ` Serge E. Hallyn
2014-05-05 21:48     ` Richard Guy Briggs
2014-05-05 21:51       ` James Bottomley
2014-05-05 22:11         ` Richard Guy Briggs
2014-05-05 22:24           ` James Bottomley
2014-05-05 22:27         ` Serge Hallyn
2014-05-05 22:30           ` James Bottomley
2014-05-05 22:36             ` Serge Hallyn
2014-05-05 23:23               ` James Bottomley
2014-05-06  3:27                 ` Serge Hallyn
2014-05-06  4:59                   ` James Bottomley
2014-05-06 14:50                     ` Serge Hallyn
2014-05-06 21:59                     ` Richard Guy Briggs
2014-05-06 12:35                 ` Nicolas Dichtel
2014-05-06 21:41                 ` Richard Guy Briggs
2014-05-06 23:57                   ` James Bottomley
2014-05-05 21:44   ` Richard Guy Briggs
2014-05-06  3:33     ` Serge Hallyn
2014-05-06 14:03       ` Richard Guy Briggs

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140502210044.GF2631@ubuntumail \
    --to=serge.hallyn@ubuntu.com \
    --cc=containers@lists.linux-foundation.org \
    --cc=ebiederm@xmission.com \
    --cc=eparis@redhat.com \
    --cc=linux-audit@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=rgb@redhat.com \
    --cc=serge@hallyn.com \
    --cc=sgrubb@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox