Linux-audit Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH 05/22] audit: Fix sleep in atomic
From: Jan Kara @ 2016-12-23 13:24 UTC (permalink / raw)
  To: Paul Moore
  Cc: Jan Kara, linux-fsdevel, Amir Goldstein, Lino Sanfilippo,
	Miklos Szeredi, linux-audit
In-Reply-To: <CAHC9VhSk+Y5rKpz-JgCZQSk1W3WPeCA9w7tk2HVx2NTeaJRs5g@mail.gmail.com>

On Thu 22-12-16 18:18:36, Paul Moore wrote:
> On Thu, Dec 22, 2016 at 4:15 AM, Jan Kara <jack@suse.cz> wrote:
> > Audit tree code was happily adding new notification marks while holding
> > spinlocks. Since fsnotify_add_mark() acquires group->mark_mutex this can
> > lead to sleeping while holding a spinlock, deadlocks due to lock
> > inversion, and probably other fun. Fix the problem by acquiring
> > group->mark_mutex earlier.
> >
> > CC: Paul Moore <paul@paul-moore.com>
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> >  kernel/audit_tree.c | 13 +++++++++++--
> >  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> [SIDE NOTE: this patch explains your comments and my earlier concern
> about the locked/unlocked variants of fsnotify_add_mark() in
> untag_chunk()]
> 
> Ouch.  Thanks for catching this ... what is your goal with these
> patches, are you targeting this as a fix during the v4.10-rcX cycle?
> If not, any objections if I pull this patch into the audit tree and
> send this to Linus during the v4.10-rcX cycle (assuming it passes
> testing, yadda yadda)?

Sure, go ahead. I plan these patches for the next merge window. So I can
rebase the series once you merge audit fixes...

								Honza
> 
> 
> > diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
> > index f3130eb0a4bd..156b6a93f4fc 100644
> > --- a/kernel/audit_tree.c
> > +++ b/kernel/audit_tree.c
> > @@ -231,6 +231,7 @@ static void untag_chunk(struct node *p)
> >         if (size)
> >                 new = alloc_chunk(size);
> >
> > +       mutex_lock(&entry->group->mark_mutex);
> >         spin_lock(&entry->lock);
> >         if (chunk->dead || !entry->inode) {
> >                 spin_unlock(&entry->lock);
> > @@ -258,7 +259,8 @@ static void untag_chunk(struct node *p)
> >         if (!new)
> >                 goto Fallback;
> >
> > -       if (fsnotify_add_mark(&new->mark, entry->group, entry->inode, NULL, 1)) {
> > +       if (fsnotify_add_mark_locked(&new->mark, entry->group, entry->inode,
> > +                                    NULL, 1)) {
> >                 fsnotify_put_mark(&new->mark);
> >                 goto Fallback;
> >         }
> > @@ -309,6 +311,7 @@ static void untag_chunk(struct node *p)
> >         spin_unlock(&hash_lock);
> >         spin_unlock(&entry->lock);
> >  out:
> > +       mutex_unlock(&entry->group->mark_mutex);
> >         fsnotify_put_mark(entry);
> >         spin_lock(&hash_lock);
> >  }
> > @@ -385,17 +388,21 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
> >
> >         chunk_entry = &chunk->mark;
> >
> > +       mutex_lock(&old_entry->group->mark_mutex);
> >         spin_lock(&old_entry->lock);
> >         if (!old_entry->inode) {
> >                 /* old_entry is being shot, lets just lie */
> >                 spin_unlock(&old_entry->lock);
> > +               mutex_unlock(&old_entry->group->mark_mutex);
> >                 fsnotify_put_mark(old_entry);
> >                 free_chunk(chunk);
> >                 return -ENOENT;
> >         }
> >
> > -       if (fsnotify_add_mark(chunk_entry, old_entry->group, old_entry->inode, NULL, 1)) {
> > +       if (fsnotify_add_mark_locked(chunk_entry, old_entry->group,
> > +                                    old_entry->inode, NULL, 1)) {
> >                 spin_unlock(&old_entry->lock);
> > +               mutex_unlock(&old_entry->group->mark_mutex);
> >                 fsnotify_put_mark(chunk_entry);
> >                 fsnotify_put_mark(old_entry);
> >                 return -ENOSPC;
> > @@ -411,6 +418,7 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
> >                 chunk->dead = 1;
> >                 spin_unlock(&chunk_entry->lock);
> >                 spin_unlock(&old_entry->lock);
> > +               mutex_unlock(&old_entry->group->mark_mutex);
> >
> >                 fsnotify_destroy_mark(chunk_entry, audit_tree_group);
> >
> > @@ -443,6 +451,7 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
> >         spin_unlock(&hash_lock);
> >         spin_unlock(&chunk_entry->lock);
> >         spin_unlock(&old_entry->lock);
> > +       mutex_unlock(&old_entry->group->mark_mutex);
> >         fsnotify_destroy_mark(old_entry, audit_tree_group);
> >         fsnotify_put_mark(chunk_entry); /* drop initial reference */
> >         fsnotify_put_mark(old_entry); /* pair to fsnotify_find mark_entry */
> > --
> > 2.10.2
> >
> 
> 
> 
> -- 
> paul moore
> www.paul-moore.com
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply

* Re: [PATCH 06/22] audit: Abstract hash key handling
From: Jan Kara @ 2016-12-23 13:27 UTC (permalink / raw)
  To: Paul Moore
  Cc: Jan Kara, linux-fsdevel, Amir Goldstein, Lino Sanfilippo,
	Miklos Szeredi, linux-audit
In-Reply-To: <CAHC9VhR3pwWoVzB1XnPweeQ+aqgmSKjoAmkd6A6ekaT+HQWVfw@mail.gmail.com>

On Thu 22-12-16 18:27:40, Paul Moore wrote:
> On Thu, Dec 22, 2016 at 4:15 AM, Jan Kara <jack@suse.cz> wrote:
> > Audit tree currently uses inode pointer as a key into the hash table.
> > Getting that from notification mark will be somewhat more difficult with
> > coming fsnotify changes and there's no reason we really have to use the
> > inode pointer. So abstract getting of hash key from the audit chunk and
> > inode so that we can switch to a different key easily later.
> >
> > CC: Paul Moore <paul@paul-moore.com>
> > Signed-off-by: Jan Kara <jack@suse.cz>
> > ---
> >  kernel/audit_tree.c | 39 ++++++++++++++++++++++++++++-----------
> >  1 file changed, 28 insertions(+), 11 deletions(-)
> 
> I have no objections with this patch in particular, but in patch 8,
> are you certain that inode_to_key() and chunk_to_key() will continue
> to return the same key value?

Yes, that's the intention. Or better in that patch the key will no longer
be inode pointer but instead the fsnotify_list pointer. But still it would
match for chunks attached to an inode and inode itself so comparison
results should stay the same.

								Honza 
> 
> > diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
> > index 156b6a93f4fc..f0859828de09 100644
> > --- a/kernel/audit_tree.c
> > +++ b/kernel/audit_tree.c
> > @@ -163,33 +163,48 @@ enum {HASH_SIZE = 128};
> >  static struct list_head chunk_hash_heads[HASH_SIZE];
> >  static __cacheline_aligned_in_smp DEFINE_SPINLOCK(hash_lock);
> >
> > -static inline struct list_head *chunk_hash(const struct inode *inode)
> > +/* Function to return search key in our hash from inode. */
> > +static unsigned long inode_to_key(const struct inode *inode)
> >  {
> > -       unsigned long n = (unsigned long)inode / L1_CACHE_BYTES;
> > +       return (unsigned long)inode;
> > +}
> > +
> > +/*
> > + * Function to return search key in our hash from chunk. Key 0 is special and
> > + * should never be present in the hash.
> > + */
> > +static unsigned long chunk_to_key(struct audit_chunk *chunk)
> > +{
> > +       return (unsigned long)chunk->mark.inode;
> > +}
> > +
> > +static inline struct list_head *chunk_hash(unsigned long key)
> > +{
> > +       unsigned long n = key / L1_CACHE_BYTES;
> >         return chunk_hash_heads + n % HASH_SIZE;
> >  }
> >
> >  /* hash_lock & entry->lock is held by caller */
> >  static void insert_hash(struct audit_chunk *chunk)
> >  {
> > -       struct fsnotify_mark *entry = &chunk->mark;
> > +       unsigned long key = chunk_to_key(chunk);
> >         struct list_head *list;
> >
> > -       if (!entry->inode)
> > +       if (!key)
> >                 return;
> > -       list = chunk_hash(entry->inode);
> > +       list = chunk_hash(key);
> >         list_add_rcu(&chunk->hash, list);
> >  }
> >
> >  /* called under rcu_read_lock */
> >  struct audit_chunk *audit_tree_lookup(const struct inode *inode)
> >  {
> > -       struct list_head *list = chunk_hash(inode);
> > +       unsigned long key = inode_to_key(inode);
> > +       struct list_head *list = chunk_hash(key);
> >         struct audit_chunk *p;
> >
> >         list_for_each_entry_rcu(p, list, hash) {
> > -               /* mark.inode may have gone NULL, but who cares? */
> > -               if (p->mark.inode == inode) {
> > +               if (chunk_to_key(p) == key) {
> >                         atomic_long_inc(&p->refs);
> >                         return p;
> >                 }
> > @@ -585,7 +600,8 @@ int audit_remove_tree_rule(struct audit_krule *rule)
> >
> >  static int compare_root(struct vfsmount *mnt, void *arg)
> >  {
> > -       return d_backing_inode(mnt->mnt_root) == arg;
> > +       return inode_to_key(d_backing_inode(mnt->mnt_root)) ==
> > +              (unsigned long)arg;
> >  }
> >
> >  void audit_trim_trees(void)
> > @@ -620,9 +636,10 @@ void audit_trim_trees(void)
> >                 list_for_each_entry(node, &tree->chunks, list) {
> >                         struct audit_chunk *chunk = find_chunk(node);
> >                         /* this could be NULL if the watch is dying else where... */
> > -                       struct inode *inode = chunk->mark.inode;
> >                         node->index |= 1U<<31;
> > -                       if (iterate_mounts(compare_root, inode, root_mnt))
> > +                       if (iterate_mounts(compare_root,
> > +                                          (void *)chunk_to_key(chunk),
> > +                                          root_mnt))
> >                                 node->index &= ~(1U<<31);
> >                 }
> >                 spin_unlock(&hash_lock);
> > --
> > 2.10.2
> >
> 
> 
> 
> -- 
> paul moore
> www.paul-moore.com
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply

* Re: [PATCH 04/22] fsnotify: Remove fsnotify_duplicate_mark()
From: Paul Moore @ 2016-12-23 14:01 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, Amir Goldstein, Lino Sanfilippo, Miklos Szeredi,
	linux-audit
In-Reply-To: <20161223132226.GA22679@quack2.suse.cz>

On Fri, Dec 23, 2016 at 8:22 AM, Jan Kara <jack@suse.cz> wrote:
> On Thu 22-12-16 18:13:11, Paul Moore wrote:
>> On Thu, Dec 22, 2016 at 4:15 AM, Jan Kara <jack@suse.cz> wrote:
>> > There are only two calls sites of fsnotify_duplicate_mark(). Those are
>> > in kernel/audit_tree.c and both are bogus. Vfsmount pointer is unused
>> > for audit tree, inode pointer and group gets set in
>> > fsnotify_add_mark_locked() later anyway, mask and free_mark are already
>> > set in alloc_chunk(). In fact, calling fsnotify_duplicate_mark() is
>> > actively harmful because following fsnotify_add_mark_locked() will leak
>> > group reference by overwriting the group pointer. So just remove the two
>> > calls to fsnotify_duplicate_mark() and the function.
>> >
>> > Signed-off-by: Jan Kara <jack@suse.cz>
>> > ---
>> >  fs/notify/mark.c                 | 12 ------------
>> >  include/linux/fsnotify_backend.h |  2 --
>> >  kernel/audit_tree.c              |  6 ++----
>> >  3 files changed, 2 insertions(+), 18 deletions(-)
>>
>> At first glance this looks reasonable, although you keep mentioning
>> "fsnotify_add_mark_locked" above when untag_chunk() is calling
>> "fsnotify_add_mark"; I just wanted to make sure you hadn't intended to
>> take the mutex in the audit code instead of relying on the locking in
>> fsnotify_add_mark().
>
> No, I didn't want to take mutex in the audit code. It is just that
> fsnotify_add_mark() is a thin wrapper around fsnotify_add_mark_locked() so
> I was speaking about that function.

Understood.  I just wanted to make sure this is what you intended
since the commit description and patch did not match.

If you end up respinning the patchset for any reason I might suggest
changing the function name in the description above.

>>
>> > diff --git a/fs/notify/mark.c b/fs/notify/mark.c
>> > index d3fea0bd89e2..6043306e8e21 100644
>> > --- a/fs/notify/mark.c
>> > +++ b/fs/notify/mark.c
>> > @@ -510,18 +510,6 @@ void fsnotify_detach_group_marks(struct fsnotify_group *group)
>> >         }
>> >  }
>> >
>> > -void fsnotify_duplicate_mark(struct fsnotify_mark *new, struct fsnotify_mark *old)
>> > -{
>> > -       assert_spin_locked(&old->lock);
>> > -       new->inode = old->inode;
>> > -       new->mnt = old->mnt;
>> > -       if (old->group)
>> > -               fsnotify_get_group(old->group);
>> > -       new->group = old->group;
>> > -       new->mask = old->mask;
>> > -       new->free_mark = old->free_mark;
>> > -}
>> > -
>> >  /*
>> >   * Nothing fancy, just initialize lists and locks and counters.
>> >   */
>> > diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
>> > index 0cf34d6cc253..487246546ebe 100644
>> > --- a/include/linux/fsnotify_backend.h
>> > +++ b/include/linux/fsnotify_backend.h
>> > @@ -323,8 +323,6 @@ extern void fsnotify_init_mark(struct fsnotify_mark *mark, void (*free_mark)(str
>> >  extern struct fsnotify_mark *fsnotify_find_inode_mark(struct fsnotify_group *group, struct inode *inode);
>> >  /* find (and take a reference) to a mark associated with group and vfsmount */
>> >  extern struct fsnotify_mark *fsnotify_find_vfsmount_mark(struct fsnotify_group *group, struct vfsmount *mnt);
>> > -/* copy the values from old into new */
>> > -extern void fsnotify_duplicate_mark(struct fsnotify_mark *new, struct fsnotify_mark *old);
>> >  /* set the ignored_mask of a mark */
>> >  extern void fsnotify_set_mark_ignored_mask_locked(struct fsnotify_mark *mark, __u32 mask);
>> >  /* set the mask of a mark (might pin the object into memory */
>> > diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
>> > index 8b1dde96a0fa..f3130eb0a4bd 100644
>> > --- a/kernel/audit_tree.c
>> > +++ b/kernel/audit_tree.c
>> > @@ -258,8 +258,7 @@ static void untag_chunk(struct node *p)
>> >         if (!new)
>> >                 goto Fallback;
>> >
>> > -       fsnotify_duplicate_mark(&new->mark, entry);
>> > -       if (fsnotify_add_mark(&new->mark, new->mark.group, new->mark.inode, NULL, 1)) {
>> > +       if (fsnotify_add_mark(&new->mark, entry->group, entry->inode, NULL, 1)) {
>> >                 fsnotify_put_mark(&new->mark);
>> >                 goto Fallback;
>> >         }
>> > @@ -395,8 +394,7 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
>> >                 return -ENOENT;
>> >         }
>> >
>> > -       fsnotify_duplicate_mark(chunk_entry, old_entry);
>> > -       if (fsnotify_add_mark(chunk_entry, chunk_entry->group, chunk_entry->inode, NULL, 1)) {
>> > +       if (fsnotify_add_mark(chunk_entry, old_entry->group, old_entry->inode, NULL, 1)) {
>> >                 spin_unlock(&old_entry->lock);
>> >                 fsnotify_put_mark(chunk_entry);
>> >                 fsnotify_put_mark(old_entry);
>> > --
>> > 2.10.2
>> >
>>
>> --
>> paul moore
>> www.paul-moore.com
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR



-- 
paul moore
www.paul-moore.com

^ permalink raw reply

* Re: [PATCH 06/22] audit: Abstract hash key handling
From: Paul Moore @ 2016-12-23 14:13 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, Amir Goldstein, Lino Sanfilippo, Miklos Szeredi,
	linux-audit
In-Reply-To: <20161223132750.GC22679@quack2.suse.cz>

On Fri, Dec 23, 2016 at 8:27 AM, Jan Kara <jack@suse.cz> wrote:
> On Thu 22-12-16 18:27:40, Paul Moore wrote:
>> On Thu, Dec 22, 2016 at 4:15 AM, Jan Kara <jack@suse.cz> wrote:
>> > Audit tree currently uses inode pointer as a key into the hash table.
>> > Getting that from notification mark will be somewhat more difficult with
>> > coming fsnotify changes and there's no reason we really have to use the
>> > inode pointer. So abstract getting of hash key from the audit chunk and
>> > inode so that we can switch to a different key easily later.
>> >
>> > CC: Paul Moore <paul@paul-moore.com>
>> > Signed-off-by: Jan Kara <jack@suse.cz>
>> > ---
>> >  kernel/audit_tree.c | 39 ++++++++++++++++++++++++++++-----------
>> >  1 file changed, 28 insertions(+), 11 deletions(-)
>>
>> I have no objections with this patch in particular, but in patch 8,
>> are you certain that inode_to_key() and chunk_to_key() will continue
>> to return the same key value?
>
> Yes, that's the intention. Or better in that patch the key will no longer
> be inode pointer but instead the fsnotify_list pointer. But still it would
> match for chunks attached to an inode and inode itself so comparison
> results should stay the same.

My apologies, I probably should have been more clear.

Yes, I think we are all in agreement that the *_to_key() functions
need to return a consistent value for the same object.  My concern is
that in patch 8 these functions are using different variables (granted
they may contain the same value, and therefore evaluate to the same
key) and I worry that there is a possibility of the two variables
taking on different values and breaking the hash.  What guarantees
exist that these values will be the same?  Are there any safeguards to
prevent future patches from accidentally sidestepping these
guarantees?

-- 
paul moore
www.paul-moore.com

^ permalink raw reply

* Re: [PATCH 05/22] audit: Fix sleep in atomic
From: Paul Moore @ 2016-12-23 14:17 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, Amir Goldstein, Lino Sanfilippo, Miklos Szeredi,
	linux-audit
In-Reply-To: <20161223132433.GB22679@quack2.suse.cz>

On Fri, Dec 23, 2016 at 8:24 AM, Jan Kara <jack@suse.cz> wrote:
> On Thu 22-12-16 18:18:36, Paul Moore wrote:
>> On Thu, Dec 22, 2016 at 4:15 AM, Jan Kara <jack@suse.cz> wrote:
>> > Audit tree code was happily adding new notification marks while holding
>> > spinlocks. Since fsnotify_add_mark() acquires group->mark_mutex this can
>> > lead to sleeping while holding a spinlock, deadlocks due to lock
>> > inversion, and probably other fun. Fix the problem by acquiring
>> > group->mark_mutex earlier.
>> >
>> > CC: Paul Moore <paul@paul-moore.com>
>> > Signed-off-by: Jan Kara <jack@suse.cz>
>> > ---
>> >  kernel/audit_tree.c | 13 +++++++++++--
>> >  1 file changed, 11 insertions(+), 2 deletions(-)
>>
>> [SIDE NOTE: this patch explains your comments and my earlier concern
>> about the locked/unlocked variants of fsnotify_add_mark() in
>> untag_chunk()]
>>
>> Ouch.  Thanks for catching this ... what is your goal with these
>> patches, are you targeting this as a fix during the v4.10-rcX cycle?
>> If not, any objections if I pull this patch into the audit tree and
>> send this to Linus during the v4.10-rcX cycle (assuming it passes
>> testing, yadda yadda)?
>
> Sure, go ahead. I plan these patches for the next merge window. So I can
> rebase the series once you merge audit fixes...

Okay, great.  I'll merge this patch in the audit/stable-4.10 branch
for Linus but there will likely be some delays due to
holidays/vacation on my end.

Thanks again for your help fixing this, I really appreciate it.

>> > diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
>> > index f3130eb0a4bd..156b6a93f4fc 100644
>> > --- a/kernel/audit_tree.c
>> > +++ b/kernel/audit_tree.c
>> > @@ -231,6 +231,7 @@ static void untag_chunk(struct node *p)
>> >         if (size)
>> >                 new = alloc_chunk(size);
>> >
>> > +       mutex_lock(&entry->group->mark_mutex);
>> >         spin_lock(&entry->lock);
>> >         if (chunk->dead || !entry->inode) {
>> >                 spin_unlock(&entry->lock);
>> > @@ -258,7 +259,8 @@ static void untag_chunk(struct node *p)
>> >         if (!new)
>> >                 goto Fallback;
>> >
>> > -       if (fsnotify_add_mark(&new->mark, entry->group, entry->inode, NULL, 1)) {
>> > +       if (fsnotify_add_mark_locked(&new->mark, entry->group, entry->inode,
>> > +                                    NULL, 1)) {
>> >                 fsnotify_put_mark(&new->mark);
>> >                 goto Fallback;
>> >         }
>> > @@ -309,6 +311,7 @@ static void untag_chunk(struct node *p)
>> >         spin_unlock(&hash_lock);
>> >         spin_unlock(&entry->lock);
>> >  out:
>> > +       mutex_unlock(&entry->group->mark_mutex);
>> >         fsnotify_put_mark(entry);
>> >         spin_lock(&hash_lock);
>> >  }
>> > @@ -385,17 +388,21 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
>> >
>> >         chunk_entry = &chunk->mark;
>> >
>> > +       mutex_lock(&old_entry->group->mark_mutex);
>> >         spin_lock(&old_entry->lock);
>> >         if (!old_entry->inode) {
>> >                 /* old_entry is being shot, lets just lie */
>> >                 spin_unlock(&old_entry->lock);
>> > +               mutex_unlock(&old_entry->group->mark_mutex);
>> >                 fsnotify_put_mark(old_entry);
>> >                 free_chunk(chunk);
>> >                 return -ENOENT;
>> >         }
>> >
>> > -       if (fsnotify_add_mark(chunk_entry, old_entry->group, old_entry->inode, NULL, 1)) {
>> > +       if (fsnotify_add_mark_locked(chunk_entry, old_entry->group,
>> > +                                    old_entry->inode, NULL, 1)) {
>> >                 spin_unlock(&old_entry->lock);
>> > +               mutex_unlock(&old_entry->group->mark_mutex);
>> >                 fsnotify_put_mark(chunk_entry);
>> >                 fsnotify_put_mark(old_entry);
>> >                 return -ENOSPC;
>> > @@ -411,6 +418,7 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
>> >                 chunk->dead = 1;
>> >                 spin_unlock(&chunk_entry->lock);
>> >                 spin_unlock(&old_entry->lock);
>> > +               mutex_unlock(&old_entry->group->mark_mutex);
>> >
>> >                 fsnotify_destroy_mark(chunk_entry, audit_tree_group);
>> >
>> > @@ -443,6 +451,7 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
>> >         spin_unlock(&hash_lock);
>> >         spin_unlock(&chunk_entry->lock);
>> >         spin_unlock(&old_entry->lock);
>> > +       mutex_unlock(&old_entry->group->mark_mutex);
>> >         fsnotify_destroy_mark(old_entry, audit_tree_group);
>> >         fsnotify_put_mark(chunk_entry); /* drop initial reference */
>> >         fsnotify_put_mark(old_entry); /* pair to fsnotify_find mark_entry */
>> > --
>> > 2.10.2
>> >

-- 
paul moore
www.paul-moore.com

^ permalink raw reply

* Re: [PATCH 05/22] audit: Fix sleep in atomic
From: Paul Moore @ 2016-12-26 16:33 UTC (permalink / raw)
  To: Jan Kara
  Cc: linux-fsdevel, Amir Goldstein, Lino Sanfilippo, Miklos Szeredi,
	linux-audit
In-Reply-To: <CAHC9VhQYq37+HOwjQAr834HVZ-iKi4SQ7L5e-zTJkf39F-A08w@mail.gmail.com>

On Fri, Dec 23, 2016 at 9:17 AM, Paul Moore <paul@paul-moore.com> wrote:
> On Fri, Dec 23, 2016 at 8:24 AM, Jan Kara <jack@suse.cz> wrote:
>> On Thu 22-12-16 18:18:36, Paul Moore wrote:
>>> On Thu, Dec 22, 2016 at 4:15 AM, Jan Kara <jack@suse.cz> wrote:
>>> > Audit tree code was happily adding new notification marks while holding
>>> > spinlocks. Since fsnotify_add_mark() acquires group->mark_mutex this can
>>> > lead to sleeping while holding a spinlock, deadlocks due to lock
>>> > inversion, and probably other fun. Fix the problem by acquiring
>>> > group->mark_mutex earlier.
>>> >
>>> > CC: Paul Moore <paul@paul-moore.com>
>>> > Signed-off-by: Jan Kara <jack@suse.cz>
>>> > ---
>>> >  kernel/audit_tree.c | 13 +++++++++++--
>>> >  1 file changed, 11 insertions(+), 2 deletions(-)
>>>
>>> [SIDE NOTE: this patch explains your comments and my earlier concern
>>> about the locked/unlocked variants of fsnotify_add_mark() in
>>> untag_chunk()]
>>>
>>> Ouch.  Thanks for catching this ... what is your goal with these
>>> patches, are you targeting this as a fix during the v4.10-rcX cycle?
>>> If not, any objections if I pull this patch into the audit tree and
>>> send this to Linus during the v4.10-rcX cycle (assuming it passes
>>> testing, yadda yadda)?
>>
>> Sure, go ahead. I plan these patches for the next merge window. So I can
>> rebase the series once you merge audit fixes...
>
> Okay, great.  I'll merge this patch in the audit/stable-4.10 branch
> for Linus but there will likely be some delays due to
> holidays/vacation on my end.
>
> Thanks again for your help fixing this, I really appreciate it.

I merged this patch, as well as the "Remove fsnotify_duplicate_mark()"
patch (to make things cleaner when merging this patch) and did a quick
test using the audit-testsuite ... the test hung on the "file_create"
tests.  Unfortunately, I'm traveling right now for the holidays and
will not likely have a chance to debug this much further until after
the new year, but I thought I would mention it in case you had some
time to look into this failure.

For reference, here is the audit-testsuite again:

* https://github.com/linux-audit/audit-testsuite

... and if you have a Fedora test system, here is the Rawhide kernel I
used to test (it is basically my kernel-secnext test kernel with those
two patches mentioned above added on top):

* https://copr.fedorainfracloud.org/coprs/pcmoore/kernel-testing/build/492386

-- 
paul moore
www.paul-moore.com

^ permalink raw reply

* [PATCH 0/2] Begin auditing SECCOMP_RET_ERRNO return actions
From: Tyler Hicks @ 2017-01-02 16:53 UTC (permalink / raw)
  To: Paul Moore, Eric Paris, Kees Cook, Andy Lutomirski, Will Drewry
  Cc: linux-audit, linux-kernel

This patch set creates the basis for auditing information specific to a given
seccomp return action and then starts auditing SECCOMP_RET_ERRNO return
actions. The audit messages for SECCOMP_RET_ERRNO return actions include the
errno value that will be returned to userspace.

Tyler

^ permalink raw reply

* [PATCH 1/2] seccomp: Allow for auditing functionality specific to return actions
From: Tyler Hicks @ 2017-01-02 16:53 UTC (permalink / raw)
  To: Paul Moore, Eric Paris, Kees Cook, Andy Lutomirski, Will Drewry
  Cc: linux-audit, linux-kernel
In-Reply-To: <1483375990-14948-1-git-send-email-tyhicks@canonical.com>

This patch introduces the concept of auditing formats that are specific
to the action specified in a filter's return value. Initially, only
SECCOMP_RET_KILL has an auditing message that differs from other return
actions because it specifies the signal that is to be sent.

This patch causes a small functional change in that "sig=0" is not
printed when auditing seccomp actions other than SECCOMP_RET_KILL.

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
---
 include/linux/audit.h | 39 +++++++++++++++++++++++++++++++++------
 kernel/auditsc.c      | 19 +++++++++++++++----
 kernel/seccomp.c      |  6 +++---
 3 files changed, 51 insertions(+), 13 deletions(-)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index f51fca8d..8c588c3 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -85,6 +85,11 @@ struct audit_field {
 	u32				op;
 };
 
+struct audit_seccomp_info {
+	int		code;
+	long		signr;
+};
+
 extern int is_audit_feature_set(int which);
 
 extern int __init audit_register_class(int class, unsigned *list);
@@ -243,7 +248,8 @@ extern void __audit_file(const struct file *);
 extern void __audit_inode_child(struct inode *parent,
 				const struct dentry *dentry,
 				const unsigned char type);
-extern void __audit_seccomp(unsigned long syscall, long signr, int code);
+extern void __audit_seccomp(unsigned long syscall,
+			    struct audit_seccomp_info *info);
 extern void __audit_ptrace(struct task_struct *t);
 
 static inline bool audit_dummy_context(void)
@@ -313,14 +319,31 @@ static inline void audit_inode_child(struct inode *parent,
 }
 void audit_core_dumps(long signr);
 
-static inline void audit_seccomp(unsigned long syscall, long signr, int code)
+static inline void audit_seccomp_signal(unsigned long syscall, long signr,
+					int code)
 {
 	if (!audit_enabled)
 		return;
 
 	/* Force a record to be reported if a signal was delivered. */
-	if (signr || unlikely(!audit_dummy_context()))
-		__audit_seccomp(syscall, signr, code);
+	if (signr || unlikely(!audit_dummy_context())) {
+		struct audit_seccomp_info info = { .code = code,
+						   .signr = signr };
+
+		__audit_seccomp(syscall, &info);
+	}
+}
+
+static inline void audit_seccomp_common(unsigned long syscall, int code)
+{
+	if (!audit_enabled)
+		return;
+
+	if (code || unlikely(!audit_dummy_context())) {
+		struct audit_seccomp_info info = { .code = code };
+
+		__audit_seccomp(syscall, &info);
+	}
 }
 
 static inline void audit_ptrace(struct task_struct *t)
@@ -485,9 +508,13 @@ static inline void audit_inode_child(struct inode *parent,
 { }
 static inline void audit_core_dumps(long signr)
 { }
-static inline void __audit_seccomp(unsigned long syscall, long signr, int code)
+static inline void __audit_seccomp(unsigned long syscall,
+				   struct audit_seccomp_info *info);
+{ }
+static inline void audit_seccomp_signal(unsigned long syscall, long signr,
+					int code)
 { }
-static inline void audit_seccomp(unsigned long syscall, long signr, int code)
+static inline void audit_seccomp_common(unsigned long syscall, int code)
 { }
 static inline int auditsc_get_stamp(struct audit_context *ctx,
 			      struct timespec *t, unsigned int *serial)
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index cf1fa43..b3472f2 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -74,6 +74,7 @@
 #include <linux/string.h>
 #include <linux/uaccess.h>
 #include <uapi/linux/limits.h>
+#include <uapi/linux/seccomp.h>
 
 #include "audit.h"
 
@@ -2415,7 +2416,7 @@ void audit_core_dumps(long signr)
 	audit_log_end(ab);
 }
 
-void __audit_seccomp(unsigned long syscall, long signr, int code)
+void __audit_seccomp(unsigned long syscall, struct audit_seccomp_info *info)
 {
 	struct audit_buffer *ab;
 
@@ -2423,9 +2424,19 @@ void __audit_seccomp(unsigned long syscall, long signr, int code)
 	if (unlikely(!ab))
 		return;
 	audit_log_task(ab);
-	audit_log_format(ab, " sig=%ld arch=%x syscall=%ld compat=%d ip=0x%lx code=0x%x",
-			 signr, syscall_get_arch(), syscall,
-			 in_compat_syscall(), KSTK_EIP(current), code);
+
+	switch (info->code) {
+	case SECCOMP_RET_KILL:
+		audit_log_format(ab, " sig=%ld", info->signr);
+		break;
+	default:
+		break;
+	}
+
+	audit_log_format(ab,
+			 " arch=%x syscall=%ld compat=%d ip=0x%lx code=0x%x",
+			 syscall_get_arch(), syscall, in_compat_syscall(),
+			 KSTK_EIP(current), info->code);
 	audit_log_end(ab);
 }
 
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index f7ce79a..54c01b6 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -532,7 +532,7 @@ static void __secure_computing_strict(int this_syscall)
 #ifdef SECCOMP_DEBUG
 	dump_stack();
 #endif
-	audit_seccomp(this_syscall, SIGKILL, SECCOMP_RET_KILL);
+	audit_seccomp_signal(this_syscall, SIGKILL, SECCOMP_RET_KILL);
 	do_exit(SIGKILL);
 }
 
@@ -635,14 +635,14 @@ static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd,
 
 	case SECCOMP_RET_KILL:
 	default:
-		audit_seccomp(this_syscall, SIGSYS, action);
+		audit_seccomp_signal(this_syscall, SIGSYS, action);
 		do_exit(SIGSYS);
 	}
 
 	unreachable();
 
 skip:
-	audit_seccomp(this_syscall, 0, action);
+	audit_seccomp_common(this_syscall, action);
 	return -1;
 }
 #else
-- 
2.7.4

^ permalink raw reply related

* [PATCH 2/2] seccomp: Audit SECCOMP_RET_ERRNO actions with errno values
From: Tyler Hicks @ 2017-01-02 16:53 UTC (permalink / raw)
  To: Paul Moore, Eric Paris, Kees Cook, Andy Lutomirski, Will Drewry
  Cc: linux-audit, linux-kernel
In-Reply-To: <1483375990-14948-1-git-send-email-tyhicks@canonical.com>

Generate audit records for SECCOMP_RET_ERRNO actions, which were
previously not audited.

Additionally, include the errno value that will be set in the audit
message.

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
---
 include/linux/audit.h | 19 ++++++++++++++++++-
 kernel/auditsc.c      |  3 +++
 kernel/seccomp.c      |  4 +++-
 3 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index 8c588c3..6815812 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -87,7 +87,10 @@ struct audit_field {
 
 struct audit_seccomp_info {
 	int		code;
-	long		signr;
+	union {
+		int	errno;
+		long	signr;
+	};
 };
 
 extern int is_audit_feature_set(int which);
@@ -319,6 +322,20 @@ static inline void audit_inode_child(struct inode *parent,
 }
 void audit_core_dumps(long signr);
 
+static inline void audit_seccomp_errno(unsigned long syscall, int errno,
+				       int code)
+{
+	if (!audit_enabled)
+		return;
+
+	if (errno || unlikely(!audit_dummy_context())) {
+		struct audit_seccomp_info info = { .code = code,
+						   .errno = errno };
+
+		__audit_seccomp(syscall, &info);
+	}
+}
+
 static inline void audit_seccomp_signal(unsigned long syscall, long signr,
 					int code)
 {
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index b3472f2..db5fc9d 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -2426,6 +2426,9 @@ void __audit_seccomp(unsigned long syscall, struct audit_seccomp_info *info)
 	audit_log_task(ab);
 
 	switch (info->code) {
+	case SECCOMP_RET_ERRNO:
+		audit_log_format(ab, " errno=%d", info->errno);
+		break;
 	case SECCOMP_RET_KILL:
 		audit_log_format(ab, " sig=%ld", info->signr);
 		break;
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 54c01b6..e99c566 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -576,9 +576,11 @@ static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd,
 		/* Set low-order bits as an errno, capped at MAX_ERRNO. */
 		if (data > MAX_ERRNO)
 			data = MAX_ERRNO;
+
+		audit_seccomp_errno(this_syscall, data, action);
 		syscall_set_return_value(current, task_pt_regs(current),
 					 -data, 0);
-		goto skip;
+		return -1;
 
 	case SECCOMP_RET_TRAP:
 		/* Show the handler the original registers. */
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH 2/2] seccomp: Audit SECCOMP_RET_ERRNO actions with errno values
From: Steve Grubb @ 2017-01-02 17:20 UTC (permalink / raw)
  To: linux-audit; +Cc: Will Drewry, linux-kernel, Andy Lutomirski
In-Reply-To: <1483375990-14948-3-git-send-email-tyhicks@canonical.com>

On Monday, January 2, 2017 4:53:10 PM EST Tyler Hicks wrote:
> Generate audit records for SECCOMP_RET_ERRNO actions, which were
> previously not audited.
> 
> Additionally, include the errno value that will be set in the audit
> message.
> 
> Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
> ---
>  include/linux/audit.h | 19 ++++++++++++++++++-
>  kernel/auditsc.c      |  3 +++
>  kernel/seccomp.c      |  4 +++-
>  3 files changed, 24 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/audit.h b/include/linux/audit.h
> index 8c588c3..6815812 100644
> --- a/include/linux/audit.h
> +++ b/include/linux/audit.h
> @@ -87,7 +87,10 @@ struct audit_field {
> 
>  struct audit_seccomp_info {
>  	int		code;
> -	long		signr;
> +	union {
> +		int	errno;
> +		long	signr;
> +	};
>  };
> 
>  extern int is_audit_feature_set(int which);
> @@ -319,6 +322,20 @@ static inline void audit_inode_child(struct inode
> *parent, }
>  void audit_core_dumps(long signr);
> 
> +static inline void audit_seccomp_errno(unsigned long syscall, int errno,
> +				       int code)
> +{
> +	if (!audit_enabled)
> +		return;
> +
> +	if (errno || unlikely(!audit_dummy_context())) {
> +		struct audit_seccomp_info info = { .code = code,
> +						   .errno = errno };
> +
> +		__audit_seccomp(syscall, &info);
> +	}
> +}
> +
>  static inline void audit_seccomp_signal(unsigned long syscall, long signr,
>  					int code)
>  {
> diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> index b3472f2..db5fc9d 100644
> --- a/kernel/auditsc.c
> +++ b/kernel/auditsc.c
> @@ -2426,6 +2426,9 @@ void __audit_seccomp(unsigned long syscall, struct
> audit_seccomp_info *info) audit_log_task(ab);
> 
>  	switch (info->code) {
> +	case SECCOMP_RET_ERRNO:
> +		audit_log_format(ab, " errno=%d", info->errno);
> +		break;

"exit" is the field name that syscalls use to return errno to user space. I'd
rather not see another field created that maps to the same thing. You can check
the translation with the auformat utility:

http://people.redhat.com/sgrubb/files/auformat.tar.gz

$ ausearch --start today --just-one -m syscall -sv no --raw | ./auformat "%EXIT\n"

Also, I am working to normalize all the records. That mean every event record
of the same type has the same fields, in the same order, with the same
representation. I would think "exit" could be added to the current record after
syscall so that its ordered similarly to a syscall record.

-Steve

>  	case SECCOMP_RET_KILL:
>  		audit_log_format(ab, " sig=%ld", info->signr);
>  		break;
> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> index 54c01b6..e99c566 100644
> --- a/kernel/seccomp.c
> +++ b/kernel/seccomp.c
> @@ -576,9 +576,11 @@ static int __seccomp_filter(int this_syscall, const
> struct seccomp_data *sd, /* Set low-order bits as an errno, capped at
> MAX_ERRNO. */
>  		if (data > MAX_ERRNO)
>  			data = MAX_ERRNO;
> +
> +		audit_seccomp_errno(this_syscall, data, action);
>  		syscall_set_return_value(current, task_pt_regs(current),
>  					 -data, 0);
> -		goto skip;
> +		return -1;
> 
>  	case SECCOMP_RET_TRAP:
>  		/* Show the handler the original registers. */

^ permalink raw reply

* [PATCH 0/2] Support auditing while still allowing a syscall
From: Tyler Hicks @ 2017-01-02 17:26 UTC (permalink / raw)
  To: Paul Moore, Eric Paris, Kees Cook, Andy Lutomirski, Will Drewry
  Cc: linux-audit, linux-kernel

Allow application authors to opt into auditing a syscall before allowing it.
This differs slightly from SECCOMP_RET_ALLOW in that an audit message is
generated for the syscall.

It can be useful when initially setting up a seccomp sandbox for your
application if you set the default action to audit instead of, for example,
kill when the application makes a restricted system call. The application
author can easily compile a list of syscalls that need to be allowed instead of
iteratively building the application, testing the application, updating the
filter, and repeating until all appropriate system calls have been allowed.

This patch set depends on the following patch set:

  https://lkml.kernel.org/r/1483375990-14948-1-git-send-email-tyhicks@canonical.com

(At least, I hope it shows up at that link soon because marc.info doesn't yet
 know of the message-id.)

The corresponding libseccomp changes can be found here:

  https://github.com/seccomp/libseccomp/pull/64

Thanks!

Tyler

^ permalink raw reply

* [PATCH 1/2] seccomp: Create an action to audit before allowing
From: Tyler Hicks @ 2017-01-02 17:26 UTC (permalink / raw)
  To: Paul Moore, Eric Paris, Kees Cook, Andy Lutomirski, Will Drewry
  Cc: linux-audit, linux-kernel
In-Reply-To: <1483377999-15019-1-git-send-email-tyhicks@canonical.com>

Add a new action, SECCOMP_RET_AUDIT, which is identical to
SECCOMP_RET_ALLOW with the exception that an audit message is logged
rather than quietly allowing the syscall.

This can be very useful when initially developing a seccomp filter for
an application because the list of syscalls needed, which aren't marked
for SECCOMP_RET_ALLOW, can be easily lifted from the audit log reports
after exercising the application. This provides a more friendly
experience than seeing the application get killed, then updating the
filter and rebuilding the app, seeing the application get killed due to
a different syscall, then updating the filter and rebuilding the app,
etc.

SECCOMP_RET_AUDIT is considered to be slightly more restrictive than
SECCOMP_RET_ALLOW. The reason is because 'audit before allowing' is more
restrictive than 'silently allowing'.

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
---
 Documentation/prctl/seccomp_filter.txt | 4 ++++
 include/uapi/linux/seccomp.h           | 1 +
 kernel/seccomp.c                       | 4 ++++
 3 files changed, 9 insertions(+)

diff --git a/Documentation/prctl/seccomp_filter.txt b/Documentation/prctl/seccomp_filter.txt
index 1e469ef..61169d3 100644
--- a/Documentation/prctl/seccomp_filter.txt
+++ b/Documentation/prctl/seccomp_filter.txt
@@ -138,6 +138,10 @@ SECCOMP_RET_TRACE:
 	allow use of ptrace, even of other sandboxed processes, without
 	extreme care; ptracers can use this mechanism to escape.)
 
+SECCOMP_RET_AUDIT:
+	Results in the system call being executed after an audit log record is
+	emitted.
+
 SECCOMP_RET_ALLOW:
 	Results in the system call being executed.
 
diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
index 0f238a4..551f099 100644
--- a/include/uapi/linux/seccomp.h
+++ b/include/uapi/linux/seccomp.h
@@ -29,6 +29,7 @@
 #define SECCOMP_RET_TRAP	0x00030000U /* disallow and force a SIGSYS */
 #define SECCOMP_RET_ERRNO	0x00050000U /* returns an errno */
 #define SECCOMP_RET_TRACE	0x7ff00000U /* pass to a tracer or disallow */
+#define SECCOMP_RET_AUDIT	0x7ffe0000U /* allow with an audit message */
 #define SECCOMP_RET_ALLOW	0x7fff0000U /* allow */
 
 /* Masks for the return value sections. */
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index e99c566..2c0ed54 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -632,6 +632,10 @@ static int __seccomp_filter(int this_syscall, const struct seccomp_data *sd,
 
 		return 0;
 
+	case SECCOMP_RET_AUDIT:
+		audit_seccomp_common(this_syscall, action);
+		return 0;
+
 	case SECCOMP_RET_ALLOW:
 		return 0;
 
-- 
2.7.4

^ permalink raw reply related

* [PATCH 2/2] seccomp: Add tests for SECCOMP_RET_AUDIT
From: Tyler Hicks @ 2017-01-02 17:26 UTC (permalink / raw)
  To: Paul Moore, Eric Paris, Kees Cook, Andy Lutomirski, Will Drewry
  Cc: linux-audit, linux-kernel
In-Reply-To: <1483377999-15019-1-git-send-email-tyhicks@canonical.com>

Extend the kernel selftests for seccomp to test the newly added
SECCOMP_RET_AUDIT action. The added tests follow the example of existing
tests.

Unfortunately, the tests are not capable of inspecting the audit log to
verify that an audit message was emitted.

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
---
 tools/testing/selftests/seccomp/seccomp_bpf.c | 94 +++++++++++++++++++++++++++
 1 file changed, 94 insertions(+)

diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
index 03f1fa4..9cff9fa 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -87,6 +87,10 @@ struct seccomp_data {
 };
 #endif
 
+#ifndef SECCOMP_RET_AUDIT
+#define SECCOMP_RET_AUDIT       0x7ffe0000U /* allow with an audit message */
+#endif
+
 #if __BYTE_ORDER == __LITTLE_ENDIAN
 #define syscall_arg(_n) (offsetof(struct seccomp_data, args[_n]))
 #elif __BYTE_ORDER == __BIG_ENDIAN
@@ -342,6 +346,28 @@ TEST(empty_prog)
 	EXPECT_EQ(EINVAL, errno);
 }
 
+TEST(AUDIT_all)
+{
+	struct sock_filter filter[] = {
+		BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_AUDIT),
+	};
+	struct sock_fprog prog = {
+		.len = (unsigned short)ARRAY_SIZE(filter),
+		.filter = filter,
+	};
+	long ret;
+	pid_t parent = getppid();
+
+	ret = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
+	ASSERT_EQ(0, ret);
+
+	ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog);
+	ASSERT_EQ(0, ret);
+
+	/* getppid() should succeed and be audited (no check for auditing) */
+	EXPECT_EQ(parent, syscall(__NR_getppid));
+}
+
 TEST_SIGNAL(unknown_ret_is_kill_inside, SIGSYS)
 {
 	struct sock_filter filter[] = {
@@ -735,6 +761,7 @@ TEST_F(TRAP, handler)
 
 FIXTURE_DATA(precedence) {
 	struct sock_fprog allow;
+	struct sock_fprog audit;
 	struct sock_fprog trace;
 	struct sock_fprog error;
 	struct sock_fprog trap;
@@ -746,6 +773,13 @@ FIXTURE_SETUP(precedence)
 	struct sock_filter allow_insns[] = {
 		BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW),
 	};
+	struct sock_filter audit_insns[] = {
+		BPF_STMT(BPF_LD|BPF_W|BPF_ABS,
+			offsetof(struct seccomp_data, nr)),
+		BPF_JUMP(BPF_JMP|BPF_JEQ|BPF_K, __NR_getpid, 1, 0),
+		BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_ALLOW),
+		BPF_STMT(BPF_RET|BPF_K, SECCOMP_RET_AUDIT),
+	};
 	struct sock_filter trace_insns[] = {
 		BPF_STMT(BPF_LD|BPF_W|BPF_ABS,
 			offsetof(struct seccomp_data, nr)),
@@ -782,6 +816,7 @@ FIXTURE_SETUP(precedence)
 	memcpy(self->_x.filter, &_x##_insns, sizeof(_x##_insns)); \
 	self->_x.len = (unsigned short)ARRAY_SIZE(_x##_insns)
 	FILTER_ALLOC(allow);
+	FILTER_ALLOC(audit);
 	FILTER_ALLOC(trace);
 	FILTER_ALLOC(error);
 	FILTER_ALLOC(trap);
@@ -792,6 +827,7 @@ FIXTURE_TEARDOWN(precedence)
 {
 #define FILTER_FREE(_x) if (self->_x.filter) free(self->_x.filter)
 	FILTER_FREE(allow);
+	FILTER_FREE(audit);
 	FILTER_FREE(trace);
 	FILTER_FREE(error);
 	FILTER_FREE(trap);
@@ -809,6 +845,8 @@ TEST_F(precedence, allow_ok)
 
 	ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->allow);
 	ASSERT_EQ(0, ret);
+	ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->audit);
+	ASSERT_EQ(0, ret);
 	ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->trace);
 	ASSERT_EQ(0, ret);
 	ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->error);
@@ -833,6 +871,8 @@ TEST_F_SIGNAL(precedence, kill_is_highest, SIGSYS)
 
 	ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->allow);
 	ASSERT_EQ(0, ret);
+	ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->audit);
+	ASSERT_EQ(0, ret);
 	ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->trace);
 	ASSERT_EQ(0, ret);
 	ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->error);
@@ -864,6 +904,8 @@ TEST_F_SIGNAL(precedence, kill_is_highest_in_any_order, SIGSYS)
 	ASSERT_EQ(0, ret);
 	ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->error);
 	ASSERT_EQ(0, ret);
+	ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->audit);
+	ASSERT_EQ(0, ret);
 	ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->trace);
 	ASSERT_EQ(0, ret);
 	ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->trap);
@@ -885,6 +927,8 @@ TEST_F_SIGNAL(precedence, trap_is_second, SIGSYS)
 
 	ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->allow);
 	ASSERT_EQ(0, ret);
+	ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->audit);
+	ASSERT_EQ(0, ret);
 	ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->trace);
 	ASSERT_EQ(0, ret);
 	ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->error);
@@ -910,6 +954,8 @@ TEST_F_SIGNAL(precedence, trap_is_second_in_any_order, SIGSYS)
 	ASSERT_EQ(0, ret);
 	ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->trap);
 	ASSERT_EQ(0, ret);
+	ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->audit);
+	ASSERT_EQ(0, ret);
 	ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->trace);
 	ASSERT_EQ(0, ret);
 	ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->error);
@@ -931,6 +977,8 @@ TEST_F(precedence, errno_is_third)
 
 	ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->allow);
 	ASSERT_EQ(0, ret);
+	ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->audit);
+	ASSERT_EQ(0, ret);
 	ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->trace);
 	ASSERT_EQ(0, ret);
 	ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->error);
@@ -949,6 +997,8 @@ TEST_F(precedence, errno_is_third_in_any_order)
 	ret = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
 	ASSERT_EQ(0, ret);
 
+	ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->audit);
+	ASSERT_EQ(0, ret);
 	ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->error);
 	ASSERT_EQ(0, ret);
 	ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->trace);
@@ -971,6 +1021,8 @@ TEST_F(precedence, trace_is_fourth)
 
 	ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->allow);
 	ASSERT_EQ(0, ret);
+	ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->audit);
+	ASSERT_EQ(0, ret);
 	ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->trace);
 	ASSERT_EQ(0, ret);
 	/* Should work just fine. */
@@ -992,12 +1044,54 @@ TEST_F(precedence, trace_is_fourth_in_any_order)
 	ASSERT_EQ(0, ret);
 	ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->allow);
 	ASSERT_EQ(0, ret);
+	ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->audit);
+	ASSERT_EQ(0, ret);
 	/* Should work just fine. */
 	EXPECT_EQ(parent, syscall(__NR_getppid));
 	/* No ptracer */
 	EXPECT_EQ(-1, syscall(__NR_getpid));
 }
 
+TEST_F(precedence, audit_is_fifth)
+{
+	pid_t mypid, parent;
+	long ret;
+
+	mypid = getpid();
+	parent = getppid();
+	ret = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
+	ASSERT_EQ(0, ret);
+
+	ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->allow);
+	ASSERT_EQ(0, ret);
+	ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->audit);
+	ASSERT_EQ(0, ret);
+	/* Should work just fine. */
+	EXPECT_EQ(parent, syscall(__NR_getppid));
+	/* Should also work just fine */
+	EXPECT_EQ(mypid, syscall(__NR_getpid));
+}
+
+TEST_F(precedence, audit_is_fifth_in_any_order)
+{
+	pid_t mypid, parent;
+	long ret;
+
+	mypid = getpid();
+	parent = getppid();
+	ret = prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
+	ASSERT_EQ(0, ret);
+
+	ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->audit);
+	ASSERT_EQ(0, ret);
+	ret = prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &self->allow);
+	ASSERT_EQ(0, ret);
+	/* Should work just fine. */
+	EXPECT_EQ(parent, syscall(__NR_getppid));
+	/* Should also work just fine */
+	EXPECT_EQ(mypid, syscall(__NR_getpid));
+}
+
 #ifndef PTRACE_O_TRACESECCOMP
 #define PTRACE_O_TRACESECCOMP	0x00000080
 #endif
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH 2/2] seccomp: Audit SECCOMP_RET_ERRNO actions with errno values
From: Tyler Hicks @ 2017-01-02 17:42 UTC (permalink / raw)
  To: Steve Grubb; +Cc: Will Drewry, linux-kernel, Andy Lutomirski, linux-audit
In-Reply-To: <5284369.V7krsaxZyN@x2>


[-- Attachment #1.1: Type: text/plain, Size: 4154 bytes --]

On 2017-01-02 12:20:53, Steve Grubb wrote:
> On Monday, January 2, 2017 4:53:10 PM EST Tyler Hicks wrote:
> > Generate audit records for SECCOMP_RET_ERRNO actions, which were
> > previously not audited.
> > 
> > Additionally, include the errno value that will be set in the audit
> > message.
> > 
> > Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
> > ---
> >  include/linux/audit.h | 19 ++++++++++++++++++-
> >  kernel/auditsc.c      |  3 +++
> >  kernel/seccomp.c      |  4 +++-
> >  3 files changed, 24 insertions(+), 2 deletions(-)
> > 
> > diff --git a/include/linux/audit.h b/include/linux/audit.h
> > index 8c588c3..6815812 100644
> > --- a/include/linux/audit.h
> > +++ b/include/linux/audit.h
> > @@ -87,7 +87,10 @@ struct audit_field {
> > 
> >  struct audit_seccomp_info {
> >  	int		code;
> > -	long		signr;
> > +	union {
> > +		int	errno;
> > +		long	signr;
> > +	};
> >  };
> > 
> >  extern int is_audit_feature_set(int which);
> > @@ -319,6 +322,20 @@ static inline void audit_inode_child(struct inode
> > *parent, }
> >  void audit_core_dumps(long signr);
> > 
> > +static inline void audit_seccomp_errno(unsigned long syscall, int errno,
> > +				       int code)
> > +{
> > +	if (!audit_enabled)
> > +		return;
> > +
> > +	if (errno || unlikely(!audit_dummy_context())) {
> > +		struct audit_seccomp_info info = { .code = code,
> > +						   .errno = errno };
> > +
> > +		__audit_seccomp(syscall, &info);
> > +	}
> > +}
> > +
> >  static inline void audit_seccomp_signal(unsigned long syscall, long signr,
> >  					int code)
> >  {
> > diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> > index b3472f2..db5fc9d 100644
> > --- a/kernel/auditsc.c
> > +++ b/kernel/auditsc.c
> > @@ -2426,6 +2426,9 @@ void __audit_seccomp(unsigned long syscall, struct
> > audit_seccomp_info *info) audit_log_task(ab);
> > 
> >  	switch (info->code) {
> > +	case SECCOMP_RET_ERRNO:
> > +		audit_log_format(ab, " errno=%d", info->errno);
> > +		break;
> 
> "exit" is the field name that syscalls use to return errno to user space. I'd
> rather not see another field created that maps to the same thing. You can check
> the translation with the auformat utility:

Thanks for having a look at the field name I was using. Although I
prefer "errno" over "exit" in terms of clarity, I agree that it makes
sense to be consistent with the field names across record types. "exit"
works for me.

> 
> http://people.redhat.com/sgrubb/files/auformat.tar.gz
> 
> $ ausearch --start today --just-one -m syscall -sv no --raw | ./auformat "%EXIT\n"
> 
> Also, I am working to normalize all the records. That mean every event record
> of the same type has the same fields, in the same order, with the same
> representation. I would think "exit" could be added to the current record after
> syscall so that its ordered similarly to a syscall record.

This patch goes against your normalization efforts in more ways than
just the placement of the "exit" field. If the action is
SECCOMP_RET_KILL, a "sig" field is present but if the action is
SECCOMP_RET_ERRNO, the "sig" field will not be present but the "errno"
field will be present. This happens all within the AUDIT_SECCOMP record
type. How would you suggest normalizing AUDIT_SECCOMP records for
different seccomp return actions?

Tyler

> 
> -Steve
> 
> >  	case SECCOMP_RET_KILL:
> >  		audit_log_format(ab, " sig=%ld", info->signr);
> >  		break;
> > diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> > index 54c01b6..e99c566 100644
> > --- a/kernel/seccomp.c
> > +++ b/kernel/seccomp.c
> > @@ -576,9 +576,11 @@ static int __seccomp_filter(int this_syscall, const
> > struct seccomp_data *sd, /* Set low-order bits as an errno, capped at
> > MAX_ERRNO. */
> >  		if (data > MAX_ERRNO)
> >  			data = MAX_ERRNO;
> > +
> > +		audit_seccomp_errno(this_syscall, data, action);
> >  		syscall_set_return_value(current, task_pt_regs(current),
> >  					 -data, 0);
> > -		goto skip;
> > +		return -1;
> > 
> >  	case SECCOMP_RET_TRAP:
> >  		/* Show the handler the original registers. */
> 
> 

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply

* Re: [PATCH 05/22] audit: Fix sleep in atomic
From: Jan Kara @ 2017-01-02 18:21 UTC (permalink / raw)
  To: Paul Moore
  Cc: Jan Kara, linux-fsdevel, Amir Goldstein, Lino Sanfilippo,
	Miklos Szeredi, linux-audit
In-Reply-To: <CAHC9VhQ=V=BNKhaHt3e9_JdV8D5VwbWVKBuBBxS=oA-FjPjQ8A@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3016 bytes --]

On Mon 26-12-16 11:33:10, Paul Moore wrote:
> On Fri, Dec 23, 2016 at 9:17 AM, Paul Moore <paul@paul-moore.com> wrote:
> > On Fri, Dec 23, 2016 at 8:24 AM, Jan Kara <jack@suse.cz> wrote:
> >> On Thu 22-12-16 18:18:36, Paul Moore wrote:
> >>> On Thu, Dec 22, 2016 at 4:15 AM, Jan Kara <jack@suse.cz> wrote:
> >>> > Audit tree code was happily adding new notification marks while holding
> >>> > spinlocks. Since fsnotify_add_mark() acquires group->mark_mutex this can
> >>> > lead to sleeping while holding a spinlock, deadlocks due to lock
> >>> > inversion, and probably other fun. Fix the problem by acquiring
> >>> > group->mark_mutex earlier.
> >>> >
> >>> > CC: Paul Moore <paul@paul-moore.com>
> >>> > Signed-off-by: Jan Kara <jack@suse.cz>
> >>> > ---
> >>> >  kernel/audit_tree.c | 13 +++++++++++--
> >>> >  1 file changed, 11 insertions(+), 2 deletions(-)
> >>>
> >>> [SIDE NOTE: this patch explains your comments and my earlier concern
> >>> about the locked/unlocked variants of fsnotify_add_mark() in
> >>> untag_chunk()]
> >>>
> >>> Ouch.  Thanks for catching this ... what is your goal with these
> >>> patches, are you targeting this as a fix during the v4.10-rcX cycle?
> >>> If not, any objections if I pull this patch into the audit tree and
> >>> send this to Linus during the v4.10-rcX cycle (assuming it passes
> >>> testing, yadda yadda)?
> >>
> >> Sure, go ahead. I plan these patches for the next merge window. So I can
> >> rebase the series once you merge audit fixes...
> >
> > Okay, great.  I'll merge this patch in the audit/stable-4.10 branch
> > for Linus but there will likely be some delays due to
> > holidays/vacation on my end.
> >
> > Thanks again for your help fixing this, I really appreciate it.
> 
> I merged this patch, as well as the "Remove fsnotify_duplicate_mark()"
> patch (to make things cleaner when merging this patch) and did a quick
> test using the audit-testsuite ... the test hung on the "file_create"
> tests.  Unfortunately, I'm traveling right now for the holidays and
> will not likely have a chance to debug this much further until after
> the new year, but I thought I would mention it in case you had some
> time to look into this failure.
> 
> For reference, here is the audit-testsuite again:
> 
> * https://github.com/linux-audit/audit-testsuite
> 
> ... and if you have a Fedora test system, here is the Rawhide kernel I
> used to test (it is basically my kernel-secnext test kernel with those
> two patches mentioned above added on top):
> 
> * https://copr.fedorainfracloud.org/coprs/pcmoore/kernel-testing/build/492386

So I found where the problem was. Attached is a new version of the patch.
Tests from audit-testsuite fail for me but do not hang anymore. I guess the
failing is because I don't have audit or selinux configured in any way and
I'm using SUSE I guess (if there's some easy way to do that, I'd be
interested) - runtests.pl complains that I have to be root although I am...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

[-- Attachment #2: 0001-audit-Fix-sleep-in-atomic.patch --]
[-- Type: text/x-patch, Size: 3764 bytes --]

>From 300e01735a0b5dbf9cd32c932b77d9dc77258489 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Wed, 14 Dec 2016 14:40:05 +0100
Subject: [PATCH] audit: Fix sleep in atomic

Audit tree code was happily adding new notification marks while holding
spinlocks. Since fsnotify_add_mark() acquires group->mark_mutex this can
lead to sleeping while holding a spinlock, deadlocks due to lock
inversion, and probably other fun. Fix the problem by acquiring
group->mark_mutex earlier.

CC: Paul Moore <paul@paul-moore.com>
Signed-off-by: Jan Kara <jack@suse.cz>
---
 kernel/audit_tree.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
index f3130eb0a4bd..7b44195da81b 100644
--- a/kernel/audit_tree.c
+++ b/kernel/audit_tree.c
@@ -231,9 +231,11 @@ static void untag_chunk(struct node *p)
 	if (size)
 		new = alloc_chunk(size);
 
+	mutex_lock(&entry->group->mark_mutex);
 	spin_lock(&entry->lock);
 	if (chunk->dead || !entry->inode) {
 		spin_unlock(&entry->lock);
+		mutex_unlock(&entry->group->mark_mutex);
 		if (new)
 			free_chunk(new);
 		goto out;
@@ -251,6 +253,7 @@ static void untag_chunk(struct node *p)
 		list_del_rcu(&chunk->hash);
 		spin_unlock(&hash_lock);
 		spin_unlock(&entry->lock);
+		mutex_unlock(&entry->group->mark_mutex);
 		fsnotify_destroy_mark(entry, audit_tree_group);
 		goto out;
 	}
@@ -258,7 +261,8 @@ static void untag_chunk(struct node *p)
 	if (!new)
 		goto Fallback;
 
-	if (fsnotify_add_mark(&new->mark, entry->group, entry->inode, NULL, 1)) {
+	if (fsnotify_add_mark_locked(&new->mark, entry->group, entry->inode,
+				     NULL, 1)) {
 		fsnotify_put_mark(&new->mark);
 		goto Fallback;
 	}
@@ -292,6 +296,7 @@ static void untag_chunk(struct node *p)
 		owner->root = new;
 	spin_unlock(&hash_lock);
 	spin_unlock(&entry->lock);
+	mutex_unlock(&entry->group->mark_mutex);
 	fsnotify_destroy_mark(entry, audit_tree_group);
 	fsnotify_put_mark(&new->mark);	/* drop initial reference */
 	goto out;
@@ -308,6 +313,7 @@ static void untag_chunk(struct node *p)
 	put_tree(owner);
 	spin_unlock(&hash_lock);
 	spin_unlock(&entry->lock);
+	mutex_unlock(&entry->group->mark_mutex);
 out:
 	fsnotify_put_mark(entry);
 	spin_lock(&hash_lock);
@@ -385,17 +391,21 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
 
 	chunk_entry = &chunk->mark;
 
+	mutex_lock(&old_entry->group->mark_mutex);
 	spin_lock(&old_entry->lock);
 	if (!old_entry->inode) {
 		/* old_entry is being shot, lets just lie */
 		spin_unlock(&old_entry->lock);
+		mutex_unlock(&old_entry->group->mark_mutex);
 		fsnotify_put_mark(old_entry);
 		free_chunk(chunk);
 		return -ENOENT;
 	}
 
-	if (fsnotify_add_mark(chunk_entry, old_entry->group, old_entry->inode, NULL, 1)) {
+	if (fsnotify_add_mark_locked(chunk_entry, old_entry->group,
+				     old_entry->inode, NULL, 1)) {
 		spin_unlock(&old_entry->lock);
+		mutex_unlock(&old_entry->group->mark_mutex);
 		fsnotify_put_mark(chunk_entry);
 		fsnotify_put_mark(old_entry);
 		return -ENOSPC;
@@ -411,6 +421,7 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
 		chunk->dead = 1;
 		spin_unlock(&chunk_entry->lock);
 		spin_unlock(&old_entry->lock);
+		mutex_unlock(&old_entry->group->mark_mutex);
 
 		fsnotify_destroy_mark(chunk_entry, audit_tree_group);
 
@@ -443,6 +454,7 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
 	spin_unlock(&hash_lock);
 	spin_unlock(&chunk_entry->lock);
 	spin_unlock(&old_entry->lock);
+	mutex_unlock(&old_entry->group->mark_mutex);
 	fsnotify_destroy_mark(old_entry, audit_tree_group);
 	fsnotify_put_mark(chunk_entry);	/* drop initial reference */
 	fsnotify_put_mark(old_entry); /* pair to fsnotify_find mark_entry */
-- 
2.10.2


^ permalink raw reply related

* Re: [PATCH 2/2] seccomp: Audit SECCOMP_RET_ERRNO actions with errno values
From: Steve Grubb @ 2017-01-02 18:49 UTC (permalink / raw)
  To: Tyler Hicks; +Cc: Will Drewry, linux-kernel, Andy Lutomirski, linux-audit
In-Reply-To: <20170102174246.GA17677@sec>

On Monday, January 2, 2017 5:42:47 PM EST Tyler Hicks wrote:
> On 2017-01-02 12:20:53, Steve Grubb wrote:
> > On Monday, January 2, 2017 4:53:10 PM EST Tyler Hicks wrote:
> > > Generate audit records for SECCOMP_RET_ERRNO actions, which were
> > > previously not audited.
> > > 
> > > Additionally, include the errno value that will be set in the audit
> > > message.
> > > 
> > > Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
> > > ---
> > > 
> > >  include/linux/audit.h | 19 ++++++++++++++++++-
> > >  kernel/auditsc.c      |  3 +++
> > >  kernel/seccomp.c      |  4 +++-
> > >  3 files changed, 24 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/include/linux/audit.h b/include/linux/audit.h
> > > index 8c588c3..6815812 100644
> > > --- a/include/linux/audit.h
> > > +++ b/include/linux/audit.h
> > > @@ -87,7 +87,10 @@ struct audit_field {
> > > 
> > >  struct audit_seccomp_info {
> > >  
> > >  	int		code;
> > > 
> > > -	long		signr;
> > > +	union {
> > > +		int	errno;
> > > +		long	signr;
> > > +	};
> > > 
> > >  };
> > >  
> > >  extern int is_audit_feature_set(int which);
> > > 
> > > @@ -319,6 +322,20 @@ static inline void audit_inode_child(struct inode
> > > *parent, }
> > > 
> > >  void audit_core_dumps(long signr);
> > > 
> > > +static inline void audit_seccomp_errno(unsigned long syscall, int
> > > errno,
> > > +				       int code)
> > > +{
> > > +	if (!audit_enabled)
> > > +		return;
> > > +
> > > +	if (errno || unlikely(!audit_dummy_context())) {
> > > +		struct audit_seccomp_info info = { .code = code,
> > > +						   .errno = errno };
> > > +
> > > +		__audit_seccomp(syscall, &info);
> > > +	}
> > > +}
> > > +
> > > 
> > >  static inline void audit_seccomp_signal(unsigned long syscall, long
> > >  signr,
> > >  
> > >  					int code)
> > >  
> > >  {
> > > 
> > > diff --git a/kernel/auditsc.c b/kernel/auditsc.c
> > > index b3472f2..db5fc9d 100644
> > > --- a/kernel/auditsc.c
> > > +++ b/kernel/auditsc.c
> > > @@ -2426,6 +2426,9 @@ void __audit_seccomp(unsigned long syscall, struct
> > > audit_seccomp_info *info) audit_log_task(ab);
> > > 
> > >  	switch (info->code) {
> > > 
> > > +	case SECCOMP_RET_ERRNO:
> > > +		audit_log_format(ab, " errno=%d", info->errno);
> > > +		break;
> > 
> > "exit" is the field name that syscalls use to return errno to user space.
> > I'd rather not see another field created that maps to the same thing. You
> > can check
> > the translation with the auformat utility:
> Thanks for having a look at the field name I was using. Although I
> prefer "errno" over "exit" in terms of clarity, I agree that it makes
> sense to be consistent with the field names across record types. "exit"
> works for me.
> 
> > http://people.redhat.com/sgrubb/files/auformat.tar.gz
> > 
> > $ ausearch --start today --just-one -m syscall -sv no --raw | ./auformat
> > "%EXIT\n"
> > 
> > Also, I am working to normalize all the records. That mean every event
> > record of the same type has the same fields, in the same order, with the
> > same representation. I would think "exit" could be added to the current
> > record after syscall so that its ordered similarly to a syscall record.
> 
> This patch goes against your normalization efforts in more ways than
> just the placement of the "exit" field. If the action is
> SECCOMP_RET_KILL, a "sig" field is present but if the action is
> SECCOMP_RET_ERRNO, the "sig" field will not be present but the "errno"
> field will be present. This happens all within the AUDIT_SECCOMP record
> type. How would you suggest normalizing AUDIT_SECCOMP records for
> different seccomp return actions?

Typically when the layout has to change, we just give it a new record type.

 -Steve

 
> > >  	case SECCOMP_RET_KILL:
> > >  		audit_log_format(ab, " sig=%ld", info->signr);
> > >  		break;
> > > 
> > > diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> > > index 54c01b6..e99c566 100644
> > > --- a/kernel/seccomp.c
> > > +++ b/kernel/seccomp.c
> > > @@ -576,9 +576,11 @@ static int __seccomp_filter(int this_syscall, const
> > > struct seccomp_data *sd, /* Set low-order bits as an errno, capped at
> > > MAX_ERRNO. */
> > > 
> > >  		if (data > MAX_ERRNO)
> > >  		
> > >  			data = MAX_ERRNO;
> > > 
> > > +
> > > +		audit_seccomp_errno(this_syscall, data, action);
> > > 
> > >  		syscall_set_return_value(current, task_pt_regs(current),
> > >  		
> > >  					 -data, 0);
> > > 
> > > -		goto skip;
> > > +		return -1;
> > > 
> > >  	case SECCOMP_RET_TRAP:
> > >  		/* Show the handler the original registers. */

^ permalink raw reply

* Re: [PATCH 1/2] seccomp: Create an action to audit before allowing
From: kbuild test robot @ 2017-01-02 20:32 UTC (permalink / raw)
  To: Tyler Hicks
  Cc: kbuild-all, Paul Moore, Eric Paris, Kees Cook, Andy Lutomirski,
	Will Drewry, linux-audit, linux-kernel
In-Reply-To: <1483377999-15019-2-git-send-email-tyhicks@canonical.com>

[-- Attachment #1: Type: text/plain, Size: 1341 bytes --]

Hi Tyler,

[auto build test ERROR on linus/master]
[also build test ERROR on v4.10-rc2 next-20161224]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Tyler-Hicks/seccomp-Create-an-action-to-audit-before-allowing/20170103-041342
config: i386-randconfig-x003-201701 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   kernel/seccomp.c: In function '__seccomp_filter':
>> kernel/seccomp.c:634:3: error: implicit declaration of function 'audit_seccomp_common' [-Werror=implicit-function-declaration]
      audit_seccomp_common(this_syscall, action);
      ^~~~~~~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors

vim +/audit_seccomp_common +634 kernel/seccomp.c

   628			if (__seccomp_filter(this_syscall, NULL, true))
   629				return -1;
   630	
   631			return 0;
   632	
   633		case SECCOMP_RET_AUDIT:
 > 634			audit_seccomp_common(this_syscall, action);
   635			return 0;
   636	
   637		case SECCOMP_RET_ALLOW:

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 25534 bytes --]

^ permalink raw reply

* Re: [PATCH 0/2] Begin auditing SECCOMP_RET_ERRNO return actions
From: Paul Moore @ 2017-01-02 22:47 UTC (permalink / raw)
  To: Tyler Hicks
  Cc: Eric Paris, Kees Cook, Andy Lutomirski, Will Drewry, linux-audit,
	linux-kernel
In-Reply-To: <1483375990-14948-1-git-send-email-tyhicks@canonical.com>

On Mon, Jan 2, 2017 at 11:53 AM, Tyler Hicks <tyhicks@canonical.com> wrote:
> This patch set creates the basis for auditing information specific to a given
> seccomp return action and then starts auditing SECCOMP_RET_ERRNO return
> actions. The audit messages for SECCOMP_RET_ERRNO return actions include the
> errno value that will be returned to userspace.

I'm replying to this patchset posting because it his my inbox first,
but my comments here apply to both this patchset and the other
seccomp/audit patchset you posted.

In my experience, we have two or three problems (the count varies
depending on perspective) when it comes to seccomp filter reporting:

1. Inability to log all filter actions.
2. Inability to selectively enable filtering; e.g. devs want noisy
logging, users want relative quiet.
3. Consistent behavior with audit enabled and disabled.

My current thinking - forgive me, this has been kicking around in my
head for the better part of six months (longer?) and I haven't
attempted to code it up - is to create a sysctl knob for a system wide
seccomp logging threshold that would be applied to the high 16-bits of
*every* triggered action: if the action was at/below the threshold a
record would be emitted, otherwise silence.  This should resolve
problems #1 and #2, and the code should be relatively straightforward
and small.

As part of the code above, I expect that all seccomp logging would get
routed through a single logging function (sort of like a better
implementation of the existing audit_seccomp()) that would check the
threshold and trigger the logging if needed.  This function could be
augmented to check for CONFIG_AUDIT and in the case where audit was
not built into the kernel, a simple printk could be used to log the
seccomp event; solving problem #3.

We could also add a SECCOMP_RET_AUDIT, or similar, if we still feel
that is important (I personally waffle on this), but I think that is
independent of the ideas above.

Thoughts?

-- 
paul moore
www.paul-moore.com

^ permalink raw reply

* Re: [PATCH 2/2] seccomp: Audit SECCOMP_RET_ERRNO actions with errno values
From: Paul Moore @ 2017-01-02 22:55 UTC (permalink / raw)
  To: Steve Grubb
  Cc: Tyler Hicks, linux-audit, Eric Paris, Kees Cook, Andy Lutomirski,
	Will Drewry, linux-kernel
In-Reply-To: <1540151.sullFKCz8n@x2>

On Mon, Jan 2, 2017 at 1:49 PM, Steve Grubb <sgrubb@redhat.com> wrote:
> On Monday, January 2, 2017 5:42:47 PM EST Tyler Hicks wrote:
>> On 2017-01-02 12:20:53, Steve Grubb wrote:
>> > On Monday, January 2, 2017 4:53:10 PM EST Tyler Hicks wrote:

...

>> Thanks for having a look at the field name I was using. Although I
>> prefer "errno" over "exit" in terms of clarity, I agree that it makes
>> sense to be consistent with the field names across record types. "exit"
>> works for me.

FWIW, we have a nice (searchable due to GitHub CSV magic) audit field
database at the link below.  I will admit that it may be a bit crusty
in places, but we are making a new effort to keep it updated, if you
notice anything wrong, send email and/or a PR.

* https://github.com/linux-audit/audit-documentation/blob/master/specs/fields/field-dictionary.csv

>> > http://people.redhat.com/sgrubb/files/auformat.tar.gz
>> >
>> > $ ausearch --start today --just-one -m syscall -sv no --raw | ./auformat
>> > "%EXIT\n"
>> >
>> > Also, I am working to normalize all the records. That mean every event
>> > record of the same type has the same fields, in the same order, with the
>> > same representation. I would think "exit" could be added to the current
>> > record after syscall so that its ordered similarly to a syscall record.
>>
>> This patch goes against your normalization efforts in more ways than
>> just the placement of the "exit" field. If the action is
>> SECCOMP_RET_KILL, a "sig" field is present but if the action is
>> SECCOMP_RET_ERRNO, the "sig" field will not be present but the "errno"
>> field will be present. This happens all within the AUDIT_SECCOMP record
>> type. How would you suggest normalizing AUDIT_SECCOMP records for
>> different seccomp return actions?
>
> Typically when the layout has to change, we just give it a new record type.

I'm going to be very loathe to accept any new record types that *only*
reorder fields; if you need to add a new field, simply add it to the
end of the record.  From my perspective new record types are really
only an option if we need to remove a field that is bogus/confusing or
some other similar case that is not easily solved.  New record types
are a last resort.

-- 
paul moore
www.paul-moore.com

^ permalink raw reply

* Re: [PATCH 0/2] Begin auditing SECCOMP_RET_ERRNO return actions
From: Andy Lutomirski @ 2017-01-03  5:56 UTC (permalink / raw)
  To: Paul Moore
  Cc: Tyler Hicks, Eric Paris, Kees Cook, Will Drewry, linux-audit,
	linux-kernel@vger.kernel.org
In-Reply-To: <CAHC9VhSOwbY9WEOKZGsx4mf=MAXeudTnQF9nXmKu+OoAs0SDsQ@mail.gmail.com>

On Mon, Jan 2, 2017 at 2:47 PM, Paul Moore <paul@paul-moore.com> wrote:
> On Mon, Jan 2, 2017 at 11:53 AM, Tyler Hicks <tyhicks@canonical.com> wrote:
>> This patch set creates the basis for auditing information specific to a given
>> seccomp return action and then starts auditing SECCOMP_RET_ERRNO return
>> actions. The audit messages for SECCOMP_RET_ERRNO return actions include the
>> errno value that will be returned to userspace.
>
> I'm replying to this patchset posting because it his my inbox first,
> but my comments here apply to both this patchset and the other
> seccomp/audit patchset you posted.
>
> In my experience, we have two or three problems (the count varies
> depending on perspective) when it comes to seccomp filter reporting:
>
> 1. Inability to log all filter actions.
> 2. Inability to selectively enable filtering; e.g. devs want noisy
> logging, users want relative quiet.
> 3. Consistent behavior with audit enabled and disabled.
>
> My current thinking - forgive me, this has been kicking around in my
> head for the better part of six months (longer?) and I haven't
> attempted to code it up - is to create a sysctl knob for a system wide
> seccomp logging threshold that would be applied to the high 16-bits of
> *every* triggered action: if the action was at/below the threshold a
> record would be emitted, otherwise silence.  This should resolve
> problems #1 and #2, and the code should be relatively straightforward
> and small.
>
> As part of the code above, I expect that all seccomp logging would get
> routed through a single logging function (sort of like a better
> implementation of the existing audit_seccomp()) that would check the
> threshold and trigger the logging if needed.  This function could be
> augmented to check for CONFIG_AUDIT and in the case where audit was
> not built into the kernel, a simple printk could be used to log the
> seccomp event; solving problem #3.

Would this not be doable with a seccomp tracepoint and a BPF filter?

--Andy

^ permalink raw reply

* Re: [PATCH 0/2] Begin auditing SECCOMP_RET_ERRNO return actions
From: Andy Lutomirski @ 2017-01-03  5:57 UTC (permalink / raw)
  To: Tyler Hicks
  Cc: Paul Moore, Eric Paris, Kees Cook, Will Drewry, linux-audit,
	linux-kernel@vger.kernel.org
In-Reply-To: <1483375990-14948-1-git-send-email-tyhicks@canonical.com>

On Mon, Jan 2, 2017 at 8:53 AM, Tyler Hicks <tyhicks@canonical.com> wrote:
> This patch set creates the basis for auditing information specific to a given
> seccomp return action and then starts auditing SECCOMP_RET_ERRNO return
> actions. The audit messages for SECCOMP_RET_ERRNO return actions include the
> errno value that will be returned to userspace.
>

Not that I'm opposed to the idea, but what's the intended purpose?

^ permalink raw reply

* Re: [PATCH 0/2] Begin auditing SECCOMP_RET_ERRNO return actions
From: Tyler Hicks @ 2017-01-03 13:31 UTC (permalink / raw)
  To: Paul Moore; +Cc: Will Drewry, linux-kernel, Andy Lutomirski, linux-audit
In-Reply-To: <CAHC9VhSOwbY9WEOKZGsx4mf=MAXeudTnQF9nXmKu+OoAs0SDsQ@mail.gmail.com>


[-- Attachment #1.1.1: Type: text/plain, Size: 2872 bytes --]

On 01/02/2017 04:47 PM, Paul Moore wrote:
> On Mon, Jan 2, 2017 at 11:53 AM, Tyler Hicks <tyhicks@canonical.com> wrote:
>> This patch set creates the basis for auditing information specific to a given
>> seccomp return action and then starts auditing SECCOMP_RET_ERRNO return
>> actions. The audit messages for SECCOMP_RET_ERRNO return actions include the
>> errno value that will be returned to userspace.
> 
> I'm replying to this patchset posting because it his my inbox first,
> but my comments here apply to both this patchset and the other
> seccomp/audit patchset you posted.
> 
> In my experience, we have two or three problems (the count varies
> depending on perspective) when it comes to seccomp filter reporting:
> 
> 1. Inability to log all filter actions.
> 2. Inability to selectively enable filtering; e.g. devs want noisy
> logging, users want relative quiet.
> 3. Consistent behavior with audit enabled and disabled.

Agreed. Those three logging issues are what have been nagging me the most.

> My current thinking - forgive me, this has been kicking around in my
> head for the better part of six months (longer?) and I haven't
> attempted to code it up - is to create a sysctl knob for a system wide
> seccomp logging threshold that would be applied to the high 16-bits of
> *every* triggered action: if the action was at/below the threshold a
> record would be emitted, otherwise silence.  This should resolve
> problems #1 and #2, and the code should be relatively straightforward
> and small.

I like that idea quite a bit. To be completely honest, for #1, I
personally only care about logging SECCOMP_RET_ERRNO actions but this
idea solves it in a nice and general way.

> As part of the code above, I expect that all seccomp logging would get
> routed through a single logging function (sort of like a better
> implementation of the existing audit_seccomp()) that would check the
> threshold and trigger the logging if needed.  This function could be
> augmented to check for CONFIG_AUDIT and in the case where audit was
> not built into the kernel, a simple printk could be used to log the
> seccomp event; solving problem #3.

That doesn't fully solve #3 for me. In Ubuntu (and I think Debian), we
build with CONFIG_AUDIT enabled but don't ship auditd by default so
audit_enabled is false. In that default configuration, we still want
seccomp audit messages to be printk'ed. I'll need to figure out how to
cleanly allow opting into seccomp audit messages when CONFIG_AUDIT is
enabled and audit_enabled is false.

> We could also add a SECCOMP_RET_AUDIT, or similar, if we still feel
> that is important (I personally waffle on this), but I think that is
> independent of the ideas above.

I agree that it is independent but SECCOMP_RET_AUDIT would still be
important to Ubuntu.

Tyler


[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply

* Re: [PATCH 0/2] Begin auditing SECCOMP_RET_ERRNO return actions
From: Tyler Hicks @ 2017-01-03 13:53 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Paul Moore, Eric Paris, Kees Cook, Will Drewry, linux-audit,
	linux-kernel@vger.kernel.org
In-Reply-To: <CALCETrW7sYuUG12DQzgj1pui16KufdBU_3kMw7s3g5oaPpebDg@mail.gmail.com>


[-- Attachment #1.1: Type: text/plain, Size: 1888 bytes --]

On 01/02/2017 11:57 PM, Andy Lutomirski wrote:
> On Mon, Jan 2, 2017 at 8:53 AM, Tyler Hicks <tyhicks@canonical.com> wrote:
>> This patch set creates the basis for auditing information specific to a given
>> seccomp return action and then starts auditing SECCOMP_RET_ERRNO return
>> actions. The audit messages for SECCOMP_RET_ERRNO return actions include the
>> errno value that will be returned to userspace.
>>
> 
> Not that I'm opposed to the idea, but what's the intended purpose?

Ubuntu has a security sandbox, which includes seccomp as a part of the
confinement strategy, that we're using to confine untrusted third-party
applications. Today, we're using SECCOMP_RET_KILL as the default action
when the applications make a call to a syscall that is not allowed by
the sandbox. It is great from a security perspective but not so great
from the perspective of the application developer as their application
(or in some cases, an interpretor) may work fine without the illegal
syscall but it doesn't get the chance to because it is killed.

In the near future, we want to switch over to using SECCOMP_RET_ERRNO
(the errno is still TBD) as the default action to improve the
application developer experience. The largest remaining blocker is that
there are no audit messages when a SECCOMP_RET_ERRNO action is taken.
Therefore, we can't suggest (to the application developer or to the
user) which sandbox knobs need to be turned to better suite their
application, we can't let the application developer know that a syscall
they're using is illegal outside of them having to debug an odd errno
value, and we can't let the user know of a potentially subverted process
that's under confinement of the sandbox. All of that could be addressed
if SECCOMP_RET_ERRNO actions generated audit messages.

I hope that helps to understand the use case.

Tyler



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply

* Re: [PATCH 06/22] audit: Abstract hash key handling
From: Jan Kara @ 2017-01-03 17:34 UTC (permalink / raw)
  To: Paul Moore
  Cc: Jan Kara, linux-fsdevel, Amir Goldstein, Lino Sanfilippo,
	Miklos Szeredi, linux-audit
In-Reply-To: <CAHC9VhT3EbWgfR_G39ZV2UUm21+CUDnemNDUPgwYP_vj556D-g@mail.gmail.com>

On Fri 23-12-16 09:13:55, Paul Moore wrote:
> On Fri, Dec 23, 2016 at 8:27 AM, Jan Kara <jack@suse.cz> wrote:
> > On Thu 22-12-16 18:27:40, Paul Moore wrote:
> >> On Thu, Dec 22, 2016 at 4:15 AM, Jan Kara <jack@suse.cz> wrote:
> >> > Audit tree currently uses inode pointer as a key into the hash table.
> >> > Getting that from notification mark will be somewhat more difficult with
> >> > coming fsnotify changes and there's no reason we really have to use the
> >> > inode pointer. So abstract getting of hash key from the audit chunk and
> >> > inode so that we can switch to a different key easily later.
> >> >
> >> > CC: Paul Moore <paul@paul-moore.com>
> >> > Signed-off-by: Jan Kara <jack@suse.cz>
> >> > ---
> >> >  kernel/audit_tree.c | 39 ++++++++++++++++++++++++++++-----------
> >> >  1 file changed, 28 insertions(+), 11 deletions(-)
> >>
> >> I have no objections with this patch in particular, but in patch 8,
> >> are you certain that inode_to_key() and chunk_to_key() will continue
> >> to return the same key value?
> >
> > Yes, that's the intention. Or better in that patch the key will no longer
> > be inode pointer but instead the fsnotify_list pointer. But still it would
> > match for chunks attached to an inode and inode itself so comparison
> > results should stay the same.
> 
> My apologies, I probably should have been more clear.
> 
> Yes, I think we are all in agreement that the *_to_key() functions
> need to return a consistent value for the same object.  My concern is
> that in patch 8 these functions are using different variables (granted
> they may contain the same value, and therefore evaluate to the same
> key) and I worry that there is a possibility of the two variables
> taking on different values and breaking the hash.  What guarantees
> exist that these values will be the same?  Are there any safeguards to
> prevent future patches from accidentally sidestepping these
> guarantees?

Ah, OK, so this is more about patch 8 than patch 6. So far audit uses inode
pointer as a key - now with patch 8, there is a fsnotify_mark_list attached
to an inode if and only if there is any fsnotify_mark for that inode and
both inode->i_fsnotify_marks (used as a key in inode_to_key()) and
mark->obj_list_head (used as a key in chunk_to_key()) point to it. So keys
for an inode and chunk match if and only if the fsnotify mark in the chunk
is attached to the inode - the same as before patch 8.

The only reason why I changed audit to use a different pointer for the key
is that you need some lock protection to do mark->obj_list_head->inode
dereference and this seemed the easiest. Actually now that all the lifetime
rules have worked out, I can see we can actually use inode pointer as a key
relatively easily since mark->obj_list_head is stable once you hold a mark
reference so locking would be only intermediate step until this gets
established in the series. Would you prefer me to do that?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply

* Re: [PATCH 0/2] Begin auditing SECCOMP_RET_ERRNO return actions
From: Paul Moore @ 2017-01-03 19:31 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Tyler Hicks, Eric Paris, Kees Cook, Will Drewry, linux-audit,
	linux-kernel@vger.kernel.org
In-Reply-To: <CALCETrViORew2PzXSPrCS+aqUnVTsatr85b05DPr9eG7RSGT+Q@mail.gmail.com>

On Tue, Jan 3, 2017 at 12:56 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Mon, Jan 2, 2017 at 2:47 PM, Paul Moore <paul@paul-moore.com> wrote:
>> On Mon, Jan 2, 2017 at 11:53 AM, Tyler Hicks <tyhicks@canonical.com> wrote:
>>> This patch set creates the basis for auditing information specific to a given
>>> seccomp return action and then starts auditing SECCOMP_RET_ERRNO return
>>> actions. The audit messages for SECCOMP_RET_ERRNO return actions include the
>>> errno value that will be returned to userspace.
>>
>> I'm replying to this patchset posting because it his my inbox first,
>> but my comments here apply to both this patchset and the other
>> seccomp/audit patchset you posted.
>>
>> In my experience, we have two or three problems (the count varies
>> depending on perspective) when it comes to seccomp filter reporting:
>>
>> 1. Inability to log all filter actions.
>> 2. Inability to selectively enable filtering; e.g. devs want noisy
>> logging, users want relative quiet.
>> 3. Consistent behavior with audit enabled and disabled.
>>
>> My current thinking - forgive me, this has been kicking around in my
>> head for the better part of six months (longer?) and I haven't
>> attempted to code it up - is to create a sysctl knob for a system wide
>> seccomp logging threshold that would be applied to the high 16-bits of
>> *every* triggered action: if the action was at/below the threshold a
>> record would be emitted, otherwise silence.  This should resolve
>> problems #1 and #2, and the code should be relatively straightforward
>> and small.
>>
>> As part of the code above, I expect that all seccomp logging would get
>> routed through a single logging function (sort of like a better
>> implementation of the existing audit_seccomp()) that would check the
>> threshold and trigger the logging if needed.  This function could be
>> augmented to check for CONFIG_AUDIT and in the case where audit was
>> not built into the kernel, a simple printk could be used to log the
>> seccomp event; solving problem #3.
>
> Would this not be doable with a seccomp tracepoint and a BPF filter?

One of the motivations for the above idea is to make it easier for
admins/users to customize the seccomp logging on their own systems,
it's not just to make devs lives easier.  I feel okay providing
guidance to an admin/user the involves setting a sysctl variable, I
can't say the same about asking them to write their own BPF ;)

-- 
paul moore
www.paul-moore.com

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox