All of lore.kernel.org
 help / color / mirror / Atom feed
From: Max Englander <max.englander@gmail.com>
To: Paul Moore <paul@paul-moore.com>
Cc: linux-audit@redhat.com
Subject: Re: [PATCH v2] audit: report audit wait metric in audit status reply
Date: Fri, 3 Jul 2020 22:31:22 +0000	[thread overview]
Message-ID: <20200703223121.GA10378@linux-kernel-dev> (raw)
In-Reply-To: <CAHC9VhTctHCCrm4Q1cPdFX-6NXEtmjEPmw6rvUoxOq8UUmycxA@mail.gmail.com>

On Thu, Jul 02, 2020 at 04:42:13PM -0400, Paul Moore wrote:
> On Wed, Jul 1, 2020 at 5:32 PM Max Englander <max.englander@gmail.com> wrote:
> >
> > In environments where the preservation of audit events and predictable
> > usage of system memory are prioritized, admins may use a combination of
> > --backlog_wait_time and -b options at the risk of degraded performance
> > resulting from backlog waiting. In some cases, this risk may be
> > preferred to lost events or unbounded memory usage. Ideally, this risk
> > can be mitigated by making adjustments when backlog waiting is detected.
> >
> > However, detection can be diffult using the currently available metrics.
> > For example, an admin attempting to debug degraded performance may
> > falsely believe a full backlog indicates backlog waiting. It may turn
> > out the backlog frequently fills up but drains quickly.
> >
> > To make it easier to reliably track degraded performance to backlog
> > waiting, this patch makes the following changes:
> >
> > Add a new field backlog_wait_sum to the audit status reply. Initialize
> > this field to zero. Add to this field the total time spent by the
> > current task on scheduled timeouts while the backlog limit is exceeded.
> >
> > Tested on Ubuntu 18.04 using complementary changes to the audit
> > userspace: https://github.com/linux-audit/audit-userspace/pull/134.
> >
> > Signed-off-by: Max Englander <max.englander@gmail.com>
> > ---
> >  Patch changelogs between v1 and v2:
> >  - Instead of printing a warning when backlog waiting occurs, add
> >    duration of backlog waiting to cumulative sum, and report this
> >    sum in audit status reply.
> >
> >  include/uapi/linux/audit.h | 7 ++++++-
> >  kernel/audit.c             | 9 +++++++++
> >  2 files changed, 15 insertions(+), 1 deletion(-)
> 
> Hi Max,
> 
> In general this looks better than the previous approach, but I do have
> a few specific comments (inline).  It also important that in addition

Thanks for your feedback and comments, Paul. I've prepared a v3 patch
addressing all of your comments, with corresponding changes to the
audit-userspace, which I'll submit once I get a working change to
the test suite. I'll use this thread to respond to some things and ask
a question.

> to the requisite userspace patch, we also need a test added to the
> audit-testsuite project so we can verify this functionality in the
> future.
> 
> * https://github.com/linux-audit/audit-testsuite
> 

I downloaded this test suite and attempted to run it on Ubuntu 18.04,
with the latest audit kernel tree and latest audit-userspace. Many tests
were failing, I assume because I have some issues with my environment.
May I ask what environment (OS, tree, commit) you recommend for running
the test suite? Happy to move this question over to GitHub if that's a
better venue.

> > diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
> > index a534d71e689a..ea0cc364beca 100644
> > --- a/include/uapi/linux/audit.h
> > +++ b/include/uapi/linux/audit.h
> > @@ -340,6 +340,7 @@ enum {
> >  #define AUDIT_STATUS_BACKLOG_LIMIT     0x0010
> >  #define AUDIT_STATUS_BACKLOG_WAIT_TIME 0x0020
> >  #define AUDIT_STATUS_LOST              0x0040
> > +#define AUDIT_STATUS_BACKLOG_WAIT_SUM  0x0080
> 
> Sooo ... you've defined this, but I don't see any of the corresponding
> AUDIT_SET code that I would expect, was that an oversight?  If not, it
> is something we should support in the kernel as I'm sure admins will
> want to reset this value at some point.

To be honest I had based this patch off v1 which included a flag for
setting the backlog warn time threshold, but didn't remove it from the
v2 patch (an oversight). I wasn't thinking about admins' need to reset
the value, but since you suggested it I've included support for that in
the v3 patch.

> 
> >  #define AUDIT_FEATURE_BITMAP_BACKLOG_LIMIT     0x00000001
> >  #define AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_TIME 0x00000002
> > @@ -348,6 +349,7 @@ enum {
> >  #define AUDIT_FEATURE_BITMAP_SESSIONID_FILTER  0x00000010
> >  #define AUDIT_FEATURE_BITMAP_LOST_RESET                0x00000020
> >  #define AUDIT_FEATURE_BITMAP_FILTER_FS         0x00000040
> > +#define AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_SUM  0x00000080
> 
> In an effort not to exhaust the feature bitmap too quickly, I've been
> restricting it to only those features that would cause breakage with
> userspace.  I haven't looked closely at Steve's userspace in quite a
> while, but I'm guessing it can key off the structure size and doesn't
> need this entry in the bitmap, right?  Let me rephrase, if userspace
> needs to key off anything, it *should* key off the structure size and
> not a new flag in the bitmask ;)

I've removed this flag from the v3 patch, but if it's alright with
you I'll key off the absence/presence of
AUDIT_STATUS_BACKLOG_WAIT_TIME_ACTUAL, which follows the pattern
currently used throughout the audit-userspace, and which Richard suggests
in separate message.

> 
> Also, I'm assuming that older userspace doesn't blow-up if it sees the
> larger structure size?  That's even more important.
> 
> >  #define AUDIT_FEATURE_BITMAP_ALL (AUDIT_FEATURE_BITMAP_BACKLOG_LIMIT | \
> >                                   AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_TIME | \
> > @@ -355,12 +357,14 @@ enum {
> >                                   AUDIT_FEATURE_BITMAP_EXCLUDE_EXTEND | \
> >                                   AUDIT_FEATURE_BITMAP_SESSIONID_FILTER | \
> >                                   AUDIT_FEATURE_BITMAP_LOST_RESET | \
> > -                                 AUDIT_FEATURE_BITMAP_FILTER_FS)
> > +                                 AUDIT_FEATURE_BITMAP_FILTER_FS | \
> > +                                 AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_SUM)
> >
> >  /* deprecated: AUDIT_VERSION_* */
> >  #define AUDIT_VERSION_LATEST           AUDIT_FEATURE_BITMAP_ALL
> >  #define AUDIT_VERSION_BACKLOG_LIMIT    AUDIT_FEATURE_BITMAP_BACKLOG_LIMIT
> >  #define AUDIT_VERSION_BACKLOG_WAIT_TIME        AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_TIME
> > +#define AUDIT_VERSION_BACKLOG_WAIT_SUM AUDIT_FEATURE_BITMAP_BACKLOG_WAIT_SUM
> >
> >                                 /* Failure-to-log actions */
> >  #define AUDIT_FAIL_SILENT      0
> > @@ -466,6 +470,7 @@ struct audit_status {
> >                 __u32   feature_bitmap; /* bitmap of kernel audit features */
> >         };
> >         __u32           backlog_wait_time;/* message queue wait timeout */
> > +       __u32           backlog_wait_sum;/* time spent waiting while message limit exceeded */
> 
> This is very nitpicky, but how about a rename to 'backlog_wait_time_actual'?

Sure, will do.

> 
> > diff --git a/kernel/audit.c b/kernel/audit.c
> > index 87f31bf1f0a0..301ea4f3d750 100644
> > --- a/kernel/audit.c
> > +++ b/kernel/audit.c
> > @@ -136,6 +136,11 @@ u32                audit_sig_sid = 0;
> >  */
> >  static atomic_t        audit_lost = ATOMIC_INIT(0);
> >
> > +/* Monotonically increasing sum of time the kernel has spent
> > + * waiting while the backlog limit is exceeded.
> > + */
> > +static atomic_t audit_backlog_wait_sum = ATOMIC_INIT(0);
> 
> Needless to say, this should be renamed too so we don't go crazy.

Of course :)

> 
> >  /* Hash for inode-based rules */
> >  struct list_head audit_inode_hash[AUDIT_INODE_BUCKETS];
> >
> > @@ -1204,6 +1209,7 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh)
> >                 s.backlog               = skb_queue_len(&audit_queue);
> >                 s.feature_bitmap        = AUDIT_FEATURE_BITMAP_ALL;
> >                 s.backlog_wait_time     = audit_backlog_wait_time;
> > +               s.backlog_wait_sum      = atomic_read(&audit_backlog_wait_sum);
> >                 audit_send_reply(skb, seq, AUDIT_GET, 0, 0, &s, sizeof(s));
> >                 break;
> >         }
> > @@ -1794,6 +1800,9 @@ struct audit_buffer *audit_log_start(struct audit_context *ctx, gfp_t gfp_mask,
> >                                 return NULL;
> >                         }
> >                 }
> > +
> > +               if (stime != audit_backlog_wait_time)
> > +                       atomic_add(audit_backlog_wait_time - stime, &audit_backlog_wait_sum);
> 
> Since stime can only be different in one place in the code above
> (after the schedule_timeout() call), why not move the atomic_add() up
> there and drop the "if"?  Yes there is the potential of calling
> atomic_add() multiple times in this case, but the thread is waiting
> anyway and this way we don't impact other code paths.

My concern was calling atomic_add() more than necessary, but happy to
move it up as you suggest.

> 
> >         }
> >
> >         ab = audit_buffer_alloc(ctx, gfp_mask, type);
> > --
> > 2.17.1
> 
> -- 
> paul moore
> www.paul-moore.com

--
Linux-audit mailing list
Linux-audit@redhat.com
https://www.redhat.com/mailman/listinfo/linux-audit


  parent reply	other threads:[~2020-07-03 22:34 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-01 21:32 [PATCH v2] audit: report audit wait metric in audit status reply Max Englander
2020-07-02 20:42 ` Paul Moore
2020-07-03 21:29   ` Richard Guy Briggs
2020-07-03 22:36     ` Max Englander
2020-07-03 22:31   ` Max Englander [this message]
2020-12-03  3:52   ` Steve Grubb
2020-12-03  4:12     ` Paul Moore
2020-12-03 12:37       ` Richard Guy Briggs
2020-12-03 15:37         ` Paul Moore
2020-12-03 23:10           ` Richard Guy Briggs
2020-12-03 23:43             ` Paul Moore
2020-12-03 23:55               ` Steve Grubb
2020-12-04  2:16                 ` Paul Moore
2020-12-04  2:47                   ` Steve Grubb
2020-12-04 20:41                     ` Paul Moore
2020-12-07 21:13                       ` Max Englander
2020-12-07 21:17                         ` Paul Moore
2020-12-07 21:21                         ` Richard Guy Briggs
2020-12-07 21:28                           ` Max Englander
2020-12-07 23:28                             ` Steve Grubb
2020-12-08  1:34                               ` Richard Guy Briggs
2020-12-08  3:34                                 ` Steve Grubb
2020-12-08 13:20                                   ` Richard Guy Briggs
2020-12-08 13:44                                     ` Steve Grubb
2020-12-08 23:08                                 ` Paul Moore
2020-12-03 13:31       ` Steve Grubb
2020-12-07 19:43   ` Lenny Bruzenak
2020-12-07 21:14     ` Paul Moore
2020-12-03  4:33 ` Joe Wulf
2020-12-07 21:48   ` Max Englander
2020-12-08 16:57 ` Lenny Bruzenak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200703223121.GA10378@linux-kernel-dev \
    --to=max.englander@gmail.com \
    --cc=linux-audit@redhat.com \
    --cc=paul@paul-moore.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.