Re: [PATCH] audit: add backlog high water mark metric

public inbox for audit@vger.kernel.org
 help / color / mirror / Atom feed

From: Steve Grubb <sgrubb@redhat.com>
To: Ricardo Robaina <rrobaina@redhat.com>, Paul Moore <paul@paul-moore.com>
Cc: audit@vger.kernel.org, linux-kernel@vger.kernel.org, eparis@redhat.com
Subject: Re: [PATCH] audit: add backlog high water mark metric
Date: Tue, 14 Apr 2026 23:45:00 -0400	[thread overview]
Message-ID: <2213475.9o76ZdvQCi@x2> (raw)
In-Reply-To: <CAHC9VhQTGfuOYev_UNzwSqUoYX-y+RYyMZgWWG2eGzM62eVm9A@mail.gmail.com>

Hello Paul,

On Friday, April 10, 2026 5:34:08 PM Eastern Daylight Time Paul Moore wrote:
> On Mon, Mar 23, 2026 at 11:07 AM Ricardo Robaina <rrobaina@redhat.com> 
wrote:
> > Currently, determining the optimal `audit_backlog_limit` relies on
> > instantaneous polling of the queue size. This misses transient
> > micro-bursts, making it difficult for system administrators to know
> > if their queue is adequately sized or if they are at risk of
> > dropping events.
> > 
> > This patch introduces `backlog_max_depth`, a high-water mark metric
> > that tracks the maximum number of buffers in the audit queue since
> > the system was booted or the metric was last reset. To minimize
> > performance overhead in the fast-path, the metric is updated using
> > a lockless cmpxchg loop in `__audit_log_end()`.
> > 
> > Userspace can read-and-clear this metric by sending an `AUDIT_SET`
> > message with the `AUDIT_STATUS_BACKLOG_MAX_DEPTH` mask. To support
> > periodic telemetry polling (e.g., statsd, Prometheus), the reset
> > operation atomically returns the snapshot of the high-water mark
> > right before zeroing it, ensuring no peaks are lost between polls.
> > 
> > Link: https://github.com/linux-audit/audit-kernel/issues/63
> > Suggested-by: Steve Grubb <sgrubb@redhat.com>
> > Signed-off-by: Ricardo Robaina <rrobaina@redhat.com>
> > ---
> > 
> >  include/linux/audit.h      |  3 ++-
> >  include/uapi/linux/audit.h |  2 ++
> >  kernel/audit.c             | 32 ++++++++++++++++++++++++++++++++
> >  3 files changed, 36 insertions(+), 1 deletion(-)
> 
> I sat on this for a bit because I wanted to think on it for a while.
> While I agree audit could benefit from better statistics around
> queue/backlog status, I'm not sure a single "max" value alone is worth
> a bit in the audit_status bitmask.  My concern is that the max queue
> length only provides a single snapshot of what the queue looked like,
> it doesn't give any indication of the average queue length over a span
> of time.  Some audit users are willing to live with occasional drops,
> and the max size doesn't help them arrive at a good balance.
> 
> As for the users who can't tolerate any audit record drops?  They
> shouldn't be running with a backlog limit anyway so the maximum queue
> value will be of limit use.

The existing audit_lost counter tells administrators they have already 
failed; the proposed backlog_max_depth tells them they are at risk of 
failing. These are different signals serving different operational needs. The 
dominant real-world deployment — compliance-driven systems that must use a 
finite backlog limit for memory safety but cannot tolerate dropped events — 
has no existing mechanism to verify their limit is correctly sized between 
polling intervals. Instantaneous backlog polling is blind to sub-second 
bursts. Only a high-water mark, atomically reset at each poll, closes this 
gap. The average queue length would not answer the question 'did I ever come 
close to the limit?' — only the maximum can.

On the bitmask concern: the last addition was 
AUDIT_STATUS_BACKLOG_WAIT_TIME_ACTUAL, six years ago.

If you don't think this closes the gap on what people need, the patch could 
be amended to include  backlog_lost_since_reset (drops since last poll) 
alongside the max so that you get two metrics for the price of one bit. But 
this is absolutely needed because people are flying blind without it.

-Steve

next prev parent reply	other threads:[~2026-04-15  3:45 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-23 15:07 [PATCH] audit: add backlog high water mark metric Ricardo Robaina
2026-03-23 16:48 ` Steve Grubb
2026-04-10 21:34 ` Paul Moore
2026-04-15  3:45   ` Steve Grubb [this message]
2026-04-15 15:19     ` Paul Moore
2026-04-15 15:21       ` Paul Moore
2026-04-16 20:33         ` Steve Grubb
2026-04-16 20:51           ` Paul Moore
2026-04-16 20:58             ` Paul Moore
2026-04-17 13:02               ` Ricardo Robaina

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2213475.9o76ZdvQCi@x2 \
    --to=sgrubb@redhat.com \
    --cc=audit@vger.kernel.org \
    --cc=eparis@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paul@paul-moore.com \
    --cc=rrobaina@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox