From mboxrd@z Thu Jan 1 00:00:00 1970 From: Steve Grubb Subject: Re: Lost events during boot Date: Mon, 20 Mar 2017 11:08:56 -0400 Message-ID: <2742334.zvR4i4OIcv@x2> References: <3997070.g5Zg3o8xPs@x2> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: linux-audit-bounces@redhat.com Errors-To: linux-audit-bounces@redhat.com To: Paul Moore Cc: Richard Briggs , linux-audit@redhat.com List-Id: linux-audit@redhat.com On Monday, March 20, 2017 10:55:43 AM EDT Paul Moore wrote: > On Mon, Mar 20, 2017 at 10:44 AM, Paul Moore wrote: > > On Mon, Mar 20, 2017 at 8:08 AM, Paul Moore wrote: > >> On Sun, Mar 19, 2017 at 9:46 PM, Steve Grubb wrote: > >>> Hello Richard and Paul, > >>> > >>> I was going to do a blog write up about booting the system with > >>> audit_backlog_limit=8192 for STIG users and have stumbled on to a > >>> mystery. The kernel initializes the variable to 64 at power on. During > >>> boot, if audit == 1, then it holds events in the hopes that an audit > >>> daemon will show up later and drain all the events. Anything over 64 > >>> events should fall off the end and increment the lost counter and put a > >>> notice in syslog. > >>> > >>> However, when booting with audit_backlog_limit=8192, as soon as I log in > >>> I run "auditctl -s" I can see I've lost 73 events. The I run "aureport > >>> --start boot" and I see 644 total events. This is nowhere near the 8192 > >>> limit that I asked for. So, why am I losing events? > >>> > >>> Additionally, I checked the logs and there is absolutely no message in > >>> syslog showing that I've lost events. This is with failure mode set to > >>> 1 - which is default at power on. And this is in spite of the the fact > >>> that the source code seems to show that it should have printk'ed > >>> something. > >>> > >>> Any ideas? Can you replicate this finding? > >> > >> It's funny, I just noticed this for the first time on Friday (the > >> exact same lost count too), although it was a development kernel build > >> with a *heavily* modified audit subsystem so I just assumed I had > >> broken something with the queuing, the lost counter, or both. It's > >> possible I still may have broken something in the v4.10 queue rework, > >> or something broke a long time ago and we are just noticing it now. > >> > >> First off, can you create a GitHub issue for this and include your > >> kernel build (e.g. 'uname -r')? Second, if you are seeing this on a > >> +v4.10 kernel, do you see the same results with a +v4.9 kernel? > > > > Quick follow-up, and completely untested, but it would appear that the > > problem lies in kauditd_hold_skb()/kauditd_print_skb(); > > kauditd_print_skb() registers a false lost record when the printk > > ratelimit is tripped. The fix is rather simple, and I'll include that > > in an upcoming patchset. > > ... and a quick question, if the kernel is booted without "audit=1" do > we want to count lost records in the case where the backlog overflows? If audit == 0, then we should not care because auditing may never be enabled. If for some reason audit == 2, then I suppose we should care. -Steve