From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,HTML_MESSAGE,INCLUDES_PATCH,MAILING_LIST_MULTI, MIME_HTML_MOSTLY,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5F616C433E0 for ; Mon, 4 Jan 2021 20:13:45 +0000 (UTC) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 4D851221F8 for ; Mon, 4 Jan 2021 20:13:43 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4D851221F8 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=iinet.net.au Authentication-Results: mail.kernel.org; spf=tempfail smtp.mailfrom=linux-audit-bounces@redhat.com Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-209-6BAKJOC8N3-OPFiFwgFNyQ-1; Mon, 04 Jan 2021 15:13:38 -0500 X-MC-Unique: 6BAKJOC8N3-OPFiFwgFNyQ-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 1FE211005513; Mon, 4 Jan 2021 20:13:35 +0000 (UTC) Received: from colo-mx.corp.redhat.com (colo-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.20]) by smtp.corp.redhat.com (Postfix) with ESMTPS id E312410016FE; Mon, 4 Jan 2021 20:13:33 +0000 (UTC) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by colo-mx.corp.redhat.com (Postfix) with ESMTP id D17AC180954D; Mon, 4 Jan 2021 20:13:31 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id 104KDUpQ030686 for ; Mon, 4 Jan 2021 15:13:30 -0500 Received: by smtp.corp.redhat.com (Postfix) id F36F4111286C; Mon, 4 Jan 2021 20:13:29 +0000 (UTC) Received: from mimecast-mx02.redhat.com (mimecast06.extmail.prod.ext.rdu2.redhat.com [10.11.55.22]) by smtp.corp.redhat.com (Postfix) with ESMTPS id EF8B2111286A for ; Mon, 4 Jan 2021 20:13:27 +0000 (UTC) Received: from us-smtp-1.mimecast.com (us-smtp-delivery-1.mimecast.com [207.211.31.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 7CDF6185A7BC for ; Mon, 4 Jan 2021 20:13:27 +0000 (UTC) Received: from icp-osb-irony-out4.external.iinet.net.au (icp-osb-irony-out4.external.iinet.net.au [203.59.1.220]) by relay.mimecast.com with ESMTP id us-mta-107-HOYpHB-IOy6stSt8LJMsAQ-1; Mon, 04 Jan 2021 15:13:23 -0500 X-MC-Unique: HOYpHB-IOy6stSt8LJMsAQ-1 X-SMTP-MATCH: 1 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: =?us-ascii?q?A2BoAwCfdfNf/1enWttiGwEBAQEBAQE?= =?us-ascii?q?BBQEBARIBAQEDAwEBAUeBSIF8gXyFIIkEhhaCFQM4TYMfimiLT4FcCwEBAQE?= =?us-ascii?q?BAQEBATUBAgQBAYREBAICgXAmOBMCEAEBAQUBAQEBAQYDAYZehXMBAQEBAgE?= =?us-ascii?q?jBC8zCwgGChwOAgJXBgEtgkBLglYBAQ8frjt2fzOFRhOCFAaBDYFCgTiJN4N?= =?us-ascii?q?yNYFNP4ERNoIuNT6HVoJgBIFUBmQHFBcEXBMhEQaBEQEYEQ4BgQaPFINPiHO?= =?us-ascii?q?Ka5ERLAeCeYEXBQuGT5NiIqJQhh0YjVacOTkahj2BejMaH4IWGIEQTxkNVo5?= =?us-ascii?q?wAQgBjRkVMGcCBgoBAQMJWQEBinksghoBAQ?= X-IPAS-Result: =?us-ascii?q?A2BoAwCfdfNf/1enWttiGwEBAQEBAQEBBQEBARIBAQEDA?= =?us-ascii?q?wEBAUeBSIF8gXyFIIkEhhaCFQM4TYMfimiLT4FcCwEBAQEBAQEBATUBAgQBA?= =?us-ascii?q?YREBAICgXAmOBMCEAEBAQUBAQEBAQYDAYZehXMBAQEBAgEjBC8zCwgGChwOA?= =?us-ascii?q?gJXBgEtgkBLglYBAQ8frjt2fzOFRhOCFAaBDYFCgTiJN4NyNYFNP4ERNoIuN?= =?us-ascii?q?T6HVoJgBIFUBmQHFBcEXBMhEQaBEQEYEQ4BgQaPFINPiHOKa5ERLAeCeYEXB?= =?us-ascii?q?QuGT5NiIqJQhh0YjVacOTkahj2BejMaH4IWGIEQTxkNVo5wAQgBjRkVMGcCB?= =?us-ascii?q?goBAQMJWQEBinksghoBAQ?= X-IronPort-AV: E=Sophos;i="5.78,474,1599494400"; d="scan'208,217";a="286836158" Received: from 219-90-167-87.ip.adam.com.au (HELO swtf.swtf.dyndns.org) ([219.90.167.87]) by icp-osb-irony-out4.iinet.net.au with ESMTP; 05 Jan 2021 04:13:01 +0800 Message-ID: <8368b5c67757f2de8f664bc2eef5f5ca153a0c87.camel@iinet.net.au> Subject: Re: Occasional delayed output of events From: Burn Alting To: Steve Grubb , Linux Audit Date: Tue, 05 Jan 2021 07:12:51 +1100 In-Reply-To: <2756596.e9J7NaK4W3@x2> References: <30c5dbc14368a1919717e2f39d2d4c29463c3108.camel@iinet.net.au> <5426593.DvuYhMxLoT@x2> <1340951117184f479e4f5dc75766fba91ed2c085.camel@iinet.net.au> <2756596.e9J7NaK4W3@x2> Mime-Version: 1.0 X-Mimecast-Impersonation-Protect: Policy=CLT - Impersonation Protection Definition; Similar Internal Domain=false; Similar Monitored External Domain=false; Custom External Domain=false; Mimecast External Domain=false; Newly Observed Domain=false; Internal User Name=false; Custom Display Name List=false; Reply-to Address Mismatch=false; Targeted Threat Dictionary=false; Mimecast Threat Dictionary=false; Custom Threat Dictionary=false X-Scanned-By: MIMEDefang 2.78 on 10.11.54.3 X-loop: linux-audit@redhat.com X-BeenThere: linux-audit@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk Reply-To: burn@swtf.dyndns.org List-Id: Linux Audit Discussion List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: linux-audit-bounces@redhat.com Errors-To: linux-audit-bounces@redhat.com X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=linux-audit-bounces@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: multipart/mixed; boundary="===============6760371851207658487==" --===============6760371851207658487== Content-Type: multipart/alternative; boundary="=-nGQ6u4avYmk3vF/Y2htB" --=-nGQ6u4avYmk3vF/Y2htB Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit On Mon, 2021-01-04 at 09:46 -0500, Steve Grubb wrote: > On Monday, January 4, 2021 2:55:25 AM EST Burn Alting wrote: > > On Sun, 2021-01-03 at 10:41 -0500, Steve Grubb wrote: > > > On Friday, January 1, 2021 4:22:33 PM EST Burn Alting wrote: > > > > Sometimes, events recorded in /var/log/audit/audit.log appear some > > > > seconds past co- located events which results in > > > > auparse:au_check_events() marking these events complete before they > > > > are. An example of this can be seen below with the offending event id > > > > 44609. > > > > > > > > This has been plaguing me for a year or two and this morning was the > > > > first time I still had access to the raw audit.log files (I monitor a lot > > > > of event types and the log files roll over fairly quickly). > > > > The example below is from a fully patched Centos 7 but I have also seen > > > > this on a patched Fedora 32. > > > > > > > > Has this been seen before? Do we need to re-evaluate how auparse > > > > 'completes' an event (ie 2 seconds is too quick). > > > > > > I have never seen this. But on the way to disk, auditd only does light > > > processing of the event. If the format is enriched, it looks things up > > > on a record by record basis. It does not collect events until they are > > > complete - it dumps it to disk as soon as it can tack on the extra > > > information. > > > > > > So, the question would be, does this delay happen on the way to disk? Or > > > is this an artifact of post processing the logs with an auparse based > > > utility? Can this be observed repeatedly on the same raw logs? If so, > > > then maybe auparse does have some issue. But if this is a post > > > processing issue, then the wall clock doesn't matter because this event > > > should have collected up together. > > > > > > I'd say this merits some investigation. > > > > OK. I think this needs to be addressed on two fronts. There may be more. > > A. Within post processing ... a 2 second timeout is not sufficient. I > > would suggest we modify auparse.c:au_check_events() to i) perform the > > event type checks first, then > > ii) increase the timeout of 2 seconds to be a larger value based on > > empirical tests. > > In the post processing, there are 2 use cases. The first is events that are on > disk. In this usage, the 2 second timeout does not come into effect because > the events are run through probably within nanoseconds or microseconds at the > worst. The only time it would come into effect is if the terminating record > is missing. In this first case, the 2 second timeout is on the event's time, not the 'processing time'. See ausearch-lol.c:check_events() and auparse.c:au_get_ready_event(). And I use the checkpointing code to avoid the incomplete event issue. In my case, I have not lost records, it's just that an event has arrived on disk with an event time more than 2 seconds after the previously written event. Basically, a. The event was delayed getting to auditd and we look to the kernel for a solution. b. The event arrived at a reasonable point in time at auditd and for some reason auditd delayed it's printing (by the way I tend to use RAW log format, not enriched. In either case, I believe ausearch-lol.c:check_events() and auparse.c:au_get_ready_event() do need to be changed as we have complete events written by auditd which these two routines fail to process properly. > > The other use case is realtime processing as an audispd plugin. In this use > case the wall clock could matter because records could potentially get lost > due to overflows or another plugin taking too long. This is the use case where > the wall clock matters. And again, it matters when records get lost or > delayed in transit. As long as everything is flowing, it should not factor > into event processing. > > > B. I will build a temporary auditd daemon to perform some empirical testing > > to see how long events can reside within the daemon. I may need some > > advice on this. I assume that the code that sets the timestamp is in > > src/auditd.c:send_audit_event(). > > This is only for audit daemon's internal events. For all "real" events, it's > set in the kernel. If that is the case and the kernel is establishing the timestamp, then either the kernel has delayed the events arrival at the daemon or the daemon has delayed it's writing. > > > If so, I will see if I can put orchestration debug code in to monitor an > > event's 'time in daemon' until this point. I will then report on this. > > > > I believe given that AUDIT_PROCTITLE and AUDIT_EOE is fairly widespread, > > then the testing switch in A. will not be a big issue (time cost wise). It > > will also mean that if we over compensate the timeout that would cause > > additional memory cost in auparse() then this is mittigated. > > I'd suggest breaking up the event completion tests so that an exact > collection termination reason code could be associated to the event. > > > With respect to 'There may be more' fronts. Are there other points in the > > 'audit ecosystem' that makes use of the '2 second timeout'. > > Ausearch/report has its own special copy of the event collection logic. It > should be nearly identical to what auparse does. They appear identical ... ausearch-lol.c:check_events() and auparse.c:au_get_ready_event(). > > > I will start work on this, this coming weekend if the above makes sense. > > One other thought, the current shipping code is audit-3.0, doing a diff > between it and audit-2.8.5 for the auparse directory does show some > differences in event collection/grouping/next_event. A lot of the differences > are cosmetic to fix extra whitespace or indentation. But if you skip all that, > there are some real changes that probably were because of bug reports. For > example, I will go through these, although this occurs on my Centos 7's (audit-2.8.5-4.el7.x86_64) as well as my 8's (audit-3.0-0.17.20191104git1c2f876.el8.x86_64). > > @@ -259,15 +260,6 @@ static event_list_t *au_get_ready_event( > if (lowest && lowest->status == EBS_COMPLETE) { > lowest->status = EBS_EMPTY; > au->au_ready--; > - // Try to consolidate the array so that we iterate > - // over a smaller portion next time > - if (lowest == &lol->array[lol->maxi]) { > - au_lolnode *ptr = lowest; > - while (ptr->status == EBS_EMPTY && lol->maxi > 0) { > - lol->maxi--; > - ptr = &lol->array[lol->maxi]; > - } > - } > return lowest->l; > } > > and > > @@ -1536,6 +1550,13 @@ static int au_auparse_next_event(auparse > aup_list_create(l); > aup_list_set_event(l, &e); > aup_list_append(l, au->cur_buf, au->list_idx, au- > > line_number); > + // Eat standalone EOE - main event was already marked > complete > + if (l->head->type == AUDIT_EOE) { > + au->cur_buf = NULL; > + aup_list_clear(l); > + free(l); > + continue; > + } > if (au_lol_append(au->au_lo, l) == NULL) { > free((char *)e.host); > #ifdef LOL_EVENTS_DEBUG01 > > I don't know if those have an effect on what you are seeing. But that is the > only substantial changes that I can see. > > -Steve > > --=-nGQ6u4avYmk3vF/Y2htB Content-Type: text/html; charset="utf-8" Content-Transfer-Encoding: quoted-printable
On Mon, 2021-01-04 at 09:46 -0500, Steve Grubb wrote:
On Monday, January 4, 2021 2:55:25 AM EST Burn Alt=
ing wrote:
On Sun, 2021-01-03 at 10:41=
 -0500, Steve Grubb wrote:
On Friday, =
January 1, 2021 4:22:33 PM EST Burn Alting wrote:
Sometimes, events recorded in /var/log/audit/audit.log appear s=
ome
seconds past co- located events which results in
au=
parse:au_check_events() marking  these events complete before they
are. An example of this can be seen below with the offending event id
44609.

This has been plaguing me for a year=
 or two and this morning was the
first time I still had access to=
 the raw audit.log files (I monitor a lot
of event types and the =
log files roll over fairly quickly).
The example below is from a =
fully patched Centos 7 but I have also seen
this on a patched Fed=
ora 32.

Has this been seen before? Do we need to r=
e-evaluate how auparse
'completes' an event (ie 2 seconds is too =
quick).

I have never seen this. But o=
n the way to disk, auditd only does light
processing of the event=
.  If the format is enriched, it looks things up
on a record by r=
ecord basis. It does not collect events until they are
complete -=
 it dumps it to disk as soon as it can tack on the extra
informat=
ion.

So, the question would be, does this delay ha=
ppen on the way to disk? Or
is this an artifact of post processin=
g the logs with an auparse based
utility? Can this be observed re=
peatedly on the same raw logs? If so,
then maybe auparse does hav=
e some issue. But if this is a post
processing issue, then the wa=
ll clock doesn't matter because this event
should have collected =
up together.

I'd say this merits some investigatio=
n.

OK. I think this needs to be addre=
ssed on two fronts. There may be more.
A.  Within post processing=
 ... a 2 second timeout is not sufficient. I
would suggest we mod=
ify auparse.c:au_check_events() to i) perform the
event type chec=
ks first, then
  ii) increase the timeout of 2 seconds to be a la=
rger value based on
empirical tests.

<= /pre>
In the post processing, there are 2 use cases. The first is event=
s that are on 
disk. In this usage, the 2 second timeout does not=
 come into effect because 
the events are run through probably wi=
thin nanoseconds or microseconds at the 
worst. The only time it =
would come into effect is if the terminating record 
is missing.<=
/pre>

In this first case, the 2 second time= out is on the event's time, not the 'processing time'. See ausearch-lol.c:= check_events() and auparse.c:au_get_ready_event().
And I use the = checkpointing code to avoid the incomplete event issue.

In my case, I have not lost records, it's just that an event has arri= ved on disk with an event time more than 2 seconds after the previously wri= tten event.
Basically,
a. The event was delayed getting= to auditd and we look to the kernel for a solution.
b. The event= arrived at a reasonable point in time at auditd and for some reason auditd= delayed it's printing (by the way I tend to use RAW log format, not enrich= ed.

In either case, I believe ausearch-lol.c:check= _events() and auparse.c:au_get_ready_event() do need to be changed as we ha= ve complete events written by auditd
which these two routines fai= l to process properly.


The other use case is realtime processing as an audispd plu=
gin. In this use 
case the wall clock could matter because record=
s could potentially get lost 
due to overflows or another plugin =
taking too long. This is the use case where 
the wall clock matte=
rs. And again, it matters when records get lost or 
delayed in tr=
ansit. As long as everything is flowing, it should not factor 
in=
to event processing.

B.=
 I will build a temporary auditd daemon to perform some empirical testing
to see how long events can reside within the daemon. I may need so=
me
advice on this. I assume that the code that sets the timestamp=
 is in
src/auditd.c:send_audit_event().
This is only for audit daemon's internal events. For all "real=
" events, it's 
set in the kernel.

If that is the case and the kernel is establishing the timestamp, t= hen either the kernel has delayed the events
arrival at the daemo= n or the daemon has delayed it's writing.


If so, I will =
see if I can put orchestration debug code in to monitor an
event'=
s 'time in daemon' until this point. I will then report on this.
=

I believe given that AUDIT_PROCTITLE and AUDIT_EOE is fairly=
 widespread,
then the testing switch in A. will not be a big issu=
e (time cost wise). It
will also mean that if we over compensate =
the timeout that would cause
additional memory cost in auparse() =
then this is mittigated.

I'd suggest =
breaking up the event completion tests so that an exact 
collecti=
on termination reason code could be associated to the event.

=
With respect to 'There may be more' f=
ronts. Are there other points in the
'audit ecosystem' that makes=
 use of the '2 second timeout'.

Ausea=
rch/report has its own special copy of the event collection logic. It 
should be nearly identical to what auparse does.

They appear identical ... ausearch-lol.c:check_events() = and auparse.c:au_get_ready_event().
I will start work on this, this com=
ing weekend if the above makes sense.

One other thought, the current shipping code is audit-3.0, doing a diff
between it and audit-2.8.5 for the auparse directory does show som=
e 
differences in event collection/grouping/next_event. A lot of =
the differences 
are cosmetic to fix extra whitespace or indentat=
ion. But if you skip all that, 
there are some real changes that =
probably were because of bug reports. For 
example,

I will go through these, although this occurs on m=
y Centos 7's (audit-2.8.5-4.el7.x86_64) as well as my 8's (audit-3.0-0.17.2=
0191104git1c2f876.el8.x86_64).


@@ -259,15 +260,6 @@ static event_list_t *au_get_rea=
dy_event(
        if (lowest && lowest->status =3D=3D =
EBS_COMPLETE) {
                lowest->status =3D EBS_EMPTY;<=
/pre>
                au->au_ready--;
-               // T=
ry to consolidate the array so that we iterate
-               //=
 over a smaller portion next time
-               if (lowest =3D=
=3D &lol->array[lol->maxi]) {
-                       a=
u_lolnode *ptr =3D lowest;
-                       while (ptr->=
;status =3D=3D EBS_EMPTY && lol->maxi > 0) {
-     =
                          lol->maxi--;
-                      =
         ptr =3D &lol->array[lol->maxi];
-             =
          }
-               }
                return lo=
west->l;
        }

and

=
@@ -1536,6 +1550,13 @@ static int au_auparse_next_event(auparse<=
/pre>
                aup_list_create(l);
                aup=
_list_set_event(l, &e);
                aup_list_append(l, au=
->cur_buf, au->list_idx, au-
lin=
e_number);
+               // Eat standalone EOE - m=
ain event was already marked 
complete
+               =
if (l->head->type =3D=3D AUDIT_EOE) {
+                    =
   au->cur_buf =3D NULL;
+                       aup_list_clea=
r(l);
+                       free(l);
+               =
        continue;
+               }
                if =
(au_lol_append(au->au_lo, l) =3D=3D NULL) {
                  =
      free((char *)e.host);
 #ifdef LOL_EVENTS_DEBUG01
=

I don't know if those have an effect on what you are seeing.=
 But that is the 
only substantial changes that I can see.
<= pre>
-Steve


<= /body> --=-nGQ6u4avYmk3vF/Y2htB-- --===============6760371851207658487== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline -- Linux-audit mailing list Linux-audit@redhat.com https://www.redhat.com/mailman/listinfo/linux-audit --===============6760371851207658487==--