From mboxrd@z Thu Jan  1 00:00:00 1970
From: Kay Hayen <kayhayen@gmx.de>
Subject: Re: Audit for live supervision
Date: Fri, 15 Aug 2008 08:43:49 +0200
Message-ID: <200808150843.49680.kayhayen@gmx.de>
References: <200808140914.07779.kayhayen@gmx.de>
	<200808141004.08678.sgrubb@redhat.com>
Mime-Version: 1.0
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Return-path: <linux-audit-bounces@redhat.com>
Received: from mx3.redhat.com (mx3.redhat.com [172.16.48.32])
	by int-mx1.corp.redhat.com (8.13.1/8.13.1) with ESMTP id m7F6ieIv014891
	for <linux-audit@redhat.com>; Fri, 15 Aug 2008 02:44:40 -0400
Received: from mail.gmx.net (mail.gmx.net [213.165.64.20])
	by mx3.redhat.com (8.13.8/8.13.8) with SMTP id m7F6hvt9024732
	for <linux-audit@redhat.com>; Fri, 15 Aug 2008 02:43:58 -0400
In-Reply-To: <200808141004.08678.sgrubb@redhat.com>
Content-Disposition: inline
List-Unsubscribe: <https://www.redhat.com/mailman/listinfo/linux-audit>,
	<mailto:linux-audit-request@redhat.com?subject=unsubscribe>
List-Archive: <https://www.redhat.com/archives/linux-audit>
List-Post: <mailto:linux-audit@redhat.com>
List-Help: <mailto:linux-audit-request@redhat.com?subject=help>
List-Subscribe: <https://www.redhat.com/mailman/listinfo/linux-audit>,
	<mailto:linux-audit-request@redhat.com?subject=subscribe>
Sender: linux-audit-bounces@redhat.com
Errors-To: linux-audit-bounces@redhat.com
To: Steve Grubb <sgrubb@redhat.com>
Cc: linux-audit@redhat.com
List-Id: linux-audit@redhat.com


Hello Steve,

thanks for your reply, very helpful. :-)

> > Now one issue, I see is that the times that we get from auditd throug=
h
> > the socket from its child daemon may not match the start_date exactly=
.
>
> All time hacks in the audit logs come from the kernel at the instant th=
e
> record is created. They all start by calling audit_log_start, and right
> here is where time is written:
>
> http://lxr.linux.no/linux+v2.6.26.2/kernel/audit.c#L1194
>
> The source that is used is current_kernel_time();

[...]

> The audit system has millisecond resolution.This was considered adequat=
e
> due to system ticks being < 1000 Hz. The current_kernel_time90 is a bro=
ken
> down time struct similar to pselect's. This is how its used:
>
>  audit_log_format(ab, "audit(%lu.%03lu:%u): ",
> 			t.tv_sec, t.tv_nsec/1000000, serial);

>>From what I checked, it seems that current_kernel_time() is indeed fed e=
xactly=20
by the jiffy/system ticks since boot (at least did I find comments that=20
suggest so). I still have to verify how it is translated. There still is =
the=20
issue of translating ticks into seconds since 1970. So far I have only=20
achieved to get hands on system boot time in a granularity of one second.

I have no clue if that is the same time used in the kernel to offset the =
ticks=20
value. I will make a delta test once I can. But suspect would be that the=
=20
real time clock has better time than one second, which is all that I get=20
from /proc/stat btime: field.

More importantly, and somewhat blocking my tests: With the improved rules=
 I=20
get this when compiling quite well reproducible:

type=3DSYSCALL msg=3Daudit(1218773075.500:118620): arch=3Dc000003e syscal=
l=3D59=20
success=3Dyes exit=3D0 a0=3D7fff6f78cf90 a1=3D7fff6f78cf40 a2=3D7fff6f78f=
068 a3=3D0=20
items=3D2 pp
id=3D11412 pid=3D11421 auid=3D4294967295 uid=3D1000 gid=3D1000 euid=3D100=
0 suid=3D1000=20
fsuid=3D1000 egid=3D1000 sgid=3D1000 fsgid=3D1000 tty=3Dpts3 ses=3D429496=
7295=20
comm=3D"gcc-4.3"
exe=3D"/usr/bin/gcc-4.3" key=3D(null)

[...]
type=3DSYSCALL msg=3Daudit(1218773075.496:118624): arch=3Dc000003e syscal=
l=3D56=20
success=3Dyes exit=3D11421 a0=3D1200011 a1=3D0 a2=3D0 a3=3D7fc067776770 i=
tems=3D0=20
ppid=3D11407 pid
=3D11412 auid=3D4294967295 uid=3D1000 gid=3D1000 euid=3D1000 suid=3D1000 =
fsuid=3D1000=20
egid=3D1000 sgid=3D1000 fsgid=3D1000 tty=3Dpts3 ses=3D4294967295 comm=3D"=
gnatchop"=20
exe=3D"/usr/b
in/gnatchop" key=3D(null)

Please note the _ascending_ sequence number but _descending_ time. This i=
s=20
pasted from my audit.log and quite surprising. It triggers an assertion i=
n=20
our code, because we also seem receive things in the wrong order from the=
=20
socket. There was not FORK before the EXECVE that refers to the pid. We=20
tolerate that (obviously there are going to be processes that we didn't s=
ee=20
fork once we start, we don't do an initial scan yet), but we don't tolera=
te=20
that fork returns an existing pid.=20

Seems like a bug? Can you have a look at it? I was using for that:

-a entry,always -F arch=3Db32 -S clone -S fork -S vfork
-a entry,always -F arch=3Db64 -S clone -S fork -S vfork
-a entry,always -S execve
-a entry,always -S exit_group -S exit

I didn't apply the arch to exit, etc. yet, but there wasn't an pid rollov=
er=20
yet, so I don't think that missing an exit is really the issue here.=20

Plus I still did't fully grasp why that arch filter was necessary in the =
first=20
place. I mean, after all, I was simply expecting that per default no filt=
er=20
should give all arches. Is that filter actually a selector? Does it have =
to=20
do with the fact that syscall numbers are arch dependent?

> > The other thing is sequence numbers. We see in the output sequence
> > numbers for each audit event. Very nice. But can you confirm where th=
ese
> > sequence numbers are created? Are they done in the kernel, in auditd =
or
> > in its child daemon?
>
> They are done in the kernel and are incremented for each audit_log_star=
t so
> that no 2 audit events within the same millisecond have the same serial
> number. Their source is here:
>
> http://lxr.linux.no/linux+v2.6.26.2/kernel/audit.c#L1085

Thanks for the pointer. That looks indeed like nothing can go wrong on th=
e=20
reliability side of time and serial. The more astounding my report from a=
bove=20
is.

> > The underlying question is, how safe can we be that we didn't miss
> > anything when sequence numbers don't suggest so. We would like to use=
 the
> > lossless mode of auditd. Does that simply mean that auditd may get be=
hind
> > in worst case?
>
> Yes. You would want to do a couple things. Increase the kernel backlog,
> increase auditd priority, & increase audispd's internal queue.

That's fine with us. We just wouldn't want to be in an inconistent state =
after=20
a load peek. I think with a decicated core for our supervision processes,=
=20
everything except benchmarks of clone performance (aka fork bombs) will b=
e=20
able to trigger any actual issue.

> > Can you confirm that a type=3DEOE delimits every event (is that even
> > the correct term to use, audit trace, how is it called).
>
> It delimits every multipart event. you can use something like this to
> determine if you have an event:
>
	> if ( r->type =3D=3D AUDIT_EOE || r->type < AUDIT_FIRST_EVENT ||
>                                 r->type >=3D AUDIT_FIRST_ANOM_MSG) {
>   have full event...
> }

I will have to check if this affects our intended process tracing. The pa=
rsing=20
is certainly not simplified by it, for a possibly unrelated reason.

We read from a socket, and in data chunks, which are then processed. For =
that,=20
we want to identify the end of a message before processing it. Right now,=
 we=20
have switched to waiting for type=3DEOE and its new line. Without that, t=
he=20
fact that type=3Dnon-EOE may be a multi-line thing. I think parameters ar=
e on=20
new lines and that makes it a bit hard to detect the end of a complete ty=
pe=3D=20
trace. I don't see that in the audit.log, so probably it's a problem of t=
he=20
sub-daemon:

 type=3DEXECVE msg=3Daudit(1218774568.173:129440): argc=3D4 a0=3D"/bin/sh=
"
a1=3D"..."
a2=3D"..."
a3=3D"..."

Without a very stateful message parser, one that e.g. knows how many line=
s are=20
to follow an EXECVE, we don't know when to forward it the part that shoul=
d=20
process it.

What we first, once we got a message is the following code:
        # 1. Some lines are split across multiple lines. The good thing i=
s=20
that these never start
        #    with whitespace and so we can make them back into single lin=
es.=20
This makes the next
        #    part easier.

        lines =3D []

        for line in message.split( "\n" ):
            if line.strip() =3D=3D "":
                pass
            elif line.startswith( " type=3D" ):
                lines.append( line )
            else:
                assert line[0] !=3D ' '

                lines[-1] =3D lines[-1] + ' ' + line


This is in hope that indeed continued lines always start with a non-space=
 and=20
type lines always start with a space. Would you consider this format wort=
hy=20
and possible to change?=20

I have no idea how much it represents and existing external interface, bu=
t I=20
can imagine you can't change it (easily). Probably the end of type=3D mus=
t be=20
detected by terminating empty line in case of those that can be continued=
.=20
But it would be very ugly to have to know the event types that have this =
so=20
early in the decoding process.

> > BTW: Release-wise, will RHEL 5.3 include the latest auditd?
>
> That is the plan. But there will be a point where audit development
> continues and bugfixes are backported rather than new version. At a
> minimum, audit-1.7.5 will be in RHEL5.3. Maybe 1.7.6 if we have another
> quick release.

That's OK for us. And until then self-compiled will do.

> If you are on amd64, I would suggest:
>
> -a entry,always -F arch=3Db32 -S clone -S fork -S vfork
> -a entry,always -F arch=3Db64 -S clone -S fork -S vfork

Actually that solved the trouble of not seeing anything on "Debian". The =
fact=20
is that we are using RHEL x86 for production and Debian (or Ubuntu) amd64=
 on=20
our development machines. So we never checked RHEL amd64, which likely wo=
uld=20
show the same thing.

But see above, this possibly a bug?

> There might be tunables that different distros can used with glibc. str=
ace
> is your friend...and having both 32/64 bit rules if amd64 is the target
> platform.

We did that of course. And what was confusing us was that the audit.log d=
id=20
actually seem to show the calls. Can that even be?

> > Does audit not  (yet?) use other tracing interface like SystemTap, et=
c.
> > where people try to have 0 cost for inactive traces.
>
> They have a cost. :)  Also, systemtap while good for some things not go=
od
> for auditing. For one, systemtap recompiles the kernel to make new modu=
les.
> You may not want that in your environment. It also has not been tested =
for
> CAPP/LSPP compilance.
>
> > Also on a general basis. Do you recommend using the sub-daemon for th=
e
> > job or should we rather use libaudit for the task instead? Any insigh=
t is
> > welcome here.
>
> It really depends on what your environment allows. Do you need an audit
> trail? With search tools? And reporting tools? Do you need the system t=
o
> halt if auditing problems occur? Do you need any certifications?

I see. Luckily we are not into security, but only "safety". I can't find=20
anything on Wikipedia about it, so I will try to explain it briefly, plea=
se=20
forgive my limited understanding of it. :-)

Increasingly we in the ATC need to provide a "Software Assurance Level" (=
SWAL)=20
to our customers and those to their regulators. That's one of many standa=
rds=20
to define "safety". Depending on the level, you may not only need to docu=
ment=20
the requirements of your software or you have to relate each line of code=
=20
with them and ultimately prove the correctness of compiler output. I beli=
eve=20
they do this for fighter planes.

We currently target level 3, which means that we will have to prove that =
we=20
have concerned ourselves with all possible hazards and their combinations=
 and=20
if that would lead to "safety" relevant things. For the running system, w=
e=20
don't have to prove anything. There are "legal recordings" of the input a=
nd=20
output data, but no complete logs of the operation are required per se,=20
although of course, as we support the analysis of trouble reports, that i=
s=20
ver welcome. That level is acceptable for non-airborne systems.

So if we are running and monitoring a system, and if e.g. certain process=
 must=20
run to prevent planes from colliding, we must provide explanations (and=20
tests) of why we are at any time sure _if_ (not "that") the process is=20
running or how the problem will be noted by the software and ultimately t=
he=20
operators through other means, if that fails.

On a general level, redundancy is normally the solution to hazards. Every=
thing=20
is doubled, and you normally have 2 trackers, etc. so you use not only on=
e=20
software, but multiple implementations.=20

When something fails, we generally stop the software unless we are sure i=
t's=20
an isolated event. That is because data corruption is worse than no data =
at=20
all. An unseen flight or a falsely reported flight is a lot worse than a=20
crashed system that only needs a restart and will be back in operation wi=
thin=20
time limits.

Our concern is not at all to prevent or detect malicious use of the syste=
m. If=20
somebody wants to run a binary in a hidden way and takes effort to achiev=
e=20
that, other processes have failed already. Our systems always run in an=20
environment where security is already solved through other means. The sys=
tems=20
e.g. do allow rsh login, without a password, as root. I hope you can cont=
inue=20
breathing now. :-)

It certainly will be very helpful to have the audit log and it searchable=
 and=20
I understand we get that automatic by leaving audit enabled, but configur=
ed=20
correctly. In the past we have disabled it, because it caused a full disk=
 and=20
boot failure on RHEL 3 after only a month or so. I think it complained ab=
out=20
the UDP echo packets that we use to check our internal LAN operations, bu=
t it=20
could have been SELinux too.

So, what we will do with audit is to look at source code, identify design=
=20
principles that make errors impossible, write that down, define our=20
requirements of it, test these. And ultimately we may have to provide an=20
alternative implementation without audit at all, with probably worse late=
ncy,=20
and the ability to detect and compensate (simulated) audit bugs. And test=
=20
that as well.

And in some figure, we will say: If the tracker process exits, we will de=
tect=20
that on avarage after 1 ms, and at latest after 50ms. And that will play =
a=20
role in the time it takes to switch to the standby tracker process on ano=
ther=20
hardware. And that will e.g. have the requirement to be able to take over=
 in=20
1 or 3 seconds, and that will be possible, because the process that hands=
=20
over started early enough and takes itself a limited time.

The ultimate benefits of auditd would be to us:

a) Achieve very low values in latency, being able to observe an "exit" as=
 it=20
happens, not just when a periodic timer makes a test. We can only achieve=
=20
these kinds of performance through child process SIGCHLD so far and signa=
ls=20
are not an ideal interface. Being able to monitor non-childs is very good=
 for=20
us, actually what made us interested in audit in the first place.

b) Robust operation. In theory the audit approach should be able to recei=
ve=20
assurances that certain hazards just cannot happen that's very nice and=20
certainly increases the assurances we can give for our process supervisio=
n, a=20
key stone building part for every system.

c) Our middleware may suddenly offer to monitor processes like ntpd, cron=
=20
jobs, etc. very easily and without system changes. There are systems that=
=20
work with "ps", "grep" and "kill" to achieve this, but as you can imagine=
,=20
that only goes so far.

Security, in the sense of intrusion detection, authorization, etc. doesn'=
t=20
play a role to us. So we don't need audit trails for our live supervision=
,=20
we "only want to know" what was running, what is running.

> > 2. We don't want to poll periodically, but rather only wake up (and t=
hen
> > with minimal latency) when something interesting happened. We would w=
ant
> > to poll a periodic check that forks are still reported, so we would
> > detect a loss of service from audit.
>
> You might write a audispd plugin for this.

Did you mean for the periodic check, or for the whole job, that means our=
=20
supervision process? We certainly would prefer the plugin approach for su=
ch a=20
test, esp. if there is hope that you accept it into your code.=20

The closing of the socket in case of loss of service would be sufficient=20
signal to us.

Regarding performance I would like to say, you are likely right in that i=
t's a=20
non-issue. It has something of a bike-shed to me though. :-) I think I st=
ill=20
have http://lwn.net/Articles/290428/ on my mind, where I had the impressi=
on=20
that kernel markers would only require a few noop instructions as place=20
holders for a jumps that would cause audit code to run. I was wondering w=
hy=20
audit wouldn't use that. Is that historic (didn't exist, nobody made a pa=
tch=20
for it) or conscious decision (too difficult, not worth it). Just curious=
=20
here and of course the comment could be read as a bit scary, because it=20
actually means we will have to benchmark the impact...

Best regards,
Kay Hayen