From mboxrd@z Thu Jan 1 00:00:00 1970 From: Kay Hayen Subject: Re: Audit for live supervision Date: Fri, 15 Aug 2008 08:43:49 +0200 Message-ID: <200808150843.49680.kayhayen@gmx.de> References: <200808140914.07779.kayhayen@gmx.de> <200808141004.08678.sgrubb@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Return-path: Received: from mx3.redhat.com (mx3.redhat.com [172.16.48.32]) by int-mx1.corp.redhat.com (8.13.1/8.13.1) with ESMTP id m7F6ieIv014891 for ; Fri, 15 Aug 2008 02:44:40 -0400 Received: from mail.gmx.net (mail.gmx.net [213.165.64.20]) by mx3.redhat.com (8.13.8/8.13.8) with SMTP id m7F6hvt9024732 for ; Fri, 15 Aug 2008 02:43:58 -0400 In-Reply-To: <200808141004.08678.sgrubb@redhat.com> Content-Disposition: inline List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: linux-audit-bounces@redhat.com Errors-To: linux-audit-bounces@redhat.com To: Steve Grubb Cc: linux-audit@redhat.com List-Id: linux-audit@redhat.com Hello Steve, thanks for your reply, very helpful. :-) > > Now one issue, I see is that the times that we get from auditd throug= h > > the socket from its child daemon may not match the start_date exactly= . > > All time hacks in the audit logs come from the kernel at the instant th= e > record is created. They all start by calling audit_log_start, and right > here is where time is written: > > http://lxr.linux.no/linux+v2.6.26.2/kernel/audit.c#L1194 > > The source that is used is current_kernel_time(); [...] > The audit system has millisecond resolution.This was considered adequat= e > due to system ticks being < 1000 Hz. The current_kernel_time90 is a bro= ken > down time struct similar to pselect's. This is how its used: > > audit_log_format(ab, "audit(%lu.%03lu:%u): ", > t.tv_sec, t.tv_nsec/1000000, serial); >>From what I checked, it seems that current_kernel_time() is indeed fed e= xactly=20 by the jiffy/system ticks since boot (at least did I find comments that=20 suggest so). I still have to verify how it is translated. There still is = the=20 issue of translating ticks into seconds since 1970. So far I have only=20 achieved to get hands on system boot time in a granularity of one second. I have no clue if that is the same time used in the kernel to offset the = ticks=20 value. I will make a delta test once I can. But suspect would be that the= =20 real time clock has better time than one second, which is all that I get=20 from /proc/stat btime: field. More importantly, and somewhat blocking my tests: With the improved rules= I=20 get this when compiling quite well reproducible: type=3DSYSCALL msg=3Daudit(1218773075.500:118620): arch=3Dc000003e syscal= l=3D59=20 success=3Dyes exit=3D0 a0=3D7fff6f78cf90 a1=3D7fff6f78cf40 a2=3D7fff6f78f= 068 a3=3D0=20 items=3D2 pp id=3D11412 pid=3D11421 auid=3D4294967295 uid=3D1000 gid=3D1000 euid=3D100= 0 suid=3D1000=20 fsuid=3D1000 egid=3D1000 sgid=3D1000 fsgid=3D1000 tty=3Dpts3 ses=3D429496= 7295=20 comm=3D"gcc-4.3" exe=3D"/usr/bin/gcc-4.3" key=3D(null) [...] type=3DSYSCALL msg=3Daudit(1218773075.496:118624): arch=3Dc000003e syscal= l=3D56=20 success=3Dyes exit=3D11421 a0=3D1200011 a1=3D0 a2=3D0 a3=3D7fc067776770 i= tems=3D0=20 ppid=3D11407 pid =3D11412 auid=3D4294967295 uid=3D1000 gid=3D1000 euid=3D1000 suid=3D1000 = fsuid=3D1000=20 egid=3D1000 sgid=3D1000 fsgid=3D1000 tty=3Dpts3 ses=3D4294967295 comm=3D"= gnatchop"=20 exe=3D"/usr/b in/gnatchop" key=3D(null) Please note the _ascending_ sequence number but _descending_ time. This i= s=20 pasted from my audit.log and quite surprising. It triggers an assertion i= n=20 our code, because we also seem receive things in the wrong order from the= =20 socket. There was not FORK before the EXECVE that refers to the pid. We=20 tolerate that (obviously there are going to be processes that we didn't s= ee=20 fork once we start, we don't do an initial scan yet), but we don't tolera= te=20 that fork returns an existing pid.=20 Seems like a bug? Can you have a look at it? I was using for that: -a entry,always -F arch=3Db32 -S clone -S fork -S vfork -a entry,always -F arch=3Db64 -S clone -S fork -S vfork -a entry,always -S execve -a entry,always -S exit_group -S exit I didn't apply the arch to exit, etc. yet, but there wasn't an pid rollov= er=20 yet, so I don't think that missing an exit is really the issue here.=20 Plus I still did't fully grasp why that arch filter was necessary in the = first=20 place. I mean, after all, I was simply expecting that per default no filt= er=20 should give all arches. Is that filter actually a selector? Does it have = to=20 do with the fact that syscall numbers are arch dependent? > > The other thing is sequence numbers. We see in the output sequence > > numbers for each audit event. Very nice. But can you confirm where th= ese > > sequence numbers are created? Are they done in the kernel, in auditd = or > > in its child daemon? > > They are done in the kernel and are incremented for each audit_log_star= t so > that no 2 audit events within the same millisecond have the same serial > number. Their source is here: > > http://lxr.linux.no/linux+v2.6.26.2/kernel/audit.c#L1085 Thanks for the pointer. That looks indeed like nothing can go wrong on th= e=20 reliability side of time and serial. The more astounding my report from a= bove=20 is. > > The underlying question is, how safe can we be that we didn't miss > > anything when sequence numbers don't suggest so. We would like to use= the > > lossless mode of auditd. Does that simply mean that auditd may get be= hind > > in worst case? > > Yes. You would want to do a couple things. Increase the kernel backlog, > increase auditd priority, & increase audispd's internal queue. That's fine with us. We just wouldn't want to be in an inconistent state = after=20 a load peek. I think with a decicated core for our supervision processes,= =20 everything except benchmarks of clone performance (aka fork bombs) will b= e=20 able to trigger any actual issue. > > Can you confirm that a type=3DEOE delimits every event (is that even > > the correct term to use, audit trace, how is it called). > > It delimits every multipart event. you can use something like this to > determine if you have an event: > > if ( r->type =3D=3D AUDIT_EOE || r->type < AUDIT_FIRST_EVENT || > r->type >=3D AUDIT_FIRST_ANOM_MSG) { > have full event... > } I will have to check if this affects our intended process tracing. The pa= rsing=20 is certainly not simplified by it, for a possibly unrelated reason. We read from a socket, and in data chunks, which are then processed. For = that,=20 we want to identify the end of a message before processing it. Right now,= we=20 have switched to waiting for type=3DEOE and its new line. Without that, t= he=20 fact that type=3Dnon-EOE may be a multi-line thing. I think parameters ar= e on=20 new lines and that makes it a bit hard to detect the end of a complete ty= pe=3D=20 trace. I don't see that in the audit.log, so probably it's a problem of t= he=20 sub-daemon: type=3DEXECVE msg=3Daudit(1218774568.173:129440): argc=3D4 a0=3D"/bin/sh= " a1=3D"..." a2=3D"..." a3=3D"..." Without a very stateful message parser, one that e.g. knows how many line= s are=20 to follow an EXECVE, we don't know when to forward it the part that shoul= d=20 process it. What we first, once we got a message is the following code: # 1. Some lines are split across multiple lines. The good thing i= s=20 that these never start # with whitespace and so we can make them back into single lin= es.=20 This makes the next # part easier. lines =3D [] for line in message.split( "\n" ): if line.strip() =3D=3D "": pass elif line.startswith( " type=3D" ): lines.append( line ) else: assert line[0] !=3D ' ' lines[-1] =3D lines[-1] + ' ' + line This is in hope that indeed continued lines always start with a non-space= and=20 type lines always start with a space. Would you consider this format wort= hy=20 and possible to change?=20 I have no idea how much it represents and existing external interface, bu= t I=20 can imagine you can't change it (easily). Probably the end of type=3D mus= t be=20 detected by terminating empty line in case of those that can be continued= .=20 But it would be very ugly to have to know the event types that have this = so=20 early in the decoding process. > > BTW: Release-wise, will RHEL 5.3 include the latest auditd? > > That is the plan. But there will be a point where audit development > continues and bugfixes are backported rather than new version. At a > minimum, audit-1.7.5 will be in RHEL5.3. Maybe 1.7.6 if we have another > quick release. That's OK for us. And until then self-compiled will do. > If you are on amd64, I would suggest: > > -a entry,always -F arch=3Db32 -S clone -S fork -S vfork > -a entry,always -F arch=3Db64 -S clone -S fork -S vfork Actually that solved the trouble of not seeing anything on "Debian". The = fact=20 is that we are using RHEL x86 for production and Debian (or Ubuntu) amd64= on=20 our development machines. So we never checked RHEL amd64, which likely wo= uld=20 show the same thing. But see above, this possibly a bug? > There might be tunables that different distros can used with glibc. str= ace > is your friend...and having both 32/64 bit rules if amd64 is the target > platform. We did that of course. And what was confusing us was that the audit.log d= id=20 actually seem to show the calls. Can that even be? > > Does audit not (yet?) use other tracing interface like SystemTap, et= c. > > where people try to have 0 cost for inactive traces. > > They have a cost. :) Also, systemtap while good for some things not go= od > for auditing. For one, systemtap recompiles the kernel to make new modu= les. > You may not want that in your environment. It also has not been tested = for > CAPP/LSPP compilance. > > > Also on a general basis. Do you recommend using the sub-daemon for th= e > > job or should we rather use libaudit for the task instead? Any insigh= t is > > welcome here. > > It really depends on what your environment allows. Do you need an audit > trail? With search tools? And reporting tools? Do you need the system t= o > halt if auditing problems occur? Do you need any certifications? I see. Luckily we are not into security, but only "safety". I can't find=20 anything on Wikipedia about it, so I will try to explain it briefly, plea= se=20 forgive my limited understanding of it. :-) Increasingly we in the ATC need to provide a "Software Assurance Level" (= SWAL)=20 to our customers and those to their regulators. That's one of many standa= rds=20 to define "safety". Depending on the level, you may not only need to docu= ment=20 the requirements of your software or you have to relate each line of code= =20 with them and ultimately prove the correctness of compiler output. I beli= eve=20 they do this for fighter planes. We currently target level 3, which means that we will have to prove that = we=20 have concerned ourselves with all possible hazards and their combinations= and=20 if that would lead to "safety" relevant things. For the running system, w= e=20 don't have to prove anything. There are "legal recordings" of the input a= nd=20 output data, but no complete logs of the operation are required per se,=20 although of course, as we support the analysis of trouble reports, that i= s=20 ver welcome. That level is acceptable for non-airborne systems. So if we are running and monitoring a system, and if e.g. certain process= must=20 run to prevent planes from colliding, we must provide explanations (and=20 tests) of why we are at any time sure _if_ (not "that") the process is=20 running or how the problem will be noted by the software and ultimately t= he=20 operators through other means, if that fails. On a general level, redundancy is normally the solution to hazards. Every= thing=20 is doubled, and you normally have 2 trackers, etc. so you use not only on= e=20 software, but multiple implementations.=20 When something fails, we generally stop the software unless we are sure i= t's=20 an isolated event. That is because data corruption is worse than no data = at=20 all. An unseen flight or a falsely reported flight is a lot worse than a=20 crashed system that only needs a restart and will be back in operation wi= thin=20 time limits. Our concern is not at all to prevent or detect malicious use of the syste= m. If=20 somebody wants to run a binary in a hidden way and takes effort to achiev= e=20 that, other processes have failed already. Our systems always run in an=20 environment where security is already solved through other means. The sys= tems=20 e.g. do allow rsh login, without a password, as root. I hope you can cont= inue=20 breathing now. :-) It certainly will be very helpful to have the audit log and it searchable= and=20 I understand we get that automatic by leaving audit enabled, but configur= ed=20 correctly. In the past we have disabled it, because it caused a full disk= and=20 boot failure on RHEL 3 after only a month or so. I think it complained ab= out=20 the UDP echo packets that we use to check our internal LAN operations, bu= t it=20 could have been SELinux too. So, what we will do with audit is to look at source code, identify design= =20 principles that make errors impossible, write that down, define our=20 requirements of it, test these. And ultimately we may have to provide an=20 alternative implementation without audit at all, with probably worse late= ncy,=20 and the ability to detect and compensate (simulated) audit bugs. And test= =20 that as well. And in some figure, we will say: If the tracker process exits, we will de= tect=20 that on avarage after 1 ms, and at latest after 50ms. And that will play = a=20 role in the time it takes to switch to the standby tracker process on ano= ther=20 hardware. And that will e.g. have the requirement to be able to take over= in=20 1 or 3 seconds, and that will be possible, because the process that hands= =20 over started early enough and takes itself a limited time. The ultimate benefits of auditd would be to us: a) Achieve very low values in latency, being able to observe an "exit" as= it=20 happens, not just when a periodic timer makes a test. We can only achieve= =20 these kinds of performance through child process SIGCHLD so far and signa= ls=20 are not an ideal interface. Being able to monitor non-childs is very good= for=20 us, actually what made us interested in audit in the first place. b) Robust operation. In theory the audit approach should be able to recei= ve=20 assurances that certain hazards just cannot happen that's very nice and=20 certainly increases the assurances we can give for our process supervisio= n, a=20 key stone building part for every system. c) Our middleware may suddenly offer to monitor processes like ntpd, cron= =20 jobs, etc. very easily and without system changes. There are systems that= =20 work with "ps", "grep" and "kill" to achieve this, but as you can imagine= ,=20 that only goes so far. Security, in the sense of intrusion detection, authorization, etc. doesn'= t=20 play a role to us. So we don't need audit trails for our live supervision= ,=20 we "only want to know" what was running, what is running. > > 2. We don't want to poll periodically, but rather only wake up (and t= hen > > with minimal latency) when something interesting happened. We would w= ant > > to poll a periodic check that forks are still reported, so we would > > detect a loss of service from audit. > > You might write a audispd plugin for this. Did you mean for the periodic check, or for the whole job, that means our= =20 supervision process? We certainly would prefer the plugin approach for su= ch a=20 test, esp. if there is hope that you accept it into your code.=20 The closing of the socket in case of loss of service would be sufficient=20 signal to us. Regarding performance I would like to say, you are likely right in that i= t's a=20 non-issue. It has something of a bike-shed to me though. :-) I think I st= ill=20 have http://lwn.net/Articles/290428/ on my mind, where I had the impressi= on=20 that kernel markers would only require a few noop instructions as place=20 holders for a jumps that would cause audit code to run. I was wondering w= hy=20 audit wouldn't use that. Is that historic (didn't exist, nobody made a pa= tch=20 for it) or conscious decision (too difficult, not worth it). Just curious= =20 here and of course the comment could be read as a bit scary, because it=20 actually means we will have to benchmark the impact... Best regards, Kay Hayen