On 10/22/2014 04:29 PM, Paul Moore wrote: > Well, like I said, It's probably safer that way as the code will work > regardless. Time to break bad habits :) > I hear you. But there's working and there's working well. As long as we don't suffer a search response degradation by changing the assumptive order, as I said, I'm OK with going back and reworking code. If it makes searching real data unusable, it's now broken some operational stuff. As it is, I already have to move the last 24 hours off to a different area and make searching go day-by-day only. Otherwise the logs are too big. I chose a 100MB file size a while back as a sweet spot based on the observation that each file could be parsed in some time acceptable to security managers. Just some info on the amount of data I've seen in real world examples: On a normally-busy site, each day, we will generate ~1-4GB of events. It used to be much more, but our team has managed to keep it down. I'd say that often more time is spent reducing the events than analyzing them. If not, they grow to a size you simply cannot parse in people-time. I did have way more, but we've become better adept at spotting and preventing event storms. Here is a current test machine's audit log directory. I have to go look to see what went wrong, obviously something did, but this kind of thing can and does happen in the real world. Note the time stamps. 100MB every minute or two is moving right along. We push all audit data off to a collector machine so it doesn't dominate the disks or otherwise hog the business server resources. BTW - funny that I just happened to login to this machine to check its status. Whatever was running amok stopped soon after 00:57. This sometimes just happens. I've seen this go on for hours more in fielded systems. Almost always something is really wrong, but the system is supposed to maintain security and be usable through these times as well. [root@audit audit]# ls -alt total 4179016 -rw-------. 1 root root 84835735 Oct 23 02:49 audit.log drwxr-x---. 2 root root 4096 Oct 23 00:57 . -r--------. 1 root root 104857789 Oct 23 00:57 audit.log.1 -r--------. 1 root root 104857663 Oct 23 00:56 audit.log.2 -r--------. 1 root root 104857639 Oct 23 00:55 audit.log.3 -r--------. 1 root root 104857608 Oct 23 00:54 audit.log.4 -r--------. 1 root root 104857874 Oct 23 00:52 audit.log.5 -r--------. 1 root root 104857864 Oct 23 00:51 audit.log.6 -r--------. 1 root root 104857670 Oct 23 00:50 audit.log.7 -r--------. 1 root root 104857740 Oct 23 00:49 audit.log.8 -r--------. 1 root root 104857644 Oct 23 00:48 audit.log.9 -r--------. 1 root root 104857871 Oct 23 00:47 audit.log.10 -r--------. 1 root root 104857602 Oct 23 00:46 audit.log.11 -r--------. 1 root root 104857755 Oct 23 00:45 audit.log.12 -r--------. 1 root root 104857907 Oct 23 00:44 audit.log.13 -r--------. 1 root root 104857973 Oct 23 00:43 audit.log.14 -r--------. 1 root root 104857632 Oct 23 00:42 audit.log.15 -r--------. 1 root root 104857843 Oct 23 00:42 audit.log.16 -r--------. 1 root root 104857769 Oct 23 00:41 audit.log.17 -r--------. 1 root root 104857864 Oct 23 00:40 audit.log.18 -r--------. 1 root root 104857862 Oct 23 00:39 audit.log.19 -r--------. 1 root root 104857757 Oct 23 00:38 audit.log.20 -r--------. 1 root root 104857601 Oct 23 00:36 audit.log.21 -r--------. 1 root root 104857663 Oct 23 00:35 audit.log.22 -r--------. 1 root root 104857711 Oct 23 00:34 audit.log.23 -r--------. 1 root root 104857806 Oct 23 00:33 audit.log.24 -r--------. 1 root root 104857722 Oct 23 00:32 audit.log.25 -r--------. 1 root root 104857684 Oct 23 00:31 audit.log.26 -r--------. 1 root root 104857761 Oct 23 00:30 audit.log.27 -r--------. 1 root root 104857975 Oct 23 00:29 audit.log.28 -r--------. 1 root root 104857702 Oct 23 00:28 audit.log.29 -r--------. 1 root root 104857771 Oct 23 00:27 audit.log.30 -r--------. 1 root root 104857990 Oct 23 00:25 audit.log.31 -r--------. 1 root root 104857629 Oct 23 00:24 audit.log.32 -r--------. 1 root root 104857663 Oct 23 00:23 audit.log.33 -r--------. 1 root root 104857852 Oct 23 00:22 audit.log.34 -r--------. 1 root root 104857841 Oct 23 00:21 audit.log.35 -r--------. 1 root root 104857634 Oct 23 00:20 audit.log.36 -r--------. 1 root root 104857690 Oct 23 00:19 audit.log.37 -r--------. 1 root root 104857623 Oct 23 00:18 audit.log.38 -r--------. 1 root root 104857711 Oct 23 00:17 audit.log.39 -r--------. 1 root root 104857867 Oct 23 00:02 audit.log.40 I started an "aureport -i --summary". I realize my aureport and ausearch don't use the auparse library, but they will soon, if not already, in the newer code. I just wanted to give you a feel for data amounts in your considerations. Also I have designs for some new tools I want to deploy which definitely will use libauparse, if it's usable. At 4GB+ of audit data, this report isn't returning yet. The nice thing is it isn't leaking memory, or if it is, it isn't gushing (from "top"): 31485 root 20 0 217m 104m 89m R 99.7 0.4 22:29.91 aureport So I guess from a real user perspective I'm just worried that fixing an order dependency will have adverse effects on retrieval. So my vote would be to promote anything which makes searching/parsing faster. Or we need a different paradigm entirely. I've been doing some other stuff while writing this. The aureport isn't done after 37+ minutes: 31485 root 20 0 220m 96m 78m R 100.0 0.4 37:22.38 aureport I realize if I were sitting inside a google infrastructure this would be different, but I'm not. So when guys try to understand what happened during, in this case, the first hour of the day, the time it takes to return the answer can make them think it's broken. This is why to me, speed of parse/retrieval is important. The aureport is still cranking: 31485 root 20 0 221m 76m 57m R 100.0 0.3 44:21.15 aureport Now if a few queries happen at the same time, and they can, we start seeing some real work on this machine. It's a reasonable machine - dual quad core Xeon with 24GB of RAM - dedicated to storage and retrieval of system audit data (from "cat /proc/cpuinfo"): ... processor : 7 vendor_id : GenuineIntel cpu family : 6 model : 44 model name : Intel(R) Xeon(R) CPU X5677 @ 3.47GHz stepping : 2 cpu MHz : 1600.000 cache size : 12288 KB So hopefully this gives you a sense of the way the audit data can stack up; to me every millisecond/event saved is potentially hours earned. Report is still running btw: 31485 root 20 0 222m 103m 83m R 99.9 0.4 56:18.64 aureport From what I've read on the list, there are other users who have similar loads. I'm not sure if this makes any real difference in your thought process or not, and you probably knew most of this anyway. Just wanted to throw out some data points for consideration in case they matter. I waited until this morning to send this so I could see how long the aureport took. [root@audit audit]# time aureport -i --summary Summary Report ====================== Range of time in logs: 10/23/2014 00:01:02.305 - 10/23/2014 09:58:40.745 Selected time for report: 10/23/2014 00:01:02 - 10/23/2014 09:58:40.745 Number of changes in configuration: 2892 Number of changes to accounts, groups, or roles: 0 Number of logins: 47 Number of failed logins: 2 Number of authentications: 2712 Number of failed authentications: 355 Number of users: 8 Number of terminals: 41 Number of host names: 7 Number of executables: 112 Number of files: 536772 Number of AVC's: 5791 Number of MAC events: 3045 Number of failed syscalls: 2502700 Number of anomaly events: 525 Number of responses to anomaly events: 0 Number of crypto events: 14 Number of keys: 36 Number of process IDs: 30494 Number of events: 8526658 real 428m11.405s user 427m21.833s sys 0m4.014s Thank you! LCB -- LC (Lenny) Bruzenak lenny@magitekltd.com