From mboxrd@z Thu Jan 1 00:00:00 1970 From: LC Bruzenak Subject: Re: reactive audit question Date: Fri, 19 Nov 2010 12:05:41 -0600 Message-ID: <1290189941.2017.57.camel@lcb> References: <1289582683.2136.36.camel@lcb> <201011191120.16426.sgrubb@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <201011191120.16426.sgrubb@redhat.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: linux-audit-bounces@redhat.com Errors-To: linux-audit-bounces@redhat.com To: Steve Grubb Cc: linux-audit@redhat.com List-Id: linux-audit@redhat.com On Fri, 2010-11-19 at 11:20 -0500, Steve Grubb wrote: > > I didn't answer right away because I didn't have a good answer for you. If the storm > is large enough to overrun the kernel queue, the rate limiting needs to be in the > kernel. If auditd is able to handle the load, then perhaps you need an analysis plugin > that performs whatever action you deem best. Steve, I understand; it isn't a straightforward thing and I appreciate you thinking about it. I think I have settled on a workable solution. I am using the unix audisp builtin and I am sampling the AVC events. I've got a non-blocking mechanism whereby I can count the AVCs on a very small number of senders. Then I can take action against the offenders (kill). Not perfect, has issues, but might be satisfactory. I'm still testing this sampling approach, making certain I don't introduce any blockage points, which would aggravate the issue. And while this may work on a single process sending thousands of AVCs in a tight loop, it wouldn't work on one which gets respawned, unless I look at the ppid or do something more clever. > > What is the general source of the problem right now? Was it just that the app was > doing something that policy didn't know it could do? Or was there attacks under way > that someone was trying something bad? Or was its just an admin mistake where > something didn't have the right label? Each of these has a different solution. Mostly the first scenario you mention - that the 3rd-party application hit an execution path we had not seen in testing. But of course it doesn't have to be a 3rd-party app. Even ones we create can run amok with AVCs if all code paths are not exercised under all data conditions. Basically untestable in finite time by humans. :) Some things you never know the code will do - for example in one error recovery case I believe some process (or library it uses) decides to go look a different running process and then wants to figure out which connections it has. Well, it doesn't get an answer because of course it isn't policy-able to see the /proc details or some such thing, generates AVCs, and it is in a loop until it gets an answer (forever). Or things which are normally working fine on targeted-policy systems get confused on MLS systems because they cannot connect to the server when they are invoked for a process running at a higher/lower/incomparable MLS level. Then they retry a few million times or so... Or a process decides to see which files it can access in a big data store. All the ones it cannot, for MAC level (MLS) reasons, all generate AVCs. A few hundred isn't a big deal; a few million is. Funny things happen to systems when you subject them to the real world and real users. :) > > I think this is a complex problem and controls might be needed at several spots. I'd > be open to hearing ideas on this too. I've also been wondering if the audit daemon > might want to use control groups as a means of keeping itself scheduled for very busy > systems. But i'd like to hear other people's thoughts. I agree on the complexity. At the very least though I'd think adding a syslog-like function whereby it can assimilate same-event audits and then submit one event like "1000 similar events like this" would be good. Likely 1000 isn't even enough. At one point we were getting well over 1500 AVCs/second over a period of days. On a weekend of course. :) Actually we were able to process that amount. I have no data on the number of drops. Tends to add right up. And this is just one sending host (there are others but they are not as busy). If I had multiples, the aggregating machine would be overrun. As processors/hardware get faster, I assume the AVC error rates will too. In my case, the concern is that a valuable event will be dropped off the queue due to others like I described taking all the resources. Even though I have increased the audispd queue size and the priorities, at some point saturation will inevitably occur. Thanks again! LCB -- LC (Lenny) Bruzenak lenny@magitekltd.com