Auditd errors on busy hosts when rolling over log files

linux-audit.redhat.com archive mirror
 help / color / mirror / Atom feed

* Auditd errors on busy hosts when rolling over log files
@ 2013-11-04  8:46 Burn Alting
  2013-11-04 13:24 ` Steve Grubb
  0 siblings, 1 reply; 4+ messages in thread
From: Burn Alting @ 2013-11-04  8:46 UTC (permalink / raw)
  To: linux-audit

Hi,

I have some quite busy hosts, that emit the following errors when I
request the audit log file is rolled over (via a kill -s USR1
auditdpid).

  Error receiving audit netlink packet(No buffer space available)
  Error sending signal_info request (No buffer space available)

>From reading earlier posts (circa 2009) it would appear my options are

a. Increase backlog buffer (currently 32768)
b. Increase priority_boost (currently 4)
c. Reduce the number of log files (currently 9)

Does anyone have a feel for which of the above should offer the best
return?

Are their other configuration parameters I could adjust (aside from
changing my ruleset in audit.rules)?

Thanks in advance

Burn

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Auditd errors on busy hosts when rolling over log files
  2013-11-04  8:46 Auditd errors on busy hosts when rolling over log files Burn Alting
@ 2013-11-04 13:24 ` Steve Grubb
  2013-11-05 11:07   ` Burn Alting
  0 siblings, 1 reply; 4+ messages in thread
From: Steve Grubb @ 2013-11-04 13:24 UTC (permalink / raw)
  To: burn; +Cc: linux-audit

On Monday, November 04, 2013 07:46:18 PM Burn Alting wrote:
> Hi,
> 
> I have some quite busy hosts, that emit the following errors when I
> request the audit log file is rolled over (via a kill -s USR1
> auditdpid).
> 
>   Error receiving audit netlink packet(No buffer space available)
>   Error sending signal_info request (No buffer space available)
> 
> >From reading earlier posts (circa 2009) it would appear my options are
> 
> a. Increase backlog buffer (currently 32768)
> b. Increase priority_boost (currently 4)
> c. Reduce the number of log files (currently 9)

Another corollary to this is that you can increase the file size and decrease 
the total files which would help on rotation. 

> Does anyone have a feel for which of the above should offer the best
> return?

There are 2 more options:

1) Review the rules to make sure you are not getting events that you really do 
not need. If you have a lot of false positives, then you might add some 
arguments that better narrow the results. For example, perhaps you have this 
rule:

-a always,exit -F arch=b64 -S clock_settime -k time-change

This can give a lot of false positives. The one that really matters is when a 
program sets CLOCK_REALTIME (the wall clock). So, the rule can be re-written 
as:

-a always,exit -F arch=b64 -S clock_settime -F a0=0 -k time-change

which narrows its scope.

2) You might experiment with cgroups.

> Are their other configuration parameters I could adjust (aside from
> changing my ruleset in audit.rules)?

There might be general disk tuning parameters in sysctl that could help as 
well. Choice of file system also has performance impacts. I haven't done any 
experimenting on the performance side, but I know there are people here that 
also have very busy systems.

-Steve

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Auditd errors on busy hosts when rolling over log files
  2013-11-04 13:24 ` Steve Grubb
@ 2013-11-05 11:07   ` Burn Alting
  2013-11-05 13:59     ` Steve Grubb
  0 siblings, 1 reply; 4+ messages in thread
From: Burn Alting @ 2013-11-05 11:07 UTC (permalink / raw)
  To: Steve Grubb; +Cc: linux-audit

On Mon, 2013-11-04 at 08:24 -0500, Steve Grubb wrote:

Thanks Steve.
 
I did a little experimentation today.
 
On a system that generates around 7500 audit events every five minutes I
changed, without success, the following:

In auditd.conf
- changed num_logs from 9 to 5 although I didn't expect a change as I
move out the rolled over (audit.log.?) log files as part of the
processing so there shouldn't be a big file rename impost
- changed priority_boost from 4 to 8
 
In audit.rules
- changed backlog from 32K to 64K to 96K to 128K
- changed rules to reduce the recorded events per 5 minute interval from
7500 to 500-600 for the same period.
 
This particular system is running audit-1.8.2-el5 but I see a similar
problem on a RHEL 6.4 box which I believe is running audit-2.2-2.el6.
 
I did note that if I executed the sync(1) command before signaling
auditd to roll over (ie execute /bin/kill -s USR1 pid) the error
SOMETIMES did not appear.
 
So I am a little bit lost.
 
I believe that the actual effect is just
- the cost of two additional lines in /var/log/messages
- the loss a few logs
 
My actual process is to
a. roll over the log file
b. run an ausearch --interpret like command
 
Perhaps my alternative is to modify my ausearch-like command to be state
full and have it process only new events as per a patch I made to
ausearch some time back

        Subject: 	[PATCH] ausearch: Add checkpoint capability and have
        incomplete logs carry forward when processing multiple audit.log
        files
        Date: 	05/11/2013 03:59:34 PM


Am open to any suggestions ... I think the key issue is that I reduced
the generated commends into audit.log from 7500 to 600 per five minute
interval but I still see the error.

Rgds
> On Monday, November 04, 2013 07:46:18 PM Burn Alting wrote:
> > Hi,
> > 
> > I have some quite busy hosts, that emit the following errors when I
> > request the audit log file is rolled over (via a kill -s USR1
> > auditdpid).
> > 
> >   Error receiving audit netlink packet(No buffer space available)
> >   Error sending signal_info request (No buffer space available)
> > 
> > >From reading earlier posts (circa 2009) it would appear my options are
> > 
> > a. Increase backlog buffer (currently 32768)
> > b. Increase priority_boost (currently 4)
> > c. Reduce the number of log files (currently 9)
> 
> Another corollary to this is that you can increase the file size and decrease 
> the total files which would help on rotation. 
> 
> 
> > Does anyone have a feel for which of the above should offer the best
> > return?
> 
> There are 2 more options:
> 
> 1) Review the rules to make sure you are not getting events that you really do 
> not need. If you have a lot of false positives, then you might add some 
> arguments that better narrow the results. For example, perhaps you have this 
> rule:
> 
> -a always,exit -F arch=b64 -S clock_settime -k time-change
> 
> This can give a lot of false positives. The one that really matters is when a 
> program sets CLOCK_REALTIME (the wall clock). So, the rule can be re-written 
> as:
> 
> -a always,exit -F arch=b64 -S clock_settime -F a0=0 -k time-change
> 
> which narrows its scope.
> 
> 2) You might experiment with cgroups.
> 
> 
> > Are their other configuration parameters I could adjust (aside from
> > changing my ruleset in audit.rules)?
> 
> There might be general disk tuning parameters in sysctl that could help as 
> well. Choice of file system also has performance impacts. I haven't done any 
> experimenting on the performance side, but I know there are people here that 
> also have very busy systems.
> 
> -Steve

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Auditd errors on busy hosts when rolling over log files
  2013-11-05 11:07   ` Burn Alting
@ 2013-11-05 13:59     ` Steve Grubb
  0 siblings, 0 replies; 4+ messages in thread
From: Steve Grubb @ 2013-11-05 13:59 UTC (permalink / raw)
  To: burn; +Cc: linux-audit

Hello,

On Tuesday, November 05, 2013 10:07:08 PM Burn Alting wrote:
> I did a little experimentation today.
> 
> On a system that generates around 7500 audit events every five minutes I
> changed, without success, the following:
> 
> In auditd.conf
> - changed num_logs from 9 to 5 although I didn't expect a change as I
> move out the rolled over (audit.log.?) log files as part of the
> processing so there shouldn't be a big file rename impost

This should have helped a little since you dropped 4 syscalls.

> - changed priority_boost from 4 to 8
> 
> In audit.rules
> - changed backlog from 32K to 64K to 96K to 128K

This should only help to the extent of your constant fill rate. What happens is 
your events are coming in and auditd is unable to attend to them during the 
rotation because it has to start with audit.log.9 and delete it, then move all 
logs up one number leaving no audit.log. At that point it can open a new one. 
So, the backlog needs to be big enough to handle the overflow during that brief 
time. 

I would expect rotation takes 10 milliseconds at the most. But just for the 
sake of argument, let's say it took 1 whole second. At your fill rate, you 
should be receiving 25 events. Some of these events may be compound, meaning 
they have support records besides syscall such as PATH or CWD. Let's assume 
you have 4 supporting records per event. You now have 100 incoming events 
during that one second. It would sound like setting the backlog to 32k should 
be sufficient...unless the system is about to fallover anyways.

You might try running:

while true; do auditctl -s; sleep 5; done

and see if your system is never able to catch up. If that's the case, you need 
to do something about the audit daemon's priority or scheduling. You can boost 
the priority way up. 20. You might even add the 'chrt' command to the 
initscript to see if you can put auditd on a different scheduler.

> - changed rules to reduce the recorded events per 5 minute interval from
> 7500 to 500-600 for the same period.

That should help both the backlog before rotation as well as the fill rate 
during rotation.

> This particular system is running audit-1.8.2-el5 but I see a similar
> problem on a RHEL 6.4 box which I believe is running audit-2.2-2.el6.

I think there was one change to normal processing that saved a syscall to stat 
the disk and just do arithmetic instead. I don't know if that one patch would 
help or not. It would allow auditd to keep the backlog lower prior to 
rotation.

> I did note that if I executed the sync(1) command before signaling
> auditd to roll over (ie execute /bin/kill -s USR1 pid) the error
> SOMETIMES did not appear.
> 
> So I am a little bit lost.

You might also experiment with the disk flushing in auditd.conf.

> I believe that the actual effect is just
> - the cost of two additional lines in /var/log/messages
> - the loss a few logs
> 
> My actual process is to
> a. roll over the log file
> b. run an ausearch --interpret like command

Running the command shouldn't interfere.

> Perhaps my alternative is to modify my ausearch-like command to be state
> full and have it process only new events as per a patch I made to
> ausearch some time back
> 
>         Subject: 	[PATCH] ausearch: Add checkpoint capability and have
>         incomplete logs carry forward when processing multiple audit.log
>         files
>         Date: 	05/11/2013 03:59:34 PM
> 
> 
> Am open to any suggestions ... I think the key issue is that I reduced
> the generated commands into audit.log from 7500 to 600 per five minute
> interval but I still see the error.

I think its several things. Dropping the fill rate will help. But something 
else is going on. Maybe some of these hints can help you investigate the 
problem.

-Steve

> > On Monday, November 04, 2013 07:46:18 PM Burn Alting wrote:
> > > Hi,
> > > 
> > > I have some quite busy hosts, that emit the following errors when I
> > > request the audit log file is rolled over (via a kill -s USR1
> > > auditdpid).
> > > 
> > >   Error receiving audit netlink packet(No buffer space available)
> > >   Error sending signal_info request (No buffer space available)
> > > >
> > > >From reading earlier posts (circa 2009) it would appear my options are
> > > 
> > > a. Increase backlog buffer (currently 32768)
> > > b. Increase priority_boost (currently 4)
> > > c. Reduce the number of log files (currently 9)
> > 
> > Another corollary to this is that you can increase the file size and
> > decrease the total files which would help on rotation.
> > 
> > > Does anyone have a feel for which of the above should offer the best
> > > return?
> > 
> > There are 2 more options:
> > 
> > 1) Review the rules to make sure you are not getting events that you
> > really do not need. If you have a lot of false positives, then you might
> > add some arguments that better narrow the results. For example, perhaps
> > you have this rule:
> > 
> > -a always,exit -F arch=b64 -S clock_settime -k time-change
> > 
> > This can give a lot of false positives. The one that really matters is
> > when a program sets CLOCK_REALTIME (the wall clock). So, the rule can be
> > re-written as:
> > 
> > -a always,exit -F arch=b64 -S clock_settime -F a0=0 -k time-change
> > 
> > which narrows its scope.
> > 
> > 2) You might experiment with cgroups.
> > 
> > > Are their other configuration parameters I could adjust (aside from
> > > changing my ruleset in audit.rules)?
> > 
> > There might be general disk tuning parameters in sysctl that could help as
> > well. Choice of file system also has performance impacts. I haven't done
> > any experimenting on the performance side, but I know there are people
> > here that also have very busy systems.
> > 
> > -Steve

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2013-11-05 13:59 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-04  8:46 Auditd errors on busy hosts when rolling over log files Burn Alting
2013-11-04 13:24 ` Steve Grubb
2013-11-05 11:07   ` Burn Alting
2013-11-05 13:59     ` Steve Grubb

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).