Performance of libauparse

public inbox for linux-audit@redhat.com
 help / color / mirror / Atom feed

* Performance of libauparse
@ 2008-09-30 13:34 Matthew Booth
  2008-09-30 18:34 ` Steve Grubb
  2008-09-30 19:18 ` John Dennis
  0 siblings, 2 replies; 11+ messages in thread
From: Matthew Booth @ 2008-09-30 13:34 UTC (permalink / raw)
  To: linux-audit

I have been investigating using libauparse in my austream replacement 
audit daemon to do some inline data enhancement[1]. austream is 
essentially a very thin wrapper which pulls audit records out of the 
kernel, wraps them in a UDP syslog packet and sends them to the network. 
It is very simple and very fast.

To measure the overhead of libauparse on austream I initialised auparse 
  as AUSOURCE_FEED, fed each received record into it, and spat them out 
unmodified on receiving the AUPARSE_CB_EVENT_READY event. This added 
more than an order of magnitude to the time austream spends in 
userspace. A brief look at this overhead shows that about 40% is spent 
in malloc()/free(), and 25% is spent in strlen, strdup, memcpy, memmove 
and friends. I suspect that very substantial gains could be made in the 
performance of libauparse by reworking the way it uses memory, and 
passing the length of strings around with the strings. Unfortunately, I 
suspect this would amount to a substantial rewrite.

Is this something anybody else is interested in? I guess performance 
isn't so important if you're just scanning log files in non-real time.

Matt

[1] What I'd really like is a well-defined binary format from the kernel.
-- 
Matthew Booth, RHCA, RHCSS
Red Hat, Global Professional Services

M:       +44 (0)7977 267231
GPG ID:  D33C3490
GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Performance of libauparse
  2008-09-30 13:34 Performance of libauparse Matthew Booth
@ 2008-09-30 18:34 ` Steve Grubb
  2008-09-30 19:18 ` John Dennis
  1 sibling, 0 replies; 11+ messages in thread
From: Steve Grubb @ 2008-09-30 18:34 UTC (permalink / raw)
  To: linux-audit

On Tuesday 30 September 2008 09:34:00 Matthew Booth wrote:
> To measure the overhead of libauparse on austream I initialised auparse
>   as AUSOURCE_FEED, fed each received record into it, and spat them out
> unmodified on receiving the AUPARSE_CB_EVENT_READY event. 

This is not an apples to apples comparison. libauparse fully parses each 
field. So, it is doing significantly more work. You could add a strtok_r loop 
to your code to come closer to a direct comparison.

> This added more than an order of magnitude to the time austream spends in
> userspace. A brief look at this overhead shows that about 40% is spent
> in malloc()/free(), 

Yep. The main idea was really to just get it working and then optimize in a 
future release. The memory management is the low hanging fruit. I've been 
thinking to fix it so that each field is not malloc'ed individually, that 
there are string pointers and lengths stored and used internally.

> and 25% is spent in strlen, strdup, memcpy, memmove and friends. I suspect
> that very substantial gains could be made in the performance of libauparse
> by reworking the way it uses memory, and passing the length of strings
> around with the strings. Unfortunately, I suspect this would amount to a
> substantial rewrite.

Possibly. But I wasn't planning to do this until after solving the interlaced 
record problem. I'd rather it be the current speed and correct than faster 
and still wrong.

> Is this something anybody else is interested in? I guess performance
> isn't so important if you're just scanning log files in non-real time.

Yes, after the next release. In the mean time it might not hurt to add some 
tests to the auparse_test programs so that any re-write induced regression 
has a chance of being found.

> [1] What I'd really like is a well-defined binary format from the kernel.

Not likely to ever happen.

-Steve

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Performance of libauparse
  2008-09-30 13:34 Performance of libauparse Matthew Booth
  2008-09-30 18:34 ` Steve Grubb
@ 2008-09-30 19:18 ` John Dennis
  2008-09-30 22:18   ` Matthew Booth
  2008-10-01 13:15   ` Eric Paris
  1 sibling, 2 replies; 11+ messages in thread
From: John Dennis @ 2008-09-30 19:18 UTC (permalink / raw)
  To: Matthew Booth; +Cc: linux-audit

Matthew Booth wrote:
> I have been investigating using libauparse in my austream replacement 
> audit daemon to do some inline data enhancement[1]. austream is 
> essentially a very thin wrapper which pulls audit records out of the 
> kernel, wraps them in a UDP syslog packet and sends them to the 
> network. It is very simple and very fast.
>
> To measure the overhead of libauparse on austream I initialised 
> auparse  as AUSOURCE_FEED, fed each received record into it, and spat 
> them out unmodified on receiving the AUPARSE_CB_EVENT_READY event. 
> This added more than an order of magnitude to the time austream spends 
> in userspace. A brief look at this overhead shows that about 40% is 
> spent in malloc()/free(), and 25% is spent in strlen, strdup, memcpy, 
> memmove and friends. I suspect that very substantial gains could be 
> made in the performance of libauparse by reworking the way it uses 
> memory, and passing the length of strings around with the strings. 
> Unfortunately, I suspect this would amount to a substantial rewrite.
>
> Is this something anybody else is interested in? I guess performance 
> isn't so important if you're just scanning log files in non-real time.
>
> Matt
>
> [1] What I'd really like is a well-defined binary format from the kernel.

auparse is very inefficient in how it handles data. I noticed this when 
reading the source code and mentioned to Steve the large number of 
strdup's used to assemble an event. This is compounded by the fact the 
processed data only persists for the period of time the record/event is 
current. I am not surprised to see that profiling reveals a significant 
proportion of time is spent repeatedly creating and destroying strings.

I also agree the data stream which emerges from audit is rather 
difficult to work with. Eric likes to point out we can't change the 
kernel, so maybe what we really need (and has been proposed) is for 
auditd to reformat the data before emitting it or writing it do disk 
(e.g. assemble records into events, decode strings which have been 
hexified, etc.) Currently auparse is responsible for much of this as 
part of a post processing step which has to be repeated every time audit 
data is read instead of just once as it emerges from the kernel. If 
instead the auparse user level code was folded into auditd which then 
became responsible for formatting the ad hoc data received from the 
kernel the final output from audit could be much more friendly and much 
of the rationale for auparse would evaporate.

-- 
John Dennis <jdennis@redhat.com>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Performance of libauparse
  2008-09-30 19:18 ` John Dennis
@ 2008-09-30 22:18   ` Matthew Booth
  2008-10-01 13:15   ` Eric Paris
  1 sibling, 0 replies; 11+ messages in thread
From: Matthew Booth @ 2008-09-30 22:18 UTC (permalink / raw)
  To: John Dennis; +Cc: linux-audit

John Dennis wrote:
> I also agree the data stream which emerges from audit is rather 
> difficult to work with. Eric likes to point out we can't change the 
> kernel, so maybe what we really need (and has been proposed) is for 
> auditd to reformat the data before emitting it or writing it do disk 
> (e.g. assemble records into events, decode strings which have been 
> hexified, etc.) Currently auparse is responsible for much of this as 
> part of a post processing step which has to be repeated every time audit 
> data is read instead of just once as it emerges from the kernel. If 
> instead the auparse user level code was folded into auditd which then 
> became responsible for formatting the ad hoc data received from the 
> kernel the final output from audit could be much more friendly and much 
> of the rationale for auparse would evaporate.

I was going to request going the other way with libauparse, i.e. to 
entirely separate it from auditd. As I mentioned, I'm not using auditd 
because it wasn't really written with my customer's requirements in mind 
(high volume, no local storage). My audit daemon needs to run on RHEL 3 
(it has a LAuS backend too) and RHEL 4. I don't see anything 
architecturally which ties libauparse to auditd, so if it was a separate 
library I could recompile it for RHEL 4 without replacing the RHEL 4 
audit-libs, etc. I can certainly see the efficiency in auditd parsing 
data before handing it off to dispatchers, but it's not hard to 
construct non-auditd uses for it either. Of course, it would need some 
performance work first for my use case, but I wouldn't want to duplicate 
the effort unnecessarily.

On the more general topic of the format of data emitted by the kernel, I 
see 2 serious threads of problem presented by the above, and by the 
current solution (even though they are currently the most pragmatic):

1. libauparse only exists to reverse engineer a really bad protocol.
2. The existing protocol has already broken userspace many times.

On that second point, the changes since the protocol was introduced 
(pre-git history, so I can't work out when) have been such that any tool 
written at the time of 2.6.12 couldn't possibly expect to continue to 
function correctly if you updated the kernel underneath it. Some examples:

bccf6ae083318ea08094d6ab185fdf7c49906b3a
"audit_rate_limit=%d old=%d by auid %u" -> "audit_rate_limit=%d old=%d 
by auid=%u"

9e45eeac867d51ff3395dcf3d7aedf5ac2812c8
Add escaping to comm field

a6c043a887a9db32a545539426ddfc8cc2c28f8f
Add tty field without quotes or escaping of value

ac03221a4fdda9bfdabf99bcd129847f20fc1d80
Remove qbytes field from IPC record
Change iuid, igid field names

5b9a4262232d632c28990fcdf4f36d0e0ade5f18
Convert some hex IPC records to octal

de6bbd1d30e5912620d25dd15e3f180ac7f9fcef
Change to format of EXECVE messages

Auditd only continues to function because it has been updated in step 
with the kernel: it is 'special'. Upstream's opinion on this is fairly 
clear. Note this isn't an argument in favour of a binary format 
specifically (although I favour that for efficiency), but it does 
highlight the requirement for a new, well-designed format.

Matt
-- 
Matthew Booth, RHCA, RHCSS
Red Hat, Global Professional Services

M:       +44 (0)7977 267231
GPG ID:  D33C3490
GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Performance of libauparse
  2008-09-30 19:18 ` John Dennis
  2008-09-30 22:18   ` Matthew Booth
@ 2008-10-01 13:15   ` Eric Paris
  2008-10-01 16:08     ` Matthew Booth
  2008-10-01 18:38     ` Paul Moore
  1 sibling, 2 replies; 11+ messages in thread
From: Eric Paris @ 2008-10-01 13:15 UTC (permalink / raw)
  To: John Dennis; +Cc: linux-audit

On Tue, 2008-09-30 at 15:18 -0400, John Dennis wrote:
> Eric likes to point out we can't change the 
> kernel

Close, but not quite.  I say we can't change the kernel without complete
backwards compatibility.  Show me the right solution and we can get
there, we just can't throw away what's already there.

-Eric

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Performance of libauparse
  2008-10-01 13:15   ` Eric Paris
@ 2008-10-01 16:08     ` Matthew Booth
  2008-10-01 18:46       ` Steve Grubb
  2008-10-01 18:38     ` Paul Moore
  1 sibling, 1 reply; 11+ messages in thread
From: Matthew Booth @ 2008-10-01 16:08 UTC (permalink / raw)
  To: Eric Paris; +Cc: linux-audit

Eric Paris wrote:
> On Tue, 2008-09-30 at 15:18 -0400, John Dennis wrote:
>> Eric likes to point out we can't change the 
>> kernel
> 
> Close, but not quite.  I say we can't change the kernel without complete
> backwards compatibility.  Show me the right solution and we can get
> there, we just can't throw away what's already there.

My other mail listed 6 ways in which audit *has already broken* 
userspace through non-backwards compatibility. That list came from a 
very quick search, and were only the changes which unarguably broke 
userspace. There are undoubtedly far more. If you look at those 6 
changes, each has been a genuine improvement but broke userspace 
nonetheless. The situation is still very messy, and this will continue 
to happen because the protocol has evolved organically rather than 
through deliberate design, and was not designed for extensibility.

The next time somebody suggests breaking userspace you could take the 
opportunity to implement a new protocol instead. The current protocol 
could be frozen, and the new protocol implemented in parallel. It seems 
to me that the biggest chunk of work to do this would be in the protocol 
design. As the same data will likely be output in the same places, most 
of the coding should be donkey work to change the format. As far as 
kernel infrastructure changes go, this wouldn't be a big one.

Matt
-- 
Matthew Booth, RHCA, RHCSS
Red Hat, Global Professional Services

M:       +44 (0)7977 267231
GPG ID:  D33C3490
GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Performance of libauparse
  2008-10-01 13:15   ` Eric Paris
  2008-10-01 16:08     ` Matthew Booth
@ 2008-10-01 18:38     ` Paul Moore
  2008-10-01 19:20       ` LC Bruzenak
  1 sibling, 1 reply; 11+ messages in thread
From: Paul Moore @ 2008-10-01 18:38 UTC (permalink / raw)
  To: linux-audit

On Wednesday 01 October 2008 9:15:27 am Eric Paris wrote:
> On Tue, 2008-09-30 at 15:18 -0400, John Dennis wrote:
> > Eric likes to point out we can't change the
> > kernel
>
> Close, but not quite.  I say we can't change the kernel without
> complete backwards compatibility.  Show me the right solution and we
> can get there, we just can't throw away what's already there.

Not really aimed at anyone in particular, just throwing out a possible 
solution ...

 1. By default kernel starts up and emits existing string format, legacy
    audit daemons function normally
 2. If a new audit daemon starts it sends a message to the kernel
    indicating that it can handle the new format and the kernel starts
    emitting newly formatted records[1]
 3. The new audit daemon records the audit records in whatever format it
    is configured to so: legacy string format, raw binary format, and/or
    some wacky format yet to be invented[2]

[1] The new record format should probably a binary format which makes 
use of netlink attributes, this would avoid much of the string parsing 
and versioning problems we have seen previously.  There is ample 
evidence of kernel subsystems using netlink in a similar fashion 
successfully.

[2] If done carefully, we might be able to allow administrators to 
create their own on-disk string formats without the need to write an 
entire dispatcher plug-in.

-- 
paul moore
linux @ hp

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Performance of libauparse
  2008-10-01 16:08     ` Matthew Booth
@ 2008-10-01 18:46       ` Steve Grubb
  0 siblings, 0 replies; 11+ messages in thread
From: Steve Grubb @ 2008-10-01 18:46 UTC (permalink / raw)
  To: linux-audit

On Wednesday 01 October 2008 12:08:44 Matthew Booth wrote:
> > Close, but not quite.  I say we can't change the kernel without complete
> > backwards compatibility.  Show me the right solution and we can get
> > there, we just can't throw away what's already there.
>
> My other mail listed 6 ways in which audit *has already broken*
> userspace through non-backwards compatibility.

Are they verified broken or just that something changed? Backwards 
compatibility was worked in wherever possible.

>The situation is still very messy, and this will continue to happen because
>the protocol has evolved organically rather than  through deliberate design,
>and was not designed for extensibility.

There was a deliberate design. Compactness and extensibility are sometimes at 
odds, though. But this is straying way away from your original post about 
performance improvements - which I would find to be topic worth talking 
about. I will not participate in any rehash of past discussions about parsing 
or representation of data.

-Steve

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Performance of libauparse
  2008-10-01 18:38     ` Paul Moore
@ 2008-10-01 19:20       ` LC Bruzenak
  2008-10-01 19:31         ` LC Bruzenak
  2008-10-01 21:19         ` Paul Moore
  0 siblings, 2 replies; 11+ messages in thread
From: LC Bruzenak @ 2008-10-01 19:20 UTC (permalink / raw)
  To: linux-audit


On Wed, 2008-10-01 at 14:38 -0400, Paul Moore wrote:
> On Wednesday 01 October 2008 9:15:27 am Eric Paris wrote:
> > On Tue, 2008-09-30 at 15:18 -0400, John Dennis wrote:
> > > Eric likes to point out we can't change the
> > > kernel
> >
> > Close, but not quite.  I say we can't change the kernel without
> > complete backwards compatibility.  Show me the right solution and we
> > can get there, we just can't throw away what's already there.
> 
> Not really aimed at anyone in particular, just throwing out a possible 
> solution ...
> 
>  1. By default kernel starts up and emits existing string format, legacy
>     audit daemons function normally
>  2. If a new audit daemon starts it sends a message to the kernel
>     indicating that it can handle the new format and the kernel starts
>     emitting newly formatted records[1]
>  3. The new audit daemon records the audit records in whatever format it
>     is configured to so: legacy string format, raw binary format, and/or
>     some wacky format yet to be invented[2]
> 
> [1] The new record format should probably a binary format which makes 
> use of netlink attributes, this would avoid much of the string parsing 
> and versioning problems we have seen previously.  There is ample 
> evidence of kernel subsystems using netlink in a similar fashion 
> successfully.
> 
> [2] If done carefully, we might be able to allow administrators to 
> create their own on-disk string formats without the need to write an 
> entire dispatcher plug-in.
> 

This isn't a vote against (since I haven't fielded yet), but I could see
it could throw the user-space tools a curve (especially option [2])
regarding legacy data.
Might have to register the format spec inside the log file?

LCB.

-- 
LC (Lenny) Bruzenak
lenny@magitekltd.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Performance of libauparse
  2008-10-01 19:20       ` LC Bruzenak
@ 2008-10-01 19:31         ` LC Bruzenak
  2008-10-01 21:19         ` Paul Moore
  1 sibling, 0 replies; 11+ messages in thread
From: LC Bruzenak @ 2008-10-01 19:31 UTC (permalink / raw)
  To: linux-audit

Oh, another thing which would (potentially) get harder is aggregation.
Since we have aggregated audit data sent from one audisp-remote to the
event loop of the aggregating auditd, both systems (kernels) would need
to be on the same data format page.
Otherwise, the formats would be interwoven in the same on-disk log.

LCB.

-- 
LC (Lenny) Bruzenak
lenny@magitekltd.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Performance of libauparse
  2008-10-01 19:20       ` LC Bruzenak
  2008-10-01 19:31         ` LC Bruzenak
@ 2008-10-01 21:19         ` Paul Moore
  1 sibling, 0 replies; 11+ messages in thread
From: Paul Moore @ 2008-10-01 21:19 UTC (permalink / raw)
  To: linux-audit

On Wednesday 01 October 2008 3:20:13 pm LC Bruzenak wrote:
> On Wed, 2008-10-01 at 14:38 -0400, Paul Moore wrote:
> > On Wednesday 01 October 2008 9:15:27 am Eric Paris wrote:
> > > On Tue, 2008-09-30 at 15:18 -0400, John Dennis wrote:
> > > > Eric likes to point out we can't change the
> > > > kernel
> > >
> > > Close, but not quite.  I say we can't change the kernel without
> > > complete backwards compatibility.  Show me the right solution and
> > > we can get there, we just can't throw away what's already there.
> >
> > Not really aimed at anyone in particular, just throwing out a
> > possible solution ...
> >
> >  1. By default kernel starts up and emits existing string format,
> > legacy audit daemons function normally
> >  2. If a new audit daemon starts it sends a message to the kernel
> >     indicating that it can handle the new format and the kernel
> > starts emitting newly formatted records[1]
> >  3. The new audit daemon records the audit records in whatever
> > format it is configured to so: legacy string format, raw binary
> > format, and/or some wacky format yet to be invented[2]
> >
> > [1] The new record format should probably a binary format which
> > makes use of netlink attributes, this would avoid much of the
> > string parsing and versioning problems we have seen previously. 
> > There is ample evidence of kernel subsystems using netlink in a
> > similar fashion successfully.
> >
> > [2] If done carefully, we might be able to allow administrators to
> > create their own on-disk string formats without the need to write
> > an entire dispatcher plug-in.
>
> This isn't a vote against (since I haven't fielded yet), but I could
> see it could throw the user-space tools a curve (especially option
> [2]) regarding legacy data.

Exactly why the system needs to be backwards compatible; you stick with 
the old format until your tools support a newer format.  However, if 
you are talking the base audit userspace tools, I would expect those to 
be updated in sync with an update audit daemon so it should be more of 
3rd party issue.

The nice bit about shipping binary events to userspace is you can create 
whatever format you want.  Ideally there would be a way for audit 
plugins to see the events in binary format which opens things up even 
more.

> Might have to register the format spec inside the log file?

Sure.  Why not.  Once you abstract the representation of the data from 
the data itself you can do whatever you want with much less fuss.  We 
can even stop worrying about all this string escaping sillyness that 
seems to never die.

[NOTE: I'm grabbing your other mail and replying to it here]

> Oh, another thing which would (potentially) get harder is
> aggregation. Since we have aggregated audit data sent from one
> audisp-remote to the event loop of the aggregating auditd, both
> systems (kernels) would need to be on the same data format page.

Yep.  Or you just pass around the binary format and you never have to 
worry about formatting, just new field types and honestly there are 
easy solutions around that as well.

-- 
paul moore
linux @ hp

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2008-10-01 21:19 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-09-30 13:34 Performance of libauparse Matthew Booth
2008-09-30 18:34 ` Steve Grubb
2008-09-30 19:18 ` John Dennis
2008-09-30 22:18   ` Matthew Booth
2008-10-01 13:15   ` Eric Paris
2008-10-01 16:08     ` Matthew Booth
2008-10-01 18:46       ` Steve Grubb
2008-10-01 18:38     ` Paul Moore
2008-10-01 19:20       ` LC Bruzenak
2008-10-01 19:31         ` LC Bruzenak
2008-10-01 21:19         ` Paul Moore

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox