* Performance of libauparse
@ 2008-09-30 13:34 Matthew Booth
2008-09-30 18:34 ` Steve Grubb
2008-09-30 19:18 ` John Dennis
0 siblings, 2 replies; 11+ messages in thread
From: Matthew Booth @ 2008-09-30 13:34 UTC (permalink / raw)
To: linux-audit
I have been investigating using libauparse in my austream replacement
audit daemon to do some inline data enhancement[1]. austream is
essentially a very thin wrapper which pulls audit records out of the
kernel, wraps them in a UDP syslog packet and sends them to the network.
It is very simple and very fast.
To measure the overhead of libauparse on austream I initialised auparse
as AUSOURCE_FEED, fed each received record into it, and spat them out
unmodified on receiving the AUPARSE_CB_EVENT_READY event. This added
more than an order of magnitude to the time austream spends in
userspace. A brief look at this overhead shows that about 40% is spent
in malloc()/free(), and 25% is spent in strlen, strdup, memcpy, memmove
and friends. I suspect that very substantial gains could be made in the
performance of libauparse by reworking the way it uses memory, and
passing the length of strings around with the strings. Unfortunately, I
suspect this would amount to a substantial rewrite.
Is this something anybody else is interested in? I guess performance
isn't so important if you're just scanning log files in non-real time.
Matt
[1] What I'd really like is a well-defined binary format from the kernel.
--
Matthew Booth, RHCA, RHCSS
Red Hat, Global Professional Services
M: +44 (0)7977 267231
GPG ID: D33C3490
GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Performance of libauparse
2008-09-30 13:34 Performance of libauparse Matthew Booth
@ 2008-09-30 18:34 ` Steve Grubb
2008-09-30 19:18 ` John Dennis
1 sibling, 0 replies; 11+ messages in thread
From: Steve Grubb @ 2008-09-30 18:34 UTC (permalink / raw)
To: linux-audit
On Tuesday 30 September 2008 09:34:00 Matthew Booth wrote:
> To measure the overhead of libauparse on austream I initialised auparse
> as AUSOURCE_FEED, fed each received record into it, and spat them out
> unmodified on receiving the AUPARSE_CB_EVENT_READY event.
This is not an apples to apples comparison. libauparse fully parses each
field. So, it is doing significantly more work. You could add a strtok_r loop
to your code to come closer to a direct comparison.
> This added more than an order of magnitude to the time austream spends in
> userspace. A brief look at this overhead shows that about 40% is spent
> in malloc()/free(),
Yep. The main idea was really to just get it working and then optimize in a
future release. The memory management is the low hanging fruit. I've been
thinking to fix it so that each field is not malloc'ed individually, that
there are string pointers and lengths stored and used internally.
> and 25% is spent in strlen, strdup, memcpy, memmove and friends. I suspect
> that very substantial gains could be made in the performance of libauparse
> by reworking the way it uses memory, and passing the length of strings
> around with the strings. Unfortunately, I suspect this would amount to a
> substantial rewrite.
Possibly. But I wasn't planning to do this until after solving the interlaced
record problem. I'd rather it be the current speed and correct than faster
and still wrong.
> Is this something anybody else is interested in? I guess performance
> isn't so important if you're just scanning log files in non-real time.
Yes, after the next release. In the mean time it might not hurt to add some
tests to the auparse_test programs so that any re-write induced regression
has a chance of being found.
> [1] What I'd really like is a well-defined binary format from the kernel.
Not likely to ever happen.
-Steve
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Performance of libauparse
2008-09-30 13:34 Performance of libauparse Matthew Booth
2008-09-30 18:34 ` Steve Grubb
@ 2008-09-30 19:18 ` John Dennis
2008-09-30 22:18 ` Matthew Booth
2008-10-01 13:15 ` Eric Paris
1 sibling, 2 replies; 11+ messages in thread
From: John Dennis @ 2008-09-30 19:18 UTC (permalink / raw)
To: Matthew Booth; +Cc: linux-audit
Matthew Booth wrote:
> I have been investigating using libauparse in my austream replacement
> audit daemon to do some inline data enhancement[1]. austream is
> essentially a very thin wrapper which pulls audit records out of the
> kernel, wraps them in a UDP syslog packet and sends them to the
> network. It is very simple and very fast.
>
> To measure the overhead of libauparse on austream I initialised
> auparse as AUSOURCE_FEED, fed each received record into it, and spat
> them out unmodified on receiving the AUPARSE_CB_EVENT_READY event.
> This added more than an order of magnitude to the time austream spends
> in userspace. A brief look at this overhead shows that about 40% is
> spent in malloc()/free(), and 25% is spent in strlen, strdup, memcpy,
> memmove and friends. I suspect that very substantial gains could be
> made in the performance of libauparse by reworking the way it uses
> memory, and passing the length of strings around with the strings.
> Unfortunately, I suspect this would amount to a substantial rewrite.
>
> Is this something anybody else is interested in? I guess performance
> isn't so important if you're just scanning log files in non-real time.
>
> Matt
>
> [1] What I'd really like is a well-defined binary format from the kernel.
auparse is very inefficient in how it handles data. I noticed this when
reading the source code and mentioned to Steve the large number of
strdup's used to assemble an event. This is compounded by the fact the
processed data only persists for the period of time the record/event is
current. I am not surprised to see that profiling reveals a significant
proportion of time is spent repeatedly creating and destroying strings.
I also agree the data stream which emerges from audit is rather
difficult to work with. Eric likes to point out we can't change the
kernel, so maybe what we really need (and has been proposed) is for
auditd to reformat the data before emitting it or writing it do disk
(e.g. assemble records into events, decode strings which have been
hexified, etc.) Currently auparse is responsible for much of this as
part of a post processing step which has to be repeated every time audit
data is read instead of just once as it emerges from the kernel. If
instead the auparse user level code was folded into auditd which then
became responsible for formatting the ad hoc data received from the
kernel the final output from audit could be much more friendly and much
of the rationale for auparse would evaporate.
--
John Dennis <jdennis@redhat.com>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Performance of libauparse
2008-09-30 19:18 ` John Dennis
@ 2008-09-30 22:18 ` Matthew Booth
2008-10-01 13:15 ` Eric Paris
1 sibling, 0 replies; 11+ messages in thread
From: Matthew Booth @ 2008-09-30 22:18 UTC (permalink / raw)
To: John Dennis; +Cc: linux-audit
John Dennis wrote:
> I also agree the data stream which emerges from audit is rather
> difficult to work with. Eric likes to point out we can't change the
> kernel, so maybe what we really need (and has been proposed) is for
> auditd to reformat the data before emitting it or writing it do disk
> (e.g. assemble records into events, decode strings which have been
> hexified, etc.) Currently auparse is responsible for much of this as
> part of a post processing step which has to be repeated every time audit
> data is read instead of just once as it emerges from the kernel. If
> instead the auparse user level code was folded into auditd which then
> became responsible for formatting the ad hoc data received from the
> kernel the final output from audit could be much more friendly and much
> of the rationale for auparse would evaporate.
I was going to request going the other way with libauparse, i.e. to
entirely separate it from auditd. As I mentioned, I'm not using auditd
because it wasn't really written with my customer's requirements in mind
(high volume, no local storage). My audit daemon needs to run on RHEL 3
(it has a LAuS backend too) and RHEL 4. I don't see anything
architecturally which ties libauparse to auditd, so if it was a separate
library I could recompile it for RHEL 4 without replacing the RHEL 4
audit-libs, etc. I can certainly see the efficiency in auditd parsing
data before handing it off to dispatchers, but it's not hard to
construct non-auditd uses for it either. Of course, it would need some
performance work first for my use case, but I wouldn't want to duplicate
the effort unnecessarily.
On the more general topic of the format of data emitted by the kernel, I
see 2 serious threads of problem presented by the above, and by the
current solution (even though they are currently the most pragmatic):
1. libauparse only exists to reverse engineer a really bad protocol.
2. The existing protocol has already broken userspace many times.
On that second point, the changes since the protocol was introduced
(pre-git history, so I can't work out when) have been such that any tool
written at the time of 2.6.12 couldn't possibly expect to continue to
function correctly if you updated the kernel underneath it. Some examples:
bccf6ae083318ea08094d6ab185fdf7c49906b3a
"audit_rate_limit=%d old=%d by auid %u" -> "audit_rate_limit=%d old=%d
by auid=%u"
9e45eeac867d51ff3395dcf3d7aedf5ac2812c8
Add escaping to comm field
a6c043a887a9db32a545539426ddfc8cc2c28f8f
Add tty field without quotes or escaping of value
ac03221a4fdda9bfdabf99bcd129847f20fc1d80
Remove qbytes field from IPC record
Change iuid, igid field names
5b9a4262232d632c28990fcdf4f36d0e0ade5f18
Convert some hex IPC records to octal
de6bbd1d30e5912620d25dd15e3f180ac7f9fcef
Change to format of EXECVE messages
Auditd only continues to function because it has been updated in step
with the kernel: it is 'special'. Upstream's opinion on this is fairly
clear. Note this isn't an argument in favour of a binary format
specifically (although I favour that for efficiency), but it does
highlight the requirement for a new, well-designed format.
Matt
--
Matthew Booth, RHCA, RHCSS
Red Hat, Global Professional Services
M: +44 (0)7977 267231
GPG ID: D33C3490
GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Performance of libauparse
2008-09-30 19:18 ` John Dennis
2008-09-30 22:18 ` Matthew Booth
@ 2008-10-01 13:15 ` Eric Paris
2008-10-01 16:08 ` Matthew Booth
2008-10-01 18:38 ` Paul Moore
1 sibling, 2 replies; 11+ messages in thread
From: Eric Paris @ 2008-10-01 13:15 UTC (permalink / raw)
To: John Dennis; +Cc: linux-audit
On Tue, 2008-09-30 at 15:18 -0400, John Dennis wrote:
> Eric likes to point out we can't change the
> kernel
Close, but not quite. I say we can't change the kernel without complete
backwards compatibility. Show me the right solution and we can get
there, we just can't throw away what's already there.
-Eric
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Performance of libauparse
2008-10-01 13:15 ` Eric Paris
@ 2008-10-01 16:08 ` Matthew Booth
2008-10-01 18:46 ` Steve Grubb
2008-10-01 18:38 ` Paul Moore
1 sibling, 1 reply; 11+ messages in thread
From: Matthew Booth @ 2008-10-01 16:08 UTC (permalink / raw)
To: Eric Paris; +Cc: linux-audit
Eric Paris wrote:
> On Tue, 2008-09-30 at 15:18 -0400, John Dennis wrote:
>> Eric likes to point out we can't change the
>> kernel
>
> Close, but not quite. I say we can't change the kernel without complete
> backwards compatibility. Show me the right solution and we can get
> there, we just can't throw away what's already there.
My other mail listed 6 ways in which audit *has already broken*
userspace through non-backwards compatibility. That list came from a
very quick search, and were only the changes which unarguably broke
userspace. There are undoubtedly far more. If you look at those 6
changes, each has been a genuine improvement but broke userspace
nonetheless. The situation is still very messy, and this will continue
to happen because the protocol has evolved organically rather than
through deliberate design, and was not designed for extensibility.
The next time somebody suggests breaking userspace you could take the
opportunity to implement a new protocol instead. The current protocol
could be frozen, and the new protocol implemented in parallel. It seems
to me that the biggest chunk of work to do this would be in the protocol
design. As the same data will likely be output in the same places, most
of the coding should be donkey work to change the format. As far as
kernel infrastructure changes go, this wouldn't be a big one.
Matt
--
Matthew Booth, RHCA, RHCSS
Red Hat, Global Professional Services
M: +44 (0)7977 267231
GPG ID: D33C3490
GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Performance of libauparse
2008-10-01 13:15 ` Eric Paris
2008-10-01 16:08 ` Matthew Booth
@ 2008-10-01 18:38 ` Paul Moore
2008-10-01 19:20 ` LC Bruzenak
1 sibling, 1 reply; 11+ messages in thread
From: Paul Moore @ 2008-10-01 18:38 UTC (permalink / raw)
To: linux-audit
On Wednesday 01 October 2008 9:15:27 am Eric Paris wrote:
> On Tue, 2008-09-30 at 15:18 -0400, John Dennis wrote:
> > Eric likes to point out we can't change the
> > kernel
>
> Close, but not quite. I say we can't change the kernel without
> complete backwards compatibility. Show me the right solution and we
> can get there, we just can't throw away what's already there.
Not really aimed at anyone in particular, just throwing out a possible
solution ...
1. By default kernel starts up and emits existing string format, legacy
audit daemons function normally
2. If a new audit daemon starts it sends a message to the kernel
indicating that it can handle the new format and the kernel starts
emitting newly formatted records[1]
3. The new audit daemon records the audit records in whatever format it
is configured to so: legacy string format, raw binary format, and/or
some wacky format yet to be invented[2]
[1] The new record format should probably a binary format which makes
use of netlink attributes, this would avoid much of the string parsing
and versioning problems we have seen previously. There is ample
evidence of kernel subsystems using netlink in a similar fashion
successfully.
[2] If done carefully, we might be able to allow administrators to
create their own on-disk string formats without the need to write an
entire dispatcher plug-in.
--
paul moore
linux @ hp
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Performance of libauparse
2008-10-01 16:08 ` Matthew Booth
@ 2008-10-01 18:46 ` Steve Grubb
0 siblings, 0 replies; 11+ messages in thread
From: Steve Grubb @ 2008-10-01 18:46 UTC (permalink / raw)
To: linux-audit
On Wednesday 01 October 2008 12:08:44 Matthew Booth wrote:
> > Close, but not quite. I say we can't change the kernel without complete
> > backwards compatibility. Show me the right solution and we can get
> > there, we just can't throw away what's already there.
>
> My other mail listed 6 ways in which audit *has already broken*
> userspace through non-backwards compatibility.
Are they verified broken or just that something changed? Backwards
compatibility was worked in wherever possible.
>The situation is still very messy, and this will continue to happen because
>the protocol has evolved organically rather than through deliberate design,
>and was not designed for extensibility.
There was a deliberate design. Compactness and extensibility are sometimes at
odds, though. But this is straying way away from your original post about
performance improvements - which I would find to be topic worth talking
about. I will not participate in any rehash of past discussions about parsing
or representation of data.
-Steve
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Performance of libauparse
2008-10-01 18:38 ` Paul Moore
@ 2008-10-01 19:20 ` LC Bruzenak
2008-10-01 19:31 ` LC Bruzenak
2008-10-01 21:19 ` Paul Moore
0 siblings, 2 replies; 11+ messages in thread
From: LC Bruzenak @ 2008-10-01 19:20 UTC (permalink / raw)
To: linux-audit
On Wed, 2008-10-01 at 14:38 -0400, Paul Moore wrote:
> On Wednesday 01 October 2008 9:15:27 am Eric Paris wrote:
> > On Tue, 2008-09-30 at 15:18 -0400, John Dennis wrote:
> > > Eric likes to point out we can't change the
> > > kernel
> >
> > Close, but not quite. I say we can't change the kernel without
> > complete backwards compatibility. Show me the right solution and we
> > can get there, we just can't throw away what's already there.
>
> Not really aimed at anyone in particular, just throwing out a possible
> solution ...
>
> 1. By default kernel starts up and emits existing string format, legacy
> audit daemons function normally
> 2. If a new audit daemon starts it sends a message to the kernel
> indicating that it can handle the new format and the kernel starts
> emitting newly formatted records[1]
> 3. The new audit daemon records the audit records in whatever format it
> is configured to so: legacy string format, raw binary format, and/or
> some wacky format yet to be invented[2]
>
> [1] The new record format should probably a binary format which makes
> use of netlink attributes, this would avoid much of the string parsing
> and versioning problems we have seen previously. There is ample
> evidence of kernel subsystems using netlink in a similar fashion
> successfully.
>
> [2] If done carefully, we might be able to allow administrators to
> create their own on-disk string formats without the need to write an
> entire dispatcher plug-in.
>
This isn't a vote against (since I haven't fielded yet), but I could see
it could throw the user-space tools a curve (especially option [2])
regarding legacy data.
Might have to register the format spec inside the log file?
LCB.
--
LC (Lenny) Bruzenak
lenny@magitekltd.com
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Performance of libauparse
2008-10-01 19:20 ` LC Bruzenak
@ 2008-10-01 19:31 ` LC Bruzenak
2008-10-01 21:19 ` Paul Moore
1 sibling, 0 replies; 11+ messages in thread
From: LC Bruzenak @ 2008-10-01 19:31 UTC (permalink / raw)
To: linux-audit
Oh, another thing which would (potentially) get harder is aggregation.
Since we have aggregated audit data sent from one audisp-remote to the
event loop of the aggregating auditd, both systems (kernels) would need
to be on the same data format page.
Otherwise, the formats would be interwoven in the same on-disk log.
LCB.
--
LC (Lenny) Bruzenak
lenny@magitekltd.com
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Performance of libauparse
2008-10-01 19:20 ` LC Bruzenak
2008-10-01 19:31 ` LC Bruzenak
@ 2008-10-01 21:19 ` Paul Moore
1 sibling, 0 replies; 11+ messages in thread
From: Paul Moore @ 2008-10-01 21:19 UTC (permalink / raw)
To: linux-audit
On Wednesday 01 October 2008 3:20:13 pm LC Bruzenak wrote:
> On Wed, 2008-10-01 at 14:38 -0400, Paul Moore wrote:
> > On Wednesday 01 October 2008 9:15:27 am Eric Paris wrote:
> > > On Tue, 2008-09-30 at 15:18 -0400, John Dennis wrote:
> > > > Eric likes to point out we can't change the
> > > > kernel
> > >
> > > Close, but not quite. I say we can't change the kernel without
> > > complete backwards compatibility. Show me the right solution and
> > > we can get there, we just can't throw away what's already there.
> >
> > Not really aimed at anyone in particular, just throwing out a
> > possible solution ...
> >
> > 1. By default kernel starts up and emits existing string format,
> > legacy audit daemons function normally
> > 2. If a new audit daemon starts it sends a message to the kernel
> > indicating that it can handle the new format and the kernel
> > starts emitting newly formatted records[1]
> > 3. The new audit daemon records the audit records in whatever
> > format it is configured to so: legacy string format, raw binary
> > format, and/or some wacky format yet to be invented[2]
> >
> > [1] The new record format should probably a binary format which
> > makes use of netlink attributes, this would avoid much of the
> > string parsing and versioning problems we have seen previously.
> > There is ample evidence of kernel subsystems using netlink in a
> > similar fashion successfully.
> >
> > [2] If done carefully, we might be able to allow administrators to
> > create their own on-disk string formats without the need to write
> > an entire dispatcher plug-in.
>
> This isn't a vote against (since I haven't fielded yet), but I could
> see it could throw the user-space tools a curve (especially option
> [2]) regarding legacy data.
Exactly why the system needs to be backwards compatible; you stick with
the old format until your tools support a newer format. However, if
you are talking the base audit userspace tools, I would expect those to
be updated in sync with an update audit daemon so it should be more of
3rd party issue.
The nice bit about shipping binary events to userspace is you can create
whatever format you want. Ideally there would be a way for audit
plugins to see the events in binary format which opens things up even
more.
> Might have to register the format spec inside the log file?
Sure. Why not. Once you abstract the representation of the data from
the data itself you can do whatever you want with much less fuss. We
can even stop worrying about all this string escaping sillyness that
seems to never die.
[NOTE: I'm grabbing your other mail and replying to it here]
> Oh, another thing which would (potentially) get harder is
> aggregation. Since we have aggregated audit data sent from one
> audisp-remote to the event loop of the aggregating auditd, both
> systems (kernels) would need to be on the same data format page.
Yep. Or you just pass around the binary format and you never have to
worry about formatting, just new field types and honestly there are
easy solutions around that as well.
--
paul moore
linux @ hp
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2008-10-01 21:19 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-09-30 13:34 Performance of libauparse Matthew Booth
2008-09-30 18:34 ` Steve Grubb
2008-09-30 19:18 ` John Dennis
2008-09-30 22:18 ` Matthew Booth
2008-10-01 13:15 ` Eric Paris
2008-10-01 16:08 ` Matthew Booth
2008-10-01 18:46 ` Steve Grubb
2008-10-01 18:38 ` Paul Moore
2008-10-01 19:20 ` LC Bruzenak
2008-10-01 19:31 ` LC Bruzenak
2008-10-01 21:19 ` Paul Moore
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox