From mboxrd@z Thu Jan 1 00:00:00 1970 From: Steve Grubb Subject: Re: Performance of libauparse Date: Tue, 30 Sep 2008 14:34:06 -0400 Message-ID: <200809301434.07079.sgrubb@redhat.com> References: <48E22AC8.8010906@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: <48E22AC8.8010906@redhat.com> Content-Disposition: inline List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: linux-audit-bounces@redhat.com Errors-To: linux-audit-bounces@redhat.com To: linux-audit@redhat.com List-Id: linux-audit@redhat.com On Tuesday 30 September 2008 09:34:00 Matthew Booth wrote: > To measure the overhead of libauparse on austream I initialised auparse > =A0 as AUSOURCE_FEED, fed each received record into it, and spat them o= ut > unmodified on receiving the AUPARSE_CB_EVENT_READY event.=20 This is not an apples to apples comparison. libauparse fully parses each=20 field. So, it is doing significantly more work. You could add a strtok_r = loop=20 to your code to come closer to a direct comparison. > This added more than an order of magnitude to the time austream spends = in > userspace. A brief look at this overhead shows that about 40% is spent > in malloc()/free(),=20 Yep. The main idea was really to just get it working and then optimize in= a=20 future release. The memory management is the low hanging fruit. I've been= =20 thinking to fix it so that each field is not malloc'ed individually, that= =20 there are string pointers and lengths stored and used internally. > and 25% is spent in strlen, strdup, memcpy, memmove and friends. I susp= ect > that very substantial gains could be made in the performance of libaupa= rse > by reworking the way it uses memory, and passing the length of strings > around with the strings. Unfortunately, I suspect this would amount to = a > substantial rewrite. Possibly. But I wasn't planning to do this until after solving the interl= aced=20 record problem. I'd rather it be the current speed and correct than faste= r=20 and still wrong. > Is this something anybody else is interested in? I guess performance > isn't so important if you're just scanning log files in non-real time. Yes, after the next release. In the mean time it might not hurt to add so= me=20 tests to the auparse_test programs so that any re-write induced regressio= n=20 has a chance of being found. > [1] What I'd really like is a well-defined binary format from the kerne= l. Not likely to ever happen. -Steve