aggregation/viewer question

public inbox for linux-audit@redhat.com
 help / color / mirror / Atom feed

* aggregation/viewer question
@ 2008-10-13 17:34 LC Bruzenak
  2008-10-13 20:24 ` John Dennis
  0 siblings, 1 reply; 2+ messages in thread
From: LC Bruzenak @ 2008-10-13 17:34 UTC (permalink / raw)
  To: Linux Audit

Has anyone been thinking about how to store/maintain the aggregated
audit data long-term?

In my setup, I will be sending data from several machines to one central
log host.

After a while, the number of logs/data will grow large. With hundreds of
files, the rotate will take more time and the audit-viewer "select
source" option becomes tedious. Most of my searches involve
time/host/user. Using the prelude plugin helps a lot, because it
highlights what is otherwise hidden in the data pool. But pulling out
that record from a selection of log files isn't currently intuitive.

I would think we'd put these into a RDB or structure them by time
directory structure something like year/month/week ... or maybe
something else entirely. I'm thinking also about ease of backup/restore
with incoming records. I'd hate to shut down all the sending clients
just to backup or restore my audit data, so that part will need to
operate asynchronously.

Before striking out on my own I thought I'd ask the list and see if
there are any such plans already in the works.

As a suggestion, the prewikka viewer seems like a workable model. I
realize that viewer is built around the IDS structure, but as an event
search tool it is pretty good and mostly complete. Having network access
to it is also a nice feature.

So right now I think that feeding the events into a DB and then using a
tool with the same capabilities as are in the prewikka viewer would be a
viable option. Others? Ideas?

Thanks in advance,
LCB.

-- 
LC (Lenny) Bruzenak
lenny@magitekltd.com

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: aggregation/viewer question
  2008-10-13 17:34 aggregation/viewer question LC Bruzenak
@ 2008-10-13 20:24 ` John Dennis
  0 siblings, 0 replies; 2+ messages in thread
From: John Dennis @ 2008-10-13 20:24 UTC (permalink / raw)
  To: LC Bruzenak; +Cc: Linux Audit

LC Bruzenak wrote:
> Has anyone been thinking about how to store/maintain the aggregated
> audit data long-term?
>
> In my setup, I will be sending data from several machines to one central
> log host.
>
> After a while, the number of logs/data will grow large. With hundreds of
> files, the rotate will take more time and the audit-viewer "select
> source" option becomes tedious. Most of my searches involve
> time/host/user. Using the prelude plugin helps a lot, because it
> highlights what is otherwise hidden in the data pool. But pulling out
> that record from a selection of log files isn't currently intuitive.
>
> I would think we'd put these into a RDB or structure them by time
> directory structure something like year/month/week ... or maybe
> something else entirely. I'm thinking also about ease of backup/restore
> with incoming records. I'd hate to shut down all the sending clients
> just to backup or restore my audit data, so that part will need to
> operate asynchronously.
>
> Before striking out on my own I thought I'd ask the list and see if
> there are any such plans already in the works.
>   

Yes, we plan on addressing many of these issues in IPA, not just for 
kernel audit data, but for all log data (e.g. Apache error log, Kerberos 
access log, SMTP logs, etc.). The basic idea is that there is will be a 
central server which accepts log data from individual nodes. The log 
data can be signed for authenticity and will be robustly transported via 
AMQP with fail over and guaranteed delivery. The log data will be 
compressed. You can specify which logs you want collected, their 
collection interval, along with record level filtering. Once on the 
server the log meta data is entered into a "catalogue"  (a relational 
database) which along with the meta data stores where the actual log 
data can be found on disk. The disk files will be optimized for 
compression and access. The catalogue manager will be able to 
reconstruct any portion of a log file (stream) from any node within a 
time interval. This can be used for external analysis tools, compliance 
reporting etc. The catalogue will be capable of intelligently archiving 
old log data and restoring it back into a "live catalogue". This is what 
is planned for v2 of IPA, which is anticipated to be about 1 year from 
now. In v3 of IPA the audit catalogue will support search and reporting 
on *all* the log data in the catalogue (not just audit.log but all log 
data). In v3 when data arrives at the catalogue it will be indexed for 
fast search and retrieval. Search will be based on tokens and key/value 
pairs and will accept constraints on nodes, time intervals, users, etc. 
(Note a relational database will NOT be used to support searching, 
rather searches will be performed via optimized reverse indexes on 
textural tokens, the use of an RDB will only be for managing the 
collection of log files)

A note about vocabulary: in "IPA land" when we say "audit data" or an 
"audit catalogue" or "audit search" the term "audit" refers to any log 
data, of which kernel audit data is just one subset.
> As a suggestion, the prewikka viewer seems like a workable model. I
> realize that viewer is built around the IDS structure, but as an event
> search tool it is pretty good and mostly complete. Having network access
> to it is also a nice feature.
>
> So right now I think that feeding the events into a DB and then using a
> tool with the same capabilities as are in the prewikka viewer would be a
> viable option. Others? Ideas?
>
> Thanks in advance,
> LCB.
>
>   

-- 
John Dennis <jdennis@redhat.com>

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2008-10-13 20:24 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-10-13 17:34 aggregation/viewer question LC Bruzenak
2008-10-13 20:24 ` John Dennis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox