* Filesystem access statistics
@ 2006-04-12 15:23 Rudi Chiarito
2006-04-12 16:26 ` Steve Grubb
0 siblings, 1 reply; 4+ messages in thread
From: Rudi Chiarito @ 2006-04-12 15:23 UTC (permalink / raw)
To: linux-audit
Hi,
I subscribed to the list after checking with Steve that it was not an
outlandish choice of places where to ask my questions.
I need to look at a portion of the filesystem namespace and maintain
aggregate statistics on access patterns. In other words, I have a large
filesystem and would like to find out which are the hot spots. I don't
need to keep track of every single file access: since the file count is
in the order of millions, that would swamp the actual I/O, the
analysis and the people looking at the final data. It would make sense
to just group accesses by looking at the top N levels (anything
accessed at levels N+1, N+2, etc. would be coalesced into the parent
directory at level N).
I think that I can't be the only one with such a need. In my case, the
information is going to be used to change the way the tree is going to
be laid out in the future, as well as determining when parts of it can
be made read-only (after an inactivity period). I can also see the
information being useful for selective incremental backups - just look
at the hot spots - or for smarter ordering during a disaster recovery
restore (if you're recovering from random access storage, not tape).
Maybe even locate/slocate/rlocate/mlocate could take advantage of it.
What would be the best approach to this? Inotify doesn't seem to cut it,
because it can't handle recursive watches. I can't afford placing
watches all over the place. Given the sheer number of operations being
tracked, it looks like I'd need some custom code that audits all
file/directory operations, determines if there's a match (I'm only
interested in a specific tree, not everything under /), increments
internal counters and throws the event away. Is there code I could look
at for ideas?
Thanks in advance for any help.
--
Rudi
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Filesystem access statistics
2006-04-12 15:23 Filesystem access statistics Rudi Chiarito
@ 2006-04-12 16:26 ` Steve Grubb
2006-04-12 20:12 ` Rudi Chiarito
0 siblings, 1 reply; 4+ messages in thread
From: Steve Grubb @ 2006-04-12 16:26 UTC (permalink / raw)
To: linux-audit
On Wednesday 12 April 2006 11:23, Rudi Chiarito wrote:
> I need to look at a portion of the filesystem namespace and maintain
> aggregate statistics on access patterns. In other words, I have a large
> filesystem and would like to find out which are the hot spots. I don't
> need to keep track of every single file access: since the file count is
> in the order of millions, that would swamp the actual I/O, the
> analysis and the people looking at the final data. It would make sense
> to just group accesses by looking at the top N levels (anything
> accessed at levels N+1, N+2, etc. would be coalesced into the parent
> directory at level N).
I would think that you could write a program to do this via the audit
dispatcher interface. In auditd.conf,
dispatcher = /usr/bin/your-program
log_format = nolog
This will tell the audit daemon you don't want the records written to disk and
that it should pass events to the dispatcher. You can get example code in
skeleton.c. (rpm -ql audit | grep skel ) The skeleton.c program around line
123 is where you would add your code to examine the PATH records. You would
likely want to do
if (hdr.type == AUDIT_PATH) {
process the path
}
so that you only look at the right kind of records. If you have too many
records coming out of the audit system, you might be able to suppress some in
the kernel by adding rules like this:
-a always,exclude -F 'msgtype<PATH' -F 'msgtype>PATH'
You can then set the audit rules for whatever you want to measure, if all you
want to measure is the opens,
-a always,exit -S open
You can use devmajor and devminor fields to limit the audit system to
reporting opens on an exact partition. This is highly recommended. On my
system, I would do something like:
-a always,exit -S open -F devmajor=3 -F devminor=6
to watch the /tmp directory.
> I think that I can't be the only one with such a need.
Bootchart.org used the audit system to examine the boot order
http://bootchart.org/misc/filemon/filemond
> In my case, the information is going to be used to change the way the tree
> is going to be laid out in the future, as well as determining when parts of
> it can be made read-only (after an inactivity period). I can also see the
> information being useful for selective incremental backups - just look
> at the hot spots - or for smarter ordering during a disaster recovery
> restore (if you're recovering from random access storage, not tape).
> Maybe even locate/slocate/rlocate/mlocate could take advantage of it.
Sounds interesting.
> What would be the best approach to this?
I think we've laid out an approach above.
1) set custom rules to watch just the syscalls you want on the exact
partitions you want.
2) put the analysis in a program hanging off of audit event dispatcher and
turn off audit logging to disk.
You need audit-1.1 or higher. audit-1.1 has some ABI changes that make it
incompatible with audit-1.0 systems. Let me know if that is a problem. Also,
you will need 2.6.17 kernel if you use the '<' or '>' rule operators.
-Steve
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Filesystem access statistics
2006-04-12 16:26 ` Steve Grubb
@ 2006-04-12 20:12 ` Rudi Chiarito
2006-04-13 13:50 ` Steve Grubb
0 siblings, 1 reply; 4+ messages in thread
From: Rudi Chiarito @ 2006-04-12 20:12 UTC (permalink / raw)
To: linux-audit
On Wed, Apr 12, 2006 at 12:26:29PM -0400, Steve Grubb wrote:
> I would think that you could write a program to do this via the audit
> dispatcher interface. In auditd.conf,
> dispatcher = /usr/bin/your-program
> log_format = nolog
Will that preempt any other audit users that might be looking for
events downstream? Sounds a bit too drastic, although I guess I am not
the typical case, so an application as "intrusive" as mine won't be
needed on the average system.
> if (hdr.type == AUDIT_PATH) {
libaudit.h from audit-libs-devel 1.1.5-1 only has AUDIT_FS_INODE. Is
this new in 1.2 or a typo? I saw mention of a new filesystem API in the
audit RPM changelog. Is that part of it?
> You can then set the audit rules for whatever you want to measure, if all you
> want to measure is the opens,
That's a very good question by itself. Anything that peeks into a
directory should do, I guess. That would mean not just opens, but also
directory traversals, unlink calls, etc. Are there aliases of any kind?
The kernel just gained a bunch of new *at() syscalls. If I had written
this a month or two ago, I would have most likely missed them. Is there
a way to look for present and future syscalls dealing with files/inodes?
> You can use devmajor and devminor fields to limit the audit system to
> reporting opens on an exact partition. This is highly recommended. On my
That's a good idea where applicable. Thanks.
--
Rudi
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Filesystem access statistics
2006-04-12 20:12 ` Rudi Chiarito
@ 2006-04-13 13:50 ` Steve Grubb
0 siblings, 0 replies; 4+ messages in thread
From: Steve Grubb @ 2006-04-13 13:50 UTC (permalink / raw)
To: linux-audit
On Wednesday 12 April 2006 16:12, Rudi Chiarito wrote:
> On Wed, Apr 12, 2006 at 12:26:29PM -0400, Steve Grubb wrote:
> > I would think that you could write a program to do this via the audit
> > dispatcher interface. In auditd.conf,
> > dispatcher = /usr/bin/your-program
> > log_format = nolog
>
> Will that preempt any other audit users that might be looking for
> events downstream?
Yes it will, but the audit event dispatcher is not ready yet. So, if you want
something today, you can just take over the interface. Then re-write it as a
plugin later when the event dispatcher is finished.
> > if (hdr.type == AUDIT_PATH) {
>
> libaudit.h from audit-libs-devel 1.1.5-1 only has AUDIT_FS_INODE.
libaudit.h includes linux/audit.h which defines the AUDIT_PATH message type.
> Is this new in 1.2 or a typo?
Its been around for at least a year.
> I saw mention of a new filesystem API in the audit RPM changelog. Is that
> part of it?
No.
> > You can then set the audit rules for whatever you want to measure, if all
> > you want to measure is the opens,
>
> That's a very good question by itself. Anything that peeks into a
> directory should do, I guess. That would mean not just opens, but also
> directory traversals, unlink calls, etc. Are there aliases of any kind?
No.
> The kernel just gained a bunch of new *at() syscalls. If I had written
> this a month or two ago, I would have most likely missed them. Is there
> a way to look for present and future syscalls dealing with files/inodes?
Not at the moment.
-Steve
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2006-04-13 13:50 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-04-12 15:23 Filesystem access statistics Rudi Chiarito
2006-04-12 16:26 ` Steve Grubb
2006-04-12 20:12 ` Rudi Chiarito
2006-04-13 13:50 ` Steve Grubb
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox