From mboxrd@z Thu Jan 1 00:00:00 1970 From: Rudi Chiarito Subject: Filesystem access statistics Date: Wed, 12 Apr 2006 17:23:25 +0200 Message-ID: <20060412152325.GA1399@plain.rackshack.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Return-path: Received: from mx3.redhat.com (mx3.redhat.com [172.16.48.32]) by int-mx1.corp.redhat.com (8.12.11.20060308/8.11.6) with ESMTP id k3CFQWML022683 for ; Wed, 12 Apr 2006 11:26:32 -0400 Received: from server4.8080.it (ev1s-207-44-234-28.ev1servers.net [207.44.234.28] (may be forged)) by mx3.redhat.com (8.13.1/8.13.1) with ESMTP id k3CFQQge022566 for ; Wed, 12 Apr 2006 11:26:26 -0400 Content-Disposition: inline List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: linux-audit-bounces@redhat.com Errors-To: linux-audit-bounces@redhat.com To: linux-audit@redhat.com List-Id: linux-audit@redhat.com Hi, I subscribed to the list after checking with Steve that it was not an outlandish choice of places where to ask my questions. I need to look at a portion of the filesystem namespace and maintain aggregate statistics on access patterns. In other words, I have a large filesystem and would like to find out which are the hot spots. I don't need to keep track of every single file access: since the file count is in the order of millions, that would swamp the actual I/O, the analysis and the people looking at the final data. It would make sense to just group accesses by looking at the top N levels (anything accessed at levels N+1, N+2, etc. would be coalesced into the parent directory at level N). I think that I can't be the only one with such a need. In my case, the information is going to be used to change the way the tree is going to be laid out in the future, as well as determining when parts of it can be made read-only (after an inactivity period). I can also see the information being useful for selective incremental backups - just look at the hot spots - or for smarter ordering during a disaster recovery restore (if you're recovering from random access storage, not tape). Maybe even locate/slocate/rlocate/mlocate could take advantage of it. What would be the best approach to this? Inotify doesn't seem to cut it, because it can't handle recursive watches. I can't afford placing watches all over the place. Given the sheer number of operations being tracked, it looks like I'd need some custom code that audits all file/directory operations, determines if there's a match (I'm only interested in a specific tree, not everything under /), increments internal counters and throws the event away. Is there code I could look at for ideas? Thanks in advance for any help. -- Rudi