From mboxrd@z Thu Jan 1 00:00:00 1970 From: "J. Bruce Fields" Subject: Re: Beagle and logging inotify events Date: Wed, 14 Nov 2007 14:30:02 -0500 Message-ID: <20071114193002.GL14254@fieldses.org> References: <9e4733910711131604k1290d4e1s5ee9808cbb61c2b6@mail.gmail.com> <45578746-916A-4F59-9A92-E7CEEFBC09B0@oracle.com> <9e4733910711140544l3f311868n96d753ce0b70cee5@mail.gmail.com> <20071114190919.GJ14254@fieldses.org> <9e4733910711141122gb6f99b2t7209ba9e43acaee5@mail.gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Andi Kleen , Chuck Lever , linux-fsdevel@vger.kernel.org To: Jon Smirl Return-path: Received: from mail.fieldses.org ([66.93.2.214]:39361 "EHLO fieldses.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752186AbXKNTaG (ORCPT ); Wed, 14 Nov 2007 14:30:06 -0500 Content-Disposition: inline In-Reply-To: <9e4733910711141122gb6f99b2t7209ba9e43acaee5@mail.gmail.com> Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-fsdevel.vger.kernel.org On Wed, Nov 14, 2007 at 02:22:51PM -0500, Jon Smirl wrote: > On 11/14/07, J. Bruce Fields wrote: > > On Wed, Nov 14, 2007 at 04:30:16PM +0100, Andi Kleen wrote: > > > "Jon Smirl" writes: > > > > > > > On 11/14/07, Chuck Lever wrote: > > > >> On Nov 13, 2007, at 7:04 PM, Jon Smirl wrote: > > > >> > Is it feasible to do something like this in the linux file system > > > >> > architecture? > > > >> > > > > >> > Beagle beats on my disk for an hour when I reboot. Of course I don't > > > >> > like that and I shut Beagle off. > > > >> > > > >> Leopard, by the way, does exactly this: it has a daemon that starts > > > >> at boot time and taps FSEvents then journals file system changes to a > > > >> well-known file on local disk. > > > > > > > > Logging file systems have all of the needed info. > > > > > > Actually most journaling file systems in Linux use block logging and > > > it would be probably hard to get specific file names out of a random > > > collection of logged blocks. And even if you could they would > > > hit a lot of false positives since everything is rounded up > > > to block level. > > > > > > With intent logging like in XFS/JFS it would be easier, but even > > > then costly :- e.g. they might log changes to the inode but > > > there is no back pointer to the file name short of searching the > > > whole directory tree. > > > > So it seems the best approach given the current api's would be just to > > cache all the stat data, and stat every file on reboot. > > > > I don't understand why beagle is reading the entire filesystem data. I > > understand why even just doing the stat's could be prohibitive, though. > > I believe Beagle is looking at the mtimes on the files. It uses xattrs > to store the last mtime it checked and then compares it to the current > mtime. It also stores a hash of the file in an xattr. So even if the You meant "only if", not "even if"? > mtimes don't match it recomputes the hash and only if the hashes > differ do it update its free text search index. OK, that makes a little more sense. (Though it seems unfortunate to use xattrs instead of caching the data elsewhere. Git and nfs e.g. both use the ctime to decide when a file changes, so you're invalidating their caches unnecessarily.) --b.