linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chuck Lever <chuck.lever@oracle.com>
To: Jon Smirl <jonsmirl@gmail.com>
Cc: linux-fsdevel@vger.kernel.org
Subject: Re: Beagle and logging inotify events
Date: Wed, 14 Nov 2007 09:41:12 -0500	[thread overview]
Message-ID: <473B0908.1060304@oracle.com> (raw)
In-Reply-To: <9e4733910711140544l3f311868n96d753ce0b70cee5@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 5213 bytes --]

Jon Smirl wrote:
> On 11/14/07, Chuck Lever <chuck.lever@oracle.com> wrote:
>> On Nov 13, 2007, at 7:04 PM, Jon Smirl wrote:
>>> Is it feasible to do something like this in the linux file system
>>> architecture?
>>>
>>> Beagle beats on my disk for an hour when I reboot. Of course I don't
>>> like that and I shut Beagle off.
>> Leopard, by the way, does exactly this: it has a daemon that starts
>> at boot time and taps FSEvents then journals file system changes to a
>> well-known file on local disk.
> 
> Logging file systems have all of the needed info. Plus they know what
> is going on with rollback/replay after a crash.

True, but not all file systems have a journal.  Consider ext2 or FAT32, 
both of which are still common.

> How about a fs API
> where Beagle has a token for a checkpoint, and then it can ask for a
> recreation of inotify events from that point forward.  It's always
> possible for the file system to say I can't do that and trigger a full
> rebuild from Beagle. Daemons that aren't coordinated with the file
> system have a window during crash/reboot where they can get confused.

A reasonably effective solution can be implemented in user space without 
changes to the file system APIs or implementations.  IOW we already have 
the tools to make something useful.

For example, you don't need to record every file system event to make 
this useful.  Listing only directory-level changes (ie "some file in 
this directory has changed") is enough to prune most of Beagle's work 
when it starts up.

> Without low level support like this Beagle is forced to do a rescan on
> every boot. Since I crash my machine all of the time the disk load
> from rebooting is intolerable and I turn Beagle off. Even just turning
> the machine on in the morning generates an annoyingly large load on
> the disk.

Understood.  The need is clear.

My Dad's WinXP system takes 10 minutes after every start-up before it's 
usable, simply because the virus scanner has to check every file in the 
system.  Same problem!

>> I don't see why this couldn't be done on Linux as well.
>>
>>> ---------- Forwarded message ----------
>>> From: Jon Smirl <jonsmirl@gmail.com>
>>> Date: Nov 13, 2007 4:44 PM
>>> Subject: Re: Strange "beagle" interaction..
>>> To: Linus Torvalds <torvalds@linux-foundation.org>
>>> Cc: "J. Bruce Fields" <bfields@fieldses.org>, Junio C Hamano
>>> <gitster@pobox.com>, Git Mailing List <git@vger.kernel.org>, Johannes
>>> Schindelin <Johannes.Schindelin@gmx.de>
>>>
>>>
>>> On 11/13/07, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>>>>
>>>> On Tue, 13 Nov 2007, J. Bruce Fields wrote:
>>>>> Last I ran across this, I believe I found it was adding extended
>>>>> attributes to the file.
>>>> Yeah, I just straced it and found the same thing. It's saving
>>>> fingerprints
>>>> and mtimes to files in the extended attributes.
>>> Things like Beagle need a guaranteed log of global inotify events.
>>> That would let them efficiently find changes made since the last time
>>> they updated their index.
>>>
>>> Right now every time Beagle starts it hasn't got a clue what has
>>> changed in the file system since it was last run. This forces Beagle
>>> to rescan the entire filesystem every time it is started. The xattrs
>>> are used as cache to reduce this load somewhat.
>>>
>>> A better solution would be for the kernel to log inotify events to
>>> disk in a manner that survives reboots. When Beagle starts it would
>>> locate its last checkpoint and then process the logged inotify events
>>> from that time forward. This inotify logging needs to be bullet proof
>>> or it will mess up your Beagle index.
>>>
>>> Logged files systems already contain the logged inotify data (in their
>>> own internal form). There's just no universal API for retrieving it in
>>> a file system independent manner.
>>>
>>>>> Yeah, I just turned off beagle.  It looked to me like it was doing
>>>>> something wrongheaded.
>>>> Gaah. The problem is, setting xattrs does actually change ctime.
>>>> Which
>>>> means that if we want to make git play nice with beagle, I guess
>>>> we have
>>>> to just remove the comparison of ctime.
>>>>
>>>> Oh, well. Git doesn't *require* it, but I like the notion of
>>>> checking the
>>>> inode really really carefully. But it looks like it may not be an
>>>> option,
>>>> because of file indexers hiding stuff behind our backs.
>>>>
>>>> Or we could just tell people not to run beagle on their git trees,
>>>> but I
>>>> suspect some people will actually *want* to. Even if it flushes
>>>> their disk
>>>> caches.
>>>>
>>>>                 Linus
>>>> -
>>>> To unsubscribe from this list: send the line "unsubscribe git" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>
>>> --
>>> Jon Smirl
>>> jonsmirl@gmail.com
>>>
>>>
>>> --
>>> Jon Smirl
>>> jonsmirl@gmail.com
>>> -
>>> To unsubscribe from this list: send the line "unsubscribe linux-
>>> fsdevel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> --
>> Chuck Lever
>> chuck[dot]lever[at]oracle[dot]com
>>
>>
>>
>>
> 
> 

[-- Attachment #2: chuck.lever.vcf --]
[-- Type: text/x-vcard, Size: 259 bytes --]

begin:vcard
fn:Chuck Lever
n:Lever;Chuck
org:Oracle Corporation;Corporate Architecture: Linux Projects Group
adr:;;1015 Granger Avenue;Ann Arbor;MI;48104;USA
title:Principal Member of Staff
tel;work:+1 248 614 5091
x-mozilla-html:FALSE
version:2.1
end:vcard


  reply	other threads:[~2007-11-14 14:42 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-11-14  0:04 Beagle and logging inotify events Jon Smirl
2007-11-14 13:29 ` Chuck Lever
2007-11-14 13:44   ` Jon Smirl
2007-11-14 14:41     ` Chuck Lever [this message]
2007-11-14 15:01       ` Jon Smirl
2007-11-14 16:32         ` Chuck Lever
2007-11-14 17:46           ` Jon Smirl
2007-11-14 19:32           ` Andreas Dilger
2007-11-14 19:38             ` J. Bruce Fields
2007-11-15 19:59               ` Jan Kara
2007-11-15 20:14                 ` J. Bruce Fields
2007-11-15 20:14                 ` Jon Smirl
2007-11-14 15:30     ` Andi Kleen
2007-11-14 19:09       ` J. Bruce Fields
2007-11-14 19:22         ` Jon Smirl
2007-11-14 19:30           ` J. Bruce Fields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=473B0908.1060304@oracle.com \
    --to=chuck.lever@oracle.com \
    --cc=jonsmirl@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).