* Beagle and logging inotify events
@ 2007-11-14 0:04 Jon Smirl
2007-11-14 13:29 ` Chuck Lever
0 siblings, 1 reply; 16+ messages in thread
From: Jon Smirl @ 2007-11-14 0:04 UTC (permalink / raw)
To: linux-fsdevel
Is it feasible to do something like this in the linux file system architecture?
Beagle beats on my disk for an hour when I reboot. Of course I don't
like that and I shut Beagle off.
---------- Forwarded message ----------
From: Jon Smirl <jonsmirl@gmail.com>
Date: Nov 13, 2007 4:44 PM
Subject: Re: Strange "beagle" interaction..
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "J. Bruce Fields" <bfields@fieldses.org>, Junio C Hamano
<gitster@pobox.com>, Git Mailing List <git@vger.kernel.org>, Johannes
Schindelin <Johannes.Schindelin@gmx.de>
On 11/13/07, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
>
> On Tue, 13 Nov 2007, J. Bruce Fields wrote:
> >
> > Last I ran across this, I believe I found it was adding extended
> > attributes to the file.
>
> Yeah, I just straced it and found the same thing. It's saving fingerprints
> and mtimes to files in the extended attributes.
Things like Beagle need a guaranteed log of global inotify events.
That would let them efficiently find changes made since the last time
they updated their index.
Right now every time Beagle starts it hasn't got a clue what has
changed in the file system since it was last run. This forces Beagle
to rescan the entire filesystem every time it is started. The xattrs
are used as cache to reduce this load somewhat.
A better solution would be for the kernel to log inotify events to
disk in a manner that survives reboots. When Beagle starts it would
locate its last checkpoint and then process the logged inotify events
from that time forward. This inotify logging needs to be bullet proof
or it will mess up your Beagle index.
Logged files systems already contain the logged inotify data (in their
own internal form). There's just no universal API for retrieving it in
a file system independent manner.
>
> > Yeah, I just turned off beagle. It looked to me like it was doing
> > something wrongheaded.
>
> Gaah. The problem is, setting xattrs does actually change ctime. Which
> means that if we want to make git play nice with beagle, I guess we have
> to just remove the comparison of ctime.
>
> Oh, well. Git doesn't *require* it, but I like the notion of checking the
> inode really really carefully. But it looks like it may not be an option,
> because of file indexers hiding stuff behind our backs.
>
> Or we could just tell people not to run beagle on their git trees, but I
> suspect some people will actually *want* to. Even if it flushes their disk
> caches.
>
> Linus
> -
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
Jon Smirl
jonsmirl@gmail.com
--
Jon Smirl
jonsmirl@gmail.com
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Beagle and logging inotify events
2007-11-14 0:04 Beagle and logging inotify events Jon Smirl
@ 2007-11-14 13:29 ` Chuck Lever
2007-11-14 13:44 ` Jon Smirl
0 siblings, 1 reply; 16+ messages in thread
From: Chuck Lever @ 2007-11-14 13:29 UTC (permalink / raw)
To: Jon Smirl; +Cc: linux-fsdevel
On Nov 13, 2007, at 7:04 PM, Jon Smirl wrote:
> Is it feasible to do something like this in the linux file system
> architecture?
>
> Beagle beats on my disk for an hour when I reboot. Of course I don't
> like that and I shut Beagle off.
Leopard, by the way, does exactly this: it has a daemon that starts
at boot time and taps FSEvents then journals file system changes to a
well-known file on local disk.
I don't see why this couldn't be done on Linux as well.
> ---------- Forwarded message ----------
> From: Jon Smirl <jonsmirl@gmail.com>
> Date: Nov 13, 2007 4:44 PM
> Subject: Re: Strange "beagle" interaction..
> To: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: "J. Bruce Fields" <bfields@fieldses.org>, Junio C Hamano
> <gitster@pobox.com>, Git Mailing List <git@vger.kernel.org>, Johannes
> Schindelin <Johannes.Schindelin@gmx.de>
>
>
> On 11/13/07, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>>
>>
>> On Tue, 13 Nov 2007, J. Bruce Fields wrote:
>>>
>>> Last I ran across this, I believe I found it was adding extended
>>> attributes to the file.
>>
>> Yeah, I just straced it and found the same thing. It's saving
>> fingerprints
>> and mtimes to files in the extended attributes.
>
> Things like Beagle need a guaranteed log of global inotify events.
> That would let them efficiently find changes made since the last time
> they updated their index.
>
> Right now every time Beagle starts it hasn't got a clue what has
> changed in the file system since it was last run. This forces Beagle
> to rescan the entire filesystem every time it is started. The xattrs
> are used as cache to reduce this load somewhat.
>
> A better solution would be for the kernel to log inotify events to
> disk in a manner that survives reboots. When Beagle starts it would
> locate its last checkpoint and then process the logged inotify events
> from that time forward. This inotify logging needs to be bullet proof
> or it will mess up your Beagle index.
>
> Logged files systems already contain the logged inotify data (in their
> own internal form). There's just no universal API for retrieving it in
> a file system independent manner.
>
>>
>>> Yeah, I just turned off beagle. It looked to me like it was doing
>>> something wrongheaded.
>>
>> Gaah. The problem is, setting xattrs does actually change ctime.
>> Which
>> means that if we want to make git play nice with beagle, I guess
>> we have
>> to just remove the comparison of ctime.
>>
>> Oh, well. Git doesn't *require* it, but I like the notion of
>> checking the
>> inode really really carefully. But it looks like it may not be an
>> option,
>> because of file indexers hiding stuff behind our backs.
>>
>> Or we could just tell people not to run beagle on their git trees,
>> but I
>> suspect some people will actually *want* to. Even if it flushes
>> their disk
>> caches.
>>
>> Linus
>> -
>> To unsubscribe from this list: send the line "unsubscribe git" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
>
> --
> Jon Smirl
> jonsmirl@gmail.com
>
>
> --
> Jon Smirl
> jonsmirl@gmail.com
> -
> To unsubscribe from this list: send the line "unsubscribe linux-
> fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Beagle and logging inotify events
2007-11-14 13:29 ` Chuck Lever
@ 2007-11-14 13:44 ` Jon Smirl
2007-11-14 14:41 ` Chuck Lever
2007-11-14 15:30 ` Andi Kleen
0 siblings, 2 replies; 16+ messages in thread
From: Jon Smirl @ 2007-11-14 13:44 UTC (permalink / raw)
To: Chuck Lever; +Cc: linux-fsdevel
On 11/14/07, Chuck Lever <chuck.lever@oracle.com> wrote:
> On Nov 13, 2007, at 7:04 PM, Jon Smirl wrote:
> > Is it feasible to do something like this in the linux file system
> > architecture?
> >
> > Beagle beats on my disk for an hour when I reboot. Of course I don't
> > like that and I shut Beagle off.
>
> Leopard, by the way, does exactly this: it has a daemon that starts
> at boot time and taps FSEvents then journals file system changes to a
> well-known file on local disk.
Logging file systems have all of the needed info. Plus they know what
is going on with rollback/replay after a crash. How about a fs API
where Beagle has a token for a checkpoint, and then it can ask for a
recreation of inotify events from that point forward. It's always
possible for the file system to say I can't do that and trigger a full
rebuild from Beagle. Daemons that aren't coordinated with the file
system have a window during crash/reboot where they can get confused.
Without low level support like this Beagle is forced to do a rescan on
every boot. Since I crash my machine all of the time the disk load
from rebooting is intolerable and I turn Beagle off. Even just turning
the machine on in the morning generates an annoyingly large load on
the disk.
>
> I don't see why this couldn't be done on Linux as well.
>
> > ---------- Forwarded message ----------
> > From: Jon Smirl <jonsmirl@gmail.com>
> > Date: Nov 13, 2007 4:44 PM
> > Subject: Re: Strange "beagle" interaction..
> > To: Linus Torvalds <torvalds@linux-foundation.org>
> > Cc: "J. Bruce Fields" <bfields@fieldses.org>, Junio C Hamano
> > <gitster@pobox.com>, Git Mailing List <git@vger.kernel.org>, Johannes
> > Schindelin <Johannes.Schindelin@gmx.de>
> >
> >
> > On 11/13/07, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> >>
> >>
> >> On Tue, 13 Nov 2007, J. Bruce Fields wrote:
> >>>
> >>> Last I ran across this, I believe I found it was adding extended
> >>> attributes to the file.
> >>
> >> Yeah, I just straced it and found the same thing. It's saving
> >> fingerprints
> >> and mtimes to files in the extended attributes.
> >
> > Things like Beagle need a guaranteed log of global inotify events.
> > That would let them efficiently find changes made since the last time
> > they updated their index.
> >
> > Right now every time Beagle starts it hasn't got a clue what has
> > changed in the file system since it was last run. This forces Beagle
> > to rescan the entire filesystem every time it is started. The xattrs
> > are used as cache to reduce this load somewhat.
> >
> > A better solution would be for the kernel to log inotify events to
> > disk in a manner that survives reboots. When Beagle starts it would
> > locate its last checkpoint and then process the logged inotify events
> > from that time forward. This inotify logging needs to be bullet proof
> > or it will mess up your Beagle index.
> >
> > Logged files systems already contain the logged inotify data (in their
> > own internal form). There's just no universal API for retrieving it in
> > a file system independent manner.
> >
> >>
> >>> Yeah, I just turned off beagle. It looked to me like it was doing
> >>> something wrongheaded.
> >>
> >> Gaah. The problem is, setting xattrs does actually change ctime.
> >> Which
> >> means that if we want to make git play nice with beagle, I guess
> >> we have
> >> to just remove the comparison of ctime.
> >>
> >> Oh, well. Git doesn't *require* it, but I like the notion of
> >> checking the
> >> inode really really carefully. But it looks like it may not be an
> >> option,
> >> because of file indexers hiding stuff behind our backs.
> >>
> >> Or we could just tell people not to run beagle on their git trees,
> >> but I
> >> suspect some people will actually *want* to. Even if it flushes
> >> their disk
> >> caches.
> >>
> >> Linus
> >> -
> >> To unsubscribe from this list: send the line "unsubscribe git" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at http://vger.kernel.org/majordomo-info.html
> >>
> >
> >
> > --
> > Jon Smirl
> > jonsmirl@gmail.com
> >
> >
> > --
> > Jon Smirl
> > jonsmirl@gmail.com
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-
> > fsdevel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> --
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com
>
>
>
>
--
Jon Smirl
jonsmirl@gmail.com
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Beagle and logging inotify events
2007-11-14 13:44 ` Jon Smirl
@ 2007-11-14 14:41 ` Chuck Lever
2007-11-14 15:01 ` Jon Smirl
2007-11-14 15:30 ` Andi Kleen
1 sibling, 1 reply; 16+ messages in thread
From: Chuck Lever @ 2007-11-14 14:41 UTC (permalink / raw)
To: Jon Smirl; +Cc: linux-fsdevel
[-- Attachment #1: Type: text/plain, Size: 5213 bytes --]
Jon Smirl wrote:
> On 11/14/07, Chuck Lever <chuck.lever@oracle.com> wrote:
>> On Nov 13, 2007, at 7:04 PM, Jon Smirl wrote:
>>> Is it feasible to do something like this in the linux file system
>>> architecture?
>>>
>>> Beagle beats on my disk for an hour when I reboot. Of course I don't
>>> like that and I shut Beagle off.
>> Leopard, by the way, does exactly this: it has a daemon that starts
>> at boot time and taps FSEvents then journals file system changes to a
>> well-known file on local disk.
>
> Logging file systems have all of the needed info. Plus they know what
> is going on with rollback/replay after a crash.
True, but not all file systems have a journal. Consider ext2 or FAT32,
both of which are still common.
> How about a fs API
> where Beagle has a token for a checkpoint, and then it can ask for a
> recreation of inotify events from that point forward. It's always
> possible for the file system to say I can't do that and trigger a full
> rebuild from Beagle. Daemons that aren't coordinated with the file
> system have a window during crash/reboot where they can get confused.
A reasonably effective solution can be implemented in user space without
changes to the file system APIs or implementations. IOW we already have
the tools to make something useful.
For example, you don't need to record every file system event to make
this useful. Listing only directory-level changes (ie "some file in
this directory has changed") is enough to prune most of Beagle's work
when it starts up.
> Without low level support like this Beagle is forced to do a rescan on
> every boot. Since I crash my machine all of the time the disk load
> from rebooting is intolerable and I turn Beagle off. Even just turning
> the machine on in the morning generates an annoyingly large load on
> the disk.
Understood. The need is clear.
My Dad's WinXP system takes 10 minutes after every start-up before it's
usable, simply because the virus scanner has to check every file in the
system. Same problem!
>> I don't see why this couldn't be done on Linux as well.
>>
>>> ---------- Forwarded message ----------
>>> From: Jon Smirl <jonsmirl@gmail.com>
>>> Date: Nov 13, 2007 4:44 PM
>>> Subject: Re: Strange "beagle" interaction..
>>> To: Linus Torvalds <torvalds@linux-foundation.org>
>>> Cc: "J. Bruce Fields" <bfields@fieldses.org>, Junio C Hamano
>>> <gitster@pobox.com>, Git Mailing List <git@vger.kernel.org>, Johannes
>>> Schindelin <Johannes.Schindelin@gmx.de>
>>>
>>>
>>> On 11/13/07, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>>>>
>>>> On Tue, 13 Nov 2007, J. Bruce Fields wrote:
>>>>> Last I ran across this, I believe I found it was adding extended
>>>>> attributes to the file.
>>>> Yeah, I just straced it and found the same thing. It's saving
>>>> fingerprints
>>>> and mtimes to files in the extended attributes.
>>> Things like Beagle need a guaranteed log of global inotify events.
>>> That would let them efficiently find changes made since the last time
>>> they updated their index.
>>>
>>> Right now every time Beagle starts it hasn't got a clue what has
>>> changed in the file system since it was last run. This forces Beagle
>>> to rescan the entire filesystem every time it is started. The xattrs
>>> are used as cache to reduce this load somewhat.
>>>
>>> A better solution would be for the kernel to log inotify events to
>>> disk in a manner that survives reboots. When Beagle starts it would
>>> locate its last checkpoint and then process the logged inotify events
>>> from that time forward. This inotify logging needs to be bullet proof
>>> or it will mess up your Beagle index.
>>>
>>> Logged files systems already contain the logged inotify data (in their
>>> own internal form). There's just no universal API for retrieving it in
>>> a file system independent manner.
>>>
>>>>> Yeah, I just turned off beagle. It looked to me like it was doing
>>>>> something wrongheaded.
>>>> Gaah. The problem is, setting xattrs does actually change ctime.
>>>> Which
>>>> means that if we want to make git play nice with beagle, I guess
>>>> we have
>>>> to just remove the comparison of ctime.
>>>>
>>>> Oh, well. Git doesn't *require* it, but I like the notion of
>>>> checking the
>>>> inode really really carefully. But it looks like it may not be an
>>>> option,
>>>> because of file indexers hiding stuff behind our backs.
>>>>
>>>> Or we could just tell people not to run beagle on their git trees,
>>>> but I
>>>> suspect some people will actually *want* to. Even if it flushes
>>>> their disk
>>>> caches.
>>>>
>>>> Linus
>>>> -
>>>> To unsubscribe from this list: send the line "unsubscribe git" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>
>>>
>>> --
>>> Jon Smirl
>>> jonsmirl@gmail.com
>>>
>>>
>>> --
>>> Jon Smirl
>>> jonsmirl@gmail.com
>>> -
>>> To unsubscribe from this list: send the line "unsubscribe linux-
>>> fsdevel" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> --
>> Chuck Lever
>> chuck[dot]lever[at]oracle[dot]com
>>
>>
>>
>>
>
>
[-- Attachment #2: chuck.lever.vcf --]
[-- Type: text/x-vcard, Size: 259 bytes --]
begin:vcard
fn:Chuck Lever
n:Lever;Chuck
org:Oracle Corporation;Corporate Architecture: Linux Projects Group
adr:;;1015 Granger Avenue;Ann Arbor;MI;48104;USA
title:Principal Member of Staff
tel;work:+1 248 614 5091
x-mozilla-html:FALSE
version:2.1
end:vcard
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Beagle and logging inotify events
2007-11-14 14:41 ` Chuck Lever
@ 2007-11-14 15:01 ` Jon Smirl
2007-11-14 16:32 ` Chuck Lever
0 siblings, 1 reply; 16+ messages in thread
From: Jon Smirl @ 2007-11-14 15:01 UTC (permalink / raw)
To: chuck.lever; +Cc: linux-fsdevel
On 11/14/07, Chuck Lever <chuck.lever@oracle.com> wrote:
> Jon Smirl wrote:
> > On 11/14/07, Chuck Lever <chuck.lever@oracle.com> wrote:
> >> On Nov 13, 2007, at 7:04 PM, Jon Smirl wrote:
> >>> Is it feasible to do something like this in the linux file system
> >>> architecture?
> >>>
> >>> Beagle beats on my disk for an hour when I reboot. Of course I don't
> >>> like that and I shut Beagle off.
> >> Leopard, by the way, does exactly this: it has a daemon that starts
> >> at boot time and taps FSEvents then journals file system changes to a
> >> well-known file on local disk.
> >
> > Logging file systems have all of the needed info. Plus they know what
> > is going on with rollback/replay after a crash.
>
> True, but not all file systems have a journal. Consider ext2 or FAT32,
> both of which are still common.
ext2/FAT32 can use the deamon approach you describe below which also
works as a short term solution. The Beagle people do have a deamon but
it can be turned off. Holes where you don't record the inotify events
and update the index are really bad because they can make files that
you know are on the disk disappear from the index. I don't believe
Beagle distinguishes between someone turning it off for a day and then
turning it back on, vs a reboot. In both cases it says there was a
window where untracked changes could have happened and it triggers a
full rescan.
The root problem here is needing a bullet proof inotify log with no
windows. The only place that is going to happen is inside the file
system logs. We just need an API to say recreate the inotify stream
from this checkpoint forward. Things like FAT/ext2 will always return
a no data available error from this API.
>
> > How about a fs API
> > where Beagle has a token for a checkpoint, and then it can ask for a
> > recreation of inotify events from that point forward. It's always
> > possible for the file system to say I can't do that and trigger a full
> > rebuild from Beagle. Daemons that aren't coordinated with the file
> > system have a window during crash/reboot where they can get confused.
>
> A reasonably effective solution can be implemented in user space without
> changes to the file system APIs or implementations. IOW we already have
> the tools to make something useful.
>
> For example, you don't need to record every file system event to make
> this useful. Listing only directory-level changes (ie "some file in
> this directory has changed") is enough to prune most of Beagle's work
> when it starts up.
>
> > Without low level support like this Beagle is forced to do a rescan on
> > every boot. Since I crash my machine all of the time the disk load
> > from rebooting is intolerable and I turn Beagle off. Even just turning
> > the machine on in the morning generates an annoyingly large load on
> > the disk.
>
> Understood. The need is clear.
>
> My Dad's WinXP system takes 10 minutes after every start-up before it's
> usable, simply because the virus scanner has to check every file in the
> system. Same problem!
>
> >> I don't see why this couldn't be done on Linux as well.
> >>
> >>> ---------- Forwarded message ----------
> >>> From: Jon Smirl <jonsmirl@gmail.com>
> >>> Date: Nov 13, 2007 4:44 PM
> >>> Subject: Re: Strange "beagle" interaction..
> >>> To: Linus Torvalds <torvalds@linux-foundation.org>
> >>> Cc: "J. Bruce Fields" <bfields@fieldses.org>, Junio C Hamano
> >>> <gitster@pobox.com>, Git Mailing List <git@vger.kernel.org>, Johannes
> >>> Schindelin <Johannes.Schindelin@gmx.de>
> >>>
> >>>
> >>> On 11/13/07, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> >>>>
> >>>> On Tue, 13 Nov 2007, J. Bruce Fields wrote:
> >>>>> Last I ran across this, I believe I found it was adding extended
> >>>>> attributes to the file.
> >>>> Yeah, I just straced it and found the same thing. It's saving
> >>>> fingerprints
> >>>> and mtimes to files in the extended attributes.
> >>> Things like Beagle need a guaranteed log of global inotify events.
> >>> That would let them efficiently find changes made since the last time
> >>> they updated their index.
> >>>
> >>> Right now every time Beagle starts it hasn't got a clue what has
> >>> changed in the file system since it was last run. This forces Beagle
> >>> to rescan the entire filesystem every time it is started. The xattrs
> >>> are used as cache to reduce this load somewhat.
> >>>
> >>> A better solution would be for the kernel to log inotify events to
> >>> disk in a manner that survives reboots. When Beagle starts it would
> >>> locate its last checkpoint and then process the logged inotify events
> >>> from that time forward. This inotify logging needs to be bullet proof
> >>> or it will mess up your Beagle index.
> >>>
> >>> Logged files systems already contain the logged inotify data (in their
> >>> own internal form). There's just no universal API for retrieving it in
> >>> a file system independent manner.
> >>>
> >>>>> Yeah, I just turned off beagle. It looked to me like it was doing
> >>>>> something wrongheaded.
> >>>> Gaah. The problem is, setting xattrs does actually change ctime.
> >>>> Which
> >>>> means that if we want to make git play nice with beagle, I guess
> >>>> we have
> >>>> to just remove the comparison of ctime.
> >>>>
> >>>> Oh, well. Git doesn't *require* it, but I like the notion of
> >>>> checking the
> >>>> inode really really carefully. But it looks like it may not be an
> >>>> option,
> >>>> because of file indexers hiding stuff behind our backs.
> >>>>
> >>>> Or we could just tell people not to run beagle on their git trees,
> >>>> but I
> >>>> suspect some people will actually *want* to. Even if it flushes
> >>>> their disk
> >>>> caches.
> >>>>
> >>>> Linus
> >>>> -
> >>>> To unsubscribe from this list: send the line "unsubscribe git" in
> >>>> the body of a message to majordomo@vger.kernel.org
> >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> >>>>
> >>>
> >>> --
> >>> Jon Smirl
> >>> jonsmirl@gmail.com
> >>>
> >>>
> >>> --
> >>> Jon Smirl
> >>> jonsmirl@gmail.com
> >>> -
> >>> To unsubscribe from this list: send the line "unsubscribe linux-
> >>> fsdevel" in
> >>> the body of a message to majordomo@vger.kernel.org
> >>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> >> --
> >> Chuck Lever
> >> chuck[dot]lever[at]oracle[dot]com
> >>
> >>
> >>
> >>
> >
> >
>
>
--
Jon Smirl
jonsmirl@gmail.com
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Beagle and logging inotify events
2007-11-14 13:44 ` Jon Smirl
2007-11-14 14:41 ` Chuck Lever
@ 2007-11-14 15:30 ` Andi Kleen
2007-11-14 19:09 ` J. Bruce Fields
1 sibling, 1 reply; 16+ messages in thread
From: Andi Kleen @ 2007-11-14 15:30 UTC (permalink / raw)
To: Jon Smirl; +Cc: Chuck Lever, linux-fsdevel
"Jon Smirl" <jonsmirl@gmail.com> writes:
> On 11/14/07, Chuck Lever <chuck.lever@oracle.com> wrote:
>> On Nov 13, 2007, at 7:04 PM, Jon Smirl wrote:
>> > Is it feasible to do something like this in the linux file system
>> > architecture?
>> >
>> > Beagle beats on my disk for an hour when I reboot. Of course I don't
>> > like that and I shut Beagle off.
>>
>> Leopard, by the way, does exactly this: it has a daemon that starts
>> at boot time and taps FSEvents then journals file system changes to a
>> well-known file on local disk.
>
> Logging file systems have all of the needed info.
Actually most journaling file systems in Linux use block logging and
it would be probably hard to get specific file names out of a random
collection of logged blocks. And even if you could they would
hit a lot of false positives since everything is rounded up
to block level.
With intent logging like in XFS/JFS it would be easier, but even
then costly :- e.g. they might log changes to the inode but
there is no back pointer to the file name short of searching the
whole directory tree.
-Andi
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Beagle and logging inotify events
2007-11-14 15:01 ` Jon Smirl
@ 2007-11-14 16:32 ` Chuck Lever
2007-11-14 17:46 ` Jon Smirl
2007-11-14 19:32 ` Andreas Dilger
0 siblings, 2 replies; 16+ messages in thread
From: Chuck Lever @ 2007-11-14 16:32 UTC (permalink / raw)
To: Jon Smirl; +Cc: linux-fsdevel
[-- Attachment #1: Type: text/plain, Size: 7471 bytes --]
Jon Smirl wrote:
> On 11/14/07, Chuck Lever <chuck.lever@oracle.com> wrote:
>> Jon Smirl wrote:
>>> On 11/14/07, Chuck Lever <chuck.lever@oracle.com> wrote:
>>>> On Nov 13, 2007, at 7:04 PM, Jon Smirl wrote:
>>>>> Is it feasible to do something like this in the linux file system
>>>>> architecture?
>>>>>
>>>>> Beagle beats on my disk for an hour when I reboot. Of course I don't
>>>>> like that and I shut Beagle off.
>>>> Leopard, by the way, does exactly this: it has a daemon that starts
>>>> at boot time and taps FSEvents then journals file system changes to a
>>>> well-known file on local disk.
>>> Logging file systems have all of the needed info. Plus they know what
>>> is going on with rollback/replay after a crash.
>> True, but not all file systems have a journal. Consider ext2 or FAT32,
>> both of which are still common.
>
> ext2/FAT32 can use the deamon approach you describe below which also
> works as a short term solution. The Beagle people do have a deamon but
> it can be turned off. Holes where you don't record the inotify events
> and update the index are really bad because they can make files that
> you know are on the disk disappear from the index. I don't believe
> Beagle distinguishes between someone turning it off for a day and then
> turning it back on, vs a reboot. In both cases it says there was a
> window where untracked changes could have happened and it triggers a
> full rescan.
>
> The root problem here is needing a bullet proof inotify log with no
> windows.
I disagree: we don't need a "bullet-proof" log. We can get a
significant performance improvement even with a permanent dnotify log
implemented in user-space. We already have well-defined fallback
behavior if such a log is missing or incomplete.
The problem with a permanent inotify log is that it can become
unmanageably enormous, and a performance problem to boot. Recording at
that level of detail makes it more likely that the logger won't be able
to keep up with file system activity.
A lightweight solution gets us most of the way there, is simple to
implement, and doesn't introduce many new issues. As long as it can
tell us precisely where the holes are, it shouldn't be a problem.
> The only place that is going to happen is inside the file
> system logs.
As Andi points out, existing block-based journaling implementations
won't easily provide this. And most fs journals are actually pretty
limited in size.
Alternately, you could insert a stackable file system layer between the
VFS and the on-disk fs to provide more seamless information about updates.
> We just need an API to say recreate the inotify stream
> from this checkpoint forward. Things like FAT/ext2 will always return
> a no data available error from this API.
>
>>> How about a fs API
>>> where Beagle has a token for a checkpoint, and then it can ask for a
>>> recreation of inotify events from that point forward. It's always
>>> possible for the file system to say I can't do that and trigger a full
>>> rebuild from Beagle. Daemons that aren't coordinated with the file
>>> system have a window during crash/reboot where they can get confused.
>> A reasonably effective solution can be implemented in user space without
>> changes to the file system APIs or implementations. IOW we already have
>> the tools to make something useful.
>>
>> For example, you don't need to record every file system event to make
>> this useful. Listing only directory-level changes (ie "some file in
>> this directory has changed") is enough to prune most of Beagle's work
>> when it starts up.
>>
>>> Without low level support like this Beagle is forced to do a rescan on
>>> every boot. Since I crash my machine all of the time the disk load
>>> from rebooting is intolerable and I turn Beagle off. Even just turning
>>> the machine on in the morning generates an annoyingly large load on
>>> the disk.
>> Understood. The need is clear.
>>
>> My Dad's WinXP system takes 10 minutes after every start-up before it's
>> usable, simply because the virus scanner has to check every file in the
>> system. Same problem!
>>
>>>> I don't see why this couldn't be done on Linux as well.
>>>>
>>>>> ---------- Forwarded message ----------
>>>>> From: Jon Smirl <jonsmirl@gmail.com>
>>>>> Date: Nov 13, 2007 4:44 PM
>>>>> Subject: Re: Strange "beagle" interaction..
>>>>> To: Linus Torvalds <torvalds@linux-foundation.org>
>>>>> Cc: "J. Bruce Fields" <bfields@fieldses.org>, Junio C Hamano
>>>>> <gitster@pobox.com>, Git Mailing List <git@vger.kernel.org>, Johannes
>>>>> Schindelin <Johannes.Schindelin@gmx.de>
>>>>>
>>>>>
>>>>> On 11/13/07, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>>>>>> On Tue, 13 Nov 2007, J. Bruce Fields wrote:
>>>>>>> Last I ran across this, I believe I found it was adding extended
>>>>>>> attributes to the file.
>>>>>> Yeah, I just straced it and found the same thing. It's saving
>>>>>> fingerprints
>>>>>> and mtimes to files in the extended attributes.
>>>>> Things like Beagle need a guaranteed log of global inotify events.
>>>>> That would let them efficiently find changes made since the last time
>>>>> they updated their index.
>>>>>
>>>>> Right now every time Beagle starts it hasn't got a clue what has
>>>>> changed in the file system since it was last run. This forces Beagle
>>>>> to rescan the entire filesystem every time it is started. The xattrs
>>>>> are used as cache to reduce this load somewhat.
>>>>>
>>>>> A better solution would be for the kernel to log inotify events to
>>>>> disk in a manner that survives reboots. When Beagle starts it would
>>>>> locate its last checkpoint and then process the logged inotify events
>>>>> from that time forward. This inotify logging needs to be bullet proof
>>>>> or it will mess up your Beagle index.
>>>>>
>>>>> Logged files systems already contain the logged inotify data (in their
>>>>> own internal form). There's just no universal API for retrieving it in
>>>>> a file system independent manner.
>>>>>
>>>>>>> Yeah, I just turned off beagle. It looked to me like it was doing
>>>>>>> something wrongheaded.
>>>>>> Gaah. The problem is, setting xattrs does actually change ctime.
>>>>>> Which
>>>>>> means that if we want to make git play nice with beagle, I guess
>>>>>> we have
>>>>>> to just remove the comparison of ctime.
>>>>>>
>>>>>> Oh, well. Git doesn't *require* it, but I like the notion of
>>>>>> checking the
>>>>>> inode really really carefully. But it looks like it may not be an
>>>>>> option,
>>>>>> because of file indexers hiding stuff behind our backs.
>>>>>>
>>>>>> Or we could just tell people not to run beagle on their git trees,
>>>>>> but I
>>>>>> suspect some people will actually *want* to. Even if it flushes
>>>>>> their disk
>>>>>> caches.
>>>>>>
>>>>>> Linus
>>>>>> -
>>>>>> To unsubscribe from this list: send the line "unsubscribe git" in
>>>>>> the body of a message to majordomo@vger.kernel.org
>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>> --
>>>>> Jon Smirl
>>>>> jonsmirl@gmail.com
>>>>>
>>>>>
>>>>> --
>>>>> Jon Smirl
>>>>> jonsmirl@gmail.com
>>>>> -
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-
>>>>> fsdevel" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>> --
>>>> Chuck Lever
>>>> chuck[dot]lever[at]oracle[dot]com
>>>>
>>>>
>>>>
>>>>
>>>
>>
>
>
[-- Attachment #2: chuck.lever.vcf --]
[-- Type: text/x-vcard, Size: 259 bytes --]
begin:vcard
fn:Chuck Lever
n:Lever;Chuck
org:Oracle Corporation;Corporate Architecture: Linux Projects Group
adr:;;1015 Granger Avenue;Ann Arbor;MI;48104;USA
title:Principal Member of Staff
tel;work:+1 248 614 5091
x-mozilla-html:FALSE
version:2.1
end:vcard
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Beagle and logging inotify events
2007-11-14 16:32 ` Chuck Lever
@ 2007-11-14 17:46 ` Jon Smirl
2007-11-14 19:32 ` Andreas Dilger
1 sibling, 0 replies; 16+ messages in thread
From: Jon Smirl @ 2007-11-14 17:46 UTC (permalink / raw)
To: chuck.lever; +Cc: linux-fsdevel
On 11/14/07, Chuck Lever <chuck.lever@oracle.com> wrote:
> Jon Smirl wrote:
> > On 11/14/07, Chuck Lever <chuck.lever@oracle.com> wrote:
> >> Jon Smirl wrote:
> >>> On 11/14/07, Chuck Lever <chuck.lever@oracle.com> wrote:
> >>>> On Nov 13, 2007, at 7:04 PM, Jon Smirl wrote:
> >>>>> Is it feasible to do something like this in the linux file system
> >>>>> architecture?
> >>>>>
> >>>>> Beagle beats on my disk for an hour when I reboot. Of course I don't
> >>>>> like that and I shut Beagle off.
> >>>> Leopard, by the way, does exactly this: it has a daemon that starts
> >>>> at boot time and taps FSEvents then journals file system changes to a
> >>>> well-known file on local disk.
> >>> Logging file systems have all of the needed info. Plus they know what
> >>> is going on with rollback/replay after a crash.
> >> True, but not all file systems have a journal. Consider ext2 or FAT32,
> >> both of which are still common.
> >
> > ext2/FAT32 can use the deamon approach you describe below which also
> > works as a short term solution. The Beagle people do have a deamon but
> > it can be turned off. Holes where you don't record the inotify events
> > and update the index are really bad because they can make files that
> > you know are on the disk disappear from the index. I don't believe
> > Beagle distinguishes between someone turning it off for a day and then
> > turning it back on, vs a reboot. In both cases it says there was a
> > window where untracked changes could have happened and it triggers a
> > full rescan.
> >
> > The root problem here is needing a bullet proof inotify log with no
> > windows.
>
> I disagree: we don't need a "bullet-proof" log. We can get a
> significant performance improvement even with a permanent dnotify log
> implemented in user-space. We already have well-defined fallback
> behavior if such a log is missing or incomplete.
>
> The problem with a permanent inotify log is that it can become
> unmanageably enormous, and a performance problem to boot. Recording at
> that level of detail makes it more likely that the logger won't be able
> to keep up with file system activity.
It doesn't have to become enormous, if the checkpoint request is too
old just return no-data and trigger a full scan in Beagle. 50K of log
data would probably be enough. The main thing you need to cover is the
reboot process and files that get touched after the beagle shuts down
or before it gets started. For example the log could checkpoint once a
minute, in that case you wouldn't need more than two minutes worth of
log. Beagle would just remember the last checkpoint it processed and
apply reapply changes after it.
If someone turns Beagle off for a couple of days it should be expected
that they will need a full scan when they turn it back on.
>
> A lightweight solution gets us most of the way there, is simple to
> implement, and doesn't introduce many new issues. As long as it can
> tell us precisely where the holes are, it shouldn't be a problem.
>
> > The only place that is going to happen is inside the file
> > system logs.
>
> As Andi points out, existing block-based journaling implementations
> won't easily provide this. And most fs journals are actually pretty
> limited in size.
>
> Alternately, you could insert a stackable file system layer between the
> VFS and the on-disk fs to provide more seamless information about updates.
>
> > We just need an API to say recreate the inotify stream
> > from this checkpoint forward. Things like FAT/ext2 will always return
> > a no data available error from this API.
> >
> >>> How about a fs API
> >>> where Beagle has a token for a checkpoint, and then it can ask for a
> >>> recreation of inotify events from that point forward. It's always
> >>> possible for the file system to say I can't do that and trigger a full
> >>> rebuild from Beagle. Daemons that aren't coordinated with the file
> >>> system have a window during crash/reboot where they can get confused.
> >> A reasonably effective solution can be implemented in user space without
> >> changes to the file system APIs or implementations. IOW we already have
> >> the tools to make something useful.
> >>
> >> For example, you don't need to record every file system event to make
> >> this useful. Listing only directory-level changes (ie "some file in
> >> this directory has changed") is enough to prune most of Beagle's work
> >> when it starts up.
> >>
> >>> Without low level support like this Beagle is forced to do a rescan on
> >>> every boot. Since I crash my machine all of the time the disk load
> >>> from rebooting is intolerable and I turn Beagle off. Even just turning
> >>> the machine on in the morning generates an annoyingly large load on
> >>> the disk.
> >> Understood. The need is clear.
> >>
> >> My Dad's WinXP system takes 10 minutes after every start-up before it's
> >> usable, simply because the virus scanner has to check every file in the
> >> system. Same problem!
> >>
> >>>> I don't see why this couldn't be done on Linux as well.
> >>>>
> >>>>> ---------- Forwarded message ----------
> >>>>> From: Jon Smirl <jonsmirl@gmail.com>
> >>>>> Date: Nov 13, 2007 4:44 PM
> >>>>> Subject: Re: Strange "beagle" interaction..
> >>>>> To: Linus Torvalds <torvalds@linux-foundation.org>
> >>>>> Cc: "J. Bruce Fields" <bfields@fieldses.org>, Junio C Hamano
> >>>>> <gitster@pobox.com>, Git Mailing List <git@vger.kernel.org>, Johannes
> >>>>> Schindelin <Johannes.Schindelin@gmx.de>
> >>>>>
> >>>>>
> >>>>> On 11/13/07, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> >>>>>> On Tue, 13 Nov 2007, J. Bruce Fields wrote:
> >>>>>>> Last I ran across this, I believe I found it was adding extended
> >>>>>>> attributes to the file.
> >>>>>> Yeah, I just straced it and found the same thing. It's saving
> >>>>>> fingerprints
> >>>>>> and mtimes to files in the extended attributes.
> >>>>> Things like Beagle need a guaranteed log of global inotify events.
> >>>>> That would let them efficiently find changes made since the last time
> >>>>> they updated their index.
> >>>>>
> >>>>> Right now every time Beagle starts it hasn't got a clue what has
> >>>>> changed in the file system since it was last run. This forces Beagle
> >>>>> to rescan the entire filesystem every time it is started. The xattrs
> >>>>> are used as cache to reduce this load somewhat.
> >>>>>
> >>>>> A better solution would be for the kernel to log inotify events to
> >>>>> disk in a manner that survives reboots. When Beagle starts it would
> >>>>> locate its last checkpoint and then process the logged inotify events
> >>>>> from that time forward. This inotify logging needs to be bullet proof
> >>>>> or it will mess up your Beagle index.
> >>>>>
> >>>>> Logged files systems already contain the logged inotify data (in their
> >>>>> own internal form). There's just no universal API for retrieving it in
> >>>>> a file system independent manner.
> >>>>>
> >>>>>>> Yeah, I just turned off beagle. It looked to me like it was doing
> >>>>>>> something wrongheaded.
> >>>>>> Gaah. The problem is, setting xattrs does actually change ctime.
> >>>>>> Which
> >>>>>> means that if we want to make git play nice with beagle, I guess
> >>>>>> we have
> >>>>>> to just remove the comparison of ctime.
> >>>>>>
> >>>>>> Oh, well. Git doesn't *require* it, but I like the notion of
> >>>>>> checking the
> >>>>>> inode really really carefully. But it looks like it may not be an
> >>>>>> option,
> >>>>>> because of file indexers hiding stuff behind our backs.
> >>>>>>
> >>>>>> Or we could just tell people not to run beagle on their git trees,
> >>>>>> but I
> >>>>>> suspect some people will actually *want* to. Even if it flushes
> >>>>>> their disk
> >>>>>> caches.
> >>>>>>
> >>>>>> Linus
> >>>>>> -
> >>>>>> To unsubscribe from this list: send the line "unsubscribe git" in
> >>>>>> the body of a message to majordomo@vger.kernel.org
> >>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> >>>>>>
> >>>>> --
> >>>>> Jon Smirl
> >>>>> jonsmirl@gmail.com
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Jon Smirl
> >>>>> jonsmirl@gmail.com
> >>>>> -
> >>>>> To unsubscribe from this list: send the line "unsubscribe linux-
> >>>>> fsdevel" in
> >>>>> the body of a message to majordomo@vger.kernel.org
> >>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
> >>>> --
> >>>> Chuck Lever
> >>>> chuck[dot]lever[at]oracle[dot]com
> >>>>
> >>>>
> >>>>
> >>>>
> >>>
> >>
> >
> >
>
>
--
Jon Smirl
jonsmirl@gmail.com
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Beagle and logging inotify events
2007-11-14 15:30 ` Andi Kleen
@ 2007-11-14 19:09 ` J. Bruce Fields
2007-11-14 19:22 ` Jon Smirl
0 siblings, 1 reply; 16+ messages in thread
From: J. Bruce Fields @ 2007-11-14 19:09 UTC (permalink / raw)
To: Andi Kleen; +Cc: Jon Smirl, Chuck Lever, linux-fsdevel
On Wed, Nov 14, 2007 at 04:30:16PM +0100, Andi Kleen wrote:
> "Jon Smirl" <jonsmirl@gmail.com> writes:
>
> > On 11/14/07, Chuck Lever <chuck.lever@oracle.com> wrote:
> >> On Nov 13, 2007, at 7:04 PM, Jon Smirl wrote:
> >> > Is it feasible to do something like this in the linux file system
> >> > architecture?
> >> >
> >> > Beagle beats on my disk for an hour when I reboot. Of course I don't
> >> > like that and I shut Beagle off.
> >>
> >> Leopard, by the way, does exactly this: it has a daemon that starts
> >> at boot time and taps FSEvents then journals file system changes to a
> >> well-known file on local disk.
> >
> > Logging file systems have all of the needed info.
>
> Actually most journaling file systems in Linux use block logging and
> it would be probably hard to get specific file names out of a random
> collection of logged blocks. And even if you could they would
> hit a lot of false positives since everything is rounded up
> to block level.
>
> With intent logging like in XFS/JFS it would be easier, but even
> then costly :- e.g. they might log changes to the inode but
> there is no back pointer to the file name short of searching the
> whole directory tree.
So it seems the best approach given the current api's would be just to
cache all the stat data, and stat every file on reboot.
I don't understand why beagle is reading the entire filesystem data. I
understand why even just doing the stat's could be prohibitive, though.
--b.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Beagle and logging inotify events
2007-11-14 19:09 ` J. Bruce Fields
@ 2007-11-14 19:22 ` Jon Smirl
2007-11-14 19:30 ` J. Bruce Fields
0 siblings, 1 reply; 16+ messages in thread
From: Jon Smirl @ 2007-11-14 19:22 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: Andi Kleen, Chuck Lever, linux-fsdevel
On 11/14/07, J. Bruce Fields <bfields@fieldses.org> wrote:
> On Wed, Nov 14, 2007 at 04:30:16PM +0100, Andi Kleen wrote:
> > "Jon Smirl" <jonsmirl@gmail.com> writes:
> >
> > > On 11/14/07, Chuck Lever <chuck.lever@oracle.com> wrote:
> > >> On Nov 13, 2007, at 7:04 PM, Jon Smirl wrote:
> > >> > Is it feasible to do something like this in the linux file system
> > >> > architecture?
> > >> >
> > >> > Beagle beats on my disk for an hour when I reboot. Of course I don't
> > >> > like that and I shut Beagle off.
> > >>
> > >> Leopard, by the way, does exactly this: it has a daemon that starts
> > >> at boot time and taps FSEvents then journals file system changes to a
> > >> well-known file on local disk.
> > >
> > > Logging file systems have all of the needed info.
> >
> > Actually most journaling file systems in Linux use block logging and
> > it would be probably hard to get specific file names out of a random
> > collection of logged blocks. And even if you could they would
> > hit a lot of false positives since everything is rounded up
> > to block level.
> >
> > With intent logging like in XFS/JFS it would be easier, but even
> > then costly :- e.g. they might log changes to the inode but
> > there is no back pointer to the file name short of searching the
> > whole directory tree.
>
> So it seems the best approach given the current api's would be just to
> cache all the stat data, and stat every file on reboot.
>
> I don't understand why beagle is reading the entire filesystem data. I
> understand why even just doing the stat's could be prohibitive, though.
I believe Beagle is looking at the mtimes on the files. It uses xattrs
to store the last mtime it checked and then compares it to the current
mtime. It also stores a hash of the file in an xattr. So even if the
mtimes don't match it recomputes the hash and only if the hashes
differ do it update its free text search index.
>
> --b.
>
--
Jon Smirl
jonsmirl@gmail.com
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Beagle and logging inotify events
2007-11-14 19:22 ` Jon Smirl
@ 2007-11-14 19:30 ` J. Bruce Fields
0 siblings, 0 replies; 16+ messages in thread
From: J. Bruce Fields @ 2007-11-14 19:30 UTC (permalink / raw)
To: Jon Smirl; +Cc: Andi Kleen, Chuck Lever, linux-fsdevel
On Wed, Nov 14, 2007 at 02:22:51PM -0500, Jon Smirl wrote:
> On 11/14/07, J. Bruce Fields <bfields@fieldses.org> wrote:
> > On Wed, Nov 14, 2007 at 04:30:16PM +0100, Andi Kleen wrote:
> > > "Jon Smirl" <jonsmirl@gmail.com> writes:
> > >
> > > > On 11/14/07, Chuck Lever <chuck.lever@oracle.com> wrote:
> > > >> On Nov 13, 2007, at 7:04 PM, Jon Smirl wrote:
> > > >> > Is it feasible to do something like this in the linux file system
> > > >> > architecture?
> > > >> >
> > > >> > Beagle beats on my disk for an hour when I reboot. Of course I don't
> > > >> > like that and I shut Beagle off.
> > > >>
> > > >> Leopard, by the way, does exactly this: it has a daemon that starts
> > > >> at boot time and taps FSEvents then journals file system changes to a
> > > >> well-known file on local disk.
> > > >
> > > > Logging file systems have all of the needed info.
> > >
> > > Actually most journaling file systems in Linux use block logging and
> > > it would be probably hard to get specific file names out of a random
> > > collection of logged blocks. And even if you could they would
> > > hit a lot of false positives since everything is rounded up
> > > to block level.
> > >
> > > With intent logging like in XFS/JFS it would be easier, but even
> > > then costly :- e.g. they might log changes to the inode but
> > > there is no back pointer to the file name short of searching the
> > > whole directory tree.
> >
> > So it seems the best approach given the current api's would be just to
> > cache all the stat data, and stat every file on reboot.
> >
> > I don't understand why beagle is reading the entire filesystem data. I
> > understand why even just doing the stat's could be prohibitive, though.
>
> I believe Beagle is looking at the mtimes on the files. It uses xattrs
> to store the last mtime it checked and then compares it to the current
> mtime. It also stores a hash of the file in an xattr. So even if the
You meant "only if", not "even if"?
> mtimes don't match it recomputes the hash and only if the hashes
> differ do it update its free text search index.
OK, that makes a little more sense. (Though it seems unfortunate to use
xattrs instead of caching the data elsewhere. Git and nfs e.g. both use
the ctime to decide when a file changes, so you're invalidating their
caches unnecessarily.)
--b.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Beagle and logging inotify events
2007-11-14 16:32 ` Chuck Lever
2007-11-14 17:46 ` Jon Smirl
@ 2007-11-14 19:32 ` Andreas Dilger
2007-11-14 19:38 ` J. Bruce Fields
1 sibling, 1 reply; 16+ messages in thread
From: Andreas Dilger @ 2007-11-14 19:32 UTC (permalink / raw)
To: Chuck Lever; +Cc: Jon Smirl, linux-fsdevel, Jan Kara
On Nov 14, 2007 11:32 -0500, Chuck Lever wrote:
> I disagree: we don't need a "bullet-proof" log. We can get a significant
> performance improvement even with a permanent dnotify log implemented in
> user-space. We already have well-defined fallback behavior if such a log
> is missing or incomplete.
>
> The problem with a permanent inotify log is that it can become unmanageably
> enormous, and a performance problem to boot. Recording at that level of
> detail makes it more likely that the logger won't be able to keep up with
> file system activity.
>
> A lightweight solution gets us most of the way there, is simple to
> implement, and doesn't introduce many new issues. As long as it can tell
> us precisely where the holes are, it shouldn't be a problem.
Jan Kara is working on a patch for ext4 which would store a recursive
timestamp for each directory that gives the latest time that a file in
that directory was modified. ZFS has a similar mechanism by virtue of
doing full-tree updates during COW of all the metadata blocks and storing
the most recent transaction number in each block. I suspect btrfs could
do the same thing easily.
That would allow recursive-descent filesystem traversal to be much more
efficient because whole chunks of the filesystem tree can be ignored during
scans.
Cheers, Andreas
--
Andreas Dilger
Sr. Software Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Beagle and logging inotify events
2007-11-14 19:32 ` Andreas Dilger
@ 2007-11-14 19:38 ` J. Bruce Fields
2007-11-15 19:59 ` Jan Kara
0 siblings, 1 reply; 16+ messages in thread
From: J. Bruce Fields @ 2007-11-14 19:38 UTC (permalink / raw)
To: Chuck Lever, Jon Smirl, linux-fsdevel, Jan Kara
On Wed, Nov 14, 2007 at 12:32:45PM -0700, Andreas Dilger wrote:
> On Nov 14, 2007 11:32 -0500, Chuck Lever wrote:
> > I disagree: we don't need a "bullet-proof" log. We can get a significant
> > performance improvement even with a permanent dnotify log implemented in
> > user-space. We already have well-defined fallback behavior if such a log
> > is missing or incomplete.
> >
> > The problem with a permanent inotify log is that it can become unmanageably
> > enormous, and a performance problem to boot. Recording at that level of
> > detail makes it more likely that the logger won't be able to keep up with
> > file system activity.
> >
> > A lightweight solution gets us most of the way there, is simple to
> > implement, and doesn't introduce many new issues. As long as it can tell
> > us precisely where the holes are, it shouldn't be a problem.
>
> Jan Kara is working on a patch for ext4 which would store a recursive
> timestamp for each directory that gives the latest time that a file in
> that directory was modified. ZFS has a similar mechanism by virtue of
> doing full-tree updates during COW of all the metadata blocks and storing
> the most recent transaction number in each block. I suspect btrfs could
> do the same thing easily.
>
> That would allow recursive-descent filesystem traversal to be much more
> efficient because whole chunks of the filesystem tree can be ignored during
> scans.
The problem is that people may not be happy with the random behavior of
hardlinks, right?
--b.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Beagle and logging inotify events
2007-11-14 19:38 ` J. Bruce Fields
@ 2007-11-15 19:59 ` Jan Kara
2007-11-15 20:14 ` J. Bruce Fields
2007-11-15 20:14 ` Jon Smirl
0 siblings, 2 replies; 16+ messages in thread
From: Jan Kara @ 2007-11-15 19:59 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: Chuck Lever, Jon Smirl, linux-fsdevel
On Wed 14-11-07 14:38:05, J. Bruce Fields wrote:
> On Wed, Nov 14, 2007 at 12:32:45PM -0700, Andreas Dilger wrote:
> > On Nov 14, 2007 11:32 -0500, Chuck Lever wrote:
> > > I disagree: we don't need a "bullet-proof" log. We can get a significant
> > > performance improvement even with a permanent dnotify log implemented in
> > > user-space. We already have well-defined fallback behavior if such a log
> > > is missing or incomplete.
> > >
> > > The problem with a permanent inotify log is that it can become unmanageably
> > > enormous, and a performance problem to boot. Recording at that level of
> > > detail makes it more likely that the logger won't be able to keep up with
> > > file system activity.
> > >
> > > A lightweight solution gets us most of the way there, is simple to
> > > implement, and doesn't introduce many new issues. As long as it can tell
> > > us precisely where the holes are, it shouldn't be a problem.
> >
> > Jan Kara is working on a patch for ext4 which would store a recursive
> > timestamp for each directory that gives the latest time that a file in
> > that directory was modified. ZFS has a similar mechanism by virtue of
> > doing full-tree updates during COW of all the metadata blocks and storing
> > the most recent transaction number in each block. I suspect btrfs could
> > do the same thing easily.
> >
> > That would allow recursive-descent filesystem traversal to be much more
> > efficient because whole chunks of the filesystem tree can be ignored during
> > scans.
>
> The problem is that people may not be happy with the random behavior of
> hardlinks, right?
The kernel part has this non-determinism with hardlinks but it can be
worked-around in userspace (and actually if you watch the whole filesystem
you don't care about the non-determinism at all because you are guaranteed
there is *at least one* path which indicates the file was modified). I'm
planning to write a userspace library which would mostly hide the
problems with hardlinks (and also problems with the fact that scanner may
not have enough rights to set inode flags pointed out by Ted) from
applications...
Honza
BTW: Where did this discussion started? Googling the subject gives me just
one news message...
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Beagle and logging inotify events
2007-11-15 19:59 ` Jan Kara
@ 2007-11-15 20:14 ` J. Bruce Fields
2007-11-15 20:14 ` Jon Smirl
1 sibling, 0 replies; 16+ messages in thread
From: J. Bruce Fields @ 2007-11-15 20:14 UTC (permalink / raw)
To: Jan Kara; +Cc: Chuck Lever, Jon Smirl, linux-fsdevel
On Thu, Nov 15, 2007 at 08:59:46PM +0100, Jan Kara wrote:
> BTW: Where did this discussion started? Googling the subject gives me just
> one news message...
Here:
http://marc.info/?l=linux-fsdevel&m=119499881822672&w=2
and before that, here:
http://marc.info/?l=git&m=119498755206826&w=2
--b.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Beagle and logging inotify events
2007-11-15 19:59 ` Jan Kara
2007-11-15 20:14 ` J. Bruce Fields
@ 2007-11-15 20:14 ` Jon Smirl
1 sibling, 0 replies; 16+ messages in thread
From: Jon Smirl @ 2007-11-15 20:14 UTC (permalink / raw)
To: Jan Kara; +Cc: J. Bruce Fields, Chuck Lever, linux-fsdevel
On 11/15/07, Jan Kara <jack@suse.cz> wrote:
> On Wed 14-11-07 14:38:05, J. Bruce Fields wrote:
> > On Wed, Nov 14, 2007 at 12:32:45PM -0700, Andreas Dilger wrote:
> > > On Nov 14, 2007 11:32 -0500, Chuck Lever wrote:
> > > > I disagree: we don't need a "bullet-proof" log. We can get a significant
> > > > performance improvement even with a permanent dnotify log implemented in
> > > > user-space. We already have well-defined fallback behavior if such a log
> > > > is missing or incomplete.
> > > >
> > > > The problem with a permanent inotify log is that it can become unmanageably
> > > > enormous, and a performance problem to boot. Recording at that level of
> > > > detail makes it more likely that the logger won't be able to keep up with
> > > > file system activity.
> > > >
> > > > A lightweight solution gets us most of the way there, is simple to
> > > > implement, and doesn't introduce many new issues. As long as it can tell
> > > > us precisely where the holes are, it shouldn't be a problem.
> > >
> > > Jan Kara is working on a patch for ext4 which would store a recursive
> > > timestamp for each directory that gives the latest time that a file in
> > > that directory was modified. ZFS has a similar mechanism by virtue of
> > > doing full-tree updates during COW of all the metadata blocks and storing
> > > the most recent transaction number in each block. I suspect btrfs could
> > > do the same thing easily.
> > >
> > > That would allow recursive-descent filesystem traversal to be much more
> > > efficient because whole chunks of the filesystem tree can be ignored during
> > > scans.
> >
> > The problem is that people may not be happy with the random behavior of
> > hardlinks, right?
> The kernel part has this non-determinism with hardlinks but it can be
> worked-around in userspace (and actually if you watch the whole filesystem
> you don't care about the non-determinism at all because you are guaranteed
> there is *at least one* path which indicates the file was modified). I'm
> planning to write a userspace library which would mostly hide the
> problems with hardlinks (and also problems with the fact that scanner may
> not have enough rights to set inode flags pointed out by Ted) from
> applications...
>
> Honza
>
> BTW: Where did this discussion started? Googling the subject gives me just
> one news message...
Linus made comments about it on the git list which restarted it.
Beagle's writing to the xattrs makes git think the file has been
changed. It was really a fs question not git so I forwarded it to this
group. Git wouldn't have a problem if there was a way for Beagle to
avoid using the xattrs.
Discussion of the general problem have been going on for a long time
and has caused the creation of dnofity and inotify. But the problem is
still not completely solved since there are these windows where files
can change and the tracker (beagle or other apps) is not aware of the
change.
My thoughts are that the kernel implementation of inotify should be
extended to support checkpoints and replay of events after the
checkpoint. How the file systems or kernel implement this is a harder
problem.
I do know for sure that many people are greatly annoyed by Beagle
beating away on their disk after each boot.
> --
> Jan Kara <jack@suse.cz>
> SUSE Labs, CR
>
--
Jon Smirl
jonsmirl@gmail.com
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2007-11-15 20:14 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-11-14 0:04 Beagle and logging inotify events Jon Smirl
2007-11-14 13:29 ` Chuck Lever
2007-11-14 13:44 ` Jon Smirl
2007-11-14 14:41 ` Chuck Lever
2007-11-14 15:01 ` Jon Smirl
2007-11-14 16:32 ` Chuck Lever
2007-11-14 17:46 ` Jon Smirl
2007-11-14 19:32 ` Andreas Dilger
2007-11-14 19:38 ` J. Bruce Fields
2007-11-15 19:59 ` Jan Kara
2007-11-15 20:14 ` J. Bruce Fields
2007-11-15 20:14 ` Jon Smirl
2007-11-14 15:30 ` Andi Kleen
2007-11-14 19:09 ` J. Bruce Fields
2007-11-14 19:22 ` Jon Smirl
2007-11-14 19:30 ` J. Bruce Fields
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).