* Recursive modfied-timestamp?
@ 2004-12-31 9:47 Fred Schaettgen
2004-12-31 22:49 ` David Masover
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Fred Schaettgen @ 2004-12-31 9:47 UTC (permalink / raw)
To: reiserfs-list
Hi,
Does reiser4 support something like recursive last-modified-timestamps? What I
mean is an attribute which contains the latest modification date of all
subdirectories and files below a given directory.
Actually I am also curios if there are any other linux file system which
support that. The reason I'm asking on the reiserfs mailinglist is that
reiser4 seems to be the filesystem which is most open for new features.
Could this be implemented as some sort of plugin for reiser4? Or does/will
reiser4 support any other concepts which can be used for that purpose?
The purpose btw. is to find all modified files in a tree as fast as possible.
There are quite a lot of application which would benefit from it: desktop
search engines, locate, build systems, tools which visualize contents of a
file system (like fsview in KDE), backup tools etc.
I know that modifying an attibute recursively on every update of the stat data
would have a huge perfomance impact, but there are many things that could be
done to keep the extra load low for most of the time.
It seem very likely that this is an idea which was discussed over and over
again already, but I really didn't find much about it. As a KDE developer,
I'm not much involved in filesystems, so maybe I'm just looking for the wrong
keywords?
Fred
--
Fred Schaettgen
kde.Sch@ttgen.net
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Recursive modfied-timestamp?
2004-12-31 9:47 Fred Schaettgen
@ 2004-12-31 22:49 ` David Masover
2005-01-01 0:51 ` Alexander G. M. Smith
2005-01-01 21:49 ` Hans Reiser
2 siblings, 0 replies; 7+ messages in thread
From: David Masover @ 2004-12-31 22:49 UTC (permalink / raw)
To: Fred Schaettgen; +Cc: reiserfs-list
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Fred Schaettgen wrote:
| Hi,
|
| Does reiser4 support something like recursive
last-modified-timestamps? What I
| mean is an attribute which contains the latest modification date of all
| subdirectories and files below a given directory.
Actually, I'm not sure about that, but reiser4 supports plugins. Maybe
there's a kind of plugin which does what you want. Or maybe you haven't
defined "what you want" properly? (see below)
[...]
| The purpose btw. is to find all modified files in a tree as fast as
possible.
| There are quite a lot of application which would benefit from it: desktop
| search engines, locate, build systems, tools which visualize contents
of a
| file system (like fsview in KDE), backup tools etc.
Seems like all of those are really problems of caching/metadata, or more
accurately, "things which Make would understand". How about some more
general way of caching or cache invalidation?
Here's how I would do it. I'd make a standard for object dependencies
within the filesystem, some way like "make". This is the same thing I
ranted about as a way for accessing the contents of zipfiles as part of
the filesystem, without a performance hit. (cat foo.zip/bar.txt)
For instance, your search engine needs an index, which depends on (is
built from) all the files in the filesystem except itself. Thus you
might have an index for each folder (starting with /). Each index
depends on the indices of its subdirectories. When a search is run,
everything has to be rebuilt, in "make"-like fashion, but it gives you
one global place to add the "many things that could be done" to improve
performance for all systems that do this kind of thing -- search engins,
locate, build systems, fsview, and backup tools.
| I know that modifying an attibute recursively on every update of the
stat data
| would have a huge perfomance impact, but there are many things that
could be
| done to keep the extra load low for most of the time.
Which of these things benefits from being _in_ the filesystem? Not that
I don't like your approach (see above), I just want you to think harder
about it.
| It seem very likely that this is an idea which was discussed over and
over
| again already, but I really didn't find much about it. As a KDE
developer,
| I'm not much involved in filesystems, so maybe I'm just looking for
the wrong
| keywords?
Maybe. Seems like people use things like FAM nowdays. But you're
right, there needs to be a better way. For instance, your desktop
search engine should only rebuild even the stat data when a user enters
a query, but it should be able to do it quickly (without searching the
whole tree).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
iQIVAwUBQdXXYngHNmZLgCUhAQJN2hAAkSk54jLWiKm6fhSp5+/gdhkps6LjsIHA
FOuKX62YQdUm+3oNfM+dm+r0Unkx5+NDbojxDujcezy1DHxUJKb1syhU3lE+IngE
XLIy3+GhoJSX0d8VLP9CALMpYVqlJbmvp9Xj6bSpqErTOKxeY18hHqG7ZljVQQfT
jQjg99pE4uDRQXVfJzygCep6sbjcB6aFFrfwDOmFpv6Qfp5Dho/Ladqm/v85S45H
NEuTeYVwyzuvSah8BqMQJTmtdfY2GdwcKAfQ6g3i/ATC0GdDrou1R+2YDdBkTYvM
uGw+P8qKmQw+q/WgXJjx0WFnAZHqHVayXMqdwPr4bONXdUPb5IHR7PXjxjB2acui
WuzsQ9tLupuBOpr0tiDbJlm7+ozHudShydbPRRQTop0FbZKecLrw1aA+MLg+krRs
waX9Shs24JWh/3MXZlO4I3os4nFLnhgOiHuNRVv4iZt7aAurvWYmWR5iCELvzwil
Sv6pxpHfu8F0sNzhnoKloj75zYCvNjzsINSepckqlt3zuBmlExXKpLf1pRWkNaA2
Q6oewc9ppFwhErD9+Tn177HIDZMiWhwDopMxyWp8CcNvcY7M9p5uGVAyq6/vSQcc
yky8clLnpU9NTMNDrp7WIA0srpUP8DZYyFqzzQC+ePREO9n3LnB1RU3CNqGT8xoR
f8TIvSw26zU=
=v/lu
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Recursive modfied-timestamp?
2004-12-31 9:47 Fred Schaettgen
2004-12-31 22:49 ` David Masover
@ 2005-01-01 0:51 ` Alexander G. M. Smith
2005-01-01 21:49 ` Hans Reiser
2 siblings, 0 replies; 7+ messages in thread
From: Alexander G. M. Smith @ 2005-01-01 0:51 UTC (permalink / raw)
To: Fred Schaettgen; +Cc: reiserfs-list
Fred Schaettgen wrote on Fri, 31 Dec 2004 10:47:14 +0100:
> The purpose btw. is to find all modified files in a tree as fast as possible.
> There are quite a lot of application which would benefit from it: desktop
> search engines, locate, build systems, tools which visualize contents of a
> file system (like fsview in KDE), backup tools etc.
Does it have to be recursive? BeOS has an index for the last modified date
of all files so it's easy to find all files modified in a given range of
dates. I expect that modern file systems could have something similar.
However, the BeOS index system is global to a disk volume, so finding
recently changed files in a tree means finding recent files then throwing
out the ones outside the tree. That awkwardness has grated against the
nerves of many a BeOS user. But nobody has sat down to figure out a
better solution to the underlying problem (indices stored per directory?).
- Alex
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Recursive modfied-timestamp?
@ 2005-01-01 2:04 Fred Schaettgen
2005-01-02 4:27 ` David Masover
0 siblings, 1 reply; 7+ messages in thread
From: Fred Schaettgen @ 2005-01-01 2:04 UTC (permalink / raw)
To: reiserfs-list
On Saturday 01 January 2005 01:51, you wrote:
> Fred Schaettgen wrote on Fri, 31 Dec 2004 10:47:14 +0100:
> > The purpose btw. is to find all modified files in a tree as fast as
> > possible. There are quite a lot of application which would benefit from
> > it: desktop search engines, locate, build systems, tools which visualize
> > contents of a file system (like fsview in KDE), backup tools etc.
>
> Does it have to be recursive? BeOS has an index for the last modified date
> of all files so it's easy to find all files modified in a given range of
> dates. I expect that modern file systems could have something similar.
>
> However, the BeOS index system is global to a disk volume, so finding
> recently changed files in a tree means finding recent files then throwing
> out the ones outside the tree. That awkwardness has grated against the
> nerves of many a BeOS user. But nobody has sat down to figure out a
> better solution to the underlying problem (indices stored per directory?).
I see.. I didn't know about BeOS' file system, thanks :)
Having an index over various attributes is certainly a powerful feature. But
wouldn't it be better if we could extend the file system in a *minimal* way
which still makes it possible to create such indices efficiently in
userspace?
Moving too much logic into the file system has lots of drawbacks. It makes
the file system complicated, so it will be less likely to be implemented at
all. And if it's implemented, it much harder to keep it up to date than with
userspace programs. It's harder to debug and it's harder to accept for
people how want keep the file systems pure.
I'm not sure if my proposal in my other post in this thread would be more
efficient or easier to implement than a global index for the modification
times, but I guess it's more or less the same in the end.
I don't know how the BeOS indices work, but it sounds like the index is
updated each time a file is modified, which is most likely more time
consuming than my proposal, where the changed-file-list is only updated when
a file is changed for the first time after the recursive mtime was requested
for it. So the performance for frequently updated files won't suffer much.
But from an application point of view, a BeOS-syle mtime-index would be just
as good, especially if there is a userspace layer in between, which allows
per-directory mtime range request or similar.
The changes to the file system itself should just so simple that we don't
have to fight a never ending war for a whole new paradigma.
Fred
--
Fred Schaettgen
Sch@ttgen.net
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Recursive modfied-timestamp?
2004-12-31 9:47 Fred Schaettgen
2004-12-31 22:49 ` David Masover
2005-01-01 0:51 ` Alexander G. M. Smith
@ 2005-01-01 21:49 ` Hans Reiser
2 siblings, 0 replies; 7+ messages in thread
From: Hans Reiser @ 2005-01-01 21:49 UTC (permalink / raw)
To: Fred Schaettgen; +Cc: reiserfs-list
Fred Schaettgen wrote:
>Hi,
>
>Does reiser4 support something like recursive last-modified-timestamps? What I
>mean is an attribute which contains the latest modification date of all
>subdirectories and files below a given directory.
>
>Actually I am also curios if there are any other linux file system which
>support that. The reason I'm asking on the reiserfs mailinglist is that
>reiser4 seems to be the filesystem which is most open for new features.
>Could this be implemented as some sort of plugin for reiser4? Or does/will
>reiser4 support any other concepts which can be used for that purpose?
>
>The purpose btw. is to find all modified files in a tree as fast as possible.
>There are quite a lot of application which would benefit from it: desktop
>search engines, locate, build systems, tools which visualize contents of a
>file system (like fsview in KDE), backup tools etc.
>
>I know that modifying an attibute recursively on every update of the stat data
>would have a huge perfomance impact, but there are many things that could be
>done to keep the extra load low for most of the time.
>It seem very likely that this is an idea which was discussed over and over
>again already, but I really didn't find much about it. As a KDE developer,
>I'm not much involved in filesystems, so maybe I'm just looking for the wrong
>keywords?
>
>Fred
>
>
>
We intend to implement inheritance of metadata, which could be made to
accomplish what you are asking for I think. Nobody is coding that at
the moment though....
We are indeed open to semantic enhancements.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Recursive modfied-timestamp?
2005-01-01 2:04 Recursive modfied-timestamp? Fred Schaettgen
@ 2005-01-02 4:27 ` David Masover
2005-01-02 12:08 ` Fred Schaettgen
0 siblings, 1 reply; 7+ messages in thread
From: David Masover @ 2005-01-02 4:27 UTC (permalink / raw)
To: Fred Schaettgen; +Cc: reiserfs-list
Fred Schaettgen wrote:
> On Saturday 01 January 2005 01:51, you wrote:
>
>>Fred Schaettgen wrote on Fri, 31 Dec 2004 10:47:14 +0100:
>>
>>>The purpose btw. is to find all modified files in a tree as fast as
>>>possible. There are quite a lot of application which would benefit from
>>>it: desktop search engines, locate, build systems, tools which visualize
>>>contents of a file system (like fsview in KDE), backup tools etc.
>>
>>Does it have to be recursive? BeOS has an index for the last modified date
>>of all files so it's easy to find all files modified in a given range of
>>dates. I expect that modern file systems could have something similar.
>>
>>However, the BeOS index system is global to a disk volume, so finding
>>recently changed files in a tree means finding recent files then throwing
>>out the ones outside the tree. That awkwardness has grated against the
>>nerves of many a BeOS user. But nobody has sat down to figure out a
>>better solution to the underlying problem (indices stored per directory?).
>
>
> Moving too much logic into the file system has lots of drawbacks. It makes
> the file system complicated, so it will be less likely to be implemented at
> all. And if it's implemented, it much harder to keep it up to date than with
> userspace programs. It's harder to debug and it's harder to accept for
> people how want keep the file systems pure.
It also has the advantage of being faster, more universal, and more
complete as a solution. Remind me why you wanted your mtimes in the kernel?
> I don't know how the BeOS indices work, but it sounds like the index is
> updated each time a file is modified, which is most likely more time
> consuming than my proposal, where the changed-file-list is only updated when
> a file is changed for the first time after the recursive mtime was requested
> for it. So the performance for frequently updated files won't suffer much.
Speaking of which, how do you make this atomic without more help from
the filesystem?
> The changes to the file system itself should just so simple that we don't
> have to fight a never ending war for a whole new paradigma.
Which is why I like the caching idea. See my last post. Support for
simple userland plugins, combined with intelligent caching by the
kernel, means we don't have to touch the kernel or the filesystem for
most kinds of customizable things we want to do. Your mtime idea is
nice -- it can be done with just those two things in the kernel. What
about a zipfile which is built from a directory tree every time it's
read, but only if files in that tree have changed? Not possible with
only recursive-mtime support (though it would require it).
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Recursive modfied-timestamp?
2005-01-02 4:27 ` David Masover
@ 2005-01-02 12:08 ` Fred Schaettgen
0 siblings, 0 replies; 7+ messages in thread
From: Fred Schaettgen @ 2005-01-02 12:08 UTC (permalink / raw)
To: reiserfs-list
On Sunday 02 January 2005 05:27, David Masover wrote:
> > Moving too much logic into the file system has lots of drawbacks. It
> > makes the file system complicated, so it will be less likely to be
> > implemented at all. And if it's implemented, it much harder to keep it up
> > to date than with userspace programs. It's harder to debug and it's
> > harder to accept for people how want keep the file systems pure.
>
> It also has the advantage of being faster, more universal, and more
> complete as a solution. Remind me why you wanted your mtimes in the
> kernel?
I was just asking if such a feature is supported already. As I said before,
the mtimes itself could be provided by a userspace library and it doesn't
matter if they are stored in EAs or a userspace database or whatever.
Actually I'm not concerned about this mtimes itself, but I want to invalidate
items in a metadata cache if a file changes. The most simple feature (from a
user's point of view) a filesystem could provide to make this cache
invalidation more efficient seemed to be recursive mtimes, and so I was
asking.
> > I don't know how the BeOS indices work, but it sounds like the index is
> > updated each time a file is modified, which is most likely more time
> > consuming than my proposal, where the changed-file-list is only updated
> > when a file is changed for the first time after the recursive mtime was
> > requested for it. So the performance for frequently updated files won't
> > suffer much.
>
> Speaking of which, how do you make this atomic without more help from
> the filesystem?
I don't know. I never worked on a file system before ;) But it doesn't need to
be fully atomic. If a file is changed, then the entry in the changed-list
must be created, but it would be acceptable if a file was listed as changed
without the change really happening for some reason. This will result in an
increased overhead when the mtimes are rebuild, but it won't break anything.
I don't know if this weaker requirement is much easier to fulfill though.
> > The changes to the file system itself should just so simple that we don't
> > have to fight a never ending war for a whole new paradigma.
>
> Which is why I like the caching idea. See my last post. Support for
> simple userland plugins, combined with intelligent caching by the
> kernel, means we don't have to touch the kernel or the filesystem for
> most kinds of customizable things we want to do. Your mtime idea is
> nice -- it can be done with just those two things in the kernel. What
> about a zipfile which is built from a directory tree every time it's
> read, but only if files in that tree have changed? Not possible with
> only recursive-mtime support (though it would require it).
This is in fact possible with recursive mtimes, but just like you need
userspace support for the mtimes itself in my proposal, you would also need
userspace support to update your zip files. It wouldn't be transparent, but
it would still be efficient.
We are talking about different goals here.
You want the file system to do things automatically, which would have to be
done by the user otherwise - probably by having the FS calling back to
userspace.
All I would like to have is help from the file system to do certain things
(updating userspace indices) much more efficiently than it is possible today.
The two concepts you need for your zipfile scenario is metadata-cache
invalidation by the FS and hooks to call back to userspace. I am not
concerned about the latter, but I really want a solution to invalidate cached
metadata. You want to have the FS do that. By doing that you take away the
choice of how to store the cached metadata. If the FS just reports the
changed files somehow, you can decide what metadata needs to be invalidated
in userspace.
And since you have to call back to userspace for the zipfile scenario anyway,
why not let userspace do the metadata cache invalidation too?
Fred
--
Fred Schaettgen
Sch@ttgen.net
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2005-01-02 12:08 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-01-01 2:04 Recursive modfied-timestamp? Fred Schaettgen
2005-01-02 4:27 ` David Masover
2005-01-02 12:08 ` Fred Schaettgen
-- strict thread matches above, loose matches on Subject: below --
2004-12-31 9:47 Fred Schaettgen
2004-12-31 22:49 ` David Masover
2005-01-01 0:51 ` Alexander G. M. Smith
2005-01-01 21:49 ` Hans Reiser
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.