* Recursive modfied-timestamp?
@ 2004-12-31 9:47 Fred Schaettgen
2004-12-31 22:49 ` David Masover
` (2 more replies)
0 siblings, 3 replies; 15+ messages in thread
From: Fred Schaettgen @ 2004-12-31 9:47 UTC (permalink / raw)
To: reiserfs-list
Hi,
Does reiser4 support something like recursive last-modified-timestamps? What I
mean is an attribute which contains the latest modification date of all
subdirectories and files below a given directory.
Actually I am also curios if there are any other linux file system which
support that. The reason I'm asking on the reiserfs mailinglist is that
reiser4 seems to be the filesystem which is most open for new features.
Could this be implemented as some sort of plugin for reiser4? Or does/will
reiser4 support any other concepts which can be used for that purpose?
The purpose btw. is to find all modified files in a tree as fast as possible.
There are quite a lot of application which would benefit from it: desktop
search engines, locate, build systems, tools which visualize contents of a
file system (like fsview in KDE), backup tools etc.
I know that modifying an attibute recursively on every update of the stat data
would have a huge perfomance impact, but there are many things that could be
done to keep the extra load low for most of the time.
It seem very likely that this is an idea which was discussed over and over
again already, but I really didn't find much about it. As a KDE developer,
I'm not much involved in filesystems, so maybe I'm just looking for the wrong
keywords?
Fred
--
Fred Schaettgen
kde.Sch@ttgen.net
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: Recursive modfied-timestamp?
2004-12-31 9:47 Recursive modfied-timestamp? Fred Schaettgen
@ 2004-12-31 22:49 ` David Masover
2005-01-01 0:43 ` Recursive modified-timestamp? Fred Schaettgen
2005-01-01 0:51 ` Recursive modfied-timestamp? Alexander G. M. Smith
2005-01-01 21:49 ` Hans Reiser
2 siblings, 1 reply; 15+ messages in thread
From: David Masover @ 2004-12-31 22:49 UTC (permalink / raw)
To: Fred Schaettgen; +Cc: reiserfs-list
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Fred Schaettgen wrote:
| Hi,
|
| Does reiser4 support something like recursive
last-modified-timestamps? What I
| mean is an attribute which contains the latest modification date of all
| subdirectories and files below a given directory.
Actually, I'm not sure about that, but reiser4 supports plugins. Maybe
there's a kind of plugin which does what you want. Or maybe you haven't
defined "what you want" properly? (see below)
[...]
| The purpose btw. is to find all modified files in a tree as fast as
possible.
| There are quite a lot of application which would benefit from it: desktop
| search engines, locate, build systems, tools which visualize contents
of a
| file system (like fsview in KDE), backup tools etc.
Seems like all of those are really problems of caching/metadata, or more
accurately, "things which Make would understand". How about some more
general way of caching or cache invalidation?
Here's how I would do it. I'd make a standard for object dependencies
within the filesystem, some way like "make". This is the same thing I
ranted about as a way for accessing the contents of zipfiles as part of
the filesystem, without a performance hit. (cat foo.zip/bar.txt)
For instance, your search engine needs an index, which depends on (is
built from) all the files in the filesystem except itself. Thus you
might have an index for each folder (starting with /). Each index
depends on the indices of its subdirectories. When a search is run,
everything has to be rebuilt, in "make"-like fashion, but it gives you
one global place to add the "many things that could be done" to improve
performance for all systems that do this kind of thing -- search engins,
locate, build systems, fsview, and backup tools.
| I know that modifying an attibute recursively on every update of the
stat data
| would have a huge perfomance impact, but there are many things that
could be
| done to keep the extra load low for most of the time.
Which of these things benefits from being _in_ the filesystem? Not that
I don't like your approach (see above), I just want you to think harder
about it.
| It seem very likely that this is an idea which was discussed over and
over
| again already, but I really didn't find much about it. As a KDE
developer,
| I'm not much involved in filesystems, so maybe I'm just looking for
the wrong
| keywords?
Maybe. Seems like people use things like FAM nowdays. But you're
right, there needs to be a better way. For instance, your desktop
search engine should only rebuild even the stat data when a user enters
a query, but it should be able to do it quickly (without searching the
whole tree).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org
iQIVAwUBQdXXYngHNmZLgCUhAQJN2hAAkSk54jLWiKm6fhSp5+/gdhkps6LjsIHA
FOuKX62YQdUm+3oNfM+dm+r0Unkx5+NDbojxDujcezy1DHxUJKb1syhU3lE+IngE
XLIy3+GhoJSX0d8VLP9CALMpYVqlJbmvp9Xj6bSpqErTOKxeY18hHqG7ZljVQQfT
jQjg99pE4uDRQXVfJzygCep6sbjcB6aFFrfwDOmFpv6Qfp5Dho/Ladqm/v85S45H
NEuTeYVwyzuvSah8BqMQJTmtdfY2GdwcKAfQ6g3i/ATC0GdDrou1R+2YDdBkTYvM
uGw+P8qKmQw+q/WgXJjx0WFnAZHqHVayXMqdwPr4bONXdUPb5IHR7PXjxjB2acui
WuzsQ9tLupuBOpr0tiDbJlm7+ozHudShydbPRRQTop0FbZKecLrw1aA+MLg+krRs
waX9Shs24JWh/3MXZlO4I3os4nFLnhgOiHuNRVv4iZt7aAurvWYmWR5iCELvzwil
Sv6pxpHfu8F0sNzhnoKloj75zYCvNjzsINSepckqlt3zuBmlExXKpLf1pRWkNaA2
Q6oewc9ppFwhErD9+Tn177HIDZMiWhwDopMxyWp8CcNvcY7M9p5uGVAyq6/vSQcc
yky8clLnpU9NTMNDrp7WIA0srpUP8DZYyFqzzQC+ePREO9n3LnB1RU3CNqGT8xoR
f8TIvSw26zU=
=v/lu
-----END PGP SIGNATURE-----
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Recursive modified-timestamp?
2004-12-31 22:49 ` David Masover
@ 2005-01-01 0:43 ` Fred Schaettgen
2005-01-01 3:12 ` Alexander G. M. Smith
0 siblings, 1 reply; 15+ messages in thread
From: Fred Schaettgen @ 2005-01-01 0:43 UTC (permalink / raw)
To: reiserfs-list
On Friday 31 December 2004 23:49, David Masover wrote:
...
> Seems like all of those are really problems of caching/metadata, or more
> accurately, "things which Make would understand". How about some more
> general way of caching or cache invalidation?
An entry in a metadata cache must become invalid if the corresponding file
changes. That's exactly what my question was about. I don't want the
filesystem to manage the metadata, just an efficient way to find files with
outdated metadata. From an application's point of view, the recursive
modified-timestamps look like the most intuitive solution for me.
> Here's how I would do it. I'd make a standard for object dependencies
> within the filesystem, some way like "make". This is the same thing I
> ranted about as a way for accessing the contents of zipfiles as part of
> the filesystem, without a performance hit. (cat foo.zip/bar.txt)
I don't want to see that much in the file system itself. I wouldn't even care
if these timestamps had to be retrieved with the help of a userspace daemon
and a library. But without some help of the filesystem itself you always have
to traverse the whole directory tree to find modified files.
> For instance, your search engine needs an index, which depends on (is
> built from) all the files in the filesystem except itself. Thus you
> might have an index for each folder (starting with /). Each index
> depends on the indices of its subdirectories. When a search is run,
> everything has to be rebuilt, in "make"-like fashion, but it gives you
> one global place to add the "many things that could be done" to improve
> performance for all systems that do this kind of thing -- search engins,
> locate, build systems, fsview, and backup tools.
How would the filesystem help in that scenario? It could invalidate or delete
the (sub)index or metadata cache if one of the files it depends on changes,
ok. But can't you do that just as efficiently in userspace if the filesystems
just provides the recursive timestamps?
...
> Seems like people use things like FAM nowdays. But you're
> right, there needs to be a better way. For instance, your desktop
> search engine should only rebuild even the stat data when a user enters
> a query, but it should be able to do it quickly (without searching the
> whole tree).
Yes, this is the problem. And recursively propagating modification timestamps
look like a good solution to me. I am not saying that the file system should
do that iself. Timestamps with this modified semantics would just exist as an
interface to the applications. But the filesystem must help to keep these
timestamps up to date.
The file system itself could help for instance by providing a new
"change-monitor"-flag for a file. This flag would be set only from userspace
and reset when the file is modified. If the flag is still set when the file
is being modified, the filesystem would then create a symlink or something
like for the file in a special directory.
The contents of this changed-files-directory will then be collected and
removed by a daemon, which manages the recursive-mtime-database (no matter if
they are stored as extended attributes or in a Berkely DB or whatever).
Now each application which has to manage a metadata cache could ask that
daemon for the rec-mtime of / first and descent deeper if the rec-mtime is
more recent than a stored timestamp etc.
Actually the "flag" would have to be something like a list of path names,
since a file can be hard linked, but that doesn't change much (I hope).
With this approach, most of the work can be delayed until an application
actually asks for rec-mtimes. The overhead while writing to a file (when the
stat data is updated) would be to check if the change monitor flag is set and
only if it is, remove it and put one - or sometimes a few - symlinks into the
special folder with links to changed files.
Until this point there is no propagating changes up till "/". That would all
be done by a userspace daemon at a later time.
If just the test for the existance of the change monitor flag could be made
efficient enough, then the overhead during regular operation would be
negligible.
I hope that this outline was clear enough to let you tell me if this is
possible or why it isn't :)
bye and a happy new year to one half of the world!
Fred
--
Fred Schaettgen
Sch@ttgen.net
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Recursive modified-timestamp?
2005-01-01 0:43 ` Recursive modified-timestamp? Fred Schaettgen
@ 2005-01-01 3:12 ` Alexander G. M. Smith
2005-01-01 11:56 ` Fred Schaettgen
2005-01-01 12:28 ` Piotr Neuman
0 siblings, 2 replies; 15+ messages in thread
From: Alexander G. M. Smith @ 2005-01-01 3:12 UTC (permalink / raw)
To: Fred Schaettgen; +Cc: reiserfs-list
Fred Schaettgen wrote on Sat, 1 Jan 2005 01:43:48 +0100:
> The file system itself could help for instance by providing a new
> "change-monitor"-flag for a file. This flag would be set only from userspace
> and reset when the file is modified. If the flag is still set when the file
> is being modified, the filesystem would then create a symlink or something
> like for the file in a special directory.
That reminds me that the other thing BeOS had was a change notification
system using messaging. If you requested monitoring of a directory or
file (with flags to say which kind of changes are of interest) then
it would send your program a BMessage with the details (such as a file
being added to a directory).
This was also extended to monitor changes to the indices: you gave it a
query expression and the kernel/file system then would send notification
messages if a change to a file (or its attributes) added or removed that
file from the set of files matching the query. Seems like a lot of
overhead, but it wasn't that noticable and did make the OS a lot more
useful (cooler too, directory windows or even complex Find results were
always up to date even as files changed).
But I don't think that's quite what you wanted (and isn't as economical
as your tree of percolated up modification times). Though it would be nifty
(but useless?) to have a build system (make-like) operating in real time -
change a source file and the system automatically recompiles it immediately.
> Moving too much logic into the file system has lots of drawbacks. It
> makes the file system complicated, so it will be less likely to be
> implemented at all.
True. That's why I think query evaluation should be outside the file
system, with just the indices in the kernel / file system API. But
that's another story.
- Alex
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Recursive modified-timestamp?
2005-01-01 3:12 ` Alexander G. M. Smith
@ 2005-01-01 11:56 ` Fred Schaettgen
2005-01-01 12:28 ` Piotr Neuman
1 sibling, 0 replies; 15+ messages in thread
From: Fred Schaettgen @ 2005-01-01 11:56 UTC (permalink / raw)
To: reiserfs-list
On Saturday 01 January 2005 04:12, you wrote:
...
> That reminds me that the other thing BeOS had was a change notification
> system using messaging. If you requested monitoring of a directory or
> file (with flags to say which kind of changes are of interest) then
> it would send your program a BMessage with the details (such as a file
> being added to a directory).
Do you know if this service was actually provided by the file system or was it
just a clever use of a more simple feature of the fs which allows to do that?
There are certainly a lot of things which could be done, if changed files can
be found quickly, but those things don't need to go into fs itself.
> But I don't think that's quite what you wanted (and isn't as economical
> as your tree of percolated up modification times). Though it would be
> nifty (but useless?) to have a build system (make-like) operating in real
> time - change a source file and the system automatically recompiles it
> immediately.
With "change notification flag" I didn't mean to have the send messages, but
put links to the file into a folder, so a daemon can poll for changes. I
guess polling is better in this case, since it limits the overhead even in
persence of many changes.
It's certainly not most important for build systems, at least as long as the
source tree is reasonable small. Just an example among others.
> > Moving too much logic into the file system has lots of drawbacks. It
> > makes the file system complicated, so it will be less likely to be
> > implemented at all.
>
> True. That's why I think query evaluation should be outside the file
> system, with just the indices in the kernel / file system API. But
> that's another story.
No, that's exacly the story. If you want to index various attributes of files,
so that the index can be quickly updated when it's needed. We don't have to
go into details about what you do with that indices in userspace. What we
need to discuss is what changes to the file system would be neccessary, so
that everything you have in mind could be done efficiently.
I claim that the approach I described...
- ...allows all these things to be done efficiently in userspace.
- ...is the smallest change to the fs neccessary for it.
- ...could be implemented in reiser4 without significant performance losses.
Of course I'm not at all sure about these claims, so that's why I'm asking ;)
bye and a Happy new year (to everyone this time)
Fred
--
Fred Schaettgen
Sch@ttgen.net
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Recursive modified-timestamp?
2005-01-01 3:12 ` Alexander G. M. Smith
2005-01-01 11:56 ` Fred Schaettgen
@ 2005-01-01 12:28 ` Piotr Neuman
2005-01-01 13:20 ` Fred Schaettgen
1 sibling, 1 reply; 15+ messages in thread
From: Piotr Neuman @ 2005-01-01 12:28 UTC (permalink / raw)
To: reiserfs-list
Dnia sobota 01 styczeñ 2005 04:12, Alexander G. M. Smith napisa³:
> Fred Schaettgen wrote on Sat, 1 Jan 2005 01:43:48 +0100:
> > The file system itself could help for instance by providing a new
> > "change-monitor"-flag for a file. This flag would be set only from
> > userspace and reset when the file is modified. If the flag is still set
> > when the file is being modified, the filesystem would then create a
> > symlink or something like for the file in a special directory.
>
> That reminds me that the other thing BeOS had was a change notification
> system using messaging. If you requested monitoring of a directory or
> file (with flags to say which kind of changes are of interest) then
> it would send your program a BMessage with the details (such as a file
> being added to a directory).
Linux has both inotify and dnotify. I really love the kind of threads where
nobody cares to do the research into existing solutions /approaches and
everybody are talking about their ideas (which is of course easier than
searching on google).
ps. and yup the GNOME folks are busy coding for inotify instead of imagining
"something that may or may not exist". Just becose you send stuff to
technical mailing list, doesn't mean you have a clue...
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Recursive modified-timestamp?
2005-01-01 12:28 ` Piotr Neuman
@ 2005-01-01 13:20 ` Fred Schaettgen
2005-01-01 17:08 ` Piotr Neuman
0 siblings, 1 reply; 15+ messages in thread
From: Fred Schaettgen @ 2005-01-01 13:20 UTC (permalink / raw)
To: reiserfs-list
On Saturday 01 January 2005 13:28, Piotr Neuman wrote:
> Linux has both inotify and dnotify. I really love the kind of threads where
> nobody cares to do the research into existing solutions /approaches and
> everybody are talking about their ideas (which is of course easier than
> searching on google).
I don't see how dnotify or inotify would help here. You can't monitor whole
directory trees with it. inotify (which I admittedly didn't know about) seems
to allow monitoring just about ~8000 files per device. And you don't have any
guarantee that none of the files changed before the program using inotify
starts up.
It seems like I'm not the only one knowing about the existing solutions, or
why else do I have to wait for several minutes each day until a cronjob has
updated the locatedb?
> ps. and yup the GNOME folks are busy coding for inotify instead of
> imagining "something that may or may not exist". Just becose you send stuff
> to technical mailing list, doesn't mean you have a clue...
So do the KDE folks (not sure if it's dnotify or inotify or simply resorting
to whatever FAM uses). But that's just a solution to a slightly different
problem. So with respect to the question I was asking (after searching on
google) the comment of Alexander about BeOS was much more relevant than
yours. I don't know if this was directed towards me, but it's true, I don't
have a clue about file systems. Maybe that's the reason I had to ask my
question here. Or where else am I allowed to ask such a question in your
opinion?
Fred
--
Fred Schaettgen
Sch@ttgen.net
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Recursive modified-timestamp?
2005-01-01 13:20 ` Fred Schaettgen
@ 2005-01-01 17:08 ` Piotr Neuman
2005-01-01 18:18 ` Fred Schaettgen
0 siblings, 1 reply; 15+ messages in thread
From: Piotr Neuman @ 2005-01-01 17:08 UTC (permalink / raw)
To: reiserfs-list
Dnia sobota 01 styczeñ 2005 14:20, Fred Schaettgen napisa³:
> On Saturday 01 January 2005 13:28, Piotr Neuman wrote:
> > Linux has both inotify and dnotify. I really love the kind of threads
> > where nobody cares to do the research into existing solutions /approaches
> > and everybody are talking about their ideas (which is of course easier
> > than searching on google).
>
> I don't see how dnotify or inotify would help here. You can't monitor whole
> directory trees with it. inotify (which I admittedly didn't know about)
> seems to allow monitoring just about ~8000 files per device. And you don't
> have any guarantee that none of the files changed before the program using
> inotify starts up.
You have the source code, move that limit up, and change data structures if
it's necesary for efficient working with large trees...
Inotify has the big advantage of being filesystem agnostic, while reiser4
plugins do not.
> > ps. and yup the GNOME folks are busy coding for inotify instead of
> > imagining "something that may or may not exist". Just becose you send
> > stuff to technical mailing list, doesn't mean you have a clue...
>
> So do the KDE folks (not sure if it's dnotify or inotify or simply
> resorting to whatever FAM uses). But that's just a solution to a slightly
> different problem. So with respect to the question I was asking (after
> searching on google) the comment of Alexander about BeOS was much more
> relevant than yours. I don't know if this was directed towards me, but it's
> true, I don't have a clue about file systems. Maybe that's the reason I had
> to ask my question here. Or where else am I allowed to ask such a question
> in your opinion?
FAM uses the outdated dnotify (just browse lkml.org for info on inotify
advantages). One of the goals of GNOME now is to have some platform to
compete with the "fabled" Microsoft WinFS. Inotify could replace the "not so
small" FAM, for example read:
http://www.ussg.iu.edu/hypermail/linux/kernel/0407.2/0359.html
http://www.gnome.org/~veillard/gamin/overview.html
I hope that low footprint, inotify based solutions will become the standard
for Linux desktop.
Inotyfy does not support such queries as BeOS did, but knowing the reluctancy
of kernel developers to do anything that may increase bloat and could be done
in userspace anyway, I don't think it will be supported.
Good luck on your searches for a new KDE file notification support/system.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Recursive modified-timestamp?
2005-01-01 17:08 ` Piotr Neuman
@ 2005-01-01 18:18 ` Fred Schaettgen
0 siblings, 0 replies; 15+ messages in thread
From: Fred Schaettgen @ 2005-01-01 18:18 UTC (permalink / raw)
To: reiserfs-list
On Saturday 01 January 2005 18:08, Piotr Neuman wrote:
> > I don't see how dnotify or inotify would help here. You can't monitor
> > whole directory trees with it. inotify (which I admittedly didn't know
> > about) seems to allow monitoring just about ~8000 files per device. And
> > you don't have any guarantee that none of the files changed before the
> > program using inotify starts up.
>
> You have the source code, move that limit up, and change data structures if
> it's necesary for efficient working with large trees...
The model of inotify doesn't fit for this application. It's made to get
instant feedback when a small number of files or directories are changed. If
a daemon would have to consume change notifications for the whole filesystem,
it's hard to imagine that this might work under heavy load. What do you do if
the change buffer runs full because the daemon has problems catching up?
Increasing the buffer? What do you do after a system restart, when you can't
be sure that no files have been touched before a daemon starts to use inotify
to monitor changes? Scan the whole filesystem on each startup?
At the moment I can't imagine how this could be solved *reliably* without the
the help of the file system. Do you?
> FAM uses the outdated dnotify (just browse lkml.org for info on inotify
> advantages). One of the goals of GNOME now is to have some platform to
> compete with the "fabled" Microsoft WinFS. Inotify could replace the "not
> so small" FAM, for example read:
>
> http://www.ussg.iu.edu/hypermail/linux/kernel/0407.2/0359.html
> http://www.gnome.org/~veillard/gamin/overview.html
I can't talk about WinFS, because all I know about it is that it will be great
and late. But that Beagle thing, which uses inotify for indexing looks
interesting. My bet is that it depends heavily on heuristics to monitor the
right files. If this is true, then it's certainly not the best solution.
But before bashing it I have to take a look at it first. Thanks for the links.
...
> Inotyfy does not support such queries as BeOS did, but knowing the
> reluctancy of kernel developers to do anything that may increase bloat and
> could be done in userspace anyway, I don't think it will be supported.
*g* You may be right.
But I thought it might still be worth trying. Especially since reiser4 with
it's heavily advertised plugin system seems to be somewhat more open to
extensions. And I'm sure that a much smaller extension than those of the BeOS
fs would be sufficient.
Fred
--
Fred Schaettgen
Sch@ttgen.net
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Recursive modfied-timestamp?
2004-12-31 9:47 Recursive modfied-timestamp? Fred Schaettgen
2004-12-31 22:49 ` David Masover
@ 2005-01-01 0:51 ` Alexander G. M. Smith
2005-01-01 21:49 ` Hans Reiser
2 siblings, 0 replies; 15+ messages in thread
From: Alexander G. M. Smith @ 2005-01-01 0:51 UTC (permalink / raw)
To: Fred Schaettgen; +Cc: reiserfs-list
Fred Schaettgen wrote on Fri, 31 Dec 2004 10:47:14 +0100:
> The purpose btw. is to find all modified files in a tree as fast as possible.
> There are quite a lot of application which would benefit from it: desktop
> search engines, locate, build systems, tools which visualize contents of a
> file system (like fsview in KDE), backup tools etc.
Does it have to be recursive? BeOS has an index for the last modified date
of all files so it's easy to find all files modified in a given range of
dates. I expect that modern file systems could have something similar.
However, the BeOS index system is global to a disk volume, so finding
recently changed files in a tree means finding recent files then throwing
out the ones outside the tree. That awkwardness has grated against the
nerves of many a BeOS user. But nobody has sat down to figure out a
better solution to the underlying problem (indices stored per directory?).
- Alex
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Recursive modfied-timestamp?
2004-12-31 9:47 Recursive modfied-timestamp? Fred Schaettgen
2004-12-31 22:49 ` David Masover
2005-01-01 0:51 ` Recursive modfied-timestamp? Alexander G. M. Smith
@ 2005-01-01 21:49 ` Hans Reiser
2005-01-02 4:22 ` AMD64/Reiser4 testing and problems Isaac Chanin
2 siblings, 1 reply; 15+ messages in thread
From: Hans Reiser @ 2005-01-01 21:49 UTC (permalink / raw)
To: Fred Schaettgen; +Cc: reiserfs-list
Fred Schaettgen wrote:
>Hi,
>
>Does reiser4 support something like recursive last-modified-timestamps? What I
>mean is an attribute which contains the latest modification date of all
>subdirectories and files below a given directory.
>
>Actually I am also curios if there are any other linux file system which
>support that. The reason I'm asking on the reiserfs mailinglist is that
>reiser4 seems to be the filesystem which is most open for new features.
>Could this be implemented as some sort of plugin for reiser4? Or does/will
>reiser4 support any other concepts which can be used for that purpose?
>
>The purpose btw. is to find all modified files in a tree as fast as possible.
>There are quite a lot of application which would benefit from it: desktop
>search engines, locate, build systems, tools which visualize contents of a
>file system (like fsview in KDE), backup tools etc.
>
>I know that modifying an attibute recursively on every update of the stat data
>would have a huge perfomance impact, but there are many things that could be
>done to keep the extra load low for most of the time.
>It seem very likely that this is an idea which was discussed over and over
>again already, but I really didn't find much about it. As a KDE developer,
>I'm not much involved in filesystems, so maybe I'm just looking for the wrong
>keywords?
>
>Fred
>
>
>
We intend to implement inheritance of metadata, which could be made to
accomplish what you are asking for I think. Nobody is coding that at
the moment though....
We are indeed open to semantic enhancements.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: AMD64/Reiser4 testing and problems
2005-01-01 21:49 ` Hans Reiser
@ 2005-01-02 4:22 ` Isaac Chanin
0 siblings, 0 replies; 15+ messages in thread
From: Isaac Chanin @ 2005-01-02 4:22 UTC (permalink / raw)
To: reiserfs-list
Hello all,
Just responding to my previous messages a bit more. Not too much new to
say, aside from a bunch of new bug report/error messages.
If you're interested they're at http://users.wpi.edu/~chanin/r4more.txt.
The old 'random' bug is still popping up. Definitely looks like it has
something to do with the reiser4_find_next_zero_bit function in bitmap.c.
I've looked through the file (and includes) and haven't found anything
obvious - but my C skills are quite what they should be for debugging
something like this.
Also, there appears to be a new bug, or perhaps simple fluke event that
resulted in some random file courruption - I've yet to formulate
uninformed opnions about what caused that one yet, however.
Finally, if there's no need for more bug reports - apparently my last one
did not warrant a patch or response (or some people just enjoy the season
more than I do) - feel free to tell me. I do recall reading that a
x86_64 machine would be on its way to namesys soon.
Thanks,
Isaac
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Recursive modfied-timestamp?
@ 2005-01-01 2:04 Fred Schaettgen
2005-01-02 4:27 ` David Masover
0 siblings, 1 reply; 15+ messages in thread
From: Fred Schaettgen @ 2005-01-01 2:04 UTC (permalink / raw)
To: reiserfs-list
On Saturday 01 January 2005 01:51, you wrote:
> Fred Schaettgen wrote on Fri, 31 Dec 2004 10:47:14 +0100:
> > The purpose btw. is to find all modified files in a tree as fast as
> > possible. There are quite a lot of application which would benefit from
> > it: desktop search engines, locate, build systems, tools which visualize
> > contents of a file system (like fsview in KDE), backup tools etc.
>
> Does it have to be recursive? BeOS has an index for the last modified date
> of all files so it's easy to find all files modified in a given range of
> dates. I expect that modern file systems could have something similar.
>
> However, the BeOS index system is global to a disk volume, so finding
> recently changed files in a tree means finding recent files then throwing
> out the ones outside the tree. That awkwardness has grated against the
> nerves of many a BeOS user. But nobody has sat down to figure out a
> better solution to the underlying problem (indices stored per directory?).
I see.. I didn't know about BeOS' file system, thanks :)
Having an index over various attributes is certainly a powerful feature. But
wouldn't it be better if we could extend the file system in a *minimal* way
which still makes it possible to create such indices efficiently in
userspace?
Moving too much logic into the file system has lots of drawbacks. It makes
the file system complicated, so it will be less likely to be implemented at
all. And if it's implemented, it much harder to keep it up to date than with
userspace programs. It's harder to debug and it's harder to accept for
people how want keep the file systems pure.
I'm not sure if my proposal in my other post in this thread would be more
efficient or easier to implement than a global index for the modification
times, but I guess it's more or less the same in the end.
I don't know how the BeOS indices work, but it sounds like the index is
updated each time a file is modified, which is most likely more time
consuming than my proposal, where the changed-file-list is only updated when
a file is changed for the first time after the recursive mtime was requested
for it. So the performance for frequently updated files won't suffer much.
But from an application point of view, a BeOS-syle mtime-index would be just
as good, especially if there is a userspace layer in between, which allows
per-directory mtime range request or similar.
The changes to the file system itself should just so simple that we don't
have to fight a never ending war for a whole new paradigma.
Fred
--
Fred Schaettgen
Sch@ttgen.net
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Recursive modfied-timestamp?
2005-01-01 2:04 Recursive modfied-timestamp? Fred Schaettgen
@ 2005-01-02 4:27 ` David Masover
2005-01-02 12:08 ` Fred Schaettgen
0 siblings, 1 reply; 15+ messages in thread
From: David Masover @ 2005-01-02 4:27 UTC (permalink / raw)
To: Fred Schaettgen; +Cc: reiserfs-list
Fred Schaettgen wrote:
> On Saturday 01 January 2005 01:51, you wrote:
>
>>Fred Schaettgen wrote on Fri, 31 Dec 2004 10:47:14 +0100:
>>
>>>The purpose btw. is to find all modified files in a tree as fast as
>>>possible. There are quite a lot of application which would benefit from
>>>it: desktop search engines, locate, build systems, tools which visualize
>>>contents of a file system (like fsview in KDE), backup tools etc.
>>
>>Does it have to be recursive? BeOS has an index for the last modified date
>>of all files so it's easy to find all files modified in a given range of
>>dates. I expect that modern file systems could have something similar.
>>
>>However, the BeOS index system is global to a disk volume, so finding
>>recently changed files in a tree means finding recent files then throwing
>>out the ones outside the tree. That awkwardness has grated against the
>>nerves of many a BeOS user. But nobody has sat down to figure out a
>>better solution to the underlying problem (indices stored per directory?).
>
>
> Moving too much logic into the file system has lots of drawbacks. It makes
> the file system complicated, so it will be less likely to be implemented at
> all. And if it's implemented, it much harder to keep it up to date than with
> userspace programs. It's harder to debug and it's harder to accept for
> people how want keep the file systems pure.
It also has the advantage of being faster, more universal, and more
complete as a solution. Remind me why you wanted your mtimes in the kernel?
> I don't know how the BeOS indices work, but it sounds like the index is
> updated each time a file is modified, which is most likely more time
> consuming than my proposal, where the changed-file-list is only updated when
> a file is changed for the first time after the recursive mtime was requested
> for it. So the performance for frequently updated files won't suffer much.
Speaking of which, how do you make this atomic without more help from
the filesystem?
> The changes to the file system itself should just so simple that we don't
> have to fight a never ending war for a whole new paradigma.
Which is why I like the caching idea. See my last post. Support for
simple userland plugins, combined with intelligent caching by the
kernel, means we don't have to touch the kernel or the filesystem for
most kinds of customizable things we want to do. Your mtime idea is
nice -- it can be done with just those two things in the kernel. What
about a zipfile which is built from a directory tree every time it's
read, but only if files in that tree have changed? Not possible with
only recursive-mtime support (though it would require it).
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Recursive modfied-timestamp?
2005-01-02 4:27 ` David Masover
@ 2005-01-02 12:08 ` Fred Schaettgen
0 siblings, 0 replies; 15+ messages in thread
From: Fred Schaettgen @ 2005-01-02 12:08 UTC (permalink / raw)
To: reiserfs-list
On Sunday 02 January 2005 05:27, David Masover wrote:
> > Moving too much logic into the file system has lots of drawbacks. It
> > makes the file system complicated, so it will be less likely to be
> > implemented at all. And if it's implemented, it much harder to keep it up
> > to date than with userspace programs. It's harder to debug and it's
> > harder to accept for people how want keep the file systems pure.
>
> It also has the advantage of being faster, more universal, and more
> complete as a solution. Remind me why you wanted your mtimes in the
> kernel?
I was just asking if such a feature is supported already. As I said before,
the mtimes itself could be provided by a userspace library and it doesn't
matter if they are stored in EAs or a userspace database or whatever.
Actually I'm not concerned about this mtimes itself, but I want to invalidate
items in a metadata cache if a file changes. The most simple feature (from a
user's point of view) a filesystem could provide to make this cache
invalidation more efficient seemed to be recursive mtimes, and so I was
asking.
> > I don't know how the BeOS indices work, but it sounds like the index is
> > updated each time a file is modified, which is most likely more time
> > consuming than my proposal, where the changed-file-list is only updated
> > when a file is changed for the first time after the recursive mtime was
> > requested for it. So the performance for frequently updated files won't
> > suffer much.
>
> Speaking of which, how do you make this atomic without more help from
> the filesystem?
I don't know. I never worked on a file system before ;) But it doesn't need to
be fully atomic. If a file is changed, then the entry in the changed-list
must be created, but it would be acceptable if a file was listed as changed
without the change really happening for some reason. This will result in an
increased overhead when the mtimes are rebuild, but it won't break anything.
I don't know if this weaker requirement is much easier to fulfill though.
> > The changes to the file system itself should just so simple that we don't
> > have to fight a never ending war for a whole new paradigma.
>
> Which is why I like the caching idea. See my last post. Support for
> simple userland plugins, combined with intelligent caching by the
> kernel, means we don't have to touch the kernel or the filesystem for
> most kinds of customizable things we want to do. Your mtime idea is
> nice -- it can be done with just those two things in the kernel. What
> about a zipfile which is built from a directory tree every time it's
> read, but only if files in that tree have changed? Not possible with
> only recursive-mtime support (though it would require it).
This is in fact possible with recursive mtimes, but just like you need
userspace support for the mtimes itself in my proposal, you would also need
userspace support to update your zip files. It wouldn't be transparent, but
it would still be efficient.
We are talking about different goals here.
You want the file system to do things automatically, which would have to be
done by the user otherwise - probably by having the FS calling back to
userspace.
All I would like to have is help from the file system to do certain things
(updating userspace indices) much more efficiently than it is possible today.
The two concepts you need for your zipfile scenario is metadata-cache
invalidation by the FS and hooks to call back to userspace. I am not
concerned about the latter, but I really want a solution to invalidate cached
metadata. You want to have the FS do that. By doing that you take away the
choice of how to store the cached metadata. If the FS just reports the
changed files somehow, you can decide what metadata needs to be invalidated
in userspace.
And since you have to call back to userspace for the zipfile scenario anyway,
why not let userspace do the metadata cache invalidation too?
Fred
--
Fred Schaettgen
Sch@ttgen.net
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2005-01-02 12:08 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-12-31 9:47 Recursive modfied-timestamp? Fred Schaettgen
2004-12-31 22:49 ` David Masover
2005-01-01 0:43 ` Recursive modified-timestamp? Fred Schaettgen
2005-01-01 3:12 ` Alexander G. M. Smith
2005-01-01 11:56 ` Fred Schaettgen
2005-01-01 12:28 ` Piotr Neuman
2005-01-01 13:20 ` Fred Schaettgen
2005-01-01 17:08 ` Piotr Neuman
2005-01-01 18:18 ` Fred Schaettgen
2005-01-01 0:51 ` Recursive modfied-timestamp? Alexander G. M. Smith
2005-01-01 21:49 ` Hans Reiser
2005-01-02 4:22 ` AMD64/Reiser4 testing and problems Isaac Chanin
-- strict thread matches above, loose matches on Subject: below --
2005-01-01 2:04 Recursive modfied-timestamp? Fred Schaettgen
2005-01-02 4:27 ` David Masover
2005-01-02 12:08 ` Fred Schaettgen
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.