From: Fred Schaettgen <namesys.Sch@ttgen.net>
To: reiserfs-list@namesys.com
Subject: Re: Recursive modified-timestamp?
Date: Sat, 1 Jan 2005 01:43:48 +0100 [thread overview]
Message-ID: <200501010143.48592.namesys.Sch@ttgen.net> (raw)
In-Reply-To: <41D5D762.8050603@slaphack.com>
On Friday 31 December 2004 23:49, David Masover wrote:
...
> Seems like all of those are really problems of caching/metadata, or more
> accurately, "things which Make would understand". How about some more
> general way of caching or cache invalidation?
An entry in a metadata cache must become invalid if the corresponding file
changes. That's exactly what my question was about. I don't want the
filesystem to manage the metadata, just an efficient way to find files with
outdated metadata. From an application's point of view, the recursive
modified-timestamps look like the most intuitive solution for me.
> Here's how I would do it. I'd make a standard for object dependencies
> within the filesystem, some way like "make". This is the same thing I
> ranted about as a way for accessing the contents of zipfiles as part of
> the filesystem, without a performance hit. (cat foo.zip/bar.txt)
I don't want to see that much in the file system itself. I wouldn't even care
if these timestamps had to be retrieved with the help of a userspace daemon
and a library. But without some help of the filesystem itself you always have
to traverse the whole directory tree to find modified files.
> For instance, your search engine needs an index, which depends on (is
> built from) all the files in the filesystem except itself. Thus you
> might have an index for each folder (starting with /). Each index
> depends on the indices of its subdirectories. When a search is run,
> everything has to be rebuilt, in "make"-like fashion, but it gives you
> one global place to add the "many things that could be done" to improve
> performance for all systems that do this kind of thing -- search engins,
> locate, build systems, fsview, and backup tools.
How would the filesystem help in that scenario? It could invalidate or delete
the (sub)index or metadata cache if one of the files it depends on changes,
ok. But can't you do that just as efficiently in userspace if the filesystems
just provides the recursive timestamps?
...
> Seems like people use things like FAM nowdays. But you're
> right, there needs to be a better way. For instance, your desktop
> search engine should only rebuild even the stat data when a user enters
> a query, but it should be able to do it quickly (without searching the
> whole tree).
Yes, this is the problem. And recursively propagating modification timestamps
look like a good solution to me. I am not saying that the file system should
do that iself. Timestamps with this modified semantics would just exist as an
interface to the applications. But the filesystem must help to keep these
timestamps up to date.
The file system itself could help for instance by providing a new
"change-monitor"-flag for a file. This flag would be set only from userspace
and reset when the file is modified. If the flag is still set when the file
is being modified, the filesystem would then create a symlink or something
like for the file in a special directory.
The contents of this changed-files-directory will then be collected and
removed by a daemon, which manages the recursive-mtime-database (no matter if
they are stored as extended attributes or in a Berkely DB or whatever).
Now each application which has to manage a metadata cache could ask that
daemon for the rec-mtime of / first and descent deeper if the rec-mtime is
more recent than a stored timestamp etc.
Actually the "flag" would have to be something like a list of path names,
since a file can be hard linked, but that doesn't change much (I hope).
With this approach, most of the work can be delayed until an application
actually asks for rec-mtimes. The overhead while writing to a file (when the
stat data is updated) would be to check if the change monitor flag is set and
only if it is, remove it and put one - or sometimes a few - symlinks into the
special folder with links to changed files.
Until this point there is no propagating changes up till "/". That would all
be done by a userspace daemon at a later time.
If just the test for the existance of the change monitor flag could be made
efficient enough, then the overhead during regular operation would be
negligible.
I hope that this outline was clear enough to let you tell me if this is
possible or why it isn't :)
bye and a happy new year to one half of the world!
Fred
--
Fred Schaettgen
Sch@ttgen.net
next prev parent reply other threads:[~2005-01-01 0:43 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2004-12-31 9:47 Recursive modfied-timestamp? Fred Schaettgen
2004-12-31 22:49 ` David Masover
2005-01-01 0:43 ` Fred Schaettgen [this message]
2005-01-01 3:12 ` Recursive modified-timestamp? Alexander G. M. Smith
2005-01-01 11:56 ` Fred Schaettgen
2005-01-01 12:28 ` Piotr Neuman
2005-01-01 13:20 ` Fred Schaettgen
2005-01-01 17:08 ` Piotr Neuman
2005-01-01 18:18 ` Fred Schaettgen
2005-01-01 0:51 ` Recursive modfied-timestamp? Alexander G. M. Smith
2005-01-01 21:49 ` Hans Reiser
2005-01-02 4:22 ` AMD64/Reiser4 testing and problems Isaac Chanin
-- strict thread matches above, loose matches on Subject: below --
2005-01-01 18:59 Recursive modified-timestamp? Alexander G. M. Smith
2005-01-02 17:52 ` Hans Reiser
2005-01-06 22:31 ` David Masover
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200501010143.48592.namesys.Sch@ttgen.net \
--to=namesys.sch@ttgen.net \
--cc=reiserfs-list@namesys.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.