All of lore.kernel.org
 help / color / mirror / Atom feed
From: Fred Schaettgen <namesys.Sch@ttgen.net>
To: reiserfs-list@namesys.com
Subject: Re: Recursive modified-timestamp?
Date: Sat, 1 Jan 2005 01:43:48 +0100	[thread overview]
Message-ID: <200501010143.48592.namesys.Sch@ttgen.net> (raw)
In-Reply-To: <41D5D762.8050603@slaphack.com>

On Friday 31 December 2004 23:49, David Masover wrote:
...
> Seems like all of those are really problems of caching/metadata, or more
> accurately, "things which Make would understand".  How about some more
> general way of caching or cache invalidation?

An entry in a metadata cache must become invalid if the corresponding file 
changes. That's exactly what my question was about. I don't want the 
filesystem to manage the metadata, just an efficient way to find files with 
outdated metadata. From an application's point of view, the recursive 
modified-timestamps look like the most intuitive solution for me. 

> Here's how I would do it.  I'd make a standard for object dependencies
> within the filesystem, some way like "make".  This is the same thing I
> ranted about as a way for accessing the contents of zipfiles as part of
> the filesystem, without a performance hit.  (cat foo.zip/bar.txt)

I don't want to see that much in the file system itself. I wouldn't even care 
if these timestamps had to be retrieved with the help of a userspace daemon 
and a library. But without some help of the filesystem itself you always have 
to traverse the whole directory tree to find modified files. 

> For instance, your search engine needs an index, which depends on (is
> built from) all the files in the filesystem except itself.  Thus you
> might have an index for each folder (starting with /).  Each index
> depends on the indices of its subdirectories.  When a search is run,
> everything has to be rebuilt, in "make"-like fashion, but it gives you
> one global place to add the "many things that could be done" to improve
> performance for all systems that do this kind of thing -- search engins,
> locate, build systems, fsview, and backup tools.

How would the filesystem help in that scenario? It could invalidate or delete 
the (sub)index or metadata cache if one of the files it depends on changes, 
ok. But can't you do that just as efficiently in userspace if the filesystems 
just provides the recursive timestamps?

...
> Seems like people use things like FAM nowdays.  But you're
> right, there needs to be a better way.  For instance, your desktop
> search engine should only rebuild even the stat data when a user enters
> a query, but it should be able to do it quickly (without searching the
> whole tree).

Yes, this is the problem. And recursively propagating modification timestamps 
look like a good solution to me. I am not saying that the file system should 
do that iself. Timestamps with this modified semantics would just exist as an 
interface to the applications. But the filesystem must help to keep these 
timestamps up to date.

The file system itself could help for instance by providing a new 
"change-monitor"-flag for a file. This flag would be set only from userspace 
and reset when the file is modified. If the flag is still set when the file 
is being modified, the filesystem would then create a symlink or something 
like for the file in a special directory.
The contents of this changed-files-directory will then be collected and 
removed by a daemon, which manages the recursive-mtime-database (no matter if 
they are stored as extended attributes or in a Berkely DB or whatever).
Now each application which has to manage a metadata cache could ask that 
daemon for the rec-mtime of / first and descent deeper if the rec-mtime is 
more recent than a stored timestamp etc.
Actually the "flag" would have to be something like a list of path names, 
since a file can be hard linked, but that doesn't change much (I hope).

With this approach, most of the work can be delayed until an application 
actually asks for rec-mtimes. The overhead while writing to a file (when the 
stat data is updated) would be to check if the change monitor flag is set and 
only if it is, remove it and put one - or sometimes a few - symlinks into the 
special folder with links to changed files.
Until this point there is no propagating changes up till "/". That would all 
be done by a userspace daemon at a later time. 
If just the test for the existance of the change monitor flag could be made 
efficient enough, then the overhead during regular operation would be 
negligible. 
I hope that this outline was clear enough to let you tell me if this is 
possible or why it isn't :)

bye and a happy new year to one half of the world!
Fred

-- 
Fred Schaettgen
Sch@ttgen.net

  reply	other threads:[~2005-01-01  0:43 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-12-31  9:47 Recursive modfied-timestamp? Fred Schaettgen
2004-12-31 22:49 ` David Masover
2005-01-01  0:43   ` Fred Schaettgen [this message]
2005-01-01  3:12     ` Recursive modified-timestamp? Alexander G. M. Smith
2005-01-01 11:56       ` Fred Schaettgen
2005-01-01 12:28       ` Piotr Neuman
2005-01-01 13:20         ` Fred Schaettgen
2005-01-01 17:08           ` Piotr Neuman
2005-01-01 18:18             ` Fred Schaettgen
2005-01-01  0:51 ` Recursive modfied-timestamp? Alexander G. M. Smith
2005-01-01 21:49 ` Hans Reiser
2005-01-02  4:22   ` AMD64/Reiser4 testing and problems Isaac Chanin
  -- strict thread matches above, loose matches on Subject: below --
2005-01-01 18:59 Recursive modified-timestamp? Alexander G. M. Smith
2005-01-02 17:52 ` Hans Reiser
2005-01-06 22:31   ` David Masover

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200501010143.48592.namesys.Sch@ttgen.net \
    --to=namesys.sch@ttgen.net \
    --cc=reiserfs-list@namesys.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.