From: Jamie Lokier <jamie@shareable.org>
To: Sage Weil <sage@newdream.net>
Cc: "J. Bruce Fields" <bfields@fieldses.org>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
ceph-devel@lists.sf.net
Subject: Re: Recursive directory accounting for size, ctime, etc.
Date: Tue, 15 Jul 2008 22:56:26 +0100 [thread overview]
Message-ID: <20080715215626.GB9222@shareable.org> (raw)
In-Reply-To: <Pine.LNX.4.64.0807151326130.1976@cobra.newdream.net>
Sage Weil wrote:
> Having fully up to date values would definitely be nice, but unfortunately
> doesn't play nice with the fact that different parts of the directory
> hierarchy may be managed by different metadata servers. A primary goal in
> implementing this was to minimize any impact on performance. The uses I
> had I mind were more in line with quota-based accounting than cache
> validation.
>
> I think I can adjust the propagation heuristics/timeouts to make updates
> seem more or less immediate to a user in most cases, but that won't be
> sufficient for a tool like git that needs to reliably identify very recent
> updates. For backup software wanting a consistent file system image, it
> should really be operating on a snapshot as well, in which case a delay
> between taking the snapshot and starting the scan for changes would allow
> those values to propagate.
I have a similar thing in a distributed database (with some
filesystem-like characteristics) I'm working on.
The way I handle propagating compound values which are derived from
multiple metadata servers, like that, is using leases. (Similar to
fcntl F_GETLEASE, Windows oplocks, and CPU MESI protocol).
E.g. when a single server is about to modify a file, it grabs a lease
covering the metadata for this file _plus_ leases for the aggregated
values for all parent directories, prior to allowing the file
modification. The first file modification will be delayed briefly to
do this, but then subsequent modifications, including to other files
covered by the same directories, are instant because those servers
already have leases. They can renew them asynchronously as needed.
When a client wants the aggregate values for a directory (i.e. total
size of all files recursively under it), it acquires a lease on that
directory only. To do that, it has to query all the metadata servers
which currently hold a lease covering that.
The net effect is you can use the results for cache validation as the
git example. There's a network ping-pong if someone is alternately
modifying a file under the tree and reading the aggregate value from a
parent directory elsewhere, but at least the values are always
consistent. Most times, there is no ping-pong because that's not a
common scenario.
(In my project, you can also specify that some queries are allowed to
be a little out of date, to avoid lease acquisition delays if getting
an inaccurate result fast is better. That's useful for GUIs, but not
suitable for git-like cache validation.)
-- Jamie
next prev parent reply other threads:[~2008-07-15 21:56 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-07-15 18:28 Recursive directory accounting for size, ctime, etc Sage Weil
2008-07-15 19:47 ` Andreas Dilger
2008-07-15 20:26 ` Sage Weil
2008-07-15 19:53 ` J. Bruce Fields
2008-07-15 20:41 ` Sage Weil
2008-07-15 20:48 ` J. Bruce Fields
2008-07-15 21:16 ` Sage Weil
2008-07-15 22:45 ` J. Bruce Fields
2008-07-15 21:44 ` Jamie Lokier
2008-07-15 21:51 ` Sage Weil
2008-07-15 21:56 ` Jamie Lokier [this message]
2008-08-05 18:26 ` Pavel Machek
2008-08-08 13:11 ` John Stoffel
2008-08-08 23:32 ` Sage Weil
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080715215626.GB9222@shareable.org \
--to=jamie@shareable.org \
--cc=bfields@fieldses.org \
--cc=ceph-devel@lists.sf.net \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=sage@newdream.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).