linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "J. Bruce Fields" <bfields@fieldses.org>
To: Sage Weil <sage@newdream.net>
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	ceph-devel@lists.sf.net
Subject: Re: Recursive directory accounting for size, ctime, etc.
Date: Tue, 15 Jul 2008 16:48:12 -0400	[thread overview]
Message-ID: <20080715204812.GD25803@fieldses.org> (raw)
In-Reply-To: <Pine.LNX.4.64.0807151326130.1976@cobra.newdream.net>

On Tue, Jul 15, 2008 at 01:41:25PM -0700, Sage Weil wrote:
> On Tue, 15 Jul 2008, J. Bruce Fields wrote:
> > >  - There is some built-in delay before statistics fully propagate up 
> > > toward the root of the hierarchy.  Changes are propagated 
> > > opportunistically when lock/lease state allows, with an upper bound of (by 
> > > default) ~30 seconds for each level of directory nesting.
> > 
> > That makes it less useful, e.g., for somebody with cached data trying to
> > validate their cache, or for something like git trying to check a
> > directory tree for changes.
> 
> Having fully up to date values would definitely be nice, but unfortunately 
> doesn't play nice with the fact that different parts of the directory 
> hierarchy may be managed by different metadata servers.  A primary goal in 
> implementing this was to minimize any impact on performance.  The uses I 
> had I mind were more in line with quota-based accounting than cache 
> validation.

Fair enough.

> I think I can adjust the propagation heuristics/timeouts to make updates 
> seem more or less immediate to a user in most cases, but that won't be 
> sufficient for a tool like git that needs to reliably identify very recent 
> updates.  For backup software wanting a consistent file system image, it 
> should really be operating on a snapshot as well, in which case a delay 
> between taking the snapshot and starting the scan for changes would allow 
> those values to propagate.
> 
> > >  - Ceph internally distinguishes between multiple links to the same file 
> > > (there is a single 'primary' link, and then zero or more 'remote' links).  
> > > Only the primary link contributes toward the 'rbytes' total.
> > 
> > Is that only true for 'rbytes'?
> 
> The same goes for rctime.  As far as the recursive stats go, the other 
> stats (file/directory counts) aren't affected.  The primary/remote 
> hard link distinction is fundamental to the way metadata is internally 
> managed and stored by the MDS, though, if that's what you mean (inode 
> content is embedded with the primary link's directory metadata).

I just wonder how one would explain to users (or application writers)
why changes to a file are reflected in the parent's rctime in one case,
and not in another, especially if the primary link is otherwise
indistinguishable from the others.  The symptoms could be a bit
mysterious from their point of view.

--b.

  reply	other threads:[~2008-07-15 20:48 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-07-15 18:28 Recursive directory accounting for size, ctime, etc Sage Weil
2008-07-15 19:47 ` Andreas Dilger
2008-07-15 20:26   ` Sage Weil
2008-07-15 19:53 ` J. Bruce Fields
2008-07-15 20:41   ` Sage Weil
2008-07-15 20:48     ` J. Bruce Fields [this message]
2008-07-15 21:16       ` Sage Weil
2008-07-15 22:45         ` J. Bruce Fields
2008-07-15 21:44       ` Jamie Lokier
2008-07-15 21:51         ` Sage Weil
2008-07-15 21:56     ` Jamie Lokier
2008-08-05 18:26 ` Pavel Machek
2008-08-08 13:11   ` John Stoffel
2008-08-08 23:32     ` Sage Weil

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080715204812.GD25803@fieldses.org \
    --to=bfields@fieldses.org \
    --cc=ceph-devel@lists.sf.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sage@newdream.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).