All of lore.kernel.org
 help / color / mirror / Atom feed
From: "J. Bruce Fields" <bfields@fieldses.org>
To: Sage Weil <sage@newdream.net>
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	ceph-devel@lists.sourceforge.net
Subject: Re: Recursive directory accounting for size, ctime, etc.
Date: Tue, 15 Jul 2008 16:48:12 -0400	[thread overview]
Message-ID: <20080715204812.GD25803@fieldses.org> (raw)
In-Reply-To: <Pine.LNX.4.64.0807151326130.1976@cobra.newdream.net>

On Tue, Jul 15, 2008 at 01:41:25PM -0700, Sage Weil wrote:
> On Tue, 15 Jul 2008, J. Bruce Fields wrote:
> > >  - There is some built-in delay before statistics fully propagate up 
> > > toward the root of the hierarchy.  Changes are propagated 
> > > opportunistically when lock/lease state allows, with an upper bound of (by 
> > > default) ~30 seconds for each level of directory nesting.
> > 
> > That makes it less useful, e.g., for somebody with cached data trying to
> > validate their cache, or for something like git trying to check a
> > directory tree for changes.
> 
> Having fully up to date values would definitely be nice, but unfortunately 
> doesn't play nice with the fact that different parts of the directory 
> hierarchy may be managed by different metadata servers.  A primary goal in 
> implementing this was to minimize any impact on performance.  The uses I 
> had I mind were more in line with quota-based accounting than cache 
> validation.

Fair enough.

> I think I can adjust the propagation heuristics/timeouts to make updates 
> seem more or less immediate to a user in most cases, but that won't be 
> sufficient for a tool like git that needs to reliably identify very recent 
> updates.  For backup software wanting a consistent file system image, it 
> should really be operating on a snapshot as well, in which case a delay 
> between taking the snapshot and starting the scan for changes would allow 
> those values to propagate.
> 
> > >  - Ceph internally distinguishes between multiple links to the same file 
> > > (there is a single 'primary' link, and then zero or more 'remote' links).  
> > > Only the primary link contributes toward the 'rbytes' total.
> > 
> > Is that only true for 'rbytes'?
> 
> The same goes for rctime.  As far as the recursive stats go, the other 
> stats (file/directory counts) aren't affected.  The primary/remote 
> hard link distinction is fundamental to the way metadata is internally 
> managed and stored by the MDS, though, if that's what you mean (inode 
> content is embedded with the primary link's directory metadata).

I just wonder how one would explain to users (or application writers)
why changes to a file are reflected in the parent's rctime in one case,
and not in another, especially if the primary link is otherwise
indistinguishable from the others.  The symptoms could be a bit
mysterious from their point of view.

--b.

WARNING: multiple messages have this Message-ID (diff)
From: "J. Bruce Fields" <bfields@fieldses.org>
To: Sage Weil <sage@newdream.net>
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	ceph-devel@lists.sf.net
Subject: Re: Recursive directory accounting for size, ctime, etc.
Date: Tue, 15 Jul 2008 16:48:12 -0400	[thread overview]
Message-ID: <20080715204812.GD25803@fieldses.org> (raw)
In-Reply-To: <Pine.LNX.4.64.0807151326130.1976@cobra.newdream.net>

On Tue, Jul 15, 2008 at 01:41:25PM -0700, Sage Weil wrote:
> On Tue, 15 Jul 2008, J. Bruce Fields wrote:
> > >  - There is some built-in delay before statistics fully propagate up 
> > > toward the root of the hierarchy.  Changes are propagated 
> > > opportunistically when lock/lease state allows, with an upper bound of (by 
> > > default) ~30 seconds for each level of directory nesting.
> > 
> > That makes it less useful, e.g., for somebody with cached data trying to
> > validate their cache, or for something like git trying to check a
> > directory tree for changes.
> 
> Having fully up to date values would definitely be nice, but unfortunately 
> doesn't play nice with the fact that different parts of the directory 
> hierarchy may be managed by different metadata servers.  A primary goal in 
> implementing this was to minimize any impact on performance.  The uses I 
> had I mind were more in line with quota-based accounting than cache 
> validation.

Fair enough.

> I think I can adjust the propagation heuristics/timeouts to make updates 
> seem more or less immediate to a user in most cases, but that won't be 
> sufficient for a tool like git that needs to reliably identify very recent 
> updates.  For backup software wanting a consistent file system image, it 
> should really be operating on a snapshot as well, in which case a delay 
> between taking the snapshot and starting the scan for changes would allow 
> those values to propagate.
> 
> > >  - Ceph internally distinguishes between multiple links to the same file 
> > > (there is a single 'primary' link, and then zero or more 'remote' links).  
> > > Only the primary link contributes toward the 'rbytes' total.
> > 
> > Is that only true for 'rbytes'?
> 
> The same goes for rctime.  As far as the recursive stats go, the other 
> stats (file/directory counts) aren't affected.  The primary/remote 
> hard link distinction is fundamental to the way metadata is internally 
> managed and stored by the MDS, though, if that's what you mean (inode 
> content is embedded with the primary link's directory metadata).

I just wonder how one would explain to users (or application writers)
why changes to a file are reflected in the parent's rctime in one case,
and not in another, especially if the primary link is otherwise
indistinguishable from the others.  The symptoms could be a bit
mysterious from their point of view.

--b.

  reply	other threads:[~2008-07-15 20:48 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-07-15 18:28 Recursive directory accounting for size, ctime, etc Sage Weil
2008-07-15 18:28 ` Sage Weil
2008-07-15 19:47 ` Andreas Dilger
2008-07-15 20:26   ` Sage Weil
2008-07-15 19:53 ` J. Bruce Fields
2008-07-15 19:53   ` J. Bruce Fields
2008-07-15 20:41   ` Sage Weil
2008-07-15 20:41     ` Sage Weil
2008-07-15 20:48     ` J. Bruce Fields [this message]
2008-07-15 20:48       ` J. Bruce Fields
2008-07-15 21:16       ` Sage Weil
2008-07-15 21:16         ` Sage Weil
2008-07-15 22:45         ` J. Bruce Fields
2008-07-15 22:45           ` J. Bruce Fields
2008-07-15 21:44       ` Jamie Lokier
2008-07-15 21:44         ` Jamie Lokier
2008-07-15 21:51         ` Sage Weil
2008-07-15 21:51           ` Sage Weil
2008-07-15 21:56     ` Jamie Lokier
2008-07-15 21:56       ` Jamie Lokier
2008-08-05 18:26 ` Pavel Machek
2008-08-05 18:26   ` Pavel Machek
2008-08-08 13:11   ` John Stoffel
2008-08-08 23:32     ` Sage Weil

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080715204812.GD25803@fieldses.org \
    --to=bfields@fieldses.org \
    --cc=ceph-devel@lists.sourceforge.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sage@newdream.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.