From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: with ECARTIS (v1.0.0; list xfs); Wed, 09 Apr 2008 21:34:10 -0700 (PDT) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.11.20060308/8.12.11/SuSE Linux 0.7) with SMTP id m3A4Xxbo030828 for ; Wed, 9 Apr 2008 21:34:02 -0700 Date: Thu, 10 Apr 2008 14:34:32 +1000 From: David Chinner Subject: Re: [Patch] unique per-AG inode generation number initialisation Message-ID: <20080410043432.GH108924158@sgi.com> References: <20080401231815.GW103491721@sgi.com> <20080407125738.GD27350@infradead.org> <20080407215203.GB108924158@sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080407215203.GB108924158@sgi.com> Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com List-Id: xfs To: David Chinner Cc: Christoph Hellwig , xfs-dev , xfs-oss Ping? Any further concerns on this? I'd like to get this resolved quickly..... Cheers, Dave. On Tue, Apr 08, 2008 at 07:52:03AM +1000, David Chinner wrote: > On Mon, Apr 07, 2008 at 08:57:38AM -0400, Christoph Hellwig wrote: > > I don't really like this. The chance to hit a previously used generation > > seems to high. > > The chance to hit an existing generation number is almost non-existant. > > The counter is incremented on every allocation and not just when > inode chunks are allocated on disk. Hence a series of "allocate > chunk, unlink + free chunk, realloc chunk" is guaranteed to get a > higher generation number on reallocation, as is the "allocate a > chunk, while [1] {allocate; unlink}, unlink chunk, reallocate > chunk." These are the issues that are causing use problems right > now. > > The generation number won't get reused at all until it wraps at 2^32 > allocations within the AG, and then you've got to have a chunk of inodes > get freed and reallocated at the same time the counter matches an inode > generation number. While not impossible, it'll be pretty rare.... > > > What about making the first few bits of each generation > > number a per-ag counter that's incremented anytime we deallocate an inode > > cluster? > > First thing I considered - increment on chunk freeing is not > sufficient guarantee of short-term uniqueness. To guarantee short > term uniqueness, the generation number used to initialise the inode > chunk if it is immediately reallocated needs to be greater than the > maximum used by any inode in the chunk that got freed. Now the "counter" > becomes a "maximum generation number used in the AG" value. This > also adds significant complexity to xfs_icluster_free() as we have to > look at every inode in the chunk and not just the ones that are > in-core. > > FWIW, the biggest complexity with this approach is wrapping - how do > you tell what the highest highest generation number in the inode > chunk being freed is when some have wrapped through zero? > > I basically gave up on this approach because of the extra complexity > and nasty, untestable corner cases it introduced into code that is > already complex. A simple incrementing counter solves the short-term > uniqueness problem while still making it very hard to get duplicates in > the long term. If you really, really need long term uniqueness, then > use 'ikeep'. > > Cheers, > > Dave. > -- > Dave Chinner > Principal Engineer > SGI Australian Software Group -- Dave Chinner Principal Engineer SGI Australian Software Group