public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* review: increase bulkstat readahead window
@ 2006-07-25  3:50 Nathan Scott
  2006-07-25  9:40 ` Christoph Hellwig
  0 siblings, 1 reply; 12+ messages in thread
From: Nathan Scott @ 2006-07-25  3:50 UTC (permalink / raw)
  To: vapo; +Cc: xfs

Hi all,
    
We limit the amount of bulkstat readahead we can issue based on 
the size of the array of inode cluster records (irbuf), which we
allocate on each bulkstat call.  Increasing the size of this array
has shown noticable performance improvements, and given bulkstat
is always called to scan the filesystem from one end to the other,
we're going to have to issue that IO at some point, may as well do
it up front.  We don't want to get silly in sizing this buffer, 
though, as it needs to be a contiguous chunk of memory.  Here I've
increased it from 1 page to 4 pages, with some logic to halve the
size incrementally if we cant allocate that successfully (as we do
in one or two other places in XFS, for other things).

cheers.

-- 
Nathan


Index: xfs-linux/xfs_itable.c
===================================================================
--- xfs-linux.orig/xfs_itable.c	2006-07-25 11:59:26.144649250 +1000
+++ xfs-linux/xfs_itable.c	2006-07-25 12:01:53.734832500 +1000
@@ -325,6 +325,8 @@ xfs_bulkstat(
 	xfs_agino_t		gino;	/* current btree rec's start inode */
 	int			i;	/* loop index */
 	int			icount;	/* count of inodes good in irbuf */
+	int			irbsize; /* size of irec buffer in bytes */
+	unsigned int		kmflags; /* flags for allocating irec buffer */
 	xfs_ino_t		ino;	/* inode number (filesystem) */
 	xfs_inobt_rec_incore_t	*irbp;	/* current irec buffer pointer */
 	xfs_inobt_rec_incore_t	*irbuf;	/* start of irec buffer */
@@ -370,12 +372,20 @@ xfs_bulkstat(
 	nimask = ~(nicluster - 1);
 	nbcluster = nicluster >> mp->m_sb.sb_inopblog;
 	/*
-	 * Allocate a page-sized buffer for inode btree records.
-	 * We could try allocating something smaller, but for normal
-	 * calls we'll always (potentially) need the whole page.
+	 * Allocate a local buffer for inode cluster btree records.
+	 * This caps our maximum readahead window (so don't be stingy)
+	 * but we must handle the case where we can't get a contiguous
+	 * multi-page buffer, so we drop back toward pagesize; the end
+	 * case we ensure succeeds, via appropriate allocation flags.
 	 */
-	irbuf = kmem_alloc(NBPC, KM_SLEEP);
-	nirbuf = NBPC / sizeof(*irbuf);
+	irbsize = NBPP * 4;
+	kmflags = KM_SLEEP | KM_MAYFAIL;
+	while (!(irbuf = kmem_alloc(irbsize, kmflags))) {
+		if ((irbsize >>= 1) <= NBPP)
+			kmflags = KM_SLEEP;
+	}
+	nirbuf = irbsize / sizeof(*irbuf);
+
 	/*
 	 * Loop over the allocation groups, starting from the last
 	 * inode returned; 0 means start of the allocation group.
@@ -673,7 +683,7 @@ xfs_bulkstat(
 	/*
 	 * Done, we're either out of filesystem or space to put the data.
 	 */
-	kmem_free(irbuf, NBPC);
+	kmem_free(irbuf, irbsize);
 	*ubcountp = ubelem;
 	if (agno >= mp->m_sb.sb_agcount) {
 		/*

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: review: increase bulkstat readahead window
  2006-07-25  3:50 review: increase bulkstat readahead window Nathan Scott
@ 2006-07-25  9:40 ` Christoph Hellwig
  2006-07-25 22:37   ` Nathan Scott
  0 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2006-07-25  9:40 UTC (permalink / raw)
  To: Nathan Scott; +Cc: vapo, xfs

On Tue, Jul 25, 2006 at 01:50:04PM +1000, Nathan Scott wrote:
> Hi all,
>     
> We limit the amount of bulkstat readahead we can issue based on 
> the size of the array of inode cluster records (irbuf), which we
> allocate on each bulkstat call.  Increasing the size of this array
> has shown noticable performance improvements, and given bulkstat
> is always called to scan the filesystem from one end to the other,
> we're going to have to issue that IO at some point, may as well do
> it up front.  We don't want to get silly in sizing this buffer, 
> though, as it needs to be a contiguous chunk of memory.  Here I've
> increased it from 1 page to 4 pages, with some logic to halve the
> size incrementally if we cant allocate that successfully (as we do
> in one or two other places in XFS, for other things).

ok.  I wonder whether we should add a generic kmalloc_leastmost routine
(with a name better than that of course..)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: review: increase bulkstat readahead window
  2006-07-25  9:40 ` Christoph Hellwig
@ 2006-07-25 22:37   ` Nathan Scott
  2006-07-26 10:25     ` Christoph Hellwig
  0 siblings, 1 reply; 12+ messages in thread
From: Nathan Scott @ 2006-07-25 22:37 UTC (permalink / raw)
  To: Christoph Hellwig, cw; +Cc: vapo, xfs

On Tue, Jul 25, 2006 at 10:40:04AM +0100, Christoph Hellwig wrote:
> On Tue, Jul 25, 2006 at 01:50:04PM +1000, Nathan Scott wrote:
> > .. it up front.  We don't want to get silly in sizing this buffer, 
> > though, as it needs to be a contiguous chunk of memory.  Here I've
> > increased it from 1 page to 4 pages, with some logic to halve the
> > size incrementally if we cant allocate that successfully (as we do
> > in one or two other places in XFS, for other things).
> 
> ok.  I wonder whether we should add a generic kmalloc_leastmost routine
> (with a name better than that of course..)

Yeah, Chris suggested the same thing - probably we should, since two
people suggested it now. :)  The XFS users I know of are the inode
hash, the dquot hash, and this bulkstat code.  Oh, and probably the
attr_multi ioctl code should use this for its buffer too.  If you can
suggest a good interface, I'll have at it.

Semi-related, I have another patch which instruments our local memory
allocation routines to add a KM_LARGE flag - I've been using this to
locate and annotate the few remaining places where we will do multi-
page allocations inside XFS... any interest in this patch?  I've been
tossing up whether or not to merge it (its debug only, so no runtime
cost is added for usual case), just so we can always easily see where
the large allocations are, and trap any inadvertantly introduced new
ones... thoughts?

cheers.

-- 
Nathan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: review: increase bulkstat readahead window
  2006-07-25 22:37   ` Nathan Scott
@ 2006-07-26 10:25     ` Christoph Hellwig
  2006-07-27 23:17       ` Nathan Scott
  0 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2006-07-26 10:25 UTC (permalink / raw)
  To: Nathan Scott; +Cc: cw, vapo, xfs

On Wed, Jul 26, 2006 at 08:37:09AM +1000, Nathan Scott wrote:
> On Tue, Jul 25, 2006 at 10:40:04AM +0100, Christoph Hellwig wrote:
> > On Tue, Jul 25, 2006 at 01:50:04PM +1000, Nathan Scott wrote:
> > > .. it up front.  We don't want to get silly in sizing this buffer, 
> > > though, as it needs to be a contiguous chunk of memory.  Here I've
> > > increased it from 1 page to 4 pages, with some logic to halve the
> > > size incrementally if we cant allocate that successfully (as we do
> > > in one or two other places in XFS, for other things).
> > 
> > ok.  I wonder whether we should add a generic kmalloc_leastmost routine
> > (with a name better than that of course..)
> 
> Yeah, Chris suggested the same thing - probably we should, since two
> people suggested it now. :)  The XFS users I know of are the inode
> hash, the dquot hash, and this bulkstat code.  Oh, and probably the
> attr_multi ioctl code should use this for its buffer too.  If you can
> suggest a good interface, I'll have at it.

Maybe we can start with a common XFS routine first:


kmem_alloc_greedy(size_t *size, size_t minsize, unsigned int flags)
{
	void *ptr;

	while (!(ptr = kmem_alloc(*size, flags))) {
		if ((*size >>= 1) <= minsize)
			flags = KM_SLEEP;
	}

	return ptr;
}

> Semi-related, I have another patch which instruments our local memory
> allocation routines to add a KM_LARGE flag - I've been using this to
> locate and annotate the few remaining places where we will do multi-
> page allocations inside XFS... any interest in this patch?  I've been
> tossing up whether or not to merge it (its debug only, so no runtime
> cost is added for usual case), just so we can always easily see where
> the large allocations are, and trap any inadvertantly introduced new
> ones... thoughts?

I'm happy with that flag, but I'd prefer it only allocations with
KM_LARGE would go to vmalloc so that we can start to untangle the
allocator wrapping mess.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: review: increase bulkstat readahead window
  2006-07-26 10:25     ` Christoph Hellwig
@ 2006-07-27 23:17       ` Nathan Scott
  2006-07-28  1:58         ` Review: xfs_repair fixes for dir2 corruption Barry Naujok
  0 siblings, 1 reply; 12+ messages in thread
From: Nathan Scott @ 2006-07-27 23:17 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: cw, vapo, xfs

On Wed, Jul 26, 2006 at 11:25:51AM +0100, Christoph Hellwig wrote:
> Maybe we can start with a common XFS routine first:
> 
> kmem_alloc_greedy(size_t *size, size_t minsize, unsigned int flags)

Looks good, I'll create a patch & send out later today.  Actually,
we could say minsize is 1 page, and simplify - for our uses in XFS,
that will suffice... or dya want that extra parameter for the more
generic version later?

> > cost is added for usual case), just so we can always easily see where
> > the large allocations are, and trap any inadvertantly introduced new
> > ones... thoughts?
> 
> I'm happy with that flag, but I'd prefer it only allocations with
> KM_LARGE would go to vmalloc so that we can start to untangle the
> allocator wrapping mess.

Yep - that was the other reason why I started down that path - I'm not
convinced we really need the vmalloc calls anymore, we don't get near
the max slab cutoff on my i386 test box anyway, AFAICT.  Long term, we
should be removing the vmalloc call, but needs more beating to be sure
nothing can get there nowadays - ultimately we should just BUG_ON over
a max size (below the current vmalloc cutoff).

cheers.

-- 
Nathan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Review: xfs_repair fixes for dir2 corruption
  2006-07-27 23:17       ` Nathan Scott
@ 2006-07-28  1:58         ` Barry Naujok
  2006-07-28  8:10           ` Nathan Scott
                             ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Barry Naujok @ 2006-07-28  1:58 UTC (permalink / raw)
  To: xfs

This patch addresses the following xfs_repair issues:
  - Can rebuild most corrupted directories. Some will be 
    beyond repair and the contents of those will end up 
    in lost+found.
  - Fixed potential problems with the duplicate name 
    checking by properly referencing the buffers where
    the names are stored.
  - Unified the two hash lists used in directory checks.
  - Fixed a case where incorrectly reference counted and
    dirty buffers were never written to disk (most 
    common observation of this is an inaccessible 
    lost+found directory after a repair).

For those that are keen to fix the 16777216 problem, give this patch a go.




===========================================================================
xfsprogs/include/cache.h
===========================================================================

--- a/xfsprogs/include/cache.h	2006-07-28 11:46:09.000000000 +1000
+++ b/xfsprogs/include/cache.h	2006-07-27 16:17:47.804986322 +1000
@@ -30,6 +30,7 @@ struct cache_node;
 typedef void *cache_key_t;
 typedef void (*cache_walk_t)(struct cache_node *);
 typedef struct cache_node * (*cache_node_alloc_t)(void);
+typedef void (*cache_node_flush_t)(struct cache_node *);
 typedef void (*cache_node_relse_t)(struct cache_node *);
 typedef unsigned int (*cache_node_hash_t)(cache_key_t, unsigned int);
 typedef int (*cache_node_compare_t)(struct cache_node *, cache_key_t);
@@ -38,6 +39,7 @@ typedef unsigned int (*cache_bulk_relse_
 struct cache_operations {
 	cache_node_hash_t	hash;
 	cache_node_alloc_t	alloc;
+	cache_node_flush_t	flush;
 	cache_node_relse_t	relse;
 	cache_node_compare_t	compare;
 	cache_bulk_relse_t	bulkrelse;	/* optional */
@@ -49,6 +51,7 @@ struct cache {
 	pthread_mutex_t		c_mutex;	/* node count mutex */
 	cache_node_hash_t	hash;		/* node hash function */
 	cache_node_alloc_t	alloc;		/* allocation function */
+	cache_node_flush_t	flush;		/* flush dirty data function */
 	cache_node_relse_t	relse;		/* memory free function */
 	cache_node_compare_t	compare;	/* comparison routine */
 	cache_bulk_relse_t	bulkrelse;	/* bulk release routine */
@@ -75,6 +78,7 @@ struct cache *cache_init(unsigned int, s
 void cache_destroy(struct cache *);
 void cache_walk(struct cache *, cache_walk_t);
 void cache_purge(struct cache *);
+void cache_flush(struct cache *);
 
 int cache_node_get(struct cache *, cache_key_t, struct cache_node **);
 void cache_node_put(struct cache_node *);

===========================================================================
xfsprogs/include/libxfs.h
===========================================================================

--- a/xfsprogs/include/libxfs.h	2006-07-28 11:46:09.000000000 +1000
+++ b/xfsprogs/include/libxfs.h	2006-07-27 16:34:45.308344014 +1000
@@ -257,6 +257,7 @@ extern int	libxfs_writebuf_int (xfs_buf_
 extern struct cache	*libxfs_bcache;
 extern struct cache_operations	libxfs_bcache_operations;
 extern void	libxfs_bcache_purge (void);
+extern void	libxfs_bcache_flush (void);
 extern xfs_buf_t	*libxfs_getbuf (dev_t, xfs_daddr_t, int);
 extern void	libxfs_putbuf (xfs_buf_t *);
 extern void	libxfs_purgebuf (xfs_buf_t *);
@@ -467,6 +468,8 @@ extern int	libxfs_bmap_finish (xfs_trans
 				xfs_fsblock_t, int *);
 extern int	libxfs_bmap_next_offset (xfs_trans_t *, xfs_inode_t *,
 				xfs_fileoff_t *, int);
+extern int	libxfs_bmap_last_offset(xfs_trans_t *, xfs_inode_t *, 
+				xfs_fileoff_t *, int);
 extern int	libxfs_bunmapi (xfs_trans_t *, xfs_inode_t *, xfs_fileoff_t,
 				xfs_filblks_t, int, xfs_extnum_t,
 				xfs_fsblock_t *, xfs_bmap_free_t *, int *);

===========================================================================
xfsprogs/libxfs/cache.c
===========================================================================

--- a/xfsprogs/libxfs/cache.c	2006-07-28 11:46:09.000000000 +1000
+++ b/xfsprogs/libxfs/cache.c	2006-07-27 17:42:43.812685388 +1000
@@ -60,6 +60,7 @@ cache_init(
 	cache->c_hashsize = hashsize;
 	cache->hash = cache_operations->hash;
 	cache->alloc = cache_operations->alloc;
+	cache->flush = cache_operations->flush;
 	cache->relse = cache_operations->relse;
 	cache->compare = cache_operations->compare;
 	cache->bulkrelse = cache_operations->bulkrelse ?
@@ -422,6 +423,39 @@ cache_purge(
 		cache_abort();
 	}
 #endif
+	/* flush any remaining nodes to disk */
+	cache_flush(cache);
+}
+
+/*
+ * Flush all nodes in the cache to disk. 
+ */
+void
+cache_flush(
+	struct cache *		cache)
+{
+	struct cache_hash *	hash;
+	struct list_head *	head;
+	struct list_head *	pos;
+	struct cache_node *	node;
+	int			i;
+	
+	if (!cache->flush)
+		return;
+	
+	for (i = 0; i < cache->c_hashsize; i++) {
+		hash = &cache->c_hash[i];
+		
+		pthread_mutex_lock(&hash->ch_mutex);
+		head = &hash->ch_list;
+		for (pos = head->next; pos != head; pos = pos->next) {
+			node = (struct cache_node *)pos;
+			pthread_mutex_lock(&node->cn_mutex);
+			cache->flush(node);
+			pthread_mutex_unlock(&node->cn_mutex);
+		}
+		pthread_mutex_unlock(&hash->ch_mutex);
+	}
 }
 
 #define	HASH_REPORT	(3*HASH_CACHE_RATIO)

===========================================================================
xfsprogs/libxfs/rdwr.c
===========================================================================

--- a/xfsprogs/libxfs/rdwr.c	2006-07-28 11:46:09.000000000 +1000
+++ b/xfsprogs/libxfs/rdwr.c	2006-07-27 16:40:56.612373938 +1000
@@ -416,6 +416,15 @@ libxfs_iomove(xfs_buf_t *bp, uint boff, 
 }
 
 static void
+libxfs_bflush(struct cache_node *node)
+{
+	xfs_buf_t		*bp = (xfs_buf_t *)node;
+
+	if ((bp != NULL) && (bp->b_flags & LIBXFS_B_DIRTY))
+		libxfs_writebufr(bp);
+}
+
+static void
 libxfs_brelse(struct cache_node *node)
 {
 	xfs_buf_t		*bp = (xfs_buf_t *)node;
@@ -442,9 +451,16 @@ libxfs_bcache_purge(void)
 	cache_purge(libxfs_bcache);
 }
 
+void 
+libxfs_bcache_flush(void)
+{
+	cache_flush(libxfs_bcache);
+}
+
 struct cache_operations libxfs_bcache_operations = {
 	/* .hash */	libxfs_bhash,
 	/* .alloc */	libxfs_balloc,
+	/* .flush */	libxfs_bflush,
 	/* .relse */	libxfs_brelse,
 	/* .compare */	libxfs_bcompare,
 	/* .bulkrelse */ NULL	/* TODO: lio_listio64 interface? */
@@ -649,6 +665,7 @@ libxfs_icache_purge(void)
 struct cache_operations libxfs_icache_operations = {
 	/* .hash */	libxfs_ihash,
 	/* .alloc */	libxfs_ialloc,
+	/* .flush */	NULL,
 	/* .relse */	libxfs_irelse,
 	/* .compare */	libxfs_icompare,
 	/* .bulkrelse */ NULL

===========================================================================
xfsprogs/libxfs/xfs.h
===========================================================================

--- a/xfsprogs/libxfs/xfs.h	2006-07-28 11:46:09.000000000 +1000
+++ b/xfsprogs/libxfs/xfs.h	2006-07-25 12:05:31.917896892 +1000
@@ -98,6 +98,7 @@
 #define xfs_bmapi_single		libxfs_bmapi_single
 #define xfs_bmap_finish			libxfs_bmap_finish
 #define xfs_bmap_del_free		libxfs_bmap_del_free
+#define xfs_bmap_last_offset		libxfs_bmap_last_offset
 #define xfs_bunmapi			libxfs_bunmapi
 #define xfs_free_extent			libxfs_free_extent
 #define xfs_rtfree_extent		libxfs_rtfree_extent

===========================================================================
xfsprogs/repair/phase6.c
===========================================================================

--- a/xfsprogs/repair/phase6.c	2006-07-28 11:46:09.000000000 +1000
+++ b/xfsprogs/repair/phase6.c	2006-07-28 11:30:50.033905530 +1000
@@ -36,43 +36,35 @@ static int orphanage_entered;
 
 /*
  * Data structures and routines to keep track of directory entries
- * and whether their leaf entry has been seen
+ * and whether their leaf entry has been seen. Also used for name
+ * duplicate checking and rebuilding step if required.
  */
 typedef struct dir_hash_ent {
-	struct dir_hash_ent	*next;	/* pointer to next entry */
+	struct dir_hash_ent	*nextbyaddr;/* pointer to next entry */
+	struct dir_hash_ent	*nextbyhash;
+	struct dir_hash_ent	*nextbyorder;
 	xfs_dir2_leaf_entry_t	ent;	/* address and hash value */
+	xfs_ino_t 		inum;	/* inode of name */
 	short			junkit;	/* name starts with / */
 	short			seen;	/* have seen leaf entry */
+	int	  	    	namelen;/* length of name */
+	uchar_t    	    	*name;	/* pointer to name (no NULL) */
 } dir_hash_ent_t;
 
 typedef struct dir_hash_tab {
 	int			size;	/* size of hash table */
-	dir_hash_ent_t		*tab[1];/* actual hash table, variable size */
+	int			names_duped;
+	dir_hash_ent_t		*first;
+	dir_hash_ent_t		*last;
+	dir_hash_ent_t		**byhash;/* actual hash table, variable size */
+	dir_hash_ent_t		**byaddr;/* actual hash table, variable size */
 } dir_hash_tab_t;
+
 #define	DIR_HASH_TAB_SIZE(n)	\
-	(offsetof(dir_hash_tab_t, tab) + (sizeof(dir_hash_ent_t *) * (n)))
+	(sizeof(dir_hash_tab_t) + (sizeof(dir_hash_ent_t *) * (n) * 2))
 #define	DIR_HASH_FUNC(t,a)	((a) % (t)->size)
 
 /*
- * Track names to check for duplicates in a directory.
- */
-
-typedef struct name_hash_ent {
-	struct name_hash_ent	*next;	/* pointer to next entry */
-	xfs_dahash_t		hashval;/* hash value of name */
-	int	  	    	namelen;/* length of name */
-	uchar_t    	    	*name;	/* pointer to name (no NULL) */
-} name_hash_ent_t;		
-
-typedef struct name_hash_tab {
-	int			size;	/* size of hash table */
-	name_hash_ent_t		*tab[1];/* actual hash table, variable size */
-} name_hash_tab_t;
-#define	NAME_HASH_TAB_SIZE(n)	\
-	(offsetof(name_hash_tab_t, tab) + (sizeof(name_hash_ent_t *) * (n)))
-#define	NAME_HASH_FUNC(t,a)	((a) % (t)->size)
-
-/*
  * Track the contents of the freespace table in a directory.
  */
 typedef struct freetab {
@@ -94,28 +86,75 @@ typedef struct freetab {
 #define	DIR_HASH_CK_BADSTALE	5
 #define	DIR_HASH_CK_TOTAL	6
 
-static void
+/*
+ * Returns 0 if the name already exists (ie. a duplicate)
+ */
+static int
 dir_hash_add(
 	dir_hash_tab_t		*hashtab,
-	xfs_dahash_t		hash,
-	xfs_dir2_dataptr_t	addr,
-	int			junk)
-{
-	int			i;
+	xfs_dir2_dataptr_t	addr,	
+	xfs_ino_t		inum,
+	int			namelen,
+	uchar_t			*name)
+{
+	xfs_dahash_t		hash = 0;
+	int			byaddr;
+	int			byhash = 0;
 	dir_hash_ent_t		*p;
-
-	i = DIR_HASH_FUNC(hashtab, addr);
+	int			dup;
+	short			junk;
+	
+	junk = name[0] == '/';
+	byaddr = DIR_HASH_FUNC(hashtab, addr);
+	dup = 0;
+
+	if (!junk) {
+		hash = libxfs_da_hashname(name, namelen);
+		byhash = DIR_HASH_FUNC(hashtab, hash);
+
+		/* 
+		 * search hash bucket for existing name.
+		 */
+		for (p = hashtab->byhash[byhash]; p; p = p->nextbyhash) {
+			if (p->ent.hashval == hash && p->namelen == namelen) {
+				if (memcmp(p->name, name, namelen) == 0) {
+					dup = 1;
+					break;
+				}
+			}
+		}
+	}
+	
 	if ((p = malloc(sizeof(*p))) == NULL)
 		do_error(_("malloc failed in dir_hash_add (%u bytes)\n"),
 			sizeof(*p));
-	p->next = hashtab->tab[i];
-	hashtab->tab[i] = p;
-	if (!(p->junkit = junk))
+	
+	p->nextbyaddr = hashtab->byaddr[byaddr];
+	hashtab->byaddr[byaddr] = p;
+	if (hashtab->last) 
+		hashtab->last->nextbyorder = p;
+	else
+		hashtab->first = p;
+	p->nextbyorder = NULL;
+	hashtab->last = p;
+	
+	if (!(p->junkit = junk)) {
 		p->ent.hashval = hash;
+		p->nextbyhash = hashtab->byhash[byhash];
+		hashtab->byhash[byhash] = p;
+	}
 	p->ent.address = addr;
+	p->inum = inum;
 	p->seen = 0;
+	p->namelen = namelen;
+	p->name = name;
+	
+	return !dup;
 }
 
+/*
+ * checks to see if any data entries are not in the leaf blocks 
+ */
 static int
 dir_hash_unseen(
 	dir_hash_tab_t	*hashtab)
@@ -124,7 +163,7 @@ dir_hash_unseen(
 	dir_hash_ent_t	*p;
 
 	for (i = 0; i < hashtab->size; i++) {
-		for (p = hashtab->tab[i]; p; p = p->next) {
+		for (p = hashtab->byaddr[i]; p; p = p->nextbyaddr) {
 			if (p->seen == 0)
 				return 1;
 		}
@@ -173,8 +212,10 @@ dir_hash_done(
 	dir_hash_ent_t	*p;
 
 	for (i = 0; i < hashtab->size; i++) {
-		for (p = hashtab->tab[i]; p; p = n) {
-			n = p->next;
+		for (p = hashtab->byaddr[i]; p; p = n) {
+			n = p->nextbyaddr;
+			if (hashtab->names_duped)
+				free(p->name);
 			free(p);
 		}
 	}
@@ -196,6 +237,10 @@ dir_hash_init(
 	if ((hashtab = calloc(DIR_HASH_TAB_SIZE(hsize), 1)) == NULL)
 		do_error(_("calloc failed in dir_hash_init\n"));
 	hashtab->size = hsize;
+	hashtab->byhash = (dir_hash_ent_t**)((char *)hashtab + 
+		sizeof(dir_hash_tab_t));
+	hashtab->byaddr = (dir_hash_ent_t**)((char *)hashtab + 
+		sizeof(dir_hash_tab_t) + sizeof(dir_hash_ent_t*) * hsize);
 	return hashtab;
 }
 
@@ -209,7 +254,7 @@ dir_hash_see(
 	dir_hash_ent_t		*p;
 
 	i = DIR_HASH_FUNC(hashtab, addr);
-	for (p = hashtab->tab[i]; p; p = p->next) {
+	for (p = hashtab->byaddr[i]; p; p = p->nextbyaddr) {
 		if (p->ent.address != addr)
 			continue;
 		if (p->seen)
@@ -222,6 +267,10 @@ dir_hash_see(
 	return DIR_HASH_CK_NODATA;
 }
 
+/*
+ * checks to make sure leafs match a data entry, and that the stale
+ * count is valid.
+ */
 static int
 dir_hash_see_all(
 	dir_hash_tab_t		*hashtab,
@@ -246,81 +295,25 @@ dir_hash_see_all(
 }
 
 /*
- * Returns 0 if the name already exists (ie. a duplicate)
+ * Convert name pointers into locally allocated memory
  */
-static int
-name_hash_add(
-	name_hash_tab_t		*nametab,
-	uchar_t			*name,
-	int			namelen)
+static void
+dir_hash_dup_names(dir_hash_tab_t *hashtab)
 {
-	xfs_dahash_t		hash;
-	int			i;
-	name_hash_ent_t		*p;
-
-	hash = libxfs_da_hashname(name, namelen);
-			
-	i = NAME_HASH_FUNC(nametab, hash);
-	
-	/* 
-	 * search hash bucket for existing name.
-	 */
-	for (p = nametab->tab[i]; p; p = p->next) {
-		if (p->hashval == hash && p->namelen == namelen) {
-			if (memcmp(p->name, name, namelen) == 0) 
-				return 0; /* exists */
-		}
-	}
-	
-	if ((p = malloc(sizeof(*p))) == NULL)
-		do_error(_("malloc failed in name_hash_add (%u bytes)\n"),
-			sizeof(*p));
+	uchar_t			*name;
+	dir_hash_ent_t		*p;
 	
-	p->next = nametab->tab[i];
-	p->hashval = hash;
-	p->name = name;
-	p->namelen = namelen;
-	nametab->tab[i] = p;
+	if (hashtab->names_duped)
+		return;
 	
-	return 1;	/* success, no duplicate */
-}
-
-static name_hash_tab_t *
-name_hash_init(
-	xfs_fsize_t	size)
-{
-	name_hash_tab_t	*nametab;
-	int		hsize;
-
-	hsize = size / (16 * 4);
-	if (hsize > 1024)
-		hsize = 1024;
-	else if (hsize < 16)
-		hsize = 16;
-	if ((nametab = calloc(NAME_HASH_TAB_SIZE(hsize), 1)) == NULL)
-		do_error(_("calloc failed in name_hash_init\n"));
-	nametab->size = hsize;
-	return nametab;
-}
-
-static void
-name_hash_done(
-	name_hash_tab_t	*nametab)
-{
-	int		i;
-	name_hash_ent_t	*n;
-	name_hash_ent_t	*p;
-
-	for (i = 0; i < nametab->size; i++) {
-		for (p = nametab->tab[i]; p; p = n) {
-			n = p->next;
-			free(p);
-		}
+	for (p = hashtab->first; p; p = p->nextbyorder) {
+		name = malloc(p->namelen);
+		memcpy(name, p->name, p->namelen);
+		p->name = name;
 	}
-	free(nametab);
+	hashtab->names_duped = 1;
 }
 
-
 /*
  * Version 1 or 2 directory routine wrappers
 */
@@ -1385,7 +1378,8 @@ lf_block_dir_entry_check(xfs_mount_t		*m
 			dir_stack_t		*stack,
 			ino_tree_node_t		*current_irec,
 			int			current_ino_offset,
-			name_hash_tab_t		*nametab)
+			dir_hash_tab_t		*hashtab,
+			xfs_dablk_t		da_bno)
 {
 	xfs_dir_leaf_entry_t	*entry;
 	ino_tree_node_t		*irec;
@@ -1545,7 +1539,9 @@ lf_block_dir_entry_check(xfs_mount_t		*m
 		/*
 		 * check for duplicate names in directory.
 		 */ 
-		if (!name_hash_add(nametab, namest->name, entry->namelen)) {
+		if (!dir_hash_add(hashtab, (da_bno << mp->m_sb.sb_blocklog) + 
+						entry->nameidx, 
+				lino, entry->namelen, namest->name)) {
 			do_warn(
 		_("entry \"%s\" (ino %llu) in dir %llu is a duplicate name"),
 				fname, lino, ino);
@@ -1635,7 +1631,7 @@ longform_dir_entry_check(xfs_mount_t	*mp
 			dir_stack_t	*stack,
 			ino_tree_node_t	*irec,
 			int		ino_offset,
-			name_hash_tab_t	*nametab)
+			dir_hash_tab_t	*hashtab)
 {
 	xfs_dir_leafblock_t	*leaf;
 	xfs_buf_t		*bp;
@@ -1677,8 +1673,6 @@ longform_dir_entry_check(xfs_mount_t	*mp
 
 		leaf = (xfs_dir_leafblock_t *)XFS_BUF_PTR(bp);
 
-		da_bno = INT_GET(leaf->hdr.info.forw, ARCH_CONVERT);
-
 		if (INT_GET(leaf->hdr.info.magic, ARCH_CONVERT) !=
 		    XFS_DIR_LEAF_MAGIC)  {
 			if (!no_modify)  {
@@ -1699,9 +1693,11 @@ _("bad magic # (0x%x) for dir ino %llu l
 		}
 
 		if (!skipit)
-			lf_block_dir_entry_check(mp, ino, leaf, &dirty,
-						num_illegal, need_dot, stack,
-						irec, ino_offset, nametab);
+			lf_block_dir_entry_check(mp, ino, leaf, &dirty, 
+					num_illegal, need_dot, stack, irec, 
+					ino_offset, hashtab, da_bno);
+
+		da_bno = INT_GET(leaf->hdr.info.forw, ARCH_CONVERT);
 
 		ASSERT(dirty == 0 || (dirty && !no_modify));
 
@@ -1745,6 +1741,127 @@ _("can't map leaf block %d in dir %llu, 
 }
 
 /*
+ * Unexpected failure during the rebuild will leave the entries in
+ * lost+found on the next run
+ */
+
+static void 
+longform_dir2_rebuild(
+	xfs_mount_t	*mp,
+	xfs_ino_t	ino,
+	xfs_inode_t	*ip,
+	dir_hash_tab_t	*hashtab)
+{
+	int			error;
+	int			nres;
+	xfs_trans_t		*tp;
+	xfs_fileoff_t		lastblock;
+	xfs_fsblock_t		firstblock;
+	xfs_bmap_free_t		flist;
+	xfs_ino_t		parentino;
+	xfs_inode_t		*pip;
+	int			byhash;
+	dir_hash_ent_t		*p;
+	int			committed;
+	int			done;
+	
+	/* 
+	 * trash directory completely and rebuild from scratch using the
+	 * name/inode pairs in the hash table
+	 */
+	 
+	do_warn(_("rebuilding directory inode %llu\n"), ino);
+	
+	/* 
+	 * first attempt to locate the parent inode, if it can't be found,
+	 * we'll use the lost+found inode 
+	 */
+	byhash = DIR_HASH_FUNC(hashtab, libxfs_da_hashname((uchar_t*)"..", 2));
+	parentino = orphanage_ino;
+	for (p = hashtab->byhash[byhash]; p; p = p->nextbyhash) {
+		if (p->namelen == 2 && p->name[0] == '.' && p->name[1] == '.') {
+			parentino = p->inum;
+			break;
+		}
+	}
+
+	XFS_BMAP_INIT(&flist, &firstblock);
+		
+	tp = libxfs_trans_alloc(mp, 0);
+	nres = XFS_REMOVE_SPACE_RES(mp);
+	error = libxfs_trans_reserve(tp, nres, XFS_REMOVE_LOG_RES(mp), 0,
+			XFS_TRANS_PERM_LOG_RES, XFS_REMOVE_LOG_COUNT);
+	if (error)
+		res_failed(error);
+	libxfs_trans_ijoin(tp, ip, 0);
+	libxfs_trans_ihold(tp, ip);
+	
+	if ((error = libxfs_bmap_last_offset(tp, ip, &lastblock, 
+						XFS_DATA_FORK)))
+		do_error(_("xfs_bmap_last_offset failed -- error - %d\n"), 
+			error);
+	
+	/* re-init the directory to shortform */
+	if ((error = libxfs_trans_iget(mp, tp, parentino, 0, 0, &pip)))
+		do_error(
+		_("couldn't iget parent inode %llu -- error - %d\n"),
+			parentino, error);
+
+	/* free all data, leaf, node and freespace blocks */
+	
+	if ((error = libxfs_bunmapi(tp, ip, 0, lastblock, 
+			XFS_BMAPI_METADATA, 0, &firstblock, &flist,
+			&done))) 
+		do_error(_("xfs_bunmapi failed -- error - %d\n"), error);
+	ASSERT(done);
+
+	libxfs_dir2_init(tp, ip, pip);
+	
+	error = libxfs_bmap_finish(&tp, &flist, firstblock, &committed);
+				
+	libxfs_trans_commit(tp, 
+			XFS_TRANS_RELEASE_LOG_RES|XFS_TRANS_SYNC, 0);
+		
+	/* go through the hash list and re-add the inodes */
+
+	for (p = hashtab->first; p; p = p->nextbyorder) {
+		
+		if (p->name[0] == '/' || (p->name[0] == '.' && (p->namelen == 1 
+				|| (p->namelen == 2 && p->name[1] == '.'))))
+			continue;
+		
+		tp = libxfs_trans_alloc(mp, 0);
+		nres = XFS_CREATE_SPACE_RES(mp, p->namelen);
+		if ((error = libxfs_trans_reserve(tp, nres, 
+				XFS_CREATE_LOG_RES(mp), 0,
+				XFS_TRANS_PERM_LOG_RES, XFS_CREATE_LOG_COUNT)))
+			do_error(
+	_("space reservation failed (%d), filesystem may be out of space\n"),
+				error);
+
+		libxfs_trans_ijoin(tp, ip, 0);
+		libxfs_trans_ihold(tp, ip);
+
+		XFS_BMAP_INIT(&flist, &firstblock);
+		if ((error = libxfs_dir2_createname(tp, ip, (char*)p->name, 
+				p->namelen, p->inum, &firstblock, &flist, nres)))
+			do_error(
+_("name create failed in ino %llu (%d), filesystem may be out of space\n"),
+				ino, error);
+
+		if ((error = libxfs_bmap_finish(&tp, &flist, firstblock, 
+				&committed)))
+			do_error(
+	_("bmap finish failed (%d), filesystem may be out of space\n"),
+				error);
+
+		libxfs_trans_commit(tp, 
+				XFS_TRANS_RELEASE_LOG_RES|XFS_TRANS_SYNC, 0);
+	}
+}
+
+
+/*
  * Kill a block in a version 2 inode.
  * Makes its own transaction.
  */
@@ -1807,7 +1924,6 @@ longform_dir2_entry_check_data(
 	xfs_dabuf_t		**bpp,
 	dir_hash_tab_t		*hashtab,
 	freetab_t		**freetabp,
-	name_hash_tab_t		*nametab,
 	xfs_dablk_t		da_bno,
 	int			isblock)
 {
@@ -1828,6 +1944,7 @@ longform_dir2_entry_check_data(
 	freetab_t		*freetab;
 	int			i;
 	int			ino_offset;
+	xfs_ino_t		inum;
 	ino_tree_node_t		*irec;
 	int			junkit;
 	int			lastfree;
@@ -1956,8 +2073,7 @@ longform_dir2_entry_check_data(
 	libxfs_trans_ijoin(tp, ip, 0);
 	libxfs_trans_ihold(tp, ip);
 	libxfs_da_bjoin(tp, bp);
-	if (isblock)
-		libxfs_da_bhold(tp, bp);
+	libxfs_da_bhold(tp, bp);
 	XFS_BMAP_INIT(&flist, &firstblock);
 	if (INT_GET(d->hdr.magic, ARCH_CONVERT) != wantmagic) {
 		do_warn(_("bad directory block magic # %#x for directory inode "
@@ -1987,7 +2103,7 @@ longform_dir2_entry_check_data(
 	while (ptr < endptr) {
 		dup = (xfs_dir2_data_unused_t *)ptr;
 		if (INT_GET(dup->freetag, ARCH_CONVERT) ==
-		    XFS_DIR2_DATA_FREE_TAG) {
+		    				XFS_DIR2_DATA_FREE_TAG) {
 			if (lastfree) {
 				do_warn(_("directory inode %llu block %u has "
 					  "consecutive free entries: "),
@@ -2011,10 +2127,24 @@ longform_dir2_entry_check_data(
 		addr = XFS_DIR2_DB_OFF_TO_DATAPTR(mp, db, ptr - (char *)d);
 		dep = (xfs_dir2_data_entry_t *)ptr;
 		ptr += XFS_DIR2_DATA_ENTSIZE(dep->namelen);
+		inum = INT_GET(dep->inumber, ARCH_CONVERT);
 		lastfree = 0;
-		dir_hash_add(hashtab,
-			libxfs_da_hashname((uchar_t *)dep->name, dep->namelen),
-			addr, dep->name[0] == '/');
+		if (!dir_hash_add(hashtab, addr, inum, dep->namelen, 
+				dep->name)) {
+			do_warn(
+		_("entry \"%s\" (ino %llu) in dir %llu is a duplicate name"),
+				fname, inum, ip->i_ino);
+			if (!no_modify) {
+				if (verbose)
+					do_warn(
+					_(", marking entry to be junked\n"));
+				else
+					do_warn("\n");
+			} else {
+				do_warn(_(", would junk entry\n"));
+			}
+			dep->name[0] = '/';
+		}
 		/*
 		 * skip bogus entries (leading '/').  they'll be deleted
 		 * later.  must still log it, else we leak references to
@@ -2029,7 +2159,7 @@ longform_dir2_entry_check_data(
 		junkit = 0;
 		bcopy(dep->name, fname, dep->namelen);
 		fname[dep->namelen] = '\0';
-		ASSERT(INT_GET(dep->inumber, ARCH_CONVERT) != NULLFSINO);
+		ASSERT(inum != NULLFSINO);
 		/*
 		 * skip the '..' entry since it's checked when the
 		 * directory is reached by something else.  if it never
@@ -2039,7 +2169,7 @@ longform_dir2_entry_check_data(
 		if (dep->namelen == 2 && dep->name[0] == '.' &&
 		    dep->name[1] == '.')
 			continue;
-		ASSERT(no_modify || !verify_inum(mp, INT_GET(dep->inumber, ARCH_CONVERT)));
+		ASSERT(no_modify || !verify_inum(mp, inum));
 		/*
 		 * special case the . entry.  we know there's only one
 		 * '.' and only '.' points to itself because bogus entries
@@ -2049,7 +2179,7 @@ longform_dir2_entry_check_data(
 		 * '..' is already accounted for or will be taken care
 		 * of when directory is moved to orphanage.
 		 */
-		if (ip->i_ino == INT_GET(dep->inumber, ARCH_CONVERT))  {
+		if (ip->i_ino == inum)  {
 			ASSERT(dep->name[0] == '.' && dep->namelen == 1);
 			add_inode_ref(current_irec, current_ino_offset);
 			*need_dot = 0;
@@ -2062,23 +2192,18 @@ longform_dir2_entry_check_data(
 		 * just skip it.  no need to process it and it's ..
 		 * link is already accounted for.
 		 */
-		if (INT_GET(dep->inumber, ARCH_CONVERT) == orphanage_ino &&
-		    strcmp(fname, ORPHANAGE) == 0)
+		if (inum == orphanage_ino && strcmp(fname, ORPHANAGE) == 0)
 			continue;
 		/*
 		 * skip entries with bogus inumbers if we're in no modify mode
 		 */
-		if (no_modify &&
-		    verify_inum(mp, INT_GET(dep->inumber, ARCH_CONVERT)))
+		if (no_modify && verify_inum(mp, inum))
 			continue;
 		/*
 		 * ok, now handle the rest of the cases besides '.' and '..'
 		 */
-		irec = find_inode_rec(
-			XFS_INO_TO_AGNO(mp,
-				INT_GET(dep->inumber, ARCH_CONVERT)),
-			XFS_INO_TO_AGINO(mp,
-				INT_GET(dep->inumber, ARCH_CONVERT)));
+		irec = find_inode_rec(XFS_INO_TO_AGNO(mp, inum),
+					XFS_INO_TO_AGINO(mp, inum));
 		if (irec == NULL)  {
 			nbad++;
 			do_warn(_("entry \"%s\" in directory inode %llu points "
@@ -2093,9 +2218,7 @@ longform_dir2_entry_check_data(
 			}
 			continue;
 		}
-		ino_offset = XFS_INO_TO_AGINO(mp,
-				INT_GET(dep->inumber, ARCH_CONVERT)) -
-					irec->ino_startnum;
+		ino_offset = XFS_INO_TO_AGINO(mp, inum) - irec->ino_startnum;
 		/*
 		 * if it's a free inode, blow out the entry.
 		 * by now, any inode that we think is free
@@ -2106,18 +2229,13 @@ longform_dir2_entry_check_data(
 			 * don't complain if this entry points to the old
 			 * and now-free lost+found inode
 			 */
-			if (verbose || no_modify ||
-			    INT_GET(dep->inumber, ARCH_CONVERT) !=
-			    old_orphanage_ino)
+			if (verbose || no_modify || inum != old_orphanage_ino)
 				do_warn(
 	_("entry \"%s\" in directory inode %llu points to free inode %llu"),
-					fname, ip->i_ino,
-					INT_GET(dep->inumber, ARCH_CONVERT));
+					fname, ip->i_ino, inum);
 			nbad++;
 			if (!no_modify)  {
-				if (verbose ||
-				    INT_GET(dep->inumber, ARCH_CONVERT) !=
-				    old_orphanage_ino)
+				if (verbose || inum != old_orphanage_ino)
 					do_warn(
 					_(", marking entry to be junked\n"));
 				else
@@ -2130,28 +2248,6 @@ longform_dir2_entry_check_data(
 			continue;
 		}
 		/*
-		 * check for duplicate names in directory.
-		 */ 
-		if (!name_hash_add(nametab, dep->name, dep->namelen)) {
-			do_warn(
-		_("entry \"%s\" (ino %llu) in dir %llu is a duplicate name"),
-				fname, INT_GET(dep->inumber, ARCH_CONVERT),
-				ip->i_ino);
-			nbad++;
-			if (!no_modify) {
-				if (verbose)
-					do_warn(
-					_(", marking entry to be junked\n"));
-				else
-					do_warn("\n");
-				dep->name[0] = '/';
-				libxfs_dir2_data_log_entry(tp, bp, dep);
-			} else {
-				do_warn(_(", would junk entry\n"));
-			}
-			continue;
-		}
-		/*
 		 * check easy case first, regular inode, just bump
 		 * the link count and continue
 		 */
@@ -2172,22 +2268,17 @@ longform_dir2_entry_check_data(
 			junkit = 1;
 			do_warn(
 _("entry \"%s\" in dir %llu points to an already connected directory inode %llu,\n"),
-				fname, ip->i_ino,
-				INT_GET(dep->inumber, ARCH_CONVERT));
+				fname, ip->i_ino, inum);
 		} else if (parent == ip->i_ino)  {
 			add_inode_reached(irec, ino_offset);
 			add_inode_ref(current_irec, current_ino_offset);
-			if (!is_inode_refchecked(
-				INT_GET(dep->inumber, ARCH_CONVERT), irec,
-					ino_offset))
-				push_dir(stack,
-					INT_GET(dep->inumber, ARCH_CONVERT));
+			if (!is_inode_refchecked(inum, irec, ino_offset))
+				push_dir(stack, inum);
 		} else  {
 			junkit = 1;
 			do_warn(
 _("entry \"%s\" in dir inode %llu inconsistent with .. value (%llu) in ino %llu,\n"),
-				fname, ip->i_ino, parent,
-				INT_GET(dep->inumber, ARCH_CONVERT));
+				fname, ip->i_ino, parent, inum);
 		}
 		if (junkit)  {
 			junkit = 0;
@@ -2195,9 +2286,7 @@ _("entry \"%s\" in dir inode %llu incons
 			if (!no_modify)  {
 				dep->name[0] = '/';
 				libxfs_dir2_data_log_entry(tp, bp, dep);
-				if (verbose ||
-				    INT_GET(dep->inumber, ARCH_CONVERT) !=
-				    old_orphanage_ino)
+				if (verbose || inum != old_orphanage_ino)
 					do_warn(
 					_("\twill clear entry \"%s\"\n"),
 						fname);
@@ -2212,8 +2301,6 @@ _("entry \"%s\" in dir inode %llu incons
 		libxfs_dir2_data_freescan(mp, d, &needlog, NULL);
 	if (needlog)
 		libxfs_dir2_data_log_header(tp, bp);
-	else if (!isblock && !nbad)
-		libxfs_da_brelse(tp, bp);
 	libxfs_bmap_finish(&tp, &flist, firstblock, &committed);
 	libxfs_trans_commit(tp, 0, 0);
 	freetab->ents[db].v = INT_GET(d->hdr.bestfree[0].length, ARCH_CONVERT);
@@ -2306,19 +2393,19 @@ longform_dir2_check_node(
 	xfs_fileoff_t		next_da_bno;
 	int			seeval = 0;
 	int			used;
-
+	
 	for (da_bno = mp->m_dirleafblk, next_da_bno = 0;
 	     next_da_bno != NULLFILEOFF && da_bno < mp->m_dirfreeblk;
 	     da_bno = (xfs_dablk_t)next_da_bno) {
 		next_da_bno = da_bno + mp->m_dirblkfsbs - 1;
-		if (libxfs_bmap_next_offset(NULL, ip, &next_da_bno, XFS_DATA_FORK))
+		if (libxfs_bmap_next_offset(NULL, ip, &next_da_bno, XFS_DATA_FORK)) 
 			break;
 		if (libxfs_da_read_bufr(NULL, ip, da_bno, -1, &bp,
 				XFS_DATA_FORK)) {
-			do_error(
-			_("can't read block %u for directory inode %llu\n"),
+			do_warn(
+			_("can't read leaf block %u for directory inode %llu\n"),
 				da_bno, ip->i_ino);
-			/* NOTREACHED */
+			return 1;
 		}
 		leaf = bp->data;
 		if (INT_GET(leaf->hdr.info.magic, ARCH_CONVERT) !=
@@ -2348,23 +2435,24 @@ longform_dir2_check_node(
 		seeval = dir_hash_see_all(hashtab, leaf->ents, INT_GET(leaf->hdr.count, ARCH_CONVERT),
 			INT_GET(leaf->hdr.stale, ARCH_CONVERT));
 		libxfs_da_brelse(NULL, bp);
-		if (seeval != DIR_HASH_CK_OK)
+		if (seeval != DIR_HASH_CK_OK) 
 			return 1;
 	}
-	if (dir_hash_check(hashtab, ip, seeval))
+	if (dir_hash_check(hashtab, ip, seeval)) 
 		return 1;
+	
 	for (da_bno = mp->m_dirfreeblk, next_da_bno = 0;
 	     next_da_bno != NULLFILEOFF;
 	     da_bno = (xfs_dablk_t)next_da_bno) {
 		next_da_bno = da_bno + mp->m_dirblkfsbs - 1;
-		if (libxfs_bmap_next_offset(NULL, ip, &next_da_bno, XFS_DATA_FORK))
+		if (libxfs_bmap_next_offset(NULL, ip, &next_da_bno, XFS_DATA_FORK)) 
 			break;
 		if (libxfs_da_read_bufr(NULL, ip, da_bno, -1, &bp,
 				XFS_DATA_FORK)) {
-			do_error(_("can't read block %u for directory inode "
-				   "%llu\n"),
+			do_warn(
+		_("can't read freespace block %u for directory inode %llu\n"),
 				da_bno, ip->i_ino);
-			/* NOTREACHED */
+			return 1;
 		}
 		free = bp->data;
 		fdb = XFS_DIR2_DA_TO_DB(mp, da_bno);
@@ -2418,388 +2506,9 @@ longform_dir2_check_node(
 }
 
 /*
- * Rebuild a directory: set up.
- * Turn it into a node-format directory with no contents in the
- * upper area.  Also has correct freespace blocks.
- */
-void
-longform_dir2_rebuild_setup(
-	xfs_mount_t		*mp,
-	xfs_ino_t		ino,
-	xfs_inode_t		*ip,
-	freetab_t		*freetab)
-{
-	xfs_da_args_t		args;
-	int			committed;
-	xfs_dir2_data_t		*data = NULL;
-	xfs_dabuf_t		*dbp;
-	int			error;
-	xfs_dir2_db_t		fbno;
-	xfs_dabuf_t		*fbp;
-	xfs_fsblock_t		firstblock;
-	xfs_bmap_free_t		flist;
-	xfs_dir2_free_t		*free;
-	int			i;
-	int			j;
-	xfs_dablk_t		lblkno;
-	xfs_dabuf_t		*lbp;
-	xfs_dir2_leaf_t		*leaf;
-	int			nres;
-	xfs_trans_t		*tp;
-
-	/* read first directory block */
-	tp = libxfs_trans_alloc(mp, 0);
-	nres = XFS_DAENTER_SPACE_RES(mp, XFS_DATA_FORK);
-	error = libxfs_trans_reserve(tp,
-		nres, XFS_CREATE_LOG_RES(mp), 0, XFS_TRANS_PERM_LOG_RES,
-		XFS_CREATE_LOG_COUNT);
-	if (error)
-		res_failed(error);
-	libxfs_trans_ijoin(tp, ip, 0);
-	libxfs_trans_ihold(tp, ip);
-	XFS_BMAP_INIT(&flist, &firstblock);
-	if (libxfs_da_read_buf(tp, ip, mp->m_dirdatablk, -2, &dbp,
-			XFS_DATA_FORK)) {
-		do_error(_("can't read block %u for directory inode %llu\n"),
-			mp->m_dirdatablk, ino);
-		/* NOTREACHED */
-	}
-
-	if (dbp)
-		data = dbp->data;
-
-	/* check for block format directory */
-	if (data &&
-	    INT_GET((data)->hdr.magic, ARCH_CONVERT) == XFS_DIR2_BLOCK_MAGIC) {
-		xfs_dir2_block_t	*block;
-		xfs_dir2_leaf_entry_t	*blp;
-		xfs_dir2_block_tail_t	*btp;
-		int			needlog;
-		int			needscan;
-
-		/* convert directory block from block format to data format */
-		INT_SET(data->hdr.magic, ARCH_CONVERT, XFS_DIR2_DATA_MAGIC);
-
-		/* construct freelist */
-		block = (xfs_dir2_block_t *)data;
-		btp = XFS_DIR2_BLOCK_TAIL_P(mp, block);
-		blp = XFS_DIR2_BLOCK_LEAF_P(btp);
-		needlog = needscan = 0;
-		libxfs_dir2_data_make_free(tp, dbp, (char *)blp - (char *)block,
-			(char *)block + mp->m_dirblksize - (char *)blp,
-			&needlog, &needscan);
-		if (needscan)
-			libxfs_dir2_data_freescan(mp, data, &needlog, NULL);
-		libxfs_da_log_buf(tp, dbp, 0, mp->m_dirblksize - 1);
-	} else if (dbp) {
-		libxfs_da_brelse(tp, dbp);
-	}
-
-	/* allocate blocks for btree */
-	bzero(&args, sizeof(args));
-	args.trans = tp;
-	args.dp = ip;
-	args.whichfork = XFS_DATA_FORK;
-	args.firstblock = &firstblock;
-	args.flist = &flist;
-	args.total = nres;
-	if ((error = libxfs_da_grow_inode(&args, &lblkno)) ||
-	    (error = libxfs_da_get_buf(tp, ip, lblkno, -1, &lbp, XFS_DATA_FORK))) {
-		do_error(_("can't add btree block to directory inode %llu\n"),
-			ino);
-		/* NOTREACHED */
-	}
-	leaf = lbp->data;
-	bzero(leaf, mp->m_dirblksize);
-	INT_SET(leaf->hdr.info.magic, ARCH_CONVERT, XFS_DIR2_LEAFN_MAGIC);
-	libxfs_da_log_buf(tp, lbp, 0, mp->m_dirblksize - 1);
-	libxfs_bmap_finish(&tp, &flist, firstblock, &committed);
-	libxfs_trans_commit(tp, 0, 0);
-
-	for (i = 0; i < freetab->nents; i += XFS_DIR2_MAX_FREE_BESTS(mp)) {
-		tp = libxfs_trans_alloc(mp, 0);
-		nres = XFS_DAENTER_SPACE_RES(mp, XFS_DATA_FORK);
-		error = libxfs_trans_reserve(tp,
-			nres, XFS_CREATE_LOG_RES(mp), 0, XFS_TRANS_PERM_LOG_RES,
-			XFS_CREATE_LOG_COUNT);
-		if (error)
-			res_failed(error);
-		libxfs_trans_ijoin(tp, ip, 0);
-		libxfs_trans_ihold(tp, ip);
-		XFS_BMAP_INIT(&flist, &firstblock);
-		bzero(&args, sizeof(args));
-		args.trans = tp;
-		args.dp = ip;
-		args.whichfork = XFS_DATA_FORK;
-		args.firstblock = &firstblock;
-		args.flist = &flist;
-		args.total = nres;
-		if ((error = libxfs_dir2_grow_inode(&args, XFS_DIR2_FREE_SPACE,
-						 &fbno)) ||
-		    (error = libxfs_da_get_buf(tp, ip, XFS_DIR2_DB_TO_DA(mp, fbno),
-					    -1, &fbp, XFS_DATA_FORK))) {
-			do_error(_("can't add free block to directory inode "
-				   "%llu\n"),
-				ino);
-			/* NOTREACHED */
-		}
-		free = fbp->data;
-		bzero(free, mp->m_dirblksize);
-		INT_SET(free->hdr.magic, ARCH_CONVERT, XFS_DIR2_FREE_MAGIC);
-		INT_SET(free->hdr.firstdb, ARCH_CONVERT, i);
-		INT_SET(free->hdr.nvalid, ARCH_CONVERT, XFS_DIR2_MAX_FREE_BESTS(mp));
-		if (i + INT_GET(free->hdr.nvalid, ARCH_CONVERT) > freetab->nents)
-			INT_SET(free->hdr.nvalid, ARCH_CONVERT, freetab->nents - i);
-		for (j = 0; j < INT_GET(free->hdr.nvalid, ARCH_CONVERT); j++) {
-			INT_SET(free->bests[j], ARCH_CONVERT, freetab->ents[i + j].v);
-			if (INT_GET(free->bests[j], ARCH_CONVERT) != NULLDATAOFF)
-				INT_MOD(free->hdr.nused, ARCH_CONVERT, +1);
-		}
-		libxfs_da_log_buf(tp, fbp, 0, mp->m_dirblksize - 1);
-		libxfs_bmap_finish(&tp, &flist, firstblock, &committed);
-		libxfs_trans_commit(tp, 0, 0);
-	}
-}
-
-/*
- * Rebuild the entries from a single data block.
- */
-void
-longform_dir2_rebuild_data(
-	xfs_mount_t		*mp,
-	xfs_ino_t		ino,
-	xfs_inode_t		*ip,
-	xfs_dablk_t		da_bno)
-{
-	xfs_dabuf_t		*bp;
-	xfs_dir2_block_tail_t	*btp;
-	int			committed;
-	xfs_dir2_data_t		*data;
-	xfs_dir2_db_t		dbno;
-	xfs_dir2_data_entry_t	*dep;
-	xfs_dir2_data_unused_t	*dup;
-	char			*endptr;
-	int			error;
-	xfs_dir2_free_t		*fblock;
-	xfs_dabuf_t		*fbp;
-	xfs_dir2_db_t		fdb;
-	int			fi;
-	xfs_fsblock_t		firstblock;
-	xfs_bmap_free_t		flist;
-	int			needlog;
-	int			needscan;
-	int			nres;
-	char			*ptr;
-	xfs_trans_t		*tp;
-
-	if (libxfs_da_read_buf(NULL, ip, da_bno, da_bno == 0 ? -2 : -1, &bp,
-			XFS_DATA_FORK)) {
-		do_error(_("can't read block %u for directory inode %llu\n"),
-			da_bno, ino);
-		/* NOTREACHED */
-	}
-	if (da_bno == 0 && bp == NULL)
-		/*
-		 * The block was punched out.
-		 */
-		return;
-	ASSERT(bp);
-	dbno = XFS_DIR2_DA_TO_DB(mp, da_bno);
-	fdb = XFS_DIR2_DB_TO_FDB(mp, dbno);
-	if (libxfs_da_read_buf(NULL, ip, XFS_DIR2_DB_TO_DA(mp, fdb), -1, &fbp,
-			XFS_DATA_FORK)) {
-		do_error(_("can't read block %u for directory inode %llu\n"),
-			XFS_DIR2_DB_TO_DA(mp, fdb), ino);
-		/* NOTREACHED */
-	}
-	data = malloc(mp->m_dirblksize);
-	if (!data) {
-		do_error(
-		_("malloc failed in longform_dir2_rebuild_data (%u bytes)\n"),
-			mp->m_dirblksize);
-		exit(1);
-	}
-	bcopy(bp->data, data, mp->m_dirblksize);
-	ptr = (char *)data->u;
-	if (INT_GET(data->hdr.magic, ARCH_CONVERT) == XFS_DIR2_BLOCK_MAGIC) {
-		btp = XFS_DIR2_BLOCK_TAIL_P(mp, (xfs_dir2_block_t *)data);
-		endptr = (char *)XFS_DIR2_BLOCK_LEAF_P(btp);
-	} else
-		endptr = (char *)data + mp->m_dirblksize;
-	fblock = fbp->data;
-	fi = XFS_DIR2_DB_TO_FDINDEX(mp, dbno);
-	tp = libxfs_trans_alloc(mp, 0);
-	error = libxfs_trans_reserve(tp, 0, XFS_CREATE_LOG_RES(mp), 0,
-		XFS_TRANS_PERM_LOG_RES, XFS_CREATE_LOG_COUNT);
-	if (error)
-		res_failed(error);
-	libxfs_trans_ijoin(tp, ip, 0);
-	libxfs_trans_ihold(tp, ip);
-	libxfs_da_bjoin(tp, bp);
-	libxfs_da_bhold(tp, bp);
-	libxfs_da_bjoin(tp, fbp);
-	libxfs_da_bhold(tp, fbp);
-	XFS_BMAP_INIT(&flist, &firstblock);
-	needlog = needscan = 0;
-	bzero(((xfs_dir2_data_t *)(bp->data))->hdr.bestfree,
-		sizeof(data->hdr.bestfree));
-	libxfs_dir2_data_make_free(tp, bp, (xfs_dir2_data_aoff_t)sizeof(data->hdr),
-		mp->m_dirblksize - sizeof(data->hdr), &needlog, &needscan);
-	ASSERT(needscan == 0);
-	libxfs_dir2_data_log_header(tp, bp);
-	INT_SET(fblock->bests[fi], ARCH_CONVERT,
-		INT_GET(((xfs_dir2_data_t *)(bp->data))->hdr.bestfree[0].length, ARCH_CONVERT));
-	libxfs_dir2_free_log_bests(tp, fbp, fi, fi);
-	libxfs_bmap_finish(&tp, &flist, firstblock, &committed);
-	libxfs_trans_commit(tp, 0, 0);
-
-	while (ptr < endptr) {
-		dup = (xfs_dir2_data_unused_t *)ptr;
-		if (INT_GET(dup->freetag, ARCH_CONVERT) == XFS_DIR2_DATA_FREE_TAG) {
-			ptr += INT_GET(dup->length, ARCH_CONVERT);
-			continue;
-		}
-		dep = (xfs_dir2_data_entry_t *)ptr;
-		ptr += XFS_DIR2_DATA_ENTSIZE(dep->namelen);
-		if (dep->name[0] == '/')
-			continue;
-		tp = libxfs_trans_alloc(mp, 0);
-		nres = XFS_CREATE_SPACE_RES(mp, dep->namelen);
-		error = libxfs_trans_reserve(tp, nres, XFS_CREATE_LOG_RES(mp), 0,
-			XFS_TRANS_PERM_LOG_RES, XFS_CREATE_LOG_COUNT);
-		if (error)
-			res_failed(error);
-		libxfs_trans_ijoin(tp, ip, 0);
-		libxfs_trans_ihold(tp, ip);
-		libxfs_da_bjoin(tp, bp);
-		libxfs_da_bhold(tp, bp);
-		libxfs_da_bjoin(tp, fbp);
-		libxfs_da_bhold(tp, fbp);
-		XFS_BMAP_INIT(&flist, &firstblock);
-		error = dir_createname(mp, tp, ip, (char *)dep->name,
-			dep->namelen, INT_GET(dep->inumber, ARCH_CONVERT),
-			&firstblock, &flist, nres);
-		ASSERT(error == 0);
-		libxfs_bmap_finish(&tp, &flist, firstblock, &committed);
-		libxfs_trans_commit(tp, 0, 0);
-	}
-	libxfs_da_brelse(NULL, bp);
-	libxfs_da_brelse(NULL, fbp);
-	free(data);
-}
-
-/*
- * Finish the rebuild of a directory.
- * Stuff / in and then remove it, this forces the directory to end
- * up in the right format.
- */
-void
-longform_dir2_rebuild_finish(
-	xfs_mount_t		*mp,
-	xfs_ino_t		ino,
-	xfs_inode_t		*ip)
-{
-	int			committed;
-	int			error;
-	xfs_fsblock_t		firstblock;
-	xfs_bmap_free_t		flist;
-	int			nres;
-	xfs_trans_t		*tp;
-
-	tp = libxfs_trans_alloc(mp, 0);
-	nres = XFS_CREATE_SPACE_RES(mp, 1);
-	error = libxfs_trans_reserve(tp, nres, XFS_CREATE_LOG_RES(mp), 0,
-		XFS_TRANS_PERM_LOG_RES, XFS_CREATE_LOG_COUNT);
-	if (error)
-		res_failed(error);
-	libxfs_trans_ijoin(tp, ip, 0);
-	libxfs_trans_ihold(tp, ip);
-	XFS_BMAP_INIT(&flist, &firstblock);
-	error = dir_createname(mp, tp, ip, "/", 1, ino,
-			&firstblock, &flist, nres);
-	ASSERT(error == 0);
-	libxfs_bmap_finish(&tp, &flist, firstblock, &committed);
-	libxfs_trans_commit(tp, 0, 0);
-
-	/* could kill trailing empty data blocks here */
-
-	tp = libxfs_trans_alloc(mp, 0);
-	nres = XFS_REMOVE_SPACE_RES(mp);
-	error = libxfs_trans_reserve(tp, nres, XFS_REMOVE_LOG_RES(mp), 0,
-		XFS_TRANS_PERM_LOG_RES, XFS_REMOVE_LOG_COUNT);
-	if (error)
-		res_failed(error);
-	libxfs_trans_ijoin(tp, ip, 0);
-	libxfs_trans_ihold(tp, ip);
-	XFS_BMAP_INIT(&flist, &firstblock);
-	error = dir_removename(mp, tp, ip, "/", 1, ino,
-			&firstblock, &flist, nres);
-	ASSERT(error == 0);
-	libxfs_bmap_finish(&tp, &flist, firstblock, &committed);
-	libxfs_trans_commit(tp, 0, 0);
-}
-
-/*
- * Rebuild a directory.
- * Remove all the non-data blocks.
- * Re-initialize to (empty) node form.
- * Loop over the data blocks reinserting each entry.
- * Force the directory into the right format.
- */
-void
-longform_dir2_rebuild(
-	xfs_mount_t	*mp,
-	xfs_ino_t	ino,
-	xfs_inode_t	*ip,
-	int		*num_illegal,
-	freetab_t	*freetab,
-	int		isblock)
-{
-	xfs_dabuf_t	*bp;
-	xfs_dablk_t	da_bno;
-	xfs_fileoff_t	next_da_bno;
-
-	do_warn(_("rebuilding directory inode %llu\n"), ino);
-
-	/* kill leaf blocks */
-	for (da_bno = mp->m_dirleafblk, next_da_bno = isblock ? NULLFILEOFF : 0;
-	     next_da_bno != NULLFILEOFF;
-	     da_bno = (xfs_dablk_t)next_da_bno) {
-		next_da_bno = da_bno + mp->m_dirblkfsbs - 1;
-		if (libxfs_bmap_next_offset(NULL, ip, &next_da_bno, XFS_DATA_FORK))
-			break;
-		if (libxfs_da_get_buf(NULL, ip, da_bno, -1, &bp, XFS_DATA_FORK)) {
-			do_error(_("can't get block %u for directory inode "
-				   "%llu\n"),
-				da_bno, ino);
-			/* NOTREACHED */
-		}
-		dir2_kill_block(mp, ip, da_bno, bp);
-	}
-
-	/* rebuild empty btree and freelist */
-	longform_dir2_rebuild_setup(mp, ino, ip, freetab);
-
-	/* rebuild directory */
-	for (da_bno = mp->m_dirdatablk, next_da_bno = 0;
-	     da_bno < mp->m_dirleafblk && next_da_bno != NULLFILEOFF;
-	     da_bno = (xfs_dablk_t)next_da_bno) {
-		next_da_bno = da_bno + mp->m_dirblkfsbs - 1;
-		if (libxfs_bmap_next_offset(NULL, ip, &next_da_bno, XFS_DATA_FORK))
-			break;
-		longform_dir2_rebuild_data(mp, ino, ip, da_bno);
-	}
-
-	/* put the directory in the appropriate on-disk format */
-	longform_dir2_rebuild_finish(mp, ino, ip);
-	*num_illegal = 0;
-}
-
-/*
- * succeeds or dies, inode never gets dirtied since all changes
- * happen in file blocks.  the inode size and other core info
- * is already correct, it's just the leaf entries that get altered.
- * XXX above comment is wrong for v2 - need to see why it matters
+ * If a directory is corrupt, we need to read in as many entries as possible,
+ * destroy the entry and create a new one with recovered name/inode pairs.
+ * (ie. get libxfs to do all the grunt work)
  */
 void
 longform_dir2_entry_check(xfs_mount_t	*mp,
@@ -2810,15 +2519,14 @@ longform_dir2_entry_check(xfs_mount_t	*m
 			dir_stack_t	*stack,
 			ino_tree_node_t	*irec,
 			int		ino_offset,
-			name_hash_tab_t	*nametab)
+			dir_hash_tab_t	*hashtab)
 {
 	xfs_dir2_block_t	*block;
 	xfs_dir2_leaf_entry_t	*blp;
-	xfs_dabuf_t		*bp;
+	xfs_dabuf_t		**bplist;
 	xfs_dir2_block_tail_t	*btp;
 	xfs_dablk_t		da_bno;
 	freetab_t		*freetab;
-	dir_hash_tab_t		*hashtab;
 	int			i;
 	int			isblock;
 	int			isleaf;
@@ -2840,6 +2548,7 @@ longform_dir2_entry_check(xfs_mount_t	*m
 		freetab->ents[i].v = NULLDATAOFF;
 		freetab->ents[i].s = 0;
 	}
+	bplist = calloc(freetab->naents, sizeof(xfs_dabuf_t*));
 	/* is this a block, leaf, or node directory? */
 	libxfs_dir2_isblock(NULL, ip, &isblock);
 	libxfs_dir2_isleaf(NULL, ip, &isleaf);
@@ -2847,50 +2556,58 @@ longform_dir2_entry_check(xfs_mount_t	*m
 	if (do_prefetch && !isblock)
 		prefetch_p6_dir2(mp, ip);
 
-	/* check directory data */
-	hashtab = dir_hash_init(ip->i_d.di_size);
+	/* check directory "data" blocks (ie. name/inode pairs) */
 	for (da_bno = 0, next_da_bno = 0;
 	     next_da_bno != NULLFILEOFF && da_bno < mp->m_dirleafblk;
 	     da_bno = (xfs_dablk_t)next_da_bno) {
 		next_da_bno = da_bno + mp->m_dirblkfsbs - 1;
+		ASSERT(XFS_DIR2_DA_TO_DB(mp, da_bno) < freetab->naents);
 		if (libxfs_bmap_next_offset(NULL, ip, &next_da_bno, XFS_DATA_FORK))
 			break;
-		if (libxfs_da_read_bufr(NULL, ip, da_bno,
-				da_bno == 0 ? -2 : -1, &bp, XFS_DATA_FORK)) {
-			do_error(_("can't read block %u for directory inode "
-				   "%llu\n"),
+		if (libxfs_da_read_bufr(NULL, ip, da_bno, -1, 
+				&bplist[XFS_DIR2_DA_TO_DB(mp, da_bno)], 
+				XFS_DATA_FORK)) {
+			do_warn(_(
+			"can't read data block %u for directory inode %llu\n"),
 				da_bno, ino);
-			/* NOTREACHED */
+			*num_illegal++;
+			continue;	/* try and read all "data" blocks */
 		}
-		/* is there a hole at the start? */
-		if (da_bno == 0 && bp == NULL)
-			continue;
 		longform_dir2_entry_check_data(mp, ip, num_illegal, need_dot,
-			stack, irec, ino_offset, &bp, hashtab, &freetab, 
-			nametab, da_bno, isblock);
-		/* it releases the buffer unless isblock is set */
+				stack, irec, ino_offset, 
+				&bplist[XFS_DIR2_DA_TO_DB(mp, da_bno)], hashtab,  
+				&freetab, da_bno, isblock);
 	}
 	fixit = (*num_illegal != 0) || dir2_is_badino(ino);
 
 	/* check btree and freespace */
 	if (isblock) {
-		ASSERT(bp);
-		block = bp->data;
+		block = bplist[0]->data;
 		btp = XFS_DIR2_BLOCK_TAIL_P(mp, block);
 		blp = XFS_DIR2_BLOCK_LEAF_P(btp);
-		seeval = dir_hash_see_all(hashtab, blp, INT_GET(btp->count, ARCH_CONVERT), INT_GET(btp->stale,
ARCH_CONVERT));
+		seeval = dir_hash_see_all(hashtab, blp, 
+				INT_GET(btp->count, ARCH_CONVERT), 
+				INT_GET(btp->stale, ARCH_CONVERT));
 		if (dir_hash_check(hashtab, ip, seeval))
 			fixit |= 1;
-		libxfs_da_brelse(NULL, bp);
 	} else if (isleaf) {
 		fixit |= longform_dir2_check_leaf(mp, ip, hashtab, freetab);
 	} else {
 		fixit |= longform_dir2_check_node(mp, ip, hashtab, freetab);
 	}
-	dir_hash_done(hashtab);
-	if (!no_modify && fixit)
-		longform_dir2_rebuild(mp, ino, ip, num_illegal, freetab,
-			isblock);
+	if (!no_modify && fixit) {
+		dir_hash_dup_names(hashtab);
+		for (i = 0; i < freetab->naents; i++) 
+			if (bplist[i])
+				libxfs_da_brelse(NULL, bplist[i]);
+		longform_dir2_rebuild(mp, ino, ip, hashtab);
+		*num_illegal = 0;
+	} else {
+		for (i = 0; i < freetab->naents; i++) 
+			if (bplist[i])
+				libxfs_da_brelse(NULL, bplist[i]);
+	}
+	
 	free(freetab);
 }
 
@@ -2906,7 +2623,7 @@ shortform_dir_entry_check(xfs_mount_t	*m
 			dir_stack_t	*stack,
 			ino_tree_node_t	*current_irec,
 			int		current_ino_offset,
-			name_hash_tab_t	*nametab)
+			dir_hash_tab_t	*hashtab)
 {
 	xfs_ino_t		lino;
 	xfs_ino_t		parent;
@@ -3044,7 +2761,7 @@ _("entry \"%s\" in shortform dir %llu re
 		ASSERT(irec != NULL);
 
 		ino_offset = XFS_INO_TO_AGINO(mp, lino) - irec->ino_startnum;
-
+		
 		/*
 		 * if it's a free inode, blow out the entry.
 		 * by now, any inode that we think is free
@@ -3066,8 +2783,9 @@ _("entry \"%s\" in shortform dir inode %
 				do_warn(_("would junk entry \"%s\"\n"),
 					fname);
 			}
-		} else if (!name_hash_add(nametab, sf_entry->name, 
-					sf_entry->namelen)) {
+		} else if (!dir_hash_add(hashtab, 
+				(xfs_dir2_dataptr_t)(sf_entry - &sf->list[0]),
+				lino, sf_entry->namelen, sf_entry->name)) {
 			/*
 			 * check for duplicate names in directory.
 			 */ 
@@ -3311,7 +3029,7 @@ shortform_dir2_entry_check(xfs_mount_t	*
 			dir_stack_t	*stack,
 			ino_tree_node_t	*current_irec,
 			int		current_ino_offset,
-			name_hash_tab_t	*nametab)
+			dir_hash_tab_t	*hashtab)
 {
 	xfs_ino_t		lino;
 	xfs_ino_t		parent;
@@ -3484,7 +3202,9 @@ shortform_dir2_entry_check(xfs_mount_t	*
 				do_warn(_("would junk entry \"%s\"\n"),
 					fname);
 			}
-		} else if (!name_hash_add(nametab, sfep->name, sfep->namelen)) {
+		} else if (!dir_hash_add(hashtab, (xfs_dir2_dataptr_t)
+					(sfep - XFS_DIR2_SF_FIRSTENTRY(sfp)),
+				lino, sfep->namelen, sfep->name)) {
 			/*
 			 * check for duplicate names in directory.
 			 */ 
@@ -3650,7 +3370,7 @@ process_dirstack(xfs_mount_t *mp, dir_st
 	xfs_trans_t		*tp;
 	xfs_dahash_t		hashval;
 	ino_tree_node_t		*irec;
-	name_hash_tab_t		*nametab;
+	dir_hash_tab_t		*hashtab;
 	int			ino_offset, need_dot, committed;
 	int			dirty, num_illegal, error, nres;
 
@@ -3731,7 +3451,7 @@ process_dirstack(xfs_mount_t *mp, dir_st
 
 		add_inode_refchecked(ino, irec, ino_offset);
 
-		nametab = name_hash_init(ip->i_d.di_size);
+		hashtab = dir_hash_init(ip->i_d.di_size);
 
 		/*
 		 * look for bogus entries
@@ -3750,13 +3470,13 @@ process_dirstack(xfs_mount_t *mp, dir_st
 							&num_illegal, &need_dot,
 							stack, irec,
 							ino_offset,
-							nametab);
+							hashtab);
 			else
 				longform_dir_entry_check(mp, ino, ip,
 							&num_illegal, &need_dot,
 							stack, irec,
 							ino_offset,
-							nametab);
+							hashtab);
 			break;
 		case XFS_DINODE_FMT_LOCAL:
 			tp = libxfs_trans_alloc(mp, 0);
@@ -3781,12 +3501,12 @@ process_dirstack(xfs_mount_t *mp, dir_st
 				shortform_dir2_entry_check(mp, ino, ip, &dirty,
 							stack, irec,
 							ino_offset,
-							nametab);
+							hashtab);
 			else
 				shortform_dir_entry_check(mp, ino, ip, &dirty,
 							stack, irec,
 							ino_offset,
-							nametab);
+							hashtab);
 
 			ASSERT(dirty == 0 || (dirty && !no_modify));
 			if (dirty)  {
@@ -3801,7 +3521,7 @@ process_dirstack(xfs_mount_t *mp, dir_st
 		default:
 			break;
 		}
-		name_hash_done(nametab);
+		dir_hash_done(hashtab);
 
 		hashval = 0;
 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Review: xfs_repair fixes for dir2 corruption
  2006-07-28  1:58         ` Review: xfs_repair fixes for dir2 corruption Barry Naujok
@ 2006-07-28  8:10           ` Nathan Scott
  2006-07-28 14:45             ` Madan Valluri
  2006-07-30  5:19           ` christian
  2006-08-01 21:50           ` Adam Sjøgren
  2 siblings, 1 reply; 12+ messages in thread
From: Nathan Scott @ 2006-07-28  8:10 UTC (permalink / raw)
  To: Barry Naujok, mvalluri; +Cc: xfs

On Fri, Jul 28, 2006 at 11:58:52AM +1000, Barry Naujok wrote:
> This patch addresses the following xfs_repair issues:

The libxfs cache stuff looks good to me.  Maybe Madan can cast
an eye over the repair changes for ya?

cheers.

>   - Can rebuild most corrupted directories. Some will be 
>     beyond repair and the contents of those will end up 
>     in lost+found.
>   - Fixed potential problems with the duplicate name 
>     checking by properly referencing the buffers where
>     the names are stored.
>   - Unified the two hash lists used in directory checks.
>   - Fixed a case where incorrectly reference counted and
>     dirty buffers were never written to disk (most 
>     common observation of this is an inaccessible 
>     lost+found directory after a repair).
> 
> For those that are keen to fix the 16777216 problem, give this patch a go.
> 
> 
> 
> 
> ===========================================================================
> xfsprogs/include/cache.h
> ===========================================================================
> 
> --- a/xfsprogs/include/cache.h	2006-07-28 11:46:09.000000000 +1000
> +++ b/xfsprogs/include/cache.h	2006-07-27 16:17:47.804986322 +1000
> @@ -30,6 +30,7 @@ struct cache_node;
>  typedef void *cache_key_t;
>  typedef void (*cache_walk_t)(struct cache_node *);
>  typedef struct cache_node * (*cache_node_alloc_t)(void);
> +typedef void (*cache_node_flush_t)(struct cache_node *);
>  typedef void (*cache_node_relse_t)(struct cache_node *);
>  typedef unsigned int (*cache_node_hash_t)(cache_key_t, unsigned int);
>  typedef int (*cache_node_compare_t)(struct cache_node *, cache_key_t);
> @@ -38,6 +39,7 @@ typedef unsigned int (*cache_bulk_relse_
>  struct cache_operations {
>  	cache_node_hash_t	hash;
>  	cache_node_alloc_t	alloc;
> +	cache_node_flush_t	flush;
>  	cache_node_relse_t	relse;
>  	cache_node_compare_t	compare;
>  	cache_bulk_relse_t	bulkrelse;	/* optional */
> @@ -49,6 +51,7 @@ struct cache {
>  	pthread_mutex_t		c_mutex;	/* node count mutex */
>  	cache_node_hash_t	hash;		/* node hash function */
>  	cache_node_alloc_t	alloc;		/* allocation function */
> +	cache_node_flush_t	flush;		/* flush dirty data function */
>  	cache_node_relse_t	relse;		/* memory free function */
>  	cache_node_compare_t	compare;	/* comparison routine */
>  	cache_bulk_relse_t	bulkrelse;	/* bulk release routine */
> @@ -75,6 +78,7 @@ struct cache *cache_init(unsigned int, s
>  void cache_destroy(struct cache *);
>  void cache_walk(struct cache *, cache_walk_t);
>  void cache_purge(struct cache *);
> +void cache_flush(struct cache *);
>  
>  int cache_node_get(struct cache *, cache_key_t, struct cache_node **);
>  void cache_node_put(struct cache_node *);
> 
> ===========================================================================
> xfsprogs/include/libxfs.h
> ===========================================================================
> 
> --- a/xfsprogs/include/libxfs.h	2006-07-28 11:46:09.000000000 +1000
> +++ b/xfsprogs/include/libxfs.h	2006-07-27 16:34:45.308344014 +1000
> @@ -257,6 +257,7 @@ extern int	libxfs_writebuf_int (xfs_buf_
>  extern struct cache	*libxfs_bcache;
>  extern struct cache_operations	libxfs_bcache_operations;
>  extern void	libxfs_bcache_purge (void);
> +extern void	libxfs_bcache_flush (void);
>  extern xfs_buf_t	*libxfs_getbuf (dev_t, xfs_daddr_t, int);
>  extern void	libxfs_putbuf (xfs_buf_t *);
>  extern void	libxfs_purgebuf (xfs_buf_t *);
> @@ -467,6 +468,8 @@ extern int	libxfs_bmap_finish (xfs_trans
>  				xfs_fsblock_t, int *);
>  extern int	libxfs_bmap_next_offset (xfs_trans_t *, xfs_inode_t *,
>  				xfs_fileoff_t *, int);
> +extern int	libxfs_bmap_last_offset(xfs_trans_t *, xfs_inode_t *, 
> +				xfs_fileoff_t *, int);
>  extern int	libxfs_bunmapi (xfs_trans_t *, xfs_inode_t *, xfs_fileoff_t,
>  				xfs_filblks_t, int, xfs_extnum_t,
>  				xfs_fsblock_t *, xfs_bmap_free_t *, int *);
> 
> ===========================================================================
> xfsprogs/libxfs/cache.c
> ===========================================================================
> 
> --- a/xfsprogs/libxfs/cache.c	2006-07-28 11:46:09.000000000 +1000
> +++ b/xfsprogs/libxfs/cache.c	2006-07-27 17:42:43.812685388 +1000
> @@ -60,6 +60,7 @@ cache_init(
>  	cache->c_hashsize = hashsize;
>  	cache->hash = cache_operations->hash;
>  	cache->alloc = cache_operations->alloc;
> +	cache->flush = cache_operations->flush;
>  	cache->relse = cache_operations->relse;
>  	cache->compare = cache_operations->compare;
>  	cache->bulkrelse = cache_operations->bulkrelse ?
> @@ -422,6 +423,39 @@ cache_purge(
>  		cache_abort();
>  	}
>  #endif
> +	/* flush any remaining nodes to disk */
> +	cache_flush(cache);
> +}
> +
> +/*
> + * Flush all nodes in the cache to disk. 
> + */
> +void
> +cache_flush(
> +	struct cache *		cache)
> +{
> +	struct cache_hash *	hash;
> +	struct list_head *	head;
> +	struct list_head *	pos;
> +	struct cache_node *	node;
> +	int			i;
> +	
> +	if (!cache->flush)
> +		return;
> +	
> +	for (i = 0; i < cache->c_hashsize; i++) {
> +		hash = &cache->c_hash[i];
> +		
> +		pthread_mutex_lock(&hash->ch_mutex);
> +		head = &hash->ch_list;
> +		for (pos = head->next; pos != head; pos = pos->next) {
> +			node = (struct cache_node *)pos;
> +			pthread_mutex_lock(&node->cn_mutex);
> +			cache->flush(node);
> +			pthread_mutex_unlock(&node->cn_mutex);
> +		}
> +		pthread_mutex_unlock(&hash->ch_mutex);
> +	}
>  }
>  
>  #define	HASH_REPORT	(3*HASH_CACHE_RATIO)
> 
> ===========================================================================
> xfsprogs/libxfs/rdwr.c
> ===========================================================================
> 
> --- a/xfsprogs/libxfs/rdwr.c	2006-07-28 11:46:09.000000000 +1000
> +++ b/xfsprogs/libxfs/rdwr.c	2006-07-27 16:40:56.612373938 +1000
> @@ -416,6 +416,15 @@ libxfs_iomove(xfs_buf_t *bp, uint boff, 
>  }
>  
>  static void
> +libxfs_bflush(struct cache_node *node)
> +{
> +	xfs_buf_t		*bp = (xfs_buf_t *)node;
> +
> +	if ((bp != NULL) && (bp->b_flags & LIBXFS_B_DIRTY))
> +		libxfs_writebufr(bp);
> +}
> +
> +static void
>  libxfs_brelse(struct cache_node *node)
>  {
>  	xfs_buf_t		*bp = (xfs_buf_t *)node;
> @@ -442,9 +451,16 @@ libxfs_bcache_purge(void)
>  	cache_purge(libxfs_bcache);
>  }
>  
> +void 
> +libxfs_bcache_flush(void)
> +{
> +	cache_flush(libxfs_bcache);
> +}
> +
>  struct cache_operations libxfs_bcache_operations = {
>  	/* .hash */	libxfs_bhash,
>  	/* .alloc */	libxfs_balloc,
> +	/* .flush */	libxfs_bflush,
>  	/* .relse */	libxfs_brelse,
>  	/* .compare */	libxfs_bcompare,
>  	/* .bulkrelse */ NULL	/* TODO: lio_listio64 interface? */
> @@ -649,6 +665,7 @@ libxfs_icache_purge(void)
>  struct cache_operations libxfs_icache_operations = {
>  	/* .hash */	libxfs_ihash,
>  	/* .alloc */	libxfs_ialloc,
> +	/* .flush */	NULL,
>  	/* .relse */	libxfs_irelse,
>  	/* .compare */	libxfs_icompare,
>  	/* .bulkrelse */ NULL
> 
> ===========================================================================
> xfsprogs/libxfs/xfs.h
> ===========================================================================
> 
> --- a/xfsprogs/libxfs/xfs.h	2006-07-28 11:46:09.000000000 +1000
> +++ b/xfsprogs/libxfs/xfs.h	2006-07-25 12:05:31.917896892 +1000
> @@ -98,6 +98,7 @@
>  #define xfs_bmapi_single		libxfs_bmapi_single
>  #define xfs_bmap_finish			libxfs_bmap_finish
>  #define xfs_bmap_del_free		libxfs_bmap_del_free
> +#define xfs_bmap_last_offset		libxfs_bmap_last_offset
>  #define xfs_bunmapi			libxfs_bunmapi
>  #define xfs_free_extent			libxfs_free_extent
>  #define xfs_rtfree_extent		libxfs_rtfree_extent
> 
> ===========================================================================
> xfsprogs/repair/phase6.c
> ===========================================================================
> 
> --- a/xfsprogs/repair/phase6.c	2006-07-28 11:46:09.000000000 +1000
> +++ b/xfsprogs/repair/phase6.c	2006-07-28 11:30:50.033905530 +1000
> @@ -36,43 +36,35 @@ static int orphanage_entered;
>  
>  /*
>   * Data structures and routines to keep track of directory entries
> - * and whether their leaf entry has been seen
> + * and whether their leaf entry has been seen. Also used for name
> + * duplicate checking and rebuilding step if required.
>   */
>  typedef struct dir_hash_ent {
> -	struct dir_hash_ent	*next;	/* pointer to next entry */
> +	struct dir_hash_ent	*nextbyaddr;/* pointer to next entry */
> +	struct dir_hash_ent	*nextbyhash;
> +	struct dir_hash_ent	*nextbyorder;
>  	xfs_dir2_leaf_entry_t	ent;	/* address and hash value */
> +	xfs_ino_t 		inum;	/* inode of name */
>  	short			junkit;	/* name starts with / */
>  	short			seen;	/* have seen leaf entry */
> +	int	  	    	namelen;/* length of name */
> +	uchar_t    	    	*name;	/* pointer to name (no NULL) */
>  } dir_hash_ent_t;
>  
>  typedef struct dir_hash_tab {
>  	int			size;	/* size of hash table */
> -	dir_hash_ent_t		*tab[1];/* actual hash table, variable size */
> +	int			names_duped;
> +	dir_hash_ent_t		*first;
> +	dir_hash_ent_t		*last;
> +	dir_hash_ent_t		**byhash;/* actual hash table, variable size */
> +	dir_hash_ent_t		**byaddr;/* actual hash table, variable size */
>  } dir_hash_tab_t;
> +
>  #define	DIR_HASH_TAB_SIZE(n)	\
> -	(offsetof(dir_hash_tab_t, tab) + (sizeof(dir_hash_ent_t *) * (n)))
> +	(sizeof(dir_hash_tab_t) + (sizeof(dir_hash_ent_t *) * (n) * 2))
>  #define	DIR_HASH_FUNC(t,a)	((a) % (t)->size)
>  
>  /*
> - * Track names to check for duplicates in a directory.
> - */
> -
> -typedef struct name_hash_ent {
> -	struct name_hash_ent	*next;	/* pointer to next entry */
> -	xfs_dahash_t		hashval;/* hash value of name */
> -	int	  	    	namelen;/* length of name */
> -	uchar_t    	    	*name;	/* pointer to name (no NULL) */
> -} name_hash_ent_t;		
> -
> -typedef struct name_hash_tab {
> -	int			size;	/* size of hash table */
> -	name_hash_ent_t		*tab[1];/* actual hash table, variable size */
> -} name_hash_tab_t;
> -#define	NAME_HASH_TAB_SIZE(n)	\
> -	(offsetof(name_hash_tab_t, tab) + (sizeof(name_hash_ent_t *) * (n)))
> -#define	NAME_HASH_FUNC(t,a)	((a) % (t)->size)
> -
> -/*
>   * Track the contents of the freespace table in a directory.
>   */
>  typedef struct freetab {
> @@ -94,28 +86,75 @@ typedef struct freetab {
>  #define	DIR_HASH_CK_BADSTALE	5
>  #define	DIR_HASH_CK_TOTAL	6
>  
> -static void
> +/*
> + * Returns 0 if the name already exists (ie. a duplicate)
> + */
> +static int
>  dir_hash_add(
>  	dir_hash_tab_t		*hashtab,
> -	xfs_dahash_t		hash,
> -	xfs_dir2_dataptr_t	addr,
> -	int			junk)
> -{
> -	int			i;
> +	xfs_dir2_dataptr_t	addr,	
> +	xfs_ino_t		inum,
> +	int			namelen,
> +	uchar_t			*name)
> +{
> +	xfs_dahash_t		hash = 0;
> +	int			byaddr;
> +	int			byhash = 0;
>  	dir_hash_ent_t		*p;
> -
> -	i = DIR_HASH_FUNC(hashtab, addr);
> +	int			dup;
> +	short			junk;
> +	
> +	junk = name[0] == '/';
> +	byaddr = DIR_HASH_FUNC(hashtab, addr);
> +	dup = 0;
> +
> +	if (!junk) {
> +		hash = libxfs_da_hashname(name, namelen);
> +		byhash = DIR_HASH_FUNC(hashtab, hash);
> +
> +		/* 
> +		 * search hash bucket for existing name.
> +		 */
> +		for (p = hashtab->byhash[byhash]; p; p = p->nextbyhash) {
> +			if (p->ent.hashval == hash && p->namelen == namelen) {
> +				if (memcmp(p->name, name, namelen) == 0) {
> +					dup = 1;
> +					break;
> +				}
> +			}
> +		}
> +	}
> +	
>  	if ((p = malloc(sizeof(*p))) == NULL)
>  		do_error(_("malloc failed in dir_hash_add (%u bytes)\n"),
>  			sizeof(*p));
> -	p->next = hashtab->tab[i];
> -	hashtab->tab[i] = p;
> -	if (!(p->junkit = junk))
> +	
> +	p->nextbyaddr = hashtab->byaddr[byaddr];
> +	hashtab->byaddr[byaddr] = p;
> +	if (hashtab->last) 
> +		hashtab->last->nextbyorder = p;
> +	else
> +		hashtab->first = p;
> +	p->nextbyorder = NULL;
> +	hashtab->last = p;
> +	
> +	if (!(p->junkit = junk)) {
>  		p->ent.hashval = hash;
> +		p->nextbyhash = hashtab->byhash[byhash];
> +		hashtab->byhash[byhash] = p;
> +	}
>  	p->ent.address = addr;
> +	p->inum = inum;
>  	p->seen = 0;
> +	p->namelen = namelen;
> +	p->name = name;
> +	
> +	return !dup;
>  }
>  
> +/*
> + * checks to see if any data entries are not in the leaf blocks 
> + */
>  static int
>  dir_hash_unseen(
>  	dir_hash_tab_t	*hashtab)
> @@ -124,7 +163,7 @@ dir_hash_unseen(
>  	dir_hash_ent_t	*p;
>  
>  	for (i = 0; i < hashtab->size; i++) {
> -		for (p = hashtab->tab[i]; p; p = p->next) {
> +		for (p = hashtab->byaddr[i]; p; p = p->nextbyaddr) {
>  			if (p->seen == 0)
>  				return 1;
>  		}
> @@ -173,8 +212,10 @@ dir_hash_done(
>  	dir_hash_ent_t	*p;
>  
>  	for (i = 0; i < hashtab->size; i++) {
> -		for (p = hashtab->tab[i]; p; p = n) {
> -			n = p->next;
> +		for (p = hashtab->byaddr[i]; p; p = n) {
> +			n = p->nextbyaddr;
> +			if (hashtab->names_duped)
> +				free(p->name);
>  			free(p);
>  		}
>  	}
> @@ -196,6 +237,10 @@ dir_hash_init(
>  	if ((hashtab = calloc(DIR_HASH_TAB_SIZE(hsize), 1)) == NULL)
>  		do_error(_("calloc failed in dir_hash_init\n"));
>  	hashtab->size = hsize;
> +	hashtab->byhash = (dir_hash_ent_t**)((char *)hashtab + 
> +		sizeof(dir_hash_tab_t));
> +	hashtab->byaddr = (dir_hash_ent_t**)((char *)hashtab + 
> +		sizeof(dir_hash_tab_t) + sizeof(dir_hash_ent_t*) * hsize);
>  	return hashtab;
>  }
>  
> @@ -209,7 +254,7 @@ dir_hash_see(
>  	dir_hash_ent_t		*p;
>  
>  	i = DIR_HASH_FUNC(hashtab, addr);
> -	for (p = hashtab->tab[i]; p; p = p->next) {
> +	for (p = hashtab->byaddr[i]; p; p = p->nextbyaddr) {
>  		if (p->ent.address != addr)
>  			continue;
>  		if (p->seen)
> @@ -222,6 +267,10 @@ dir_hash_see(
>  	return DIR_HASH_CK_NODATA;
>  }
>  
> +/*
> + * checks to make sure leafs match a data entry, and that the stale
> + * count is valid.
> + */
>  static int
>  dir_hash_see_all(
>  	dir_hash_tab_t		*hashtab,
> @@ -246,81 +295,25 @@ dir_hash_see_all(
>  }
>  
>  /*
> - * Returns 0 if the name already exists (ie. a duplicate)
> + * Convert name pointers into locally allocated memory
>   */
> -static int
> -name_hash_add(
> -	name_hash_tab_t		*nametab,
> -	uchar_t			*name,
> -	int			namelen)
> +static void
> +dir_hash_dup_names(dir_hash_tab_t *hashtab)
>  {
> -	xfs_dahash_t		hash;
> -	int			i;
> -	name_hash_ent_t		*p;
> -
> -	hash = libxfs_da_hashname(name, namelen);
> -			
> -	i = NAME_HASH_FUNC(nametab, hash);
> -	
> -	/* 
> -	 * search hash bucket for existing name.
> -	 */
> -	for (p = nametab->tab[i]; p; p = p->next) {
> -		if (p->hashval == hash && p->namelen == namelen) {
> -			if (memcmp(p->name, name, namelen) == 0) 
> -				return 0; /* exists */
> -		}
> -	}
> -	
> -	if ((p = malloc(sizeof(*p))) == NULL)
> -		do_error(_("malloc failed in name_hash_add (%u bytes)\n"),
> -			sizeof(*p));
> +	uchar_t			*name;
> +	dir_hash_ent_t		*p;
>  	
> -	p->next = nametab->tab[i];
> -	p->hashval = hash;
> -	p->name = name;
> -	p->namelen = namelen;
> -	nametab->tab[i] = p;
> +	if (hashtab->names_duped)
> +		return;
>  	
> -	return 1;	/* success, no duplicate */
> -}
> -
> -static name_hash_tab_t *
> -name_hash_init(
> -	xfs_fsize_t	size)
> -{
> -	name_hash_tab_t	*nametab;
> -	int		hsize;
> -
> -	hsize = size / (16 * 4);
> -	if (hsize > 1024)
> -		hsize = 1024;
> -	else if (hsize < 16)
> -		hsize = 16;
> -	if ((nametab = calloc(NAME_HASH_TAB_SIZE(hsize), 1)) == NULL)
> -		do_error(_("calloc failed in name_hash_init\n"));
> -	nametab->size = hsize;
> -	return nametab;
> -}
> -
> -static void
> -name_hash_done(
> -	name_hash_tab_t	*nametab)
> -{
> -	int		i;
> -	name_hash_ent_t	*n;
> -	name_hash_ent_t	*p;
> -
> -	for (i = 0; i < nametab->size; i++) {
> -		for (p = nametab->tab[i]; p; p = n) {
> -			n = p->next;
> -			free(p);
> -		}
> +	for (p = hashtab->first; p; p = p->nextbyorder) {
> +		name = malloc(p->namelen);
> +		memcpy(name, p->name, p->namelen);
> +		p->name = name;
>  	}
> -	free(nametab);
> +	hashtab->names_duped = 1;
>  }
>  
> -
>  /*
>   * Version 1 or 2 directory routine wrappers
>  */
> @@ -1385,7 +1378,8 @@ lf_block_dir_entry_check(xfs_mount_t		*m
>  			dir_stack_t		*stack,
>  			ino_tree_node_t		*current_irec,
>  			int			current_ino_offset,
> -			name_hash_tab_t		*nametab)
> +			dir_hash_tab_t		*hashtab,
> +			xfs_dablk_t		da_bno)
>  {
>  	xfs_dir_leaf_entry_t	*entry;
>  	ino_tree_node_t		*irec;
> @@ -1545,7 +1539,9 @@ lf_block_dir_entry_check(xfs_mount_t		*m
>  		/*
>  		 * check for duplicate names in directory.
>  		 */ 
> -		if (!name_hash_add(nametab, namest->name, entry->namelen)) {
> +		if (!dir_hash_add(hashtab, (da_bno << mp->m_sb.sb_blocklog) + 
> +						entry->nameidx, 
> +				lino, entry->namelen, namest->name)) {
>  			do_warn(
>  		_("entry \"%s\" (ino %llu) in dir %llu is a duplicate name"),
>  				fname, lino, ino);
> @@ -1635,7 +1631,7 @@ longform_dir_entry_check(xfs_mount_t	*mp
>  			dir_stack_t	*stack,
>  			ino_tree_node_t	*irec,
>  			int		ino_offset,
> -			name_hash_tab_t	*nametab)
> +			dir_hash_tab_t	*hashtab)
>  {
>  	xfs_dir_leafblock_t	*leaf;
>  	xfs_buf_t		*bp;
> @@ -1677,8 +1673,6 @@ longform_dir_entry_check(xfs_mount_t	*mp
>  
>  		leaf = (xfs_dir_leafblock_t *)XFS_BUF_PTR(bp);
>  
> -		da_bno = INT_GET(leaf->hdr.info.forw, ARCH_CONVERT);
> -
>  		if (INT_GET(leaf->hdr.info.magic, ARCH_CONVERT) !=
>  		    XFS_DIR_LEAF_MAGIC)  {
>  			if (!no_modify)  {
> @@ -1699,9 +1693,11 @@ _("bad magic # (0x%x) for dir ino %llu l
>  		}
>  
>  		if (!skipit)
> -			lf_block_dir_entry_check(mp, ino, leaf, &dirty,
> -						num_illegal, need_dot, stack,
> -						irec, ino_offset, nametab);
> +			lf_block_dir_entry_check(mp, ino, leaf, &dirty, 
> +					num_illegal, need_dot, stack, irec, 
> +					ino_offset, hashtab, da_bno);
> +
> +		da_bno = INT_GET(leaf->hdr.info.forw, ARCH_CONVERT);
>  
>  		ASSERT(dirty == 0 || (dirty && !no_modify));
>  
> @@ -1745,6 +1741,127 @@ _("can't map leaf block %d in dir %llu, 
>  }
>  
>  /*
> + * Unexpected failure during the rebuild will leave the entries in
> + * lost+found on the next run
> + */
> +
> +static void 
> +longform_dir2_rebuild(
> +	xfs_mount_t	*mp,
> +	xfs_ino_t	ino,
> +	xfs_inode_t	*ip,
> +	dir_hash_tab_t	*hashtab)
> +{
> +	int			error;
> +	int			nres;
> +	xfs_trans_t		*tp;
> +	xfs_fileoff_t		lastblock;
> +	xfs_fsblock_t		firstblock;
> +	xfs_bmap_free_t		flist;
> +	xfs_ino_t		parentino;
> +	xfs_inode_t		*pip;
> +	int			byhash;
> +	dir_hash_ent_t		*p;
> +	int			committed;
> +	int			done;
> +	
> +	/* 
> +	 * trash directory completely and rebuild from scratch using the
> +	 * name/inode pairs in the hash table
> +	 */
> +	 
> +	do_warn(_("rebuilding directory inode %llu\n"), ino);
> +	
> +	/* 
> +	 * first attempt to locate the parent inode, if it can't be found,
> +	 * we'll use the lost+found inode 
> +	 */
> +	byhash = DIR_HASH_FUNC(hashtab, libxfs_da_hashname((uchar_t*)"..", 2));
> +	parentino = orphanage_ino;
> +	for (p = hashtab->byhash[byhash]; p; p = p->nextbyhash) {
> +		if (p->namelen == 2 && p->name[0] == '.' && p->name[1] == '.') {
> +			parentino = p->inum;
> +			break;
> +		}
> +	}
> +
> +	XFS_BMAP_INIT(&flist, &firstblock);
> +		
> +	tp = libxfs_trans_alloc(mp, 0);
> +	nres = XFS_REMOVE_SPACE_RES(mp);
> +	error = libxfs_trans_reserve(tp, nres, XFS_REMOVE_LOG_RES(mp), 0,
> +			XFS_TRANS_PERM_LOG_RES, XFS_REMOVE_LOG_COUNT);
> +	if (error)
> +		res_failed(error);
> +	libxfs_trans_ijoin(tp, ip, 0);
> +	libxfs_trans_ihold(tp, ip);
> +	
> +	if ((error = libxfs_bmap_last_offset(tp, ip, &lastblock, 
> +						XFS_DATA_FORK)))
> +		do_error(_("xfs_bmap_last_offset failed -- error - %d\n"), 
> +			error);
> +	
> +	/* re-init the directory to shortform */
> +	if ((error = libxfs_trans_iget(mp, tp, parentino, 0, 0, &pip)))
> +		do_error(
> +		_("couldn't iget parent inode %llu -- error - %d\n"),
> +			parentino, error);
> +
> +	/* free all data, leaf, node and freespace blocks */
> +	
> +	if ((error = libxfs_bunmapi(tp, ip, 0, lastblock, 
> +			XFS_BMAPI_METADATA, 0, &firstblock, &flist,
> +			&done))) 
> +		do_error(_("xfs_bunmapi failed -- error - %d\n"), error);
> +	ASSERT(done);
> +
> +	libxfs_dir2_init(tp, ip, pip);
> +	
> +	error = libxfs_bmap_finish(&tp, &flist, firstblock, &committed);
> +				
> +	libxfs_trans_commit(tp, 
> +			XFS_TRANS_RELEASE_LOG_RES|XFS_TRANS_SYNC, 0);
> +		
> +	/* go through the hash list and re-add the inodes */
> +
> +	for (p = hashtab->first; p; p = p->nextbyorder) {
> +		
> +		if (p->name[0] == '/' || (p->name[0] == '.' && (p->namelen == 1 
> +				|| (p->namelen == 2 && p->name[1] == '.'))))
> +			continue;
> +		
> +		tp = libxfs_trans_alloc(mp, 0);
> +		nres = XFS_CREATE_SPACE_RES(mp, p->namelen);
> +		if ((error = libxfs_trans_reserve(tp, nres, 
> +				XFS_CREATE_LOG_RES(mp), 0,
> +				XFS_TRANS_PERM_LOG_RES, XFS_CREATE_LOG_COUNT)))
> +			do_error(
> +	_("space reservation failed (%d), filesystem may be out of space\n"),
> +				error);
> +
> +		libxfs_trans_ijoin(tp, ip, 0);
> +		libxfs_trans_ihold(tp, ip);
> +
> +		XFS_BMAP_INIT(&flist, &firstblock);
> +		if ((error = libxfs_dir2_createname(tp, ip, (char*)p->name, 
> +				p->namelen, p->inum, &firstblock, &flist, nres)))
> +			do_error(
> +_("name create failed in ino %llu (%d), filesystem may be out of space\n"),
> +				ino, error);
> +
> +		if ((error = libxfs_bmap_finish(&tp, &flist, firstblock, 
> +				&committed)))
> +			do_error(
> +	_("bmap finish failed (%d), filesystem may be out of space\n"),
> +				error);
> +
> +		libxfs_trans_commit(tp, 
> +				XFS_TRANS_RELEASE_LOG_RES|XFS_TRANS_SYNC, 0);
> +	}
> +}
> +
> +
> +/*
>   * Kill a block in a version 2 inode.
>   * Makes its own transaction.
>   */
> @@ -1807,7 +1924,6 @@ longform_dir2_entry_check_data(
>  	xfs_dabuf_t		**bpp,
>  	dir_hash_tab_t		*hashtab,
>  	freetab_t		**freetabp,
> -	name_hash_tab_t		*nametab,
>  	xfs_dablk_t		da_bno,
>  	int			isblock)
>  {
> @@ -1828,6 +1944,7 @@ longform_dir2_entry_check_data(
>  	freetab_t		*freetab;
>  	int			i;
>  	int			ino_offset;
> +	xfs_ino_t		inum;
>  	ino_tree_node_t		*irec;
>  	int			junkit;
>  	int			lastfree;
> @@ -1956,8 +2073,7 @@ longform_dir2_entry_check_data(
>  	libxfs_trans_ijoin(tp, ip, 0);
>  	libxfs_trans_ihold(tp, ip);
>  	libxfs_da_bjoin(tp, bp);
> -	if (isblock)
> -		libxfs_da_bhold(tp, bp);
> +	libxfs_da_bhold(tp, bp);
>  	XFS_BMAP_INIT(&flist, &firstblock);
>  	if (INT_GET(d->hdr.magic, ARCH_CONVERT) != wantmagic) {
>  		do_warn(_("bad directory block magic # %#x for directory inode "
> @@ -1987,7 +2103,7 @@ longform_dir2_entry_check_data(
>  	while (ptr < endptr) {
>  		dup = (xfs_dir2_data_unused_t *)ptr;
>  		if (INT_GET(dup->freetag, ARCH_CONVERT) ==
> -		    XFS_DIR2_DATA_FREE_TAG) {
> +		    				XFS_DIR2_DATA_FREE_TAG) {
>  			if (lastfree) {
>  				do_warn(_("directory inode %llu block %u has "
>  					  "consecutive free entries: "),
> @@ -2011,10 +2127,24 @@ longform_dir2_entry_check_data(
>  		addr = XFS_DIR2_DB_OFF_TO_DATAPTR(mp, db, ptr - (char *)d);
>  		dep = (xfs_dir2_data_entry_t *)ptr;
>  		ptr += XFS_DIR2_DATA_ENTSIZE(dep->namelen);
> +		inum = INT_GET(dep->inumber, ARCH_CONVERT);
>  		lastfree = 0;
> -		dir_hash_add(hashtab,
> -			libxfs_da_hashname((uchar_t *)dep->name, dep->namelen),
> -			addr, dep->name[0] == '/');
> +		if (!dir_hash_add(hashtab, addr, inum, dep->namelen, 
> +				dep->name)) {
> +			do_warn(
> +		_("entry \"%s\" (ino %llu) in dir %llu is a duplicate name"),
> +				fname, inum, ip->i_ino);
> +			if (!no_modify) {
> +				if (verbose)
> +					do_warn(
> +					_(", marking entry to be junked\n"));
> +				else
> +					do_warn("\n");
> +			} else {
> +				do_warn(_(", would junk entry\n"));
> +			}
> +			dep->name[0] = '/';
> +		}
>  		/*
>  		 * skip bogus entries (leading '/').  they'll be deleted
>  		 * later.  must still log it, else we leak references to
> @@ -2029,7 +2159,7 @@ longform_dir2_entry_check_data(
>  		junkit = 0;
>  		bcopy(dep->name, fname, dep->namelen);
>  		fname[dep->namelen] = '\0';
> -		ASSERT(INT_GET(dep->inumber, ARCH_CONVERT) != NULLFSINO);
> +		ASSERT(inum != NULLFSINO);
>  		/*
>  		 * skip the '..' entry since it's checked when the
>  		 * directory is reached by something else.  if it never
> @@ -2039,7 +2169,7 @@ longform_dir2_entry_check_data(
>  		if (dep->namelen == 2 && dep->name[0] == '.' &&
>  		    dep->name[1] == '.')
>  			continue;
> -		ASSERT(no_modify || !verify_inum(mp, INT_GET(dep->inumber, ARCH_CONVERT)));
> +		ASSERT(no_modify || !verify_inum(mp, inum));
>  		/*
>  		 * special case the . entry.  we know there's only one
>  		 * '.' and only '.' points to itself because bogus entries
> @@ -2049,7 +2179,7 @@ longform_dir2_entry_check_data(
>  		 * '..' is already accounted for or will be taken care
>  		 * of when directory is moved to orphanage.
>  		 */
> -		if (ip->i_ino == INT_GET(dep->inumber, ARCH_CONVERT))  {
> +		if (ip->i_ino == inum)  {
>  			ASSERT(dep->name[0] == '.' && dep->namelen == 1);
>  			add_inode_ref(current_irec, current_ino_offset);
>  			*need_dot = 0;
> @@ -2062,23 +2192,18 @@ longform_dir2_entry_check_data(
>  		 * just skip it.  no need to process it and it's ..
>  		 * link is already accounted for.
>  		 */
> -		if (INT_GET(dep->inumber, ARCH_CONVERT) == orphanage_ino &&
> -		    strcmp(fname, ORPHANAGE) == 0)
> +		if (inum == orphanage_ino && strcmp(fname, ORPHANAGE) == 0)
>  			continue;
>  		/*
>  		 * skip entries with bogus inumbers if we're in no modify mode
>  		 */
> -		if (no_modify &&
> -		    verify_inum(mp, INT_GET(dep->inumber, ARCH_CONVERT)))
> +		if (no_modify && verify_inum(mp, inum))
>  			continue;
>  		/*
>  		 * ok, now handle the rest of the cases besides '.' and '..'
>  		 */
> -		irec = find_inode_rec(
> -			XFS_INO_TO_AGNO(mp,
> -				INT_GET(dep->inumber, ARCH_CONVERT)),
> -			XFS_INO_TO_AGINO(mp,
> -				INT_GET(dep->inumber, ARCH_CONVERT)));
> +		irec = find_inode_rec(XFS_INO_TO_AGNO(mp, inum),
> +					XFS_INO_TO_AGINO(mp, inum));
>  		if (irec == NULL)  {
>  			nbad++;
>  			do_warn(_("entry \"%s\" in directory inode %llu points "
> @@ -2093,9 +2218,7 @@ longform_dir2_entry_check_data(
>  			}
>  			continue;
>  		}
> -		ino_offset = XFS_INO_TO_AGINO(mp,
> -				INT_GET(dep->inumber, ARCH_CONVERT)) -
> -					irec->ino_startnum;
> +		ino_offset = XFS_INO_TO_AGINO(mp, inum) - irec->ino_startnum;
>  		/*
>  		 * if it's a free inode, blow out the entry.
>  		 * by now, any inode that we think is free
> @@ -2106,18 +2229,13 @@ longform_dir2_entry_check_data(
>  			 * don't complain if this entry points to the old
>  			 * and now-free lost+found inode
>  			 */
> -			if (verbose || no_modify ||
> -			    INT_GET(dep->inumber, ARCH_CONVERT) !=
> -			    old_orphanage_ino)
> +			if (verbose || no_modify || inum != old_orphanage_ino)
>  				do_warn(
>  	_("entry \"%s\" in directory inode %llu points to free inode %llu"),
> -					fname, ip->i_ino,
> -					INT_GET(dep->inumber, ARCH_CONVERT));
> +					fname, ip->i_ino, inum);
>  			nbad++;
>  			if (!no_modify)  {
> -				if (verbose ||
> -				    INT_GET(dep->inumber, ARCH_CONVERT) !=
> -				    old_orphanage_ino)
> +				if (verbose || inum != old_orphanage_ino)
>  					do_warn(
>  					_(", marking entry to be junked\n"));
>  				else
> @@ -2130,28 +2248,6 @@ longform_dir2_entry_check_data(
>  			continue;
>  		}
>  		/*
> -		 * check for duplicate names in directory.
> -		 */ 
> -		if (!name_hash_add(nametab, dep->name, dep->namelen)) {
> -			do_warn(
> -		_("entry \"%s\" (ino %llu) in dir %llu is a duplicate name"),
> -				fname, INT_GET(dep->inumber, ARCH_CONVERT),
> -				ip->i_ino);
> -			nbad++;
> -			if (!no_modify) {
> -				if (verbose)
> -					do_warn(
> -					_(", marking entry to be junked\n"));
> -				else
> -					do_warn("\n");
> -				dep->name[0] = '/';
> -				libxfs_dir2_data_log_entry(tp, bp, dep);
> -			} else {
> -				do_warn(_(", would junk entry\n"));
> -			}
> -			continue;
> -		}
> -		/*
>  		 * check easy case first, regular inode, just bump
>  		 * the link count and continue
>  		 */
> @@ -2172,22 +2268,17 @@ longform_dir2_entry_check_data(
>  			junkit = 1;
>  			do_warn(
>  _("entry \"%s\" in dir %llu points to an already connected directory inode %llu,\n"),
> -				fname, ip->i_ino,
> -				INT_GET(dep->inumber, ARCH_CONVERT));
> +				fname, ip->i_ino, inum);
>  		} else if (parent == ip->i_ino)  {
>  			add_inode_reached(irec, ino_offset);
>  			add_inode_ref(current_irec, current_ino_offset);
> -			if (!is_inode_refchecked(
> -				INT_GET(dep->inumber, ARCH_CONVERT), irec,
> -					ino_offset))
> -				push_dir(stack,
> -					INT_GET(dep->inumber, ARCH_CONVERT));
> +			if (!is_inode_refchecked(inum, irec, ino_offset))
> +				push_dir(stack, inum);
>  		} else  {
>  			junkit = 1;
>  			do_warn(
>  _("entry \"%s\" in dir inode %llu inconsistent with .. value (%llu) in ino %llu,\n"),
> -				fname, ip->i_ino, parent,
> -				INT_GET(dep->inumber, ARCH_CONVERT));
> +				fname, ip->i_ino, parent, inum);
>  		}
>  		if (junkit)  {
>  			junkit = 0;
> @@ -2195,9 +2286,7 @@ _("entry \"%s\" in dir inode %llu incons
>  			if (!no_modify)  {
>  				dep->name[0] = '/';
>  				libxfs_dir2_data_log_entry(tp, bp, dep);
> -				if (verbose ||
> -				    INT_GET(dep->inumber, ARCH_CONVERT) !=
> -				    old_orphanage_ino)
> +				if (verbose || inum != old_orphanage_ino)
>  					do_warn(
>  					_("\twill clear entry \"%s\"\n"),
>  						fname);
> @@ -2212,8 +2301,6 @@ _("entry \"%s\" in dir inode %llu incons
>  		libxfs_dir2_data_freescan(mp, d, &needlog, NULL);
>  	if (needlog)
>  		libxfs_dir2_data_log_header(tp, bp);
> -	else if (!isblock && !nbad)
> -		libxfs_da_brelse(tp, bp);
>  	libxfs_bmap_finish(&tp, &flist, firstblock, &committed);
>  	libxfs_trans_commit(tp, 0, 0);
>  	freetab->ents[db].v = INT_GET(d->hdr.bestfree[0].length, ARCH_CONVERT);
> @@ -2306,19 +2393,19 @@ longform_dir2_check_node(
>  	xfs_fileoff_t		next_da_bno;
>  	int			seeval = 0;
>  	int			used;
> -
> +	
>  	for (da_bno = mp->m_dirleafblk, next_da_bno = 0;
>  	     next_da_bno != NULLFILEOFF && da_bno < mp->m_dirfreeblk;
>  	     da_bno = (xfs_dablk_t)next_da_bno) {
>  		next_da_bno = da_bno + mp->m_dirblkfsbs - 1;
> -		if (libxfs_bmap_next_offset(NULL, ip, &next_da_bno, XFS_DATA_FORK))
> +		if (libxfs_bmap_next_offset(NULL, ip, &next_da_bno, XFS_DATA_FORK)) 
>  			break;
>  		if (libxfs_da_read_bufr(NULL, ip, da_bno, -1, &bp,
>  				XFS_DATA_FORK)) {
> -			do_error(
> -			_("can't read block %u for directory inode %llu\n"),
> +			do_warn(
> +			_("can't read leaf block %u for directory inode %llu\n"),
>  				da_bno, ip->i_ino);
> -			/* NOTREACHED */
> +			return 1;
>  		}
>  		leaf = bp->data;
>  		if (INT_GET(leaf->hdr.info.magic, ARCH_CONVERT) !=
> @@ -2348,23 +2435,24 @@ longform_dir2_check_node(
>  		seeval = dir_hash_see_all(hashtab, leaf->ents, INT_GET(leaf->hdr.count, ARCH_CONVERT),
>  			INT_GET(leaf->hdr.stale, ARCH_CONVERT));
>  		libxfs_da_brelse(NULL, bp);
> -		if (seeval != DIR_HASH_CK_OK)
> +		if (seeval != DIR_HASH_CK_OK) 
>  			return 1;
>  	}
> -	if (dir_hash_check(hashtab, ip, seeval))
> +	if (dir_hash_check(hashtab, ip, seeval)) 
>  		return 1;
> +	
>  	for (da_bno = mp->m_dirfreeblk, next_da_bno = 0;
>  	     next_da_bno != NULLFILEOFF;
>  	     da_bno = (xfs_dablk_t)next_da_bno) {
>  		next_da_bno = da_bno + mp->m_dirblkfsbs - 1;
> -		if (libxfs_bmap_next_offset(NULL, ip, &next_da_bno, XFS_DATA_FORK))
> +		if (libxfs_bmap_next_offset(NULL, ip, &next_da_bno, XFS_DATA_FORK)) 
>  			break;
>  		if (libxfs_da_read_bufr(NULL, ip, da_bno, -1, &bp,
>  				XFS_DATA_FORK)) {
> -			do_error(_("can't read block %u for directory inode "
> -				   "%llu\n"),
> +			do_warn(
> +		_("can't read freespace block %u for directory inode %llu\n"),
>  				da_bno, ip->i_ino);
> -			/* NOTREACHED */
> +			return 1;
>  		}
>  		free = bp->data;
>  		fdb = XFS_DIR2_DA_TO_DB(mp, da_bno);
> @@ -2418,388 +2506,9 @@ longform_dir2_check_node(
>  }
>  
>  /*
> - * Rebuild a directory: set up.
> - * Turn it into a node-format directory with no contents in the
> - * upper area.  Also has correct freespace blocks.
> - */
> -void
> -longform_dir2_rebuild_setup(
> -	xfs_mount_t		*mp,
> -	xfs_ino_t		ino,
> -	xfs_inode_t		*ip,
> -	freetab_t		*freetab)
> -{
> -	xfs_da_args_t		args;
> -	int			committed;
> -	xfs_dir2_data_t		*data = NULL;
> -	xfs_dabuf_t		*dbp;
> -	int			error;
> -	xfs_dir2_db_t		fbno;
> -	xfs_dabuf_t		*fbp;
> -	xfs_fsblock_t		firstblock;
> -	xfs_bmap_free_t		flist;
> -	xfs_dir2_free_t		*free;
> -	int			i;
> -	int			j;
> -	xfs_dablk_t		lblkno;
> -	xfs_dabuf_t		*lbp;
> -	xfs_dir2_leaf_t		*leaf;
> -	int			nres;
> -	xfs_trans_t		*tp;
> -
> -	/* read first directory block */
> -	tp = libxfs_trans_alloc(mp, 0);
> -	nres = XFS_DAENTER_SPACE_RES(mp, XFS_DATA_FORK);
> -	error = libxfs_trans_reserve(tp,
> -		nres, XFS_CREATE_LOG_RES(mp), 0, XFS_TRANS_PERM_LOG_RES,
> -		XFS_CREATE_LOG_COUNT);
> -	if (error)
> -		res_failed(error);
> -	libxfs_trans_ijoin(tp, ip, 0);
> -	libxfs_trans_ihold(tp, ip);
> -	XFS_BMAP_INIT(&flist, &firstblock);
> -	if (libxfs_da_read_buf(tp, ip, mp->m_dirdatablk, -2, &dbp,
> -			XFS_DATA_FORK)) {
> -		do_error(_("can't read block %u for directory inode %llu\n"),
> -			mp->m_dirdatablk, ino);
> -		/* NOTREACHED */
> -	}
> -
> -	if (dbp)
> -		data = dbp->data;
> -
> -	/* check for block format directory */
> -	if (data &&
> -	    INT_GET((data)->hdr.magic, ARCH_CONVERT) == XFS_DIR2_BLOCK_MAGIC) {
> -		xfs_dir2_block_t	*block;
> -		xfs_dir2_leaf_entry_t	*blp;
> -		xfs_dir2_block_tail_t	*btp;
> -		int			needlog;
> -		int			needscan;
> -
> -		/* convert directory block from block format to data format */
> -		INT_SET(data->hdr.magic, ARCH_CONVERT, XFS_DIR2_DATA_MAGIC);
> -
> -		/* construct freelist */
> -		block = (xfs_dir2_block_t *)data;
> -		btp = XFS_DIR2_BLOCK_TAIL_P(mp, block);
> -		blp = XFS_DIR2_BLOCK_LEAF_P(btp);
> -		needlog = needscan = 0;
> -		libxfs_dir2_data_make_free(tp, dbp, (char *)blp - (char *)block,
> -			(char *)block + mp->m_dirblksize - (char *)blp,
> -			&needlog, &needscan);
> -		if (needscan)
> -			libxfs_dir2_data_freescan(mp, data, &needlog, NULL);
> -		libxfs_da_log_buf(tp, dbp, 0, mp->m_dirblksize - 1);
> -	} else if (dbp) {
> -		libxfs_da_brelse(tp, dbp);
> -	}
> -
> -	/* allocate blocks for btree */
> -	bzero(&args, sizeof(args));
> -	args.trans = tp;
> -	args.dp = ip;
> -	args.whichfork = XFS_DATA_FORK;
> -	args.firstblock = &firstblock;
> -	args.flist = &flist;
> -	args.total = nres;
> -	if ((error = libxfs_da_grow_inode(&args, &lblkno)) ||
> -	    (error = libxfs_da_get_buf(tp, ip, lblkno, -1, &lbp, XFS_DATA_FORK))) {
> -		do_error(_("can't add btree block to directory inode %llu\n"),
> -			ino);
> -		/* NOTREACHED */
> -	}
> -	leaf = lbp->data;
> -	bzero(leaf, mp->m_dirblksize);
> -	INT_SET(leaf->hdr.info.magic, ARCH_CONVERT, XFS_DIR2_LEAFN_MAGIC);
> -	libxfs_da_log_buf(tp, lbp, 0, mp->m_dirblksize - 1);
> -	libxfs_bmap_finish(&tp, &flist, firstblock, &committed);
> -	libxfs_trans_commit(tp, 0, 0);
> -
> -	for (i = 0; i < freetab->nents; i += XFS_DIR2_MAX_FREE_BESTS(mp)) {
> -		tp = libxfs_trans_alloc(mp, 0);
> -		nres = XFS_DAENTER_SPACE_RES(mp, XFS_DATA_FORK);
> -		error = libxfs_trans_reserve(tp,
> -			nres, XFS_CREATE_LOG_RES(mp), 0, XFS_TRANS_PERM_LOG_RES,
> -			XFS_CREATE_LOG_COUNT);
> -		if (error)
> -			res_failed(error);
> -		libxfs_trans_ijoin(tp, ip, 0);
> -		libxfs_trans_ihold(tp, ip);
> -		XFS_BMAP_INIT(&flist, &firstblock);
> -		bzero(&args, sizeof(args));
> -		args.trans = tp;
> -		args.dp = ip;
> -		args.whichfork = XFS_DATA_FORK;
> -		args.firstblock = &firstblock;
> -		args.flist = &flist;
> -		args.total = nres;
> -		if ((error = libxfs_dir2_grow_inode(&args, XFS_DIR2_FREE_SPACE,
> -						 &fbno)) ||
> -		    (error = libxfs_da_get_buf(tp, ip, XFS_DIR2_DB_TO_DA(mp, fbno),
> -					    -1, &fbp, XFS_DATA_FORK))) {
> -			do_error(_("can't add free block to directory inode "
> -				   "%llu\n"),
> -				ino);
> -			/* NOTREACHED */
> -		}
> -		free = fbp->data;
> -		bzero(free, mp->m_dirblksize);
> -		INT_SET(free->hdr.magic, ARCH_CONVERT, XFS_DIR2_FREE_MAGIC);
> -		INT_SET(free->hdr.firstdb, ARCH_CONVERT, i);
> -		INT_SET(free->hdr.nvalid, ARCH_CONVERT, XFS_DIR2_MAX_FREE_BESTS(mp));
> -		if (i + INT_GET(free->hdr.nvalid, ARCH_CONVERT) > freetab->nents)
> -			INT_SET(free->hdr.nvalid, ARCH_CONVERT, freetab->nents - i);
> -		for (j = 0; j < INT_GET(free->hdr.nvalid, ARCH_CONVERT); j++) {
> -			INT_SET(free->bests[j], ARCH_CONVERT, freetab->ents[i + j].v);
> -			if (INT_GET(free->bests[j], ARCH_CONVERT) != NULLDATAOFF)
> -				INT_MOD(free->hdr.nused, ARCH_CONVERT, +1);
> -		}
> -		libxfs_da_log_buf(tp, fbp, 0, mp->m_dirblksize - 1);
> -		libxfs_bmap_finish(&tp, &flist, firstblock, &committed);
> -		libxfs_trans_commit(tp, 0, 0);
> -	}
> -}
> -
> -/*
> - * Rebuild the entries from a single data block.
> - */
> -void
> -longform_dir2_rebuild_data(
> -	xfs_mount_t		*mp,
> -	xfs_ino_t		ino,
> -	xfs_inode_t		*ip,
> -	xfs_dablk_t		da_bno)
> -{
> -	xfs_dabuf_t		*bp;
> -	xfs_dir2_block_tail_t	*btp;
> -	int			committed;
> -	xfs_dir2_data_t		*data;
> -	xfs_dir2_db_t		dbno;
> -	xfs_dir2_data_entry_t	*dep;
> -	xfs_dir2_data_unused_t	*dup;
> -	char			*endptr;
> -	int			error;
> -	xfs_dir2_free_t		*fblock;
> -	xfs_dabuf_t		*fbp;
> -	xfs_dir2_db_t		fdb;
> -	int			fi;
> -	xfs_fsblock_t		firstblock;
> -	xfs_bmap_free_t		flist;
> -	int			needlog;
> -	int			needscan;
> -	int			nres;
> -	char			*ptr;
> -	xfs_trans_t		*tp;
> -
> -	if (libxfs_da_read_buf(NULL, ip, da_bno, da_bno == 0 ? -2 : -1, &bp,
> -			XFS_DATA_FORK)) {
> -		do_error(_("can't read block %u for directory inode %llu\n"),
> -			da_bno, ino);
> -		/* NOTREACHED */
> -	}
> -	if (da_bno == 0 && bp == NULL)
> -		/*
> -		 * The block was punched out.
> -		 */
> -		return;
> -	ASSERT(bp);
> -	dbno = XFS_DIR2_DA_TO_DB(mp, da_bno);
> -	fdb = XFS_DIR2_DB_TO_FDB(mp, dbno);
> -	if (libxfs_da_read_buf(NULL, ip, XFS_DIR2_DB_TO_DA(mp, fdb), -1, &fbp,
> -			XFS_DATA_FORK)) {
> -		do_error(_("can't read block %u for directory inode %llu\n"),
> -			XFS_DIR2_DB_TO_DA(mp, fdb), ino);
> -		/* NOTREACHED */
> -	}
> -	data = malloc(mp->m_dirblksize);
> -	if (!data) {
> -		do_error(
> -		_("malloc failed in longform_dir2_rebuild_data (%u bytes)\n"),
> -			mp->m_dirblksize);
> -		exit(1);
> -	}
> -	bcopy(bp->data, data, mp->m_dirblksize);
> -	ptr = (char *)data->u;
> -	if (INT_GET(data->hdr.magic, ARCH_CONVERT) == XFS_DIR2_BLOCK_MAGIC) {
> -		btp = XFS_DIR2_BLOCK_TAIL_P(mp, (xfs_dir2_block_t *)data);
> -		endptr = (char *)XFS_DIR2_BLOCK_LEAF_P(btp);
> -	} else
> -		endptr = (char *)data + mp->m_dirblksize;
> -	fblock = fbp->data;
> -	fi = XFS_DIR2_DB_TO_FDINDEX(mp, dbno);
> -	tp = libxfs_trans_alloc(mp, 0);
> -	error = libxfs_trans_reserve(tp, 0, XFS_CREATE_LOG_RES(mp), 0,
> -		XFS_TRANS_PERM_LOG_RES, XFS_CREATE_LOG_COUNT);
> -	if (error)
> -		res_failed(error);
> -	libxfs_trans_ijoin(tp, ip, 0);
> -	libxfs_trans_ihold(tp, ip);
> -	libxfs_da_bjoin(tp, bp);
> -	libxfs_da_bhold(tp, bp);
> -	libxfs_da_bjoin(tp, fbp);
> -	libxfs_da_bhold(tp, fbp);
> -	XFS_BMAP_INIT(&flist, &firstblock);
> -	needlog = needscan = 0;
> -	bzero(((xfs_dir2_data_t *)(bp->data))->hdr.bestfree,
> -		sizeof(data->hdr.bestfree));
> -	libxfs_dir2_data_make_free(tp, bp, (xfs_dir2_data_aoff_t)sizeof(data->hdr),
> -		mp->m_dirblksize - sizeof(data->hdr), &needlog, &needscan);
> -	ASSERT(needscan == 0);
> -	libxfs_dir2_data_log_header(tp, bp);
> -	INT_SET(fblock->bests[fi], ARCH_CONVERT,
> -		INT_GET(((xfs_dir2_data_t *)(bp->data))->hdr.bestfree[0].length, ARCH_CONVERT));
> -	libxfs_dir2_free_log_bests(tp, fbp, fi, fi);
> -	libxfs_bmap_finish(&tp, &flist, firstblock, &committed);
> -	libxfs_trans_commit(tp, 0, 0);
> -
> -	while (ptr < endptr) {
> -		dup = (xfs_dir2_data_unused_t *)ptr;
> -		if (INT_GET(dup->freetag, ARCH_CONVERT) == XFS_DIR2_DATA_FREE_TAG) {
> -			ptr += INT_GET(dup->length, ARCH_CONVERT);
> -			continue;
> -		}
> -		dep = (xfs_dir2_data_entry_t *)ptr;
> -		ptr += XFS_DIR2_DATA_ENTSIZE(dep->namelen);
> -		if (dep->name[0] == '/')
> -			continue;
> -		tp = libxfs_trans_alloc(mp, 0);
> -		nres = XFS_CREATE_SPACE_RES(mp, dep->namelen);
> -		error = libxfs_trans_reserve(tp, nres, XFS_CREATE_LOG_RES(mp), 0,
> -			XFS_TRANS_PERM_LOG_RES, XFS_CREATE_LOG_COUNT);
> -		if (error)
> -			res_failed(error);
> -		libxfs_trans_ijoin(tp, ip, 0);
> -		libxfs_trans_ihold(tp, ip);
> -		libxfs_da_bjoin(tp, bp);
> -		libxfs_da_bhold(tp, bp);
> -		libxfs_da_bjoin(tp, fbp);
> -		libxfs_da_bhold(tp, fbp);
> -		XFS_BMAP_INIT(&flist, &firstblock);
> -		error = dir_createname(mp, tp, ip, (char *)dep->name,
> -			dep->namelen, INT_GET(dep->inumber, ARCH_CONVERT),
> -			&firstblock, &flist, nres);
> -		ASSERT(error == 0);
> -		libxfs_bmap_finish(&tp, &flist, firstblock, &committed);
> -		libxfs_trans_commit(tp, 0, 0);
> -	}
> -	libxfs_da_brelse(NULL, bp);
> -	libxfs_da_brelse(NULL, fbp);
> -	free(data);
> -}
> -
> -/*
> - * Finish the rebuild of a directory.
> - * Stuff / in and then remove it, this forces the directory to end
> - * up in the right format.
> - */
> -void
> -longform_dir2_rebuild_finish(
> -	xfs_mount_t		*mp,
> -	xfs_ino_t		ino,
> -	xfs_inode_t		*ip)
> -{
> -	int			committed;
> -	int			error;
> -	xfs_fsblock_t		firstblock;
> -	xfs_bmap_free_t		flist;
> -	int			nres;
> -	xfs_trans_t		*tp;
> -
> -	tp = libxfs_trans_alloc(mp, 0);
> -	nres = XFS_CREATE_SPACE_RES(mp, 1);
> -	error = libxfs_trans_reserve(tp, nres, XFS_CREATE_LOG_RES(mp), 0,
> -		XFS_TRANS_PERM_LOG_RES, XFS_CREATE_LOG_COUNT);
> -	if (error)
> -		res_failed(error);
> -	libxfs_trans_ijoin(tp, ip, 0);
> -	libxfs_trans_ihold(tp, ip);
> -	XFS_BMAP_INIT(&flist, &firstblock);
> -	error = dir_createname(mp, tp, ip, "/", 1, ino,
> -			&firstblock, &flist, nres);
> -	ASSERT(error == 0);
> -	libxfs_bmap_finish(&tp, &flist, firstblock, &committed);
> -	libxfs_trans_commit(tp, 0, 0);
> -
> -	/* could kill trailing empty data blocks here */
> -
> -	tp = libxfs_trans_alloc(mp, 0);
> -	nres = XFS_REMOVE_SPACE_RES(mp);
> -	error = libxfs_trans_reserve(tp, nres, XFS_REMOVE_LOG_RES(mp), 0,
> -		XFS_TRANS_PERM_LOG_RES, XFS_REMOVE_LOG_COUNT);
> -	if (error)
> -		res_failed(error);
> -	libxfs_trans_ijoin(tp, ip, 0);
> -	libxfs_trans_ihold(tp, ip);
> -	XFS_BMAP_INIT(&flist, &firstblock);
> -	error = dir_removename(mp, tp, ip, "/", 1, ino,
> -			&firstblock, &flist, nres);
> -	ASSERT(error == 0);
> -	libxfs_bmap_finish(&tp, &flist, firstblock, &committed);
> -	libxfs_trans_commit(tp, 0, 0);
> -}
> -
> -/*
> - * Rebuild a directory.
> - * Remove all the non-data blocks.
> - * Re-initialize to (empty) node form.
> - * Loop over the data blocks reinserting each entry.
> - * Force the directory into the right format.
> - */
> -void
> -longform_dir2_rebuild(
> -	xfs_mount_t	*mp,
> -	xfs_ino_t	ino,
> -	xfs_inode_t	*ip,
> -	int		*num_illegal,
> -	freetab_t	*freetab,
> -	int		isblock)
> -{
> -	xfs_dabuf_t	*bp;
> -	xfs_dablk_t	da_bno;
> -	xfs_fileoff_t	next_da_bno;
> -
> -	do_warn(_("rebuilding directory inode %llu\n"), ino);
> -
> -	/* kill leaf blocks */
> -	for (da_bno = mp->m_dirleafblk, next_da_bno = isblock ? NULLFILEOFF : 0;
> -	     next_da_bno != NULLFILEOFF;
> -	     da_bno = (xfs_dablk_t)next_da_bno) {
> -		next_da_bno = da_bno + mp->m_dirblkfsbs - 1;
> -		if (libxfs_bmap_next_offset(NULL, ip, &next_da_bno, XFS_DATA_FORK))
> -			break;
> -		if (libxfs_da_get_buf(NULL, ip, da_bno, -1, &bp, XFS_DATA_FORK)) {
> -			do_error(_("can't get block %u for directory inode "
> -				   "%llu\n"),
> -				da_bno, ino);
> -			/* NOTREACHED */
> -		}
> -		dir2_kill_block(mp, ip, da_bno, bp);
> -	}
> -
> -	/* rebuild empty btree and freelist */
> -	longform_dir2_rebuild_setup(mp, ino, ip, freetab);
> -
> -	/* rebuild directory */
> -	for (da_bno = mp->m_dirdatablk, next_da_bno = 0;
> -	     da_bno < mp->m_dirleafblk && next_da_bno != NULLFILEOFF;
> -	     da_bno = (xfs_dablk_t)next_da_bno) {
> -		next_da_bno = da_bno + mp->m_dirblkfsbs - 1;
> -		if (libxfs_bmap_next_offset(NULL, ip, &next_da_bno, XFS_DATA_FORK))
> -			break;
> -		longform_dir2_rebuild_data(mp, ino, ip, da_bno);
> -	}
> -
> -	/* put the directory in the appropriate on-disk format */
> -	longform_dir2_rebuild_finish(mp, ino, ip);
> -	*num_illegal = 0;
> -}
> -
> -/*
> - * succeeds or dies, inode never gets dirtied since all changes
> - * happen in file blocks.  the inode size and other core info
> - * is already correct, it's just the leaf entries that get altered.
> - * XXX above comment is wrong for v2 - need to see why it matters
> + * If a directory is corrupt, we need to read in as many entries as possible,
> + * destroy the entry and create a new one with recovered name/inode pairs.
> + * (ie. get libxfs to do all the grunt work)
>   */
>  void
>  longform_dir2_entry_check(xfs_mount_t	*mp,
> @@ -2810,15 +2519,14 @@ longform_dir2_entry_check(xfs_mount_t	*m
>  			dir_stack_t	*stack,
>  			ino_tree_node_t	*irec,
>  			int		ino_offset,
> -			name_hash_tab_t	*nametab)
> +			dir_hash_tab_t	*hashtab)
>  {
>  	xfs_dir2_block_t	*block;
>  	xfs_dir2_leaf_entry_t	*blp;
> -	xfs_dabuf_t		*bp;
> +	xfs_dabuf_t		**bplist;
>  	xfs_dir2_block_tail_t	*btp;
>  	xfs_dablk_t		da_bno;
>  	freetab_t		*freetab;
> -	dir_hash_tab_t		*hashtab;
>  	int			i;
>  	int			isblock;
>  	int			isleaf;
> @@ -2840,6 +2548,7 @@ longform_dir2_entry_check(xfs_mount_t	*m
>  		freetab->ents[i].v = NULLDATAOFF;
>  		freetab->ents[i].s = 0;
>  	}
> +	bplist = calloc(freetab->naents, sizeof(xfs_dabuf_t*));
>  	/* is this a block, leaf, or node directory? */
>  	libxfs_dir2_isblock(NULL, ip, &isblock);
>  	libxfs_dir2_isleaf(NULL, ip, &isleaf);
> @@ -2847,50 +2556,58 @@ longform_dir2_entry_check(xfs_mount_t	*m
>  	if (do_prefetch && !isblock)
>  		prefetch_p6_dir2(mp, ip);
>  
> -	/* check directory data */
> -	hashtab = dir_hash_init(ip->i_d.di_size);
> +	/* check directory "data" blocks (ie. name/inode pairs) */
>  	for (da_bno = 0, next_da_bno = 0;
>  	     next_da_bno != NULLFILEOFF && da_bno < mp->m_dirleafblk;
>  	     da_bno = (xfs_dablk_t)next_da_bno) {
>  		next_da_bno = da_bno + mp->m_dirblkfsbs - 1;
> +		ASSERT(XFS_DIR2_DA_TO_DB(mp, da_bno) < freetab->naents);
>  		if (libxfs_bmap_next_offset(NULL, ip, &next_da_bno, XFS_DATA_FORK))
>  			break;
> -		if (libxfs_da_read_bufr(NULL, ip, da_bno,
> -				da_bno == 0 ? -2 : -1, &bp, XFS_DATA_FORK)) {
> -			do_error(_("can't read block %u for directory inode "
> -				   "%llu\n"),
> +		if (libxfs_da_read_bufr(NULL, ip, da_bno, -1, 
> +				&bplist[XFS_DIR2_DA_TO_DB(mp, da_bno)], 
> +				XFS_DATA_FORK)) {
> +			do_warn(_(
> +			"can't read data block %u for directory inode %llu\n"),
>  				da_bno, ino);
> -			/* NOTREACHED */
> +			*num_illegal++;
> +			continue;	/* try and read all "data" blocks */
>  		}
> -		/* is there a hole at the start? */
> -		if (da_bno == 0 && bp == NULL)
> -			continue;
>  		longform_dir2_entry_check_data(mp, ip, num_illegal, need_dot,
> -			stack, irec, ino_offset, &bp, hashtab, &freetab, 
> -			nametab, da_bno, isblock);
> -		/* it releases the buffer unless isblock is set */
> +				stack, irec, ino_offset, 
> +				&bplist[XFS_DIR2_DA_TO_DB(mp, da_bno)], hashtab,  
> +				&freetab, da_bno, isblock);
>  	}
>  	fixit = (*num_illegal != 0) || dir2_is_badino(ino);
>  
>  	/* check btree and freespace */
>  	if (isblock) {
> -		ASSERT(bp);
> -		block = bp->data;
> +		block = bplist[0]->data;
>  		btp = XFS_DIR2_BLOCK_TAIL_P(mp, block);
>  		blp = XFS_DIR2_BLOCK_LEAF_P(btp);
> -		seeval = dir_hash_see_all(hashtab, blp, INT_GET(btp->count, ARCH_CONVERT), INT_GET(btp->stale,
> ARCH_CONVERT));
> +		seeval = dir_hash_see_all(hashtab, blp, 
> +				INT_GET(btp->count, ARCH_CONVERT), 
> +				INT_GET(btp->stale, ARCH_CONVERT));
>  		if (dir_hash_check(hashtab, ip, seeval))
>  			fixit |= 1;
> -		libxfs_da_brelse(NULL, bp);
>  	} else if (isleaf) {
>  		fixit |= longform_dir2_check_leaf(mp, ip, hashtab, freetab);
>  	} else {
>  		fixit |= longform_dir2_check_node(mp, ip, hashtab, freetab);
>  	}
> -	dir_hash_done(hashtab);
> -	if (!no_modify && fixit)
> -		longform_dir2_rebuild(mp, ino, ip, num_illegal, freetab,
> -			isblock);
> +	if (!no_modify && fixit) {
> +		dir_hash_dup_names(hashtab);
> +		for (i = 0; i < freetab->naents; i++) 
> +			if (bplist[i])
> +				libxfs_da_brelse(NULL, bplist[i]);
> +		longform_dir2_rebuild(mp, ino, ip, hashtab);
> +		*num_illegal = 0;
> +	} else {
> +		for (i = 0; i < freetab->naents; i++) 
> +			if (bplist[i])
> +				libxfs_da_brelse(NULL, bplist[i]);
> +	}
> +	
>  	free(freetab);
>  }
>  
> @@ -2906,7 +2623,7 @@ shortform_dir_entry_check(xfs_mount_t	*m
>  			dir_stack_t	*stack,
>  			ino_tree_node_t	*current_irec,
>  			int		current_ino_offset,
> -			name_hash_tab_t	*nametab)
> +			dir_hash_tab_t	*hashtab)
>  {
>  	xfs_ino_t		lino;
>  	xfs_ino_t		parent;
> @@ -3044,7 +2761,7 @@ _("entry \"%s\" in shortform dir %llu re
>  		ASSERT(irec != NULL);
>  
>  		ino_offset = XFS_INO_TO_AGINO(mp, lino) - irec->ino_startnum;
> -
> +		
>  		/*
>  		 * if it's a free inode, blow out the entry.
>  		 * by now, any inode that we think is free
> @@ -3066,8 +2783,9 @@ _("entry \"%s\" in shortform dir inode %
>  				do_warn(_("would junk entry \"%s\"\n"),
>  					fname);
>  			}
> -		} else if (!name_hash_add(nametab, sf_entry->name, 
> -					sf_entry->namelen)) {
> +		} else if (!dir_hash_add(hashtab, 
> +				(xfs_dir2_dataptr_t)(sf_entry - &sf->list[0]),
> +				lino, sf_entry->namelen, sf_entry->name)) {
>  			/*
>  			 * check for duplicate names in directory.
>  			 */ 
> @@ -3311,7 +3029,7 @@ shortform_dir2_entry_check(xfs_mount_t	*
>  			dir_stack_t	*stack,
>  			ino_tree_node_t	*current_irec,
>  			int		current_ino_offset,
> -			name_hash_tab_t	*nametab)
> +			dir_hash_tab_t	*hashtab)
>  {
>  	xfs_ino_t		lino;
>  	xfs_ino_t		parent;
> @@ -3484,7 +3202,9 @@ shortform_dir2_entry_check(xfs_mount_t	*
>  				do_warn(_("would junk entry \"%s\"\n"),
>  					fname);
>  			}
> -		} else if (!name_hash_add(nametab, sfep->name, sfep->namelen)) {
> +		} else if (!dir_hash_add(hashtab, (xfs_dir2_dataptr_t)
> +					(sfep - XFS_DIR2_SF_FIRSTENTRY(sfp)),
> +				lino, sfep->namelen, sfep->name)) {
>  			/*
>  			 * check for duplicate names in directory.
>  			 */ 
> @@ -3650,7 +3370,7 @@ process_dirstack(xfs_mount_t *mp, dir_st
>  	xfs_trans_t		*tp;
>  	xfs_dahash_t		hashval;
>  	ino_tree_node_t		*irec;
> -	name_hash_tab_t		*nametab;
> +	dir_hash_tab_t		*hashtab;
>  	int			ino_offset, need_dot, committed;
>  	int			dirty, num_illegal, error, nres;
>  
> @@ -3731,7 +3451,7 @@ process_dirstack(xfs_mount_t *mp, dir_st
>  
>  		add_inode_refchecked(ino, irec, ino_offset);
>  
> -		nametab = name_hash_init(ip->i_d.di_size);
> +		hashtab = dir_hash_init(ip->i_d.di_size);
>  
>  		/*
>  		 * look for bogus entries
> @@ -3750,13 +3470,13 @@ process_dirstack(xfs_mount_t *mp, dir_st
>  							&num_illegal, &need_dot,
>  							stack, irec,
>  							ino_offset,
> -							nametab);
> +							hashtab);
>  			else
>  				longform_dir_entry_check(mp, ino, ip,
>  							&num_illegal, &need_dot,
>  							stack, irec,
>  							ino_offset,
> -							nametab);
> +							hashtab);
>  			break;
>  		case XFS_DINODE_FMT_LOCAL:
>  			tp = libxfs_trans_alloc(mp, 0);
> @@ -3781,12 +3501,12 @@ process_dirstack(xfs_mount_t *mp, dir_st
>  				shortform_dir2_entry_check(mp, ino, ip, &dirty,
>  							stack, irec,
>  							ino_offset,
> -							nametab);
> +							hashtab);
>  			else
>  				shortform_dir_entry_check(mp, ino, ip, &dirty,
>  							stack, irec,
>  							ino_offset,
> -							nametab);
> +							hashtab);
>  
>  			ASSERT(dirty == 0 || (dirty && !no_modify));
>  			if (dirty)  {
> @@ -3801,7 +3521,7 @@ process_dirstack(xfs_mount_t *mp, dir_st
>  		default:
>  			break;
>  		}
> -		name_hash_done(nametab);
> +		dir_hash_done(hashtab);
>  
>  		hashval = 0;
>  
> 
> 
> 

-- 
Nathan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Review: xfs_repair fixes for dir2 corruption
  2006-07-28  8:10           ` Nathan Scott
@ 2006-07-28 14:45             ` Madan Valluri
  2006-07-31  7:18               ` Barry Naujok
  0 siblings, 1 reply; 12+ messages in thread
From: Madan Valluri @ 2006-07-28 14:45 UTC (permalink / raw)
  To: Nathan Scott; +Cc: Barry Naujok, xfs

Nathan Scott wrote:
> On Fri, Jul 28, 2006 at 11:58:52AM +1000, Barry Naujok wrote:
>   
>> This patch addresses the following xfs_repair issues:
>>     
>
> The libxfs cache stuff looks good to me.  Maybe Madan can cast
> an eye over the repair changes for ya?
>
> cheers.
>
>   
>>  
1) Since dir_hash_add can be called for both V1 and V2 directories its 
second parameter type xfs_dir2_dataptr_t should be neutral.
2) In dir_hash_add when dup is set, do you still need to add to the 
nextbyhash by list?
3) The following statement in   longform_dir2_rebuild looks odd. 
Besides, FWIW, you can match "/."

                if (p->name[0] == '/' || (p->name[0] == '.' && 
(p->namelen == 1
                                || (p->namelen == 2 && p->name[1] == '.'))))
                        continue;

Consider:

               if (((p->name[0] == '/' || p->name[0] == '.') && 
p->namelen == 1) ||
                        (p->name[0] == '.' && p->name[1] == '.' && 
p->namelen == 2))
                        continue;

4) Related to items 2&3, shouldn't the code be skipping duplicate entries?
5) Can we do anything to minimize the do_error calls in 
longform_dir2_rebuild? Seems like on a full file system, while 
rebuilding say the root directory, matters can get wacky - The directory 
is being rebuilt and we have no further room. Sounds like that this how 
it has been....

Thanks.

/Madan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Review: xfs_repair fixes for dir2 corruption
  2006-07-28  1:58         ` Review: xfs_repair fixes for dir2 corruption Barry Naujok
  2006-07-28  8:10           ` Nathan Scott
@ 2006-07-30  5:19           ` christian
  2006-08-01 21:50           ` Adam Sjøgren
  2 siblings, 0 replies; 12+ messages in thread
From: christian @ 2006-07-30  5:19 UTC (permalink / raw)
  To: Barry Naujok; +Cc: xfs

[-- Attachment #1: Type: TEXT/PLAIN, Size: 553 bytes --]

Hello Barry,

On Fri, 28 Jul 2006, Barry Naujok wrote:
> For those that are keen to fix the 16777216 problem, give this patch a go.

thanks for the patch, worked for me ;-)

(somehow your patch here was linewrapped around line 1389 or so, so I
  attach it again)

I've compiled xfsprogs from cvs with your patch applied on x86_64 to
work on a saved copy of a corrupt xfs (from a i386 box).

for the record, here is how it went:
http://nerdbynature.de/bits/2.6.18-rc2/sda5_prinz64.log.gz

Thanks again,
Christian.
-- 
BOFH excuse #158:

Defunct processes

[-- Attachment #2: Type: TEXT/plain, Size: 48692 bytes --]

#
# This patch addresses the following xfs_repair issues:
#  - Can rebuild most corrupted directories. Some will be 
#    beyond repair and the contents of those will end up 
#    in lost+found.
#  - Fixed potential problems with the duplicate name 
#    checking by properly referencing the buffers where
#    the names are stored.
#  - Unified the two hash lists used in directory checks.
#  - Fixed a case where incorrectly reference counted and
#    dirty buffers were never written to disk (most 
#    common observation of this is an inaccessible 
#    lost+found directory after a repair).
#
# For those that are keen to fix the 16777216 problem, give this patch a go.
#
--- a/xfsprogs/include/cache.h	2006-07-28 11:46:09.000000000 +1000
+++ b/xfsprogs/include/cache.h	2006-07-27 16:17:47.804986322 +1000
@@ -30,6 +30,7 @@ struct cache_node;
 typedef void *cache_key_t;
 typedef void (*cache_walk_t)(struct cache_node *);
 typedef struct cache_node * (*cache_node_alloc_t)(void);
+typedef void (*cache_node_flush_t)(struct cache_node *);
 typedef void (*cache_node_relse_t)(struct cache_node *);
 typedef unsigned int (*cache_node_hash_t)(cache_key_t, unsigned int);
 typedef int (*cache_node_compare_t)(struct cache_node *, cache_key_t);
@@ -38,6 +39,7 @@ typedef unsigned int (*cache_bulk_relse_
 struct cache_operations {
 	cache_node_hash_t	hash;
 	cache_node_alloc_t	alloc;
+	cache_node_flush_t	flush;
 	cache_node_relse_t	relse;
 	cache_node_compare_t	compare;
 	cache_bulk_relse_t	bulkrelse;	/* optional */
@@ -49,6 +51,7 @@ struct cache {
 	pthread_mutex_t		c_mutex;	/* node count mutex */
 	cache_node_hash_t	hash;		/* node hash function */
 	cache_node_alloc_t	alloc;		/* allocation function */
+	cache_node_flush_t	flush;		/* flush dirty data function */
 	cache_node_relse_t	relse;		/* memory free function */
 	cache_node_compare_t	compare;	/* comparison routine */
 	cache_bulk_relse_t	bulkrelse;	/* bulk release routine */
@@ -75,6 +78,7 @@ struct cache *cache_init(unsigned int, s
 void cache_destroy(struct cache *);
 void cache_walk(struct cache *, cache_walk_t);
 void cache_purge(struct cache *);
+void cache_flush(struct cache *);
 
 int cache_node_get(struct cache *, cache_key_t, struct cache_node **);
 void cache_node_put(struct cache_node *);

--- a/xfsprogs/include/libxfs.h	2006-07-28 11:46:09.000000000 +1000
+++ b/xfsprogs/include/libxfs.h	2006-07-27 16:34:45.308344014 +1000
@@ -257,6 +257,7 @@ extern int	libxfs_writebuf_int (xfs_buf_
 extern struct cache	*libxfs_bcache;
 extern struct cache_operations	libxfs_bcache_operations;
 extern void	libxfs_bcache_purge (void);
+extern void	libxfs_bcache_flush (void);
 extern xfs_buf_t	*libxfs_getbuf (dev_t, xfs_daddr_t, int);
 extern void	libxfs_putbuf (xfs_buf_t *);
 extern void	libxfs_purgebuf (xfs_buf_t *);
@@ -467,6 +468,8 @@ extern int	libxfs_bmap_finish (xfs_trans
 				xfs_fsblock_t, int *);
 extern int	libxfs_bmap_next_offset (xfs_trans_t *, xfs_inode_t *,
 				xfs_fileoff_t *, int);
+extern int	libxfs_bmap_last_offset(xfs_trans_t *, xfs_inode_t *, 
+				xfs_fileoff_t *, int);
 extern int	libxfs_bunmapi (xfs_trans_t *, xfs_inode_t *, xfs_fileoff_t,
 				xfs_filblks_t, int, xfs_extnum_t,
 				xfs_fsblock_t *, xfs_bmap_free_t *, int *);

--- a/xfsprogs/libxfs/cache.c	2006-07-28 11:46:09.000000000 +1000
+++ b/xfsprogs/libxfs/cache.c	2006-07-27 17:42:43.812685388 +1000
@@ -60,6 +60,7 @@ cache_init(
 	cache->c_hashsize = hashsize;
 	cache->hash = cache_operations->hash;
 	cache->alloc = cache_operations->alloc;
+	cache->flush = cache_operations->flush;
 	cache->relse = cache_operations->relse;
 	cache->compare = cache_operations->compare;
 	cache->bulkrelse = cache_operations->bulkrelse ?
@@ -422,6 +423,39 @@ cache_purge(
 		cache_abort();
 	}
 #endif
+	/* flush any remaining nodes to disk */
+	cache_flush(cache);
+}
+
+/*
+ * Flush all nodes in the cache to disk. 
+ */
+void
+cache_flush(
+	struct cache *		cache)
+{
+	struct cache_hash *	hash;
+	struct list_head *	head;
+	struct list_head *	pos;
+	struct cache_node *	node;
+	int			i;
+	
+	if (!cache->flush)
+		return;
+	
+	for (i = 0; i < cache->c_hashsize; i++) {
+		hash = &cache->c_hash[i];
+		
+		pthread_mutex_lock(&hash->ch_mutex);
+		head = &hash->ch_list;
+		for (pos = head->next; pos != head; pos = pos->next) {
+			node = (struct cache_node *)pos;
+			pthread_mutex_lock(&node->cn_mutex);
+			cache->flush(node);
+			pthread_mutex_unlock(&node->cn_mutex);
+		}
+		pthread_mutex_unlock(&hash->ch_mutex);
+	}
 }
 
 #define	HASH_REPORT	(3*HASH_CACHE_RATIO)

--- a/xfsprogs/libxfs/rdwr.c	2006-07-28 11:46:09.000000000 +1000
+++ b/xfsprogs/libxfs/rdwr.c	2006-07-27 16:40:56.612373938 +1000
@@ -416,6 +416,15 @@ libxfs_iomove(xfs_buf_t *bp, uint boff, 
 }
 
 static void
+libxfs_bflush(struct cache_node *node)
+{
+	xfs_buf_t		*bp = (xfs_buf_t *)node;
+
+	if ((bp != NULL) && (bp->b_flags & LIBXFS_B_DIRTY))
+		libxfs_writebufr(bp);
+}
+
+static void
 libxfs_brelse(struct cache_node *node)
 {
 	xfs_buf_t		*bp = (xfs_buf_t *)node;
@@ -442,9 +451,16 @@ libxfs_bcache_purge(void)
 	cache_purge(libxfs_bcache);
 }
 
+void 
+libxfs_bcache_flush(void)
+{
+	cache_flush(libxfs_bcache);
+}
+
 struct cache_operations libxfs_bcache_operations = {
 	/* .hash */	libxfs_bhash,
 	/* .alloc */	libxfs_balloc,
+	/* .flush */	libxfs_bflush,
 	/* .relse */	libxfs_brelse,
 	/* .compare */	libxfs_bcompare,
 	/* .bulkrelse */ NULL	/* TODO: lio_listio64 interface? */
@@ -649,6 +665,7 @@ libxfs_icache_purge(void)
 struct cache_operations libxfs_icache_operations = {
 	/* .hash */	libxfs_ihash,
 	/* .alloc */	libxfs_ialloc,
+	/* .flush */	NULL,
 	/* .relse */	libxfs_irelse,
 	/* .compare */	libxfs_icompare,
 	/* .bulkrelse */ NULL

--- a/xfsprogs/libxfs/xfs.h	2006-07-28 11:46:09.000000000 +1000
+++ b/xfsprogs/libxfs/xfs.h	2006-07-25 12:05:31.917896892 +1000
@@ -98,6 +98,7 @@
 #define xfs_bmapi_single		libxfs_bmapi_single
 #define xfs_bmap_finish			libxfs_bmap_finish
 #define xfs_bmap_del_free		libxfs_bmap_del_free
+#define xfs_bmap_last_offset		libxfs_bmap_last_offset
 #define xfs_bunmapi			libxfs_bunmapi
 #define xfs_free_extent			libxfs_free_extent
 #define xfs_rtfree_extent		libxfs_rtfree_extent

--- a/xfsprogs/repair/phase6.c	2006-07-28 11:46:09.000000000 +1000
+++ b/xfsprogs/repair/phase6.c	2006-07-28 11:30:50.033905530 +1000
@@ -36,43 +36,35 @@ static int orphanage_entered;
 
 /*
  * Data structures and routines to keep track of directory entries
- * and whether their leaf entry has been seen
+ * and whether their leaf entry has been seen. Also used for name
+ * duplicate checking and rebuilding step if required.
  */
 typedef struct dir_hash_ent {
-	struct dir_hash_ent	*next;	/* pointer to next entry */
+	struct dir_hash_ent	*nextbyaddr;/* pointer to next entry */
+	struct dir_hash_ent	*nextbyhash;
+	struct dir_hash_ent	*nextbyorder;
 	xfs_dir2_leaf_entry_t	ent;	/* address and hash value */
+	xfs_ino_t 		inum;	/* inode of name */
 	short			junkit;	/* name starts with / */
 	short			seen;	/* have seen leaf entry */
+	int	  	    	namelen;/* length of name */
+	uchar_t    	    	*name;	/* pointer to name (no NULL) */
 } dir_hash_ent_t;
 
 typedef struct dir_hash_tab {
 	int			size;	/* size of hash table */
-	dir_hash_ent_t		*tab[1];/* actual hash table, variable size */
+	int			names_duped;
+	dir_hash_ent_t		*first;
+	dir_hash_ent_t		*last;
+	dir_hash_ent_t		**byhash;/* actual hash table, variable size */
+	dir_hash_ent_t		**byaddr;/* actual hash table, variable size */
 } dir_hash_tab_t;
+
 #define	DIR_HASH_TAB_SIZE(n)	\
-	(offsetof(dir_hash_tab_t, tab) + (sizeof(dir_hash_ent_t *) * (n)))
+	(sizeof(dir_hash_tab_t) + (sizeof(dir_hash_ent_t *) * (n) * 2))
 #define	DIR_HASH_FUNC(t,a)	((a) % (t)->size)
 
 /*
- * Track names to check for duplicates in a directory.
- */
-
-typedef struct name_hash_ent {
-	struct name_hash_ent	*next;	/* pointer to next entry */
-	xfs_dahash_t		hashval;/* hash value of name */
-	int	  	    	namelen;/* length of name */
-	uchar_t    	    	*name;	/* pointer to name (no NULL) */
-} name_hash_ent_t;		
-
-typedef struct name_hash_tab {
-	int			size;	/* size of hash table */
-	name_hash_ent_t		*tab[1];/* actual hash table, variable size */
-} name_hash_tab_t;
-#define	NAME_HASH_TAB_SIZE(n)	\
-	(offsetof(name_hash_tab_t, tab) + (sizeof(name_hash_ent_t *) * (n)))
-#define	NAME_HASH_FUNC(t,a)	((a) % (t)->size)
-
-/*
  * Track the contents of the freespace table in a directory.
  */
 typedef struct freetab {
@@ -94,28 +86,75 @@ typedef struct freetab {
 #define	DIR_HASH_CK_BADSTALE	5
 #define	DIR_HASH_CK_TOTAL	6
 
-static void
+/*
+ * Returns 0 if the name already exists (ie. a duplicate)
+ */
+static int
 dir_hash_add(
 	dir_hash_tab_t		*hashtab,
-	xfs_dahash_t		hash,
-	xfs_dir2_dataptr_t	addr,
-	int			junk)
-{
-	int			i;
+	xfs_dir2_dataptr_t	addr,	
+	xfs_ino_t		inum,
+	int			namelen,
+	uchar_t			*name)
+{
+	xfs_dahash_t		hash = 0;
+	int			byaddr;
+	int			byhash = 0;
 	dir_hash_ent_t		*p;
-
-	i = DIR_HASH_FUNC(hashtab, addr);
+	int			dup;
+	short			junk;
+	
+	junk = name[0] == '/';
+	byaddr = DIR_HASH_FUNC(hashtab, addr);
+	dup = 0;
+
+	if (!junk) {
+		hash = libxfs_da_hashname(name, namelen);
+		byhash = DIR_HASH_FUNC(hashtab, hash);
+
+		/* 
+		 * search hash bucket for existing name.
+		 */
+		for (p = hashtab->byhash[byhash]; p; p = p->nextbyhash) {
+			if (p->ent.hashval == hash && p->namelen == namelen) {
+				if (memcmp(p->name, name, namelen) == 0) {
+					dup = 1;
+					break;
+				}
+			}
+		}
+	}
+	
 	if ((p = malloc(sizeof(*p))) == NULL)
 		do_error(_("malloc failed in dir_hash_add (%u bytes)\n"),
 			sizeof(*p));
-	p->next = hashtab->tab[i];
-	hashtab->tab[i] = p;
-	if (!(p->junkit = junk))
+	
+	p->nextbyaddr = hashtab->byaddr[byaddr];
+	hashtab->byaddr[byaddr] = p;
+	if (hashtab->last) 
+		hashtab->last->nextbyorder = p;
+	else
+		hashtab->first = p;
+	p->nextbyorder = NULL;
+	hashtab->last = p;
+	
+	if (!(p->junkit = junk)) {
 		p->ent.hashval = hash;
+		p->nextbyhash = hashtab->byhash[byhash];
+		hashtab->byhash[byhash] = p;
+	}
 	p->ent.address = addr;
+	p->inum = inum;
 	p->seen = 0;
+	p->namelen = namelen;
+	p->name = name;
+	
+	return !dup;
 }
 
+/*
+ * checks to see if any data entries are not in the leaf blocks 
+ */
 static int
 dir_hash_unseen(
 	dir_hash_tab_t	*hashtab)
@@ -124,7 +163,7 @@ dir_hash_unseen(
 	dir_hash_ent_t	*p;
 
 	for (i = 0; i < hashtab->size; i++) {
-		for (p = hashtab->tab[i]; p; p = p->next) {
+		for (p = hashtab->byaddr[i]; p; p = p->nextbyaddr) {
 			if (p->seen == 0)
 				return 1;
 		}
@@ -173,8 +212,10 @@ dir_hash_done(
 	dir_hash_ent_t	*p;
 
 	for (i = 0; i < hashtab->size; i++) {
-		for (p = hashtab->tab[i]; p; p = n) {
-			n = p->next;
+		for (p = hashtab->byaddr[i]; p; p = n) {
+			n = p->nextbyaddr;
+			if (hashtab->names_duped)
+				free(p->name);
 			free(p);
 		}
 	}
@@ -196,6 +237,10 @@ dir_hash_init(
 	if ((hashtab = calloc(DIR_HASH_TAB_SIZE(hsize), 1)) == NULL)
 		do_error(_("calloc failed in dir_hash_init\n"));
 	hashtab->size = hsize;
+	hashtab->byhash = (dir_hash_ent_t**)((char *)hashtab + 
+		sizeof(dir_hash_tab_t));
+	hashtab->byaddr = (dir_hash_ent_t**)((char *)hashtab + 
+		sizeof(dir_hash_tab_t) + sizeof(dir_hash_ent_t*) * hsize);
 	return hashtab;
 }
 
@@ -209,7 +254,7 @@ dir_hash_see(
 	dir_hash_ent_t		*p;
 
 	i = DIR_HASH_FUNC(hashtab, addr);
-	for (p = hashtab->tab[i]; p; p = p->next) {
+	for (p = hashtab->byaddr[i]; p; p = p->nextbyaddr) {
 		if (p->ent.address != addr)
 			continue;
 		if (p->seen)
@@ -222,6 +267,10 @@ dir_hash_see(
 	return DIR_HASH_CK_NODATA;
 }
 
+/*
+ * checks to make sure leafs match a data entry, and that the stale
+ * count is valid.
+ */
 static int
 dir_hash_see_all(
 	dir_hash_tab_t		*hashtab,
@@ -246,81 +295,25 @@ dir_hash_see_all(
 }
 
 /*
- * Returns 0 if the name already exists (ie. a duplicate)
+ * Convert name pointers into locally allocated memory
  */
-static int
-name_hash_add(
-	name_hash_tab_t		*nametab,
-	uchar_t			*name,
-	int			namelen)
+static void
+dir_hash_dup_names(dir_hash_tab_t *hashtab)
 {
-	xfs_dahash_t		hash;
-	int			i;
-	name_hash_ent_t		*p;
-
-	hash = libxfs_da_hashname(name, namelen);
-			
-	i = NAME_HASH_FUNC(nametab, hash);
-	
-	/* 
-	 * search hash bucket for existing name.
-	 */
-	for (p = nametab->tab[i]; p; p = p->next) {
-		if (p->hashval == hash && p->namelen == namelen) {
-			if (memcmp(p->name, name, namelen) == 0) 
-				return 0; /* exists */
-		}
-	}
-	
-	if ((p = malloc(sizeof(*p))) == NULL)
-		do_error(_("malloc failed in name_hash_add (%u bytes)\n"),
-			sizeof(*p));
+	uchar_t			*name;
+	dir_hash_ent_t		*p;
 	
-	p->next = nametab->tab[i];
-	p->hashval = hash;
-	p->name = name;
-	p->namelen = namelen;
-	nametab->tab[i] = p;
+	if (hashtab->names_duped)
+		return;
 	
-	return 1;	/* success, no duplicate */
-}
-
-static name_hash_tab_t *
-name_hash_init(
-	xfs_fsize_t	size)
-{
-	name_hash_tab_t	*nametab;
-	int		hsize;
-
-	hsize = size / (16 * 4);
-	if (hsize > 1024)
-		hsize = 1024;
-	else if (hsize < 16)
-		hsize = 16;
-	if ((nametab = calloc(NAME_HASH_TAB_SIZE(hsize), 1)) == NULL)
-		do_error(_("calloc failed in name_hash_init\n"));
-	nametab->size = hsize;
-	return nametab;
-}
-
-static void
-name_hash_done(
-	name_hash_tab_t	*nametab)
-{
-	int		i;
-	name_hash_ent_t	*n;
-	name_hash_ent_t	*p;
-
-	for (i = 0; i < nametab->size; i++) {
-		for (p = nametab->tab[i]; p; p = n) {
-			n = p->next;
-			free(p);
-		}
+	for (p = hashtab->first; p; p = p->nextbyorder) {
+		name = malloc(p->namelen);
+		memcpy(name, p->name, p->namelen);
+		p->name = name;
 	}
-	free(nametab);
+	hashtab->names_duped = 1;
 }
 
-
 /*
  * Version 1 or 2 directory routine wrappers
 */
@@ -1385,7 +1378,8 @@ lf_block_dir_entry_check(xfs_mount_t		*m
 			dir_stack_t		*stack,
 			ino_tree_node_t		*current_irec,
 			int			current_ino_offset,
-			name_hash_tab_t		*nametab)
+			dir_hash_tab_t		*hashtab,
+			xfs_dablk_t		da_bno)
 {
 	xfs_dir_leaf_entry_t	*entry;
 	ino_tree_node_t		*irec;
@@ -1545,7 +1539,9 @@ lf_block_dir_entry_check(xfs_mount_t		*m
 		/*
 		 * check for duplicate names in directory.
 		 */ 
-		if (!name_hash_add(nametab, namest->name, entry->namelen)) {
+		if (!dir_hash_add(hashtab, (da_bno << mp->m_sb.sb_blocklog) + 
+						entry->nameidx, 
+				lino, entry->namelen, namest->name)) {
 			do_warn(
 		_("entry \"%s\" (ino %llu) in dir %llu is a duplicate name"),
 				fname, lino, ino);
@@ -1635,7 +1631,7 @@ longform_dir_entry_check(xfs_mount_t	*mp
 			dir_stack_t	*stack,
 			ino_tree_node_t	*irec,
 			int		ino_offset,
-			name_hash_tab_t	*nametab)
+			dir_hash_tab_t	*hashtab)
 {
 	xfs_dir_leafblock_t	*leaf;
 	xfs_buf_t		*bp;
@@ -1677,8 +1673,6 @@ longform_dir_entry_check(xfs_mount_t	*mp
 
 		leaf = (xfs_dir_leafblock_t *)XFS_BUF_PTR(bp);
 
-		da_bno = INT_GET(leaf->hdr.info.forw, ARCH_CONVERT);
-
 		if (INT_GET(leaf->hdr.info.magic, ARCH_CONVERT) !=
 		    XFS_DIR_LEAF_MAGIC)  {
 			if (!no_modify)  {
@@ -1699,9 +1693,11 @@ _("bad magic # (0x%x) for dir ino %llu l
 		}
 
 		if (!skipit)
-			lf_block_dir_entry_check(mp, ino, leaf, &dirty,
-						num_illegal, need_dot, stack,
-						irec, ino_offset, nametab);
+			lf_block_dir_entry_check(mp, ino, leaf, &dirty, 
+					num_illegal, need_dot, stack, irec, 
+					ino_offset, hashtab, da_bno);
+
+		da_bno = INT_GET(leaf->hdr.info.forw, ARCH_CONVERT);
 
 		ASSERT(dirty == 0 || (dirty && !no_modify));
 
@@ -1745,6 +1741,127 @@ _("can't map leaf block %d in dir %llu, 
 }
 
 /*
+ * Unexpected failure during the rebuild will leave the entries in
+ * lost+found on the next run
+ */
+
+static void 
+longform_dir2_rebuild(
+	xfs_mount_t	*mp,
+	xfs_ino_t	ino,
+	xfs_inode_t	*ip,
+	dir_hash_tab_t	*hashtab)
+{
+	int			error;
+	int			nres;
+	xfs_trans_t		*tp;
+	xfs_fileoff_t		lastblock;
+	xfs_fsblock_t		firstblock;
+	xfs_bmap_free_t		flist;
+	xfs_ino_t		parentino;
+	xfs_inode_t		*pip;
+	int			byhash;
+	dir_hash_ent_t		*p;
+	int			committed;
+	int			done;
+	
+	/* 
+	 * trash directory completely and rebuild from scratch using the
+	 * name/inode pairs in the hash table
+	 */
+	 
+	do_warn(_("rebuilding directory inode %llu\n"), ino);
+	
+	/* 
+	 * first attempt to locate the parent inode, if it can't be found,
+	 * we'll use the lost+found inode 
+	 */
+	byhash = DIR_HASH_FUNC(hashtab, libxfs_da_hashname((uchar_t*)"..", 2));
+	parentino = orphanage_ino;
+	for (p = hashtab->byhash[byhash]; p; p = p->nextbyhash) {
+		if (p->namelen == 2 && p->name[0] == '.' && p->name[1] == '.') {
+			parentino = p->inum;
+			break;
+		}
+	}
+
+	XFS_BMAP_INIT(&flist, &firstblock);
+		
+	tp = libxfs_trans_alloc(mp, 0);
+	nres = XFS_REMOVE_SPACE_RES(mp);
+	error = libxfs_trans_reserve(tp, nres, XFS_REMOVE_LOG_RES(mp), 0,
+			XFS_TRANS_PERM_LOG_RES, XFS_REMOVE_LOG_COUNT);
+	if (error)
+		res_failed(error);
+	libxfs_trans_ijoin(tp, ip, 0);
+	libxfs_trans_ihold(tp, ip);
+	
+	if ((error = libxfs_bmap_last_offset(tp, ip, &lastblock, 
+						XFS_DATA_FORK)))
+		do_error(_("xfs_bmap_last_offset failed -- error - %d\n"), 
+			error);
+	
+	/* re-init the directory to shortform */
+	if ((error = libxfs_trans_iget(mp, tp, parentino, 0, 0, &pip)))
+		do_error(
+		_("couldn't iget parent inode %llu -- error - %d\n"),
+			parentino, error);
+
+	/* free all data, leaf, node and freespace blocks */
+	
+	if ((error = libxfs_bunmapi(tp, ip, 0, lastblock, 
+			XFS_BMAPI_METADATA, 0, &firstblock, &flist,
+			&done))) 
+		do_error(_("xfs_bunmapi failed -- error - %d\n"), error);
+	ASSERT(done);
+
+	libxfs_dir2_init(tp, ip, pip);
+	
+	error = libxfs_bmap_finish(&tp, &flist, firstblock, &committed);
+				
+	libxfs_trans_commit(tp, 
+			XFS_TRANS_RELEASE_LOG_RES|XFS_TRANS_SYNC, 0);
+		
+	/* go through the hash list and re-add the inodes */
+
+	for (p = hashtab->first; p; p = p->nextbyorder) {
+		
+		if (p->name[0] == '/' || (p->name[0] == '.' && (p->namelen == 1 
+				|| (p->namelen == 2 && p->name[1] == '.'))))
+			continue;
+		
+		tp = libxfs_trans_alloc(mp, 0);
+		nres = XFS_CREATE_SPACE_RES(mp, p->namelen);
+		if ((error = libxfs_trans_reserve(tp, nres, 
+				XFS_CREATE_LOG_RES(mp), 0,
+				XFS_TRANS_PERM_LOG_RES, XFS_CREATE_LOG_COUNT)))
+			do_error(
+	_("space reservation failed (%d), filesystem may be out of space\n"),
+				error);
+
+		libxfs_trans_ijoin(tp, ip, 0);
+		libxfs_trans_ihold(tp, ip);
+
+		XFS_BMAP_INIT(&flist, &firstblock);
+		if ((error = libxfs_dir2_createname(tp, ip, (char*)p->name, 
+				p->namelen, p->inum, &firstblock, &flist, nres)))
+			do_error(
+_("name create failed in ino %llu (%d), filesystem may be out of space\n"),
+				ino, error);
+
+		if ((error = libxfs_bmap_finish(&tp, &flist, firstblock, 
+				&committed)))
+			do_error(
+	_("bmap finish failed (%d), filesystem may be out of space\n"),
+				error);
+
+		libxfs_trans_commit(tp, 
+				XFS_TRANS_RELEASE_LOG_RES|XFS_TRANS_SYNC, 0);
+	}
+}
+
+
+/*
  * Kill a block in a version 2 inode.
  * Makes its own transaction.
  */
@@ -1807,7 +1924,6 @@ longform_dir2_entry_check_data(
 	xfs_dabuf_t		**bpp,
 	dir_hash_tab_t		*hashtab,
 	freetab_t		**freetabp,
-	name_hash_tab_t		*nametab,
 	xfs_dablk_t		da_bno,
 	int			isblock)
 {
@@ -1828,6 +1944,7 @@ longform_dir2_entry_check_data(
 	freetab_t		*freetab;
 	int			i;
 	int			ino_offset;
+	xfs_ino_t		inum;
 	ino_tree_node_t		*irec;
 	int			junkit;
 	int			lastfree;
@@ -1956,8 +2073,7 @@ longform_dir2_entry_check_data(
 	libxfs_trans_ijoin(tp, ip, 0);
 	libxfs_trans_ihold(tp, ip);
 	libxfs_da_bjoin(tp, bp);
-	if (isblock)
-		libxfs_da_bhold(tp, bp);
+	libxfs_da_bhold(tp, bp);
 	XFS_BMAP_INIT(&flist, &firstblock);
 	if (INT_GET(d->hdr.magic, ARCH_CONVERT) != wantmagic) {
 		do_warn(_("bad directory block magic # %#x for directory inode "
@@ -1987,7 +2103,7 @@ longform_dir2_entry_check_data(
 	while (ptr < endptr) {
 		dup = (xfs_dir2_data_unused_t *)ptr;
 		if (INT_GET(dup->freetag, ARCH_CONVERT) ==
-		    XFS_DIR2_DATA_FREE_TAG) {
+		    				XFS_DIR2_DATA_FREE_TAG) {
 			if (lastfree) {
 				do_warn(_("directory inode %llu block %u has "
 					  "consecutive free entries: "),
@@ -2011,10 +2127,24 @@ longform_dir2_entry_check_data(
 		addr = XFS_DIR2_DB_OFF_TO_DATAPTR(mp, db, ptr - (char *)d);
 		dep = (xfs_dir2_data_entry_t *)ptr;
 		ptr += XFS_DIR2_DATA_ENTSIZE(dep->namelen);
+		inum = INT_GET(dep->inumber, ARCH_CONVERT);
 		lastfree = 0;
-		dir_hash_add(hashtab,
-			libxfs_da_hashname((uchar_t *)dep->name, dep->namelen),
-			addr, dep->name[0] == '/');
+		if (!dir_hash_add(hashtab, addr, inum, dep->namelen, 
+				dep->name)) {
+			do_warn(
+		_("entry \"%s\" (ino %llu) in dir %llu is a duplicate name"),
+				fname, inum, ip->i_ino);
+			if (!no_modify) {
+				if (verbose)
+					do_warn(
+					_(", marking entry to be junked\n"));
+				else
+					do_warn("\n");
+			} else {
+				do_warn(_(", would junk entry\n"));
+			}
+			dep->name[0] = '/';
+		}
 		/*
 		 * skip bogus entries (leading '/').  they'll be deleted
 		 * later.  must still log it, else we leak references to
@@ -2029,7 +2159,7 @@ longform_dir2_entry_check_data(
 		junkit = 0;
 		bcopy(dep->name, fname, dep->namelen);
 		fname[dep->namelen] = '\0';
-		ASSERT(INT_GET(dep->inumber, ARCH_CONVERT) != NULLFSINO);
+		ASSERT(inum != NULLFSINO);
 		/*
 		 * skip the '..' entry since it's checked when the
 		 * directory is reached by something else.  if it never
@@ -2039,7 +2169,7 @@ longform_dir2_entry_check_data(
 		if (dep->namelen == 2 && dep->name[0] == '.' &&
 		    dep->name[1] == '.')
 			continue;
-		ASSERT(no_modify || !verify_inum(mp, INT_GET(dep->inumber, ARCH_CONVERT)));
+		ASSERT(no_modify || !verify_inum(mp, inum));
 		/*
 		 * special case the . entry.  we know there's only one
 		 * '.' and only '.' points to itself because bogus entries
@@ -2049,7 +2179,7 @@ longform_dir2_entry_check_data(
 		 * '..' is already accounted for or will be taken care
 		 * of when directory is moved to orphanage.
 		 */
-		if (ip->i_ino == INT_GET(dep->inumber, ARCH_CONVERT))  {
+		if (ip->i_ino == inum)  {
 			ASSERT(dep->name[0] == '.' && dep->namelen == 1);
 			add_inode_ref(current_irec, current_ino_offset);
 			*need_dot = 0;
@@ -2062,23 +2192,18 @@ longform_dir2_entry_check_data(
 		 * just skip it.  no need to process it and it's ..
 		 * link is already accounted for.
 		 */
-		if (INT_GET(dep->inumber, ARCH_CONVERT) == orphanage_ino &&
-		    strcmp(fname, ORPHANAGE) == 0)
+		if (inum == orphanage_ino && strcmp(fname, ORPHANAGE) == 0)
 			continue;
 		/*
 		 * skip entries with bogus inumbers if we're in no modify mode
 		 */
-		if (no_modify &&
-		    verify_inum(mp, INT_GET(dep->inumber, ARCH_CONVERT)))
+		if (no_modify && verify_inum(mp, inum))
 			continue;
 		/*
 		 * ok, now handle the rest of the cases besides '.' and '..'
 		 */
-		irec = find_inode_rec(
-			XFS_INO_TO_AGNO(mp,
-				INT_GET(dep->inumber, ARCH_CONVERT)),
-			XFS_INO_TO_AGINO(mp,
-				INT_GET(dep->inumber, ARCH_CONVERT)));
+		irec = find_inode_rec(XFS_INO_TO_AGNO(mp, inum),
+					XFS_INO_TO_AGINO(mp, inum));
 		if (irec == NULL)  {
 			nbad++;
 			do_warn(_("entry \"%s\" in directory inode %llu points "
@@ -2093,9 +2218,7 @@ longform_dir2_entry_check_data(
 			}
 			continue;
 		}
-		ino_offset = XFS_INO_TO_AGINO(mp,
-				INT_GET(dep->inumber, ARCH_CONVERT)) -
-					irec->ino_startnum;
+		ino_offset = XFS_INO_TO_AGINO(mp, inum) - irec->ino_startnum;
 		/*
 		 * if it's a free inode, blow out the entry.
 		 * by now, any inode that we think is free
@@ -2106,18 +2229,13 @@ longform_dir2_entry_check_data(
 			 * don't complain if this entry points to the old
 			 * and now-free lost+found inode
 			 */
-			if (verbose || no_modify ||
-			    INT_GET(dep->inumber, ARCH_CONVERT) !=
-			    old_orphanage_ino)
+			if (verbose || no_modify || inum != old_orphanage_ino)
 				do_warn(
 	_("entry \"%s\" in directory inode %llu points to free inode %llu"),
-					fname, ip->i_ino,
-					INT_GET(dep->inumber, ARCH_CONVERT));
+					fname, ip->i_ino, inum);
 			nbad++;
 			if (!no_modify)  {
-				if (verbose ||
-				    INT_GET(dep->inumber, ARCH_CONVERT) !=
-				    old_orphanage_ino)
+				if (verbose || inum != old_orphanage_ino)
 					do_warn(
 					_(", marking entry to be junked\n"));
 				else
@@ -2130,28 +2248,6 @@ longform_dir2_entry_check_data(
 			continue;
 		}
 		/*
-		 * check for duplicate names in directory.
-		 */ 
-		if (!name_hash_add(nametab, dep->name, dep->namelen)) {
-			do_warn(
-		_("entry \"%s\" (ino %llu) in dir %llu is a duplicate name"),
-				fname, INT_GET(dep->inumber, ARCH_CONVERT),
-				ip->i_ino);
-			nbad++;
-			if (!no_modify) {
-				if (verbose)
-					do_warn(
-					_(", marking entry to be junked\n"));
-				else
-					do_warn("\n");
-				dep->name[0] = '/';
-				libxfs_dir2_data_log_entry(tp, bp, dep);
-			} else {
-				do_warn(_(", would junk entry\n"));
-			}
-			continue;
-		}
-		/*
 		 * check easy case first, regular inode, just bump
 		 * the link count and continue
 		 */
@@ -2172,22 +2268,17 @@ longform_dir2_entry_check_data(
 			junkit = 1;
 			do_warn(
 _("entry \"%s\" in dir %llu points to an already connected directory inode %llu,\n"),
-				fname, ip->i_ino,
-				INT_GET(dep->inumber, ARCH_CONVERT));
+				fname, ip->i_ino, inum);
 		} else if (parent == ip->i_ino)  {
 			add_inode_reached(irec, ino_offset);
 			add_inode_ref(current_irec, current_ino_offset);
-			if (!is_inode_refchecked(
-				INT_GET(dep->inumber, ARCH_CONVERT), irec,
-					ino_offset))
-				push_dir(stack,
-					INT_GET(dep->inumber, ARCH_CONVERT));
+			if (!is_inode_refchecked(inum, irec, ino_offset))
+				push_dir(stack, inum);
 		} else  {
 			junkit = 1;
 			do_warn(
 _("entry \"%s\" in dir inode %llu inconsistent with .. value (%llu) in ino %llu,\n"),
-				fname, ip->i_ino, parent,
-				INT_GET(dep->inumber, ARCH_CONVERT));
+				fname, ip->i_ino, parent, inum);
 		}
 		if (junkit)  {
 			junkit = 0;
@@ -2195,9 +2286,7 @@ _("entry \"%s\" in dir inode %llu incons
 			if (!no_modify)  {
 				dep->name[0] = '/';
 				libxfs_dir2_data_log_entry(tp, bp, dep);
-				if (verbose ||
-				    INT_GET(dep->inumber, ARCH_CONVERT) !=
-				    old_orphanage_ino)
+				if (verbose || inum != old_orphanage_ino)
 					do_warn(
 					_("\twill clear entry \"%s\"\n"),
 						fname);
@@ -2212,8 +2301,6 @@ _("entry \"%s\" in dir inode %llu incons
 		libxfs_dir2_data_freescan(mp, d, &needlog, NULL);
 	if (needlog)
 		libxfs_dir2_data_log_header(tp, bp);
-	else if (!isblock && !nbad)
-		libxfs_da_brelse(tp, bp);
 	libxfs_bmap_finish(&tp, &flist, firstblock, &committed);
 	libxfs_trans_commit(tp, 0, 0);
 	freetab->ents[db].v = INT_GET(d->hdr.bestfree[0].length, ARCH_CONVERT);
@@ -2306,19 +2393,19 @@ longform_dir2_check_node(
 	xfs_fileoff_t		next_da_bno;
 	int			seeval = 0;
 	int			used;
-
+	
 	for (da_bno = mp->m_dirleafblk, next_da_bno = 0;
 	     next_da_bno != NULLFILEOFF && da_bno < mp->m_dirfreeblk;
 	     da_bno = (xfs_dablk_t)next_da_bno) {
 		next_da_bno = da_bno + mp->m_dirblkfsbs - 1;
-		if (libxfs_bmap_next_offset(NULL, ip, &next_da_bno, XFS_DATA_FORK))
+		if (libxfs_bmap_next_offset(NULL, ip, &next_da_bno, XFS_DATA_FORK)) 
 			break;
 		if (libxfs_da_read_bufr(NULL, ip, da_bno, -1, &bp,
 				XFS_DATA_FORK)) {
-			do_error(
-			_("can't read block %u for directory inode %llu\n"),
+			do_warn(
+			_("can't read leaf block %u for directory inode %llu\n"),
 				da_bno, ip->i_ino);
-			/* NOTREACHED */
+			return 1;
 		}
 		leaf = bp->data;
 		if (INT_GET(leaf->hdr.info.magic, ARCH_CONVERT) !=
@@ -2348,23 +2435,24 @@ longform_dir2_check_node(
 		seeval = dir_hash_see_all(hashtab, leaf->ents, INT_GET(leaf->hdr.count, ARCH_CONVERT),
 			INT_GET(leaf->hdr.stale, ARCH_CONVERT));
 		libxfs_da_brelse(NULL, bp);
-		if (seeval != DIR_HASH_CK_OK)
+		if (seeval != DIR_HASH_CK_OK) 
 			return 1;
 	}
-	if (dir_hash_check(hashtab, ip, seeval))
+	if (dir_hash_check(hashtab, ip, seeval)) 
 		return 1;
+	
 	for (da_bno = mp->m_dirfreeblk, next_da_bno = 0;
 	     next_da_bno != NULLFILEOFF;
 	     da_bno = (xfs_dablk_t)next_da_bno) {
 		next_da_bno = da_bno + mp->m_dirblkfsbs - 1;
-		if (libxfs_bmap_next_offset(NULL, ip, &next_da_bno, XFS_DATA_FORK))
+		if (libxfs_bmap_next_offset(NULL, ip, &next_da_bno, XFS_DATA_FORK)) 
 			break;
 		if (libxfs_da_read_bufr(NULL, ip, da_bno, -1, &bp,
 				XFS_DATA_FORK)) {
-			do_error(_("can't read block %u for directory inode "
-				   "%llu\n"),
+			do_warn(
+		_("can't read freespace block %u for directory inode %llu\n"),
 				da_bno, ip->i_ino);
-			/* NOTREACHED */
+			return 1;
 		}
 		free = bp->data;
 		fdb = XFS_DIR2_DA_TO_DB(mp, da_bno);
@@ -2418,388 +2506,9 @@ longform_dir2_check_node(
 }
 
 /*
- * Rebuild a directory: set up.
- * Turn it into a node-format directory with no contents in the
- * upper area.  Also has correct freespace blocks.
- */
-void
-longform_dir2_rebuild_setup(
-	xfs_mount_t		*mp,
-	xfs_ino_t		ino,
-	xfs_inode_t		*ip,
-	freetab_t		*freetab)
-{
-	xfs_da_args_t		args;
-	int			committed;
-	xfs_dir2_data_t		*data = NULL;
-	xfs_dabuf_t		*dbp;
-	int			error;
-	xfs_dir2_db_t		fbno;
-	xfs_dabuf_t		*fbp;
-	xfs_fsblock_t		firstblock;
-	xfs_bmap_free_t		flist;
-	xfs_dir2_free_t		*free;
-	int			i;
-	int			j;
-	xfs_dablk_t		lblkno;
-	xfs_dabuf_t		*lbp;
-	xfs_dir2_leaf_t		*leaf;
-	int			nres;
-	xfs_trans_t		*tp;
-
-	/* read first directory block */
-	tp = libxfs_trans_alloc(mp, 0);
-	nres = XFS_DAENTER_SPACE_RES(mp, XFS_DATA_FORK);
-	error = libxfs_trans_reserve(tp,
-		nres, XFS_CREATE_LOG_RES(mp), 0, XFS_TRANS_PERM_LOG_RES,
-		XFS_CREATE_LOG_COUNT);
-	if (error)
-		res_failed(error);
-	libxfs_trans_ijoin(tp, ip, 0);
-	libxfs_trans_ihold(tp, ip);
-	XFS_BMAP_INIT(&flist, &firstblock);
-	if (libxfs_da_read_buf(tp, ip, mp->m_dirdatablk, -2, &dbp,
-			XFS_DATA_FORK)) {
-		do_error(_("can't read block %u for directory inode %llu\n"),
-			mp->m_dirdatablk, ino);
-		/* NOTREACHED */
-	}
-
-	if (dbp)
-		data = dbp->data;
-
-	/* check for block format directory */
-	if (data &&
-	    INT_GET((data)->hdr.magic, ARCH_CONVERT) == XFS_DIR2_BLOCK_MAGIC) {
-		xfs_dir2_block_t	*block;
-		xfs_dir2_leaf_entry_t	*blp;
-		xfs_dir2_block_tail_t	*btp;
-		int			needlog;
-		int			needscan;
-
-		/* convert directory block from block format to data format */
-		INT_SET(data->hdr.magic, ARCH_CONVERT, XFS_DIR2_DATA_MAGIC);
-
-		/* construct freelist */
-		block = (xfs_dir2_block_t *)data;
-		btp = XFS_DIR2_BLOCK_TAIL_P(mp, block);
-		blp = XFS_DIR2_BLOCK_LEAF_P(btp);
-		needlog = needscan = 0;
-		libxfs_dir2_data_make_free(tp, dbp, (char *)blp - (char *)block,
-			(char *)block + mp->m_dirblksize - (char *)blp,
-			&needlog, &needscan);
-		if (needscan)
-			libxfs_dir2_data_freescan(mp, data, &needlog, NULL);
-		libxfs_da_log_buf(tp, dbp, 0, mp->m_dirblksize - 1);
-	} else if (dbp) {
-		libxfs_da_brelse(tp, dbp);
-	}
-
-	/* allocate blocks for btree */
-	bzero(&args, sizeof(args));
-	args.trans = tp;
-	args.dp = ip;
-	args.whichfork = XFS_DATA_FORK;
-	args.firstblock = &firstblock;
-	args.flist = &flist;
-	args.total = nres;
-	if ((error = libxfs_da_grow_inode(&args, &lblkno)) ||
-	    (error = libxfs_da_get_buf(tp, ip, lblkno, -1, &lbp, XFS_DATA_FORK))) {
-		do_error(_("can't add btree block to directory inode %llu\n"),
-			ino);
-		/* NOTREACHED */
-	}
-	leaf = lbp->data;
-	bzero(leaf, mp->m_dirblksize);
-	INT_SET(leaf->hdr.info.magic, ARCH_CONVERT, XFS_DIR2_LEAFN_MAGIC);
-	libxfs_da_log_buf(tp, lbp, 0, mp->m_dirblksize - 1);
-	libxfs_bmap_finish(&tp, &flist, firstblock, &committed);
-	libxfs_trans_commit(tp, 0, 0);
-
-	for (i = 0; i < freetab->nents; i += XFS_DIR2_MAX_FREE_BESTS(mp)) {
-		tp = libxfs_trans_alloc(mp, 0);
-		nres = XFS_DAENTER_SPACE_RES(mp, XFS_DATA_FORK);
-		error = libxfs_trans_reserve(tp,
-			nres, XFS_CREATE_LOG_RES(mp), 0, XFS_TRANS_PERM_LOG_RES,
-			XFS_CREATE_LOG_COUNT);
-		if (error)
-			res_failed(error);
-		libxfs_trans_ijoin(tp, ip, 0);
-		libxfs_trans_ihold(tp, ip);
-		XFS_BMAP_INIT(&flist, &firstblock);
-		bzero(&args, sizeof(args));
-		args.trans = tp;
-		args.dp = ip;
-		args.whichfork = XFS_DATA_FORK;
-		args.firstblock = &firstblock;
-		args.flist = &flist;
-		args.total = nres;
-		if ((error = libxfs_dir2_grow_inode(&args, XFS_DIR2_FREE_SPACE,
-						 &fbno)) ||
-		    (error = libxfs_da_get_buf(tp, ip, XFS_DIR2_DB_TO_DA(mp, fbno),
-					    -1, &fbp, XFS_DATA_FORK))) {
-			do_error(_("can't add free block to directory inode "
-				   "%llu\n"),
-				ino);
-			/* NOTREACHED */
-		}
-		free = fbp->data;
-		bzero(free, mp->m_dirblksize);
-		INT_SET(free->hdr.magic, ARCH_CONVERT, XFS_DIR2_FREE_MAGIC);
-		INT_SET(free->hdr.firstdb, ARCH_CONVERT, i);
-		INT_SET(free->hdr.nvalid, ARCH_CONVERT, XFS_DIR2_MAX_FREE_BESTS(mp));
-		if (i + INT_GET(free->hdr.nvalid, ARCH_CONVERT) > freetab->nents)
-			INT_SET(free->hdr.nvalid, ARCH_CONVERT, freetab->nents - i);
-		for (j = 0; j < INT_GET(free->hdr.nvalid, ARCH_CONVERT); j++) {
-			INT_SET(free->bests[j], ARCH_CONVERT, freetab->ents[i + j].v);
-			if (INT_GET(free->bests[j], ARCH_CONVERT) != NULLDATAOFF)
-				INT_MOD(free->hdr.nused, ARCH_CONVERT, +1);
-		}
-		libxfs_da_log_buf(tp, fbp, 0, mp->m_dirblksize - 1);
-		libxfs_bmap_finish(&tp, &flist, firstblock, &committed);
-		libxfs_trans_commit(tp, 0, 0);
-	}
-}
-
-/*
- * Rebuild the entries from a single data block.
- */
-void
-longform_dir2_rebuild_data(
-	xfs_mount_t		*mp,
-	xfs_ino_t		ino,
-	xfs_inode_t		*ip,
-	xfs_dablk_t		da_bno)
-{
-	xfs_dabuf_t		*bp;
-	xfs_dir2_block_tail_t	*btp;
-	int			committed;
-	xfs_dir2_data_t		*data;
-	xfs_dir2_db_t		dbno;
-	xfs_dir2_data_entry_t	*dep;
-	xfs_dir2_data_unused_t	*dup;
-	char			*endptr;
-	int			error;
-	xfs_dir2_free_t		*fblock;
-	xfs_dabuf_t		*fbp;
-	xfs_dir2_db_t		fdb;
-	int			fi;
-	xfs_fsblock_t		firstblock;
-	xfs_bmap_free_t		flist;
-	int			needlog;
-	int			needscan;
-	int			nres;
-	char			*ptr;
-	xfs_trans_t		*tp;
-
-	if (libxfs_da_read_buf(NULL, ip, da_bno, da_bno == 0 ? -2 : -1, &bp,
-			XFS_DATA_FORK)) {
-		do_error(_("can't read block %u for directory inode %llu\n"),
-			da_bno, ino);
-		/* NOTREACHED */
-	}
-	if (da_bno == 0 && bp == NULL)
-		/*
-		 * The block was punched out.
-		 */
-		return;
-	ASSERT(bp);
-	dbno = XFS_DIR2_DA_TO_DB(mp, da_bno);
-	fdb = XFS_DIR2_DB_TO_FDB(mp, dbno);
-	if (libxfs_da_read_buf(NULL, ip, XFS_DIR2_DB_TO_DA(mp, fdb), -1, &fbp,
-			XFS_DATA_FORK)) {
-		do_error(_("can't read block %u for directory inode %llu\n"),
-			XFS_DIR2_DB_TO_DA(mp, fdb), ino);
-		/* NOTREACHED */
-	}
-	data = malloc(mp->m_dirblksize);
-	if (!data) {
-		do_error(
-		_("malloc failed in longform_dir2_rebuild_data (%u bytes)\n"),
-			mp->m_dirblksize);
-		exit(1);
-	}
-	bcopy(bp->data, data, mp->m_dirblksize);
-	ptr = (char *)data->u;
-	if (INT_GET(data->hdr.magic, ARCH_CONVERT) == XFS_DIR2_BLOCK_MAGIC) {
-		btp = XFS_DIR2_BLOCK_TAIL_P(mp, (xfs_dir2_block_t *)data);
-		endptr = (char *)XFS_DIR2_BLOCK_LEAF_P(btp);
-	} else
-		endptr = (char *)data + mp->m_dirblksize;
-	fblock = fbp->data;
-	fi = XFS_DIR2_DB_TO_FDINDEX(mp, dbno);
-	tp = libxfs_trans_alloc(mp, 0);
-	error = libxfs_trans_reserve(tp, 0, XFS_CREATE_LOG_RES(mp), 0,
-		XFS_TRANS_PERM_LOG_RES, XFS_CREATE_LOG_COUNT);
-	if (error)
-		res_failed(error);
-	libxfs_trans_ijoin(tp, ip, 0);
-	libxfs_trans_ihold(tp, ip);
-	libxfs_da_bjoin(tp, bp);
-	libxfs_da_bhold(tp, bp);
-	libxfs_da_bjoin(tp, fbp);
-	libxfs_da_bhold(tp, fbp);
-	XFS_BMAP_INIT(&flist, &firstblock);
-	needlog = needscan = 0;
-	bzero(((xfs_dir2_data_t *)(bp->data))->hdr.bestfree,
-		sizeof(data->hdr.bestfree));
-	libxfs_dir2_data_make_free(tp, bp, (xfs_dir2_data_aoff_t)sizeof(data->hdr),
-		mp->m_dirblksize - sizeof(data->hdr), &needlog, &needscan);
-	ASSERT(needscan == 0);
-	libxfs_dir2_data_log_header(tp, bp);
-	INT_SET(fblock->bests[fi], ARCH_CONVERT,
-		INT_GET(((xfs_dir2_data_t *)(bp->data))->hdr.bestfree[0].length, ARCH_CONVERT));
-	libxfs_dir2_free_log_bests(tp, fbp, fi, fi);
-	libxfs_bmap_finish(&tp, &flist, firstblock, &committed);
-	libxfs_trans_commit(tp, 0, 0);
-
-	while (ptr < endptr) {
-		dup = (xfs_dir2_data_unused_t *)ptr;
-		if (INT_GET(dup->freetag, ARCH_CONVERT) == XFS_DIR2_DATA_FREE_TAG) {
-			ptr += INT_GET(dup->length, ARCH_CONVERT);
-			continue;
-		}
-		dep = (xfs_dir2_data_entry_t *)ptr;
-		ptr += XFS_DIR2_DATA_ENTSIZE(dep->namelen);
-		if (dep->name[0] == '/')
-			continue;
-		tp = libxfs_trans_alloc(mp, 0);
-		nres = XFS_CREATE_SPACE_RES(mp, dep->namelen);
-		error = libxfs_trans_reserve(tp, nres, XFS_CREATE_LOG_RES(mp), 0,
-			XFS_TRANS_PERM_LOG_RES, XFS_CREATE_LOG_COUNT);
-		if (error)
-			res_failed(error);
-		libxfs_trans_ijoin(tp, ip, 0);
-		libxfs_trans_ihold(tp, ip);
-		libxfs_da_bjoin(tp, bp);
-		libxfs_da_bhold(tp, bp);
-		libxfs_da_bjoin(tp, fbp);
-		libxfs_da_bhold(tp, fbp);
-		XFS_BMAP_INIT(&flist, &firstblock);
-		error = dir_createname(mp, tp, ip, (char *)dep->name,
-			dep->namelen, INT_GET(dep->inumber, ARCH_CONVERT),
-			&firstblock, &flist, nres);
-		ASSERT(error == 0);
-		libxfs_bmap_finish(&tp, &flist, firstblock, &committed);
-		libxfs_trans_commit(tp, 0, 0);
-	}
-	libxfs_da_brelse(NULL, bp);
-	libxfs_da_brelse(NULL, fbp);
-	free(data);
-}
-
-/*
- * Finish the rebuild of a directory.
- * Stuff / in and then remove it, this forces the directory to end
- * up in the right format.
- */
-void
-longform_dir2_rebuild_finish(
-	xfs_mount_t		*mp,
-	xfs_ino_t		ino,
-	xfs_inode_t		*ip)
-{
-	int			committed;
-	int			error;
-	xfs_fsblock_t		firstblock;
-	xfs_bmap_free_t		flist;
-	int			nres;
-	xfs_trans_t		*tp;
-
-	tp = libxfs_trans_alloc(mp, 0);
-	nres = XFS_CREATE_SPACE_RES(mp, 1);
-	error = libxfs_trans_reserve(tp, nres, XFS_CREATE_LOG_RES(mp), 0,
-		XFS_TRANS_PERM_LOG_RES, XFS_CREATE_LOG_COUNT);
-	if (error)
-		res_failed(error);
-	libxfs_trans_ijoin(tp, ip, 0);
-	libxfs_trans_ihold(tp, ip);
-	XFS_BMAP_INIT(&flist, &firstblock);
-	error = dir_createname(mp, tp, ip, "/", 1, ino,
-			&firstblock, &flist, nres);
-	ASSERT(error == 0);
-	libxfs_bmap_finish(&tp, &flist, firstblock, &committed);
-	libxfs_trans_commit(tp, 0, 0);
-
-	/* could kill trailing empty data blocks here */
-
-	tp = libxfs_trans_alloc(mp, 0);
-	nres = XFS_REMOVE_SPACE_RES(mp);
-	error = libxfs_trans_reserve(tp, nres, XFS_REMOVE_LOG_RES(mp), 0,
-		XFS_TRANS_PERM_LOG_RES, XFS_REMOVE_LOG_COUNT);
-	if (error)
-		res_failed(error);
-	libxfs_trans_ijoin(tp, ip, 0);
-	libxfs_trans_ihold(tp, ip);
-	XFS_BMAP_INIT(&flist, &firstblock);
-	error = dir_removename(mp, tp, ip, "/", 1, ino,
-			&firstblock, &flist, nres);
-	ASSERT(error == 0);
-	libxfs_bmap_finish(&tp, &flist, firstblock, &committed);
-	libxfs_trans_commit(tp, 0, 0);
-}
-
-/*
- * Rebuild a directory.
- * Remove all the non-data blocks.
- * Re-initialize to (empty) node form.
- * Loop over the data blocks reinserting each entry.
- * Force the directory into the right format.
- */
-void
-longform_dir2_rebuild(
-	xfs_mount_t	*mp,
-	xfs_ino_t	ino,
-	xfs_inode_t	*ip,
-	int		*num_illegal,
-	freetab_t	*freetab,
-	int		isblock)
-{
-	xfs_dabuf_t	*bp;
-	xfs_dablk_t	da_bno;
-	xfs_fileoff_t	next_da_bno;
-
-	do_warn(_("rebuilding directory inode %llu\n"), ino);
-
-	/* kill leaf blocks */
-	for (da_bno = mp->m_dirleafblk, next_da_bno = isblock ? NULLFILEOFF : 0;
-	     next_da_bno != NULLFILEOFF;
-	     da_bno = (xfs_dablk_t)next_da_bno) {
-		next_da_bno = da_bno + mp->m_dirblkfsbs - 1;
-		if (libxfs_bmap_next_offset(NULL, ip, &next_da_bno, XFS_DATA_FORK))
-			break;
-		if (libxfs_da_get_buf(NULL, ip, da_bno, -1, &bp, XFS_DATA_FORK)) {
-			do_error(_("can't get block %u for directory inode "
-				   "%llu\n"),
-				da_bno, ino);
-			/* NOTREACHED */
-		}
-		dir2_kill_block(mp, ip, da_bno, bp);
-	}
-
-	/* rebuild empty btree and freelist */
-	longform_dir2_rebuild_setup(mp, ino, ip, freetab);
-
-	/* rebuild directory */
-	for (da_bno = mp->m_dirdatablk, next_da_bno = 0;
-	     da_bno < mp->m_dirleafblk && next_da_bno != NULLFILEOFF;
-	     da_bno = (xfs_dablk_t)next_da_bno) {
-		next_da_bno = da_bno + mp->m_dirblkfsbs - 1;
-		if (libxfs_bmap_next_offset(NULL, ip, &next_da_bno, XFS_DATA_FORK))
-			break;
-		longform_dir2_rebuild_data(mp, ino, ip, da_bno);
-	}
-
-	/* put the directory in the appropriate on-disk format */
-	longform_dir2_rebuild_finish(mp, ino, ip);
-	*num_illegal = 0;
-}
-
-/*
- * succeeds or dies, inode never gets dirtied since all changes
- * happen in file blocks.  the inode size and other core info
- * is already correct, it's just the leaf entries that get altered.
- * XXX above comment is wrong for v2 - need to see why it matters
+ * If a directory is corrupt, we need to read in as many entries as possible,
+ * destroy the entry and create a new one with recovered name/inode pairs.
+ * (ie. get libxfs to do all the grunt work)
  */
 void
 longform_dir2_entry_check(xfs_mount_t	*mp,
@@ -2810,15 +2519,14 @@ longform_dir2_entry_check(xfs_mount_t	*m
 			dir_stack_t	*stack,
 			ino_tree_node_t	*irec,
 			int		ino_offset,
-			name_hash_tab_t	*nametab)
+			dir_hash_tab_t	*hashtab)
 {
 	xfs_dir2_block_t	*block;
 	xfs_dir2_leaf_entry_t	*blp;
-	xfs_dabuf_t		*bp;
+	xfs_dabuf_t		**bplist;
 	xfs_dir2_block_tail_t	*btp;
 	xfs_dablk_t		da_bno;
 	freetab_t		*freetab;
-	dir_hash_tab_t		*hashtab;
 	int			i;
 	int			isblock;
 	int			isleaf;
@@ -2840,6 +2548,7 @@ longform_dir2_entry_check(xfs_mount_t	*m
 		freetab->ents[i].v = NULLDATAOFF;
 		freetab->ents[i].s = 0;
 	}
+	bplist = calloc(freetab->naents, sizeof(xfs_dabuf_t*));
 	/* is this a block, leaf, or node directory? */
 	libxfs_dir2_isblock(NULL, ip, &isblock);
 	libxfs_dir2_isleaf(NULL, ip, &isleaf);
@@ -2847,50 +2556,58 @@ longform_dir2_entry_check(xfs_mount_t	*m
 	if (do_prefetch && !isblock)
 		prefetch_p6_dir2(mp, ip);
 
-	/* check directory data */
-	hashtab = dir_hash_init(ip->i_d.di_size);
+	/* check directory "data" blocks (ie. name/inode pairs) */
 	for (da_bno = 0, next_da_bno = 0;
 	     next_da_bno != NULLFILEOFF && da_bno < mp->m_dirleafblk;
 	     da_bno = (xfs_dablk_t)next_da_bno) {
 		next_da_bno = da_bno + mp->m_dirblkfsbs - 1;
+		ASSERT(XFS_DIR2_DA_TO_DB(mp, da_bno) < freetab->naents);
 		if (libxfs_bmap_next_offset(NULL, ip, &next_da_bno, XFS_DATA_FORK))
 			break;
-		if (libxfs_da_read_bufr(NULL, ip, da_bno,
-				da_bno == 0 ? -2 : -1, &bp, XFS_DATA_FORK)) {
-			do_error(_("can't read block %u for directory inode "
-				   "%llu\n"),
+		if (libxfs_da_read_bufr(NULL, ip, da_bno, -1, 
+				&bplist[XFS_DIR2_DA_TO_DB(mp, da_bno)], 
+				XFS_DATA_FORK)) {
+			do_warn(_(
+			"can't read data block %u for directory inode %llu\n"),
 				da_bno, ino);
-			/* NOTREACHED */
+			*num_illegal++;
+			continue;	/* try and read all "data" blocks */
 		}
-		/* is there a hole at the start? */
-		if (da_bno == 0 && bp == NULL)
-			continue;
 		longform_dir2_entry_check_data(mp, ip, num_illegal, need_dot,
-			stack, irec, ino_offset, &bp, hashtab, &freetab, 
-			nametab, da_bno, isblock);
-		/* it releases the buffer unless isblock is set */
+				stack, irec, ino_offset, 
+				&bplist[XFS_DIR2_DA_TO_DB(mp, da_bno)], hashtab,  
+				&freetab, da_bno, isblock);
 	}
 	fixit = (*num_illegal != 0) || dir2_is_badino(ino);
 
 	/* check btree and freespace */
 	if (isblock) {
-		ASSERT(bp);
-		block = bp->data;
+		block = bplist[0]->data;
 		btp = XFS_DIR2_BLOCK_TAIL_P(mp, block);
 		blp = XFS_DIR2_BLOCK_LEAF_P(btp);
-		seeval = dir_hash_see_all(hashtab, blp, INT_GET(btp->count, ARCH_CONVERT), INT_GET(btp->stale, ARCH_CONVERT));
+		seeval = dir_hash_see_all(hashtab, blp, 
+				INT_GET(btp->count, ARCH_CONVERT), 
+				INT_GET(btp->stale, ARCH_CONVERT));
 		if (dir_hash_check(hashtab, ip, seeval))
 			fixit |= 1;
-		libxfs_da_brelse(NULL, bp);
 	} else if (isleaf) {
 		fixit |= longform_dir2_check_leaf(mp, ip, hashtab, freetab);
 	} else {
 		fixit |= longform_dir2_check_node(mp, ip, hashtab, freetab);
 	}
-	dir_hash_done(hashtab);
-	if (!no_modify && fixit)
-		longform_dir2_rebuild(mp, ino, ip, num_illegal, freetab,
-			isblock);
+	if (!no_modify && fixit) {
+		dir_hash_dup_names(hashtab);
+		for (i = 0; i < freetab->naents; i++) 
+			if (bplist[i])
+				libxfs_da_brelse(NULL, bplist[i]);
+		longform_dir2_rebuild(mp, ino, ip, hashtab);
+		*num_illegal = 0;
+	} else {
+		for (i = 0; i < freetab->naents; i++) 
+			if (bplist[i])
+				libxfs_da_brelse(NULL, bplist[i]);
+	}
+	
 	free(freetab);
 }
 
@@ -2906,7 +2623,7 @@ shortform_dir_entry_check(xfs_mount_t	*m
 			dir_stack_t	*stack,
 			ino_tree_node_t	*current_irec,
 			int		current_ino_offset,
-			name_hash_tab_t	*nametab)
+			dir_hash_tab_t	*hashtab)
 {
 	xfs_ino_t		lino;
 	xfs_ino_t		parent;
@@ -3044,7 +2761,7 @@ _("entry \"%s\" in shortform dir %llu re
 		ASSERT(irec != NULL);
 
 		ino_offset = XFS_INO_TO_AGINO(mp, lino) - irec->ino_startnum;
-
+		
 		/*
 		 * if it's a free inode, blow out the entry.
 		 * by now, any inode that we think is free
@@ -3066,8 +2783,9 @@ _("entry \"%s\" in shortform dir inode %
 				do_warn(_("would junk entry \"%s\"\n"),
 					fname);
 			}
-		} else if (!name_hash_add(nametab, sf_entry->name, 
-					sf_entry->namelen)) {
+		} else if (!dir_hash_add(hashtab, 
+				(xfs_dir2_dataptr_t)(sf_entry - &sf->list[0]),
+				lino, sf_entry->namelen, sf_entry->name)) {
 			/*
 			 * check for duplicate names in directory.
 			 */ 
@@ -3311,7 +3029,7 @@ shortform_dir2_entry_check(xfs_mount_t	*
 			dir_stack_t	*stack,
 			ino_tree_node_t	*current_irec,
 			int		current_ino_offset,
-			name_hash_tab_t	*nametab)
+			dir_hash_tab_t	*hashtab)
 {
 	xfs_ino_t		lino;
 	xfs_ino_t		parent;
@@ -3484,7 +3202,9 @@ shortform_dir2_entry_check(xfs_mount_t	*
 				do_warn(_("would junk entry \"%s\"\n"),
 					fname);
 			}
-		} else if (!name_hash_add(nametab, sfep->name, sfep->namelen)) {
+		} else if (!dir_hash_add(hashtab, (xfs_dir2_dataptr_t)
+					(sfep - XFS_DIR2_SF_FIRSTENTRY(sfp)),
+				lino, sfep->namelen, sfep->name)) {
 			/*
 			 * check for duplicate names in directory.
 			 */ 
@@ -3650,7 +3370,7 @@ process_dirstack(xfs_mount_t *mp, dir_st
 	xfs_trans_t		*tp;
 	xfs_dahash_t		hashval;
 	ino_tree_node_t		*irec;
-	name_hash_tab_t		*nametab;
+	dir_hash_tab_t		*hashtab;
 	int			ino_offset, need_dot, committed;
 	int			dirty, num_illegal, error, nres;
 
@@ -3731,7 +3451,7 @@ process_dirstack(xfs_mount_t *mp, dir_st
 
 		add_inode_refchecked(ino, irec, ino_offset);
 
-		nametab = name_hash_init(ip->i_d.di_size);
+		hashtab = dir_hash_init(ip->i_d.di_size);
 
 		/*
 		 * look for bogus entries
@@ -3750,13 +3470,13 @@ process_dirstack(xfs_mount_t *mp, dir_st
 							&num_illegal, &need_dot,
 							stack, irec,
 							ino_offset,
-							nametab);
+							hashtab);
 			else
 				longform_dir_entry_check(mp, ino, ip,
 							&num_illegal, &need_dot,
 							stack, irec,
 							ino_offset,
-							nametab);
+							hashtab);
 			break;
 		case XFS_DINODE_FMT_LOCAL:
 			tp = libxfs_trans_alloc(mp, 0);
@@ -3781,12 +3501,12 @@ process_dirstack(xfs_mount_t *mp, dir_st
 				shortform_dir2_entry_check(mp, ino, ip, &dirty,
 							stack, irec,
 							ino_offset,
-							nametab);
+							hashtab);
 			else
 				shortform_dir_entry_check(mp, ino, ip, &dirty,
 							stack, irec,
 							ino_offset,
-							nametab);
+							hashtab);
 
 			ASSERT(dirty == 0 || (dirty && !no_modify));
 			if (dirty)  {
@@ -3801,7 +3521,7 @@ process_dirstack(xfs_mount_t *mp, dir_st
 		default:
 			break;
 		}
-		name_hash_done(nametab);
+		dir_hash_done(hashtab);
 
 		hashval = 0;
 




^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: Review: xfs_repair fixes for dir2 corruption
  2006-07-28 14:45             ` Madan Valluri
@ 2006-07-31  7:18               ` Barry Naujok
  0 siblings, 0 replies; 12+ messages in thread
From: Barry Naujok @ 2006-07-31  7:18 UTC (permalink / raw)
  To: 'Madan Valluri', 'Nathan Scott'; +Cc: xfs

[-- Attachment #1: Type: text/plain, Size: 2203 bytes --]

 

> -----Original Message-----
> From: xfs-bounce@oss.sgi.com [mailto:xfs-bounce@oss.sgi.com] 
> On Behalf Of Madan Valluri
> Sent: Saturday, 29 July 2006 12:45 AM
> To: Nathan Scott
> Cc: Barry Naujok; xfs@oss.sgi.com
> Subject: Re: Review: xfs_repair fixes for dir2 corruption
> 
> Nathan Scott wrote:
> > On Fri, Jul 28, 2006 at 11:58:52AM +1000, Barry Naujok wrote:
> >   
> >> This patch addresses the following xfs_repair issues:
> >>     
> >
> > The libxfs cache stuff looks good to me.  Maybe Madan can cast
> > an eye over the repair changes for ya?
> >
> > cheers.
> >
> >   
> >>  
> 1) Since dir_hash_add can be called for both V1 and V2 
> directories its 
> second parameter type xfs_dir2_dataptr_t should be neutral.

Ok, made it a __uint32_t.

> 2) In dir_hash_add when dup is set, do you still need to add to the 
> nextbyhash by list?

Good pickup. I set junk to 1 as well in the dup check.

> 3) The following statement in   longform_dir2_rebuild looks odd. 
> Besides, FWIW, you can match "/."
> 
>                 if (p->name[0] == '/' || (p->name[0] == '.' && 
> (p->namelen == 1
>                                 || (p->namelen == 2 && 
> p->name[1] == '.'))))
>                         continue;
> 
> Consider:
> 
>                if (((p->name[0] == '/' || p->name[0] == '.') && 
> p->namelen == 1) ||
>                         (p->name[0] == '.' && p->name[1] == '.' && 
> p->namelen == 2))
>                         continue;
> 
> 4) Related to items 2&3, shouldn't the code be skipping 
> duplicate entries?

That's exactly what the above is doing, it's skipping all entries starting with a "/" as they have been marked as bad,
and the "." and ".." entries which already exist after the libxfs_dir2_init() call.

> 5) Can we do anything to minimize the do_error calls in 
> longform_dir2_rebuild? Seems like on a full file system, while 
> rebuilding say the root directory, matters can get wacky - 
> The directory 
> is being rebuilt and we have no further room. Sounds like 
> that this how 
> it has been....

Done, now it will do what it can, and I do a flush before the lost+found creation which still has the do_error() calls.

Updated diff attached.

Thanks,
Barry

[-- Attachment #2: diff.txt --]
[-- Type: text/plain, Size: 51023 bytes --]


===========================================================================
xfsprogs/include/cache.h
===========================================================================

--- a/xfsprogs/include/cache.h	2006-07-31 17:09:22.000000000 +1000
+++ b/xfsprogs/include/cache.h	2006-07-27 16:17:47.804986322 +1000
@@ -30,6 +30,7 @@ struct cache_node;
 typedef void *cache_key_t;
 typedef void (*cache_walk_t)(struct cache_node *);
 typedef struct cache_node * (*cache_node_alloc_t)(void);
+typedef void (*cache_node_flush_t)(struct cache_node *);
 typedef void (*cache_node_relse_t)(struct cache_node *);
 typedef unsigned int (*cache_node_hash_t)(cache_key_t, unsigned int);
 typedef int (*cache_node_compare_t)(struct cache_node *, cache_key_t);
@@ -38,6 +39,7 @@ typedef unsigned int (*cache_bulk_relse_
 struct cache_operations {
 	cache_node_hash_t	hash;
 	cache_node_alloc_t	alloc;
+	cache_node_flush_t	flush;
 	cache_node_relse_t	relse;
 	cache_node_compare_t	compare;
 	cache_bulk_relse_t	bulkrelse;	/* optional */
@@ -49,6 +51,7 @@ struct cache {
 	pthread_mutex_t		c_mutex;	/* node count mutex */
 	cache_node_hash_t	hash;		/* node hash function */
 	cache_node_alloc_t	alloc;		/* allocation function */
+	cache_node_flush_t	flush;		/* flush dirty data function */
 	cache_node_relse_t	relse;		/* memory free function */
 	cache_node_compare_t	compare;	/* comparison routine */
 	cache_bulk_relse_t	bulkrelse;	/* bulk release routine */
@@ -75,6 +78,7 @@ struct cache *cache_init(unsigned int, s
 void cache_destroy(struct cache *);
 void cache_walk(struct cache *, cache_walk_t);
 void cache_purge(struct cache *);
+void cache_flush(struct cache *);
 
 int cache_node_get(struct cache *, cache_key_t, struct cache_node **);
 void cache_node_put(struct cache_node *);

===========================================================================
xfsprogs/include/libxfs.h
===========================================================================

--- a/xfsprogs/include/libxfs.h	2006-07-31 17:09:22.000000000 +1000
+++ b/xfsprogs/include/libxfs.h	2006-07-31 16:42:58.384131150 +1000
@@ -257,6 +257,7 @@ extern int	libxfs_writebuf_int (xfs_buf_
 extern struct cache	*libxfs_bcache;
 extern struct cache_operations	libxfs_bcache_operations;
 extern void	libxfs_bcache_purge (void);
+extern void	libxfs_bcache_flush (void);
 extern xfs_buf_t	*libxfs_getbuf (dev_t, xfs_daddr_t, int);
 extern void	libxfs_putbuf (xfs_buf_t *);
 extern void	libxfs_purgebuf (xfs_buf_t *);
@@ -465,8 +466,11 @@ extern int	libxfs_bmapi_single(xfs_trans
 				xfs_fsblock_t *, xfs_fileoff_t);
 extern int	libxfs_bmap_finish (xfs_trans_t **, xfs_bmap_free_t *,
 				xfs_fsblock_t, int *);
+extern void	libxfs_bmap_cancel(xfs_bmap_free_t *);
 extern int	libxfs_bmap_next_offset (xfs_trans_t *, xfs_inode_t *,
 				xfs_fileoff_t *, int);
+extern int	libxfs_bmap_last_offset(xfs_trans_t *, xfs_inode_t *, 
+				xfs_fileoff_t *, int);
 extern int	libxfs_bunmapi (xfs_trans_t *, xfs_inode_t *, xfs_fileoff_t,
 				xfs_filblks_t, int, xfs_extnum_t,
 				xfs_fsblock_t *, xfs_bmap_free_t *, int *);

===========================================================================
xfsprogs/libxfs/cache.c
===========================================================================

--- a/xfsprogs/libxfs/cache.c	2006-07-31 17:09:22.000000000 +1000
+++ b/xfsprogs/libxfs/cache.c	2006-07-27 17:42:43.812685388 +1000
@@ -60,6 +60,7 @@ cache_init(
 	cache->c_hashsize = hashsize;
 	cache->hash = cache_operations->hash;
 	cache->alloc = cache_operations->alloc;
+	cache->flush = cache_operations->flush;
 	cache->relse = cache_operations->relse;
 	cache->compare = cache_operations->compare;
 	cache->bulkrelse = cache_operations->bulkrelse ?
@@ -422,6 +423,39 @@ cache_purge(
 		cache_abort();
 	}
 #endif
+	/* flush any remaining nodes to disk */
+	cache_flush(cache);
+}
+
+/*
+ * Flush all nodes in the cache to disk. 
+ */
+void
+cache_flush(
+	struct cache *		cache)
+{
+	struct cache_hash *	hash;
+	struct list_head *	head;
+	struct list_head *	pos;
+	struct cache_node *	node;
+	int			i;
+	
+	if (!cache->flush)
+		return;
+	
+	for (i = 0; i < cache->c_hashsize; i++) {
+		hash = &cache->c_hash[i];
+		
+		pthread_mutex_lock(&hash->ch_mutex);
+		head = &hash->ch_list;
+		for (pos = head->next; pos != head; pos = pos->next) {
+			node = (struct cache_node *)pos;
+			pthread_mutex_lock(&node->cn_mutex);
+			cache->flush(node);
+			pthread_mutex_unlock(&node->cn_mutex);
+		}
+		pthread_mutex_unlock(&hash->ch_mutex);
+	}
 }
 
 #define	HASH_REPORT	(3*HASH_CACHE_RATIO)

===========================================================================
xfsprogs/libxfs/rdwr.c
===========================================================================

--- a/xfsprogs/libxfs/rdwr.c	2006-07-31 17:09:22.000000000 +1000
+++ b/xfsprogs/libxfs/rdwr.c	2006-07-27 16:40:56.612373938 +1000
@@ -416,6 +416,15 @@ libxfs_iomove(xfs_buf_t *bp, uint boff, 
 }
 
 static void
+libxfs_bflush(struct cache_node *node)
+{
+	xfs_buf_t		*bp = (xfs_buf_t *)node;
+
+	if ((bp != NULL) && (bp->b_flags & LIBXFS_B_DIRTY))
+		libxfs_writebufr(bp);
+}
+
+static void
 libxfs_brelse(struct cache_node *node)
 {
 	xfs_buf_t		*bp = (xfs_buf_t *)node;
@@ -442,9 +451,16 @@ libxfs_bcache_purge(void)
 	cache_purge(libxfs_bcache);
 }
 
+void 
+libxfs_bcache_flush(void)
+{
+	cache_flush(libxfs_bcache);
+}
+
 struct cache_operations libxfs_bcache_operations = {
 	/* .hash */	libxfs_bhash,
 	/* .alloc */	libxfs_balloc,
+	/* .flush */	libxfs_bflush,
 	/* .relse */	libxfs_brelse,
 	/* .compare */	libxfs_bcompare,
 	/* .bulkrelse */ NULL	/* TODO: lio_listio64 interface? */
@@ -649,6 +665,7 @@ libxfs_icache_purge(void)
 struct cache_operations libxfs_icache_operations = {
 	/* .hash */	libxfs_ihash,
 	/* .alloc */	libxfs_ialloc,
+	/* .flush */	NULL,
 	/* .relse */	libxfs_irelse,
 	/* .compare */	libxfs_icompare,
 	/* .bulkrelse */ NULL

===========================================================================
xfsprogs/libxfs/xfs.h
===========================================================================

--- a/xfsprogs/libxfs/xfs.h	2006-07-31 17:09:22.000000000 +1000
+++ b/xfsprogs/libxfs/xfs.h	2006-07-31 16:26:28.040456032 +1000
@@ -97,7 +97,9 @@
 #define xfs_bmapi			libxfs_bmapi
 #define xfs_bmapi_single		libxfs_bmapi_single
 #define xfs_bmap_finish			libxfs_bmap_finish
+#define xfs_bmap_cancel			libxfs_bmap_cancel
 #define xfs_bmap_del_free		libxfs_bmap_del_free
+#define xfs_bmap_last_offset		libxfs_bmap_last_offset
 #define xfs_bunmapi			libxfs_bunmapi
 #define xfs_free_extent			libxfs_free_extent
 #define xfs_rtfree_extent		libxfs_rtfree_extent

===========================================================================
xfsprogs/repair/phase6.c
===========================================================================

--- a/xfsprogs/repair/phase6.c	2006-07-31 17:09:22.000000000 +1000
+++ b/xfsprogs/repair/phase6.c	2006-07-31 16:37:03.134895653 +1000
@@ -36,43 +36,36 @@ static int orphanage_entered;
 
 /*
  * Data structures and routines to keep track of directory entries
- * and whether their leaf entry has been seen
+ * and whether their leaf entry has been seen. Also used for name
+ * duplicate checking and rebuilding step if required.
  */
 typedef struct dir_hash_ent {
-	struct dir_hash_ent	*next;	/* pointer to next entry */
-	xfs_dir2_leaf_entry_t	ent;	/* address and hash value */
-	short			junkit;	/* name starts with / */
-	short			seen;	/* have seen leaf entry */
+	struct dir_hash_ent	*nextbyaddr;	/* next in addr bucket */
+	struct dir_hash_ent	*nextbyhash;	/* next in name bucket */
+	struct dir_hash_ent	*nextbyorder;	/* next in order added */
+	xfs_dahash_t		hashval;	/* hash value of name */
+	__uint32_t		address;	/* offset of data entry */
+	xfs_ino_t 		inum;		/* inode num of entry */
+	short			junkit;		/* name starts with / */
+	short			seen;		/* have seen leaf entry */
+	int	  	    	namelen;	/* length of name */
+	uchar_t    	    	*name;		/* pointer to name (no NULL) */
 } dir_hash_ent_t;
 
 typedef struct dir_hash_tab {
-	int			size;	/* size of hash table */
-	dir_hash_ent_t		*tab[1];/* actual hash table, variable size */
+	int			size;		/* size of hash tables */
+	int			names_duped;	/* 1 = ent names malloced */
+	dir_hash_ent_t		*first;		/* ptr to first added entry */
+	dir_hash_ent_t		*last;		/* ptr to last added entry */
+	dir_hash_ent_t		**byhash;	/* ptr to name hash buckets */
+	dir_hash_ent_t		**byaddr;	/* ptr to addr hash buckets */
 } dir_hash_tab_t;
+
 #define	DIR_HASH_TAB_SIZE(n)	\
-	(offsetof(dir_hash_tab_t, tab) + (sizeof(dir_hash_ent_t *) * (n)))
+	(sizeof(dir_hash_tab_t) + (sizeof(dir_hash_ent_t *) * (n) * 2))
 #define	DIR_HASH_FUNC(t,a)	((a) % (t)->size)
 
 /*
- * Track names to check for duplicates in a directory.
- */
-
-typedef struct name_hash_ent {
-	struct name_hash_ent	*next;	/* pointer to next entry */
-	xfs_dahash_t		hashval;/* hash value of name */
-	int	  	    	namelen;/* length of name */
-	uchar_t    	    	*name;	/* pointer to name (no NULL) */
-} name_hash_ent_t;		
-
-typedef struct name_hash_tab {
-	int			size;	/* size of hash table */
-	name_hash_ent_t		*tab[1];/* actual hash table, variable size */
-} name_hash_tab_t;
-#define	NAME_HASH_TAB_SIZE(n)	\
-	(offsetof(name_hash_tab_t, tab) + (sizeof(name_hash_ent_t *) * (n)))
-#define	NAME_HASH_FUNC(t,a)	((a) % (t)->size)
-
-/*
  * Track the contents of the freespace table in a directory.
  */
 typedef struct freetab {
@@ -94,28 +87,78 @@ typedef struct freetab {
 #define	DIR_HASH_CK_BADSTALE	5
 #define	DIR_HASH_CK_TOTAL	6
 
-static void
+/*
+ * Returns 0 if the name already exists (ie. a duplicate)
+ */
+static int
 dir_hash_add(
 	dir_hash_tab_t		*hashtab,
-	xfs_dahash_t		hash,
-	xfs_dir2_dataptr_t	addr,
-	int			junk)
-{
-	int			i;
+	__uint32_t		addr,	
+	xfs_ino_t		inum,
+	int			namelen,
+	uchar_t			*name)
+{
+	xfs_dahash_t		hash = 0;
+	int			byaddr;
+	int			byhash = 0;
 	dir_hash_ent_t		*p;
-
-	i = DIR_HASH_FUNC(hashtab, addr);
+	int			dup;
+	short			junk;
+	
+	ASSERT(!hashtab->names_duped);
+	
+	junk = name[0] == '/';
+	byaddr = DIR_HASH_FUNC(hashtab, addr);
+	dup = 0;
+
+	if (!junk) {
+		hash = libxfs_da_hashname(name, namelen);
+		byhash = DIR_HASH_FUNC(hashtab, hash);
+
+		/* 
+		 * search hash bucket for existing name.
+		 */
+		for (p = hashtab->byhash[byhash]; p; p = p->nextbyhash) {
+			if (p->hashval == hash && p->namelen == namelen) {
+				if (memcmp(p->name, name, namelen) == 0) {
+					dup = 1;
+					junk = 1;
+					break;
+				}
+			}
+		}
+	}
+	
 	if ((p = malloc(sizeof(*p))) == NULL)
 		do_error(_("malloc failed in dir_hash_add (%u bytes)\n"),
 			sizeof(*p));
-	p->next = hashtab->tab[i];
-	hashtab->tab[i] = p;
-	if (!(p->junkit = junk))
-		p->ent.hashval = hash;
-	p->ent.address = addr;
+	
+	p->nextbyaddr = hashtab->byaddr[byaddr];
+	hashtab->byaddr[byaddr] = p;
+	if (hashtab->last) 
+		hashtab->last->nextbyorder = p;
+	else
+		hashtab->first = p;
+	p->nextbyorder = NULL;
+	hashtab->last = p;
+	
+	if (!(p->junkit = junk)) {
+		p->hashval = hash;
+		p->nextbyhash = hashtab->byhash[byhash];
+		hashtab->byhash[byhash] = p;
+	}
+	p->address = addr;
+	p->inum = inum;
 	p->seen = 0;
+	p->namelen = namelen;
+	p->name = name;
+	
+	return !dup;
 }
 
+/*
+ * checks to see if any data entries are not in the leaf blocks 
+ */
 static int
 dir_hash_unseen(
 	dir_hash_tab_t	*hashtab)
@@ -124,7 +167,7 @@ dir_hash_unseen(
 	dir_hash_ent_t	*p;
 
 	for (i = 0; i < hashtab->size; i++) {
-		for (p = hashtab->tab[i]; p; p = p->next) {
+		for (p = hashtab->byaddr[i]; p; p = p->nextbyaddr) {
 			if (p->seen == 0)
 				return 1;
 		}
@@ -173,8 +216,10 @@ dir_hash_done(
 	dir_hash_ent_t	*p;
 
 	for (i = 0; i < hashtab->size; i++) {
-		for (p = hashtab->tab[i]; p; p = n) {
-			n = p->next;
+		for (p = hashtab->byaddr[i]; p; p = n) {
+			n = p->nextbyaddr;
+			if (hashtab->names_duped)
+				free(p->name);
 			free(p);
 		}
 	}
@@ -196,6 +241,10 @@ dir_hash_init(
 	if ((hashtab = calloc(DIR_HASH_TAB_SIZE(hsize), 1)) == NULL)
 		do_error(_("calloc failed in dir_hash_init\n"));
 	hashtab->size = hsize;
+	hashtab->byhash = (dir_hash_ent_t**)((char *)hashtab + 
+		sizeof(dir_hash_tab_t));
+	hashtab->byaddr = (dir_hash_ent_t**)((char *)hashtab + 
+		sizeof(dir_hash_tab_t) + sizeof(dir_hash_ent_t*) * hsize);
 	return hashtab;
 }
 
@@ -209,12 +258,12 @@ dir_hash_see(
 	dir_hash_ent_t		*p;
 
 	i = DIR_HASH_FUNC(hashtab, addr);
-	for (p = hashtab->tab[i]; p; p = p->next) {
-		if (p->ent.address != addr)
+	for (p = hashtab->byaddr[i]; p; p = p->nextbyaddr) {
+		if (p->address != addr)
 			continue;
 		if (p->seen)
 			return DIR_HASH_CK_DUPLEAF;
-		if (p->junkit == 0 && p->ent.hashval != hash)
+		if (p->junkit == 0 && p->hashval != hash)
 			return DIR_HASH_CK_BADHASH;
 		p->seen = 1;
 		return DIR_HASH_CK_OK;
@@ -222,6 +271,10 @@ dir_hash_see(
 	return DIR_HASH_CK_NODATA;
 }
 
+/*
+ * checks to make sure leafs match a data entry, and that the stale
+ * count is valid.
+ */
 static int
 dir_hash_see_all(
 	dir_hash_tab_t		*hashtab,
@@ -246,81 +299,26 @@ dir_hash_see_all(
 }
 
 /*
- * Returns 0 if the name already exists (ie. a duplicate)
+ * Convert name pointers into locally allocated memory.
+ * This must only be done after all the entries have been added.
  */
-static int
-name_hash_add(
-	name_hash_tab_t		*nametab,
-	uchar_t			*name,
-	int			namelen)
+static void
+dir_hash_dup_names(dir_hash_tab_t *hashtab)
 {
-	xfs_dahash_t		hash;
-	int			i;
-	name_hash_ent_t		*p;
-
-	hash = libxfs_da_hashname(name, namelen);
-			
-	i = NAME_HASH_FUNC(nametab, hash);
-	
-	/* 
-	 * search hash bucket for existing name.
-	 */
-	for (p = nametab->tab[i]; p; p = p->next) {
-		if (p->hashval == hash && p->namelen == namelen) {
-			if (memcmp(p->name, name, namelen) == 0) 
-				return 0; /* exists */
-		}
-	}
-	
-	if ((p = malloc(sizeof(*p))) == NULL)
-		do_error(_("malloc failed in name_hash_add (%u bytes)\n"),
-			sizeof(*p));
+	uchar_t			*name;
+	dir_hash_ent_t		*p;
 	
-	p->next = nametab->tab[i];
-	p->hashval = hash;
-	p->name = name;
-	p->namelen = namelen;
-	nametab->tab[i] = p;
+	if (hashtab->names_duped)
+		return;
 	
-	return 1;	/* success, no duplicate */
-}
-
-static name_hash_tab_t *
-name_hash_init(
-	xfs_fsize_t	size)
-{
-	name_hash_tab_t	*nametab;
-	int		hsize;
-
-	hsize = size / (16 * 4);
-	if (hsize > 1024)
-		hsize = 1024;
-	else if (hsize < 16)
-		hsize = 16;
-	if ((nametab = calloc(NAME_HASH_TAB_SIZE(hsize), 1)) == NULL)
-		do_error(_("calloc failed in name_hash_init\n"));
-	nametab->size = hsize;
-	return nametab;
-}
-
-static void
-name_hash_done(
-	name_hash_tab_t	*nametab)
-{
-	int		i;
-	name_hash_ent_t	*n;
-	name_hash_ent_t	*p;
-
-	for (i = 0; i < nametab->size; i++) {
-		for (p = nametab->tab[i]; p; p = n) {
-			n = p->next;
-			free(p);
-		}
+	for (p = hashtab->first; p; p = p->nextbyorder) {
+		name = malloc(p->namelen);
+		memcpy(name, p->name, p->namelen);
+		p->name = name;
 	}
-	free(nametab);
+	hashtab->names_duped = 1;
 }
 
-
 /*
  * Version 1 or 2 directory routine wrappers
 */
@@ -1385,7 +1383,8 @@ lf_block_dir_entry_check(xfs_mount_t		*m
 			dir_stack_t		*stack,
 			ino_tree_node_t		*current_irec,
 			int			current_ino_offset,
-			name_hash_tab_t		*nametab)
+			dir_hash_tab_t		*hashtab,
+			xfs_dablk_t		da_bno)
 {
 	xfs_dir_leaf_entry_t	*entry;
 	ino_tree_node_t		*irec;
@@ -1545,7 +1544,9 @@ lf_block_dir_entry_check(xfs_mount_t		*m
 		/*
 		 * check for duplicate names in directory.
 		 */ 
-		if (!name_hash_add(nametab, namest->name, entry->namelen)) {
+		if (!dir_hash_add(hashtab, (da_bno << mp->m_sb.sb_blocklog) + 
+						entry->nameidx, 
+				lino, entry->namelen, namest->name)) {
 			do_warn(
 		_("entry \"%s\" (ino %llu) in dir %llu is a duplicate name"),
 				fname, lino, ino);
@@ -1635,7 +1636,7 @@ longform_dir_entry_check(xfs_mount_t	*mp
 			dir_stack_t	*stack,
 			ino_tree_node_t	*irec,
 			int		ino_offset,
-			name_hash_tab_t	*nametab)
+			dir_hash_tab_t	*hashtab)
 {
 	xfs_dir_leafblock_t	*leaf;
 	xfs_buf_t		*bp;
@@ -1677,8 +1678,6 @@ longform_dir_entry_check(xfs_mount_t	*mp
 
 		leaf = (xfs_dir_leafblock_t *)XFS_BUF_PTR(bp);
 
-		da_bno = INT_GET(leaf->hdr.info.forw, ARCH_CONVERT);
-
 		if (INT_GET(leaf->hdr.info.magic, ARCH_CONVERT) !=
 		    XFS_DIR_LEAF_MAGIC)  {
 			if (!no_modify)  {
@@ -1699,9 +1698,11 @@ _("bad magic # (0x%x) for dir ino %llu l
 		}
 
 		if (!skipit)
-			lf_block_dir_entry_check(mp, ino, leaf, &dirty,
-						num_illegal, need_dot, stack,
-						irec, ino_offset, nametab);
+			lf_block_dir_entry_check(mp, ino, leaf, &dirty, 
+					num_illegal, need_dot, stack, irec, 
+					ino_offset, hashtab, da_bno);
+
+		da_bno = INT_GET(leaf->hdr.info.forw, ARCH_CONVERT);
 
 		ASSERT(dirty == 0 || (dirty && !no_modify));
 
@@ -1745,6 +1746,152 @@ _("can't map leaf block %d in dir %llu, 
 }
 
 /*
+ * Unexpected failure during the rebuild will leave the entries in
+ * lost+found on the next run
+ */
+
+static void 
+longform_dir2_rebuild(
+	xfs_mount_t	*mp,
+	xfs_ino_t	ino,
+	xfs_inode_t	*ip,
+	dir_hash_tab_t	*hashtab)
+{
+	int			error;
+	int			nres;
+	xfs_trans_t		*tp;
+	xfs_fileoff_t		lastblock;
+	xfs_fsblock_t		firstblock;
+	xfs_bmap_free_t		flist;
+	xfs_ino_t		parentino;
+	xfs_inode_t		*pip;
+	int			byhash;
+	dir_hash_ent_t		*p;
+	int			committed;
+	int			done;
+	
+	/* 
+	 * trash directory completely and rebuild from scratch using the
+	 * name/inode pairs in the hash table
+	 */
+	 
+	do_warn(_("rebuilding directory inode %llu\n"), ino);
+	
+	/* 
+	 * first attempt to locate the parent inode, if it can't be found,
+	 * we'll use the lost+found inode 
+	 */
+	byhash = DIR_HASH_FUNC(hashtab, libxfs_da_hashname((uchar_t*)"..", 2));
+	parentino = orphanage_ino;
+	for (p = hashtab->byhash[byhash]; p; p = p->nextbyhash) {
+		if (p->namelen == 2 && p->name[0] == '.' && p->name[1] == '.') {
+			parentino = p->inum;
+			break;
+		}
+	}
+
+	XFS_BMAP_INIT(&flist, &firstblock);
+		
+	tp = libxfs_trans_alloc(mp, 0);
+	nres = XFS_REMOVE_SPACE_RES(mp);
+	error = libxfs_trans_reserve(tp, nres, XFS_REMOVE_LOG_RES(mp), 0,
+			XFS_TRANS_PERM_LOG_RES, XFS_REMOVE_LOG_COUNT);
+	if (error)
+		res_failed(error);
+	libxfs_trans_ijoin(tp, ip, 0);
+	libxfs_trans_ihold(tp, ip);
+	
+	if ((error = libxfs_bmap_last_offset(tp, ip, &lastblock, 
+						XFS_DATA_FORK)))
+		do_error(_("xfs_bmap_last_offset failed -- error - %d\n"), 
+			error);
+	
+	/* re-init the directory to shortform */
+	if ((error = libxfs_trans_iget(mp, tp, parentino, 0, 0, &pip))) {
+		do_warn(
+		_("couldn't iget parent inode %llu -- error - %d\n"),
+			parentino, error);
+		/* we'll try to use the orphanage ino then */
+		parentino = orphanage_ino;
+		if ((error = libxfs_trans_iget(mp, tp, parentino, 0, 0, &pip)))
+			do_error(
+		_("couldn't iget lost+found inode %llu -- error - %d\n"),
+				parentino, error);
+	}
+
+	/* free all data, leaf, node and freespace blocks */
+	
+	if ((error = libxfs_bunmapi(tp, ip, 0, lastblock, 
+			XFS_BMAPI_METADATA, 0, &firstblock, &flist,
+			&done))) {
+		do_warn(_("xfs_bunmapi failed -- error - %d\n"), error);
+		libxfs_trans_cancel(tp, XFS_TRANS_RELEASE_LOG_RES |
+					XFS_TRANS_ABORT);
+		return;
+	}
+		
+	ASSERT(done);
+
+	libxfs_dir2_init(tp, ip, pip);
+	
+	error = libxfs_bmap_finish(&tp, &flist, firstblock, &committed);
+				
+	libxfs_trans_commit(tp, 
+			XFS_TRANS_RELEASE_LOG_RES|XFS_TRANS_SYNC, 0);
+		
+	/* go through the hash list and re-add the inodes */
+
+	for (p = hashtab->first; p; p = p->nextbyorder) {
+		
+		if (p->name[0] == '/' || (p->name[0] == '.' && (p->namelen == 1 
+				|| (p->namelen == 2 && p->name[1] == '.'))))
+			continue;
+		
+		tp = libxfs_trans_alloc(mp, 0);
+		nres = XFS_CREATE_SPACE_RES(mp, p->namelen);
+		if ((error = libxfs_trans_reserve(tp, nres, 
+				XFS_CREATE_LOG_RES(mp), 0,
+				XFS_TRANS_PERM_LOG_RES, XFS_CREATE_LOG_COUNT))) {
+			do_warn(
+	_("space reservation failed (%d), filesystem may be out of space\n"),
+				error);
+			break;
+		}
+
+		libxfs_trans_ijoin(tp, ip, 0);
+		libxfs_trans_ihold(tp, ip);
+
+		XFS_BMAP_INIT(&flist, &firstblock);
+		if ((error = libxfs_dir2_createname(tp, ip, (char*)p->name, 
+				p->namelen, p->inum, &firstblock, &flist, 
+				nres))) {
+			do_warn(
+_("name create failed in ino %llu (%d), filesystem may be out of space\n"),
+				ino, error);
+			libxfs_trans_cancel(tp, XFS_TRANS_RELEASE_LOG_RES |
+						XFS_TRANS_ABORT);
+			break;
+		}
+
+		if ((error = libxfs_bmap_finish(&tp, &flist, firstblock, 
+				&committed))) {
+			do_warn(
+	_("bmap finish failed (%d), filesystem may be out of space\n"),
+				error);
+			libxfs_bmap_cancel(&flist);
+			libxfs_trans_cancel(tp, XFS_TRANS_RELEASE_LOG_RES |
+						XFS_TRANS_ABORT);
+			break;
+		}
+			
+
+		libxfs_trans_commit(tp, 
+				XFS_TRANS_RELEASE_LOG_RES|XFS_TRANS_SYNC, 0);
+	}
+}
+
+
+/*
  * Kill a block in a version 2 inode.
  * Makes its own transaction.
  */
@@ -1807,7 +1954,6 @@ longform_dir2_entry_check_data(
 	xfs_dabuf_t		**bpp,
 	dir_hash_tab_t		*hashtab,
 	freetab_t		**freetabp,
-	name_hash_tab_t		*nametab,
 	xfs_dablk_t		da_bno,
 	int			isblock)
 {
@@ -1828,6 +1974,7 @@ longform_dir2_entry_check_data(
 	freetab_t		*freetab;
 	int			i;
 	int			ino_offset;
+	xfs_ino_t		inum;
 	ino_tree_node_t		*irec;
 	int			junkit;
 	int			lastfree;
@@ -1956,8 +2103,7 @@ longform_dir2_entry_check_data(
 	libxfs_trans_ijoin(tp, ip, 0);
 	libxfs_trans_ihold(tp, ip);
 	libxfs_da_bjoin(tp, bp);
-	if (isblock)
-		libxfs_da_bhold(tp, bp);
+	libxfs_da_bhold(tp, bp);
 	XFS_BMAP_INIT(&flist, &firstblock);
 	if (INT_GET(d->hdr.magic, ARCH_CONVERT) != wantmagic) {
 		do_warn(_("bad directory block magic # %#x for directory inode "
@@ -1987,7 +2133,7 @@ longform_dir2_entry_check_data(
 	while (ptr < endptr) {
 		dup = (xfs_dir2_data_unused_t *)ptr;
 		if (INT_GET(dup->freetag, ARCH_CONVERT) ==
-		    XFS_DIR2_DATA_FREE_TAG) {
+		    				XFS_DIR2_DATA_FREE_TAG) {
 			if (lastfree) {
 				do_warn(_("directory inode %llu block %u has "
 					  "consecutive free entries: "),
@@ -2011,10 +2157,24 @@ longform_dir2_entry_check_data(
 		addr = XFS_DIR2_DB_OFF_TO_DATAPTR(mp, db, ptr - (char *)d);
 		dep = (xfs_dir2_data_entry_t *)ptr;
 		ptr += XFS_DIR2_DATA_ENTSIZE(dep->namelen);
+		inum = INT_GET(dep->inumber, ARCH_CONVERT);
 		lastfree = 0;
-		dir_hash_add(hashtab,
-			libxfs_da_hashname((uchar_t *)dep->name, dep->namelen),
-			addr, dep->name[0] == '/');
+		if (!dir_hash_add(hashtab, addr, inum, dep->namelen, 
+				dep->name)) {
+			do_warn(
+		_("entry \"%s\" (ino %llu) in dir %llu is a duplicate name"),
+				fname, inum, ip->i_ino);
+			if (!no_modify) {
+				if (verbose)
+					do_warn(
+					_(", marking entry to be junked\n"));
+				else
+					do_warn("\n");
+			} else {
+				do_warn(_(", would junk entry\n"));
+			}
+			dep->name[0] = '/';
+		}
 		/*
 		 * skip bogus entries (leading '/').  they'll be deleted
 		 * later.  must still log it, else we leak references to
@@ -2029,7 +2189,7 @@ longform_dir2_entry_check_data(
 		junkit = 0;
 		bcopy(dep->name, fname, dep->namelen);
 		fname[dep->namelen] = '\0';
-		ASSERT(INT_GET(dep->inumber, ARCH_CONVERT) != NULLFSINO);
+		ASSERT(inum != NULLFSINO);
 		/*
 		 * skip the '..' entry since it's checked when the
 		 * directory is reached by something else.  if it never
@@ -2039,7 +2199,7 @@ longform_dir2_entry_check_data(
 		if (dep->namelen == 2 && dep->name[0] == '.' &&
 		    dep->name[1] == '.')
 			continue;
-		ASSERT(no_modify || !verify_inum(mp, INT_GET(dep->inumber, ARCH_CONVERT)));
+		ASSERT(no_modify || !verify_inum(mp, inum));
 		/*
 		 * special case the . entry.  we know there's only one
 		 * '.' and only '.' points to itself because bogus entries
@@ -2049,7 +2209,7 @@ longform_dir2_entry_check_data(
 		 * '..' is already accounted for or will be taken care
 		 * of when directory is moved to orphanage.
 		 */
-		if (ip->i_ino == INT_GET(dep->inumber, ARCH_CONVERT))  {
+		if (ip->i_ino == inum)  {
 			ASSERT(dep->name[0] == '.' && dep->namelen == 1);
 			add_inode_ref(current_irec, current_ino_offset);
 			*need_dot = 0;
@@ -2062,23 +2222,18 @@ longform_dir2_entry_check_data(
 		 * just skip it.  no need to process it and it's ..
 		 * link is already accounted for.
 		 */
-		if (INT_GET(dep->inumber, ARCH_CONVERT) == orphanage_ino &&
-		    strcmp(fname, ORPHANAGE) == 0)
+		if (inum == orphanage_ino && strcmp(fname, ORPHANAGE) == 0)
 			continue;
 		/*
 		 * skip entries with bogus inumbers if we're in no modify mode
 		 */
-		if (no_modify &&
-		    verify_inum(mp, INT_GET(dep->inumber, ARCH_CONVERT)))
+		if (no_modify && verify_inum(mp, inum))
 			continue;
 		/*
 		 * ok, now handle the rest of the cases besides '.' and '..'
 		 */
-		irec = find_inode_rec(
-			XFS_INO_TO_AGNO(mp,
-				INT_GET(dep->inumber, ARCH_CONVERT)),
-			XFS_INO_TO_AGINO(mp,
-				INT_GET(dep->inumber, ARCH_CONVERT)));
+		irec = find_inode_rec(XFS_INO_TO_AGNO(mp, inum),
+					XFS_INO_TO_AGINO(mp, inum));
 		if (irec == NULL)  {
 			nbad++;
 			do_warn(_("entry \"%s\" in directory inode %llu points "
@@ -2093,9 +2248,7 @@ longform_dir2_entry_check_data(
 			}
 			continue;
 		}
-		ino_offset = XFS_INO_TO_AGINO(mp,
-				INT_GET(dep->inumber, ARCH_CONVERT)) -
-					irec->ino_startnum;
+		ino_offset = XFS_INO_TO_AGINO(mp, inum) - irec->ino_startnum;
 		/*
 		 * if it's a free inode, blow out the entry.
 		 * by now, any inode that we think is free
@@ -2106,18 +2259,13 @@ longform_dir2_entry_check_data(
 			 * don't complain if this entry points to the old
 			 * and now-free lost+found inode
 			 */
-			if (verbose || no_modify ||
-			    INT_GET(dep->inumber, ARCH_CONVERT) !=
-			    old_orphanage_ino)
+			if (verbose || no_modify || inum != old_orphanage_ino)
 				do_warn(
 	_("entry \"%s\" in directory inode %llu points to free inode %llu"),
-					fname, ip->i_ino,
-					INT_GET(dep->inumber, ARCH_CONVERT));
+					fname, ip->i_ino, inum);
 			nbad++;
 			if (!no_modify)  {
-				if (verbose ||
-				    INT_GET(dep->inumber, ARCH_CONVERT) !=
-				    old_orphanage_ino)
+				if (verbose || inum != old_orphanage_ino)
 					do_warn(
 					_(", marking entry to be junked\n"));
 				else
@@ -2130,28 +2278,6 @@ longform_dir2_entry_check_data(
 			continue;
 		}
 		/*
-		 * check for duplicate names in directory.
-		 */ 
-		if (!name_hash_add(nametab, dep->name, dep->namelen)) {
-			do_warn(
-		_("entry \"%s\" (ino %llu) in dir %llu is a duplicate name"),
-				fname, INT_GET(dep->inumber, ARCH_CONVERT),
-				ip->i_ino);
-			nbad++;
-			if (!no_modify) {
-				if (verbose)
-					do_warn(
-					_(", marking entry to be junked\n"));
-				else
-					do_warn("\n");
-				dep->name[0] = '/';
-				libxfs_dir2_data_log_entry(tp, bp, dep);
-			} else {
-				do_warn(_(", would junk entry\n"));
-			}
-			continue;
-		}
-		/*
 		 * check easy case first, regular inode, just bump
 		 * the link count and continue
 		 */
@@ -2172,22 +2298,17 @@ longform_dir2_entry_check_data(
 			junkit = 1;
 			do_warn(
 _("entry \"%s\" in dir %llu points to an already connected directory inode %llu,\n"),
-				fname, ip->i_ino,
-				INT_GET(dep->inumber, ARCH_CONVERT));
+				fname, ip->i_ino, inum);
 		} else if (parent == ip->i_ino)  {
 			add_inode_reached(irec, ino_offset);
 			add_inode_ref(current_irec, current_ino_offset);
-			if (!is_inode_refchecked(
-				INT_GET(dep->inumber, ARCH_CONVERT), irec,
-					ino_offset))
-				push_dir(stack,
-					INT_GET(dep->inumber, ARCH_CONVERT));
+			if (!is_inode_refchecked(inum, irec, ino_offset))
+				push_dir(stack, inum);
 		} else  {
 			junkit = 1;
 			do_warn(
 _("entry \"%s\" in dir inode %llu inconsistent with .. value (%llu) in ino %llu,\n"),
-				fname, ip->i_ino, parent,
-				INT_GET(dep->inumber, ARCH_CONVERT));
+				fname, ip->i_ino, parent, inum);
 		}
 		if (junkit)  {
 			junkit = 0;
@@ -2195,9 +2316,7 @@ _("entry \"%s\" in dir inode %llu incons
 			if (!no_modify)  {
 				dep->name[0] = '/';
 				libxfs_dir2_data_log_entry(tp, bp, dep);
-				if (verbose ||
-				    INT_GET(dep->inumber, ARCH_CONVERT) !=
-				    old_orphanage_ino)
+				if (verbose || inum != old_orphanage_ino)
 					do_warn(
 					_("\twill clear entry \"%s\"\n"),
 						fname);
@@ -2212,8 +2331,6 @@ _("entry \"%s\" in dir inode %llu incons
 		libxfs_dir2_data_freescan(mp, d, &needlog, NULL);
 	if (needlog)
 		libxfs_dir2_data_log_header(tp, bp);
-	else if (!isblock && !nbad)
-		libxfs_da_brelse(tp, bp);
 	libxfs_bmap_finish(&tp, &flist, firstblock, &committed);
 	libxfs_trans_commit(tp, 0, 0);
 	freetab->ents[db].v = INT_GET(d->hdr.bestfree[0].length, ARCH_CONVERT);
@@ -2306,19 +2423,19 @@ longform_dir2_check_node(
 	xfs_fileoff_t		next_da_bno;
 	int			seeval = 0;
 	int			used;
-
+	
 	for (da_bno = mp->m_dirleafblk, next_da_bno = 0;
 	     next_da_bno != NULLFILEOFF && da_bno < mp->m_dirfreeblk;
 	     da_bno = (xfs_dablk_t)next_da_bno) {
 		next_da_bno = da_bno + mp->m_dirblkfsbs - 1;
-		if (libxfs_bmap_next_offset(NULL, ip, &next_da_bno, XFS_DATA_FORK))
+		if (libxfs_bmap_next_offset(NULL, ip, &next_da_bno, XFS_DATA_FORK)) 
 			break;
 		if (libxfs_da_read_bufr(NULL, ip, da_bno, -1, &bp,
 				XFS_DATA_FORK)) {
-			do_error(
-			_("can't read block %u for directory inode %llu\n"),
+			do_warn(
+			_("can't read leaf block %u for directory inode %llu\n"),
 				da_bno, ip->i_ino);
-			/* NOTREACHED */
+			return 1;
 		}
 		leaf = bp->data;
 		if (INT_GET(leaf->hdr.info.magic, ARCH_CONVERT) !=
@@ -2348,23 +2465,24 @@ longform_dir2_check_node(
 		seeval = dir_hash_see_all(hashtab, leaf->ents, INT_GET(leaf->hdr.count, ARCH_CONVERT),
 			INT_GET(leaf->hdr.stale, ARCH_CONVERT));
 		libxfs_da_brelse(NULL, bp);
-		if (seeval != DIR_HASH_CK_OK)
+		if (seeval != DIR_HASH_CK_OK) 
 			return 1;
 	}
-	if (dir_hash_check(hashtab, ip, seeval))
+	if (dir_hash_check(hashtab, ip, seeval)) 
 		return 1;
+	
 	for (da_bno = mp->m_dirfreeblk, next_da_bno = 0;
 	     next_da_bno != NULLFILEOFF;
 	     da_bno = (xfs_dablk_t)next_da_bno) {
 		next_da_bno = da_bno + mp->m_dirblkfsbs - 1;
-		if (libxfs_bmap_next_offset(NULL, ip, &next_da_bno, XFS_DATA_FORK))
+		if (libxfs_bmap_next_offset(NULL, ip, &next_da_bno, XFS_DATA_FORK)) 
 			break;
 		if (libxfs_da_read_bufr(NULL, ip, da_bno, -1, &bp,
 				XFS_DATA_FORK)) {
-			do_error(_("can't read block %u for directory inode "
-				   "%llu\n"),
+			do_warn(
+		_("can't read freespace block %u for directory inode %llu\n"),
 				da_bno, ip->i_ino);
-			/* NOTREACHED */
+			return 1;
 		}
 		free = bp->data;
 		fdb = XFS_DIR2_DA_TO_DB(mp, da_bno);
@@ -2418,388 +2536,9 @@ longform_dir2_check_node(
 }
 
 /*
- * Rebuild a directory: set up.
- * Turn it into a node-format directory with no contents in the
- * upper area.  Also has correct freespace blocks.
- */
-void
-longform_dir2_rebuild_setup(
-	xfs_mount_t		*mp,
-	xfs_ino_t		ino,
-	xfs_inode_t		*ip,
-	freetab_t		*freetab)
-{
-	xfs_da_args_t		args;
-	int			committed;
-	xfs_dir2_data_t		*data = NULL;
-	xfs_dabuf_t		*dbp;
-	int			error;
-	xfs_dir2_db_t		fbno;
-	xfs_dabuf_t		*fbp;
-	xfs_fsblock_t		firstblock;
-	xfs_bmap_free_t		flist;
-	xfs_dir2_free_t		*free;
-	int			i;
-	int			j;
-	xfs_dablk_t		lblkno;
-	xfs_dabuf_t		*lbp;
-	xfs_dir2_leaf_t		*leaf;
-	int			nres;
-	xfs_trans_t		*tp;
-
-	/* read first directory block */
-	tp = libxfs_trans_alloc(mp, 0);
-	nres = XFS_DAENTER_SPACE_RES(mp, XFS_DATA_FORK);
-	error = libxfs_trans_reserve(tp,
-		nres, XFS_CREATE_LOG_RES(mp), 0, XFS_TRANS_PERM_LOG_RES,
-		XFS_CREATE_LOG_COUNT);
-	if (error)
-		res_failed(error);
-	libxfs_trans_ijoin(tp, ip, 0);
-	libxfs_trans_ihold(tp, ip);
-	XFS_BMAP_INIT(&flist, &firstblock);
-	if (libxfs_da_read_buf(tp, ip, mp->m_dirdatablk, -2, &dbp,
-			XFS_DATA_FORK)) {
-		do_error(_("can't read block %u for directory inode %llu\n"),
-			mp->m_dirdatablk, ino);
-		/* NOTREACHED */
-	}
-
-	if (dbp)
-		data = dbp->data;
-
-	/* check for block format directory */
-	if (data &&
-	    INT_GET((data)->hdr.magic, ARCH_CONVERT) == XFS_DIR2_BLOCK_MAGIC) {
-		xfs_dir2_block_t	*block;
-		xfs_dir2_leaf_entry_t	*blp;
-		xfs_dir2_block_tail_t	*btp;
-		int			needlog;
-		int			needscan;
-
-		/* convert directory block from block format to data format */
-		INT_SET(data->hdr.magic, ARCH_CONVERT, XFS_DIR2_DATA_MAGIC);
-
-		/* construct freelist */
-		block = (xfs_dir2_block_t *)data;
-		btp = XFS_DIR2_BLOCK_TAIL_P(mp, block);
-		blp = XFS_DIR2_BLOCK_LEAF_P(btp);
-		needlog = needscan = 0;
-		libxfs_dir2_data_make_free(tp, dbp, (char *)blp - (char *)block,
-			(char *)block + mp->m_dirblksize - (char *)blp,
-			&needlog, &needscan);
-		if (needscan)
-			libxfs_dir2_data_freescan(mp, data, &needlog, NULL);
-		libxfs_da_log_buf(tp, dbp, 0, mp->m_dirblksize - 1);
-	} else if (dbp) {
-		libxfs_da_brelse(tp, dbp);
-	}
-
-	/* allocate blocks for btree */
-	bzero(&args, sizeof(args));
-	args.trans = tp;
-	args.dp = ip;
-	args.whichfork = XFS_DATA_FORK;
-	args.firstblock = &firstblock;
-	args.flist = &flist;
-	args.total = nres;
-	if ((error = libxfs_da_grow_inode(&args, &lblkno)) ||
-	    (error = libxfs_da_get_buf(tp, ip, lblkno, -1, &lbp, XFS_DATA_FORK))) {
-		do_error(_("can't add btree block to directory inode %llu\n"),
-			ino);
-		/* NOTREACHED */
-	}
-	leaf = lbp->data;
-	bzero(leaf, mp->m_dirblksize);
-	INT_SET(leaf->hdr.info.magic, ARCH_CONVERT, XFS_DIR2_LEAFN_MAGIC);
-	libxfs_da_log_buf(tp, lbp, 0, mp->m_dirblksize - 1);
-	libxfs_bmap_finish(&tp, &flist, firstblock, &committed);
-	libxfs_trans_commit(tp, 0, 0);
-
-	for (i = 0; i < freetab->nents; i += XFS_DIR2_MAX_FREE_BESTS(mp)) {
-		tp = libxfs_trans_alloc(mp, 0);
-		nres = XFS_DAENTER_SPACE_RES(mp, XFS_DATA_FORK);
-		error = libxfs_trans_reserve(tp,
-			nres, XFS_CREATE_LOG_RES(mp), 0, XFS_TRANS_PERM_LOG_RES,
-			XFS_CREATE_LOG_COUNT);
-		if (error)
-			res_failed(error);
-		libxfs_trans_ijoin(tp, ip, 0);
-		libxfs_trans_ihold(tp, ip);
-		XFS_BMAP_INIT(&flist, &firstblock);
-		bzero(&args, sizeof(args));
-		args.trans = tp;
-		args.dp = ip;
-		args.whichfork = XFS_DATA_FORK;
-		args.firstblock = &firstblock;
-		args.flist = &flist;
-		args.total = nres;
-		if ((error = libxfs_dir2_grow_inode(&args, XFS_DIR2_FREE_SPACE,
-						 &fbno)) ||
-		    (error = libxfs_da_get_buf(tp, ip, XFS_DIR2_DB_TO_DA(mp, fbno),
-					    -1, &fbp, XFS_DATA_FORK))) {
-			do_error(_("can't add free block to directory inode "
-				   "%llu\n"),
-				ino);
-			/* NOTREACHED */
-		}
-		free = fbp->data;
-		bzero(free, mp->m_dirblksize);
-		INT_SET(free->hdr.magic, ARCH_CONVERT, XFS_DIR2_FREE_MAGIC);
-		INT_SET(free->hdr.firstdb, ARCH_CONVERT, i);
-		INT_SET(free->hdr.nvalid, ARCH_CONVERT, XFS_DIR2_MAX_FREE_BESTS(mp));
-		if (i + INT_GET(free->hdr.nvalid, ARCH_CONVERT) > freetab->nents)
-			INT_SET(free->hdr.nvalid, ARCH_CONVERT, freetab->nents - i);
-		for (j = 0; j < INT_GET(free->hdr.nvalid, ARCH_CONVERT); j++) {
-			INT_SET(free->bests[j], ARCH_CONVERT, freetab->ents[i + j].v);
-			if (INT_GET(free->bests[j], ARCH_CONVERT) != NULLDATAOFF)
-				INT_MOD(free->hdr.nused, ARCH_CONVERT, +1);
-		}
-		libxfs_da_log_buf(tp, fbp, 0, mp->m_dirblksize - 1);
-		libxfs_bmap_finish(&tp, &flist, firstblock, &committed);
-		libxfs_trans_commit(tp, 0, 0);
-	}
-}
-
-/*
- * Rebuild the entries from a single data block.
- */
-void
-longform_dir2_rebuild_data(
-	xfs_mount_t		*mp,
-	xfs_ino_t		ino,
-	xfs_inode_t		*ip,
-	xfs_dablk_t		da_bno)
-{
-	xfs_dabuf_t		*bp;
-	xfs_dir2_block_tail_t	*btp;
-	int			committed;
-	xfs_dir2_data_t		*data;
-	xfs_dir2_db_t		dbno;
-	xfs_dir2_data_entry_t	*dep;
-	xfs_dir2_data_unused_t	*dup;
-	char			*endptr;
-	int			error;
-	xfs_dir2_free_t		*fblock;
-	xfs_dabuf_t		*fbp;
-	xfs_dir2_db_t		fdb;
-	int			fi;
-	xfs_fsblock_t		firstblock;
-	xfs_bmap_free_t		flist;
-	int			needlog;
-	int			needscan;
-	int			nres;
-	char			*ptr;
-	xfs_trans_t		*tp;
-
-	if (libxfs_da_read_buf(NULL, ip, da_bno, da_bno == 0 ? -2 : -1, &bp,
-			XFS_DATA_FORK)) {
-		do_error(_("can't read block %u for directory inode %llu\n"),
-			da_bno, ino);
-		/* NOTREACHED */
-	}
-	if (da_bno == 0 && bp == NULL)
-		/*
-		 * The block was punched out.
-		 */
-		return;
-	ASSERT(bp);
-	dbno = XFS_DIR2_DA_TO_DB(mp, da_bno);
-	fdb = XFS_DIR2_DB_TO_FDB(mp, dbno);
-	if (libxfs_da_read_buf(NULL, ip, XFS_DIR2_DB_TO_DA(mp, fdb), -1, &fbp,
-			XFS_DATA_FORK)) {
-		do_error(_("can't read block %u for directory inode %llu\n"),
-			XFS_DIR2_DB_TO_DA(mp, fdb), ino);
-		/* NOTREACHED */
-	}
-	data = malloc(mp->m_dirblksize);
-	if (!data) {
-		do_error(
-		_("malloc failed in longform_dir2_rebuild_data (%u bytes)\n"),
-			mp->m_dirblksize);
-		exit(1);
-	}
-	bcopy(bp->data, data, mp->m_dirblksize);
-	ptr = (char *)data->u;
-	if (INT_GET(data->hdr.magic, ARCH_CONVERT) == XFS_DIR2_BLOCK_MAGIC) {
-		btp = XFS_DIR2_BLOCK_TAIL_P(mp, (xfs_dir2_block_t *)data);
-		endptr = (char *)XFS_DIR2_BLOCK_LEAF_P(btp);
-	} else
-		endptr = (char *)data + mp->m_dirblksize;
-	fblock = fbp->data;
-	fi = XFS_DIR2_DB_TO_FDINDEX(mp, dbno);
-	tp = libxfs_trans_alloc(mp, 0);
-	error = libxfs_trans_reserve(tp, 0, XFS_CREATE_LOG_RES(mp), 0,
-		XFS_TRANS_PERM_LOG_RES, XFS_CREATE_LOG_COUNT);
-	if (error)
-		res_failed(error);
-	libxfs_trans_ijoin(tp, ip, 0);
-	libxfs_trans_ihold(tp, ip);
-	libxfs_da_bjoin(tp, bp);
-	libxfs_da_bhold(tp, bp);
-	libxfs_da_bjoin(tp, fbp);
-	libxfs_da_bhold(tp, fbp);
-	XFS_BMAP_INIT(&flist, &firstblock);
-	needlog = needscan = 0;
-	bzero(((xfs_dir2_data_t *)(bp->data))->hdr.bestfree,
-		sizeof(data->hdr.bestfree));
-	libxfs_dir2_data_make_free(tp, bp, (xfs_dir2_data_aoff_t)sizeof(data->hdr),
-		mp->m_dirblksize - sizeof(data->hdr), &needlog, &needscan);
-	ASSERT(needscan == 0);
-	libxfs_dir2_data_log_header(tp, bp);
-	INT_SET(fblock->bests[fi], ARCH_CONVERT,
-		INT_GET(((xfs_dir2_data_t *)(bp->data))->hdr.bestfree[0].length, ARCH_CONVERT));
-	libxfs_dir2_free_log_bests(tp, fbp, fi, fi);
-	libxfs_bmap_finish(&tp, &flist, firstblock, &committed);
-	libxfs_trans_commit(tp, 0, 0);
-
-	while (ptr < endptr) {
-		dup = (xfs_dir2_data_unused_t *)ptr;
-		if (INT_GET(dup->freetag, ARCH_CONVERT) == XFS_DIR2_DATA_FREE_TAG) {
-			ptr += INT_GET(dup->length, ARCH_CONVERT);
-			continue;
-		}
-		dep = (xfs_dir2_data_entry_t *)ptr;
-		ptr += XFS_DIR2_DATA_ENTSIZE(dep->namelen);
-		if (dep->name[0] == '/')
-			continue;
-		tp = libxfs_trans_alloc(mp, 0);
-		nres = XFS_CREATE_SPACE_RES(mp, dep->namelen);
-		error = libxfs_trans_reserve(tp, nres, XFS_CREATE_LOG_RES(mp), 0,
-			XFS_TRANS_PERM_LOG_RES, XFS_CREATE_LOG_COUNT);
-		if (error)
-			res_failed(error);
-		libxfs_trans_ijoin(tp, ip, 0);
-		libxfs_trans_ihold(tp, ip);
-		libxfs_da_bjoin(tp, bp);
-		libxfs_da_bhold(tp, bp);
-		libxfs_da_bjoin(tp, fbp);
-		libxfs_da_bhold(tp, fbp);
-		XFS_BMAP_INIT(&flist, &firstblock);
-		error = dir_createname(mp, tp, ip, (char *)dep->name,
-			dep->namelen, INT_GET(dep->inumber, ARCH_CONVERT),
-			&firstblock, &flist, nres);
-		ASSERT(error == 0);
-		libxfs_bmap_finish(&tp, &flist, firstblock, &committed);
-		libxfs_trans_commit(tp, 0, 0);
-	}
-	libxfs_da_brelse(NULL, bp);
-	libxfs_da_brelse(NULL, fbp);
-	free(data);
-}
-
-/*
- * Finish the rebuild of a directory.
- * Stuff / in and then remove it, this forces the directory to end
- * up in the right format.
- */
-void
-longform_dir2_rebuild_finish(
-	xfs_mount_t		*mp,
-	xfs_ino_t		ino,
-	xfs_inode_t		*ip)
-{
-	int			committed;
-	int			error;
-	xfs_fsblock_t		firstblock;
-	xfs_bmap_free_t		flist;
-	int			nres;
-	xfs_trans_t		*tp;
-
-	tp = libxfs_trans_alloc(mp, 0);
-	nres = XFS_CREATE_SPACE_RES(mp, 1);
-	error = libxfs_trans_reserve(tp, nres, XFS_CREATE_LOG_RES(mp), 0,
-		XFS_TRANS_PERM_LOG_RES, XFS_CREATE_LOG_COUNT);
-	if (error)
-		res_failed(error);
-	libxfs_trans_ijoin(tp, ip, 0);
-	libxfs_trans_ihold(tp, ip);
-	XFS_BMAP_INIT(&flist, &firstblock);
-	error = dir_createname(mp, tp, ip, "/", 1, ino,
-			&firstblock, &flist, nres);
-	ASSERT(error == 0);
-	libxfs_bmap_finish(&tp, &flist, firstblock, &committed);
-	libxfs_trans_commit(tp, 0, 0);
-
-	/* could kill trailing empty data blocks here */
-
-	tp = libxfs_trans_alloc(mp, 0);
-	nres = XFS_REMOVE_SPACE_RES(mp);
-	error = libxfs_trans_reserve(tp, nres, XFS_REMOVE_LOG_RES(mp), 0,
-		XFS_TRANS_PERM_LOG_RES, XFS_REMOVE_LOG_COUNT);
-	if (error)
-		res_failed(error);
-	libxfs_trans_ijoin(tp, ip, 0);
-	libxfs_trans_ihold(tp, ip);
-	XFS_BMAP_INIT(&flist, &firstblock);
-	error = dir_removename(mp, tp, ip, "/", 1, ino,
-			&firstblock, &flist, nres);
-	ASSERT(error == 0);
-	libxfs_bmap_finish(&tp, &flist, firstblock, &committed);
-	libxfs_trans_commit(tp, 0, 0);
-}
-
-/*
- * Rebuild a directory.
- * Remove all the non-data blocks.
- * Re-initialize to (empty) node form.
- * Loop over the data blocks reinserting each entry.
- * Force the directory into the right format.
- */
-void
-longform_dir2_rebuild(
-	xfs_mount_t	*mp,
-	xfs_ino_t	ino,
-	xfs_inode_t	*ip,
-	int		*num_illegal,
-	freetab_t	*freetab,
-	int		isblock)
-{
-	xfs_dabuf_t	*bp;
-	xfs_dablk_t	da_bno;
-	xfs_fileoff_t	next_da_bno;
-
-	do_warn(_("rebuilding directory inode %llu\n"), ino);
-
-	/* kill leaf blocks */
-	for (da_bno = mp->m_dirleafblk, next_da_bno = isblock ? NULLFILEOFF : 0;
-	     next_da_bno != NULLFILEOFF;
-	     da_bno = (xfs_dablk_t)next_da_bno) {
-		next_da_bno = da_bno + mp->m_dirblkfsbs - 1;
-		if (libxfs_bmap_next_offset(NULL, ip, &next_da_bno, XFS_DATA_FORK))
-			break;
-		if (libxfs_da_get_buf(NULL, ip, da_bno, -1, &bp, XFS_DATA_FORK)) {
-			do_error(_("can't get block %u for directory inode "
-				   "%llu\n"),
-				da_bno, ino);
-			/* NOTREACHED */
-		}
-		dir2_kill_block(mp, ip, da_bno, bp);
-	}
-
-	/* rebuild empty btree and freelist */
-	longform_dir2_rebuild_setup(mp, ino, ip, freetab);
-
-	/* rebuild directory */
-	for (da_bno = mp->m_dirdatablk, next_da_bno = 0;
-	     da_bno < mp->m_dirleafblk && next_da_bno != NULLFILEOFF;
-	     da_bno = (xfs_dablk_t)next_da_bno) {
-		next_da_bno = da_bno + mp->m_dirblkfsbs - 1;
-		if (libxfs_bmap_next_offset(NULL, ip, &next_da_bno, XFS_DATA_FORK))
-			break;
-		longform_dir2_rebuild_data(mp, ino, ip, da_bno);
-	}
-
-	/* put the directory in the appropriate on-disk format */
-	longform_dir2_rebuild_finish(mp, ino, ip);
-	*num_illegal = 0;
-}
-
-/*
- * succeeds or dies, inode never gets dirtied since all changes
- * happen in file blocks.  the inode size and other core info
- * is already correct, it's just the leaf entries that get altered.
- * XXX above comment is wrong for v2 - need to see why it matters
+ * If a directory is corrupt, we need to read in as many entries as possible,
+ * destroy the entry and create a new one with recovered name/inode pairs.
+ * (ie. get libxfs to do all the grunt work)
  */
 void
 longform_dir2_entry_check(xfs_mount_t	*mp,
@@ -2810,15 +2549,14 @@ longform_dir2_entry_check(xfs_mount_t	*m
 			dir_stack_t	*stack,
 			ino_tree_node_t	*irec,
 			int		ino_offset,
-			name_hash_tab_t	*nametab)
+			dir_hash_tab_t	*hashtab)
 {
 	xfs_dir2_block_t	*block;
 	xfs_dir2_leaf_entry_t	*blp;
-	xfs_dabuf_t		*bp;
+	xfs_dabuf_t		**bplist;
 	xfs_dir2_block_tail_t	*btp;
 	xfs_dablk_t		da_bno;
 	freetab_t		*freetab;
-	dir_hash_tab_t		*hashtab;
 	int			i;
 	int			isblock;
 	int			isleaf;
@@ -2840,6 +2578,7 @@ longform_dir2_entry_check(xfs_mount_t	*m
 		freetab->ents[i].v = NULLDATAOFF;
 		freetab->ents[i].s = 0;
 	}
+	bplist = calloc(freetab->naents, sizeof(xfs_dabuf_t*));
 	/* is this a block, leaf, or node directory? */
 	libxfs_dir2_isblock(NULL, ip, &isblock);
 	libxfs_dir2_isleaf(NULL, ip, &isleaf);
@@ -2847,50 +2586,58 @@ longform_dir2_entry_check(xfs_mount_t	*m
 	if (do_prefetch && !isblock)
 		prefetch_p6_dir2(mp, ip);
 
-	/* check directory data */
-	hashtab = dir_hash_init(ip->i_d.di_size);
+	/* check directory "data" blocks (ie. name/inode pairs) */
 	for (da_bno = 0, next_da_bno = 0;
 	     next_da_bno != NULLFILEOFF && da_bno < mp->m_dirleafblk;
 	     da_bno = (xfs_dablk_t)next_da_bno) {
 		next_da_bno = da_bno + mp->m_dirblkfsbs - 1;
+		ASSERT(XFS_DIR2_DA_TO_DB(mp, da_bno) < freetab->naents);
 		if (libxfs_bmap_next_offset(NULL, ip, &next_da_bno, XFS_DATA_FORK))
 			break;
-		if (libxfs_da_read_bufr(NULL, ip, da_bno,
-				da_bno == 0 ? -2 : -1, &bp, XFS_DATA_FORK)) {
-			do_error(_("can't read block %u for directory inode "
-				   "%llu\n"),
+		if (libxfs_da_read_bufr(NULL, ip, da_bno, -1, 
+				&bplist[XFS_DIR2_DA_TO_DB(mp, da_bno)], 
+				XFS_DATA_FORK)) {
+			do_warn(_(
+			"can't read data block %u for directory inode %llu\n"),
 				da_bno, ino);
-			/* NOTREACHED */
+			*num_illegal++;
+			continue;	/* try and read all "data" blocks */
 		}
-		/* is there a hole at the start? */
-		if (da_bno == 0 && bp == NULL)
-			continue;
 		longform_dir2_entry_check_data(mp, ip, num_illegal, need_dot,
-			stack, irec, ino_offset, &bp, hashtab, &freetab, 
-			nametab, da_bno, isblock);
-		/* it releases the buffer unless isblock is set */
+				stack, irec, ino_offset, 
+				&bplist[XFS_DIR2_DA_TO_DB(mp, da_bno)], hashtab,  
+				&freetab, da_bno, isblock);
 	}
 	fixit = (*num_illegal != 0) || dir2_is_badino(ino);
 
 	/* check btree and freespace */
 	if (isblock) {
-		ASSERT(bp);
-		block = bp->data;
+		block = bplist[0]->data;
 		btp = XFS_DIR2_BLOCK_TAIL_P(mp, block);
 		blp = XFS_DIR2_BLOCK_LEAF_P(btp);
-		seeval = dir_hash_see_all(hashtab, blp, INT_GET(btp->count, ARCH_CONVERT), INT_GET(btp->stale, ARCH_CONVERT));
+		seeval = dir_hash_see_all(hashtab, blp, 
+				INT_GET(btp->count, ARCH_CONVERT), 
+				INT_GET(btp->stale, ARCH_CONVERT));
 		if (dir_hash_check(hashtab, ip, seeval))
 			fixit |= 1;
-		libxfs_da_brelse(NULL, bp);
 	} else if (isleaf) {
 		fixit |= longform_dir2_check_leaf(mp, ip, hashtab, freetab);
 	} else {
 		fixit |= longform_dir2_check_node(mp, ip, hashtab, freetab);
 	}
-	dir_hash_done(hashtab);
-	if (!no_modify && fixit)
-		longform_dir2_rebuild(mp, ino, ip, num_illegal, freetab,
-			isblock);
+	if (!no_modify && fixit) {
+		dir_hash_dup_names(hashtab);
+		for (i = 0; i < freetab->naents; i++) 
+			if (bplist[i])
+				libxfs_da_brelse(NULL, bplist[i]);
+		longform_dir2_rebuild(mp, ino, ip, hashtab);
+		*num_illegal = 0;
+	} else {
+		for (i = 0; i < freetab->naents; i++) 
+			if (bplist[i])
+				libxfs_da_brelse(NULL, bplist[i]);
+	}
+	
 	free(freetab);
 }
 
@@ -2906,7 +2653,7 @@ shortform_dir_entry_check(xfs_mount_t	*m
 			dir_stack_t	*stack,
 			ino_tree_node_t	*current_irec,
 			int		current_ino_offset,
-			name_hash_tab_t	*nametab)
+			dir_hash_tab_t	*hashtab)
 {
 	xfs_ino_t		lino;
 	xfs_ino_t		parent;
@@ -3044,7 +2791,7 @@ _("entry \"%s\" in shortform dir %llu re
 		ASSERT(irec != NULL);
 
 		ino_offset = XFS_INO_TO_AGINO(mp, lino) - irec->ino_startnum;
-
+		
 		/*
 		 * if it's a free inode, blow out the entry.
 		 * by now, any inode that we think is free
@@ -3066,8 +2813,9 @@ _("entry \"%s\" in shortform dir inode %
 				do_warn(_("would junk entry \"%s\"\n"),
 					fname);
 			}
-		} else if (!name_hash_add(nametab, sf_entry->name, 
-					sf_entry->namelen)) {
+		} else if (!dir_hash_add(hashtab, 
+				(xfs_dir2_dataptr_t)(sf_entry - &sf->list[0]),
+				lino, sf_entry->namelen, sf_entry->name)) {
 			/*
 			 * check for duplicate names in directory.
 			 */ 
@@ -3311,7 +3059,7 @@ shortform_dir2_entry_check(xfs_mount_t	*
 			dir_stack_t	*stack,
 			ino_tree_node_t	*current_irec,
 			int		current_ino_offset,
-			name_hash_tab_t	*nametab)
+			dir_hash_tab_t	*hashtab)
 {
 	xfs_ino_t		lino;
 	xfs_ino_t		parent;
@@ -3484,7 +3232,9 @@ shortform_dir2_entry_check(xfs_mount_t	*
 				do_warn(_("would junk entry \"%s\"\n"),
 					fname);
 			}
-		} else if (!name_hash_add(nametab, sfep->name, sfep->namelen)) {
+		} else if (!dir_hash_add(hashtab, (xfs_dir2_dataptr_t)
+					(sfep - XFS_DIR2_SF_FIRSTENTRY(sfp)),
+				lino, sfep->namelen, sfep->name)) {
 			/*
 			 * check for duplicate names in directory.
 			 */ 
@@ -3650,7 +3400,7 @@ process_dirstack(xfs_mount_t *mp, dir_st
 	xfs_trans_t		*tp;
 	xfs_dahash_t		hashval;
 	ino_tree_node_t		*irec;
-	name_hash_tab_t		*nametab;
+	dir_hash_tab_t		*hashtab;
 	int			ino_offset, need_dot, committed;
 	int			dirty, num_illegal, error, nres;
 
@@ -3731,7 +3481,7 @@ process_dirstack(xfs_mount_t *mp, dir_st
 
 		add_inode_refchecked(ino, irec, ino_offset);
 
-		nametab = name_hash_init(ip->i_d.di_size);
+		hashtab = dir_hash_init(ip->i_d.di_size);
 
 		/*
 		 * look for bogus entries
@@ -3750,13 +3500,13 @@ process_dirstack(xfs_mount_t *mp, dir_st
 							&num_illegal, &need_dot,
 							stack, irec,
 							ino_offset,
-							nametab);
+							hashtab);
 			else
 				longform_dir_entry_check(mp, ino, ip,
 							&num_illegal, &need_dot,
 							stack, irec,
 							ino_offset,
-							nametab);
+							hashtab);
 			break;
 		case XFS_DINODE_FMT_LOCAL:
 			tp = libxfs_trans_alloc(mp, 0);
@@ -3781,12 +3531,12 @@ process_dirstack(xfs_mount_t *mp, dir_st
 				shortform_dir2_entry_check(mp, ino, ip, &dirty,
 							stack, irec,
 							ino_offset,
-							nametab);
+							hashtab);
 			else
 				shortform_dir_entry_check(mp, ino, ip, &dirty,
 							stack, irec,
 							ino_offset,
-							nametab);
+							hashtab);
 
 			ASSERT(dirty == 0 || (dirty && !no_modify));
 			if (dirty)  {
@@ -3801,7 +3551,7 @@ process_dirstack(xfs_mount_t *mp, dir_st
 		default:
 			break;
 		}
-		name_hash_done(nametab);
+		dir_hash_done(hashtab);
 
 		hashval = 0;
 
@@ -4223,6 +3973,10 @@ _("        - skipping filesystem travers
 	}
 
 	do_log(_("        - traversals finished ... \n"));
+	
+	/* flush all dirty data before doing lost+found search */
+	libxfs_bcache_flush();
+	
 	do_log(_("        - moving disconnected inodes to lost+found ... \n"));
 
 	/*

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Review: xfs_repair fixes for dir2 corruption
  2006-07-28  1:58         ` Review: xfs_repair fixes for dir2 corruption Barry Naujok
  2006-07-28  8:10           ` Nathan Scott
  2006-07-30  5:19           ` christian
@ 2006-08-01 21:50           ` Adam Sjøgren
  2006-08-01 23:06             ` Christian Guggenberger
  2 siblings, 1 reply; 12+ messages in thread
From: Adam Sjøgren @ 2006-08-01 21:50 UTC (permalink / raw)
  To: linux-xfs

On Fri, 28 Jul 2006 11:58:52 +1000, Barry wrote:

> For those that are keen to fix the 16777216 problem, give this patch
> a go.

I thought that I'd just report that the patch worked flawlessly when I
applied it to xfsprogs from cvs and ran the the resulting xfs_repair
on my bad filesystem (see <87veqhfmo4.fsf@topper.koldfront.dk>,
<87fyhd1ek7.fsf@topper.koldfront.dk>).

I have another box (Intel P4) which had the problem, the patched
xfs_repair fixed its three bad filesystems as well.


  Thanks!

   Adam

-- 
 "Subdued flamboyance"                                        Adam Sjøgren
                                                         asjo@koldfront.dk

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Review: xfs_repair fixes for dir2 corruption
  2006-08-01 21:50           ` Adam Sjøgren
@ 2006-08-01 23:06             ` Christian Guggenberger
  0 siblings, 0 replies; 12+ messages in thread
From: Christian Guggenberger @ 2006-08-01 23:06 UTC (permalink / raw)
  To: Adam Sjøgren; +Cc: linux-xfs

On Tue, Aug 01, 2006 at 11:50:14PM +0200, Adam Sjøgren wrote:
> On Fri, 28 Jul 2006 11:58:52 +1000, Barry wrote:
> 
> > For those that are keen to fix the 16777216 problem, give this patch
> > a go.
> 
> I thought that I'd just report that the patch worked flawlessly when I
> applied it to xfsprogs from cvs and ran the the resulting xfs_repair
> on my bad filesystem (see <87veqhfmo4.fsf@topper.koldfront.dk>,
> <87fyhd1ek7.fsf@topper.koldfront.dk>).
>

I'd like to report a "me too". No problems arised while repairing a
corrupted filesystem on powerpc with the method mentioned above.

thanks.
 - Christian

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2006-08-02  0:25 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-07-25  3:50 review: increase bulkstat readahead window Nathan Scott
2006-07-25  9:40 ` Christoph Hellwig
2006-07-25 22:37   ` Nathan Scott
2006-07-26 10:25     ` Christoph Hellwig
2006-07-27 23:17       ` Nathan Scott
2006-07-28  1:58         ` Review: xfs_repair fixes for dir2 corruption Barry Naujok
2006-07-28  8:10           ` Nathan Scott
2006-07-28 14:45             ` Madan Valluri
2006-07-31  7:18               ` Barry Naujok
2006-07-30  5:19           ` christian
2006-08-01 21:50           ` Adam Sjøgren
2006-08-01 23:06             ` Christian Guggenberger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox