public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/9] xfsrestore dirent limitations and scaling issues
@ 2010-11-16 15:05 wkendall
  2010-11-16 15:05 ` [PATCH v3 1/9] xfsrestore: turn off NODECHK wkendall
                   ` (9 more replies)
  0 siblings, 10 replies; 21+ messages in thread
From: wkendall @ 2010-11-16 15:05 UTC (permalink / raw)
  To: xfs

v3 of this series reworks a couple of the patches based on feedback
from Alex Elder:

- 3rd patch: changed a buffer size check to an assert.
- 7th patch: move static functions up in the file and
  create typedefs for segix_t and relnix_t.

The remainder is unchanged from the previous submission.

Bill

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v3 1/9] xfsrestore: turn off NODECHK
  2010-11-16 15:05 [PATCH v3 0/9] xfsrestore dirent limitations and scaling issues wkendall
@ 2010-11-16 15:05 ` wkendall
  2010-11-17  9:20   ` Christoph Hellwig
  2010-11-16 15:05 ` [PATCH v3 2/9] xfsrestore: change nrh_t from 32 to 64 bits wkendall
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 21+ messages in thread
From: wkendall @ 2010-11-16 15:05 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: no_node_check --]
[-- Type: text/plain, Size: 1694 bytes --]

The NODECHK macro should only be enabled as needed for
development/debugging. Having it on limits xfsrestore to dumps
with 268 million directory entries instead of 4 billion. 

Signed-off-by: Bill Kendall <wkendall@sgi.com>

Reviewed-by: Alex Elder <aelder@sgi.com>

---
 restore/Makefile |    2 +-
 restore/node.c   |    4 +++-
 2 files changed, 4 insertions(+), 2 deletions(-)

Index: xfsdump-kernel.org/restore/Makefile
===================================================================
--- xfsdump-kernel.org.orig/restore/Makefile
+++ xfsdump-kernel.org/restore/Makefile
@@ -103,7 +103,7 @@ LLDLIBS = $(LIBUUID) $(LIBHANDLE) $(LIBA
 LTDEPENDENCIES = $(LIBRMT)
 
 LCFLAGS = -DRESTORE -DRMT -DBASED -DDOSOCKS -DINVCONVFIX -DPIPEINVFIX \
-	-DEOMFIX -DSESSCPLT -DWHITEPARSE -DNODECHK -DDIRENTHDR_CHECKSUM \
+	-DEOMFIX -DSESSCPLT -DWHITEPARSE -DDIRENTHDR_CHECKSUM \
 	-DF_FSSETDM
 
 default: depend $(LTCOMMAND)
Index: xfsdump-kernel.org/restore/node.c
===================================================================
--- xfsdump-kernel.org.orig/restore/node.c
+++ xfsdump-kernel.org/restore/node.c
@@ -42,7 +42,9 @@ extern size_t pgmask;
 
 /* macros for manipulating node handles when handle consistency
  * checking is enabled. the upper bits of a handle will be loaded
- * with the node gen count, described below.
+ * with the node gen count, described below. this should not be
+ * used for production code, it cuts into the number of dirents
+ * that xfsrestore can handle.
  */
 #define HDLGENCNT		4
 #define	HDLGENSHIFT		( NBBY * sizeof ( nh_t ) - HDLGENCNT )

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v3 2/9] xfsrestore: change nrh_t from 32 to 64 bits
  2010-11-16 15:05 [PATCH v3 0/9] xfsrestore dirent limitations and scaling issues wkendall
  2010-11-16 15:05 ` [PATCH v3 1/9] xfsrestore: turn off NODECHK wkendall
@ 2010-11-16 15:05 ` wkendall
  2010-11-17  9:23   ` Christoph Hellwig
  2010-11-16 15:05 ` [PATCH v3 3/9] xfsrestore: cache path lookups wkendall
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 21+ messages in thread
From: wkendall @ 2010-11-16 15:05 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: namreg_64bit --]
[-- Type: text/plain, Size: 4376 bytes --]

An nrh_t refers to a byte offset in a file containing all the pathname
components from a dump. At an average filename length of 20
characters, an nrh_t would overflow on a dump containing ~214 million
directory entries. Removing this limitation allows xfsrestore to
handle 4 billion directory entries.

Signed-off-by: Bill Kendall <wkendall@sgi.com>

Reviewed-by: Alex Elder <aelder@sgi.com>

---
 restore/namreg.c |   20 +++++++-------------
 restore/namreg.h |    4 ++--
 restore/tree.c   |   29 ++++++++++++++---------------
 3 files changed, 23 insertions(+), 30 deletions(-)

Index: xfsdump-kernel.org/restore/namreg.h
===================================================================
--- xfsdump-kernel.org.orig/restore/namreg.h
+++ xfsdump-kernel.org/restore/namreg.h
@@ -26,8 +26,8 @@
 
 /* nrh_t - handle to a registered name
  */
-typedef size32_t nrh_t;
-#define NRH_NULL	SIZE32MAX
+typedef size64_t nrh_t;
+#define NRH_NULL	SIZE64MAX
 
 
 /* namreg_init - creates the name registry. resync is TRUE if the
Index: xfsdump-kernel.org/restore/namreg.c
===================================================================
--- xfsdump-kernel.org.orig/restore/namreg.c
+++ xfsdump-kernel.org/restore/namreg.c
@@ -72,26 +72,20 @@ typedef struct namreg_tran namreg_tran_t
  */
 #define CHKBITCNT		2
 #define	CHKBITSHIFT		( NBBY * sizeof( nrh_t ) - CHKBITCNT )
-#define	CHKBITLOMASK		( ( 1 << CHKBITCNT ) - 1 )
+#define	CHKBITLOMASK		( ( 1ULL << CHKBITCNT ) - 1 )
 #define	CHKBITMASK		( CHKBITLOMASK << CHKBITSHIFT )
 #define CHKHDLCNT		CHKBITSHIFT
-#define CHKHDLMASK		( ( 1 << CHKHDLCNT ) - 1 )
-#define CHKGETBIT( h )		( ( h >> CHKBITSHIFT ) & CHKBITLOMASK )
-#define CHKGETHDL( h )		( h & CHKHDLMASK )
-#define CHKMKHDL( c, h )	( ( ( c << CHKBITSHIFT ) & CHKBITMASK )	\
+#define CHKHDLMASK		( ( 1ULL << CHKHDLCNT ) - 1 )
+#define CHKGETBIT( h )		( ( (h) >> CHKBITSHIFT ) & CHKBITLOMASK )
+#define CHKGETHDL( h )		( (h) & CHKHDLMASK )
+#define CHKMKHDL( c, h )	( ( ( (c) << CHKBITSHIFT ) & CHKBITMASK )	\
 				  |					\
-				  ( h & CHKHDLMASK ))
+				  ( (h) & CHKHDLMASK ))
 #define HDLMAX			( ( off64_t )CHKHDLMASK )
 
 #else /* NAMREGCHK */
 
-#define HDLMAX			( ( ( off64_t )1			\
-				    <<					\
-				    ( ( off64_t )NBBY			\
-				      *					\
-				      ( off64_t )sizeof( nrh_t )))	\
-				  -					\
-				  ( off64_t )2 ) /* 2 to avoid NRH_NULL */
+#define HDLMAX			( NRH_NULL - 1 )
 
 #endif /* NAMREGCHK */
 
Index: xfsdump-kernel.org/restore/tree.c
===================================================================
--- xfsdump-kernel.org.orig/restore/tree.c
+++ xfsdump-kernel.org/restore/tree.c
@@ -166,19 +166,18 @@ typedef struct tran tran_t;
 #define NODESZ	48
 
 struct node {
-	xfs_ino_t n_ino;		/* 8  8 ino */
-	nrh_t n_nrh;		/* 4 12 handle to name in name registry */
-	dah_t n_dah;		/* 4 16 handle to directory attributes */
-	nh_t n_hashh;		/* 4 20 hash array */
-	nh_t n_parh;		/* 4 24 parent */
-	nh_t n_sibh;		/* 4 28 sibling list */
-	nh_t n_sibprevh;	/* 4 32 prev sibling list - dbl link list */
-	nh_t n_cldh;		/* 4 36 children list */
-	nh_t n_lnkh;		/* 4 40 hard link list */
-	gen_t n_gen;		/* 2 42 generation count mod 0x10000 */
-	u_char_t n_flags;	/* 1 43 action and state flags */
-	u_char_t n_nodehkbyte;	/* 1 44 given to node abstraction */
-	int32_t pad;		/* 4 48 padding to 8 byte boundary */
+	xfs_ino_t n_ino;	/* 8  8 ino */
+	nrh_t n_nrh;		/* 8 16 handle to name in name registry */
+	dah_t n_dah;		/* 4 20 handle to directory attributes */
+	nh_t n_hashh;		/* 4 24 hash array */
+	nh_t n_parh;		/* 4 28 parent */
+	nh_t n_sibh;		/* 4 32 sibling list */
+	nh_t n_sibprevh;	/* 4 36 prev sibling list - dbl link list */
+	nh_t n_cldh;		/* 4 40 children list */
+	nh_t n_lnkh;		/* 4 44 hard link list */
+	gen_t n_gen;		/* 2 46 generation count mod 0x10000 */
+	u_char_t n_flags;	/* 1 47 action and state flags */
+	u_char_t n_nodehkbyte;	/* 1 48 given to node abstraction */
 };
 
 typedef struct node node_t;
@@ -3393,9 +3392,9 @@ Node_free( nh_t *nhp )
 		namreg_del( np->n_nrh );
 		np->n_nrh = NRH_NULL;
 	}
-	if ( np->n_dah != NRH_NULL ) {
+	if ( np->n_dah != DAH_NULL ) {
 		dirattr_del( np->n_dah );
-		np->n_dah = NRH_NULL;
+		np->n_dah = DAH_NULL;
 	}
 	np->n_flags = 0;
 	np->n_parh = NH_NULL;

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v3 3/9] xfsrestore: cache path lookups
  2010-11-16 15:05 [PATCH v3 0/9] xfsrestore dirent limitations and scaling issues wkendall
  2010-11-16 15:05 ` [PATCH v3 1/9] xfsrestore: turn off NODECHK wkendall
  2010-11-16 15:05 ` [PATCH v3 2/9] xfsrestore: change nrh_t from 32 to 64 bits wkendall
@ 2010-11-16 15:05 ` wkendall
  2010-11-17  9:24   ` Christoph Hellwig
  2010-11-16 15:05 ` [PATCH v3 4/9] xfsrestore: mmap dirent names for faster lookups wkendall
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 21+ messages in thread
From: wkendall @ 2010-11-16 15:05 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: node2path_caching --]
[-- Type: text/plain, Size: 3949 bytes --]

In order to resolve a pathname, xfsrestore must work from an inode
number (from the dump) and recurse up the directory entry tree that it
has constructed. Each level of recursion requires a seek and read to
get the name of the dirent, and possibly a mmap of a section of the
directory entry tree if it is not already mapped (and in that case,
possibly a munmap of another section). It's quite common to resolve
pathnames in the same directory consecutively, so simply caching the
parent directory pathname from the previous lookup saves quite a bit
of overhead.

Signed-off-by: Bill Kendall <wkendall@sgi.com>

Reviewed-by: Alex Elder <aelder@sgi.com>


---
 restore/tree.c |   42 +++++++++++++++++++++++++++++++++++++-----
 1 file changed, 37 insertions(+), 5 deletions(-)

Index: xfsdump-kernel.org/restore/tree.c
===================================================================
--- xfsdump-kernel.org.orig/restore/tree.c
+++ xfsdump-kernel.org/restore/tree.c
@@ -236,6 +236,14 @@ struct link_iter_context {
 };
 typedef struct link_iter_context link_iter_context_t;
 
+/* used for caching parent pathname from previous Node2path result
+ */
+struct path_cache {
+	nh_t nh;
+	intgen_t len;
+	char buf[MAXPATHLEN];
+};
+typedef struct path_cache path_cache_t;
 
 /* declarations of externally defined global symbols *************************/
 
@@ -254,7 +262,8 @@ static nh_t Node_alloc( xfs_ino_t ino,
 static void Node_free( nh_t *nhp );
 static node_t * Node_map( nh_t nh );
 static void Node_unmap( nh_t nh, node_t **npp );
-static intgen_t Node2path_recurse( nh_t nh, char *buf, intgen_t bufsz );
+static intgen_t Node2path_recurse( nh_t nh, char *buf,
+				   intgen_t bufsz, intgen_t level );
 static void adopt( nh_t parh, nh_t cldh, nrh_t nrh );
 static nrh_t disown( nh_t cldh );
 static void selsubtree( nh_t nh, bool_t sensepr );
@@ -3435,7 +3444,7 @@ Node2path( nh_t nh, char *path, char *er
 {
 	intgen_t remainingcnt;
 	path[ 0 ] = 0; /* in case root node passed in */
-	remainingcnt = Node2path_recurse( nh, path, MAXPATHLEN );
+	remainingcnt = Node2path_recurse( nh, path, MAXPATHLEN, 0 );
 	if ( remainingcnt <= 0 ) {
 		node_t *np = Node_map( nh );
 		xfs_ino_t ino = np->n_ino;
@@ -3459,13 +3468,15 @@ Node2path( nh_t nh, char *path, char *er
  * works because the buffer size is secretly 2 * MAXPATHLEN.
  */
 static intgen_t
-Node2path_recurse( nh_t nh, char *buf, intgen_t bufsz )
+Node2path_recurse( nh_t nh, char *buf, intgen_t bufsz, intgen_t level )
 {
+	static path_cache_t cache = { NH_NULL, 0, "" };
 	node_t *np;
 	nh_t parh;
 	xfs_ino_t ino;
 	gen_t gen;
 	nrh_t nrh;
+	char *oldbuf;
 	intgen_t oldbufsz;
 	intgen_t namelen;
 
@@ -3475,6 +3486,14 @@ Node2path_recurse( nh_t nh, char *buf, i
 		return bufsz;
 	}
 
+	/* if we have a cache hit, no need to recurse any further
+	 */
+	if ( nh == cache.nh ) {
+		ASSERT( bufsz > cache.len );
+		strcpy( buf, cache.buf );
+		return bufsz - cache.len;
+	}
+
 	/* extract useful node members
 	 */
 	np = Node_map( nh );
@@ -3486,8 +3505,9 @@ Node2path_recurse( nh_t nh, char *buf, i
 
 	/* build path to parent
 	 */
+	oldbuf = buf;
 	oldbufsz = bufsz;
-	bufsz = Node2path_recurse( parh, buf, bufsz ); /* RECURSION */
+	bufsz = Node2path_recurse( parh, buf, bufsz, level+1 ); /* RECURSION */
 	if ( bufsz <= 0 ) {
 		return bufsz;
 	}
@@ -3517,10 +3537,22 @@ Node2path_recurse( nh_t nh, char *buf, i
 		ASSERT( namelen > 0 );
 	}
 
-	/* return remaining buffer size
+	/* update remaining buffer size
 	 */
 	bufsz -= namelen;
 	ASSERT( bufsz + MAXPATHLEN > 0 );
+
+	/* update the cache if we're the target's parent
+	 * (and the pathname is not too long)
+	 */
+	if ( level == 1 && bufsz > 0 ) {
+		cache.nh = nh;
+		strcpy( cache.buf, oldbuf );
+		cache.len = oldbufsz - bufsz;
+	}
+
+	/* return remaining buffer size
+	 */
 	return bufsz;
 }
 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v3 4/9] xfsrestore: mmap dirent names for faster lookups
  2010-11-16 15:05 [PATCH v3 0/9] xfsrestore dirent limitations and scaling issues wkendall
                   ` (2 preceding siblings ...)
  2010-11-16 15:05 ` [PATCH v3 3/9] xfsrestore: cache path lookups wkendall
@ 2010-11-16 15:05 ` wkendall
  2010-11-17  9:34   ` Christoph Hellwig
  2010-11-16 15:05 ` [PATCH v3 5/9] xfsrestore: cleanup node allocation wkendall
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 21+ messages in thread
From: wkendall @ 2010-11-16 15:05 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: namreg_map --]
[-- Type: text/plain, Size: 7172 bytes --]

Pathname resolution in xfsrestore is about 4x faster if the file
containing dirent names ("namreg") is memory mapped.  If xfsrestore is
unable to map the file (e.g., due to virtual memory constraints)
fallback to the existing seek-and-read approach.

The file is mapped after all directory entries have been written to
the "namreg" file. If the caller tries to add additional entries after
the file has been mapped, it will be unmapped and restore will resort
back to seek-and-read lookups.

Signed-off-by: Bill Kendall <wkendall@sgi.com>

Reviewed-by: Alex Elder <aelder@sgi.com>

---
 restore/content.c |    2 
 restore/namreg.c  |  124 ++++++++++++++++++++++++++++++++++++++++--------------
 restore/namreg.h  |    6 +-
 3 files changed, 98 insertions(+), 34 deletions(-)

Index: xfsdump-kernel.org/restore/content.c
===================================================================
--- xfsdump-kernel.org.orig/restore/content.c
+++ xfsdump-kernel.org/restore/content.c
@@ -3026,7 +3026,7 @@ applydirdump( drive_t *drivep,
 			return rv;
 		}
 
-		if ((rv = namreg_flush()) != RV_OK) {
+		if ((rv = namreg_flush(BOOL_TRUE)) != RV_OK) {
 			return rv;
 		}
 
Index: xfsdump-kernel.org/restore/namreg.c
===================================================================
--- xfsdump-kernel.org.orig/restore/namreg.c
+++ xfsdump-kernel.org/restore/namreg.c
@@ -57,6 +57,7 @@ typedef struct namreg_pers namreg_pers_t
 struct namreg_tran {
 	char *nt_pathname;
 	int nt_fd;
+	char *nt_map;
 	bool_t nt_at_endpr;
 	size_t nt_off;
 	char nt_buf[NAMREG_BUFSIZE];
@@ -96,6 +97,9 @@ extern size_t pgsz;
 
 /* forward declarations of locally defined static functions ******************/
 
+static inline bool_t namreg_is_mapped();
+static void namreg_map();
+static void namreg_unmap();
 
 /* definition of locally defined global variables ****************************/
 
@@ -263,6 +267,10 @@ namreg_add( char *name, size_t namelen )
 	ASSERT( ntp );
 	ASSERT( npp );
 
+	if ( namreg_is_mapped() ) {
+		namreg_unmap();
+	}
+
 	/* make sure file pointer is positioned to append
 	 */
 	if ( ! ntp->nt_at_endpr ) {
@@ -280,7 +288,7 @@ namreg_add( char *name, size_t namelen )
 	}
 
 	if (ntp->nt_off + namelen + 1 > sizeof(ntp->nt_buf)) {
-		if (namreg_flush() != RV_OK) {
+		if (namreg_flush(BOOL_FALSE) != RV_OK) {
 			return NRH_NULL;
 		}
 	}
@@ -327,7 +335,7 @@ namreg_del( nrh_t nrh )
 }
 
 rv_t
-namreg_flush( void )
+namreg_flush( bool_t done_adding )
 {
 	ssize_t nwritten;
 
@@ -335,6 +343,10 @@ namreg_flush( void )
 	*/
 	assert( ntp );
 
+	if ( namreg_is_mapped() ) {
+		namreg_unmap();
+	}
+
 	if (ntp->nt_off) {
 
 		/* write the accumulated name strings.
@@ -356,6 +368,11 @@ namreg_flush( void )
 		}
 		ntp->nt_off = 0;
 	}
+
+	if (done_adding) {
+		namreg_map();
+	}
+
 	return RV_OK;
 }
 
@@ -367,6 +384,7 @@ namreg_get( nrh_t nrh,
 	off64_t newoff;
 	intgen_t nread;
 	size_t len;
+	char *in_bufp;
 	static char read_buf[256];
 	/* long enough for the longest allowed name (255), plus 1 for length */
 #ifdef NAMREGCHK
@@ -403,41 +421,51 @@ namreg_get( nrh_t nrh,
 
 	lock( );
 
-	if ( ntp->nt_at_endpr && ntp->nt_off ) {
-		if (namreg_flush() != RV_OK) {
+	if ( namreg_is_mapped() ) {
+
+		in_bufp = ntp->nt_map + newoff - NAMREG_PERS_SZ;
+
+	} else {
+
+		if ( ntp->nt_at_endpr && ntp->nt_off ) {
+			if (namreg_flush(BOOL_FALSE) != RV_OK) {
+				unlock( );
+				return -3;
+			}
+		}
+
+		/* seek to the name
+		*/
+		newoff = lseek64( ntp->nt_fd, newoff, SEEK_SET );
+		if ( newoff == ( off64_t )-1 ) {
 			unlock( );
+			mlog( MLOG_NORMAL, _(
+				"lseek of namreg failed: %s\n"),
+				strerror( errno ));
 			return -3;
 		}
-	}
+		ntp->nt_at_endpr = BOOL_FALSE;
 
-	/* seek to the name
-	 */
-	newoff = lseek64( ntp->nt_fd, newoff, SEEK_SET );
-	if ( newoff == ( off64_t )-1 ) {
-		unlock( );
-		mlog( MLOG_NORMAL, _(
-		      "lseek of namreg failed: %s\n"),
-		      strerror( errno ));
-		return -3;
-	}
+		/* read the name length and the name itself in one call
+		 * NOTE: assumes read_buf is big enough for the longest
+		 * allowed name (255 chars) plus one byte for length.
+		 */
+		nread = read( ntp->nt_fd, ( void * )read_buf, sizeof(read_buf) );
+		if ( nread <= 0 ) {
+			unlock( );
+			mlog( MLOG_NORMAL, _(
+				"read of namreg failed: %s (nread = %d)\n"),
+				strerror( errno ),
+				nread );
+			return -3;
+		}
 
-	/* read the name length and the name itself in one call
-	 * NOTE: assumes read_buf is big enough for the longest
-	 * allowed name (255 chars) plus one byte for length.
-	 */
-	nread = read( ntp->nt_fd, ( void * )read_buf, sizeof(read_buf) );
-	if ( nread <= 0 ) {
-		unlock( );
-		mlog( MLOG_NORMAL, _(
-		      "read of namreg failed: %s (nread = %d)\n"),
-		      strerror( errno ),
-		      nread );
-		return -3;
+		in_bufp = read_buf;
 	}
 
 	/* deal with a short caller-supplied buffer
 	 */
-	len = ( size_t )read_buf[0];
+	len = ( size_t )in_bufp[0];
 	if ( bufsz < len + 1 ) {
 		unlock( );
 		return -1;
@@ -445,7 +473,7 @@ namreg_get( nrh_t nrh,
 
 	/* copy the name into the caller-supplied buffer.
 	 */
-	strncpy(bufp, read_buf+1, len);
+	strncpy(bufp, in_bufp+1, len);
 
 #ifdef NAMREGCHK
 
@@ -460,8 +488,6 @@ namreg_get( nrh_t nrh,
 	/* null-terminate the string if room
 	 */
 	bufp[ len ] = 0;
-
-	ntp->nt_at_endpr = BOOL_FALSE;
 	
 	unlock( );
 
@@ -470,3 +496,39 @@ namreg_get( nrh_t nrh,
 
 
 /* definition of locally defined static functions ****************************/
+
+static inline bool_t
+namreg_is_mapped()
+{
+	return ntp->nt_map ? BOOL_TRUE : BOOL_FALSE;
+}
+
+static void
+namreg_map()
+{
+	ntp->nt_map = ( char * ) mmap_autogrow(
+					npp->np_appendoff - NAMREG_PERS_SZ,
+					ntp->nt_fd,
+					NAMREG_PERS_SZ );
+
+	/* it's okay if this fails, just fall back to (the much slower)
+	 * seek-and-read lookups.
+	 */
+	if ( ntp->nt_map == ( char * )-1 ) {
+		mlog( MLOG_DEBUG, "failed to map namreg: %s\n",
+			strerror( errno ) );
+		ntp->nt_map = NULL;
+	}
+}
+
+static void
+namreg_unmap()
+{
+	/* we really shouldn't be here, we were told that no more names would
+	 * be added to the namreg. rather than tip over, just fall back to
+	 * seek-and-read lookups.
+	 */
+	munmap( ( void * )ntp->nt_map, npp->np_appendoff - NAMREG_PERS_SZ );
+	ntp->nt_map = NULL;
+	mlog( MLOG_DEBUG, "namreg lookups resorting to seek-and-read\n" );
+}
Index: xfsdump-kernel.org/restore/namreg.h
===================================================================
--- xfsdump-kernel.org.orig/restore/namreg.h
+++ xfsdump-kernel.org/restore/namreg.h
@@ -49,9 +49,11 @@ extern nrh_t namreg_add( char *name, siz
  */
 extern void namreg_del( nrh_t nrh );
 
-/* namreg_flush - flush namreg I/O buffer.  Returns 0 if successful.
+/* namreg_flush - flush namreg I/O buffer. pass BOOL_TRUE
+ * if done adding names to the registry.
+ * Returns 0 if successful.
  */
-extern rv_t namreg_flush( void );
+extern rv_t namreg_flush( bool_t done_adding );
 
 /* namreg_get - retrieves the name identified by the index.
  * fills the buffer with the null-terminated name from the registry.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v3 5/9] xfsrestore: cleanup node allocation
  2010-11-16 15:05 [PATCH v3 0/9] xfsrestore dirent limitations and scaling issues wkendall
                   ` (3 preceding siblings ...)
  2010-11-16 15:05 ` [PATCH v3 4/9] xfsrestore: mmap dirent names for faster lookups wkendall
@ 2010-11-16 15:05 ` wkendall
  2010-11-16 15:05 ` [PATCH v3 6/9] xfsrestore: fix node table setup wkendall
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 21+ messages in thread
From: wkendall @ 2010-11-16 15:05 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: node_alloc_cleanup --]
[-- Type: text/plain, Size: 6916 bytes --]

Simplify the node allocation code. The current code takes some
number of nodes from a new segment and links them into the
freelist whenever the freelist is depleted. There's no reason
to put the new nodes on the freelist, we can just allocate the
next available new node as needed. This also saves a trip through
win_map/win_unmap if there are no nodes on the freelist (the
common case).

Newly allocated nodes are no longer zeroed in this code, but
all allocations are done through Node_alloc which does initialize
each field.

Also remove a couple of TREE_DEBUG messages that were not useful.

Signed-off-by: Bill Kendall <wkendall@sgi.com>

Reviewed-by: Alex Elder <aelder@sgi.com>

---
 restore/node.c |  144 +++++++++++++++------------------------------------------
 1 file changed, 39 insertions(+), 105 deletions(-)

Index: xfsdump-kernel.org/restore/node.c
===================================================================
--- xfsdump-kernel.org.orig/restore/node.c
+++ xfsdump-kernel.org/restore/node.c
@@ -111,10 +111,6 @@ extern size_t pgmask;
  */
 #define NODESPERSEGMIN	1000000
 
-/* how many nodes to place on free list at a time
- */
-#define VIRGSACRMAX	8192 /* fudged: 8192 48 byte nodes (24 or 96 pages) */
-
 /* a node is identified internally by its index into the backing store.
  * this index is the offset of the node into the segmented portion of
  * backing store (follows the abstraction header page) divided by the
@@ -380,72 +376,41 @@ node_alloc( void )
 	nix_t nix;
 	u_char_t *p;
 	nh_t nh;
-	register nix_t *linkagep;
 #ifdef NODECHK
-	register u_char_t *hkpp;
-	register u_char_t gen;
-	register u_char_t unq;
+	u_char_t *hkpp;
+	u_char_t gen = 0;
+	u_char_t unq;
 #endif /* NODECHK */
 
-	/* if free list is depleted, map in a new window at the
-	 * end of backing store. put all nodes on free list.
-	 * initialize the gen count to the node index, and the unique
-	 * pattern to the free pattern.
-	 */
-	if ( node_hdrp->nh_freenix == NIX_NULL ) {
-		nix_t virgbegnix; /* abs. nix of first node in virg seg */
-		nix_t virgendnix; /* abs. nix of next node after last */
-		nix_t sacrcnt; /* how many virgins to put on free list */
-		nix_t sacrnix; 
-
-		ASSERT( node_hdrp->nh_virgrelnix
-			<
-			( nix_t )node_hdrp->nh_nodesperseg );
-		virgbegnix = OFF2NIX( node_hdrp->nh_virgsegreloff )
-			     +
-			     node_hdrp->nh_virgrelnix;
-		virgendnix =
-		      OFF2NIX( ( node_hdrp->nh_virgsegreloff
-			       +
-			       ( off64_t )node_hdrp->nh_segsz ) );
-#ifdef TREE_DEBUG
-		mlog(MLOG_DEBUG | MLOG_TREE,
-		   "node_alloc(): create freelist - "
-		   "virg_begin=%lld virg_end=%lld\n",
-		   virgbegnix, virgendnix); 
-#endif
-		ASSERT( virgendnix > virgbegnix );
-		sacrcnt = min( VIRGSACRMAX, virgendnix - virgbegnix );
-		ASSERT( sacrcnt >= 1 );
-		p = 0; /* keep lint happy */
-		win_map( NIX2OFF( virgbegnix ), ( void ** )&p );
+	/* if there's a node available on the free list, use it.
+	 * otherwise get the next one from the current virgin segment,
+	 * or allocate a new virgin segment if the current one is depleted.
+	 */
+	if ( node_hdrp->nh_freenix != NIX_NULL ) {
+		nix_t *linkagep;
+		off64_t off;
+
+		nix = node_hdrp->nh_freenix;
+
+		off = NIX2OFF( nix );
+		win_map( off, ( void ** )&p );
 		if (p == NULL)
-		    return NH_NULL;
-		node_hdrp->nh_freenix = virgbegnix;
-		for ( sacrnix = virgbegnix
-		      ;
-		      sacrnix < virgbegnix + sacrcnt - 1
-		      ;
-		      p += node_hdrp->nh_nodesz, sacrnix++ ) {
-			linkagep = ( nix_t * )p;
-			*linkagep = sacrnix + 1;
-#ifdef NODECHK
-			hkpp = p + node_hdrp->nh_nodehkix;
-			gen = ( u_char_t )sacrnix;
-			*hkpp = ( u_char_t )HKPMKHKP( ( size_t )gen,
-						      NODEUNQFREE );
-#endif /* NODECHK */
-		}
-		linkagep = ( nix_t * )p;
-		*linkagep = NIX_NULL;
+			return NH_NULL;
 #ifdef NODECHK
 		hkpp = p + node_hdrp->nh_nodehkix;
-		gen = ( u_char_t )sacrnix;
-		*hkpp = HKPMKHKP( gen, NODEUNQFREE );
+		gen = ( u_char_t )( HKPGETGEN( *p ) + ( u_char_t )1 );
+		unq = HKPGETUNQ( *hkpp );
+		ASSERT( unq != NODEUNQALCD );
+		ASSERT( unq == NODEUNQFREE );
 #endif /* NODECHK */
-		node_hdrp->nh_virgrelnix += sacrcnt;
-		win_unmap( node_hdrp->nh_virgsegreloff, ( void ** )&p );
 
+		/* adjust the free list */
+		linkagep = ( nix_t * )p;
+		node_hdrp->nh_freenix = *linkagep;
+
+		win_unmap( off, ( void ** )&p );
+
+	} else {
 		if ( node_hdrp->nh_virgrelnix
 		     >=
 		     ( nix_t )node_hdrp->nh_nodesperseg ) {
@@ -468,14 +433,11 @@ node_alloc( void )
 			mlog( MLOG_DEBUG,
 			      "pre-growing new node array segment at %lld "
 			      "size %lld\n",
-			      node_hdrp->nh_firstsegoff 
-			      +
-			      node_hdrp->nh_virgsegreloff
-			      +
-			      ( off64_t )node_hdrp->nh_segsz,
+			      node_hdrp->nh_firstsegoff +
+			      node_hdrp->nh_virgsegreloff,
 			      ( off64_t )node_hdrp->nh_segsz );
 			rval = ftruncate64( node_fd,
-					    node_hdrp->nh_firstsegoff 
+					    node_hdrp->nh_firstsegoff
 					    +
 					    node_hdrp->nh_virgsegreloff
 					    +
@@ -484,60 +446,32 @@ node_alloc( void )
 				mlog( MLOG_NORMAL | MLOG_WARNING | MLOG_TREE, _(
 				      "unable to autogrow node segment %llu: "
 				      "%s (%d)\n"),
-				      node_hdrp->nh_virgsegreloff
-				      /
-				      ( off64_t )node_hdrp->nh_segsz,
+				      OFF2NIX(node_hdrp->nh_virgsegreloff),
 				      strerror( errno ),
 				      errno );
 			}
 		}
-	}
 
-	/* map in window containing node at top of free list,
-	 * and adjust free list.
-	 */
-	nix = node_hdrp->nh_freenix;
-#ifdef TREE_DEBUG
-	mlog(MLOG_DEBUG | MLOG_TREE,
-	   "node_alloc(): win_map(%llu) and get head from node freelist\n",
-           NIX2OFF(nix));
-#endif
-	win_map( NIX2OFF( nix ), ( void ** )&p );
-	if (p == NULL)
-	    return NH_NULL;
-#ifdef NODECHK
-	hkpp = p + node_hdrp->nh_nodehkix;
-	unq = HKPGETUNQ( *hkpp );
-	ASSERT( unq != NODEUNQALCD );
-	ASSERT( unq == NODEUNQFREE );
-#endif /* NODECHK */
-	linkagep = ( nix_t * )p;
-	node_hdrp->nh_freenix = *linkagep;
-
-	/* clean the node
-	 */
-	memset( ( void * )p, 0, node_hdrp->nh_nodesz );
+		nix = OFF2NIX( node_hdrp->nh_virgsegreloff )
+			+
+			node_hdrp->nh_virgrelnix++;
+	}
 
 	/* build a handle for node
 	 */
 	ASSERT( nix <= NIX_MAX );
 #ifdef NODECHK
+	win_map( NIX2OFF( nix ), ( void ** )&p );
+	if (p == NULL)
+		abort();
 	hkpp = p + ( int )node_hdrp->nh_nodehkix;
-	gen = ( u_char_t )( HKPGETGEN( *p ) + ( u_char_t )1 );
 	nh = HDLMKHDL( gen, nix );
 	*hkpp = HKPMKHKP( gen, NODEUNQALCD );
+	win_unmap( NIX2OFF( nix ), ( void ** )&p );
 #else /* NODECHK */
 	nh = ( nh_t )nix;
 #endif /* NODECHK */
 
-	/* unmap window
-	 */
-#ifdef TREE_DEBUG
-	mlog(MLOG_DEBUG | MLOG_TREE,
-	   "node_alloc(): win_unmap(%llu)\n", NIX2OFF(nix));
-#endif
-	win_unmap( NIX2OFF( nix ), ( void ** )&p );
-
 	return nh;
 }
 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v3 6/9] xfsrestore: fix node table setup
  2010-11-16 15:05 [PATCH v3 0/9] xfsrestore dirent limitations and scaling issues wkendall
                   ` (4 preceding siblings ...)
  2010-11-16 15:05 ` [PATCH v3 5/9] xfsrestore: cleanup node allocation wkendall
@ 2010-11-16 15:05 ` wkendall
  2010-11-16 15:05 ` [PATCH v3 7/9] xfsrestore: make node lookup more efficient wkendall
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 21+ messages in thread
From: wkendall @ 2010-11-16 15:05 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: winmap_max_fix --]
[-- Type: text/plain, Size: 10356 bytes --]

The node table setup code unintentionally limits the amount of virtual
memory that will be used for the node table. Rather than setting a
limit based on the remaining virtual memory, the node table is limited
to the amount of memory it thinks it will need based on the dump's
inode count. But the node table stores directory entries, not inodes,
and so dumps with lots of hard links end up thrashing mapping and
unmapping segments of the node table to stay within the self-imposed
virtual memory limit.

This patch also changes the node table to ensure that there are a
power of two nodes per segment. Profiling revealed that while building
the node table, 50% of the restore cpu time was spent doing division
and modulo to convert a node index into its segment index and offset
within the segment. This change prepares the node table for another
patch which will change the lookup mechanism to use shift and
bitwise-and.

Also don't bother passing the estimated segment table size to the
window abstraction. It has to deal with resizing the table anyway and
can choose a reasonable starting size.


Signed-off-by: Bill Kendall <wkendall@sgi.com>

Reviewed-by: Alex Elder <aelder@sgi.com>

---
 restore/node.c |  125 ++++++++++++++++++++++++++++++++++-----------------------
 restore/win.c  |    7 +--
 restore/win.h  |    3 -
 3 files changed, 80 insertions(+), 55 deletions(-)

Index: xfsdump-kernel.org/restore/node.c
===================================================================
--- xfsdump-kernel.org.orig/restore/node.c
+++ xfsdump-kernel.org/restore/node.c
@@ -97,19 +97,14 @@ extern size_t pgmask;
 
 #else /* NODECHK */
 
-#define NIX_MAX			( ( ( off64_t )1			\
-				    <<					\
-				    ( ( off64_t )NBBY			\
-				      *					\
-				      ( off64_t )sizeof( nh_t )))	\
-				  -					\
-				  ( off64_t )2 ) /* 2 to avoid NH_NULL */
+#define NIX_MAX			( NH_NULL - 1 )
 
 #endif /* NODECHK */
 
 /* window constraints
  */
-#define NODESPERSEGMIN	1000000
+#define NODESPERSEG_MIN	1048576
+#define WINMAP_MIN	4
 
 /* a node is identified internally by its index into the backing store.
  * this index is the offset of the node into the segmented portion of
@@ -135,17 +130,14 @@ struct node_hdr {
 		/* index of byte in each node the user has reserved
 		 * for use by me
 		 */
-	size_t nh_nodesperseg;
+	nh_t nh_nodesperseg;
 		/* an integral number of internal nodes must fit into a
 		 * segment
 		 */
-	size_t nh_segsz;
+	size64_t nh_segsz;
 		/* the backing store is partitoned into segment, which
 		 * can be mapped into VM windows  by the win abstraction
 		 */
-	size64_t nh_segtblsz;
-		/* the estimated size of the entire segment table.
-		 */
 	size_t nh_winmapmax;
 		/* maximum number of windows which can be mapped
 		 */
@@ -191,13 +183,13 @@ node_init( intgen_t fd,
 	   size64_t vmsz,
 	   size64_t dirs_nondirs_cnt )
 {
-	size64_t nodesz;
-	size64_t winmap_mem;
+	size_t nodesz;
 	size64_t segsz;
-	size64_t segtablesz;
-	size64_t nodesperseg;
-	size64_t minsegsz;
-	size64_t winmapmax;
+	nh_t nodesperseg;
+	size_t max_segments;
+	size_t winmapmax;
+	size_t segcount;
+	intgen_t segixshift;
 	intgen_t rval;
 
 	/* sanity checks
@@ -218,36 +210,76 @@ node_init( intgen_t fd,
 	*/
 	nodesz = ( usrnodesz + nodealignsz - 1 ) & ~( nodealignsz - 1 );
 
-#define	WINMAP_MAX	20	/* maximum number of windows to use */
-#define	WINMAP_MIN	4	/* minimum number of windows to use */
-#define	HARDLINK_FUDGE	1.2	/* approx 1.2 hard links per file */
-
-	/* Calculate the expected size of the segment table using the number
-	 * of dirs and non-dirs.  Since we don't know how many hard-links
-	 * there will be, scale the size upward using HARDLINK_FUDGE.
+	/* Calculate the node table params based on the number of inodes in the
+	 * dump, since that's all we know. Ideally we'd base this on the number
+	 * of dirents in the dump instead as there's a node per dirent.
+	 *
+	 * Due to virtual memory constraints and the fact that we don't know
+	 * the final node table size, we can't guarantee that the entire node
+	 * table can be mapped at once. Instead segments will be mapped using a
+	 * window abstraction. Some operations require WINMAP_MIN nodes to be
+	 * referenced at a time, therefore we must ensure that this many
+	 * segments fit in virtual memory.
+	 *
+	 * nodesperseg must be a power of two. Earlier versions did not enforce
+	 * this, and profiling showed that nearly 50% of cpu time during the
+	 * node table construction was consumed doing division and modulo to
+	 * convert an nh_t to a segment index and node offset.  By making
+	 * nodesperseg a power of two we can use shift and bitwise-and instead.
+	 *
+	 * Each segment must hold an integral number of nodes and be an integral
+	 * number of pages. #1 ensures this except when nodesperseg is small, so
+	 * the smallest allowed segsz is pgsz * nodesz (i.e., nodesperseg ==
+	 * pgsz). However this is of no consequence as we enforce a larger
+	 * minimum nodesperseg (NODESPERSEG_MIN) anyway in order to place a
+	 * reasonable cap on the max number of segments.
 	 */
 
-	segtablesz = ( (size64_t)(HARDLINK_FUDGE * (double)dirs_nondirs_cnt) * nodesz);
+	ASSERT( NODESPERSEG_MIN >= pgsz );
+
+	if ( vmsz < WINMAP_MIN * NODESPERSEG_MIN * nodesz ) {
+		mlog( MLOG_NORMAL | MLOG_ERROR, _(
+		  "not enough virtual memory for node abstraction: "
+		  "remaining-vsmz=%llu need=%llu\n"),
+		  vmsz, WINMAP_MIN * NODESPERSEG_MIN * nodesz );
+		return BOOL_FALSE;
+	}
 
-	/* Figure out how much memory is available for use by winmaps, and
-	 * use that to pick an appropriate winmapmax, segsz, and nodesperseg,
-	 * the goal being that if at all possible we want the entire segment
-	 * table to be mapped so that we aren't constantly mapping and
-	 * unmapping winmaps.  There must be at least WINMAP_MIN winmaps
-	 * because references can be held on more than one winmap at the
-	 * same time.  More winmaps are generally better to reduce the
-	 * number of nodes that are unmapped if unmapping does occur.
+	/* This is checked as nodes are allocated as well (remember that
+	 * dirs_nondirs_cnt may be less than the number of nodes/dirents).
+	 * Checking this here prevents potential overflow in the logic below.
 	 */
+	if ( dirs_nondirs_cnt > NIX_MAX ) {
+		mlog( MLOG_NORMAL | MLOG_ERROR, _(
+		  "dump contains %llu inodes, restore can only handle %llu\n"),
+		  dirs_nondirs_cnt, NIX_MAX );
+		return BOOL_FALSE;
+	}
+
+	for ( winmapmax = 0, segcount = 1; winmapmax < WINMAP_MIN; segcount <<= 1 ) {
+
+		nodesperseg = max( dirs_nondirs_cnt / segcount, NODESPERSEG_MIN );
+
+		/* nodesperseg must be a power of 2 */
+		for ( segixshift = 0;
+		      ( 1ULL << segixshift ) < nodesperseg;
+		      segixshift++ );
 
-	minsegsz = pgsz * nodesz;	/* must be pgsz and nodesz multiple */
-	winmap_mem = min(vmsz, segtablesz);
-	segsz = (((winmap_mem / WINMAP_MAX) + minsegsz - 1) / minsegsz) * minsegsz;
-	segsz = max(segsz, minsegsz);
+		/* rounding up to a power of 2 may have caused overflow */
+		if ( ( 1ULL << segixshift ) > NIX_MAX )
+			segixshift--;
 
-	nodesperseg = segsz / nodesz;
+		nodesperseg = 1UL << segixshift;
 
-	winmapmax = min(WINMAP_MAX, vmsz / segsz);
-	winmapmax = max(winmapmax, WINMAP_MIN);
+		max_segments = 1UL << ( NBBY * sizeof(nh_t) - segixshift );
+
+		segsz = nodesperseg * nodesz;
+
+		/* max number of segments that will fit in virtual memory,
+		 * capped at the max possible number of segments
+		 */
+		winmapmax = min( vmsz / segsz, max_segments );
+	}
 
 	/* map the abstraction header
 	 */
@@ -272,7 +304,6 @@ node_init( intgen_t fd,
 	node_hdrp->nh_nodesz = nodesz;
 	node_hdrp->nh_nodehkix = nodehkix;
 	node_hdrp->nh_segsz = segsz;
-	node_hdrp->nh_segtblsz = segtablesz;
 	node_hdrp->nh_winmapmax = winmapmax;
 	node_hdrp->nh_nodesperseg = nodesperseg;
 	node_hdrp->nh_nodealignsz = nodealignsz;
@@ -309,7 +340,6 @@ node_init( intgen_t fd,
 	win_init( fd,
 		  node_hdrp->nh_firstsegoff,
 		  segsz,
-		  segtablesz,
 		  winmapmax );
 	
 	/* announce the results
@@ -317,14 +347,12 @@ node_init( intgen_t fd,
 	mlog( MLOG_DEBUG | MLOG_TREE,
 	      "node_init:"
 	      " vmsz = %llu (0x%llx)"
-	      " segsz = %u (0x%x)"
-	      " segtblsz = %llu (0x%llx)"
+	      " segsz = %llu (0x%llx)"
 	      " nodesperseg = %u (0x%x)"
-	      " winmapmax = %llu (0x%llx)"
+	      " winmapmax = %lu (0x%lx)"
 	      "\n",
 	      vmsz, vmsz,
 	      segsz, segsz,
-	      segtablesz, segtablesz,
 	      nodesperseg, nodesperseg,
 	      winmapmax, winmapmax );
 
@@ -364,7 +392,6 @@ node_sync( intgen_t fd, off64_t off )
 	win_init( fd,
 		  node_hdrp->nh_firstsegoff,
 		  node_hdrp->nh_segsz,
-		  node_hdrp->nh_segtblsz,
 		  node_hdrp->nh_winmapmax );
 
 	return BOOL_TRUE;
Index: xfsdump-kernel.org/restore/win.c
===================================================================
--- xfsdump-kernel.org.orig/restore/win.c
+++ xfsdump-kernel.org/restore/win.c
@@ -80,7 +80,7 @@ struct tran {
 	off64_t t_firstoff;
 		/* offset of first seg within backing store (for mmap( ))
 		 */
-	size_t t_segsz;
+	size64_t t_segsz;
 		/* backing store segment / window size
 		 */
 	size_t t_winmax;
@@ -147,8 +147,7 @@ win_getnum_mmaps(void)
 void
 win_init( intgen_t fd,
 	  off64_t firstoff,
-	  size_t segsz,
-	  size64_t segtablesz,
+	  size64_t segsz,
 	  size_t winmax )
 {
 	/* validate parameters
@@ -167,7 +166,7 @@ win_init( intgen_t fd,
 	tranp->t_segsz = segsz;
 	tranp->t_winmax = winmax;
 
-	tranp->t_segmaplen = (size_t)(segtablesz / segsz) + 1;
+	tranp->t_segmaplen = SEGMAP_INCR;
 	tranp->t_segmap = (win_t **)
 	calloc( tranp->t_segmaplen, sizeof(win_t *) );
 	ASSERT( tranp->t_segmap );
Index: xfsdump-kernel.org/restore/win.h
===================================================================
--- xfsdump-kernel.org.orig/restore/win.h
+++ xfsdump-kernel.org/restore/win.h
@@ -25,8 +25,7 @@
  */
 void win_init( intgen_t fd,
 	       off64_t rngoff,		/* offset into file of windowing */
-	       size_t winsz,		/* window size */
-	       size64_t segtablesz,     /* estimate of segment table size */
+	       size64_t winsz,		/* window size */
 	       size_t wincntmax );	/* max number of windows to manage */
 
 /* supply a pointer to the portion of the file identified by off.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v3 7/9] xfsrestore: make node lookup more efficient
  2010-11-16 15:05 [PATCH v3 0/9] xfsrestore dirent limitations and scaling issues wkendall
                   ` (5 preceding siblings ...)
  2010-11-16 15:05 ` [PATCH v3 6/9] xfsrestore: fix node table setup wkendall
@ 2010-11-16 15:05 ` wkendall
  2010-11-16 19:20   ` Alex Elder
  2010-11-16 15:05 ` [PATCH v3 8/9] xfsrestore: remove nix_t wkendall
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 21+ messages in thread
From: wkendall @ 2010-11-16 15:05 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: window_lookup_performance --]
[-- Type: text/plain, Size: 13894 bytes --]

When converting an nh_t to a segment index and relative node index,
use shift and bitwise-and instead of division and modulo. Results show
this is about 50% faster than the current method.


Signed-off-by: Bill Kendall <wkendall@sgi.com>

---
 restore/node.c |  229 +++++++++++++++++++++++++++------------------------------
 restore/win.c  |   43 +++-------
 restore/win.h  |    8 +
 3 files changed, 130 insertions(+), 150 deletions(-)

Index: xfsdump-kernel.org/restore/node.c
===================================================================
--- xfsdump-kernel.org.orig/restore/node.c
+++ xfsdump-kernel.org/restore/node.c
@@ -115,13 +115,13 @@ extern size_t pgmask;
  */
 typedef off64_t nix_t;
 #define NIX_NULL OFF64MAX
-#define NIX2OFF( nix )	( nix * ( nix_t )node_hdrp->nh_nodesz )
-#define OFF2NIX( noff )	( noff / ( nix_t )node_hdrp->nh_nodesz )
 
 /* reserve the firstpage for a header to save persistent context
  */
 #define NODE_HDRSZ	pgsz
 
+typedef intgen_t relnix_t;
+
 struct node_hdr {
 	size_t nh_nodesz;
 		/* internal node size - user may see something smaller
@@ -151,20 +151,15 @@ struct node_hdr {
 	off64_t nh_firstsegoff;
 		/* offset into backing store of the first segment
 		 */
-	off64_t nh_virgsegreloff;
-		/* offset (relative to beginning of first segment) into
-		 * backing store of segment containing one or
-		 * more virgin nodes. relative to beginning of segmented
-		 * portion of backing store. bumped only when all of the
-		 * nodes in the segment have been placed on the free list.
-		 * when bumped, nh_virginrelnix is simultaneously set back
-		 * to zero.
-		 */
-	nix_t nh_virgrelnix;
-		/* relative node index within the segment identified by
-		 * nh_virgsegreloff of the next node not yet placed on the
-		 * free list. never reaches nh_nodesperseg: instead set
-		 * to zero and bump nh_virgsegreloff by one segment.
+	nh_t nh_virgnix;
+		/* node handle of next virgin node
+		 */
+	intgen_t nh_segixshift;
+		/* bitshift used to extract the segment index from an nh_t
+		 */
+	relnix_t nh_relnixmask;
+		/* bitmask used to extract the node index from an nh_t
+		 * (relative to the start of a segment)
 		 */
 };
 
@@ -173,6 +168,76 @@ typedef struct node_hdr node_hdr_t;
 static node_hdr_t *node_hdrp;
 static intgen_t node_fd;
 
+static inline segix_t
+nh2segix( nh_t nh )
+{
+	return nh >> node_hdrp->nh_segixshift;
+}
+
+static inline relnix_t
+nh2relnix( nh_t nh )
+{
+	return nh & node_hdrp->nh_relnixmask;
+}
+
+static inline void
+node_map_internal( nh_t nh, void **pp )
+{
+	win_map( nh2segix( nh ), pp );
+	if ( *pp != NULL ) {
+		relnix_t relnix = nh2relnix( nh );
+		*pp = ( void * )( ( char * )( *pp ) +
+				( ( off64_t )relnix *
+				  node_hdrp->nh_nodesz ) );
+	}
+}
+
+/* ARGSUSED */
+static inline void
+node_unmap_internal( nh_t nh, void **pp, bool_t freepr )
+{
+	nix_t nix;
+#ifdef NODECHK
+	register u_char_t hkp;
+	register u_char_t hdlgen;
+	register u_char_t nodegen;
+	register u_char_t nodeunq;
+#endif /* NODECHK */
+
+	ASSERT( pp );
+	ASSERT( *pp );
+	ASSERT( nh != NH_NULL );
+
+	/* convert the handle into an index
+	 */
+#ifdef NODECHK
+	hdlgen = HDLGETGEN( nh );
+	nix = HDLGETNIX( nh );
+#else /* NODECHK */
+	nix = ( nix_t )nh;
+#endif /* NODECHK */
+
+	ASSERT( nix <= NIX_MAX );
+
+#ifdef NODECHK
+	hkp = *( *( u_char_t ** )pp + node_hdrp->nh_nodehkix );
+	nodegen = HKPGETGEN( hkp );
+	ASSERT( nodegen == hdlgen );
+	nodeunq = HKPGETUNQ( hkp );
+	if ( ! freepr ) {
+		ASSERT( nodeunq != NODEUNQFREE );
+		ASSERT( nodeunq == NODEUNQALCD );
+	} else {
+		ASSERT( nodeunq != NODEUNQALCD );
+		ASSERT( nodeunq == NODEUNQFREE );
+	}
+#endif /* NODECHK */
+
+	/* unmap the window containing the node
+	 */
+	win_unmap( nh2segix( nix ), pp ); /* zeros *pp */
+}
+
 /* ARGSUSED */
 bool_t
 node_init( intgen_t fd,
@@ -190,12 +255,13 @@ node_init( intgen_t fd,
 	size_t winmapmax;
 	size_t segcount;
 	intgen_t segixshift;
-	intgen_t rval;
 
 	/* sanity checks
 	 */
 	ASSERT( sizeof( node_hdr_t ) <= NODE_HDRSZ );
 	ASSERT( sizeof( nh_t ) < sizeof( off64_t ));
+	ASSERT( sizeof( nh_t ) <= sizeof( segix_t ));
+	ASSERT( sizeof( nh_t ) <= sizeof( relnix_t ));
 	ASSERT( nodehkix < usrnodesz );
 	ASSERT( usrnodesz >= sizeof( char * ) + 1 );
 		/* so node is at least big enough to hold
@@ -309,32 +375,14 @@ node_init( intgen_t fd,
 	node_hdrp->nh_nodealignsz = nodealignsz;
 	node_hdrp->nh_freenix = NIX_NULL;
 	node_hdrp->nh_firstsegoff = off + ( off64_t )NODE_HDRSZ;
-	node_hdrp->nh_virgsegreloff = 0;
-	node_hdrp->nh_virgrelnix = 0;
+	node_hdrp->nh_virgnix = 0;
+	node_hdrp->nh_segixshift = segixshift;
+	node_hdrp->nh_relnixmask = nodesperseg - 1;
 
 	/* save transient context
 	 */
 	node_fd = fd;
 
-	/* autogrow the first segment
-	 */
-	mlog( MLOG_DEBUG,
-	      "pre-growing new node array segment at %lld "
-	      "size %lld\n",
-	      node_hdrp->nh_firstsegoff,
-	      ( off64_t )node_hdrp->nh_segsz );
-	rval = ftruncate64( node_fd,
-			    node_hdrp->nh_firstsegoff
-			    +
-			    ( off64_t )node_hdrp->nh_segsz );
-	if ( rval ) {
-		mlog( MLOG_NORMAL | MLOG_ERROR | MLOG_TREE, _(
-		      "unable to autogrow first node segment: %s (%d)\n"),
-		      strerror( errno ),
-		      errno );
-		return BOOL_FALSE;
-	}
-
 	/* initialize the window abstraction
 	 */
 	win_init( fd,
@@ -415,12 +463,10 @@ node_alloc( void )
 	 */
 	if ( node_hdrp->nh_freenix != NIX_NULL ) {
 		nix_t *linkagep;
-		off64_t off;
 
 		nix = node_hdrp->nh_freenix;
 
-		off = NIX2OFF( nix );
-		win_map( off, ( void ** )&p );
+		node_map_internal( nix, ( void ** )&p );
 		if (p == NULL)
 			return NH_NULL;
 #ifdef NODECHK
@@ -435,66 +481,59 @@ node_alloc( void )
 		linkagep = ( nix_t * )p;
 		node_hdrp->nh_freenix = *linkagep;
 
-		win_unmap( off, ( void ** )&p );
+		node_unmap_internal( nix, ( void ** )&p, BOOL_TRUE );
 
 	} else {
-		if ( node_hdrp->nh_virgrelnix
-		     >=
-		     ( nix_t )node_hdrp->nh_nodesperseg ) {
+		if ( nh2relnix( node_hdrp->nh_virgnix ) == 0 ) {
+			/* need to start a new virgin segment */
 			intgen_t rval;
-			ASSERT( node_hdrp->nh_virgrelnix
-				==
-				( nix_t )node_hdrp->nh_nodesperseg );
-			ASSERT( node_hdrp->nh_virgsegreloff
+			off64_t new_seg_off =
+				node_hdrp->nh_firstsegoff +
+				( off64_t )nh2segix( node_hdrp->nh_virgnix ) *
+				( off64_t )node_hdrp->nh_segsz;
+
+			ASSERT( new_seg_off
 				<=
 				OFF64MAX - ( off64_t )node_hdrp->nh_segsz );
-#ifdef TREE_DEBUG
-			mlog(MLOG_DEBUG | MLOG_TREE,
-			    "node_alloc(): runout of nodes for freelist in "
-                            "this segment - nodes used = %lld\n", 
-                            node_hdrp->nh_virgrelnix);
-#endif
-			node_hdrp->nh_virgsegreloff +=
-					( off64_t )node_hdrp->nh_segsz;
-			node_hdrp->nh_virgrelnix = 0;
 			mlog( MLOG_DEBUG,
 			      "pre-growing new node array segment at %lld "
 			      "size %lld\n",
-			      node_hdrp->nh_firstsegoff +
-			      node_hdrp->nh_virgsegreloff,
+			      new_seg_off,
 			      ( off64_t )node_hdrp->nh_segsz );
 			rval = ftruncate64( node_fd,
-					    node_hdrp->nh_firstsegoff
-					    +
-					    node_hdrp->nh_virgsegreloff
+					    new_seg_off
 					    +
 					    ( off64_t )node_hdrp->nh_segsz );
 			if ( rval ) {
 				mlog( MLOG_NORMAL | MLOG_WARNING | MLOG_TREE, _(
-				      "unable to autogrow node segment %llu: "
+				      "unable to autogrow node segment %u: "
 				      "%s (%d)\n"),
-				      OFF2NIX(node_hdrp->nh_virgsegreloff),
+				      nh2segix( node_hdrp->nh_virgnix ),
 				      strerror( errno ),
 				      errno );
 			}
 		}
 
-		nix = OFF2NIX( node_hdrp->nh_virgsegreloff )
-			+
-			node_hdrp->nh_virgrelnix++;
+		nix = node_hdrp->nh_virgnix++;
 	}
 
 	/* build a handle for node
 	 */
-	ASSERT( nix <= NIX_MAX );
+	if ( nix > NIX_MAX ) {
+		mlog( MLOG_NORMAL | MLOG_ERROR, _(
+		  "dump contains too many dirents, "
+		  "restore can only handle %llu\n"),
+		  NIX_MAX );
+		return NH_NULL;
+	}
 #ifdef NODECHK
-	win_map( NIX2OFF( nix ), ( void ** )&p );
+	node_map_internal( nix , ( void ** )&p );
 	if (p == NULL)
 		abort();
 	hkpp = p + ( int )node_hdrp->nh_nodehkix;
 	nh = HDLMKHDL( gen, nix );
 	*hkpp = HKPMKHKP( gen, NODEUNQALCD );
-	win_unmap( NIX2OFF( nix ), ( void ** )&p );
+	node_unmap_internal( nix, ( void ** )&p, BOOL_FALSE );
 #else /* NODECHK */
 	nh = ( nh_t )nix;
 #endif /* NODECHK */
@@ -530,7 +569,7 @@ node_map( nh_t nh )
 	/* map in
 	 */
 	p = 0; /* keep lint happy */
-	win_map( NIX2OFF( nix ), ( void ** )&p );
+	node_map_internal( nix, ( void ** )&p );
 	if (p == NULL)
 	    return NULL;
 
@@ -546,52 +585,6 @@ node_map( nh_t nh )
 	return ( void * )p;
 }
 
-/* ARGSUSED */
-static void
-node_unmap_internal( nh_t nh, void **pp, bool_t internalpr )
-{
-	nix_t nix;
-#ifdef NODECHK
-	register u_char_t hkp;
-	register u_char_t hdlgen;
-	register u_char_t nodegen;
-	register u_char_t nodeunq;
-#endif /* NODECHK */
-
-	ASSERT( pp );
-	ASSERT( *pp );
-	ASSERT( nh != NH_NULL );
-
-	/* convert the handle into an index
-	 */
-#ifdef NODECHK
-	hdlgen = HDLGETGEN( nh );
-	nix = HDLGETNIX( nh );
-#else /* NODECHK */
-	nix = ( nix_t )nh;
-#endif /* NODECHK */
-
-	ASSERT( nix <= NIX_MAX );
-
-#ifdef NODECHK
-	hkp = *( *( u_char_t ** )pp + node_hdrp->nh_nodehkix );
-	nodegen = HKPGETGEN( hkp );
-	ASSERT( nodegen == hdlgen );
-	nodeunq = HKPGETUNQ( hkp );
-	if ( ! internalpr ) {
-		ASSERT( nodeunq != NODEUNQFREE );
-		ASSERT( nodeunq == NODEUNQALCD );
-	} else {
-		ASSERT( nodeunq != NODEUNQALCD );
-		ASSERT( nodeunq == NODEUNQFREE );
-	}
-#endif /* NODECHK */
-
-	/* unmap the window containing the node
-	 */
-	win_unmap( NIX2OFF( nix ), pp ); /* zeros *pp */
-}
-
 void
 node_unmap( nh_t nh, void **pp )
 {
Index: xfsdump-kernel.org/restore/win.c
===================================================================
--- xfsdump-kernel.org.orig/restore/win.c
+++ xfsdump-kernel.org/restore/win.c
@@ -30,6 +30,7 @@
 #include "mlog.h"
 #include "qlock.h"
 #include "mmap.h"
+#include "win.h"
 
 extern size_t pgsz;
 extern size_t pgmask;
@@ -48,7 +49,7 @@ extern size_t pgmask;
 /* window descriptor
  */
 struct win {
-	size_t w_segix;
+	segix_t w_segix;
 		/* index of segment mapped by this window
 		 */
 	void *w_p;
@@ -69,7 +70,7 @@ typedef struct win win_t;
 
 /* forward declarations
  */
-static void win_segmap_resize( size_t segix );
+static void win_segmap_resize( segix_t segix );
 
 /* transient state
  */
@@ -177,31 +178,16 @@ win_init( intgen_t fd,
 }
 
 void
-win_map( off64_t off, void **pp )
+win_map( segix_t segix, void **pp )
 {
-	size_t offwithinseg;
-	size_t segix;
 	off64_t segoff;
 	win_t *winp;
 
 	CRITICAL_BEGIN();
 
-	/* calculate offset within segment
-	 */
-	offwithinseg = ( size_t )( off % ( off64_t )tranp->t_segsz );
-
-	/* calculate segment index
-	 */
-	segix = (size_t)( off / ( off64_t )tranp->t_segsz );
-
-	/* calculate offset of segment
-	 */
-	segoff = off - ( off64_t )offwithinseg;
-
 #ifdef TREE_DEBUG
 	mlog(MLOG_DEBUG | MLOG_TREE | MLOG_NOLOCK,
-	     "win_map(off=%lld,addr=%x): off within = %llu, segoff = %lld\n",
-	      off, pp, offwithinseg, segoff);
+	     "win_map(segix=%u,addr=%p)\n", segix, pp);
 #endif
 	/* resize the array if necessary */
 	if ( segix >= tranp->t_segmaplen )
@@ -243,7 +229,7 @@ win_map( off64_t off, void **pp )
 			ASSERT( ! winp->w_nextp );
 		}
 		winp->w_refcnt++;
-		*pp = ( void * )( ( char * )( winp->w_p ) + offwithinseg );
+		*pp = winp->w_p;
 		CRITICAL_END();
 		return;
 	}
@@ -287,6 +273,10 @@ win_map( off64_t off, void **pp )
 		return;
 	}
 
+	/* calculate offset of segment
+	 */
+	segoff = segix * ( off64_t )tranp->t_segsz;
+
 	/* map the window
 	 */
 	ASSERT( tranp->t_segsz >= 1 );
@@ -320,7 +310,7 @@ win_map( off64_t off, void **pp )
 		if (error == ENOMEM && tranp->t_lruheadp) {
 			mlog( MLOG_NORMAL | MLOG_ERROR,
 		      		_("win_map(): try to select a different win_t\n"));
-			win_map(off, pp);
+			win_map(segix, pp);
 			return;
 		}
 		*pp = NULL;
@@ -331,23 +321,18 @@ win_map( off64_t off, void **pp )
 	winp->w_refcnt++;
 	tranp->t_segmap[winp->w_segix] = winp;
 
-	*pp = ( void * )( ( char * )( winp->w_p ) + offwithinseg );
+	*pp = winp->w_p;
 
 	CRITICAL_END();
 }
 
 void
-win_unmap( off64_t off, void **pp )
+win_unmap( segix_t segix, void **pp )
 {
-	size_t segix;
 	win_t *winp;
 
 	CRITICAL_BEGIN();
 
-	/* calculate segment index
-	 */
-	segix = (size_t)( off / ( off64_t )tranp->t_segsz );
-
 	/* verify window mapped
 	 */
 	ASSERT( segix < tranp->t_segmaplen );
@@ -390,7 +375,7 @@ win_unmap( off64_t off, void **pp )
 }
 
 static void
-win_segmap_resize(size_t segix)
+win_segmap_resize(segix_t segix)
 {
 	size_t oldlen;
 	win_t **new_part;
Index: xfsdump-kernel.org/restore/win.h
===================================================================
--- xfsdump-kernel.org.orig/restore/win.h
+++ xfsdump-kernel.org/restore/win.h
@@ -21,6 +21,8 @@
 /* win.[ch] - windows into a very large file
  */
 
+typedef intgen_t segix_t;
+
 /* initialize the window abstraction
  */
 void win_init( intgen_t fd,
@@ -28,15 +30,15 @@ void win_init( intgen_t fd,
 	       size64_t winsz,		/* window size */
 	       size_t wincntmax );	/* max number of windows to manage */
 
-/* supply a pointer to the portion of the file identified by off.
+/* supply a pointer to the portion of the file identified by segix.
  */
-void win_map( off64_t off,		/* file offset relative to rngoff */
+void win_map( segix_t segix,		/* segment index to be mapped */
 	      void **pp );		/* returns pointer by reference */
 
 /* invalidate the pointer previously supplied. SIDE-EFFECT: zeros
  * by reference the caller's pointer.
  */
-void win_unmap( off64_t off,		/* must match win_map param */
+void win_unmap( segix_t segix,		/* must match win_map param */
 	        void **pp );		/* ptr generated by win_map: ZEROED */
 
 /*

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v3 8/9] xfsrestore: remove nix_t
  2010-11-16 15:05 [PATCH v3 0/9] xfsrestore dirent limitations and scaling issues wkendall
                   ` (6 preceding siblings ...)
  2010-11-16 15:05 ` [PATCH v3 7/9] xfsrestore: make node lookup more efficient wkendall
@ 2010-11-16 15:05 ` wkendall
  2010-11-17  9:37   ` Christoph Hellwig
  2010-11-16 15:05 ` [PATCH v3 9/9] xfsrestore: check for compatible xfsrestore wkendall
  2010-11-16 19:21 ` [PATCH v3 0/9] xfsrestore dirent limitations and scaling issues Alex Elder
  9 siblings, 1 reply; 21+ messages in thread
From: wkendall @ 2010-11-16 15:05 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: nuke_nix --]
[-- Type: text/plain, Size: 9466 bytes --]

The nix_t type is a 64-bit version of a (32-bit) nh_t, created "to
facilitate conversion to a 64-bit offset." At this point there's only
one case where an nh_t is converted to an off64_t, so just do an
explicit cast in that case and remove the nix_t.

Signed-off-by: Bill Kendall <wkendall@sgi.com>

Reviewed-by: Alex Elder <aelder@sgi.com>

---
 restore/node.c |  112 ++++++++++++++++++++++-----------------------------------
 1 file changed, 44 insertions(+), 68 deletions(-)

Index: xfsdump-kernel.org/restore/node.c
===================================================================
--- xfsdump-kernel.org.orig/restore/node.c
+++ xfsdump-kernel.org/restore/node.c
@@ -50,22 +50,21 @@ extern size_t pgmask;
 #define	HDLGENSHIFT		( NBBY * sizeof ( nh_t ) - HDLGENCNT )
 #define	HDLGENLOMASK		( ( 1 << HDLGENCNT ) - 1 )
 #define	HDLGENMASK		( HDLGENLOMASK << HDLGENSHIFT )
-#define HDLNIXCNT		HDLGENSHIFT
-#define HDLNIXMASK		( ( 1 << HDLNIXCNT ) - 1 )
+#define HDLMASK			( ( 1 << HDLGENSHIFT ) - 1 )
 #define HDLGETGEN( h )		( ( u_char_t )				\
 				  ( ( ( int )h >> HDLGENSHIFT )		\
 				    &					\
 				    HDLGENLOMASK ))
-#define HDLGETNIX( h )		( ( nix_t )( ( int )h & HDLNIXMASK ))
+#define HDLGETNHDL( h )		( ( nh_t )( ( int )h & HDLMASK ))
 #define HDLMKHDL( g, n )	( ( nh_t )( ( ( ( int )g << HDLGENSHIFT )\
 					      &				\
 					      HDLGENMASK )		\
 					  |				\
-					  ( ( int )n & HDLNIXMASK )))
-#define NIX_MAX			( ( off64_t )HDLNIXMASK )
+					  ( ( int )n & HDLMASK )))
+#define NH_MAX			( HDLMASK )
 
 /* the housekeeping byte of each node will hold two check fields:
- * a gen count, initialized to the node ix and incremented each time a node
+ * a gen count, initialized to 0 and incremented each time a node
  * is allocated, to catch re-use of stale handles; and unique pattern, to
  * differentiate a valid node from random memory. two unique patterns will
  * be used; one when the node is on the free list, another when it is
@@ -97,7 +96,7 @@ extern size_t pgmask;
 
 #else /* NODECHK */
 
-#define NIX_MAX			( NH_NULL - 1 )
+#define NH_MAX			( NH_NULL - 1 )
 
 #endif /* NODECHK */
 
@@ -106,16 +105,6 @@ extern size_t pgmask;
 #define NODESPERSEG_MIN	1048576
 #define WINMAP_MIN	4
 
-/* a node is identified internally by its index into the backing store.
- * this index is the offset of the node into the segmented portion of
- * backing store (follows the abstraction header page) divided by the
- * size of a node. a special index is reserved to represent the null
- * index. a type is defined for node index (nix_t). it is a 64 bit
- * unsigned to facilitate conversion from index to 64 bit offset.
- */
-typedef off64_t nix_t;
-#define NIX_NULL OFF64MAX
-
 /* reserve the firstpage for a header to save persistent context
  */
 #define NODE_HDRSZ	pgsz
@@ -144,15 +133,14 @@ struct node_hdr {
 	size_t nh_nodealignsz;
 		/* user's constraint on node alignment
 		 */
-	nix_t nh_freenix;
-		/* index into backing store of first node of singly-linked
-		 * list of free nodes
+	nh_t nh_freenh;
+		/* handle of first node of singly-linked list of free nodes
 		 */
 	off64_t nh_firstsegoff;
 		/* offset into backing store of the first segment
 		 */
-	nh_t nh_virgnix;
-		/* node handle of next virgin node
+	nh_t nh_virgnh;
+		/* handle of next virgin node
 		 */
 	intgen_t nh_segixshift;
 		/* bitshift used to extract the segment index from an nh_t
@@ -196,7 +184,6 @@ node_map_internal( nh_t nh, void **pp )
 static inline void
 node_unmap_internal( nh_t nh, void **pp, bool_t freepr )
 {
-	nix_t nix;
 #ifdef NODECHK
 	register u_char_t hkp;
 	register u_char_t hdlgen;
@@ -212,12 +199,10 @@ node_unmap_internal( nh_t nh, void **pp,
 	 */
 #ifdef NODECHK
 	hdlgen = HDLGETGEN( nh );
-	nix = HDLGETNIX( nh );
-#else /* NODECHK */
-	nix = ( nix_t )nh;
+	nh = HDLGETNHDL( nh );
 #endif /* NODECHK */
 
-	ASSERT( nix <= NIX_MAX );
+	ASSERT( nh <= NH_MAX );
 
 #ifdef NODECHK
 	hkp = *( *( u_char_t ** )pp + node_hdrp->nh_nodehkix );
@@ -235,7 +220,7 @@ node_unmap_internal( nh_t nh, void **pp,
 
 	/* unmap the window containing the node
 	 */
-	win_unmap( nh2segix( nix ), pp ); /* zeros *pp */
+	win_unmap( nh2segix( nh ), pp ); /* zeros *pp */
 }
 
 /* ARGSUSED */
@@ -315,10 +300,10 @@ node_init( intgen_t fd,
 	 * dirs_nondirs_cnt may be less than the number of nodes/dirents).
 	 * Checking this here prevents potential overflow in the logic below.
 	 */
-	if ( dirs_nondirs_cnt > NIX_MAX ) {
+	if ( dirs_nondirs_cnt > NH_MAX ) {
 		mlog( MLOG_NORMAL | MLOG_ERROR, _(
-		  "dump contains %llu inodes, restore can only handle %llu\n"),
-		  dirs_nondirs_cnt, NIX_MAX );
+		  "dump contains %llu inodes, restore can only handle %u\n"),
+		  dirs_nondirs_cnt, NH_MAX );
 		return BOOL_FALSE;
 	}
 
@@ -332,7 +317,7 @@ node_init( intgen_t fd,
 		      segixshift++ );
 
 		/* rounding up to a power of 2 may have caused overflow */
-		if ( ( 1ULL << segixshift ) > NIX_MAX )
+		if ( ( 1ULL << segixshift ) > NH_MAX )
 			segixshift--;
 
 		nodesperseg = 1UL << segixshift;
@@ -373,9 +358,9 @@ node_init( intgen_t fd,
 	node_hdrp->nh_winmapmax = winmapmax;
 	node_hdrp->nh_nodesperseg = nodesperseg;
 	node_hdrp->nh_nodealignsz = nodealignsz;
-	node_hdrp->nh_freenix = NIX_NULL;
+	node_hdrp->nh_freenh = NH_NULL;
 	node_hdrp->nh_firstsegoff = off + ( off64_t )NODE_HDRSZ;
-	node_hdrp->nh_virgnix = 0;
+	node_hdrp->nh_virgnh = 0;
 	node_hdrp->nh_segixshift = segixshift;
 	node_hdrp->nh_relnixmask = nodesperseg - 1;
 
@@ -448,7 +433,6 @@ node_sync( intgen_t fd, off64_t off )
 nh_t
 node_alloc( void )
 {
-	nix_t nix;
 	u_char_t *p;
 	nh_t nh;
 #ifdef NODECHK
@@ -461,12 +445,12 @@ node_alloc( void )
 	 * otherwise get the next one from the current virgin segment,
 	 * or allocate a new virgin segment if the current one is depleted.
 	 */
-	if ( node_hdrp->nh_freenix != NIX_NULL ) {
-		nix_t *linkagep;
+	if ( node_hdrp->nh_freenh != NH_NULL ) {
+		nh_t *linkagep;
 
-		nix = node_hdrp->nh_freenix;
+		nh = node_hdrp->nh_freenh;
 
-		node_map_internal( nix, ( void ** )&p );
+		node_map_internal( nh, ( void ** )&p );
 		if (p == NULL)
 			return NH_NULL;
 #ifdef NODECHK
@@ -478,18 +462,18 @@ node_alloc( void )
 #endif /* NODECHK */
 
 		/* adjust the free list */
-		linkagep = ( nix_t * )p;
-		node_hdrp->nh_freenix = *linkagep;
+		linkagep = ( nh_t * )p;
+		node_hdrp->nh_freenh = *linkagep;
 
-		node_unmap_internal( nix, ( void ** )&p, BOOL_TRUE );
+		node_unmap_internal( nh, ( void ** )&p, BOOL_TRUE );
 
 	} else {
-		if ( nh2relnix( node_hdrp->nh_virgnix ) == 0 ) {
+		if ( nh2relnix( node_hdrp->nh_virgnh ) == 0 ) {
 			/* need to start a new virgin segment */
 			intgen_t rval;
 			off64_t new_seg_off =
 				node_hdrp->nh_firstsegoff +
-				( off64_t )nh2segix( node_hdrp->nh_virgnix ) *
+				( off64_t )nh2segix( node_hdrp->nh_virgnh ) *
 				( off64_t )node_hdrp->nh_segsz;
 
 			ASSERT( new_seg_off
@@ -508,34 +492,32 @@ node_alloc( void )
 				mlog( MLOG_NORMAL | MLOG_WARNING | MLOG_TREE, _(
 				      "unable to autogrow node segment %u: "
 				      "%s (%d)\n"),
-				      nh2segix( node_hdrp->nh_virgnix ),
+				      nh2segix( node_hdrp->nh_virgnh ),
 				      strerror( errno ),
 				      errno );
 			}
 		}
 
-		nix = node_hdrp->nh_virgnix++;
+		nh = node_hdrp->nh_virgnh++;
 	}
 
 	/* build a handle for node
 	 */
-	if ( nix > NIX_MAX ) {
+	if ( nh > NH_MAX ) {
 		mlog( MLOG_NORMAL | MLOG_ERROR, _(
 		  "dump contains too many dirents, "
 		  "restore can only handle %llu\n"),
-		  NIX_MAX );
+		  NH_MAX );
 		return NH_NULL;
 	}
 #ifdef NODECHK
-	node_map_internal( nix , ( void ** )&p );
+	node_map_internal( nh , ( void ** )&p );
 	if (p == NULL)
 		abort();
 	hkpp = p + ( int )node_hdrp->nh_nodehkix;
-	nh = HDLMKHDL( gen, nix );
+	nh = HDLMKHDL( gen, nh );
 	*hkpp = HKPMKHKP( gen, NODEUNQALCD );
-	node_unmap_internal( nix, ( void ** )&p, BOOL_FALSE );
-#else /* NODECHK */
-	nh = ( nh_t )nix;
+	node_unmap_internal( nh, ( void ** )&p, BOOL_FALSE );
 #endif /* NODECHK */
 
 	return nh;
@@ -544,7 +526,6 @@ node_alloc( void )
 void *
 node_map( nh_t nh )
 {
-	nix_t nix;
 	u_char_t *p;
 #ifdef NODECHK
 	register u_char_t hkp;
@@ -559,17 +540,15 @@ node_map( nh_t nh )
 	 */
 #ifdef NODECHK
 	hdlgen = HDLGETGEN( nh );
-	nix = HDLGETNIX( nh );
-#else /* NODECHK */
-	nix = ( nix_t )nh;
+	nh = HDLGETNHDL( nh );
 #endif /* NODECHK */
 
-	ASSERT( nix <= NIX_MAX );
+	ASSERT( nh <= NH_MAX );
 
 	/* map in
 	 */
 	p = 0; /* keep lint happy */
-	node_map_internal( nix, ( void ** )&p );
+	node_map_internal( nh, ( void ** )&p );
 	if (p == NULL)
 	    return NULL;
 
@@ -595,9 +574,8 @@ void
 node_free( nh_t *nhp )
 {
 	nh_t nh;
-	nix_t nix;
 	u_char_t *p;
-	register nix_t *linkagep;
+	register nh_t *linkagep;
 #ifdef NODECHK
 	register u_char_t *hkpp;
 	register u_char_t hdlgen;
@@ -613,12 +591,10 @@ node_free( nh_t *nhp )
 	 */
 #ifdef NODECHK
 	hdlgen = HDLGETGEN( nh );
-	nix = HDLGETNIX( nh );
-#else /* NODECHK */
-	nix = ( nix_t )nh;
+	nh = HDLGETNHDL( nh );
 #endif /* NODECHK */
 
-	ASSERT( nix <= NIX_MAX );
+	ASSERT( nh <= NH_MAX );
 
 	/* map in
 	 */
@@ -642,9 +618,9 @@ node_free( nh_t *nhp )
 
 	/* put node on free list
 	 */
-	linkagep = ( nix_t * )p;
-	*linkagep = node_hdrp->nh_freenix;
-	node_hdrp->nh_freenix = nix;
+	linkagep = ( nh_t * )p;
+	*linkagep = node_hdrp->nh_freenh;
+	node_hdrp->nh_freenh = nh;
 
 	/* map out
 	 */

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v3 9/9] xfsrestore: check for compatible xfsrestore
  2010-11-16 15:05 [PATCH v3 0/9] xfsrestore dirent limitations and scaling issues wkendall
                   ` (7 preceding siblings ...)
  2010-11-16 15:05 ` [PATCH v3 8/9] xfsrestore: remove nix_t wkendall
@ 2010-11-16 15:05 ` wkendall
  2010-11-17  9:38   ` Christoph Hellwig
  2010-11-16 19:21 ` [PATCH v3 0/9] xfsrestore dirent limitations and scaling issues Alex Elder
  9 siblings, 1 reply; 21+ messages in thread
From: wkendall @ 2010-11-16 15:05 UTC (permalink / raw)
  To: xfs

[-- Attachment #1: check_compatible_restore --]
[-- Type: text/plain, Size: 6981 bytes --]

When resuming a restore or doing a cumulative restore, xfsrestore
reads state information left around by the previous invocation.
This patch adds logic to determine whether or not restore is
able to make sense of the sense information.

The xfsrestore man page has also been updated to make the user
aware of the requirement to use a compatible restore and
system when resuming restores.

Signed-off-by: Bill Kendall <wkendall@sgi.com>

Reviewed-by: Alex Elder <aelder@sgi.com>

---
 man/man8/xfsrestore.8 |   26 +++++++++++---
 restore/content.c     |   92 +++++++++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 108 insertions(+), 10 deletions(-)

Index: xfsdump-kernel.org/man/man8/xfsrestore.8
===================================================================
--- xfsdump-kernel.org.orig/man/man8/xfsrestore.8
+++ xfsdump-kernel.org/man/man8/xfsrestore.8
@@ -61,6 +61,18 @@ The deltas must be applied in the order 
 Each delta applied must have been produced with the previously applied
 delta as its base.
 .P
+.I xfsrestore
+keeps state information in the
+.IR xfsrestorehousekeepingdir ,
+to inform subsequent invocations when used in
+cumulative mode, or in the event a restore is interrupted.
+To ensure that the state information can be processed,
+a compatible version of
+.I xfsrestore
+must be used for each subsequent invocation. Additionally,
+each invocation must run on a system of the same endianness
+and page size.
+.P
 The options to
 .I xfsrestore
 are:
@@ -78,11 +90,11 @@ option allows the operator to specify an
 in which
 .I xfsrestore
 creates the
-.I xfsrestorehousekeeping
-directory.
-When performing a cumulative (\f3\-r\f1 option) restore,
-each successive invocation of \f2xfsrestore\f1 must specify the same alternate
+.I xfsrestorehousekeepingdir
 directory.
+When performing a cumulative (\f3\-r\f1 option) restore or
+resuming (\f3\-R\f1 option) a restore, each successive invocation
+must specify the same alternate directory.
 .TP 5
 \f3\-b\f1 \f2blocksize\f1
 Specifies the blocksize, in bytes, to be used for the restore. 
@@ -205,7 +217,11 @@ Source tape drive is a QIC tape.  QIC ta
 blocksize, for which \f2xfsrestore\f1 must make special allowances.
 .TP 5
 \f3\-r\f1
-Selects the cumulative mode of operation.
+Selects the cumulative mode of operation. The
+.B \-a
+and
+.I destination
+options must be the same for each invocation.
 .TP 5
 \f3\-s\f1 \f2subtree\f1
 Specifies a subtree to restore.
Index: xfsdump-kernel.org/restore/content.c
===================================================================
--- xfsdump-kernel.org.orig/restore/content.c
+++ xfsdump-kernel.org/restore/content.c
@@ -71,6 +71,13 @@
 
 /* structure definitions used locally ****************************************/
 
+#define HOUSEKEEPING_MAGIC	0x686b6d61
+	/* "hkma" - see the housekeeping_magic field of pers_t below.
+	 */
+#define HOUSEKEEPING_VERSION	1
+	/* see the housekeeping_version field of pers_t below.
+	 */
+
 #define WRITE_TRIES_MAX	3
 	/* retry loop tuning for write(2) workaround
 	 */
@@ -358,12 +365,43 @@ struct stream_context {
 
 typedef struct stream_context stream_context_t;
 
-/* persistent state file header - two parts: accumulation state
- * which spans several sessions, and session state. each has a valid
- * bit, and no fields are valid until the valid bit is set.
- * all elements defined such that a bzero results in a valid initial state.
+/* persistent state file header - on-disk format information plus
+ * accumulation state (which spans several sessions) and session state.
+ * the latter two have a valid bit, and their fields are not valid until
+ * the valid bit is set. all elements defined such that a bzero results
+ * in a valid initial state.
  */
 struct pers {
+	/* on-disk format information used to verify that xfsrestore
+	 * can make sense of the data in xfsrestorehousekeepingdir
+	 * when running in cumulative mode or when resuming a restore.
+	 *
+	 * for backwards/forwards compatibility, this struct must be
+	 * the first field! also any changes to the struct must address
+	 * compatibility with other xfsrestore versions.
+	 */
+	struct {
+		size32_t housekeeping_magic;
+			/* used to determine if this struct has been
+			 * initialized, and whether the machine's
+			 * endianness is the same as the previous
+			 * invocation. (data written to xfsrestore's
+			 * state directory is not converted to an
+			 * endian-neutral format since it only persists
+			 * for the life of one or more restore sessions.)
+			 */
+		size32_t housekeeping_version;
+			/* version of the data structures used in the
+			 * state files in housekeepingdir. this must be
+			 * bumped whenever the on-disk format changes.
+			 */
+		size64_t pagesize;
+			/* headers in the persistent state files
+			 * are aligned on page size boundaries, so
+			 * this cannot change betweeen invocations.
+			 */
+	} v;
+
 	/* command line arguments from first session, and session
 	 * history.
 	 */
@@ -1301,6 +1339,49 @@ content_init( intgen_t argc, char *argv[
 		      strerror( errno ));
 		return BOOL_FALSE;
 	}
+
+	/* but first setup or verify the on-disk format information
+	 */
+	if ( ! persp->a.valpr ) {
+		/* this is the first restore session
+		 */
+		persp->v.housekeeping_magic = HOUSEKEEPING_MAGIC;
+		persp->v.housekeeping_version = HOUSEKEEPING_VERSION;
+		persp->v.pagesize = pgsz;
+
+	} else {
+		/* cumulative or resuming a restore, verify the header
+		 */
+		if ( persp->v.housekeeping_magic != HOUSEKEEPING_MAGIC ) {
+			mlog( MLOG_NORMAL | MLOG_ERROR, _(
+			      "%s format corrupt or wrong endianness "
+			      "(0x%x, expected 0x%x)\n"),
+			      hkdirname,
+			      persp->v.housekeeping_magic,
+			      HOUSEKEEPING_MAGIC );
+			return BOOL_FALSE;
+		}
+		if ( persp->v.housekeeping_version != HOUSEKEEPING_VERSION ) {
+			mlog( MLOG_NORMAL | MLOG_ERROR, _(
+			      "%s format version differs from previous "
+			      "restore (%u, expected %u)\n"),
+			      hkdirname,
+			      persp->v.housekeeping_version,
+			      HOUSEKEEPING_VERSION );
+			return BOOL_FALSE;
+		}
+		if ( persp->v.pagesize != pgsz ) {
+			mlog( MLOG_NORMAL | MLOG_ERROR, _(
+			      "%s format differs from previous "
+			      "restore due to page size change "
+			      "(was %lu, now %lu)\n"),
+			      hkdirname,
+			      persp->v.pagesize,
+			      pgsz );
+			return BOOL_FALSE;
+		}
+	}
+
 	if ( ! persp->a.valpr ) {
 		if ( ! dstdir ) {
 			mlog( MLOG_NORMAL | MLOG_ERROR, _(
@@ -1565,7 +1646,8 @@ content_init( intgen_t argc, char *argv[
 		stpgcnt = 0;
 		newstpgcnt = ( stsz + pgmask ) / pgsz;
 		descpgcnt = 0;
-		memset( ( void * )persp, 0, sizeof( pers_t ));
+		memset( ( void * )&persp->a, 0,
+			sizeof( pers_t ) - offsetofmember( pers_t, a ));
 	} else if ( ! persp->s.valpr ) {
 		stpgcnt = persp->a.stpgcnt;
 		newstpgcnt = stpgcnt;

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 7/9] xfsrestore: make node lookup more efficient
  2010-11-16 15:05 ` [PATCH v3 7/9] xfsrestore: make node lookup more efficient wkendall
@ 2010-11-16 19:20   ` Alex Elder
  0 siblings, 0 replies; 21+ messages in thread
From: Alex Elder @ 2010-11-16 19:20 UTC (permalink / raw)
  To: wkendall; +Cc: xfs

On Tue, 2010-11-16 at 09:05 -0600, wkendall@sgi.com wrote:
> plain text document attachment (window_lookup_performance)
> When converting an nh_t to a segment index and relative node index,
> use shift and bitwise-and instead of division and modulo. Results show
> this is about 50% faster than the current method.

So the mapping interface now takes a segment index
rather than a node index, and you have node_map_*()
functions that operate on node handles.  I think
that's good.  That change seems to be bigger, but
it basically enabled you to make the switch to
masking only affect those new functions.

Anyway, looks good to me.

Reviewed-by: Alex Elder <aelder@sgi.com>

> 
> Signed-off-by: Bill Kendall <wkendall@sgi.com>


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 0/9] xfsrestore dirent limitations and scaling issues
  2010-11-16 15:05 [PATCH v3 0/9] xfsrestore dirent limitations and scaling issues wkendall
                   ` (8 preceding siblings ...)
  2010-11-16 15:05 ` [PATCH v3 9/9] xfsrestore: check for compatible xfsrestore wkendall
@ 2010-11-16 19:21 ` Alex Elder
  9 siblings, 0 replies; 21+ messages in thread
From: Alex Elder @ 2010-11-16 19:21 UTC (permalink / raw)
  To: wkendall; +Cc: xfs

On Tue, 2010-11-16 at 09:05 -0600, wkendall@sgi.com wrote:
> v3 of this series reworks a couple of the patches based on feedback
> from Alex Elder:
> 
> - 3rd patch: changed a buffer size check to an assert.
> - 7th patch: move static functions up in the file and
>   create typedefs for segix_t and relnix_t.
> 
> The remainder is unchanged from the previous submission.

I'm shepherd this series in to the xfsdump-dev tree
for Bill.  I'll wait until tomorrow though, to give
others one more chance to review.  If anyone has any
objection please say so.

					-Alex

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 1/9] xfsrestore: turn off NODECHK
  2010-11-16 15:05 ` [PATCH v3 1/9] xfsrestore: turn off NODECHK wkendall
@ 2010-11-17  9:20   ` Christoph Hellwig
  0 siblings, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2010-11-17  9:20 UTC (permalink / raw)
  To: wkendall; +Cc: xfs

On Tue, Nov 16, 2010 at 09:05:03AM -0600, wkendall@sgi.com wrote:
> The NODECHK macro should only be enabled as needed for
> development/debugging. Having it on limits xfsrestore to dumps
> with 268 million directory entries instead of 4 billion. 
> 
> Signed-off-by: Bill Kendall <wkendall@sgi.com>
> 
> Reviewed-by: Alex Elder <aelder@sgi.com>

Looks good,


Reviewed-by: Christoph Hellwig <hch@lst.de>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 2/9] xfsrestore: change nrh_t from 32 to 64 bits
  2010-11-16 15:05 ` [PATCH v3 2/9] xfsrestore: change nrh_t from 32 to 64 bits wkendall
@ 2010-11-17  9:23   ` Christoph Hellwig
  0 siblings, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2010-11-17  9:23 UTC (permalink / raw)
  To: wkendall; +Cc: xfs

On Tue, Nov 16, 2010 at 09:05:04AM -0600, wkendall@sgi.com wrote:
> An nrh_t refers to a byte offset in a file containing all the pathname
> components from a dump. At an average filename length of 20
> characters, an nrh_t would overflow on a dump containing ~214 million
> directory entries. Removing this limitation allows xfsrestore to
> handle 4 billion directory entries.

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 3/9] xfsrestore: cache path lookups
  2010-11-16 15:05 ` [PATCH v3 3/9] xfsrestore: cache path lookups wkendall
@ 2010-11-17  9:24   ` Christoph Hellwig
  0 siblings, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2010-11-17  9:24 UTC (permalink / raw)
  To: wkendall; +Cc: xfs

On Tue, Nov 16, 2010 at 09:05:05AM -0600, wkendall@sgi.com wrote:
> In order to resolve a pathname, xfsrestore must work from an inode
> number (from the dump) and recurse up the directory entry tree that it
> has constructed. Each level of recursion requires a seek and read to
> get the name of the dirent, and possibly a mmap of a section of the
> directory entry tree if it is not already mapped (and in that case,
> possibly a munmap of another section). It's quite common to resolve
> pathnames in the same directory consecutively, so simply caching the
> parent directory pathname from the previous lookup saves quite a bit
> of overhead.
> 
> Signed-off-by: Bill Kendall <wkendall@sgi.com>
> 
> Reviewed-by: Alex Elder <aelder@sgi.com>

Looks good,


Reviewed-by: Christoph Hellwig <hch@lst.de>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 4/9] xfsrestore: mmap dirent names for faster lookups
  2010-11-16 15:05 ` [PATCH v3 4/9] xfsrestore: mmap dirent names for faster lookups wkendall
@ 2010-11-17  9:34   ` Christoph Hellwig
  2010-11-17 17:57     ` Bill Kendall
  0 siblings, 1 reply; 21+ messages in thread
From: Christoph Hellwig @ 2010-11-17  9:34 UTC (permalink / raw)
  To: wkendall; +Cc: xfs

On Tue, Nov 16, 2010 at 09:05:06AM -0600, wkendall@sgi.com wrote:
> Pathname resolution in xfsrestore is about 4x faster if the file
> containing dirent names ("namreg") is memory mapped.  If xfsrestore is
> unable to map the file (e.g., due to virtual memory constraints)
> fallback to the existing seek-and-read approach.
> 
> The file is mapped after all directory entries have been written to
> the "namreg" file. If the caller tries to add additional entries after
> the file has been mapped, it will be unmapped and restore will resort
> back to seek-and-read lookups.
> 
> Signed-off-by: Bill Kendall <wkendall@sgi.com>
> 
> Reviewed-by: Alex Elder <aelder@sgi.com>

Generally looks good to me, but I really hate how namreg_map/unmap
are hidden under namreg_add/flush.  As a start instead of adding the
done_adding argument we can easily move the explicit map to the one
caller wanting it, similarly namreg_unmap could be moved to namreg_add,
that is manage to mapping/unmapping explicitly.

In fact I'm not sure what the point of the unmap/map cycles is.  At
least in Linux concurrent buffer writes and mmap reads are coherent.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 8/9] xfsrestore: remove nix_t
  2010-11-16 15:05 ` [PATCH v3 8/9] xfsrestore: remove nix_t wkendall
@ 2010-11-17  9:37   ` Christoph Hellwig
  0 siblings, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2010-11-17  9:37 UTC (permalink / raw)
  To: wkendall; +Cc: xfs

On Tue, Nov 16, 2010 at 09:05:10AM -0600, wkendall@sgi.com wrote:
> The nix_t type is a 64-bit version of a (32-bit) nh_t, created "to
> facilitate conversion to a 64-bit offset." At this point there's only
> one case where an nh_t is converted to an off64_t, so just do an
> explicit cast in that case and remove the nix_t.
> 
> Signed-off-by: Bill Kendall <wkendall@sgi.com>
> 
> Reviewed-by: Alex Elder <aelder@sgi.com>

Looks good,


Reviewed-by: Christoph Hellwig <hch@lst.de>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 9/9] xfsrestore: check for compatible xfsrestore
  2010-11-16 15:05 ` [PATCH v3 9/9] xfsrestore: check for compatible xfsrestore wkendall
@ 2010-11-17  9:38   ` Christoph Hellwig
  2010-11-17 15:31     ` Bill Kendall
  0 siblings, 1 reply; 21+ messages in thread
From: Christoph Hellwig @ 2010-11-17  9:38 UTC (permalink / raw)
  To: wkendall; +Cc: xfs

On Tue, Nov 16, 2010 at 09:05:11AM -0600, wkendall@sgi.com wrote:
> When resuming a restore or doing a cumulative restore, xfsrestore
> reads state information left around by the previous invocation.
> This patch adds logic to determine whether or not restore is
> able to make sense of the sense information.
> 
> The xfsrestore man page has also been updated to make the user
> aware of the requirement to use a compatible restore and
> system when resuming restores.

Shouldn't you use the opportunity to switch everything in the on-disk
layout to explicitly sized types?  Seeing things like the bool types
in the persistant structures really scares me.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 9/9] xfsrestore: check for compatible xfsrestore
  2010-11-17  9:38   ` Christoph Hellwig
@ 2010-11-17 15:31     ` Bill Kendall
  2010-11-23 13:44       ` Christoph Hellwig
  0 siblings, 1 reply; 21+ messages in thread
From: Bill Kendall @ 2010-11-17 15:31 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On 11/17/2010 03:38 AM, Christoph Hellwig wrote:
> On Tue, Nov 16, 2010 at 09:05:11AM -0600, wkendall@sgi.com wrote:
>> When resuming a restore or doing a cumulative restore, xfsrestore
>> reads state information left around by the previous invocation.
>> This patch adds logic to determine whether or not restore is
>> able to make sense of the sense information.
>>
>> The xfsrestore man page has also been updated to make the user
>> aware of the requirement to use a compatible restore and
>> system when resuming restores.
>
> Shouldn't you use the opportunity to switch everything in the on-disk
> layout to explicitly sized types?  Seeing things like the bool types
> in the persistant structures really scares me.

Just to be clear, the state information is used only for the life of
a series of restores. You restore your level 0 dump, then run restore
again on your level 1, and so on. After that the state information is not
used and would be deleted.

Given how unlikely it is for someone to start a restore on one system
and continue it on another (incompatible) system, and since your suggested
change would ripple out into all the code that touches any of the on-disk
structures, I'd prefer to simply detect a change in the size of types. I
would think that recording/checking the size of a pointer would be
sufficient, assuming your main concern is type size differences between
32-bit and 64-bit systems.

Bill

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 4/9] xfsrestore: mmap dirent names for faster lookups
  2010-11-17  9:34   ` Christoph Hellwig
@ 2010-11-17 17:57     ` Bill Kendall
  0 siblings, 0 replies; 21+ messages in thread
From: Bill Kendall @ 2010-11-17 17:57 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On 11/17/2010 03:34 AM, Christoph Hellwig wrote:
> On Tue, Nov 16, 2010 at 09:05:06AM -0600, wkendall@sgi.com wrote:
>> Pathname resolution in xfsrestore is about 4x faster if the file
>> containing dirent names ("namreg") is memory mapped.  If xfsrestore is
>> unable to map the file (e.g., due to virtual memory constraints)
>> fallback to the existing seek-and-read approach.
>>
>> The file is mapped after all directory entries have been written to
>> the "namreg" file. If the caller tries to add additional entries after
>> the file has been mapped, it will be unmapped and restore will resort
>> back to seek-and-read lookups.
>>
>> Signed-off-by: Bill Kendall<wkendall@sgi.com>
>>
>> Reviewed-by: Alex Elder<aelder@sgi.com>
>
> Generally looks good to me, but I really hate how namreg_map/unmap
> are hidden under namreg_add/flush.  As a start instead of adding the
> done_adding argument we can easily move the explicit map to the one
> caller wanting it, similarly namreg_unmap could be moved to namreg_add,
> that is manage to mapping/unmapping explicitly.

I'll rework this.

>
> In fact I'm not sure what the point of the unmap/map cycles is.  At
> least in Linux concurrent buffer writes and mmap reads are coherent.

Adding an entry extends the file beyond what is mapped. The unmap
code was merely defensive, in case namreg_add is called somewhere
after the file is mapped. I spent some time looking at the callers,
and all namreg_adds occur before the file is mapped. namreg_del is
called after the file is mapped, but as it doesn't do anything this
is not a problem. I'll remove the unmap code.

Bill

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 9/9] xfsrestore: check for compatible xfsrestore
  2010-11-17 15:31     ` Bill Kendall
@ 2010-11-23 13:44       ` Christoph Hellwig
  0 siblings, 0 replies; 21+ messages in thread
From: Christoph Hellwig @ 2010-11-23 13:44 UTC (permalink / raw)
  To: Bill Kendall; +Cc: Christoph Hellwig, xfs

On Wed, Nov 17, 2010 at 09:31:40AM -0600, Bill Kendall wrote:
> Just to be clear, the state information is used only for the life of
> a series of restores. You restore your level 0 dump, then run restore
> again on your level 1, and so on. After that the state information is not
> used and would be deleted.
> 
> Given how unlikely it is for someone to start a restore on one system
> and continue it on another (incompatible) system, and since your suggested
> change would ripple out into all the code that touches any of the on-disk
> structures, I'd prefer to simply detect a change in the size of types. I
> would think that recording/checking the size of a pointer would be
> sufficient, assuming your main concern is type size differences between
> 32-bit and 64-bit systems.

Ok, still not nice, but no reason to revamp all of xfsdump due to this.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2010-11-23 13:42 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-11-16 15:05 [PATCH v3 0/9] xfsrestore dirent limitations and scaling issues wkendall
2010-11-16 15:05 ` [PATCH v3 1/9] xfsrestore: turn off NODECHK wkendall
2010-11-17  9:20   ` Christoph Hellwig
2010-11-16 15:05 ` [PATCH v3 2/9] xfsrestore: change nrh_t from 32 to 64 bits wkendall
2010-11-17  9:23   ` Christoph Hellwig
2010-11-16 15:05 ` [PATCH v3 3/9] xfsrestore: cache path lookups wkendall
2010-11-17  9:24   ` Christoph Hellwig
2010-11-16 15:05 ` [PATCH v3 4/9] xfsrestore: mmap dirent names for faster lookups wkendall
2010-11-17  9:34   ` Christoph Hellwig
2010-11-17 17:57     ` Bill Kendall
2010-11-16 15:05 ` [PATCH v3 5/9] xfsrestore: cleanup node allocation wkendall
2010-11-16 15:05 ` [PATCH v3 6/9] xfsrestore: fix node table setup wkendall
2010-11-16 15:05 ` [PATCH v3 7/9] xfsrestore: make node lookup more efficient wkendall
2010-11-16 19:20   ` Alex Elder
2010-11-16 15:05 ` [PATCH v3 8/9] xfsrestore: remove nix_t wkendall
2010-11-17  9:37   ` Christoph Hellwig
2010-11-16 15:05 ` [PATCH v3 9/9] xfsrestore: check for compatible xfsrestore wkendall
2010-11-17  9:38   ` Christoph Hellwig
2010-11-17 15:31     ` Bill Kendall
2010-11-23 13:44       ` Christoph Hellwig
2010-11-16 19:21 ` [PATCH v3 0/9] xfsrestore dirent limitations and scaling issues Alex Elder

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox