public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3, 09/16] xfsprogs: metadump: don't loop on too many dups
@ 2011-02-18 21:21 Alex Elder
  2011-02-24  1:55 ` Dave Chinner
  0 siblings, 1 reply; 4+ messages in thread
From: Alex Elder @ 2011-02-18 21:21 UTC (permalink / raw)
  To: xfs

Don't just loop indefinitely when an obfuscated name comes up as a
duplicate.  Count the number of times we've found a duplicate and if
if it gets excessive despite choosing names at random, just give up
and use the original name without obfuscation.

Technically, a typical 5-character name has 255 other names that can
have the same hash value.  But the algorithm doesn't hit all
possible names (far from it) so duplicates are still possible.

Signed-off-by: Alex Elder <aelder@sgi.com>

The only change worth mentioning from the last version posted is
that the duplicate count is now updated inside the loop that
searches the name table.

---
 db/metadump.c |   32 ++++++++++++++++++++++++--------
 1 file changed, 24 insertions(+), 8 deletions(-)

Index: b/db/metadump.c
===================================================================
--- a/db/metadump.c
+++ b/db/metadump.c
@@ -29,6 +29,14 @@
 
 #define DEFAULT_MAX_EXT_SIZE	1000
 
+/*
+ * It's possible that multiple files in a directory (or attributes
+ * in a file) produce the same obfuscated name.  If that happens, we
+ * try to create another one.  After several rounds of this though,
+ * we just give up and leave the original name as-is.
+ */
+#define	DUP_MAX		5	/* Max duplicates before we give up */
+
 /* copy all metadata structures to/from a file */
 
 static int	metadump_f(int argc, char **argv);
@@ -437,8 +445,9 @@ generate_obfuscated_name(
 {
 	xfs_dahash_t		hash;
 	name_ent_t		*p;
-	int			dup;
+	int			dup = 0;
 	uchar_t			newname[NAME_MAX];
+	uchar_t			*newp;
 
 	/*
 	 * Our obfuscation algorithm requires at least 5-character
@@ -471,19 +480,17 @@ generate_obfuscated_name(
 	do {
 		int		i;
 		xfs_dahash_t	newhash = 0;
-		uchar_t		*newp = &newname[0];
 		uchar_t		*first;
 		uchar_t		high_bit;
 		int		shift;
 
-		dup = 0;
-
 		/*
 		 * The beginning of the obfuscated name can be
 		 * pretty much anything, so fill it in with random
 		 * characters.  Accumulate its new hash value as we
 		 * go.
 		 */
+		newp = &newname[0];
 		for (i = 0; i < namelen - 5; i++) {
 			*newp = random_filename_char();
 			newhash = *newp ^ rol32(newhash, 7);
@@ -531,14 +538,22 @@ generate_obfuscated_name(
 
 		ASSERT(libxfs_da_hashname(newname, namelen) == hash);
 
+		/*
+		 * Search the name table to be sure we don't produce
+		 * a name that's already been used.
+		 */
 		for (p = nametable[hash % NAME_TABLE_SIZE]; p; p = p->next) {
 			if (p->hash == hash && p->namelen == namelen &&
 					!memcmp(p->name, newname, namelen)) {
-				dup = 1;
+				dup++;
 				break;
 			}
 		}
-	} while (dup);
+	} while (dup && dup < DUP_MAX);
+
+	/* Use the original name if we got too many dups. */
+
+	newp = dup < DUP_MAX ? newname : name;
 
 	/* Create an entry for the name in the name table */
 
@@ -547,7 +562,7 @@ generate_obfuscated_name(
 		return;
 
 	p->namelen = namelen;
-	memcpy(p->name, newname, namelen);
+	memcpy(p->name, newp, namelen);
 	p->hash = hash;
 	p->next = nametable[hash % NAME_TABLE_SIZE];
 
@@ -555,7 +570,8 @@ generate_obfuscated_name(
 
 	/* Update the caller's copy with the obfuscated name */
 
-	memcpy(name, newname, namelen);
+	if (newp != name)
+		memcpy(name, newp, namelen);
 }
 
 static void

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v3, 09/16] xfsprogs: metadump: don't loop on too many dups
  2011-02-18 21:21 [PATCH v3, 09/16] xfsprogs: metadump: don't loop on too many dups Alex Elder
@ 2011-02-24  1:55 ` Dave Chinner
  2011-02-25 18:13   ` [PATCH v4, " Alex Elder
  0 siblings, 1 reply; 4+ messages in thread
From: Dave Chinner @ 2011-02-24  1:55 UTC (permalink / raw)
  To: Alex Elder; +Cc: xfs

On Fri, Feb 18, 2011 at 03:21:01PM -0600, Alex Elder wrote:
> Don't just loop indefinitely when an obfuscated name comes up as a
> duplicate.  Count the number of times we've found a duplicate and if
> if it gets excessive despite choosing names at random, just give up
> and use the original name without obfuscation.
> 
> Technically, a typical 5-character name has 255 other names that can
> have the same hash value.  But the algorithm doesn't hit all
> possible names (far from it) so duplicates are still possible.
> 
> Signed-off-by: Alex Elder <aelder@sgi.com>
> 
> The only change worth mentioning from the last version posted is
> that the duplicate count is now updated inside the loop that
> searches the name table.

The only thing that I'd suggest here is that we emit a warning to
indicate that we haven't obfuscated a name due to excessive
duplicates being created. If the user has asked for obfuscation, we
shoul dat least inform them  failures to do so for filenames that
should be obfuscated....

Otherwise,

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH v4, 09/16] xfsprogs: metadump: don't loop on too many dups
  2011-02-24  1:55 ` Dave Chinner
@ 2011-02-25 18:13   ` Alex Elder
  2011-03-03  5:07     ` Dave Chinner
  0 siblings, 1 reply; 4+ messages in thread
From: Alex Elder @ 2011-02-25 18:13 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

Don't just loop indefinitely when an obfuscated name comes up as a
duplicate.  Count the number of times we've found a duplicate and if
if it gets excessive despite choosing names at random, just give up
and use the original name without obfuscation.

Technically, a typical 5-character name has 255 other names that can
have the same hash value.  But the algorithm doesn't hit all
possible names (far from it) so duplicates are still possible.

Signed-off-by: Alex Elder <aelder@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>

Updates (v4):
- Rearranged things a bit so that if too many duplicates are
  encountered, a warning gets emitted.
    
Dave already signed off on it but the update was different enough I
thought I should post it once more.

---
 db/metadump.c |   42 +++++++++++++++++++++++++++++++-----------
 1 file changed, 31 insertions(+), 11 deletions(-)

Index: b/db/metadump.c
===================================================================
--- a/db/metadump.c
+++ b/db/metadump.c
@@ -29,6 +29,14 @@
 
 #define DEFAULT_MAX_EXT_SIZE	1000
 
+/*
+ * It's possible that multiple files in a directory (or attributes
+ * in a file) produce the same obfuscated name.  If that happens, we
+ * try to create another one.  After several rounds of this though,
+ * we just give up and leave the original name as-is.
+ */
+#define	DUP_MAX		5	/* Max duplicates before we give up */
+
 /* copy all metadata structures to/from a file */
 
 static int	metadump_f(int argc, char **argv);
@@ -444,8 +452,9 @@ generate_obfuscated_name(
 {
 	xfs_dahash_t		hash;
 	name_ent_t		*p;
-	int			dup;
+	int			dup = 0;
 	uchar_t			newname[NAME_MAX];
+	uchar_t			*newp;
 
 	/*
 	 * Our obfuscation algorithm requires at least 5-character
@@ -481,19 +490,17 @@ generate_obfuscated_name(
 	do {
 		int		i;
 		xfs_dahash_t	newhash = 0;
-		uchar_t		*newp = &newname[0];
 		uchar_t		*first;
 		uchar_t		high_bit;
 		int		shift;
 
-		dup = 0;
-
 		/*
 		 * The beginning of the obfuscated name can be
 		 * pretty much anything, so fill it in with random
 		 * characters.  Accumulate its new hash value as we
 		 * go.
 		 */
+		newp = &newname[0];
 		for (i = 0; i < namelen - 5; i++) {
 			*newp = random_filename_char();
 			newhash = *newp ^ rol32(newhash, 7);
@@ -541,14 +548,31 @@ generate_obfuscated_name(
 
 		ASSERT(libxfs_da_hashname(newname, namelen) == hash);
 
+		/*
+		 * Search the name table to be sure we don't produce
+		 * a name that's already been used.
+		 */
 		for (p = nametable[hash % NAME_TABLE_SIZE]; p; p = p->next) {
 			if (p->hash == hash && p->namelen == namelen &&
 					!memcmp(p->name, newname, namelen)) {
-				dup = 1;
+				dup++;
 				break;
 			}
 		}
-	} while (dup);
+	} while (dup && dup < DUP_MAX);
+
+	/*
+	 * Update the caller's copy with the obfuscated name.  Use
+	 * the original name if we got too many duplicates--and if
+	 * so, issue a warning.
+	 */
+	if (dup < DUP_MAX)
+		memcpy(name, newname, namelen);
+	else
+		print_warning("duplicate name for inode %llu "
+				"in dir inode %llu\n",
+			(unsigned long long) ino,
+			(unsigned long long) cur_ino);
 
 	/* Create an entry for the name in the name table */
 
@@ -557,15 +581,11 @@ generate_obfuscated_name(
 		return;
 
 	p->namelen = namelen;
-	memcpy(p->name, newname, namelen);
+	memcpy(p->name, name, namelen);
 	p->hash = hash;
 	p->next = nametable[hash % NAME_TABLE_SIZE];
 
 	nametable[hash % NAME_TABLE_SIZE] = p;
-
-	/* Update the caller's copy with the obfuscated name */
-
-	memcpy(name, newname, namelen);
 }
 
 static void


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v4, 09/16] xfsprogs: metadump: don't loop on too many dups
  2011-02-25 18:13   ` [PATCH v4, " Alex Elder
@ 2011-03-03  5:07     ` Dave Chinner
  0 siblings, 0 replies; 4+ messages in thread
From: Dave Chinner @ 2011-03-03  5:07 UTC (permalink / raw)
  To: Alex Elder; +Cc: xfs

On Fri, Feb 25, 2011 at 12:13:44PM -0600, Alex Elder wrote:
> Don't just loop indefinitely when an obfuscated name comes up as a
> duplicate.  Count the number of times we've found a duplicate and if
> if it gets excessive despite choosing names at random, just give up
> and use the original name without obfuscation.
> 
> Technically, a typical 5-character name has 255 other names that can
> have the same hash value.  But the algorithm doesn't hit all
> possible names (far from it) so duplicates are still possible.
> 
> Signed-off-by: Alex Elder <aelder@sgi.com>
> Reviewed-by: Dave Chinner <dchinner@redhat.com>
> 
> Updates (v4):
> - Rearranged things a bit so that if too many duplicates are
>   encountered, a warning gets emitted.
>     
> Dave already signed off on it but the update was different enough I
> thought I should post it once more.

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2011-03-03  5:04 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-18 21:21 [PATCH v3, 09/16] xfsprogs: metadump: don't loop on too many dups Alex Elder
2011-02-24  1:55 ` Dave Chinner
2011-02-25 18:13   ` [PATCH v4, " Alex Elder
2011-03-03  5:07     ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox