[PATCH v3, 04/16] xfsprogs: metadump: adjust rather than start over when invalid byte found

* [PATCH v3, 04/16] xfsprogs: metadump: adjust rather than start over when invalid byte found
@ 2011-02-18 21:21 Alex Elder
  2011-02-24  1:37 ` Dave Chinner
  0 siblings, 1 reply; 2+ messages in thread
From: Alex Elder @ 2011-02-18 21:21 UTC (permalink / raw)
  To: xfs

The last 5 bytes of a name generated by generate_obfuscated_name()
can be selected such that they (along with all of the preceding
characters in the name) produce any desired value when hashed.

They are selected based on how their value affects the outcome of
the hash calculation for the obfuscated name.  Each byte is XOR'd
into the hash at a certain position.  The portion of the hash
affected by each of these last five bytes can be seen visually below
(where "last-3" means the byte 3 positions before the last byte in
the name):

	+-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-+
hash:	| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
	+-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-+
       last-4 ->|           |<-- last-2 --->|           |<--- last ---->|
              |<-- last-3 --->|           |<-- last-1 --->|     |<- last-4

(Note that byte (last-4) wraps around.  The previous patch in this
series eliminated the effect of the upper 4 bits of that byte by
forcing them to be all be 0 bits.)

Using the (XOR) difference between the hash computed for the
beginning of the obfuscated name and the hash from the original
name, we directly determine the required final five bytes to make
the hashes for the two complete names match.  The lower byte (bits
0-7) of that difference is used for the last character in the
obfuscated name, bits 7-14 for the second-to-last, and so on.

So we start with the difference between the hash from the complete
original name and the hash (so far) for a random string constituting
the first part of the obfuscated name.  We extract five sets of 8
bits from the result at the positions indicated above, and those
8-bit values will become the final five bytes of the obfuscated
name.  By assuming (or forcing) the top bit of each of these
extracted values to be 0 (by masking off the top bit), we can ignore
the overlapping portions when determining the bytes to use.


It's possible for this process to produce characters ('\0' and '/')
that are not allowed in valid names.  If this occurs, the existing
code abandons the current obfuscated name and starts again from the
beginning.  But there exist cases where this can lead to a
never-ending loop.

Arkadiusz Miśkiewicz encountered just such a name, "R\323\257NE".
That name produces hash value 0x3a4be740, which requires that the
obfuscated name uses '/' at position last-2.  The current algorithm
starts over, but since there are no random characters in this
length-5 obfuscated name, no other possibility will be found, and
the process repeats forever.


This change modifies the algorithm used so that if a unallowed
character arises, we flip a bit in that character, along with
another "matching" bit in another (overlapping) character such that
the resulting hash is unchanged.  The two unallowed characters in a
name are '\0' (0x00) and '/' (0x2f), and flipping any one bit in
either of those characters results in an allowed character.


So, starting with the first of these last 5 bytes (last-4), if its
"normal" value is one of the unallowed characters, we flip its low
bit and arrange to flip the high bit of its successor byte.  The
remaining bytes are treated similarly.

The very last byte has a little different treatment.  We can flip
its low bit, but it has no successor byte per se.  Its effect on
the hash does, however overlap the upper four bits from byte
(last-4).  We can therefore flip the corresponding bit in that (at
position 0x10).


There is one more case to consider.  It's possible in that last
case that by flipping a bit in byte (last-4), we have converted
that byte to one that's not allowed.  It turns out this won't ever
happen, because we know that byte was initially assigned a value
with its upper four bits clear.  Flipping the bit at 0x10 cannot
therefore produce either 0x00 or 0x2f, so we don't need to treat
this case.


With these changes to the name generation algorithm, we avoid
any of the cases in which no alternate name can be found without
using an illegal character.  We also avoid all looping due to bad
characters.

Reported-by: Arkadiusz Miśkiewicz <arekm@maven.pl>
Signed-off-by: Alex Elder <aelder@sgi.com>

The only notable update since the last version posted is that the
second bit-flip in the last byte is no longer done (because I
realized it was not necessary).

---
 db/metadump.c |   75 +++++++++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 59 insertions(+), 16 deletions(-)

Index: b/db/metadump.c
===================================================================

--- a/db/metadump.c
+++ b/db/metadump.c
@@ -474,6 +474,8 @@ generate_obfuscated_name(
 	do {
 		dup = 0;
 		for (;;) {
+			uchar_t	high_bit;
+
 		    	/*
 			 * The beginning of the obfuscated name can
 			 * be pretty much anything, so fill it in
@@ -490,25 +492,66 @@ generate_obfuscated_name(
 			 * Compute which five bytes need to be used
 			 * at the end of the name so the hash of the
 			 * obfuscated name is the same as the hash
-			 * of the original.
+			 * of the original.  If any result in an
+			 * invalid character, flip a bit and arrange
+			 * for a corresponding bit in a neighboring
+			 * byte to be flipped as well.  For the last
+			 * byte, the "neighbor" to change is the
+			 * first byte we're computing here.
 			 */
 			newhash = rol32(newhash, 3) ^ hash;
 
-			newp[namelen - 5] = (newhash >> 28) & 0x7f;
-			if (is_invalid_char(newp[namelen - 5]))
-				continue;
-			newp[namelen - 4] = (newhash >> 21) & 0x7f;
-			if (is_invalid_char(newp[namelen - 4]))
-				continue;
-			newp[namelen - 3] = (newhash >> 14) & 0x7f;
-			if (is_invalid_char(newp[namelen - 3]))
-				continue;
-			newp[namelen - 2] = (newhash >> 7) & 0x7f;
-			if (is_invalid_char(newp[namelen - 2]))
-				continue;
-			newp[namelen - 1] = (newhash >> 0) & 0x7f;
-			if (is_invalid_char(newp[namelen - 1]))
-				continue;
+			high_bit = 0;
+
+			newp[namelen - 5] = ((newhash >> 28) & 0x7f) ^ high_bit;
+			if (is_invalid_char(newp[namelen - 5])) {
+				newp[namelen - 5] ^= 1;
+				high_bit = 0x80;
+			} else
+				high_bit = 0;
+
+			newp[namelen - 4] = ((newhash >> 21) & 0x7f) ^ high_bit;
+			if (is_invalid_char(newp[namelen - 4])) {
+				newp[namelen - 4] ^= 1;
+				high_bit = 0x80;
+			} else
+				high_bit = 0;
+
+			newp[namelen - 3] = ((newhash >> 14) & 0x7f) ^ high_bit;
+			if (is_invalid_char(newp[namelen - 3])) {
+				newp[namelen - 3] ^= 1;
+				high_bit = 0x80;
+			} else
+				high_bit = 0;
+
+			newp[namelen - 2] = ((newhash >> 7) & 0x7f) ^ high_bit;
+			if (is_invalid_char(newp[namelen - 2])) {
+				newp[namelen - 2] ^= 1;
+				high_bit = 0x80;
+			} else
+				high_bit = 0;
+
+			newp[namelen - 1] = ((newhash >> 0) & 0x7f) ^ high_bit;
+			if (is_invalid_char(newp[namelen - 1])) {
+				newp[namelen - 1] ^= 1;
+				high_bit = 0x80;
+			} else
+				high_bit = 0;
+
+			/*
+			 * If we flipped a bit on the last byte, we
+			 * need to fix up the first one we computed.
+			 *
+			 * That first byte had 0's in its upper four
+			 * bits (it's the result of shifting a
+			 * 32-bit unsigned value Right by 28 bits),
+			 * so we don't need to worry about it
+			 * becoming invalid as a result.
+			 */
+			if (high_bit) {
+			    	newp[namelen - 5] ^= 0x10;
+				ASSERT(!is_invalid_char(newp[namelen - 5]));
+			}
 			break;
 		}
 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 2+ messages in thread