public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3, 04/16] xfsprogs: metadump: adjust rather than start over when invalid byte found
@ 2011-02-18 21:21 Alex Elder
  2011-02-24  1:37 ` Dave Chinner
  0 siblings, 1 reply; 2+ messages in thread
From: Alex Elder @ 2011-02-18 21:21 UTC (permalink / raw)
  To: xfs

The last 5 bytes of a name generated by generate_obfuscated_name()
can be selected such that they (along with all of the preceding
characters in the name) produce any desired value when hashed.

They are selected based on how their value affects the outcome of
the hash calculation for the obfuscated name.  Each byte is XOR'd
into the hash at a certain position.  The portion of the hash
affected by each of these last five bytes can be seen visually below
(where "last-3" means the byte 3 positions before the last byte in
the name):

	+-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-+
hash:	| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
	+-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-+
       last-4 ->|           |<-- last-2 --->|           |<--- last ---->|
              |<-- last-3 --->|           |<-- last-1 --->|     |<- last-4

(Note that byte (last-4) wraps around.  The previous patch in this
series eliminated the effect of the upper 4 bits of that byte by
forcing them to be all be 0 bits.)

Using the (XOR) difference between the hash computed for the
beginning of the obfuscated name and the hash from the original
name, we directly determine the required final five bytes to make
the hashes for the two complete names match.  The lower byte (bits
0-7) of that difference is used for the last character in the
obfuscated name, bits 7-14 for the second-to-last, and so on.

So we start with the difference between the hash from the complete
original name and the hash (so far) for a random string constituting
the first part of the obfuscated name.  We extract five sets of 8
bits from the result at the positions indicated above, and those
8-bit values will become the final five bytes of the obfuscated
name.  By assuming (or forcing) the top bit of each of these
extracted values to be 0 (by masking off the top bit), we can ignore
the overlapping portions when determining the bytes to use.


It's possible for this process to produce characters ('\0' and '/')
that are not allowed in valid names.  If this occurs, the existing
code abandons the current obfuscated name and starts again from the
beginning.  But there exist cases where this can lead to a
never-ending loop.

Arkadiusz Miśkiewicz encountered just such a name, "R\323\257NE".
That name produces hash value 0x3a4be740, which requires that the
obfuscated name uses '/' at position last-2.  The current algorithm
starts over, but since there are no random characters in this
length-5 obfuscated name, no other possibility will be found, and
the process repeats forever.


This change modifies the algorithm used so that if a unallowed
character arises, we flip a bit in that character, along with
another "matching" bit in another (overlapping) character such that
the resulting hash is unchanged.  The two unallowed characters in a
name are '\0' (0x00) and '/' (0x2f), and flipping any one bit in
either of those characters results in an allowed character.


So, starting with the first of these last 5 bytes (last-4), if its
"normal" value is one of the unallowed characters, we flip its low
bit and arrange to flip the high bit of its successor byte.  The
remaining bytes are treated similarly.

The very last byte has a little different treatment.  We can flip
its low bit, but it has no successor byte per se.  Its effect on
the hash does, however overlap the upper four bits from byte
(last-4).  We can therefore flip the corresponding bit in that (at
position 0x10).


There is one more case to consider.  It's possible in that last
case that by flipping a bit in byte (last-4), we have converted
that byte to one that's not allowed.  It turns out this won't ever
happen, because we know that byte was initially assigned a value
with its upper four bits clear.  Flipping the bit at 0x10 cannot
therefore produce either 0x00 or 0x2f, so we don't need to treat
this case.


With these changes to the name generation algorithm, we avoid
any of the cases in which no alternate name can be found without
using an illegal character.  We also avoid all looping due to bad
characters.

Reported-by: Arkadiusz Miśkiewicz <arekm@maven.pl>
Signed-off-by: Alex Elder <aelder@sgi.com>

The only notable update since the last version posted is that the
second bit-flip in the last byte is no longer done (because I
realized it was not necessary).

---
 db/metadump.c |   75 +++++++++++++++++++++++++++++++++++++++++++++-------------
 1 file changed, 59 insertions(+), 16 deletions(-)

Index: b/db/metadump.c
===================================================================
--- a/db/metadump.c
+++ b/db/metadump.c
@@ -474,6 +474,8 @@ generate_obfuscated_name(
 	do {
 		dup = 0;
 		for (;;) {
+			uchar_t	high_bit;
+
 		    	/*
 			 * The beginning of the obfuscated name can
 			 * be pretty much anything, so fill it in
@@ -490,25 +492,66 @@ generate_obfuscated_name(
 			 * Compute which five bytes need to be used
 			 * at the end of the name so the hash of the
 			 * obfuscated name is the same as the hash
-			 * of the original.
+			 * of the original.  If any result in an
+			 * invalid character, flip a bit and arrange
+			 * for a corresponding bit in a neighboring
+			 * byte to be flipped as well.  For the last
+			 * byte, the "neighbor" to change is the
+			 * first byte we're computing here.
 			 */
 			newhash = rol32(newhash, 3) ^ hash;
 
-			newp[namelen - 5] = (newhash >> 28) & 0x7f;
-			if (is_invalid_char(newp[namelen - 5]))
-				continue;
-			newp[namelen - 4] = (newhash >> 21) & 0x7f;
-			if (is_invalid_char(newp[namelen - 4]))
-				continue;
-			newp[namelen - 3] = (newhash >> 14) & 0x7f;
-			if (is_invalid_char(newp[namelen - 3]))
-				continue;
-			newp[namelen - 2] = (newhash >> 7) & 0x7f;
-			if (is_invalid_char(newp[namelen - 2]))
-				continue;
-			newp[namelen - 1] = (newhash >> 0) & 0x7f;
-			if (is_invalid_char(newp[namelen - 1]))
-				continue;
+			high_bit = 0;
+
+			newp[namelen - 5] = ((newhash >> 28) & 0x7f) ^ high_bit;
+			if (is_invalid_char(newp[namelen - 5])) {
+				newp[namelen - 5] ^= 1;
+				high_bit = 0x80;
+			} else
+				high_bit = 0;
+
+			newp[namelen - 4] = ((newhash >> 21) & 0x7f) ^ high_bit;
+			if (is_invalid_char(newp[namelen - 4])) {
+				newp[namelen - 4] ^= 1;
+				high_bit = 0x80;
+			} else
+				high_bit = 0;
+
+			newp[namelen - 3] = ((newhash >> 14) & 0x7f) ^ high_bit;
+			if (is_invalid_char(newp[namelen - 3])) {
+				newp[namelen - 3] ^= 1;
+				high_bit = 0x80;
+			} else
+				high_bit = 0;
+
+			newp[namelen - 2] = ((newhash >> 7) & 0x7f) ^ high_bit;
+			if (is_invalid_char(newp[namelen - 2])) {
+				newp[namelen - 2] ^= 1;
+				high_bit = 0x80;
+			} else
+				high_bit = 0;
+
+			newp[namelen - 1] = ((newhash >> 0) & 0x7f) ^ high_bit;
+			if (is_invalid_char(newp[namelen - 1])) {
+				newp[namelen - 1] ^= 1;
+				high_bit = 0x80;
+			} else
+				high_bit = 0;
+
+			/*
+			 * If we flipped a bit on the last byte, we
+			 * need to fix up the first one we computed.
+			 *
+			 * That first byte had 0's in its upper four
+			 * bits (it's the result of shifting a
+			 * 32-bit unsigned value Right by 28 bits),
+			 * so we don't need to worry about it
+			 * becoming invalid as a result.
+			 */
+			if (high_bit) {
+			    	newp[namelen - 5] ^= 0x10;
+				ASSERT(!is_invalid_char(newp[namelen - 5]));
+			}
 			break;
 		}
 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [PATCH v3, 04/16] xfsprogs: metadump: adjust rather than start over when invalid byte found
  2011-02-18 21:21 [PATCH v3, 04/16] xfsprogs: metadump: adjust rather than start over when invalid byte found Alex Elder
@ 2011-02-24  1:37 ` Dave Chinner
  0 siblings, 0 replies; 2+ messages in thread
From: Dave Chinner @ 2011-02-24  1:37 UTC (permalink / raw)
  To: Alex Elder; +Cc: xfs

On Fri, Feb 18, 2011 at 03:21:01PM -0600, Alex Elder wrote:
> The last 5 bytes of a name generated by generate_obfuscated_name()
> can be selected such that they (along with all of the preceding
> characters in the name) produce any desired value when hashed.
> 
> They are selected based on how their value affects the outcome of
> the hash calculation for the obfuscated name.  Each byte is XOR'd
> into the hash at a certain position.  The portion of the hash
> affected by each of these last five bytes can be seen visually below
> (where "last-3" means the byte 3 positions before the last byte in
> the name):
> 
> 	+-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-+
> hash:	| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
> 	+-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-|-+-+-+-+-+-+-+-+
>        last-4 ->|           |<-- last-2 --->|           |<--- last ---->|
>               |<-- last-3 --->|           |<-- last-1 --->|     |<- last-4
> 
> (Note that byte (last-4) wraps around.  The previous patch in this
> series eliminated the effect of the upper 4 bits of that byte by
> forcing them to be all be 0 bits.)
> 
> Using the (XOR) difference between the hash computed for the
> beginning of the obfuscated name and the hash from the original
> name, we directly determine the required final five bytes to make
> the hashes for the two complete names match.  The lower byte (bits
> 0-7) of that difference is used for the last character in the
> obfuscated name, bits 7-14 for the second-to-last, and so on.
> 
> So we start with the difference between the hash from the complete
> original name and the hash (so far) for a random string constituting
> the first part of the obfuscated name.  We extract five sets of 8
> bits from the result at the positions indicated above, and those
> 8-bit values will become the final five bytes of the obfuscated
> name.  By assuming (or forcing) the top bit of each of these
> extracted values to be 0 (by masking off the top bit), we can ignore
> the overlapping portions when determining the bytes to use.
> 
> 
> It's possible for this process to produce characters ('\0' and '/')
> that are not allowed in valid names.  If this occurs, the existing
> code abandons the current obfuscated name and starts again from the
> beginning.  But there exist cases where this can lead to a
> never-ending loop.
> 
> Arkadiusz Miśkiewicz encountered just such a name, "R\323\257NE".
> That name produces hash value 0x3a4be740, which requires that the
> obfuscated name uses '/' at position last-2.  The current algorithm
> starts over, but since there are no random characters in this
> length-5 obfuscated name, no other possibility will be found, and
> the process repeats forever.
> 
> 
> This change modifies the algorithm used so that if a unallowed
> character arises, we flip a bit in that character, along with
> another "matching" bit in another (overlapping) character such that
> the resulting hash is unchanged.  The two unallowed characters in a
> name are '\0' (0x00) and '/' (0x2f), and flipping any one bit in
> either of those characters results in an allowed character.
> 
> 
> So, starting with the first of these last 5 bytes (last-4), if its
> "normal" value is one of the unallowed characters, we flip its low
> bit and arrange to flip the high bit of its successor byte.  The
> remaining bytes are treated similarly.
> 
> The very last byte has a little different treatment.  We can flip
> its low bit, but it has no successor byte per se.  Its effect on
> the hash does, however overlap the upper four bits from byte
> (last-4).  We can therefore flip the corresponding bit in that (at
> position 0x10).
> 
> 
> There is one more case to consider.  It's possible in that last
> case that by flipping a bit in byte (last-4), we have converted
> that byte to one that's not allowed.  It turns out this won't ever
> happen, because we know that byte was initially assigned a value
> with its upper four bits clear.  Flipping the bit at 0x10 cannot
> therefore produce either 0x00 or 0x2f, so we don't need to treat
> this case.
> 
> 
> With these changes to the name generation algorithm, we avoid
> any of the cases in which no alternate name can be found without
> using an illegal character.  We also avoid all looping due to bad
> characters.
> 
> Reported-by: Arkadiusz Miśkiewicz <arekm@maven.pl>
> Signed-off-by: Alex Elder <aelder@sgi.com>

Reviewed-by: Dave Chinner <dchinner@redhat.com>

-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2011-02-24  1:34 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-18 21:21 [PATCH v3, 04/16] xfsprogs: metadump: adjust rather than start over when invalid byte found Alex Elder
2011-02-24  1:37 ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox