From: roberto.tyley@gmail.com
To: git@vger.kernel.org
Cc: gitster@pobox.com, Roberto Tyley <roberto.tyley@guardian.co.uk>
Subject: [PATCH] Tolerate zlib deflation with window size < 32Kb
Date: Wed, 3 Aug 2011 23:32:10 +0100 [thread overview]
Message-ID: <1312410730-12261-1-git-send-email-roberto.tyley@gmail.com> (raw)
From: Roberto Tyley <roberto.tyley@guardian.co.uk>
Git currently reports loose objects as 'corrupt' if they've been
deflated using a window size less than 32Kb, because the
experimental_loose_object() function doesn't recognise the header
byte as a zlib header. This patch makes the function tolerant of
all valid window sizes (15-bit to 8-bit) - but doesn't sacrifice
it's accuracy in distingushing the standard loose-object format
from the experimental (now abandoned) format.
On memory constrained systems zlib may use a much smaller window
size - working on Agit, I found that Android uses a 4KB window;
giving a header byte of 0x48, not 0x78. Consequently all loose
objects generated appear 'corrupt', which is why Agit is a read-only
Git client at this time - I don't want my client to generate Git
repos that other clients treat as broken :(
This patch makes Git tolerant of different deflate settings - it
might appear that it changes experimental_loose_object() to the point
where it could incorrectly identify the experimental format as the
standard one, but the two criteria (bitmask & checksum) can only
give a false result for an experimental object where both of the
following are true:
1) object size is exactly 8 bytes when uncompressed (bitmask)
2) [single-byte in-pack git type&size header] * 256
+ [1st byte of the following zlib header] % 31 = 0 (checksum)
As it happens, for all possible combinations of valid object type
(1-4) and window bits (0-7), the only time when the checksum will be
divisible by 31 is for 0x1838 - ie object type *1*, a Commit - which,
due the fields all Commit objects must contain, could never be as
small as 8 bytes in size.
Given this, the combination of the two criteria (bitmask & checksum)
always correctly determines the buffer format, and is more tolerant
than the previous version.
The alternative to this patch is simply removing support for the
experimental format, which I am also totally cool with.
References:
Android uses a 4KB window for deflation:
http://android.git.kernel.org/?p=platform/libcore.git;a=blob;f=luni/src/main/native/java_util_zip_Deflater.cpp;h=c0b2feff196e63a7b85d97cf9ae5bb2583409c28;hb=refs/heads/gingerbread#l53
Code snippet searching for false positives with the zlib checksum:
https://gist.github.com/1118177
Signed-off-by: Roberto Tyley <roberto.tyley@guardian.co.uk>
---
sha1_file.c | 32 ++++++++++++++++++++++++++------
1 files changed, 26 insertions(+), 6 deletions(-)
diff --git a/sha1_file.c b/sha1_file.c
index 89d7e5e..2083e8b 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1217,14 +1217,34 @@ static int experimental_loose_object(unsigned char *map)
unsigned int word;
/*
- * Is it a zlib-compressed buffer? If so, the first byte
- * must be 0x78 (15-bit window size, deflated), and the
- * first 16-bit word is evenly divisible by 31. If so,
- * we are looking at the official format, not the experimental
- * one.
+ * We must determine if the buffer contains the standard
+ * zlib-deflated stream or the experimental format based
+ * on the in-pack object format. Compare the header byte
+ * for each format:
+ *
+ * RFC1950 zlib w/ deflate : 0www1000 : 0 <= www <= 7
+ * Experimental pack-based : Stttssss : ttt = 1,2,3,4
+ *
+ * If bit 7 is clear and bits 0-3 equal 8, the buffer MUST be
+ * in standard loose-object format, UNLESS it is a Git-pack
+ * format object *exactly* 8 bytes in size when inflated.
+ *
+ * However, RFC1950 also specifies that the 1st 16-bit word
+ * must be divisible by 31 - this checksum tells us our buffer
+ * is in the standard format, giving a false positive only if
+ * the 1st word of the Git-pack format object happens to be
+ * divisible by 31, ie:
+ * ((byte0 * 256) + byte1) % 31 = 0
+ * => 0ttt10000www1000 % 31 = 0
+ *
+ * As it happens, this case can only arise for www=3 & ttt=1
+ * - ie, a Commit object, which would have to be 8 bytes in
+ * size. As no Commit can be that small, we find that the
+ * combination of these two criteria (bitmask & checksum)
+ * can always correctly determine the buffer format.
*/
word = (map[0] << 8) + map[1];
- if (map[0] == 0x78 && !(word % 31))
+ if ((map[0] & 0x88) == 0x08 && !(word % 31))
return 0;
else
return 1;
--
1.7.4.1
next reply other threads:[~2011-08-03 22:32 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-08-03 22:32 roberto.tyley [this message]
2011-08-03 23:56 ` [PATCH] Tolerate zlib deflation with window size < 32Kb Junio C Hamano
[not found] ` <CAFY1edZyO7oYDi+tV2mxbhBHY_cf2F0bD7+KF9rxmKYygSFAjA@mail.gmail.com>
2011-08-04 7:40 ` Roberto Tyley
2011-08-04 7:45 ` Sverre Rabbelier
2011-08-04 17:11 ` Junio C Hamano
2011-08-07 18:46 ` [PATCH v2] " roberto.tyley
2011-08-08 19:23 ` Junio C Hamano
2011-08-08 20:07 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1312410730-12261-1-git-send-email-roberto.tyley@gmail.com \
--to=roberto.tyley@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=roberto.tyley@guardian.co.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).