* corrupt object memory allocation error
@ 2013-11-20 20:33 Joey Hess
2013-11-20 21:33 ` Jeff King
0 siblings, 1 reply; 28+ messages in thread
From: Joey Hess @ 2013-11-20 20:33 UTC (permalink / raw)
To: git
[-- Attachment #1.1: Type: text/plain, Size: 618 bytes --]
I've got a git repository of < 2 mb, where git wants to
allocate a rather insane amount of memory:
>git fsck
Checking object directories: 100% (256/256), done.
fatal: Out of memory, malloc failed (tried to allocate 124865231165 bytes)
> git show 11644b5a075dc1425e01fbba51c045cea2d0c408
fatal: Out of memory, malloc failed (tried to allocate 124865231165 bytes)
The problem seems to be the attached object file, which has gotten
corrupted, presumably in the header that git reads to see how large it
is. Thought I'd report this in case there is some easy way to
add a sanity check.
--
see shy jo
[-- Attachment #1.2: 644b5a075dc1425e01fbba51c045cea2d0c408 --]
[-- Type: application/octet-stream, Size: 7017 bytes --]
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: corrupt object memory allocation error
2013-11-20 20:33 corrupt object memory allocation error Joey Hess
@ 2013-11-20 21:33 ` Jeff King
2013-11-20 22:28 ` Joey Hess
0 siblings, 1 reply; 28+ messages in thread
From: Jeff King @ 2013-11-20 21:33 UTC (permalink / raw)
To: Joey Hess; +Cc: git
On Wed, Nov 20, 2013 at 04:33:50PM -0400, Joey Hess wrote:
> I've got a git repository of < 2 mb, where git wants to
> allocate a rather insane amount of memory:
>
> >git fsck
> Checking object directories: 100% (256/256), done.
> fatal: Out of memory, malloc failed (tried to allocate 124865231165 bytes)
>
> > git show 11644b5a075dc1425e01fbba51c045cea2d0c408
> fatal: Out of memory, malloc failed (tried to allocate 124865231165 bytes)
>
> The problem seems to be the attached object file, which has gotten
> corrupted, presumably in the header that git reads to see how large it
> is. Thought I'd report this in case there is some easy way to
> add a sanity check.
Definitely a corrupt object. The start is not a valid zlib header, so we
guess that it is an "experimental loose object". This is a format that
git wrote for very short period as a performance experiment; it didn't
pan out and we no longer write it.
The loose object format contains the (purported) object size outside of
the checksum'd zlib data (whereas the normal format has a human-readable
header that gets zlib'd). Your corrupted bytes end up specifying a
ridiculously large size.
I wonder if it is time to drop reading support for the experimental
objects. It was never widely used, and was deprecated in v1.5.2 by
726f852 (deprecate the new loose object header format, 2007-05-09). That
would improve the case when the initial bytes of a loose object are
corrupted, because we would complain about the bogus zlib data before
trying to allocate the buffer.
The problem would still remain for packfiles, which use a similar
encoding, but I suspect it is less common there. For a single-byte
corruption, it is unlikely to be right in the length header. But for
absolute junk that is not git data at all, the first bytes are very
likely to be corrupted. In the pack case, we would notice early that it
does not look like a packfile; for the loose object, we have no such
header and proceed with the allocation.
As for your specific corruption, I can't make heads or tails of it. It
is not a single-bit error. The first two bytes of a loose object should
always be <0x78, 0x01>, which is the standard zlib deflate header. Your
bytes aren't even close, and decoding the rest with a corrupted zlib
header seems fruitless.
You don't happen to have another copy of the object (or of the data
contained in the object, such as the working tree file), do you? It
might be interesting to see a comparison of the bytes of the correct
data and your corruption.
-Peff
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: corrupt object memory allocation error
2013-11-20 21:33 ` Jeff King
@ 2013-11-20 22:28 ` Joey Hess
2013-11-21 11:41 ` [PATCH] drop support for "experimental" loose objects Jeff King
0 siblings, 1 reply; 28+ messages in thread
From: Joey Hess @ 2013-11-20 22:28 UTC (permalink / raw)
To: Jeff King; +Cc: git
[-- Attachment #1: Type: text/plain, Size: 307 bytes --]
Jeff King wrote:
> As for your specific corruption, I can't make heads or tails of it. It
> is not a single-bit error.
Oh, I should have mentioned that I am generating corrupt git
repositories mechanically for testing. I think in this case it prepended
some garbage to an object.
--
see shy jo
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH] drop support for "experimental" loose objects
2013-11-20 22:28 ` Joey Hess
@ 2013-11-21 11:41 ` Jeff King
2013-11-21 11:48 ` Jeff King
2013-11-21 16:04 ` Joey Hess
0 siblings, 2 replies; 28+ messages in thread
From: Jeff King @ 2013-11-21 11:41 UTC (permalink / raw)
To: Joey Hess; +Cc: git
On Wed, Nov 20, 2013 at 06:28:06PM -0400, Joey Hess wrote:
> Jeff King wrote:
> > As for your specific corruption, I can't make heads or tails of it. It
> > is not a single-bit error.
>
> Oh, I should have mentioned that I am generating corrupt git
> repositories mechanically for testing. I think in this case it prepended
> some garbage to an object.
Ah. That explains a lot (yes, dropping the first 19 bytes of your
corrupted file recovers the object).
I still think we should probably do this, though:
-- >8 --
Subject: drop support for "experimental" loose objects
In git v1.4.3, we introduced a new loose object format that
encoded some object information outside of the zlib stream.
Ultimately the format was dropped in v1.5.3, but we kept the
reading side around to help people migrate objects. Each
time we open a loose object, we use a heuristic to check
whether it is in the normal loose format or the
experimental one.
This heuristic is robust in the face of valid data, but it
tends to treat corrupted or garbage data as an experimental
object. With the regular format, we would notice quickly
that zlib's crc does not check out and complain. With the
experimental object, we are likely to extract a nonsensical
object size and try to allocate a huge buffer, resulting in
xmalloc calling "die".
This latter behavior is much worse for two reasons. One,
git reports an allocation error when the real error is
corruption. And two, the program dies unconditionally, so
you cannot even run fsck (which would otherwise ignore the
broken object and keep going).
We could try to improve the heuristic to err on the side of
normal objects in the face of corruption, but there is
really little point. The experimental format is long-dead,
and was never enabled by default to begin with. We can
instead simply remove it. The only affected repository would
be one that explicitly set core.legacyheaders in 2007, and
then never repacked in the intervening 6 years.
Signed-off-by: Jeff King <peff@peff.net>
---
The test objects removed are all binary. Git seems to guess a few as
non-binary, though, because they don't contain any NULs, and includes
gross binary bytes in the patch below. In theory the mail's transfer
encoding will take care of this. We'll see, I guess. :)
sha1_file.c | 74 ---------------------
t/t1013-loose-object-format.sh | 66 ------------------
.../14/9cedb5c46929d18e0f118e9fa31927487af3b6 | Bin 117 -> 0 bytes
.../16/56f9233d999f61ef23ef390b9c71d75399f435 | Bin 17 -> 0 bytes
.../1e/72a6b2c4a577ab0338860fa9fe87f761fc9bbd | Bin 18 -> 0 bytes
.../25/7cc5642cb1a054f08cc83f2d943e56fd3ebe99 | Bin 19 -> 0 bytes
.../2e/65efe2a145dda7ee51d1741299f848e5bf752e | Bin 10 -> 0 bytes
.../6b/aee0540ea990d9761a3eb9ab183003a71c3696 | Bin 181 -> 0 bytes
.../70/e6a83d8dcb26fc8bc0cf702e2ddeb6adca18fd | Bin 26 -> 0 bytes
.../76/e7fa9941f4d5f97f64fea65a2cba436bc79cbb | 2 -
.../78/75c6237d3fcdd0ac2f0decc7d3fa6a50b66c09 | Bin 139 -> 0 bytes
.../7a/37b887a73791d12d26c0d3e39568a8fb0fa6e8 | Bin 54 -> 0 bytes
.../85/df50785d62d3b05ab03d9cbf7e4a0b49449730 | Bin 13 -> 0 bytes
.../8d/4e360d6c70fbd72411991c02a09c442cf7a9fa | Bin 156 -> 0 bytes
.../95/b1625de3ba8b2214d1e0d0591138aea733f64f | Bin 252 -> 0 bytes
.../9a/e9e86b7bd6cb1472d9373702d8249973da0832 | Bin 11 -> 0 bytes
.../bd/15045f6ce8ff75747562173640456a394412c8 | Bin 34 -> 0 bytes
.../e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391 | Bin 9 -> 0 bytes
.../f8/16d5255855ac160652ee5253b06cd8ee14165a | 1 -
19 files changed, 143 deletions(-)
delete mode 100755 t/t1013-loose-object-format.sh
delete mode 100644 t/t1013/objects/14/9cedb5c46929d18e0f118e9fa31927487af3b6
delete mode 100644 t/t1013/objects/16/56f9233d999f61ef23ef390b9c71d75399f435
delete mode 100644 t/t1013/objects/1e/72a6b2c4a577ab0338860fa9fe87f761fc9bbd
delete mode 100644 t/t1013/objects/25/7cc5642cb1a054f08cc83f2d943e56fd3ebe99
delete mode 100644 t/t1013/objects/2e/65efe2a145dda7ee51d1741299f848e5bf752e
delete mode 100644 t/t1013/objects/6b/aee0540ea990d9761a3eb9ab183003a71c3696
delete mode 100644 t/t1013/objects/70/e6a83d8dcb26fc8bc0cf702e2ddeb6adca18fd
delete mode 100644 t/t1013/objects/76/e7fa9941f4d5f97f64fea65a2cba436bc79cbb
delete mode 100644 t/t1013/objects/78/75c6237d3fcdd0ac2f0decc7d3fa6a50b66c09
delete mode 100644 t/t1013/objects/7a/37b887a73791d12d26c0d3e39568a8fb0fa6e8
delete mode 100644 t/t1013/objects/85/df50785d62d3b05ab03d9cbf7e4a0b49449730
delete mode 100644 t/t1013/objects/8d/4e360d6c70fbd72411991c02a09c442cf7a9fa
delete mode 100644 t/t1013/objects/95/b1625de3ba8b2214d1e0d0591138aea733f64f
delete mode 100644 t/t1013/objects/9a/e9e86b7bd6cb1472d9373702d8249973da0832
delete mode 100644 t/t1013/objects/bd/15045f6ce8ff75747562173640456a394412c8
delete mode 100644 t/t1013/objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391
delete mode 100644 t/t1013/objects/f8/16d5255855ac160652ee5253b06cd8ee14165a
diff --git a/sha1_file.c b/sha1_file.c
index 7dadd04..a72fcb6 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1442,51 +1442,6 @@ void *map_sha1_file(const unsigned char *sha1, unsigned long *size)
return map;
}
-/*
- * There used to be a second loose object header format which
- * was meant to mimic the in-pack format, allowing for direct
- * copy of the object data. This format turned up not to be
- * really worth it and we no longer write loose objects in that
- * format.
- */
-static int experimental_loose_object(unsigned char *map)
-{
- unsigned int word;
-
- /*
- * We must determine if the buffer contains the standard
- * zlib-deflated stream or the experimental format based
- * on the in-pack object format. Compare the header byte
- * for each format:
- *
- * RFC1950 zlib w/ deflate : 0www1000 : 0 <= www <= 7
- * Experimental pack-based : Stttssss : ttt = 1,2,3,4
- *
- * If bit 7 is clear and bits 0-3 equal 8, the buffer MUST be
- * in standard loose-object format, UNLESS it is a Git-pack
- * format object *exactly* 8 bytes in size when inflated.
- *
- * However, RFC1950 also specifies that the 1st 16-bit word
- * must be divisible by 31 - this checksum tells us our buffer
- * is in the standard format, giving a false positive only if
- * the 1st word of the Git-pack format object happens to be
- * divisible by 31, ie:
- * ((byte0 * 256) + byte1) % 31 = 0
- * => 0ttt10000www1000 % 31 = 0
- *
- * As it happens, this case can only arise for www=3 & ttt=1
- * - ie, a Commit object, which would have to be 8 bytes in
- * size. As no Commit can be that small, we find that the
- * combination of these two criteria (bitmask & checksum)
- * can always correctly determine the buffer format.
- */
- word = (map[0] << 8) + map[1];
- if ((map[0] & 0x8F) == 0x08 && !(word % 31))
- return 0;
- else
- return 1;
-}
-
unsigned long unpack_object_header_buffer(const unsigned char *buf,
unsigned long len, enum object_type *type, unsigned long *sizep)
{
@@ -1514,14 +1469,6 @@ unsigned long unpack_object_header_buffer(const unsigned char *buf,
int unpack_sha1_header(git_zstream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz)
{
- unsigned long size, used;
- static const char valid_loose_object_type[8] = {
- 0, /* OBJ_EXT */
- 1, 1, 1, 1, /* "commit", "tree", "blob", "tag" */
- 0, /* "delta" and others are invalid in a loose object */
- };
- enum object_type type;
-
/* Get the data stream */
memset(stream, 0, sizeof(*stream));
stream->next_in = map;
@@ -1529,27 +1476,6 @@ int unpack_sha1_header(git_zstream *stream, unsigned char *map, unsigned long ma
stream->next_out = buffer;
stream->avail_out = bufsiz;
- if (experimental_loose_object(map)) {
- /*
- * The old experimental format we no longer produce;
- * we can still read it.
- */
- used = unpack_object_header_buffer(map, mapsize, &type, &size);
- if (!used || !valid_loose_object_type[type])
- return -1;
- map += used;
- mapsize -= used;
-
- /* Set up the stream for the rest.. */
- stream->next_in = map;
- stream->avail_in = mapsize;
- git_inflate_init(stream);
-
- /* And generate the fake traditional header */
- stream->total_out = 1 + snprintf(buffer, bufsiz, "%s %lu",
- typename(type), size);
- return 0;
- }
git_inflate_init(stream);
return git_inflate(stream, 0);
}
diff --git a/t/t1013-loose-object-format.sh b/t/t1013-loose-object-format.sh
deleted file mode 100755
index fbf5f2f..0000000
--- a/t/t1013-loose-object-format.sh
+++ /dev/null
@@ -1,66 +0,0 @@
-#!/bin/sh
-#
-# Copyright (c) 2011 Roberto Tyley
-#
-
-test_description='Correctly identify and parse loose object headers
-
-There are two file formats for loose objects - the original standard
-format, and the experimental format introduced with Git v1.4.3, later
-deprecated with v1.5.3. Although Git no longer writes the
-experimental format, objects in both formats must be read, with the
-format for a given file being determined by the header.
-
-Detecting file format based on header is not entirely trivial, not
-least because the first byte of a zlib-deflated stream will vary
-depending on how much memory was allocated for the deflation window
-buffer when the object was written out (for example 4KB on Android,
-rather that 32KB on a normal PC).
-
-The loose objects used as test vectors have been generated with the
-following Git versions:
-
-standard format: Git v1.7.4.1
-experimental format: Git v1.4.3 (legacyheaders=false)
-standard format, deflated with 4KB window size: Agit/JGit on Android
-'
-
-. ./test-lib.sh
-
-assert_blob_equals() {
- printf "%s" "$2" >expected &&
- git cat-file -p "$1" >actual &&
- test_cmp expected actual
-}
-
-test_expect_success setup '
- cp -R "$TEST_DIRECTORY/t1013/objects" .git/ &&
- git --version
-'
-
-test_expect_success 'read standard-format loose objects' '
- git cat-file tag 8d4e360d6c70fbd72411991c02a09c442cf7a9fa &&
- git cat-file commit 6baee0540ea990d9761a3eb9ab183003a71c3696 &&
- git ls-tree 7a37b887a73791d12d26c0d3e39568a8fb0fa6e8 &&
- assert_blob_equals "257cc5642cb1a054f08cc83f2d943e56fd3ebe99" "foo$LF"
-'
-
-test_expect_success 'read experimental-format loose objects' '
- git cat-file tag 76e7fa9941f4d5f97f64fea65a2cba436bc79cbb &&
- git cat-file commit 7875c6237d3fcdd0ac2f0decc7d3fa6a50b66c09 &&
- git ls-tree 95b1625de3ba8b2214d1e0d0591138aea733f64f &&
- assert_blob_equals "2e65efe2a145dda7ee51d1741299f848e5bf752e" "a" &&
- assert_blob_equals "9ae9e86b7bd6cb1472d9373702d8249973da0832" "ab" &&
- assert_blob_equals "85df50785d62d3b05ab03d9cbf7e4a0b49449730" "abcd" &&
- assert_blob_equals "1656f9233d999f61ef23ef390b9c71d75399f435" "abcdefgh" &&
- assert_blob_equals "1e72a6b2c4a577ab0338860fa9fe87f761fc9bbd" "abcdefghi" &&
- assert_blob_equals "70e6a83d8dcb26fc8bc0cf702e2ddeb6adca18fd" "abcdefghijklmnop" &&
- assert_blob_equals "bd15045f6ce8ff75747562173640456a394412c8" "abcdefghijklmnopqrstuvwx"
-'
-
-test_expect_success 'read standard-format objects deflated with smaller window buffer' '
- git cat-file tag f816d5255855ac160652ee5253b06cd8ee14165a &&
- git cat-file tag 149cedb5c46929d18e0f118e9fa31927487af3b6
-'
-
-test_done
diff --git a/t/t1013/objects/14/9cedb5c46929d18e0f118e9fa31927487af3b6 b/t/t1013/objects/14/9cedb5c46929d18e0f118e9fa31927487af3b6
deleted file mode 100644
index 472fd1458e03e47136416bce60d6e7c893a468ab..0000000000000000000000000000000000000000
GIT binary patch
literal 0
HcmV?d00001
literal 117
zcmV-*0E+)ei51K-4#F@DKvCwL!aJ&DH^jjbLR`g3tWwmHs(9h{l<n&c-*p0_eCp+8
z)q#teVY;BH2sX(~8m)*Hx<<sPnQCO=;NQ)l_H~^-`0>zp+xy&xqlfV?lkEVv$I`1V
X&;Ic{P^6Jl5+pbyA%^e+FOeitJb^w>
diff --git a/t/t1013/objects/16/56f9233d999f61ef23ef390b9c71d75399f435 b/t/t1013/objects/16/56f9233d999f61ef23ef390b9c71d75399f435
deleted file mode 100644
index c379d74ae2b40faae9c31cb7478bc7f42a6fb13c..0000000000000000000000000000000000000000
GIT binary patch
literal 0
HcmV?d00001
literal 17
YcmcDhnB(o^<>%?^eV&1VkAYbg05K>8VgLXD
diff --git a/t/t1013/objects/1e/72a6b2c4a577ab0338860fa9fe87f761fc9bbd b/t/t1013/objects/1e/72a6b2c4a577ab0338860fa9fe87f761fc9bbd
deleted file mode 100644
index 93706305bcff060547181ee4b7d3a8583b691181..0000000000000000000000000000000000000000
GIT binary patch
literal 0
HcmV?d00001
literal 18
ZcmcDlnB(o^<>%?^ef|UsgJ2(X9{@cS1}Ojl
diff --git a/t/t1013/objects/25/7cc5642cb1a054f08cc83f2d943e56fd3ebe99 b/t/t1013/objects/25/7cc5642cb1a054f08cc83f2d943e56fd3ebe99
deleted file mode 100644
index bdcf704c9e663f3a11b3146b1b455bc2581b4761..0000000000000000000000000000000000000000
GIT binary patch
literal 0
HcmV?d00001
literal 19
acmb<m^geacKgb}_<MjFGObnu-%uWDJxd#LQ
diff --git a/t/t1013/objects/2e/65efe2a145dda7ee51d1741299f848e5bf752e b/t/t1013/objects/2e/65efe2a145dda7ee51d1741299f848e5bf752e
deleted file mode 100644
index ad62c43e418c11254cede4dc94982ac6a30dbe9c..0000000000000000000000000000000000000000
GIT binary patch
literal 0
HcmV?d00001
literal 10
RcmXr4nB&dDz>vg{1ON`X0$Bh6
diff --git a/t/t1013/objects/6b/aee0540ea990d9761a3eb9ab183003a71c3696 b/t/t1013/objects/6b/aee0540ea990d9761a3eb9ab183003a71c3696
deleted file mode 100644
index 3d2f0337dbb64c092b4a7e9bd324a66986cfe01a..0000000000000000000000000000000000000000
GIT binary patch
literal 0
HcmV?d00001
literal 181
zcmV;m080OO0i}>jY6CG41+&&EdLIz_c+?s&#*jt!#uw=6?r{d&BO_}9zP*3BL6-Fv
zMe(?t&r^etx{p>>0V(2;GZIGZz4#y@v6HB=?^32b4sN8R*<7gV+yFCnoI*s2Ba1lV
zFgj7@=Rk=%H>8K4H?*{$QejsHt*yZRcG4TH>l<x*;`Xpmm5FA{#V*GU_~=6l7@~(y
j=bbbBs%`pTkNNr&2`txXKEU_mgI{mauB<nAL=six5bRpF
diff --git a/t/t1013/objects/70/e6a83d8dcb26fc8bc0cf702e2ddeb6adca18fd b/t/t1013/objects/70/e6a83d8dcb26fc8bc0cf702e2ddeb6adca18fd
deleted file mode 100644
index b3f71a6ee5802b36b4576c5ebf111f30ef2017a4..0000000000000000000000000000000000000000
GIT binary patch
literal 0
HcmV?d00001
literal 26
icmdnMSTV=j$IH*t*Zcg5GpEj-JbPN7fx*mytrGyBA`4~!
diff --git a/t/t1013/objects/76/e7fa9941f4d5f97f64fea65a2cba436bc79cbb b/t/t1013/objects/76/e7fa9941f4d5f97f64fea65a2cba436bc79cbb
deleted file mode 100644
index af4e9a7..0000000
--- a/t/t1013/objects/76/e7fa9941f4d5f97f64fea65a2cba436bc79cbb
+++ /dev/null
@@ -1,2 +0,0 @@
-Â\vx%ÌA\x0e0\x10@Ñ}O1{cSZ(\x18ãνá\x02ÃthªZÜÞ Ëÿ\x16?\r\x0f¦\x02m×6dµi\x19É9
¤Gåh\a´Ø¨ÁZR'Q¶
R¡\x1eø³p\x0eçÓqL9âÏ=g¸§sIÐo\x13opÎÿeÏ«_1»³¤$×ç\x05*Si«ëNwpPRBôûÅÁú
-³[(ð®d-ø\x02ÁL9á
\ No newline at end of file
diff --git a/t/t1013/objects/78/75c6237d3fcdd0ac2f0decc7d3fa6a50b66c09 b/t/t1013/objects/78/75c6237d3fcdd0ac2f0decc7d3fa6a50b66c09
deleted file mode 100644
index 3dd28be5c61840c23486bc654213c493c5387d75..0000000000000000000000000000000000000000
GIT binary patch
literal 0
HcmV?d00001
literal 139
zcmV;60CfM64S1ZT%u5QwFc1LHeNHiZA!PEi1rc|y+=v&LG*cUF4TP!E+lzPvmv8f=
zF+(2`MjJA_L|w8LeMUCfgdWj##I$#AjDA$K%2XR%YvLvqZrjWo9NLdszC7JmYPrx;
t4^^*^BcMYYt&hRN&Y&@BsLN7B_}@oeC^Ni^OmHp&FVtQ;^#LMbK`4x{Jre)`
diff --git a/t/t1013/objects/7a/37b887a73791d12d26c0d3e39568a8fb0fa6e8 b/t/t1013/objects/7a/37b887a73791d12d26c0d3e39568a8fb0fa6e8
deleted file mode 100644
index 2b97b264c33f78fe6c8230b5bbeacd6409d9f963..0000000000000000000000000000000000000000
GIT binary patch
literal 0
HcmV?d00001
literal 54
zcmV-60LlM&0V^p=O;s>9XD~D{Ff%bx2yqP#(RK6mab-}gIhvxgaY4w3o)h-EQ|!Y2
M+U=VO05jJR92g)Lz5oCK
diff --git a/t/t1013/objects/85/df50785d62d3b05ab03d9cbf7e4a0b49449730 b/t/t1013/objects/85/df50785d62d3b05ab03d9cbf7e4a0b49449730
deleted file mode 100644
index 6dff746876aab1acb9b42c557cdda69164d68398..0000000000000000000000000000000000000000
GIT binary patch
literal 0
HcmV?d00001
literal 13
UcmXr1nB(o^<;Tdte1owY030|2-~a#s
diff --git a/t/t1013/objects/8d/4e360d6c70fbd72411991c02a09c442cf7a9fa b/t/t1013/objects/8d/4e360d6c70fbd72411991c02a09c442cf7a9fa
deleted file mode 100644
index cb41e92d076c191245389bedfc0184ff78511c03..0000000000000000000000000000000000000000
GIT binary patch
literal 0
HcmV?d00001
literal 156
zcmV;N0Av4n0VRw<3c@fD06pgwdly5tsfhs*Z{DRJ*tBb?+D6i?(BE72I0G|63DCPu
zj(2VaTqI_*uMJZOrVHL7S&o4s9;`8zJhs*ar(}6Cw0RhMQL;WJp|PXV?QXdY^mB;|
zTyx|i8JgwE3mnTIwS4iM<~8VP)NR)D;{<52a+SALfUQAelxip??qHt!F~Ox5c%$~Z
K)~G%MG&zg-a!8y2
diff --git a/t/t1013/objects/95/b1625de3ba8b2214d1e0d0591138aea733f64f b/t/t1013/objects/95/b1625de3ba8b2214d1e0d0591138aea733f64f
deleted file mode 100644
index 7ac46b4f703aa00cd96c42971a237e3da5bbb5ca..0000000000000000000000000000000000000000
GIT binary patch
literal 0
HcmV?d00001
literal 252
zcmV<Y00aN25_p_5G%zqTF;Pg(OwTCME2$`95DWXMY&&y);(O)ymfUj+uLsZkVrmFc
zl$Kvw1Xj~}KcFHu>GFoC4YqUk*LiV!x=c5Ks>#dDO9iWuD_XYc$kOuF%oc6@EC02B
zPy91`FH}uFREb{d`$r31?=F8Ac(Fui<`0jj`%CqpN{TZpN>Wqvz{(1qt+4Gqt@fw;
z!1)3_-Fw^Co|5<rRaR1-npaY(3wPLFQI`0e7ynC3N|VIR99*+3U4%}+mF9z$%zF7E
zyZYK`k)oUC=1ezKW)|P#FoG(nN-ct@c{caa>`fQ1IeT|&t}Bnaap*};@I(N)b$dQ6
C6@X{}
diff --git a/t/t1013/objects/9a/e9e86b7bd6cb1472d9373702d8249973da0832 b/t/t1013/objects/9a/e9e86b7bd6cb1472d9373702d8249973da0832
deleted file mode 100644
index 9d8316d4e598e32ef17b3918cf00276c8c1bffba..0000000000000000000000000000000000000000
GIT binary patch
literal 0
HcmV?d00001
literal 11
ScmXr2nB(ok#K5S=a0CDn4+6^o
diff --git a/t/t1013/objects/bd/15045f6ce8ff75747562173640456a394412c8 b/t/t1013/objects/bd/15045f6ce8ff75747562173640456a394412c8
deleted file mode 100644
index eebf23956e3b8ac736fa04e682b4214ac75c716a..0000000000000000000000000000000000000000
GIT binary patch
literal 0
HcmV?d00001
literal 34
qcmdnNSTV=j$IH*t*Zcg5GpEj-JbPMSLq|(bQ&)RE14GpTE?oczGYz%?
diff --git a/t/t1013/objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391 b/t/t1013/objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391
deleted file mode 100644
index 134cf1937963c733d7affa6919f9835db864188a..0000000000000000000000000000000000000000
GIT binary patch
literal 0
HcmV?d00001
literal 9
OcmXr0n8VBf1dIR)&;dyR
diff --git a/t/t1013/objects/f8/16d5255855ac160652ee5253b06cd8ee14165a b/t/t1013/objects/f8/16d5255855ac160652ee5253b06cd8ee14165a
deleted file mode 100644
index 26b75ae..0000000
--- a/t/t1013/objects/f8/16d5255855ac160652ee5253b06cd8ee14165a
+++ /dev/null
@@ -1 +0,0 @@
-H\x15ÌÁ\x0e0\faÏ{Þ\rI»e\x1d&Æø*¥\x1d\x01G°\x17ß^¸ýù\x0e¿Ë\x04DåÒwUÒ¬\x1cS±4ª\x19Æ\x11ª ,\x19\afÅ[ðßVAÛºÎ\x1eüxÈÇö6[wtG§Lu\a¸?¦²¼Ú×\x1f@"gì{+\x12b\b\x7fy¾%M
\ No newline at end of file
--
1.8.5.rc2.442.gbff39ff
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH] drop support for "experimental" loose objects
2013-11-21 11:41 ` [PATCH] drop support for "experimental" loose objects Jeff King
@ 2013-11-21 11:48 ` Jeff King
2013-11-21 12:43 ` Duy Nguyen
` (2 more replies)
2013-11-21 16:04 ` Joey Hess
1 sibling, 3 replies; 28+ messages in thread
From: Jeff King @ 2013-11-21 11:48 UTC (permalink / raw)
To: Joey Hess; +Cc: git
On Thu, Nov 21, 2013 at 06:41:58AM -0500, Jeff King wrote:
> The test objects removed are all binary. Git seems to guess a few as
> non-binary, though, because they don't contain any NULs, and includes
> gross binary bytes in the patch below. In theory the mail's transfer
> encoding will take care of this. We'll see, I guess. :)
Nope; something munged it along the way (I sent it as C-T-E: 8bit, but
vger rewrites to QP before re-mailing, so that is the likely spot).
Here's the same patch, but with this stuck into .git/info/attributes:
/t/t1013/objects/*/* binary
which should work.
-- >8 --
Subject: drop support for "experimental" loose objects
In git v1.4.3, we introduced a new loose object format that
encoded some object information outside of the zlib stream.
Ultimately the format was dropped in v1.5.3, but we kept the
reading side around to help people migrate objects. Each
time we open a loose object, we use a heuristic to check
whether it is in the normal loose format, or the
experimental one.
This heuristic is robust in the face of valid data, but it
tends to treat corrupted or garbage data as an experimental
object. With the regular format, we would notice quickly
that zlib's crc does not check out and complain. With the
experimental object, we are likely to extract a nonsensical
object size and try to allocate a huge buffer, resulting in
xmalloc calling "die".
This latter behavior is much worse, for two reasons. One,
git reports an allocation error when the real error is
corruption. And two, the program dies unconditionally, so
you cannot even run fsck (which would otherwise ignore the
broken object and keep going).
We could try to improve the heuristic to err on the side of
normal objects in the face of corruption, but there is
really little point. The experimental format is long-dead,
and was never enabled by default to begin with. We can
instead simply remove it. The only affected repository would
be one that explicitly set core.legacyheaders in 2007, and
then never repacked in the intervening 6 years.
Signed-off-by: Jeff King <peff@peff.net>
---
sha1_file.c | 74 ---------------------
t/t1013-loose-object-format.sh | 66 ------------------
.../14/9cedb5c46929d18e0f118e9fa31927487af3b6 | Bin 117 -> 0 bytes
.../16/56f9233d999f61ef23ef390b9c71d75399f435 | Bin 17 -> 0 bytes
.../1e/72a6b2c4a577ab0338860fa9fe87f761fc9bbd | Bin 18 -> 0 bytes
.../25/7cc5642cb1a054f08cc83f2d943e56fd3ebe99 | Bin 19 -> 0 bytes
.../2e/65efe2a145dda7ee51d1741299f848e5bf752e | Bin 10 -> 0 bytes
.../6b/aee0540ea990d9761a3eb9ab183003a71c3696 | Bin 181 -> 0 bytes
.../70/e6a83d8dcb26fc8bc0cf702e2ddeb6adca18fd | Bin 26 -> 0 bytes
.../76/e7fa9941f4d5f97f64fea65a2cba436bc79cbb | Bin 155 -> 0 bytes
.../78/75c6237d3fcdd0ac2f0decc7d3fa6a50b66c09 | Bin 139 -> 0 bytes
.../7a/37b887a73791d12d26c0d3e39568a8fb0fa6e8 | Bin 54 -> 0 bytes
.../85/df50785d62d3b05ab03d9cbf7e4a0b49449730 | Bin 13 -> 0 bytes
.../8d/4e360d6c70fbd72411991c02a09c442cf7a9fa | Bin 156 -> 0 bytes
.../95/b1625de3ba8b2214d1e0d0591138aea733f64f | Bin 252 -> 0 bytes
.../9a/e9e86b7bd6cb1472d9373702d8249973da0832 | Bin 11 -> 0 bytes
.../bd/15045f6ce8ff75747562173640456a394412c8 | Bin 34 -> 0 bytes
.../e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391 | Bin 9 -> 0 bytes
.../f8/16d5255855ac160652ee5253b06cd8ee14165a | Bin 116 -> 0 bytes
19 files changed, 140 deletions(-)
delete mode 100755 t/t1013-loose-object-format.sh
delete mode 100644 t/t1013/objects/14/9cedb5c46929d18e0f118e9fa31927487af3b6
delete mode 100644 t/t1013/objects/16/56f9233d999f61ef23ef390b9c71d75399f435
delete mode 100644 t/t1013/objects/1e/72a6b2c4a577ab0338860fa9fe87f761fc9bbd
delete mode 100644 t/t1013/objects/25/7cc5642cb1a054f08cc83f2d943e56fd3ebe99
delete mode 100644 t/t1013/objects/2e/65efe2a145dda7ee51d1741299f848e5bf752e
delete mode 100644 t/t1013/objects/6b/aee0540ea990d9761a3eb9ab183003a71c3696
delete mode 100644 t/t1013/objects/70/e6a83d8dcb26fc8bc0cf702e2ddeb6adca18fd
delete mode 100644 t/t1013/objects/76/e7fa9941f4d5f97f64fea65a2cba436bc79cbb
delete mode 100644 t/t1013/objects/78/75c6237d3fcdd0ac2f0decc7d3fa6a50b66c09
delete mode 100644 t/t1013/objects/7a/37b887a73791d12d26c0d3e39568a8fb0fa6e8
delete mode 100644 t/t1013/objects/85/df50785d62d3b05ab03d9cbf7e4a0b49449730
delete mode 100644 t/t1013/objects/8d/4e360d6c70fbd72411991c02a09c442cf7a9fa
delete mode 100644 t/t1013/objects/95/b1625de3ba8b2214d1e0d0591138aea733f64f
delete mode 100644 t/t1013/objects/9a/e9e86b7bd6cb1472d9373702d8249973da0832
delete mode 100644 t/t1013/objects/bd/15045f6ce8ff75747562173640456a394412c8
delete mode 100644 t/t1013/objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391
delete mode 100644 t/t1013/objects/f8/16d5255855ac160652ee5253b06cd8ee14165a
diff --git a/sha1_file.c b/sha1_file.c
index 7dadd04..a72fcb6 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1442,51 +1442,6 @@ void *map_sha1_file(const unsigned char *sha1, unsigned long *size)
return map;
}
-/*
- * There used to be a second loose object header format which
- * was meant to mimic the in-pack format, allowing for direct
- * copy of the object data. This format turned up not to be
- * really worth it and we no longer write loose objects in that
- * format.
- */
-static int experimental_loose_object(unsigned char *map)
-{
- unsigned int word;
-
- /*
- * We must determine if the buffer contains the standard
- * zlib-deflated stream or the experimental format based
- * on the in-pack object format. Compare the header byte
- * for each format:
- *
- * RFC1950 zlib w/ deflate : 0www1000 : 0 <= www <= 7
- * Experimental pack-based : Stttssss : ttt = 1,2,3,4
- *
- * If bit 7 is clear and bits 0-3 equal 8, the buffer MUST be
- * in standard loose-object format, UNLESS it is a Git-pack
- * format object *exactly* 8 bytes in size when inflated.
- *
- * However, RFC1950 also specifies that the 1st 16-bit word
- * must be divisible by 31 - this checksum tells us our buffer
- * is in the standard format, giving a false positive only if
- * the 1st word of the Git-pack format object happens to be
- * divisible by 31, ie:
- * ((byte0 * 256) + byte1) % 31 = 0
- * => 0ttt10000www1000 % 31 = 0
- *
- * As it happens, this case can only arise for www=3 & ttt=1
- * - ie, a Commit object, which would have to be 8 bytes in
- * size. As no Commit can be that small, we find that the
- * combination of these two criteria (bitmask & checksum)
- * can always correctly determine the buffer format.
- */
- word = (map[0] << 8) + map[1];
- if ((map[0] & 0x8F) == 0x08 && !(word % 31))
- return 0;
- else
- return 1;
-}
-
unsigned long unpack_object_header_buffer(const unsigned char *buf,
unsigned long len, enum object_type *type, unsigned long *sizep)
{
@@ -1514,14 +1469,6 @@ unsigned long unpack_object_header_buffer(const unsigned char *buf,
int unpack_sha1_header(git_zstream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz)
{
- unsigned long size, used;
- static const char valid_loose_object_type[8] = {
- 0, /* OBJ_EXT */
- 1, 1, 1, 1, /* "commit", "tree", "blob", "tag" */
- 0, /* "delta" and others are invalid in a loose object */
- };
- enum object_type type;
-
/* Get the data stream */
memset(stream, 0, sizeof(*stream));
stream->next_in = map;
@@ -1529,27 +1476,6 @@ int unpack_sha1_header(git_zstream *stream, unsigned char *map, unsigned long ma
stream->next_out = buffer;
stream->avail_out = bufsiz;
- if (experimental_loose_object(map)) {
- /*
- * The old experimental format we no longer produce;
- * we can still read it.
- */
- used = unpack_object_header_buffer(map, mapsize, &type, &size);
- if (!used || !valid_loose_object_type[type])
- return -1;
- map += used;
- mapsize -= used;
-
- /* Set up the stream for the rest.. */
- stream->next_in = map;
- stream->avail_in = mapsize;
- git_inflate_init(stream);
-
- /* And generate the fake traditional header */
- stream->total_out = 1 + snprintf(buffer, bufsiz, "%s %lu",
- typename(type), size);
- return 0;
- }
git_inflate_init(stream);
return git_inflate(stream, 0);
}
diff --git a/t/t1013-loose-object-format.sh b/t/t1013-loose-object-format.sh
deleted file mode 100755
index fbf5f2f..0000000
--- a/t/t1013-loose-object-format.sh
+++ /dev/null
@@ -1,66 +0,0 @@
-#!/bin/sh
-#
-# Copyright (c) 2011 Roberto Tyley
-#
-
-test_description='Correctly identify and parse loose object headers
-
-There are two file formats for loose objects - the original standard
-format, and the experimental format introduced with Git v1.4.3, later
-deprecated with v1.5.3. Although Git no longer writes the
-experimental format, objects in both formats must be read, with the
-format for a given file being determined by the header.
-
-Detecting file format based on header is not entirely trivial, not
-least because the first byte of a zlib-deflated stream will vary
-depending on how much memory was allocated for the deflation window
-buffer when the object was written out (for example 4KB on Android,
-rather that 32KB on a normal PC).
-
-The loose objects used as test vectors have been generated with the
-following Git versions:
-
-standard format: Git v1.7.4.1
-experimental format: Git v1.4.3 (legacyheaders=false)
-standard format, deflated with 4KB window size: Agit/JGit on Android
-'
-
-. ./test-lib.sh
-
-assert_blob_equals() {
- printf "%s" "$2" >expected &&
- git cat-file -p "$1" >actual &&
- test_cmp expected actual
-}
-
-test_expect_success setup '
- cp -R "$TEST_DIRECTORY/t1013/objects" .git/ &&
- git --version
-'
-
-test_expect_success 'read standard-format loose objects' '
- git cat-file tag 8d4e360d6c70fbd72411991c02a09c442cf7a9fa &&
- git cat-file commit 6baee0540ea990d9761a3eb9ab183003a71c3696 &&
- git ls-tree 7a37b887a73791d12d26c0d3e39568a8fb0fa6e8 &&
- assert_blob_equals "257cc5642cb1a054f08cc83f2d943e56fd3ebe99" "foo$LF"
-'
-
-test_expect_success 'read experimental-format loose objects' '
- git cat-file tag 76e7fa9941f4d5f97f64fea65a2cba436bc79cbb &&
- git cat-file commit 7875c6237d3fcdd0ac2f0decc7d3fa6a50b66c09 &&
- git ls-tree 95b1625de3ba8b2214d1e0d0591138aea733f64f &&
- assert_blob_equals "2e65efe2a145dda7ee51d1741299f848e5bf752e" "a" &&
- assert_blob_equals "9ae9e86b7bd6cb1472d9373702d8249973da0832" "ab" &&
- assert_blob_equals "85df50785d62d3b05ab03d9cbf7e4a0b49449730" "abcd" &&
- assert_blob_equals "1656f9233d999f61ef23ef390b9c71d75399f435" "abcdefgh" &&
- assert_blob_equals "1e72a6b2c4a577ab0338860fa9fe87f761fc9bbd" "abcdefghi" &&
- assert_blob_equals "70e6a83d8dcb26fc8bc0cf702e2ddeb6adca18fd" "abcdefghijklmnop" &&
- assert_blob_equals "bd15045f6ce8ff75747562173640456a394412c8" "abcdefghijklmnopqrstuvwx"
-'
-
-test_expect_success 'read standard-format objects deflated with smaller window buffer' '
- git cat-file tag f816d5255855ac160652ee5253b06cd8ee14165a &&
- git cat-file tag 149cedb5c46929d18e0f118e9fa31927487af3b6
-'
-
-test_done
diff --git a/t/t1013/objects/14/9cedb5c46929d18e0f118e9fa31927487af3b6 b/t/t1013/objects/14/9cedb5c46929d18e0f118e9fa31927487af3b6
deleted file mode 100644
index 472fd1458e03e47136416bce60d6e7c893a468ab..0000000000000000000000000000000000000000
GIT binary patch
literal 0
HcmV?d00001
literal 117
zcmV-*0E+)ei51K-4#F@DKvCwL!aJ&DH^jjbLR`g3tWwmHs(9h{l<n&c-*p0_eCp+8
z)q#teVY;BH2sX(~8m)*Hx<<sPnQCO=;NQ)l_H~^-`0>zp+xy&xqlfV?lkEVv$I`1V
X&;Ic{P^6Jl5+pbyA%^e+FOeitJb^w>
diff --git a/t/t1013/objects/16/56f9233d999f61ef23ef390b9c71d75399f435 b/t/t1013/objects/16/56f9233d999f61ef23ef390b9c71d75399f435
deleted file mode 100644
index c379d74ae2b40faae9c31cb7478bc7f42a6fb13c..0000000000000000000000000000000000000000
GIT binary patch
literal 0
HcmV?d00001
literal 17
YcmcDhnB(o^<>%?^eV&1VkAYbg05K>8VgLXD
diff --git a/t/t1013/objects/1e/72a6b2c4a577ab0338860fa9fe87f761fc9bbd b/t/t1013/objects/1e/72a6b2c4a577ab0338860fa9fe87f761fc9bbd
deleted file mode 100644
index 93706305bcff060547181ee4b7d3a8583b691181..0000000000000000000000000000000000000000
GIT binary patch
literal 0
HcmV?d00001
literal 18
ZcmcDlnB(o^<>%?^ef|UsgJ2(X9{@cS1}Ojl
diff --git a/t/t1013/objects/25/7cc5642cb1a054f08cc83f2d943e56fd3ebe99 b/t/t1013/objects/25/7cc5642cb1a054f08cc83f2d943e56fd3ebe99
deleted file mode 100644
index bdcf704c9e663f3a11b3146b1b455bc2581b4761..0000000000000000000000000000000000000000
GIT binary patch
literal 0
HcmV?d00001
literal 19
acmb<m^geacKgb}_<MjFGObnu-%uWDJxd#LQ
diff --git a/t/t1013/objects/2e/65efe2a145dda7ee51d1741299f848e5bf752e b/t/t1013/objects/2e/65efe2a145dda7ee51d1741299f848e5bf752e
deleted file mode 100644
index ad62c43e418c11254cede4dc94982ac6a30dbe9c..0000000000000000000000000000000000000000
GIT binary patch
literal 0
HcmV?d00001
literal 10
RcmXr4nB&dDz>vg{1ON`X0$Bh6
diff --git a/t/t1013/objects/6b/aee0540ea990d9761a3eb9ab183003a71c3696 b/t/t1013/objects/6b/aee0540ea990d9761a3eb9ab183003a71c3696
deleted file mode 100644
index 3d2f0337dbb64c092b4a7e9bd324a66986cfe01a..0000000000000000000000000000000000000000
GIT binary patch
literal 0
HcmV?d00001
literal 181
zcmV;m080OO0i}>jY6CG41+&&EdLIz_c+?s&#*jt!#uw=6?r{d&BO_}9zP*3BL6-Fv
zMe(?t&r^etx{p>>0V(2;GZIGZz4#y@v6HB=?^32b4sN8R*<7gV+yFCnoI*s2Ba1lV
zFgj7@=Rk=%H>8K4H?*{$QejsHt*yZRcG4TH>l<x*;`Xpmm5FA{#V*GU_~=6l7@~(y
j=bbbBs%`pTkNNr&2`txXKEU_mgI{mauB<nAL=six5bRpF
diff --git a/t/t1013/objects/70/e6a83d8dcb26fc8bc0cf702e2ddeb6adca18fd b/t/t1013/objects/70/e6a83d8dcb26fc8bc0cf702e2ddeb6adca18fd
deleted file mode 100644
index b3f71a6ee5802b36b4576c5ebf111f30ef2017a4..0000000000000000000000000000000000000000
GIT binary patch
literal 0
HcmV?d00001
literal 26
icmdnMSTV=j$IH*t*Zcg5GpEj-JbPN7fx*mytrGyBA`4~!
diff --git a/t/t1013/objects/76/e7fa9941f4d5f97f64fea65a2cba436bc79cbb b/t/t1013/objects/76/e7fa9941f4d5f97f64fea65a2cba436bc79cbb
deleted file mode 100644
index af4e9a7b0c035fc7c26355b85f250f3ff7d3afa1..0000000000000000000000000000000000000000
GIT binary patch
literal 0
HcmV?d00001
literal 155
zcmV;M0A&Bd3wWF*%s~!<Fc3h|eNQoaV^dlvm>A>Ez2O4GbZDxSl3I-1-k{6>7C#LS
zrUGr(He|JFof*kFg``L2m}m#I*r>r;QYTTig@ICxp@@PW__J^hk>`TbaZEYl&pl_j
zr-5@x&~FoOaL)gfWzVZ$F}r}Xq$Jnp1u9c%tLsj8a8Q*}LiGE^!TJibhg&G{u4FBZ
J_yWO9IpM14O;`W`
diff --git a/t/t1013/objects/78/75c6237d3fcdd0ac2f0decc7d3fa6a50b66c09 b/t/t1013/objects/78/75c6237d3fcdd0ac2f0decc7d3fa6a50b66c09
deleted file mode 100644
index 3dd28be5c61840c23486bc654213c493c5387d75..0000000000000000000000000000000000000000
GIT binary patch
literal 0
HcmV?d00001
literal 139
zcmV;60CfM64S1ZT%u5QwFc1LHeNHiZA!PEi1rc|y+=v&LG*cUF4TP!E+lzPvmv8f=
zF+(2`MjJA_L|w8LeMUCfgdWj##I$#AjDA$K%2XR%YvLvqZrjWo9NLdszC7JmYPrx;
t4^^*^BcMYYt&hRN&Y&@BsLN7B_}@oeC^Ni^OmHp&FVtQ;^#LMbK`4x{Jre)`
diff --git a/t/t1013/objects/7a/37b887a73791d12d26c0d3e39568a8fb0fa6e8 b/t/t1013/objects/7a/37b887a73791d12d26c0d3e39568a8fb0fa6e8
deleted file mode 100644
index 2b97b264c33f78fe6c8230b5bbeacd6409d9f963..0000000000000000000000000000000000000000
GIT binary patch
literal 0
HcmV?d00001
literal 54
zcmV-60LlM&0V^p=O;s>9XD~D{Ff%bx2yqP#(RK6mab-}gIhvxgaY4w3o)h-EQ|!Y2
M+U=VO05jJR92g)Lz5oCK
diff --git a/t/t1013/objects/85/df50785d62d3b05ab03d9cbf7e4a0b49449730 b/t/t1013/objects/85/df50785d62d3b05ab03d9cbf7e4a0b49449730
deleted file mode 100644
index 6dff746876aab1acb9b42c557cdda69164d68398..0000000000000000000000000000000000000000
GIT binary patch
literal 0
HcmV?d00001
literal 13
UcmXr1nB(o^<;Tdte1owY030|2-~a#s
diff --git a/t/t1013/objects/8d/4e360d6c70fbd72411991c02a09c442cf7a9fa b/t/t1013/objects/8d/4e360d6c70fbd72411991c02a09c442cf7a9fa
deleted file mode 100644
index cb41e92d076c191245389bedfc0184ff78511c03..0000000000000000000000000000000000000000
GIT binary patch
literal 0
HcmV?d00001
literal 156
zcmV;N0Av4n0VRw<3c@fD06pgwdly5tsfhs*Z{DRJ*tBb?+D6i?(BE72I0G|63DCPu
zj(2VaTqI_*uMJZOrVHL7S&o4s9;`8zJhs*ar(}6Cw0RhMQL;WJp|PXV?QXdY^mB;|
zTyx|i8JgwE3mnTIwS4iM<~8VP)NR)D;{<52a+SALfUQAelxip??qHt!F~Ox5c%$~Z
K)~G%MG&zg-a!8y2
diff --git a/t/t1013/objects/95/b1625de3ba8b2214d1e0d0591138aea733f64f b/t/t1013/objects/95/b1625de3ba8b2214d1e0d0591138aea733f64f
deleted file mode 100644
index 7ac46b4f703aa00cd96c42971a237e3da5bbb5ca..0000000000000000000000000000000000000000
GIT binary patch
literal 0
HcmV?d00001
literal 252
zcmV<Y00aN25_p_5G%zqTF;Pg(OwTCME2$`95DWXMY&&y);(O)ymfUj+uLsZkVrmFc
zl$Kvw1Xj~}KcFHu>GFoC4YqUk*LiV!x=c5Ks>#dDO9iWuD_XYc$kOuF%oc6@EC02B
zPy91`FH}uFREb{d`$r31?=F8Ac(Fui<`0jj`%CqpN{TZpN>Wqvz{(1qt+4Gqt@fw;
z!1)3_-Fw^Co|5<rRaR1-npaY(3wPLFQI`0e7ynC3N|VIR99*+3U4%}+mF9z$%zF7E
zyZYK`k)oUC=1ezKW)|P#FoG(nN-ct@c{caa>`fQ1IeT|&t}Bnaap*};@I(N)b$dQ6
C6@X{}
diff --git a/t/t1013/objects/9a/e9e86b7bd6cb1472d9373702d8249973da0832 b/t/t1013/objects/9a/e9e86b7bd6cb1472d9373702d8249973da0832
deleted file mode 100644
index 9d8316d4e598e32ef17b3918cf00276c8c1bffba..0000000000000000000000000000000000000000
GIT binary patch
literal 0
HcmV?d00001
literal 11
ScmXr2nB(ok#K5S=a0CDn4+6^o
diff --git a/t/t1013/objects/bd/15045f6ce8ff75747562173640456a394412c8 b/t/t1013/objects/bd/15045f6ce8ff75747562173640456a394412c8
deleted file mode 100644
index eebf23956e3b8ac736fa04e682b4214ac75c716a..0000000000000000000000000000000000000000
GIT binary patch
literal 0
HcmV?d00001
literal 34
qcmdnNSTV=j$IH*t*Zcg5GpEj-JbPMSLq|(bQ&)RE14GpTE?oczGYz%?
diff --git a/t/t1013/objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391 b/t/t1013/objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391
deleted file mode 100644
index 134cf1937963c733d7affa6919f9835db864188a..0000000000000000000000000000000000000000
GIT binary patch
literal 0
HcmV?d00001
literal 9
OcmXr0n8VBf1dIR)&;dyR
diff --git a/t/t1013/objects/f8/16d5255855ac160652ee5253b06cd8ee14165a b/t/t1013/objects/f8/16d5255855ac160652ee5253b06cd8ee14165a
deleted file mode 100644
index 26b75aec56f8f178a9f001be3b630a4e48ceb6b9..0000000000000000000000000000000000000000
GIT binary patch
literal 0
HcmV?d00001
literal 116
zcmV-)0E_=fi51Mj4uUWYfML&jirx)LyJa0F#`r3w9f$!(uovH6xc&JKzsm$f<<f?C
zRfp1-tQ=FZG^!bj#u2Tmo**n42WG`v@ZVNJ+q%vk{CLR6_BLC0bVsL5bqBaVm!`73
W+SeaIi6Uq0dxk3#VhDeEz9mhO%Qr3n
--
1.8.5.rc2.442.gbff39ff
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH] drop support for "experimental" loose objects
2013-11-21 11:48 ` Jeff King
@ 2013-11-21 12:43 ` Duy Nguyen
2013-11-21 14:42 ` Keshav Kini
2013-11-21 22:41 ` Jeff King
2013-11-21 19:44 ` Junio C Hamano
2013-11-23 0:24 ` Jonathan Nieder
2 siblings, 2 replies; 28+ messages in thread
From: Duy Nguyen @ 2013-11-21 12:43 UTC (permalink / raw)
To: Jeff King; +Cc: Joey Hess, Git Mailing List
On Thu, Nov 21, 2013 at 6:48 PM, Jeff King <peff@peff.net> wrote:
> @@ -1514,14 +1469,6 @@ unsigned long unpack_object_header_buffer(const unsigned char *buf,
>
> int unpack_sha1_header(git_zstream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz)
> {
> - unsigned long size, used;
> - static const char valid_loose_object_type[8] = {
> - 0, /* OBJ_EXT */
> - 1, 1, 1, 1, /* "commit", "tree", "blob", "tag" */
> - 0, /* "delta" and others are invalid in a loose object */
> - };
> - enum object_type type;
> -
> /* Get the data stream */
> memset(stream, 0, sizeof(*stream));
> stream->next_in = map;
> @@ -1529,27 +1476,6 @@ int unpack_sha1_header(git_zstream *stream, unsigned char *map, unsigned long ma
> stream->next_out = buffer;
> stream->avail_out = bufsiz;
>
> - if (experimental_loose_object(map)) {
Perhaps keep this..
> - /*
> - * The old experimental format we no longer produce;
> - * we can still read it.
> - */
> - used = unpack_object_header_buffer(map, mapsize, &type, &size);
> - if (!used || !valid_loose_object_type[type])
> - return -1;
> - map += used;
> - mapsize -= used;
> -
> - /* Set up the stream for the rest.. */
> - stream->next_in = map;
> - stream->avail_in = mapsize;
> - git_inflate_init(stream);
> -
> - /* And generate the fake traditional header */
> - stream->total_out = 1 + snprintf(buffer, bufsiz, "%s %lu",
> - typename(type), size);
> - return 0;
and replace all this with
die("detected an object in obsolete format, please repack the
repository using a version before XXX");
?
> - }
> git_inflate_init(stream);
> return git_inflate(stream, 0);
> }
--
Duy
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] drop support for "experimental" loose objects
2013-11-21 12:43 ` Duy Nguyen
@ 2013-11-21 14:42 ` Keshav Kini
2013-11-21 22:41 ` Jeff King
1 sibling, 0 replies; 28+ messages in thread
From: Keshav Kini @ 2013-11-21 14:42 UTC (permalink / raw)
To: git
Duy Nguyen <pclouds@gmail.com> writes:
> On Thu, Nov 21, 2013 at 6:48 PM, Jeff King <peff@peff.net> wrote:
>> @@ -1514,14 +1469,6 @@ unsigned long unpack_object_header_buffer(const unsigned char *buf,
>>
>> int unpack_sha1_header(git_zstream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz)
>> {
>> - unsigned long size, used;
>> - static const char valid_loose_object_type[8] = {
>> - 0, /* OBJ_EXT */
>> - 1, 1, 1, 1, /* "commit", "tree", "blob", "tag" */
>> - 0, /* "delta" and others are invalid in a loose object */
>> - };
>> - enum object_type type;
>> -
>> /* Get the data stream */
>> memset(stream, 0, sizeof(*stream));
>> stream->next_in = map;
>> @@ -1529,27 +1476,6 @@ int unpack_sha1_header(git_zstream *stream, unsigned char *map, unsigned long ma
>> stream->next_out = buffer;
>> stream->avail_out = bufsiz;
>>
>> - if (experimental_loose_object(map)) {
>
> Perhaps keep this..
>
>> - /*
>> - * The old experimental format we no longer produce;
>> - * we can still read it.
>> - */
>> - used = unpack_object_header_buffer(map, mapsize, &type, &size);
>> - if (!used || !valid_loose_object_type[type])
>> - return -1;
>> - map += used;
>> - mapsize -= used;
>> -
>> - /* Set up the stream for the rest.. */
>> - stream->next_in = map;
>> - stream->avail_in = mapsize;
>> - git_inflate_init(stream);
>> -
>> - /* And generate the fake traditional header */
>> - stream->total_out = 1 + snprintf(buffer, bufsiz, "%s %lu",
>> - typename(type), size);
>> - return 0;
>
> and replace all this with
>
> die("detected an object in obsolete format, please repack the
> repository using a version before XXX");
>
> ?
Wouldn't that fail to solve the issue of `git fsck` dying on corrupt
data? experimental_loose_object() would need to be rewritten to be more
conservative in deciding that an object was in the experimental loose
object format.
-Keshav
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] drop support for "experimental" loose objects
2013-11-21 11:41 ` [PATCH] drop support for "experimental" loose objects Jeff King
2013-11-21 11:48 ` Jeff King
@ 2013-11-21 16:04 ` Joey Hess
2013-11-21 20:19 ` Christian Couder
2013-11-22 2:09 ` Jeff King
1 sibling, 2 replies; 28+ messages in thread
From: Joey Hess @ 2013-11-21 16:04 UTC (permalink / raw)
To: git
[-- Attachment #1: Type: text/plain, Size: 576 bytes --]
Jeff King wrote:
> This latter behavior is much worse for two reasons. One,
> git reports an allocation error when the real error is
> corruption. And two, the program dies unconditionally, so
> you cannot even run fsck (which would otherwise ignore the
> broken object and keep going).
BTW, I've also seen git cat-file --batch report wrong sizes for objects,
sometimes without crashing. This is particularly problimatic because if
the object size is wrong, it's very hard to detect the actual end of the
object output in the batch mode stream.
--
see shy jo
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] drop support for "experimental" loose objects
2013-11-21 11:48 ` Jeff King
2013-11-21 12:43 ` Duy Nguyen
@ 2013-11-21 19:44 ` Junio C Hamano
2013-11-23 0:24 ` Jonathan Nieder
2 siblings, 0 replies; 28+ messages in thread
From: Junio C Hamano @ 2013-11-21 19:44 UTC (permalink / raw)
To: Jeff King; +Cc: Joey Hess, git
Jeff King <peff@peff.net> writes:
> We could try to improve the heuristic to err on the side of
> normal objects in the face of corruption, but there is
> really little point. The experimental format is long-dead,
> and was never enabled by default to begin with. We can
> instead simply remove it. The only affected repository would
> be one that explicitly set core.legacyheaders in 2007, and
> then never repacked in the intervening 6 years.
Sounds sensible. Thanks.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] drop support for "experimental" loose objects
2013-11-21 16:04 ` Joey Hess
@ 2013-11-21 20:19 ` Christian Couder
2013-11-22 9:58 ` Jeff King
2013-11-22 2:09 ` Jeff King
1 sibling, 1 reply; 28+ messages in thread
From: Christian Couder @ 2013-11-21 20:19 UTC (permalink / raw)
To: Joey Hess; +Cc: git, Junio C Hamano
On Thu, Nov 21, 2013 at 5:04 PM, Joey Hess <joey@kitenet.net> wrote:
>
> BTW, I've also seen git cat-file --batch report wrong sizes for objects,
> sometimes without crashing. This is particularly problimatic because if
> the object size is wrong, it's very hard to detect the actual end of the
> object output in the batch mode stream.
Yeah, I think it might report wrong size in case of replaced objects
for example.
I looked at that following Junio's comment about the
sha1_object_info() API, which,
unlike read_sha1_file() API, does not interact with the "replace" mechanism:
http://thread.gmane.org/gmane.comp.version-control.git/234023/
I started to work on a patch about this but didn't take the time to
finish and post it.
Thanks,
Christian.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] drop support for "experimental" loose objects
2013-11-21 12:43 ` Duy Nguyen
2013-11-21 14:42 ` Keshav Kini
@ 2013-11-21 22:41 ` Jeff King
1 sibling, 0 replies; 28+ messages in thread
From: Jeff King @ 2013-11-21 22:41 UTC (permalink / raw)
To: Duy Nguyen; +Cc: Joey Hess, Git Mailing List
On Thu, Nov 21, 2013 at 07:43:03PM +0700, Duy Nguyen wrote:
> > - if (experimental_loose_object(map)) {
>
> Perhaps keep this..
>
> > - /*
> > - * The old experimental format we no longer produce;
> > - * we can still read it.
> > - */
> > - used = unpack_object_header_buffer(map, mapsize, &type, &size);
> > - if (!used || !valid_loose_object_type[type])
> > - return -1;
> > - map += used;
> > - mapsize -= used;
> > -
> > - /* Set up the stream for the rest.. */
> > - stream->next_in = map;
> > - stream->avail_in = mapsize;
> > - git_inflate_init(stream);
> > -
> > - /* And generate the fake traditional header */
> > - stream->total_out = 1 + snprintf(buffer, bufsiz, "%s %lu",
> > - typename(type), size);
> > - return 0;
>
> and replace all this with
>
> die("detected an object in obsolete format, please repack the
> repository using a version before XXX");
That would eliminate the second part of my purpose, which is to not
die() on a corrupted object because we incorrectly guess that it is
experimental.
If we think these objects are in the wild, the right thing to do would
be to warn() and continue. But I really find it hard to believe any such
objects exist at this point.
-Peff
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] drop support for "experimental" loose objects
2013-11-21 16:04 ` Joey Hess
2013-11-21 20:19 ` Christian Couder
@ 2013-11-22 2:09 ` Jeff King
2013-11-22 17:28 ` Joey Hess
1 sibling, 1 reply; 28+ messages in thread
From: Jeff King @ 2013-11-22 2:09 UTC (permalink / raw)
To: Joey Hess; +Cc: git
On Thu, Nov 21, 2013 at 12:04:26PM -0400, Joey Hess wrote:
> Jeff King wrote:
> > This latter behavior is much worse for two reasons. One,
> > git reports an allocation error when the real error is
> > corruption. And two, the program dies unconditionally, so
> > you cannot even run fsck (which would otherwise ignore the
> > broken object and keep going).
>
> BTW, I've also seen git cat-file --batch report wrong sizes for objects,
> sometimes without crashing. This is particularly problimatic because if
> the object size is wrong, it's very hard to detect the actual end of the
> object output in the batch mode stream.
Hrm. For --batch, I'd think we would open the whole object and notice
the corruption, even with the current code. But for --batch-check, we
use sha1_object_info, and for an "experimental" object, we do not need
to de-zlib the object at all. So we end up reporting whatever crap we
decipher from the garbage bytes. My patch would fix that, as we would
not incorrectly guess an object is experimental anymore.
If you have specific cases that trigger even after my patch, I'd be
interested to see them.
-Peff
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] drop support for "experimental" loose objects
2013-11-21 20:19 ` Christian Couder
@ 2013-11-22 9:58 ` Jeff King
2013-11-22 11:04 ` Christian Couder
2013-11-22 17:23 ` Junio C Hamano
0 siblings, 2 replies; 28+ messages in thread
From: Jeff King @ 2013-11-22 9:58 UTC (permalink / raw)
To: Christian Couder; +Cc: Joey Hess, git, Junio C Hamano
On Thu, Nov 21, 2013 at 09:19:25PM +0100, Christian Couder wrote:
> Yeah, I think it might report wrong size in case of replaced objects
> for example.
> I looked at that following Junio's comment about the
> sha1_object_info() API, which,
> unlike read_sha1_file() API, does not interact with the "replace" mechanism:
>
> http://thread.gmane.org/gmane.comp.version-control.git/234023/
>
> I started to work on a patch about this but didn't take the time to
> finish and post it.
That seems kind of crazy. Would the fix be as simple as this:
diff --git a/sha1_file.c b/sha1_file.c
index 10676ba..a051d6c 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2529,6 +2529,8 @@ int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi)
struct pack_entry e;
int rtype;
+ sha1 = lookup_replace_object(sha1);
+
co = find_cached_object(sha1);
if (co) {
if (oi->typep)
or do we need some way for callers to turn off replacement? I notice
that read_sha1_file has such a feature, but it is only used in one
place. I guess we would need to audit all the sha1_object_info callers.
-Peff
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH] drop support for "experimental" loose objects
2013-11-22 9:58 ` Jeff King
@ 2013-11-22 11:04 ` Christian Couder
2013-11-22 11:24 ` Jeff King
2013-11-22 17:23 ` Junio C Hamano
1 sibling, 1 reply; 28+ messages in thread
From: Christian Couder @ 2013-11-22 11:04 UTC (permalink / raw)
To: Jeff King; +Cc: Joey Hess, git, Junio C Hamano
On Fri, Nov 22, 2013 at 10:58 AM, Jeff King <peff@peff.net> wrote:
> On Thu, Nov 21, 2013 at 09:19:25PM +0100, Christian Couder wrote:
>
>> Yeah, I think it might report wrong size in case of replaced objects
>> for example.
>> I looked at that following Junio's comment about the
>> sha1_object_info() API, which,
>> unlike read_sha1_file() API, does not interact with the "replace" mechanism:
>>
>> http://thread.gmane.org/gmane.comp.version-control.git/234023/
>>
>> I started to work on a patch about this but didn't take the time to
>> finish and post it.
>
> That seems kind of crazy. Would the fix be as simple as this:
>
> diff --git a/sha1_file.c b/sha1_file.c
> index 10676ba..a051d6c 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -2529,6 +2529,8 @@ int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi)
> struct pack_entry e;
> int rtype;
>
> + sha1 = lookup_replace_object(sha1);
> +
> co = find_cached_object(sha1);
> if (co) {
> if (oi->typep)
>
> or do we need some way for callers to turn off replacement? I notice
> that read_sha1_file has such a feature, but it is only used in one
> place.
Yeah, indeed, I asked myself such a question and that's why it is not
so simple unfortunately.
In "sha1_file.c", there is:
void *read_sha1_file_extended(const unsigned char *sha1,
enum object_type *type,
unsigned long *size,
unsigned flag)
{
void *data;
char *path;
const struct packed_git *p;
const unsigned char *repl = (flag & READ_SHA1_FILE_REPLACE)
? lookup_replace_object(sha1) : sha1;
errno = 0;
data = read_object(repl, type, size);
...
And in cache.h, there is:
#define READ_SHA1_FILE_REPLACE 1
static inline void *read_sha1_file(const unsigned char *sha1, enum
object_type *type, unsigned long *size)
{
return read_sha1_file_extended(sha1, type, size,
READ_SHA1_FILE_REPLACE);
}
So the READ_SHA1_FILE_REPLACE is a way to disable replacement at compile time.
But in my opinion if we want such a knob, we should use it when we set
the "read_replace_refs" global variable.
For example with something like this:
diff --git a/environment.c b/environment.c
index 0a15349..7c99af8 100644
--- a/environment.c
+++ b/environment.c
@@ -44,7 +44,7 @@ const char *editor_program;
const char *askpass_program;
const char *excludes_file;
enum auto_crlf auto_crlf = AUTO_CRLF_FALSE;
-int read_replace_refs = 1; /* NEEDSWORK: rename to use_replace_refs */
+int read_replace_refs = READ_SHA1_FILE_REPLACE; /* NEEDSWORK: rename
to use_replace_refs */
enum eol core_eol = EOL_UNSET;
enum safe_crlf safe_crlf = SAFE_CRLF_WARN;
unsigned whitespace_rule_cfg = WS_DEFAULT_RULE;
@Junio what would you think about such a change?
> I guess we would need to audit all the sha1_object_info callers.
Yeah but when I looked at them, there were not many that looked dangerous.
Thanks,
Christian.
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH] drop support for "experimental" loose objects
2013-11-22 11:04 ` Christian Couder
@ 2013-11-22 11:24 ` Jeff King
2013-11-22 14:23 ` Christian Couder
0 siblings, 1 reply; 28+ messages in thread
From: Jeff King @ 2013-11-22 11:24 UTC (permalink / raw)
To: Christian Couder; +Cc: Joey Hess, git, Junio C Hamano
On Fri, Nov 22, 2013 at 12:04:01PM +0100, Christian Couder wrote:
> In "sha1_file.c", there is:
>
> void *read_sha1_file_extended(const unsigned char *sha1,
> enum object_type *type,
> unsigned long *size,
> unsigned flag)
> {
> void *data;
> char *path;
> const struct packed_git *p;
> const unsigned char *repl = (flag & READ_SHA1_FILE_REPLACE)
> ? lookup_replace_object(sha1) : sha1;
>
> errno = 0;
> data = read_object(repl, type, size);
> ...
>
> And in cache.h, there is:
>
> #define READ_SHA1_FILE_REPLACE 1
> static inline void *read_sha1_file(const unsigned char *sha1, enum
> object_type *type, unsigned long *size)
> {
> return read_sha1_file_extended(sha1, type, size,
> READ_SHA1_FILE_REPLACE);
> }
>
> So the READ_SHA1_FILE_REPLACE is a way to disable replacement at compile time.
Is it? I did not have the impression anyone would ever redefine
READ_SHA1_FILE_REPLACE at compile time, but that it was a flag that
individual callsites would pass to read_sha1_file_extended to tell them
whether they were interested in replacements. And the obvious reasons to
not be are:
1. You are doing some operation which needs real objects, like fsck or
generating a packfile.
2. You have already resolved any replacements, and want to make sure
you are getting the same object used elsewhere (e.g., because you
already printed its size :) ).
The only site which calls read_sha1_file_extended directly and does not
pass the REPLACE flag is in streaming.c. And that looks to be a case of
(2), since we resolve the replacement at the start in open_istream().
I am kind of surprised we do not need to do so for (1) in places like
pack-objects.c. Most of that code does not use read_sha1_file,
preferring instead to find the individual pack entries (for reuse). But
there are some calls to read_sha1_file, and I wonder if there is a bug
lurking there.
-Peff
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] drop support for "experimental" loose objects
2013-11-22 11:24 ` Jeff King
@ 2013-11-22 14:23 ` Christian Couder
2013-11-22 16:15 ` Jeff King
0 siblings, 1 reply; 28+ messages in thread
From: Christian Couder @ 2013-11-22 14:23 UTC (permalink / raw)
To: Jeff King; +Cc: Joey Hess, git, Junio C Hamano
On Fri, Nov 22, 2013 at 12:24 PM, Jeff King <peff@peff.net> wrote:
> On Fri, Nov 22, 2013 at 12:04:01PM +0100, Christian Couder wrote:
>
>> In "sha1_file.c", there is:
>>
>> void *read_sha1_file_extended(const unsigned char *sha1,
>> enum object_type *type,
>> unsigned long *size,
>> unsigned flag)
>> {
>> void *data;
>> char *path;
>> const struct packed_git *p;
>> const unsigned char *repl = (flag & READ_SHA1_FILE_REPLACE)
>> ? lookup_replace_object(sha1) : sha1;
>>
>> errno = 0;
>> data = read_object(repl, type, size);
>> ...
>>
>> And in cache.h, there is:
>>
>> #define READ_SHA1_FILE_REPLACE 1
>> static inline void *read_sha1_file(const unsigned char *sha1, enum
>> object_type *type, unsigned long *size)
>> {
>> return read_sha1_file_extended(sha1, type, size,
>> READ_SHA1_FILE_REPLACE);
>> }
>>
>> So the READ_SHA1_FILE_REPLACE is a way to disable replacement at compile time.
>
> Is it? I did not have the impression anyone would ever redefine
> READ_SHA1_FILE_REPLACE at compile time, but that it was a flag that
> individual callsites would pass to read_sha1_file_extended to tell them
> whether they were interested in replacements. And the obvious reasons to
> not be are:
>
> 1. You are doing some operation which needs real objects, like fsck or
> generating a packfile.
>
> 2. You have already resolved any replacements, and want to make sure
> you are getting the same object used elsewhere (e.g., because you
> already printed its size :) ).
>
> The only site which calls read_sha1_file_extended directly and does not
> pass the REPLACE flag is in streaming.c. And that looks to be a case of
> (2), since we resolve the replacement at the start in open_istream().
Yeah, you are right. Sorry for overlooking this.
But anyway it looks redundant to me to have both this REPLACE flag and
the read_replace_refs global variable, so I think a proper solution
would involve some significant refactoring.
And if we decide to keep a REPLACE flag we might need to add one to
sha1_object_info_extended() too.
Thanks,
Christian.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] drop support for "experimental" loose objects
2013-11-22 14:23 ` Christian Couder
@ 2013-11-22 16:15 ` Jeff King
0 siblings, 0 replies; 28+ messages in thread
From: Jeff King @ 2013-11-22 16:15 UTC (permalink / raw)
To: Christian Couder; +Cc: Joey Hess, git, Junio C Hamano
On Fri, Nov 22, 2013 at 03:23:31PM +0100, Christian Couder wrote:
> > The only site which calls read_sha1_file_extended directly and does not
> > pass the REPLACE flag is in streaming.c. And that looks to be a case of
> > (2), since we resolve the replacement at the start in open_istream().
>
> Yeah, you are right. Sorry for overlooking this.
>
> But anyway it looks redundant to me to have both this REPLACE flag and
> the read_replace_refs global variable, so I think a proper solution
> would involve some significant refactoring.
I don't think it is redundant. The global variable is about "does the
whole operation want the replace feature turned on" and the flag is
about "does this particular callsite want the replace featured turned
on". We use the feature iff both are true.
We could implement the callsite flag by tweaking the global right before
the call to read_sha1_file, but then we would have to remember to turn
it back on afterwards. If this were a language with dynamic scopes like
Perl, that would be easy. But in C you have to remember to reset it in
all code paths. :)
In some cases it does make sense to turn the feature off for a whole
command (like pack-objects); using the global makes sense there. And
indeed, we seem to do it already in things like fsck, index-pack, etc.
So that answers my question of why I did not see more of case (1) in my
previous email: they do not need per-callsite disabling, because they do
it for the whole command.
> And if we decide to keep a REPLACE flag we might need to add one to
> sha1_object_info_extended() too.
Yes, but somebody needs to look at all of the callsites and decide which
form they want. :)
I did a brief skim, and the ones I noticed were:
- several spots in index-pack, pack-objects, etc. But these are
already covered by unsetting read_replace_refs.
- replace_object looks at both the original and new object to compare
their types (due to your recent patches); it would obviously want to
get the true type of the original object
- When creating tags and trees, we care about the type of the object
(the former for the "type" line of the tag, the latter to set the
mode). What should they do with replace objects? As above, it is
probably insane to switch types, so it may not matter for practical
purposes.
- istream_source in streaming.c would probably want to turn it off for
the same reason it uses read_sha1_file_extended
So I think most sites would be unaffected, but due to the second and
fourth item in my list above, we would need a flag for
sha1_object_info_extended.
-Peff
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] drop support for "experimental" loose objects
2013-11-22 9:58 ` Jeff King
2013-11-22 11:04 ` Christian Couder
@ 2013-11-22 17:23 ` Junio C Hamano
1 sibling, 0 replies; 28+ messages in thread
From: Junio C Hamano @ 2013-11-22 17:23 UTC (permalink / raw)
To: Jeff King; +Cc: Christian Couder, Joey Hess, git
Jeff King <peff@peff.net> writes:
> I guess we would need to audit all the sha1_object_info callers.
Yup; I agree that was the conclusion of Christian's thread.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] drop support for "experimental" loose objects
2013-11-22 2:09 ` Jeff King
@ 2013-11-22 17:28 ` Joey Hess
2013-11-24 8:44 ` Jeff King
0 siblings, 1 reply; 28+ messages in thread
From: Joey Hess @ 2013-11-22 17:28 UTC (permalink / raw)
To: git
[-- Attachment #1: Type: text/plain, Size: 1406 bytes --]
Jeff King wrote:
> > BTW, I've also seen git cat-file --batch report wrong sizes for objects,
>
> Hrm. For --batch, I'd think we would open the whole object and notice
> the corruption, even with the current code. But for --batch-check, we
> use sha1_object_info, and for an "experimental" object, we do not need
> to de-zlib the object at all. So we end up reporting whatever crap we
> decipher from the garbage bytes. My patch would fix that, as we would
> not incorrectly guess an object is experimental anymore.
>
> If you have specific cases that trigger even after my patch, I'd be
> interested to see them.
I was seeing it with --batch, not --batch-check. Probably only with the
old experimental loose object format. In one case, --batch reported a
size of 20k, and only output 1k of data. With the object file I sent
earlier, --batch reports a huge size, and fails trying to allocate the
memory for it before it can output anything.
I also have seen at least once a corrupt pack file that caused git to try
and allocate a absurd quantity of memory.
Unfortunately I do not currently have exemplars for these, although I
should be able to run a less robust version of my code and find them
again. ;) Will try to find time to do that.
BTW, the fuzzing code is here:
http://source.git-repair.branchable.com/?p=source.git;a=blob;f=Git/Destroyer.hs
--
see shy jo
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 828 bytes --]
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] drop support for "experimental" loose objects
2013-11-21 11:48 ` Jeff King
2013-11-21 12:43 ` Duy Nguyen
2013-11-21 19:44 ` Junio C Hamano
@ 2013-11-23 0:24 ` Jonathan Nieder
2013-11-23 0:30 ` Jeff King
2 siblings, 1 reply; 28+ messages in thread
From: Jonathan Nieder @ 2013-11-23 0:24 UTC (permalink / raw)
To: Jeff King; +Cc: Joey Hess, git
Jeff King wrote:
> sha1_file.c | 74 ---------------------
Yay!
> t/t1013-loose-object-format.sh | 66 ------------------
Hmm, not all of these tests are about the "experimental" format. Do
we really want to remove them all?
Thanks,
Jonathan
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] drop support for "experimental" loose objects
2013-11-23 0:24 ` Jonathan Nieder
@ 2013-11-23 0:30 ` Jeff King
2013-11-23 0:47 ` Jonathan Nieder
0 siblings, 1 reply; 28+ messages in thread
From: Jeff King @ 2013-11-23 0:30 UTC (permalink / raw)
To: Jonathan Nieder; +Cc: Joey Hess, git
On Fri, Nov 22, 2013 at 04:24:05PM -0800, Jonathan Nieder wrote:
> > t/t1013-loose-object-format.sh | 66 ------------------
>
> Hmm, not all of these tests are about the "experimental" format. Do
> we really want to remove them all?
I think so. They were not all testing the experimental format, but they
were about making sure the is-it-experimental heuristic triggered
properly with various zlib settings.
Now that we do not apply that heuristic, there is nothing (in git) to
test. We feed the contents straight to zlib. We could keep the objects
with small window size as a test, but we are not really testing git; we
are testing zlib at that point.
-Peff
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] drop support for "experimental" loose objects
2013-11-23 0:30 ` Jeff King
@ 2013-11-23 0:47 ` Jonathan Nieder
0 siblings, 0 replies; 28+ messages in thread
From: Jonathan Nieder @ 2013-11-23 0:47 UTC (permalink / raw)
To: Jeff King; +Cc: Joey Hess, git
Jeff King wrote:
> On Fri, Nov 22, 2013 at 04:24:05PM -0800, Jonathan Nieder wrote:
>>> t/t1013-loose-object-format.sh | 66 ------------------
>>
>> Hmm, not all of these tests are about the "experimental" format. Do
>> we really want to remove them all?
>
> I think so. They were not all testing the experimental format, but they
> were about making sure the is-it-experimental heuristic triggered
> properly with various zlib settings.
>
> Now that we do not apply that heuristic, there is nothing (in git) to
> test. We feed the contents straight to zlib.
Ok, makes sense.
In principle the tests are still useful as futureproofing in case git
starts to sanity-check the objects as a way to notice corruption
earlier or something. But in practice, that kind of futureproofing is
probably not worth the extra tests to maintain.
For what it's worth,
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] drop support for "experimental" loose objects
2013-11-22 17:28 ` Joey Hess
@ 2013-11-24 8:44 ` Jeff King
2013-11-24 9:07 ` Jeff King
0 siblings, 1 reply; 28+ messages in thread
From: Jeff King @ 2013-11-24 8:44 UTC (permalink / raw)
To: Joey Hess; +Cc: git
On Fri, Nov 22, 2013 at 01:28:59PM -0400, Joey Hess wrote:
> > Hrm. For --batch, I'd think we would open the whole object and notice
> > the corruption, even with the current code. But for --batch-check, we
> > use sha1_object_info, and for an "experimental" object, we do not need
> > to de-zlib the object at all. So we end up reporting whatever crap we
> > decipher from the garbage bytes. My patch would fix that, as we would
> > not incorrectly guess an object is experimental anymore.
> >
> > If you have specific cases that trigger even after my patch, I'd be
> > interested to see them.
>
> I was seeing it with --batch, not --batch-check. Probably only with the
> old experimental loose object format. In one case, --batch reported a
> size of 20k, and only output 1k of data. With the object file I sent
> earlier, --batch reports a huge size, and fails trying to allocate the
> memory for it before it can output anything.
Ah, yeah, that makes sense. We report the size via sha1_object_info,
whether we are going to output the object itself or not. So we might
report the bogus size, not noticing the corruption, and then hit an
error and bail when sending the object itself.
My patch makes that better in some cases, because we'll notice more
corruption when looking at the header of the object for
sha1_object_info. But fundamentally, we may still hit an error while
outputting the bytes. Reading the cat-file code, it looks like we should
always die if we hit an error, so at least a reader will get premature
EOF (and not the beginning of another object).
I can believe there is some specific corruption that yields a valid zlib
stream that is a different size than the object advertises. Since
v1.8.4, we double-check that the size we advertised matches what we are
about to write. But the streaming-blob code path does not include that
check, so it might still be affected. It would be pretty easy and cheap
to detect that case.
In any code path where we call parse_object, we double-check that the
result matches the sha1 we asked for. But low-level commands like
cat-file just call read_sha1_file directly, and do not have such a
check. We could add it, but I suspect the processing cost would be
noticeable.
> I also have seen at least once a corrupt pack file that caused git to try
> and allocate a absurd quantity of memory.
I'm not surprised by that. The packfiles contain size information
outside of the checksummed zlib data, and we pre-allocate the buffer
before reading the zlib data. We could try to detect it, but then we are
hard-coding the definition of "absurd". The current definition is "we
asked the OS for memory, and it did not give it to us". :)
-Peff
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] drop support for "experimental" loose objects
2013-11-24 8:44 ` Jeff King
@ 2013-11-24 9:07 ` Jeff King
2013-11-25 18:35 ` Junio C Hamano
0 siblings, 1 reply; 28+ messages in thread
From: Jeff King @ 2013-11-24 9:07 UTC (permalink / raw)
To: Joey Hess; +Cc: git
On Sun, Nov 24, 2013 at 03:44:44AM -0500, Jeff King wrote:
> In any code path where we call parse_object, we double-check that the
> result matches the sha1 we asked for. But low-level commands like
> cat-file just call read_sha1_file directly, and do not have such a
> check. We could add it, but I suspect the processing cost would be
> noticeable.
Curious, I tested this. It is noticeable. Here's the best-of-five
timings for the patch below when running a --batch cat-file on every
object in my git.git repo, using the patch below:
[before]
real 0m12.941s
user 0m12.700s
sys 0m0.244s
[after]
real 0m15.800s
user 0m15.472s
sys 0m0.344s
So it's about 20% slower. I don't know what the right tradeoff is. It's
cool to check the data each time we look at it, but it does carry a
performance penalty. I notice elsewhere in git we are inconsistent. If
you call parse_object() on an object, you get the sha1 check. But if you
just call parse_commit() on something you know to be a commit (e.g.,
because you are traversing the history and looked it up as a parent
pointer), you do not. I don't know if that is oversight, or an
intentional performance decision.
-Peff
---
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index b2ca775..2b09773 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -199,6 +199,8 @@ static void print_object_or_die(int fd, const unsigned char *sha1,
if (type == OBJ_BLOB) {
if (stream_blob_to_fd(fd, sha1, NULL, 0) < 0)
die("unable to stream %s to stdout", sha1_to_hex(sha1));
+ if (check_sha1_signature(sha1, NULL, 0, NULL) < 0)
+ die("object %s sha1 mismatch", sha1_to_hex(sha1));
}
else {
enum object_type rtype;
@@ -208,6 +210,8 @@ static void print_object_or_die(int fd, const unsigned char *sha1,
contents = read_sha1_file(sha1, &rtype, &rsize);
if (!contents)
die("object %s disappeared", sha1_to_hex(sha1));
+ if (check_sha1_signature(sha1, contents, rsize, typename(rtype)) < 0)
+ die("object %s sha1 mismatch", sha1_to_hex(sha1));
if (rtype != type)
die("object %s changed type!?", sha1_to_hex(sha1));
if (rsize != size)
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH] drop support for "experimental" loose objects
2013-11-24 9:07 ` Jeff King
@ 2013-11-25 18:35 ` Junio C Hamano
2013-11-27 9:30 ` Jeff King
0 siblings, 1 reply; 28+ messages in thread
From: Junio C Hamano @ 2013-11-25 18:35 UTC (permalink / raw)
To: Jeff King; +Cc: Joey Hess, git
Jeff King <peff@peff.net> writes:
> On Sun, Nov 24, 2013 at 03:44:44AM -0500, Jeff King wrote:
>
>> In any code path where we call parse_object, we double-check that the
>> result matches the sha1 we asked for. But low-level commands like
>> cat-file just call read_sha1_file directly, and do not have such a
>> check. We could add it, but I suspect the processing cost would be
>> noticeable.
>
> Curious, I tested this. It is noticeable. Here's the best-of-five
> timings for the patch below when running a --batch cat-file on every
> object in my git.git repo, using the patch below:
>
> [before]
> real 0m12.941s
> user 0m12.700s
> sys 0m0.244s
>
> [after]
> real 0m15.800s
> user 0m15.472s
> sys 0m0.344s
>
> So it's about 20% slower. I don't know what the right tradeoff is. It's
> cool to check the data each time we look at it, but it does carry a
> performance penalty.
> diff --git a/builtin/cat-file.c b/builtin/cat-file.c
> index b2ca775..2b09773 100644
> --- a/builtin/cat-file.c
> +++ b/builtin/cat-file.c
> @@ -199,6 +199,8 @@ static void print_object_or_die(int fd, const unsigned char *sha1,
> if (type == OBJ_BLOB) {
> if (stream_blob_to_fd(fd, sha1, NULL, 0) < 0)
> die("unable to stream %s to stdout", sha1_to_hex(sha1));
> + if (check_sha1_signature(sha1, NULL, 0, NULL) < 0)
> + die("object %s sha1 mismatch", sha1_to_hex(sha1));
check_sha1_signature() opens the object again and streams the data.
Essentially the read side is doing twice the work with that patch,
isn't it?
I wonder if we want to extend the stream_blob_to_fd() API to
optionally allow the caller to ask to validate that the returned
data is consistent with the object name the caller asked the data
for. Something along the lines of the attached weatherbaloon patch?
builtin/fsck.c | 3 ++-
entry.c | 2 +-
streaming.c | 29 ++++++++++++++++++++++++++++-
streaming.h | 4 +++-
4 files changed, 34 insertions(+), 4 deletions(-)
diff --git a/builtin/fsck.c b/builtin/fsck.c
index 97ce678..f42ed9c 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -237,7 +237,8 @@ static void check_unreachable_object(struct object *obj)
if (!(f = fopen(filename, "w")))
die_errno("Could not open '%s'", filename);
if (obj->type == OBJ_BLOB) {
- if (stream_blob_to_fd(fileno(f), obj->sha1, NULL, 1))
+ if (stream_blob_to_fd(fileno(f), obj->sha1, NULL,
+ STREAMING_OUTPUT_CAN_SEEK))
die_errno("Could not write '%s'", filename);
} else
fprintf(f, "%s\n", sha1_to_hex(obj->sha1));
diff --git a/entry.c b/entry.c
index 7b7aa81..b3bc827 100644
--- a/entry.c
+++ b/entry.c
@@ -127,7 +127,7 @@ static int streaming_write_entry(const struct cache_entry *ce, char *path,
if (fd < 0)
return -1;
- result |= stream_blob_to_fd(fd, ce->sha1, filter, 1);
+ result |= stream_blob_to_fd(fd, ce->sha1, filter, STREAMING_OUTPUT_CAN_SEEK);
*fstat_done = fstat_output(fd, state, statbuf);
result |= close(fd);
diff --git a/streaming.c b/streaming.c
index debe904..50599df 100644
--- a/streaming.c
+++ b/streaming.c
@@ -2,6 +2,7 @@
* Copyright (c) 2011, Google Inc.
*/
#include "cache.h"
+#include "object.h"
#include "streaming.h"
enum input_source {
@@ -496,19 +497,33 @@ static open_method_decl(incore)
****************************************************************/
int stream_blob_to_fd(int fd, unsigned const char *sha1, struct stream_filter *filter,
- int can_seek)
+ unsigned flags)
{
struct git_istream *st;
enum object_type type;
unsigned long sz;
ssize_t kept = 0;
int result = -1;
+ int can_seek = flags & STREAMING_OUTPUT_CAN_SEEK;
+
+ int want_verify = flags & STREAMING_VERIFY_OBJECT_NAME;
+ git_SHA_CTX ctx;
st = open_istream(sha1, &type, &sz, filter);
if (!st)
return result;
if (type != OBJ_BLOB)
goto close_and_exit;
+
+ if (want_verify) {
+ char hdr[32];
+ int hdrlen;
+
+ git_SHA1_Init(&ctx);
+ hdrlen = sprintf(hdr, "%s %lu", typename(type), sz) + 1;
+ git_SHA1_Update(&ctx, hdr, hdrlen);
+ }
+
for (;;) {
char buf[1024 * 16];
ssize_t wrote, holeto;
@@ -518,6 +533,10 @@ int stream_blob_to_fd(int fd, unsigned const char *sha1, struct stream_filter *f
goto close_and_exit;
if (!readlen)
break;
+
+ if (want_verify)
+ git_SHA1_Update(&ctx, buf, readlen);
+
if (can_seek && sizeof(buf) == readlen) {
for (holeto = 0; holeto < readlen; holeto++)
if (buf[holeto])
@@ -542,6 +561,14 @@ int stream_blob_to_fd(int fd, unsigned const char *sha1, struct stream_filter *f
goto close_and_exit;
result = 0;
+ if (want_verify) {
+ unsigned char verify[20];
+
+ git_SHA1_Final(verify, &ctx);
+ if (hashcmp(verify, lookup_replace_object(sha1)))
+ result = -1;
+ }
+
close_and_exit:
close_istream(st);
return result;
diff --git a/streaming.h b/streaming.h
index 1d05c2a..68fe3a4 100644
--- a/streaming.h
+++ b/streaming.h
@@ -12,6 +12,8 @@ extern struct git_istream *open_istream(const unsigned char *, enum object_type
extern int close_istream(struct git_istream *);
extern ssize_t read_istream(struct git_istream *, void *, size_t);
-extern int stream_blob_to_fd(int fd, const unsigned char *, struct stream_filter *, int can_seek);
+#define STREAMING_OUTPUT_CAN_SEEK 01
+#define STREAMING_VERIFY_OBJECT_NAME 02
+extern int stream_blob_to_fd(int fd, const unsigned char *, struct stream_filter *, unsigned flags);
#endif /* STREAMING_H */
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH] drop support for "experimental" loose objects
2013-11-25 18:35 ` Junio C Hamano
@ 2013-11-27 9:30 ` Jeff King
2013-11-27 18:57 ` Junio C Hamano
0 siblings, 1 reply; 28+ messages in thread
From: Jeff King @ 2013-11-27 9:30 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Joey Hess, git
On Mon, Nov 25, 2013 at 10:35:19AM -0800, Junio C Hamano wrote:
> > if (type == OBJ_BLOB) {
> > if (stream_blob_to_fd(fd, sha1, NULL, 0) < 0)
> > die("unable to stream %s to stdout", sha1_to_hex(sha1));
> > + if (check_sha1_signature(sha1, NULL, 0, NULL) < 0)
> > + die("object %s sha1 mismatch", sha1_to_hex(sha1));
>
> check_sha1_signature() opens the object again and streams the data.
> Essentially the read side is doing twice the work with that patch,
> isn't it?
Yes. I considered that, but I also got ~20% slow-down when just doing
commits/trees, which are in-core and can re-hash the same buffer. So
since my with-blobs numbers backed that up, I didn't think too much
further on it.
But there is something curious about the numbers I posted. It takes 12s
without the check, and 15s with the check. So the extra hashing adds 3s.
But if we are reading each blob twice, and we would expect blob reading
to be a significant chunk of that 12s, then shouldn't we expect much
more than 3s increase?
The answer must be that either we are not streaming as much as I think,
or re-reading the data is much cheaper than I expect. And I think it is
the latter.
The vast majority of blobs in git.git will be stored as packed deltas.
That means the streaming code will fall back to doing the regular
in-core access. We _could_ therefore use that in-core copy to do our
sha1 check rather than streaming; but of course we never get access to
it outside of stream_blob_to_fd, and it is discarded. However, we do
keep a copy in the delta base cache. When we immediately ask to unpack
the exact same entry for check_sha1_signature, we can pull the copy
straight out of the cache without having to re-inflate the object.
After applying the patch below on top of yours, my numbers remain the
same:
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index b2ca775..e3ff677 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -197,7 +197,7 @@ static void print_object_or_die(int fd, const unsigned char *sha1,
enum object_type type, unsigned long size)
{
if (type == OBJ_BLOB) {
- if (stream_blob_to_fd(fd, sha1, NULL, 0) < 0)
+ if (stream_blob_to_fd(fd, sha1, NULL, STREAMING_VERIFY_OBJECT_NAME) < 0)
die("unable to stream %s to stdout", sha1_to_hex(sha1));
}
else {
@@ -208,6 +208,8 @@ static void print_object_or_die(int fd, const unsigned char *sha1,
contents = read_sha1_file(sha1, &rtype, &rsize);
if (!contents)
die("object %s disappeared", sha1_to_hex(sha1));
+ if (check_sha1_signature(sha1, contents, rsize, typename(rtype)) < 0)
+ die("object %s sha1 mismatch", sha1_to_hex(sha1));
if (rtype != type)
die("object %s changed type!?", sha1_to_hex(sha1));
if (rsize != size)
> I wonder if we want to extend the stream_blob_to_fd() API to
> optionally allow the caller to ask to validate that the returned
> data is consistent with the object name the caller asked the data
> for. Something along the lines of the attached weatherbaloon patch?
Yes, I think it is a reasonable addition to the streaming API. However,
I do not think there are any callsites which would currently want it.
All of the current users of stream_blob_to_fd use read_sha1_file as
their alternative, and not parse_object. So we are not verifying the
sha1 in either case (we may want to change that, of course, but that is
a bigger decision than just trying to bring streaming and non-streaming
code-paths into parity).
I also wondered if parse_object itself had problems with double-reading
or failing to verify. But its use goes the opposite direction; it wants
to verify the sha1 of the blob object, but it knows that it does not
actually need the data. So it streams (as of 090ea12) to check the
signature, but then discards each buffer-full after hashing it.
-Peff
^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH] drop support for "experimental" loose objects
2013-11-27 9:30 ` Jeff King
@ 2013-11-27 18:57 ` Junio C Hamano
2013-11-27 19:03 ` Jeff King
0 siblings, 1 reply; 28+ messages in thread
From: Junio C Hamano @ 2013-11-27 18:57 UTC (permalink / raw)
To: Jeff King; +Cc: Joey Hess, git
Jeff King <peff@peff.net> writes:
> The vast majority of blobs in git.git will be stored as packed deltas.
> That means the streaming code will fall back to doing the regular
> in-core access. We _could_ therefore use that in-core copy to do our
> sha1 check rather than streaming; but of course we never get access to
> it outside of stream_blob_to_fd, and it is discarded. However, we do
> keep a copy in the delta base cache. When we immediately ask to unpack
> the exact same entry for check_sha1_signature, we can pull the copy
> straight out of the cache without having to re-inflate the object.
OK, that explains the overhead of 20% that is lower than one would
naïvely expect. Thanks.
> Yes, I think it is a reasonable addition to the streaming API. However,
> I do not think there are any callsites which would currently want it.
> All of the current users of stream_blob_to_fd use read_sha1_file as
> their alternative, and not parse_object. So we are not verifying the
> sha1 in either case (we may want to change that, of course, but that is
> a bigger decision than just trying to bring streaming and non-streaming
> code-paths into parity).
True. I am not offhand sure if we want to make read_sha1_file() to
rehash, but I agree that it is a question different from what we are
asking in this discussion.
> I also wondered if parse_object itself had problems with double-reading
> or failing to verify. But its use goes the opposite direction; it wants
> to verify the sha1 of the blob object, but it knows that it does not
> actually need the data. So it streams (as of 090ea12) to check the
> signature, but then discards each buffer-full after hashing it.
>
> -Peff
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] drop support for "experimental" loose objects
2013-11-27 18:57 ` Junio C Hamano
@ 2013-11-27 19:03 ` Jeff King
0 siblings, 0 replies; 28+ messages in thread
From: Jeff King @ 2013-11-27 19:03 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Joey Hess, git
On Wed, Nov 27, 2013 at 10:57:14AM -0800, Junio C Hamano wrote:
> > Yes, I think it is a reasonable addition to the streaming API. However,
> > I do not think there are any callsites which would currently want it.
> > All of the current users of stream_blob_to_fd use read_sha1_file as
> > their alternative, and not parse_object. So we are not verifying the
> > sha1 in either case (we may want to change that, of course, but that is
> > a bigger decision than just trying to bring streaming and non-streaming
> > code-paths into parity).
>
> True. I am not offhand sure if we want to make read_sha1_file() to
> rehash, but I agree that it is a question different from what we are
> asking in this discussion.
I'm torn on that. Having git verify everything all the time is kind of
cool. But it _does_ have a performance impact, and the vast majority of
the time nothing got corrupted since the last time we looked at the
object. It seems like periodically running "git fsck" is a smarter way
of doing periodic checks.
We already are careful when objects are coming into the repository, and
I think that is a sensible boundary (and I am increasingly of the
opinion that running with transfer.fsckobjects off is not a good idea).
The checks in parse_object seem hack-ish to me, because they catch some
random subset of the times we access objects (e.g., calling parse_object
on a commit sha1 will check, but calling parse_commit on an unparsed
commit struct will not). If anything, I'd suggest moving the checking
down to read_sha1_file, which would add it fairly consistently
everywhere, and then tying it to a config option (off for high
performance, on for slower-but-meticulous).
-Peff
^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2013-11-27 19:03 UTC | newest]
Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-20 20:33 corrupt object memory allocation error Joey Hess
2013-11-20 21:33 ` Jeff King
2013-11-20 22:28 ` Joey Hess
2013-11-21 11:41 ` [PATCH] drop support for "experimental" loose objects Jeff King
2013-11-21 11:48 ` Jeff King
2013-11-21 12:43 ` Duy Nguyen
2013-11-21 14:42 ` Keshav Kini
2013-11-21 22:41 ` Jeff King
2013-11-21 19:44 ` Junio C Hamano
2013-11-23 0:24 ` Jonathan Nieder
2013-11-23 0:30 ` Jeff King
2013-11-23 0:47 ` Jonathan Nieder
2013-11-21 16:04 ` Joey Hess
2013-11-21 20:19 ` Christian Couder
2013-11-22 9:58 ` Jeff King
2013-11-22 11:04 ` Christian Couder
2013-11-22 11:24 ` Jeff King
2013-11-22 14:23 ` Christian Couder
2013-11-22 16:15 ` Jeff King
2013-11-22 17:23 ` Junio C Hamano
2013-11-22 2:09 ` Jeff King
2013-11-22 17:28 ` Joey Hess
2013-11-24 8:44 ` Jeff King
2013-11-24 9:07 ` Jeff King
2013-11-25 18:35 ` Junio C Hamano
2013-11-27 9:30 ` Jeff King
2013-11-27 18:57 ` Junio C Hamano
2013-11-27 19:03 ` Jeff King
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).