* diff-core segfault
@ 2005-12-12 16:29 Darrin Thompson
2005-12-12 16:56 ` Johannes Schindelin
` (2 more replies)
0 siblings, 3 replies; 15+ messages in thread
From: Darrin Thompson @ 2005-12-12 16:29 UTC (permalink / raw)
To: git
$ mkdir a
$ cd a/
$ git-init-db
defaulting to local storage area
$ touch a
$ git-update-index --add a
$ git-commit -m 'message'
Committing initial tree 496d6428b9cf92981dc9495211e6e1120fb6f2ba
$ echo hello >a
$ git-diff-files
:100644 100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
0000000000000000000000000000000000000000 M a
$ git-diff-files -B
Segmentation fault
Disclaimer: I'm running 1.0rc2. I did some searching of recent mailing
list posts and commits. I see no evidence that this has been addressed.
Could someone confirm that this exists on more recent git heads and fix
if needed?
Thanks.
--
Darrin
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: diff-core segfault 2005-12-12 16:29 diff-core segfault Darrin Thompson @ 2005-12-12 16:56 ` Johannes Schindelin 2005-12-12 18:50 ` Junio C Hamano 2005-12-12 20:40 ` [PATCH 2/2] diff-delta.c: allow delta with empty blob Junio C Hamano 2 siblings, 0 replies; 15+ messages in thread From: Johannes Schindelin @ 2005-12-12 16:56 UTC (permalink / raw) To: Darrin Thompson; +Cc: git Hi, On Mon, 12 Dec 2005, Darrin Thompson wrote: > $ git-diff-files -B > Segmentation fault This looks exactly like the problem on cygwin which is fixed by using NO_MMAP=YesPlease. How about enabling NO_MMAP=YesPlease on cygwin per default? I think there are enough cases where it helps. If it is too slow *and* the user knows what she's doing, she can recompile NO_MMAP=NoNoNo. Opinions, please? Ciao, Dscho ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: diff-core segfault 2005-12-12 16:29 diff-core segfault Darrin Thompson 2005-12-12 16:56 ` Johannes Schindelin @ 2005-12-12 18:50 ` Junio C Hamano 2005-12-12 18:59 ` Delitifier broken (Re: diff-core segfault) Junio C Hamano 2005-12-12 20:40 ` [PATCH 2/2] diff-delta.c: allow delta with empty blob Junio C Hamano 2 siblings, 1 reply; 15+ messages in thread From: Junio C Hamano @ 2005-12-12 18:50 UTC (permalink / raw) To: Darrin Thompson; +Cc: git Darrin Thompson <darrint@progeny.com> writes: > Could someone confirm that this exists on more recent git heads and fix > if needed? (1) Yup. I can reproduce it. (2) Will look into it when able. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Delitifier broken (Re: diff-core segfault) 2005-12-12 18:50 ` Junio C Hamano @ 2005-12-12 18:59 ` Junio C Hamano 2005-12-12 21:28 ` Nicolas Pitre 0 siblings, 1 reply; 15+ messages in thread From: Junio C Hamano @ 2005-12-12 18:59 UTC (permalink / raw) To: Darrin Thompson; +Cc: git, Linus Torvalds Junio C Hamano <junkio@cox.net> writes: > Darrin Thompson <darrint@progeny.com> writes: > >> Could someone confirm that this exists on more recent git heads and fix >> if needed? > > (1) Yup. I can reproduce it. > (2) Will look into it when able. This is not just "diff". Our deltify code is half-broken, and in the worst case this can corrupt our packs if an empty blob is involved. The problem is if from_size or to_size is empty, it does not produce any. if (!from_size || !to_size || delta_prepare(from_buf, from_size, &bdf)) return NULL; I think either we need to make the users more careful or fix deltifier to produce trivial delta. I'd vote for the latter; let me rig up something. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Delitifier broken (Re: diff-core segfault) 2005-12-12 18:59 ` Delitifier broken (Re: diff-core segfault) Junio C Hamano @ 2005-12-12 21:28 ` Nicolas Pitre 2005-12-12 21:54 ` Junio C Hamano 0 siblings, 1 reply; 15+ messages in thread From: Nicolas Pitre @ 2005-12-12 21:28 UTC (permalink / raw) To: Junio C Hamano; +Cc: Darrin Thompson, git, Linus Torvalds On Mon, 12 Dec 2005, Junio C Hamano wrote: > Junio C Hamano <junkio@cox.net> writes: > > > Darrin Thompson <darrint@progeny.com> writes: > > > >> Could someone confirm that this exists on more recent git heads and fix > >> if needed? > > > > (1) Yup. I can reproduce it. > > (2) Will look into it when able. > > This is not just "diff". Our deltify code is half-broken, and > in the worst case this can corrupt our packs if an empty blob is > involved. I would say involving an empty blob with deltas _is_ the bug in the first place. Please don't let that happen. Especially with pack files, an empty blob can be represented with a _single_ byte. A delta must always be against something else and simply storing the reference for the object the delta is against will always use at least 20 bytes even for empty ones. > The problem is if from_size or to_size is empty, it does not > produce any. > > if (!from_size || !to_size || delta_prepare(from_buf, from_size, &bdf)) > return NULL; > > > I think either we need to make the users more careful or fix > deltifier to produce trivial delta. I'd vote for the latter; > let me rig up something. If my opinion is still of any weight I'd strongly vote for the former. A delta against an empty object, or a delta that produces an empty object simply makes no sense since it is always suboptimal compared to storing the non deltified object (or finding another object to deltify against). Allowing empty deltas only paper over another more fundamental bug IMHO. Nicolas ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Delitifier broken (Re: diff-core segfault) 2005-12-12 21:28 ` Nicolas Pitre @ 2005-12-12 21:54 ` Junio C Hamano 2005-12-12 23:31 ` Linus Torvalds 0 siblings, 1 reply; 15+ messages in thread From: Junio C Hamano @ 2005-12-12 21:54 UTC (permalink / raw) To: Nicolas Pitre; +Cc: git Nicolas Pitre <nico@cam.org> writes: >> This is not just "diff". Our deltify code is half-broken, and >> in the worst case this can corrupt our packs if an empty blob is >> involved. > > I would say involving an empty blob with deltas _is_ the bug in the > first place. Please don't let that happen. Not all use of delta is to produce a pack. An empty->empty delta is a valid two byte \0\0 sequence, and I do not see any reason to forbid it. Although using such delta to represent anything in a pack does *not* make any sense as you say, it makes other callers simpler if they do not have to check if from_len and to_len are empty before calling the delta code. They care about from_len=0 (or to_len=0) case to produce similar results as from_len=1 (or to_len=1) case and do not care at all about the produced delta being a useful one for compressed storage purposes. > Especially with pack files, an empty blob can be represented with a > _single_ byte. A delta must always be against something else and simply > storing the reference for the object the delta is against will always > use at least 20 bytes even for empty ones. True, and the pack code is actually safe. It punts on NULL return, so my initial worry about packs turns out to be unneeded. > If my opinion is still of any weight I'd strongly vote for the former. I ended up doing both ;-). The call site of diffcore-break was certainly careless and broken (fixed); I've run git-grep to check all callers to diff_delta() and the only one that did not check the return value with NULL was the one that started with thread. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Delitifier broken (Re: diff-core segfault) 2005-12-12 21:54 ` Junio C Hamano @ 2005-12-12 23:31 ` Linus Torvalds 2005-12-13 1:08 ` Junio C Hamano 0 siblings, 1 reply; 15+ messages in thread From: Linus Torvalds @ 2005-12-12 23:31 UTC (permalink / raw) To: Junio C Hamano; +Cc: Nicolas Pitre, git On Mon, 12 Dec 2005, Junio C Hamano wrote: > Nicolas Pitre <nico@cam.org> writes: > > > > I would say involving an empty blob with deltas _is_ the bug in the > > first place. Please don't let that happen. I agree with Nicolas. > Not all use of delta is to produce a pack. An empty->empty > delta is a valid two byte \0\0 sequence, and I do not see any > reason to forbid it. Although using such delta to represent > anything in a pack does *not* make any sense as you say, it > makes other callers simpler if they do not have to check if > from_len and to_len are empty before calling the delta code. And you don't need to. Do what pack-objects.c does: just call "diff_delta()" and check the result for NULL. If the result is NULL, then you have to do some special code, because that means that it's a full create or a full delete (or it's an unchanged empty file). Regardless, it really _is_ a special case, and it would be silly to generate a delta for it. Linus ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Delitifier broken (Re: diff-core segfault) 2005-12-12 23:31 ` Linus Torvalds @ 2005-12-13 1:08 ` Junio C Hamano 2005-12-13 1:34 ` Linus Torvalds 0 siblings, 1 reply; 15+ messages in thread From: Junio C Hamano @ 2005-12-13 1:08 UTC (permalink / raw) To: Linus Torvalds; +Cc: git Linus Torvalds <torvalds@osdl.org> writes: > Do what pack-objects.c does: just call "diff_delta()" and check the result > for NULL. If the result is NULL, then you have to do some special code, > because that means that it's a full create or a full delete (or it's an > unchanged empty file). Regardless, it really _is_ a special case, and it > would be silly to generate a delta for it. When the result is NULL, it could be delta against empty, or other failure in diff_delta() (could be it exceeded max_size, could be it could not allocate memory, could be we introduced some other failure modes later...). I'll revert the changes anyway, but not because I necessarily agree with you two. I am not 100% confident that the core of the diff_delta code would work fine with empty input (it seems to from my limited test), and I do not want to break things unnecessarily at this point. More importantly, for the updated delta code that allows empty input to work, the codepaths the various existing callers that check with NULL must not be assuming non-NULL return means non empty input -- otherwise my change would subtly break things -- and I do not have enough energy to verify that right now. Since we do not break files smaller than MINIMUM_BREAK_SIZE, this becomes a non-issue with the attached patch. I do not know why I did not check both sides when I did it the first time; I do not know why I was too stupid to notice that the earlier test in the if() was far more expensive than the later one, either ;-). -- >8 -- diff --git a/diffcore-break.c b/diffcore-break.c index e6a468e..9b27456 100644 --- a/diffcore-break.c +++ b/diffcore-break.c @@ -55,12 +55,6 @@ static int should_break(struct diff_file * is the default. */ - if (!S_ISREG(src->mode) || !S_ISREG(dst->mode)) - return 0; /* leave symlink rename alone */ - - if (diff_populate_filespec(src, 0) || diff_populate_filespec(dst, 0)) - return 0; /* error but caught downstream */ - base_size = ((src->size < dst->size) ? src->size : dst->size); delta = diff_delta(src->data, src->size, @@ -169,9 +163,15 @@ void diffcore_break(int break_score) if (DIFF_FILE_VALID(p->one) && DIFF_FILE_VALID(p->two) && !S_ISDIR(p->one->mode) && !S_ISDIR(p->two->mode) && !strcmp(p->one->path, p->two->path)) { - if (should_break(p->one, p->two, - break_score, &score) && - MINIMUM_BREAK_SIZE <= p->one->size) { + + if (S_ISREG(p->one->mode) && + S_ISREG(p->two->mode) && + !diff_populate_filespec(p->one, 0) && + MINIMUM_BREAK_SIZE <= p->one->size && + !diff_populate_filespec(p->two, 0) && + MINIMUM_BREAK_SIZE <= p->two->size && + should_break(p->one, p->two, + break_score, &score)) { /* Split this into delete and create */ struct diff_filespec *null_one, *null_two; struct diff_filepair *dp; ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: Delitifier broken (Re: diff-core segfault) 2005-12-13 1:08 ` Junio C Hamano @ 2005-12-13 1:34 ` Linus Torvalds 2005-12-13 1:41 ` Junio C Hamano 2005-12-13 2:05 ` Linus Torvalds 0 siblings, 2 replies; 15+ messages in thread From: Linus Torvalds @ 2005-12-13 1:34 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On Mon, 12 Dec 2005, Junio C Hamano wrote: > > I'll revert the changes anyway, but not because I necessarily > agree with you two. I am not 100% confident that the core of > the diff_delta code would work fine with empty input (it seems > to from my limited test), and I do not want to break things > unnecessarily at this point. Well, I checked the pack-objects.c side, and your patch to diff_delta() should not hurt at least there. We already check the size and would have broken out long before if either side was zero-sized. But that's kind of part of the point - any user of diff_delta() is likely to have checked the size anyway for other reasons. There's just very seldom any valid reason to generate a delta against an empty file, there's no interesting information that diff_delta() can really give us. Basically, the binary diffs that diff-delta returns are interesting for just two things: - efficient packing, in the pack-objects.c style. As mentioned, pack-objects.c needs to check the size heuristics before doing diff_delta() _anyway_, for performance reasons as well as simply because the secondary use of diff_delta() is to estimate how big the delta is, and it's always pointless to generate a delta that is guaranteed to be bigger than the file (which is always the case with either side being an empty file - the size difference will inevitably be bigger than the size of the resulting file). - difference size estimation (ie for rename/copy detection) This boils down to the same case as the secondary use of pack-objects, ie delta size estimation. Again, if either side is empty, we _know_ that the delta generation is pointless, because the delta is always going to be bigger than the end result, and thus it can't be sensible for rename/copy detection. So in one sense I actually agree with your patch: it makes the deltifier code more generic and actually simplifies the diff_delta() code a bit by avoiding one special case, and in that sense it's a good change. So the reason I disagree with it is that doing the delta is always going to be unnecessary work. And regardless of how we're ever going to use the delta, we _know_ that it's unnecessary work. So I think your diffcore-break.c patch is much more appropriate: it also fixes the bug, but it fixes it by virtue of realizing that the delta cannot matter and thus should never even be computed. Now, your diff_setup() change may actually be worth it because of the simplification, but on the other hand, you can also consider the NULL return as being nice because it's effectively a way of saying "the delta is meaningless, why did you even ask me?" Linus ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Delitifier broken (Re: diff-core segfault) 2005-12-13 1:34 ` Linus Torvalds @ 2005-12-13 1:41 ` Junio C Hamano 2005-12-13 2:05 ` Linus Torvalds 1 sibling, 0 replies; 15+ messages in thread From: Junio C Hamano @ 2005-12-13 1:41 UTC (permalink / raw) To: Linus Torvalds; +Cc: git Linus Torvalds <torvalds@osdl.org> writes: > So I think your diffcore-break.c patch is much more appropriate: it also > fixes the bug, but it fixes it by virtue of realizing that the delta > cannot matter and thus should never even be computed. Agreed, redone and pushed out. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Delitifier broken (Re: diff-core segfault) 2005-12-13 1:34 ` Linus Torvalds 2005-12-13 1:41 ` Junio C Hamano @ 2005-12-13 2:05 ` Linus Torvalds 2005-12-13 2:45 ` Nicolas Pitre 1 sibling, 1 reply; 15+ messages in thread From: Linus Torvalds @ 2005-12-13 2:05 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On Mon, 12 Dec 2005, Linus Torvalds wrote: > > As mentioned, pack-objects.c needs to check the size heuristics before > doing diff_delta() _anyway_, for performance reasons as well as simply > because the secondary use of diff_delta() is to estimate how big the > delta is, and it's always pointless to generate a delta that is > guaranteed to be bigger than the file (which is always the case with > either side being an empty file - the size difference will inevitably > be bigger than the size of the resulting file). Side note: this isn't technically entirely true. A binary diff that has a source file that is empty could in theory be smaller than the destination file simply because it may involve a certain amount of automatic compression in the form of "insert 100 spaces" kind of diff encoding. I'm not sure whether xdelta actually does something like that, but it's certainly possible at least in theory. Of course, even if the delta in such a case may be smaller than the resulting file, such a delta is still not interesting: even from a packing angle, if the resulting file has patterns that makes it easy to generate a small delta against an empty file, the fact is, such a regular end result will _compress_ better than the delta will, assuming any decent compression mechanism. So from a packing standpoint, generating the delta is still the wrong thing to do - you're better off with just compressing the undeltified result. And from a similarity-estimation standpoint, going from an empty file to anything else is also obviously not interesting either. An empty file cannot be "similar" to anything else (except perhaps another empty file, and even that is a matter of taste). I just wanted to correct the technicality that delta's can certainly be smaller than the result at least if the delta format allows for that kind of encoding. Linus ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Delitifier broken (Re: diff-core segfault) 2005-12-13 2:05 ` Linus Torvalds @ 2005-12-13 2:45 ` Nicolas Pitre 2005-12-13 3:23 ` Junio C Hamano 0 siblings, 1 reply; 15+ messages in thread From: Nicolas Pitre @ 2005-12-13 2:45 UTC (permalink / raw) To: Linus Torvalds; +Cc: Junio C Hamano, git On Mon, 12 Dec 2005, Linus Torvalds wrote: > > > On Mon, 12 Dec 2005, Linus Torvalds wrote: > > > > As mentioned, pack-objects.c needs to check the size heuristics before > > doing diff_delta() _anyway_, for performance reasons as well as simply > > because the secondary use of diff_delta() is to estimate how big the > > delta is, and it's always pointless to generate a delta that is > > guaranteed to be bigger than the file (which is always the case with > > either side being an empty file - the size difference will inevitably > > be bigger than the size of the resulting file). > > Side note: this isn't technically entirely true. A binary diff that has a > source file that is empty could in theory be smaller than the destination > file simply because it may involve a certain amount of automatic > compression in the form of "insert 100 spaces" kind of diff encoding. I'm > not sure whether xdelta actually does something like that, but it's > certainly possible at least in theory. xdelta doesn't. It only has two functions currently: 1) copy x bytes from offset y in source file to current position in destination file; 2) paste the x following bytes straight from the delta stream to current position into the destination file. Of course in the GIT context files are buffers. However I added the possibility for (1) to use the destination file as well as the "source" file for block copy in patch_delta(). However diff_delta() currently doesn't use that capability. But if it did then the "insert 100 spaces" would be: - paste \x20\x20\x20\x20 to dest (delta = 5 bytes, dest = 4 bytes) - copy 4 bytes from offset 0 of dest to dest (delta = 7 bytes, dest = 8 bytes) - copy 8 bytes from offset 0 of dest to dest (delta = 9 bytes, dest = 16 bytes) - copy 16 bytes from offset 0 of dest to dest (delta = 11 bytes, dest = 32 bytes) - copy 32 bytes from offset 0 of dest to dest (delta = 13 bytes, dest = 64 bytes) - copy 36 bytes from offset 0 of dest to dest (delta = 15 bytes, dest = 100 bytes) And yet that could be optimized further with a better size for the initial paste. However adding that capability to diff_delta() might make it significantly slower for still unknown gain for real life data. But I should write the code some day. Nicolas ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Delitifier broken (Re: diff-core segfault) 2005-12-13 2:45 ` Nicolas Pitre @ 2005-12-13 3:23 ` Junio C Hamano 0 siblings, 0 replies; 15+ messages in thread From: Junio C Hamano @ 2005-12-13 3:23 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Linus Torvalds, git Nicolas Pitre <nico@cam.org> writes: > However I added the possibility for (1) to use the destination file as > well as the "source" file for block copy in patch_delta(). Ah, I've been wondering where that one came from. ^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 2/2] diff-delta.c: allow delta with empty blob. 2005-12-12 16:29 diff-core segfault Darrin Thompson 2005-12-12 16:56 ` Johannes Schindelin 2005-12-12 18:50 ` Junio C Hamano @ 2005-12-12 20:40 ` Junio C Hamano 2005-12-12 21:12 ` Darrin Thompson 2 siblings, 1 reply; 15+ messages in thread From: Junio C Hamano @ 2005-12-12 20:40 UTC (permalink / raw) To: Darrin Thompson; +Cc: git Delta computation with an empty blob used to punt and returned NULL. This commit allows creation with empty blob; all combination of empty->empty, empty->something, and something->empty are allowed. Signed-off-by: Junio C Hamano <junkio@cox.net> --- Darrin Thompson <darrint@progeny.com> writes: > $ git-diff-files -B > Segmentation fault > Could someone confirm that this exists on more recent git heads and fix > if needed? Could you try this patch? It is marked as 2/2 but 1/2 is a test script to reproduce the problem with the current code, which this patch is supposed to fix, and this should be the only fix you need. delta.h | 4 ++-- diff-delta.c | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) 1e4d6f6618abb72e3948419d113a6f2f14c83ebc diff --git a/delta.h b/delta.h index 31d1820..c6a4763 100644 --- a/delta.h +++ b/delta.h @@ -9,8 +9,8 @@ extern void *patch_delta(void *src_buf, void *delta_buf, unsigned long delta_size, unsigned long *dst_size); -/* the smallest possible delta size is 4 bytes */ -#define DELTA_SIZE_MIN 4 +/* the smallest possible delta size is 2 bytes (empty to empty) */ +#define DELTA_SIZE_MIN 2 /* * This must be called twice on the delta data buffer, first to get the diff --git a/diff-delta.c b/diff-delta.c index b2ae7b5..cf50138 100644 --- a/diff-delta.c +++ b/diff-delta.c @@ -213,7 +213,7 @@ void *diff_delta(void *from_buf, unsigne bdrecord_t *brec; bdfile_t bdf; - if (!from_size || !to_size || delta_prepare(from_buf, from_size, &bdf)) + if (delta_prepare(from_buf, from_size, &bdf)) return NULL; outpos = 0; -- 0.99.9.GIT ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 2/2] diff-delta.c: allow delta with empty blob. 2005-12-12 20:40 ` [PATCH 2/2] diff-delta.c: allow delta with empty blob Junio C Hamano @ 2005-12-12 21:12 ` Darrin Thompson 0 siblings, 0 replies; 15+ messages in thread From: Darrin Thompson @ 2005-12-12 21:12 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On Mon, 2005-12-12 at 12:40 -0800, Junio C Hamano wrote: > Could you try this patch? It is marked as 2/2 but 1/2 is a > test script to reproduce the problem with the current code, > which this patch is supposed to fix, and this should be the > only fix you need. I'm almost out of time for today. Possibly tomorrow I can get to it. -- Darrin ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2005-12-13 3:23 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2005-12-12 16:29 diff-core segfault Darrin Thompson 2005-12-12 16:56 ` Johannes Schindelin 2005-12-12 18:50 ` Junio C Hamano 2005-12-12 18:59 ` Delitifier broken (Re: diff-core segfault) Junio C Hamano 2005-12-12 21:28 ` Nicolas Pitre 2005-12-12 21:54 ` Junio C Hamano 2005-12-12 23:31 ` Linus Torvalds 2005-12-13 1:08 ` Junio C Hamano 2005-12-13 1:34 ` Linus Torvalds 2005-12-13 1:41 ` Junio C Hamano 2005-12-13 2:05 ` Linus Torvalds 2005-12-13 2:45 ` Nicolas Pitre 2005-12-13 3:23 ` Junio C Hamano 2005-12-12 20:40 ` [PATCH 2/2] diff-delta.c: allow delta with empty blob Junio C Hamano 2005-12-12 21:12 ` Darrin Thompson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).