git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dmitry Ivankov <divanorama@gmail.com>
To: git@vger.kernel.org
Cc: "Shawn O. Pearce" <spearce@spearce.org>,
	Jonathan Nieder <jrnieder@gmail.com>,
	David Barr <davidbarr@google.com>
Subject: Re: long fast-import errors out "failed to apply delta"
Date: Sun, 24 Jul 2011 02:34:20 +0600	[thread overview]
Message-ID: <CA+gfSn8C-nB2hSSRTqSu1N1Z-b8ctRsVmUGAjLXTW0du_W3EQw@mail.gmail.com> (raw)
In-Reply-To: <CA+gfSn8jjptyv10iVimmfXpf6QHrR_3UpkRdd+Dv1M=KgORtGQ@mail.gmail.com>

+people contributed to the related parts of fast-import.c

Hi!

I did more tests and investigations. (one can safely jump over next 2
paragraphs)
There is a problem in fast-import, triggered on long svn-fe produced
imports (10k..80k commits depending on the svn repo history, in my
tests at least). fast-imports sometimes writes off a tree object with
wrong sha1. It goes silent unless it tries to read these objects for
later revisions (via ls command) and "crashes", or unless fsck is run
to find sha1 mismatches.

This seems to be a old bug. Due to the content of my fast-import
streams, I need cat-blob and ls commands (once the import stream is
sniffed these can be ignored) and more importantly 334fba656b50c9..
"Teach fast-import to import subtrees named by tree id" 30 Jun 2010
for  "M 040000 <tree id> pathname" commands. The earliest git version
I've tested so far is v1.6.0.6-7-g3d1d81e Jan 2009 + dummy cat-blob
and ls + 334fba656b50c9..

So, how do I currently reproduce it.
Import gcc svn repository with svn-fe up to r15507 - fine. (1.9G
fast-import stream, 3min to import, 1min to fsck)
Import up to r15508 - fine, but fsck finds a sha1 mismatch in a tree object.
Strip r15508 to a three commands:

..commit header..
ls :14842 branches/gcc3/gcc/config
M 040000 fbc83f80e9516c831918dff149058cba38a2e5f1 tags/egcs_ss_970917/gcc/config
ls :15459 trunk/gcc/config/alpha
M 040000 9ffe84c346eec93b523d95ce642b54d54d23109c
tags/egcs_ss_970917/gcc/config/alpha
ls "tags/egcs_ss_970917/gcc/config/alpha"
D tags/egcs_ss_970917/gcc/config/alpha/vms-tramp.asm

So here we set a directory with one old tree, then set it's child with
another old tree, and then delete a file in the child. The broken tree
is the resulting tree for that child. sha1 written to the pack matches
the intent (the second M + D) while the content is wrong (matches the
first M/subdir + D) - the second M command is partially lost.

If a checkpoint command is added before the last commit there is no
sha1 mismatch. I haven't yet found a small fast-import stream with
this bug, but it is a definetely a logic bug somewhere in fast-import.
I randomly changed pool sizes (didn't touch packfile settings yet),
run valgrind, slightly modified the stream many times, tested a bunch
of different git versions on two machines - the bug is stable.

Any ideas? How is it possible at all that the tree's sha1 and content
diverge, and moreover this is unnoticed by fast-import and a broken
packfile is produced?

On Thu, Jul 7, 2011 at 4:47 PM, Dmitry Ivankov <divanorama@gmail.com> wrote:
> Hi,
>
> I'm getting a strange error from git-fast-import.
> Tested on v1.7.5 and v1.7.6 on two machines (gentoo amd64 8gb ram,
> 3-core amd cpu; gentoo x86 2gb ram, 1-core intel mobile cpu).
> The crash is stable - same message, instruction and deepest function
> (patch_delta) parameters contents.
>
> $ git fast-import --quiet < big_dump
> fatal: failed to apply delta
> fast-import: dumping crash report to fast_import_crash_7700
[skipped the details]

  parent reply	other threads:[~2011-07-23 20:34 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-07-07 10:47 long fast-import errors out "failed to apply delta" Dmitry Ivankov
2011-07-07 12:24 ` Dmitry Ivankov
2011-07-23 20:34 ` Dmitry Ivankov [this message]
2011-07-26 11:46   ` Dmitry Ivankov
2011-07-26 16:58     ` Jonathan Nieder
2011-07-26 18:22       ` Dmitry Ivankov
2011-07-26 18:55         ` Jonathan Nieder
2011-07-26 21:09           ` Dmitry Ivankov
2011-07-28  9:56     ` Jonathan Nieder
2011-07-28 11:24       ` Dmitry Ivankov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CA+gfSn8C-nB2hSSRTqSu1N1Z-b8ctRsVmUGAjLXTW0du_W3EQw@mail.gmail.com \
    --to=divanorama@gmail.com \
    --cc=davidbarr@google.com \
    --cc=git@vger.kernel.org \
    --cc=jrnieder@gmail.com \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).