git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: "Shawn O. Pearce" <spearce@spearce.org>
Cc: jidanni@jidanni.org, johannes.schindelin@gmx.de, nico@cam.org,
	gitster@pobox.com, mdl123@verizon.net, git@vger.kernel.org
Subject: Re: [PATCH] Documentation/git-bundle.txt: Dumping contents of any bundle
Date: Fri, 2 Jan 2009 03:27:09 -0500	[thread overview]
Message-ID: <20090102082709.GA3498@coredump.intra.peff.net> (raw)
In-Reply-To: <20090102071519.GA14472@spearce.org>

On Thu, Jan 01, 2009 at 11:15:19PM -0800, Shawn O. Pearce wrote:

> > OK, I wish you luck in the fruition of the new --dump-delta option, and
> > can proofread the man pages involved, otherwise this is no area for
> > junior programmer me.
> 
> This is rather insane.  There's very little data inside of a delta.
> That's sort of the point of that level of compression, it takes
> up very little disk space and yet describes the change made.
> Almost nobody is going to want the delta without the base object
> it applies onto.  No user of git is going to need that.  I'd rather
> not carry dead code around in the tree for something nobody will
> ever use.

I somewhat agree. Obviously we can come up with contrived cases where
the delta is a pure "add" and this option magically lets you recover
some text via "strings" on the resulting delta dump. But in practice,
it's hard to say exactly how useful it would be, especially since the
"motivation" here seems to be more academic than any actual real-world
problem. We can approximate with something like:

  git clone git://git.kernel.org/pub/scm/git/git.git
  cd git
  git bundle create ../bundle.git v1.6.0..v1.6.1
  mkdir ../broken && cd ../broken
  sed '/^PACK/,$!d' ../bundle.git >pack
  git init
  git unpack-objects --dump-deltas <pack
  strings .git/lost-found/delta/* | less

where maybe you lost your actual repository, but you still have a backup
of a bundle you sneaker-netted between major versions. In this instance
we have 6000 objects in the bundle, 2681 of which are blobs (and
therefore presumably the most interesting things to recover). Of those,
1070 were non-delta and can be recovered completely. For the remainder,
our strings command shows us snippets of what was there. There are
definitely recognizable pieces of code. But likewise there are pieces of
code that are missing subtle parts. E.g.:

                  if (textconv_one) {
                        size_t size;
                        mf1.ptr = run_textconv(textconv_one, one, &size);
                        if (!mf1.
ptr)
                        mf1.size = size;
                if (textconv_two) {
                        size_t size;
                        mf2.ptr = run_textconv(textconv_two, two, &size);
                        if (!mf2.
ptr)
                        mf2.size = size;

So while there is _something_ to be recovered there, it is basically as
easy to rewrite the code as it is to piece together whatever fragments
are available into something comprehensible.

So in practice, the delta dump would only be useful if:

  1. You have an incomplete thin pack, which generally means you are
     using bundles (or you interrupted a fetch and kept the tmp_pack).

  2. There is _no_ other copy of the basis. The results you get from
     this method are so awful that it should really only be last-ditch.
     I think you would be insane to say "Oh, I don't have net access
     right now. Let me just spend hours picking through these deltas to
     find a scrap of something useful instead of just waiting until I
     get access again."

  3. The changes in the pack tend to produce deltas rather than full
     blobs, but the deltas tend to be very add-heavy.

I don't know how popular bundles are, but I would expect (1) puts us
very much in the minority. On top of that, given the nature of git, I
find (2) to be pretty unlikely. If you're sneaker-netting data with a
bundle, then it seems rare that both ends of the net will be lost at
once. As for (3), it seems source code is not a good candidate here.
Perhaps if you were writing a novel in a single file, you might salvage
whole paragraphs or even chapters.

So I am inclined to leave it as-is: a patch in the list archive. If and
when the day comes when somebody loses some super-important data and
somehow matches all of these criteria, then they can consult whatever
aged and senile git gurus still exist to pull the patch out and see if
anything can be recovered.

-Peff

  reply	other threads:[~2009-01-02  8:28 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-12-19 19:29 How to extract files out of a "git bundle", no matter what? jidanni
2008-12-19 19:32 ` Shawn O. Pearce
2008-12-19 19:57   ` Mark Levedahl
2008-12-19 20:13     ` jidanni
2008-12-19 20:21       ` Jeff King
2008-12-19 20:35         ` jidanni
2008-12-19 20:51           ` Jeff King
2009-01-01  4:24             ` [PATCH] Documentation/git-bundle.txt: Dumping contents of any bundle jidanni
2009-01-01 17:03               ` Johannes Schindelin
2009-01-01 19:21               ` Jeff King
2009-01-01 22:12                 ` jidanni
2009-01-01 23:48                   ` Jeff King
2009-01-02  0:10                     ` jidanni
2009-01-02  7:15                       ` Shawn O. Pearce
2009-01-02  8:27                         ` Jeff King [this message]
2009-01-02 22:03                           ` jidanni
2009-01-01 23:18                 ` git ls-tree prints wacko file sizes if it can't find the blob jidanni
2009-01-01 23:47                   ` jidanni
2009-01-01 23:52                   ` [PATCH] Handle sha1_object_info failures in ls-tree -l Alex Riesen
2009-01-26 19:02       ` [PATCH] git-bundle(1): add no references required simplest case jidanni
2009-01-26 19:53         ` Junio C Hamano
2009-01-29 15:32           ` [PATCH,v2] " jidanni
2009-02-01 23:42             ` jidanni
2009-02-02  0:04               ` Johannes Schindelin
2009-02-02  0:45                 ` Junio C Hamano
2009-02-04  0:09                   ` jidanni
2009-02-04  2:07                     ` Junio C Hamano
2009-02-04  2:18                       ` jidanni
2009-02-04  9:15                       ` [PATCH] git-bundle doc: update examples Nanako Shiraishi
2009-02-04 15:26                         ` Jeff King
2009-02-04 22:44                         ` Junio C Hamano
2008-12-19 20:07 ` How to extract files out of a "git bundle", no matter what? Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090102082709.GA3498@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jidanni@jidanni.org \
    --cc=johannes.schindelin@gmx.de \
    --cc=mdl123@verizon.net \
    --cc=nico@cam.org \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).