git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Charles Bailey <charles@hashpling.org>
To: "J.C. Pizarro" <jcpiza@gmail.com>
Cc: LKML <linux-kernel@vger.kernel.org>, git@vger.kernel.org
Subject: Re: Question about your git habits
Date: Sat, 23 Feb 2008 14:01:53 +0000	[thread overview]
Message-ID: <20080223140153.GB5811@hashpling.org> (raw)
In-Reply-To: <998d0e4a0802230536w74e93ec3s40c77d52b183a419@mail.gmail.com>

On Sat, Feb 23, 2008 at 02:36:59PM +0100, J.C. Pizarro wrote:
> On 2008/2/23, Charles Bailey <charles@hashpling.org> wrote:
> >
> > It shouldn't matter how aggressively the repositories are packed or what
> >  the binary differences are between the pack files are. git clone
> >  should (with the --reference option) generate a new pack for you with
> >  only the missing objects. If these objects are ~52 MiB then a lot has
> >  been committed to the repository, but you're not going to be able to
> >  get around a big download any other way.
> 
> You're wrong, nothing has to be commited ~52 MiB to the repository.
> 
> I'm not saying "commit", i'm saying
> 
> "Assume A & B binary git repos and delta_B-A another binary file, i
> request built
> B' = A + delta_B-A where is verified SHA1(B') = SHA1(B) for avoiding
> corrupting".
> 
> Assume B is the higher repacked version of "A + minor commits of the day"
> as if B was optimizing 24 hours more the minimum spanning tree. Wow!!!
> 

I'm not sure that I understand where you are going with this.
Originally, you stated that if you clone a 775 MiB repository on day
one, and then you clone it again on day two when it was 777 MiB, then
you currently have to download 775 + 777 MiB of data, whereas you
could download a 52 MiB binary diff. I have no idea where that value
of 52 MiB comes from, and I've no idea how many objects were committed
between day one and day two. If we're going to talk about details,
then you need to provide more details about your scenario.

Having said that, here is my original point in some more detail. git
repositories are not binary blobs, they are object databases. Better
than this, they are databases of immutable objects. This means that to
get the difference between one database and another, you only need to
add the objects that are missing from the other database. If the two
databases are actually a database and the same database at short time
interval later, then almost all the objects are going to be common and
the difference will be a small set of objects. Using git:// this set
of objects can be efficiently transfered as a pack file. You may have
a corner case scenario where the following isn't true, but in my
experience an incremental pack file will be a more compact
representation of this difference than a binary difference of two
aggressively repacked git repositories as generated by a generic
binary difference engine.

I'm sorry if I've misunderstood your last point. Perhaps you could
expand in the exact issue that are having if I have, as I'm not sure
that I've really answered your last message.

  reply	other threads:[~2008-02-23 14:02 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-02-23  0:37 Question about your git habits Chase Venters
2008-02-23  1:26 ` Tommy Thorn
2008-02-23  1:28 ` Steven Walter
2008-02-23  1:37 ` Jan Engelhardt
2008-02-23  1:44   ` Al Viro
2008-02-23  1:51     ` Junio C Hamano
2008-02-23  2:09       ` Al Viro
     [not found]         ` <998d0e4a0802221823h3ba53097gf64fcc2ea826302b@mail.gmail.com>
2008-02-23  2:47           ` J.C. Pizarro
2008-02-23 11:39             ` Charles Bailey
2008-02-23 13:08               ` J.C. Pizarro
2008-02-23 13:17                 ` Charles Bailey
2008-02-23 13:36                   ` J.C. Pizarro
2008-02-23 14:01                     ` Charles Bailey [this message]
2008-02-23 17:10                       ` J.C. Pizarro
2008-02-23 18:16                         ` Charles Bailey
2008-02-23 18:47                           ` J.C. Pizarro
2008-02-23 19:28                             ` Charles Bailey
2008-02-23 18:19                         ` J.C. Pizarro
2008-02-23 14:08             ` Mike Hommey
2008-02-23  1:42 ` Junio C Hamano
2008-02-23 10:39   ` Samuel Tardieu
     [not found] ` <998d0e4a0802221736q4e4c3a28l101522912f7d3caf@mail.gmail.com>
2008-02-23  2:46   ` J.C. Pizarro
2008-02-23  4:10 ` Daniel Barkalow
2008-02-23  5:03   ` Jeff Garzik
2008-02-23  9:18   ` Mike Hommey
2008-02-23  4:39 ` Rene Herman
2008-02-23  8:56 ` Willy Tarreau
2008-02-23  9:10 ` Sam Ravnborg
2008-02-23 13:07 ` Jakub Narebski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080223140153.GB5811@hashpling.org \
    --to=charles@hashpling.org \
    --cc=git@vger.kernel.org \
    --cc=jcpiza@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).