git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jakub Narebski <jnareb@gmail.com>
To: "Feanil Patel" <feanil@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: How Blobs Work ( Blobs Vs. Deltas)
Date: Tue, 30 Sep 2008 11:54:21 -0700 (PDT)	[thread overview]
Message-ID: <m3y719qxc9.fsf@localhost.localdomain> (raw)
In-Reply-To: <16946e800809300814v134a42dft37becdbd8aa7669a@mail.gmail.com>

"Feanil Patel" <feanil@gmail.com> writes:

> Hello,
> 
> I was reading about git objects in "The Git Community Book"
> (http://book.git-scm.com/1_the_git_object_model.html), which was
> posted on the mailing list a while back, and I was wondering something
> about blobs and how files are stored in any particular version.  If
> file A is changed from version one to version two there are two
> different blobs that exist for the two versions of the file, is that
> correct?  The Book was saying Git does not use delta storage so does
> this mean that there are two almost identical copies of the file with
> the difference being the change that was put in from version one to
> version two?

In Git there are two kinds of storage: loose objects and packs. Each
object generally starts as a loose object; for those it is like you
wrote: if you have two versions of some file, you would have both
of those contents of a file stored as separate objects (blobs).  Note
that those 'blob' objects are compressed, so they usually don't take
more time than current version of file and its backup.

But there exists also other type of storage, namely packed.  In the
past you had to pack (repack) objects by invoking "git repack" and
"git prune", and in more modern times by calling "git gc"; nowadays
this should be taken care of by git using "git gc --auto" behind.
When packing git tries to find objects which are close contents,
and store them as base object and binary delta (based on LibXDiff).
So you get benefits of delta storage, while on the API and script
level you always see single objects.

Note that explicit repacking allow git to not only consider versions
of the same file to diff against, tree and not only linear chains of
deltas (think branches), and while recency order is preferred it is
not enforced; objects and deltas are then compressed individually.

HTH
-- 
Jakub Narebski
Poland
ShadeHawk on #git

      parent reply	other threads:[~2008-09-30 18:55 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-09-30 15:14 How Blobs Work ( Blobs Vs. Deltas) Feanil Patel
2008-09-30 15:28 ` Bruce Stephens
2008-09-30 15:29 ` Johannes Sixt
2008-09-30 18:54 ` Jakub Narebski [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m3y719qxc9.fsf@localhost.localdomain \
    --to=jnareb@gmail.com \
    --cc=feanil@gmail.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).