From: Jeff King <peff@peff.net>
To: Bo Chen <chen@chenirvine.org>
Cc: Nguyen Thai Ngoc Duy <pclouds@gmail.com>, git@vger.kernel.org
Subject: Re: GSoC - Some questions on the idea of "Better big-file support".
Date: Fri, 30 Mar 2012 15:54:04 -0400 [thread overview]
Message-ID: <20120330195404.GA20189@sigill.intra.peff.net> (raw)
In-Reply-To: <CA+M5ThS1XiaGJWmSvfwXoqebnH6fK3h6cC7OnQQi=LXzcA0GRw@mail.gmail.com>
On Fri, Mar 30, 2012 at 03:11:40PM -0400, Bo Chen wrote:
> Just make clear one of my confusions. Delta operation is to find out
> the differences between different versions of the same file, right?
> As I know, delta encoding is to re-encode a file based on the
> differences between neighboring blocks, thus can help compress a file
> since after delta encoding, we will have more similar data within the
> file. Can anyone elaborate a little bit what is the relation between
> delta operation in git and delta encoding listed above? Thanks.
Sort of. Git is snapshot based. So each version of a file is its own
"object", and from a high-level view, we store all objects. But we store
the logical objects themselves in packfiles, in which the actual
representation of the object may be stored as a difference to another
object (which is likely to be a different version of the same file, but
does not have to be).
Here's some background reading:
http://progit.org/book/ch1-3.html
http://progit.org/book/ch9-4.html
> I am wondering why we cannot divide the 2 2GB files into chunks and
> delta chunks by chunks. Is that any difference, except a little more
> IOs?
It's more complicated than that. What if the file is re-ordered? You
would want to compare early chunks in one version against later chunks
in the other. So yes, you can reduce memory pressure by doing more I/O,
but doing too much I/O will be very slow. Coming up with a solution is
part of what this project is about. And chunking is part of that
solution.
> > Read about rsync algorithm [2]. Bup [1] implements the same (I think)
> > algorithm, but on top of git. For preliminary patches, have a look at
> > jc/split-blob series at commit 4a1242d in git.git.
>
> Make clear my another confusion. The file which has been updated
> (added, deleted, and modified) is first delta-compressed, and then
> synchronize to the remote repo by some mechanism (rsync?). I am
> wondering what is the the relationship between delta operation and
> rsync.
No, the updated file is delta compressed into a packfile, and the
packfile is transmitted. Rsync comes into play because it uses a novel
chunking algorithm, which was copied by bup (and is referred to as the
"bupsplit" algorithm). Read up on how bup works and why it was invented.
-Peff
prev parent reply other threads:[~2012-03-30 19:54 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-03-28 4:38 GSoC - Some questions on the idea of "Better big-file support" Bo Chen
2012-03-28 6:19 ` Nguyen Thai Ngoc Duy
2012-03-28 11:33 ` GSoC - Some questions on the idea of Sergio
2012-03-30 19:44 ` Bo Chen
2012-03-30 19:51 ` Bo Chen
2012-03-30 20:34 ` Jeff King
2012-03-30 23:08 ` Bo Chen
2012-03-31 11:02 ` Sergio Callegari
2012-03-31 16:18 ` Neal Kreitzinger
2012-04-02 21:07 ` Jeff King
2012-04-03 9:58 ` Sergio Callegari
2012-04-11 1:24 ` Neal Kreitzinger
2012-04-11 6:04 ` Jonathan Nieder
2012-04-11 16:29 ` Neal Kreitzinger
2012-04-11 22:09 ` Jeff King
2012-04-11 16:35 ` Neal Kreitzinger
2012-04-11 16:44 ` Neal Kreitzinger
2012-04-11 17:20 ` Jonathan Nieder
2012-04-11 18:51 ` Junio C Hamano
2012-04-11 19:03 ` Jonathan Nieder
2012-04-11 18:23 ` Neal Kreitzinger
2012-04-11 21:35 ` Jeff King
2012-04-12 19:29 ` Neal Kreitzinger
2012-04-12 21:03 ` Jeff King
[not found] ` <4F8A2EBD.1070407@gmail.com>
2012-04-15 2:15 ` Jeff King
2012-04-15 2:33 ` Neal Kreitzinger
2012-04-16 14:54 ` Jeff King
2012-05-10 21:43 ` Neal Kreitzinger
2012-05-10 22:39 ` Jeff King
2012-04-12 21:08 ` Neal Kreitzinger
2012-04-13 21:36 ` Bo Chen
2012-03-31 15:19 ` Neal Kreitzinger
2012-04-02 21:40 ` Jeff King
2012-04-02 22:19 ` Junio C Hamano
2012-04-03 10:07 ` Jeff King
2012-03-31 16:49 ` Neal Kreitzinger
2012-03-31 20:28 ` Neal Kreitzinger
2012-03-31 21:27 ` Bo Chen
2012-04-01 4:22 ` Nguyen Thai Ngoc Duy
2012-04-01 23:30 ` Bo Chen
2012-04-02 1:00 ` Nguyen Thai Ngoc Duy
2012-03-30 19:11 ` GSoC - Some questions on the idea of "Better big-file support" Bo Chen
2012-03-30 19:54 ` Jeff King [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120330195404.GA20189@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=chen@chenirvine.org \
--cc=git@vger.kernel.org \
--cc=pclouds@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).