From: Junio C Hamano <gitster@pobox.com>
To: git@vger.kernel.org
Subject: [PATCH v3 0/6] Bulk check-in
Date: Thu, 1 Dec 2011 16:40:43 -0800 [thread overview]
Message-ID: <1322786449-25753-1-git-send-email-gitster@pobox.com> (raw)
In-Reply-To: <1322699263-14475-6-git-send-email-gitster@pobox.com>
I would declare that the earlier parts of the v2 that are about factoring
out various API pieces from existing code are basically completed, so they
are not part of this iteration.
The bulk-checkin patch from v2 has been tweaked a bit (deflate_to_pack()
initializes "already_hashed_to" pointer to 0, instead of the current file
position "seekback"), and then the rest of the series builds on top of it
to add a new in-pack encoding that I am tentatively calling "chunked".
The basic idea is to represent a large/huge blob as a concatenation of
smaller blobs. An entry in a pack in "chunked" representation records a
list of object names of the component blob objects. The object name given
to such a blob is computed exactly the same way as before. In other words,
the name of a object does not depend on its representation; we hash "blob
<size> NUL" and the whole large blob contents to come up with its name. It
is *not* the hash of the component blob object names.
As can be seen in the log message of the "support chunked-object encoding"
patch, many pieces are still missing from this series and filling them
will be a long and tortuous journey. But we need to start somewhere.
I specifically excluded any heuristics to split large objects into chunks
in a self-synchronising way so that a small edit near the beginning of a
large blob results in a handful of new component blobs followed by the
same component blobs as used to represent the same blob before such an
edit, and I do not plan to work on that part myself. My impression from
listening Avery's plug for "bup" is that it is a solved problem; it should
be reasonably straightforward to lift the logic and plug it into the
framework presented here (once the codebase gets solid enough, that is).
After this series, the next step for me is likely to teach the streaming
interface about "chunked" objects, and then pack-objects to take notice
and reuse "chunked" representation when sending things out (which means
that sending a "chunked" encoded blob would involve sending the component
blobs it uses, among other things), but I expect that it will extend well
into next year.
Junio C Hamano (6):
bulk-checkin: replace fast-import based implementation
varint-in-pack: refactor varint encoding/decoding
new representation types in the packstream
bulk-checkin: allow the same data to be multiply hashed
bulk-checkin: support chunked-object encoding
chunked-object: fallback checkout codepaths
Makefile | 3 +
builtin/add.c | 5 +
builtin/pack-objects.c | 34 ++---
bulk-checkin.c | 415 ++++++++++++++++++++++++++++++++++++++++++++++++
bulk-checkin.h | 17 ++
cache.h | 13 ++-
config.c | 9 +
environment.c | 2 +
pack-write.c | 50 +++++-
pack.h | 2 +
sha1_file.c | 150 +++++++++---------
split-chunk.c | 28 ++++
t/t1050-large.sh | 135 +++++++++++++++-
zlib.c | 9 +-
14 files changed, 760 insertions(+), 112 deletions(-)
create mode 100644 bulk-checkin.c
create mode 100644 bulk-checkin.h
create mode 100644 split-chunk.c
--
1.7.8.rc4.177.g4d64
next prev parent reply other threads:[~2011-12-02 0:40 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-12-01 0:27 [PATCH v2 0/5] Bulk Check-in Junio C Hamano
2011-12-01 0:27 ` [PATCH v2 1/5] write_pack_header(): a helper function Junio C Hamano
2011-12-01 0:27 ` [PATCH v2 2/5] create_tmp_packfile(): " Junio C Hamano
2011-12-01 0:27 ` [PATCH v2 3/5] finish_tmp_packfile(): " Junio C Hamano
2011-12-01 0:27 ` [PATCH v2 4/5] csum-file: introduce sha1file_checkpoint Junio C Hamano
2011-12-01 0:27 ` [PATCH v2 5/5] bulk-checkin: replace fast-import based implementation Junio C Hamano
2011-12-01 8:05 ` Nguyen Thai Ngoc Duy
2011-12-01 15:46 ` Junio C Hamano
2011-12-02 0:40 ` Junio C Hamano [this message]
2011-12-02 0:40 ` [PATCH v3 1/6] " Junio C Hamano
2011-12-02 0:40 ` [PATCH v3 2/6] varint-in-pack: refactor varint encoding/decoding Junio C Hamano
2011-12-02 0:40 ` [PATCH v3 3/6] new representation types in the packstream Junio C Hamano
2011-12-02 0:40 ` [PATCH v3 4/6] bulk-checkin: allow the same data to be multiply hashed Junio C Hamano
2011-12-02 0:40 ` [PATCH v3 5/6] bulk-checkin: support chunked-object encoding Junio C Hamano
2011-12-02 0:40 ` [PATCH v3 6/6] chunked-object: fallback checkout codepaths Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1322786449-25753-1-git-send-email-gitster@pobox.com \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).