From: Nguyen Thai Ngoc Duy <pclouds@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: git <git@vger.kernel.org>
Subject: Re: [PATCH 05/16] Hook up replace-object to allow bulk commit replacement
Date: Tue, 3 Aug 2010 08:42:06 +1000 [thread overview]
Message-ID: <AANLkTimMxtVGYnVhMOdnZ4oWQa1BsmUC2gUpt0ZcDspE@mail.gmail.com> (raw)
In-Reply-To: <7v8w4olrc0.fsf@alter.siamese.dyndns.org>
2010/8/3 Junio C Hamano <gitster@pobox.com>:
> I really do not like the use of "replace" for the purpose of narrow
> clones. While "replace" is about fixing a mistake by tweaking trees, a
> desire to have a narrow clone at this moment is _not_ a mistake. You may
> want to have wider or full clone of the project tomorrow. You may want to
> push the result of committing on top of such a narrowed clone back to a
> full repository. My gut feeling is that that use of "replace" to stub out
> the objects that you do not currently have would make it a nightmare when
> you would want to widen (especially to widen over the wire while pushing
> into a full repository on the other end), although I haven't looked at all
> the patches in the series.
Indeed. My intention was "hey this repo is too big, I only need some
pieces of it. Let me grab something and do my work. (Then throw away
the cloned repo)". It's best used together with shallow clone to give
low download/disk space, and a minimum tree to fix something quick.
I'm not really sure if such repos are sustainable in long run. And no
I did not want to widen/narrow the tree (as it was to be throw away
tree). Now thinking of widening. The way I do narrow clone is quite
similar with shallow clone. I hope the way shallow clone is deepen can
be applied to widening clone.
> Can you back up a bit and give us a high-level overview of how various
> operations in a narrowed clone should work, and how you achieve that
> design goal?
Operations work as normal (as the incomplete clone is augmented to
become "normal"). In order to make it looks normal, every time a new
commit comes in (either from another repository, or user creates a new
one), the commit needs to be processed/replaced, so that the repo
looks normal from git perspective.
> Let's take an example of starting from git.git and narrow-clone only its
> Documentation/ (as you seem to have used as a guinea-pig) subdirectory.
> For the sake of simplicity, let's say the upstream project has only one
> commit.
>
> One plausible approach would be to have the commit, its top level tree
> object, its Documentation/ tree object and all the blobs below that level,
> while other blobs and trees that are reachable from the top level tree
> object are left missing, but somehow are marked so that fsck would think
> they are OK to be missing. Your worktree would obviously be narrowed to
> the same Documentation/ area, and unlike the narrow checkout codepath, you
> do not widen on demand (unless you automatically fetch missing parts of
> the tree, which I do not think you should do by default to help people who
> work while at 30,000ft). Instead, any operation that tries to modify
> outside the "subtree" area should fail.
Changes outside the subtree area are dropped on the floor now, not
fail. But yes, it should fail.
> When you build a commit that represents a Documentation patch on top of
> such a narrowed clone, because you have a full tree of Documentation/
> area, you can come up with the updated tree object for that part of the
> project. If "subtree" mode (aka narrowed clone) rejects operation outside
> the cloned area, your commit is guaranteed to touch only Documentation/
> area and nothing outside. You therefore should be able to compute the
> tree object for the whole repository (i.e. all the other entries in the
> top level tree object should be the same as those from HEAD).
Correct. Except..
> Because the index is a flat structure, you would need to fudge the entries
> that are missing-but-OK in there somehow, _and_ you would need to be able
> to recompute the tree after updating Documentation/ area. E.g. you may
> know ppc/ is tree db31c066 but may not know that it has three blobs
> underneath it nor what their object names are, so your index operating in
> this mode would need to record (ppc -> db31c066) mapping in order to be
> able to recreate the tree object out of it.
This is where git-replace comes in. I do not want to deal with full
flat index. Giving pointers to missing objects may make git commands
nervous. I rewrite the commit so that now it only has Documentation/
and nothing else (which I have all needed objects). The index is
narrowed too. Because the index (even narrowed) is complete (i.e. all
entries reachable), most operations should work.
Then, to hide the helper commit from user, I replace the original
(full) commit with this new commit. So from outside git sees SHA-1 of
the original commit, but its content is from the helper one. These
helper commits guarantee git won't reach out for missing objects.
It's a trade off. Doing full index requires much more effort into git.
Using "git-subtree split", while free git developers to do other
things, might be inconvenient for users (without server support, full
repo must be downloaded, replaced SHA-1 from git-subtree cannot be
used to communicate with coworkers..)
> Using cache-tree data structure might help in doing this. It so far has
> been an optimization (i.e. when it says it has an up-to-date information,
> it does, but if it doesn't you can always recompute what is needed from
> the flat index entries), but I would imagine that you can add an "out of
> cloned area" bit to cache-tree entries, and mark a subtree that represents
> missing parts (e.g. 'ppc/') as such---anything that tries to invalidate
> such a cache-tree entry would be an error anyway, and when you need to
> write the index out as a tree, such cache-tree entries that record the
> trees outside your cloned area can be reused, no?
That's just a part of the story. Repository integrity is a
prerequisite in git from the beginning. git-merge operates directly on
trees so cache-tree won't help much. git-commit does sha1 existence
check on every sha1 before commit, so it needs to be
narrow-clone-aware too. That made me wonder if has_sha1_file was used
elsewhere. Then "git grep has_sha1_file" scared me off and I'm back
away.
--
Duy
next prev parent reply other threads:[~2010-08-02 22:42 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-07-31 16:18 [PATCH 00/16] Subtree clone proof of concept Nguyễn Thái Ngọc Duy
2010-07-31 16:18 ` [PATCH 01/16] Add core.subtree Nguyễn Thái Ngọc Duy
2010-07-31 16:18 ` [PATCH 02/16] list-objects: limit traversing within the given subtree if core.subtree is set Nguyễn Thái Ngọc Duy
2010-08-01 11:30 ` Ævar Arnfjörð Bjarmason
2010-08-01 23:11 ` Nguyen Thai Ngoc Duy
2010-08-02 4:21 ` Elijah Newren
2010-08-02 6:51 ` Nguyen Thai Ngoc Duy
2010-07-31 16:18 ` [PATCH 03/16] parse_object: keep sha1 even when parsing replaced one Nguyễn Thái Ngọc Duy
2010-07-31 16:18 ` [PATCH 04/16] Allow to invalidate a commit in in-memory object store Nguyễn Thái Ngọc Duy
2010-07-31 16:18 ` [PATCH 05/16] Hook up replace-object to allow bulk commit replacement Nguyễn Thái Ngọc Duy
2010-08-02 19:58 ` Junio C Hamano
2010-08-02 22:42 ` Nguyen Thai Ngoc Duy [this message]
2010-07-31 16:18 ` [PATCH 06/16] upload-pack: use a separate variable to control whether internal rev-list is used Nguyễn Thái Ngọc Duy
2010-08-02 4:25 ` Elijah Newren
2010-07-31 16:18 ` [PATCH 07/16] upload-pack: support subtree pack Nguyễn Thái Ngọc Duy
2010-08-02 4:27 ` Elijah Newren
2010-07-31 16:18 ` [PATCH 08/16] fetch-pack: support --subtree Nguyễn Thái Ngọc Duy
2010-07-31 16:18 ` [PATCH 09/16] subtree: rewrite incoming commits Nguyễn Thái Ngọc Duy
2010-08-02 4:37 ` Elijah Newren
2010-07-31 16:18 ` [PATCH 10/16] clone: support subtree clone with parameter --subtree Nguyễn Thái Ngọc Duy
2010-07-31 16:18 ` [PATCH 11/16] pack-objects: add --subtree (for pushing) Nguyễn Thái Ngọc Duy
2010-07-31 16:18 ` [PATCH 12/16] subtree: rewriting outgoing commits Nguyễn Thái Ngọc Duy
2010-08-02 4:40 ` Elijah Newren
2010-07-31 16:18 ` [PATCH 13/16] Update commit_tree() interface to take base tree too Nguyễn Thái Ngọc Duy
2010-07-31 16:18 ` [PATCH 14/16] commit_tree(): rewriting/replacing new commits Nguyễn Thái Ngọc Duy
2010-07-31 16:18 ` [PATCH 15/16] commit: rewrite outgoing commits Nguyễn Thái Ngọc Duy
2010-07-31 16:18 ` [PATCH 16/16] do not use thin packs and subtree together (just a bad feeling about this) Nguyễn Thái Ngọc Duy
2010-08-01 4:14 ` [PATCH 00/16] Subtree clone proof of concept Sverre Rabbelier
2010-08-01 6:58 ` Nguyen Thai Ngoc Duy
2010-08-01 20:05 ` Sverre Rabbelier
2010-08-02 5:18 ` Elijah Newren
2010-08-02 7:10 ` Nguyen Thai Ngoc Duy
2010-08-02 22:55 ` Nguyen Thai Ngoc Duy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=AANLkTimMxtVGYnVhMOdnZ4oWQa1BsmUC2gUpt0ZcDspE@mail.gmail.com \
--to=pclouds@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).