* [RFC/PATCH] commit-tree: bump MAX_PARENTS to 128
@ 2007-02-26 12:15 Jeff King
2007-02-26 14:31 ` Shawn O. Pearce
0 siblings, 1 reply; 12+ messages in thread
From: Jeff King @ 2007-02-26 12:15 UTC (permalink / raw)
To: git
This limit doesn't seem to come into effect anywhere else; it's simply
an arbitrary limit to make memory allocation easier. It's used to
declare a single static array of 20-byte hashes, so this increase wastes
about 2K.
---
This limit is arbitrary; should it be ridiculously high (I think 128 is
already ridiculous, but we could go to 1024 and waste 20K). Or
should it simply allocate dynamically?
I ran into this while trying to make an octopus with 25 heads. I have a
set of 25 small repositories imported from CVS. They no longer see
active development, but I want to keep them around for historical
purposes. Checking out 25 repos is a pain, so I wanted to put them all
in one repo. However, I didn't just want the histories on separate
branches; I wanted everything checked out at once. So I did:
rm -f .git/MERGE_HEAD
for i in $repos; do
git fetch ../$i $i
git read-tree --prefix=$i/ $i
git checkout -- $i
git rev-parse $i >>.git/MERGE_HEAD
done
git commit
Which of course barfed on the giant octopus. Bumping up the limit
allowed it to happen with no visible problems (the history browsing code
works fine). Yes, I obviously could have done a series of 25 pair-wise
merges (or even 2 16-way octopus merges), but I think this more closely
represents what I'm trying to accomplish.
builtin-commit-tree.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/builtin-commit-tree.c b/builtin-commit-tree.c
index 2a818a0..48dbf1d 100644
--- a/builtin-commit-tree.c
+++ b/builtin-commit-tree.c
@@ -60,7 +60,7 @@ static void check_valid(unsigned char *sha1, const char *expect)
* Having more than two parents is not strange at all, and this is
* how multi-way merges are represented.
*/
-#define MAXPARENT (16)
+#define MAXPARENT (128)
static unsigned char parent_sha1[MAXPARENT][20];
static const char commit_tree_usage[] = "git-commit-tree <sha1> [-p <sha1>]* < changelog";
--
1.5.0.1.793.gedfd5-dirty
^ permalink raw reply related [flat|nested] 12+ messages in thread* Re: [RFC/PATCH] commit-tree: bump MAX_PARENTS to 128 2007-02-26 12:15 [RFC/PATCH] commit-tree: bump MAX_PARENTS to 128 Jeff King @ 2007-02-26 14:31 ` Shawn O. Pearce 2007-02-26 16:38 ` Johannes Schindelin 0 siblings, 1 reply; 12+ messages in thread From: Shawn O. Pearce @ 2007-02-26 14:31 UTC (permalink / raw) To: Jeff King; +Cc: git Jeff King <peff@peff.net> wrote: > This limit doesn't seem to come into effect anywhere else; it's simply > an arbitrary limit to make memory allocation easier. It's used to > declare a single static array of 20-byte hashes, so this increase wastes > about 2K. I don't really see a problem with this, however: The pack v4 code that Nico and I are working on was planning on taking a very useful optimization for any commit with less than 64 parents (or maybe 128, I'd have to go back to look at my notes). We would fall back to a less optimal storage for these large octopus commits. Of course the fallback strategy (which is really just the current OBJ_COMMIT packing) is still more space efficient than making multiple commits to express the octopus, so pushing this limit up higher would save space better. Oh, and these types of octopus merges aren't very frequent either. ;-) git-bisect can bisect these large octopuses, but it needs to search every parent commit in the merge. It cannot perform a binary search through them. Getting massive octopuses makes it harder for the user to bisect. I'm thinking maybe this should just change to a dynamic allocation and let the caller feed however many parents they want. Most people don't make an octopus very often, and when they do they really mean to do it, such as the case you just described. Unless Dscho/Nico/Junio/Linus/etc. know of some other limitation lurking within Git. My recollection is that only git-commit-tree and git-gui knew about this 16 parent limit. And the latter only knows about the limit so that can prevent the user from doing an octopus merge that overflowed git-commit-tree's limit. Be nice if git-gui has no limit. ;-) -- Shawn. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC/PATCH] commit-tree: bump MAX_PARENTS to 128 2007-02-26 14:31 ` Shawn O. Pearce @ 2007-02-26 16:38 ` Johannes Schindelin 2007-02-27 8:16 ` Junio C Hamano 0 siblings, 1 reply; 12+ messages in thread From: Johannes Schindelin @ 2007-02-26 16:38 UTC (permalink / raw) To: Shawn O. Pearce; +Cc: Jeff King, git Hi, On Mon, 26 Feb 2007, Shawn O. Pearce wrote: > My recollection is that only git-commit-tree and git-gui knew about this > 16 parent limit. AFAIRC git-show-branch has a limit of 28 parents or so... But that is purely viewing porcelain... Ciao, Dscho ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC/PATCH] commit-tree: bump MAX_PARENTS to 128 2007-02-26 16:38 ` Johannes Schindelin @ 2007-02-27 8:16 ` Junio C Hamano 2007-02-27 8:19 ` Jeff King 0 siblings, 1 reply; 12+ messages in thread From: Junio C Hamano @ 2007-02-27 8:16 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Shawn O. Pearce, Jeff King, git Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > On Mon, 26 Feb 2007, Shawn O. Pearce wrote: > >> My recollection is that only git-commit-tree and git-gui knew about this >> 16 parent limit. > > AFAIRC git-show-branch has a limit of 28 parents or so... That's limit of the number of _tips_ to traverse from, and I do not think it has anything to do with maximum size of Octopus. But as Shawn pointed out, Octopus makes bisect less (much less) efficient for the end users, I tend to think the current 16 is already insanely large. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC/PATCH] commit-tree: bump MAX_PARENTS to 128 2007-02-27 8:16 ` Junio C Hamano @ 2007-02-27 8:19 ` Jeff King 2007-02-27 10:23 ` Junio C Hamano 0 siblings, 1 reply; 12+ messages in thread From: Jeff King @ 2007-02-27 8:19 UTC (permalink / raw) To: Junio C Hamano; +Cc: Johannes Schindelin, Shawn O. Pearce, git On Tue, Feb 27, 2007 at 12:16:42AM -0800, Junio C Hamano wrote: > But as Shawn pointed out, Octopus makes bisect less (much less) > efficient for the end users, I tend to think the current 16 is > already insanely large. Did you look at my "why I need a huge octopus" description? Is there a better way to do it? Should I simply do a bunch of pair-wise merges? I'll almost certainly never bisect it, but the octopus "looks right" in gitk (though I have to admit, it's really not _that_ big a deal -- it's almost more readable to look at pairwise merges anyway). -Peff ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC/PATCH] commit-tree: bump MAX_PARENTS to 128 2007-02-27 8:19 ` Jeff King @ 2007-02-27 10:23 ` Junio C Hamano 2007-02-27 10:52 ` Jeff King 2007-02-27 15:46 ` Johannes Schindelin 0 siblings, 2 replies; 12+ messages in thread From: Junio C Hamano @ 2007-02-27 10:23 UTC (permalink / raw) To: Jeff King; +Cc: Johannes Schindelin, Shawn O. Pearce, git Jeff King <peff@peff.net> writes: > On Tue, Feb 27, 2007 at 12:16:42AM -0800, Junio C Hamano wrote: > >> But as Shawn pointed out, Octopus makes bisect less (much less) >> efficient for the end users, I tend to think the current 16 is >> already insanely large. > > Did you look at my "why I need a huge octopus" description? Is there a > better way to do it? Should I simply do a bunch of pair-wise merges? > I'll almost certainly never bisect it,... I hate having to compose this message because I know I will end up saying negative things without offering anything constructive. I do not think bundling commits from unrelated multiple projects in one commit (some people seem to have called this Hydra in the past) is a good practice, regardless of size. For the sake of simplicity, suppose you are bundling two projects A and B. The first such commit would have two parent commits (the current tips of A and B). Next time you create another Hydra, what will be its parents? * You do not care about the ancestry of Hydra itself, so it has two parents, then-current tips of A and B? * You do care about the ancestry of Hydra, so the first parent is the previous Hydra commit, the second parent is the then-current tip of A and the third parent is B? If you do the former, then I do not think people can follow your progress unless they have access to your reflog, so I am guessing that you are doing the latter. Now, do you have some files that are maintained by Hydra itself? Duct tape to hold these projects together, perhaps a Makefile to build the whole thing that does not belong to either A or B? I am also guessing the answer is yes, but you said you won't bisect it, so maybe this is not an issue. But let's pretend you have something that you care about their evolution history in the Hydra itself. Then, perhaps you would need to merge the ancestry of Hydras from time to time, if you have multiple concurrent development tracks of the bundled project. That means we cannot say the first parent is from Hydra itself and the rest are component projects anymore (well, we cannot say that for the initial Hydra commit itself already, but we could always special case the "root" commit). Perhaps we could say "the last N are components", but then it is not clear what happens when you add a new component. What bothers me is that in the usual commit all parents are equal, but in this case, you have different kinds of "parent" commits and from the structure of the ancestry chain, you cannot tell which is what kind. Ancestry chain of some "parent" commits represent how the bundling of components have evolved, while other "parent" commits are just pointers into different history. Although pointers to component project commits are represented as "parent" field in commit objects, I suspect that you wish they were treated as if they were tree objects contained in the toplevel commits more often than not for the purposes of many git operations. If we think about how bisect and merge _should_ work on such ancestry chain of Hydras, my gut feeling is that the only way that makes sense is to take only the first kind of ancestry (the evolution of the bundling of components) into account. Use them to determine the merge base to perform 3-way merge, count them to find the bisection point, etc. I am not saying that the problem you are trying to solve is a wrong problem. Rather, it is showing a gap between the structure you are trying to express and the semantics of ancestry chain git offers. Currently there is nothing but commit objects that can have more than one pointers to other commit objects, so if you wanted to, making an Octopus to fake it may be the only way to do so, but the current ancestry chain semantics git offers is not set up to distinguish the two different meanings of "parent" you are trying to assign to commits, so it is very likely that many things git naturally does do not match what you expect. I think git-log (without any diff options nor paths limiter) to view the linearlized sequence of commit messages is about the only thing that makes some sense, and the size limit of Octopus would probably end up to be the least of your problems. So in that sense, I would very much more prefer the solution based on "the (single) tree object contained in the top-level commit has pointers that point at commits of subprojects" approach somebody (sorry I forgot who did this) proposed in the past (well, the very original idea was Linus's "gitlink" which is probably more than a year old). Before concluding... Yes, I am aware that you do not even intend to build on top of the history of your imported-from-CVS, so in that sense, you do not care about the ancestry of Hydra itself (it does not even have a history -- just a single state). It's such a one-shot thing that we probably should not even care about (and your commit-tree patch is fine -- I think the only thing in the core git that cares about the maximum number of parents a commit can have is git-blame), but I thought I should mention that it would be an ideal application for a proper subproject support. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC/PATCH] commit-tree: bump MAX_PARENTS to 128 2007-02-27 10:23 ` Junio C Hamano @ 2007-02-27 10:52 ` Jeff King 2007-02-27 11:06 ` Junio C Hamano 2007-02-27 11:31 ` Andy Parkins 2007-02-27 15:46 ` Johannes Schindelin 1 sibling, 2 replies; 12+ messages in thread From: Jeff King @ 2007-02-27 10:52 UTC (permalink / raw) To: Junio C Hamano; +Cc: Johannes Schindelin, Shawn O. Pearce, git On Tue, Feb 27, 2007 at 02:23:02AM -0800, Junio C Hamano wrote: > I hate having to compose this message because I know I will end > up saying negative things without offering anything constructive. First off, thanks for a thoughtful and well-written response; it was, in fact, constructive in getting me to think about my setup. Subproject support is actually what I want here. > I do not think bundling commits from unrelated multiple projects > in one commit (some people seem to have called this Hydra in the > past) is a good practice, regardless of size. I'll assume by "unrelated" here you mean in the git sense; that is, not sharing any commit history. My projects are, in fact, semantically related. Let me describe a little further. I had a CVS repository consisting of school work over the past several years, with one directory per class: school/cs101 school/cs201 etc. A year or so ago, I started using git, and imported all of my CVS repos to git. I did each class directory separately, reasoning that each represented a separate history. This can end up being unwieldy, because there are dozens of repositories; I would now like to group them in the same repo for ease of clone/fetch. Similarly, each new class I take gets its own repo, which is quite convenient when actively committing. However, I would like to "archive" it in the main repo. Thus, I believe a central "archive" repo to which I could add "subproject" pointers to each class's commit history would be ideal. Something like the gitlink or subproject support which has been talked about would work fine. OTOH, I am fortunate that this is not a "real" distributed project. I think the most convenient thing might be to simply rewrite the history of each class, pushing all of its files into a subdirectory from the main history. > * You do not care about the ancestry of Hydra itself, so it has > two parents, then-current tips of A and B? This is my case; the hydra had no history at all. > Now, do you have some files that are maintained by Hydra itself? Nope. It's purely a set of subprojects. > Although pointers to component project commits are represented > as "parent" field in commit objects, I suspect that you wish > they were treated as if they were tree objects contained in the > toplevel commits more often than not for the purposes of many > git operations. Yes, that is exactly correct. One problem I realized after doing this is that you get unexpected results from "git-whatchanged -- subproject/". My first expectation was to see _just_ the history of the subproject. But of course, you see only the merge commit, since the previous commits for that subproject didn't have that path at all (they were in the root!). Subproject support would fix that, as would simply rewriting the history. > I am not saying that the problem you are trying to solve is a > wrong problem. Rather, it is showing a gap between the > structure you are trying to express and the semantics of > ancestry chain git offers. Agreed. I have come around to the conclusion that this is an abuse of the parent pointers. > Yes, I am aware that you do not even intend to build on top of > the history of your imported-from-CVS, so in that sense, you do > not care about the ancestry of Hydra itself (it does not even > have a history -- just a single state). It's such a one-shot I thought about that, too, but I actually _might_ want to make a commit (e.g., I'm keeping most of this around for reference code. If the reference code has a minor bug, it would be nice to fix it). Anyway, thanks for your comments. I think I will look at simply rewriting the history as if it were one big repository. I think it is simplest in this case since I have the luxury of a private repo. -Peff ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC/PATCH] commit-tree: bump MAX_PARENTS to 128 2007-02-27 10:52 ` Jeff King @ 2007-02-27 11:06 ` Junio C Hamano 2007-02-27 11:31 ` Andy Parkins 1 sibling, 0 replies; 12+ messages in thread From: Junio C Hamano @ 2007-02-27 11:06 UTC (permalink / raw) To: Jeff King; +Cc: Johannes Schindelin, Shawn O. Pearce, git Jeff King <peff@peff.net> writes: > On Tue, Feb 27, 2007 at 02:23:02AM -0800, Junio C Hamano wrote: > >> Although pointers to component project commits are represented >> as "parent" field in commit objects, I suspect that you wish >> they were treated as if they were tree objects contained in the >> toplevel commits more often than not for the purposes of many >> git operations. > > Yes, that is exactly correct. One problem I realized after doing this is > that you get unexpected results from "git-whatchanged -- subproject/". > My first expectation was to see _just_ the history of the subproject. > But of course, you see only the merge commit, since the previous commits > for that subproject didn't have that path at all (they were in the > root!). Subproject support would fix that, as would simply rewriting > the history. For the record, I am aware of the fact that the recent git.git itself exhibits this exact problem, due to the subtree merge of git-gui repository. I haven't got to the point of being annoyed enough to regret it, but running "git show" on merges from Shawn always needs -M option to make heads or tails of. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC/PATCH] commit-tree: bump MAX_PARENTS to 128 2007-02-27 10:52 ` Jeff King 2007-02-27 11:06 ` Junio C Hamano @ 2007-02-27 11:31 ` Andy Parkins 2007-02-27 11:39 ` Jeff King 1 sibling, 1 reply; 12+ messages in thread From: Andy Parkins @ 2007-02-27 11:31 UTC (permalink / raw) To: git; +Cc: Jeff King, Junio C Hamano, Johannes Schindelin, Shawn O. Pearce On Tuesday 2007 February 27 10:52, Jeff King wrote: > A year or so ago, I started using git, and imported all of my CVS repos > to git. I did each class directory separately, reasoning that each > represented a separate history. This can end up being unwieldy, because > there are dozens of repositories; I would now like to group them in the > same repo for ease of clone/fetch. It doesn't have fetch or clone support, but perhaps my poorman's submodule code will help you a bit, until real submodule support appears in git. http://marc2.theaimsgroup.com/?l=git&m=116662031219222&w=2 I've found it useful for myself, and with a little bit of massaging I don't think it would be hard to put fetch support in. I think it just needs a way of telling the remote end to switch GIT_DIR to a different directory. "Merges" are handled by simply sorting out any conflicts in the module file (basically you pick one of the submodule commits) just as you would with any other file. As I say - not fabulous, but gets me by. Andy -- Dr Andy Parkins, M Eng (hons), MIET andyparkins@gmail.com ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC/PATCH] commit-tree: bump MAX_PARENTS to 128 2007-02-27 11:31 ` Andy Parkins @ 2007-02-27 11:39 ` Jeff King 2007-02-27 14:04 ` Jakub Narebski 0 siblings, 1 reply; 12+ messages in thread From: Jeff King @ 2007-02-27 11:39 UTC (permalink / raw) To: Andy Parkins; +Cc: git, Junio C Hamano, Johannes Schindelin, Shawn O. Pearce On Tue, Feb 27, 2007 at 11:31:52AM +0000, Andy Parkins wrote: > > there are dozens of repositories; I would now like to group them in the > > same repo for ease of clone/fetch. > > It doesn't have fetch or clone support, but perhaps my poorman's submodule > code will help you a bit, until real submodule support appears in git. Thanks for the pointer, but it doesn't handle one of my pet peeves with many repositories: fetching 25 repositories takes a long time. I have a "look at every repository and see if anything needs fetched or pushed" script; it takes about 0.5-1.0 seconds per repository. Turning 25 fetches into 1 makes it a lot nicer to use. So of all the problems hoped to be solved by submodule support, I think your poor man's submodule support solves the ones I don't care about (tracking external repositories with merge resolution) but not the one I do (fetch/clone effort). :) -Peff ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC/PATCH] commit-tree: bump MAX_PARENTS to 128 2007-02-27 11:39 ` Jeff King @ 2007-02-27 14:04 ` Jakub Narebski 0 siblings, 0 replies; 12+ messages in thread From: Jakub Narebski @ 2007-02-27 14:04 UTC (permalink / raw) To: git Jeff King wrote: > On Tue, Feb 27, 2007 at 11:31:52AM +0000, Andy Parkins wrote: > >>> there are dozens of repositories; I would now like to group them in the >>> same repo for ease of clone/fetch. >> >> It doesn't have fetch or clone support, but perhaps my poorman's submodule >> code will help you a bit, until real submodule support appears in git. > > Thanks for the pointer, but it doesn't handle one of my pet peeves with > many repositories: fetching 25 repositories takes a long time. I have a > "look at every repository and see if anything needs fetched or pushed" > script; it takes about 0.5-1.0 seconds per repository. Turning 25 > fetches into 1 makes it a lot nicer to use. > > So of all the problems hoped to be solved by submodule support, I think > your poor man's submodule support solves the ones I don't care about > (tracking external repositories with merge resolution) but not the one I > do (fetch/clone effort). :) See http://git.or.cz/gitwiki/SubprojectSupport which mentions prototype submodules/subprojects implementation by Martin Waitz, with having link to submodule commit in the tree (so tree have links to trees, to blobs, and to submodules/commits). BTW. Andy, could you add note about your lightweight submodule support to this page? TIA. -- Jakub Narebski Warsaw, Poland ShadeHawk on #git ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC/PATCH] commit-tree: bump MAX_PARENTS to 128 2007-02-27 10:23 ` Junio C Hamano 2007-02-27 10:52 ` Jeff King @ 2007-02-27 15:46 ` Johannes Schindelin 1 sibling, 0 replies; 12+ messages in thread From: Johannes Schindelin @ 2007-02-27 15:46 UTC (permalink / raw) To: Junio C Hamano; +Cc: Jeff King, Shawn O. Pearce, git Hi, On Tue, 27 Feb 2007, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > On Tue, Feb 27, 2007 at 12:16:42AM -0800, Junio C Hamano wrote: > > > >> But as Shawn pointed out, Octopus makes bisect less (much less) > >> efficient for the end users, I tend to think the current 16 is > >> already insanely large. > > > > Did you look at my "why I need a huge octopus" description? Is there a > > better way to do it? Should I simply do a bunch of pair-wise merges? > > I'll almost certainly never bisect it,... > > I hate having to compose this message because I know I will end up > saying negative things without offering anything constructive. IMHO this discussion is anything but not constructive. > I do not think bundling commits from unrelated multiple projects in one > commit (some people seem to have called this Hydra in the past) is a > good practice, regardless of size. Yesterday, I kicked an idea around on IRC with Sam: Darcs does not have branches as we do. Sam was nice enough to show me a picture: http://utsl.gen.nz/git/hydra-vs-regular.png For those poor souls stuck with a text terminal, it looks like this: Right image: A-B-C-D-E-F-G-H-I-J-K-L-M-N-O-P-Q-R-S-T-U-V-W-X-Y-Z-a-b-c-d-e Left image: A-B---------------------------------------------------------e | C F-G-H-I------------------------------------------/| |\ | | | ---+-_ --J----------------------------------------/| \ / --/ | D-E \----K--------------------------------------/| \ \_ | \ -L------------------------------------/| \ | \ M----------------------------------/| \ | \ P-----_ Y----------/| \ | \ / | --------------+-------T-U-V-W-X---Z---b-c--/| / | \ / / N-O Q----/| ----------a d \ | ----R--/| / S (Puh! Can you believe how much time such a picture takes?) So, the right image is what it would look like of you just committed everything with Git, and the left image how it would look like with Darcs. Now, I never have worked with Darcs, but I _could_ imagine that it would be useful for some workflows to generate the Darcs layout automatically from the Git layout. In that case, a _lot_ of parents should be allowed. Ciao, Dscho ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2007-02-27 15:46 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-02-26 12:15 [RFC/PATCH] commit-tree: bump MAX_PARENTS to 128 Jeff King 2007-02-26 14:31 ` Shawn O. Pearce 2007-02-26 16:38 ` Johannes Schindelin 2007-02-27 8:16 ` Junio C Hamano 2007-02-27 8:19 ` Jeff King 2007-02-27 10:23 ` Junio C Hamano 2007-02-27 10:52 ` Jeff King 2007-02-27 11:06 ` Junio C Hamano 2007-02-27 11:31 ` Andy Parkins 2007-02-27 11:39 ` Jeff King 2007-02-27 14:04 ` Jakub Narebski 2007-02-27 15:46 ` Johannes Schindelin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).