* q: faster way to integrate/merge lots of topic branches? @ 2008-07-23 13:05 Ingo Molnar 2008-07-23 13:17 ` Ingo Molnar ` (6 more replies) 0 siblings, 7 replies; 32+ messages in thread From: Ingo Molnar @ 2008-07-23 13:05 UTC (permalink / raw) To: git I've got the following, possibly stupid question: is there a way to merge a healthy number of topic branches into the master branch in a quicker way, when most of the branches are already merged up? Right now i've got something like this scripted up: for B in $(git-branch | cut -c3- ); do git-merge $B; done It takes a lot of time to run on even a 3.45GHz box: real 0m53.228s user 0m41.134s sys 0m11.405s I just had a workflow incident where i forgot that this script was running in one window (53 seconds are a _long_ time to start doing some other stuff :-), i switched branches and the script merrily chugged away merging branches into a topic branch i did not intend. It iterates over 140 branches - but all of them are already merged up. Anyone can simulate it by switching to the linus/master branch of the current Linux kernel tree, and doing: time for ((i=0; i<140; i++)); do git-merge v2.6.26; done real 1m26.397s user 1m10.048s sys 0m13.944s One could argue that determining whether it's all merged up already is a complex task, but but even this seemingly trivial merge of HEAD into HEAD is quite slow: time for ((i=0; i<140; i++)); do git-merge HEAD; done real 0m17.871s user 0m8.977s sys 0m8.396s I'm wondering whether there are tricks to speed this up. The real script i'm using is much longer and obscured with boring details like errors, conflicts, etc. - but the above is the gist of it. (and that is what makes it slow primarily) Using a speculative Octopus might be one approach, but that runs into the octopus merge limitation at 24 branches, and it also is quite slow as well. (and is not equivalent to the serial merge of 140 branches) I have thought of using the last CommitDate of the topic branch and compare it with the last CommitDate of the master branch [and i can trust those values] - that would be a lot faster - but maybe i'm missing something trivial that makes that approach unworkable. It would also be nice to have a builtin shortcut for that instead of having to go via "git-log --pretty=fuller" to dump the CommitDate field. builtin-integrate.c perhaps? ;-) Ingo ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: q: faster way to integrate/merge lots of topic branches? 2008-07-23 13:05 q: faster way to integrate/merge lots of topic branches? Ingo Molnar @ 2008-07-23 13:17 ` Ingo Molnar 2008-07-23 13:49 ` Ingo Molnar 2008-07-23 13:40 ` Andreas Ericsson ` (5 subsequent siblings) 6 siblings, 1 reply; 32+ messages in thread From: Ingo Molnar @ 2008-07-23 13:17 UTC (permalink / raw) To: git * Ingo Molnar <mingo@elte.hu> wrote: > I have thought of using the last CommitDate of the topic branch and > compare it with the last CommitDate of the master branch [and i can > trust those values] - that would be a lot faster - but maybe i'm > missing something trivial that makes that approach unworkable. It > would also be nice to have a builtin shortcut for that instead of > having to go via "git-log --pretty=fuller" to dump the CommitDate > field. hm, this method would be fragile if done purely within my integration script, as the timestamp of the head would have to be updated atomically, while always merging all the topic branches in one such transaction. (so that the timestamps do not get out of sync and a topic branch is not skipped by accident) So i guess it's better to just create a separate .git/refs/merge-cache/ hierarchy with timestamps of last merged branches and their head sha1 ... but maybe i'm banging on open doors? Ingo ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: q: faster way to integrate/merge lots of topic branches? 2008-07-23 13:17 ` Ingo Molnar @ 2008-07-23 13:49 ` Ingo Molnar 2008-07-23 14:47 ` Jay Soffian 0 siblings, 1 reply; 32+ messages in thread From: Ingo Molnar @ 2008-07-23 13:49 UTC (permalink / raw) To: git * Ingo Molnar <mingo@elte.hu> wrote: > So i guess it's better to just create a separate > .git/refs/merge-cache/ hierarchy with timestamps of last merged > branches and their head sha1 ... but maybe i'm banging on open doors? here's the git-fastmerge script i've whipped up in 10 minutes. It does the trick nicely for me: first run: real 0m53.228s user 0m41.134s sys 0m11.405s second run: real 0m2.751s user 0m1.280s sys 0m1.491s or a 20x speedup. Yummie! :-) It properly notices when i commit to a topic branch, and it maintains a proper matrix of <A> <- <B> merge timestamps. It even embedds the sha1's in the timestamp path so it should be quite complete. It should work fine across resets, re-merges, etc. too i think. It should work well with renamed branches as well i think. (although i dont do that all that often) In fact even if i delete the whole .git/mergecache/ hierarchy and run a 'cold' merge, it's much faster: real 0m32.129s user 0m24.456s sys 0m7.603s Because many of the branches have the same sha1 so it's already half-optimized even on the first run. Much of the remaining 2.7 seconds overhead comes from the git-log runs to retrieve the sha1s, so i guess it could all be made even faster. Now this scheme assumes that there's a sane underlying filesystem that can take these long pathnames and which has good timestamps (which i have, so it's not a worry for me). Hm? Ingo -----------------{ git-fastmerge }---------------------> #!/bin/bash usage () { echo 'usage: git-fastmerge <refspec>..' exit -1 } [ $# = 0 ] && usage BRANCH=$1 MERGECACHE=.git/mergecache [ ! -d $MERGECACHE ] && { mkdir $MERGECACHE || usage; } HEAD_SHA1=$(git-log -1 --pretty=format:"%H") BRANCH_SHA1=$(git-log -1 --pretty=format:"%H" $BRANCH) CACHE=$MERGECACHE/$HEAD_SHA1/$BRANCH_SHA1 [ -f "$CACHE" -a "$CACHE" -nt .git/refs/heads/$BRANCH_SHA1 ] && { echo "merge-cache hit on HEAD <= $1" exit 0 } git-merge $1 && { mkdir -p $(dirname $CACHE) touch $CACHE } ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: q: faster way to integrate/merge lots of topic branches? 2008-07-23 13:49 ` Ingo Molnar @ 2008-07-23 14:47 ` Jay Soffian 2008-07-23 14:56 ` Ingo Molnar 0 siblings, 1 reply; 32+ messages in thread From: Jay Soffian @ 2008-07-23 14:47 UTC (permalink / raw) To: Ingo Molnar; +Cc: git On Wed, Jul 23, 2008 at 9:49 AM, Ingo Molnar <mingo@elte.hu> wrote: > #!/bin/bash > > usage () { > echo 'usage: git-fastmerge <refspec>..' > exit -1 > } > > [ $# = 0 ] && usage > > BRANCH=$1 > > MERGECACHE=.git/mergecache > > [ ! -d $MERGECACHE ] && { mkdir $MERGECACHE || usage; } > > HEAD_SHA1=$(git-log -1 --pretty=format:"%H") > BRANCH_SHA1=$(git-log -1 --pretty=format:"%H" $BRANCH) > > CACHE=$MERGECACHE/$HEAD_SHA1/$BRANCH_SHA1 > > [ -f "$CACHE" -a "$CACHE" -nt .git/refs/heads/$BRANCH_SHA1 ] && { Shouldn't this be: [ -f "$CACHE" -a "$CACHE" -nt .git/refs/heads/$BRANCH ] && { ? j. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: q: faster way to integrate/merge lots of topic branches? 2008-07-23 14:47 ` Jay Soffian @ 2008-07-23 14:56 ` Ingo Molnar 2008-07-23 15:06 ` Ingo Molnar 0 siblings, 1 reply; 32+ messages in thread From: Ingo Molnar @ 2008-07-23 14:56 UTC (permalink / raw) To: Jay Soffian; +Cc: git * Jay Soffian <jaysoffian@gmail.com> wrote: > > CACHE=$MERGECACHE/$HEAD_SHA1/$BRANCH_SHA1 > > > > [ -f "$CACHE" -a "$CACHE" -nt .git/refs/heads/$BRANCH_SHA1 ] && { > > Shouldn't this be: > > [ -f "$CACHE" -a "$CACHE" -nt .git/refs/heads/$BRANCH ] && { > > ? yeah, i just figured it out too ... the hard way :) Updated script below. This works fine across resets in the master branch. While it's fast in the empty-merge case, it's not as fast as i'd like it to be in the almost-empty-merge case. Ingo --------------{ git-fastmerge }--------------------> #!/bin/bash usage () { echo 'usage: tip-fastmerge <refspec>..' exit -1 } [ $# = 0 ] && usage BRANCH=$1 MERGECACHE=.git/mergecache [ ! -d $MERGECACHE ] && { mkdir $MERGECACHE || usage; } HEADREF=.git/$(cut -d' ' -f2 .git/HEAD) HEAD_SHA1=$(git-log -1 --pretty=format:"%H") BRANCH_SHA1=$(git-log -1 --pretty=format:"%H" $BRANCH) CACHE=$MERGECACHE/$HEAD_SHA1/$BRANCH_SHA1 [ -f "$CACHE" -a "$CACHE" -nt "$HEADREF" ] && { # echo "merge-cache hit on HEAD <= $1" exit 0 } git-merge $1 && { mkdir -p $(dirname $CACHE) touch $CACHE } ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: q: faster way to integrate/merge lots of topic branches? 2008-07-23 14:56 ` Ingo Molnar @ 2008-07-23 15:06 ` Ingo Molnar 0 siblings, 0 replies; 32+ messages in thread From: Ingo Molnar @ 2008-07-23 15:06 UTC (permalink / raw) To: Jay Soffian; +Cc: git * Ingo Molnar <mingo@elte.hu> wrote: > > Shouldn't this be: > > > > [ -f "$CACHE" -a "$CACHE" -nt .git/refs/heads/$BRANCH ] && { > > > > ? > > yeah, i just figured it out too ... the hard way :) > > Updated script below. This works fine across resets in the master > branch. > > While it's fast in the empty-merge case, it's not as fast as i'd like > it to be in the almost-empty-merge case. When i update a topic branch, i first get a relatively fast run: earth4:~/tip> time todo-merge-all merging all branches ... Auto-merged arch/x86/kernel/genx2apic_uv_x.c Merge made by recursive. arch/x86/kernel/genx2apic_uv_x.c | 1 - 1 files changed, 0 insertions(+), 1 deletions(-) ... merge done. real 0m6.625s user 0m3.740s sys 0m2.563s Then on the next run it's slower: earth4:~/tip> time todo-merge-all merging all branches ... ... merge done. real 0m30.823s user 0m23.403s sys 0m7.545s that's unfortunate. The freshly updated topic branch was at the end of the run, now all other topic branches will have to run slow at least once until they become cached again. Perhaps the cache should update all other current topics to the new sha1, to establish the fact that they were not merged this time. (and that they are still not to be merged) (It's still much faster than completely uncached though, because of the overlap in sha1's.) Third (empty) run is fast again, because it's fully cached: earth4:~/tip> time todo-merge-all merging all branches ... ... merge done. real 0m3.036s user 0m1.360s sys 0m1.782s But it would be nice if the cache worked more intelligently in the one-topic-updated-only case as well. Ingo ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: q: faster way to integrate/merge lots of topic branches? 2008-07-23 13:05 q: faster way to integrate/merge lots of topic branches? Ingo Molnar 2008-07-23 13:17 ` Ingo Molnar @ 2008-07-23 13:40 ` Andreas Ericsson 2008-07-23 14:02 ` Ingo Molnar 2008-07-23 14:57 ` Miklos Vajna 2008-07-23 13:41 ` Sergey Vlasov ` (4 subsequent siblings) 6 siblings, 2 replies; 32+ messages in thread From: Andreas Ericsson @ 2008-07-23 13:40 UTC (permalink / raw) To: Ingo Molnar; +Cc: git Ingo Molnar wrote: > I've got the following, possibly stupid question: is there a way to > merge a healthy number of topic branches into the master branch in a > quicker way, when most of the branches are already merged up? > > Right now i've got something like this scripted up: > > for B in $(git-branch | cut -c3- ); do git-merge $B; done > > It takes a lot of time to run on even a 3.45GHz box: > > real 0m53.228s > user 0m41.134s > sys 0m11.405s > > I just had a workflow incident where i forgot that this script was > running in one window (53 seconds are a _long_ time to start doing some > other stuff :-), i switched branches and the script merrily chugged away > merging branches into a topic branch i did not intend. > > It iterates over 140 branches - but all of them are already merged up. > With the builtin merge (which is in next), this should be doable with an octopus merge, which will eliminate the branches that are already fully merged, resulting in a less-than-140-way merge (thank gods...). It also doesn't have the 24-way cap that the scripted version suffers from. If it does a good job at your rather extreme use-case, I'd say it's good enough for 'master' pretty soon :-) -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: q: faster way to integrate/merge lots of topic branches? 2008-07-23 13:40 ` Andreas Ericsson @ 2008-07-23 14:02 ` Ingo Molnar 2008-07-23 14:57 ` Miklos Vajna 1 sibling, 0 replies; 32+ messages in thread From: Ingo Molnar @ 2008-07-23 14:02 UTC (permalink / raw) To: Andreas Ericsson; +Cc: git * Andreas Ericsson <ae@op5.se> wrote: > Ingo Molnar wrote: >> I've got the following, possibly stupid question: is there a way to >> merge a healthy number of topic branches into the master branch in a >> quicker way, when most of the branches are already merged up? >> >> Right now i've got something like this scripted up: >> >> for B in $(git-branch | cut -c3- ); do git-merge $B; done >> >> It takes a lot of time to run on even a 3.45GHz box: >> >> real 0m53.228s >> user 0m41.134s >> sys 0m11.405s >> >> I just had a workflow incident where i forgot that this script was >> running in one window (53 seconds are a _long_ time to start doing some >> other stuff :-), i switched branches and the script merrily chugged >> away merging branches into a topic branch i did not intend. >> >> It iterates over 140 branches - but all of them are already merged up. >> > > With the builtin merge (which is in next), this should be doable with > an octopus merge, which will eliminate the branches that are already > fully merged, resulting in a less-than-140-way merge (thank gods...). > It also doesn't have the 24-way cap that the scripted version suffers > from. > > If it does a good job at your rather extreme use-case, I'd say it's > good enough for 'master' pretty soon :-) hm, while i do love octopus merges [*] for release and bisection-quality purposes, for throw-away (delta-)integration runs it's more manageable to do a predictable series of one-on-one merges. It results in better git-rerere behavior, has easier (to the human) conflict resolutions and the octopus merge also falls apart quite easily when it runs into conflicts. Furthermore, i've often seen octopus merges fail while a series of 1:1 merges succeeded. What i could try is to do a speculative octopus merge, in the hope of it just going fine - and then fall back to the serial merge if it fails? The git-fastmerge approach is probably still faster though - and certainly simpler from a workflow POV. Ingo [*] take a look at these in the Linux kernel -git repo: gitk 3c1ca43fafea41e38cb2d0c1684119af4c1de547 gitk 6924d1ab8b7bbe5ab416713f5701b3316b2df85b ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: q: faster way to integrate/merge lots of topic branches? 2008-07-23 13:40 ` Andreas Ericsson 2008-07-23 14:02 ` Ingo Molnar @ 2008-07-23 14:57 ` Miklos Vajna 1 sibling, 0 replies; 32+ messages in thread From: Miklos Vajna @ 2008-07-23 14:57 UTC (permalink / raw) To: Andreas Ericsson; +Cc: Ingo Molnar, git [-- Attachment #1: Type: text/plain, Size: 172 bytes --] On Wed, Jul 23, 2008 at 03:40:41PM +0200, Andreas Ericsson <ae@op5.se> wrote: > With the builtin merge (which is in next) Just a small correction: it's already in master. [-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: q: faster way to integrate/merge lots of topic branches? 2008-07-23 13:05 q: faster way to integrate/merge lots of topic branches? Ingo Molnar 2008-07-23 13:17 ` Ingo Molnar 2008-07-23 13:40 ` Andreas Ericsson @ 2008-07-23 13:41 ` Sergey Vlasov 2008-07-23 14:09 ` Ingo Molnar 2008-07-23 13:56 ` SZEDER Gábor ` (3 subsequent siblings) 6 siblings, 1 reply; 32+ messages in thread From: Sergey Vlasov @ 2008-07-23 13:41 UTC (permalink / raw) To: Ingo Molnar; +Cc: git [-- Attachment #1: Type: text/plain, Size: 889 bytes --] On Wed, 23 Jul 2008 15:05:18 +0200 Ingo Molnar wrote: > Anyone can simulate it by switching to the linus/master branch of the > current Linux kernel tree, and doing: > > time for ((i=0; i<140; i++)); do git-merge v2.6.26; done > > real 1m26.397s > user 1m10.048s > sys 0m13.944s Timing results here (E6750 @ 2.66GHz): 41.61s user 3.71s system 99% cpu 45.530 total However, testing whether there is something new to merge could be performed significantly faster: $ time sh -c 'for ((i=0; i<140; i++)); do [ -n "$(git rev-list --max-count=1 v2.6.26 ^HEAD)" ]; done' sh -c 5.49s user 0.26s system 99% cpu 5.786 total The same loop with "git merge-base v2.6.26 HEAD" takes about 40 seconds here - apparently finding the merge base is the expensive part, and it makes sense to avoid it if you expect that most of your branches do not contain anything new to merge. [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: q: faster way to integrate/merge lots of topic branches? 2008-07-23 13:41 ` Sergey Vlasov @ 2008-07-23 14:09 ` Ingo Molnar 2008-07-23 14:14 ` Ingo Molnar 0 siblings, 1 reply; 32+ messages in thread From: Ingo Molnar @ 2008-07-23 14:09 UTC (permalink / raw) To: Sergey Vlasov; +Cc: git * Sergey Vlasov <vsu@altlinux.ru> wrote: > On Wed, 23 Jul 2008 15:05:18 +0200 Ingo Molnar wrote: > > > Anyone can simulate it by switching to the linus/master branch of the > > current Linux kernel tree, and doing: > > > > time for ((i=0; i<140; i++)); do git-merge v2.6.26; done > > > > real 1m26.397s > > user 1m10.048s > > sys 0m13.944s > > Timing results here (E6750 @ 2.66GHz): > 41.61s user 3.71s system 99% cpu 45.530 total > > However, testing whether there is something new to merge could be > performed significantly faster: > > $ time sh -c 'for ((i=0; i<140; i++)); do [ -n "$(git rev-list --max-count=1 v2.6.26 ^HEAD)" ]; done' > sh -c 5.49s user 0.26s system 99% cpu 5.786 total > > The same loop with "git merge-base v2.6.26 HEAD" takes about 40 > seconds here - apparently finding the merge base is the expensive > part, and it makes sense to avoid it if you expect that most of your > branches do not contain anything new to merge. using git-fastmerge i get 2.4 seconds: $ time for ((i=0; i<140; i++)); do git-fastmerge v2.6.26; done [...] real 0m2.388s user 0m1.211s sys 0m1.131s for something that 'progresses' in a forward manner (which merges do fundamentally) nothing beats the performance of a timestamped cache i think. at least for my usecase. Even assuming that the filesystem is sane, is my merge-cache implementation semantically equivalent to a git-merge? One detail is that i suspect it is not equivalent in the git-merge --no-ff case. (but that is a not too interesting non-default case anyway) Ingo ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: q: faster way to integrate/merge lots of topic branches? 2008-07-23 14:09 ` Ingo Molnar @ 2008-07-23 14:14 ` Ingo Molnar 0 siblings, 0 replies; 32+ messages in thread From: Ingo Molnar @ 2008-07-23 14:14 UTC (permalink / raw) To: Sergey Vlasov; +Cc: git * Ingo Molnar <mingo@elte.hu> wrote: > Even assuming that the filesystem is sane, is my merge-cache > implementation semantically equivalent to a git-merge? One detail is > that i suspect it is not equivalent in the git-merge --no-ff case. > (but that is a not too interesting non-default case anyway) actually, since --no-ff creates a merge commit and thus propagates the head sha1, this should work fine as well. (besides the small detail that my script has $1 hardcoded so parameters are not properly passed onto.) Ingo ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: q: faster way to integrate/merge lots of topic branches? 2008-07-23 13:05 q: faster way to integrate/merge lots of topic branches? Ingo Molnar ` (2 preceding siblings ...) 2008-07-23 13:41 ` Sergey Vlasov @ 2008-07-23 13:56 ` SZEDER Gábor 2008-07-23 14:04 ` Ingo Molnar 2008-07-23 14:06 ` Björn Steinbrink ` (2 subsequent siblings) 6 siblings, 1 reply; 32+ messages in thread From: SZEDER Gábor @ 2008-07-23 13:56 UTC (permalink / raw) To: Ingo Molnar; +Cc: git Hi, On Wed, Jul 23, 2008 at 03:05:18PM +0200, Ingo Molnar wrote: > I've got the following, possibly stupid question: is there a way to > merge a healthy number of topic branches into the master branch in a > quicker way, when most of the branches are already merged up? > > Right now i've got something like this scripted up: > > for B in $(git-branch | cut -c3- ); do git-merge $B; done you cound use 'git branch --no-merged' to list only those branches that have not been merged into your current HEAD. Üdv, Gábor ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: q: faster way to integrate/merge lots of topic branches? 2008-07-23 13:56 ` SZEDER Gábor @ 2008-07-23 14:04 ` Ingo Molnar 2008-07-23 17:59 ` Junio C Hamano ` (2 more replies) 0 siblings, 3 replies; 32+ messages in thread From: Ingo Molnar @ 2008-07-23 14:04 UTC (permalink / raw) To: SZEDER Gábor; +Cc: git * SZEDER Gábor <szeder@ira.uka.de> wrote: > Hi, > > On Wed, Jul 23, 2008 at 03:05:18PM +0200, Ingo Molnar wrote: > > I've got the following, possibly stupid question: is there a way to > > merge a healthy number of topic branches into the master branch in a > > quicker way, when most of the branches are already merged up? > > > > Right now i've got something like this scripted up: > > > > for B in $(git-branch | cut -c3- ); do git-merge $B; done > you cound use 'git branch --no-merged' to list only those branches > that have not been merged into your current HEAD. hm, it's very slow: $ time git branch --no-merged [...] real 0m9.177s user 0m9.027s sys 0m0.129s when running it on tip/master: http://people.redhat.com/mingo/tip.git/README Ingo ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: q: faster way to integrate/merge lots of topic branches? 2008-07-23 14:04 ` Ingo Molnar @ 2008-07-23 17:59 ` Junio C Hamano 2008-07-23 22:09 ` [PATCH 1/2] builtin-branch.c: remove unused code in append_ref() callback function Junio C Hamano ` (2 more replies) 2008-07-23 18:04 ` Linus Torvalds 2008-07-23 20:01 ` Junio C Hamano 2 siblings, 3 replies; 32+ messages in thread From: Junio C Hamano @ 2008-07-23 17:59 UTC (permalink / raw) To: Ingo Molnar; +Cc: SZEDER Gábor, git Ingo Molnar <mingo@elte.hu> writes: > * SZEDER Gábor <szeder@ira.uka.de> wrote: > >> you cound use 'git branch --no-merged' to list only those branches >> that have not been merged into your current HEAD. > > hm, it's very slow: Yeah, --no-merged and --merged were done in a quite naïve way. The patch needs to be cleaned up by splitting it into multiple steps: (1) discard everything outside refs/heads and refs/remotes in append_ref(); why do we even have code to deal with refs/tags to begin with??? (2) change ref_item->sha1 to ref_item->commit (and make has_commit() take struct commit); (3) teach merge_filter code not to do has_commit() for each ref, but use revision traversal machinery to compute everything in parallel and in one traversal. but other than that, this seems to pass the tests, and is obviously correct ;-) With an artificial repository that has "master" and 1000 test-$i branches where they were created by "git branch test-$i master~$i": (with patch) $ /usr/bin/time git-branch --no-merged master >/dev/null 0.12user 0.02system 0:00.15elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+1588minor)pagefaults 0swaps $ /usr/bin/time git-branch --no-merged test-200 >/dev/null 0.15user 0.03system 0:00.18elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+1711minor)pagefaults 0swaps (without patch) $ /usr/bin/time git-branch --no-merged master >/dev/null 0.69user 0.03system 0:00.72elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+2229minor)pagefaults 0swaps $ /usr/bin/time git-branch --no-merged test-200 >/dev/null 0.58user 0.03system 0:00.61elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+2248minor)pagefaults 0swaps --- builtin-branch.c | 68 ++++++++++++++++++++++++++++++++--------------------- 1 files changed, 41 insertions(+), 27 deletions(-) diff --git a/builtin-branch.c b/builtin-branch.c index b885bd1..788e70a 100644 --- a/builtin-branch.c +++ b/builtin-branch.c @@ -13,6 +13,8 @@ #include "remote.h" #include "parse-options.h" #include "branch.h" +#include "diff.h" +#include "revision.h" static const char * const builtin_branch_usage[] = { "git branch [options] [-r | -a] [--merged | --no-merged]", @@ -181,25 +183,21 @@ static int delete_branches(int argc, const char **argv, int force, int kinds) struct ref_item { char *name; unsigned int kind; - unsigned char sha1[20]; + struct commit *commit; }; struct ref_list { + struct rev_info revs; int index, alloc, maxwidth; struct ref_item *list; struct commit_list *with_commit; int kinds; }; -static int has_commit(const unsigned char *sha1, struct commit_list *with_commit) +static int has_commit(struct commit *commit, struct commit_list *with_commit) { - struct commit *commit; - if (!with_commit) return 1; - commit = lookup_commit_reference_gently(sha1, 1); - if (!commit) - return 0; while (with_commit) { struct commit *other; @@ -215,6 +213,7 @@ static int append_ref(const char *refname, const unsigned char *sha1, int flags, { struct ref_list *ref_list = (struct ref_list*)(cb_data); struct ref_item *newitem; + struct commit *commit; int kind = REF_UNKNOWN_TYPE; int len; static struct commit_list branch; @@ -226,13 +225,15 @@ static int append_ref(const char *refname, const unsigned char *sha1, int flags, } else if (!prefixcmp(refname, "refs/remotes/")) { kind = REF_REMOTE_BRANCH; refname += 13; - } else if (!prefixcmp(refname, "refs/tags/")) { - kind = REF_TAG; - refname += 10; - } + } else + return 0; + + commit = lookup_commit_reference_gently(sha1, 1); + if (!commit) + return error("branch '%s' does not point at a commit", refname); /* Filter with with_commit if specified */ - if (!has_commit(sha1, ref_list->with_commit)) + if (!has_commit(commit, ref_list->with_commit)) return 0; /* Don't add types the caller doesn't want */ @@ -243,30 +244,25 @@ static int append_ref(const char *refname, const unsigned char *sha1, int flags, branch.item = lookup_commit_reference_gently(sha1, 1); if (!branch.item) die("Unable to lookup tip of branch %s", refname); - if (merge_filter == SHOW_NOT_MERGED && - has_commit(merge_filter_ref, &branch)) - return 0; - if (merge_filter == SHOW_MERGED && - !has_commit(merge_filter_ref, &branch)) - return 0; + add_pending_object(&ref_list->revs, + (struct object *)branch.item, refname); } /* Resize buffer */ if (ref_list->index >= ref_list->alloc) { ref_list->alloc = alloc_nr(ref_list->alloc); ref_list->list = xrealloc(ref_list->list, - ref_list->alloc * sizeof(struct ref_item)); + ref_list->alloc * sizeof(struct ref_item)); } /* Record the new item */ newitem = &(ref_list->list[ref_list->index++]); newitem->name = xstrdup(refname); newitem->kind = kind; - hashcpy(newitem->sha1, sha1); + newitem->commit = commit; len = strlen(newitem->name); if (len > ref_list->maxwidth) ref_list->maxwidth = len; - return 0; } @@ -309,7 +305,13 @@ static void print_ref_item(struct ref_item *item, int maxwidth, int verbose, { char c; int color; - struct commit *commit; + struct commit *commit = item->commit; + + if (merge_filter != NO_FILTER) { + int is_merged = !!(item->commit->object.flags & UNINTERESTING); + if (is_merged != (merge_filter == SHOW_MERGED)) + return; + } switch (item->kind) { case REF_LOCAL_BRANCH: @@ -337,7 +339,7 @@ static void print_ref_item(struct ref_item *item, int maxwidth, int verbose, strbuf_init(&subject, 0); stat[0] = '\0'; - commit = lookup_commit(item->sha1); + commit = item->commit; if (commit && !parse_commit(commit)) { pretty_print_commit(CMIT_FMT_ONELINE, commit, &subject, 0, NULL, NULL, 0, 0); @@ -350,7 +352,7 @@ static void print_ref_item(struct ref_item *item, int maxwidth, int verbose, printf("%c %s%-*s%s %s %s%s\n", c, branch_get_color(color), maxwidth, item->name, branch_get_color(COLOR_BRANCH_RESET), - find_unique_abbrev(item->sha1, abbrev), + find_unique_abbrev(item->commit->object.sha1, abbrev), stat, sub); strbuf_release(&subject); } else { @@ -363,22 +365,34 @@ static void print_ref_list(int kinds, int detached, int verbose, int abbrev, str { int i; struct ref_list ref_list; + struct commit *head_commit = lookup_commit_reference_gently(head_sha1, 1); memset(&ref_list, 0, sizeof(ref_list)); ref_list.kinds = kinds; ref_list.with_commit = with_commit; + if (merge_filter != NO_FILTER) + init_revisions(&ref_list.revs, NULL); for_each_ref(append_ref, &ref_list); + if (merge_filter != NO_FILTER) { + struct commit *filter; + filter = lookup_commit_reference_gently(merge_filter_ref, 0); + filter->object.flags |= UNINTERESTING; + add_pending_object(&ref_list.revs, + (struct object *) filter, ""); + ref_list.revs.limited = 1; + prepare_revision_walk(&ref_list.revs); + } qsort(ref_list.list, ref_list.index, sizeof(struct ref_item), ref_cmp); detached = (detached && (kinds & REF_LOCAL_BRANCH)); - if (detached && has_commit(head_sha1, with_commit)) { + if (detached && head_commit && has_commit(head_commit, with_commit)) { struct ref_item item; item.name = xstrdup("(no branch)"); item.kind = REF_LOCAL_BRANCH; - hashcpy(item.sha1, head_sha1); + item.commit = head_commit; if (strlen(item.name) > ref_list.maxwidth) - ref_list.maxwidth = strlen(item.name); + ref_list.maxwidth = strlen(item.name); print_ref_item(&item, ref_list.maxwidth, verbose, abbrev, 1); free(item.name); } ^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 1/2] builtin-branch.c: remove unused code in append_ref() callback function 2008-07-23 17:59 ` Junio C Hamano @ 2008-07-23 22:09 ` Junio C Hamano 2008-07-23 22:15 ` [PATCH] builtin-branch.c: optimize --merged and --no-merged Junio C Hamano 2008-07-24 15:29 ` q: faster way to integrate/merge lots of topic branches? Ingo Molnar 2 siblings, 0 replies; 32+ messages in thread From: Junio C Hamano @ 2008-07-23 22:09 UTC (permalink / raw) To: Ingo Molnar; +Cc: SZEDER Gábor, git We let for_each_ref() to feed all refs to append_ref() but we are only ever interested in local or remote tracking branches. Signed-off-by: Junio C Hamano <gitster@pobox.com> --- * I ended up splitting the patch into two, not three as I originally thought I would. builtin-branch.c | 10 +++------- 1 files changed, 3 insertions(+), 7 deletions(-) diff --git a/builtin-branch.c b/builtin-branch.c index b885bd1..3708a50 100644 --- a/builtin-branch.c +++ b/builtin-branch.c @@ -22,10 +22,8 @@ static const char * const builtin_branch_usage[] = { NULL }; -#define REF_UNKNOWN_TYPE 0x00 #define REF_LOCAL_BRANCH 0x01 #define REF_REMOTE_BRANCH 0x02 -#define REF_TAG 0x04 static const char *head; static unsigned char head_sha1[20]; @@ -215,7 +213,7 @@ static int append_ref(const char *refname, const unsigned char *sha1, int flags, { struct ref_list *ref_list = (struct ref_list*)(cb_data); struct ref_item *newitem; - int kind = REF_UNKNOWN_TYPE; + int kind; int len; static struct commit_list branch; @@ -226,10 +224,8 @@ static int append_ref(const char *refname, const unsigned char *sha1, int flags, } else if (!prefixcmp(refname, "refs/remotes/")) { kind = REF_REMOTE_BRANCH; refname += 13; - } else if (!prefixcmp(refname, "refs/tags/")) { - kind = REF_TAG; - refname += 10; - } + } else + return 0; /* Filter with with_commit if specified */ if (!has_commit(sha1, ref_list->with_commit)) -- 1.6.0.rc0.31.g128c7 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH] builtin-branch.c: optimize --merged and --no-merged 2008-07-23 17:59 ` Junio C Hamano 2008-07-23 22:09 ` [PATCH 1/2] builtin-branch.c: remove unused code in append_ref() callback function Junio C Hamano @ 2008-07-23 22:15 ` Junio C Hamano 2008-07-24 7:16 ` Lars Hjemli 2008-07-24 8:29 ` Nanako Shiraishi 2008-07-24 15:29 ` q: faster way to integrate/merge lots of topic branches? Ingo Molnar 2 siblings, 2 replies; 32+ messages in thread From: Junio C Hamano @ 2008-07-23 22:15 UTC (permalink / raw) To: Ingo Molnar; +Cc: SZEDER Gábor, git "git branch --no-merged $commit" used to compute the merge base between the tip of each and every branch with the named $commit, but this was wasteful when you have many branches. Inside append_ref() we literally ran has_commit() between the tip of the branch and the merge_filter_ref. Instead, we can let the revision machinery traverse the history as if we are running: $ git rev-list --branches --not $commit by queueing the tips of branches we encounter as positive refs (this mimicks the "--branches" option in the above command line) and then appending the merge_filter_ref commit as a negative one, and finally calling prepare_revision_walk() to limit the list.. After the traversal is done, branch tips that are reachable from $commit are painted UNINTERESTING; they are already fully contained in $commit (i.e. --merged). Tips that are not painted UNINTERESTING still have commits that are not reachable from $commit, thus "--no-merged" will show them. With an artificial repository that has "master" and 1000 test-$i branches where they were created by "git branch test-$i master~$i": (with patch) $ /usr/bin/time git-branch --no-merged master >/dev/null 0.12user 0.02system 0:00.15elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+1588minor)pagefaults 0swaps $ /usr/bin/time git-branch --no-merged test-200 >/dev/null 0.15user 0.03system 0:00.18elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+1711minor)pagefaults 0swaps (without patch) $ /usr/bin/time git-branch --no-merged master >/dev/null 0.69user 0.03system 0:00.72elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+2229minor)pagefaults 0swaps $ /usr/bin/time git-branch --no-merged test-200 >/dev/null 0.58user 0.03system 0:00.61elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+2248minor)pagefaults 0swaps Signed-off-by: Junio C Hamano <gitster@pobox.com> --- builtin-branch.c | 59 ++++++++++++++++++++++++++++++++++------------------- 1 files changed, 38 insertions(+), 21 deletions(-) diff --git a/builtin-branch.c b/builtin-branch.c index 3708a50..5db8ad8 100644 --- a/builtin-branch.c +++ b/builtin-branch.c @@ -13,6 +13,8 @@ #include "remote.h" #include "parse-options.h" #include "branch.h" +#include "diff.h" +#include "revision.h" static const char * const builtin_branch_usage[] = { "git branch [options] [-r | -a] [--merged | --no-merged]", @@ -179,25 +181,21 @@ static int delete_branches(int argc, const char **argv, int force, int kinds) struct ref_item { char *name; unsigned int kind; - unsigned char sha1[20]; + struct commit *commit; }; struct ref_list { + struct rev_info revs; int index, alloc, maxwidth; struct ref_item *list; struct commit_list *with_commit; int kinds; }; -static int has_commit(const unsigned char *sha1, struct commit_list *with_commit) +static int has_commit(struct commit *commit, struct commit_list *with_commit) { - struct commit *commit; - if (!with_commit) return 1; - commit = lookup_commit_reference_gently(sha1, 1); - if (!commit) - return 0; while (with_commit) { struct commit *other; @@ -213,6 +211,7 @@ static int append_ref(const char *refname, const unsigned char *sha1, int flags, { struct ref_list *ref_list = (struct ref_list*)(cb_data); struct ref_item *newitem; + struct commit *commit; int kind; int len; static struct commit_list branch; @@ -227,8 +226,12 @@ static int append_ref(const char *refname, const unsigned char *sha1, int flags, } else return 0; + commit = lookup_commit_reference_gently(sha1, 1); + if (!commit) + return error("branch '%s' does not point at a commit", refname); + /* Filter with with_commit if specified */ - if (!has_commit(sha1, ref_list->with_commit)) + if (!has_commit(commit, ref_list->with_commit)) return 0; /* Don't add types the caller doesn't want */ @@ -239,12 +242,8 @@ static int append_ref(const char *refname, const unsigned char *sha1, int flags, branch.item = lookup_commit_reference_gently(sha1, 1); if (!branch.item) die("Unable to lookup tip of branch %s", refname); - if (merge_filter == SHOW_NOT_MERGED && - has_commit(merge_filter_ref, &branch)) - return 0; - if (merge_filter == SHOW_MERGED && - !has_commit(merge_filter_ref, &branch)) - return 0; + add_pending_object(&ref_list->revs, + (struct object *)branch.item, refname); } /* Resize buffer */ @@ -258,7 +257,7 @@ static int append_ref(const char *refname, const unsigned char *sha1, int flags, newitem = &(ref_list->list[ref_list->index++]); newitem->name = xstrdup(refname); newitem->kind = kind; - hashcpy(newitem->sha1, sha1); + newitem->commit = commit; len = strlen(newitem->name); if (len > ref_list->maxwidth) ref_list->maxwidth = len; @@ -305,7 +304,13 @@ static void print_ref_item(struct ref_item *item, int maxwidth, int verbose, { char c; int color; - struct commit *commit; + struct commit *commit = item->commit; + + if (merge_filter != NO_FILTER) { + int is_merged = !!(item->commit->object.flags & UNINTERESTING); + if (is_merged != (merge_filter == SHOW_MERGED)) + return; + } switch (item->kind) { case REF_LOCAL_BRANCH: @@ -333,7 +338,7 @@ static void print_ref_item(struct ref_item *item, int maxwidth, int verbose, strbuf_init(&subject, 0); stat[0] = '\0'; - commit = lookup_commit(item->sha1); + commit = item->commit; if (commit && !parse_commit(commit)) { pretty_print_commit(CMIT_FMT_ONELINE, commit, &subject, 0, NULL, NULL, 0, 0); @@ -346,7 +351,7 @@ static void print_ref_item(struct ref_item *item, int maxwidth, int verbose, printf("%c %s%-*s%s %s %s%s\n", c, branch_get_color(color), maxwidth, item->name, branch_get_color(COLOR_BRANCH_RESET), - find_unique_abbrev(item->sha1, abbrev), + find_unique_abbrev(item->commit->object.sha1, abbrev), stat, sub); strbuf_release(&subject); } else { @@ -359,22 +364,34 @@ static void print_ref_list(int kinds, int detached, int verbose, int abbrev, str { int i; struct ref_list ref_list; + struct commit *head_commit = lookup_commit_reference_gently(head_sha1, 1); memset(&ref_list, 0, sizeof(ref_list)); ref_list.kinds = kinds; ref_list.with_commit = with_commit; + if (merge_filter != NO_FILTER) + init_revisions(&ref_list.revs, NULL); for_each_ref(append_ref, &ref_list); + if (merge_filter != NO_FILTER) { + struct commit *filter; + filter = lookup_commit_reference_gently(merge_filter_ref, 0); + filter->object.flags |= UNINTERESTING; + add_pending_object(&ref_list.revs, + (struct object *) filter, ""); + ref_list.revs.limited = 1; + prepare_revision_walk(&ref_list.revs); + } qsort(ref_list.list, ref_list.index, sizeof(struct ref_item), ref_cmp); detached = (detached && (kinds & REF_LOCAL_BRANCH)); - if (detached && has_commit(head_sha1, with_commit)) { + if (detached && head_commit && has_commit(head_commit, with_commit)) { struct ref_item item; item.name = xstrdup("(no branch)"); item.kind = REF_LOCAL_BRANCH; - hashcpy(item.sha1, head_sha1); + item.commit = head_commit; if (strlen(item.name) > ref_list.maxwidth) - ref_list.maxwidth = strlen(item.name); + ref_list.maxwidth = strlen(item.name); print_ref_item(&item, ref_list.maxwidth, verbose, abbrev, 1); free(item.name); } -- 1.6.0.rc0.31.g128c7 ^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH] builtin-branch.c: optimize --merged and --no-merged 2008-07-23 22:15 ` [PATCH] builtin-branch.c: optimize --merged and --no-merged Junio C Hamano @ 2008-07-24 7:16 ` Lars Hjemli 2008-07-24 8:29 ` Nanako Shiraishi 1 sibling, 0 replies; 32+ messages in thread From: Lars Hjemli @ 2008-07-24 7:16 UTC (permalink / raw) To: Junio C Hamano; +Cc: Ingo Molnar, SZEDER Gábor, git On Thu, Jul 24, 2008 at 12:15 AM, Junio C Hamano <gitster@pobox.com> wrote: > Instead, we can let the revision machinery traverse the history as if we > are running: > > $ git rev-list --branches --not $commit > > by queueing the tips of branches we encounter as positive refs (this > mimicks the "--branches" option in the above command line) and then > appending the merge_filter_ref commit as a negative one, and finally > calling prepare_revision_walk() to limit the list.. Nice. > @@ -213,6 +211,7 @@ static int append_ref(const char *refname, const unsigned char *sha1, int flags, > { > struct ref_list *ref_list = (struct ref_list*)(cb_data); > struct ref_item *newitem; > + struct commit *commit; > int kind; > int len; > static struct commit_list branch; I think you can drop the 'branch' here. > @@ -239,12 +242,8 @@ static int append_ref(const char *refname, const unsigned char *sha1, int flags, > branch.item = lookup_commit_reference_gently(sha1, 1); > if (!branch.item) > die("Unable to lookup tip of branch %s", refname); ..and here. - if (merge_filter == SHOW_NOT_MERGED && - has_commit(merge_filter_ref, &branch)) - return 0; - if (merge_filter == SHOW_MERGED && - !has_commit(merge_filter_ref, &branch)) - return 0; + add_pending_object(&ref_list->revs, + (struct object *)branch.item, refname); ..and use 'commit' instead of 'branch.item' here. > @@ -305,7 +304,13 @@ static void print_ref_item(struct ref_item *item, int maxwidth, int verbose, > { > char c; > int color; > - struct commit *commit; > + struct commit *commit = item->commit; > + > + if (merge_filter != NO_FILTER) { > + int is_merged = !!(item->commit->object.flags & UNINTERESTING); > + if (is_merged != (merge_filter == SHOW_MERGED)) > + return; > + } A possible issue here is that `git branch -v --[no]-merged` might use a wrong maxwidth, but I'm not sure if it's even worth fixing. Thanks for cleaning up my mess. -- larsh ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] builtin-branch.c: optimize --merged and --no-merged 2008-07-23 22:15 ` [PATCH] builtin-branch.c: optimize --merged and --no-merged Junio C Hamano 2008-07-24 7:16 ` Lars Hjemli @ 2008-07-24 8:29 ` Nanako Shiraishi 2008-07-24 10:03 ` Lars Hjemli 1 sibling, 1 reply; 32+ messages in thread From: Nanako Shiraishi @ 2008-07-24 8:29 UTC (permalink / raw) To: Junio C Hamano; +Cc: git Quoting Junio C Hamano <gitster@pobox.com> writes: > "git branch --no-merged $commit" used to compute the merge base between > the tip of each and every branch with the named $commit, but this was > wasteful when you have many branches. I am not sure if I followed the technical description of the patch, but I have a stupid question. How is --merged different from --contains? -- Nanako Shiraishi http://ivory.ap.teacup.com/nanako3/ ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH] builtin-branch.c: optimize --merged and --no-merged 2008-07-24 8:29 ` Nanako Shiraishi @ 2008-07-24 10:03 ` Lars Hjemli 0 siblings, 0 replies; 32+ messages in thread From: Lars Hjemli @ 2008-07-24 10:03 UTC (permalink / raw) To: Nanako Shiraishi; +Cc: Junio C Hamano, git On Thu, Jul 24, 2008 at 10:29 AM, Nanako Shiraishi <nanako3@lavabit.com> wrote: > How is --merged different from --contains? --merged only shows the branches which are contained by (reachable from) a specific commit --contains only shows the branches which contains (descends from) a specific commit And finally, --no-merged only shows the branches which are not contained by a specific commit -- larsh ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: q: faster way to integrate/merge lots of topic branches? 2008-07-23 17:59 ` Junio C Hamano 2008-07-23 22:09 ` [PATCH 1/2] builtin-branch.c: remove unused code in append_ref() callback function Junio C Hamano 2008-07-23 22:15 ` [PATCH] builtin-branch.c: optimize --merged and --no-merged Junio C Hamano @ 2008-07-24 15:29 ` Ingo Molnar 2 siblings, 0 replies; 32+ messages in thread From: Ingo Molnar @ 2008-07-24 15:29 UTC (permalink / raw) To: Junio C Hamano; +Cc: SZEDER Gábor, git * Junio C Hamano <gitster@pobox.com> wrote: > (with patch) > 0.15user 0.03system 0:00.18elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k > (without patch) > 0.58user 0.03system 0:00.61elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k cool - a 3.3x speedup :-) Will check that out. Ingo ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: q: faster way to integrate/merge lots of topic branches? 2008-07-23 14:04 ` Ingo Molnar 2008-07-23 17:59 ` Junio C Hamano @ 2008-07-23 18:04 ` Linus Torvalds 2008-07-23 18:12 ` Linus Torvalds 2008-07-23 20:01 ` Junio C Hamano 2 siblings, 1 reply; 32+ messages in thread From: Linus Torvalds @ 2008-07-23 18:04 UTC (permalink / raw) To: Ingo Molnar; +Cc: SZEDER Gábor, git Ahh, missed that somebody already suggested it. On Wed, 23 Jul 2008, Ingo Molnar wrote: > > hm, it's very slow: > > $ time git branch --no-merged > [...] > > real 0m9.177s > user 0m9.027s > sys 0m0.129s > > when running it on tip/master: > > http://people.redhat.com/mingo/tip.git/README We can probably speed it up, but more importantly, even if we don't, it's slow _once_. It's worth taking a 9s hit, if that means that you can then skip half of the merges entirely, and thus win half of the 53s cost. But I'll look if there's a way to cut it down from 9s. I suspect it has to traverse the whole history to make 100% sure that something isn't merged, but even that should be faster than 9s. Linus ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: q: faster way to integrate/merge lots of topic branches? 2008-07-23 18:04 ` Linus Torvalds @ 2008-07-23 18:12 ` Linus Torvalds 0 siblings, 0 replies; 32+ messages in thread From: Linus Torvalds @ 2008-07-23 18:12 UTC (permalink / raw) To: Ingo Molnar; +Cc: SZEDER Gábor, Git Mailing List, Lars Hjemli On Wed, 23 Jul 2008, Linus Torvalds wrote: > > But I'll look if there's a way to cut it down from 9s. I suspect it has to > traverse the whole history to make 100% sure that something isn't merged, > but even that should be faster than 9s. Heh. It should be trivially doable _much_ faster, but the has_commit() logic really relies on re-doing the "in_merge_base()" thing over and over again (clearing the bits), instead of just populating the object list with a "already seen" bit and lettign that expand over time. So using "git branch --no-merged" does avoid re-parsing the commits over and over again (which is a pretty big win), but the way the code is written it does end up traversing the commit list fully for every single branch. That's quite horrible. Lars added to Cc list in the hope that he'll be embarrassed enough about the performance to try to fix it ;) Linus ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: q: faster way to integrate/merge lots of topic branches? 2008-07-23 14:04 ` Ingo Molnar 2008-07-23 17:59 ` Junio C Hamano 2008-07-23 18:04 ` Linus Torvalds @ 2008-07-23 20:01 ` Junio C Hamano 2008-07-24 15:27 ` Ingo Molnar 2 siblings, 1 reply; 32+ messages in thread From: Junio C Hamano @ 2008-07-23 20:01 UTC (permalink / raw) To: Ingo Molnar; +Cc: SZEDER Gábor, git Ingo Molnar <mingo@elte.hu> writes: > hm, it's very slow: > > $ time git branch --no-merged > [...] > > real 0m9.177s > user 0m9.027s > sys 0m0.129s > > when running it on tip/master: > > http://people.redhat.com/mingo/tip.git/README Hmmm, does not reproduce for me with a copy of that repository. $ time git branch -a --no-merged mingo/master linus/master mingo/acpi-for-len mingo/auto-cpus4096-next mingo/auto-kmemcheck-next mingo/auto-test mingo/auto-test-fixes mingo/core/futex-64bit mingo/core/kill-the-BKL mingo/core/percpu-zerobased mingo/cpus4096-for-linus mingo/kmemcheck-for-linus mingo/stackprotector-for-linus mingo/timers/for-linus mingo/tip mingo/tracing/ftrace mingo/tracing/immediates mingo/tracing/markers mingo/tracing/stopmachine-allcpus mingo/tracing/textedit mingo/x86/acpi-rename-acpi_nmi mingo/x86/audit-speedup mingo/x86/crashdump mingo/x86/header-guards mingo/x86/prototypes mingo/x86/sparse-fixes mingo/x86/unify-mce mingo/x86/x2apic real 0m1.442s user 0m1.360s sys 0m0.084s With the patch I posted earlier, the time becomes: real 0m0.600s user 0m0.560s sys 0m0.040s ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: q: faster way to integrate/merge lots of topic branches? 2008-07-23 20:01 ` Junio C Hamano @ 2008-07-24 15:27 ` Ingo Molnar 2008-07-25 8:46 ` Junio C Hamano 0 siblings, 1 reply; 32+ messages in thread From: Ingo Molnar @ 2008-07-24 15:27 UTC (permalink / raw) To: Junio C Hamano; +Cc: SZEDER Gábor, git * Junio C Hamano <gitster@pobox.com> wrote: > Ingo Molnar <mingo@elte.hu> writes: > > > hm, it's very slow: > > > > $ time git branch --no-merged > > [...] > > > > real 0m9.177s > > user 0m9.027s > > sys 0m0.129s > > > > when running it on tip/master: > > > > http://people.redhat.com/mingo/tip.git/README > > Hmmm, does not reproduce for me with a copy of that repository. perhaps you need to run tip-create-local-branches.sh to create all the local branches? You can find it in: tip/tip .tip/bin/tip-create-local-branches.sh (does/should the presence of local branches matter?) Ingo ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: q: faster way to integrate/merge lots of topic branches? 2008-07-24 15:27 ` Ingo Molnar @ 2008-07-25 8:46 ` Junio C Hamano 0 siblings, 0 replies; 32+ messages in thread From: Junio C Hamano @ 2008-07-25 8:46 UTC (permalink / raw) To: Ingo Molnar; +Cc: SZEDER Gábor, git Ingo Molnar <mingo@elte.hu> writes: > perhaps you need to run tip-create-local-branches.sh to create all the > local branches? You can find it in: > > tip/tip .tip/bin/tip-create-local-branches.sh > > (does/should the presence of local branches matter?) My refs/remotes/mingo/ hierarchy has as many branches as you do in your repository as local branches, and I presume you do not have these tracking branches as I do because you are the upstream of this repository, so overall we have about the same number of refs. In the experiment, I did "branch -a --no-merged" (notice -a), so I do not think the difference should have mattered. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: q: faster way to integrate/merge lots of topic branches? 2008-07-23 13:05 q: faster way to integrate/merge lots of topic branches? Ingo Molnar ` (3 preceding siblings ...) 2008-07-23 13:56 ` SZEDER Gábor @ 2008-07-23 14:06 ` Björn Steinbrink 2008-07-23 14:06 ` Santi Béjar 2008-07-23 17:59 ` Linus Torvalds 6 siblings, 0 replies; 32+ messages in thread From: Björn Steinbrink @ 2008-07-23 14:06 UTC (permalink / raw) To: Ingo Molnar; +Cc: git On 2008.07.23 15:05:18 +0200, Ingo Molnar wrote: > > I've got the following, possibly stupid question: is there a way to > merge a healthy number of topic branches into the master branch in a > quicker way, when most of the branches are already merged up? > > Right now i've got something like this scripted up: > > for B in $(git-branch | cut -c3- ); do git-merge $B; done Not yet in any release (AFAICT), but with git.git master, you could use: for B in $(git branch --no-merged); do git-merge $B; done Or with earlier versions, this should work, but it's a lot slower: for B in $(git branch | cut -c3- ); do [[ -n "$(git rev-list -1 HEAD..$B)" ]] && git merge $B; done Björn ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: q: faster way to integrate/merge lots of topic branches? 2008-07-23 13:05 q: faster way to integrate/merge lots of topic branches? Ingo Molnar ` (4 preceding siblings ...) 2008-07-23 14:06 ` Björn Steinbrink @ 2008-07-23 14:06 ` Santi Béjar 2008-07-23 17:59 ` Linus Torvalds 6 siblings, 0 replies; 32+ messages in thread From: Santi Béjar @ 2008-07-23 14:06 UTC (permalink / raw) To: Ingo Molnar; +Cc: git On Wed, Jul 23, 2008 at 15:05, Ingo Molnar <mingo@elte.hu> wrote: > > I've got the following, possibly stupid question: is there a way to > merge a healthy number of topic branches into the master branch in a > quicker way, when most of the branches are already merged up? You could filter upfront the branches that are already merged up with: git show-branch --independent <commits> but it has a limit of 25 refs. Santi ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: q: faster way to integrate/merge lots of topic branches? 2008-07-23 13:05 q: faster way to integrate/merge lots of topic branches? Ingo Molnar ` (5 preceding siblings ...) 2008-07-23 14:06 ` Santi Béjar @ 2008-07-23 17:59 ` Linus Torvalds 2008-07-23 19:09 ` Pierre Habouzit 6 siblings, 1 reply; 32+ messages in thread From: Linus Torvalds @ 2008-07-23 17:59 UTC (permalink / raw) To: Ingo Molnar; +Cc: git On Wed, 23 Jul 2008, Ingo Molnar wrote: > > I've got the following, possibly stupid question: is there a way to > merge a healthy number of topic branches into the master branch in a > quicker way, when most of the branches are already merged up? > > Right now i've got something like this scripted up: > > for B in $(git-branch | cut -c3- ); do git-merge $B; done > > It takes a lot of time to run on even a 3.45GHz box: > > real 0m53.228s > user 0m41.134s > sys 0m11.405s This is almost certainly because a lot of your branches are a long way back in the history, and just parsing the commit history is old. For example, doing a no-op merge of something old like v2.6.24 (which is obviously already merged) takes half a second for me: [torvalds@woody linux]$ time git merge v2.6.24 Already up-to-date. real 0m0.546s user 0m0.488s sys 0m0.008s and it gets worse the further back in history you go (going back to 2.6.14 takes a second and a half - plus any IO needed, of course). And just about _all_ of it is literally just unpacking the commits as you start going backwards from the current point, eg: [torvalds@woody linux]$ time ~/git/git merge v2.6.14 Already up-to-date. real 0m1.540s vs [torvalds@woody linux]$ time git rev-list ..v2.6.14 real 0m1.407s (The merge loop isn't quite as optimized as the regular revision traversal, so you see it being slower, but you can still see that it's roughly in the same class). The merge gets a bit more expensive still if you have enabled merge summaries (because now it traverses the lists twice - once for merge bases, once for logs), but that's still a secondary effect (ie it adds another 10% or so to the cost, but the base cost is still very much about the parsing of the commits). In fact, the two top entries in a profile look roughly like: 102161 70.2727 libz.so.1.2.3 libz.so.1.2.3 (no symbols) 7685 5.2862 git git find_pack_entry_one ... ie 70% of the time is just purely unpacking the data, and another 5% is just finding it. We could perhaps improve on it, but not a whole lot. Now, quite frankly, I don't think that times on the order of one second are worth worrying about for _regular_ merges, and the whole (and only) reason you see this as a performance problem is that you're basically automating it over a ton of branches, with most of them being old and already merged. But that also points to a solution: instead of trying to merge them one at a time, and doing the costly revision traversal over and over and over again, do the costly thing _once_, and then you can just filter out the branches that aren't interesting. So instead of doing for B in $(git-branch | cut -c3- ); do git-merge $B; done the obvious optimization is to add "--no-merged" to the "git branch" call. That itself is expensive (ie doing "git branch --no-merged" will have to traverse at least as far back as the oldest branch), so that phase will be AT LEAST as expensive as one of the merges (and probably quite a bit more: I suspect "--no-merged" isn't very heavily optimized), but if a lot of your branches are already fully merged, it will do all that work _once_, and then avoid it for the merges themselves. So the _trivial_ solution is to just change it to for B in $(git branch --no-merged | cut -c3- ); do git-merge $B; done and that may already fix it in practice for you, bringing the cost down by a factor of two or more, depending on the exact pattern (of course, it could also make the cost go _up_ - if it turns out that none of the branches are merged). Other solutions exist, but they get much uglier. Octopus merges are more efficient, for example, for all the same reasons - it keeps the commit traversal in a single process, and thus avoids having to re-parse the whole history down to the common base. But they have other problems, of course. Linus ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: q: faster way to integrate/merge lots of topic branches? 2008-07-23 17:59 ` Linus Torvalds @ 2008-07-23 19:09 ` Pierre Habouzit 2008-07-23 20:27 ` Pierre Habouzit 0 siblings, 1 reply; 32+ messages in thread From: Pierre Habouzit @ 2008-07-23 19:09 UTC (permalink / raw) To: Linus Torvalds; +Cc: Ingo Molnar, git [-- Attachment #1: Type: text/plain, Size: 1457 bytes --] On Wed, Jul 23, 2008 at 05:59:01PM +0000, Linus Torvalds wrote: > In fact, the two top entries in a profile look roughly like: > > 102161 70.2727 libz.so.1.2.3 libz.so.1.2.3 (no symbols) > 7685 5.2862 git git find_pack_entry_one > ... > > ie 70% of the time is just purely unpacking the data, and another 5% is > just finding it. We could perhaps improve on it, but not a whole lot. Well there is an easy way though, that could reduce that: using adaptative compression. I proposed a patch once upon a time, that set the compression strengh to 0 for "small" objects with a configurable cut-off. If you do that, most trees, commits messages and so on aren't compressed, and it will reduce (with IIRC a 5-liner) this time quite dramatically. I could maybe resurect it to see if for people that do the kind of things Ingo does it helps. By setting the cut-off at 1k, I had packs being less than 1% bigger IIRC. I'll try to find it again and run your tests with it to see how much it helps. [ Of course, it doesn't invalidate the rest of your mail about being more clever with git-merge, but still, we could reduce this 70% of zlib time quite a lot with that ] -- ·O· Pierre Habouzit ··O madcoder@debian.org OOO http://www.madism.org [-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: q: faster way to integrate/merge lots of topic branches? 2008-07-23 19:09 ` Pierre Habouzit @ 2008-07-23 20:27 ` Pierre Habouzit 2008-07-23 20:40 ` Pierre Habouzit 0 siblings, 1 reply; 32+ messages in thread From: Pierre Habouzit @ 2008-07-23 20:27 UTC (permalink / raw) To: Linus Torvalds, Ingo Molnar, git [-- Attachment #1: Type: text/plain, Size: 2398 bytes --] On Wed, Jul 23, 2008 at 07:09:20PM +0000, Pierre Habouzit wrote: > On Wed, Jul 23, 2008 at 05:59:01PM +0000, Linus Torvalds wrote: > > In fact, the two top entries in a profile look roughly like: > > > > 102161 70.2727 libz.so.1.2.3 libz.so.1.2.3 (no symbols) > > 7685 5.2862 git git find_pack_entry_one > > ... > > > > ie 70% of the time is just purely unpacking the data, and another 5% is > > just finding it. We could perhaps improve on it, but not a whole lot. > > Well there is an easy way though, that could reduce that: using > adaptative compression. I proposed a patch once upon a time, that set > the compression strengh to 0 for "small" objects with a configurable > cut-off. If you do that, most trees, commits messages and so on aren't > compressed, and it will reduce (with IIRC a 5-liner) this time quite > dramatically. > > I could maybe resurect it to see if for people that do the kind of > things Ingo does it helps. By setting the cut-off at 1k, I had packs > being less than 1% bigger IIRC. I'll try to find it again and run your > tests with it to see how much it helps. Unsurprisingly with a 1024o cutoff, the numbers are (first run is forced cold-cache with /proc/.../drop_caches, second is the best run of 5): default git: 3.10user 0.16system 0:08.10elapsed 40%CPU (0avgtext+0avgdata 0maxresident)k 116152inputs+0outputs (671major+35286minor)pagefaults 0swaps 2.01user 0.11system 0:02.12elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+35958minor)pagefaults 0swaps With a 1024k cutoff: 1.16user 0.13system 0:08.29elapsed 15%CPU (0avgtext+0avgdata 0maxresident)k 154208inputs+0outputs (947major+39777minor)pagefaults 0swaps 0.76user 0.06system 0:00.82elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+40724minor)pagefaults 0swaps According to [0], a 1k cutoff meant something like a 10% larger pack. 512o meant an almost identical pack in size, but with reduced performance improvements. [0] http://thread.gmane.org/gmane.comp.version-control.git/70019/focus=70250 -- ·O· Pierre Habouzit ··O madcoder@debian.org OOO http://www.madism.org [-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: q: faster way to integrate/merge lots of topic branches? 2008-07-23 20:27 ` Pierre Habouzit @ 2008-07-23 20:40 ` Pierre Habouzit 0 siblings, 0 replies; 32+ messages in thread From: Pierre Habouzit @ 2008-07-23 20:40 UTC (permalink / raw) To: Linus Torvalds, Ingo Molnar, git [-- Attachment #1: Type: text/plain, Size: 3120 bytes --] On mer, jui 23, 2008 at 08:27:22 +0000, Pierre Habouzit wrote: > On Wed, Jul 23, 2008 at 07:09:20PM +0000, Pierre Habouzit wrote: > > On Wed, Jul 23, 2008 at 05:59:01PM +0000, Linus Torvalds wrote: > > > In fact, the two top entries in a profile look roughly like: > > > > > > 102161 70.2727 libz.so.1.2.3 libz.so.1.2.3 (no symbols) > > > 7685 5.2862 git git find_pack_entry_one > > > ... > > > > > > ie 70% of the time is just purely unpacking the data, and another 5% is > > > just finding it. We could perhaps improve on it, but not a whole lot. > > > > Well there is an easy way though, that could reduce that: using > > adaptative compression. I proposed a patch once upon a time, that set > > the compression strengh to 0 for "small" objects with a configurable > > cut-off. If you do that, most trees, commits messages and so on aren't > > compressed, and it will reduce (with IIRC a 5-liner) this time quite > > dramatically. > > > > I could maybe resurect it to see if for people that do the kind of > > things Ingo does it helps. By setting the cut-off at 1k, I had packs > > being less than 1% bigger IIRC. I'll try to find it again and run your > > tests with it to see how much it helps. > > Unsurprisingly with a 1024o cutoff, the numbers are (first run is > forced cold-cache with /proc/.../drop_caches, second is the best run of 5): > > default git: > > 3.10user 0.16system 0:08.10elapsed 40%CPU (0avgtext+0avgdata 0maxresident)k > 116152inputs+0outputs (671major+35286minor)pagefaults 0swaps > > 2.01user 0.11system 0:02.12elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k > 0inputs+0outputs (0major+35958minor)pagefaults 0swaps > > With a 1024k cutoff: > > 1.16user 0.13system 0:08.29elapsed 15%CPU (0avgtext+0avgdata 0maxresident)k > 154208inputs+0outputs (947major+39777minor)pagefaults 0swaps > > 0.76user 0.06system 0:00.82elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k > 0inputs+0outputs (0major+40724minor)pagefaults 0swaps With a 512o cutoff: 1.49user 0.17system 0:07.50elapsed 22%CPU (0avgtext+0avgdata 0maxresident)k 127648inputs+0outputs (780major+36687minor)pagefaults 0swaps 1.54user 0.07system 0:01.61elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+37467minor)pagefaults 0swaps What I bench, I see I forgot to mention, is: git merge v2.6.14. And the respective pack sizes: 214M .git-0/objects/pack/pack-bfeec11abed1ec6d046bc954b94d70ba81716356.pack 225M .git-512/objects/pack/pack-bfeec11abed1ec6d046bc954b94d70ba81716356.pack 243M .git-1024/objects/pack/pack-bfeec11abed1ec6d046bc954b94d70ba81716356.pack .git-0 is cheating because it was generated with a way deeper window and memory window that the other ones, but it allow to give rough impressions. -- ·O· Pierre Habouzit ··O madcoder@debian.org OOO http://www.madism.org [-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --] ^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2008-07-25 8:48 UTC | newest] Thread overview: 32+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2008-07-23 13:05 q: faster way to integrate/merge lots of topic branches? Ingo Molnar 2008-07-23 13:17 ` Ingo Molnar 2008-07-23 13:49 ` Ingo Molnar 2008-07-23 14:47 ` Jay Soffian 2008-07-23 14:56 ` Ingo Molnar 2008-07-23 15:06 ` Ingo Molnar 2008-07-23 13:40 ` Andreas Ericsson 2008-07-23 14:02 ` Ingo Molnar 2008-07-23 14:57 ` Miklos Vajna 2008-07-23 13:41 ` Sergey Vlasov 2008-07-23 14:09 ` Ingo Molnar 2008-07-23 14:14 ` Ingo Molnar 2008-07-23 13:56 ` SZEDER Gábor 2008-07-23 14:04 ` Ingo Molnar 2008-07-23 17:59 ` Junio C Hamano 2008-07-23 22:09 ` [PATCH 1/2] builtin-branch.c: remove unused code in append_ref() callback function Junio C Hamano 2008-07-23 22:15 ` [PATCH] builtin-branch.c: optimize --merged and --no-merged Junio C Hamano 2008-07-24 7:16 ` Lars Hjemli 2008-07-24 8:29 ` Nanako Shiraishi 2008-07-24 10:03 ` Lars Hjemli 2008-07-24 15:29 ` q: faster way to integrate/merge lots of topic branches? Ingo Molnar 2008-07-23 18:04 ` Linus Torvalds 2008-07-23 18:12 ` Linus Torvalds 2008-07-23 20:01 ` Junio C Hamano 2008-07-24 15:27 ` Ingo Molnar 2008-07-25 8:46 ` Junio C Hamano 2008-07-23 14:06 ` Björn Steinbrink 2008-07-23 14:06 ` Santi Béjar 2008-07-23 17:59 ` Linus Torvalds 2008-07-23 19:09 ` Pierre Habouzit 2008-07-23 20:27 ` Pierre Habouzit 2008-07-23 20:40 ` Pierre Habouzit
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).