* cherry-pick is slow @ 2012-05-12 22:39 Dmitry Risenberg 2012-05-13 1:11 ` Junio C Hamano 0 siblings, 1 reply; 9+ messages in thread From: Dmitry Risenberg @ 2012-05-12 22:39 UTC (permalink / raw) To: git Hello. I have a very big git repository (the .git directory is about 5.3 Gb), which is a copy of an svn repository fetched via git-svn. In fact there are a few repositories ("working copies") that share the same .git directory (via symlinks), in which I have different svn branches checked out. Now I want to merge a commit from one svn branch to another via git cherry-pick. The commit contains diff in only one file. So I do git cherry-pick <commit> And the operation takes tens of seconds to finish. In "top" output I see that git process uses almost no CPU, but has hundreds of page faults, so I assume that it is reading a lot of files from disk. I also tried running git in gdb and interrupting it in random places, the stacktrace I get is usually like this: #0 0x00000000004fb70c in experimental_loose_object ( map=0x3268e000 "x\001+)JMU043g040031QpöMÌNõÉ,.)Ö+©(aøt*äãú½\vÍ4\236Yñ\230¤&z¯ß<{º\211\001\020(¤eV0\024\177ò\177á$\\$\034\224\036ö*÷vÂ\216#3ºâ!²\231)@Ù*íü»Ü7\233BmÜ\005^·\034ñÏOä¨\205Èæf\227\024e¦2¬fÐ\tY¾}ѳ\212,û\027î<ê\236[³w\234\204*(Í)Éd0ø\024\030n\233øìP\210\214üDSÆ?ó\216½¹>\a") at sha1_file.c:1259 #1 0x00000000004fb8df in unpack_sha1_header (stream=0x7fffffffc9f0, map=0x3268e000 "x\001+)JMU043g040031QpöMÌNõÉ,.)Ö+©(aøt*äãú½\vÍ4\236Yñ\230¤&z¯ß<{º\211\001\020(¤eV0\024\177ò\177á$\\$\034\224\036ö*÷vÂ\216#3ºâ!²\231)@Ù*íü»Ü7\233BmÜ\005^·\034ñÏOä¨\205Èæf\227\024e¦2¬fÐ\tY¾}ѳ\212,û\027î<ê\236[³w\234\204*(Í)Éd0ø\024\030n\233øìP\210\214üDSÆ?ó\216½¹>\a", mapsize=173, buffer=0x7fffffffa9f0, bufsiz=8192) at sha1_file.c:1308 #2 0x00000000004fbc85 in unpack_sha1_file (map=0x3268e000, mapsize=173, type=0x7fffffffcbf0, size=0x7fffffffcbe8, sha1=0x7fffffffcbd0 "\001/Ç4&箺\036© wK`\214\"Ë\035H\203") at sha1_file.c:1435 #3 0x00000000004fd96c in read_object (sha1=0x7fffffffcbd0 "\001/Ç4&箺\036© wK`\214\"Ë\035H\203", type=0x7fffffffcbf0, size=0x7fffffffcbe8) at sha1_file.c:2233 #4 0x00000000004fda0d in read_sha1_file_extended (sha1=0x7fffffffcbd0 "\001/Ç4&箺\036© wK`\214\"Ë\035H\203", type=0x7fffffffcbf0, size=0x7fffffffcbe8, flag=1) at sha1_file.c:2258 #5 0x00000000004fdcda in read_sha1_file (sha1=0x7fffffffcbd0 "\001/Ç4&箺\036© wK`\214\"Ë\035H\203", type=0x7fffffffcbf0, size=0x7fffffffcbe8) at cache.h:761 #6 0x00000000004fdbb1 in read_object_with_reference (sha1=0x334a8130 "\001/Ç4&箺\036© wK`\214\"Ë\035H\203", required_type_name=0x55a1a0 "tree", size=0x7fffffffcc30, actual_sha1_return=0x0) at sha1_file.c:2299 #7 0x0000000000510f50 in fill_tree_descriptor (desc=0x7fffffffcd10, sha1=0x334a8130 "\001/Ç4&箺\036© wK`\214\"Ë\035H\203") at tree-walk.c:57 #8 0x00000000005133dd in traverse_trees_recursive (n=1, dirmask=1, df_conflicts=0, names=0x349d68a0, info=0x7fffffffd020) at unpack-trees.c:456 #9 0x0000000000514239 in unpack_callback (n=1, mask=1, dirmask=1, names=0x349d68a0, info=0x7fffffffd020) at unpack-trees.c:809 #10 0x00000000005119c9 in traverse_trees (n=1, t=0x7fffffffd0b0, info=0x7fffffffd020) at tree-walk.c:407 #11 0x000000000051342d in traverse_trees_recursive (n=1, dirmask=0, df_conflicts=0, names=0x349d6860, info=0x7fffffffd3c0) at unpack-trees.c:460 #12 0x0000000000514239 in unpack_callback (n=1, mask=1, dirmask=1, names=0x349d6860, info=0x7fffffffd3c0) at unpack-trees.c:809 #13 0x00000000005119c9 in traverse_trees (n=1, t=0x7fffffffd450, info=0x7fffffffd3c0) at tree-walk.c:407 #14 0x000000000051342d in traverse_trees_recursive (n=1, dirmask=0, df_conflicts=0, names=0x349d6840, info=0x7fffffffd760) at unpack-trees.c:460 #15 0x0000000000514239 in unpack_callback (n=1, mask=1, dirmask=1, names=0x349d6840, info=0x7fffffffd760) at unpack-trees.c:809 #16 0x00000000005119c9 in traverse_trees (n=1, t=0x7fffffffda40, info=0x7fffffffd760) at tree-walk.c:407 #17 0x0000000000514af5 in unpack_trees (len=1, t=0x7fffffffda40, o=0x7fffffffd830) at unpack-trees.c:1063 #18 0x000000000049f140 in diff_cache (revs=0x7fffffffdaf0, tree_sha1=0x32f3b094 "ò\032'\023\220U", tree_name=0x546766 "HEAD", cached=1) at diff-lib.c:476 #19 0x000000000049f18a in run_diff_index (revs=0x7fffffffdaf0, cached=1) at diff-lib.c:484 #20 0x000000000049f34d in index_differs_from (def=0x546766 "HEAD", diff_flags=0) at diff-lib.c:519 #21 0x0000000000470288 in do_pick_commit (commit=0x32f3b000, opts=0x7fffffffe270) at builtin/revert.c:502 #22 0x0000000000471d38 in single_pick (cmit=0x32f3b000, opts=0x7fffffffe270) at builtin/revert.c:1069 #23 0x0000000000471ea9 in pick_revisions (opts=0x7fffffffe270) at builtin/revert.c:1113 #24 0x0000000000472045 in cmd_cherry_pick (argc=2, argv=0x7fffffffe4c0, prefix=0x6a81c1 ) at builtin/revert.c:1161 #25 0x0000000000405093 in run_builtin (p=0x65d190, argc=2, argv=0x7fffffffe4c0) at git.c:308 #26 0x0000000000405288 in handle_internal_command (argc=2, argv=0x7fffffffe4c0) at git.c:467 It always interrupts inside experimental_loose_object, when reading memory-mapped data from disk(?). git diff <commit>^ <commit> works blazingly fast, so I assume that cherry-picking should also be, but it is not. What can I do to make the cherry-picking go quicker? I am using git 1.7.10 on FreeBSD 7.2. -- Dmitry Risenberg ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: cherry-pick is slow 2012-05-12 22:39 cherry-pick is slow Dmitry Risenberg @ 2012-05-13 1:11 ` Junio C Hamano 2012-05-13 15:39 ` Dmitry Risenberg 0 siblings, 1 reply; 9+ messages in thread From: Junio C Hamano @ 2012-05-13 1:11 UTC (permalink / raw) To: Dmitry Risenberg; +Cc: git On Sat, May 12, 2012 at 3:39 PM, Dmitry Risenberg <dmitry.risenberg@gmail.com> wrote: > > Hello. > > I have a very big git repository (the .git directory is about 5.3 Gb), > which is a copy of an svn repository fetched via git-svn. In fact > there are a few repositories ("working copies") that share the same > .git directory (via symlinks), in which I have different svn branches > checked out. Now I want to merge a commit from one svn branch to > another via git cherry-pick. The commit contains diff in only one > file. So I do > > git cherry-pick <commit> > > And the operation takes tens of seconds to finish. In "top" output I > see that git process uses almost no CPU, but has hundreds of page > faults, so I assume that it is reading a lot of files from disk. Wild guess: poorly (or worse yet, never) packed repository? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: cherry-pick is slow 2012-05-13 1:11 ` Junio C Hamano @ 2012-05-13 15:39 ` Dmitry Risenberg 2012-05-14 14:54 ` Jeff King 0 siblings, 1 reply; 9+ messages in thread From: Dmitry Risenberg @ 2012-05-13 15:39 UTC (permalink / raw) To: Junio C Hamano; +Cc: git 2012/5/13 Junio C Hamano <gitster-vger@pobox.com>: > On Sat, May 12, 2012 at 3:39 PM, Dmitry Risenberg > <dmitry.risenberg@gmail.com> wrote: >> >> Hello. >> >> I have a very big git repository (the .git directory is about 5.3 Gb), >> which is a copy of an svn repository fetched via git-svn. In fact >> there are a few repositories ("working copies") that share the same >> .git directory (via symlinks), in which I have different svn branches >> checked out. Now I want to merge a commit from one svn branch to >> another via git cherry-pick. The commit contains diff in only one >> file. So I do >> >> git cherry-pick <commit> >> >> And the operation takes tens of seconds to finish. In "top" output I >> see that git process uses almost no CPU, but has hundreds of page >> faults, so I assume that it is reading a lot of files from disk. > > Wild guess: poorly (or worse yet, never) packed repository? You were absolutely right. I set "gc.auto = 0" during the initial checkout of svn and forgot to turn it on afterwards. After running "git gc", my repo became two times smaller, and git operations are now running much faster. However, cherry-picking is still not as fast as I expected it to be - cherry-picking a single-file commit takes about 14-15 seconds, fully using one CPU core. Anything else I can improve? -- Dmitry Risenberg ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: cherry-pick is slow 2012-05-13 15:39 ` Dmitry Risenberg @ 2012-05-14 14:54 ` Jeff King [not found] ` <CAPZ_ugbD=mOPBs6GyapWtv6NWuJ-=r2+bqBN9n+gdTPwGj3F0Q@mail.gmail.com> 0 siblings, 1 reply; 9+ messages in thread From: Jeff King @ 2012-05-14 14:54 UTC (permalink / raw) To: Dmitry Risenberg; +Cc: Junio C Hamano, git On Sun, May 13, 2012 at 07:39:49PM +0400, Dmitry Risenberg wrote: > However, cherry-picking is still not as fast as I expected it to be - > cherry-picking a single-file commit takes about 14-15 seconds, fully > using one CPU core. Anything else I can improve? It's probably detecting renames as part of the merge, which can be expensive if the thing you are cherry-picking is far away from HEAD. You can try setting the merge.renamelimit config variable to something small (like 1; setting it to 0 means "no limit"). -Peff ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <CAPZ_ugbD=mOPBs6GyapWtv6NWuJ-=r2+bqBN9n+gdTPwGj3F0Q@mail.gmail.com>]
* Re: cherry-pick is slow [not found] ` <CAPZ_ugbD=mOPBs6GyapWtv6NWuJ-=r2+bqBN9n+gdTPwGj3F0Q@mail.gmail.com> @ 2012-05-15 13:24 ` Jeff King 2012-05-15 18:57 ` Paweł Sikora 2012-05-15 20:32 ` Junio C Hamano 0 siblings, 2 replies; 9+ messages in thread From: Jeff King @ 2012-05-15 13:24 UTC (permalink / raw) To: Dmitry Risenberg; +Cc: git [let's keep this on-list so others can benefit from the discussion] On Tue, May 15, 2012 at 12:38:59PM +0400, Dmitry Risenberg wrote: > > It's probably detecting renames as part of the merge, which can be > > expensive if the thing you are cherry-picking is far away from HEAD. You > > can try setting the merge.renamelimit config variable to something small > > (like 1; setting it to 0 means "no limit"). > > I set it to 1, but it didn't help at all - cherry-pick time is still > about the same. OK, then my guess was probably wrong. You'll have to try profiling (if you are on Linux, "perf record git cherry-pick ..."; perf report" is the simplest way). Or if the repository is publicly available, I can do a quick profile run. -Peff ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: cherry-pick is slow 2012-05-15 13:24 ` Jeff King @ 2012-05-15 18:57 ` Paweł Sikora 2012-05-15 20:32 ` Junio C Hamano 1 sibling, 0 replies; 9+ messages in thread From: Paweł Sikora @ 2012-05-15 18:57 UTC (permalink / raw) To: git; +Cc: Jeff King, Dmitry Risenberg On Tuesday 15 of May 2012 09:24:51 Jeff King wrote: > [let's keep this on-list so others can benefit from the discussion] > > On Tue, May 15, 2012 at 12:38:59PM +0400, Dmitry Risenberg wrote: > > > > It's probably detecting renames as part of the merge, which can be > > > expensive if the thing you are cherry-picking is far away from HEAD. You > > > can try setting the merge.renamelimit config variable to something small > > > (like 1; setting it to 0 means "no limit"). > > > > I set it to 1, but it didn't help at all - cherry-pick time is still > > about the same. > > OK, then my guess was probably wrong. You'll have to try profiling (if > you are on Linux, "perf record git cherry-pick ..."; perf report" is the > simplest way). Or if the repository is publicly available, I can do a > quick profile run. i have two big repos (few GB) and cherry-pick utilizes i/o and cpu heavy. timing varies from few seconds on raid-0 (2x500GB) to ~30 second on linear lvm (few TB). here's perf report: 36,24% git libc-2.15.so [.] __memmove_ssse3_back 7,04% git libz.so.1.2.7 [.] inflate_fast 6,17% git libz.so.1.2.7 [.] inflate 5,53% git git [.] xdl_recs_cmp 3,04% git libc-2.15.so [.] __memcmp_sse4_1 2,54% git libz.so.1.2.7 [.] inflate_table 1,83% git libc-2.15.so [.] __strcmp_sse42 1,52% git libc-2.15.so [.] __memcpy_ssse3_back 1,49% git git [.] match_trees 1,39% git libc-2.15.so [.] _int_malloc 1,18% git libz.so.1.2.7 [.] adler32 1,08% git git [.] do_head_ref 1,02% git git [.] splice_tree 0,83% git libc-2.15.so [.] __strlen_sse2_pminub 0,71% git [kernel.kallsyms] [k] _raw_spin_lock 0,68% git git [.] shift_tree_by 0,67% git libc-2.15.so [.] _int_free 0,63% git [kernel.kallsyms] [k] __d_lookup_rcu 0,60% git [kernel.kallsyms] [k] link_path_walk 0,57% git git [.] get_shallow_commits (...) ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: cherry-pick is slow 2012-05-15 13:24 ` Jeff King 2012-05-15 18:57 ` Paweł Sikora @ 2012-05-15 20:32 ` Junio C Hamano 2012-05-15 21:03 ` Junio C Hamano 1 sibling, 1 reply; 9+ messages in thread From: Junio C Hamano @ 2012-05-15 20:32 UTC (permalink / raw) To: Jeff King; +Cc: Dmitry Risenberg, git Jeff King <peff@peff.net> writes: > [let's keep this on-list so others can benefit from the discussion] > > On Tue, May 15, 2012 at 12:38:59PM +0400, Dmitry Risenberg wrote: > >> > It's probably detecting renames as part of the merge, which can be >> > expensive if the thing you are cherry-picking is far away from HEAD. You >> > can try setting the merge.renamelimit config variable to something small >> > (like 1; setting it to 0 means "no limit"). >> >> I set it to 1, but it didn't help at all - cherry-pick time is still >> about the same. > > OK, then my guess was probably wrong. You'll have to try profiling (if > you are on Linux, "perf record git cherry-pick ..."; perf report" is the > simplest way). Or if the repository is publicly available, I can do a > quick profile run. Perhaps the word "cherry-pick" invites an expectation that it must be faster than a full-tree merge, i.e. something like "format-patch | am -3", especially when the change introduced by the commit being cherry-picked touch only a handful of paths. Unfortunately, I do not think that the actual implementation of "cherry-pick" matches that expectation, as it is a full three-way merge. I am somewhat curious to see what the performance characteristics would be if the same commit is replayed using git format-patch -1 --stdout $commit | git apply --index --3way pipeline. Depending on the number of paths in the whole tree vs the number of paths the $commit touches, I wouldn't be surprised if it is faster. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: cherry-pick is slow 2012-05-15 20:32 ` Junio C Hamano @ 2012-05-15 21:03 ` Junio C Hamano 2012-05-19 0:54 ` Jeff King 0 siblings, 1 reply; 9+ messages in thread From: Junio C Hamano @ 2012-05-15 21:03 UTC (permalink / raw) To: Jeff King; +Cc: Dmitry Risenberg, git Junio C Hamano <gitster@pobox.com> writes: > Unfortunately, I do not think that the actual implementation of > "cherry-pick" matches that expectation, as it is a full three-way merge. > > I am somewhat curious to see what the performance characteristics would be > if the same commit is replayed using > > git format-patch -1 --stdout $commit | git apply --index --3way > > pipeline. Depending on the number of paths in the whole tree vs the > number of paths the $commit touches, I wouldn't be surprised if it is > faster. An unscientific datapoint shows that with a project as small as the kernel, the difference is noticeable. For example, v3.4-rc7-22-g3911ff3 (random tip of the day) touches two paths, and cherry-picking it on top of v3.3 goes like this: $ git checkout v3.3 && EDITOR=: /usr/bin/time git cherry-pick 3911ff3 Author: Jiri Kosina <jkosina@suse.cz> 2 files changed, 2 insertions(+) 1.08user 0.20system 0:01.28elapsed 99%CPU (0avgtext+0avgdata 469728maxresident)k 0inputs+7536outputs (0major+52604minor)pagefaults 0swaps as opposed to an alternative that touches only these two paths: $ git checkout v3.3 && EDITOR=: /usr/bin/time sh -c ' git format-patch --stdout -1 3911ff3 | git am -3' Applying: genirq: export handle_edge_irq() and irq_to_desc() 0.36user 0.16system 0:00.46elapsed 112%CPU (0avgtext+0avgdata 254720maxresident)k 0inputs+14872outputs (0major+55145minor)pagefaults 0swaps Of course, there are vast differences between v3.3 and 3911ff3^1; 11k+ paths touched, countless paths created and deleted. I _think_ most of the overhead comes from having to match the large trees in unpack_trees() even though none of the changes between the base versions matters for this" cherry-pick". Both reads the flat index into the core in its entirety and futzing with the index file format would not affect this comparison, even though it could improve the performance of "am", if done right, as it could limit its updates to only two paths. In the merge case, we pretty much rebuild the resulting index from scratch by walking the entire tree in unpack_trees(), so there won't be much benefit. Perhaps we might want to rethink the way we run merges? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: cherry-pick is slow 2012-05-15 21:03 ` Junio C Hamano @ 2012-05-19 0:54 ` Jeff King 0 siblings, 0 replies; 9+ messages in thread From: Jeff King @ 2012-05-19 0:54 UTC (permalink / raw) To: Junio C Hamano; +Cc: Dmitry Risenberg, git On Tue, May 15, 2012 at 02:03:40PM -0700, Junio C Hamano wrote: > > git format-patch -1 --stdout $commit | git apply --index --3way > [...] > An unscientific datapoint shows that with a project as small as the kernel, > the difference is noticeable. > > For example, v3.4-rc7-22-g3911ff3 (random tip of the day) touches two > paths, and cherry-picking it on top of v3.3 goes like this: Yeah that's what I would expect. And that's not even that far away. Cherry-picking the same commit onto v3.0 should be even more noticeable. > I _think_ most of the overhead comes from having to match the large trees > in unpack_trees() even though none of the changes between the base > versions matters for this" cherry-pick". > > Both reads the flat index into the core in its entirety and futzing with > the index file format would not affect this comparison, even though it > could improve the performance of "am", if done right, as it could limit > its updates to only two paths. In the merge case, we pretty much rebuild > the resulting index from scratch by walking the entire tree in > unpack_trees(), so there won't be much benefit. > > Perhaps we might want to rethink the way we run merges? For merge-recursive, we would always want to compute the pair-wise renames between each side and the ancestor. So that diff to the cherry-pick destination is always going to be an expensive O(# of changes between source and dest) operation. Without renames, you could do better on the actual merge with a three-way tree walk. E.g., you see that some sub-tree is at tree A in the "ours" and "ancestor" trees, but at tree B in "theirs". So you don't have to descend further, and can just say "take theirs" (well, you have to descend "theirs" to get the values). But I expect it gets more complicated with the interactions with the index (and is probably not worth spending much effort on because of the rename issue, anyway). -Peff ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2012-05-19 0:54 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-05-12 22:39 cherry-pick is slow Dmitry Risenberg 2012-05-13 1:11 ` Junio C Hamano 2012-05-13 15:39 ` Dmitry Risenberg 2012-05-14 14:54 ` Jeff King [not found] ` <CAPZ_ugbD=mOPBs6GyapWtv6NWuJ-=r2+bqBN9n+gdTPwGj3F0Q@mail.gmail.com> 2012-05-15 13:24 ` Jeff King 2012-05-15 18:57 ` Paweł Sikora 2012-05-15 20:32 ` Junio C Hamano 2012-05-15 21:03 ` Junio C Hamano 2012-05-19 0:54 ` Jeff King
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).