git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* cherry-pick is slow
@ 2012-05-12 22:39 Dmitry Risenberg
  2012-05-13  1:11 ` Junio C Hamano
  0 siblings, 1 reply; 9+ messages in thread
From: Dmitry Risenberg @ 2012-05-12 22:39 UTC (permalink / raw)
  To: git

Hello.

I have a very big git repository (the .git directory is about 5.3 Gb),
which is a copy of an svn repository fetched via git-svn. In fact
there are a few repositories ("working copies") that share the same
.git directory (via symlinks), in which I have different svn branches
checked out. Now I want to merge a commit from one svn branch to
another via git cherry-pick. The commit contains diff in only one
file. So I do

git cherry-pick <commit>

And the operation takes tens of seconds to finish. In "top" output I
see that git process uses almost no CPU, but has hundreds of page
faults, so I assume that it is reading a lot of files from disk. I
also tried running git in gdb and interrupting it in random places,
the stacktrace I get is usually like this:

#0  0x00000000004fb70c in experimental_loose_object (
    map=0x3268e000
"x\001+)JMU043g040031QpöMÌNõÉ,.)Ö+©(aøt*äãú½\vÍ4\236Yñ\230¤&z¯ß<{º\211\001\020(¤eV0\024\177ò\177á$\\$\034\224\036ö*÷vÂ\216#3ºâ!²\231)@Ù*íü»Ü7\233BmÜ\005^·\034ñÏOä¨\205Èæf\227\024e¦2¬fÐ\tY¾}ѳ\212,û\027î<ê\236[³w\234\204*(Í)Éd0ø\024\030n\233øìP\210\214üDSÆ?ó\216½¹>\a")
at sha1_file.c:1259
#1  0x00000000004fb8df in unpack_sha1_header (stream=0x7fffffffc9f0,
    map=0x3268e000
"x\001+)JMU043g040031QpöMÌNõÉ,.)Ö+©(aøt*äãú½\vÍ4\236Yñ\230¤&z¯ß<{º\211\001\020(¤eV0\024\177ò\177á$\\$\034\224\036ö*÷vÂ\216#3ºâ!²\231)@Ù*íü»Ü7\233BmÜ\005^·\034ñÏOä¨\205Èæf\227\024e¦2¬fÐ\tY¾}ѳ\212,û\027î<ê\236[³w\234\204*(Í)Éd0ø\024\030n\233øìP\210\214üDSÆ?ó\216½¹>\a",
mapsize=173,
    buffer=0x7fffffffa9f0, bufsiz=8192) at sha1_file.c:1308
#2  0x00000000004fbc85 in unpack_sha1_file (map=0x3268e000,
mapsize=173, type=0x7fffffffcbf0, size=0x7fffffffcbe8,
    sha1=0x7fffffffcbd0 "\001/Ç4&箺\036© wK`\214\"Ë\035H\203") at
sha1_file.c:1435
#3  0x00000000004fd96c in read_object (sha1=0x7fffffffcbd0
"\001/Ç4&箺\036© wK`\214\"Ë\035H\203", type=0x7fffffffcbf0,
size=0x7fffffffcbe8)
    at sha1_file.c:2233
#4  0x00000000004fda0d in read_sha1_file_extended (sha1=0x7fffffffcbd0
"\001/Ç4&箺\036© wK`\214\"Ë\035H\203", type=0x7fffffffcbf0,
size=0x7fffffffcbe8,
    flag=1) at sha1_file.c:2258
#5  0x00000000004fdcda in read_sha1_file (sha1=0x7fffffffcbd0
"\001/Ç4&箺\036© wK`\214\"Ë\035H\203", type=0x7fffffffcbf0,
size=0x7fffffffcbe8)
    at cache.h:761
#6  0x00000000004fdbb1 in read_object_with_reference (sha1=0x334a8130
"\001/Ç4&箺\036© wK`\214\"Ë\035H\203", required_type_name=0x55a1a0
"tree",
    size=0x7fffffffcc30, actual_sha1_return=0x0) at sha1_file.c:2299
#7  0x0000000000510f50 in fill_tree_descriptor (desc=0x7fffffffcd10,
sha1=0x334a8130 "\001/Ç4&箺\036© wK`\214\"Ë\035H\203") at
tree-walk.c:57
#8  0x00000000005133dd in traverse_trees_recursive (n=1, dirmask=1,
df_conflicts=0, names=0x349d68a0, info=0x7fffffffd020) at
unpack-trees.c:456
#9  0x0000000000514239 in unpack_callback (n=1, mask=1, dirmask=1,
names=0x349d68a0, info=0x7fffffffd020) at unpack-trees.c:809
#10 0x00000000005119c9 in traverse_trees (n=1, t=0x7fffffffd0b0,
info=0x7fffffffd020) at tree-walk.c:407
#11 0x000000000051342d in traverse_trees_recursive (n=1, dirmask=0,
df_conflicts=0, names=0x349d6860, info=0x7fffffffd3c0) at
unpack-trees.c:460
#12 0x0000000000514239 in unpack_callback (n=1, mask=1, dirmask=1,
names=0x349d6860, info=0x7fffffffd3c0) at unpack-trees.c:809
#13 0x00000000005119c9 in traverse_trees (n=1, t=0x7fffffffd450,
info=0x7fffffffd3c0) at tree-walk.c:407
#14 0x000000000051342d in traverse_trees_recursive (n=1, dirmask=0,
df_conflicts=0, names=0x349d6840, info=0x7fffffffd760) at
unpack-trees.c:460
#15 0x0000000000514239 in unpack_callback (n=1, mask=1, dirmask=1,
names=0x349d6840, info=0x7fffffffd760) at unpack-trees.c:809
#16 0x00000000005119c9 in traverse_trees (n=1, t=0x7fffffffda40,
info=0x7fffffffd760) at tree-walk.c:407
#17 0x0000000000514af5 in unpack_trees (len=1, t=0x7fffffffda40,
o=0x7fffffffd830) at unpack-trees.c:1063
#18 0x000000000049f140 in diff_cache (revs=0x7fffffffdaf0,
tree_sha1=0x32f3b094 "ò\032'\023\220U", tree_name=0x546766 "HEAD",
cached=1) at diff-lib.c:476
#19 0x000000000049f18a in run_diff_index (revs=0x7fffffffdaf0,
cached=1) at diff-lib.c:484
#20 0x000000000049f34d in index_differs_from (def=0x546766 "HEAD",
diff_flags=0) at diff-lib.c:519
#21 0x0000000000470288 in do_pick_commit (commit=0x32f3b000,
opts=0x7fffffffe270) at builtin/revert.c:502
#22 0x0000000000471d38 in single_pick (cmit=0x32f3b000,
opts=0x7fffffffe270) at builtin/revert.c:1069
#23 0x0000000000471ea9 in pick_revisions (opts=0x7fffffffe270) at
builtin/revert.c:1113
#24 0x0000000000472045 in cmd_cherry_pick (argc=2,
argv=0x7fffffffe4c0, prefix=0x6a81c1 ) at builtin/revert.c:1161
#25 0x0000000000405093 in run_builtin (p=0x65d190, argc=2,
argv=0x7fffffffe4c0) at git.c:308
#26 0x0000000000405288 in handle_internal_command (argc=2,
argv=0x7fffffffe4c0) at git.c:467

It always interrupts inside experimental_loose_object, when reading
memory-mapped data from disk(?).

git diff <commit>^ <commit>

works blazingly fast, so I assume that cherry-picking should also be,
but it is not. What can I do to make the cherry-picking go quicker?

I am using git 1.7.10 on FreeBSD 7.2.

-- 
Dmitry Risenberg

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: cherry-pick is slow
  2012-05-12 22:39 cherry-pick is slow Dmitry Risenberg
@ 2012-05-13  1:11 ` Junio C Hamano
  2012-05-13 15:39   ` Dmitry Risenberg
  0 siblings, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2012-05-13  1:11 UTC (permalink / raw)
  To: Dmitry Risenberg; +Cc: git

On Sat, May 12, 2012 at 3:39 PM, Dmitry Risenberg
<dmitry.risenberg@gmail.com> wrote:
>
> Hello.
>
> I have a very big git repository (the .git directory is about 5.3 Gb),
> which is a copy of an svn repository fetched via git-svn. In fact
> there are a few repositories ("working copies") that share the same
> .git directory (via symlinks), in which I have different svn branches
> checked out. Now I want to merge a commit from one svn branch to
> another via git cherry-pick. The commit contains diff in only one
> file. So I do
>
> git cherry-pick <commit>
>
> And the operation takes tens of seconds to finish. In "top" output I
> see that git process uses almost no CPU, but has hundreds of page
> faults, so I assume that it is reading a lot of files from disk.

Wild guess: poorly (or worse yet, never) packed repository?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: cherry-pick is slow
  2012-05-13  1:11 ` Junio C Hamano
@ 2012-05-13 15:39   ` Dmitry Risenberg
  2012-05-14 14:54     ` Jeff King
  0 siblings, 1 reply; 9+ messages in thread
From: Dmitry Risenberg @ 2012-05-13 15:39 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

2012/5/13 Junio C Hamano <gitster-vger@pobox.com>:
> On Sat, May 12, 2012 at 3:39 PM, Dmitry Risenberg
> <dmitry.risenberg@gmail.com> wrote:
>>
>> Hello.
>>
>> I have a very big git repository (the .git directory is about 5.3 Gb),
>> which is a copy of an svn repository fetched via git-svn. In fact
>> there are a few repositories ("working copies") that share the same
>> .git directory (via symlinks), in which I have different svn branches
>> checked out. Now I want to merge a commit from one svn branch to
>> another via git cherry-pick. The commit contains diff in only one
>> file. So I do
>>
>> git cherry-pick <commit>
>>
>> And the operation takes tens of seconds to finish. In "top" output I
>> see that git process uses almost no CPU, but has hundreds of page
>> faults, so I assume that it is reading a lot of files from disk.
>
> Wild guess: poorly (or worse yet, never) packed repository?

You were absolutely right.
I set "gc.auto = 0" during the initial checkout of svn and forgot to
turn it on afterwards. After running "git gc", my repo became two
times smaller, and git operations are now running much faster.

However, cherry-picking is still not as fast as I expected it to be -
cherry-picking a single-file commit takes about 14-15 seconds, fully
using one CPU core. Anything else I can improve?

-- 
Dmitry Risenberg

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: cherry-pick is slow
  2012-05-13 15:39   ` Dmitry Risenberg
@ 2012-05-14 14:54     ` Jeff King
       [not found]       ` <CAPZ_ugbD=mOPBs6GyapWtv6NWuJ-=r2+bqBN9n+gdTPwGj3F0Q@mail.gmail.com>
  0 siblings, 1 reply; 9+ messages in thread
From: Jeff King @ 2012-05-14 14:54 UTC (permalink / raw)
  To: Dmitry Risenberg; +Cc: Junio C Hamano, git

On Sun, May 13, 2012 at 07:39:49PM +0400, Dmitry Risenberg wrote:

> However, cherry-picking is still not as fast as I expected it to be -
> cherry-picking a single-file commit takes about 14-15 seconds, fully
> using one CPU core. Anything else I can improve?

It's probably detecting renames as part of the merge, which can be
expensive if the thing you are cherry-picking is far away from HEAD. You
can try setting the merge.renamelimit config variable to something small
(like 1; setting it to 0 means "no limit").

-Peff

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: cherry-pick is slow
       [not found]       ` <CAPZ_ugbD=mOPBs6GyapWtv6NWuJ-=r2+bqBN9n+gdTPwGj3F0Q@mail.gmail.com>
@ 2012-05-15 13:24         ` Jeff King
  2012-05-15 18:57           ` Paweł Sikora
  2012-05-15 20:32           ` Junio C Hamano
  0 siblings, 2 replies; 9+ messages in thread
From: Jeff King @ 2012-05-15 13:24 UTC (permalink / raw)
  To: Dmitry Risenberg; +Cc: git

[let's keep this on-list so others can benefit from the discussion]

On Tue, May 15, 2012 at 12:38:59PM +0400, Dmitry Risenberg wrote:

> > It's probably detecting renames as part of the merge, which can be
> > expensive if the thing you are cherry-picking is far away from HEAD. You
> > can try setting the merge.renamelimit config variable to something small
> > (like 1; setting it to 0 means "no limit").
> 
> I set it to 1, but it didn't help at all - cherry-pick time is still
> about the same.

OK, then my guess was probably wrong. You'll have to try profiling (if
you are on Linux, "perf record git cherry-pick ..."; perf report" is the
simplest way). Or if the repository is publicly available, I can do a
quick profile run.

-Peff

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: cherry-pick is slow
  2012-05-15 13:24         ` Jeff King
@ 2012-05-15 18:57           ` Paweł Sikora
  2012-05-15 20:32           ` Junio C Hamano
  1 sibling, 0 replies; 9+ messages in thread
From: Paweł Sikora @ 2012-05-15 18:57 UTC (permalink / raw)
  To: git; +Cc: Jeff King, Dmitry Risenberg

On Tuesday 15 of May 2012 09:24:51 Jeff King wrote:
> [let's keep this on-list so others can benefit from the discussion]
> 
> On Tue, May 15, 2012 at 12:38:59PM +0400, Dmitry Risenberg wrote:
> 
> > > It's probably detecting renames as part of the merge, which can be
> > > expensive if the thing you are cherry-picking is far away from HEAD. You
> > > can try setting the merge.renamelimit config variable to something small
> > > (like 1; setting it to 0 means "no limit").
> > 
> > I set it to 1, but it didn't help at all - cherry-pick time is still
> > about the same.
> 
> OK, then my guess was probably wrong. You'll have to try profiling (if
> you are on Linux, "perf record git cherry-pick ..."; perf report" is the
> simplest way). Or if the repository is publicly available, I can do a
> quick profile run.

i have two big repos (few GB) and cherry-pick utilizes i/o and cpu heavy.
timing varies from few seconds on raid-0 (2x500GB) to ~30 second
on linear lvm (few TB). here's perf report:

 36,24%  git  libc-2.15.so        [.] __memmove_ssse3_back
  7,04%  git  libz.so.1.2.7       [.] inflate_fast
  6,17%  git  libz.so.1.2.7       [.] inflate
  5,53%  git  git                 [.] xdl_recs_cmp
  3,04%  git  libc-2.15.so        [.] __memcmp_sse4_1
  2,54%  git  libz.so.1.2.7       [.] inflate_table
  1,83%  git  libc-2.15.so        [.] __strcmp_sse42
  1,52%  git  libc-2.15.so        [.] __memcpy_ssse3_back
  1,49%  git  git                 [.] match_trees
  1,39%  git  libc-2.15.so        [.] _int_malloc
  1,18%  git  libz.so.1.2.7       [.] adler32
  1,08%  git  git                 [.] do_head_ref
  1,02%  git  git                 [.] splice_tree
  0,83%  git  libc-2.15.so        [.] __strlen_sse2_pminub
  0,71%  git  [kernel.kallsyms]   [k] _raw_spin_lock
  0,68%  git  git                 [.] shift_tree_by
  0,67%  git  libc-2.15.so        [.] _int_free
  0,63%  git  [kernel.kallsyms]   [k] __d_lookup_rcu
  0,60%  git  [kernel.kallsyms]   [k] link_path_walk
  0,57%  git  git                 [.] get_shallow_commits
(...)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: cherry-pick is slow
  2012-05-15 13:24         ` Jeff King
  2012-05-15 18:57           ` Paweł Sikora
@ 2012-05-15 20:32           ` Junio C Hamano
  2012-05-15 21:03             ` Junio C Hamano
  1 sibling, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2012-05-15 20:32 UTC (permalink / raw)
  To: Jeff King; +Cc: Dmitry Risenberg, git

Jeff King <peff@peff.net> writes:

> [let's keep this on-list so others can benefit from the discussion]
>
> On Tue, May 15, 2012 at 12:38:59PM +0400, Dmitry Risenberg wrote:
>
>> > It's probably detecting renames as part of the merge, which can be
>> > expensive if the thing you are cherry-picking is far away from HEAD. You
>> > can try setting the merge.renamelimit config variable to something small
>> > (like 1; setting it to 0 means "no limit").
>> 
>> I set it to 1, but it didn't help at all - cherry-pick time is still
>> about the same.
>
> OK, then my guess was probably wrong. You'll have to try profiling (if
> you are on Linux, "perf record git cherry-pick ..."; perf report" is the
> simplest way). Or if the repository is publicly available, I can do a
> quick profile run.

Perhaps the word "cherry-pick" invites an expectation that it must be
faster than a full-tree merge, i.e. something like "format-patch | am -3",
especially when the change introduced by the commit being cherry-picked
touch only a handful of paths.

Unfortunately, I do not think that the actual implementation of
"cherry-pick" matches that expectation, as it is a full three-way merge.

I am somewhat curious to see what the performance characteristics would be
if the same commit is replayed using

	git format-patch -1 --stdout $commit | git apply --index --3way

pipeline.  Depending on the number of paths in the whole tree vs the
number of paths the $commit touches, I wouldn't be surprised if it is
faster.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: cherry-pick is slow
  2012-05-15 20:32           ` Junio C Hamano
@ 2012-05-15 21:03             ` Junio C Hamano
  2012-05-19  0:54               ` Jeff King
  0 siblings, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2012-05-15 21:03 UTC (permalink / raw)
  To: Jeff King; +Cc: Dmitry Risenberg, git

Junio C Hamano <gitster@pobox.com> writes:

> Unfortunately, I do not think that the actual implementation of
> "cherry-pick" matches that expectation, as it is a full three-way merge.
>
> I am somewhat curious to see what the performance characteristics would be
> if the same commit is replayed using
>
> 	git format-patch -1 --stdout $commit | git apply --index --3way
>
> pipeline.  Depending on the number of paths in the whole tree vs the
> number of paths the $commit touches, I wouldn't be surprised if it is
> faster.

An unscientific datapoint shows that with a project as small as the kernel,
the difference is noticeable.

For example, v3.4-rc7-22-g3911ff3 (random tip of the day) touches two
paths, and cherry-picking it on top of v3.3 goes like this:

    $ git checkout v3.3 && EDITOR=: /usr/bin/time git cherry-pick 3911ff3
     Author: Jiri Kosina <jkosina@suse.cz>
     2 files changed, 2 insertions(+)
    1.08user 0.20system 0:01.28elapsed 99%CPU (0avgtext+0avgdata 469728maxresident)k
    0inputs+7536outputs (0major+52604minor)pagefaults 0swaps

as opposed to an alternative that touches only these two paths:

    $ git checkout v3.3 && EDITOR=: /usr/bin/time sh -c '
	git format-patch --stdout -1 3911ff3 | git am -3'
    Applying: genirq: export handle_edge_irq() and irq_to_desc()
    0.36user 0.16system 0:00.46elapsed 112%CPU (0avgtext+0avgdata 254720maxresident)k
    0inputs+14872outputs (0major+55145minor)pagefaults 0swaps

Of course, there are vast differences between v3.3 and 3911ff3^1; 11k+
paths touched, countless paths created and deleted.

I _think_ most of the overhead comes from having to match the large trees
in unpack_trees() even though none of the changes between the base
versions matters for this" cherry-pick".

Both reads the flat index into the core in its entirety and futzing with
the index file format would not affect this comparison, even though it
could improve the performance of "am", if done right, as it could limit
its updates to only two paths.  In the merge case, we pretty much rebuild
the resulting index from scratch by walking the entire tree in
unpack_trees(), so there won't be much benefit.

Perhaps we might want to rethink the way we run merges?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: cherry-pick is slow
  2012-05-15 21:03             ` Junio C Hamano
@ 2012-05-19  0:54               ` Jeff King
  0 siblings, 0 replies; 9+ messages in thread
From: Jeff King @ 2012-05-19  0:54 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Dmitry Risenberg, git

On Tue, May 15, 2012 at 02:03:40PM -0700, Junio C Hamano wrote:

> > 	git format-patch -1 --stdout $commit | git apply --index --3way
> [...]
> An unscientific datapoint shows that with a project as small as the kernel,
> the difference is noticeable.
>
> For example, v3.4-rc7-22-g3911ff3 (random tip of the day) touches two
> paths, and cherry-picking it on top of v3.3 goes like this:

Yeah that's what I would expect. And that's not even that far away.
Cherry-picking the same commit onto v3.0 should be even more noticeable.

> I _think_ most of the overhead comes from having to match the large trees
> in unpack_trees() even though none of the changes between the base
> versions matters for this" cherry-pick".
> 
> Both reads the flat index into the core in its entirety and futzing with
> the index file format would not affect this comparison, even though it
> could improve the performance of "am", if done right, as it could limit
> its updates to only two paths.  In the merge case, we pretty much rebuild
> the resulting index from scratch by walking the entire tree in
> unpack_trees(), so there won't be much benefit.
> 
> Perhaps we might want to rethink the way we run merges?

For merge-recursive, we would always want to compute the pair-wise
renames between each side and the ancestor. So that diff to the
cherry-pick destination is always going to be an expensive O(# of
changes between source and dest) operation.

Without renames, you could do better on the actual merge with a
three-way tree walk. E.g., you see that some sub-tree is at tree A in
the "ours" and "ancestor" trees, but at tree B in "theirs". So you don't
have to descend further, and can just say "take theirs" (well, you have
to descend "theirs" to get the values). But I expect it gets more
complicated with the interactions with the index (and is probably not
worth spending much effort on because of the rename issue, anyway).

-Peff

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2012-05-19  0:54 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-05-12 22:39 cherry-pick is slow Dmitry Risenberg
2012-05-13  1:11 ` Junio C Hamano
2012-05-13 15:39   ` Dmitry Risenberg
2012-05-14 14:54     ` Jeff King
     [not found]       ` <CAPZ_ugbD=mOPBs6GyapWtv6NWuJ-=r2+bqBN9n+gdTPwGj3F0Q@mail.gmail.com>
2012-05-15 13:24         ` Jeff King
2012-05-15 18:57           ` Paweł Sikora
2012-05-15 20:32           ` Junio C Hamano
2012-05-15 21:03             ` Junio C Hamano
2012-05-19  0:54               ` Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).