git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* VERY slow git format-patch (tens on minutes) during rebase and rev-list during rebase -i
@ 2010-07-13  6:56 Marat Radchenko
  2010-07-13  8:12 ` Michael J Gruber
  2010-10-13  7:56 ` [FEATURE REQUEST] allow enabling patience diff algorithm by default Marat Radchenko
  0 siblings, 2 replies; 7+ messages in thread
From: Marat Radchenko @ 2010-07-13  6:56 UTC (permalink / raw)
  To: git

Hi.

My setup:
0. Quad-code machine with 8GB of ram, 10K RPM hdd.
1. SVN repo that i periodically fetch into origin/trunk branch. Has ~200 
commits/day.
2. My local branch with 1-5 commits which i often rebase against trunk.
3. I haven't rebased for 2 days, so i'm rebasing 3 (three) commits in my branch 
over 453 commits in trunk using "git rebase trunk".
4. trunk does contain "bad" from diff POV files (big & binary).
5. Sadly, data in repo is confidential.

Expected: rebase takes some reasonable amount of time (< 1 min?).

Actual: rebase takes 20 mins.

Almost all of that time was spent doing `git format-patch -k --stdout --full-
index --ignore-if-in-upstream 
80bb0dfe3d86f3cc9095ea616d9d1b1530fbe7b8..d3fde4ae7497981a6fe61b0366b105477896cf
52` (that's three commits from my branch) at 100% of one CPU core.

Additional info:

Another similar rebase but over 4.5k of commits took 2 hours.

Running without --ignore-if-in-upstream:
$ time git format-patch -k --stdout --full-index 
80bb0dfe3d86f3cc9095ea616d9d1b1530fbe7b8..d3fde4ae7497981a6fe61b0366b105477896cf
5 | wc -l
25823
Is it 
real	0m0.163s
user	0m0.140s
sys	0m0.020s

Proof there are only three commits:

$ git rev-list 
80bb0dfe3d86f3cc9095ea616d9d1b1530fbe7b8..d3fde4ae7497981a6fe61b0366b105477896cf
52d3fde4ae7497981a6fe61b0366b105477896cf52
e18069258806bda6a6165822003f5e9fd958f906
c8c2f2e157e615b73d0baab1d793a22991c9ba71

Questions:
1. Is it expected behavior (branch you rebase onto has binary files -> no 
performance for you)?
2. If [1] is yes, is it possible to prevent rebase from running --ignore-if-in-
upstream?
3. If [1] is no, should i run some kind of profiler (how?) to determine what 
exactly causes such performance drop?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: VERY slow git format-patch (tens on minutes) during rebase and rev-list during rebase -i
  2010-07-13  6:56 VERY slow git format-patch (tens on minutes) during rebase and rev-list during rebase -i Marat Radchenko
@ 2010-07-13  8:12 ` Michael J Gruber
  2010-07-13  8:13   ` [RFC/PATCH] rebase: Allow to turn of ignore-if-in-upstream Michael J Gruber
  2010-10-13  7:56 ` [FEATURE REQUEST] allow enabling patience diff algorithm by default Marat Radchenko
  1 sibling, 1 reply; 7+ messages in thread
From: Michael J Gruber @ 2010-07-13  8:12 UTC (permalink / raw)
  To: Marat Radchenko; +Cc: git

Marat Radchenko venit, vidit, dixit 13.07.2010 08:56:
> Hi.
> 
> My setup:
> 0. Quad-code machine with 8GB of ram, 10K RPM hdd.
> 1. SVN repo that i periodically fetch into origin/trunk branch. Has ~200 
> commits/day.
> 2. My local branch with 1-5 commits which i often rebase against trunk.
> 3. I haven't rebased for 2 days, so i'm rebasing 3 (three) commits in my branch 
> over 453 commits in trunk using "git rebase trunk".
> 4. trunk does contain "bad" from diff POV files (big & binary).
> 5. Sadly, data in repo is confidential.
> 
> Expected: rebase takes some reasonable amount of time (< 1 min?).
> 
> Actual: rebase takes 20 mins.
> 
> Almost all of that time was spent doing `git format-patch -k --stdout --full-
> index --ignore-if-in-upstream 
> 80bb0dfe3d86f3cc9095ea616d9d1b1530fbe7b8..d3fde4ae7497981a6fe61b0366b105477896cf
> 52` (that's three commits from my branch) at 100% of one CPU core.
> 
> Additional info:
> 
> Another similar rebase but over 4.5k of commits took 2 hours.
> 
> Running without --ignore-if-in-upstream:
> $ time git format-patch -k --stdout --full-index 
> 80bb0dfe3d86f3cc9095ea616d9d1b1530fbe7b8..d3fde4ae7497981a6fe61b0366b105477896cf
> 5 | wc -l
> 25823
> Is it 
> real	0m0.163s
> user	0m0.140s
> sys	0m0.020s
> 
> Proof there are only three commits:
> 
> $ git rev-list 
> 80bb0dfe3d86f3cc9095ea616d9d1b1530fbe7b8..d3fde4ae7497981a6fe61b0366b105477896cf
> 52d3fde4ae7497981a6fe61b0366b105477896cf52
> e18069258806bda6a6165822003f5e9fd958f906
> c8c2f2e157e615b73d0baab1d793a22991c9ba71
> 
> Questions:
> 1. Is it expected behavior (branch you rebase onto has binary files -> no 
> performance for you)?

Well, with "ignore-if-in-upstream" git has to compute a patch-id for
every upstream patch (merge-base..upstream) and compare to the ids of
the commits in mb..HEAD.

> 2. If [1] is yes, is it possible to prevent rebase from running --ignore-if-in-
> upstream?

Not currently, but with my upcoming patch ;)

This has the (side-) effect of not ignoring patches which have been
applied (with different sha1) upstream, of course.

> 3. If [1] is no, should i run some kind of profiler (how?) to determine what 
> exactly causes such performance drop?

It is the calculation of the patch-ids. Git first creates a "binary
diff" and then computes the patch-id (sha1) of that diff. I am sure we
could optimize the calculation of patch-ids for binary diffs, which may
be useful in addition to shutting off "cherry" with rebase.

Michael

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [RFC/PATCH] rebase: Allow to turn of ignore-if-in-upstream
  2010-07-13  8:12 ` Michael J Gruber
@ 2010-07-13  8:13   ` Michael J Gruber
  2010-07-13 19:33     ` Erik Faye-Lund
  0 siblings, 1 reply; 7+ messages in thread
From: Michael J Gruber @ 2010-07-13  8:13 UTC (permalink / raw)
  To: git; +Cc: Marat Radchenko

git-rebase uses "format-patch --ignore-if-in-upstream" do determine
which commits to apply. This may or may not be desired: a user may want
to transplant all commits, or may opt to avoid the possibly time
consuming calculation of patch-ids.

Therefore, introduce rebase.cherry (defaulting to true) and --cherry and
--no-cherry options (to override the config), where --cherry means the
current behavior and --no-cherry avoids "--ignore-if-in-upstream".

Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net>
---
RFC for obvious reasons (doc, tests).

 git-rebase.sh |   16 +++++++++++++++-
 1 files changed, 15 insertions(+), 1 deletions(-)

diff --git a/git-rebase.sh b/git-rebase.sh
index ab4afa7..1eb6ad1 100755
--- a/git-rebase.sh
+++ b/git-rebase.sh
@@ -53,6 +53,7 @@ git_am_opt=
 rebase_root=
 force_rebase=
 allow_rerere_autoupdate=
+cherry=$(git config --bool rebase.cherry)
 
 continue_merge () {
 	test -n "$prev_head" || die "prev_head must be defined"
@@ -307,6 +308,12 @@ do
 		esac
 		do_merge=t
 		;;
+	--cherry)
+		cherry=true
+		;;
+	--no-cherry)
+		cherry=false
+		;;
 	-n|--no-stat)
 		diffstat=
 		;;
@@ -540,9 +547,16 @@ else
 	revisions="$upstream..$orig_head"
 fi
 
+if test "x$cherry" = "xfalse"
+then
+	cherry_opt=""
+else
+	cherry_opt="--ignore-if-in-upstream"
+fi
+
 if test -z "$do_merge"
 then
-	git format-patch -k --stdout --full-index --ignore-if-in-upstream \
+	git format-patch -k --stdout --full-index $cherry_opt \
 		$root_flag "$revisions" |
 	git am $git_am_opt --rebasing --resolvemsg="$RESOLVEMSG" &&
 	move_to_original_branch
-- 
1.7.2.rc1.212.g850a

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC/PATCH] rebase: Allow to turn of ignore-if-in-upstream
  2010-07-13  8:13   ` [RFC/PATCH] rebase: Allow to turn of ignore-if-in-upstream Michael J Gruber
@ 2010-07-13 19:33     ` Erik Faye-Lund
  2010-09-04 15:03       ` Michael J Gruber
  0 siblings, 1 reply; 7+ messages in thread
From: Erik Faye-Lund @ 2010-07-13 19:33 UTC (permalink / raw)
  To: Michael J Gruber; +Cc: git, Marat Radchenko

s/of/off/ in the subject ;)

On Tue, Jul 13, 2010 at 10:13 AM, Michael J Gruber
<git@drmicha.warpmail.net> wrote:
> git-rebase uses "format-patch --ignore-if-in-upstream" do determine
> which commits to apply. This may or may not be desired: a user may want
> to transplant all commits, or may opt to avoid the possibly time
> consuming calculation of patch-ids.
>
> Therefore, introduce rebase.cherry (defaulting to true) and --cherry and
> --no-cherry options (to override the config), where --cherry means the
> current behavior and --no-cherry avoids "--ignore-if-in-upstream".
>
> Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net>
> ---
> RFC for obvious reasons (doc, tests).

-- 
Erik "kusma" Faye-Lund

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC/PATCH] rebase: Allow to turn of ignore-if-in-upstream
  2010-07-13 19:33     ` Erik Faye-Lund
@ 2010-09-04 15:03       ` Michael J Gruber
  2010-09-09  8:05         ` Marat Radchenko
  0 siblings, 1 reply; 7+ messages in thread
From: Michael J Gruber @ 2010-09-04 15:03 UTC (permalink / raw)
  To: kusmabite; +Cc: Erik Faye-Lund, git, Marat Radchenko, Junio C Hamano

Erik Faye-Lund venit, vidit, dixit 13.07.2010 21:33:
> s/of/off/ in the subject ;)
> 
> On Tue, Jul 13, 2010 at 10:13 AM, Michael J Gruber
> <git@drmicha.warpmail.net> wrote:
>> git-rebase uses "format-patch --ignore-if-in-upstream" do determine
>> which commits to apply. This may or may not be desired: a user may want
>> to transplant all commits, or may opt to avoid the possibly time
>> consuming calculation of patch-ids.
>>
>> Therefore, introduce rebase.cherry (defaulting to true) and --cherry and
>> --no-cherry options (to override the config), where --cherry means the
>> current behavior and --no-cherry avoids "--ignore-if-in-upstream".
>>
>> Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net>
>> ---
>> RFC for obvious reasons (doc, tests).
> 

Pinging this one. Is there any interest? Erik is right, off course ;)

Michael

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC/PATCH] rebase: Allow to turn of ignore-if-in-upstream
  2010-09-04 15:03       ` Michael J Gruber
@ 2010-09-09  8:05         ` Marat Radchenko
  0 siblings, 0 replies; 7+ messages in thread
From: Marat Radchenko @ 2010-09-09  8:05 UTC (permalink / raw)
  To: Michael J Gruber, kusmabite; +Cc: Erik Faye-Lund, git, Junio C Hamano

> Pinging this one. Is there any interest? Erik is right, off course ;)

There definitely is. Since [1] rebasing became much faster (minutes instead of tens of minutes), though still it takes more than I'd like it to.

[1]: http://repo.or.cz/w/git.git/commit/34597c1f5a77c710dae33092cb8a7cb01c6b21c1

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [FEATURE REQUEST] allow enabling patience diff algorithm by default
  2010-07-13  6:56 VERY slow git format-patch (tens on minutes) during rebase and rev-list during rebase -i Marat Radchenko
  2010-07-13  8:12 ` Michael J Gruber
@ 2010-10-13  7:56 ` Marat Radchenko
  1 sibling, 0 replies; 7+ messages in thread
From: Marat Radchenko @ 2010-10-13  7:56 UTC (permalink / raw)
  To: git


I observe patience algorithm being several times faster than standard diff on
some big (1MB<size<10MB) text files (and, actually, it produces smaller
diffs). So using patience diff is likely to improve git-rev-list
performance.

Suggested way: add option to ~/.gitconfig to enable patience diff by
default. Additionally, smth like--no-patience may be added to commands that
accept --patience now so it is possible to override setting if needed.

-- 
View this message in context: http://git.661346.n2.nabble.com/VERY-slow-git-format-patch-tens-on-minutes-during-rebase-and-rev-list-during-rebase-i-tp5286226p5629926.html
Sent from the git mailing list archive at Nabble.com.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-10-13  7:57 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-07-13  6:56 VERY slow git format-patch (tens on minutes) during rebase and rev-list during rebase -i Marat Radchenko
2010-07-13  8:12 ` Michael J Gruber
2010-07-13  8:13   ` [RFC/PATCH] rebase: Allow to turn of ignore-if-in-upstream Michael J Gruber
2010-07-13 19:33     ` Erik Faye-Lund
2010-09-04 15:03       ` Michael J Gruber
2010-09-09  8:05         ` Marat Radchenko
2010-10-13  7:56 ` [FEATURE REQUEST] allow enabling patience diff algorithm by default Marat Radchenko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).