* VERY slow git format-patch (tens on minutes) during rebase and rev-list during rebase -i
@ 2010-07-13 6:56 Marat Radchenko
2010-07-13 8:12 ` Michael J Gruber
2010-10-13 7:56 ` [FEATURE REQUEST] allow enabling patience diff algorithm by default Marat Radchenko
0 siblings, 2 replies; 7+ messages in thread
From: Marat Radchenko @ 2010-07-13 6:56 UTC (permalink / raw)
To: git
Hi.
My setup:
0. Quad-code machine with 8GB of ram, 10K RPM hdd.
1. SVN repo that i periodically fetch into origin/trunk branch. Has ~200
commits/day.
2. My local branch with 1-5 commits which i often rebase against trunk.
3. I haven't rebased for 2 days, so i'm rebasing 3 (three) commits in my branch
over 453 commits in trunk using "git rebase trunk".
4. trunk does contain "bad" from diff POV files (big & binary).
5. Sadly, data in repo is confidential.
Expected: rebase takes some reasonable amount of time (< 1 min?).
Actual: rebase takes 20 mins.
Almost all of that time was spent doing `git format-patch -k --stdout --full-
index --ignore-if-in-upstream
80bb0dfe3d86f3cc9095ea616d9d1b1530fbe7b8..d3fde4ae7497981a6fe61b0366b105477896cf
52` (that's three commits from my branch) at 100% of one CPU core.
Additional info:
Another similar rebase but over 4.5k of commits took 2 hours.
Running without --ignore-if-in-upstream:
$ time git format-patch -k --stdout --full-index
80bb0dfe3d86f3cc9095ea616d9d1b1530fbe7b8..d3fde4ae7497981a6fe61b0366b105477896cf
5 | wc -l
25823
Is it
real 0m0.163s
user 0m0.140s
sys 0m0.020s
Proof there are only three commits:
$ git rev-list
80bb0dfe3d86f3cc9095ea616d9d1b1530fbe7b8..d3fde4ae7497981a6fe61b0366b105477896cf
52d3fde4ae7497981a6fe61b0366b105477896cf52
e18069258806bda6a6165822003f5e9fd958f906
c8c2f2e157e615b73d0baab1d793a22991c9ba71
Questions:
1. Is it expected behavior (branch you rebase onto has binary files -> no
performance for you)?
2. If [1] is yes, is it possible to prevent rebase from running --ignore-if-in-
upstream?
3. If [1] is no, should i run some kind of profiler (how?) to determine what
exactly causes such performance drop?
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: VERY slow git format-patch (tens on minutes) during rebase and rev-list during rebase -i
2010-07-13 6:56 VERY slow git format-patch (tens on minutes) during rebase and rev-list during rebase -i Marat Radchenko
@ 2010-07-13 8:12 ` Michael J Gruber
2010-07-13 8:13 ` [RFC/PATCH] rebase: Allow to turn of ignore-if-in-upstream Michael J Gruber
2010-10-13 7:56 ` [FEATURE REQUEST] allow enabling patience diff algorithm by default Marat Radchenko
1 sibling, 1 reply; 7+ messages in thread
From: Michael J Gruber @ 2010-07-13 8:12 UTC (permalink / raw)
To: Marat Radchenko; +Cc: git
Marat Radchenko venit, vidit, dixit 13.07.2010 08:56:
> Hi.
>
> My setup:
> 0. Quad-code machine with 8GB of ram, 10K RPM hdd.
> 1. SVN repo that i periodically fetch into origin/trunk branch. Has ~200
> commits/day.
> 2. My local branch with 1-5 commits which i often rebase against trunk.
> 3. I haven't rebased for 2 days, so i'm rebasing 3 (three) commits in my branch
> over 453 commits in trunk using "git rebase trunk".
> 4. trunk does contain "bad" from diff POV files (big & binary).
> 5. Sadly, data in repo is confidential.
>
> Expected: rebase takes some reasonable amount of time (< 1 min?).
>
> Actual: rebase takes 20 mins.
>
> Almost all of that time was spent doing `git format-patch -k --stdout --full-
> index --ignore-if-in-upstream
> 80bb0dfe3d86f3cc9095ea616d9d1b1530fbe7b8..d3fde4ae7497981a6fe61b0366b105477896cf
> 52` (that's three commits from my branch) at 100% of one CPU core.
>
> Additional info:
>
> Another similar rebase but over 4.5k of commits took 2 hours.
>
> Running without --ignore-if-in-upstream:
> $ time git format-patch -k --stdout --full-index
> 80bb0dfe3d86f3cc9095ea616d9d1b1530fbe7b8..d3fde4ae7497981a6fe61b0366b105477896cf
> 5 | wc -l
> 25823
> Is it
> real 0m0.163s
> user 0m0.140s
> sys 0m0.020s
>
> Proof there are only three commits:
>
> $ git rev-list
> 80bb0dfe3d86f3cc9095ea616d9d1b1530fbe7b8..d3fde4ae7497981a6fe61b0366b105477896cf
> 52d3fde4ae7497981a6fe61b0366b105477896cf52
> e18069258806bda6a6165822003f5e9fd958f906
> c8c2f2e157e615b73d0baab1d793a22991c9ba71
>
> Questions:
> 1. Is it expected behavior (branch you rebase onto has binary files -> no
> performance for you)?
Well, with "ignore-if-in-upstream" git has to compute a patch-id for
every upstream patch (merge-base..upstream) and compare to the ids of
the commits in mb..HEAD.
> 2. If [1] is yes, is it possible to prevent rebase from running --ignore-if-in-
> upstream?
Not currently, but with my upcoming patch ;)
This has the (side-) effect of not ignoring patches which have been
applied (with different sha1) upstream, of course.
> 3. If [1] is no, should i run some kind of profiler (how?) to determine what
> exactly causes such performance drop?
It is the calculation of the patch-ids. Git first creates a "binary
diff" and then computes the patch-id (sha1) of that diff. I am sure we
could optimize the calculation of patch-ids for binary diffs, which may
be useful in addition to shutting off "cherry" with rebase.
Michael
^ permalink raw reply [flat|nested] 7+ messages in thread
* [RFC/PATCH] rebase: Allow to turn of ignore-if-in-upstream
2010-07-13 8:12 ` Michael J Gruber
@ 2010-07-13 8:13 ` Michael J Gruber
2010-07-13 19:33 ` Erik Faye-Lund
0 siblings, 1 reply; 7+ messages in thread
From: Michael J Gruber @ 2010-07-13 8:13 UTC (permalink / raw)
To: git; +Cc: Marat Radchenko
git-rebase uses "format-patch --ignore-if-in-upstream" do determine
which commits to apply. This may or may not be desired: a user may want
to transplant all commits, or may opt to avoid the possibly time
consuming calculation of patch-ids.
Therefore, introduce rebase.cherry (defaulting to true) and --cherry and
--no-cherry options (to override the config), where --cherry means the
current behavior and --no-cherry avoids "--ignore-if-in-upstream".
Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net>
---
RFC for obvious reasons (doc, tests).
git-rebase.sh | 16 +++++++++++++++-
1 files changed, 15 insertions(+), 1 deletions(-)
diff --git a/git-rebase.sh b/git-rebase.sh
index ab4afa7..1eb6ad1 100755
--- a/git-rebase.sh
+++ b/git-rebase.sh
@@ -53,6 +53,7 @@ git_am_opt=
rebase_root=
force_rebase=
allow_rerere_autoupdate=
+cherry=$(git config --bool rebase.cherry)
continue_merge () {
test -n "$prev_head" || die "prev_head must be defined"
@@ -307,6 +308,12 @@ do
esac
do_merge=t
;;
+ --cherry)
+ cherry=true
+ ;;
+ --no-cherry)
+ cherry=false
+ ;;
-n|--no-stat)
diffstat=
;;
@@ -540,9 +547,16 @@ else
revisions="$upstream..$orig_head"
fi
+if test "x$cherry" = "xfalse"
+then
+ cherry_opt=""
+else
+ cherry_opt="--ignore-if-in-upstream"
+fi
+
if test -z "$do_merge"
then
- git format-patch -k --stdout --full-index --ignore-if-in-upstream \
+ git format-patch -k --stdout --full-index $cherry_opt \
$root_flag "$revisions" |
git am $git_am_opt --rebasing --resolvemsg="$RESOLVEMSG" &&
move_to_original_branch
--
1.7.2.rc1.212.g850a
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [RFC/PATCH] rebase: Allow to turn of ignore-if-in-upstream
2010-07-13 8:13 ` [RFC/PATCH] rebase: Allow to turn of ignore-if-in-upstream Michael J Gruber
@ 2010-07-13 19:33 ` Erik Faye-Lund
2010-09-04 15:03 ` Michael J Gruber
0 siblings, 1 reply; 7+ messages in thread
From: Erik Faye-Lund @ 2010-07-13 19:33 UTC (permalink / raw)
To: Michael J Gruber; +Cc: git, Marat Radchenko
s/of/off/ in the subject ;)
On Tue, Jul 13, 2010 at 10:13 AM, Michael J Gruber
<git@drmicha.warpmail.net> wrote:
> git-rebase uses "format-patch --ignore-if-in-upstream" do determine
> which commits to apply. This may or may not be desired: a user may want
> to transplant all commits, or may opt to avoid the possibly time
> consuming calculation of patch-ids.
>
> Therefore, introduce rebase.cherry (defaulting to true) and --cherry and
> --no-cherry options (to override the config), where --cherry means the
> current behavior and --no-cherry avoids "--ignore-if-in-upstream".
>
> Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net>
> ---
> RFC for obvious reasons (doc, tests).
--
Erik "kusma" Faye-Lund
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC/PATCH] rebase: Allow to turn of ignore-if-in-upstream
2010-07-13 19:33 ` Erik Faye-Lund
@ 2010-09-04 15:03 ` Michael J Gruber
2010-09-09 8:05 ` Marat Radchenko
0 siblings, 1 reply; 7+ messages in thread
From: Michael J Gruber @ 2010-09-04 15:03 UTC (permalink / raw)
To: kusmabite; +Cc: Erik Faye-Lund, git, Marat Radchenko, Junio C Hamano
Erik Faye-Lund venit, vidit, dixit 13.07.2010 21:33:
> s/of/off/ in the subject ;)
>
> On Tue, Jul 13, 2010 at 10:13 AM, Michael J Gruber
> <git@drmicha.warpmail.net> wrote:
>> git-rebase uses "format-patch --ignore-if-in-upstream" do determine
>> which commits to apply. This may or may not be desired: a user may want
>> to transplant all commits, or may opt to avoid the possibly time
>> consuming calculation of patch-ids.
>>
>> Therefore, introduce rebase.cherry (defaulting to true) and --cherry and
>> --no-cherry options (to override the config), where --cherry means the
>> current behavior and --no-cherry avoids "--ignore-if-in-upstream".
>>
>> Signed-off-by: Michael J Gruber <git@drmicha.warpmail.net>
>> ---
>> RFC for obvious reasons (doc, tests).
>
Pinging this one. Is there any interest? Erik is right, off course ;)
Michael
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [RFC/PATCH] rebase: Allow to turn of ignore-if-in-upstream
2010-09-04 15:03 ` Michael J Gruber
@ 2010-09-09 8:05 ` Marat Radchenko
0 siblings, 0 replies; 7+ messages in thread
From: Marat Radchenko @ 2010-09-09 8:05 UTC (permalink / raw)
To: Michael J Gruber, kusmabite; +Cc: Erik Faye-Lund, git, Junio C Hamano
> Pinging this one. Is there any interest? Erik is right, off course ;)
There definitely is. Since [1] rebasing became much faster (minutes instead of tens of minutes), though still it takes more than I'd like it to.
[1]: http://repo.or.cz/w/git.git/commit/34597c1f5a77c710dae33092cb8a7cb01c6b21c1
^ permalink raw reply [flat|nested] 7+ messages in thread
* [FEATURE REQUEST] allow enabling patience diff algorithm by default
2010-07-13 6:56 VERY slow git format-patch (tens on minutes) during rebase and rev-list during rebase -i Marat Radchenko
2010-07-13 8:12 ` Michael J Gruber
@ 2010-10-13 7:56 ` Marat Radchenko
1 sibling, 0 replies; 7+ messages in thread
From: Marat Radchenko @ 2010-10-13 7:56 UTC (permalink / raw)
To: git
I observe patience algorithm being several times faster than standard diff on
some big (1MB<size<10MB) text files (and, actually, it produces smaller
diffs). So using patience diff is likely to improve git-rev-list
performance.
Suggested way: add option to ~/.gitconfig to enable patience diff by
default. Additionally, smth like--no-patience may be added to commands that
accept --patience now so it is possible to override setting if needed.
--
View this message in context: http://git.661346.n2.nabble.com/VERY-slow-git-format-patch-tens-on-minutes-during-rebase-and-rev-list-during-rebase-i-tp5286226p5629926.html
Sent from the git mailing list archive at Nabble.com.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2010-10-13 7:57 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-07-13 6:56 VERY slow git format-patch (tens on minutes) during rebase and rev-list during rebase -i Marat Radchenko
2010-07-13 8:12 ` Michael J Gruber
2010-07-13 8:13 ` [RFC/PATCH] rebase: Allow to turn of ignore-if-in-upstream Michael J Gruber
2010-07-13 19:33 ` Erik Faye-Lund
2010-09-04 15:03 ` Michael J Gruber
2010-09-09 8:05 ` Marat Radchenko
2010-10-13 7:56 ` [FEATURE REQUEST] allow enabling patience diff algorithm by default Marat Radchenko
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).