Debugging git-commit slowness on a large repo

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Debugging git-commit slowness on a large repo
@ 2011-12-02 23:17 Joshua Redstone
  2011-12-03  0:23 ` Carlos Martín Nieto
  2011-12-04 13:54 ` Tomas Carnecky
  0 siblings, 2 replies; 16+ messages in thread
From: Joshua Redstone @ 2011-12-02 23:17 UTC (permalink / raw)
  To: git@vger.kernel.org

Hi,
I have a git repo with about 300k commits,  150k files totaling maybe 7GB.
 Locally committing a small change - say touching fewer than 300 bytes
across 4 files - consistently takes over one second, which seems kinda
slow.  This is using git 1.7.7.4 on a linux 2.6 box.  The time does not
improve after doing a git-gc (my .git dir has maybe 250 files after a git
gc).  The same size commit on a brand new repo takes < 10ms.  Any thoughts
on why committing a small change seems to take a long time on larger repos?

Fwiw, I also tried doing the same test using libgit2 (via the pygit2
wrapper), and it was ever slower (about 6 seconds to commit the same small
change).

Thanks for any thoughts or places to look.

Cheers,
Josh

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Debugging git-commit slowness on a large repo
  2011-12-02 23:17 Debugging git-commit slowness on a large repo Joshua Redstone
@ 2011-12-03  0:23 ` Carlos Martín Nieto
  2011-12-05 17:38   ` Junio C Hamano
  2011-12-07  1:48   ` Joshua Redstone
  2011-12-04 13:54 ` Tomas Carnecky
  1 sibling, 2 replies; 16+ messages in thread
From: Carlos Martín Nieto @ 2011-12-03  0:23 UTC (permalink / raw)
  To: Joshua Redstone; +Cc: git@vger.kernel.org

[-- Attachment #1: Type: text/plain, Size: 2071 bytes --]

On Fri, Dec 02, 2011 at 11:17:10PM +0000, Joshua Redstone wrote:
> Hi,
> I have a git repo with about 300k commits,  150k files totaling maybe 7GB.
>  Locally committing a small change - say touching fewer than 300 bytes
> across 4 files - consistently takes over one second, which seems kinda
> slow.  This is using git 1.7.7.4 on a linux 2.6 box.  The time does not
> improve after doing a git-gc (my .git dir has maybe 250 files after a git
> gc).  The same size commit on a brand new repo takes < 10ms.  Any thoughts
> on why committing a small change seems to take a long time on larger repos?

By "same size commit" do you mean the same amount of changes, or the
same amount of files? Committing doesn't depend on the size of the
repo (by itself), but on the size of the index, which depends on the
amount of files to be committed (as git is snapshot-based). At one
point, commit forgot how to write the tree cache to the index (a
performance optimisation). Do the times improve if you run 'git
read-tree HEAD' between one commit and another? Note that this will
reset the index to the last commit, though for the tests I image you
use some variation of 'git commit -a'.

Thomas Rast wrote a patch to re-teach commit to store the tree cache,
but there were some issues and never got applied.

> 
> Fwiw, I also tried doing the same test using libgit2 (via the pygit2
> wrapper), and it was ever slower (about 6 seconds to commit the same small
> change).

I don't know about the python bindings, but on the (somewhat
unscientific) tests for libgit2's write-tree (the slow part of a
creating a commit), it performs slightly faster than git's (though I
think git's write-tree does update the tree cache, which libgit2
doesn't currently). The speed could just be a side-effect of the small
test repo. From your domain, I assume the data is not for public
consumption, but it'd be great if you could post your code to pygit2's
issue tracker so we can see how much of the slowdown comes from the
bindings or the library.

   cmn

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Debugging git-commit slowness on a large repo
  2011-12-03  0:23 ` Carlos Martín Nieto
@ 2011-12-05 17:38   ` Junio C Hamano
  2011-12-07  1:48   ` Joshua Redstone
  1 sibling, 0 replies; 16+ messages in thread
From: Junio C Hamano @ 2011-12-05 17:38 UTC (permalink / raw)
  To: Carlos Martín Nieto, Thomas Rast
  Cc: Joshua Redstone, git@vger.kernel.org

Carlos Martín Nieto <cmn@elego.de> writes:

> ... At one
> point, commit forgot how to write the tree cache to the index (a
> performance optimisation). Do the times improve if you run 'git
> read-tree HEAD' between one commit and another? Note that this will
> reset the index to the last commit, though for the tests I image you
> use some variation of 'git commit -a'.
>
> Thomas Rast wrote a patch to re-teach commit to store the tree cache,
> but there were some issues and never got applied.

Ahh, I forgot all about that exchange.

  http://thread.gmane.org/gmane.comp.version-control.git/178480/focus=178515

The cache-tree mechanism has traditionally been one of the more important
optimizations and it would be very nice if we can resurrect the behaviour
for "git commit" too.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Debugging git-commit slowness on a large repo
  2011-12-03  0:23 ` Carlos Martín Nieto
  2011-12-05 17:38   ` Junio C Hamano
@ 2011-12-07  1:48   ` Joshua Redstone
  2011-12-07  2:08     ` Nguyen Thai Ngoc Duy
  1 sibling, 1 reply; 16+ messages in thread
From: Joshua Redstone @ 2011-12-07  1:48 UTC (permalink / raw)
  To: Carlos Martín Nieto, Tomas Carnecky, Junio C Hamano
  Cc: git@vger.kernel.org

Hi Carlos and Tomas and Junio,

@Tomas, I tried adding the '--no-status' flag to 'git commit' and it sped
things up by maybe 15%, but commits still take a second.

@Carlos, by "same size", I mean roughly the same number of files and
number of bytes modified in each file.  In all experiments, it's less than
5 files modified per commit with changes totaling fewer than 10 KB, often
more like 1 KB.  I actually wrote a test script to generate commits,
customized for the stats on the repo I'm using.  It repeatedly generates
some changes, does 'git add [ list of files changed ]' followed by 'git
commit --no-status -m [ msg ]'.   It generates changes by picking fewer
than 5 files at random, modifying two 100-byte regions in each file, and
occasionally creates a new file of about 1 KB.  If it helps, I can
probably post the test script I've been using.

I tried doing a 'git read-tree HEAD' before each 'git add ; git commit'
iteration, and the time for git-commit jumped from about 1 second to about
8 seconds.  That is a pretty dramatic slowdown.  Any idea why?  I wonder
if that's related to the overall commit slowness.

@Carlos and/or @Junio, can you point me at any docs/code to understand
what a tree-cache is and how it differs from the index?  I did a google
search for [git tree-cache index], but nothing popped out.

Cheers,
Josh

On 12/2/11 4:23 PM, "Carlos Martín Nieto" <cmn@elego.de> wrote:

>On Fri, Dec 02, 2011 at 11:17:10PM +0000, Joshua Redstone wrote:
>> Hi,
>> I have a git repo with about 300k commits,  150k files totaling maybe
>>7GB.
>>  Locally committing a small change - say touching fewer than 300 bytes
>> across 4 files - consistently takes over one second, which seems kinda
>> slow.  This is using git 1.7.7.4 on a linux 2.6 box.  The time does not
>> improve after doing a git-gc (my .git dir has maybe 250 files after a
>>git
>> gc).  The same size commit on a brand new repo takes < 10ms.  Any
>>thoughts
>> on why committing a small change seems to take a long time on larger
>>repos?
>
>By "same size commit" do you mean the same amount of changes, or the
>same amount of files? Committing doesn't depend on the size of the
>repo (by itself), but on the size of the index, which depends on the
>amount of files to be committed (as git is snapshot-based). At one
>point, commit forgot how to write the tree cache to the index (a
>performance optimisation). Do the times improve if you run 'git
>read-tree HEAD' between one commit and another? Note that this will
>reset the index to the last commit, though for the tests I image you
>use some variation of 'git commit -a'.
>
>Thomas Rast wrote a patch to re-teach commit to store the tree cache,
>but there were some issues and never got applied.
>
>> 
>> Fwiw, I also tried doing the same test using libgit2 (via the pygit2
>> wrapper), and it was ever slower (about 6 seconds to commit the same
>>small
>> change).
>
>I don't know about the python bindings, but on the (somewhat
>unscientific) tests for libgit2's write-tree (the slow part of a
>creating a commit), it performs slightly faster than git's (though I
>think git's write-tree does update the tree cache, which libgit2
>doesn't currently). The speed could just be a side-effect of the small
>test repo. From your domain, I assume the data is not for public
>consumption, but it'd be great if you could post your code to pygit2's
>issue tracker so we can see how much of the slowdown comes from the
>bindings or the library.
>
>   cmn
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Debugging git-commit slowness on a large repo
  2011-12-07  1:48   ` Joshua Redstone
@ 2011-12-07  2:08     ` Nguyen Thai Ngoc Duy
  2011-12-07 22:48       ` Joshua Redstone
  0 siblings, 1 reply; 16+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2011-12-07  2:08 UTC (permalink / raw)
  To: Joshua Redstone
  Cc: Carlos Martín Nieto, Tomas Carnecky, Junio C Hamano,
	git@vger.kernel.org

On Wed, Dec 7, 2011 at 8:48 AM, Joshua Redstone <joshua.redstone@fb.com> wrote:
> I tried doing a 'git read-tree HEAD' before each 'git add ; git commit'
> iteration, and the time for git-commit jumped from about 1 second to about
> 8 seconds.  That is a pretty dramatic slowdown.  Any idea why?  I wonder
> if that's related to the overall commit slowness.

How big is your working directory? "git ls-files | wc -l" should show
it. Try "git read-tree HEAD; git add; git write-tree" and see if the
write-tree part takes as much time as commit. write-tree is mainly
about cache-tree generation.

> @Carlos and/or @Junio, can you point me at any docs/code to understand
> what a tree-cache is and how it differs from the index?  I did a google
> search for [git tree-cache index], but nothing popped out.

Have a look at Documentation/technical/index-format.txt. Cache tree
extension is near the end.
-- 
Duy

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Debugging git-commit slowness on a large repo
  2011-12-07  2:08     ` Nguyen Thai Ngoc Duy
@ 2011-12-07 22:48       ` Joshua Redstone
  2011-12-08  1:39         ` Nguyen Thai Ngoc Duy
  0 siblings, 1 reply; 16+ messages in thread
From: Joshua Redstone @ 2011-12-07 22:48 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy
  Cc: Carlos Martín Nieto, Tomas Carnecky, Junio C Hamano,
	git@vger.kernel.org

Hi Duy,
Thanks for the documentation link.

git ls-files shows 100k files, which matches # of files in the working
tree ('find . -type f -print | wc -l').

I added a 'git read-tree HEAD' before the git-add, and a 'git write-tree'
after the add.  With that, the commit time slowed down to 8 seconds per
commit, plus 4 more seconds for the read-tree/add/write-tree ops.  The
read-tree/add/write-tree each took about a second.

As an experiment, I also tried removing the 'git read-tree' and just
having the git-write-tree.  That sped up commits to 0.6 seconds, but the
overall time for add/write-tree/commit was still 3 to 6 seconds.

For comparison, without the read-tree and write-tree, commits take about 1
second and add/commit in total takes about 2 seconds.

It surprises me that the presence of git read-tree or write-tree would
slow things down so much.

Josh

On 12/6/11 6:08 PM, "Nguyen Thai Ngoc Duy" <pclouds@gmail.com> wrote:

>On Wed, Dec 7, 2011 at 8:48 AM, Joshua Redstone <joshua.redstone@fb.com>
>wrote:
>> I tried doing a 'git read-tree HEAD' before each 'git add ; git commit'
>> iteration, and the time for git-commit jumped from about 1 second to
>>about
>> 8 seconds.  That is a pretty dramatic slowdown.  Any idea why?  I wonder
>> if that's related to the overall commit slowness.
>
>How big is your working directory? "git ls-files | wc -l" should show
>it. Try "git read-tree HEAD; git add; git write-tree" and see if the
>write-tree part takes as much time as commit. write-tree is mainly
>about cache-tree generation.
>
>> @Carlos and/or @Junio, can you point me at any docs/code to understand
>> what a tree-cache is and how it differs from the index?  I did a google
>> search for [git tree-cache index], but nothing popped out.
>
>Have a look at Documentation/technical/index-format.txt. Cache tree
>extension is near the end.
>-- 
>Duy

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Debugging git-commit slowness on a large repo
  2011-12-07 22:48       ` Joshua Redstone
@ 2011-12-08  1:39         ` Nguyen Thai Ngoc Duy
  2011-12-09  0:09           ` Joshua Redstone
  0 siblings, 1 reply; 16+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2011-12-08  1:39 UTC (permalink / raw)
  To: Joshua Redstone
  Cc: Carlos Martín Nieto, Tomas Carnecky, Junio C Hamano,
	git@vger.kernel.org

On Thu, Dec 8, 2011 at 5:48 AM, Joshua Redstone <joshua.redstone@fb.com> wrote:
> Hi Duy,
> Thanks for the documentation link.
>
> git ls-files shows 100k files, which matches # of files in the working
> tree ('find . -type f -print | wc -l').

Any chance you can split it into smaller repositories, or remove files
from working directory (e.g. if you store logs, you don't have to keep
logs from all time in working directory, they can be retrieved from
history).

> I added a 'git read-tree HEAD' before the git-add, and a 'git write-tree'
> after the add.  With that, the commit time slowed down to 8 seconds per
> commit, plus 4 more seconds for the read-tree/add/write-tree ops.  The
> read-tree/add/write-tree each took about a second.

read-tree destroys stat info in index, refreshing 100k entries in
index in this case may take some time. Try this to see if commit time
reduces and how much time update-index takes

read-tree HEAD
update-index --refresh
add ....
write-tree
commit -q

> As an experiment, I also tried removing the 'git read-tree' and just
> having the git-write-tree.  That sped up commits to 0.6 seconds, but the
> overall time for add/write-tree/commit was still 3 to 6 seconds.

overall time is not really important because we duplicate work here
(write-tree is done as part of commit again). What I'm trying to do is
to determine how much time each operation in commit may take.
-- 
Duy

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Debugging git-commit slowness on a large repo
  2011-12-08  1:39         ` Nguyen Thai Ngoc Duy
@ 2011-12-09  0:09           ` Joshua Redstone
  2011-12-09  0:17             ` Joshua Redstone
  0 siblings, 1 reply; 16+ messages in thread
From: Joshua Redstone @ 2011-12-09  0:09 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy
  Cc: Carlos Martín Nieto, Tomas Carnecky, Junio C Hamano,
	git@vger.kernel.org

On 12/7/11 5:39 PM, "Nguyen Thai Ngoc Duy" <pclouds@gmail.com> wrote:

>On Thu, Dec 8, 2011 at 5:48 AM, Joshua Redstone <joshua.redstone@fb.com>
>wrote:
>> Hi Duy,
>> Thanks for the documentation link.
>>
>> git ls-files shows 100k files, which matches # of files in the working
>> tree ('find . -type f -print | wc -l').
>
>Any chance you can split it into smaller repositories, or remove files
>from working directory (e.g. if you store logs, you don't have to keep
>logs from all time in working directory, they can be retrieved from
>history).

It's not really feasible to split it into smaller repositories.  In fact,
we're expecting it to grow between 3x and 5x in number of files and number
of commits.

>
>> I added a 'git read-tree HEAD' before the git-add, and a 'git
>>write-tree'
>> after the add.  With that, the commit time slowed down to 8 seconds per
>> commit, plus 4 more seconds for the read-tree/add/write-tree ops.  The
>> read-tree/add/write-tree each took about a second.
>
>read-tree destroys stat info in index, refreshing 100k entries in
>index in this case may take some time. Try this to see if commit time
>reduces and how much time update-index takes
>
>read-tree HEAD
>update-index --refresh
>add ....
>write-tree
>commit -q

I added the "update-index --refresh" and the time for commit became more
like 0.6 seconds.
In this setup: read-tree takes ~2 seconds, update-index takes ~8 seconds,
git-add takes 1 to 4 seconds, and write-tree takes less than 1 second.

>
>> As an experiment, I also tried removing the 'git read-tree' and just
>> having the git-write-tree.  That sped up commits to 0.6 seconds, but the
>> overall time for add/write-tree/commit was still 3 to 6 seconds.
>
>overall time is not really important because we duplicate work here
>(write-tree is done as part of commit again). What I'm trying to do is
>to determine how much time each operation in commit may take.
>-- 
>Duy

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Debugging git-commit slowness on a large repo
  2011-12-09  0:09           ` Joshua Redstone
@ 2011-12-09  0:17             ` Joshua Redstone
  2011-12-13  0:15               ` Joshua Redstone
  0 siblings, 1 reply; 16+ messages in thread
From: Joshua Redstone @ 2011-12-09  0:17 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy
  Cc: Carlos Martín Nieto, Tomas Carnecky, Junio C Hamano,
	git@vger.kernel.org

Btw, I also tried doing some very poor-man's profiling on "git commit"
without any of the readtree/writetree/updateindex commands.

Around 50% of the time was in (bottom few frames may have varied)

#1  0x00000000004c467e in find_pack_entry (sha1=0x1475a44 ,
e=0x7fff2621f070) at sha1_file.c:2027
#2  0x00000000004c57b0 in has_sha1_file (sha1=0x7fe2cd9c7900 "00") at
sha1_file.c:2567   
                   
                 
#3  0x000000000046e4af in update_one (it=<value optimized out>,
cache=<value optimized out>, entries=<value optimized out>, base=<value
optimized out>, baselen=<value optimized out>, missing_ok=<value optimized
out>, dryrun=0) at cache-\
tree.c:333         
                   
                   
            
#4  0x000000000046e278 in update_one (it=<value optimized out>,
cache=<value optimized out>, entries=<value optimized out>, base=<value
optimized out>, baselen=<value optimized out>, missing_ok=<value optimized
out>, dryrun=0) at cache-\
tree.c:285         
                   
                   
            
#5  0x000000000046e278 in update_one (it=<value optimized out>,
cache=<value optimized out>, entries=<value optimized out>, base=<value
optimized out>, baselen=<value optimized out>, missing_ok=<value optimized
out>, dryrun=0) at cache-\
tree.c:285         
                   
                   
            
#6  0x000000000046e278 in update_one (it=<value optimized out>,
cache=<value optimized out>, entries=<value optimized out>, base=<value
optimized out>, baselen=<value optimized out>, missing_ok=<value optimized
out>, dryrun=0) at cache-\
tree.c:285         
                   
                   
            
#7  0x000000000046e278 in update_one (it=<value optimized out>,
cache=<value optimized out>, entries=<value optimized out>, base=<value
optimized out>, baselen=<value optimized out>, missing_ok=<value optimized
out>, dryrun=0) at cache-\
tree.c:285         
                   
                   
            
#8  0x000000000046e278 in update_one (it=<value optimized out>,
cache=<value optimized out>, entries=<value optimized out>, base=<value
optimized out>, baselen=<value optimized out>, missing_ok=<value optimized
out>, dryrun=0) at cache-\
tree.c:285         
                   
                   
            
#9  0x000000000046e869 in cache_tree_update (it=<value optimized out>,
cache=<value optimized out>, entries=dwarf2_read_address: Corrupted DWARF
expression.        
                 
) at cache-tree.c:379
                   
                   
            
#10 0x000000000041cade in prepare_to_commit (index_file=0x781740
".git/index", prefix=<value optimized out>, current_head=<value optimized
out>, s=0x7fff26220d00, author_ident=<value optimized out>) at
builtin/commit.c:866
#11 0x000000000041d891 in cmd_commit (argc=0, argv=0x7fff262213a0,
prefix=0x0) at builtin/commit.c:1407
                   
                   
#12 0x0000000000404bf7 in handle_internal_command (argc=4,
argv=0x7fff262213a0) at git.c:308
                   
                   
#13 0x0000000000404e2f in main (argc=4, argv=0x7fff262213a0) at git.c:512
                   
                   
            
 


And 30% of the time was in:

#0  0x00000034af2c34a5 in _lxstat () from /lib64/libc.so.6
                   
                   
            
#1  0x00000000004abe0f in refresh_cache_ent (istate=0x780940,
ce=0x7f8462a34e40, options=0, err=0x7fff6dd9f588) at
/usr/include/sys/stat.h:443
                   
#2  0x00000000004ac1a0 in refresh_index (istate=0x780940, flags=<value
optimized out>, pathspec=<value optimized out>, seen=<value optimized
out>, header_msg=0x0) at read-cache.c:1133
                   
#3  0x000000000041b60a in refresh_cache_or_die (refresh_flags=<value
optimized out>) at builtin/commit.c:331
                   
                  
#4  0x000000000041bc39 in prepare_index (argc=0, argv=0x7fff6dda0310,
prefix=0x0, current_head=<value optimized out>, is_status=<value optimized
out>) at builtin/commit.c:414
                 
#5  0x000000000041d878 in cmd_commit (argc=0, argv=0x7fff6dda0310,
prefix=0x0) at builtin/commit.c:1403
                   
                   
  


Josh


On 12/8/11 4:09 PM, "Joshua Redstone" <joshua.redstone@fb.com> wrote:

>On 12/7/11 5:39 PM, "Nguyen Thai Ngoc Duy" <pclouds@gmail.com> wrote:
>
>>On Thu, Dec 8, 2011 at 5:48 AM, Joshua Redstone <joshua.redstone@fb.com>
>>wrote:
>>> Hi Duy,
>>> Thanks for the documentation link.
>>>
>>> git ls-files shows 100k files, which matches # of files in the working
>>> tree ('find . -type f -print | wc -l').
>>
>>Any chance you can split it into smaller repositories, or remove files
>>from working directory (e.g. if you store logs, you don't have to keep
>>logs from all time in working directory, they can be retrieved from
>>history).
>
>It's not really feasible to split it into smaller repositories.  In fact,
>we're expecting it to grow between 3x and 5x in number of files and number
>of commits.
>
>>
>>> I added a 'git read-tree HEAD' before the git-add, and a 'git
>>>write-tree'
>>> after the add.  With that, the commit time slowed down to 8 seconds per
>>> commit, plus 4 more seconds for the read-tree/add/write-tree ops.  The
>>> read-tree/add/write-tree each took about a second.
>>
>>read-tree destroys stat info in index, refreshing 100k entries in
>>index in this case may take some time. Try this to see if commit time
>>reduces and how much time update-index takes
>>
>>read-tree HEAD
>>update-index --refresh
>>add ....
>>write-tree
>>commit -q
>
>I added the "update-index --refresh" and the time for commit became more
>like 0.6 seconds.
>In this setup: read-tree takes ~2 seconds, update-index takes ~8 seconds,
>git-add takes 1 to 4 seconds, and write-tree takes less than 1 second.
>
>>
>>> As an experiment, I also tried removing the 'git read-tree' and just
>>> having the git-write-tree.  That sped up commits to 0.6 seconds, but
>>>the
>>> overall time for add/write-tree/commit was still 3 to 6 seconds.
>>
>>overall time is not really important because we duplicate work here
>>(write-tree is done as part of commit again). What I'm trying to do is
>>to determine how much time each operation in commit may take.
>>-- 
>>Duy
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Debugging git-commit slowness on a large repo
  2011-12-09  0:17             ` Joshua Redstone
@ 2011-12-13  0:15               ` Joshua Redstone
  2011-12-20  0:51                 ` Joshua Redstone
  0 siblings, 1 reply; 16+ messages in thread
From: Joshua Redstone @ 2011-12-13  0:15 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy, Carlos Martín Nieto, Tomas Carnecky,
	Junio C Hamano
  Cc: git@vger.kernel.org

Sorry for the poor formatting of the stack trace.

I've written two scripts to reproduce the slow commit behavior that I see.
 I've posted both to:
   https://gist.github.com/1469760

To repro, first create a dir with lots of files (it defaults to creating 1
million files in 1000 dirs):

$ loadGen.py --baseDir=./bigdir

then, run the simulator scripts to generate and commit a series of small
changes to the repo:

$ git reset --hard HEAD && simulate.py ./bigdir git

The git reset is to clean up any cruft left over from a previous partial
invocation of simulate.py

Note that loadGen.py defaults to creating 1 million files and committing
them in one commit.  With a flash drive this took < 30 min, and subsequent
small commits in simulate.py took about 6 seconds.  With a hard-drive,
it's taking > 1hr (still waiting for it to finish).

Cheers,
Josh


On 12/8/11 4:17 PM, "Joshua Redstone" <joshua.redstone@fb.com> wrote:

>Btw, I also tried doing some very poor-man's profiling on "git commit"
>without any of the readtree/writetree/updateindex commands.
>
>Around 50% of the time was in (bottom few frames may have varied)
>
>#1  0x00000000004c467e in find_pack_entry (sha1=0x1475a44 ,
>e=0x7fff2621f070) at sha1_file.c:2027
>#2  0x00000000004c57b0 in has_sha1_file (sha1=0x7fe2cd9c7900 "00") at
>sha1_file.c:2567  
>                  
>                 
>#3  0x000000000046e4af in update_one (it=<value optimized out>,
>cache=<value optimized out>, entries=<value optimized out>, base=<value
>optimized out>, baselen=<value optimized out>, missing_ok=<value optimized
>out>, dryrun=0) at cache-\
>tree.c:333        
>                  
>                  
>            
>#4  0x000000000046e278 in update_one (it=<value optimized out>,
>cache=<value optimized out>, entries=<value optimized out>, base=<value
>optimized out>, baselen=<value optimized out>, missing_ok=<value optimized
>out>, dryrun=0) at cache-\
>tree.c:285        
>                  
>                  
>            
>#5  0x000000000046e278 in update_one (it=<value optimized out>,
>cache=<value optimized out>, entries=<value optimized out>, base=<value
>optimized out>, baselen=<value optimized out>, missing_ok=<value optimized
>out>, dryrun=0) at cache-\
>tree.c:285        
>                  
>                  
>            
>#6  0x000000000046e278 in update_one (it=<value optimized out>,
>cache=<value optimized out>, entries=<value optimized out>, base=<value
>optimized out>, baselen=<value optimized out>, missing_ok=<value optimized
>out>, dryrun=0) at cache-\
>tree.c:285        
>                  
>                  
>            
>#7  0x000000000046e278 in update_one (it=<value optimized out>,
>cache=<value optimized out>, entries=<value optimized out>, base=<value
>optimized out>, baselen=<value optimized out>, missing_ok=<value optimized
>out>, dryrun=0) at cache-\
>tree.c:285        
>                  
>                  
>            
>#8  0x000000000046e278 in update_one (it=<value optimized out>,
>cache=<value optimized out>, entries=<value optimized out>, base=<value
>optimized out>, baselen=<value optimized out>, missing_ok=<value optimized
>out>, dryrun=0) at cache-\
>tree.c:285        
>                  
>                  
>            
>#9  0x000000000046e869 in cache_tree_update (it=<value optimized out>,
>cache=<value optimized out>, entries=dwarf2_read_address: Corrupted DWARF
>expression.       
>                 
>) at cache-tree.c:379
>                  
>                  
>            
>#10 0x000000000041cade in prepare_to_commit (index_file=0x781740
>".git/index", prefix=<value optimized out>, current_head=<value optimized
>out>, s=0x7fff26220d00, author_ident=<value optimized out>) at
>builtin/commit.c:866
>#11 0x000000000041d891 in cmd_commit (argc=0, argv=0x7fff262213a0,
>prefix=0x0) at builtin/commit.c:1407
>                  
>                  
>#12 0x0000000000404bf7 in handle_internal_command (argc=4,
>argv=0x7fff262213a0) at git.c:308
>                  
>                  
>#13 0x0000000000404e2f in main (argc=4, argv=0x7fff262213a0) at git.c:512
>                  
>                  
>            
> 
>
>
>And 30% of the time was in:
>
>#0  0x00000034af2c34a5 in _lxstat () from /lib64/libc.so.6
>                  
>                  
>            
>#1  0x00000000004abe0f in refresh_cache_ent (istate=0x780940,
>ce=0x7f8462a34e40, options=0, err=0x7fff6dd9f588) at
>/usr/include/sys/stat.h:443
>                  
>#2  0x00000000004ac1a0 in refresh_index (istate=0x780940, flags=<value
>optimized out>, pathspec=<value optimized out>, seen=<value optimized
>out>, header_msg=0x0) at read-cache.c:1133
>                  
>#3  0x000000000041b60a in refresh_cache_or_die (refresh_flags=<value
>optimized out>) at builtin/commit.c:331
>                  
>                  
>#4  0x000000000041bc39 in prepare_index (argc=0, argv=0x7fff6dda0310,
>prefix=0x0, current_head=<value optimized out>, is_status=<value optimized
>out>) at builtin/commit.c:414
>                 
>#5  0x000000000041d878 in cmd_commit (argc=0, argv=0x7fff6dda0310,
>prefix=0x0) at builtin/commit.c:1403
>                  
>                  
>  
>
>
>Josh
>
>
>On 12/8/11 4:09 PM, "Joshua Redstone" <joshua.redstone@fb.com> wrote:
>
>>On 12/7/11 5:39 PM, "Nguyen Thai Ngoc Duy" <pclouds@gmail.com> wrote:
>>
>>>On Thu, Dec 8, 2011 at 5:48 AM, Joshua Redstone <joshua.redstone@fb.com>
>>>wrote:
>>>> Hi Duy,
>>>> Thanks for the documentation link.
>>>>
>>>> git ls-files shows 100k files, which matches # of files in the working
>>>> tree ('find . -type f -print | wc -l').
>>>
>>>Any chance you can split it into smaller repositories, or remove files
>>>from working directory (e.g. if you store logs, you don't have to keep
>>>logs from all time in working directory, they can be retrieved from
>>>history).
>>
>>It's not really feasible to split it into smaller repositories.  In fact,
>>we're expecting it to grow between 3x and 5x in number of files and
>>number
>>of commits.
>>
>>>
>>>> I added a 'git read-tree HEAD' before the git-add, and a 'git
>>>>write-tree'
>>>> after the add.  With that, the commit time slowed down to 8 seconds
>>>>per
>>>> commit, plus 4 more seconds for the read-tree/add/write-tree ops.  The
>>>> read-tree/add/write-tree each took about a second.
>>>
>>>read-tree destroys stat info in index, refreshing 100k entries in
>>>index in this case may take some time. Try this to see if commit time
>>>reduces and how much time update-index takes
>>>
>>>read-tree HEAD
>>>update-index --refresh
>>>add ....
>>>write-tree
>>>commit -q
>>
>>I added the "update-index --refresh" and the time for commit became more
>>like 0.6 seconds.
>>In this setup: read-tree takes ~2 seconds, update-index takes ~8 seconds,
>>git-add takes 1 to 4 seconds, and write-tree takes less than 1 second.
>>
>>>
>>>> As an experiment, I also tried removing the 'git read-tree' and just
>>>> having the git-write-tree.  That sped up commits to 0.6 seconds, but
>>>>the
>>>> overall time for add/write-tree/commit was still 3 to 6 seconds.
>>>
>>>overall time is not really important because we duplicate work here
>>>(write-tree is done as part of commit again). What I'm trying to do is
>>>to determine how much time each operation in commit may take.
>>>-- 
>>>Duy
>>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Debugging git-commit slowness on a large repo
  2011-12-13  0:15               ` Joshua Redstone
@ 2011-12-20  0:51                 ` Joshua Redstone
  2011-12-20  1:21                   ` Junio C Hamano
  0 siblings, 1 reply; 16+ messages in thread
From: Joshua Redstone @ 2011-12-20  0:51 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy, Carlos Martín Nieto, Tomas Carnecky,
	Junio C Hamano
  Cc: git@vger.kernel.org

I've managed to speed up git-commit on large repos by 4x by removing some
safeguards that caused git to stat every file in the repo on commits that
touch a small number of files.  The diff, for illustrative purposes only,
is at:

    https://gist.github.com/1499621


With a repo with 1 million files (but few commits), the diff drops the
commit time down from 7.3 seconds to 1.8 seconds, a 75% decrease. The
optimizations are:

1. Remove call to refresh_cache_or_die that stats every file in the repo,
i think the purpose is to detect any changes between git-add and
git-commit.

2. Pass missing_ok=true to cache_tree_update. This causes the tree
generation code to not stat every file in the repo to verify it still
exists as a git object.

3. Remove pair discard_cache/read_cache_from, which rereads the index
file. I think this was in case a pre-commit hook changed the set of things
being committed.

It may be worth making some of these flag-enabled.



Josh


On 12/12/11 4:15 PM, "Joshua Redstone" <joshua.redstone@fb.com> wrote:

>Sorry for the poor formatting of the stack trace.
>
>I've written two scripts to reproduce the slow commit behavior that I see.
> I've posted both to:
>   https://gist.github.com/1469760
>
>To repro, first create a dir with lots of files (it defaults to creating 1
>million files in 1000 dirs):
>
>$ loadGen.py --baseDir=./bigdir
>
>then, run the simulator scripts to generate and commit a series of small
>changes to the repo:
>
>$ git reset --hard HEAD && simulate.py ./bigdir git
>
>The git reset is to clean up any cruft left over from a previous partial
>invocation of simulate.py
>
>Note that loadGen.py defaults to creating 1 million files and committing
>them in one commit.  With a flash drive this took < 30 min, and subsequent
>small commits in simulate.py took about 6 seconds.  With a hard-drive,
>it's taking > 1hr (still waiting for it to finish).
>
>Cheers,
>Josh
>
>
>On 12/8/11 4:17 PM, "Joshua Redstone" <joshua.redstone@fb.com> wrote:
>
>>Btw, I also tried doing some very poor-man's profiling on "git commit"
>>without any of the readtree/writetree/updateindex commands.
>>
>>Around 50% of the time was in (bottom few frames may have varied)
>>
>>#1  0x00000000004c467e in find_pack_entry (sha1=0x1475a44 ,
>>e=0x7fff2621f070) at sha1_file.c:2027
>>#2  0x00000000004c57b0 in has_sha1_file (sha1=0x7fe2cd9c7900 "00") at
>>sha1_file.c:2567 
>>                 
>>                 
>>#3  0x000000000046e4af in update_one (it=<value optimized out>,
>>cache=<value optimized out>, entries=<value optimized out>, base=<value
>>optimized out>, baselen=<value optimized out>, missing_ok=<value
>>optimized
>>out>, dryrun=0) at cache-\
>>tree.c:333       
>>                 
>>                 
>>            
>>#4  0x000000000046e278 in update_one (it=<value optimized out>,
>>cache=<value optimized out>, entries=<value optimized out>, base=<value
>>optimized out>, baselen=<value optimized out>, missing_ok=<value
>>optimized
>>out>, dryrun=0) at cache-\
>>tree.c:285       
>>                 
>>                 
>>            
>>#5  0x000000000046e278 in update_one (it=<value optimized out>,
>>cache=<value optimized out>, entries=<value optimized out>, base=<value
>>optimized out>, baselen=<value optimized out>, missing_ok=<value
>>optimized
>>out>, dryrun=0) at cache-\
>>tree.c:285       
>>                 
>>                 
>>            
>>#6  0x000000000046e278 in update_one (it=<value optimized out>,
>>cache=<value optimized out>, entries=<value optimized out>, base=<value
>>optimized out>, baselen=<value optimized out>, missing_ok=<value
>>optimized
>>out>, dryrun=0) at cache-\
>>tree.c:285       
>>                 
>>                 
>>            
>>#7  0x000000000046e278 in update_one (it=<value optimized out>,
>>cache=<value optimized out>, entries=<value optimized out>, base=<value
>>optimized out>, baselen=<value optimized out>, missing_ok=<value
>>optimized
>>out>, dryrun=0) at cache-\
>>tree.c:285       
>>                 
>>                 
>>            
>>#8  0x000000000046e278 in update_one (it=<value optimized out>,
>>cache=<value optimized out>, entries=<value optimized out>, base=<value
>>optimized out>, baselen=<value optimized out>, missing_ok=<value
>>optimized
>>out>, dryrun=0) at cache-\
>>tree.c:285       
>>                 
>>                 
>>            
>>#9  0x000000000046e869 in cache_tree_update (it=<value optimized out>,
>>cache=<value optimized out>, entries=dwarf2_read_address: Corrupted DWARF
>>expression.      
>>                 
>>) at cache-tree.c:379
>>                 
>>                 
>>            
>>#10 0x000000000041cade in prepare_to_commit (index_file=0x781740
>>".git/index", prefix=<value optimized out>, current_head=<value optimized
>>out>, s=0x7fff26220d00, author_ident=<value optimized out>) at
>>builtin/commit.c:866
>>#11 0x000000000041d891 in cmd_commit (argc=0, argv=0x7fff262213a0,
>>prefix=0x0) at builtin/commit.c:1407
>>                 
>>                 
>>#12 0x0000000000404bf7 in handle_internal_command (argc=4,
>>argv=0x7fff262213a0) at git.c:308
>>                 
>>                 
>>#13 0x0000000000404e2f in main (argc=4, argv=0x7fff262213a0) at git.c:512
>>                 
>>                 
>>            
>> 
>>
>>
>>And 30% of the time was in:
>>
>>#0  0x00000034af2c34a5 in _lxstat () from /lib64/libc.so.6
>>                 
>>                 
>>            
>>#1  0x00000000004abe0f in refresh_cache_ent (istate=0x780940,
>>ce=0x7f8462a34e40, options=0, err=0x7fff6dd9f588) at
>>/usr/include/sys/stat.h:443
>>                 
>>#2  0x00000000004ac1a0 in refresh_index (istate=0x780940, flags=<value
>>optimized out>, pathspec=<value optimized out>, seen=<value optimized
>>out>, header_msg=0x0) at read-cache.c:1133
>>                 
>>#3  0x000000000041b60a in refresh_cache_or_die (refresh_flags=<value
>>optimized out>) at builtin/commit.c:331
>>                 
>>                 
>>#4  0x000000000041bc39 in prepare_index (argc=0, argv=0x7fff6dda0310,
>>prefix=0x0, current_head=<value optimized out>, is_status=<value
>>optimized
>>out>) at builtin/commit.c:414
>>                 
>>#5  0x000000000041d878 in cmd_commit (argc=0, argv=0x7fff6dda0310,
>>prefix=0x0) at builtin/commit.c:1403
>>                 
>>                 
>>  
>>
>>
>>Josh
>>
>>
>>On 12/8/11 4:09 PM, "Joshua Redstone" <joshua.redstone@fb.com> wrote:
>>
>>>On 12/7/11 5:39 PM, "Nguyen Thai Ngoc Duy" <pclouds@gmail.com> wrote:
>>>
>>>>On Thu, Dec 8, 2011 at 5:48 AM, Joshua Redstone
>>>><joshua.redstone@fb.com>
>>>>wrote:
>>>>> Hi Duy,
>>>>> Thanks for the documentation link.
>>>>>
>>>>> git ls-files shows 100k files, which matches # of files in the
>>>>>working
>>>>> tree ('find . -type f -print | wc -l').
>>>>
>>>>Any chance you can split it into smaller repositories, or remove files
>>>>from working directory (e.g. if you store logs, you don't have to keep
>>>>logs from all time in working directory, they can be retrieved from
>>>>history).
>>>
>>>It's not really feasible to split it into smaller repositories.  In
>>>fact,
>>>we're expecting it to grow between 3x and 5x in number of files and
>>>number
>>>of commits.
>>>
>>>>
>>>>> I added a 'git read-tree HEAD' before the git-add, and a 'git
>>>>>write-tree'
>>>>> after the add.  With that, the commit time slowed down to 8 seconds
>>>>>per
>>>>> commit, plus 4 more seconds for the read-tree/add/write-tree ops.
>>>>>The
>>>>> read-tree/add/write-tree each took about a second.
>>>>
>>>>read-tree destroys stat info in index, refreshing 100k entries in
>>>>index in this case may take some time. Try this to see if commit time
>>>>reduces and how much time update-index takes
>>>>
>>>>read-tree HEAD
>>>>update-index --refresh
>>>>add ....
>>>>write-tree
>>>>commit -q
>>>
>>>I added the "update-index --refresh" and the time for commit became more
>>>like 0.6 seconds.
>>>In this setup: read-tree takes ~2 seconds, update-index takes ~8
>>>seconds,
>>>git-add takes 1 to 4 seconds, and write-tree takes less than 1 second.
>>>
>>>>
>>>>> As an experiment, I also tried removing the 'git read-tree' and just
>>>>> having the git-write-tree.  That sped up commits to 0.6 seconds, but
>>>>>the
>>>>> overall time for add/write-tree/commit was still 3 to 6 seconds.
>>>>
>>>>overall time is not really important because we duplicate work here
>>>>(write-tree is done as part of commit again). What I'm trying to do is
>>>>to determine how much time each operation in commit may take.
>>>>-- 
>>>>Duy
>>>
>>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Debugging git-commit slowness on a large repo
  2011-12-20  0:51                 ` Joshua Redstone
@ 2011-12-20  1:21                   ` Junio C Hamano
  2011-12-20  1:40                     ` Joshua Redstone
  0 siblings, 1 reply; 16+ messages in thread
From: Junio C Hamano @ 2011-12-20  1:21 UTC (permalink / raw)
  To: Joshua Redstone
  Cc: Nguyen Thai Ngoc Duy, Carlos Martín Nieto, Tomas Carnecky,
	git@vger.kernel.org

Joshua Redstone <joshua.redstone@fb.com> writes:

> I've managed to speed up git-commit on large repos by 4x by removing some
> safeguards that caused git to stat every file in the repo on commits that
> touch a small number of files.  The diff, for illustrative purposes only,
> is at:
>
>     https://gist.github.com/1499621
>
>
> With a repo with 1 million files (but few commits), the diff drops the
> commit time down from 7.3 seconds to 1.8 seconds, a 75% decrease. The
> optimizations are:

I do not know if these kind of changes are called "optimizations" or
merely making the command record a random tree object that may have some
resemblance to what you wanted to commit but is subtly incorrect. I didn't
fetch your safety removal, though.

Wouldn't you get a similar speed-up without being unsafe if you simply ran
"git commit" without any parameter (i.e. write out the current index as a
tree and make a commit), combined with "--no-status" and perhaps "-q" to
avoid running the comparison between the resulting commit and the working
tree state after the commit?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Debugging git-commit slowness on a large repo
  2011-12-20  1:21                   ` Junio C Hamano
@ 2011-12-20  1:40                     ` Joshua Redstone
  2011-12-20  9:23                       ` Thomas Rast
  0 siblings, 1 reply; 16+ messages in thread
From: Joshua Redstone @ 2011-12-20  1:40 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Nguyen Thai Ngoc Duy, Carlos Martín Nieto, Tomas Carnecky,
	git@vger.kernel.org

You're right, more than optimizations, they are modifications that reduce
safety checks and make assumptions about the way one is using git (e.g.,
you always remember to add each file you want to commit).  I focused on
them because:

  1. In our installation, we don't use commit hooks that change what's
being committed, so it's good to know that in principle, there's a big
perf benefit to be had by leveraging that fact.

  2. At an abstract level, it seems like the cost of doing a commit should
be proportional to the amount of the repository touched by the commit, not
by the size of the repository.  These experiments are demonstrations of
one direction that a set of optimizations would need to go to get commit
performance more along those lines.

  3. We're also exploring storage systems that support more efficient ways
to query what's changed than stat'ing every file.

I forgot to mention, the times I quoted where with --no-verify and
--no-status.  Adding '-q' didn't speed up performance at all.

As a bonus, I've also profiled git-add on the 1-million file repo, and it
looks like, as you might expect, the time is dominated by reading and
writing the index.  The time for git-add is a couple of seconds.

Josh

On 12/19/11 5:21 PM, "Junio C Hamano" <gitster@pobox.com> wrote:

>Joshua Redstone <joshua.redstone@fb.com> writes:
>
>> I've managed to speed up git-commit on large repos by 4x by removing
>>some
>> safeguards that caused git to stat every file in the repo on commits
>>that
>> touch a small number of files.  The diff, for illustrative purposes
>>only,
>> is at:
>>
>>     https://gist.github.com/1499621
>>
>>
>> With a repo with 1 million files (but few commits), the diff drops the
>> commit time down from 7.3 seconds to 1.8 seconds, a 75% decrease. The
>> optimizations are:
>
>I do not know if these kind of changes are called "optimizations" or
>merely making the command record a random tree object that may have some
>resemblance to what you wanted to commit but is subtly incorrect. I didn't
>fetch your safety removal, though.
>
>Wouldn't you get a similar speed-up without being unsafe if you simply ran
>"git commit" without any parameter (i.e. write out the current index as a
>tree and make a commit), combined with "--no-status" and perhaps "-q" to
>avoid running the comparison between the resulting commit and the working
>tree state after the commit?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Debugging git-commit slowness on a large repo
  2011-12-20  1:40                     ` Joshua Redstone
@ 2011-12-20  9:23                       ` Thomas Rast
  2011-12-20 19:26                         ` Joshua Redstone
  0 siblings, 1 reply; 16+ messages in thread
From: Thomas Rast @ 2011-12-20  9:23 UTC (permalink / raw)
  To: Joshua Redstone
  Cc: Junio C Hamano, Nguyen Thai Ngoc Duy, Carlos Martín Nieto,
	Tomas Carnecky, git@vger.kernel.org

Joshua Redstone <joshua.redstone@fb.com> writes:
> As a bonus, I've also profiled git-add on the 1-million file repo, and it
> looks like, as you might expect, the time is dominated by reading and
> writing the index.  The time for git-add is a couple of seconds.

Note that the time to write the index itself is also rather small, but
the time needed to sha1 the index when loading and then again when
saving it really hurts.

(I noticed this while working on the commit-tree topic.)

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Debugging git-commit slowness on a large repo
  2011-12-20  9:23                       ` Thomas Rast
@ 2011-12-20 19:26                         ` Joshua Redstone
  0 siblings, 0 replies; 16+ messages in thread
From: Joshua Redstone @ 2011-12-20 19:26 UTC (permalink / raw)
  To: Thomas Rast
  Cc: Junio C Hamano, Nguyen Thai Ngoc Duy, Carlos Martín Nieto,
	Tomas Carnecky, git@vger.kernel.org

I looked again at my poor-mans-profiling output of git-add.  The Sha1
stuff under ce_write_entry->ce_write_flush  takes a bunch of time.
commit_lock_file->rename takes about the same as well.

Btw, the perf numbers for commit and add are with a warm file cache.  I
expect the benefit of skipping all the stat() calls will increase for cold
cache.

Josh

On 12/20/11 1:23 AM, "Thomas Rast" <trast@student.ethz.ch> wrote:

>Joshua Redstone <joshua.redstone@fb.com> writes:
>> As a bonus, I've also profiled git-add on the 1-million file repo, and
>>it
>> looks like, as you might expect, the time is dominated by reading and
>> writing the index.  The time for git-add is a couple of seconds.
>
>Note that the time to write the index itself is also rather small, but
>the time needed to sha1 the index when loading and then again when
>saving it really hurts.
>
>(I noticed this while working on the commit-tree topic.)
>
>-- 
>Thomas Rast
>trast@{inf,student}.ethz.ch

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Debugging git-commit slowness on a large repo
  2011-12-02 23:17 Debugging git-commit slowness on a large repo Joshua Redstone
  2011-12-03  0:23 ` Carlos Martín Nieto
@ 2011-12-04 13:54 ` Tomas Carnecky
  1 sibling, 0 replies; 16+ messages in thread
From: Tomas Carnecky @ 2011-12-04 13:54 UTC (permalink / raw)
  To: Joshua Redstone; +Cc: git@vger.kernel.org

On 12/3/11 12:17 AM, Joshua Redstone wrote:
> Hi,
> I have a git repo with about 300k commits,  150k files totaling maybe 7GB.
>   Locally committing a small change - say touching fewer than 300 bytes
> across 4 files - consistently takes over one second, which seems kinda
> slow.  This is using git 1.7.7.4 on a linux 2.6 box.  The time does not
> improve after doing a git-gc (my .git dir has maybe 250 files after a git
> gc).  The same size commit on a brand new repo takes<  10ms.  Any thoughts
> on why committing a small change seems to take a long time on larger repos?
>
> Fwiw, I also tried doing the same test using libgit2 (via the pygit2
> wrapper), and it was ever slower (about 6 seconds to commit the same small
> change).

try git commit --no-status

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2011-12-20 19:28 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-12-02 23:17 Debugging git-commit slowness on a large repo Joshua Redstone
2011-12-03  0:23 ` Carlos Martín Nieto
2011-12-05 17:38   ` Junio C Hamano
2011-12-07  1:48   ` Joshua Redstone
2011-12-07  2:08     ` Nguyen Thai Ngoc Duy
2011-12-07 22:48       ` Joshua Redstone
2011-12-08  1:39         ` Nguyen Thai Ngoc Duy
2011-12-09  0:09           ` Joshua Redstone
2011-12-09  0:17             ` Joshua Redstone
2011-12-13  0:15               ` Joshua Redstone
2011-12-20  0:51                 ` Joshua Redstone
2011-12-20  1:21                   ` Junio C Hamano
2011-12-20  1:40                     ` Joshua Redstone
2011-12-20  9:23                       ` Thomas Rast
2011-12-20 19:26                         ` Joshua Redstone
2011-12-04 13:54 ` Tomas Carnecky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).