git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Rast <trast@student.ethz.ch>
To: Junio C Hamano <gitster@pobox.com>
Cc: git@vger.kernel.org, "Carlos Martín Nieto" <cmn@elego.de>
Subject: Re: [PATCH] commit: write out cache-tree information
Date: Wed, 3 Aug 2011 00:01:53 +0200	[thread overview]
Message-ID: <201108030001.53476.trast@student.ethz.ch> (raw)
In-Reply-To: <7vmxfrel63.fsf@alter.siamese.dyndns.org>

Junio C Hamano wrote:
> <trast@student.ethz.ch> writes:
>
> > From: Thomas Rast <trast@student.ethz.ch>
> >
> > While write-tree has code to write out the cache-tree information
> > (since we have to compute it anyway if the cache is stale), commit
> > lost this capability when it became a builtin and moved away from
> > using write-tree.
>
> Earlier the code read from the index, made sure that it is not unmerged by
> running cache_tere_update(), before running prepare-commit-msg hook. The
> hook used to see the index that was read in this codepath which is the
> same as what pre-commit left us.
>
> Why run an extra I/O here? The index file could be quite large, and I do
> not want people to writing it out without good reason.

Ok, so let's run some numbers.  With the first test script below I'm
seeing:

  before patch:
    $ time ./commit-in-large-tree.sh
    Initialized empty Git repository in /dev/shm/commit-in-large-tree.tmp/.git/
    6.9M    .git/index

    real    1m31.607s
    user    0m57.604s
    sys     0m29.976s

  after patch: 14% speedup
    $ time ./commit-in-large-tree.sh
    Initialized empty Git repository in /dev/shm/commit-in-large-tree.tmp/.git/
    7.0M    .git/index

    real    1m18.521s
    user    0m53.430s
    sys     0m22.138s

On the other hand if you touch every file as in the second script:

  before patch:
    $ time ./commit-in-large-tree-2.sh 
    Initialized empty Git repository in /dev/shm/commit-in-large-tree.tmp/.git/
    6.9M    .git/index

    real    1m40.910s
    user    0m58.731s
    sys     0m38.011s

  after patch: 5% slowdown
    $ time ./commit-in-large-tree-2.sh 
    Initialized empty Git repository in /dev/shm/commit-in-large-tree.tmp/.git/
    7.0M    .git/index

    real    1m45.465s
    user    1m2.329s
    sys     0m38.849s

I also ran the latter test where it only touches one file in 100
(instead of all 1000) subdirs, and there the patch is still a speedup.

So I guess it depends whether we expect users to mostly modify a small
part or the whole tree.

Regarding your other email

> When we are running a partial commit, the index file you are writing back
> is a temporary index only to build a tree object to record in the commit,
> which we already have done, and the temporary will be discarded.

that's a valid point that I need to address.



-- 8< --   commit-in-large-tree.sh
#!/bin/sh

set -e

git init /dev/shm/commit-in-large-tree.tmp
cd /dev/shm/commit-in-large-tree.tmp
for i in $(seq 1 1000); do
    mkdir $i
    (
	cd $i
	for j in $(seq 1 100); do
	    echo $j > $j
	done
    )
    git add $i
done
git commit -q -m initial
du -h .git/index

for i in $(seq 1 100); do
    echo "$i changed" > $i/$i
    git add $i/$i
    git commit -q -m $i
done

rm -rf /dev/shm/commit-in-large-tree.tmp
-- >8 --

-- 8< --  commit-in-large-tree-2.sh
#!/bin/sh

set -e

git init /dev/shm/commit-in-large-tree.tmp
cd /dev/shm/commit-in-large-tree.tmp
for i in $(seq 1 1000); do
    mkdir $i
    (
	cd $i
	for j in $(seq 1 100); do
	    echo $j > $j
	done
    )
    git add $i
done
git commit -q -m initial
du -h .git/index

for i in $(seq 1 100); do
    for j in $(seq 1 1000); do
	echo "$i changed" > $j/$i
    done
    git add -u
    git commit -q -m $i
done

rm -rf /dev/shm/commit-in-large-tree.tmp
-- >3 --

--
Thomas Rast
trast@{inf,student}.ethz.ch

      parent reply	other threads:[~2011-08-02 22:02 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-08-02 16:36 [PATCH] commit: write out cache-tree information trast
2011-08-02 18:13 ` Junio C Hamano
2011-08-02 21:15   ` Junio C Hamano
2011-08-02 22:01   ` Thomas Rast [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=201108030001.53476.trast@student.ethz.ch \
    --to=trast@student.ethz.ch \
    --cc=cmn@elego.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).