* [PATCH] commit: write out cache-tree information
@ 2011-08-02 16:36 trast
2011-08-02 18:13 ` Junio C Hamano
0 siblings, 1 reply; 4+ messages in thread
From: trast @ 2011-08-02 16:36 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano, Carlos Martín Nieto, Thomas Rast
From: Thomas Rast <trast@student.ethz.ch>
While write-tree has code to write out the cache-tree information
(since we have to compute it anyway if the cache is stale), commit
lost this capability when it became a builtin and moved away from
using write-tree.
Add the necessary code to write out the cache. This is extremely
similar to what write_cache_as_tree() does.
Reported-by: Carlos Martín Nieto <cmn@elego.de>
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
---
Reported on IRC.
There's a similar disparity between read-tree and checkout, but I
first have to understand the exact conditions in or around
unpack_trees() [checkout.c:417] where the read-tree reasoning applies.
builtin/commit.c | 9 +++++++++
1 files changed, 9 insertions(+), 0 deletions(-)
diff --git a/builtin/commit.c b/builtin/commit.c
index e1af9b1..47b0eea 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -637,6 +637,8 @@ static int prepare_to_commit(const char *index_file, const char *prefix,
const char *hook_arg2 = NULL;
int ident_shown = 0;
int clean_message_contents = (cleanup_mode != CLEANUP_NONE);
+ int fd;
+ struct lock_file *lock_file;
if (!no_verify && run_hook(index_file, "pre-commit", NULL))
return 0;
@@ -861,6 +863,8 @@ static int prepare_to_commit(const char *index_file, const char *prefix,
* the editor and after we invoke run_status above.
*/
discard_cache();
+ lock_file = xcalloc(1, sizeof(struct lock_file));
+ fd = hold_lock_file_for_update(lock_file, index_file, LOCK_DIE_ON_ERROR);
read_cache_from(index_file);
if (!active_cache_tree)
active_cache_tree = cache_tree();
@@ -869,6 +873,11 @@ static int prepare_to_commit(const char *index_file, const char *prefix,
error(_("Error building trees"));
return 0;
}
+ if (0 <= fd) {
+ if (write_cache(fd, active_cache, active_nr) ||
+ commit_lock_file(lock_file))
+ rollback_lock_file(lock_file);
+ }
if (run_hook(index_file, "prepare-commit-msg",
git_path(commit_editmsg), hook_arg1, hook_arg2, NULL))
--
1.7.6.668.g17b0a
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] commit: write out cache-tree information
2011-08-02 16:36 [PATCH] commit: write out cache-tree information trast
@ 2011-08-02 18:13 ` Junio C Hamano
2011-08-02 21:15 ` Junio C Hamano
2011-08-02 22:01 ` Thomas Rast
0 siblings, 2 replies; 4+ messages in thread
From: Junio C Hamano @ 2011-08-02 18:13 UTC (permalink / raw)
To: trast; +Cc: git, Carlos Martín Nieto
<trast@student.ethz.ch> writes:
> From: Thomas Rast <trast@student.ethz.ch>
>
> While write-tree has code to write out the cache-tree information
> (since we have to compute it anyway if the cache is stale), commit
> lost this capability when it became a builtin and moved away from
> using write-tree.
Earlier the code read from the index, made sure that it is not unmerged by
running cache_tere_update(), before running prepare-commit-msg hook. The
hook used to see the index that was read in this codepath which is the
same as what pre-commit left us.
Why run an extra I/O here? The index file could be quite large, and I do
not want people to writing it out without good reason.
The tree object that is committed is taken from active_cache_tree->sha1 in
cmd_commit() and not from the cache tree you are writing out to the index
file.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] commit: write out cache-tree information
2011-08-02 18:13 ` Junio C Hamano
@ 2011-08-02 21:15 ` Junio C Hamano
2011-08-02 22:01 ` Thomas Rast
1 sibling, 0 replies; 4+ messages in thread
From: Junio C Hamano @ 2011-08-02 21:15 UTC (permalink / raw)
To: Thomas Rast; +Cc: Git Mailing List, Carlos Martín Nieto
Junio C Hamano <gitster@pobox.com> writes:
>> While write-tree has code to write out the cache-tree information
>> (since we have to compute it anyway if the cache is stale), commit
>> lost this capability when it became a builtin and moved away from
>> using write-tree.
>
> Earlier the code read from the index, made sure that it is not unmerged by
> running cache_tere_update(), before running prepare-commit-msg hook. The
> hook used to see the index that was read in this codepath which is the
> same as what pre-commit left us.
>
> Why run an extra I/O here? The index file could be quite large, and I do
> not want people to writing it out without good reason.
One possible good reason is if you did this only when we are committing
the whole index. Then theoretically the next operation after commit could
start from a fully populated cache-tree in the index.
When we are running a partial commit, the index file you are writing back
is a temporary index only to build a tree object to record in the commit,
which we already have done, and the temporary will be discarded.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] commit: write out cache-tree information
2011-08-02 18:13 ` Junio C Hamano
2011-08-02 21:15 ` Junio C Hamano
@ 2011-08-02 22:01 ` Thomas Rast
1 sibling, 0 replies; 4+ messages in thread
From: Thomas Rast @ 2011-08-02 22:01 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git, Carlos Martín Nieto
Junio C Hamano wrote:
> <trast@student.ethz.ch> writes:
>
> > From: Thomas Rast <trast@student.ethz.ch>
> >
> > While write-tree has code to write out the cache-tree information
> > (since we have to compute it anyway if the cache is stale), commit
> > lost this capability when it became a builtin and moved away from
> > using write-tree.
>
> Earlier the code read from the index, made sure that it is not unmerged by
> running cache_tere_update(), before running prepare-commit-msg hook. The
> hook used to see the index that was read in this codepath which is the
> same as what pre-commit left us.
>
> Why run an extra I/O here? The index file could be quite large, and I do
> not want people to writing it out without good reason.
Ok, so let's run some numbers. With the first test script below I'm
seeing:
before patch:
$ time ./commit-in-large-tree.sh
Initialized empty Git repository in /dev/shm/commit-in-large-tree.tmp/.git/
6.9M .git/index
real 1m31.607s
user 0m57.604s
sys 0m29.976s
after patch: 14% speedup
$ time ./commit-in-large-tree.sh
Initialized empty Git repository in /dev/shm/commit-in-large-tree.tmp/.git/
7.0M .git/index
real 1m18.521s
user 0m53.430s
sys 0m22.138s
On the other hand if you touch every file as in the second script:
before patch:
$ time ./commit-in-large-tree-2.sh
Initialized empty Git repository in /dev/shm/commit-in-large-tree.tmp/.git/
6.9M .git/index
real 1m40.910s
user 0m58.731s
sys 0m38.011s
after patch: 5% slowdown
$ time ./commit-in-large-tree-2.sh
Initialized empty Git repository in /dev/shm/commit-in-large-tree.tmp/.git/
7.0M .git/index
real 1m45.465s
user 1m2.329s
sys 0m38.849s
I also ran the latter test where it only touches one file in 100
(instead of all 1000) subdirs, and there the patch is still a speedup.
So I guess it depends whether we expect users to mostly modify a small
part or the whole tree.
Regarding your other email
> When we are running a partial commit, the index file you are writing back
> is a temporary index only to build a tree object to record in the commit,
> which we already have done, and the temporary will be discarded.
that's a valid point that I need to address.
-- 8< -- commit-in-large-tree.sh
#!/bin/sh
set -e
git init /dev/shm/commit-in-large-tree.tmp
cd /dev/shm/commit-in-large-tree.tmp
for i in $(seq 1 1000); do
mkdir $i
(
cd $i
for j in $(seq 1 100); do
echo $j > $j
done
)
git add $i
done
git commit -q -m initial
du -h .git/index
for i in $(seq 1 100); do
echo "$i changed" > $i/$i
git add $i/$i
git commit -q -m $i
done
rm -rf /dev/shm/commit-in-large-tree.tmp
-- >8 --
-- 8< -- commit-in-large-tree-2.sh
#!/bin/sh
set -e
git init /dev/shm/commit-in-large-tree.tmp
cd /dev/shm/commit-in-large-tree.tmp
for i in $(seq 1 1000); do
mkdir $i
(
cd $i
for j in $(seq 1 100); do
echo $j > $j
done
)
git add $i
done
git commit -q -m initial
du -h .git/index
for i in $(seq 1 100); do
for j in $(seq 1 1000); do
echo "$i changed" > $j/$i
done
git add -u
git commit -q -m $i
done
rm -rf /dev/shm/commit-in-large-tree.tmp
-- >3 --
--
Thomas Rast
trast@{inf,student}.ethz.ch
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2011-08-02 22:02 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-08-02 16:36 [PATCH] commit: write out cache-tree information trast
2011-08-02 18:13 ` Junio C Hamano
2011-08-02 21:15 ` Junio C Hamano
2011-08-02 22:01 ` Thomas Rast
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).