* [PATCH] cogito: Avoid slowness when timewarping large trees.
@ 2006-03-24 8:44 Jeff King
2006-03-24 10:21 ` Junio C Hamano
0 siblings, 1 reply; 10+ messages in thread
From: Jeff King @ 2006-03-24 8:44 UTC (permalink / raw)
To: git
tree_timewarp was calling read, egrep, and rm in an O(N) loop where N is
the number of changed files between two trees. This caused a bottleneck
when seeking/switching/merging between trees with many changed files.
On the historical linux tree, the time to cg-seek from the head to the
initial commit (a change of 19099 files) dropped from 2m35s to 21s.
---
cg-Xlib | 9 +++------
1 files changed, 3 insertions(+), 6 deletions(-)
a9a160c0bd63973c53ba3aa74650728135d23ac7
diff --git a/cg-Xlib b/cg-Xlib
index a2f28cf..ceddeeb 100644
--- a/cg-Xlib
+++ b/cg-Xlib
@@ -345,12 +345,9 @@ tree_timewarp()
# Kill gone files
git-diff-tree -r "$base" "$branch" |
- while IFS=$'\t' read header file; do
- # match ":100755 000000 14d43b1abf... 000000000... D"
- if echo "$header" | egrep "^:([^ ][^ ]* ){4}D" >/dev/null; then
- rm -- "$file"
- fi
- done
+ # match ":100755 000000 14d43b1abf... 000000000... D"
+ sed -ne 's/^:\([^ ][^ ]* \)\{4\}D\t//p' |
+ xargs rm --
git-checkout-index -u -f -a
# FIXME: Can produce bogus "contains only garbage" messages.
--
1.2.4
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] cogito: Avoid slowness when timewarping large trees.
2006-03-24 8:44 [PATCH] cogito: Avoid slowness when timewarping large trees Jeff King
@ 2006-03-24 10:21 ` Junio C Hamano
2006-03-24 10:55 ` Jeff King
0 siblings, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2006-03-24 10:21 UTC (permalink / raw)
To: Jeff King; +Cc: git
Jeff King <peff@peff.net> writes:
> tree_timewarp was calling read, egrep, and rm in an O(N) loop where N is
> the number of changed files between two trees. This caused a bottleneck
> when seeking/switching/merging between trees with many changed files.
>...
> ---
>
> cg-Xlib | 9 +++------
> 1 files changed, 3 insertions(+), 6 deletions(-)
>
> a9a160c0bd63973c53ba3aa74650728135d23ac7
> diff --git a/cg-Xlib b/cg-Xlib
> index a2f28cf..ceddeeb 100644
> --- a/cg-Xlib
> +++ b/cg-Xlib
> @@ -345,12 +345,9 @@ tree_timewarp()
>
> # Kill gone files
> git-diff-tree -r "$base" "$branch" |
> ...
> + # match ":100755 000000 14d43b1abf... 000000000... D"
> + sed -ne 's/^:\([^ ][^ ]* \)\{4\}D\t//p' |
> + xargs rm --
> git-checkout-index -u -f -a
Metainformation fields are internally separated with SP and a
TAB comes before pathname; you can just say:
sed -ne 's/^:[^ ]* D //p'
there (whitespace inside [] and after D are TAB; one before D is
a SP). You might also want to consider "xargs rm -f --", BTW.
However, I wonder why it does not do this instead:
... stash away the local changes
git-read-tree -m "$base" ;# reset the index to $base
# switch to $branch -- removing gone files as well
git-read-tree -m -u "$base" "$branch"
Then you can also lose diff-tree and checkout-index there.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] cogito: Avoid slowness when timewarping large trees.
2006-03-24 10:21 ` Junio C Hamano
@ 2006-03-24 10:55 ` Jeff King
2006-03-24 11:01 ` Junio C Hamano
0 siblings, 1 reply; 10+ messages in thread
From: Jeff King @ 2006-03-24 10:55 UTC (permalink / raw)
To: git
On Fri, Mar 24, 2006 at 02:21:25AM -0800, Junio C Hamano wrote:
> Metainformation fields are internally separated with SP and a
> TAB comes before pathname; you can just say:
>
> sed -ne 's/^:[^ ]* D //p'
That is much cleaner (I stupidly just converted the original regex
verbatim).
> a SP). You might also want to consider "xargs rm -f --", BTW.
Oops, you're right. In particular, rm complains when there are no
deletions.
> However, I wonder why it does not do this instead:
>
> ... stash away the local changes
> git-read-tree -m "$base" ;# reset the index to $base
>
> # switch to $branch -- removing gone files as well
> git-read-tree -m -u "$base" "$branch"
>
> Then you can also lose diff-tree and checkout-index there.
This doesn't deal very well with local changes. The second read-tree
complains about a not uptodate entry during the merge. Since we've
already stashed the local changes as a diff, we should be able to simply
ignore them during the read-tree. Should the first read-tree actually
be:
git-read-tree --reset "$base"
?
-Peff
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] cogito: Avoid slowness when timewarping large trees.
2006-03-24 10:55 ` Jeff King
@ 2006-03-24 11:01 ` Junio C Hamano
2006-03-24 11:22 ` Jeff King
0 siblings, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2006-03-24 11:01 UTC (permalink / raw)
To: Jeff King; +Cc: git
Jeff King <peff@peff.net> writes:
> ... Should the first read-tree actually
> be:
> git-read-tree --reset "$base"
Exactly. That's what I meant. Thanks.
Originally I wrote "git reset" there, but this being Cogito I
thought Pasky preferred to use the true Plumbing and I botched
it X-<.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] cogito: Avoid slowness when timewarping large trees.
2006-03-24 11:01 ` Junio C Hamano
@ 2006-03-24 11:22 ` Jeff King
2006-03-24 16:43 ` Shawn Pearce
0 siblings, 1 reply; 10+ messages in thread
From: Jeff King @ 2006-03-24 11:22 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
On Fri, Mar 24, 2006 at 03:01:35AM -0800, Junio C Hamano wrote:
> > git-read-tree --reset "$base"
> Exactly. That's what I meant. Thanks.
Hmm. That doesn't actually work, though. If I have a history like this:
$ cg-init -m "initial"
$ cg-tag initial
$ echo contents >file
$ cg-add file
$ cg-commit -m "added file"
and I try this:
$ echo changes >file
$ git-read-tree --reset master
$ git-read-tree -m -u master initial
I get this:
fatal: Entry 'file' not uptodate. Cannot merge.
If I do an update-index before the second read-tree, then I simply get:
fatal: Entry 'file' would be overwritten by merge. Cannot merge.
Is there something I'm missing, or is a 'git reset --hard' really what
we want here (in that case, the fact that git reset changes the HEAD
might be a problem)?
-Peff
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] cogito: Avoid slowness when timewarping large trees.
2006-03-24 11:22 ` Jeff King
@ 2006-03-24 16:43 ` Shawn Pearce
2006-03-25 9:36 ` Jeff King
0 siblings, 1 reply; 10+ messages in thread
From: Shawn Pearce @ 2006-03-24 16:43 UTC (permalink / raw)
To: Jeff King; +Cc: Junio C Hamano, git
Jeff King <peff@peff.net> wrote:
> On Fri, Mar 24, 2006 at 03:01:35AM -0800, Junio C Hamano wrote:
>
> > > git-read-tree --reset "$base"
> > Exactly. That's what I meant. Thanks.
>
> Hmm. That doesn't actually work, though. If I have a history like this:
>
> $ cg-init -m "initial"
> $ cg-tag initial
> $ echo contents >file
> $ cg-add file
> $ cg-commit -m "added file"
>
> and I try this:
> $ echo changes >file
> $ git-read-tree --reset master
> $ git-read-tree -m -u master initial
>
> I get this:
> fatal: Entry 'file' not uptodate. Cannot merge.
>
> If I do an update-index before the second read-tree, then I simply get:
> fatal: Entry 'file' would be overwritten by merge. Cannot merge.
>
> Is there something I'm missing, or is a 'git reset --hard' really what
> we want here (in that case, the fact that git reset changes the HEAD
> might be a problem)?
This is sort of what I'm doing in pg-reset-tree, which is kind
of like 'git-reset --hard' but I think it is faster when $force
is unset:
# Remove files left over from merge conflicts and files which are
# somehow modified. If this makes a directory empty it may have
# been a new directory so delete that too.
#
(git-ls-files -z \
--others \
--ignored \
--exclude='*#1' \
--exclude='*#2' \
--exclude='*#3' \
--exclude='*.rej'
git-diff-index --name-only -z HEAD
) | perl -n0e 'chomp; unlink; 1 while (s,/[^/]*$,, && rmdir)'
# Rebuild the index and working directory. We'll only checkout the
# files which don't exist. This resets the modified files we deleted
# just above; remaining files will have their stat information updated
# in the index.
#
git-read-tree --reset HEAD &&
git-checkout-index --index --all $force \
|| die "Can't reset index and working directory."
# Now that the working directory is clean we can safely merge it to
# to our target tree, $new_base.
#
git-read-tree -m -u HEAD $new_base
The $force in git-checkout-index may or may not be set to
'--force'; its usually not set as its not usually necessary.
Unfortunately I've got a case where I'm mounting a directory
exported by SAMBA onto a Windows 2000 system and if I don't include
--force during git-checkout-index it doesn't work right about 1/3
of the time. (It appears to be bad stat information coming from
Cygwin/Windows/SAMBA/Solaris.)
You can't skip the git-checkout-index step (I've tried) as the
ls-files/diff-index above causes the modified files (in your test
above 'changes') to disappear from the working directory and the
read-tree may not bring it back.
Now that I think about it isn't this sort of where you were before
in cg-seek?
--
Shawn.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] cogito: Avoid slowness when timewarping large trees.
2006-03-24 16:43 ` Shawn Pearce
@ 2006-03-25 9:36 ` Jeff King
2006-03-25 9:39 ` [PATCH] " Jeff King
2006-03-25 19:55 ` [PATCH] cogito: " Junio C Hamano
0 siblings, 2 replies; 10+ messages in thread
From: Jeff King @ 2006-03-25 9:36 UTC (permalink / raw)
To: git
On Fri, Mar 24, 2006 at 11:43:52AM -0500, Shawn Pearce wrote:
> Now that I think about it isn't this sort of where you were before
> in cg-seek?
Yes, that's basically it. Short of Junio explaining how the manual file
removal can be avoided, I think my original patch should be applied, as
it causes an order of magnitude speed up. I will repost the cleaned-up
version.
-Peff
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH] Avoid slowness when timewarping large trees.
2006-03-25 9:36 ` Jeff King
@ 2006-03-25 9:39 ` Jeff King
2006-03-26 18:05 ` Petr Baudis
2006-03-25 19:55 ` [PATCH] cogito: " Junio C Hamano
1 sibling, 1 reply; 10+ messages in thread
From: Jeff King @ 2006-03-25 9:39 UTC (permalink / raw)
To: git; +Cc: pasky
tree_timewarp was calling read, egrep, and rm in an O(N) loop where N is
the number of changed files between two trees. This caused a bottleneck
when seeking/switching/merging between trees with many changed files.
Signed-off-by: Jeff King <peff@peff.net>
---
This is a repost of the initial patch featuring a few cleanups suggested
by Junio.
cg-Xlib | 9 +++------
1 files changed, 3 insertions(+), 6 deletions(-)
5f79b37a0eb85ff4f643e70a7f2823e68e9d9ca4
diff --git a/cg-Xlib b/cg-Xlib
index 5896df7..1a9bd4f 100644
--- a/cg-Xlib
+++ b/cg-Xlib
@@ -363,12 +363,9 @@ tree_timewarp()
# Kill gone files
git-diff-tree -r "$base" "$branch" |
- while IFS=$'\t' read header file; do
- # match ":100755 000000 14d43b1abf... 000000000... D"
- if echo "$header" | egrep "^:([^ ][^ ]* ){4}D" >/dev/null; then
- rm -- "$file"
- fi
- done
+ # match ":100755 000000 14d43b1abf... 000000000... D"
+ sed -ne 's/^:[^\t]* D\t//p' |
+ xargs rm -f --
git-checkout-index -u -f -a
# FIXME: Can produce bogus "contains only garbage" messages.
--
1.2.4
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] cogito: Avoid slowness when timewarping large trees.
2006-03-25 9:36 ` Jeff King
2006-03-25 9:39 ` [PATCH] " Jeff King
@ 2006-03-25 19:55 ` Junio C Hamano
1 sibling, 0 replies; 10+ messages in thread
From: Junio C Hamano @ 2006-03-25 19:55 UTC (permalink / raw)
To: Jeff King; +Cc: git
Jeff King <peff@peff.net> writes:
> On Fri, Mar 24, 2006 at 11:43:52AM -0500, Shawn Pearce wrote:
>
>> Now that I think about it isn't this sort of where you were before
>> in cg-seek?
>
> Yes, that's basically it. Short of Junio explaining how the manual file
> removal can be avoided,...
Please don't let me slow you down -- that "git-read-tree --reset"
one was just confused. I was not thinking clearly about the
modification you already had in the working tree which you
wanted to temporarily reset while switching.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] Avoid slowness when timewarping large trees.
2006-03-25 9:39 ` [PATCH] " Jeff King
@ 2006-03-26 18:05 ` Petr Baudis
0 siblings, 0 replies; 10+ messages in thread
From: Petr Baudis @ 2006-03-26 18:05 UTC (permalink / raw)
To: git
Dear diary, on Sat, Mar 25, 2006 at 10:39:57AM CET, I got a letter
where Jeff King <peff@peff.net> said that...
> tree_timewarp was calling read, egrep, and rm in an O(N) loop where N is
> the number of changed files between two trees. This caused a bottleneck
> when seeking/switching/merging between trees with many changed files.
>
> Signed-off-by: Jeff King <peff@peff.net>
Thanks, applied.
--
Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Right now I am having amnesia and deja-vu at the same time. I think
I have forgotten this before.
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2006-03-26 18:05 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-03-24 8:44 [PATCH] cogito: Avoid slowness when timewarping large trees Jeff King
2006-03-24 10:21 ` Junio C Hamano
2006-03-24 10:55 ` Jeff King
2006-03-24 11:01 ` Junio C Hamano
2006-03-24 11:22 ` Jeff King
2006-03-24 16:43 ` Shawn Pearce
2006-03-25 9:36 ` Jeff King
2006-03-25 9:39 ` [PATCH] " Jeff King
2006-03-26 18:05 ` Petr Baudis
2006-03-25 19:55 ` [PATCH] cogito: " Junio C Hamano
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).