* Is cogito really this inefficient
@ 2005-07-13 12:50 Russell King
2005-07-13 16:51 ` Matthias Urlichs
2005-07-13 20:28 ` Linus Torvalds
0 siblings, 2 replies; 13+ messages in thread
From: Russell King @ 2005-07-13 12:50 UTC (permalink / raw)
To: git
This says it all. 1min 22secs to generate a patch from a locally
modified but uncommitted file.
cp, edit, diff would be several orders of magnitude faster. What's
going on?
$ /usr/bin/time cg-diff drivers/serial/8250.c > o
Command exited with non-zero status 1
14.40user 17.47system 1:22.96elapsed 38%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (180major+14692minor)pagefaults 0swaps
diff --git a/drivers/serial/8250.c b/drivers/serial/8250.c
--- a/drivers/serial/8250.c
+++ b/drivers/serial/8250.c
@@ -2333,6 +2333,7 @@ static int __devinit serial8250_probe(st
dev_err(dev, "unable to register port at index %d "
"(IO%lx MEM%lx IRQ%d): %d\n", i,
p->iobase, p->mapbase, p->irq, ret);
+ printk(KERN_ERR "uartclk was %d\n", port.uartclk);
}
}
return 0;
--
Russell King
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Is cogito really this inefficient
2005-07-13 12:50 Is cogito really this inefficient Russell King
@ 2005-07-13 16:51 ` Matthias Urlichs
2005-07-14 7:38 ` Russell King
2005-07-13 20:28 ` Linus Torvalds
1 sibling, 1 reply; 13+ messages in thread
From: Matthias Urlichs @ 2005-07-13 16:51 UTC (permalink / raw)
To: git
Hi, Russell King wrote:
> This says it all. 1min 22secs to generate a patch from a locally
> modified but uncommitted file.
I only get that when the index is out-of-date WRT the file modification
dates, so cg-diff has to examine every file.
The good news is that the index is being updated as it finds that the
files are in sync, so expect this to be significantly faster the next time
around.
--
Matthias Urlichs | {M:U} IT Design @ m-u-it.de | smurf@smurf.noris.de
Disclaimer: The quote was selected randomly. Really. | http://smurf.noris.de
- -
Praise the sea; on shore remain.
-- John Florio
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Is cogito really this inefficient
2005-07-13 12:50 Is cogito really this inefficient Russell King
2005-07-13 16:51 ` Matthias Urlichs
@ 2005-07-13 20:28 ` Linus Torvalds
2005-07-14 7:37 ` Russell King
1 sibling, 1 reply; 13+ messages in thread
From: Linus Torvalds @ 2005-07-13 20:28 UTC (permalink / raw)
To: Russell King; +Cc: git
On Wed, 13 Jul 2005, Russell King wrote:
>
> This says it all. 1min 22secs to generate a patch from a locally
> modified but uncommitted file.
No, there's something else going on.
Most likely that something forced a total index file re-validation, and
the time you see is every single checked out file having its SHA1
re-computed.
Was this a recently cloned tree, or what was the last operation you did on
that tree before that command? Something must have invalidated the index.
Linus
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Is cogito really this inefficient
2005-07-13 20:28 ` Linus Torvalds
@ 2005-07-14 7:37 ` Russell King
2005-07-14 9:08 ` Catalin Marinas
2005-07-14 15:26 ` Linus Torvalds
0 siblings, 2 replies; 13+ messages in thread
From: Russell King @ 2005-07-14 7:37 UTC (permalink / raw)
To: Linus Torvalds; +Cc: git
On Wed, Jul 13, 2005 at 01:28:18PM -0700, Linus Torvalds wrote:
> On Wed, 13 Jul 2005, Russell King wrote:
> > This says it all. 1min 22secs to generate a patch from a locally
> > modified but uncommitted file.
>
> No, there's something else going on.
>
> Most likely that something forced a total index file re-validation, and
> the time you see is every single checked out file having its SHA1
> re-computed.
>
> Was this a recently cloned tree, or what was the last operation you did on
> that tree before that command? Something must have invalidated the index.
cg-update origin
and then I edited drivers/serial/8250.c
As discovered using:
sh -x /usr/bin/cg-diff drivers/serial/8250.c
it appears that cg-diff does a
git-update-cache --refresh >/dev/null
each time it's run, which is taking the bulk of the time. Also note
that curiously, it exits with status 1.
--
Russell King
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Is cogito really this inefficient
2005-07-13 16:51 ` Matthias Urlichs
@ 2005-07-14 7:38 ` Russell King
0 siblings, 0 replies; 13+ messages in thread
From: Russell King @ 2005-07-14 7:38 UTC (permalink / raw)
To: Matthias Urlichs; +Cc: git
On Wed, Jul 13, 2005 at 06:51:30PM +0200, Matthias Urlichs wrote:
> Hi, Russell King wrote:
>
> > This says it all. 1min 22secs to generate a patch from a locally
> > modified but uncommitted file.
>
> I only get that when the index is out-of-date WRT the file modification
> dates, so cg-diff has to examine every file.
>
> The good news is that the index is being updated as it finds that the
> files are in sync, so expect this to be significantly faster the next time
> around.
It isn't. First time it was 1min11, second time _immediately_ after
it was 1min22. See my reply to Linus.
Oddly, show-diff seemed to be a lot more efficient in previous git
revisions.
--
Russell King
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Is cogito really this inefficient
2005-07-14 7:37 ` Russell King
@ 2005-07-14 9:08 ` Catalin Marinas
2005-07-14 9:59 ` Russell King
2005-07-14 15:26 ` Linus Torvalds
1 sibling, 1 reply; 13+ messages in thread
From: Catalin Marinas @ 2005-07-14 9:08 UTC (permalink / raw)
To: Russell King; +Cc: Linus Torvalds, git
Russell King <rmk@arm.linux.org.uk> wrote:
> it appears that cg-diff does a
>
> git-update-cache --refresh >/dev/null
>
> each time it's run, which is taking the bulk of the time. Also note
> that curiously, it exits with status 1.
Does git-ls-files --unmerged show any files?
--
Catalin
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Is cogito really this inefficient
2005-07-14 9:08 ` Catalin Marinas
@ 2005-07-14 9:59 ` Russell King
2005-07-14 15:51 ` Linus Torvalds
0 siblings, 1 reply; 13+ messages in thread
From: Russell King @ 2005-07-14 9:59 UTC (permalink / raw)
To: Catalin Marinas; +Cc: Linus Torvalds, git
On Thu, Jul 14, 2005 at 10:08:31AM +0100, Catalin Marinas wrote:
> Russell King <rmk@arm.linux.org.uk> wrote:
> > it appears that cg-diff does a
> >
> > git-update-cache --refresh >/dev/null
> >
> > each time it's run, which is taking the bulk of the time. Also note
> > that curiously, it exits with status 1.
>
> Does git-ls-files --unmerged show any files?
No, and it returns fairly quickly:
$ /usr/bin/time git-ls-files --unmerged
0.29user 0.03system 0:00.43elapsed 73%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+655minor)pagefaults 0swaps
Actually, I should've left the sh -x /usr/bin/cg-diff drivers/serial/8250.c
running a little longer. It's not the git-update-cache command which
is taking the time, it's git-diff-cache.
Running the diff several times, both with and without changes to
drivers/serial/8250.c, it seems that sometimes it's faster. I guess
it has to do with dentry invalidation...
However, the point is - I've only asked for _one_ file. Why do we need
to look at _every_ file in the tree?
I could understand this behaviour if I'd asked for a diff across the
whole tree, but I didn't.
Internally, the sha1 of the unmodified drivers/serial/8250.c should be
known, so should be trivial to unpack that and generate a diff. Given
the cache, this should be something which should be lightning fast
when the requested fileset to diff is already known.
--
Russell King
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Is cogito really this inefficient
2005-07-14 7:37 ` Russell King
2005-07-14 9:08 ` Catalin Marinas
@ 2005-07-14 15:26 ` Linus Torvalds
2005-07-19 23:54 ` Petr Baudis
1 sibling, 1 reply; 13+ messages in thread
From: Linus Torvalds @ 2005-07-14 15:26 UTC (permalink / raw)
To: Russell King; +Cc: git
On Thu, 14 Jul 2005, Russell King wrote:
>
> cg-update origin
> and then I edited drivers/serial/8250.c
Hmm..
> it appears that cg-diff does a
>
> git-update-cache --refresh >/dev/null
>
> each time it's run, which is taking the bulk of the time. Also note
> that curiously, it exits with status 1.
That part is normal - a update-cache is fast (it takes me 0.08 sec for the
kernel) if the cache is already mostly up-to-date, and the non-zero exit
status just means that some file was different (ie it's telling the caller
that there are edits in your tree - drivers/serial/8250.c).
The update-cache is slow only if the index isn't up-to-date, which can
happen either if somebody plays games with the index, or if somebody
touches all the files in the tree.
It's quite possible that some path in cg-update ends up not updating the
index properly. For example, I notice that the "fast-forward" uses
"git-checkout-cache -f -a", which can do so (lack of "-u" fila), but then
it does do a "git-update-cache --refresh" later, so that doesn't seem to
be it either.
If you do a "git-diff-files" every once in a while, it will _scream_ at
you whenever you have files that aren't up-to-date in the cache. That's
normal in small doses, of course (eg your edit of drivers/serial/8250.c
would make that one not up-to-date), but if you get a _lot_ of files
listed, that's usually a sign that something screwed up your index.
Linus
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Is cogito really this inefficient
2005-07-14 9:59 ` Russell King
@ 2005-07-14 15:51 ` Linus Torvalds
2005-07-15 0:29 ` Linus Torvalds
0 siblings, 1 reply; 13+ messages in thread
From: Linus Torvalds @ 2005-07-14 15:51 UTC (permalink / raw)
To: Russell King; +Cc: Catalin Marinas, git
On Thu, 14 Jul 2005, Russell King wrote:
>
> Actually, I should've left the sh -x /usr/bin/cg-diff drivers/serial/8250.c
> running a little longer. It's not the git-update-cache command which
> is taking the time, it's git-diff-cache.
Ok. git-diff-cache actually ends up reading your HEAD tree, and that, in
turn, is 1000+ tree objects. So it can take a while for the whole tree,
especially in the nonpacked and uncached case.
git-diff-tree (comparing two trees) is smart enough to limit itself to
just the sub-trees that have been named, and would have compared the two
trees by looking up just eight objects (three subdirectories from each
tree, and then the file itself from both trees).
But git-diff-cache isn't - because it's comparing the tree against the
index file, and the index is inevitably the whole tree.
And I now think I know what makes it slow. Not only are you basically
opening 1100 files (the tree objects - there's really that many
subdirectories in the kernel. Scary), but because you have alternate
object directories, and almost all of the objects are in the alternate
(not your primary), you'll basically always end up _first_ looking in the
primary, failing, and then looking in the alternate.
Together with the hashing, you'll be looking all over the place, in other
words ;)
Which means that you'll be needing a fair amount of memory to keep all of
those negative dentries etc cached (and the directory tree too).
This is something the pack-files will just help enormously with, but it
was only recently that we turned git around to check the pack-files
_first_, and the object directories second, so you probably won't see it
(not to mention that you probably don't have big pack-files at all ;)
I'll look into making diff-cache be more efficient. I normally don't use
it myself, so I didn't bother (I use git-diff-files, which is way more
efficient, but doesn't show the difference against the _tree_, it shows
the difference against the index. Since cogito tries to hide the index
from you, cogito can't very well use that).
Linus
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Is cogito really this inefficient
2005-07-14 15:51 ` Linus Torvalds
@ 2005-07-15 0:29 ` Linus Torvalds
2005-07-15 2:10 ` Junio C Hamano
2005-07-15 9:48 ` Russell King
0 siblings, 2 replies; 13+ messages in thread
From: Linus Torvalds @ 2005-07-15 0:29 UTC (permalink / raw)
To: Russell King; +Cc: Catalin Marinas, git
On Thu, 14 Jul 2005, Linus Torvalds wrote:
>
> I'll look into making diff-cache be more efficient. I normally don't use
> it myself, so I didn't bother (I use git-diff-files, which is way more
> efficient, but doesn't show the difference against the _tree_, it shows
> the difference against the index. Since cogito tries to hide the index
> from you, cogito can't very well use that).
Ok, done.
I made git-diff-cache _and_ git-diff-files limit the pathnames early, so
that they don't even bother expanding the tree objects that are
irrelevant, and don't bother even validating index objects that don't
match the pathnames given.
Junio - I think this makes gitcore-pathspec pretty pointless, but I didn't
actually remove it. I guess "git-diff-helper" still uses it.
Linus
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Is cogito really this inefficient
2005-07-15 0:29 ` Linus Torvalds
@ 2005-07-15 2:10 ` Junio C Hamano
2005-07-15 9:48 ` Russell King
1 sibling, 0 replies; 13+ messages in thread
From: Junio C Hamano @ 2005-07-15 2:10 UTC (permalink / raw)
To: Linus Torvalds; +Cc: git
Linus Torvalds <torvalds@osdl.org> writes:
> On Thu, 14 Jul 2005, Linus Torvalds wrote:
>>
>> I'll look into making diff-cache be more efficient. I normally don't use
>> it myself, so I didn't bother (I use git-diff-files, which is way more
>> efficient, but doesn't show the difference against the _tree_, it shows
>> the difference against the index. Since cogito tries to hide the index
>> from you, cogito can't very well use that).
>
> Ok, done.
Wonderful.
> Junio - I think this makes gitcore-pathspec pretty pointless, but I didn't
> actually remove it. I guess "git-diff-helper" still uses it.
And probably it shouldn't; diff-helper should be raw-to-patch
converter, nothing more.
Usually I'd volunteer to clean up the remaining mess (which was
originally my mess anyway) myself, but since I'd already asked
smurf to help cleaning up the diff option parsing, and recently
I've suddenly got quite busy in the day job, so ...
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Is cogito really this inefficient
2005-07-15 0:29 ` Linus Torvalds
2005-07-15 2:10 ` Junio C Hamano
@ 2005-07-15 9:48 ` Russell King
1 sibling, 0 replies; 13+ messages in thread
From: Russell King @ 2005-07-15 9:48 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Catalin Marinas, git
On Thu, Jul 14, 2005 at 05:29:09PM -0700, Linus Torvalds wrote:
> On Thu, 14 Jul 2005, Linus Torvalds wrote:
> > I'll look into making diff-cache be more efficient. I normally don't use
> > it myself, so I didn't bother (I use git-diff-files, which is way more
> > efficient, but doesn't show the difference against the _tree_, it shows
> > the difference against the index. Since cogito tries to hide the index
> > from you, cogito can't very well use that).
>
> Ok, done.
Thanks Linus. I'll look forward to trying this out.
--
Russell King
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Is cogito really this inefficient
2005-07-14 15:26 ` Linus Torvalds
@ 2005-07-19 23:54 ` Petr Baudis
0 siblings, 0 replies; 13+ messages in thread
From: Petr Baudis @ 2005-07-19 23:54 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Russell King, git
Dear diary, on Thu, Jul 14, 2005 at 05:26:07PM CEST, I got a letter
where Linus Torvalds <torvalds@osdl.org> told me that...
> It's quite possible that some path in cg-update ends up not updating the
> index properly. For example, I notice that the "fast-forward" uses
> "git-checkout-cache -f -a", which can do so (lack of "-u" fila), but then
> it does do a "git-update-cache --refresh" later, so that doesn't seem to
> be it either.
Just a side note for casual readers, Cogito could use a cleanup here -
from large part it ignores things like git-checkout-cache -u simply
because there was no such option at the time that part of Cogito was
written. I myself am not even too familiar about those gazillions of
funny new options, and as long as it works, I prefer not to touch that
code, but if someone is bored and wants to get familiar with core git
usage as well as Cogito internals...
--
Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
If you want the holes in your knowledge showing up try teaching
someone. -- Alan Cox
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2005-07-19 23:55 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-07-13 12:50 Is cogito really this inefficient Russell King
2005-07-13 16:51 ` Matthias Urlichs
2005-07-14 7:38 ` Russell King
2005-07-13 20:28 ` Linus Torvalds
2005-07-14 7:37 ` Russell King
2005-07-14 9:08 ` Catalin Marinas
2005-07-14 9:59 ` Russell King
2005-07-14 15:51 ` Linus Torvalds
2005-07-15 0:29 ` Linus Torvalds
2005-07-15 2:10 ` Junio C Hamano
2005-07-15 9:48 ` Russell King
2005-07-14 15:26 ` Linus Torvalds
2005-07-19 23:54 ` Petr Baudis
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).