git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFH] git-log vs git-rev-list performance
@ 2007-12-29 12:18 Marco Costalba
  2007-12-29 18:51 ` Linus Torvalds
  0 siblings, 1 reply; 3+ messages in thread
From: Marco Costalba @ 2007-12-29 12:18 UTC (permalink / raw)
  To: Git Mailing List

Hi all,

   perhaps this will turn out to be a bit academic, anyway...

I'm almost ready to release qgit-2.1, a lot of improvements went in
but still I cannot beat the speed of stable qgit-1.5 series.

I really profiled this puppy and I think qgit-2.1 is _internally_ much
faster then qgit-1.5, but at the end of the day qgit-1.5 is about 13%
faster then the new 2.1 at loading repositories.


The loading speed is the sum of two factors:

- speed of underlying git-rev-list / git-log command

- qgit overhead in parsing and storing the git plumbing outpt


I think qgit-2.0 code is much more efficient then the old one because
the overhead went down from 17% of qgit-1.5 to current 6% of qgit-2.1,
it means that opening and loading a repository with qgit-2.1 is only
6% slower then running the underlying git command from the command
line.


So the problem seems to be in the underlying command that for qgit-1.5
is git-rev-list while for new qgit-2.1 is git-log


To have some numbers I have tested on the Linux repository with the
actual git commands used by the two versions of qgit:

[marco@localhost linux-2.6]$ git --version
git version 1.5.4-rc1.GIT


[marco@localhost linux-2.6]$ time git log --topo-order --no-color
--parents --boundary -z --log-size
--pretty=format:"%m%HX%PX%n%an<%ae>%n%at%n%s%n%b" HEAD > /dev/null
3.60user 0.09system 0:03.70elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+27156minor)pagefaults 0swaps


[marco@localhost linux-2.6]$ time git rev-list --topo-order --no-color
--parents --boundary -z --header HEAD > /dev/null
2.89user 0.08system 0:02.98elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+27156minor)pagefaults 0swaps


[ BTW, yes I have a new laptop! ;-) ]


Note that the output is smaller for git-log because --pretty=format
asks for less info then the --header option used by git-rev-list. To
be precise, output is  57.113.153 bytes for git-rev-list against
41.755.328 bytes for git-log.



So the bottom line is that git-log is 24% slower then git-rev-list
although size of its output is 36% smaller!


Could someone be so kind to explain me why these differences? I'm not
so confident with git-log /git-rev-list internals.


Thanks
Marco

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RFH] git-log vs git-rev-list performance
  2007-12-29 12:18 [RFH] git-log vs git-rev-list performance Marco Costalba
@ 2007-12-29 18:51 ` Linus Torvalds
  2007-12-29 20:05   ` Marco Costalba
  0 siblings, 1 reply; 3+ messages in thread
From: Linus Torvalds @ 2007-12-29 18:51 UTC (permalink / raw)
  To: Marco Costalba; +Cc: Git Mailing List



On Sat, 29 Dec 2007, Marco Costalba wrote:
>
> [marco@localhost linux-2.6]$ time git log --topo-order --no-color
> --parents --boundary -z --log-size
> --pretty=format:"%m%HX%PX%n%an<%ae>%n%at%n%s%n%b" HEAD > /dev/null

Don't compare "--pretty=format" to the pre-formatted versions.

Use "--pretty=raw" for "git log" if you want to approximate "git 
rev-list --header". 

Or alternatively use the same "--pretty=format:" for git-rev-list.

If you start using anything else, you only have yourself to blame. OF 
COURSE it's more expensive to pretty-format the messages.

I get

	[torvalds@woody linux]$ time git rev-list \
		--pretty=format:"%m%HX%PX%n%an<%ae>%n%at%n%s%n%b" \
		--topo-order --parents --boundary --header \
		--log-size HEAD	> /dev/null

	real    0m1.596s
	user    0m1.556s
	sys     0m0.040s

	[torvalds@woody linux]$ time git log \
		--pretty=format:"%m%HX%PX%n%an<%ae>%n%at%n%s%n%b" \
		--topo-order --parents --boundary \
		--log-size HEAD > /dev/null

	real    0m1.597s
	user    0m1.548s
	sys     0m0.048s

so I'd say that with the same output, the timings are pretty much the 
same (except "git log" is more capable - "--log-size" does nothing for 
"git rev-list", for example).

But if you ask for different formats, they'll have different performance, 
even if you then use the same command (ie "git log" will be slower than 
"git log" depending on the command line arguments!)

		Linus

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RFH] git-log vs git-rev-list performance
  2007-12-29 18:51 ` Linus Torvalds
@ 2007-12-29 20:05   ` Marco Costalba
  0 siblings, 0 replies; 3+ messages in thread
From: Marco Costalba @ 2007-12-29 20:05 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List

On Dec 29, 2007 7:51 PM, Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
>
> On Sat, 29 Dec 2007, Marco Costalba wrote:
> >
> > [marco@localhost linux-2.6]$ time git log --topo-order --no-color
> > --parents --boundary -z --log-size
> > --pretty=format:"%m%HX%PX%n%an<%ae>%n%at%n%s%n%b" HEAD > /dev/null
>
> Don't compare "--pretty=format" to the pre-formatted versions.
>
> Use "--pretty=raw" for "git log" if you want to approximate "git
> rev-list --header".
>

I have switched to --pretty=format instead of preformatted one to save
RAM, becuase needed memory is about 35% less with a custom format, the
preformatted ones give me additional info that is not shown on qgit so
it's just a waste.

As example a full Linux tree loaded with qgit takes less then 80MB,
with gitk as comparison we are above 400MB although of course the
optimized format is not the whole reason for this difference.

What I have seen looking expecially at the pretty.c sources with a
profiler is that the custom format is continuosly reparsed _for each
revision_ also if it never changes during the whole git-log run. This
could explain why the custom format although cheaper in terms of
quantity of outputted data is slower then a preformatted one.

A caching of the parsed custom --pretty=format at the beginning of
git-log could help...

Marco

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2007-12-29 20:05 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-12-29 12:18 [RFH] git-log vs git-rev-list performance Marco Costalba
2007-12-29 18:51 ` Linus Torvalds
2007-12-29 20:05   ` Marco Costalba

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).