git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicolas Pitre <nico@cam.org>, Git Mailing List <git@vger.kernel.org>
Subject: Re: Some git performance measurements..
Date: Thu, 29 Nov 2007 21:00:21 -0800	[thread overview]
Message-ID: <7v3auos4yi.fsf@gitster.siamese.dyndns.org> (raw)
In-Reply-To: <alpine.LFD.0.9999.0711290945060.8458@woody.linux-foundation.org> (Linus Torvalds's message of "Thu, 29 Nov 2007 09:48:19 -0800 (PST)")

Linus Torvalds <torvalds@linux-foundation.org> writes:

> Umm. See my earlier numbers. For "git checkout" with cold cache, the 
> *bulk* of the time is actually the ".gitignore" file lookups, so if you 
> see a three-second improvement out of 17s, it may not look spectacular, 
> but considering that probably 10s of those 17s were something *else* going 
> on, I suspect that if you really did just a plain "git checkout", you 
> actually *do* have a spectacular improvement of roughly 7s -> 4s!

I am hoping that "probably 10s of those 17s" can actually be measured
with the patch I sent out last night.  Has anybody took a look at it?

Partitioning the pack data by object type shifts the tradeoffs from the
current "the data in the same tree are mostly together, except commits
are treated differently because rev walk is done quite often" layout.
Because we do not ever look at blob objects while pruning the history
(unless the -Spickaxe option is used, I think), partitioned layout would
optimize ancestry walking even more than the current packfile layout.

On the other hand, any operation that wants to look at the contents are
penalized.  A two-tree diff that inspects the contents (e.g. fuzzy
renames and pickaxe) needs to read from the tree section to find which
blob to compare with which other blob, and and then needs to seek to the
blob section to actually read the contents, while the current layout
tends to group both trees and blobs that belong to the same tree
together.  It is natural that blame is penalized by the new layout,
mostly because it needs to grab two blobs to compare from parent-child
pair, but also because it needs to find two-tree diffs for parent-child
pair it traverses whenever it needs to follow across renames (that is,
when it sees there is no corresponding path in the parent).  I would
expect to see similar slowdown from grep which wants to inspect blobs
that are in the same tree.

When I do archaeology, I think I often run blame first to see which
change made the block of text into the current shape first, and then run
a path limited "git log -p" either starting or ending at that revision.
In that workflow, the initial blame may get slower with the new layout,
but I suspect it would help by speeding up the latter "git log -p" step.

  parent reply	other threads:[~2007-11-30  5:00 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-11-29  2:49 Some git performance measurements Linus Torvalds
2007-11-29  3:14 ` Linus Torvalds
2007-11-29  3:59   ` Nicolas Pitre
2007-11-29  4:32     ` Linus Torvalds
2007-11-29 17:25       ` Nicolas Pitre
2007-11-29 17:48         ` Linus Torvalds
2007-11-29 18:52           ` Nicolas Pitre
2007-11-30  5:00           ` Junio C Hamano [this message]
2007-11-30  6:03             ` Linus Torvalds
2007-11-30  0:54         ` Jakub Narebski
2007-11-30  2:21           ` Linus Torvalds
2007-11-30  2:39             ` Jakub Narebski
2007-11-30  2:40             ` Nicolas Pitre
2007-11-30  6:11               ` Steffen Prohaska
2007-12-07 13:35                 ` Mike Ralphson
2007-12-07 13:49                   ` Johannes Schindelin
2007-12-07 16:07                     ` Linus Torvalds
2007-12-07 16:09                     ` Mike Ralphson
2007-12-07 18:37                       ` Johannes Schindelin
2007-12-07 19:15                         ` Mike Ralphson
2007-12-08 11:05                           ` Johannes Schindelin
2007-12-08 23:04                             ` Brian Downing
2007-11-30  2:54             ` Linus Torvalds
2007-12-05  1:04               ` Federico Mena Quintero
2007-12-01 11:36   ` Joachim B Haga
2007-12-01 17:19     ` Linus Torvalds
2007-11-29  5:17 ` Junio C Hamano
2007-11-29 10:17   ` [PATCH] per-directory-exclude: lazily read .gitignore files Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7v3auos4yi.fsf@gitster.siamese.dyndns.org \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=nico@cam.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).