git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <gitster@pobox.com>
To: Jeff King <peff@peff.net>
Cc: Nicolas Pitre <nico@cam.org>,
	Joshua Jensen <jjensen@workspacewhiz.com>,
	git@vger.kernel.org
Subject: Re: Reverting an uncommitted revert
Date: Wed, 20 May 2009 10:50:39 -0700	[thread overview]
Message-ID: <7vfxezzms0.fsf@alter.siamese.dyndns.org> (raw)
In-Reply-To: <20090520032139.GB10212@coredump.intra.peff.net> (Jeff King's message of "Tue\, 19 May 2009 23\:21\:39 -0400")

Jeff King <peff@peff.net> writes:

> Related to this, I have wondered if it might be useful to have an "index
> reflog". If I do something like this:
>
>   $ git add foo
>   $ hack hack hack
>   $ git add foo
>
> Then the first added state of "foo" is available in the object database,
> but it is not connected to the name "foo" in any way, which makes it
> much harder to find. If we had a reflog pointing to trees representing
> the index state after each change, then it would be simple (you could
> look at "INDEX@{1}:foo" or similar).
>
> I don't know if the performance is an issue. We are writing an extra
> tree every time we touch the index, but in many cases you are already
> writing a blob.

It is not just "an extra tree every time".  For example, in the kernel
repository, one of the path that is deepest [*1*] (i.e. whose modification
affects the most number of trees) is:

    arch/cris/include/arch-v32/mach-a3/mach/hwregs/iop/asm/iop_reg_space_asm.h

If you modify this file and then "git add", and if you write-tree the
index at that point, you need to write a tree object for ".", arch/,
arch/cris, ..., arch/cris/include/arch-v32/mach-a3/mach/hwregs/iop/asm, 10
trees in total (if I am counting them right ;-).

If your cache-tree is fresh (and if you "git write-tree" every time you
"git add", that will make it stay fresh), you do not have to recompute
object names of other 1728 tree objects (they are unchanged) [*2*], which
should help somewhat, but the majority of time is spent in the I/O (and
perhaps slow fsync on ext3 ;-) of writing these 10 tree objects [*3*].

People like Shawn who work with Java projects, where the tree hierarchy
tends to be (unnecessary) deep with prefixes like org/spearce/jgit due to
the namespace issues will have bigger overhead than a relatively shallow
project like git.git itself.

[Footnotes]

*1* You can find it out yourself with...

git ls-files "$(
    git ls-files |
    sed -e 's|[^/]||g' |
    sort -u |
    tail -n 1 |
    sed -e 's|/|*/|g' -e 's/$/*/'
)" | head -n 1

*2* The total number of tree objects in a commit is...

echo $(git ls-tree -r -d HEAD | wc -l) 1 + p | dc

*3* write-tree with or without help from cache-tree in the kernel
repository with a hot cache (we are talking about running "git write-tree"
every time you do "git add" so the cold cache case does not matter) looks
like this:

$ l=arch/cris/include/arch-v32/mach-a3/mach/hwregs/iop/asm/iop_reg_space_asm.h
$ echo >>$l && git add $l
$ /usr/bin/time git write-tree
04bc92c40a5d0f0d44e162e140cb00964a52046b
0.02user 0.01system 0:00.03elapsed 102%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+6387minor)pagefaults 0swaps

$ git reset --hard

$ echo >>$l && git add $l
$ /usr/bin/time git write-tree --ignore-cache-tree
04bc92c40a5d0f0d44e162e140cb00964a52046b
0.13user 0.04system 0:00.17elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+5336outputs (0major+17141minor)pagefaults 0swaps

(The numbers are from my Athlon(tm) 64 X2 3800+ with slow IDE disks).

  parent reply	other threads:[~2009-05-20 17:50 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-05-20  2:34 Reverting an uncommitted revert Joshua Jensen
2009-05-20  3:10 ` Nicolas Pitre
2009-05-20  3:21   ` Jeff King
2009-05-20  3:35     ` Nicolas Pitre
2009-05-20  3:38       ` Jeff King
2009-05-20  4:58       ` Ping Yin
2009-05-20  9:15       ` Wincent Colaiuta
2009-05-20 10:16         ` Jakub Narebski
2009-05-20 12:53         ` Nicolas Pitre
2009-05-20 14:17           ` Shawn O. Pearce
2009-05-20 16:55             ` Eric Raible
2009-05-20 17:59             ` Junio C Hamano
2009-05-20 18:19               ` Nicolas Pitre
2009-05-20 18:25                 ` Nicolas Pitre
2009-05-20 18:57                 ` Shawn O. Pearce
2009-05-21  6:16                 ` Junio C Hamano
2009-05-20 18:21               ` Jakub Narebski
2009-05-20 15:23           ` Wincent Colaiuta
2009-05-20 15:47             ` Nicolas Pitre
2009-05-20 16:13               ` Sverre Rabbelier
2009-05-20 16:58                 ` Jeff King
2009-05-20 18:04                   ` Nicolas Pitre
2009-05-20 18:08                     ` Sverre Rabbelier
2009-05-21  3:47                     ` Jeff King
2009-05-20 17:50     ` Junio C Hamano [this message]
2009-05-20 18:27       ` [PATCH] write-tree --ignore-cache-tree Junio C Hamano
2009-05-21  0:40         ` [PATCH 1/2] cache-tree.c::cache_tree_find(): simplify inernal API Junio C Hamano
2009-05-21  0:44         ` [PATCH 2/2] Optimize "diff-index --cached" using cache-tree Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7vfxezzms0.fsf@alter.siamese.dyndns.org \
    --to=gitster@pobox.com \
    --cc=git@vger.kernel.org \
    --cc=jjensen@workspacewhiz.com \
    --cc=nico@cam.org \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).