git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Gummerer <t.gummerer@gmail.com>
To: Nguyen Thai Ngoc Duy <pclouds@gmail.com>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
	mhagger@alum.mit.edu, trast@student.ethz.ch,
	Robin Rosenberg <robin.rosenberg@dewire.com>,
	Tomas Carnecky <tom@dbservice.com>
Subject: Re: [GSoC] Designing a faster index format - Progress report week 14
Date: Tue, 24 Jul 2012 09:52:52 +0200	[thread overview]
Message-ID: <20120724075252.GB40532@tgummerer.surfnet.iacbox> (raw)
In-Reply-To: <CACsJy8A8J-FXtJezOZrmqfUPX5unbGG15A6BuZnDW+164n6-kw@mail.gmail.com>

On 07/24, Nguyen Thai Ngoc Duy wrote:
> On Tue, Jul 24, 2012 at 2:08 AM, Thomas Gummerer <t.gummerer@gmail.com> wrote:
>
> Is ls-files improvement not drastic because you do not limit subdir
> like grep? I see Thomas Rast put similar partial loading hack to
> ls-files.c so I assume it can partial load too. Is partial loading
> still fast with when a lot of unmerged entries are present?
> -- 
> Duy

Yes, exactly, the ls-files was to show the overall performance of the
index for a full read. The improvement when limiting it to a subdir
is about the same as with git grep. Here are some more timings. I'm
using update-index instead of ls-files in this tests, with a
--force-rewrite option I added which writes the index even if there
are no changes, to test the performance of both the reader and the
writer.

Test                                        this tree      
-----------------------------------------------------------
0002.2: v[23]: update-index                 0.29(0.21+0.06)
0002.3: v[23]: grep nonexistent -- subdir   0.13(0.12+0.01)
0002.5: v4: update-index                    0.26(0.20+0.05)
0002.6: v4: grep nonexistent -- subdir      0.11(0.08+0.02)
0002.7: v4: ls-files -- subdir              0.10(0.07+0.02)
0002.9: v5: update-index                    0.19(0.11+0.07)
0002.10: v5: grep nonexistent -- subdir     0.01(0.00+0.00)
0002.11: v5: ls-files -- subdir             0.01(0.00+0.00)

Partial loading is still fast with unmerged entries, since we only
need to load files that belong to a specific directory there too.
I've created about 15,000 conflicts on the webkit repository, and
got the following times:

Test                                        this tree      
-----------------------------------------------------------
0002.2: v[23]: update-index                 0.30(0.18+0.10)
0002.3: v[23]: grep nonexistent -- subdir   0.13(0.09+0.04)
0002.5: v4: update-index                    0.26(0.22+0.04)
0002.6: v4: grep nonexistent -- subdir      0.11(0.07+0.03)
0002.7: v4: ls-files -- subdir              0.10(0.09+0.01)
0002.9: v5: update-index                    0.21(0.16+0.05)
0002.10: v5: grep nonexistent -- subdir     0.01(0.00+0.00)
0002.11: v5: ls-files -- subdir             0.01(0.00+0.00)

I could create more conflicts (~180,000 files are  in the index),
but I think 15,000 already is a number that's very unlikely to
be reached.

      reply	other threads:[~2012-07-24  7:53 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-07-23 19:08 [GSoC] Designing a faster index format - Progress report week 14 Thomas Gummerer
2012-07-24  1:23 ` Nguyen Thai Ngoc Duy
2012-07-24  7:52   ` Thomas Gummerer [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120724075252.GB40532@tgummerer.surfnet.iacbox \
    --to=t.gummerer@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mhagger@alum.mit.edu \
    --cc=pclouds@gmail.com \
    --cc=robin.rosenberg@dewire.com \
    --cc=tom@dbservice.com \
    --cc=trast@student.ethz.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).