git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jonathan Nieder <jrnieder@gmail.com>
To: Shawn Pearce <spearce@spearce.org>
Cc: git@vger.kernel.org, David Barr <david.barr@cordelta.com>,
	Nicolas Pitre <nico@fluxnic.net>,
	Raja R Harinath <harinath@hurrynot.org>,
	Sverre Rabbelier <srabbelier@gmail.com>
Subject: Re: [PATCH/RFC] fast-import: insert new object entries at start of hash bucket
Date: Tue, 23 Nov 2010 17:17:18 -0600	[thread overview]
Message-ID: <20101123231718.GA4317@burratino> (raw)
In-Reply-To: <AANLkTikqUjjjMRzWTcEOs+2PGu=-9VVbdn0YgpabFaDu@mail.gmail.com>

Shawn Pearce wrote:
> On Mon, Nov 22, 2010 at 11:53 PM, Jonathan Nieder <jrnieder@gmail.com> wrote:

>> Other aspects to investigate: choice of hash function;
>
> Why?  SHA-1 is pretty uniform in its distribution.

I got distracted for a moment by the atom table, but since that does
not have a big effect on performance it's probably not worth spending
time on.  Sorry about that; please ignore.

[...]
>                                                           The way I
> read this store_tree() code, every subdirectory is recursed into even
> if no modifications were made inside of that subdirectory during the
> current commit.

Doesn't the is_null_sha1 check avoid that?

To further explain the workload: svn-fe receives its blobs from svn
in the form of deltas.  So the conversation might go like this:

	S	commit refs/heads/master
	S	mark :10000
	S	committer felicity <felicity@local>
	S	data 74
	S	bug 3097: switch spamd from doing 'fork per message' to a 'prefork' model
	S	cat incubator/spamassassin/trunk/spamd/spamd.raw
	F	89d56462577b8b7b4f4115f2a47f0b3da22b791a blob 63633
	F	#!/usr/bin/perl -w -T
	...
	S	M 100644 inline incubator/spamassassin/trunk/spamd/spamd.raw
	S	data 62114
	...

Current svn-fe in vcs-svn-pu requests the preimage blobs using marks,
but the idea is the same.

If this proves a bottleneck I suppose we could cache the content of
frequently-requested old blobs and keep pointers to that in the
in-core tree.

  reply	other threads:[~2010-11-23 23:17 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-23  7:53 [PATCH/RFC] fast-import: insert new object entries at start of hash bucket Jonathan Nieder
2010-11-23 12:51 ` Sverre Rabbelier
2010-11-23 18:19   ` Jonathan Nieder
2010-11-23 22:33 ` Shawn Pearce
2010-11-23 23:17   ` Jonathan Nieder [this message]
2010-11-23 23:29     ` Shawn Pearce

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20101123231718.GA4317@burratino \
    --to=jrnieder@gmail.com \
    --cc=david.barr@cordelta.com \
    --cc=git@vger.kernel.org \
    --cc=harinath@hurrynot.org \
    --cc=nico@fluxnic.net \
    --cc=spearce@spearce.org \
    --cc=srabbelier@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).