git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: Michael Haggerty <mhagger@alum.mit.edu>,
	git discussion list <git@vger.kernel.org>
Subject: Re: [PATCH 2/2] log: do not shorten decoration names too early
Date: Thu, 14 May 2015 13:49:45 -0400	[thread overview]
Message-ID: <20150514174945.GB7966@peff.net> (raw)
In-Reply-To: <xmqq8ucr83h8.fsf@gitster.dls.corp.google.com>

On Thu, May 14, 2015 at 10:37:39AM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > While we are on the subject of the name_decoration code, I had
> > considered at one point replacing the use of the decorate.[ch] hash
> > table with a commit_slab (you can't do it in the general case, because
> > decorate.[ch] handles arbitrary objects, but the name_decoration code
> > only does commits). It would in theory be faster, though I don't know if
> > the time we spend on the hash table is actually measurable (we make a
> > lot of queries on it, but it doesn't actually get that big in the first
> > place).
> 
> Hmmm, but I do not know if commit_slab is a good fit for the usage
> pattern.  I expect commit objects to be fairly sparsely decorated
> (e.g. a tag or ref pointing at say 1-2% of commits or fewer).
> Wouldn't the hashtable used by decorate.[ch] with the max load
> factor capped to 66% a better economy?

Good point. A slab would be less memory efficient, but cheaper to look
up (it is a direct memory reference, with no probing and no hashcmp()).
But cache effects matter, so it could even be slower.

On the other hand, the slab makes it easy to actually store the real
type (struct name_decoration), whereas the decorate hash stores only
pointers. So we'd save an extra malloc/pointer in each case.

So with your slab_peek() below, I'd guess that the slab would actually
be faster. But I'd also be unsurprised if it makes no appreciable
difference to the overall runtime of "git log --decorate". I think we'd
have to build it and profile (and please feel free to say "eh, not worth
the time to think about further").

> I notice that there is no API into commit_slab to ask "Does this
> commit have data in the slab?"  *slabname##_at() may be the closest
> thing, but that would allocate the space and then says "This is the
> slot for that commit; go check if there is data there already."

Yes. I think it's not a big deal if your slab is reasonably full (you'll
extend it to the full size of your commit space eventually either way).
But for a sparse slab, it does make that query much more expensive than
it needs to be.

> Perhaps we want a new function *slabname##_peek() with the same
> signature as *slabname##_at() that returns NULL when commit->index
> is larger than the last existing element in the slab?  Then the
> above would become more like:
> 
>     const void *get_cached_commit_buffer(const struct commit *commit, unsigned long *sizep)
>     {
>             struct commit_buffer *v = buffer_slab_peek(&buffer_slab, commit);
>             if (!v)
>                     return NULL;
>             if (sizep)
>                     *sizep = v->size;
>             return v->buffer;
>     }

Yeah, I agree that is a good interface. It is basically a read-only
version of slab_at(). And it should be similarly agnostic as to the type
we are storing, because the pointer is into the slab array.

I'm not sure that the memory overhead is that big a deal (even in the
kernel, we are only talking about a few megabytes). But it is wasteful,
and the interface above should be trivial to write, so it might be worth
doing anyway.

-Peff

  reply	other threads:[~2015-05-14 17:49 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-13 13:11 "HEAD -> branch" decoration doesn't work with "--decorate=full" Michael Haggerty
2015-05-13 14:51 ` Junio C Hamano
2015-05-13 15:26   ` Michael J Gruber
2015-05-13 17:11 ` Junio C Hamano
2015-05-13 17:13   ` Junio C Hamano
2015-05-13 19:40     ` [PATCH 2/2] log: do not shorten decoration names too early Junio C Hamano
2015-05-14  6:33       ` Jeff King
2015-05-14 17:37         ` Junio C Hamano
2015-05-14 17:49           ` Jeff King [this message]
2015-05-14 18:01             ` Junio C Hamano
2015-05-14 18:10               ` Jeff King
2015-05-14 21:49           ` Junio C Hamano
2015-05-14 21:54             ` Jeff King
2015-05-14 22:25               ` Junio C Hamano
2015-05-14 22:33                 ` Jeff King
2015-05-22 21:21                   ` Junio C Hamano
2015-05-22 21:38                     ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150514174945.GB7966@peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=mhagger@alum.mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).