git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Nguyen Thai Ngoc Duy <pclouds@gmail.com>
Cc: Junio C Hamano <gitster@pobox.com>, git@vger.kernel.org
Subject: [PATCH 0/8] caching rename results
Date: Sat, 4 Aug 2012 13:09:05 -0400	[thread overview]
Message-ID: <20120804170905.GA19267@sigill.intra.peff.net> (raw)
In-Reply-To: <20120802224155.GB28217@sigill.intra.peff.net>

On Thu, Aug 02, 2012 at 06:41:55PM -0400, Jeff King wrote:

> > (1a) is good regardless rename overrides. Why don't you polish and
> > submit it? We can set some criteria to limit the cache size while
> > keeping computation reasonably low. Caching rename scores for file
> > pairs that has file size larger than a limit is one. Rename matrix
> > size could also be a candidate. We could even cache just rename scores
> > for recent commits (i.e. close to heads) only with the assumption that
> > people diff/apply recent commits more often.
> 
> I'll polish and share it. I'm still not 100% sure it's a good idea,
> because introducing an on-disk cache means we need to _manage_ that
> cache. How big will it be? Who will prune it when it gets too big? By
> what criteria? And so on.
> 
> But if it's all hidden behind a config option, then it won't hurt people
> who don't use it. And people who do use it can gather data on how the
> caches grow.

Here it is, all polished up. I'm still a little lukewarm on it for two
reasons:

  1. The whole idea. For the reasons above, I'm a little iffy on doing
     this cache at all. It does yield speedups, but only in some
     specific cases. So it's hidden behind a diff.renamecaches option
     and off by default.

  2. The implementation is a little...gross. Long ago, I had written a
     type-generic map class for git using void pointers. It ended up
     complex and had problems with unaligned accesses. So I rewrote it
     using preprocessor macro expansion (e.g., you'd call
     IMPLEMENT_MAP(foo, const char *, int) or similar). But that wasn't
     quite powerful enough, as I really want conditional compilation
     inside the macro expansion, but you can't #ifdef.

     So I really wanted some kind of code generation that could do
     conditionals. Which you can do with the C preprocessor, but rather
     than expanding macros, you have to #include templates that expand
     based on parameters you've set. Which is kind of ugly and
     non-intuitive, but it does work. Look at patch 1 to see what I
     mean.

     Also, this sort of pre-processor hackery to create type-generic
     data structures is the first step on the road that eventually led
     to C++ being developed. And that scares me a little.

So yeah. Here it is. I'm not sure yet if it's a good idea or not.

  [1/8]: implement generic key/value map

Infrastructure.

  [2/8]: map: add helper functions for objects as keys
  [3/8]: fast-export: use object to uint32 map instead of "decorate"
  [4/8]: decorate: use "map" for the underlying implementation

These ones are optional for this series, but since we are introducing
the infrastructure anyway (which is really just a generalized form of
what "decorate" does), it offsets the code bloat.

  [5/8]: map: implement persistent maps
  [6/8]: implement metadata cache subsystem

More infrastructure.

  [7/8]: implement rename cache
  [8/8]: diff: optionally use rename cache

And these are the actual rename cache.

-Peff

  reply	other threads:[~2012-08-04 17:09 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-07-31 14:15 [WIP PATCH] Manual rename correction Nguyen Thai Ngoc Duy
2012-07-31 16:32 ` Junio C Hamano
2012-07-31 19:23   ` Jeff King
2012-07-31 20:20     ` Junio C Hamano
2012-08-01  0:42       ` Jeff King
2012-08-01  6:01         ` Junio C Hamano
2012-08-01 21:54           ` Jeff King
2012-08-01 22:10             ` Junio C Hamano
2012-08-02 22:37               ` Jeff King
2012-08-02 22:51                 ` Junio C Hamano
2012-08-02 22:58                   ` Jeff King
2012-08-02  5:33             ` Junio C Hamano
2012-08-01  1:10     ` Nguyen Thai Ngoc Duy
2012-08-01  2:01       ` Jeff King
2012-08-01  4:36         ` Nguyen Thai Ngoc Duy
2012-08-01  6:09           ` Junio C Hamano
2012-08-01  6:34             ` Nguyen Thai Ngoc Duy
2012-08-01 21:32               ` Jeff King
2012-08-01 21:27           ` Jeff King
2012-08-02 12:08             ` Nguyen Thai Ngoc Duy
2012-08-02 22:41               ` Jeff King
2012-08-04 17:09                 ` Jeff King [this message]
2012-08-04 17:10                   ` [PATCH 1/8] implement generic key/value map Jeff King
2012-08-04 22:58                     ` Junio C Hamano
2012-08-06 20:35                       ` Jeff King
2012-08-04 17:10                   ` [PATCH 2/8] map: add helper functions for objects as keys Jeff King
2012-08-04 17:11                   ` [PATCH 3/8] fast-export: use object to uint32 map instead of "decorate" Jeff King
2012-08-04 17:11                   ` [PATCH 4/8] decorate: use "map" for the underlying implementation Jeff King
2012-08-04 17:11                   ` [PATCH 5/8] map: implement persistent maps Jeff King
2012-08-04 17:11                   ` [PATCH 6/8] implement metadata cache subsystem Jeff King
2012-08-04 22:49                     ` Junio C Hamano
2012-08-06 20:31                       ` Jeff King
2012-08-06 20:38                     ` Jeff King
2012-08-04 17:12                   ` [PATCH 7/8] implement rename cache Jeff King
2012-08-04 17:14                   ` [PATCH 8/8] diff: optionally use " Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120804170905.GA19267@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=pclouds@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).