git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michael Haggerty <mhagger@alum.mit.edu>
To: Ramkumar Ramachandra <artagnon@gmail.com>
Cc: Junio C Hamano <gitster@pobox.com>,
	git@vger.kernel.org, Jeff King <peff@peff.net>,
	Drew Northup <drew.northup@maine.edu>,
	Jakub Narebski <jnareb@gmail.com>,
	Heiko Voigt <hvoigt@hvoigt.net>,
	Johan Herland <johan@herland.net>,
	Julian Phillips <julian@quantumfyre.co.uk>
Subject: Re: [PATCH 00/28] Store references hierarchically in cache
Date: Fri, 28 Oct 2011 20:45:21 +0200	[thread overview]
Message-ID: <4EAAF841.3090900@alum.mit.edu> (raw)
In-Reply-To: <CALkWK0=Hsq_yg1Vyr-3_jf9n=WcB_XNYRQa0SEhSWD5VxKBXQg@mail.gmail.com>

On 10/28/2011 03:07 PM, Ramkumar Ramachandra wrote:
> Michael Haggerty writes:
>> Therefore, this patch series changes the data structure used to store
>> the reference cache from a single array of pointers-to-struct into a
>> tree-like structure in which each subdirectory of the reference
>> namespace is stored as an array of pointers-to-entry and entries can
>> be either references or subdirectories containing more references.
> 
> Very nice! I like the idea. Can't wait to start reading the series.
> 
>>  * refs/replace is almost *always* needed even though it often
>>    doesn't even exist.  Thus the presence of many loose references
>>    slows down *many* git commands for no reason whatsoever.
> 
> Was this one of your primary inspirations for writing this series?

My primary inspiration was that "git filter-branch" was so slow, which
is partly because of the refs/replace thing and partly just the built-in
inefficiency of the old reference cache.

>> * the time to create a new branch goes from 180 ms to less than 10 ms
>>  (my test resolution only includes two decimal places) and the time
>>  to checkout a new branch does the same.
> 
> I'm interested in seeing how the callgraph changed.  Assuming you used
> Valgrind to profile it, could you publish the outputs?

I didn't use valgrind; I just timed commands using time(1).

>> The efficiency gains are such that some operations are now faster with
>> loose references than with packed references; however, some operations
>> with packed references slow down a bit.
> 
> Curiously, why do operations with packed references slow down?  (I'll
> probably find out in a few minutes after reading the series, but I'm
> asking anyway because it it's very non-obvious to me now)

I think it's just because the new data structure is slightly slower than
the old one for the task of appending thousands of refs without doing
any searching or sorting.  Since packed refs are read all-or-nothing
(even after my changes), there is no way to read the packed refs only
for the directory that you are interested in.

>> Patches 11-24 change most of the internal functions to work with
>> "struct ref_entry *" (namely the kind of ref_entry that holds a
>> directory of references) instead of "struct ref_dir *".  The reason
>> for this change it to allow these functions access to the "flag" and
>> "name" fields that are stored in ref_entry and thereby avoid having to
>> store redundant information in "struct ref_dir" (which would increase
>> the size of *every* ref_entry because of its presence in the union).
> 
> Hm, I was wondering why the series was looking so intimidating.  Is it
> not possible to squash all (or atleast some) of these together?

Why would I possibly want to squash them together?  Each one is
self-contained and logically separate from the rest.  After each commit
the code compiles and works.  Squashing them together wouldn't increase
the cognitive burden of reading them; on the contrary, it would make it
harder to read them because several logically-separate changes would be
confounded into a single patch.  Most of these patches have only a few
nontrivial lines which you can read at a glance.  And if the series ever
has to be bisected, the error can be narrowed down to a very small diff.

>> From: Michael Haggerty <mhagger@alum.mit.edu>
> 
> Nit: Can't you configure your email client to put this in the "From: "
> header of your emails?

I'm not sure where those extra lines come from.  My email client is
git-send-email.  I just checked, and they have appeared in patch series
sent through two completely different SMTP servers, so it seems unlikely
that the SMTP server is guilty.  I'll see if I can figure it out.

> Thanks for the interesting read.

Thanks for reading :-)

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

  reply	other threads:[~2011-10-28 18:45 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-10-28 12:28 [PATCH 00/28] Store references hierarchically in cache mhagger
2011-10-28 12:28 ` [PATCH 01/28] refs.c: reorder definitions more logically mhagger
2011-10-28 12:28 ` [PATCH 02/28] free_ref_entry(): new function mhagger
2011-10-28 12:28 ` [PATCH 03/28] check_refname_component(): return 0 for zero-length components mhagger
2011-10-28 12:28 ` [PATCH 04/28] struct ref_entry: nest the value part in a union mhagger
2011-10-28 12:28 ` [PATCH 05/28] refs.c: rename ref_array -> ref_dir mhagger
2011-10-28 12:28 ` [PATCH 06/28] refs: store references hierarchically mhagger
2011-10-28 12:28 ` [PATCH 07/28] sort_ref_dir(): do not sort if already sorted mhagger
2011-10-28 12:28 ` [PATCH 08/28] refs: sort ref_dirs lazily mhagger
2011-10-28 12:28 ` [PATCH 09/28] do_for_each_ref(): only iterate over the subtree that was requested mhagger
2011-10-28 12:28 ` [PATCH 10/28] get_ref_dir(): keep track of the current ref_dir mhagger
2011-10-28 12:28 ` [PATCH 11/28] refs: wrap top-level ref_dirs in ref_entries mhagger
2011-10-28 12:28 ` [PATCH 12/28] get_packed_refs(): return (ref_entry *) instead of (ref_dir *) mhagger
2011-10-28 12:28 ` [PATCH 13/28] get_loose_refs(): " mhagger
2011-10-28 12:28 ` [PATCH 14/28] is_refname_available(): take " mhagger
2011-10-28 12:28 ` [PATCH 15/28] find_ref(): " mhagger
2011-10-28 12:28 ` [PATCH 16/28] read_packed_refs(): " mhagger
2011-10-28 12:28 ` [PATCH 17/28] add_ref(): " mhagger
2011-10-28 12:28 ` [PATCH 18/28] find_containing_direntry(): use " mhagger
2011-10-28 12:28 ` [PATCH 19/28] search_ref_dir(): take " mhagger
2011-10-28 12:28 ` [PATCH 20/28] add_entry(): " mhagger
2011-10-28 12:28 ` [PATCH 21/28] do_for_each_ref_in_dir*(): " mhagger
2011-10-28 12:28 ` [PATCH 22/28] sort_ref_dir(): " mhagger
2011-10-28 12:28 ` [PATCH 23/28] struct ref_dir: store a reference to the enclosing ref_cache mhagger
2011-10-28 12:28 ` [PATCH 24/28] read_loose_refs(): take a (ref_entry *) as argument mhagger
2011-10-28 12:28 ` [PATCH 25/28] refs: read loose references lazily mhagger
2011-10-28 12:28 ` [PATCH 26/28] is_refname_available(): query only possibly-conflicting references mhagger
2011-11-15  5:55   ` [PATCH] Fix "is_refname_available(): query only possibly-conflicting references" mhagger
2011-11-15  7:24     ` Junio C Hamano
2011-11-15 16:19       ` Michael Haggerty
2011-11-15 19:19         ` Junio C Hamano
2011-10-28 12:28 ` [PATCH 27/28] read_packed_refs(): keep track of the directory being worked in mhagger
2011-10-28 12:28 ` [PATCH 28/28] repack_without_ref(): call clear_packed_ref_cache() mhagger
2011-10-28 13:07 ` [PATCH 00/28] Store references hierarchically in cache Ramkumar Ramachandra
2011-10-28 18:45   ` Michael Haggerty [this message]
2011-11-16 12:51 ` [PATCH 00/28] Store references hierarchically in cache -- benchmark results Michael Haggerty

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4EAAF841.3090900@alum.mit.edu \
    --to=mhagger@alum.mit.edu \
    --cc=artagnon@gmail.com \
    --cc=drew.northup@maine.edu \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=hvoigt@hvoigt.net \
    --cc=jnareb@gmail.com \
    --cc=johan@herland.net \
    --cc=julian@quantumfyre.co.uk \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).