git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michael Haggerty <mhagger@alum.mit.edu>
To: Heiko Voigt <hvoigt@hvoigt.net>
Cc: git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>,
	Jeff King <peff@peff.net>, Drew Northup <drew.northup@maine.edu>,
	Jakub Narebski <jnareb@gmail.com>,
	Josh Triplett <josh@joshtriplett.org>
Subject: Re: [PATCH 6/6] Retain caches of submodule refs
Date: Wed, 24 Aug 2011 10:17:24 +0200	[thread overview]
Message-ID: <4E54B394.2070006@alum.mit.edu> (raw)
In-Reply-To: <20110813125438.GC4783@book.hvoigt.net>

On 08/13/2011 02:54 PM, Heiko Voigt wrote:
> On Sat, Aug 13, 2011 at 12:36:29AM +0200, Michael Haggerty wrote:
>> diff --git a/refs.c b/refs.c
>> index 8d1055d..f02cf94 100644
>> --- a/refs.c
>> +++ b/refs.c
> 
> ...
> 
>> @@ -205,23 +208,28 @@ struct cached_refs *create_cached_refs(const char *submodule)
>>   */
>>  static struct cached_refs *get_cached_refs(const char *submodule)
>>  {
>> -	if (! submodule) {
>> -		if (!cached_refs)
>> -			cached_refs = create_cached_refs(submodule);
>> -		return cached_refs;
>> -	} else {
>> -		if (!submodule_refs)
>> -			submodule_refs = create_cached_refs(submodule);
>> -		else
>> -			/* For now, don't reuse the refs cache for submodules. */
>> -			clear_cached_refs(submodule_refs);
>> -		return submodule_refs;
>> +	struct cached_refs *refs = cached_refs;
>> +	if (! submodule)
>> +		submodule = "";
> 
> Maybe instead of searching for the main refs store a pointer to them
> locally so you can immediately return here. That will keep the
> performance when requesting the main refs the same.

I am assuming that the number of submodules will be small compared to
the number of refs in a typical submodule.  Therefore, given that the
refs are stored in a big linked list and that the only thing that can
realistically be done with the list is to iterate through it, the cost
of finding the reference cache for a specific (sub-)module should be
negligible compared to the cost of the iteration over refs in the cache.
 Treating the main module like the other submodules makes the code
simpler, so I would prefer to leave it this way.

If iteration over refs ever becomes a bottleneck, then optimization of
the storage of refs within a (sub-)module would be a bigger win than
special-casing the main module.  And that is what I would like to work
towards.

> If I see it correctly you are always prepending to the linked list 

This is true.

>                                                                    and
> in case many submodules get cached this could slow down the iteration
> over the refs of the main repository.

Is this a realistic concern?  Remember that the extra search over
submodules only occurs once for each scan through the list of references.

I wrote an additional patch that moves the least-recently accessed
module to the front of the list.  But I doubt that the savings justify
the 10-odd extra lines of code, so I kept it to myself.  Please note
that this approach gives pessimal performance (2x slower) if the
submodules are iterated over repeatedly.  If you would like me to add
this patch to the patch series, please let me know.

Long-term, it would be better to implement a "struct submodule" and use
it to hold submodule-specific data like the ref cache.  Then users could
hold on to the "struct submodule *" and pass it, rather than the
submodule name, to the functions that need it.  But this goes beyond the
scope of what I want to change now, especially since I have no
experience even working with submodules.

Summary: I hope I have convinced you that the extra overhead is
negligible and does not justify additional code.  But if you insist on
more emphasis on performance (or obviously if you have numbers to back
up your concerns) then I would be willing to put more work into it.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/

  reply	other threads:[~2011-08-24  8:18 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-08-12 22:36 [PATCH 0/6] Retain caches of submodule refs Michael Haggerty
2011-08-12 22:36 ` [PATCH 1/6] Extract a function clear_cached_refs() Michael Haggerty
2011-08-12 22:36 ` [PATCH 2/6] Access reference caches only through new function get_cached_refs() Michael Haggerty
2011-08-14 22:12   ` Junio C Hamano
2011-08-23  4:21     ` Michael Haggerty
2011-08-12 22:36 ` [PATCH 3/6] Change the signature of read_packed_refs() Michael Haggerty
2011-08-12 22:36 ` [PATCH 4/6] Allocate cached_refs objects dynamically Michael Haggerty
2011-08-14 22:21   ` Junio C Hamano
2011-08-12 22:36 ` [PATCH 5/6] Store the submodule name in struct cached_refs Michael Haggerty
2011-08-12 22:36 ` [PATCH 6/6] Retain caches of submodule refs Michael Haggerty
2011-08-13 12:54   ` Heiko Voigt
2011-08-24  8:17     ` Michael Haggerty [this message]
2011-08-24 20:05       ` Heiko Voigt
2011-08-16 22:45   ` Junio C Hamano
2011-08-24 11:33     ` Michael Haggerty
2011-10-09 11:12     ` Michael Haggerty
2011-10-09 20:10       ` Junio C Hamano
2011-10-10  5:46         ` [PATCH 0/2] Provide API to invalidate refs cache Michael Haggerty
2011-10-10  5:46           ` [PATCH 1/2] invalidate_cached_refs(): take the submodule as parameter Michael Haggerty
2011-10-10  5:46           ` [PATCH 2/2] invalidate_cached_refs(): expose this function in refs API Michael Haggerty
2011-10-10  8:24           ` [PATCH v2 0/7] Provide API to invalidate refs cache Michael Haggerty
2011-10-10  8:24             ` [PATCH v2 1/7] invalidate_ref_cache(): rename function from invalidate_cached_refs() Michael Haggerty
2011-10-11  0:00               ` Junio C Hamano
2011-10-11  5:53                 ` Michael Haggerty
2011-10-10  8:24             ` [PATCH v2 2/7] invalidate_ref_cache(): take the submodule as parameter Michael Haggerty
2011-10-10  8:24             ` [PATCH v2 3/7] invalidate_ref_cache(): expose this function in refs API Michael Haggerty
2011-10-10  8:24             ` [PATCH v2 4/7] clear_cached_refs(): rename parameter Michael Haggerty
2011-10-10  8:24             ` [PATCH v2 5/7] clear_cached_refs(): extract two new functions Michael Haggerty
2011-10-10  8:24             ` [PATCH v2 6/7] write_ref_sha1(): only invalidate the loose ref cache Michael Haggerty
2011-10-10  8:24             ` [PATCH v2 7/7] clear_cached_refs(): inline function Michael Haggerty
2011-10-11  0:02             ` [PATCH v2 0/7] Provide API to invalidate refs cache Junio C Hamano
2011-10-11  5:50               ` Michael Haggerty
2011-10-11  8:09                 ` Julian Phillips
2011-10-11 17:26                 ` Junio C Hamano
2011-10-12 18:44               ` [PATCH v3 " Michael Haggerty
2011-10-12 18:44                 ` [PATCH v3 1/7] invalidate_ref_cache(): rename function from invalidate_cached_refs() Michael Haggerty
2011-10-12 19:14                   ` Junio C Hamano
2011-10-12 22:12                     ` Michael Haggerty
2011-10-12 18:44                 ` [PATCH v3 2/7] invalidate_ref_cache(): take the submodule as parameter Michael Haggerty
2011-10-12 19:19                   ` Junio C Hamano
2011-10-12 22:07                     ` Michael Haggerty
2011-10-17 18:00                       ` Junio C Hamano
2011-11-03 10:23                         ` Michael Haggerty
2011-11-03 18:57                           ` Junio C Hamano
2011-10-12 18:44                 ` [PATCH v3 3/7] invalidate_ref_cache(): expose this function in refs API Michael Haggerty
2011-10-12 18:44                 ` [PATCH v3 4/7] clear_cached_refs(): rename parameter Michael Haggerty
2011-10-12 18:44                 ` [PATCH v3 5/7] clear_cached_refs(): extract two new functions Michael Haggerty
2011-10-12 18:44                 ` [PATCH v3 6/7] write_ref_sha1(): only invalidate the loose ref cache Michael Haggerty
2011-10-12 18:44                 ` [PATCH v3 7/7] clear_cached_refs(): inline function Michael Haggerty
2011-10-12 19:28                 ` [PATCH v3 0/7] Provide API to invalidate refs cache Junio C Hamano
2011-10-10 19:53       ` Re: [PATCH 6/6] Retain caches of submodule refs Heiko Voigt
2011-10-11  4:12         ` Michael Haggerty
2011-10-11 17:41           ` Heiko Voigt
2011-08-13 12:34 ` [PATCH 0/6] " Heiko Voigt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E54B394.2070006@alum.mit.edu \
    --to=mhagger@alum.mit.edu \
    --cc=drew.northup@maine.edu \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=hvoigt@hvoigt.net \
    --cc=jnareb@gmail.com \
    --cc=josh@joshtriplett.org \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).