git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nick Edelen <sirnot@gmail.com>
To: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Cc: Michael J Gruber <git@drmicha.warpmail.net>,
	Junio C Hamano <gitster@pobox.com>, Jeff King <peff@peff.net>,
	Sam Vilain <sam@vilain.net>,
	"Shawn O. Pearce" <spearce@spearce.org>,
	Andreas Ericsson <exon@op5.se>,
	Christian Couder <christian@couder.net>,
	"git@vger.kernel.org" <git@vger.kernel.org>
Subject: Re: [PATCH 0/5] Suggested for PU: revision caching system to  significantly speed up packing/walking
Date: Thu, 6 Aug 2009 22:01:39 +0200	[thread overview]
Message-ID: <c77435a80908061301n5e855aeci16af392ed3128651@mail.gmail.com> (raw)
In-Reply-To: <alpine.DEB.1.00.0908062030340.8306@pacific.mpi-cbg.de>

Hi,

>My idea with that was that you already have a SHA-1 map in the pack index,
>and if all you want to be able to accelerate the revision walker, you'd
>probably need something that adds yet another mapping, from commit to
>parents and tree, and from tree to sub-tree and blob (so you can avoid
>unpacking commit and tree objects).

As I mention in one of the patch descriptions, along with each commit
a list of objects introduced per commit is cached, so no extra I/O is
necessary for tree recursion, etc. during traversal.

>I just thought that it could be more efficient to do it at the time the
>pack index is written _anyway_, as nothing will change in the pack after
>that anyway.

Nothing might change in the pack, but the slices were made to allow
for continual addition and refinement of the cache.  In a typical
usage slices will be added and fused on a regular basis, which would
require tinkering in pack indexes if they were combined.

>But I guess I can answer my question easily myself: the boundary commits
>will not be handled that way.
>
>Still, there is some redundancy between the pack index and your cache, as
>you have to write out the whole list of SHA-1s all over again.  I guess it
>is time to look at the code instead of asking stupid questions.

The whole revision cache is redundant, technically speaking: nothing
in it can't be found by rummaging through packs or objects.  The point
of it was to distill out important information for fast, easy access
of the commit tree.

On another note, I've eliminated the python dependancy.  Shall I
resend the patchset now or should I wait until it has been further
reviewed? (don't want to flood the list with resubmits)

 - Nick

On Thu, Aug 6, 2009 at 9:06 PM, Johannes
Schindelin<Johannes.Schindelin@gmx.de> wrote:
> Hi,
>
> On Thu, 6 Aug 2009, Nick Edelen wrote:
>
>> > Sorry, I forgot the details, could you quickly remind me why these
>> > caches are not in the pack index files?
>>
>> Er, I'm not sure what you mean.  Are you asking why these revision
>> caches are required if we have a pack index, or why they aren't in the
>> pack index, or something different?  I'm thinking probably the second:
>
> Yep.
>
>> the short answer is that cache slices are totally independant of pack
>> files.
>
> My idea with that was that you already have a SHA-1 map in the pack index,
> and if all you want to be able to accelerate the revision walker, you'd
> probably need something that adds yet another mapping, from commit to
> parents and tree, and from tree to sub-tree and blob (so you can avoid
> unpacking commit and tree objects).
>
> I just thought that it could be more efficient to do it at the time the
> pack index is written _anyway_, as nothing will change in the pack after
> that anyway.
>
> But I guess I can answer my question easily myself: the boundary commits
> will not be handled that way.
>
> Still, there is some redundancy between the pack index and your cache, as
> you have to write out the whole list of SHA-1s all over again.  I guess it
> is time to look at the code instead of asking stupid questions.
>
>> It might be possible to somehow merge revision cache slices with pack
>> indexes, but I don't think it'd be a very suitable modification.  The
>> rev-cache slices are meant to act almost like topo-relation pack files:
>> new slices are simply new files, seperate slice files can be fused
>> ("repacked") into a larger one, the index is a (recreatable) single file
>> associating file (positions) with objects.  The format was geared to
>> reducing potential cache/data loss and preventing overly large cache
>> slices.
>>
>> >> Hmpf.
>> >>
>> >> We got rid of the last Python script in Git a long time ago, but now
>> >> two different patch series try to sneak that dependency (at least for
>> >> testing) back in.
>> >>
>> >> That's all the worse because we cannot use Python in msysGit, and
>> >> Windows should be a platform benefitting dramatically from your work.
>> >
>> > In fact, the test the script performs could be easily rephrased with
>> > "sort", "uniq" and "comm". OTOH: If the walker is supposed to return
>> > the exact same orderd list of commits you can just use test_cmp.
>>
>> The language that script is written in isn't important.  I originally
>> wrote it in python because I wanted something quick and wasn't much of
>> a sh guru (sorry :-/ ).  As Micheal said I've no doubt it can easily
>> be converted to shell script
>
> That is not what I wanted to hear.
>
>> -- in fact, I'll try to get a shell version working today.
>
> That is.
>
> Thanks,
> Dscho
>

  reply	other threads:[~2009-08-06 20:01 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-06  9:55 [PATCH 0/5] Suggested for PU: revision caching system to significantly speed up packing/walking Nick Edelen
2009-08-06 14:48 ` Johannes Schindelin
2009-08-06 14:58   ` Michael J Gruber
2009-08-06 17:39     ` Nick Edelen
2009-08-06 19:06       ` Johannes Schindelin
2009-08-06 20:01         ` Nick Edelen [this message]
2009-08-06 20:30           ` Nick Edelen
2009-08-06 20:32             ` Shawn O. Pearce
2009-08-06 23:35               ` A Large Angry SCM
2009-08-06 23:37                 ` Shawn O. Pearce
2009-08-06 23:43                   ` A Large Angry SCM
2009-08-07  0:15                     ` Nick Edelen
2009-08-07  6:05                   ` Johannes Schindelin
2009-08-07  4:42             ` Nicolas Pitre
2009-08-07  2:47         ` Sam Vilain
2009-08-07  4:35           ` Nicolas Pitre
2009-08-07  6:08             ` Johannes Schindelin
2009-08-07 14:18               ` Nicolas Pitre
2009-08-08 15:18                 ` Johannes Schindelin
2009-08-08 16:07                   ` Junio C Hamano
2009-08-08 23:54                   ` Sam Vilain
2009-08-09  2:37                   ` Nicolas Pitre
2009-08-09 13:42                     ` Nick Edelen
2009-08-07  6:12           ` Johannes Schindelin
2009-08-07 15:00             ` Nicolas Pitre
2009-08-07 22:02               ` Nick Edelen
2009-08-07 22:48                 ` Junio C Hamano
2009-08-07 22:53                   ` Nick Edelen
2009-08-08  3:11                     ` Junio C Hamano
2009-08-08  7:27                       ` Nick Edelen
2009-08-08  7:30                         ` Jeff King
2009-08-08  7:40                           ` Nick Edelen
2009-08-08  2:50                   ` Jeff King
2009-08-08 18:57         ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c77435a80908061301n5e855aeci16af392ed3128651@mail.gmail.com \
    --to=sirnot@gmail.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=christian@couder.net \
    --cc=exon@op5.se \
    --cc=git@drmicha.warpmail.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=sam@vilain.net \
    --cc=spearce@spearce.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).