git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Jon Smirl" <jonsmirl@gmail.com>
To: "Nicolas Pitre" <nico@cam.org>
Cc: "Git Mailing List" <git@vger.kernel.org>
Subject: Re: Performance problem, long run of identical hashes
Date: Mon, 10 Dec 2007 11:20:40 -0500	[thread overview]
Message-ID: <9e4733910712100820k1bd0959fsdfae92727826c6db@mail.gmail.com> (raw)
In-Reply-To: <alpine.LFD.0.99999.0712101037270.555@xanadu.home>

On 12/10/07, Nicolas Pitre <nico@cam.org> wrote:
> On Mon, 10 Dec 2007, Jon Smirl wrote:
>
> > Running oprofile during my gcc repack shows this loop as the hottest
> > place in the code by far.
>
> Well, that is kind of expected.
>
> > I added some debug printfs which show that I
> > have a 100,000+ run of identical hash entries. Processing the 100,000
> > entries also causes RAM consumption to explode.
>
> That is impossible.  If you look at the code where those hash entries
> are created in create_delta_index(), you'll notice a hard limit of
> HASH_LIMIT (currently 64) is imposed on the number of identical hash
> entries.

On 12/10/07, Jon Smirl <jonsmirl@gmail.com> wrote:
> On 12/9/07, Jon Smirl <jonsmirl@gmail.com> wrote:
> > > +               if (victim) {
> > > +                       sub_size = victim->remaining / 2;
> > > +                       list = victim->list + victim->list_size - sub_size;
> > > +                       while (sub_size && list[0]->hash &&
> > > +                              list[0]->hash == list[-1]->hash) {
> > > +                               list++;
> >
> > I think you needed to copy sub_size to another variable for this loop
>
> Copying sub_size was wrong. I believe you are checking for deltas on
> the same file. It's probably that chain of 103,817 deltas that can't
> be broken up.

At the end of multi-threaded repack one thread ends up with 45 minutes
of work after all the other threads have exited. That's because it
hits this loop and can't spit the list any more.

If the lists can't be over 64 identical entries, why do I get caught
in this loop for 50,000+ iterations? If remove this loop the threads
are balanced right to the end.

-- 
Jon Smirl
jonsmirl@gmail.com


-- 
Jon Smirl
jonsmirl@gmail.com

  parent reply	other threads:[~2007-12-10 16:21 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-12-10 15:07 Performance problem, long run of identical hashes Jon Smirl
2007-12-10 15:45 ` Nicolas Pitre
2007-12-10 16:14   ` David Kastrup
2007-12-10 16:20   ` Jon Smirl [this message]
2007-12-10 16:30     ` Nicolas Pitre
2007-12-10 19:39   ` David Kastrup
2007-12-10 20:11     ` Nicolas Pitre

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9e4733910712100820k1bd0959fsdfae92727826c6db@mail.gmail.com \
    --to=jonsmirl@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=nico@cam.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).