git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Jon Smirl" <jonsmirl@gmail.com>
To: "Nicolas Pitre" <nico@cam.org>
Cc: "Junio C Hamano" <gitster@pobox.com>, git@vger.kernel.org
Subject: Re: [PATCH 2/2] pack-objects: fix threaded load balancing
Date: Mon, 10 Dec 2007 12:06:39 -0500	[thread overview]
Message-ID: <9e4733910712100906g6794e326qf18da4be146f3667@mail.gmail.com> (raw)
In-Reply-To: <alpine.LFD.0.99999.0712101104320.555@xanadu.home>

On 12/10/07, Nicolas Pitre <nico@cam.org> wrote:
> On Mon, 10 Dec 2007, Jon Smirl wrote:
>
> > On 12/10/07, Jon Smirl <jonsmirl@gmail.com> wrote:
> > > I just deleted the section looking for identical hashes.
> > >
> > > +                       while (sub_size && list[0]->hash &&
> > > +                              list[0]->hash == list[-1]->hash) {
> > > +                               list++;
> > > +                               sub_size--;
> > > +                       }
> > >
> > > Doing that allows the long chains to be split over the cores.
> > >
> > > My last 5% of objects is taking over 50% of the total CPU time in the
> > > repack. I think these objects are the ones from that 103,817 entry
> > > chain. It is also causing the explosion in RAM consumption.
> > >
> > > At the end I can only do 20 objects per clock second on four cores. It
> > > takes 30 clock minutes (120 CPU minutes) to do the last 3% of objects.
> >
> > It's all in create_delta...
>
> Here you're mixing two different hashes with no relation what so ever
> with each other.
>
> The hash in create_delta corresponds to chunk of data in a reference
> buffer that we try to match in a source buffer.
>
> The hash in the code above has to do with the file names the
> corresponding objects are coming from.

So can we change this loop to exit after a max of window_size * 10 or
something like that iterations? Without capping it the threads become
way unbalanced in the end. In the gcc case one thread is continuing
30+ minutes past the others exiting.

> And again, both hash uses are deterministic i.e. they will be the same
> when repacking with -f regardless if the source pack is the 2.1GB or the
> 300MB one, so they may not explain the huge performance and memory usage
> discrepency you see between those two packs.

There is a correlation here but I don't know what it is. The memory
blow up occurs near the end of repacking. At the same time I move from
processing hundreds of objects per second to just a few per second.
And the threads are getting unbalanced.

I did notice that when I removed the above loop and evened things out
memory consumption did not spike as bad as it previously did. I maxed
out at 3GB instead of 4.5GB.

Linus suggested memory fragmentation could be a culprit. Evening the
threads out changed the allocation pattern. It is possible that it
avoided a fragmentation problem. It is also possible that evening
things out split the work so that less memory needed to be allocated.

Don't hold any of these numbers to be gospel. I am using the machine
for other things while I run these tests and there may be
interactions.

>
> The code that do get influenced by the source pack, though, is all
> concentrated in sha1_file.c.
>
>
> Nicolas
>


-- 
Jon Smirl
jonsmirl@gmail.com

  reply	other threads:[~2007-12-10 17:07 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-12-08  5:03 [PATCH 2/2] pack-objects: fix threaded load balancing Nicolas Pitre
2007-12-08  9:18 ` Jeff King
2007-12-10  4:10 ` Jon Smirl
2007-12-10  4:30 ` Jon Smirl
2007-12-10  5:23   ` Jon Smirl
2007-12-10  5:59     ` Jon Smirl
2007-12-10  6:06       ` Jon Smirl
2007-12-10  6:19         ` Jon Smirl
2007-12-10 16:03           ` Nicolas Pitre
2007-12-10 16:14         ` Nicolas Pitre
2007-12-10 17:06           ` Jon Smirl [this message]
2007-12-10 18:21             ` Nicolas Pitre
2007-12-10 19:19               ` [PATCH] pack-objects: more threaded load balancing fix with often changed paths Nicolas Pitre
2007-12-11 17:02 ` [PATCH 2/2] pack-objects: fix threaded load balancing Johannes Sixt
2007-12-11 17:28   ` Nicolas Pitre
2007-12-13  7:15     ` Johannes Sixt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9e4733910712100906g6794e326qf18da4be146f3667@mail.gmail.com \
    --to=jonsmirl@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=nico@cam.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).