From: "Jon Smirl" <jonsmirl@gmail.com>
To: "Nicolas Pitre" <nico@cam.org>
Cc: "Junio C Hamano" <gitster@pobox.com>, git@vger.kernel.org
Subject: Re: [PATCH 2/2] pack-objects: fix threaded load balancing
Date: Mon, 10 Dec 2007 12:06:39 -0500 [thread overview]
Message-ID: <9e4733910712100906g6794e326qf18da4be146f3667@mail.gmail.com> (raw)
In-Reply-To: <alpine.LFD.0.99999.0712101104320.555@xanadu.home>
On 12/10/07, Nicolas Pitre <nico@cam.org> wrote:
> On Mon, 10 Dec 2007, Jon Smirl wrote:
>
> > On 12/10/07, Jon Smirl <jonsmirl@gmail.com> wrote:
> > > I just deleted the section looking for identical hashes.
> > >
> > > + while (sub_size && list[0]->hash &&
> > > + list[0]->hash == list[-1]->hash) {
> > > + list++;
> > > + sub_size--;
> > > + }
> > >
> > > Doing that allows the long chains to be split over the cores.
> > >
> > > My last 5% of objects is taking over 50% of the total CPU time in the
> > > repack. I think these objects are the ones from that 103,817 entry
> > > chain. It is also causing the explosion in RAM consumption.
> > >
> > > At the end I can only do 20 objects per clock second on four cores. It
> > > takes 30 clock minutes (120 CPU minutes) to do the last 3% of objects.
> >
> > It's all in create_delta...
>
> Here you're mixing two different hashes with no relation what so ever
> with each other.
>
> The hash in create_delta corresponds to chunk of data in a reference
> buffer that we try to match in a source buffer.
>
> The hash in the code above has to do with the file names the
> corresponding objects are coming from.
So can we change this loop to exit after a max of window_size * 10 or
something like that iterations? Without capping it the threads become
way unbalanced in the end. In the gcc case one thread is continuing
30+ minutes past the others exiting.
> And again, both hash uses are deterministic i.e. they will be the same
> when repacking with -f regardless if the source pack is the 2.1GB or the
> 300MB one, so they may not explain the huge performance and memory usage
> discrepency you see between those two packs.
There is a correlation here but I don't know what it is. The memory
blow up occurs near the end of repacking. At the same time I move from
processing hundreds of objects per second to just a few per second.
And the threads are getting unbalanced.
I did notice that when I removed the above loop and evened things out
memory consumption did not spike as bad as it previously did. I maxed
out at 3GB instead of 4.5GB.
Linus suggested memory fragmentation could be a culprit. Evening the
threads out changed the allocation pattern. It is possible that it
avoided a fragmentation problem. It is also possible that evening
things out split the work so that less memory needed to be allocated.
Don't hold any of these numbers to be gospel. I am using the machine
for other things while I run these tests and there may be
interactions.
>
> The code that do get influenced by the source pack, though, is all
> concentrated in sha1_file.c.
>
>
> Nicolas
>
--
Jon Smirl
jonsmirl@gmail.com
next prev parent reply other threads:[~2007-12-10 17:07 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-12-08 5:03 [PATCH 2/2] pack-objects: fix threaded load balancing Nicolas Pitre
2007-12-08 9:18 ` Jeff King
2007-12-10 4:10 ` Jon Smirl
2007-12-10 4:30 ` Jon Smirl
2007-12-10 5:23 ` Jon Smirl
2007-12-10 5:59 ` Jon Smirl
2007-12-10 6:06 ` Jon Smirl
2007-12-10 6:19 ` Jon Smirl
2007-12-10 16:03 ` Nicolas Pitre
2007-12-10 16:14 ` Nicolas Pitre
2007-12-10 17:06 ` Jon Smirl [this message]
2007-12-10 18:21 ` Nicolas Pitre
2007-12-10 19:19 ` [PATCH] pack-objects: more threaded load balancing fix with often changed paths Nicolas Pitre
2007-12-11 17:02 ` [PATCH 2/2] pack-objects: fix threaded load balancing Johannes Sixt
2007-12-11 17:28 ` Nicolas Pitre
2007-12-13 7:15 ` Johannes Sixt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9e4733910712100906g6794e326qf18da4be146f3667@mail.gmail.com \
--to=jonsmirl@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=nico@cam.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).