* Multiple threads of compression
@ 2012-11-25 16:27 Thorsten Glaser
2012-11-26 2:37 ` Brandon Casey
2012-11-26 7:23 ` Sebastian Leske
0 siblings, 2 replies; 4+ messages in thread
From: Thorsten Glaser @ 2012-11-25 16:27 UTC (permalink / raw)
To: git
Hi,
I’m asking here informally first, because my information relates
to a quite old version (the one from lenny-backports). A tl;dr
is at the end.
On a multi-core machine, the garbage collection of git, as well
as pack compression on the server side when someone clones a
repository remotely, the compression is normally done automatically
using multiple threads of execution.
That may be fine for your typical setups, but in my cases, I have
two scenarios where it isn’t:
ⓐ The machine where I want it to use only, say, 2 of my 4 or 8 cores
as I’m also running some VMs on the box which eat up a lot of CPU
and which I don’t want to slow down.
ⓑ The server VM which has been given 2 or 3 VCPUs to cope with all
the load done by clients, but which is RAM-constrained to only
512 or, when lucky, 768 MiB. It previously served only http/https
and *yuk* Subversion, but now, git comes into the play, and I’ve
seen the one server box I think about go down *HARD* because git
ate up all RAM *and* swap when someone wanted to update their clone
of a repository after someone else committed… well, an ~100 MiB large
binary file they shouldn’t. (It required manual intervention on the
server to kill that revision and then the objects coupled with it,
but even *that* didn’t work, read on for more.)
In both cases, I had to apply a quick hack. One I can reproduce
by now is, that, on the first box, I added a --threads=2 to the
line calling git pack-objects in /usr/lib/git-core/git-repack,
like this:
83 args="$args $local ${GIT_QUIET:+-q} $no_reuse$extra"
84 names=$(git pack-objects --threads=2 --keep-true-parents --honor-pack-
keep --non-empty --all --reflog $arg
85 exit 1
(By the way, wrapping source code at 80c is still way to go IMHO.)
On the second box, IIRC I added --threads=1, but that box got
subsequently upgraded from lenny to wheezy so any local modification
is lost (luckily, the problem didn’t occur again recently, or at
least I didn’t notice it, save for the VM load going up to 6-8
several times a day).
tl;dr: I would like to have a *global* option for git to restrict
the number of threads of execution it uses. Several subcommands,
like pack-objects, are already equipped with an optioin for this,
but unfortunately, these are seldom invoked by hand¹, so this can’t
work in my situations.
① automatic garbage collection, “git gc --aggressive --prune=now”,
and cloning are the use cases I have at hand right now.
À propos, while here: is gc --aggressive safe to run on a live,
online-shared repository, or does it break other users accessing
the repository concurrently? (If it’s safe I’d very much like to do
that in a, say weekly, cronjob on FusionForge, our hosting system.)
Thanks in advance!
//mirabilos
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Multiple threads of compression
2012-11-25 16:27 Multiple threads of compression Thorsten Glaser
@ 2012-11-26 2:37 ` Brandon Casey
2012-11-26 9:59 ` Thorsten Glaser
2012-11-26 7:23 ` Sebastian Leske
1 sibling, 1 reply; 4+ messages in thread
From: Brandon Casey @ 2012-11-26 2:37 UTC (permalink / raw)
To: Thorsten Glaser; +Cc: git@vger.kernel.org
On Sun, Nov 25, 2012 at 8:27 AM, Thorsten Glaser <tg@debian.org> wrote:
> tl;dr: I would like to have a *global* option for git to restrict
> the number of threads of execution it uses. Several subcommands,
> like pack-objects, are already equipped with an optioin for this,
> but unfortunately, these are seldom invoked by hand¹, so this can’t
> work in my situations.
See the git-config man page.
The number of threads that pack uses can be configured in the global
or system gitconfig file by setting pack.threads.
e.g.
$ git config --system pack.threads 1
Also, modern git accepts a '-c' option which allows you to set
configuration options on the command line, e.g. 'git -c pack.threads=1
gc'.
The other setting you should probably look at is pack.windowMemory
which should help you control the amount of memory git uses while
packing. Also look at core.packedGitWindowSize and
core.packedGitLimit if your repository is really large.
>
> ① automatic garbage collection, “git gc --aggressive --prune=now”,
> and cloning are the use cases I have at hand right now.
>
> À propos, while here: is gc --aggressive safe to run on a live,
> online-shared repository, or does it break other users accessing
> the repository concurrently? (If it’s safe I’d very much like to do
> that in a, say weekly, cronjob on FusionForge, our hosting system.)
Running 'git gc' with --aggressive should be as safe as running it
without --aggressive.
But, you should think about whether you really need to run it more
than once, or at all. When you use --aggressive, git will perform the
entire delta search again for _every_ object in the repository. The
general usecase for --aggressive is when you suspect that the original
delta search produced sub-optimal deltas or if you modify the size of
the delta search window or depth and want to regenerate your packed
objects to improve compression or access speed. Even then, you will
not likely get much benefit from running with --aggressive a second
time.
hth,
-Brandon
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Multiple threads of compression
2012-11-26 2:37 ` Brandon Casey
@ 2012-11-26 9:59 ` Thorsten Glaser
0 siblings, 0 replies; 4+ messages in thread
From: Thorsten Glaser @ 2012-11-26 9:59 UTC (permalink / raw)
To: Brandon Casey; +Cc: git@vger.kernel.org
Brandon Casey dixit:
>The number of threads that pack uses can be configured in the global
>or system gitconfig file by setting pack.threads.
[…]
>The other setting you should probably look at is pack.windowMemory
>which should help you control the amount of memory git uses while
>packing. Also look at core.packedGitWindowSize and
>core.packedGitLimit if your repository is really large.
OK, thanks a lot!
I can’t really say much about the repositories beforehand
because it’s a generic code hosting platform, several instances
of which we run at my employer’s place (I also run one privately
now), and which is also run by e.g. Debian. But I’ll try to figure
out some somewhat sensible defaults.
>Running 'git gc' with --aggressive should be as safe as running it
>without --aggressive.
OK, thanks.
>But, you should think about whether you really need to run it more
>than once, or at all. When you use --aggressive, git will perform the
[…]
Great explanation!
I think that I’d want to run it once, after the repository has
been begun to be used (probably not correct English but you know
what I want to say), but have to figure out a way to do so… but
I’ll just leave out the --aggressive from the cronjob then.
Much appreciated,
//mirabilos
--
Sometimes they [people] care too much: pretty printers [and syntax highligh-
ting, d.A.] mechanically produce pretty output that accentuates irrelevant
detail in the program, which is as sensible as putting all the prepositions
in English text in bold font. -- Rob Pike in "Notes on Programming in C"
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Multiple threads of compression
2012-11-25 16:27 Multiple threads of compression Thorsten Glaser
2012-11-26 2:37 ` Brandon Casey
@ 2012-11-26 7:23 ` Sebastian Leske
1 sibling, 0 replies; 4+ messages in thread
From: Sebastian Leske @ 2012-11-26 7:23 UTC (permalink / raw)
To: git
Hi,
[Thorsten Glaser <tg@debian.org>, 2012-11-25 17:27]:
> On a multi-core machine, the garbage collection of git, as well
> as pack compression on the server side when someone clones a
> repository remotely, the compression is normally done automatically
> using multiple threads of execution.
>
> That may be fine for your typical setups, but in my cases, I have
> two scenarios where it isn’t:
>
> ⓐ The machine where I want it to use only, say, 2 of my 4 or 8 cores
> as I’m also running some VMs on the box which eat up a lot of CPU
> and which I don’t want to slow down.
> ⓑ The server VM which has been given 2 or 3 VCPUs to cope with all
> the load done by clients, but which is RAM-constrained to only
> 512 or, when lucky, 768 MiB. It previously served only http/https
> and *yuk* Subversion, but now, git comes into the play, and I’ve
> seen the one server box I think about go down *HARD* because git
> ate up all RAM *and* swap when someone wanted to update their clone
> of a repository after someone else committed… well, an ~100 MiB large
> binary file they shouldn’t.
unfortunately I can't really speak to the git side of things, but both
of these cases just sound like standard resource starvation. So why
don't you address them using the usual OS mechanisms?
If you want to prevent git from sucking up CPU, nice(1) it, and if it
eats too much RAM, use the parent shell's ulimit mechanism.
Granted, this might also require some changes to git, but wouldn't that
be a simpler and more general approach to solving starvation problems?
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2012-11-26 22:58 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-11-25 16:27 Multiple threads of compression Thorsten Glaser
2012-11-26 2:37 ` Brandon Casey
2012-11-26 9:59 ` Thorsten Glaser
2012-11-26 7:23 ` Sebastian Leske
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).