git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* git-daemon memory usage, disconnection.
@ 2006-04-19 13:22 David Woodhouse
  2006-04-19 14:59 ` Linus Torvalds
  0 siblings, 1 reply; 4+ messages in thread
From: David Woodhouse @ 2006-04-19 13:22 UTC (permalink / raw)
  To: git

I'm running git-daemon from xinetd and it seems a little greedy...

Cpu(s):  2.7% us,  6.4% sy,  0.0% ni,  1.7% id, 87.7% wa,  1.4% hi,  0.0% si
Mem:    253680k total,   250076k used,     3604k free,      568k buffers
Swap:   500960k total,   500864k used,       96k free,    24696k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
31232 nobody    18   0  155m  29m 7224 D  1.3 11.9   0:25.56 git-rev-list
30743 nobody    18   0  179m  29m 9480 D  0.7 11.9   0:42.60 git-rev-list
31277 nobody    18   0  147m  28m 7476 D  2.6 11.4   0:20.90 git-rev-list
30314 nobody    18   0  233m  26m 7696 D  0.0 10.6   1:20.24 git-rev-list
30612 nobody    18   0  204m  23m 7432 D  1.3  9.4   0:59.19 git-rev-list
30574 nobody    18   0  190m  20m 7608 D  0.3  8.3   0:50.77 git-rev-list
30208 nobody    18   0  140m  14m 7632 D  0.3  5.9   0:15.23 git-pack-object

Now, this wouldn't be _so_ bad if there were only two of them running.
The clients for the other four have actually given up and disconnected
long ago, but git-daemon doesn't seem to have reacted to that.

-- 
dwmw2

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: git-daemon memory usage, disconnection.
  2006-04-19 13:22 git-daemon memory usage, disconnection David Woodhouse
@ 2006-04-19 14:59 ` Linus Torvalds
  2006-04-19 15:27   ` David Woodhouse
  0 siblings, 1 reply; 4+ messages in thread
From: Linus Torvalds @ 2006-04-19 14:59 UTC (permalink / raw)
  To: David Woodhouse; +Cc: git



On Wed, 19 Apr 2006, David Woodhouse wrote:
>
> I'm running git-daemon from xinetd and it seems a little greedy...
> 
> Cpu(s):  2.7% us,  6.4% sy,  0.0% ni,  1.7% id, 87.7% wa,  1.4% hi,  0.0% si
> Mem:    253680k total,   250076k used,     3604k free,      568k buffers
> Swap:   500960k total,   500864k used,       96k free,    24696k cached
> 
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> 31232 nobody    18   0  155m  29m 7224 D  1.3 11.9   0:25.56 git-rev-list
> 30743 nobody    18   0  179m  29m 9480 D  0.7 11.9   0:42.60 git-rev-list
> 31277 nobody    18   0  147m  28m 7476 D  2.6 11.4   0:20.90 git-rev-list
> 30314 nobody    18   0  233m  26m 7696 D  0.0 10.6   1:20.24 git-rev-list
> 30612 nobody    18   0  204m  23m 7432 D  1.3  9.4   0:59.19 git-rev-list
> 30574 nobody    18   0  190m  20m 7608 D  0.3  8.3   0:50.77 git-rev-list
> 30208 nobody    18   0  140m  14m 7632 D  0.3  5.9   0:15.23 git-pack-object

Well, you've probably got two issues: 

 - it looks like you aren't packing your archives (which explains why the 
   disk accesses are horrid, which in turn explains the "D" part).

   For a git server, you _really_ want all trees to be mostly packed, or 
   you want absolutely tons of memory (and 256kB is definitely not "tons"
   as far as git is concerned).

 - git-rev-list won't notice that there is nobody listening until it gets 
   a EPIPE, and it won't get an EPIPE until it actually outputs something, 
   and it won't output anything until it is largely done traversing the 
   tree..

> Now, this wouldn't be _so_ bad if there were only two of them running.
> The clients for the other four have actually given up and disconnected
> long ago, but git-daemon doesn't seem to have reacted to that.

Well, the way things work under UNIX, you normally don't notice that the 
other end isn't interested until you try to write, and you get a "nobody 
is listening". And sadly, the packing stuff does most (not all) of the 
heavy lifting before it can even start to write things out.

That said, I should probably take a look at git-rev-list --objects memory 
usage once again. It's neve rbeen exactly "lean" (and it can't really be: 
it does end up needing the total object list in memory for a full clone, 
and with something like the kernel, that's about 250 _thousand_ objects).

We should probably also make send-pack.c use the nice revision library, 
because right now it's doing that pipe to git-rev-list for no good reason.

		Linus

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: git-daemon memory usage, disconnection.
  2006-04-19 14:59 ` Linus Torvalds
@ 2006-04-19 15:27   ` David Woodhouse
  2006-04-19 15:49     ` Linus Torvalds
  0 siblings, 1 reply; 4+ messages in thread
From: David Woodhouse @ 2006-04-19 15:27 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

On Wed, 2006-04-19 at 07:59 -0700, Linus Torvalds wrote:
> Well, you've probably got two issues: 
> 
>  - it looks like you aren't packing your archives (which explains why the 
>    disk accesses are horrid, which in turn explains the "D" part).

Hm, good point. They're fairly new trees -- I had foolishly assumed that
they would at least start off packed. That isn't the case though --
perhaps it should be? Did the original clone receive a pack on the wire
and then _split_ it?

If the tools would automatically pack when the number of unpacked
objects reaches a threshold, that would be useful.

Since this repo is only available through git:// and git+ssh:// URLs, I
can safely use git-repack's '-a -d' options, right?

I'll do 'git-repack -l' nightly and 'git-repack -a -d -l' weekly -- does
that seem sane?

>    For a git server, you _really_ want all trees to be mostly packed, or 
>    you want absolutely tons of memory (and 256kB is definitely not "tons"
>    as far as git is concerned).
> 
> Well, the way things work under UNIX, you normally don't notice that the 
> other end isn't interested until you try to write, and you get a "nobody 
> is listening". And sadly, the packing stuff does most (not all) of the 
> heavy lifting before it can even start to write things out.

Well, it does that with SIGALRM happening periodically, theoretically
for the purpose of providing progress output. Perhaps we could do a
getpeername() or something else to check on the output fd each time?

-- 
dwmw2

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: git-daemon memory usage, disconnection.
  2006-04-19 15:27   ` David Woodhouse
@ 2006-04-19 15:49     ` Linus Torvalds
  0 siblings, 0 replies; 4+ messages in thread
From: Linus Torvalds @ 2006-04-19 15:49 UTC (permalink / raw)
  To: David Woodhouse; +Cc: git



On Wed, 19 Apr 2006, David Woodhouse wrote:

> On Wed, 2006-04-19 at 07:59 -0700, Linus Torvalds wrote:
> > Well, you've probably got two issues: 
> > 
> >  - it looks like you aren't packing your archives (which explains why the 
> >    disk accesses are horrid, which in turn explains the "D" part).
> 
> Hm, good point. They're fairly new trees -- I had foolishly assumed that
> they would at least start off packed. That isn't the case though --
> perhaps it should be? Did the original clone receive a pack on the wire
> and then _split_ it?

For old versions of git, yes.

> If the tools would automatically pack when the number of unpacked
> objects reaches a threshold, that would be useful.

Well, packing is still best done in the background: you don't generally 
want the tools to just stop for a minute to repack while you're doing 
something. You'd normally want to do a cron run at 4AM or something, see 
if there is lots to pack, and repack that.

The one exception is probably a large conversion process (from CVS, SVN, 
whatever). The conversion process itself probably takes ages, and it will 
be even slower if it were to keep the potentially huge result unpacked all 
the time.

But for normal ops, you really don't want to repack synchronously.

> Since this repo is only available through git:// and git+ssh:// URLs, I
> can safely use git-repack's '-a -d' options, right?

Yes.

> I'll do 'git-repack -l' nightly and 'git-repack -a -d -l' weekly -- does
> that seem sane?

Absolutely. The one exception might be trees that really don't change very 
much (which is quite common), so you might make it conditional on seeing 
if there are _any_ objects at all in .git/objects/00/, for example. Not 
that repack will be very expensive, but still..

> Well, it does that with SIGALRM happening periodically, theoretically
> for the purpose of providing progress output. Perhaps we could do a
> getpeername() or something else to check on the output fd each time?

Yes, that's possibly a good idea. Of course, for git-rev-list, it's just a 
pipe, and it's hard to do that check at least portably. On Linux, doing a 
"poll()" on a pipe for writing, with newer kernels you'll get a POLLERR if 
the other side has hung up, but that's by no means portable.

(On some other systems, doing a zero-sized write() _might_ do it, but at 
least Linux will happily say "ok, wrote 0 bytes" even if the other end 
isn't listening).

And git-rev-list isn't doing the SIGALARM anyway.

In other words, to do this, we'd have to change send-pack to use the 
revision library. Which, as mentioned, is worth-while anyway, but it's not 
totally trivial.

		Linus

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2006-04-19 15:51 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-04-19 13:22 git-daemon memory usage, disconnection David Woodhouse
2006-04-19 14:59 ` Linus Torvalds
2006-04-19 15:27   ` David Woodhouse
2006-04-19 15:49     ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).