From: Avery Pennarun <apenwarr@gmail.com>
To: Ivan Kanis <expire-by-2010-08-14@kanis.fr>
Cc: Dmitry Potapov <dpotapov@gmail.com>,
Nguyen Thai Ngoc Duy <pclouds@gmail.com>,
jaredhance@gmail.com, jnareb@gmail.com, git <git@vger.kernel.org>
Subject: Re: Excessive mmap [was Git server eats all memory]
Date: Mon, 9 Aug 2010 12:50:30 -0400 [thread overview]
Message-ID: <AANLkTiktriuvciNTNPD4941AG3th6rWwUYT4v_UnaAz3@mail.gmail.com> (raw)
In-Reply-To: <westyn3n3sa.fsf@kanis.fr>
On Mon, Aug 9, 2010 at 12:34 PM, Ivan Kanis
<expire-by-2010-08-14@kanis.fr> wrote:
> Dmitry Potapov <dpotapov@gmail.com> wrote:
>> On 64-bit architecture, you have plenty virtual space, and mapping
>> a file to memory should not take much physical memory (only space
>> needed for system tables).
>
> What I can tell from the mmap man page is that it should map memory to a
> file. I assume it shouldn't take up physical memory. However I am seeing
> physical memory being consumed. It might be a feature of the kernel. Is
> there a way to turn it off?
'ps axu' will show two columns: VSIZE and RSS. The only one that
actually matters is RSS.
When you mmap a file, it will immediately consume a lot of VSIZE - but
this won't affect your available system memory, because you have only
consumed "virtual" memory. Instead of swapping that memory out to the
swap file, the kernel knows that this chunk of virtual memory is
already on disk - inside the mmap'd file.
When you access some of the pages of the mmap'd file, the kernel will
swap those pages into memory, which increases RSS. This uses *real*
memory on the system.
As git generates a new pack file, it needs to access every single page
of every single pack that it's reading from, so eventually, all the
stuff you need will get sucked into RSS, so you'll see that number
grow and grow. If your packfiles are huge, this is a lot of memory.
Now, the kernel is supposed to be smart enough to release old pages
out of RSS if you stop using them; it's no different from what the
kernel does with any cached file data. So it shouldn't be expensive
to mmap instead of just reading the file.
> Looking some more into it today the bulk of the memory allocation
> happens in write_pack_file in the following loop.
>
> for (; i < nr_objects; i++) {
> if (!write_one(f, objects + i, &offset))
> break;
> display_progress(progress_state, written);
> }
>
> This eventually calls write_object, here I am wondering if the
> unuse_pack function is doing its job. As far as I can tell it writes a
> null in memory, that I think is not enough to reclaim memory.
What do you mean by the "memory allocation" happens here? How are you
measuring it?
unuse_pack indeed doesn't free any memory; it just zeroes a pointer
and decreases a refcount. I don't know much about this code, but I
assume something else goes and cleans up the mmaps later.
In any case, mmap/munmap have little to do with your "real" memory
usage. munmap() won't free any actual kernel memory; the used pages
will still be floating around in disk cache.
> I also looked at the use_pack function where the mmap is
> happening. Would it be worth refactoring this function so that it uses
> an index withing a file instead of mmap?
>
> Unless I hear of a better idea I'll be trying that tomorrow...
I wouldn't expect this to help, but I would be interested to hear if it does.
If the problem is simply that you're flooding the kernel disk cache
with data you'll use only once, to the detriment of everything else on
the system, then one thing that might help could be posix_fadvise:
posix_fadvise(fd, ofs, len, POSIX_FADV_DONTNEED);
bup uses this when backing up huge files, since it knows it's only
going to use each block once, and this seemed to decrease system load
(without affecting bup's own performance) in some test cases.
However, it uses this for filesystem files, not packs, so it's a
different use case.
On the other hand, perhaps a more important question is: why does git
feel like it needs to generate entirely new packs for each person
doing a clone on your system? Shouldn't it be reusing existing ones
and just streaming them straight out to the recipient?
Have fun,
Avery
next prev parent reply other threads:[~2010-08-09 16:52 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-08-04 14:57 Git server eats all memory Ivan Kanis
2010-08-04 15:55 ` Matthieu Moy
2010-08-04 17:50 ` Ivan Kanis
2010-08-04 20:12 ` Avery Pennarun
2010-08-05 6:33 ` Ivan Kanis
2010-08-05 22:45 ` Jared Hance
2010-08-06 1:37 ` Nguyen Thai Ngoc Duy
2010-08-06 1:51 ` Nguyen Thai Ngoc Duy
2010-08-06 11:34 ` Jakub Narebski
2010-08-06 17:23 ` Ivan Kanis
2010-08-07 6:42 ` Dmitry Potapov
2010-08-09 10:12 ` Excessive mmap [was Git server eats all memory] Ivan Kanis
2010-08-09 12:35 ` Dmitry Potapov
2010-08-09 16:34 ` Ivan Kanis
2010-08-09 16:50 ` Avery Pennarun [this message]
2010-08-09 17:45 ` Tomas Carnecky
2010-08-09 18:17 ` Avery Pennarun
2010-08-09 21:28 ` Dmitry Potapov
2010-08-11 15:47 ` Ivan Kanis
2010-08-11 16:35 ` Avery Pennarun
[not found] ` <wes4oetv31i.fsf@kanis.fr>
2010-08-17 17:07 ` Dmitry Potapov
2018-06-20 14:53 ` Duy Nguyen
[not found] ` <AANLkTi=yeTh2tKn9t_=iZbdB5VLrfCPZ2_fBpYdf9wta@mail.gmail.com>
[not found] ` <wesbp9cnnag.fsf@kanis.fr>
2010-08-09 9:57 ` Git server eats all memory Nguyen Thai Ngoc Duy
2010-08-09 17:38 ` Ivan Kanis
2010-08-10 0:46 ` Robin H. Johnson
2010-08-10 2:31 ` Sverre Rabbelier
2010-08-11 10:30 ` Sam Vilain
2010-08-11 15:54 ` Ivan Kanis
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=AANLkTiktriuvciNTNPD4941AG3th6rWwUYT4v_UnaAz3@mail.gmail.com \
--to=apenwarr@gmail.com \
--cc=dpotapov@gmail.com \
--cc=expire-by-2010-08-14@kanis.fr \
--cc=git@vger.kernel.org \
--cc=jaredhance@gmail.com \
--cc=jnareb@gmail.com \
--cc=pclouds@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).