From: Junio C Hamano <junkio@cox.net>
To: Linus Torvalds <torvalds@osdl.org>
Cc: git@vger.kernel.org, Nguyen Thai Ngoc Duy <pclouds@gmail.com>
Subject: Re: git-diff opens too many files?
Date: Mon, 20 Nov 2006 11:51:22 -0800 [thread overview]
Message-ID: <7vvel928xx.fsf@assigned-by-dhcp.cox.net> (raw)
In-Reply-To: <Pine.LNX.4.64.0611200832450.3692@woody.osdl.org> (Linus Torvalds's message of "Mon, 20 Nov 2006 09:00:55 -0800 (PST)")
Linus Torvalds <torvalds@osdl.org> writes:
> Anyway, there's two possible solutions:
>
> - simply make sure that you can have that many open files.
>
> If it's a Linux system, just increase the value of the file
> /proc/sys/fs/file-max, and you're done. Of course, if you're not the
> admin of the box, you may need to ask somebody else to do it for you..
>
> - we could try to make git not keep them mmap'ed for the whole time.
>
> Junio? This is your speciality, I'm not sure how painful it would be to
> unmap and remap on demand.. (or switch it to some kind of "keep the last
> <n> mmaps active" kind of thing to avoid having thousands and thousands of
> mmaps active).
60,000 files 1kB each is 60MB which is a peanuts these days, but
10kB each would be already nontrivial burden on 32-bit (20%
under 3+1 split), so even if we do the "read in small files
instead of mapping" we would need diff_unpopulate_filespec() calls.
I think after diffcore_rename() runs, the data in filespec is
used only once during final textual diff generation. We would
use once more before diff generation if diffcore_pickaxe() is in
use. These codepaths begin with diff_populate_filespec(), so if
we unpopulate them after diffcore_rename() runs nobody would
notice (other than performance degradation and strace showing us
reading the same thing twice).
The diffcore_rename() matrix code expects all filespecs involved
can be populated at the same time, but it should not be too hard
to change it to keep one dst and all src candidates populated
but others dropped if space gets tight. I need to look at the
code for the details.
But Nguyen's command line did not have -M; I think the filespecs
are populated only during the text generation in that case, so
the above would not help him while it would be a worthwhile
change.
Because there is _no_ processing that comes after textual diff
generation that looks at the data, I think diff_flush_patch()
after calling run_diff() can unpopulate the data from the
filepair *p before returning without harming performance.
I think diff_flush_stat() and diff_flush_checkdiff() would have
the same issue, though. Ideally these should be able to do
their processing while the main textual diff holds the data in
memory for its processing but that is currently not the way the
code is structured.
next prev parent reply other threads:[~2006-11-20 19:51 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-11-20 10:12 git-diff opens too many files? Nguyen Thai Ngoc Duy
2006-11-20 15:20 ` Johannes Schindelin
2006-11-20 15:32 ` Nguyen Thai Ngoc Duy
2006-11-20 15:48 ` Johannes Schindelin
2006-11-20 17:00 ` Linus Torvalds
2006-11-20 19:51 ` Junio C Hamano [this message]
2006-11-20 21:02 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7vvel928xx.fsf@assigned-by-dhcp.cox.net \
--to=junkio@cox.net \
--cc=git@vger.kernel.org \
--cc=pclouds@gmail.com \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox