git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* gitweb - option to disable rename detection
@ 2005-08-15 18:31 Yasushi SHOJI
  2005-08-15 18:53 ` Linus Torvalds
  0 siblings, 1 reply; 6+ messages in thread
From: Yasushi SHOJI @ 2005-08-15 18:31 UTC (permalink / raw)
  To: git

Hi all,

It seems to me that git-diff-tree needs huge memory if you try to diff
on big change with rename detection enabled.

This isn't problem for sane project but if you create a repo with only
major releases imports, git-diff-tree run by git_commit() eats system
memory and die ;P

so here is a proposal patch to add an option to disable rename
detection on selected project.

# please bear with me! I _know_ my perl code sucks.  this is just to
# show my idea.

regards,
--
         yashi

diff --git a/gitweb.cgi b/gitweb.cgi
--- a/gitweb.cgi
+++ b/gitweb.cgi
@@ -40,6 +40,11 @@ my $home_text =              "indextext.html";
 #my $projects_list = $projectroot;
 my $projects_list = "index/index.aux";
 
+# rename detection on big change require a lot of memory. If your tree
+# has big change commit, list those project names to disable rename
+# detection. most sane project doesn't need to disable this feature.
+my $projects_disable_rename_detection = "";
+
 # input validation and dispatch
 my $action = $cgi->param('a');
 if (defined $action) {
@@ -1572,6 +1577,17 @@ sub git_log {
        git_footer_html();
 }
 
+sub git_diff_tree_options {
+       my $found = 0;
+       my @list = split(/ /, $projects_disable_rename_detection);
+       foreach my $p (@list) {
+               if ($project eq $p) {
+                       $found = 1;
+               }
+       }
+       $found ? "" : "-M";
+}
+
 sub git_commit {
        my %co = git_read_commit($hash);
        if (!%co) {
@@ -1587,7 +1603,8 @@ sub git_commit {
                $root = " --root";
                $parent = "";
        }
-       open my $fd, "-|", "$gitbin/git-diff-tree -r -M $root $parent $hash" or die_error(undef, "Open failed.");
+       my $opts = git_diff_tree_options();
+       open my $fd, "-|", "$gitbin/git-diff-tree -r $opts $root $parent $hash" or die_error(undef, "Open failed.");
        @difftree = map { chomp; $_ } <$fd>;
        close $fd or die_error(undef, "Reading diff-tree failed.");
        git_header_html();

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: gitweb - option to disable rename detection
  2005-08-15 18:31 gitweb - option to disable rename detection Yasushi SHOJI
@ 2005-08-15 18:53 ` Linus Torvalds
  2005-08-15 20:48   ` Yasushi SHOJI
  0 siblings, 1 reply; 6+ messages in thread
From: Linus Torvalds @ 2005-08-15 18:53 UTC (permalink / raw)
  To: Yasushi SHOJI; +Cc: git



On Tue, 16 Aug 2005, Yasushi SHOJI wrote:
> 
> It seems to me that git-diff-tree needs huge memory if you try to diff
> on big change with rename detection enabled.
> 
> This isn't problem for sane project but if you create a repo with only
> major releases imports, git-diff-tree run by git_commit() eats system
> memory and die ;P

Instead of disabling it entirely, how about just having some limit on it?

The basic rename detection works very well for "normal" changes as you 
point out, but it very fundamentally is O(n^2) in number of files created 
and deleted, so we could instead just limit it and say "if we have tons of 
new/deleted files, disable it in the interest of CPU/memory usage".

			Linus

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: gitweb - option to disable rename detection
  2005-08-15 18:53 ` Linus Torvalds
@ 2005-08-15 20:48   ` Yasushi SHOJI
  2005-08-15 22:43     ` Linus Torvalds
  0 siblings, 1 reply; 6+ messages in thread
From: Yasushi SHOJI @ 2005-08-15 20:48 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

At Mon, 15 Aug 2005 11:53:20 -0700 (PDT),
Linus Torvalds wrote:
> 
> On Tue, 16 Aug 2005, Yasushi SHOJI wrote:
> > 
> > It seems to me that git-diff-tree needs huge memory if you try to diff
> > on big change with rename detection enabled.
> > 
> > This isn't problem for sane project but if you create a repo with only
> > major releases imports, git-diff-tree run by git_commit() eats system
> > memory and die ;P
> 
> Instead of disabling it entirely, how about just having some limit on it?

ah, that's a good idea.  here is a quick and dirty patch.
--
          yashi

diff --git a/gitweb.cgi b/gitweb.cgi
--- a/gitweb.cgi
+++ b/gitweb.cgi
@@ -40,6 +40,9 @@ my $home_text =               "indextext.html";
 #my $projects_list = $projectroot;
 my $projects_list = "index/index.aux";
 
+# max number of changes to use rename detection on git-diff-tree
+my $rename_detection_threshold = 1000;
+
 # input validation and dispatch
 my $action = $cgi->param('a');
 if (defined $action) {
@@ -1587,7 +1590,18 @@ sub git_commit {
                $root = " --root";
                $parent = "";
        }
-       open my $fd, "-|", "$gitbin/git-diff-tree -r -M $root $parent $hash" or die_error(undef, "Open failed.");
+       my $nr_files;
+       my $opts = "";
+       my $disabled_notice;
+       open my $fd, "-|", "$gitbin/git-diff-tree -r $root $parent $hash" or die_error(undef, "Open failed.");
+       $nr_files++ while <$fd>;
+       close $fd or die_error(undef, "Counting diff-tree failed.");
+       if ($nr_files <  $rename_detection_threshold) {
+               $opts .= " -M";
+       } else {
+               $disabled_notice = "(Rename detection disabled)";
+       }
+       open my $fd, "-|", "$gitbin/git-diff-tree -r $opts $root $parent $hash" or die_error(undef, "Open failed.");
        @difftree = map { chomp; $_ } <$fd>;
        close $fd or die_error(undef, "Reading diff-tree failed.");
        git_header_html();
@@ -1671,7 +1685,7 @@ sub git_commit {
        print "</div>\n";
        print "<div class=\"list_head\">\n";
        if ($#difftree > 10) {
-               print(($#difftree + 1) . " files changed:\n");
+               print(($#difftree + 1) . " files changed" . $disabled_notice  . ":\n");
        }
        print "</div>\n";
        print "<table cellspacing=\"0\">\n";

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: gitweb - option to disable rename detection
  2005-08-15 20:48   ` Yasushi SHOJI
@ 2005-08-15 22:43     ` Linus Torvalds
  2005-08-16  1:21       ` Junio C Hamano
  2005-08-16 23:14       ` Yasushi SHOJI
  0 siblings, 2 replies; 6+ messages in thread
From: Linus Torvalds @ 2005-08-15 22:43 UTC (permalink / raw)
  To: Yasushi SHOJI; +Cc: git



On Tue, 16 Aug 2005, Yasushi SHOJI wrote:
>
> > Instead of disabling it entirely, how about just having some limit on it?
> 
> ah, that's a good idea.  here is a quick and dirty patch.

This makes it somewhat more expensive - I was thinking about disabling it 
in git-diff-tree, since the rename logic already knows how many 
new/deleted files there are.

		Linus

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: gitweb - option to disable rename detection
  2005-08-15 22:43     ` Linus Torvalds
@ 2005-08-16  1:21       ` Junio C Hamano
  2005-08-16 23:14       ` Yasushi SHOJI
  1 sibling, 0 replies; 6+ messages in thread
From: Junio C Hamano @ 2005-08-16  1:21 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

Linus Torvalds <torvalds@osdl.org> writes:

> This makes it somewhat more expensive - I was thinking about disabling it 
> in git-diff-tree, since the rename logic already knows how many 
> new/deleted files there are.

That's doable.  I'll rig something up on my next GIT day, along
with the clean-up for diff option handling which we have
postponed for some time.  Clearly this would introduce another
option to diffcore.

OTOH, you could take totally the opposite avenue and take
advantage of the fact that git commit ancestry is immultable.
Once commitdiff is made, you do not have to regenerate it but
cache it and reuse for the next request.  I may be mistaken, but
I vaguely recall kernel.org webservers already does something
like that.

If you have infinite amount of disk space, you could choose to
cache everything, and also prime the cache before any real users
ask for a page.  Once you go that route, you could even give -C
flag for copy detection, with --find-copies-harder which is an
ultra expensive option, and nobody would notice.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: gitweb - option to disable rename detection
  2005-08-15 22:43     ` Linus Torvalds
  2005-08-16  1:21       ` Junio C Hamano
@ 2005-08-16 23:14       ` Yasushi SHOJI
  1 sibling, 0 replies; 6+ messages in thread
From: Yasushi SHOJI @ 2005-08-16 23:14 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

At Mon, 15 Aug 2005 15:43:26 -0700 (PDT),
Linus Torvalds wrote:
> 
> On Tue, 16 Aug 2005, Yasushi SHOJI wrote:
> >
> > > Instead of disabling it entirely, how about just having some limit on it?
> > 
> > ah, that's a good idea.  here is a quick and dirty patch.
> 
> This makes it somewhat more expensive.

yes.  but much cheaper than OOM ;P

>                                        - I was thinking about disabling it 
> in git-diff-tree, since the rename logic already knows how many 
> new/deleted files there are.

I'll leave this feature for homework.  I'll work on
prepare_temp_file() first since it seems to fix my problem.

regards,
--
         yashi

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2005-08-16 23:14 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-08-15 18:31 gitweb - option to disable rename detection Yasushi SHOJI
2005-08-15 18:53 ` Linus Torvalds
2005-08-15 20:48   ` Yasushi SHOJI
2005-08-15 22:43     ` Linus Torvalds
2005-08-16  1:21       ` Junio C Hamano
2005-08-16 23:14       ` Yasushi SHOJI

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).