* gitk: avoid obscene memory consumption
@ 2016-11-04 19:49 Markus Hitter
2016-11-04 22:45 ` Stefan Beller
0 siblings, 1 reply; 7+ messages in thread
From: Markus Hitter @ 2016-11-04 19:49 UTC (permalink / raw)
To: git
Hello all,
after Gitk brought my shabby development machine (Core2Duo, 4 GB RAM, Ubuntu 16.10, no swap to save the SSD) to its knees once more than I'm comfortable with, I decided to investigate this issue.
Result of this investigation is, my Git repo has a commit with a diff of some 365'000 lines and Gitk tries to display all of them, consuming more than 1.5 GB of memory.
The solution is to cut off diffs at 50'000 lines for the display. This consumes about 350 MB RAM, still a lot. These first 50'000 lines are shown, followed by a copyable message on how to view the full diff on the command line. Diffs shorter than this limit are displayed as before.
To test the waters whether such a change is welcome, here's the patch as I currently use it. If this patch makes sense I'll happily apply change requests and bring it more in line with Git's patch submission expectations. The patch is made against git(k) version 2.9.3, the one coming with latest Ubuntu. Please also note that this is the first time I wrote some Tcl code, so the strategy used might not follow best Tcl practices.
$ diff -uw /usr/bin/gitk.org /usr/bin/gitk
--- /usr/bin/gitk.org 2016-08-16 22:32:47.000000000 +0200
+++ /usr/bin/gitk 2016-11-04 20:06:14.805920404 +0100
@@ -7,6 +7,15 @@
# and distributed under the terms of the GNU General Public Licence,
# either version 2, or (at your option) any later version.
+# Markus: trying to limit memory consumption. It happened that
+# complex commits led to more than 1.5 GB of memory usage.
+#
+# The problem was identified to be caused by extremely long diffs. The
+# commit leading to this research had some 365'000 lines of diff, consuming
+# these 1.5 GB when drawn into the canvas. The solution is to limit diffs to
+# 50'000 lines and skipping the rest. In case of a cutoff, a CLI command for
+# getting the full diff is shown.
+
package require Tk
proc hasworktree {} {
@@ -7956,6 +7965,7 @@
proc getblobdiffs {ids} {
global blobdifffd diffids env
+ global parseddifflines
global treediffs
global diffcontext
global ignorespace
@@ -7987,6 +7997,7 @@
}
fconfigure $bdf -blocking 0 -encoding binary -eofchar {}
set blobdifffd($ids) $bdf
+ set parseddifflines 0
initblobdiffvars
filerun $bdf [list getblobdiffline $bdf $diffids]
}
@@ -8063,20 +8074,34 @@
proc getblobdiffline {bdf ids} {
global diffids blobdifffd
+ global parseddifflines
global ctext
set nr 0
+ set maxlines 50000
$ctext conf -state normal
while {[incr nr] <= 1000 && [gets $bdf line] >= 0} {
+ incr parseddifflines
+ if {$parseddifflines >= $maxlines} {
+ break
+ }
if {$ids != $diffids || $bdf != $blobdifffd($ids)} {
catch {close $bdf}
return 0
}
parseblobdiffline $ids $line
}
+ if {$parseddifflines >= $maxlines} {
+ $ctext insert end "\n------------------" hunksep
+ $ctext insert end " Lines exceeding $maxlines skipped " hunksep
+ $ctext insert end "------------------\n\n" hunksep
+ $ctext insert end "To get a full diff, run\n\n" hunksep
+ $ctext insert end " git diff-tree -p -C --cc $ids\n\n" hunksep
+ $ctext insert end "on the command line.\n" hunksep
+ }
$ctext conf -state disabled
blobdiffmaybeseehere [eof $bdf]
- if {[eof $bdf]} {
+ if {[eof $bdf] || $parseddifflines >= $maxlines} {
catch {close $bdf}
return 0
}
@@ -9093,6 +9118,7 @@
proc diffcommits {a b} {
global diffcontext diffids blobdifffd diffinhdr currdiffsubmod
+ global parseddifflines
set tmpdir [gitknewtmpdir]
set fna [file join $tmpdir "commit-[string range $a 0 7]"]
@@ -9114,6 +9140,7 @@
set blobdifffd($diffids) $fd
set diffinhdr 0
set currdiffsubmod ""
+ set parseddifflines 0
filerun $fd [list getblobdiffline $fd $diffids]
}
Cheers,
Markus
--
- - - - - - - - - - - - - - - - - - -
Dipl. Ing. (FH) Markus Hitter
http://www.jump-ing.de/
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: gitk: avoid obscene memory consumption
2016-11-04 19:49 gitk: avoid obscene memory consumption Markus Hitter
@ 2016-11-04 22:45 ` Stefan Beller
2016-11-05 11:08 ` Paul Mackerras
0 siblings, 1 reply; 7+ messages in thread
From: Stefan Beller @ 2016-11-04 22:45 UTC (permalink / raw)
To: Markus Hitter, Paul Mackerras; +Cc: git@vger.kernel.org
On Fri, Nov 4, 2016 at 12:49 PM, Markus Hitter <mah@jump-ing.de> wrote:
>
> Hello all,
+cc Paul Mackeras, who maintains gitk.
>
> after Gitk brought my shabby development machine (Core2Duo, 4 GB RAM, Ubuntu 16.10, no swap to save the SSD) to its knees once more than I'm comfortable with, I decided to investigate this issue.
>
> Result of this investigation is, my Git repo has a commit with a diff of some 365'000 lines and Gitk tries to display all of them, consuming more than 1.5 GB of memory.
>
> The solution is to cut off diffs at 50'000 lines for the display. This consumes about 350 MB RAM, still a lot. These first 50'000 lines are shown, followed by a copyable message on how to view the full diff on the command line. Diffs shorter than this limit are displayed as before.
Bikeshedding: I'd argue to even lower the number to 5-10k lines.
>
> To test the waters whether such a change is welcome, here's the patch as I currently use it. If this patch makes sense I'll happily apply change requests and bring it more in line with Git's patch submission expectations.
I have never contributed to gitk myself,
which is hosted at git://ozlabs.org/~paulus/gitk
though I'd expect these guide lines would roughly apply:
https://github.com/git/git/blob/master/Documentation/SubmittingPatches
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: gitk: avoid obscene memory consumption
2016-11-04 22:45 ` Stefan Beller
@ 2016-11-05 11:08 ` Paul Mackerras
2016-11-06 10:28 ` Markus Hitter
0 siblings, 1 reply; 7+ messages in thread
From: Paul Mackerras @ 2016-11-05 11:08 UTC (permalink / raw)
To: Stefan Beller; +Cc: Markus Hitter, git@vger.kernel.org
On Fri, Nov 04, 2016 at 03:45:09PM -0700, Stefan Beller wrote:
> On Fri, Nov 4, 2016 at 12:49 PM, Markus Hitter <mah@jump-ing.de> wrote:
> >
> > Hello all,
>
> +cc Paul Mackeras, who maintains gitk.
Thanks.
> >
> > after Gitk brought my shabby development machine (Core2Duo, 4 GB RAM, Ubuntu 16.10, no swap to save the SSD) to its knees once more than I'm comfortable with, I decided to investigate this issue.
> >
> > Result of this investigation is, my Git repo has a commit with a diff of some 365'000 lines and Gitk tries to display all of them, consuming more than 1.5 GB of memory.
> >
> > The solution is to cut off diffs at 50'000 lines for the display. This consumes about 350 MB RAM, still a lot. These first 50'000 lines are shown, followed by a copyable message on how to view the full diff on the command line. Diffs shorter than this limit are displayed as before.
That sounds reasonable.
>
> Bikeshedding: I'd argue to even lower the number to 5-10k lines.
I could go with 10k.
>
> >
> > To test the waters whether such a change is welcome, here's the patch as I currently use it. If this patch makes sense I'll happily apply change requests and bring it more in line with Git's patch submission expectations.
>
> I have never contributed to gitk myself,
> which is hosted at git://ozlabs.org/~paulus/gitk
> though I'd expect these guide lines would roughly apply:
> https://github.com/git/git/blob/master/Documentation/SubmittingPatches
Paul.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: gitk: avoid obscene memory consumption
2016-11-05 11:08 ` Paul Mackerras
@ 2016-11-06 10:28 ` Markus Hitter
2016-11-06 20:33 ` Jacob Keller
2016-11-07 4:11 ` Paul Mackerras
0 siblings, 2 replies; 7+ messages in thread
From: Markus Hitter @ 2016-11-06 10:28 UTC (permalink / raw)
To: Paul Mackerras, Stefan Beller; +Cc: git@vger.kernel.org
Am 05.11.2016 um 12:08 schrieb Paul Mackerras:
> On Fri, Nov 04, 2016 at 03:45:09PM -0700, Stefan Beller wrote:
>> On Fri, Nov 4, 2016 at 12:49 PM, Markus Hitter <mah@jump-ing.de> wrote:
>>>
>>> Hello all,
>>
>> +cc Paul Mackeras, who maintains gitk.
>
> Thanks.
>
>>>
>>> after Gitk brought my shabby development machine (Core2Duo, 4 GB RAM, Ubuntu 16.10, no swap to save the SSD) to its knees once more than I'm comfortable with, I decided to investigate this issue.
>>>
>>> Result of this investigation is, my Git repo has a commit with a diff of some 365'000 lines and Gitk tries to display all of them, consuming more than 1.5 GB of memory.
>>>
>>> The solution is to cut off diffs at 50'000 lines for the display. This consumes about 350 MB RAM, still a lot. These first 50'000 lines are shown, followed by a copyable message on how to view the full diff on the command line. Diffs shorter than this limit are displayed as before.
>
> That sounds reasonable.
>
>>
>> Bikeshedding: I'd argue to even lower the number to 5-10k lines.
>
> I could go with 10k.
Thanks for the positive comments.
TBH, the more I think about the problem, the less I'm satisfied with the solution I provided. Including two reasons:
- The list of files affected to the right is still complete and clicking a file name further down results in nothing ... as if the file wasn't part of the diff.
- Local searches. Cutting off diffs makes them unreliable. Global searches still work, but actually viewing a search result in the skipped section is no longer possible.
So I'm watching out for better solutions. So far I can think of these:
- Storing only the actually viewed diff. It's an interactive tool, so there's no advantage in displaying the diff in 0.001 seconds over viewing it in 0.1 seconds. As far as I can see, Gitk currently stores every diff it gets a hold of forever.
- View the diff sparsely. Like rendering only the actually visible portion.
- Enhancing ctext. This reference diff has 28 million characters, so there should be a way to store this with color information in, let's say, 29 MB of memory.
Any additional ideas?
Markus
--
- - - - - - - - - - - - - - - - - - -
Dipl. Ing. (FH) Markus Hitter
http://www.jump-ing.de/
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: gitk: avoid obscene memory consumption
2016-11-06 10:28 ` Markus Hitter
@ 2016-11-06 20:33 ` Jacob Keller
2016-11-07 4:11 ` Paul Mackerras
1 sibling, 0 replies; 7+ messages in thread
From: Jacob Keller @ 2016-11-06 20:33 UTC (permalink / raw)
To: Markus Hitter; +Cc: Paul Mackerras, Stefan Beller, git@vger.kernel.org
On Sun, Nov 6, 2016 at 2:28 AM, Markus Hitter <mah@jump-ing.de> wrote:
> - Storing only the actually viewed diff. It's an interactive tool, so there's no advantage in displaying the diff in 0.001 seconds over viewing it in 0.1 seconds. As far as I can see, Gitk currently stores every diff it gets a hold of forever.
>
This seems like the right solution. Store only what we need to view as
we need to view it. (IE: lazily generate the diff and don't keep it
long term, possibly by generating each file separately when that file
is viewed)?
> - View the diff sparsely. Like rendering only the actually visible portion.
>
This also would be valuable, as part of the solution above.
> - Enhancing ctext. This reference diff has 28 million characters, so there should be a way to store this with color information in, let's say, 29 MB of memory.
>
I think all three suggestions here are a better solution that what
you've outlined already as you explain the problems caused by cutting
off the diff.
Thanks,
Jake
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: gitk: avoid obscene memory consumption
2016-11-06 10:28 ` Markus Hitter
2016-11-06 20:33 ` Jacob Keller
@ 2016-11-07 4:11 ` Paul Mackerras
2016-11-07 13:43 ` Markus Hitter
1 sibling, 1 reply; 7+ messages in thread
From: Paul Mackerras @ 2016-11-07 4:11 UTC (permalink / raw)
To: Markus Hitter; +Cc: Stefan Beller, git@vger.kernel.org
On Sun, Nov 06, 2016 at 11:28:37AM +0100, Markus Hitter wrote:
>
> Thanks for the positive comments.
>
> TBH, the more I think about the problem, the less I'm satisfied with the solution I provided. Including two reasons:
>
> - The list of files affected to the right is still complete and clicking a file name further down results in nothing ... as if the file wasn't part of the diff.
>
> - Local searches. Cutting off diffs makes them unreliable. Global searches still work, but actually viewing a search result in the skipped section is no longer possible.
>
> So I'm watching out for better solutions. So far I can think of these:
>
> - Storing only the actually viewed diff. It's an interactive tool, so there's no advantage in displaying the diff in 0.001 seconds over viewing it in 0.1 seconds. As far as I can see, Gitk currently stores every diff it gets a hold of forever.
It does? That would be a bug. :)
>
> - View the diff sparsely. Like rendering only the actually visible portion.
>
> - Enhancing ctext. This reference diff has 28 million characters, so there should be a way to store this with color information in, let's say, 29 MB of memory.
Tcl uses Unicode internally, I believe, so 57MB, but yes.
Paul.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: gitk: avoid obscene memory consumption
2016-11-07 4:11 ` Paul Mackerras
@ 2016-11-07 13:43 ` Markus Hitter
0 siblings, 0 replies; 7+ messages in thread
From: Markus Hitter @ 2016-11-07 13:43 UTC (permalink / raw)
To: Paul Mackerras; +Cc: Stefan Beller, git@vger.kernel.org
Am 07.11.2016 um 05:11 schrieb Paul Mackerras:
>> - Storing only the actually viewed diff. It's an interactive tool, so there's no advantage in displaying the diff in 0.001 seconds over viewing it in 0.1 seconds. As far as I can see, Gitk currently stores every diff it gets a hold of forever.
> It does? That would be a bug. :)
>
So far I've found three arrays being populated lazily (which is good) but never being released (which ignores changes to the underlying repo):
$commitinfo: one entry of about 500 bytes per line viewed in the list of commits. Maximum size of the array is the number of commits. As far as I can see, this array should be removed on a reload (Shift-F5).
$blobdifffd: one entry of about 45 bytes for every commit ever read. The underlying file descriptor gets closed, but the entry in this array remains. So far I didn't find the reason why this array exists at all. It's also not removed on a reload.
$treediffs: always the same number of entries as $blobdiffd, but > 1000 bytes/entry. Removed/refreshed on a reload (good!), different number of entries from that point on.
In case you want to play as well, here's the code I wrote for the investigation, it can be appended right at the bottom of the gitk script:
--------------8<---------------
proc variableSizes {} {
# Add variable here to get them shown.
global diffcontext diffids blobdifffd currdiffsubmod commitinfo
global diffnexthead diffnextnote difffilestart
global diffinhdr treediffs
puts "---------------------------------------------------"
foreach V [info vars] {
if { ! [info exists $V] } {
continue
}
set count 0
set bytes 0
if [array exists $V] {
set count [array size $V]
foreach I [array get $V] {
set bytes [expr $bytes + [string bytelength $I]]
}
} elseif [catch {llength [set $V]}] {
set count [llength [set $V]]
# set bytes [string bytelength [list {*}[set $V]]]
} else {
set bytes [string bytelength [set $V]]
}
puts [format "%20s: %5d items, %10d bytes" $V $count $bytes]
}
# catch {
# set output [memory info]
# puts $output
# }
after 3000 variableSizes
}
variableSizes
-------------->8---------------
[memory info] requires a Tcl with memory debug enabled.
Markus
--
- - - - - - - - - - - - - - - - - - -
Dipl. Ing. (FH) Markus Hitter
http://www.jump-ing.de/
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2016-11-07 13:43 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-11-04 19:49 gitk: avoid obscene memory consumption Markus Hitter
2016-11-04 22:45 ` Stefan Beller
2016-11-05 11:08 ` Paul Mackerras
2016-11-06 10:28 ` Markus Hitter
2016-11-06 20:33 ` Jacob Keller
2016-11-07 4:11 ` Paul Mackerras
2016-11-07 13:43 ` Markus Hitter
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).