* [RFC] Dynamic window size on repack?
@ 2007-07-08 21:16 Brian Downing
2007-07-08 21:35 ` Dana How
2007-07-08 21:35 ` Linus Torvalds
0 siblings, 2 replies; 4+ messages in thread
From: Brian Downing @ 2007-07-08 21:16 UTC (permalink / raw)
To: git
I have a CVS repository which is mostly sane, but has an approximately
20MB RTF file that has two hundred revisions or so. (Thank you, Windows
help.)
Now, since this is old history, I want to make it as small as possible.
The only problem is that when I use high --window values for repack,
it goes along swimmingly until it gets to this file, at which point
memory usage quickly rises to the point where I'm well into my swap file.
I think what I'd like is an extra option to repack to limit window
memory usage. This would dynamically scale the window size down if it
can't fit within the limit, then scale it back up once you're off of the
nasty file. This would let me repack my repository with --window=100
and have it actually finish someday on the machines I have access to.
The big file may not be as efficiently packed as possible, but I can
live with that.
My question is, is this sane? Does the repack algorithm depend on having
a fixed window size to work? I'd rather not look into implementing this
if it's silly on the face of it.
Thanks,
-bcd
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [RFC] Dynamic window size on repack?
2007-07-08 21:16 [RFC] Dynamic window size on repack? Brian Downing
@ 2007-07-08 21:35 ` Dana How
2007-07-08 21:35 ` Linus Torvalds
1 sibling, 0 replies; 4+ messages in thread
From: Dana How @ 2007-07-08 21:35 UTC (permalink / raw)
To: Brian Downing; +Cc: git, danahow
On 7/8/07, Brian Downing <bdowning@lavos.net> wrote:
> I have a CVS repository which is mostly sane, but has an approximately
> 20MB RTF file that has two hundred revisions or so. (Thank you, Windows
> help.)
>
> Now, since this is old history, I want to make it as small as possible.
> The only problem is that when I use high --window values for repack,
> it goes along swimmingly until it gets to this file, at which point
> memory usage quickly rises to the point where I'm well into my swap file.
>
> I think what I'd like is an extra option to repack to limit window
> memory usage. This would dynamically scale the window size down if it
> can't fit within the limit, then scale it back up once you're off of the
> nasty file. This would let me repack my repository with --window=100
> and have it actually finish someday on the machines I have access to.
> The big file may not be as efficiently packed as possible, but I can
> live with that.
>
> My question is, is this sane? Does the repack algorithm depend on having
> a fixed window size to work? I'd rather not look into implementing this
> if it's silly on the face of it.
Sounds very sane to me.
It seems like you want something like this
(I've not referred to the code, but there is a loop
that could be modifed to include something like this):
/* build list of delta candidates */
tot = 0;
for (i = 0; i < window; ++i ) {
obj = objects_sorted_for_delta[here + i];
tot += SIZE(obj);
if ( tot > window_limit )
break;
/* insert obj in list of things to delta, or just try it here */
...
}
if ( i <= 1 )
break/return;
window_limit could be set automatically like the variables
for the mmap windows are (no new options).
SIZE() should be defined to return the actual bytes consumed
by the object (I think for this it's uncompressed and undeltified,
but as I said I haven't looked at the code).
It would be better if the current list of objects in the window
were a FIFO. Before each deltification attempt, add objects
from the sort list until #objects > window or tot > window_limit.
After each attempt, drop off the object we were trying to delta.
I like your idea and think you should look into implementing it.
--
Dana L. How danahow@gmail.com +1 650 804 5991 cell
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [RFC] Dynamic window size on repack?
2007-07-08 21:16 [RFC] Dynamic window size on repack? Brian Downing
2007-07-08 21:35 ` Dana How
@ 2007-07-08 21:35 ` Linus Torvalds
2007-07-08 21:39 ` Linus Torvalds
1 sibling, 1 reply; 4+ messages in thread
From: Linus Torvalds @ 2007-07-08 21:35 UTC (permalink / raw)
To: Brian Downing; +Cc: git
On Sun, 8 Jul 2007, Brian Downing wrote:
>
> I think what I'd like is an extra option to repack to limit window
> memory usage. This would dynamically scale the window size down if it
> can't fit within the limit, then scale it back up once you're off of the
> nasty file. This would let me repack my repository with --window=100
> and have it actually finish someday on the machines I have access to.
> The big file may not be as efficiently packed as possible, but I can
> live with that.
>
> My question is, is this sane? Does the repack algorithm depend on having
> a fixed window size to work? I'd rather not look into implementing this
> if it's silly on the face of it.
It doesn't sound silly, and it should even be fairly easy. The window code
is all in builtin-pack-objects.c (find_deltas()) and while it's currently
coded for a constant-sized window, it shouldn't be too hard to free more
old entries if you allocate one big one to make sure that the "array"
thing doesn't grow to contain too much data.
In other words, just look at how the variables "struct unpacked *array"
(the whole window array) and the "struct unpacked *n" (the "next entry" in
the array using a simple circular queue using "idx") are accessed.
Linus
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [RFC] Dynamic window size on repack?
2007-07-08 21:35 ` Linus Torvalds
@ 2007-07-08 21:39 ` Linus Torvalds
0 siblings, 0 replies; 4+ messages in thread
From: Linus Torvalds @ 2007-07-08 21:39 UTC (permalink / raw)
To: Brian Downing; +Cc: git
On Sun, 8 Jul 2007, Linus Torvalds wrote:
>
> In other words, just look at how the variables "struct unpacked *array"
> (the whole window array) and the "struct unpacked *n" (the "next entry" in
> the array using a simple circular queue using "idx") are accessed.
Side note: a limit based on object sizes is likely a much better way to
handle the window than just a "number of objects" thing ever was. Doing
the size in number of objects was easier, and is fine for source code that
tends to have a reasonably normal distribution of sized, but yeah, if you
have a few really big objects with lots of history, then it's likely the
wrong thing to do just because it can get really expensive.
Linus
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2007-07-08 21:40 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-08 21:16 [RFC] Dynamic window size on repack? Brian Downing
2007-07-08 21:35 ` Dana How
2007-07-08 21:35 ` Linus Torvalds
2007-07-08 21:39 ` Linus Torvalds
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.