* [RFC/PATCH] Add a --nosort option to pack-objects
@ 2007-12-07 21:10 Mike Hommey
2007-12-07 21:24 ` Nicolas Pitre
2007-12-07 21:25 ` Junio C Hamano
0 siblings, 2 replies; 8+ messages in thread
From: Mike Hommey @ 2007-12-07 21:10 UTC (permalink / raw)
To: git; +Cc: Junio C Hamano
While most of the time the heuristics used by pack-objects to sort the
given object list are satisfying enough, there are cases where it can be
useful for the user to sort the list with heuristics that would be better
suited.
The --nosort option disabled the internal sorting used by pack-objects,
and runs the sliding window along the object list litterally as given on
stdin.
Signed-off-by: Mike Hommey <mh@glandium.org>
---
I would obviously add the appropriate documentation for this flag if this
is accepted. I'll also try to send another documentation patch for
pack-objects with some information compiled from Linus's explanation to my
last message about pack-objects.
builtin-pack-objects.c | 11 +++++++++--
1 files changed, 9 insertions(+), 2 deletions(-)
diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
index 4f44658..8bc2d5f 100644
--- a/builtin-pack-objects.c
+++ b/builtin-pack-objects.c
@@ -21,7 +21,7 @@
static const char pack_usage[] = "\
git-pack-objects [{ -q | --progress | --all-progress }] \n\
- [--max-pack-size=N] [--local] [--incremental] \n\
+ [--max-pack-size=N] [--local] [--incremental] [--nosort]\n\
[--window=N] [--window-memory=N] [--depth=N] \n\
[--no-reuse-delta] [--no-reuse-object] [--delta-base-offset] \n\
[--threads=N] [--non-empty] [--revs [--unpacked | --all]*] [--reflog] \n\
@@ -64,6 +64,7 @@ static int non_empty;
static int no_reuse_delta, no_reuse_object, keep_unreachable;
static int local;
static int incremental;
+static int nosort;
static int allow_ofs_delta;
static const char *base_name;
static int progress = 1;
@@ -1715,7 +1716,9 @@ static void prepare_pack(int window, int depth)
if (progress)
progress_state = start_progress("Compressing objects",
nr_deltas);
- qsort(delta_list, n, sizeof(*delta_list), type_size_sort);
+ if (! nosort)
+ qsort(delta_list, n, sizeof(*delta_list),
+ type_size_sort);
ll_find_deltas(delta_list, n, window+1, depth, &nr_done);
stop_progress(&progress_state);
if (nr_done != nr_deltas)
@@ -1988,6 +1991,10 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
incremental = 1;
continue;
}
+ if (!strcmp("--nosort", arg)) {
+ nosort = 1;
+ continue;
+ }
if (!prefixcmp(arg, "--compression=")) {
char *end;
int level = strtoul(arg+14, &end, 0);
--
1.5.3.7
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [RFC/PATCH] Add a --nosort option to pack-objects
2007-12-07 21:10 [RFC/PATCH] Add a --nosort option to pack-objects Mike Hommey
@ 2007-12-07 21:24 ` Nicolas Pitre
2007-12-07 21:44 ` Mike Hommey
2007-12-07 21:25 ` Junio C Hamano
1 sibling, 1 reply; 8+ messages in thread
From: Nicolas Pitre @ 2007-12-07 21:24 UTC (permalink / raw)
To: Mike Hommey; +Cc: git, Junio C Hamano
On Fri, 7 Dec 2007, Mike Hommey wrote:
> While most of the time the heuristics used by pack-objects to sort the
> given object list are satisfying enough, there are cases where it can be
> useful for the user to sort the list with heuristics that would be better
> suited.
Could you please elaborate on those cases where the current heuristic
would be unsatisfactory?
Nicolas
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC/PATCH] Add a --nosort option to pack-objects
2007-12-07 21:10 [RFC/PATCH] Add a --nosort option to pack-objects Mike Hommey
2007-12-07 21:24 ` Nicolas Pitre
@ 2007-12-07 21:25 ` Junio C Hamano
2007-12-07 21:37 ` Junio C Hamano
2007-12-07 21:46 ` Mike Hommey
1 sibling, 2 replies; 8+ messages in thread
From: Junio C Hamano @ 2007-12-07 21:25 UTC (permalink / raw)
To: Mike Hommey; +Cc: git
Mike Hommey <mh@glandium.org> writes:
> The --nosort option disabled the internal sorting used by pack-objects,
> and runs the sliding window along the object list litterally as given on
> stdin.
I think this is a good way to give people an easier way to experiment.
But it makes me wonder if this is disabling too much, and if the list
should be sorted at least by type, as we won't delta different types of
objects against each other.
At the beginning of try_delta(), when we see that the next candidate is
of a different type, we return -1 telling the caller that "No object in
the window will ever be a good delta base for the current object, please
abort". This relies on the fact that we sort by type first, so I think
one of the following is necessary:
(1) you weaken this check (return 0, saying "This did not delta well but
do not give up yet"),
(2) you document this well so that --nosort user will know, or
(3) you sort --nosort input by type.
> I would obviously add the appropriate documentation for this flag if this
> is accepted. I'll also try to send another documentation patch for
> pack-objects with some information compiled from Linus's explanation to my
> last message about pack-objects.
I need to rant here a bit.
Sometimes people say "Here is my patch. If this is accepted, I'll add
documentation and tests". My reaction is, "Don't you, as the person who
proposes that change, believe in your patch deeply enough yourself to be
willing to perfect it, to make it suitable for consumption by the
general public, whether it is included in my tree or not? A change that
even you do not believe in deeply enough probably to perfect would not
benefit the general public, so thanks but no thanks, I'll pass."
Fortunately we haven't had this problem too many times on this list.
I would not have minded at all if you said:
Obviously, appropriate documentation and tests are needed before
inclusion, but I am sending this out primarily to seek opinions
from the list to make sure this is going in the right direction,
iow, this is an RFC.
What bugged me was the phrase "if this is accepted".
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC/PATCH] Add a --nosort option to pack-objects
2007-12-07 21:25 ` Junio C Hamano
@ 2007-12-07 21:37 ` Junio C Hamano
2007-12-07 21:46 ` Mike Hommey
1 sibling, 0 replies; 8+ messages in thread
From: Junio C Hamano @ 2007-12-07 21:37 UTC (permalink / raw)
To: Mike Hommey; +Cc: git
Junio C Hamano <gitster@pobox.com> writes:
> Mike Hommey <mh@glandium.org> writes:
>
>> The --nosort option disabled the internal sorting used by pack-objects,
>> and runs the sliding window along the object list litterally as given on
>> stdin.
>
> I think this is a good way to give people an easier way to experiment.
Actually, I take the half of this back.
We need a pair of two good sort orders. The order of objects fed to
pack-objects determines the final layout of the resulting pack, and
using something like the "recency order" we currently have is to
optimize the layout in the resulting pack for typical access patterns.
By using your --nosort, you may be able to influence the deltification
process, but the order you use will most likely not match the access
pattern of the resulting pack. So it will be an easy, quick-and-dirty
way to _experiment_ how the deltification sort order affects the final
pack size, but I suspect that the resulting "small" pack won't be useful
in the real life.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC/PATCH] Add a --nosort option to pack-objects
2007-12-07 21:24 ` Nicolas Pitre
@ 2007-12-07 21:44 ` Mike Hommey
0 siblings, 0 replies; 8+ messages in thread
From: Mike Hommey @ 2007-12-07 21:44 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: git, Junio C Hamano
On Fri, Dec 07, 2007 at 04:24:17PM -0500, Nicolas Pitre wrote:
> On Fri, 7 Dec 2007, Mike Hommey wrote:
>
> > While most of the time the heuristics used by pack-objects to sort the
> > given object list are satisfying enough, there are cases where it can be
> > useful for the user to sort the list with heuristics that would be better
> > suited.
>
> Could you please elaborate on those cases where the current heuristic
> would be unsatisfactory?
I imagine it could be useful when importing a huge tree in the first commit,
when some data in the tree is redundant with (or similar to) others in the
same tree. I guess there could be some other VCS use-cases.
The real case where I've been using this is that I use git to store my debian
build logs in an efficient manner, and having a custom-sorted list of objects
ends up being much faster and less memory consuming than using a huge
window (and 1GB of logs became less than 10MB).
Mike
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC/PATCH] Add a --nosort option to pack-objects
2007-12-07 21:25 ` Junio C Hamano
2007-12-07 21:37 ` Junio C Hamano
@ 2007-12-07 21:46 ` Mike Hommey
2007-12-07 22:20 ` Nicolas Pitre
1 sibling, 1 reply; 8+ messages in thread
From: Mike Hommey @ 2007-12-07 21:46 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
On Fri, Dec 07, 2007 at 01:25:24PM -0800, Junio C Hamano wrote:
> I need to rant here a bit.
>
> Sometimes people say "Here is my patch. If this is accepted, I'll add
> documentation and tests". My reaction is, "Don't you, as the person who
> proposes that change, believe in your patch deeply enough yourself to be
> willing to perfect it, to make it suitable for consumption by the
> general public, whether it is included in my tree or not? A change that
> even you do not believe in deeply enough probably to perfect would not
> benefit the general public, so thanks but no thanks, I'll pass."
As you can seen from my other message, I'm *actually* not sure this is
really material for git as a VCS. I will add documentation unrelated to
--nosort to pack-objects anyways.
Mike
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC/PATCH] Add a --nosort option to pack-objects
2007-12-07 21:46 ` Mike Hommey
@ 2007-12-07 22:20 ` Nicolas Pitre
2007-12-08 8:54 ` Mike Hommey
0 siblings, 1 reply; 8+ messages in thread
From: Nicolas Pitre @ 2007-12-07 22:20 UTC (permalink / raw)
To: Mike Hommey; +Cc: Junio C Hamano, git
On Fri, 7 Dec 2007, Mike Hommey wrote:
> As you can seen from my other message, I'm *actually* not sure this is
> really material for git as a VCS. I will add documentation unrelated to
> --nosort to pack-objects anyways.
Well, I have serious doubts about this patch in the first place.
I think it is simply unneeded.
If you want pack-objects not to change the sort order because you have
some sorting of your own, externally implemented, then you simply have
to run git-pack-objects feeding it the list of object SHA1s along with a
tag of your own which will effectively impose the sorting you want,
based on that tag.
Objects with the same tag will still be sorted amongst themselves which
is still a good thing.
for example, you may have something like:
git rev-list --all --objects |
sed -e 's|foo/logs/.*|LOGS|' |
git pack-objects ...
This will effectively cluster all foo/logs/* files together for delta
compression regardless of their actual name. Maybe that's what you
really want?
Nicolas
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [RFC/PATCH] Add a --nosort option to pack-objects
2007-12-07 22:20 ` Nicolas Pitre
@ 2007-12-08 8:54 ` Mike Hommey
0 siblings, 0 replies; 8+ messages in thread
From: Mike Hommey @ 2007-12-08 8:54 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: Junio C Hamano, git
On Fri, Dec 07, 2007 at 05:20:59PM -0500, Nicolas Pitre wrote:
> On Fri, 7 Dec 2007, Mike Hommey wrote:
>
> > As you can seen from my other message, I'm *actually* not sure this is
> > really material for git as a VCS. I will add documentation unrelated to
> > --nosort to pack-objects anyways.
>
> Well, I have serious doubts about this patch in the first place.
>
> I think it is simply unneeded.
>
> If you want pack-objects not to change the sort order because you have
> some sorting of your own, externally implemented, then you simply have
> to run git-pack-objects feeding it the list of object SHA1s along with a
> tag of your own which will effectively impose the sorting you want,
> based on that tag.
>
> Objects with the same tag will still be sorted amongst themselves which
> is still a good thing.
>
> for example, you may have something like:
>
> git rev-list --all --objects |
> sed -e 's|foo/logs/.*|LOGS|' |
> git pack-objects ...
>
> This will effectively cluster all foo/logs/* files together for delta
> compression regardless of their actual name. Maybe that's what you
> really want?
I've been thinking about this, but I'm not sure the list won't be
mixed up with the rest of the sort... I'll just try to see what the
sorted list look like...
Mike
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2007-12-08 8:54 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-12-07 21:10 [RFC/PATCH] Add a --nosort option to pack-objects Mike Hommey
2007-12-07 21:24 ` Nicolas Pitre
2007-12-07 21:44 ` Mike Hommey
2007-12-07 21:25 ` Junio C Hamano
2007-12-07 21:37 ` Junio C Hamano
2007-12-07 21:46 ` Mike Hommey
2007-12-07 22:20 ` Nicolas Pitre
2007-12-08 8:54 ` Mike Hommey
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).