* [PATCH] git fast-export: add --no-data option
@ 2009-07-25 13:45 Geoffrey Irving
2009-07-25 14:28 ` Johannes Schindelin
2009-07-25 17:25 ` Junio C Hamano
0 siblings, 2 replies; 9+ messages in thread
From: Geoffrey Irving @ 2009-07-25 13:45 UTC (permalink / raw)
To: git, Junio C Hamano
When using git fast-export and git fast-import to rewrite the history
of a repository with large binary files, almost all of the time is
spent dealing with blobs. This is extremely inefficient if all we want
to do is rewrite the commits and tree structure. --no-data skips the
output of blobs and writes SHA-1s instead of marks, which provides a
massive speedup.
Signed-off-by: Geoffrey Irving <irving@naml.us>
---
I've already done all I need with this change (for now, at least), but
here it is in case it proves useful to others. Amusingly, rewriting
history with
git fast-export --no-data <branch> | <python-script> | git fast-import
is now much, much faster than the equivalent
git filter-branch --prune-empty --msg-filter ...
I haven't investigated why.
Documentation/git-fast-export.txt | 7 +++++++
builtin-fast-export.c | 8 +++++++-
2 files changed, 14 insertions(+), 1 deletions(-)
diff --git a/Documentation/git-fast-export.txt
b/Documentation/git-fast-export.txt
index 0c9eb56..47a96dd 100644
--- a/Documentation/git-fast-export.txt
+++ b/Documentation/git-fast-export.txt
@@ -71,6 +71,13 @@ marks the same across runs.
allow that. So fake a tagger to be able to fast-import the
output.
+--no-data::
+ Skip output of blob objects and instead refer to blobs via
+ their original SHA-1 hash. This is useful when rewriting the
+ directory structure or history of a repository without
+ touching the contents of individual files. Note that the
+ resulting stream can only be used by a repository which
+ already contains the necessary objects.
EXAMPLES
--------
diff --git a/builtin-fast-export.c b/builtin-fast-export.c
index 9a8a6fc..ac72791 100644
--- a/builtin-fast-export.c
+++ b/builtin-fast-export.c
@@ -25,6 +25,7 @@ static const char *fast_export_usage[] = {
static int progress;
static enum { VERBATIM, WARN, STRIP, ABORT } signed_tag_mode = ABORT;
static int fake_missing_tagger;
+static int no_data;
static int parse_opt_signed_tag_mode(const struct option *opt,
const char *arg, int unset)
@@ -101,6 +102,9 @@ static void handle_object(const unsigned char *sha1)
char *buf;
struct object *object;
+ if (no_data)
+ return;
+
if (is_null_sha1(sha1))
return;
@@ -158,7 +162,7 @@ static void show_filemodify(struct diff_queue_struct *q,
* Links refer to objects in another repositories;
* output the SHA-1 verbatim.
*/
- if (S_ISGITLINK(spec->mode))
+ if (no_data || S_ISGITLINK(spec->mode))
printf("M %06o %s %s\n", spec->mode,
sha1_to_hex(spec->sha1), spec->path);
else {
@@ -504,6 +508,8 @@ int cmd_fast_export(int argc, const char **argv,
const char *prefix)
"Import marks from this file"),
OPT_BOOLEAN(0, "fake-missing-tagger", &fake_missing_tagger,
"Fake a tagger when tags lack one"),
+ OPT_BOOLEAN(0, "no-data", &no_data,
+ "Skip output of blob data"),
OPT_END()
};
--
1.6.3.1
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH] git fast-export: add --no-data option 2009-07-25 13:45 [PATCH] git fast-export: add --no-data option Geoffrey Irving @ 2009-07-25 14:28 ` Johannes Schindelin 2009-07-25 17:25 ` Junio C Hamano 1 sibling, 0 replies; 9+ messages in thread From: Johannes Schindelin @ 2009-07-25 14:28 UTC (permalink / raw) To: Geoffrey Irving; +Cc: git, Junio C Hamano Hi, On Sat, 25 Jul 2009, Geoffrey Irving wrote: > When using git fast-export and git fast-import to rewrite the history > of a repository with large binary files, almost all of the time is > spent dealing with blobs. This is extremely inefficient if all we want > to do is rewrite the commits and tree structure. --no-data skips the > output of blobs and writes SHA-1s instead of marks, which provides a > massive speedup. ACK. I was looking for such an option recently, and was disappointed that it was not there yet. Ciao, Dscho ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] git fast-export: add --no-data option 2009-07-25 13:45 [PATCH] git fast-export: add --no-data option Geoffrey Irving 2009-07-25 14:28 ` Johannes Schindelin @ 2009-07-25 17:25 ` Junio C Hamano 2009-07-25 17:44 ` Johannes Schindelin 1 sibling, 1 reply; 9+ messages in thread From: Junio C Hamano @ 2009-07-25 17:25 UTC (permalink / raw) To: Geoffrey Irving; +Cc: git, Johannes Schindelin Geoffrey Irving <irving@naml.us> writes: > @@ -504,6 +508,8 @@ int cmd_fast_export(int argc, const char **argv, > const char *prefix) > "Import marks from this file"), > OPT_BOOLEAN(0, "fake-missing-tagger", &fake_missing_tagger, > "Fake a tagger when tags lack one"), > + OPT_BOOLEAN(0, "no-data", &no_data, > + "Skip output of blob data"), Shouldn't this be --[no-]data option that defaults to true? Otherwise you would accept --no-no-data that looks silly. > OPT_END() > }; > > -- > 1.6.3.1 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] git fast-export: add --no-data option 2009-07-25 17:25 ` Junio C Hamano @ 2009-07-25 17:44 ` Johannes Schindelin 2009-07-27 12:48 ` Geoffrey Irving 0 siblings, 1 reply; 9+ messages in thread From: Johannes Schindelin @ 2009-07-25 17:44 UTC (permalink / raw) To: Junio C Hamano; +Cc: Geoffrey Irving, git Hi, On Sat, 25 Jul 2009, Junio C Hamano wrote: > Geoffrey Irving <irving@naml.us> writes: > > > @@ -504,6 +508,8 @@ int cmd_fast_export(int argc, const char **argv, > > const char *prefix) > > "Import marks from this file"), > > OPT_BOOLEAN(0, "fake-missing-tagger", &fake_missing_tagger, > > "Fake a tagger when tags lack one"), > > + OPT_BOOLEAN(0, "no-data", &no_data, > > + "Skip output of blob data"), > > Shouldn't this be --[no-]data option that defaults to true? Otherwise you > would accept --no-no-data that looks silly. Maybe OPT_NEGBIT(0, "data", &no_data, "Skip output of blob data", 1), Hmm? Ciao, Dscho ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] git fast-export: add --no-data option 2009-07-25 17:44 ` Johannes Schindelin @ 2009-07-27 12:48 ` Geoffrey Irving 2009-07-27 18:49 ` Johannes Schindelin 0 siblings, 1 reply; 9+ messages in thread From: Geoffrey Irving @ 2009-07-27 12:48 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Junio C Hamano, git On Sat, Jul 25, 2009 at 1:44 PM, Johannes Schindelin<Johannes.Schindelin@gmx.de> wrote: > Hi, > > On Sat, 25 Jul 2009, Junio C Hamano wrote: > >> Geoffrey Irving <irving@naml.us> writes: >> >> > @@ -504,6 +508,8 @@ int cmd_fast_export(int argc, const char **argv, >> > const char *prefix) >> > "Import marks from this file"), >> > OPT_BOOLEAN(0, "fake-missing-tagger", &fake_missing_tagger, >> > "Fake a tagger when tags lack one"), >> > + OPT_BOOLEAN(0, "no-data", &no_data, >> > + "Skip output of blob data"), >> >> Shouldn't this be --[no-]data option that defaults to true? Otherwise you >> would accept --no-no-data that looks silly. > > Maybe > > OPT_NEGBIT(0, "data", &no_data, > "Skip output of blob data", 1), > > Hmm? Not quite. That produces usage: git fast-export [rev-list-opts] --progress <n> show progress after <n> objects --signed-tags <mode> select handling of signed tags --export-marks <FILE> Dump marks to this file --import-marks <FILE> Import marks from this file --fake-missing-tagger Fake a tagger when tags lack one --data Skip output of blob data I don't see similar uses of OPT_NEGBIT, so maybe the necessary option case hasn't been written yet (or I'm missing something obvious)? Geoffrey ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] git fast-export: add --no-data option 2009-07-27 12:48 ` Geoffrey Irving @ 2009-07-27 18:49 ` Johannes Schindelin 2009-07-28 2:20 ` Geoffrey Irving 2009-07-28 4:11 ` Stephen Boyd 0 siblings, 2 replies; 9+ messages in thread From: Johannes Schindelin @ 2009-07-27 18:49 UTC (permalink / raw) To: Geoffrey Irving; +Cc: Junio C Hamano, git [-- Attachment #1: Type: TEXT/PLAIN, Size: 3525 bytes --] Hi, On Mon, 27 Jul 2009, Geoffrey Irving wrote: > On Sat, Jul 25, 2009 at 1:44 PM, Johannes > Schindelin<Johannes.Schindelin@gmx.de> wrote: > > > On Sat, 25 Jul 2009, Junio C Hamano wrote: > > > >> Geoffrey Irving <irving@naml.us> writes: > >> > >> > @@ -504,6 +508,8 @@ int cmd_fast_export(int argc, const char **argv, > >> > const char *prefix) > >> > "Import marks from this file"), > >> > OPT_BOOLEAN(0, "fake-missing-tagger", &fake_missing_tagger, > >> > "Fake a tagger when tags lack one"), > >> > + OPT_BOOLEAN(0, "no-data", &no_data, > >> > + "Skip output of blob data"), > >> > >> Shouldn't this be --[no-]data option that defaults to true? > >> Otherwise you would accept --no-no-data that looks silly. > > > > Maybe > > > > OPT_NEGBIT(0, "data", &no_data, > > "Skip output of blob data", 1), > > > > Hmm? > > Not quite. That produces > > usage: git fast-export [rev-list-opts] > > --progress <n> show progress after <n> objects > --signed-tags <mode> select handling of signed tags > --export-marks <FILE> > Dump marks to this file > --import-marks <FILE> > Import marks from this file > --fake-missing-tagger > Fake a tagger when tags lack one > --data Skip output of blob data > > I don't see similar uses of OPT_NEGBIT, so maybe the necessary option > case hasn't been written yet (or I'm missing something obvious)? There is an ugly solution: { OPTION_NEGBIT, 0, "no-data", &no_data, NULL, NULL, PARSE_OPT_NOARG | PARSE_OPT_HIDDEN, NULL, 0 }, { OPTION_BIT, 0, "no-data", NULL, NULL, "Skip output of blob data", PARSE_OPT_NOARG, NULL, 1 }, and there is a more elegant solution: [PATCH] parse-opt: optionally show "--no-" option string It is usually better to have positive options, to avoid confusing double negations. However, sometimes it is desirable to show the negative option in the help. Introduce the flag PARSE_OPT_NEGHELP to do that. Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> --- parse-options.c | 6 ++++-- parse-options.h | 1 + 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/parse-options.c b/parse-options.c index 68accef..a64a4d6 100644 --- a/parse-options.c +++ b/parse-options.c @@ -511,7 +511,7 @@ static int usage_with_options_internal(const char * const *usagestr, continue; pos = fprintf(stderr, " "); - if (opts->short_name) { + if (opts->short_name && !(opts->flags & PARSE_OPT_NEGHELP)) { if (opts->flags & PARSE_OPT_NODASH) pos += fprintf(stderr, "%c", opts->short_name); else @@ -520,7 +520,9 @@ static int usage_with_options_internal(const char * const *usagestr, if (opts->long_name && opts->short_name) pos += fprintf(stderr, ", "); if (opts->long_name) - pos += fprintf(stderr, "--%s", opts->long_name); + pos += fprintf(stderr, "--%s%s", + (opts->flags & PARSE_OPT_NEGHELP) ? "no-" : "", + opts->long_name); if (opts->type == OPTION_NUMBER) pos += fprintf(stderr, "-NUM"); diff --git a/parse-options.h b/parse-options.h index 4b62361..8f0035a 100644 --- a/parse-options.h +++ b/parse-options.h @@ -36,6 +36,7 @@ enum parse_opt_option_flags { PARSE_OPT_LASTARG_DEFAULT = 16, PARSE_OPT_NODASH = 32, PARSE_OPT_LITERAL_ARGHELP = 64, + PARSE_OPT_NEGHELP = 128, }; struct option; ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH] git fast-export: add --no-data option 2009-07-27 18:49 ` Johannes Schindelin @ 2009-07-28 2:20 ` Geoffrey Irving 2009-07-28 4:11 ` Stephen Boyd 1 sibling, 0 replies; 9+ messages in thread From: Geoffrey Irving @ 2009-07-28 2:20 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Junio C Hamano, git When using git fast-export and git fast-import to rewrite the history of a repository with large binary files, almost all of the time is spent dealing with blobs. This is extremely inefficient if all we want to do is rewrite the commits and tree structure. --no-data skips the output of blobs and writes SHA-1s instead of marks, which provides a massive speedup. Signed-off-by: Geoffrey Irving <irving@naml.us> --- Here's my modified patch on top of Johannes' fix to parse-options. On github: http://github.com/girving/git/commit/98549f6809a4dc22d088f3c2ee1f798e858cce3e http://github.com/girving/git/commit/00a7c591b9a1fc6880ad5f88d118bb1d6ea86878 Documentation/git-fast-export.txt | 7 +++++++ builtin-fast-export.c | 9 ++++++++- 2 files changed, 15 insertions(+), 1 deletions(-) diff --git a/Documentation/git-fast-export.txt b/Documentation/git-fast-export.txt index 0c9eb56..47a96dd 100644 --- a/Documentation/git-fast-export.txt +++ b/Documentation/git-fast-export.txt @@ -71,6 +71,13 @@ marks the same across runs. allow that. So fake a tagger to be able to fast-import the output. +--no-data:: + Skip output of blob objects and instead refer to blobs via + their original SHA-1 hash. This is useful when rewriting the + directory structure or history of a repository without + touching the contents of individual files. Note that the + resulting stream can only be used by a repository which + already contains the necessary objects. EXAMPLES -------- diff --git a/builtin-fast-export.c b/builtin-fast-export.c index 9a8a6fc..a0f0284 100644 --- a/builtin-fast-export.c +++ b/builtin-fast-export.c @@ -25,6 +25,7 @@ static const char *fast_export_usage[] = { static int progress; static enum { VERBATIM, WARN, STRIP, ABORT } signed_tag_mode = ABORT; static int fake_missing_tagger; +static int no_data; static int parse_opt_signed_tag_mode(const struct option *opt, const char *arg, int unset) @@ -101,6 +102,9 @@ static void handle_object(const unsigned char *sha1) char *buf; struct object *object; + if (no_data) + return; + if (is_null_sha1(sha1)) return; @@ -158,7 +162,7 @@ static void show_filemodify(struct diff_queue_struct *q, * Links refer to objects in another repositories; * output the SHA-1 verbatim. */ - if (S_ISGITLINK(spec->mode)) + if (no_data || S_ISGITLINK(spec->mode)) printf("M %06o %s %s\n", spec->mode, sha1_to_hex(spec->sha1), spec->path); else { @@ -504,6 +508,9 @@ int cmd_fast_export(int argc, const char **argv, const char *prefix) "Import marks from this file"), OPT_BOOLEAN(0, "fake-missing-tagger", &fake_missing_tagger, "Fake a tagger when tags lack one"), + { OPTION_NEGBIT, 0, "data", &no_data, NULL, + "Skip output of blob data", + PARSE_OPT_NOARG | PARSE_OPT_NEGHELP, NULL, 1 }, OPT_END() }; -- 1.6.3.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] git fast-export: add --no-data option 2009-07-27 18:49 ` Johannes Schindelin 2009-07-28 2:20 ` Geoffrey Irving @ 2009-07-28 4:11 ` Stephen Boyd 2009-07-28 8:01 ` Johannes Schindelin 1 sibling, 1 reply; 9+ messages in thread From: Stephen Boyd @ 2009-07-28 4:11 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Geoffrey Irving, Junio C Hamano, git Johannes Schindelin wrote: > There is an ugly solution: > > { OPTION_NEGBIT, 0, "no-data", &no_data, NULL, NULL, > PARSE_OPT_NOARG | PARSE_OPT_HIDDEN, NULL, 0 }, > { OPTION_BIT, 0, "no-data", NULL, NULL, > "Skip output of blob data", > PARSE_OPT_NOARG, NULL, 1 }, > > and there is a more elegant solution: > > [PATCH] parse-opt: optionally show "--no-" option string > > It is usually better to have positive options, to avoid confusing double > negations. However, sometimes it is desirable to show the negative option > in the help. > > Introduce the flag PARSE_OPT_NEGHELP to do that. Perhaps with this documentation throw in? diff --git a/parse-options.h b/parse-options.h index 90e577d..14162e9 100644 --- a/parse-options.h +++ b/parse-options.h @@ -81,6 +81,9 @@ typedef int parse_opt_cb(const struct option *, const char *arg, int unset); * PARSE_OPT_LITERAL_ARGHELP: says that argh shouldn't be enclosed in brackets * (i.e. '<argh>') in the help message. * Useful for options with multiple parameters. + * PARSE_OPT_NEGHELP: says that the long option should always be shown with + * the --no prefix in the usage message. Sometimes + * useful for users of OPTION_NEGBIT. * * `callback`:: * pointer to the callback to use for OPTION_CALLBACK. ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] git fast-export: add --no-data option 2009-07-28 4:11 ` Stephen Boyd @ 2009-07-28 8:01 ` Johannes Schindelin 0 siblings, 0 replies; 9+ messages in thread From: Johannes Schindelin @ 2009-07-28 8:01 UTC (permalink / raw) To: Stephen Boyd; +Cc: Geoffrey Irving, Junio C Hamano, git Hi, On Mon, 27 Jul 2009, Stephen Boyd wrote: > Johannes Schindelin wrote: > > > > [PATCH] parse-opt: optionally show "--no-" option string > > > > It is usually better to have positive options, to avoid confusing > > double negations. However, sometimes it is desirable to show the > > negative option in the help. > > > > Introduce the flag PARSE_OPT_NEGHELP to do that. > > Perhaps with this documentation throw in? > > diff --git a/parse-options.h b/parse-options.h > index 90e577d..14162e9 100644 > --- a/parse-options.h > +++ b/parse-options.h > @@ -81,6 +81,9 @@ typedef int parse_opt_cb(const struct option *, const char *arg, int unset); > * PARSE_OPT_LITERAL_ARGHELP: says that argh shouldn't be enclosed in brackets > * (i.e. '<argh>') in the help message. > * Useful for options with multiple parameters. > + * PARSE_OPT_NEGHELP: says that the long option should always be shown with > + * the --no prefix in the usage message. Sometimes > + * useful for users of OPTION_NEGBIT. > * > * `callback`:: > * pointer to the callback to use for OPTION_CALLBACK. > Thanks. Ciao, Dscho ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2009-07-28 8:01 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-07-25 13:45 [PATCH] git fast-export: add --no-data option Geoffrey Irving 2009-07-25 14:28 ` Johannes Schindelin 2009-07-25 17:25 ` Junio C Hamano 2009-07-25 17:44 ` Johannes Schindelin 2009-07-27 12:48 ` Geoffrey Irving 2009-07-27 18:49 ` Johannes Schindelin 2009-07-28 2:20 ` Geoffrey Irving 2009-07-28 4:11 ` Stephen Boyd 2009-07-28 8:01 ` Johannes Schindelin
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.