* [PATCH 0/4] Add git-rewrite-commits @ 2007-07-08 16:23 skimo 2007-07-08 16:23 ` [PATCH 1/4] export get_short_sha1 skimo ` (3 more replies) 0 siblings, 4 replies; 25+ messages in thread From: skimo @ 2007-07-08 16:23 UTC (permalink / raw) To: git, Junio C Hamano From: Sven Verdoolaege <skimo@kotnet.org> This patch series adds git-rewrite-commits and depends (for the tests in the fourth part) on my earlier patch to allow negative matches (or an extended patch by Johannes). [PATCH 1/4] export get_short_sha1 [PATCH 2/4] export add_ref_decoration [PATCH 3/4] revision: mark commits that didn't match a pattern for later use [PATCH 4/4] Add git-rewrite-commits The first two should be fairly uncontroversial. The third may be considered a waste of a precious bit. If so, any suggestions for other ways of passing on this information are welcomed. The fourth contains the actual git-rewrite-commits builtin. My main motivation was that cg-admin-rewritehist doesn't change the SHA1's in commit message and I don't like shell programming. skimo ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 1/4] export get_short_sha1 2007-07-08 16:23 [PATCH 0/4] Add git-rewrite-commits skimo @ 2007-07-08 16:23 ` skimo 2007-07-08 16:23 ` [PATCH 2/4] export add_ref_decoration skimo ` (2 subsequent siblings) 3 siblings, 0 replies; 25+ messages in thread From: skimo @ 2007-07-08 16:23 UTC (permalink / raw) To: git, Junio C Hamano From: Sven Verdoolaege <skimo@kotnet.org> Sometimes it is useful to check whether a given string is exactly a short SHA1. Signed-off-by: Sven Verdoolaege <skimo@kotnet.org> --- cache.h | 1 + sha1_name.c | 3 +-- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/cache.h b/cache.h index e64071e..8fda8ee 100644 --- a/cache.h +++ b/cache.h @@ -391,6 +391,7 @@ static inline unsigned int hexval(unsigned char c) extern int get_sha1(const char *str, unsigned char *sha1); extern int get_sha1_with_mode(const char *str, unsigned char *sha1, unsigned *mode); +extern int get_short_sha1(const char *name, int len, unsigned char *sha1, int quietly); extern int get_sha1_hex(const char *hex, unsigned char *sha1); extern char *sha1_to_hex(const unsigned char *sha1); /* static buffer result! */ extern int read_ref(const char *filename, unsigned char *sha1); diff --git a/sha1_name.c b/sha1_name.c index 858f08c..0bed79d 100644 --- a/sha1_name.c +++ b/sha1_name.c @@ -154,8 +154,7 @@ static int find_unique_short_object(int len, char *canonical, return 0; } -static int get_short_sha1(const char *name, int len, unsigned char *sha1, - int quietly) +int get_short_sha1(const char *name, int len, unsigned char *sha1, int quietly) { int i, status; char canonical[40]; -- 1.5.3.rc0.68.geec71-dirty ^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 2/4] export add_ref_decoration 2007-07-08 16:23 [PATCH 0/4] Add git-rewrite-commits skimo 2007-07-08 16:23 ` [PATCH 1/4] export get_short_sha1 skimo @ 2007-07-08 16:23 ` skimo 2007-07-08 16:23 ` [PATCH 3/4] revision: mark commits that didn't match a pattern for later use skimo 2007-07-08 16:23 ` [PATCH 4/4] Add git-rewrite-commits skimo 3 siblings, 0 replies; 25+ messages in thread From: skimo @ 2007-07-08 16:23 UTC (permalink / raw) To: git, Junio C Hamano From: Sven Verdoolaege <skimo@kotnet.org> add_ref_decoration is also useful outside of git-log. Since the name_decoration declaration appears in commit.h, the function is moved to commit.c. Signed-off-by: Sven Verdoolaege <skimo@kotnet.org> --- builtin-log.c | 25 ------------------------- commit.c | 27 +++++++++++++++++++++++++++ commit.h | 3 +++ log-tree.c | 2 -- 4 files changed, 30 insertions(+), 27 deletions(-) diff --git a/builtin-log.c b/builtin-log.c index 13bae31..c14eea5 100644 --- a/builtin-log.c +++ b/builtin-log.c @@ -21,31 +21,6 @@ static const char *fmt_patch_subject_prefix = "PATCH"; /* this is in builtin-diff.c */ void add_head(struct rev_info *revs); -static void add_name_decoration(const char *prefix, const char *name, struct object *obj) -{ - int plen = strlen(prefix); - int nlen = strlen(name); - struct name_decoration *res = xmalloc(sizeof(struct name_decoration) + plen + nlen); - memcpy(res->name, prefix, plen); - memcpy(res->name + plen, name, nlen + 1); - res->next = add_decoration(&name_decoration, obj, res); -} - -static int add_ref_decoration(const char *refname, const unsigned char *sha1, int flags, void *cb_data) -{ - struct object *obj = parse_object(sha1); - if (!obj) - return 0; - add_name_decoration("", refname, obj); - while (obj->type == OBJ_TAG) { - obj = ((struct tag *)obj)->tagged; - if (!obj) - break; - add_name_decoration("tag: ", refname, obj); - } - return 0; -} - static void cmd_log_init(int argc, const char **argv, const char *prefix, struct rev_info *rev) { diff --git a/commit.c b/commit.c index 03436b1..24d7dd4 100644 --- a/commit.c +++ b/commit.c @@ -1553,3 +1553,30 @@ int in_merge_bases(struct commit *commit, struct commit **reference, int num) free_commit_list(bases); return ret; } + +struct decoration name_decoration = { "object names" }; + +static void add_name_decoration(const char *prefix, const char *name, struct object *obj) +{ + int plen = strlen(prefix); + int nlen = strlen(name); + struct name_decoration *res = xmalloc(sizeof(struct name_decoration) + plen + nlen); + memcpy(res->name, prefix, plen); + memcpy(res->name + plen, name, nlen + 1); + res->next = add_decoration(&name_decoration, obj, res); +} + +int add_ref_decoration(const char *refname, const unsigned char *sha1, int flags, void *cb_data) +{ + struct object *obj = parse_object(sha1); + if (!obj) + return 0; + add_name_decoration("", refname, obj); + while (obj->type == OBJ_TAG) { + obj = ((struct tag *)obj)->tagged; + if (!obj) + break; + add_name_decoration("tag: ", refname, obj); + } + return 0; +} diff --git a/commit.h b/commit.h index 467872e..bf23535 100644 --- a/commit.h +++ b/commit.h @@ -122,4 +122,7 @@ extern struct commit_list *get_shallow_commits(struct object_array *heads, int depth, int shallow_flag, int not_shallow_flag); int in_merge_bases(struct commit *, struct commit **, int); + +extern int add_ref_decoration(const char *refname, const unsigned char *sha1, int flags, void *cb_data); + #endif /* COMMIT_H */ diff --git a/log-tree.c b/log-tree.c index 8624d5a..b69f029 100644 --- a/log-tree.c +++ b/log-tree.c @@ -4,8 +4,6 @@ #include "log-tree.h" #include "reflog-walk.h" -struct decoration name_decoration = { "object names" }; - static void show_parents(struct commit *commit, int abbrev) { struct commit_list *p; -- 1.5.3.rc0.68.geec71-dirty ^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 3/4] revision: mark commits that didn't match a pattern for later use 2007-07-08 16:23 [PATCH 0/4] Add git-rewrite-commits skimo 2007-07-08 16:23 ` [PATCH 1/4] export get_short_sha1 skimo 2007-07-08 16:23 ` [PATCH 2/4] export add_ref_decoration skimo @ 2007-07-08 16:23 ` skimo 2007-07-08 16:23 ` [PATCH 4/4] Add git-rewrite-commits skimo 3 siblings, 0 replies; 25+ messages in thread From: skimo @ 2007-07-08 16:23 UTC (permalink / raw) To: git, Junio C Hamano From: Sven Verdoolaege <skimo@kotnet.org> Signed-off-by: Sven Verdoolaege <skimo@kotnet.org> --- revision.c | 4 +++- revision.h | 1 + 2 files changed, 4 insertions(+), 1 deletions(-) diff --git a/revision.c b/revision.c index 0035d40..8d3bed2 100644 --- a/revision.c +++ b/revision.c @@ -1421,8 +1421,10 @@ static struct commit *get_revision_1(struct rev_info *revs) if (revs->no_merges && commit->parents && commit->parents->next) continue; - if (!commit_match(commit, revs)) + if (!commit_match(commit, revs)) { + commit->object.flags |= PRUNED; continue; + } if (revs->prune_fn && revs->dense) { /* Commit without changes? */ if (!(commit->object.flags & TREECHANGE)) { diff --git a/revision.h b/revision.h index 9728d4c..d3609ab 100644 --- a/revision.h +++ b/revision.h @@ -10,6 +10,7 @@ #define CHILD_SHOWN (1u<<6) #define ADDED (1u<<7) /* Parents already parsed and added? */ #define SYMMETRIC_LEFT (1u<<8) +#define PRUNED (1u<<9) struct rev_info; struct log_info; -- 1.5.3.rc0.68.geec71-dirty ^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 4/4] Add git-rewrite-commits 2007-07-08 16:23 [PATCH 0/4] Add git-rewrite-commits skimo ` (2 preceding siblings ...) 2007-07-08 16:23 ` [PATCH 3/4] revision: mark commits that didn't match a pattern for later use skimo @ 2007-07-08 16:23 ` skimo 2007-07-08 16:37 ` Johannes Schindelin 2007-07-08 23:56 ` Johannes Schindelin 3 siblings, 2 replies; 25+ messages in thread From: skimo @ 2007-07-08 16:23 UTC (permalink / raw) To: git, Junio C Hamano From: Sven Verdoolaege <skimo@kotnet.org> This builtin is similar to git-filter-branch (and cg-admin-rewritehist). The main difference is that git-rewrite-commits will automatically rewrite any SHA1 in a commit message to the rewritten SHA1 as well as any reference (in .git/refs/) that points to a rewritten commit. It's also a lot faster than git-filter-branch if no external command is called. For example, running either to eliminating a commit specified as a graft results in the following timings, both being performed on a freshly cloned copy of a small repo: bash-3.00$ time git-filter-branch test Rewrite 274fe3dfb8e8c7d0a6ce05138bdb650de7b459ea (425/425) Rewritten history saved to the test branch real 0m30.845s user 0m13.400s sys 0m19.640s bash-3.00$ time git-rewrite-commits real 0m0.223s user 0m0.080s sys 0m0.140s The command line is more reminiscent of git-log. For example you can say git-rewrite-commits --all to incorporate grafts in all branches, or git rewrite-commits --author='!Darl McBribe' --all to remove all commits by Darl McBribe. Signed-off-by: Sven Verdoolaege <skimo@kotnet.org> --- .gitignore | 1 + Documentation/cmd-list.perl | 1 + Documentation/git-rewrite-commits.txt | 147 +++++++++++ Makefile | 1 + builtin-rewrite-commits.c | 451 +++++++++++++++++++++++++++++++++ builtin.h | 1 + git.c | 1 + t/t7005-rewrite-commits.sh | 74 ++++++ 8 files changed, 677 insertions(+), 0 deletions(-) create mode 100644 Documentation/git-rewrite-commits.txt create mode 100644 builtin-rewrite-commits.c create mode 100755 t/t7005-rewrite-commits.sh diff --git a/.gitignore b/.gitignore index 20ee642..bcd95a9 100644 --- a/.gitignore +++ b/.gitignore @@ -109,6 +109,7 @@ git-reset git-rev-list git-rev-parse git-revert +git-rewrite-commits git-rm git-runstatus git-send-email diff --git a/Documentation/cmd-list.perl b/Documentation/cmd-list.perl index 2143995..63911b7 100755 --- a/Documentation/cmd-list.perl +++ b/Documentation/cmd-list.perl @@ -166,6 +166,7 @@ git-reset mainporcelain git-revert mainporcelain git-rev-list plumbinginterrogators git-rev-parse ancillaryinterrogators +git-rewrite-commits mainporcelain git-rm mainporcelain git-runstatus ancillaryinterrogators git-send-email foreignscminterface diff --git a/Documentation/git-rewrite-commits.txt b/Documentation/git-rewrite-commits.txt new file mode 100644 index 0000000..d82b777 --- /dev/null +++ b/Documentation/git-rewrite-commits.txt @@ -0,0 +1,147 @@ +git-rewrite-commits(1) +====================== + +NAME +---- +git-rewrite-commits - Rewrite commits + +SYNOPSIS +-------- +'git-rewrite-commits' [--index-map <command>] [--commit-map <command>] + [<rev-list options>...] + +DESCRIPTION +----------- +Lets you rewrite the commits selected by the gitlink:git-rev-list[1] +options, optionally applying custom maps on each of them. +These maps either modify the tree associated to a commit +(through manipulations of the index) or the (raw) commit itself. +Any commit within the specified range that is filtered out +through a pattern is removed from its children (assuming they +are in range) and replaced by the closest ancestors that +are either not being rewritten or not filtered out. + +Any branch pointing to any of the rewritten commits is replaced +by a pointer to the new commit. The original pointers are saved +in the refs/rewritten hierarchy. Any SHA1 in the commit message +of any of the rewritten commits that points to (another) rewritten +commit is replaced by the SHA1 of the corresponding new commit. + +Besides the actions specified by the maps, the new commits +will also reflect any grafts that may apply to any of the selected commits. + +*WARNING*! The rewritten history will have different object names for all +the objects and will not converge with the original branch. You will not +be able to easily push and distribute the rewritten branch on top of the +original branch. Please do not use this command if you do not know the +full implications, and avoid using it anyway, if a simple single commit +would suffice to fix your problem. + +Maps +~~~~ + +The maps are applied in the order as listed below. The <command> +argument is run as "sh -c '<command'>". + + +OPTIONS +------- + +--index-map <command>:: + This map should only modify the index specified by 'GIT_INDEX_FILE'. + Prior to running the map, this temporary index is populated + with the tree of the commit that is being rewritten. + This tree will be replaced according to the new state of this + temporary index after the map finishes. + +--commit-map <command>:: + This map receives the (raw) commit on stdin and should produce + the (possibly) modified commit on stdout. In the input commit, + the tree has already been modified by the index map (if any) + and has all references to older commits (including the parents) + changed to the possibly rewritten commits. + +<rev-list-options>:: + Selects the commits to be rewritten, defaulting to the history + that lead to HEAD. If commits are filtered using a (negative) + pattern then all the commits filtered out will be removed + from the history of the selected commits. + + +Examples +-------- + +Suppose you want to remove a file (containing confidential information +or copyright violation) from all commits: + +-------------------------------------------------------------------- +git rewrite-commits --index-map 'git update-index --remove filename' +-------------------------------------------------------------------- + +Now, you will get the rewritten history saved in your current branch +(the old branch is saved in refs/rewritten). + +To set a commit "$graft-id" (which typically is at the tip of another +history) to be the parent of the current initial commit "$commit-id", in +order to paste the other history behind the current history: + +----------------------------------------------- +echo "$commit-id $graft-id" >> .git/info/grafts +git rewrite-commits +----------------------------------------------- + +To remove commits authored by "Darl McBribe" from the history: + +-------------------------------------------- +git rewrite-commits --author='!Darl McBribe' +-------------------------------------------- + +Note that the changes introduced by the commits, and not reverted by +subsequent commits, will still be in the rewritten branch. If you want +to throw out _changes_ together with the commits, you should use the +interactive mode of gitlink:git-rebase[1]. + +Consider this history: + +------------------ + D--E--F--G--H + / / +A--B-----C +------------------ + +To rewrite only commits D,E,F,G,H, but leave A, B and C alone, use: + +-------------------------------- +git rewrite-commits ... C..H +-------------------------------- + +To rewrite commits E,F,G,H, use one of these: + +---------------------------------------- +git rewrite-commits ... C..H --not D +git rewrite-commits ... D..H --not C +---------------------------------------- + +To move the whole tree into a subdirectory, or remove it from there: + +--------------------------------------------------------------- +git rewrite-commits --index-map \ + 'git ls-files -s | sed "s-\t-&newsubdir/-" | + GIT_INDEX_FILE=$GIT_INDEX_FILE.new \ + git update-index --index-info && + mv $GIT_INDEX_FILE.new $GIT_INDEX_FILE' +--------------------------------------------------------------- + + +Author +------ +Written by Sven Verdoolaege. +Inspired by cg-admin-rewritehist by Petr "Pasky" Baudis <pasky@suse.cz>. + +Documentation +-------------- +Documentation by Petr Baudis, Sven Verdoolaege and the git list. + +GIT +--- +Part of the gitlink:git[7] suite diff --git a/Makefile b/Makefile index 5b30e5c..f84bcd7 100644 --- a/Makefile +++ b/Makefile @@ -370,6 +370,7 @@ BUILTIN_OBJS = \ builtin-rev-list.o \ builtin-rev-parse.o \ builtin-revert.o \ + builtin-rewrite-commits.o \ builtin-rm.o \ builtin-runstatus.o \ builtin-shortlog.o \ diff --git a/builtin-rewrite-commits.c b/builtin-rewrite-commits.c new file mode 100644 index 0000000..eebf816 --- /dev/null +++ b/builtin-rewrite-commits.c @@ -0,0 +1,451 @@ +#include "cache.h" +#include "refs.h" +#include "builtin.h" +#include "commit.h" +#include "diff.h" +#include "revision.h" +#include "list-objects.h" +#include "run-command.h" +#include "cache-tree.h" +#include "grep.h" + +struct decoration rewrite_decoration = { "rewritten as" }; +static const char *rewritten_prefix = "rewritten"; +static const char *commit_map; +static const char *index_map; + +struct rewrite_decoration { + struct rewrite_decoration *next; + unsigned char sha1[20]; +}; + +static void add_rewrite_decoration(struct object *obj, unsigned char *sha1) +{ + struct rewrite_decoration *deco = xmalloc(sizeof(struct rewrite_decoration)); + hashcpy(deco->sha1, sha1); + deco->next = add_decoration(&rewrite_decoration, obj, deco); +} + +static int get_rewritten_sha1(unsigned char *sha1) +{ + struct rewrite_decoration *deco; + struct object *obj = lookup_object(sha1); + + if (!obj) + return 1; + + deco = lookup_decoration(&rewrite_decoration, obj); + if (!deco) + return 1; + + hashcpy(sha1, deco->sha1); + return 0; +} + +static char *add_parents(char *dest, struct commit_list *parents) +{ + unsigned char sha1[20]; + struct commit_list *list; + + for (list = parents; list; list = list->next) { + hashcpy(sha1, list->item->object.sha1); + get_rewritten_sha1(sha1); + memcpy(dest, "parent ", 7); + dest += 7; + memcpy(dest, sha1_to_hex(sha1), 40); + dest += 40; + *dest++ = '\n'; + } + return dest; +} + +static int get_one_line(char *buf, unsigned long len) +{ + char *end = memchr(buf, '\n', len); + if (end) + return end - buf + 1; + return len; +} + +static char *map_index(char *orig_hex) +{ + int argc; + const char *argv[10]; + static char index_env[16+PATH_MAX]; + char *tmp_index = git_path("rewrite_index"); + const char *env[] = { index_env, NULL }; + char *hex; + + memcpy(index_env, "GIT_INDEX_FILE=", 15); + strcpy(index_env+15, tmp_index); + tmp_index = index_env+15; + + /* First write out tree to temporary index */ + argc = 0; + argv[argc++] = "read-tree"; + argv[argc++] = orig_hex; + argv[argc] = NULL; + if (run_command_v_opt_cd_env(argv, RUN_GIT_CMD, NULL, env)) + die("Cannot write index '%s' for map", tmp_index); + + /* Then map the index */ + argc = 0; + argv[argc++] = "sh"; + argv[argc++] = "-c"; + argv[argc++] = index_map; + argv[argc] = NULL; + if (run_command_v_opt_cd_env(argv, 0, NULL, env)) + die("Index map '%s' failed", index_map); + + /* Finally read it back in */ + if (read_cache_from(tmp_index) < 0) + die("Error reading index '%s'", tmp_index); + active_cache_tree = cache_tree(); + if (cache_tree_update(active_cache_tree, active_cache, active_nr, + 0, 0) < 0) + die("Error building trees"); + hex = sha1_to_hex(active_cache_tree->sha1); + discard_cache(); + + unlink(tmp_index); + + return hex; +} + +static char *rewrite_header(char *dest, unsigned long *len_p, char **buf_p, + struct commit_list *parents) +{ + int linelen; + do { + char *line = *buf_p; + linelen = get_one_line(*buf_p, *len_p); + + if (!linelen) + return dest; + *buf_p += linelen; + *len_p -= linelen; + + if (index_map && !memcmp(line, "tree ", 5)) { + if (linelen != 46) + die("bad tree line in commit"); + memcpy(dest, "tree ", 5); + line[45] = '\0'; + memcpy(dest+5, map_index(line+5), 40); + line[45] = '\n'; + dest[45] = '\n'; + dest += 46; + continue; + } + + /* drop old parents */ + if (!memcmp(line, "parent ", 7)) { + if (linelen != 48) + die("bad parent line in commit"); + continue; + } + + /* insert new parents before author */ + if (!memcmp(line, "author ", 7)) + dest = add_parents(dest, parents); + + memcpy(dest, line, linelen); + dest += linelen; + } while (linelen > 1); + return dest; +} + +static size_t memspn(const char *s, const char *accept, size_t n) +{ + size_t offset; + + for (offset = 0; offset < n; ++offset) + if (!strchr(accept, s[offset])) + break; + return offset; +} + +static size_t memcspn(const char *s, const char *accept, size_t n) +{ + size_t offset; + + for (offset = 0; offset < n; ++offset) + if (strchr(accept, s[offset])) + break; + return offset; +} + +/* Replace any (short) sha1 of a rewritten commit by the new (short) sha1 */ +static char *rewrite_body(char *dest, unsigned long len, char *buf) +{ + static const char hex[] = "0123456789abcdef"; + unsigned char sha1[20]; + + while (len) { + size_t ll = memcspn(buf, hex, len); + memcpy(dest, buf, ll); + dest += ll; + buf += ll; + len -= ll; + + ll = memspn(buf, hex, len); + if (ll >= 8 && ll <= 40 && + !get_short_sha1(buf, ll, sha1, 1) && + !get_rewritten_sha1(sha1)) + memcpy(dest, sha1_to_hex(sha1), ll); + else + memcpy(dest, buf, ll); + dest += ll; + buf += ll; + len -= ll; + } + return dest; +} + +static void write_ref_sha1_or_die(const char *ref, const unsigned char *old_sha1, + const unsigned char *new_sha1, + const char *logmsg, int flags) +{ + struct ref_lock *lock; + + lock = lock_any_ref_for_update(ref, old_sha1, flags); + if (!lock) + die("%s: cannot lock the ref", ref); + if (write_ref_sha1(lock, new_sha1, logmsg) < 0) + die("%s: cannot update the ref", ref); +} + +static int is_ref_to_be_rewritten(const char *ref) +{ + unsigned char sha1[20]; + int flag; + + if (prefixcmp(ref, "refs/")) + return 0; + if (!prefixcmp(ref, "refs/remotes/")) + return 0; + + resolve_ref(ref, sha1, 0, &flag); + if (flag & REF_ISSYMREF) + return 0; + + return 1; +} + +static void rewrite_sha1(struct object *obj, unsigned char *new_sha1) +{ + struct name_decoration *deco; + int prefix_len; + + if (!hashcmp(obj->sha1, new_sha1)) + return; + + add_rewrite_decoration(obj, new_sha1); + + prefix_len = strlen(rewritten_prefix); + deco = lookup_decoration(&name_decoration, obj); + for (; deco; deco = deco->next) { + char buffer[256]; + char *p; + int len = strlen(deco->name); + + if (!is_ref_to_be_rewritten(deco->name)) + continue; + if (len + prefix_len + 1 + 1 > sizeof(buffer)) + die("rewrite ref of '%s' too long", deco->name); + p = buffer; + memcpy(p, "refs/", 5); + p += 5; + memcpy(p, rewritten_prefix, prefix_len); + p += prefix_len; + memcpy(p, deco->name+4, len-4 + 1); + + write_ref_sha1_or_die(buffer, NULL, obj->sha1, + "copied during rewrite", 0); + + if (is_null_sha1(new_sha1)) + delete_ref(deco->name, obj->sha1); + else + write_ref_sha1_or_die(deco->name, obj->sha1, new_sha1, + "rewritten", 0); + } +} + +static void map_and_write_commit(char *commit, size_t len, unsigned char *sha1) +{ + char commit_path[PATH_MAX]; + struct child_process cmd; + int argc; + const char *argv[10]; + int fd; + + fd = git_mkstemp(commit_path, sizeof(commit_path), ".commit_XXXXXX");; + write_or_die(fd, commit, len); + + argc = 0; + argv[argc++] = "sh"; + argv[argc++] = "-c"; + argv[argc++] = commit_map; + argv[argc] = NULL; + memset(&cmd, 0, sizeof(cmd)); + cmd.in = open(commit_path, O_RDONLY); + if (cmd.in < 0) + die("Unable to read commit from file '%s'", commit_path); + unlink(commit_path); + cmd.out = -1; + cmd.argv = argv; + if (start_command(&cmd)) + die("Commit map '%s' failed to start", commit_map); + if (index_pipe(sha1, cmd.out, commit_type, 1)) + die("Unable to write new commit"); + if (finish_command(&cmd)) + die("Commit map '%s' failed", commit_map); + close(cmd.in); +} + +/* + * Replace any parent that has been removed by its parents + * and return the number of new parents. + * We directly modify the parent list, so any libification + * should probably adapt this function. + */ +static int rewrite_parents(struct commit *commit) +{ + int n; + struct commit_list *list, *parents, **prev; + unsigned char sha1[20]; + + for (n = 0, prev = &commit->parents; *prev; ++n) { + list = *prev; + + if (!(list->item->object.flags & PRUNED)) { + prev = &list->next; + continue; + } + + hashcpy(sha1, list->item->object.sha1); + get_rewritten_sha1(sha1); + if (!is_null_sha1(sha1)) { + hashclr(sha1); + rewrite_sha1(&list->item->object, sha1); + } + + parents = list->item->parents; + if (!parents) { + *prev = list->next; + free(list); + --n; + continue; + } + list->item = parents->item; + prev = &list->next; + list = list->next; + while ((parents = parents->next)) { + commit_list_insert(parents->item, prev); + prev = &(*prev)->next; + ++n; + } + *prev = list; + } + + return n; +} + +static void rewrite_commit(struct commit *commit) +{ + char *buf, *p; + int n; + char *orig_buf = commit->buffer; + unsigned long orig_len = strlen(orig_buf); + unsigned char sha1[20]; + + n = rewrite_parents(commit); + + /* Make enough remove for n (possibly extra) parents */ + p = buf = xmalloc(orig_len + n*48); + p = rewrite_header(p, &orig_len, &orig_buf, commit->parents); + p = rewrite_body(p, orig_len, orig_buf); + if (!commit_map) { + if (write_sha1_file(buf, p-buf, commit_type, sha1)) + die("Unable to write new commit"); + } else + map_and_write_commit(buf, p-buf, sha1); + free(buf); + + rewrite_sha1(&commit->object, sha1); +} + +static void move_head_forward(char *ref, unsigned char *old_sha1, int detached) +{ + unsigned char new_sha1[20]; + int rewritten; + + hashcpy(new_sha1, old_sha1); + rewritten = !get_rewritten_sha1(new_sha1); + if (rewritten && !is_bare_repository()) { + int argc; + const char *argv[10]; + + argc = 0; + argv[argc++] = "read-tree"; + argv[argc++] = "-m"; + argv[argc++] = "-u"; + argv[argc++] = sha1_to_hex(old_sha1); + argv[argc++] = sha1_to_hex(new_sha1); + argv[argc] = NULL; + if (run_command_v_opt(argv, RUN_GIT_CMD)) + die("Cannot move HEAD forward"); + } + if (!detached) + create_symref("HEAD", ref, "reattached after rewrite"); + else if (rewritten) + write_ref_sha1_or_die("HEAD", NULL, new_sha1, + "rewritten", REF_NODEREF); +} + +int cmd_rewrite_commits(int argc, const char **argv, const char *prefix) +{ + struct rev_info rev; + struct commit *commit; + unsigned char HEAD_sha1[20]; + char *HEAD_ref; + int flag; + int i, j; + + init_revisions(&rev, prefix); + + for (i = 1, j = 1; i < argc; ++i) { + if (!strcmp(argv[i], "--index-map")) { + if (++i == argc) + die("Argument required for --index-map"); + index_map = argv[i]; + } else if (!strcmp(argv[i], "--commit-map")) { + if (++i == argc) + die("Argument required for --commit-map"); + commit_map = argv[i]; + } else + argv[j++] = argv[i]; + } + argc = j; + argc = setup_revisions(argc, argv, &rev, "HEAD"); + rev.ignore_merges = 0; + rev.topo_order = 1; + rev.reverse = 1; + + for_each_ref(add_ref_decoration, NULL); + + HEAD_ref = xstrdup(resolve_ref("HEAD", HEAD_sha1, 1, &flag)); + if (flag & REF_ISSYMREF) { + /* Detach HEAD at its current position */ + write_ref_sha1_or_die("HEAD", NULL, HEAD_sha1, + "detached for rewrite", REF_NODEREF); + } + + prepare_revision_walk(&rev); + while ((commit = get_revision(&rev)) != NULL) { + rewrite_commit(commit); + } + + move_head_forward(HEAD_ref, HEAD_sha1, !(flag & REF_ISSYMREF)); + + return 0; +} diff --git a/builtin.h b/builtin.h index 661a92f..e01d337 100644 --- a/builtin.h +++ b/builtin.h @@ -63,6 +63,7 @@ extern int cmd_rerere(int argc, const char **argv, const char *prefix); extern int cmd_rev_list(int argc, const char **argv, const char *prefix); extern int cmd_rev_parse(int argc, const char **argv, const char *prefix); extern int cmd_revert(int argc, const char **argv, const char *prefix); +extern int cmd_rewrite_commits(int argc, const char **argv, const char *prefix); extern int cmd_rm(int argc, const char **argv, const char *prefix); extern int cmd_runstatus(int argc, const char **argv, const char *prefix); extern int cmd_shortlog(int argc, const char **argv, const char *prefix); diff --git a/git.c b/git.c index a647f9c..ce02b0b 100644 --- a/git.c +++ b/git.c @@ -356,6 +356,7 @@ static void handle_internal_command(int argc, const char **argv) { "rev-list", cmd_rev_list, RUN_SETUP }, { "rev-parse", cmd_rev_parse, RUN_SETUP }, { "revert", cmd_revert, RUN_SETUP | NEED_WORK_TREE }, + { "rewrite-commits", cmd_rewrite_commits, RUN_SETUP }, { "rm", cmd_rm, RUN_SETUP | NEED_WORK_TREE }, { "runstatus", cmd_runstatus, RUN_SETUP | NEED_WORK_TREE }, { "shortlog", cmd_shortlog, RUN_SETUP | USE_PAGER }, diff --git a/t/t7005-rewrite-commits.sh b/t/t7005-rewrite-commits.sh new file mode 100755 index 0000000..93f438f --- /dev/null +++ b/t/t7005-rewrite-commits.sh @@ -0,0 +1,74 @@ +#!/bin/sh + +test_description='git-rewrite-commits' +. ./test-lib.sh + +make_commit () { + lower=$(echo $1 | tr A-Z a-z) + echo $lower > $lower + git add $lower + test_tick + git commit -m $1 + git tag $1 +} + +test_expect_success 'setup' ' + make_commit A + make_commit B + git checkout -b branch B + make_commit D + make_commit E + git checkout master + make_commit C + git checkout branch + git merge C + git tag F + make_commit G + make_commit H +' + +orig_H=$(git rev-parse H) +test_expect_success 'rewrite identically' ' + git-rewrite-commits +' + +test_expect_success 'result is really identical' ' + test $orig_H = $(git rev-parse H) +' + +# for lack of 'git-mv --cached d doh' +test_expect_success 'rewrite, renaming a specific file' ' + git-rewrite-commits --index-map \ + "git-ls-files -s | sed \"s-\\td\\\$-\\tdoh-\" | + GIT_INDEX_FILE=\$GIT_INDEX_FILE.new \ + git-update-index --index-info && + mv \$GIT_INDEX_FILE.new \$GIT_INDEX_FILE" +' + +test_expect_success 'test that the file was renamed' ' + test d = $(git-show H:doh) && + ! git-show H:d -- +' + +orig_H=$(git rev-parse H) +test_expect_success 'use index-map to move into a subdirectory' ' + git-rewrite-commits --index-map \ + "git ls-files -s | sed \"s-\\t-&newsubdir/-\" | + GIT_INDEX_FILE=\$GIT_INDEX_FILE.new \ + git update-index --index-info && + mv \$GIT_INDEX_FILE.new \$GIT_INDEX_FILE" && + test -z "$(git diff $orig_H H:newsubdir)"' + +test_expect_success 'remove merge commit' ' + git-rewrite-commits --grep="!Merge" && + test 2 = `git-log ^G^@ G --pretty=format:%P | wc -w`' + +test_expect_success 'remove first commit' ' + git-rewrite-commits --grep="!A"' + +test_expect_success 'stops when index map fails' ' + ! git-rewrite-commits --index-map false && + git-checkout branch +' + +test_done -- 1.5.3.rc0.68.geec71-dirty ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH 4/4] Add git-rewrite-commits 2007-07-08 16:23 ` [PATCH 4/4] Add git-rewrite-commits skimo @ 2007-07-08 16:37 ` Johannes Schindelin 2007-07-08 17:30 ` Sven Verdoolaege 2007-07-08 18:04 ` Steven Grimm 2007-07-08 23:56 ` Johannes Schindelin 1 sibling, 2 replies; 25+ messages in thread From: Johannes Schindelin @ 2007-07-08 16:37 UTC (permalink / raw) To: skimo; +Cc: git, Junio C Hamano Hi, On Sun, 8 Jul 2007, skimo@liacs.nl wrote: > bash-3.00$ time git-filter-branch test > Rewrite 274fe3dfb8e8c7d0a6ce05138bdb650de7b459ea (425/425) > Rewritten history saved to the test branch > > real 0m30.845s > user 0m13.400s > sys 0m19.640s > > bash-3.00$ time git-rewrite-commits > > real 0m0.223s > user 0m0.080s > sys 0m0.140s That is to be expected. After all, the first is a script. However, I really have ask: how often per hour do you want to run that program? > The command line is more reminiscent of git-log. > For example you can say > > git-rewrite-commits --all > > to incorporate grafts in all branches, or > > git rewrite-commits --author='!Darl McBribe' --all > > to remove all commits by Darl McBribe. I am really unhappy that so much is talked about filtering out commits. That is amost certainly not what you want in most cases. In particular, I suspect that most users would expect the _changes_ filtered out by such a command, which is just not true. Further, I do not see much value in making this operation faster. It is meant to be a one-time operation, for example when you open-source a product and have to cull a lot of history that you must not show for legal reasons. It is a one-shot operation. And having a script which works reliably is more valuable than having two programs, each with its own set of bugs. So there are two things I see here that filter-branch cannot do yet. The first is to rewrite _all_ branches, which should be easy to do, it only has to be done. The second is to rewrite the commit messages so that the hashes are mapped, too. But that should be relatively easy, too: you can provide a message filter, and you can use the provided "map" function. If this seems to be what many people need, you can write a simple function and put it into filter-branch for common use. BTW I am not at all opposed to changing the name from filter-branch to something that fits better. Ciao, Dscho ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 4/4] Add git-rewrite-commits 2007-07-08 16:37 ` Johannes Schindelin @ 2007-07-08 17:30 ` Sven Verdoolaege 2007-07-08 18:17 ` Johannes Schindelin 2007-07-08 18:04 ` Steven Grimm 1 sibling, 1 reply; 25+ messages in thread From: Sven Verdoolaege @ 2007-07-08 17:30 UTC (permalink / raw) To: Johannes Schindelin; +Cc: git, Junio C Hamano On Sun, Jul 08, 2007 at 05:37:22PM +0100, Johannes Schindelin wrote: > That is to be expected. After all, the first is a script. However, I > really have ask: how often per hour do you want to run that program? I have a project that needs some cleaning-up and I'd like to do it incrementally. I think I'll have to run it about a dozen times. > I am really unhappy that so much is talked about filtering out commits. > That is amost certainly not what you want in most cases. In particular, I > suspect that most users would expect the _changes_ filtered out by such a > command, which is just not true. I don't care about that either. I'm just mentioning it because it's mentioned in the git-filter-branch documentation (which you added). > The second is to rewrite the commit messages so that the hashes are > mapped, too. But that should be relatively easy, too: you can provide a > message filter, and you can use the provided "map" function. If this > seems to be what many people need, you can write a simple function and put > it into filter-branch for common use. It's not going to be me (as I sais, I don't like shell programming). skimo ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 4/4] Add git-rewrite-commits 2007-07-08 17:30 ` Sven Verdoolaege @ 2007-07-08 18:17 ` Johannes Schindelin 2007-07-08 19:11 ` Sven Verdoolaege 0 siblings, 1 reply; 25+ messages in thread From: Johannes Schindelin @ 2007-07-08 18:17 UTC (permalink / raw) To: skimo; +Cc: git, Junio C Hamano Hi, On Sun, 8 Jul 2007, Sven Verdoolaege wrote: > On Sun, Jul 08, 2007 at 05:37:22PM +0100, Johannes Schindelin wrote: > > That is to be expected. After all, the first is a script. However, I > > really have ask: how often per hour do you want to run that program? > > I have a project that needs some cleaning-up and I'd like to do it > incrementally. I think I'll have to run it about a dozen times. A dozen times seems not that bad, especially since you can run both programs on a limited set of commits. So the cost is not that big. > > I am really unhappy that so much is talked about filtering out > > commits. That is amost certainly not what you want in most cases. > > In particular, I suspect that most users would expect the _changes_ > > filtered out by such a command, which is just not true. > > I don't care about that either. I'm just mentioning it because it's > mentioned in the git-filter-branch documentation (which you added). Which I copied. And this is not the first, let alone the only example in filter-branch's documentation. > > The second is to rewrite the commit messages so that the hashes are > > mapped, too. But that should be relatively easy, too: you can provide > > a message filter, and you can use the provided "map" function. If > > this seems to be what many people need, you can write a simple > > function and put it into filter-branch for common use. > > It's not going to be me (as I sais, I don't like shell programming). Yes, you made that clear. However, this leaves things only in half-finished states. - "git filter-branch" did not learn the useful features that you seem to need, and - your builtin is at most a start of a builtin replacement for filter-branch, which changes the semantics, to be sure. I have no doubts that it will stay that way for a while, since this builtin seems to be good enough for what you want it to do. Ciao, Dscho ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 4/4] Add git-rewrite-commits 2007-07-08 18:17 ` Johannes Schindelin @ 2007-07-08 19:11 ` Sven Verdoolaege 0 siblings, 0 replies; 25+ messages in thread From: Sven Verdoolaege @ 2007-07-08 19:11 UTC (permalink / raw) To: Johannes Schindelin; +Cc: git, Junio C Hamano On Sun, Jul 08, 2007 at 07:17:16PM +0100, Johannes Schindelin wrote: > On Sun, 8 Jul 2007, Sven Verdoolaege wrote: > > On Sun, Jul 08, 2007 at 05:37:22PM +0100, Johannes Schindelin wrote: > > > I am really unhappy that so much is talked about filtering out > > > commits. That is amost certainly not what you want in most cases. > > > In particular, I suspect that most users would expect the _changes_ > > > filtered out by such a command, which is just not true. > > > > I don't care about that either. I'm just mentioning it because it's > > mentioned in the git-filter-branch documentation (which you added). > > Which I copied. And this is not the first, let alone the only example in > filter-branch's documentation. All I'm saying is that you shouldn't blame me for doing something you have done yourself. And if you're not blaming me, but just making a general comment, then all I can say is that I agree with your comment. > However, this leaves things only in half-finished states. > > - "git filter-branch" did not learn the useful features that you seem to > need, and > > - your builtin is at most a start of a builtin replacement for > filter-branch, which changes the semantics, to be sure. > > I have no doubts that it will stay that way for a while, since this > builtin seems to be good enough for what you want it to do. If people find rewrite-commits useful, but think that something is missing, then I'd be willing to look into that. I'm personally not likely to work on fiter-branch, but maybe someone else, possibly inspired by rewrite-commits, will. But it is true that rewrite-commits does everything I want now. skimo ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 4/4] Add git-rewrite-commits 2007-07-08 16:37 ` Johannes Schindelin 2007-07-08 17:30 ` Sven Verdoolaege @ 2007-07-08 18:04 ` Steven Grimm 2007-07-08 18:15 ` Sven Verdoolaege 2007-07-08 18:41 ` Johannes Schindelin 1 sibling, 2 replies; 25+ messages in thread From: Steven Grimm @ 2007-07-08 18:04 UTC (permalink / raw) To: Johannes Schindelin; +Cc: skimo, git, Junio C Hamano Johannes Schindelin wrote: > I am really unhappy that so much is talked about filtering out commits. > That is amost certainly not what you want in most cases. In particular, I > suspect that most users would expect the _changes_ filtered out by such a > command, which is just not true. > I agree that unless it's named and documented very carefully, users might expect this to tweak history such that the commits in question never happened (unlike revert, which of course adds a new commit and leaves the old ones alone.) The documentation for this command could stand to be more explicit about that. > Further, I do not see much value in making this operation faster. It is > meant to be a one-time operation, for example when you open-source a > product and have to cull a lot of history that you must not show for legal > reasons. It is a one-shot operation. > Your recent changes to git-rebase (which, BTW, are great) include a feature that's very similar to this: the "squash these commits together in my history" feature. That'd be my use case for this, when I want to publish my changes to other developers who don't care about all my intermediate checkpoints of work in progress, and when the commits I'm removing haven't been published anywhere else yet. With this command, I could do something like: git rewrite-commits --grep="!@@@checkpoint" git push and it would strip out all my intermediate checkpoint commits (assuming I've marked them as such in my commit comments, which I always do) before pushing to my project's shared repo. Right now that's a much more cumbersome, and very manual, operation. Even with the new git-rebase changes, I still have to pick out those commits by hand, and it assumes that I otherwise want to do a rebase in the first place. > So there are two things I see here that filter-branch cannot do yet. The > first is to rewrite _all_ branches, which should be easy to do, it only > has to be done. > I wonder if it makes sense to go that direction, though, or to make this command do the things that filter-branch can do, for the simple reason that filter-branch is a shell script and this is already a nice non-shell-dependent C program. Obviously you end up in the same place either way eventually once filter-branch percolates to the top of the "port these scripts to C" list, but it seems odd to me to port features from a C program back to a shell script only to have to convert the shell script to C later on. Ironically, this app doesn't really speed up the one thing I find too slow in filter-branch: the "remove a file from the tree in all revisions" case. To do that you still have to launch a filter app for every commit, which is especially bad when the file in question only appears in a few revisions deep in the history of a repo. This command points us in the direction of a "remove/rename this file in history" feature that doesn't require forking tens of thousands of child processes on a repo with lots of history. For that alone I think it's worthwhile, even though it's not there yet; that will never happen with a shell script. And yeah, that's not a frequent operation, but it's sure nice when even the infrequent operations are lightning fast. -Steve ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 4/4] Add git-rewrite-commits 2007-07-08 18:04 ` Steven Grimm @ 2007-07-08 18:15 ` Sven Verdoolaege 2007-07-08 18:41 ` Johannes Schindelin 1 sibling, 0 replies; 25+ messages in thread From: Sven Verdoolaege @ 2007-07-08 18:15 UTC (permalink / raw) To: Steven Grimm; +Cc: Johannes Schindelin, git, Junio C Hamano On Sun, Jul 08, 2007 at 11:04:22AM -0700, Steven Grimm wrote: > This command points us in the direction of a "remove/rename this file in > history" feature that doesn't require forking tens of thousands of child > processes on a repo with lots of history. For that alone I think it's > worthwhile, even though it's not there yet; that will never happen with > a shell script. And yeah, that's not a frequent operation, but it's sure > nice when even the infrequent operations are lightning fast. I couldn't come up with a clean interface to specify removes/renames on the command line (other than copying the external command idea from cg-admin-rewritehist), but this is one of the operations I have to perform on the project I'm cleaning up, so if you have any suggestions, I'd be interested to hear them. skimo ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 4/4] Add git-rewrite-commits 2007-07-08 18:04 ` Steven Grimm 2007-07-08 18:15 ` Sven Verdoolaege @ 2007-07-08 18:41 ` Johannes Schindelin 2007-07-08 21:10 ` Sven Verdoolaege 1 sibling, 1 reply; 25+ messages in thread From: Johannes Schindelin @ 2007-07-08 18:41 UTC (permalink / raw) To: Steven Grimm; +Cc: skimo, git, Junio C Hamano Hi, On Sun, 8 Jul 2007, Steven Grimm wrote: > Johannes Schindelin wrote: > > > I am really unhappy that so much is talked about filtering out > > commits. That is amost certainly not what you want in most cases. In > > particular, I suspect that most users would expect the _changes_ > > filtered out by such a command, which is just not true. > > > > I agree that unless it's named and documented very carefully, users > might expect this to tweak history such that the commits in question > never happened (unlike revert, which of course adds a new commit and > leaves the old ones alone.) The documentation for this command could > stand to be more explicit about that. In the documentation of filter-branch, you have at least a couple of more useful examples, and the misunderstanding is cleared up. At least a little. > > Further, I do not see much value in making this operation faster. It > > is meant to be a one-time operation, for example when you open-source > > a product and have to cull a lot of history that you must not show for > > legal reasons. It is a one-shot operation. > > Your recent changes to git-rebase (which, BTW, are great) include a > feature that's very similar to this: the "squash these commits together > in my history" feature. That'd be my use case for this, when I want to > publish my changes to other developers who don't care about all my > intermediate checkpoints of work in progress, and when the commits I'm > removing haven't been published anywhere else yet. > > With this command, I could do something like: > > git rewrite-commits --grep="!@@@checkpoint" > git push git reset --head HEAD~<n> && git rev-list --pretty=oneline ..HEAD@{1} | while read sha1 message do git cherry-pick -n $sha1 || exit case "$message" in *@@@checkpoint*) ;; *) git commit -C $sha1;; esac done Yes, it is longer than the command line you gave. But it does not need any fancy new stuff, so it works even on 1.4.4.4. And you can make that a script, or if you really need to, an alias. Since you want to avoid the interactivity, rebase -i is kind of the wrong place to look. > and it would strip out all my intermediate checkpoint commits (assuming > I've marked them as such in my commit comments, which I always do) > before pushing to my project's shared repo. Right now that's a much more > cumbersome, and very manual, operation. Even with the new git-rebase > changes, I still have to pick out those commits by hand, and it assumes > that I otherwise want to do a rebase in the first place. If you _have_ to use some fancy stuff, you can fake an editor by resetting VISUAL, like illustrated in t3404. > > So there are two things I see here that filter-branch cannot do yet. > > The first is to rewrite _all_ branches, which should be easy to do, it > > only has to be done. > > I wonder if it makes sense to go that direction, though, or to make this > command do the things that filter-branch can do, for the simple reason > that filter-branch is a shell script and this is already a nice > non-shell-dependent C program. Exactly it is a C program. And if you are fleshing out issues, it is always easier to work on a shell script. That is why git-remote is still a perl script; the semantics are not yet flashed out, and it is orders of magnitudes slower to change them in C. And to test them in C. Besides, it is way easier to fsck up some obscure detail in C, since you have low-level access to everything. And those bugs are harder to spot. In shell, you have a limited interface, but for prototyping, that is actually an advantage. > Obviously you end up in the same place either way eventually once > filter-branch percolates to the top of the "port these scripts to C" > list, but it seems odd to me to port features from a C program back to a > shell script only to have to convert the shell script to C later on. Hey, if skimo would be willing to work until it is a complete replacement for filter-branch, and more, I will stop working on filter-branch at once. Somehow I did not get this impression, however. > Ironically, this app doesn't really speed up the one thing I find too > slow in filter-branch: the "remove a file from the tree in all > revisions" case. To do that you still have to launch a filter app for > every commit, which is especially bad when the file in question only > appears in a few revisions deep in the history of a repo. Semantics. That is the stage we are at. Trying to find the best semantics. Filtering out single files might well suffice for you. However, I think that this is only a very special corner case of something much bigger and more versatile. ATM I am quite certain that the eval'ed filters are the way to go. Since even in your corner case, you usually do it _once_. > This command points us in the direction of a "remove/rename this file in > history" feature that doesn't require forking tens of thousands of child > processes on a repo with lots of history. For that alone I think it's > worthwhile, even though it's not there yet; that will never happen with > a shell script. And yeah, that's not a frequent operation, but it's sure > nice when even the infrequent operations are lightning fast. It is sure nice when the frequent operations get lightning fast _first_. Ciao, Dscho ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 4/4] Add git-rewrite-commits 2007-07-08 18:41 ` Johannes Schindelin @ 2007-07-08 21:10 ` Sven Verdoolaege 2007-07-09 9:01 ` Johannes Sixt 0 siblings, 1 reply; 25+ messages in thread From: Sven Verdoolaege @ 2007-07-08 21:10 UTC (permalink / raw) To: Johannes Schindelin; +Cc: Steven Grimm, git, Junio C Hamano On Sun, Jul 08, 2007 at 07:41:53PM +0100, Johannes Schindelin wrote: > Hey, if skimo would be willing to work until it is a complete replacement > for filter-branch, and more, I will stop working on filter-branch at once. As I said, I'm willing to put some more time into this. However, I didn't want to wait until I had all the bells and whistles before sending something out. If people don't like the direction I'm going, then there is no point continuing and then I'll just use it for myself. I guess the major thing that is missing is --subdirectory-filter. Anything else? skimo ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 4/4] Add git-rewrite-commits 2007-07-08 21:10 ` Sven Verdoolaege @ 2007-07-09 9:01 ` Johannes Sixt 2007-07-09 9:48 ` Sven Verdoolaege 2007-07-09 11:57 ` Johannes Schindelin 0 siblings, 2 replies; 25+ messages in thread From: Johannes Sixt @ 2007-07-09 9:01 UTC (permalink / raw) To: git Sven Verdoolaege wrote: > I guess the major thing that is missing is --subdirectory-filter. > Anything else? Yes, how about this: $ git rewrite-commits --index-map ' testresult=$($HOME/bin/expensive-test); [ $testresult = t ] && $HOME/bin/tweak-index ' \ --commit-map ' [ $testresult = t ] && $HOME/bin/tweak-commit ' :-P -- Hannes ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 4/4] Add git-rewrite-commits 2007-07-09 9:01 ` Johannes Sixt @ 2007-07-09 9:48 ` Sven Verdoolaege 2007-07-09 11:57 ` Johannes Schindelin 1 sibling, 0 replies; 25+ messages in thread From: Sven Verdoolaege @ 2007-07-09 9:48 UTC (permalink / raw) To: Johannes Sixt; +Cc: git On Mon, Jul 09, 2007 at 11:01:34AM +0200, Johannes Sixt wrote: > Sven Verdoolaege wrote: > > I guess the major thing that is missing is --subdirectory-filter. > > Anything else? > > Yes, how about this: > > $ git rewrite-commits --index-map ' > testresult=$($HOME/bin/expensive-test); > [ $testresult = t ] && $HOME/bin/tweak-index ' \ > --commit-map ' > [ $testresult = t ] && $HOME/bin/tweak-commit ' > > :-P You can write the result of expensive-test to a temporary file. skimo ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 4/4] Add git-rewrite-commits 2007-07-09 9:01 ` Johannes Sixt 2007-07-09 9:48 ` Sven Verdoolaege @ 2007-07-09 11:57 ` Johannes Schindelin 1 sibling, 0 replies; 25+ messages in thread From: Johannes Schindelin @ 2007-07-09 11:57 UTC (permalink / raw) To: Johannes Sixt; +Cc: git Hi, On Mon, 9 Jul 2007, Johannes Sixt wrote: > Sven Verdoolaege wrote: > > I guess the major thing that is missing is --subdirectory-filter. > > Anything else? > > Yes, how about this: > > $ git rewrite-commits --index-map ' > testresult=$($HOME/bin/expensive-test); > [ $testresult = t ] && $HOME/bin/tweak-index ' \ > --commit-map ' > [ $testresult = t ] && $HOME/bin/tweak-commit ' As skimo almost hinted: this will not work. Once the index "map" exits, testresult is forgotten. Ciao, Dscho ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 4/4] Add git-rewrite-commits 2007-07-08 16:23 ` [PATCH 4/4] Add git-rewrite-commits skimo 2007-07-08 16:37 ` Johannes Schindelin @ 2007-07-08 23:56 ` Johannes Schindelin 2007-07-09 9:47 ` Sven Verdoolaege 2007-07-09 12:36 ` Jeff King 1 sibling, 2 replies; 25+ messages in thread From: Johannes Schindelin @ 2007-07-08 23:56 UTC (permalink / raw) To: skimo; +Cc: git, Junio C Hamano Hi, I am way too tired to comment in detail, but here are some preliminary findings: - your command line interface is very nice and elegant. - you use a lot of what was in cg-admin-rewritehist (including adjustments I made in the documentation for filter-branch), but you also make it more confusing for people used to that tool: - instead of leaving the original branches as they are, you overwrite them. That's okay. But then you put the originals into refs/rewritten. Without cg-admin-rewritehist using that name for the _result_, you could explain your way out of confusion. As it is, you cannot. - in spite of doing the same as cg-admin-rewritehist with filters, you call them maps. But they are no maps. They are manipulators, you can call them mutators or filters, too. Given what people know of cg-admin-rewritehist, you really should keep the name "filter". - the name "map" itself is used in cg-admin-rewritehist, to map commit names from old to new. By using that name differently, again you contribute to confusion, for no good reason. - neat idea to abuse the decorator for rewriting purposes. - get_one_line() is a misnomer. It wants to be named get_linelen(). - instead of spawning read-tree, you could use unpack_trees() to boost performance even more. But I guess it is probably left for later, to make it easier to review the patch. - your memspn() and memcspn() functions are very inefficient. Better walk the memory, introduce a GIT_HEXCHAR into ctype.c, ishexchar() into git-compat-util.h, and use that. - The example you give with "git update-index --remove" can fail, right? Tell the user about that. - The commit filter again deviates from the usage in cg-admin-rewritehist. I can see that you wanted to make it more versatile. However, it makes the tool potentially also a bit more cumbersome to use. Besides, you use a temporary file where there is no need to. - "map" is missing. This is a function that you can use in all filters in cg-admin-rewritehist, except (unfortunately) in the commit filter (for technical reasons). Alas, these technical reasons also prevent such a function to be usable by any of the filters ("maps", as you call them). - the more fundamental problem with the missing "map", I do not see a reasonable way to get the same functionality from any of the code snippets passed to rewrite-commits. Indeed, even the workaround of cg-admin-rewritehist, to read $TEMP/map/$sha1, does not work, since you are keeping it all in memory. On IRC, gitster suggested to use a bidirectional pipe (such as stdin/stdout) to get at the new commit names, but because of buffering, I guess this is no joy. As commented on IRC, the env/tree/parent/msg filters of cg-admin-rewritehist could be all emulated by commit filters. However, that would be really inconvenient. So at a later stage, these would have to be integrated into rewrite-commits (even if it would be possible to drive rewrite-commits by a shell porcelain, but I guess you are opposed to that idea, since you want to do everything else in C, too). However, the biggest and very real problem is that your filters do not have a "map" function to get the rewritten sha1 for a given sha1. That is what makes the filters so versatile, though, since you can skip revisions by much more complex rules than just greps on the commit message or header. But hey, maybe it _is_ time to rethink the whole filter business, and introduce some kind of regular expression based action language. Something like git rewrite-commits -e '/^author Darl McBribe/skip-commit' \ -e 'substitute/^author Joahnnes/author Johannes/header' \ -e 'substitute/poreclain/porcelain/body' \ -e 'rewrite-commit-names' Hmm? Ciao, Dscho ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 4/4] Add git-rewrite-commits 2007-07-08 23:56 ` Johannes Schindelin @ 2007-07-09 9:47 ` Sven Verdoolaege 2007-07-09 12:57 ` Johannes Schindelin 2007-07-09 12:36 ` Jeff King 1 sibling, 1 reply; 25+ messages in thread From: Sven Verdoolaege @ 2007-07-09 9:47 UTC (permalink / raw) To: Johannes Schindelin; +Cc: git, Junio C Hamano Thanks for the review. On Mon, Jul 09, 2007 at 12:56:04AM +0100, Johannes Schindelin wrote: > - you use a lot of what was in cg-admin-rewritehist (including > adjustments I made in the documentation for filter-branch), but you also > make it more confusing for people used to that tool: > > - instead of leaving the original branches as they are, you > overwrite them. That's okay. But then you put the originals into > refs/rewritten. Without cg-admin-rewritehist using that name > for the _result_, you could explain your way out of confusion. > As it is, you cannot. I thought you had to specify the name of the new branch on the command line. Anyway, I don't really care about the name of this hierarchy. I just picked a name that is somewhat related to "rewrite-commits". Suggestions are welcome. I could also just not create them. The old values are also available in reflog. > - in spite of doing the same as cg-admin-rewritehist with filters, > you call them maps. But they are no maps. They are manipulators, > you can call them mutators or filters, too. Given what people > know of cg-admin-rewritehist, you really should keep the name > "filter". Nonesense. They are not filters (http://en.wikipedia.org/wiki/Filter_%28higher-order_function%29). They are maps (http://en.wikipedia.org/wiki/Map_%28higher-order_function%29). (In cg-admin-rewritehist some of them are (partial) filters, but the ones I have are not filters). I could extend the commit-map to remove the commit it the output is empty, but it'd still be closer to a map than to a filter. (You can map a commit to nothing, but a filter can't alter the elements of a list, it only determines which elements are kept.) > - the name "map" itself is used in cg-admin-rewritehist, to map > commit names from old to new. By using that name differently, > again you contribute to confusion, for no good reason. There is a good reason to call what I call maps maps. They _are_ maps. Still, I'm open for suggestions. > - get_one_line() is a misnomer. It wants to be named get_linelen(). Hmmm... I guess you missed Linus' mistake when he introduced it in commit.c (e3bc7a3bc7b77f44d686003f5a9346a135529f73). Do you want me to rename that one as well? > - instead of spawning read-tree, you could use unpack_trees() to boost > performance even more. But I guess it is probably left for later, to > make it easier to review the patch. Yeah, it looked a bit tricky for an initial implementation, especially where I move the HEAD forward. > - The example you give with "git update-index --remove" can fail, right? Yes. Spectacularly, even. > - The commit filter again deviates from the usage in cg-admin-rewritehist. > I can see that you wanted to make it more versatile. However, it makes > the tool potentially also a bit more cumbersome to use. Besides, you use > a temporary file where there is no need to. Are you saying I should use two pipes? What if the commit message is larger than the pipe buffer? > - the more fundamental problem with the missing "map", I do not see a > reasonable way to get the same functionality from any of the code > snippets passed to rewrite-commits. Indeed, even the workaround of > cg-admin-rewritehist, to read $TEMP/map/$sha1, does not work, since you > are keeping it all in memory. On IRC, gitster suggested to use a > bidirectional pipe (such as stdin/stdout) to get at the new commit > names, but because of buffering, I guess this is no joy. I could add an option to write $TEMP/map/$sha1, but it's not clear to me when such a map would be useful. Please enlighten me. > As commented on IRC, the env/tree/parent/msg filters of > cg-admin-rewritehist could be all emulated by commit filters. However, > that would be really inconvenient. So at a later stage, these would have > to be integrated into rewrite-commits (even if it would be possible to > drive rewrite-commits by a shell porcelain, but I guess you are opposed to > that idea, since you want to do everything else in C, too). I'm not opposed to running a few commands and connecting stuff in shell. (See git-submodule add, although I admit that I would have preferred to do all of it in C.) > However, the biggest and very real problem is that your filters do not > have a "map" function to get the rewritten sha1 for a given sha1. That is > what makes the filters so versatile, though, since you can skip revisions > by much more complex rules than just greps on the commit message or > header. I thought your were opposed to the idea of skipping commits, since you still carry along the changes in those commits. Do you have a use case? > But hey, maybe it _is_ time to rethink the whole filter business, and > introduce some kind of regular expression based action language. Something > like > > git rewrite-commits -e '/^author Darl McBribe/skip-commit' \ What's wrong with --author='!Darl McBribe' ? > -e 'substitute/^author Joahnnes/author Johannes/header' \ > -e 'substitute/poreclain/porcelain/body' \ > -e 'rewrite-commit-names' Hmmm... some of these would basically need a builtin sed. I was thinking about adding --remove and --rename, though. skimo ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 4/4] Add git-rewrite-commits 2007-07-09 9:47 ` Sven Verdoolaege @ 2007-07-09 12:57 ` Johannes Schindelin 2007-07-09 13:49 ` Sven Verdoolaege 0 siblings, 1 reply; 25+ messages in thread From: Johannes Schindelin @ 2007-07-09 12:57 UTC (permalink / raw) To: skimo; +Cc: git, Junio C Hamano Hi, On Mon, 9 Jul 2007, Sven Verdoolaege wrote: > On Mon, Jul 09, 2007 at 12:56:04AM +0100, Johannes Schindelin wrote: > > > - you use a lot of what was in cg-admin-rewritehist (including > > adjustments I made in the documentation for filter-branch), but you > > also make it more confusing for people used to that tool: > > > > - instead of leaving the original branches as they are, you > > overwrite them. That's okay. But then you put the originals into > > refs/rewritten. Without cg-admin-rewritehist using that name > > for the _result_, you could explain your way out of confusion. > > As it is, you cannot. > > I thought you had to specify the name of the new branch on the command > line. Anyway, I don't really care about the name of this hierarchy. > I just picked a name that is somewhat related to "rewrite-commits". > Suggestions are welcome. How much more explicit should I formulate my suggestion? > I could also just not create them. The old values are also available in > reflog. There are two types of convenience. One for the programmer, and one for the user. As a user I would definitely like to see which branches were actually changed. So even if you _have_ to keep your current behaviour, namely that the results are stored under the original names, you should store the original refs in something like refs/original (overridable by a command line option), but _only if_ they actually changed. > > - in spite of doing the same as cg-admin-rewritehist with filters, > > you call them maps. But they are no maps. They are manipulators, > > you can call them mutators or filters, too. Given what people > > know of cg-admin-rewritehist, you really should keep the name > > "filter". > > Nonesense. Quite a strong word. Let's see if you can defend it: > They are not filters > (http://en.wikipedia.org/wiki/Filter_%28higher-order_function%29). They > are maps (http://en.wikipedia.org/wiki/Map_%28higher-order_function%29). Aha. Functional programming. Since when do we do functional programming here? C++ is much closer to what we do. So let's see: http://en.wikipedia.org/wiki/Map_%28C%2B%2B_container%29 Does not apply at all here. But does something else apply? Ah, there is something about mathematics. The basis of programming. Maybe it is helpful: http://en.wikipedia.org/wiki/Map_%28mathematics%29 Ah, no! It only says that it is a function. But there is a mention about partial maps! And you said something about that: > (In cg-admin-rewritehist some of them are (partial) filters, > but the ones I have are not filters). Alas, Wikipedia says about a partial function that it is a function which maps from one space into another space (not necessarily different spaces, though), but does not necessarily have a result for every input. But in that sense, everything is a map. Even rewrite-commits. And commits. (Maybe you should rename that to map-maps?) So maybe more luck with "filter"? Since we still do not write in Haskell or something similar obscure, let's see if Wikipedia has something which is more likely applying to us. Yes. Software. That surely must apply: http://en.wikipedia.org/wiki/Filter_%28software%29 and even more to the point: UNIX filters: http://en.wikipedia.org/wiki/Filter_%28Unix%29 You will see that both definitions apply _perfectly_ to what you do here. You pipe the commit into the "map" and write it out as a new commit! The _perfect_ example for a UNIX filter. > I could extend the commit-map to remove the commit it the output is > empty, but it'd still be closer to a map than to a filter. (You can map > a commit to nothing, but a filter can't alter the elements of a list, it > only determines which elements are kept.) This is not what the commit-filter of cg-admin-rewritehist does. If at all, your commit-map is a better example for a filter than rewritehist's commit-filter is. So, you cannot defend that bold statement. Enough ranted about an unmerited rant, let's get back to the interesting stuff: All you demonstrated here is that the way "commit-map"s work right now is not sufficient to make more complicated rewrite rules. You cannot split one commit in two, and you cannot skip it. > > - the name "map" itself is used in cg-admin-rewritehist, to map > > commit names from old to new. By using that name differently, > > again you contribute to confusion, for no good reason. > > There is a good reason to call what I call maps maps. They _are_ maps. > Still, I'm open for suggestions. Are you? Well, my suggestion still stands... I mean, the suggestion I already gave in the post you quoted. You know, to keep the name as it was in the program you based your work on? Okay, I'll be explicit for once: Filter. > > - get_one_line() is a misnomer. It wants to be named get_linelen(). > > Hmmm... I guess you missed Linus' mistake when he introduced it > in commit.c (e3bc7a3bc7b77f44d686003f5a9346a135529f73). Yes. But finger pointing will not help you here. I reviewed _your_ code. > Do you want me to rename that one as well? Be my guest. But let's do that independently, will ya? > > - instead of spawning read-tree, you could use unpack_trees() to boost > > performance even more. But I guess it is probably left for later, to > > make it easier to review the patch. > > Yeah, it looked a bit tricky for an initial implementation, especially > where I move the HEAD forward. Later. > > - The example you give with "git update-index --remove" can fail, right? > > Yes. Spectacularly, even. Does that mean you acknowledge that the man page should tell about this? And probably also reveal the "|| :" remedy? > > - The commit filter again deviates from the usage in cg-admin-rewritehist. > > I can see that you wanted to make it more versatile. However, it makes > > the tool potentially also a bit more cumbersome to use. Besides, you use > > a temporary file where there is no need to. > > Are you saying I should use two pipes? Umm. Why not? > What if the commit message is larger than the pipe buffer? You start_command(). Then you write() until it is all written, or the pipe is broken. Then you get the output via index_pipe(). Which is a single sha1. I do not understand your question. > > - the more fundamental problem with the missing "map", I do not see a > > reasonable way to get the same functionality from any of the code > > snippets passed to rewrite-commits. Indeed, even the workaround of > > cg-admin-rewritehist, to read $TEMP/map/$sha1, does not work, since you > > are keeping it all in memory. On IRC, gitster suggested to use a > > bidirectional pipe (such as stdin/stdout) to get at the new commit > > names, but because of buffering, I guess this is no joy. > > I could add an option to write $TEMP/map/$sha1, but it's not clear > to me when such a map would be useful. Please enlighten me. To be flexible, you have to have a way to ask "what was this commit rewritten to?" from within a filter. Example: maybe you expect the rewritten commit to have a tree identical to commits in a certain branch, which was _also_ possibly rewritten. Maybe you want to mark these pairs, for example by a tag. Then you need to get the sha1 of the rewritten commits. > > As commented on IRC, the env/tree/parent/msg filters of > > cg-admin-rewritehist could be all emulated by commit filters. However, > > that would be really inconvenient. So at a later stage, these would > > have to be integrated into rewrite-commits (even if it would be > > possible to drive rewrite-commits by a shell porcelain, but I guess > > you are opposed to that idea, since you want to do everything else in > > C, too). > > I'm not opposed to running a few commands and connecting stuff in shell. > (See git-submodule add, although I admit that I would have preferred to > do all of it in C.) Yes, but your original version was very intrusive and hard to review. In contrast, the simple shell script we have now is hackable by many, and easily reviewed. Indeed, it takes but a couple of minutes to verify that it does what it should do and has no side effects. But I agree, it is easy enough to run the commands from rewrite-commits.c. The harder part is to provide an infrastructure which makes it useful. To enhance on the above example: you're rewriting the commit messages so that commit names are rewritten to match the rewritten commits. That is possible by a message filter in cg-admin-rewritehist. But now somebody comes along, and says "I have a history I need to rewrite. All bug fixes. The commit names were all abbreviated in the commit messages, but they always had 'commit ' in front of them. I want to rewrite them, too." No chance without a "map" function. Except to bug you to change the C code for this very special need. By contrast, filter-branch does not need to be changed for that at all. > > However, the biggest and very real problem is that your filters do not > > have a "map" function to get the rewritten sha1 for a given sha1. That > > is what makes the filters so versatile, though, since you can skip > > revisions by much more complex rules than just greps on the commit > > message or header. > > I thought your were opposed to the idea of skipping commits, since > you still carry along the changes in those commits. Are you trying to misunderstand me? You can tell the user how to skip single commits. But you have to warn them that they might be wanting something different, because the word "commit" not only implies a "revision", which you would skip in this context, but also a "commit diff", which you would _not_. The "much more complex" rules clearly are for other use cases, which brings me to your next question: > Do you have a use case? _You_ already mentioned it. Subdirectory filters. And if you think a little further, you can easily see that for this to become useful, you _have_ to have a "map" function, too. Because you are likely wanting to use rewrite-commits, or filter-branch, to transform big repositories into super/subprojects. > > But hey, maybe it _is_ time to rethink the whole filter business, and > > introduce some kind of regular expression based action language. > > Something like > > > > git rewrite-commits -e '/^author Darl McBribe/skip-commit' \ > > What's wrong with --author='!Darl McBribe' ? It is a very special use case. Not always will you be able to get all the information from the commit object you need for conditional operations. My example only showed that you can do the same with that syntax. But imagine what you could do if we just added a small syntactical sugar: -e '?has-path:README?substitute/v2/v3/' > > -e 'substitute/^author Joahnnes/author Johannes/header' \ > > -e 'substitute/poreclain/porcelain/body' \ > > -e 'rewrite-commit-names' > > Hmmm... some of these would basically need a builtin sed. I was thinking > about adding --remove and --rename, though. IMHO you will not get happy by introducing ever more, but still restricted functionality. At the moment, cg-admin-rewritehist can do much more than what rewrite-commits can do, because the eval'ed filters give you a much greater freedom in what you can do (since "map" is available, and you can even modify the commit name mapping by the filters on the go: they can edit .git-rewrite/!), and how you can do it (no need to work around certain limitations by using temporary files). A lot of the speed of rewrite-commits stems from the fact that it is limited in that way. Don't get me wrong. I _want_ rewrite-commits to replace filter-branch. But there are fundamental problems with filter-branch, and there are fundamental problems with having the same functionality from within C. And it appears to me that you do not even try to address the latter fundamental problems. Ciao, Dscho ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 4/4] Add git-rewrite-commits 2007-07-09 12:57 ` Johannes Schindelin @ 2007-07-09 13:49 ` Sven Verdoolaege 2007-07-09 14:11 ` Johannes Schindelin 0 siblings, 1 reply; 25+ messages in thread From: Sven Verdoolaege @ 2007-07-09 13:49 UTC (permalink / raw) To: Johannes Schindelin; +Cc: git, Junio C Hamano On Mon, Jul 09, 2007 at 01:57:10PM +0100, Johannes Schindelin wrote: > On Mon, 9 Jul 2007, Sven Verdoolaege wrote: > > On Mon, Jul 09, 2007 at 12:56:04AM +0100, Johannes Schindelin wrote: > > I thought you had to specify the name of the new branch on the command > > line. Anyway, I don't really care about the name of this hierarchy. > > I just picked a name that is somewhat related to "rewrite-commits". > > Suggestions are welcome. > > How much more explicit should I formulate my suggestion? [..] > actually changed. So even if you _have_ to keep your current behaviour, > namely that the results are stored under the original names, you should > store the original refs in something like refs/original (overridable by a I suppose I can live with "original". > command line option), but _only if_ they actually changed. That's what I do right now. > You will see that both definitions apply _perfectly_ to what you do here. > You pipe the commit into the "map" and write it out as a new commit! > The _perfect_ example for a UNIX filter. OK. Makes sense. I've always found that naming confusing, but it has a long tradition. Sorry for the strong word. > > > - The example you give with "git update-index --remove" can fail, right? > > > > Yes. Spectacularly, even. > > Does that mean you acknowledge that the man page should tell about this? > And probably also reveal the "|| :" remedy? Yes. (Am I on trial, here?) > > > - The commit filter again deviates from the usage in cg-admin-rewritehist. > > > I can see that you wanted to make it more versatile. However, it makes > > > the tool potentially also a bit more cumbersome to use. Besides, you use > > > a temporary file where there is no need to. > > > > Are you saying I should use two pipes? > > Umm. Why not? > > > What if the commit message is larger than the pipe buffer? > > You start_command(). Then you write() until it is all written, or the > pipe is broken. Then you get the output via index_pipe(). Which is a > single sha1. I do not understand your question. Ah, but that is not how my commit "filter" works right now. It should produce the whole commit on stdout (as mentioned in the documentation). So the default filter is "cat" (and not "git-hash-object -t commit --stdin", as you seem to assume). How about I change that to a filter that accepts a single SHA1 and produces zero or more SHA1's as output? A filter for the current rewrite-commits would then be replaced by "xargs | git-cat-file commit | original-filter | git-hash-object -t commit --stdin" > To enhance on the above example: you're rewriting the commit messages so > that commit names are rewritten to match the rewritten commits. That is > possible by a message filter in cg-admin-rewritehist. > > But now somebody comes along, and says "I have a history I need to > rewrite. All bug fixes. The commit names were all abbreviated in the > commit messages, but they always had 'commit ' in front of them. I want > to rewrite them, too." The current git-rewrite-commits will rewrite all SHA1's it can find, irrespective of any 'commit ' that may precede it. I guess I'm trying to misunderstand you again. > Are you trying to misunderstand me? No. > > > But hey, maybe it _is_ time to rethink the whole filter business, and > > > introduce some kind of regular expression based action language. > > > Something like > > > > > > git rewrite-commits -e '/^author Darl McBribe/skip-commit' \ > > > > What's wrong with --author='!Darl McBribe' ? > > It is a very special use case. Not always will you be able to get all the > information from the commit object you need for conditional operations. > My example only showed that you can do the same with that syntax. But > imagine what you could do if we just added a small syntactical sugar: > > -e '?has-path:README?substitute/v2/v3/' So you want to introduce a whole language? Isn't that a bit over-engineering? skimo ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 4/4] Add git-rewrite-commits 2007-07-09 13:49 ` Sven Verdoolaege @ 2007-07-09 14:11 ` Johannes Schindelin 2007-07-09 14:42 ` Sven Verdoolaege 0 siblings, 1 reply; 25+ messages in thread From: Johannes Schindelin @ 2007-07-09 14:11 UTC (permalink / raw) To: skimo; +Cc: git, Junio C Hamano Hi, On Mon, 9 Jul 2007, Sven Verdoolaege wrote: > On Mon, Jul 09, 2007 at 01:57:10PM +0100, Johannes Schindelin wrote: > > > On Mon, 9 Jul 2007, Sven Verdoolaege wrote: > > > > > What if the commit message is larger than the pipe buffer? > > > > You start_command(). Then you write() until it is all written, or the > > pipe is broken. Then you get the output via index_pipe(). Which is a > > single sha1. I do not understand your question. > > Ah, but that is not how my commit "filter" works right now. It should > produce the whole commit on stdout (as mentioned in the documentation). > So the default filter is "cat" (and not "git-hash-object -t commit > --stdin", as you seem to assume). Sorry. Somehow I misread that index_pipe() for a read_pipe(). > How about I change that to a filter that accepts a single SHA1 and > produces zero or more SHA1's as output? > A filter for the current rewrite-commits would then be replaced > by "xargs | git-cat-file commit | original-filter | git-hash-object -t commit --stdin" You still have not addressed the fundamental problem! If the commit filter takes the things as a whole commit buffer or as a single sha1, or if it writes one, two or -1 commits is not really a fundamental problem. The real problem is that this filter can only act on _one_ commit. Yes, it wants to change only one commit at a time, but it might need information from _other_ commits, too! > > To enhance on the above example: you're rewriting the commit messages > > so that commit names are rewritten to match the rewritten commits. > > That is possible by a message filter in cg-admin-rewritehist. > > > > But now somebody comes along, and says "I have a history I need to > > rewrite. All bug fixes. The commit names were all abbreviated in the > > commit messages, but they always had 'commit ' in front of them. I > > want to rewrite them, too." > > The current git-rewrite-commits will rewrite all SHA1's it can find, > irrespective of any 'commit ' that may precede it. Even abbreviated ones? > > > What's wrong with --author='!Darl McBribe' ? > > > > It is a very special use case. Not always will you be able to get all > > the information from the commit object you need for conditional > > operations. My example only showed that you can do the same with that > > syntax. But imagine what you could do if we just added a small > > syntactical sugar: > > > > -e '?has-path:README?substitute/v2/v3/' > > So you want to introduce a whole language? > Isn't that a bit over-engineering? We already have a tool which is powerful enough to do that. Yes, it is a little complicated to operate, and yes, it is slower than your version. But darn it, it _does_ more than your version. It is a pity that you did not address the fundamental problem, so I have to spend time (that I should really spend differently) thinking about it. Alas, I think I have a solution. You need a flag: --write-sha1-mappings=<directory> Yes, it makes your code slower again, but only if you need those mappings. Yes, it is way less convenient than the "map" function, but then you could automatically write a script into that directory, providing the convenience functions, and exporting the path so that you can say --commit-filter '. "$G"; ... map "$sha1"' and also add convenience functions "save <varname>" and "restore <varname>" so that finally a bit of convenience is restored to the filter writers. But those are no longer fundamental problems. I'd be glad if you could put these suggestions to use in rewrite-commits. While at it, you should also change the semantics for the commit filter, probably even less so than you suggested: since you rewrite the parents, and the tree, you should continue to pipe this information into the commit filter. That would be another place for a convenience function "commit", which does the same as "git hash-object -w --stdin". Okay now. To be precise, here is my wish list: * rename the darned things to "filter" again. * --write-sha1-mappings=<directory> (or --write-commit-mappings), possibly defaulting to .git/mappings/. Be careful not to overwrite an existing such directory. * change the semantics of the commit filter: the output is a list (possibly empty) of replacement sha1's for this commit. * if any filters are called, provide a script with convenience functions, and an environment variable pointing to it. These functions should include: * map * commit * save * restore Ciao, Dscho ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 4/4] Add git-rewrite-commits 2007-07-09 14:11 ` Johannes Schindelin @ 2007-07-09 14:42 ` Sven Verdoolaege 2007-07-09 14:59 ` Johannes Schindelin 0 siblings, 1 reply; 25+ messages in thread From: Sven Verdoolaege @ 2007-07-09 14:42 UTC (permalink / raw) To: Johannes Schindelin; +Cc: git, Junio C Hamano On Mon, Jul 09, 2007 at 03:11:45PM +0100, Johannes Schindelin wrote: > On Mon, 9 Jul 2007, Sven Verdoolaege wrote: > > The current git-rewrite-commits will rewrite all SHA1's it can find, > > irrespective of any 'commit ' that may precede it. > > Even abbreviated ones? Yes. I'll add that to the documentation. > * rename the darned things to "filter" again. > > * --write-sha1-mappings=<directory> (or --write-commit-mappings), possibly > defaulting to .git/mappings/. Be careful not to overwrite an existing > such directory. > > * change the semantics of the commit filter: the output is a list > (possibly empty) of replacement sha1's for this commit. > > * if any filters are called, provide a script with convenience functions, > and an environment variable pointing to it. These functions should > include: > > * map > * commit > * save > * restore Hmm... you're tricking me into write shell code. skimo ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 4/4] Add git-rewrite-commits 2007-07-09 14:42 ` Sven Verdoolaege @ 2007-07-09 14:59 ` Johannes Schindelin 0 siblings, 0 replies; 25+ messages in thread From: Johannes Schindelin @ 2007-07-09 14:59 UTC (permalink / raw) To: skimo; +Cc: git, Junio C Hamano Hi, On Mon, 9 Jul 2007, Sven Verdoolaege wrote: > On Mon, Jul 09, 2007 at 03:11:45PM +0100, Johannes Schindelin wrote: > > On Mon, 9 Jul 2007, Sven Verdoolaege wrote: > > > The current git-rewrite-commits will rewrite all SHA1's it can find, > > > irrespective of any 'commit ' that may precede it. > > > > Even abbreviated ones? > > Yes. I'll add that to the documentation. That is definitely something you want to control. I have seen commit messages referencing certain hexadecimal numbers, and they were definitely no commit names. The shorter they are, the more likely they are to be rewritten by your magic. > > * rename the darned things to "filter" again. > > > > * --write-sha1-mappings=<directory> (or --write-commit-mappings), possibly > > defaulting to .git/mappings/. Be careful not to overwrite an existing > > such directory. > > > > * change the semantics of the commit filter: the output is a list > > (possibly empty) of replacement sha1's for this commit. > > > > * if any filters are called, provide a script with convenience functions, > > and an environment variable pointing to it. These functions should > > include: > > > > * map > > * commit > > * save > > * restore > > Hmm... you're tricking me into write shell code. Ah, oh well. If you do the rest, I'll do the shell code. Ciao, Dscho ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 4/4] Add git-rewrite-commits 2007-07-08 23:56 ` Johannes Schindelin 2007-07-09 9:47 ` Sven Verdoolaege @ 2007-07-09 12:36 ` Jeff King 2007-07-09 13:16 ` Johannes Schindelin 1 sibling, 1 reply; 25+ messages in thread From: Jeff King @ 2007-07-09 12:36 UTC (permalink / raw) To: Johannes Schindelin; +Cc: skimo, git, Junio C Hamano On Mon, Jul 09, 2007 at 12:56:04AM +0100, Johannes Schindelin wrote: > But hey, maybe it _is_ time to rethink the whole filter business, and > introduce some kind of regular expression based action language. Something > like > > git rewrite-commits -e '/^author Darl McBribe/skip-commit' \ > -e 'substitute/^author Joahnnes/author Johannes/header' \ > -e 'substitute/poreclain/porcelain/body' \ > -e 'rewrite-commit-names' This is starting to look an awful lot like sed. Which is good, but I wonder if we can get sed to do the heavy lifting. I have had success with similar systems by writing the data structure out into a canonical format, editing it as text, and then "applying" the result. Something like: git rewrite-generate oldbranch | sed 's/^author Darl McBribe/skip-commit/' sed 's/^author Joahnnes/author Johannes/' | git-rewrite-commit-names | git rewrite-apply newbranch where git-rewrite-generate would generate something like git-log output, and git-rewrite-apply would, given a log-ish input, write a new history branch. A nice advantage is that it makes things like this very natural: git rewrite-generate oldbranch >history vi history git rewrite-apply newbranch <history which allows interactive editing. Of course, this is: - possibly inefficient, since rewrite-apply doesn't know what you changed and what you didn't change; it would have to recalculate a lot of sha1 hashes. - doesn't really deal with actual tree rewriting, unless there is some canonical text format for that, and then we are talking about making things _really_ inefficient Hmm. Which makes me think that maybe 'git-format-patch' is really git-rewrite-generate, and 'git-am' is really git-rewrite-apply (but with some extensions to preserve committer info). So maybe a bad idea, but I thought I would throw it out there. -Peff ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 4/4] Add git-rewrite-commits 2007-07-09 12:36 ` Jeff King @ 2007-07-09 13:16 ` Johannes Schindelin 0 siblings, 0 replies; 25+ messages in thread From: Johannes Schindelin @ 2007-07-09 13:16 UTC (permalink / raw) To: Jeff King; +Cc: skimo, git, Junio C Hamano Hi, On Mon, 9 Jul 2007, Jeff King wrote: > On Mon, Jul 09, 2007 at 12:56:04AM +0100, Johannes Schindelin wrote: > > > But hey, maybe it _is_ time to rethink the whole filter business, and > > introduce some kind of regular expression based action language. Something > > like > > > > git rewrite-commits -e '/^author Darl McBribe/skip-commit' \ > > -e 'substitute/^author Joahnnes/author Johannes/header' \ > > -e 'substitute/poreclain/porcelain/body' \ > > -e 'rewrite-commit-names' > > This is starting to look an awful lot like sed. Yep. Fully intended. > Which is good, but I wonder if we can get sed to do the heavy lifting. I > have had success with similar systems by writing the data structure out > into a canonical format, editing it as text, and then "applying" the > result. > > Something like: > git rewrite-generate oldbranch | > sed 's/^author Darl McBribe/skip-commit/' > sed 's/^author Joahnnes/author Johannes/' | > git-rewrite-commit-names | > git rewrite-apply newbranch > > where git-rewrite-generate would generate something like git-log output, > and git-rewrite-apply would, given a log-ish input, write a new history > branch. A nice advantage is that it makes things like this very natural: > git rewrite-generate oldbranch >history > vi history > git rewrite-apply newbranch <history > > which allows interactive editing. Granted. It is a really nice idea, but again you lack the map function. > Of course, this is: > - possibly inefficient, since rewrite-apply doesn't know what you > changed and what you didn't change; it would have to recalculate > a lot of sha1 hashes. > - doesn't really deal with actual tree rewriting, unless there is some > canonical text format for that, and then we are talking about making > things _really_ inefficient > > Hmm. Which makes me think that maybe 'git-format-patch' is really > git-rewrite-generate, and 'git-am' is really git-rewrite-apply (but with > some extensions to preserve committer info). Of course, you miss out the merge commits. > So maybe a bad idea, but I thought I would throw it out there. Not so bad. What you describe as possibly inefficient has been implemented very, very efficiently already: git-fast-import! So your idea brings me to another idea: Why not write git-fast-export? Actually, let's not all it that, since repo.or.cz has that name already (curiously enough, it is used for scripts exporting from _other_ SCMs, feeding to git-fast-import...), but git-fast-dump. The output should be _exactly_ as expected by git-fast-import, so that "git fast-dump | git fast-import" would be a nop, data-wise. Of course, there is a more fundamental problem with that approach: how to act on the commit message, conditional on the commit header? I know, with perl it would be really easy. But then you have to write a complete perl script, and the whole purpose of this admin-rewritehist/filter-branch/rewrite-commits frackass is that it should be _easy_ to use. Ciao, Dscho ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2007-07-09 15:07 UTC | newest] Thread overview: 25+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-07-08 16:23 [PATCH 0/4] Add git-rewrite-commits skimo 2007-07-08 16:23 ` [PATCH 1/4] export get_short_sha1 skimo 2007-07-08 16:23 ` [PATCH 2/4] export add_ref_decoration skimo 2007-07-08 16:23 ` [PATCH 3/4] revision: mark commits that didn't match a pattern for later use skimo 2007-07-08 16:23 ` [PATCH 4/4] Add git-rewrite-commits skimo 2007-07-08 16:37 ` Johannes Schindelin 2007-07-08 17:30 ` Sven Verdoolaege 2007-07-08 18:17 ` Johannes Schindelin 2007-07-08 19:11 ` Sven Verdoolaege 2007-07-08 18:04 ` Steven Grimm 2007-07-08 18:15 ` Sven Verdoolaege 2007-07-08 18:41 ` Johannes Schindelin 2007-07-08 21:10 ` Sven Verdoolaege 2007-07-09 9:01 ` Johannes Sixt 2007-07-09 9:48 ` Sven Verdoolaege 2007-07-09 11:57 ` Johannes Schindelin 2007-07-08 23:56 ` Johannes Schindelin 2007-07-09 9:47 ` Sven Verdoolaege 2007-07-09 12:57 ` Johannes Schindelin 2007-07-09 13:49 ` Sven Verdoolaege 2007-07-09 14:11 ` Johannes Schindelin 2007-07-09 14:42 ` Sven Verdoolaege 2007-07-09 14:59 ` Johannes Schindelin 2007-07-09 12:36 ` Jeff King 2007-07-09 13:16 ` Johannes Schindelin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).