* [PATCH] git-filter-branch: add --egrep-filter option
@ 2011-04-15 22:50 Michael O'Cleirigh
2011-04-16 8:16 ` Johannes Sixt
0 siblings, 1 reply; 5+ messages in thread
From: Michael O'Cleirigh @ 2011-04-15 22:50 UTC (permalink / raw)
To: git
The --subdirectory-filter will look for a single directory and then rewrite
history to make its content the root. This is ok except for cases where we
want to retain history of those files before they were moved into that
directory.
The --egrep-filter option allows specifying an egrep regex for the files in the
tree of each commit to keep. For example:
Directories we want are A, B, C, D and they exist in several different
lifetimes. A and B exist sometimes together then B and C and finally then D.
e.g. git-filter-branch --egrep-filter "(A|B|C|D)"
Each commit will then contain different combination's of A or B or C or D (up to A and B and C and D).
---
git-filter-branch.sh | 12 ++++++++++++
1 files changed, 12 insertions(+), 0 deletions(-)
diff --git a/git-filter-branch.sh b/git-filter-branch.sh
index 962a93b..2392ad6 100755
--- a/git-filter-branch.sh
+++ b/git-filter-branch.sh
@@ -101,6 +101,7 @@ USAGE="[--env-filter<command>] [--tree-filter<command>]
[--index-filter<command>] [--parent-filter<command>]
[--msg-filter<command>] [--commit-filter<command>]
[--tag-name-filter<command>] [--subdirectory-filter<directory>]
+ [--egrep-filter<filter>]
[--original<namespace>] [-d<directory>] [-f | --force]
[<rev-list options>...]"
@@ -122,6 +123,7 @@ filter_msg=cat
filter_commit=
filter_tag_name=
filter_subdir=
+filter_egrep=
orig_namespace=refs/original/
force=
prune_empty=
@@ -191,6 +193,10 @@ do
filter_subdir="$OPTARG"
remap_to_ancestor=t
;;
+ --egrep-filter)
+ filter_egrep="$OPTARG"
+ remap_to_ancestor=t
+ ;;
--original)
orig_namespace=$(expr "$OPTARG/" : '\(.*[^/]\)/*$')/
;;
@@ -317,6 +323,12 @@ while read commit parents; do
}
esac || die "Could not initialize the index"
+ if [ "$filter_egrep" ]; then
+
+ git ls-tree $commit | egrep "$filter_egrep" | git mktree | xargs git read-tree -i -m
+
+ fi
+
GIT_COMMIT=$commit
export GIT_COMMIT
git cat-file commit "$commit">../commit ||
-- 1.7.2.3
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] git-filter-branch: add --egrep-filter option
2011-04-15 22:50 [PATCH] git-filter-branch: add --egrep-filter option Michael O'Cleirigh
@ 2011-04-16 8:16 ` Johannes Sixt
2011-04-17 1:45 ` Michael O'Cleirigh
0 siblings, 1 reply; 5+ messages in thread
From: Johannes Sixt @ 2011-04-16 8:16 UTC (permalink / raw)
To: Michael O'Cleirigh; +Cc: git
On Samstag, 16. April 2011, Michael O'Cleirigh wrote:
> The --subdirectory-filter will look for a single directory and then rewrite
> history to make its content the root. This is ok except for cases where we
> want to retain history of those files before they were moved into that
> directory.
>
> The --egrep-filter option allows specifying an egrep regex for the files in
> the tree of each commit to keep. For example:
>
> Directories we want are A, B, C, D and they exist in several different
> lifetimes. A and B exist sometimes together then B and C and finally then
> D.
>
> e.g. git-filter-branch --egrep-filter "(A|B|C|D)"
>
> Each commit will then contain different combination's of A or B or C or D
> (up to A and B and C and D).
Why do you need a new --...-filter option for this? Your implementation is
merely an instance of an --index-filter, and at that a very specialized one,
which operates only at the top-most directory level.
> + git ls-tree $commit | egrep "$filter_egrep" | git mktree |
xargs git read-tree -i -m
-- Hannes
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] git-filter-branch: add --egrep-filter option
2011-04-16 8:16 ` Johannes Sixt
@ 2011-04-17 1:45 ` Michael O'Cleirigh
2011-04-19 8:01 ` Jonathan Nieder
0 siblings, 1 reply; 5+ messages in thread
From: Michael O'Cleirigh @ 2011-04-17 1:45 UTC (permalink / raw)
To: Johannes Sixt; +Cc: git
Hi Johannes,
Thanks for commenting on this patch.
> On Samstag, 16. April 2011, Michael O'Cleirigh wrote:
>> The --subdirectory-filter will look for a single directory and then rewrite
>> history to make its content the root. This is ok except for cases where we
>> want to retain history of those files before they were moved into that
>> directory.
>>
>> The --egrep-filter option allows specifying an egrep regex for the files in
>> the tree of each commit to keep. For example:
>>
>> Directories we want are A, B, C, D and they exist in several different
>> lifetimes. A and B exist sometimes together then B and C and finally then
>> D.
>>
>> e.g. git-filter-branch --egrep-filter "(A|B|C|D)"
>>
>> Each commit will then contain different combination's of A or B or C or D
>> (up to A and B and C and D).
> Why do you need a new --...-filter option for this? Your implementation is
> merely an instance of an --index-filter, and at that a very specialized one,
> which operates only at the top-most directory level.
>
At work we needed to split out 2 more modules from a 1400 revision
repository that we imported from subversion.
Each had been originally created under different names at the top level
and then only recently moved into a more logical single directory per
project structure. When we first ran filter-branch with the
--subdirectory-filter we only had 6 commits instead of the 100 commits
we ended up with after using the --egrep-filter method.
I tried a tree-filter first but it was slow and then the same method as
an index filter was slower (I would search for the paths that didn't
match the filter (egrep -v "pattern") and then remove each of them).
By using this egrep-filter option it only took 5 minutes per repo vs >8
hours for the tree-filter approach.
I posted to the list incase it might be useful to others; But I didn't
really know if it would be useful or not.
After considering your comment I have to agree with you that it is a
special case of index-filter and probably not useful/general for enough
other cases to justify adding in a new command line option.
Regards,
Mike
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] git-filter-branch: add --egrep-filter option
2011-04-17 1:45 ` Michael O'Cleirigh
@ 2011-04-19 8:01 ` Jonathan Nieder
2011-04-19 16:03 ` Phil Hord
0 siblings, 1 reply; 5+ messages in thread
From: Jonathan Nieder @ 2011-04-19 8:01 UTC (permalink / raw)
To: Michael O'Cleirigh; +Cc: Johannes Sixt, git
Hi,
Michael O'Cleirigh wrote:
> After considering your comment I have to agree with you that it is a
> special case of index-filter and probably not useful/general for
> enough other cases to justify adding in a new command line option.
Now, why do you give up so easily? ;-)
Surely what your patch is hinting at is the possibility of an
--ls-tree-filter (for lack of a better name) that works with trees
without the overhead of unpacking them. On the other hand I do agree
with Hannes that allowing only "egrep" is a bit overspecialized.
In practice I would have used something like
--commit-filter='
tree=$1 &&
new_tree=$(
git ls-tree $commit |
egrep "$filter_egrep" |
git mktree
) &&
shift &&
git_commit_non_empty_tree "$new_tree" "$@"
'
so another (simpler?) solution might be an entry for the EXAMPLES
section of the manual along these lines.
Ciao,
Jonathan
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] git-filter-branch: add --egrep-filter option
2011-04-19 8:01 ` Jonathan Nieder
@ 2011-04-19 16:03 ` Phil Hord
0 siblings, 0 replies; 5+ messages in thread
From: Phil Hord @ 2011-04-19 16:03 UTC (permalink / raw)
To: Jonathan Nieder; +Cc: Michael O'Cleirigh, Johannes Sixt, git
On 04/19/2011 04:01 AM, Jonathan Nieder wrote:
> Hi,
>
> Michael O'Cleirigh wrote:
>
>> After considering your comment I have to agree with you that it is a
>> special case of index-filter and probably not useful/general for
>> enough other cases to justify adding in a new command line option.
> Now, why do you give up so easily? ;-)
>
> Surely what your patch is hinting at is the possibility of an
> --ls-tree-filter (for lack of a better name) that works with trees
> without the overhead of unpacking them.
I have invented something similar[*] for git three different times in
three different ways. The last one is the fastest and uses
git-fast-import instead of filter-branch, but I was sure one of the
filter-branch methods would have been more efficient. More examples
would be very welcome.
Phil
[*] My implementations mostly focused on applying a ".gitignore" file to
the repo history. I spent many hours on this. I wound up with a script
that also handles file and branch renaming (the latter important so I
can run different filters on the same repo and drop results into
different branches). It's not patch-worthy (yet), but I would have
loved to have more examples along the way such as the mktree one you
just provided.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2011-04-19 16:03 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-04-15 22:50 [PATCH] git-filter-branch: add --egrep-filter option Michael O'Cleirigh
2011-04-16 8:16 ` Johannes Sixt
2011-04-17 1:45 ` Michael O'Cleirigh
2011-04-19 8:01 ` Jonathan Nieder
2011-04-19 16:03 ` Phil Hord
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).