git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Understanding git filter-branch --subdirectory-filter behaviour
@ 2008-05-20 20:11 David Tweed
  2008-05-21  6:26 ` Johannes Sixt
  0 siblings, 1 reply; 3+ messages in thread
From: David Tweed @ 2008-05-20 20:11 UTC (permalink / raw)
  To: gi mailing list

Hi, I'm experimenting with git filter-branch --subdirectory-filter
(being specific since it appears to have several special code branches
in the script) and getting results that I don't understand. Firstly,
can I confirm what appears implied by the man-page but I can't find
explicitly stated:

git filter-branch <how to filter> HEAD

is expected to do its filtering on the branch HEAD is on the entire
DAG all the way back to the initial commit, even if this is a DAG with
multiple branches splitting off and remerging?

I'm trying this on a repo (copy) containing a directory WRITING,
although not quite all the way back to the repo creation getting:

$ git filter-branch --subdirectory-filter WRITING/ HEAD
Rewrite 42f24be8d8198738134a19471697b39359199fa3 (351/351)
Ref 'refs/heads/master' was rewritten

$ git rev-list HEAD | wc
     55      55    2255

Looking at this with gitk and git log confirms 55 commits, and the
first commit is the one immediately after the first merge encountered
(the commit that occured just after the merge) when walking backwards
in history. Is this something that would be expected?

Digging a little into the shell-script I find the list of commits is
generated with

git rev-list --reverse --topo-order --default HEAD --parents HEAD
--full-history -- WRITING

and (adding --pretty so I can easily read it) running this manually
gives 351 entries and looks to contain the expected commits. So I'm
confused what's happening?

If this is expected, is there an refspec I'm missing to get
filter-branch to filter the entire repo?

(FWIW, git version 1.5.5.1.316.g377d9 on x86-64 Linux.)

Many thanks,

-- 
cheers, dave tweed__________________________
david.tweed@gmail.com
Rm 124, School of Systems Engineering, University of Reading.
"while having code so boring anyone can maintain it, use Python." --
attempted insult seen on slashdot

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Understanding git filter-branch --subdirectory-filter behaviour
  2008-05-20 20:11 Understanding git filter-branch --subdirectory-filter behaviour David Tweed
@ 2008-05-21  6:26 ` Johannes Sixt
  2008-05-22 18:05   ` David Tweed
  0 siblings, 1 reply; 3+ messages in thread
From: Johannes Sixt @ 2008-05-21  6:26 UTC (permalink / raw)
  To: David Tweed; +Cc: git mailing list

David Tweed schrieb:
> $ git filter-branch --subdirectory-filter WRITING/ HEAD
> Rewrite 42f24be8d8198738134a19471697b39359199fa3 (351/351)
> Ref 'refs/heads/master' was rewritten
> 
> $ git rev-list HEAD | wc
>      55      55    2255
> 
...
> 
> Digging a little into the shell-script I find the list of commits is
> generated with
> 
> git rev-list --reverse --topo-order --default HEAD --parents HEAD
> --full-history -- WRITING
> 
> and (adding --pretty so I can easily read it) running this manually
> gives 351 entries and looks to contain the expected commits. So I'm
> confused what's happening?

That's difficult to tell without a peek at the repository.

Did you compare 'gitk HEAD' to 'gitk HEAD -- WRITING'? I'd expect the
latter to be a subset of the former. Note that with a path specified
"history simplification" happens, which means that you won't see as many
merges as when no path is specified.

-- Hannes

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Understanding git filter-branch --subdirectory-filter behaviour
  2008-05-21  6:26 ` Johannes Sixt
@ 2008-05-22 18:05   ` David Tweed
  0 siblings, 0 replies; 3+ messages in thread
From: David Tweed @ 2008-05-22 18:05 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: git mailing list

On Wed, May 21, 2008 at 7:26 AM, Johannes Sixt <j.sixt@viscovery.net> wrote:
> David Tweed schrieb:
> That's difficult to tell without a peek at the repository.
>
> Did you compare 'gitk HEAD' to 'gitk HEAD -- WRITING'? I'd expect the
> latter to be a subset of the former. Note that with a path specified
> "history simplification" happens, which means that you won't see as many
> merges as when no path is specified.

Just did that in the before-filtering repository, and "gitk HEAD --
WRITING" doesn't have any branches after the simplification but it
does go back to the first commit in the repository creating WRITING
(presumably simplifying out several branches that didn't affect
WRITING), whereas the filtered repository starts on the commit
immediately after the first merge you encounter walking backwards in
time. I was prepared for the branch structure to possibly simplify
whilst keeping all the commits that change that directory, but was a
bit surprised it stopped before the first merge.

<in original>
$ git log HEAD -- WRITING | wc -l
   2033

<in filtered repo>
$ git log | wc -l
329

So it's definitely creating a smaller repo than git log filtering. If
you would be interested in looking at the actual repo (about 17M) let
me know and I'll send you tarball details via personal mail.

Anyway, many thanks for the insight and assistance,
-- 
cheers, dave tweed__________________________
david.tweed@gmail.com
Rm 124, School of Systems Engineering, University of Reading.
"while having code so boring anyone can maintain it, use Python." --
attempted insult seen on slashdot

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2008-05-22 18:06 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-05-20 20:11 Understanding git filter-branch --subdirectory-filter behaviour David Tweed
2008-05-21  6:26 ` Johannes Sixt
2008-05-22 18:05   ` David Tweed

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).