git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* import files w/ history
@ 2009-03-03 12:54 Csaba Henk
  2009-03-03 13:00 ` Jeff King
  0 siblings, 1 reply; 8+ messages in thread
From: Csaba Henk @ 2009-03-03 12:54 UTC (permalink / raw)
  To: git

Hi,

How could I import some files from an unrelated git repo with history?
And if I'd like to use different paths? Eg:

Say the other repo has these files:

lib/trees/rb_tree.{c,h}

and I want to import them into my repo as

include/rb_tree.h
src/rb_tree.c

(In fact I just have a single file to import and I don't want to
vary paths, yet I'm curious about this extended case, too.)

Thanks,
Csaba

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: import files w/ history
  2009-03-03 12:54 import files w/ history Csaba Henk
@ 2009-03-03 13:00 ` Jeff King
  2009-03-06 13:29   ` Csaba Henk
  0 siblings, 1 reply; 8+ messages in thread
From: Jeff King @ 2009-03-03 13:00 UTC (permalink / raw)
  To: Csaba Henk; +Cc: git

On Tue, Mar 03, 2009 at 12:54:54PM +0000, Csaba Henk wrote:

> How could I import some files from an unrelated git repo with history?

Just "git pull" from the other repo, which will include all of its
history. If you want to pretend that the other history contains just a
subset of the true history, use "git filter-branch" to rewrite it first.

> And if I'd like to use different paths? Eg:
> 
> Say the other repo has these files:
> 
> lib/trees/rb_tree.{c,h}
> 
> and I want to import them into my repo as
> 
> include/rb_tree.h
> src/rb_tree.c

If you are rewriting the history, you can rename the files as you see
fit. There is even an example of this in "git help filter-branch".

-Peff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: import files w/ history
  2009-03-03 13:00 ` Jeff King
@ 2009-03-06 13:29   ` Csaba Henk
  2009-03-06 15:33     ` Miklos Vajna
  2009-03-08  0:10     ` Jeff King
  0 siblings, 2 replies; 8+ messages in thread
From: Csaba Henk @ 2009-03-06 13:29 UTC (permalink / raw)
  To: git

On 2009-03-03, Jeff King <peff@peff.net> wrote:
> On Tue, Mar 03, 2009 at 12:54:54PM +0000, Csaba Henk wrote:
>
>> How could I import some files from an unrelated git repo with history?
>
> Just "git pull" from the other repo, which will include all of its
> history. If you want to pretend that the other history contains just a
> subset of the true history, use "git filter-branch" to rewrite it first.

Thanks Jeff, but it didn't work well for a large repo. At least not what
I could carve out myself.

The repo in question is the DragonFlyBSD repository, and I wanted to
cut out the history of sys/dev/disk/vn/vn.c. After reading
git-filter-branch(1) I come up with the following: first I wanted to
select those commits where the file in question was modified. I tried
to use the following filtration:

$ git filter-branch --commit-filter '
   if [ $# -lt 3 ] || git diff --stat $3 $1 | grep -q 'sys/dev/disk/vn/vn\.c'
   then
     git commit-tree "$@"
   else
     skip_commit "$@"
   fi' HEAD

It should select those commits where vn.c differs from the vn.c in the _first_
parent, so probably it's not exactly what I want, but anyway, I went on
to give it a try.

I have even tested this filter script on a small repo and it worked
well. Then I ran it against the Dfly repo, and after 23 hours of
processing I ended up with:

...
23575b3e0b087120b0475ae93c505c72a9779fdb
35ac2f0aa5ac0ca78109781817c524fa354e8691
23575b3e0b087120b0475ae93c505c72a9779fdb
35ac2f0aa5ac0ca78109781817c524fa354e8691
23575b3e0b087120b0475ae93c505c72a9779fdb
35ac2f0aa5ac0ca78109781817c524fa354e8691
23575b3e0b087120b0475ae93c505c72a9779fdb
35ac2f0aa5ac0ca78109781817c524fa354e8691
23575b3e0b087120b0475ae93c505c72a9779fdb
35ac2f0aa5ac0ca78109781817c524fa354e8691
WARNING: Ref 'refs/heads/__rewrite' points to the first one now.

And the result is completely f*cked up.
Neither those two commits which occur repeatedly at the end of the
output, nor the commit at the actual position of the __rewrite brach
has a parent, and the upstream commits from which these were derived
didn't affect vn.c.

  *  *  *

OK, I then tried to do more RTFM and be more clever and efficient, and
find a way to specify directly those commits which affect vn.c. As "git
rev-list" can be invoked like "git rev-list <commit> <path>", and the
synopsis of "git filter-branch" is like

 git filter-branch [options] [--] [<rev-list options>...]

I then gave a try to:

$ git filter-branch --  master sys/dev/disk/vn/vn.c

but no dice -- I got:

  fatal: ambiguous argument 'sys/dev/disk/vn/vn.c': unknown revision or
  path not in the working tree.
  Use '--' to separate paths from revisions
  Could not get the commits

Any idea?

Thanks,
Csaba

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: import files w/ history
  2009-03-06 13:29   ` Csaba Henk
@ 2009-03-06 15:33     ` Miklos Vajna
  2009-03-08  0:10     ` Jeff King
  1 sibling, 0 replies; 8+ messages in thread
From: Miklos Vajna @ 2009-03-06 15:33 UTC (permalink / raw)
  To: Csaba Henk; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 480 bytes --]

On Fri, Mar 06, 2009 at 01:29:38PM +0000, Csaba Henk <csaba-ml@creo.hu> wrote:
> $ git filter-branch --commit-filter '
>    if [ $# -lt 3 ] || git diff --stat $3 $1 | grep -q 'sys/dev/disk/vn/vn\.c'
>    then
>      git commit-tree "$@"
>    else
>      skip_commit "$@"
>    fi' HEAD

Did you notice --subdirectory-filter? Maybe it would be more efficient
to run --subdirectory-filter sys/dev/disk/vn first, then you can play
with the resulting small repo to suit your needs. :)

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: import files w/ history
  2009-03-06 13:29   ` Csaba Henk
  2009-03-06 15:33     ` Miklos Vajna
@ 2009-03-08  0:10     ` Jeff King
  2009-03-09  5:15       ` Csaba Henk
  1 sibling, 1 reply; 8+ messages in thread
From: Jeff King @ 2009-03-08  0:10 UTC (permalink / raw)
  To: Csaba Henk; +Cc: git

On Fri, Mar 06, 2009 at 01:29:38PM +0000, Csaba Henk wrote:

> $ git filter-branch --commit-filter '
>    if [ $# -lt 3 ] || git diff --stat $3 $1 | grep -q 'sys/dev/disk/vn/vn\.c'
>    then
>      git commit-tree "$@"
>    else
>      skip_commit "$@"
>    fi' HEAD

Wow, I'll bet that was slow to run. And it's not really what you want.

You are picking commits that changed a particular file, and then
including the _whole_ tree. Remember that commits really record a tree
state; we only think of them as "changes" because they point to a prior
commit with its own tree state. So you are just selecting some subset of
the states, but not cutting down the tree in each state.

What you really want to do is say:

  - for every commit, narrow the tree to _just_ the one file

  - if there were no changes in the narrowed tree, just throw out the
    commit

You can use an --index-filter to do the former, and a --commit-filter to
do the latter (or just use --prune-empty, which is a shorthand).

Another poster had a similar problem, and you can see the right
filter-branch recipe there:

  http://article.gmane.org/gmane.comp.version-control.git/111991

>   *  *  *
> 
> OK, I then tried to do more RTFM and be more clever and efficient, and
> find a way to specify directly those commits which affect vn.c. As "git
> rev-list" can be invoked like "git rev-list <commit> <path>", and the
> synopsis of "git filter-branch" is like
> 
>  git filter-branch [options] [--] [<rev-list options>...]
> 
> I then gave a try to:
> 
> $ git filter-branch --  master sys/dev/disk/vn/vn.c
> 
> but no dice -- I got:
> 
>   fatal: ambiguous argument 'sys/dev/disk/vn/vn.c': unknown revision or
>   path not in the working tree.
>   Use '--' to separate paths from revisions
>   Could not get the commits
> 
> Any idea?

I think you need an extra '--' to separate the paths from the revisions
in the rev-list arguments:

  git filter-branch -- master -- sys/dev/disk/vn/vn.c

but even that doesn't quite do what you want. It limits the commits that
are shown, similar to your first attempt above, but it doesn't cut down
the tree itself (OTOH, limiting by path rather than using --prune-empty
is likely to run faster, since you won't even look at commits that are
uninteresting. However, it may change the shape of your history with
respect to branching and merging).

-Peff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: import files w/ history
  2009-03-08  0:10     ` Jeff King
@ 2009-03-09  5:15       ` Csaba Henk
  2009-03-10 18:03         ` Jeff King
  0 siblings, 1 reply; 8+ messages in thread
From: Csaba Henk @ 2009-03-09  5:15 UTC (permalink / raw)
  To: git

On 2009-03-08, Jeff King <peff@peff.net> wrote:
> What you really want to do is say:
>
>   - for every commit, narrow the tree to _just_ the one file
>
>   - if there were no changes in the narrowed tree, just throw out the
>     commit
>
> You can use an --index-filter to do the former, and a --commit-filter to
> do the latter (or just use --prune-empty, which is a shorthand).
>
> Another poster had a similar problem, and you can see the right
> filter-branch recipe there:
>
>   http://article.gmane.org/gmane.comp.version-control.git/111991

Thanks, this did the job.

>
> I think you need an extra '--' to separate the paths from the revisions
> in the rev-list arguments:
>
>   git filter-branch -- master -- sys/dev/disk/vn/vn.c
>
> but even that doesn't quite do what you want. It limits the commits that
> are shown, similar to your first attempt above, but it doesn't cut down
> the tree itself (OTOH, limiting by path rather than using --prune-empty
> is likely to run faster, since you won't even look at commits that are
> uninteresting. However, it may change the shape of your history with
> respect to branching and merging).

Finally I choose to add the path to the rev-list args -- 80 vs
15000 commits does make a difference. (I can still check if there was
any histroy [I just coined this from "history" and "destroy" :)] and
go back to the full-scan way if yes.)

But I still had a hard time with it... Finally I realized that if I do
filtering this way, I have to start filtering from the topmost commit
which affects the given file.

If I just start from origin/HEAD (assuming that it's on a commit which
does not affect the file), then it won't be found as a key of the mapping
created by git-filter-branch (as it's ignored because rev-listing was
narrowed down to the file), and therefore filter-branch finally punts
with "WARNING: Ref '<sha1>' is unchanged". I don't know if it's an
intended behaviour, or something which could/should be improved, or at
least documented... seems to be some sort of POLS violation to me (at
least I was surprised :) ).

Regards,
Csaba

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: import files w/ history
  2009-03-09  5:15       ` Csaba Henk
@ 2009-03-10 18:03         ` Jeff King
  2009-03-11  0:11           ` Csaba Henk
  0 siblings, 1 reply; 8+ messages in thread
From: Jeff King @ 2009-03-10 18:03 UTC (permalink / raw)
  To: Csaba Henk; +Cc: git

On Mon, Mar 09, 2009 at 05:15:16AM +0000, Csaba Henk wrote:

> But I still had a hard time with it... Finally I realized that if I do
> filtering this way, I have to start filtering from the topmost commit
> which affects the given file.
> 
> If I just start from origin/HEAD (assuming that it's on a commit which
> does not affect the file), then it won't be found as a key of the mapping
> created by git-filter-branch (as it's ignored because rev-listing was
> narrowed down to the file), and therefore filter-branch finally punts
> with "WARNING: Ref '<sha1>' is unchanged". I don't know if it's an
> intended behaviour, or something which could/should be improved, or at
> least documented... seems to be some sort of POLS violation to me (at
> least I was surprised :) ).

I think passing path limiters to filter-branch is just something that
nobody ever really tried before. I think the solutions are, in order of
decreasing easiness and increasing difficulty:

  1. document the problem in Documentation/git-filter-branch.txt

  2. create a failing test for it in the test suite

  3. fix the failing test. ;)

Do you want to try a patch for one (or more!) of those?

-Peff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: import files w/ history
  2009-03-10 18:03         ` Jeff King
@ 2009-03-11  0:11           ` Csaba Henk
  0 siblings, 0 replies; 8+ messages in thread
From: Csaba Henk @ 2009-03-11  0:11 UTC (permalink / raw)
  To: git

On 2009-03-10, Jeff King <peff@peff.net> wrote:
> I think passing path limiters to filter-branch is just something that
> nobody ever really tried before. I think the solutions are, in order of
> decreasing easiness and increasing difficulty:
>
>   1. document the problem in Documentation/git-filter-branch.txt
>
>   2. create a failing test for it in the test suite
>
>   3. fix the failing test. ;)
>
> Do you want to try a patch for one (or more!) of those?

I'm eager to contribute... next month. I'm pretty much overwhelmed
now. (Accidentally I almost wrote "overclocked" instead of
"overwhelmed"... only if that were true!)

Regards,
Csaba

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2009-03-11  0:13 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-03-03 12:54 import files w/ history Csaba Henk
2009-03-03 13:00 ` Jeff King
2009-03-06 13:29   ` Csaba Henk
2009-03-06 15:33     ` Miklos Vajna
2009-03-08  0:10     ` Jeff King
2009-03-09  5:15       ` Csaba Henk
2009-03-10 18:03         ` Jeff King
2009-03-11  0:11           ` Csaba Henk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).