* Re: git pull transfers useless files
2012-09-24 17:51 git pull transfers useless files Angelo Borsotti
2012-09-24 18:59 ` Junio C Hamano
@ 2012-09-24 19:17 ` Jeff King
1 sibling, 0 replies; 3+ messages in thread
From: Jeff King @ 2012-09-24 19:17 UTC (permalink / raw)
To: Angelo Borsotti; +Cc: git
On Mon, Sep 24, 2012 at 07:51:20PM +0200, Angelo Borsotti wrote:
> #!/bin/bash
>
> set -v
> cd remote
> rm -rf * .git/
> git init
> echo '*.pdf -crlf -diff merge=binary' >.git/info/attributes
>
> touch f1
> git add f1
> echo "aaa" >f1.pdf
> git add f1.pdf
> cp <very large pdf file, some 100 Mbytes>.pdf f2.pdf
> git add f2.pdf
> git commit -m A
> cd ..
>
> cd local
> rm -rf * .git/
> git init
> echo '*.pdf -crlf -diff merge=binary' >.git/info/attributes
> git remote add remote ../remote
>
> touch f3
> git add f3
> git commit -m B
> git checkout -b develop
>
> echo "bbb" >f2.pdf
> git add f2.pdf
> git commit -m C
> git pull -v --squash remote master
>
> ls
> cat <f2.pdf
>
> set +v
>
> Replace <very large pdf file, some 100 Mbytes>.pdf with the path of a pdf file
> that is really large and run it.
> When it executes the git pull it spends on my computer some 30 seconds,
> obviously transferring the pdf file, that then it disregards because of the
> merge=binary attribute.
It does not disregard the file. The working tree is left with your
existing version of f2, but note that the index still marks the
conflict. Your next step would be to resolve the conflict in some way.
Towards that end, you can now inspect both sides:
git show :2:f2.pdf ;# our side
git show :3:f2.pdf ;# their side
Or you can invoke a mergetool to start a third-party merge helper on the
binary files:
git mergetool
Or you can just resolve in favor of "their" side:
git checkout --theirs f2.pdf
>From your description, I imagine your intent is to simply resolve in
favor of the "ours", and never look at the other side. However, git does
not have enough information to know that.
There is no "merge=ours" attribute (and indeed, it would be kind of
crazy, since your result would depend on which direction you were
merging, which is something you only know at the time of merge. Hence it
makes sense as a command-line option for a strategy, but not something
that is an attribute as a file).
All that being said, we can construct a case where the contents of the
PDF really _don't_ matter at all to the result. Like this:
# new repo
git init parent
cd parent
# make a commit with a giant file
echo small >foo.txt
cp <your-giant-file>.pdf big.pdf
git add .
git commit -m one
# now get rid of the giant file
git rm big.pdf
git commit -m two
# now merge it into another history
git init ../child
cd ../child
echo unrelated >file.txt
git add .
git commit -m three
git pull -v --squash ../parent master
Because we are doing a squash merge, we will throw away most of the
history we fetch, and only ever look at the tip of parent/master (which
in this case does not contain the PDF), and the shared ancestor (which
in this case is empty, since there is no shared history).
So in theory we could get by with fetching all the commits (to do the
history traversal), and the trees and blobs only from the tip commit.
But that is not a good idea in general for two reasons:
1. Even if that PDF is not used in the actual merge algorithm, the
contents of the earlier commits are useful for figuring out what
happened (e.g., when resolving another conflict, you might want to
refer back via "git log").
2. It breaks git's reachability assumptions. Git always makes sure
that if you have object X, you have all of the objects it refers
to, the ones they refer to, and so on. This assumption underlies
many of git's operations (e.g., what we need to send to a remote
who claims to have commit X).
In this case, since you are using --squash, you could presumably
throw away the original history after doing the squash merge. But
it would be quite complex to special-case this in the protocol, and
almost certainly not worth it for this corner case.
-Peff
^ permalink raw reply [flat|nested] 3+ messages in thread