Git development
 help / color / mirror / Atom feed
* Re: git and binary files
From: Nicolas Pitre @ 2008-01-16 15:58 UTC (permalink / raw)
  To: Petko Manolov; +Cc: Jakub Narebski, Johannes Schindelin, git
In-Reply-To: <alpine.DEB.1.00.0801161715570.5260@bender.nucleusys.com>

On Wed, 16 Jan 2008, Petko Manolov wrote:

> On Wed, 16 Jan 2008, Nicolas Pitre wrote:
> 
> > On Wed, 16 Jan 2008, Petko Manolov wrote:
> > 
> > > On Wed, 16 Jan 2008, Jakub Narebski wrote:
> > > 
> > > > Petko Manolov wrote:
> > > > > 
> > > > > Unfortunately this is not the case.  These binary blobs are already
> > > > > compressed and/or encrypted and adding even a few bytes ends up
> > > > > storing
> > > > > new version in full size.
> > > > 
> > > > Can't you store them uncompressed?
> > > 
> > > Not really, but i can convert them into ascii format and store only the
> > > delta.
> > 
> > If you don't have the original uncompressed unencrypted file, what will
> > converting them to ascii actually give you?
> 
> I hope that in the case of incremental changes (0 to 5MB file is the same,
> last 64KB are actually new) the delta will be small and should be able to
> compress well.
> 
> This won't work for random changes along the length of the whole file.

But my question remains.

If you cannot create good deltas out of your binary files, converting 
those binaries into ascii will do nothing to compression performance.


Nicolas

^ permalink raw reply

* Re: git on MacOSX and files with decomposed utf-8 file names
From: Kevin Ballard @ 2008-01-16 15:43 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Mark Junker, git
In-Reply-To: <alpine.LSU.1.00.0801161531030.17650@racer.site>

[-- Attachment #1: Type: text/plain, Size: 977 bytes --]

On Jan 16, 2008, at 10:34 AM, Johannes Schindelin wrote:

> On Wed, 16 Jan 2008, Mark Junker wrote:
>
>> I have some files like "Lüftung.txt" in my repository. The strange  
>> thing is
>> that I can pull / add / commit / push those files without problem but
>> git-status always complains that thoes files are untraced (but not  
>> missing).
>
> This is a known problem.  Unfortunately, noone has implemented a fix,
> although if you're serious about it, I can point you to threads  
> where it
> has been hinted how to solve the issue.
>
> FWIW the issue is that Mac OS X decides that it knows better how to  
> encode
> your filename than you could yourself.


More like, Mac OS X has standardized on Unicode and the rest of the  
world hasn't caught up yet. Git is the only tool I've ever heard of  
that has a problem with OS X using Unicode.

-Kevin Ballard

-- 
Kevin Ballard
http://kevin.sb.org
kevin@sb.org
http://www.tildesoft.com



[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 2432 bytes --]

^ permalink raw reply

* Re: Git Cygwin - unable to create any repository - help!
From: Paul Umbers @ 2008-01-16 15:42 UTC (permalink / raw)
  To: Alex Riesen; +Cc: Robin Rosenberg, git
In-Reply-To: <20080116071832.GA2896@steel.home>

OK, I think this worked (I'm a Java man, not C/C++). I downloaded the
latest 1.5.3 source from the git repository and ran "make" with
GIT_TEST_OPTS="--verbose --debug". Here's the output:

paulumbers@Devteam29 ~/workspace/git/git-1.5.3/t
$ make
*** t0000-basic.sh ***
*   ok 1: .git/objects should be empty after git init in an empty repo.
*   ok 2: .git/objects should have 3 subdirectories.
*   ok 3: git update-index without --add should fail adding.
*   ok 4: git update-index with --add should succeed.
* FAIL 5: writing tree out with git write-tree
        tree=$(git write-tree)
* FAIL 6: validate object ID of a known tree.
        test "$tree" = 7bb943559a305bdd6bdee2cef6e5df2413c3d30a
*   ok 7: git update-index without --remove should fail removing.
*   ok 8: git update-index with --remove should be able to remove.
*   ok 9: git write-tree should be able to write an empty tree.
*   ok 10: validate object ID of a known tree.
*   ok 11: adding various types of objects with git update-index --add.
*   ok 12: showing stage with git ls-files --stage
*   ok 13: validate git ls-files output for a known tree.
* FAIL 14: writing tree out with git write-tree.
        tree=$(git write-tree)
* FAIL 15: validate object ID for a known tree.
        test "$tree" = 087704a96baf1c2d1c869a8b084481e121c88b5b
* FAIL 16: showing tree with git ls-tree
        git ls-tree $tree >current
* FAIL 17: git ls-tree output for a known tree.
        diff current expected
* FAIL 18: showing tree with git ls-tree -r
        git ls-tree -r $tree >current
* FAIL 19: git ls-tree -r output for a known tree.
        diff current expected
* FAIL 20: showing tree with git ls-tree -r -t
        git ls-tree -r -t $tree >current
* FAIL 21: git ls-tree -r output for a known tree.
        diff current expected
* FAIL 22: writing partial tree out with git write-tree --prefix.
        ptree=$(git write-tree --prefix=path3)
* FAIL 23: validate object ID for a known tree.
        test "$ptree" = 21ae8269cacbe57ae09138dcc3a2887f904d02b3
* FAIL 24: writing partial tree out with git write-tree --prefix.
        ptree=$(git write-tree --prefix=path3/subp3)
* FAIL 25: validate object ID for a known tree.
        test "$ptree" = 3c5e5399f3a333eddecce7a9b9465b63f65f51e2
*   ok 26: put invalid objects into the index.
*   ok 27: writing this tree without --missing-ok.
*   ok 28: writing this tree with --missing-ok.
* FAIL 29: git read-tree followed by write-tree should be idempotent.
        git read-tree $tree &&
             test -f .git/index &&
             newtree=$(git write-tree) &&
             test "$newtree" = "$tree"
* FAIL 30: validate git diff-files output for a know cache/work tree state.
        git diff-files >current && diff >/dev/null -b current expected
*   ok 31: git update-index --refresh should succeed.
*   ok 32: no diff after checkout and git update-index --refresh.
* FAIL 33: git commit-tree records the correct tree in a commit.
        commit0=$(echo NO | git commit-tree $P) &&
             tree=$(git show --pretty=raw $commit0 |
                 sed -n -e "s/^tree //p" -e "/^author /q") &&
             test "z$tree" = "z$P"
* FAIL 34: git commit-tree records the correct parent in a commit.
        commit1=$(echo NO | git commit-tree $P -p $commit0) &&
             parent=$(git show --pretty=raw $commit1 |
                 sed -n -e "s/^parent //p" -e "/^author /q") &&
             test "z$commit0" = "z$parent"
* FAIL 35: git commit-tree omits duplicated parent in a commit.
        commit2=$(echo NO | git commit-tree $P -p $commit0 -p $commit0) &&
             parent=$(git show --pretty=raw $commit2 |
                 sed -n -e "s/^parent //p" -e "/^author /q" |
                 sort -u) &&
             test "z$commit0" = "z$parent" &&
             numparent=$(git show --pretty=raw $commit2 |
                 sed -n -e "s/^parent //p" -e "/^author /q" |
                 wc -l) &&
             test $numparent = 1
*   ok 36: update-index D/F conflict
* FAIL 37: absolute path works as expected

                mkdir first &&
                ln -s ../.git first/.git &&
                mkdir second &&
                ln -s ../first second/other &&
                mkdir third &&
                dir="$(cd .git; pwd -P)" &&
                dir2=third/../second/other/.git &&
                test "$dir" = "$(test-absolute-path $dir2)" &&
                file="$dir"/index &&
                test "$file" = "$(test-absolute-path $dir2/index)" &&
                ln -s ../first/file .git/syml &&
                sym="$(cd first; pwd -P)"/file &&
                test "$sym" = "$(test-absolute-path $dir2/syml)"

* failed 20 among 37 test(s)
make: *** [t0000-basic.sh] Error 1

paulumbers@Devteam29 ~/workspace/git/git-1.5.3/t
$ make -v -d
GNU Make 3.81
Copyright (C) 2006  Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.

This program built for i686-pc-cygwin

paulumbers@Devteam29 ~/workspace/git/git-1.5.3/t
$ export GIT_TEST_OPTS="--verbose --debug"

paulumbers@Devteam29 ~/workspace/git/git-1.5.3/t
$ make
*** t0000-basic.sh ***
* expecting success: cmp -s /dev/null should-be-empty
*   ok 1: .git/objects should be empty after git init in an empty repo.

* expecting success: test $(wc -l < full-of-directories) = 3
*   ok 2: .git/objects should have 3 subdirectories.

* expecting failure: git update-index should-be-empty
error: should-be-empty: cannot add to the index - missing --add option?
fatal: Unable to process path should-be-empty
*   ok 3: git update-index without --add should fail adding.

* expecting success: git update-index --add should-be-empty
*   ok 4: git update-index with --add should succeed.

* expecting success: tree=$(git write-tree)
error: invalid object e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
fatal: git-write-tree: error building trees
* FAIL 5: writing tree out with git write-tree
        tree=$(git write-tree)

* expecting success: test "$tree" = 7bb943559a305bdd6bdee2cef6e5df2413c3d30a
* FAIL 6: validate object ID of a known tree.
        test "$tree" = 7bb943559a305bdd6bdee2cef6e5df2413c3d30a

* expecting failure: git update-index should-be-empty
error: should-be-empty: does not exist and --remove not passed
fatal: Unable to process path should-be-empty
*   ok 7: git update-index without --remove should fail removing.

* expecting success: git update-index --remove should-be-empty
*   ok 8: git update-index with --remove should be able to remove.

* expecting success: tree=$(git write-tree)
*   ok 9: git write-tree should be able to write an empty tree.

* expecting success: test "$tree" = 4b825dc642cb6eb9a060e54bf8d69288fbee4904
*   ok 10: validate object ID of a known tree.

* expecting success: find path* ! -type d -print | xargs git update-index --add
*   ok 11: adding various types of objects with git update-index --add.

* expecting success: git ls-files --stage >current
*   ok 12: showing stage with git ls-files --stage

* expecting success: diff current expected
*   ok 13: validate git ls-files output for a known tree.

* expecting success: tree=$(git write-tree)
error: invalid object 3feff949ed00a62d9f7af97c15cd8a30595e7ac7
fatal: git-write-tree: error building trees
* FAIL 14: writing tree out with git write-tree.
        tree=$(git write-tree)

* expecting success: test "$tree" = 087704a96baf1c2d1c869a8b084481e121c88b5b
* FAIL 15: validate object ID for a known tree.
        test "$tree" = 087704a96baf1c2d1c869a8b084481e121c88b5b

* expecting success: git ls-tree $tree >current
usage: git-ls-tree [-d] [-r] [-t] [-l] [-z] [--name-only]
[--name-status] [--full-name] [--abbrev[=<n>]] <tree-ish> [path...]
* FAIL 16: showing tree with git ls-tree
        git ls-tree $tree >current

* expecting success: diff current expected
0a1,4
> 100644 blob f87290f8eb2cbbea7857214459a0739927eab154  path0
> 120000 blob 15a98433ae33114b085f3eb3bb03b832b3180a01  path0sym
> 040000 tree 58a09c23e2ca152193f2786e06986b7b6712bdbe  path2
> 040000 tree 21ae8269cacbe57ae09138dcc3a2887f904d02b3  path3
* FAIL 17: git ls-tree output for a known tree.
        diff current expected

* expecting success: git ls-tree -r $tree >current
usage: git-ls-tree [-d] [-r] [-t] [-l] [-z] [--name-only]
[--name-status] [--full-name] [--abbrev[=<n>]] <tree-ish> [path...]
* FAIL 18: showing tree with git ls-tree -r
        git ls-tree -r $tree >current

* expecting success: diff current expected
0a1,8
> 100644 blob f87290f8eb2cbbea7857214459a0739927eab154  path0
> 120000 blob 15a98433ae33114b085f3eb3bb03b832b3180a01  path0sym
> 100644 blob 3feff949ed00a62d9f7af97c15cd8a30595e7ac7  path2/file2
> 120000 blob d8ce161addc5173867a3c3c730924388daedbc38  path2/file2sym
> 100644 blob 0aa34cae68d0878578ad119c86ca2b5ed5b28376  path3/file3
> 120000 blob 8599103969b43aff7e430efea79ca4636466794f  path3/file3sym
> 100644 blob 00fb5908cb97c2564a9783c0c64087333b3b464f  path3/subp3/file3
> 120000 blob 6649a1ebe9e9f1c553b66f5a6e74136a07ccc57c  path3/subp3/file3sym
* FAIL 19: git ls-tree -r output for a known tree.
        diff current expected

* expecting success: git ls-tree -r -t $tree >current
usage: git-ls-tree [-d] [-r] [-t] [-l] [-z] [--name-only]
[--name-status] [--full-name] [--abbrev[=<n>]] <tree-ish> [path...]
* FAIL 20: showing tree with git ls-tree -r -t
        git ls-tree -r -t $tree >current

* expecting success: diff current expected
0a1,11
> 100644 blob f87290f8eb2cbbea7857214459a0739927eab154  path0
> 120000 blob 15a98433ae33114b085f3eb3bb03b832b3180a01  path0sym
> 040000 tree 58a09c23e2ca152193f2786e06986b7b6712bdbe  path2
> 100644 blob 3feff949ed00a62d9f7af97c15cd8a30595e7ac7  path2/file2
> 120000 blob d8ce161addc5173867a3c3c730924388daedbc38  path2/file2sym
> 040000 tree 21ae8269cacbe57ae09138dcc3a2887f904d02b3  path3
> 100644 blob 0aa34cae68d0878578ad119c86ca2b5ed5b28376  path3/file3
> 120000 blob 8599103969b43aff7e430efea79ca4636466794f  path3/file3sym
> 040000 tree 3c5e5399f3a333eddecce7a9b9465b63f65f51e2  path3/subp3
> 100644 blob 00fb5908cb97c2564a9783c0c64087333b3b464f  path3/subp3/file3
> 120000 blob 6649a1ebe9e9f1c553b66f5a6e74136a07ccc57c  path3/subp3/file3sym
* FAIL 21: git ls-tree -r output for a known tree.
        diff current expected

* expecting success: ptree=$(git write-tree --prefix=path3)
error: invalid object 3feff949ed00a62d9f7af97c15cd8a30595e7ac7
fatal: git-write-tree: error building trees
* FAIL 22: writing partial tree out with git write-tree --prefix.
        ptree=$(git write-tree --prefix=path3)

* expecting success: test "$ptree" = 21ae8269cacbe57ae09138dcc3a2887f904d02b3
* FAIL 23: validate object ID for a known tree.
        test "$ptree" = 21ae8269cacbe57ae09138dcc3a2887f904d02b3

* expecting success: ptree=$(git write-tree --prefix=path3/subp3)
error: invalid object 3feff949ed00a62d9f7af97c15cd8a30595e7ac7
fatal: git-write-tree: error building trees
* FAIL 24: writing partial tree out with git write-tree --prefix.
        ptree=$(git write-tree --prefix=path3/subp3)

* expecting success: test "$ptree" = 3c5e5399f3a333eddecce7a9b9465b63f65f51e2
* FAIL 25: validate object ID for a known tree.
        test "$ptree" = 3c5e5399f3a333eddecce7a9b9465b63f65f51e2

* expecting success: git update-index --index-info < badobjects
*   ok 26: put invalid objects into the index.

* expecting failure: git write-tree
error: invalid object 1000000000000000000000000000000000000000
fatal: git-write-tree: error building trees
*   ok 27: writing this tree without --missing-ok.

* expecting success: git write-tree --missing-ok
851a367613bb6e1f0b2b518323eafed530b5b4c4
*   ok 28: writing this tree with --missing-ok.

* expecting success: git read-tree $tree &&
     test -f .git/index &&
     newtree=$(git write-tree) &&
     test "$newtree" = "$tree"
* FAIL 29: git read-tree followed by write-tree should be idempotent.
        git read-tree $tree &&
             test -f .git/index &&
             newtree=$(git write-tree) &&
             test "$newtree" = "$tree"

* expecting success: git diff-files >current && diff >/dev/null -b
current expected
* FAIL 30: validate git diff-files output for a know cache/work tree state.
        git diff-files >current && diff >/dev/null -b current expected

* expecting success: git update-index --refresh
*   ok 31: git update-index --refresh should succeed.

* expecting success: git diff-files >current && cmp -s current /dev/null
*   ok 32: no diff after checkout and git update-index --refresh.

* expecting success: commit0=$(echo NO | git commit-tree $P) &&
     tree=$(git show --pretty=raw $commit0 |
         sed -n -e "s/^tree //p" -e "/^author /q") &&
     test "z$tree" = "z$P"
error: unable to find 087704a96baf1c2d1c869a8b084481e121c88b5b
fatal: 087704a96baf1c2d1c869a8b084481e121c88b5b is not a valid object
* FAIL 33: git commit-tree records the correct tree in a commit.
        commit0=$(echo NO | git commit-tree $P) &&
             tree=$(git show --pretty=raw $commit0 |
                 sed -n -e "s/^tree //p" -e "/^author /q") &&
             test "z$tree" = "z$P"

* expecting success: commit1=$(echo NO | git commit-tree $P -p $commit0) &&
     parent=$(git show --pretty=raw $commit1 |
         sed -n -e "s/^parent //p" -e "/^author /q") &&
     test "z$commit0" = "z$parent"
error: unable to find 087704a96baf1c2d1c869a8b084481e121c88b5b
fatal: 087704a96baf1c2d1c869a8b084481e121c88b5b is not a valid object
* FAIL 34: git commit-tree records the correct parent in a commit.
        commit1=$(echo NO | git commit-tree $P -p $commit0) &&
             parent=$(git show --pretty=raw $commit1 |
                 sed -n -e "s/^parent //p" -e "/^author /q") &&
             test "z$commit0" = "z$parent"

* expecting success: commit2=$(echo NO | git commit-tree $P -p
$commit0 -p $commit0) &&
     parent=$(git show --pretty=raw $commit2 |
         sed -n -e "s/^parent //p" -e "/^author /q" |
         sort -u) &&
     test "z$commit0" = "z$parent" &&
     numparent=$(git show --pretty=raw $commit2 |
         sed -n -e "s/^parent //p" -e "/^author /q" |
         wc -l) &&
     test $numparent = 1
error: unable to find 087704a96baf1c2d1c869a8b084481e121c88b5b
fatal: 087704a96baf1c2d1c869a8b084481e121c88b5b is not a valid object
* FAIL 35: git commit-tree omits duplicated parent in a commit.
        commit2=$(echo NO | git commit-tree $P -p $commit0 -p $commit0) &&
             parent=$(git show --pretty=raw $commit2 |
                 sed -n -e "s/^parent //p" -e "/^author /q" |
                 sort -u) &&
             test "z$commit0" = "z$parent" &&
             numparent=$(git show --pretty=raw $commit2 |
                 sed -n -e "s/^parent //p" -e "/^author /q" |
                 wc -l) &&
             test $numparent = 1

* expecting success:
        mv path0 tmp &&
        mv path2 path0 &&
        mv tmp path2 &&
        git update-index --add --replace path2 path0/file2 &&
        numpath0=$(git ls-files path0 | wc -l) &&
        test $numpath0 = 1

*   ok 36: update-index D/F conflict

* expecting success:
        mkdir first &&
        ln -s ../.git first/.git &&
        mkdir second &&
        ln -s ../first second/other &&
        mkdir third &&
        dir="$(cd .git; pwd -P)" &&
        dir2=third/../second/other/.git &&
        test "$dir" = "$(test-absolute-path $dir2)" &&
        file="$dir"/index &&
        test "$file" = "$(test-absolute-path $dir2/index)" &&
        ln -s ../first/file .git/syml &&
        sym="$(cd first; pwd -P)"/file &&
        test "$sym" = "$(test-absolute-path $dir2/syml)"

* FAIL 37: absolute path works as expected

                mkdir first &&
                ln -s ../.git first/.git &&
                mkdir second &&
                ln -s ../first second/other &&
                mkdir third &&
                dir="$(cd .git; pwd -P)" &&
                dir2=third/../second/other/.git &&
                test "$dir" = "$(test-absolute-path $dir2)" &&
                file="$dir"/index &&
                test "$file" = "$(test-absolute-path $dir2/index)" &&
                ln -s ../first/file .git/syml &&
                sym="$(cd first; pwd -P)"/file &&
                test "$sym" = "$(test-absolute-path $dir2/syml)"


* failed 20 among 37 test(s)
make: *** [t0000-basic.sh] Error 1

^ permalink raw reply

* Re: git on MacOSX and files with decomposed utf-8 file names
From: Johannes Schindelin @ 2008-01-16 15:34 UTC (permalink / raw)
  To: Mark Junker; +Cc: git
In-Reply-To: <478E1FED.5010801@web.de>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 590 bytes --]

Hi,

On Wed, 16 Jan 2008, Mark Junker wrote:

> I have some files like "Lüftung.txt" in my repository. The strange thing is
> that I can pull / add / commit / push those files without problem but
> git-status always complains that thoes files are untraced (but not missing).

This is a known problem.  Unfortunately, noone has implemented a fix, 
although if you're serious about it, I can point you to threads where it 
has been hinted how to solve the issue.

FWIW the issue is that Mac OS X decides that it knows better how to encode 
your filename than you could yourself.

Ciao,
Dscho

^ permalink raw reply

* Re: git and binary files
From: Petko Manolov @ 2008-01-16 15:18 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Jakub Narebski, Johannes Schindelin, git
In-Reply-To: <alpine.LFD.1.00.0801160958170.25841@xanadu.home>

On Wed, 16 Jan 2008, Nicolas Pitre wrote:

> On Wed, 16 Jan 2008, Petko Manolov wrote:
>
>> On Wed, 16 Jan 2008, Jakub Narebski wrote:
>>
>>> Petko Manolov wrote:
>>>>
>>>> Unfortunately this is not the case.  These binary blobs are already
>>>> compressed and/or encrypted and adding even a few bytes ends up storing
>>>> new version in full size.
>>>
>>> Can't you store them uncompressed?
>>
>> Not really, but i can convert them into ascii format and store only the delta.
>
> If you don't have the original uncompressed unencrypted file, what will
> converting them to ascii actually give you?

I hope that in the case of incremental changes (0 to 5MB file is the same, 
last 64KB are actually new) the delta will be small and should be able to 
compress well.

This won't work for random changes along the length of the whole file.


 		Petko

^ permalink raw reply

* git on MacOSX and files with decomposed utf-8 file names
From: Mark Junker @ 2008-01-16 15:17 UTC (permalink / raw)
  To: git

Hi,

I have some files like "Lüftung.txt" in my repository. The strange thing 
is that I can pull / add / commit / push those files without problem but 
git-status always complains that thoes files are untraced (but not 
missing). My assumption is that it's a problem with the way MacOSX 
stores the file names (decomposed UTF-8). So something like 
"Lüftung.txt" becomes "Lüftung.txt".

It seems that git-status does two things:
1. Find files under version control (i.e. search for missing files)
2. Find files not under version control (i.e. search for untracked files)

I guess that the first look-up succeeds because MacOS X converts 
composed UTF-8 to decomposed UTF-8 when searching for a file. But it 
seems that the second look-up takes the file names as-is (decomposed) 
without converting them to composed UTF-8.

Is there an easy way to fix this behaviour? It's really annoying to see 
all those "untracked" files that are already under version control when 
executing a git-status.

Regards,
Mark

^ permalink raw reply

* Re: git and binary files
From: Rogan Dawes @ 2008-01-16 15:05 UTC (permalink / raw)
  To: Petko Manolov; +Cc: Jeff King, Johannes Schindelin, Git Mailing List
In-Reply-To: <alpine.DEB.1.00.0801161634080.5260@bender.nucleusys.com>

Petko Manolov wrote:
> On Wed, 16 Jan 2008, Jeff King wrote:
> 
>> OK, that was the answer I was looking for; it looks like you are out
>> of luck.
> 
> Story of my life. :-)
> 
>> As an experiment, it might be worth trying to store the uncompressed
>> versions instead (git will delta _and_ compress them for you).
> 
> I don't have them uncompressed.
> 
> I can try to convert those files into ascii format and then save them in 
> the repository.  Since most changes are incremental git should be able 
> to generate relatively small delta, which should compress well enough.
> 
> Thanks for the hint.
> 
> 
>         Petko

That is unlikely to help, since git can find deltas in binary files just 
as easily as in text files. All you are doing is changing the encoding.

Rogan

^ permalink raw reply

* Re: git and binary files
From: Nicolas Pitre @ 2008-01-16 15:01 UTC (permalink / raw)
  To: Petko Manolov; +Cc: Jakub Narebski, Johannes Schindelin, git
In-Reply-To: <alpine.DEB.1.00.0801161640010.5260@bender.nucleusys.com>

On Wed, 16 Jan 2008, Petko Manolov wrote:

> On Wed, 16 Jan 2008, Jakub Narebski wrote:
> 
> > Petko Manolov wrote:
> > > 
> > > Unfortunately this is not the case.  These binary blobs are already
> > > compressed and/or encrypted and adding even a few bytes ends up storing
> > > new version in full size.
> > 
> > Can't you store them uncompressed?
> 
> Not really, but i can convert them into ascii format and store only the delta.

If you don't have the original uncompressed unencrypted file, what will 
converting them to ascii actually give you?


Nicolas

^ permalink raw reply

* Re: git and binary files
From: Petko Manolov @ 2008-01-16 14:45 UTC (permalink / raw)
  To: Wincent Colaiuta; +Cc: Johannes Schindelin, David Symonds, git
In-Reply-To: <D3716EB3-10B1-4D96-AB12-BD86CBB189CB@wincent.com>

On Wed, 16 Jan 2008, Wincent Colaiuta wrote:

> If the exact contents of these large binaries *really* don't matter, as 
> you say they don't, than why don't you just commit one and never touch 
> it again?

Unfortunately those binaries does change, although the process is slow and 
not very frequent.  And this is why it pokes me in the eye - for changing 
a few bytes i end up with much larger repository.


 		Petko

^ permalink raw reply

* Re: git and binary files
From: Petko Manolov @ 2008-01-16 14:43 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Johannes Schindelin, git
In-Reply-To: <200801161520.44668.jnareb@gmail.com>

On Wed, 16 Jan 2008, Jakub Narebski wrote:

> Petko Manolov wrote:
>> On Wed, 16 Jan 2008, Jakub Narebski wrote:
>>
>>> You can always tag a blob (like junio-gpg-pub tag in git.git repository),
>>> but it wouldn't be in a working directory. But it would get distributed
>>> on clone.
>>
>> Hm, how does it work?
>
> You use git-hash-object to put file (-t blob) into the object database.
> It would return sha1 of added object. Use git-tag to create tag to blob
> (use returned sha1 for head). You can get file (to stdout) with
> "git cat-file blob tagname^{blob}".

Sounds like i'll have to play with the above.  Thanks for the tip.

> The file would be in object database, but not in working directory
> by default.

Not a big problem.

>> Unfortunately this is not the case.  These binary blobs are already
>> compressed and/or encrypted and adding even a few bytes ends up storing
>> new version in full size.
>
> Can't you store them uncompressed?

Not really, but i can convert them into ascii format and store only the 
delta.  This will admittedly increase the initial size of the repository, 
but hopefully not by much.


 		Petko

^ permalink raw reply

* Re: git and binary files
From: Petko Manolov @ 2008-01-16 14:39 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Schindelin, git
In-Reply-To: <20080116143219.GA22744@coredump.intra.peff.net>

On Wed, 16 Jan 2008, Jeff King wrote:

> OK, that was the answer I was looking for; it looks like you are out
> of luck.

Story of my life. :-)

> As an experiment, it might be worth trying to store the uncompressed
> versions instead (git will delta _and_ compress them for you).

I don't have them uncompressed.

I can try to convert those files into ascii format and then save them in 
the repository.  Since most changes are incremental git should be able to 
generate relatively small delta, which should compress well enough.

Thanks for the hint.


 		Petko

^ permalink raw reply

* Re: git and binary files
From: Wincent Colaiuta @ 2008-01-16 14:34 UTC (permalink / raw)
  To: Petko Manolov; +Cc: Johannes Schindelin, David Symonds, git
In-Reply-To: <alpine.DEB.1.00.0801161549140.5260@bender.nucleusys.com>

El 16/1/2008, a las 14:58, Petko Manolov escribió:

> On Wed, 16 Jan 2008, Johannes Schindelin wrote:
>
>> I think that you're missing the point of version control.  It's not  
>> only about having an up-to-date source tree, but also about being  
>> able to go back to a certain revision.
>
> No contradiction here.  In my case old source code will work  
> perfectly with new binaries/firmware.  That's why i don't _need_ the  
> history, only the latest stuff in order to save space.

You may be interested in the history, but the entire purpose of any  
version control system (not just Git) is to record exactly that:  
history.

If the exact contents of these large binaries *really* don't matter,  
as you say they don't, than why don't you just commit one and never  
touch it again?

Cheers,
Wincent

^ permalink raw reply

* Re: git and binary files
From: Jeff King @ 2008-01-16 14:32 UTC (permalink / raw)
  To: Petko Manolov; +Cc: Johannes Schindelin, git
In-Reply-To: <alpine.DEB.1.00.0801161622030.5260@bender.nucleusys.com>

On Wed, Jan 16, 2008 at 04:25:28PM +0200, Petko Manolov wrote:

>> Right, as loose objects. Did you try running "git-gc" to repack?
>
> I did "git repack -f -a -d", but it didn't reduce the repository size.  
> Those binaries are already compressed so any change adds up their size  
> once again.

OK, that was the answer I was looking for; it looks like you are out
of luck.

BTW, the main space-saver in repacking is _not_ compression, but rather
finding deltas between similar objects (e.g., two versions of the same
file that, although large, differ only by a small amount). So even
compressed files can still produce space savings during a repack, though
perhaps not as well because of randomness introduced by the compression.

As an experiment, it might be worth trying to store the uncompressed
versions instead (git will delta _and_ compress them for you).

-Peff

^ permalink raw reply

* Re: git and binary files
From: Petko Manolov @ 2008-01-16 14:25 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Schindelin, git
In-Reply-To: <20080116141836.GA22639@coredump.intra.peff.net>

On Wed, 16 Jan 2008, Jeff King wrote:

> On Wed, Jan 16, 2008 at 04:14:30PM +0200, Petko Manolov wrote:
>
>>> How big are your firmware files? How often do they change, and how large
>>> are the changes? IOW, have you confirmed that repacking does not produce
>>> an acceptable delta, meaning you get versioning for very low space cost?
>>
>> Changes don't happen too often, but the size of everything binary in the
>> tree easily goes to about 100MB.  Three commits later it ends up at about
>> 300MB...
>
> Right, as loose objects. Did you try running "git-gc" to repack?

I did "git repack -f -a -d", but it didn't reduce the repository size. 
Those binaries are already compressed so any change adds up their size 
once again.


cheers,
Petko

^ permalink raw reply

* Re: git and binary files
From: Petko Manolov @ 2008-01-16 14:21 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: David Symonds, git
In-Reply-To: <alpine.LSU.1.00.0801161405330.17650@racer.site>

On Wed, 16 Jan 2008, Johannes Schindelin wrote:

> No, you _do_ miss the point here.  You might _think_ that they work 
> perfectly, but with revision control you want to have _exactly_ the same 
> setup.  You want to be able to go back to a certain _revision_ 
> (including the then-current firmware).

I _know_ older code will work with new binaries.  I know because i've done 
it many times and the application is the sort that is not going to forgive 
any frivolity.

Unfortunately this is very specific to what i'm doing and does not apply 
for 99.99% of what people usually need.

> And that's what you don't want.  So git is not for you.

I use git for SCV from day one.  It's great.  I was just thinking aloud 
about something i've stumbled upon.  ;-)


 		Petko

^ permalink raw reply

* Re: git and binary files
From: Jakub Narebski @ 2008-01-16 14:20 UTC (permalink / raw)
  To: Petko Manolov; +Cc: Johannes Schindelin, git
In-Reply-To: <alpine.DEB.1.00.0801161600030.5260@bender.nucleusys.com>

Petko Manolov wrote:
> On Wed, 16 Jan 2008, Jakub Narebski wrote:
> 
> > You can always tag a blob (like junio-gpg-pub tag in git.git repository),
> > but it wouldn't be in a working directory. But it would get distributed
> > on clone.
> 
> Hm, how does it work?

You use git-hash-object to put file (-t blob) into the object database.
It would return sha1 of added object. Use git-tag to create tag to blob
(use returned sha1 for head). You can get file (to stdout) with 
"git cat-file blob tagname^{blob}".

The file would be in object database, but not in working directory
by default.

> > BTW. if those large binary files doesn't differ much between version, 
> > they should get well compressed even if you would store them normally, 
> > all revisions.
> 
> Unfortunately this is not the case.  These binary blobs are already 
> compressed and/or encrypted and adding even a few bytes ends up storing 
> new version in full size.

Can't you store them uncompressed?

-- 
Jakub Narebski
Poland

^ permalink raw reply

* Re: git and binary files
From: Jeff King @ 2008-01-16 14:18 UTC (permalink / raw)
  To: Petko Manolov; +Cc: Johannes Schindelin, git
In-Reply-To: <alpine.DEB.1.00.0801161606160.5260@bender.nucleusys.com>

On Wed, Jan 16, 2008 at 04:14:30PM +0200, Petko Manolov wrote:

>> How big are your firmware files? How often do they change, and how large 
>> are the changes? IOW, have you confirmed that repacking does not produce 
>> an acceptable delta, meaning you get versioning for very low space cost?
>
> Changes don't happen too often, but the size of everything binary in the  
> tree easily goes to about 100MB.  Three commits later it ends up at about  
> 300MB...

Right, as loose objects. Did you try running "git-gc" to repack?

-Peff

^ permalink raw reply

* Re: git and binary files
From: Petko Manolov @ 2008-01-16 14:14 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Schindelin, git
In-Reply-To: <20080116135420.GA21588@coredump.intra.peff.net>

On Wed, 16 Jan 2008, Jeff King wrote:

> On Wed, Jan 16, 2008 at 03:39:06PM +0200, Petko Manolov wrote:
>
>> What i am trying to suggest is that there might be cases when you need
>> something in the repository, but you don't want GIT to keep it's history
>> nor it's predecessors.  Leaving it out breaks the atomicity of such
>> repository and makes the project management more complex.
>
> But not versioning some files while versioning others breaks the
> atomicity of project version, which is at the core of git's model. There
> is no such thing as "this file is at revision X, but that one is at
> revision Y." There is only "the project is at revision X."

Sigh.  You are right.

However, the said project is kind of exception.  The binaries are there 
from the very beginning - they are indivisible part of the project and it 
won't work without them.  This is why i am not worried if i revert to 
previous source code version, but actually check-out fresh binary - in my 
case it won't break things.

>> There's a few examples out there that shows how to solve this, but it
>> seems inconvenient and involves branching, cloning, etc.  Isn't it
>> possible to add something like:
>>
>> 	"git nohistory firmware.bin"
>>
>> or
>> 	"git nohistory -i-understand-this-might-be-dangerous firmware.bin"
>
> Not easily. It goes against the underlying data model at the core of
> git.

Damn, i knew you'd say something like this. :-)

> How big are your firmware files? How often do they change, and how large 
> are the changes? IOW, have you confirmed that repacking does not produce 
> an acceptable delta, meaning you get versioning for very low space cost?

Changes don't happen too often, but the size of everything binary in the 
tree easily goes to about 100MB.  Three commits later it ends up at about 
300MB...


cheers,
Petko

^ permalink raw reply

* Re: git and binary files
From: Johannes Schindelin @ 2008-01-16 14:07 UTC (permalink / raw)
  To: Petko Manolov; +Cc: David Symonds, git
In-Reply-To: <alpine.DEB.1.00.0801161549140.5260@bender.nucleusys.com>

Hi,

On Wed, 16 Jan 2008, Petko Manolov wrote:

> On Wed, 16 Jan 2008, Johannes Schindelin wrote:
> 
> > I think that you're missing the point of version control.  It's not 
> > only about having an up-to-date source tree, but also about being able 
> > to go back to a certain revision.
> 
> No contradiction here.  In my case old source code will work perfectly 
> with new binaries/firmware.  That's why i don't _need_ the history, only 
> the latest stuff in order to save space.

No, you _do_ miss the point here.  You might _think_ that they work 
perfectly, but with revision control you want to have _exactly_ the same 
setup.  You want to be able to go back to a certain _revision_ (including 
the then-current firmware).

And that's what you don't want.  So git is not for you.

Ciao,
Dscho

^ permalink raw reply

* Re: git and binary files
From: Petko Manolov @ 2008-01-16 14:04 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: Johannes Schindelin, git
In-Reply-To: <m37ii9nagt.fsf@roke.D-201>

On Wed, 16 Jan 2008, Jakub Narebski wrote:

> You can always tag a blob (like junio-gpg-pub tag in git.git repository),
> but it wouldn't be in a working directory. But it would get distributed
> on clone.

Hm, how does it work?

> BTW. if those large binary files doesn't differ much between version, 
> they should get well compressed even if you would store them normally, 
> all revisions.

Unfortunately this is not the case.  These binary blobs are already 
compressed and/or encrypted and adding even a few bytes ends up storing 
new version in full size.


cheers,
Petko

^ permalink raw reply

* Re: git and binary files
From: Petko Manolov @ 2008-01-16 13:58 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: David Symonds, git
In-Reply-To: <alpine.LSU.1.00.0801161341430.17650@racer.site>

On Wed, 16 Jan 2008, Johannes Schindelin wrote:

> I think that you're missing the point of version control.  It's not only 
> about having an up-to-date source tree, but also about being able to go 
> back to a certain revision.

No contradiction here.  In my case old source code will work perfectly 
with new binaries/firmware.  That's why i don't _need_ the history, only 
the latest stuff in order to save space.

I do realize that what i am talking about is statistically microscopic 
scenario, but it does exist.  If there's no such feature then i don't have 
much choice, but stick with my current way of doing things.

> What you want is most likely covered by "rsync -au".

Yeah, just like in the old days when "git pull" didn't do everything for 
you.


thanks,
Petko

^ permalink raw reply

* Re: git and binary files
From: Jeff King @ 2008-01-16 13:54 UTC (permalink / raw)
  To: Petko Manolov; +Cc: Johannes Schindelin, git
In-Reply-To: <alpine.DEB.1.00.0801161521500.5260@bender.nucleusys.com>

On Wed, Jan 16, 2008 at 03:39:06PM +0200, Petko Manolov wrote:

> What i am trying to suggest is that there might be cases when you need  
> something in the repository, but you don't want GIT to keep it's history  
> nor it's predecessors.  Leaving it out breaks the atomicity of such  
> repository and makes the project management more complex.

But not versioning some files while versioning others breaks the
atomicity of project version, which is at the core of git's model. There
is no such thing as "this file is at revision X, but that one is at
revision Y." There is only "the project is at revision X."

> There's a few examples out there that shows how to solve this, but it  
> seems inconvenient and involves branching, cloning, etc.  Isn't it  
> possible to add something like:
>
> 	"git nohistory firmware.bin"
>
> or
> 	"git nohistory -i-understand-this-might-be-dangerous firmware.bin"

Not easily. It goes against the underlying data model at the core of
git.

How big are your firmware files? How often do they change, and how large
are the changes? IOW, have you confirmed that repacking does not produce
an acceptable delta, meaning you get versioning for very low space cost?

-Peff

^ permalink raw reply

* Re: git and binary files
From: Jakub Narebski @ 2008-01-16 13:53 UTC (permalink / raw)
  To: Petko Manolov; +Cc: Johannes Schindelin, git
In-Reply-To: <alpine.DEB.1.00.0801161521500.5260@bender.nucleusys.com>

Petko Manolov <petkan@nucleusys.com> writes:

> On Wed, 16 Jan 2008, Johannes Schindelin wrote:
> 
> > The answer is no.  You cannot ask git to have the newest version of
> > something, but not the old ones.  It contradicts the distributedness
> > of git, too.
> 
> I don't agree here.  Assume that whatever you're working on require
> firmware for a device that won't change during the lifetime of the
> software project.  The newest version of the said firmware is mostly
> bugfixes and you basically don't want to revert to the older
> ones. Consider the microcode for modern Pentiums, Core 2, etc.
> 
> What i am trying to suggest is that there might be cases when you need
> something in the repository, but you don't want GIT to keep it's
> history nor it's predecessors.  Leaving it out breaks the atomicity of
> such repository and makes the project management more complex.
> 
> There's a few examples out there that shows how to solve this, but it
> seems inconvenient and involves branching, cloning, etc.  Isn't it
> possible to add something like:
> 
>  	"git nohistory firmware.bin"
> 
> or
>  	"git nohistory -i-understand-this-might-be-dangerous firmware.bin"

You can always tag a blob (like junio-gpg-pub tag in git.git repository),
but it wouldn't be in a working directory. But it would get distributed
on clone.

BTW. if those large binary files doesn't differ much between version,
they should get well compressed even if you would store them normally,
all revisions.

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply

* Re: git and binary files
From: Johannes Schindelin @ 2008-01-16 13:42 UTC (permalink / raw)
  To: Petko Manolov; +Cc: David Symonds, git
In-Reply-To: <alpine.DEB.1.00.0801161517260.5260@bender.nucleusys.com>

Hi,

On Wed, 16 Jan 2008, Petko Manolov wrote:

> On Wed, 16 Jan 2008, David Symonds wrote:
> 
> > If you don't care about versioning those files, why would you use a 
> > version control system? Just store them somewhere else, and use 
> > symlinks.
> 
> That is certainly a way of doing it.  However, it will be much simpler 
> and fast to be able to "git clone" and then "git pull" every once in a 
> while. The alternative involves "cp -a" or most likely "scp -r" the 
> binaries along with the repository and you can never be sure that both 
> are in sync.

I think that you're missing the point of version control.  It's not only 
about having an up-to-date source tree, but also about being able to go 
back to a certain revision.

What you want is most likely covered by "rsync -au".

Hth,
Dscho

^ permalink raw reply

* Re: git and binary files
From: Petko Manolov @ 2008-01-16 13:39 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git
In-Reply-To: <alpine.LSU.1.00.0801161113170.17650@racer.site>

On Wed, 16 Jan 2008, Johannes Schindelin wrote:

> Your subject is a little bit misleading, no?  It's not about the 
> binariness (git handles binary files just fine, thankyouverymuch), but 
> about the not-tracking them.

You're absobloodylutely correct.  I was too preoccupied defining my problem 
in a better way, which left the subject kind of dumb.  Well, quite dumb. 
:-)

> The answer is no.  You cannot ask git to have the newest version of 
> something, but not the old ones.  It contradicts the distributedness of 
> git, too.

I don't agree here.  Assume that whatever you're working on require 
firmware for a device that won't change during the lifetime of the 
software project.  The newest version of the said firmware is mostly 
bugfixes and you basically don't want to revert to the older ones. 
Consider the microcode for modern Pentiums, Core 2, etc.

What i am trying to suggest is that there might be cases when you need 
something in the repository, but you don't want GIT to keep it's history 
nor it's predecessors.  Leaving it out breaks the atomicity of such 
repository and makes the project management more complex.

There's a few examples out there that shows how to solve this, but it 
seems inconvenient and involves branching, cloning, etc.  Isn't it 
possible to add something like:

 	"git nohistory firmware.bin"

or
 	"git nohistory -i-understand-this-might-be-dangerous firmware.bin"



cheers,
Petko

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox