Git development
 help / color / mirror / Atom feed
* [BUG?] "git submodule foreach" when command is ssh
From: Chris Packham @ 2011-01-05 22:32 UTC (permalink / raw)
  To: GIT

Hi All,

I just noticed something odd with "git submodule foreach". I was
running a script to create a backup of each submodule on a server I
have ssh access to. I was surprised to find that git submodule foreach
stopped silently after the first submodule.

A little debugging and I find that

git submodule foreach 'ssh localhost "ls /"' - stops silently after
the first module (note that the command does produce the expected
listing and there is no error about the command failing).

git submodule foreach 'echo foo' - works as expected

Any thoughts as to whats going on?

---
git version 1.7.3.2

^ permalink raw reply

* Re: concurrent fetches to update same mirror
From: Neal Kreitzinger @ 2011-01-05 22:34 UTC (permalink / raw)
  To: Jeff King; +Cc: Shawn Pearce, Neal Kreitzinger, git
In-Reply-To: <20110105211313.GB7808@sigill.intra.peff.net>

On 1/5/2011 3:13 PM, Jeff King wrote:
> On Wed, Jan 05, 2011 at 03:53:25PM -0500, Jeff King wrote:
>
>>> If both fetch processes try to update the same ref at the same time,
>>> one will get the lock and continue, and the other will crash with an
>>> error (because the lock was busy).  If one is slightly slower than the
>>> other, they will probably update the refs twice, with the slower fetch
>>> updating what the faster one had just updated.  :-)
>>
>> I assumed it would take the "old" value at the very beginning of the
>> fetch (before talking with the remote), and then see that the ref was
>> changed under our feet. Or does it simply do it at the end?
>
> Hmm. Weirder even, builtin/fetch.c:s_update_ref takes a "check_old"
> flag, and we do always use it for branch updates. But not for tag
> updates. I can't think of why. The code blames all the way back to the
> original builtin-fetch.
>
> Anyway, when we do check, we check the value from the beginning of the
> fetch. So you can get lock conflicts. For example, doing this:
>
>    mkdir repo&&  cd repo&&  git init
>    echo contents>foo&&  git add .&&  git commit -m one
>    git update-ref refs/remotes/origin/master refs/heads/master
>    git remote add origin some-remote-repo-that-takes-a-few-seconds
>    xterm -e 'git fetch -v; read'&  xterm -e 'git fetch -v; read'
>
> I.e., putting some cruft into the ref and then updating it. One fetch
> will force-write over the ref properly:
>
>     + ac32203...4e64590 master     ->  origin/master  (forced update)
>
> but the other one will barf on the lock:
>
>    error: Ref refs/remotes/origin/master is at 4e6459052ab329914c7712a926773e566b8c821d but expected ac32203727daa3bcb5fc041786aa45adbbe86299
>    ...
>     ! ac32203...4e64590 master     ->  origin/master  (unable to update local ref)
>
> Interestingly, in the case of ref _creation_, not update, like this:
>
>    mkdir repo&&  cd repo&&  git init
>    git remote add origin some-remote-repo-that-takes-a-few-seconds
>    xterm -e 'git fetch -v; read'&  xterm -e 'git fetch -v; read'
>
> then both will happily update, the second one overwriting the results of
> the first. It seems in the case of locking a ref which previously didn't
> exist, we don't enforce that it still doesn't exist.
>
> I wonder if we should, but perhaps there is some corner case I am not
> considering. The code is in lock_ref_sha1_basic, but blaming didn't turn
> up anything helpful.
>
> -Peff

This was actually the case in my test.  Updates to the mirror are always 
new branches except for master.  The only pre-existing branch that might 
get updated is master, but in that test it didn't.  The new branches and 
tags were updated.  The new tags always point to the new branches.  I'm 
running 1.7.1 on both servers.

v/r,
Neal

^ permalink raw reply

* Re: concurrent fetches to update same mirror
From: Neal Kreitzinger @ 2011-01-05 22:42 UTC (permalink / raw)
  To: Jeff King; +Cc: Shawn Pearce, Neal Kreitzinger, git
In-Reply-To: <20110105211313.GB7808@sigill.intra.peff.net>

On 1/5/2011 3:13 PM, Jeff King wrote:
> On Wed, Jan 05, 2011 at 03:53:25PM -0500, Jeff King wrote:
>
>>> If both fetch processes try to update the same ref at the same time,
>>> one will get the lock and continue, and the other will crash with an
>>> error (because the lock was busy).  If one is slightly slower than the
>>> other, they will probably update the refs twice, with the slower fetch
>>> updating what the faster one had just updated.  :-)
>>
>> I assumed it would take the "old" value at the very beginning of the
>> fetch (before talking with the remote), and then see that the ref was
>> changed under our feet. Or does it simply do it at the end?
>
> Hmm. Weirder even, builtin/fetch.c:s_update_ref takes a "check_old"
> flag, and we do always use it for branch updates. But not for tag
> updates. I can't think of why. The code blames all the way back to the
> original builtin-fetch.
>
> Anyway, when we do check, we check the value from the beginning of the
> fetch. So you can get lock conflicts. For example, doing this:
>
>    mkdir repo&&  cd repo&&  git init
>    echo contents>foo&&  git add .&&  git commit -m one
>    git update-ref refs/remotes/origin/master refs/heads/master
>    git remote add origin some-remote-repo-that-takes-a-few-seconds
>    xterm -e 'git fetch -v; read'&  xterm -e 'git fetch -v; read'
>
> I.e., putting some cruft into the ref and then updating it. One fetch
> will force-write over the ref properly:
>
>     + ac32203...4e64590 master     ->  origin/master  (forced update)
>
> but the other one will barf on the lock:
>
>    error: Ref refs/remotes/origin/master is at 4e6459052ab329914c7712a926773e566b8c821d but expected ac32203727daa3bcb5fc041786aa45adbbe86299
>    ...
>     ! ac32203...4e64590 master     ->  origin/master  (unable to update local ref)
>
> Interestingly, in the case of ref _creation_, not update, like this:
>
>    mkdir repo&&  cd repo&&  git init
>    git remote add origin some-remote-repo-that-takes-a-few-seconds
>    xterm -e 'git fetch -v; read'&  xterm -e 'git fetch -v; read'
>
> then both will happily update, the second one overwriting the results of
> the first. It seems in the case of locking a ref which previously didn't
> exist, we don't enforce that it still doesn't exist.
>
> I wonder if we should, but perhaps there is some corner case I am not
> considering. The code is in lock_ref_sha1_basic, but blaming didn't turn
> up anything helpful.
>
> -Peff

In the case of concurrent pulls to the same non-bare repo, could the 
working tree or index get corrupted, or does git have concurrency 
control mechanisms for this too?

v/r,
Neal

^ permalink raw reply

* Re: [BUG?] "git submodule foreach" when command is ssh
From: Chris Packham @ 2011-01-05 22:50 UTC (permalink / raw)
  To: GIT
In-Reply-To: <AANLkTi=x2i6NvDNRzbszhk-a-z5AYe46-iUBxQsxJJHC@mail.gmail.com>

On Thu, Jan 6, 2011 at 11:32 AM, Chris Packham <judge.packham@gmail.com> wrote:
> Hi All,
>
> I just noticed something odd with "git submodule foreach". I was
> running a script to create a backup of each submodule on a server I
> have ssh access to. I was surprised to find that git submodule foreach
> stopped silently after the first submodule.
>
> A little debugging and I find that
>
> git submodule foreach 'ssh localhost "ls /"' - stops silently after
> the first module (note that the command does produce the expected
> listing and there is no error about the command failing).
>
> git submodule foreach 'echo foo' - works as expected
>
> Any thoughts as to whats going on?
>
> ---
> git version 1.7.3.2
>

Actually this might be a ssh/bash bug (feature?). There is different
behaviour between

  find . -maxdepth 1 -type d -a ! -name '\.*' | while read; do echo
$REPLY && ssh localhost ls /; done

and

  find . -maxdepth 1 -type d -a ! -name '\.*' | while read; do echo
$REPLY && ls /; done

^ permalink raw reply

* Re: concurrent fetches to update same mirror
From: Jeff King @ 2011-01-05 22:57 UTC (permalink / raw)
  To: Neal Kreitzinger; +Cc: Shawn Pearce, Neal Kreitzinger, git
In-Reply-To: <4D24F3E9.3070904@gmail.com>

On Wed, Jan 05, 2011 at 04:42:49PM -0600, Neal Kreitzinger wrote:

> In the case of concurrent pulls to the same non-bare repo, could the
> working tree or index get corrupted, or does git have concurrency
> control mechanisms for this too?

There's a lock on the index, so it shouldn't be corruptable; one process
will just end up waiting. I'm not sure offhand whether writing working
tree files is done under any lock, but I would tend to think not, since
it can be a long process. However, writing the same file twice should be
OK; we unlink the old version and create the new from scratch. So the
first writer will get its write-in-progress unlinked, and the second one
will "win".

-Peff

^ permalink raw reply

* Re: [BUG?] "git submodule foreach" when command is ssh
From: Jeff King @ 2011-01-05 23:03 UTC (permalink / raw)
  To: Chris Packham; +Cc: GIT
In-Reply-To: <AANLkTini=GaGSHDX4e1jhPVxKaSayUJoWa=w4u4Rz-+5@mail.gmail.com>

On Thu, Jan 06, 2011 at 11:50:58AM +1300, Chris Packham wrote:

> Actually this might be a ssh/bash bug (feature?). There is different
> behaviour between
> 
>   find . -maxdepth 1 -type d -a ! -name '\.*' | while read; do echo
> $REPLY && ssh localhost ls /; done
> 
> and
> 
>   find . -maxdepth 1 -type d -a ! -name '\.*' | while read; do echo
> $REPLY && ls /; done

Ssh will opportunistically eat data on stdin to send to the other side,
even though the command on the other side ("ls" in this case) will never
read it. Because of course ssh has no way of knowing that, and is trying
to be an interactive terminal. So it ends up eating some random amount
of the data you expected to go to the "read" call.

You can use the "-n" option to suppress it. For example:

  $ (echo foo; echo bar) |
    while read line; do
      echo local $line
      ssh host "echo remote $line"
    done

produces:

  local foo
  remote foo

but:

  $ (echo foo; echo bar) |
    while read line; do
      echo local $line
      ssh -n host "echo remote $line"
    done

produces:

  local foo
  remote foo
  local bar
  remote bar

which is what you want.

-Peff

^ permalink raw reply

* Re: patch for git-p4
From: Junio C Hamano @ 2011-01-05 23:16 UTC (permalink / raw)
  To: kusmabite; +Cc: Andrew Garber, git
In-Reply-To: <AANLkTimdMH_HcF-Qk3SSmqT24OgxynYnXpSLiDtU7Y6c@mail.gmail.com>

Erik Faye-Lund <kusmabite@gmail.com> writes:

> We tend not to write commit messages in past tence. E.g "git-p4:
> replace tabs with spaces" (notice that I removed a 'd').

Just for the record, we do not write in present tense either.  That
"replace" is imperative.

^ permalink raw reply

* Re: [BUG?] "git submodule foreach" when command is ssh
From: Chris Packham @ 2011-01-05 23:22 UTC (permalink / raw)
  To: Jeff King; +Cc: GIT
In-Reply-To: <20110105230334.GB9774@sigill.intra.peff.net>

On Thu, Jan 6, 2011 at 12:03 PM, Jeff King <peff@peff.net> wrote:
> On Thu, Jan 06, 2011 at 11:50:58AM +1300, Chris Packham wrote:
>
>> Actually this might be a ssh/bash bug (feature?). There is different
>> behaviour between
>>
>>   find . -maxdepth 1 -type d -a ! -name '\.*' | while read; do echo
>> $REPLY && ssh localhost ls /; done
>>
>> and
>>
>>   find . -maxdepth 1 -type d -a ! -name '\.*' | while read; do echo
>> $REPLY && ls /; done
>
> Ssh will opportunistically eat data on stdin to send to the other side,
> even though the command on the other side ("ls" in this case) will never
> read it. Because of course ssh has no way of knowing that, and is trying
> to be an interactive terminal. So it ends up eating some random amount
> of the data you expected to go to the "read" call.
>
> You can use the "-n" option to suppress it. For example:
>
>  $ (echo foo; echo bar) |
>    while read line; do
>      echo local $line
>      ssh host "echo remote $line"
>    done
>
> produces:
>
>  local foo
>  remote foo
>
> but:
>
>  $ (echo foo; echo bar) |
>    while read line; do
>      echo local $line
>      ssh -n host "echo remote $line"
>    done
>
> produces:
>
>  local foo
>  remote foo
>  local bar
>  remote bar
>
> which is what you want.
>
> -Peff

Thanks that makes sense and adding -n to my ssh invocations solves the problem.

^ permalink raw reply

* Re: Resumable clone/Gittorrent (again)
From: Maaartin @ 2011-01-05 23:28 UTC (permalink / raw)
  To: git
In-Reply-To: <AANLkTinUV9Z_w85Gz13J+bm8xqnxJ9jBJXJm9bn5Y2ec@mail.gmail.com>

Nguyen Thai Ngoc Duy <pclouds <at> gmail.com> writes:

> I've been analyzing bittorrent protocol and come up with this. The
> last idea about a similar thing [1], gittorrent, was given by Nicolas.
> This keeps close to that idea (i.e the transfer protocol must be around git
> objects, not file chunks) with a bit difference.
>
> The idea is to transfer a chain of objects (trees or blobs), including
> base object and delta chain. Objects are chained in according to
> worktree layout, e.g. all objects of path/to/any/blob will form a
> chain, from a commit tip down to the root commits. Chains can have
> gaps, and don't need to start from commit tip. The transfer is
> resumable because if a delta chain is corrupt at some point, we can
> just request another chain from where it stops. Base object is
> obviously resumable.

I may be talking nonsense, please bare with me.

I'm not sure if it works well, since chains defined this way change over time. 
I may request commits A and B while declaring to possess commits C and D. One 
server may be ahead of A, so should it send me more data or repack the chain so 
that the non-requested versions get excluded? At the same time the server may 
be missing B and posses only some ancestors of it. Should it send me only a 
part of the chain or should I better ask a different server?

Moreover, in case a directory gets renamed, the content may get transfered 
needlessly. This is probably no big problem.

I haven't read the whole other thread yet, but what about going the other way 
round? Use a single commit as a chain, create deltas assuming that all 
ancestors are already available. The packs may arrive out of order, so the 
decompression may have to wait. The number of commits may be one order of 
magnitude larger than the the number of paths (there are currently 2254 paths 
and 24235 commits in git.git), so grouping consequent commits into one larger 
pack may be useful.

The advantage is that the packs stays stable over time, you may create them 
using the most aggressive and time-consuming settings and store them forever. 
You could create packs for single commits, packs for non-overlapping 
consecutive pairs of them, for non-overlapping pairs of pairs, etc. I mean with 
commits numbered 0, 1, 2, ... create packs [0,1], [2,3], ..., [0,3], [4,7], 
etc. The reason for this is obviously to allow reading groups of commits from 
different servers so that they fit together (similar to Buddy memory 
allocation). Of course, there are things like branches bringing chaos in this 
simple scheme, but I'm sure this can be solved somehow.

Another problem is the client requesting commits A and B while declaring to 
possess commits C and D. When both C and D are ancestors of either A or B, you 
can ignore it (as you assume this while packing, anyway). The other case is 
less probable, unless e.g. C is the master and A is a developing branch. 
Currently. I've no idea how to optimize this and whether this could be 
important.

I see no disadvantage when compared to path-based chains, but am probably 
overlooking something obvious.

^ permalink raw reply

* Re: concurrent fetches to update same mirror
From: Junio C Hamano @ 2011-01-05 23:29 UTC (permalink / raw)
  To: Jeff King; +Cc: Shawn Pearce, Neal Kreitzinger, git
In-Reply-To: <20110105211313.GB7808@sigill.intra.peff.net>

Jeff King <peff@peff.net> writes:

> Interestingly, in the case of ref _creation_, not update, like this:
>
>   mkdir repo && cd repo && git init
>   git remote add origin some-remote-repo-that-takes-a-few-seconds
>   xterm -e 'git fetch -v; read' & xterm -e 'git fetch -v; read'
>
> then both will happily update, the second one overwriting the results of
> the first. It seems in the case of locking a ref which previously didn't
> exist, we don't enforce that it still doesn't exist.

We probably should, especially when there is no --force or +prefix is
involved.

^ permalink raw reply

* Re: [BUG?] "git submodule foreach" when command is ssh
From: Seth Robertson @ 2011-01-05 23:02 UTC (permalink / raw)
  To: Chris Packham; +Cc: GIT
In-Reply-To: <AANLkTi=x2i6NvDNRzbszhk-a-z5AYe46-iUBxQsxJJHC@mail.gmail.com>


In message <AANLkTi=x2i6NvDNRzbszhk-a-z5AYe46-iUBxQsxJJHC@mail.gmail.com>, Chri
s Packham writes:

    I just noticed something odd with "git submodule foreach". I was
    running a script to create a backup of each submodule on a server I
    have ssh access to. I was surprised to find that git submodule foreach
    stopped silently after the first submodule.

    A little debugging and I find that

    git submodule foreach 'ssh localhost "ls /"' - stops silently after

Putting some input redirection will work around the problem.
Presumably some pipe input is going into the ssh accidentally.

git submodule foreach 'ssh localhost "ls /" < /dev/null'

I'll also just take this moment to advertise gitslave
(http://gitslave.sf.net) as an alternate to submodules which may help
(or hinder) you--depending on your workflow.  It doesn't suffer from
this particular problem in any case.

					-Seth Robertson

^ permalink raw reply

* Status of the svn remote helper project (Jan 2011, #1)
From: Jonathan Nieder @ 2011-01-05 23:39 UTC (permalink / raw)
  To: git
  Cc: Ramkumar Ramachandra, Sverre Rabbelier, David Barr, Sam Vilain,
	Stephen Bash, Tomas Carnecky
In-Reply-To: <20101205113717.GH4332@burratino>

Here are the topics that are cooking in vcs-svn-pu.

Hopefully v1.7.4-rc0 is treating you well and free git time has been
going into finding and fixing regressions.  So if you are bored and
looking for something to do, please skip to [1] and ignore the rest of
this message.

December was a busy month.  Excluding changes from the 'jch' branch:

 39 files changed, 1472 insertions(+), 2020 deletions(-)

That breaks down as

   1.7% Documentation/
   1.8% contrib/svn-fe/
  17.5% t/
  65.4% vcs-svn/

The other 13%:

 .gitignore                        |    3 -
 Makefile                          |   17 +--
 fast-import.c                     |  134 ++++++++-----
 quote.h                           |    3 +-

Users will probably notice that svn-fe requires ls and cat-blob
support from fast-import (alas); hopefully the lower memory footprint,
code simplification and incremental import support can justify that
cost.

The fast-import changes are a mixture of enhancements to the "ls"
command and general optimizations.  The optimizations are not in their
final form and likely have bugs but are being sent out now for some
early exposure.

As always, a merge of the branches listed below is available as

	git://repo.or.cz/git/jrn.git vcs-svn-pu

and individual topic branches are available in that repository under
the refs/topics namespace.

Let's get svn-fe3 polished so when the next merge window comes around
it is ready to be merged quickly.  

--------------------------------------------------
[Graduated to "master"]
* db/fast-import-object-reuse (2010-11-24) 1 commit
 - fast-import: insert new object entries at start of hash bucket

* jn/fast-import-ondemand-checkpoint (2010-11-24) 1 commit
 - fast-import: treat SIGUSR1 as a request to access objects early

* jn/svn-fe-makefile (2010-12-04) 1 commit
 - Makefile: dependencies for vcs-svn tests

* rr/svnfe-tests-no-perl (2010-11-23) 1 commit
 - t9010 (svn-fe): Eliminate dependency on svn perl bindings

* jn/maint-svn-fe (2010-12-05) 2 commits
 - vcs-svn: fix intermittent repo_tree corruption
 - treap: make treap_insert return inserted node

* db/fast-import-blob-access (2010-12-04) 4 commits
 - fast-import: Allow cat-blob requests at arbitrary points in stream
 - fast-import: let importers retrieve blobs
 - fast-import: clarify documentation of "feature" command
 - fast-import: stricter parsing of integer options

The old tip commit that adds an 'ls' command was reworked (see below).

* jn/fast-import-ondemand-checkpoint (2010-11-24) 1 commit
 - fast-import: treat SIGUSR1 as a request to access objects early

--------------------------------------------------
[New Topics]
* jn/line-buffer-error (2010-12-28) 4 commits
 - vcs-svn: improve reporting of input errors
 - vcs-svn: make buffer_copy_bytes return length read
 - vcs-svn: make buffer_skip_bytes return length read
 - vcs-svn: allow input errors to be detected promptly

>From jn/svndiff0 but expanded, waiting for feedback from the list.
These let the calling function take care of reporting the error with
more context; is that worth it or would it make more sense to die()
directly?

* jn/line-buffer-large-file (2010-12-24) 1 commit
 - vcs-svn: improve support for reading large files

>From jn/svndiff0.  Will merge soon if there are no objections.

* jn/line-buffer (2011-01-02) 14 commits
 - Merge branch 'jn/line-buffer-large-file' into jn/line-buffer
 - Merge branch 'jn/line-buffer-error' into jn/line-buffer
 - vcs-svn: teach line_buffer about temporary files
 - vcs-svn: allow input from file descriptor
 - vcs-svn: allow character-oriented input
 - vcs-svn: add binary-safe read function
 - t0081 (line-buffer): add buffering tests
 - vcs-svn: tweak test-line-buffer to not assume line-oriented input
 - tests: give vcs-svn/line_buffer its own test script
 - vcs-svn: make test-line-buffer input format more flexible
 - vcs-svn: teach line_buffer to handle multiple input files
 - vcs-svn: collect line_buffer data in a struct
 - vcs-svn: replace buffer_read_string memory pool with a strbuf
 - vcs-svn: eliminate global byte_buffer
 (this branch uses jn/line-buffer-error and jn/line-buffer-large-file.)

>From jn/svndiff0 and db/text-delta.  Putting temporary files with
meaningless names in /tmp is unfortunate (maybe buffer_tmpfile_init
could use a filename prefix argument?).

* jn/unsigned-overflow (2010-12-25) 1 commit
 - compat: helper for detecting unsigned overflow

>From jn/svndiff0.  Should submit for separate inclusion.

* jn/sliding-window (2011-01-02) 1 commit
 - vcs-svn: learn to maintain a sliding view of a file
 (this branch uses jn/line-buffer, jn/line-buffer-error, and
  jn/line-buffer-large-file.)

>From jn/svndiff0 but clarified somewhat.

* db/vcs-svn-incremental (2011-01-05) 20 commits
 - svn-fe: WIP testme.sh performance enhancements
 - svn-fe: testme.sh update
 - vcs-svn: use mark from previous import for parent commit
 - vcs-svn: handle filenames with dq correctly
 - vcs-svn: quote paths correctly for ls command
 - vcs-svn: eliminate repo_tree structure
 - vcs-svn: add a comment before each commit
 - vcs-svn: simplify repo_modify_path and repo_copy
 - vcs-svn: prepare to eliminate repo_tree structure
 - vcs-svn: do not rely on marks for old blobs
 - vcs-svn: split off function to export result from delta application
 - vcs-svn: make apply_delta caller retrieve preimage
 - vcs-svn: explicitly close streams used for delta application at exit
 - vcs-svn: introduce cat_mark function to retrieve a marked blob
 - vcs-svn: save marks for imported commits
 - vcs-svn: use higher mark numbers for blobs
 - vcs-svn: check for errors reading from cat-blob-fd
 - quote.h: simplify the inclusion
 - Makefile: update dependencies for test-svn-fe.c
 - Merge branch 'db/fast-import-blob-access' into db/vcs-svn-incremental
 (this branch uses db/text-delta, db/prop-delta, jn/svndiff0,
  jn/sliding-window, jn/line-buffer, jn/line-buffer-error,
  jn/line-buffer-large-file, and db/fast-import-blob-access.)

Support for importing different revs in different svn-fe runs.

* db/optimize-vcs-svn (2011-01-05) 9 commits
 - vcs-svn: use strchr to find RFC822 delimiter
 - vcs-svn: drop obj_pool.h
 - vcs-svn: drop trp.h
 - vcs-svn: drop string_pool
 - vcs-svn: factor out usage of string_pool
 - vcs-svn: implement perfect hash for top-level keys
 - vcs-svn: implement perfect hash for node-prop keys
 - vcs-svn: avoid using ls command twice
 - vcs-svn: pass paths through to fast-import
 (this branch uses db/vcs-svn-incremental, db/text-delta, db/prop-delta,
  jn/svndiff0, jn/sliding-window, jn/line-buffer, jn/line-buffer-error,
  jn/line-buffer-large-file, and db/fast-import-blob-access.)

The diffstat says it all. ;-)

* db/optimize-fast-import (2011-01-05) 3 commits
 - WIP
 - WIP
 - WIP Hash/bitmap combo
 (this branch uses db/fast-import-blob-access.)

Very rough.  Testers beware.

I suspect "struct hash_table" may provide a simpler approach to
avoiding filling fast-import's tables.

--------------------------------------------------
[Cooking]
* jn/svndiff0 (2011-01-05) 11 commits
 - vcs-svn: microcleanup in svndiff0 window-reading code
 - vcs-svn: let deltas use data from preimage
 - vcs-svn: let deltas use data from postimage
 - vcs-svn: verify that deltas consume all inline data
 - vcs-svn: implement copyfrom_data delta instruction
 - vcs-svn: read instructions from deltas
 - vcs-svn: read inline data from deltas
 - vcs-svn: read the preimage when applying deltas
 - vcs-svn: parse svndiff0 window header
 - vcs-svn: skeleton of an svn delta parser
 - Merge branch 'jn/unsigned-overflow' into jn/svndiff0
 (this branch uses jn/sliding-window, jn/line-buffer,
  jn/line-buffer-error, and jn/line-buffer-large-file.)

Well tested and should be ready for wide use.

Is there some tool (a hex editor?) that can be used to read and write
deltas?  The "printf and test against 'svnadmin load'" method is a bit
time confusing.

* db/prop-delta (2010-12-09) 18 commits
 - vcs-svn: Simplify handling of deleted properties
 - vcs-svn: Allow change nodes for root of tree (/)
 - vcs-svn: Implement Prop-delta handling
 - vcs-svn: Sharpen parsing of property lines
 - vcs-svn: Split off function for handling of individual properties
 - vcs-svn: Make source easier to read on small screens
 - vcs-svn: More dump format sanity checks
 - vcs-svn: Reject path nodes without Node-action
 - vcs-svn: Delay read of per-path properties
 - vcs-svn: Combine repo_replace and repo_modify functions
 - vcs-svn: Replace = Delete + Add
 - vcs-svn: handle_node: Handle deletion case early
 - vcs-svn: Use mark to indicate nodes with included text
 - vcs-svn: Unclutter handle_node by introducing have_props var
 - vcs-svn: Eliminate node_ctx.mark global
 - vcs-svn: Eliminate node_ctx.srcRev global
 - vcs-svn: Check for errors from open()
 - Allow simple v3 dumps (no deltas yet)

All but the tip commit are in 'jch'.

* db/fast-import-blob-access (2011-01-03) 3 commits
 - fast-import: add 'ls' command
 - fast-import: treat filemodify with empty tree as delete
 - fast-import: clarify handling of cat-blob feature

A proof of concept for 'cat-blob' and 'ls' support targetting another
repository format (hg?) would be a great comfort.

Synchronization overhead seems to be a problem.  If someone can use
"top -b" output to produce a nice timechart then I would be happy to
take a look.

* db/text-delta (2011-01-04) 5 commits
 - svn-fe: Test script for handling of dumps with --deltas
 - vcs-svn: implement text-delta handling
 - vcs-svn: introduce repo_read_path to check the content at a path
 - Merge branch 'db/prop-delta' into db/text-delta
 - Merge branch 'jn/svndiff0' into db/text-delta
 (this branch uses db/prop-delta, jn/svndiff0, jn/sliding-window,
  jn/line-buffer, jn/line-buffer-error, and jn/line-buffer-large-file.)

Still seems to work. ;-)

* db/svn-extract-branches (2010-11-20) 1 commit
 - svn-fe: Script to remap svn history

Very rough but let's merge it so it doesn't get forgotten.

--------------------------------------------------
[Ejected]
* db/recognize-v3 (2010-11-20) 1 commit
 - vcs-svn: Allow simple v3 dumps (no deltas yet)

Not worth maintaining as a separate branch from db/prop-delta.

* xx/thinner-wrapper-svndiff0 (2010-11-07) 1 commit
 - svn-fe: stop linking to libz and libxdiff
 (this branch used jn/svndiff0.)

Fixed when jn/svndiff0 was rebased on top of jn/thinner-wrapper.

--------------------------------------------------
[Out of tree, stalled]

* tc/remote-helper-usability: $gmane/157860
 . Register new packs after the remote helper is done fetching
 . Properly record history of the notes ref
 . Fix ls-remote output when displaying impure refs
 . Add git-remote-svn
 . Introduce the git fast-import-helper
 . Rename get_mode() to decode_tree_mode() and export it
 . Allow the transport fetch command to add additional refs
 . Allow more than one keepfile in the transport
 . Remote helper: accept ':<value> <name>' as a response to 'list'

I here there has been some work to make this work with the usual
fast-import.

* rr/remote-helper: http://github.com/artagnon/git
 . [WIP] Temporary commit
 . remote-svn: Write in fetch functionality
 . run-command: Protect the FD 3 from being grabbed
 . remote-svn: Build a pipeline for the import using svnrdump
 . run-command: Extend child_process to include a backchannel FD
 . Allow the transport fetch command to add additional refs
 . Remote helper: accept ':<value> <name>' as a response to 'list'
 . test-svn-fe: Allow for a dumpfile on stdin
 . Add Tom's remote helper for reference
 . Add a stubby remote-svn remote helper
 . Add a correct svndiff applier

Work in progress, waiting on lower levels to stabilize.

* sb/svn-fe-example: $gmane/159054

[1] Debian: 8 reports[2]
    Fedora: 6 reports[3]
    Gentoo: 1 report[4]
[2] http://bugs.debian.org/cgi-bin/pkgreport.cgi?src=git;include=tags:upstream;exclude=tags:fixed-upstream;exclude=severity:wishlist
[3] https://bugzilla.redhat.com/buglist.cgi?component=git&bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED
[4] http://bugs.gentoo.org/buglist.cgi?short_desc_type=allwordssubstr&short_desc=dev-vcs/git&product=Gentoo+Linux&bug_status=UNCONFIRMED&bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED

^ permalink raw reply

* [PATCH 3/3] t3032: limit sed branch labels to 8 characters
From: Brandon Casey @ 2011-01-06  0:30 UTC (permalink / raw)
  To: gitster; +Cc: git, Brandon Casey
In-Reply-To: <gmeXEearzUOUEst4-B2b8sVUo0XhywYUDm7rCJikom1xi9tIroh9GnJRv-bJTzbCbvqI-4DOU3A@cipher.nrlssc.navy.mil>

From: Brandon Casey <drafnel@gmail.com>

POSIX leaves as unspecified the handling of labels greater than 8
characters.  Apparently, Sun decided to treat them as errors.  Make sed on
Solaris happy by trimming the length of labels to 8 characters.

Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
---
 t/t3032-merge-recursive-options.sh |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/t/t3032-merge-recursive-options.sh b/t/t3032-merge-recursive-options.sh
index 2293797..de9ff89 100755
--- a/t/t3032-merge-recursive-options.sh
+++ b/t/t3032-merge-recursive-options.sh
@@ -16,13 +16,13 @@ test_description='merge-recursive options
 test_expect_success 'setup' '
 	conflict_hunks () {
 		sed -n -e "
-			/^<<<</ b inconflict
+			/^<<<</ b conflict
 			b
-			: inconflict
+			: conflict
 			p
 			/^>>>>/ b
 			n
-			b inconflict
+			b conflict
 		" "$@"
 	} &&
 
-- 
1.7.3.1

^ permalink raw reply related

* [PATCH 2/3] t0001,t1510,t3301: use sane_unset which always returns with status 0
From: Brandon Casey @ 2011-01-06  0:30 UTC (permalink / raw)
  To: gitster; +Cc: git, Brandon Casey
In-Reply-To: <gmeXEearzUOUEst4-B2b8sVUo0XhywYUDm7rCJikom1xi9tIroh9GnJRv-bJTzbCbvqI-4DOU3A@cipher.nrlssc.navy.mil>

From: Brandon Casey <drafnel@gmail.com>

On some shells (like /usr/xpg4/bin/sh on Solaris), unset will exit
non-zero when passed the name of a variable that has not been set.  Use
sane_unset instead so that the return value of unset can be ignored while
the && linkage of the test script can be preserved.

Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
---
 t/t0001-init.sh       |    8 +++---
 t/t1510-repo-setup.sh |   64 ++++++++++++++++++++++++------------------------
 t/t3301-notes.sh      |    2 +-
 3 files changed, 37 insertions(+), 37 deletions(-)

diff --git a/t/t0001-init.sh b/t/t0001-init.sh
index af8b9c5..f684993 100755
--- a/t/t0001-init.sh
+++ b/t/t0001-init.sh
@@ -35,7 +35,7 @@ test_expect_success 'plain' '
 
 test_expect_success 'plain nested in bare' '
 	(
-		unset GIT_DIR GIT_WORK_TREE &&
+		sane_unset GIT_DIR GIT_WORK_TREE &&
 		git init --bare bare-ancestor.git &&
 		cd bare-ancestor.git &&
 		mkdir plain-nested &&
@@ -47,7 +47,7 @@ test_expect_success 'plain nested in bare' '
 
 test_expect_success 'plain through aliased command, outside any git repo' '
 	(
-		unset GIT_DIR GIT_WORK_TREE GIT_CONFIG_NOGLOBAL &&
+		sane_unset GIT_DIR GIT_WORK_TREE GIT_CONFIG_NOGLOBAL &&
 		HOME=$(pwd)/alias-config &&
 		export HOME &&
 		mkdir alias-config &&
@@ -65,7 +65,7 @@ test_expect_success 'plain through aliased command, outside any git repo' '
 
 test_expect_failure 'plain nested through aliased command' '
 	(
-		unset GIT_DIR GIT_WORK_TREE &&
+		sane_unset GIT_DIR GIT_WORK_TREE &&
 		git init plain-ancestor-aliased &&
 		cd plain-ancestor-aliased &&
 		echo "[alias] aliasedinit = init" >>.git/config &&
@@ -78,7 +78,7 @@ test_expect_failure 'plain nested through aliased command' '
 
 test_expect_failure 'plain nested in bare through aliased command' '
 	(
-		unset GIT_DIR GIT_WORK_TREE &&
+		sane_unset GIT_DIR GIT_WORK_TREE &&
 		git init --bare bare-ancestor-aliased.git &&
 		cd bare-ancestor-aliased.git &&
 		echo "[alias] aliasedinit = init" >>config &&
diff --git a/t/t1510-repo-setup.sh b/t/t1510-repo-setup.sh
index 500ffaf..c3798ce 100755
--- a/t/t1510-repo-setup.sh
+++ b/t/t1510-repo-setup.sh
@@ -80,7 +80,7 @@ test_repo() {
 #  - cwd can't be outside worktree
 
 test_expect_success '#0: setup' '
-	unset GIT_DIR GIT_WORK_TREE &&
+	sane_unset GIT_DIR GIT_WORK_TREE &&
 	mkdir 0 0/sub &&
 	cd 0 && git init && cd ..
 '
@@ -123,7 +123,7 @@ EOF
 # GIT_WORK_TREE is ignored -> #0
 
 test_expect_success '#1: setup' '
-	unset GIT_DIR GIT_WORK_TREE &&
+	sane_unset GIT_DIR GIT_WORK_TREE &&
 	mkdir 1 1/sub 1.wt 1.wt/sub 1/wt 1/wt/sub &&
 	cd 1 &&
 	git init &&
@@ -174,7 +174,7 @@ EOF
 #  - cwd can't be outside worktree
 
 test_expect_success '#2: setup' '
-	unset GIT_DIR GIT_WORK_TREE &&
+	sane_unset GIT_DIR GIT_WORK_TREE &&
 	mkdir 2 2/sub &&
 	cd 2 && git init && cd ..
 '
@@ -241,7 +241,7 @@ EOF
 #  - cwd can be outside worktree
 
 test_expect_success '#3: setup' '
-	unset GIT_DIR GIT_WORK_TREE &&
+	sane_unset GIT_DIR GIT_WORK_TREE &&
 	mkdir 3 3/sub 3/sub/sub 3.wt 3.wt/sub 3/wt 3/wt/sub &&
 	cd 3 && git init && cd ..
 '
@@ -504,7 +504,7 @@ EOF
 # core.worktree is ignored -> #0
 
 test_expect_success '#4: setup' '
-	unset GIT_DIR GIT_WORK_TREE &&
+	sane_unset GIT_DIR GIT_WORK_TREE &&
 	mkdir 4 4/sub &&
 	cd 4 &&
 	git init &&
@@ -550,7 +550,7 @@ EOF
 # GIT_WORK_TREE/core.worktree are ignored -> #0
 
 test_expect_success '#5: setup' '
-	unset GIT_DIR GIT_WORK_TREE &&
+	sane_unset GIT_DIR GIT_WORK_TREE &&
 	mkdir 5 5/sub &&
 	cd 5 &&
 	git init &&
@@ -602,7 +602,7 @@ EOF
 #  - cwd can be outside worktree
 
 test_expect_success '#6: setup' '
-	unset GIT_DIR GIT_WORK_TREE &&
+	sane_unset GIT_DIR GIT_WORK_TREE &&
 	mkdir 6 6/sub 6/sub/sub 6.wt 6.wt/sub 6/wt 6/wt/sub &&
 	cd 6 && git init && cd ..
 '
@@ -889,7 +889,7 @@ EOF
 # core.worktree is overridden by GIT_WORK_TREE -> #3
 
 test_expect_success '#7: setup' '
-	unset GIT_DIR GIT_WORK_TREE &&
+	sane_unset GIT_DIR GIT_WORK_TREE &&
 	mkdir 7 7/sub 7/sub/sub 7.wt 7.wt/sub 7/wt 7/wt/sub &&
 	cd 7 &&
 	git init &&
@@ -1155,7 +1155,7 @@ EOF
 # #0 except that git_dir is set by .git file
 
 test_expect_success '#8: setup' '
-	unset GIT_DIR GIT_WORK_TREE &&
+	sane_unset GIT_DIR GIT_WORK_TREE &&
 	mkdir 8 8/sub &&
 	cd 8 &&
 	git init &&
@@ -1202,7 +1202,7 @@ EOF
 # #1 except that git_dir is set by .git file
 
 test_expect_success '#9: setup' '
-	unset GIT_DIR GIT_WORK_TREE &&
+	sane_unset GIT_DIR GIT_WORK_TREE &&
 	mkdir 9 9/sub 9.wt 9.wt/sub 9/wt 9/wt/sub &&
 	cd 9 &&
 	git init &&
@@ -1251,7 +1251,7 @@ EOF
 # #2 except that git_dir is set by .git file
 
 test_expect_success '#10: setup' '
-	unset GIT_DIR GIT_WORK_TREE &&
+	sane_unset GIT_DIR GIT_WORK_TREE &&
 	mkdir 10 10/sub &&
 	cd 10 &&
 	git init &&
@@ -1318,7 +1318,7 @@ EOF
 # #3 except that git_dir is set by .git file
 
 test_expect_success '#11: setup' '
-	unset GIT_DIR GIT_WORK_TREE &&
+	sane_unset GIT_DIR GIT_WORK_TREE &&
 	mkdir 11 11/sub 11/sub/sub 11.wt 11.wt/sub 11/wt 11/wt/sub &&
 	cd 11 &&
 	git init &&
@@ -1586,7 +1586,7 @@ EOF
 
 
 test_expect_success '#12: setup' '
-	unset GIT_DIR GIT_WORK_TREE &&
+	sane_unset GIT_DIR GIT_WORK_TREE &&
 	mkdir 12 12/sub 12/sub/sub 12.wt 12.wt/sub 12/wt 12/wt/sub &&
 	cd 12 &&
 	git init &&
@@ -1634,7 +1634,7 @@ EOF
 # #5 except that git_dir is set by .git file
 
 test_expect_success '#13: setup' '
-	unset GIT_DIR GIT_WORK_TREE &&
+	sane_unset GIT_DIR GIT_WORK_TREE &&
 	mkdir 13 13/sub 13/sub/sub 13.wt 13.wt/sub 13/wt 13/wt/sub &&
 	cd 13 &&
 	git init &&
@@ -1684,7 +1684,7 @@ EOF
 # #6 except that git_dir is set by .git file
 
 test_expect_success '#14: setup' '
-	unset GIT_DIR GIT_WORK_TREE &&
+	sane_unset GIT_DIR GIT_WORK_TREE &&
 	mkdir 14 14/sub 14/sub/sub 14.wt 14.wt/sub 14/wt 14/wt/sub &&
 	cd 14 &&
 	git init &&
@@ -1975,7 +1975,7 @@ EOF
 # #7 except that git_dir is set by .git file
 
 test_expect_success '#15: setup' '
-	unset GIT_DIR GIT_WORK_TREE &&
+	sane_unset GIT_DIR GIT_WORK_TREE &&
 	mkdir 15 15/sub 15/sub/sub 15.wt 15.wt/sub 15/wt 15/wt/sub &&
 	cd 15 &&
 	git init &&
@@ -2247,7 +2247,7 @@ EOF
 #  - cwd can't be outside worktree
 
 test_expect_success '#16.1: setup' '
-	unset GIT_DIR GIT_WORK_TREE &&
+	sane_unset GIT_DIR GIT_WORK_TREE &&
 	mkdir 16 16/sub &&
 	cd 16 &&
 	git init &&
@@ -2378,7 +2378,7 @@ EOF
 # GIT_WORK_TREE is ignored -> #16.1 (with warnings perhaps)
 
 test_expect_success '#17.1: setup' '
-	unset GIT_DIR GIT_WORK_TREE &&
+	sane_unset GIT_DIR GIT_WORK_TREE &&
 	mkdir 17 17/sub &&
 	cd 17 &&
 	git init &&
@@ -2511,7 +2511,7 @@ EOF
 #  - cwd can't be outside worktree
 
 test_expect_success '#18: setup' '
-	unset GIT_DIR GIT_WORK_TREE &&
+	sane_unset GIT_DIR GIT_WORK_TREE &&
 	mkdir 18 18/sub &&
 	cd 18 &&
 	git init &&
@@ -2578,7 +2578,7 @@ EOF
 # bare repo is overridden by GIT_WORK_TREE -> #3
 
 test_expect_success '#19: setup' '
-	unset GIT_DIR GIT_WORK_TREE &&
+	sane_unset GIT_DIR GIT_WORK_TREE &&
 	mkdir 19 19/sub 19/sub/sub 19.wt 19.wt/sub 19/wt 19/wt/sub &&
 	cd 19 &&
 	git init &&
@@ -2844,7 +2844,7 @@ EOF
 # core.worktree is ignored -> #16.1
 
 test_expect_success '#20.1: setup' '
-	unset GIT_DIR GIT_WORK_TREE &&
+	sane_unset GIT_DIR GIT_WORK_TREE &&
 	mkdir 20 20/sub &&
 	cd 20 &&
 	git init &&
@@ -2972,7 +2972,7 @@ EOF
 # GIT_WORK_TREE/core.worktree are ignored -> #20.1
 
 test_expect_success '#21.1: setup' '
-	unset GIT_DIR GIT_WORK_TREE &&
+	sane_unset GIT_DIR GIT_WORK_TREE &&
 	mkdir 21 21/sub &&
 	cd 21 &&
 	git init &&
@@ -3108,7 +3108,7 @@ EOF
 #  - cwd can be outside worktree
 
 test_expect_success '#22.1: setup' '
-	unset GIT_DIR GIT_WORK_TREE &&
+	sane_unset GIT_DIR GIT_WORK_TREE &&
 	mkdir 22 &&
 	cd 22 &&
 	git init &&
@@ -3439,7 +3439,7 @@ test_expect_success '#22.2: at root' '
 # core.worktree is overridden by GIT_WORK_TREE -> #19
 
 test_expect_success '#23: setup' '
-	unset GIT_DIR GIT_WORK_TREE &&
+	sane_unset GIT_DIR GIT_WORK_TREE &&
 	mkdir 23 23/sub 23/sub/sub 23.wt 23.wt/sub 23/wt 23/wt/sub &&
 	cd 23 &&
 	git init &&
@@ -3706,7 +3706,7 @@ EOF
 # #16.2 except git_dir is set according to .git file
 
 test_expect_success '#24: setup' '
-	unset GIT_DIR GIT_WORK_TREE &&
+	sane_unset GIT_DIR GIT_WORK_TREE &&
 	mkdir 24 24/sub &&
 	cd 24 &&
 	git init &&
@@ -3754,7 +3754,7 @@ EOF
 # #17.2 except git_dir is set according to .git file
 
 test_expect_success '#25: setup' '
-	unset GIT_DIR GIT_WORK_TREE &&
+	sane_unset GIT_DIR GIT_WORK_TREE &&
 	mkdir 25 25/sub &&
 	cd 25 &&
 	git init &&
@@ -3804,7 +3804,7 @@ EOF
 # #18 except git_dir is set according to .git file
 
 test_expect_success '#26: setup' '
-	unset GIT_DIR GIT_WORK_TREE &&
+	sane_unset GIT_DIR GIT_WORK_TREE &&
 	mkdir 26 26/sub &&
 	cd 26 &&
 	git init &&
@@ -3872,7 +3872,7 @@ EOF
 # #19 except git_dir is set according to .git file
 
 test_expect_success '#27: setup' '
-	unset GIT_DIR GIT_WORK_TREE &&
+	sane_unset GIT_DIR GIT_WORK_TREE &&
 	mkdir 27 27/sub 27/sub/sub 27.wt 27.wt/sub 27/wt 27/wt/sub &&
 	cd 27 &&
 	git init &&
@@ -4140,7 +4140,7 @@ EOF
 # core.worktree is ignored -> #24
 
 test_expect_success '#28: setup' '
-	unset GIT_DIR GIT_WORK_TREE &&
+	sane_unset GIT_DIR GIT_WORK_TREE &&
 	mkdir 28 28/sub &&
 	cd 28 &&
 	git init &&
@@ -4189,7 +4189,7 @@ EOF
 # GIT_WORK_TREE/core.worktree are ignored -> #28
 
 test_expect_success '#29: setup' '
-	unset GIT_DIR GIT_WORK_TREE &&
+	sane_unset GIT_DIR GIT_WORK_TREE &&
 	mkdir 29 29/sub &&
 	cd 29 &&
 	git init &&
@@ -4239,7 +4239,7 @@ EOF
 # core.worktree and core.bare conflict, won't fly.
 
 test_expect_success '#30: setup' '
-	unset GIT_DIR GIT_WORK_TREE &&
+	sane_unset GIT_DIR GIT_WORK_TREE &&
 	mkdir 30 &&
 	cd 30 &&
 	git init &&
@@ -4278,7 +4278,7 @@ test_expect_success '#30: at root' '
 # #23 except git_dir is set according to .git file
 
 test_expect_success '#31: setup' '
-	unset GIT_DIR GIT_WORK_TREE &&
+	sane_unset GIT_DIR GIT_WORK_TREE &&
 	mkdir 31 31/sub 31/sub/sub 31.wt 31.wt/sub 31/wt 31/wt/sub &&
 	cd 31 &&
 	git init &&
diff --git a/t/t3301-notes.sh b/t/t3301-notes.sh
index dc2e04a..1921ca3 100755
--- a/t/t3301-notes.sh
+++ b/t/t3301-notes.sh
@@ -1067,7 +1067,7 @@ test_expect_success 'git notes copy diagnoses too many or too few parameters' '
 
 test_expect_success 'git notes get-ref (no overrides)' '
 	git config --unset core.notesRef &&
-	unset GIT_NOTES_REF &&
+	sane_unset GIT_NOTES_REF &&
 	test "$(git notes get-ref)" = "refs/notes/commits"
 '
 
-- 
1.7.3.1

^ permalink raw reply related

* [PATCH 1/3] trace.c: ensure NULL is not passed to printf
From: Brandon Casey @ 2011-01-06  0:30 UTC (permalink / raw)
  To: gitster; +Cc: git, Brandon Casey

From: Brandon Casey <drafnel@gmail.com>

GNU printf, and many others, will print the string "(null)" if a NULL
pointer is passed as the argument to a "%s" format specifier.  Some
implementations (like on Solaris) do not detect a NULL pointer and will
produce a segfault in this case.

So, fix this by ensuring that pointer variables do not contain the value
NULL.  Assign the string "(null)" to the variables are NULL.

Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
---
 trace.c |    9 ++++++++-
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/trace.c b/trace.c
index 02279b8..35d388d 100644
--- a/trace.c
+++ b/trace.c
@@ -154,6 +154,7 @@ static const char *quote_crnl(const char *path)
 /* FIXME: move prefix to startup_info struct and get rid of this arg */
 void trace_repo_setup(const char *prefix)
 {
+	const char *git_work_tree;
 	char cwd[PATH_MAX];
 	char *trace = getenv("GIT_TRACE");
 
@@ -164,8 +165,14 @@ void trace_repo_setup(const char *prefix)
 	if (!getcwd(cwd, PATH_MAX))
 		die("Unable to get current working directory");
 
+	if (!(git_work_tree = get_git_work_tree()))
+		git_work_tree = "(null)";
+
+	if (!prefix)
+		prefix = "(null)";
+
 	trace_printf("setup: git_dir: %s\n", quote_crnl(get_git_dir()));
-	trace_printf("setup: worktree: %s\n", quote_crnl(get_git_work_tree()));
+	trace_printf("setup: worktree: %s\n", quote_crnl(git_work_tree));
 	trace_printf("setup: cwd: %s\n", quote_crnl(cwd));
 	trace_printf("setup: prefix: %s\n", quote_crnl(prefix));
 }
-- 
1.7.3.1

^ permalink raw reply related

* Re: patch for git-p4
From: Andrew Garber @ 2011-01-06  0:46 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Erik Faye-Lund, git
In-Reply-To: <7vfwt7c56w.fsf@alter.siamese.dyndns.org>

I'll fix up the patch and resubmit it properly this weekend. Thanks!

^ permalink raw reply

* [ANNOUNCE] Git 1.7.3.5
From: Junio C Hamano @ 2011-01-06  0:52 UTC (permalink / raw)
  To: git

The latest maintenance release Git 1.7.3.5 is available at the
usual places:

  http://www.kernel.org/pub/software/scm/git/

  git-1.7.3.5.tar.{gz,bz2}			(source tarball)
  git-htmldocs-1.7.3.5.tar.{gz,bz2}		(preformatted docs)
  git-manpages-1.7.3.5.tar.{gz,bz2}		(preformatted docs)

The RPM binary packages for a few architectures are found in:

  RPMS/$arch/git-*-1.7.3.5-1.fc11.$arch.rpm	(RPM)

Just a handful of small fixes here and there, nothing spectacular, except
perhaps for a fix to the "am --abort" safety issue Linus noticed the other
day.

----------------------------------------------------------------

Changes since v1.7.3.4 are as follows:

Brandon Casey (1):
      test-lib.sh/test_decode_color(): use octal not hex in awk script

Jakub Narebski (1):
      gitweb: Include links to feeds in HTML header only for '200 OK' response

Jeff King (1):
      ident: die on bogus date format

Jiang Xin (1):
      Fix typo in git-gc document.

Jonathan Nieder (2):
      t0050: fix printf format strings for portability
      gitweb: skip logo in atom feed when there is none

Junio C Hamano (5):
      commit: die before asking to edit the log message
      am --abort: keep unrelated commits since the last failure and warn
      rebase --skip: correctly wrap-up when skipping the last patch
      Prepare for 1.7.3.5
      Git 1.7.3.5

Kevin Ballard (1):
      status: Quote paths with spaces in short format

Kirill Smelkov (2):
      t/t8006: Demonstrate blame is broken when cachetextconv is on
      fill_textconv(): Don't get/put cache if sha1 is not valid

Mark Lodato (1):
      fsck docs: remove outdated and useless diagnostic

Michael J Gruber (2):
      git-difftool.txt: correct the description of $BASE and describe $MERGED
      difftool: provide basename to external tools

Ramsay Allan Jones (1):
      t3419-*.sh: Fix arithmetic expansion syntax error

René Scharfe (1):
      close file on error in read_mmfile()

Robin H. Johnson (2):
      Fix false positives in t3404 due to SHELL=/bin/false
      t9001: Fix test prerequisites

Thomas Rast (1):
      userdiff: fix typo in ruby and python word regexes

Vasyl' Vavrychuk (1):
      trace.c: mark file-local function static

^ permalink raw reply

* [ANNOUNCE] Git 1.7.4-rc1
From: Junio C Hamano @ 2011-01-06  0:54 UTC (permalink / raw)
  To: git

A release candidate Git 1.7.4-rc1 is available at the usual places
for testing:

  http://www.kernel.org/pub/software/scm/git/

  git-1.7.4.rc1.tar.{gz,bz2}			(source tarball)
  git-htmldocs-1.7.4.rc1.tar.{gz,bz2}		(preformatted docs)
  git-manpages-1.7.4.rc1.tar.{gz,bz2}		(preformatted docs)

The RPM binary packages for a few architectures are found in:

  testing/git-*-1.7.4.rc1-1.fc13.$arch.rpm	(RPM)


Git v1.7.4 Release Notes (draft)
================================

Updates since v1.7.3
--------------------

 * The documentation Makefile now assumes by default asciidoc 8 and
   docbook-xsl >= 1.73. If you have older versions, you can set
   ASCIIDOC7 and ASCIIDOC_ROFF, respectively.

 * The option parsers of various commands that create new branches (or
   rename existing ones to a new name) were too loose and users were
   allowed to give a branch a name that begins with a dash by creative
   abuse of their command line options, which only led to burning
   themselves.  The name of a branch cannot begin with a dash now.

 * System-wide fallback default attributes can be stored in
   /etc/gitattributes; core.attributesfile configuration variable can
   be used to customize the path to this file.

 * The thread structure generated by "git send-email" has changed
   slightly.  Setting the cover letter of the latest series as a reply
   to the cover letter of the previous series with --in-reply-to used
   to make the new cover letter and all the patches replies to the
   cover letter of the previous series; this has been changed to make
   the patches in the new series replies to the new cover letter.

 * Bash completion script in contrib/ has been adjusted to be usable with
   Bash 4 (options with '=value' didn't complete)  It has been also made
   usable with zsh.

 * Different pagers can be chosen depending on which subcommand is
   being run under the pager, using "pager.<subcommand>" variable.

 * The hardcoded tab-width of 8 used in whitespace breakage checks is now
   configurable via the attributes mechanism.

 * Support of case insensitive filesystems (i.e. "core.ignorecase") has
   been improved.  For example, the gitignore mechanism didn't pay attention
   to the case insensitivity.

 * The <tree>:<path> syntax to name a blob in a tree, and :<path>
   syntax to name a blob in the index (e.g. "master:Makefile",
   ":hello.c") have been extended.  You can start <path> with "./" to
   implicitly have the (sub)directory you are in prefixed to the
   lookup.  Similarly, ":../Makefile" from a subdirectory would mean
   "the Makefile of the parent directory in the index".

 * "git blame" learned --show-email option to display the e-mail
   addresses instead of the names of authors.

 * "git commit" learned --fixup and --squash options to help later invocation
   of the interactive rebase.

 * Command line options to "git cvsimport" whose names are in capital
   letters (-A, -M, -R and -S) can now be specified as the default in
   the .git/config file by their longer names (cvsimport.authorsFile,
   cvsimport.mergeRegex, cvsimport.trackRevisions, cvsimport.ignorePaths).

 * "git daemon" can be built in MinGW environment.

 * "git daemon" can take more than one --listen option to listen to
   multiple addresses.

 * "git describe --exact-match" was optimized not to read commit
   objects unnecessarily.

 * "git diff" and "git grep" learned how functions and subroutines
   in Fortran look like.

 * "git fetch" learned "--recurse-submodules" option.

 * "git mergetool" tells vim/gvim to show three-way diff by default
   (use vimdiff2/gvimdiff2 as the tool name for old behaviour).

 * "git log -G<pattern>" limits the output to commits whose change has
   added or deleted lines that match the given pattern.

 * "git read-tree" with no argument as a way to empty the index is
   deprecated; we might want to remove it in the future.  Users can
   use the new --empty option to be more explicit instead.

 * "git repack -f" does not spend cycles to recompress objects in the
   non-delta representation anymore (use -F if you really mean it
   e.g. after you changed the core.compression variable setting).

 * "git merge --log" used to limit the resulting merge log to 20
   entries; this is now customizable by giving e.g. "--log=47".

 * "git merge" may work better when all files were moved out of a
   directory in one branch while a new file is created in place of that
   directory in the other branch.

 * "git rebase --autosquash" can use SHA-1 object names to name which
   commit to fix up (e.g. "fixup! e83c5163").

 * The default "recursive" merge strategy learned --rename-threshold
   option to influence the rename detection, similar to the -M option
   of "git diff".  From "git merge" frontend, "-X<strategy option>"
   interface, e.g. "git merge -Xrename-threshold=50% ...", can be used
   to trigger this.

 * The "recursive" strategy also learned to ignore various whitespace
   changes; the most notable is -Xignore-space-at-eol.

 * "git send-email" learned "--to-cmd", similar to "--cc-cmd", to read
   recipient list from a command output.

 * "git send-email" learned to read and use "To:" from its input files.

 * you can extend "git shell", which is often used on boxes that allow
   git-only login over ssh as login shell, with custom set of
   commands.

 * The current branch name in "git status" output can be colored differently
   from the generic header color by setting "color.status.branch" variable.

 * "git submodule sync" updates metainformation for all submodules,
   not just the ones that have been checked out.

 * gitweb can use custom 'highlight' command with its configuration file.

 * other gitweb updates.


Also contains various documentation updates.


Fixes since v1.7.3
------------------

All of the fixes in v1.7.3.X maintenance series are included in this
release, unless otherwise noted.

 * "git log --author=me --author=her" did not find commits written by
   me or by her; instead it looked for commits written by me and by
   her, which is impossible.

 * "git push --progress" shows progress indicators now.

 * "git repack" places its temporary packs under $GIT_OBJECT_DIRECTORY/pack
   instead of $GIT_OBJECT_DIRECTORY/ to avoid cross directory renames.

 * "git submodule update --recursive --other-flags" passes flags down
   to its subinvocations.

^ permalink raw reply

* Re: Resumable clone/Gittorrent (again)
From: Nguyen Thai Ngoc Duy @ 2011-01-06  1:32 UTC (permalink / raw)
  To: Maaartin; +Cc: git
In-Reply-To: <loom.20110105T222915-261@post.gmane.org>

On Thu, Jan 6, 2011 at 6:28 AM, Maaartin <grajcar1@seznam.cz> wrote:
> Nguyen Thai Ngoc Duy <pclouds <at> gmail.com> writes:
>
>> I've been analyzing bittorrent protocol and come up with this. The
>> last idea about a similar thing [1], gittorrent, was given by Nicolas.
>> This keeps close to that idea (i.e the transfer protocol must be around git
>> objects, not file chunks) with a bit difference.
>>
>> The idea is to transfer a chain of objects (trees or blobs), including
>> base object and delta chain. Objects are chained in according to
>> worktree layout, e.g. all objects of path/to/any/blob will form a
>> chain, from a commit tip down to the root commits. Chains can have
>> gaps, and don't need to start from commit tip. The transfer is
>> resumable because if a delta chain is corrupt at some point, we can
>> just request another chain from where it stops. Base object is
>> obviously resumable.
>
> I may be talking nonsense, please bare with me.
>
> I'm not sure if it works well, since chains defined this way change over time.
> I may request commits A and B while declaring to possess commits C and D. One
> server may be ahead of A, so should it send me more data or repack the chain so
> that the non-requested versions get excluded? At the same time the server may
> be missing B and posses only some ancestors of it. Should it send me only a
> part of the chain or should I better ask a different server?

I'll keep it simple. A chain is defined by one commit head. Such a
chain can't change over time. But you can ask for just part of the
chain, rev-list syntax can be used here. For example if you already
have commits C and D and 10 delta in the chain (linear history for
simplicity here), requesting "give me A~10 ^C ^D" should give required
commits.

> Moreover, in case a directory gets renamed, the content may get transfered
> needlessly. This is probably no big problem.

Yes, the chain constraint can backfire in these cases. We can mix
standard upload-pack/fetch-pack and this if the server can recognize
these cases, by cutting commit history into chunks. The dir rename
chunks can be fetched with git-fetch.

> I haven't read the whole other thread yet, but what about going the other way
> round? Use a single commit as a chain, create deltas assuming that all
> ancestors are already available. The packs may arrive out of order, so the
> decompression may have to wait. The number of commits may be one order of
> magnitude larger than the the number of paths (there are currently 2254 paths
> and 24235 commits in git.git), so grouping consequent commits into one larger
> pack may be useful.

The number of commits can increase fast. I'd rather have a
small/stable number over time. And commits depend on other commits so
you can't verify a commit until you have got all of its parents. That
does apply to file, but then this file chain does not interfere other
file chains.

> The advantage is that the packs stays stable over time, you may create them
> using the most aggressive and time-consuming settings and store them forever.
> You could create packs for single commits, packs for non-overlapping
> consecutive pairs of them, for non-overlapping pairs of pairs, etc. I mean with
> commits numbered 0, 1, 2, ... create packs [0,1], [2,3], ..., [0,3], [4,7],
> etc. The reason for this is obviously to allow reading groups of commits from
> different servers so that they fit together (similar to Buddy memory
> allocation). Of course, there are things like branches bringing chaos in this
> simple scheme, but I'm sure this can be solved somehow.

Pack encoding can change. And packs can contain objects you don't want
to share (i.e. hidden from public view).

> Another problem is the client requesting commits A and B while declaring to
> possess commits C and D. When both C and D are ancestors of either A or B, you
> can ignore it (as you assume this while packing, anyway). The other case is
> less probable, unless e.g. C is the master and A is a developing branch.
> Currently. I've no idea how to optimize this and whether this could be
> important.

As I said, we can request just part of a chain (from A+B to C+D).
git-fetch should be used if the repo is quite uptodate though. It's
just more efficient.
-- 
Duy

^ permalink raw reply

* Re: Resumable clone/Gittorrent (again)
From: Nguyen Thai Ngoc Duy @ 2011-01-06  1:47 UTC (permalink / raw)
  To: Luke Kenneth Casson Leighton; +Cc: Thomas Rast, Git Mailing List, Nicolas Pitre
In-Reply-To: <AANLkTikn+89iGbkt90Bv1Hndiimf4brcCNOo0HBX-oPy@mail.gmail.com>

On Thu, Jan 6, 2011 at 1:07 AM, Luke Kenneth Casson Leighton
<luke.leighton@gmail.com> wrote:
>  the plan is to turn that variation in the git pack-objects responses,
> across multiple peers, into an *advantage* not a liability.  how?
> like this:
>
>  * a client requiring objects from commit abcd0123 up to commit
> efga3456 sends out a DHT broadcast query to all and sundry who have
> commits abcd0123 and everything in between up to efga3456.
>
>  * those clients that can be bothered to respond, do so [refinements below]
>
>  * the requestor selects a few of them, and asks them to create git
> pack-objects.  this takes time, but that's ok.  once created, the size
> of the git pack-object is sent as part of the acknowledgement.
>
>  * the requestor, on receipt of all the sizes, selects the *smallest*
> one to begin the p2p (.torrent) from (by asking the remote client to
> create a .torrent specifically for that purpose, with the filename
> abcd0123-ebga3456).

That defeats the purpose of distributing. You are putting pressure on
certain peers.

>  now, an immediately obvious refinement of this is that those .torrent
> (pack-objects) "stick around", in a cache (with a hard limit defined
> on the cache size of course).  and so, when the client that requires a
> pack-object makes the request, of course, those remote clients that
> *already* have that cached pack-object for that specific commit-range
> should be given first priority, to avoid other clients from having to
> make massive amounts of git pack-objects.

Cache have its limits too. Suppose I half-fetch a pack then stop and
go wild for a month. The next month I restart the fetch, the pack may
no longer in cache. A new pack may or may not be identical to the old
pack.

Also if you go with packs, you are tied to the peer that generates
that pack. Two different peers can, in theory, generate different
packs (in encoding) for the same input.

Another thing with packs (ok, not exactly with packs) is how you
verify that's you have got what you asked. Bittorrent can verify every
piece a peer receives because sha-1 sum of those pieces are recorded
in .torrent file. We have SHA-1 all over the place, but if you don't
have base objects to undeltify, you can't use those SHA-1 to verify.
Verification is an important step before you advertise to other peers
"I have these".

> so, can you see that a) this is a far cry from the "simplistic
> transfer of blobs and trees" b) it's *not* going to overload peoples'
> systems by splattering (eek!) millions of md5 sums across the internet
> as bittorrent files c) it _does_ fit neatly into the bittorrent
> protocol d) it combines the best of git with the best of p2p
> distributed networking principles...

How can you advertise what you have to another peer?
-- 
Duy

^ permalink raw reply

* Re: Resumable clone/Gittorrent (again) - stable packs?
From: Zenaan Harkness @ 2011-01-06  2:29 UTC (permalink / raw)
  To: git

Bittorrent requires some stability around torrent files.

Can packs be generated deterministically?
If not by two separate repos, what about by one particular repo?

For Linus' linux-2.6.git, that repo is considered 'canonical' by many.

Pack-torrents could be ~1MiB, ~10MiB, ~100Mib, ~1GiB, or as configured
in a particular repo, which repo is the canonical location for
pack-torrents for all who consider that particular repo as canonical.

Perhaps a heuristic/ algorithm: once ten 10MiB (sequentially
generated) pack-torrents are floating around,
they could be simply concatenated to create a 100MiB pack-torrent,
with a deterministic name and SHA etc,
so that all those 10MiB pack-torrent files that torrent clients have,
can be re-used and locally combined into the 100MiB torrent as needed,
on demand.

Same for 100MiB -> 1GiB pack-torrents.

Individual extra commits:
While "small" number of additional commits go into a repo, clients
fall back to git-fetch, _after .

If Linus linus-2.6.git (currently configured "canonical" repo) goes
offline, simply configure a new remote canonical repo.

Branches:
Other "branches" repos of linux-2.6.git could create their own
consistent 50MiB (or as configured) pack-torrents which are
commits-only-missing-from-linux-2.6 pack-torrents (ie, those missing
from that repo's "canonical" upstream).

This would require clients have a recursive torrent locator (I start
at linux-net.git, which requires linux-2.6.git, so I go get those
packs as well as the linux-net.git packs).

Perhaps have a system-wide or user-wide git repo/ torrent config, or
check with user running git-clone linux-net.git "Do you have an
existing git.vger.kernel.org/linux-2.6.git archive?".

Zen

^ permalink raw reply

* Re: Resumable clone/Gittorrent (again)
From: Maaartin-1 @ 2011-01-06  3:34 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: git
In-Reply-To: <AANLkTi=_R53fm5Er0CdtZCFvDpE-Dqt8tMHAubcjOUBb@mail.gmail.com>

On 11-01-06 02:32, Nguyen Thai Ngoc Duy wrote:
> On Thu, Jan 6, 2011 at 6:28 AM, Maaartin <grajcar1@seznam.cz> wrote:
>> Nguyen Thai Ngoc Duy <pclouds <at> gmail.com> writes:

>> I haven't read the whole other thread yet, but what about going the other way
>> round? Use a single commit as a chain, create deltas assuming that all
>> ancestors are already available. The packs may arrive out of order, so the
>> decompression may have to wait. The number of commits may be one order of
>> magnitude larger than the the number of paths (there are currently 2254 paths
>> and 24235 commits in git.git), so grouping consequent commits into one larger
>> pack may be useful.
> 
> The number of commits can increase fast. I'd rather have a
> small/stable number over time.

In theory, I could create many commits per seconds. I could create many
unique paths per seconds, too. But I don't think it really happens. I do
know no larger repository than git.git and I don't want to download it
just to see how many commits, paths, and object it contains, but I'd
suppose it's less than one million commits, which should be manageable,
especially when commits get grouped together as I described below.

> And commits depend on other commits so
> you can't verify a commit until you have got all of its parents. That
> does apply to file, but then this file chain does not interfere other
> file chains.

That's true, but the verification is something done locally on the
client, it consumes no network traffic and no server resources, so I
consider it to be cheap. I need less than half a minute (using only a
single core) for verifying of the whole git.git repository (36 MB). This
is no problem, even when it had to wait until the download finishes. I'm
sure, the OP of [1] would be happy if he could wait for this.

>> The advantage is that the packs stays stable over time, you may create them
>> using the most aggressive and time-consuming settings and store them forever.
>> You could create packs for single commits, packs for non-overlapping
>> consecutive pairs of them, for non-overlapping pairs of pairs, etc. I mean with
>> commits numbered 0, 1, 2, ... create packs [0,1], [2,3], ..., [0,3], [4,7],
>> etc. The reason for this is obviously to allow reading groups of commits from
>> different servers so that they fit together (similar to Buddy memory
>> allocation). Of course, there are things like branches bringing chaos in this
>> simple scheme, but I'm sure this can be solved somehow.
> 
> Pack encoding can change.

I see I didn't explain it clear enough (or am missing something
completely). I know why the packs normally used by git can't be used for
this purpose. Let me retry: Let's assume there's a commit chain
A-B-C-D-E-F-..., the client has already commit B and requests commit F.
It may send requests to up to 4 servers, asking for C, D, E, and F,
respectively. The server being asked for E _creates_ a pack containing
all the information needed to create E given _all of_ A, B, C, D. As
base for any blob/whatever in E it may choose any blob contained in any
of these commits. Of course, it may also choose a blob already packed in
this pack. It may not choose any other blob, so any client having all
ancestors of E can use the pack. Different server and/or program
versions may create different packs for E, but all of them are
_interchangeable_. Because of this, it makes sense to _store_ it for
future reuse.

Compared to the way git packing normally works, this is a restriction,
but I don't think it leads to significantly worse compression. You guys
working on git can confirm or disprove it.

> And packs can contain objects you don't want
> to share (i.e. hidden from public view).

This pack would contain only commit E. I also described pairing intended
for greater efficiency. In this case a server creates a pack allowing
e.g. to create commits E and F given all their ancestors (while other
server creates a pack for C and D). This way the number of packs needed
may be a fraction of the total number of commits requested.

>> Another problem is the client requesting commits A and B while declaring to
>> possess commits C and D. When both C and D are ancestors of either A or B, you
>> can ignore it (as you assume this while packing, anyway). The other case is
>> less probable, unless e.g. C is the master and A is a developing branch.
>> Currently. I've no idea how to optimize this and whether this could be
>> important.
> 
> As I said, we can request just part of a chain (from A+B to C+D).
> git-fetch should be used if the repo is quite uptodate though. It's
> just more efficient.

[1] http://article.gmane.org/gmane.comp.version-control.git/164564

^ permalink raw reply

* git mergetool broken when rerere active
From: Martin von Zweigbergk @ 2011-01-06  3:39 UTC (permalink / raw)
  To: git
  Cc: Magnus Baeck, Avery Pennarun, Jay Soffian, David Aguilar,
	Junio C Hamano

Hi,

When rerere is enabled, git mergetool uses 'git rerere status' to find
out which files to run the merge tool on. This was introduced in
bb0a484 (mergetool: Skip autoresolved paths, 2010-08-17). Before that,
'git ls-files -u' was used, whether or not rerere was active.

This change caused two problems:

 (1) Before this change, it used to be that case that all conflicts
     would be resolved and added to the index after running 'git
     mergetool' without arguments, i.e. on all files. After the
     change, conflicts of type 'deleted by them' or 'deleted by us'
     would be ignored, since they are not listed shown by 'git rerere
     status'. Previously, git mergetool would ask whether to pick the
     modified file or to delete the file.

 (2) When running mergetool again after resolving some (or all)
     conflicts, so that some of the files have already been added to
     the index, mergetool will now print something like

     file1: file does not need merging
     Continue merging other unresolved paths (y/n) ?

     Before the change, any files that were already added to the index
     would just be skipped, without mergetool asking the user whether
     to continue.

I would like to have both the original properties in (1) and (2) back,
i.e. being ready for commit once 'git mergetool' has been successfully
completed, and having it ignore any files that have already been added
to the index.

I was reading the original thread [1], but I didn't quite understand
why just enabling rerere.autoupdate would not solve the problem. Maybe
it was just that the goal was a solution that works even with
rerere.autoupdate disabled? Can we fix it in some way by combining the
output of 'git rerere status' and 'git ls-files -u'?


Regards,
Martin

[1] http://thread.gmane.org/gmane.comp.version-control.git/153420

^ permalink raw reply

* Re: "git svn fetch" on a branch is broken after "git svn reset"
From: Albert Dvornik @ 2011-01-06  5:00 UTC (permalink / raw)
  To: git
In-Reply-To: <AANLkTikhaPP0bHEEeFf_2RgK_bdE-i+gaCKopfQjqgHP@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1988 bytes --]

Here is a reproducer using a very simple public SVN repo.

[I spent a bit of effort on trying to figure out what's going on, but
git-svn code, er, has a learning curve. =)  After a reset, fetch_all()
correctly backs up $base when fetching on the trunk, but not on the
branch, because $gs->rev_map_max returns a correctly reset value in
one case but not the other.  This may be because $fetch ALWAYS seems
to refer to the trunk; is that supposed to be the case?]

--bert Dvornik

On Wed, Jan 5, 2011 at 1:22 AM, Albert Dvornik <dvornik+git@gmail.com> wrote:
> The documentation for git svn claims that this should work:
>
>    git svn reset -r2 -p
>    git svn fetch
>
> But when I tried it (using an SVN tree that has recent commits only in
> a branch, not the trunk), it didn't work correctly.  "fetch" grabbed
> just the latest version from SVN, and not all revs from <revnum> to
> the head!  Note that it matters that this is in an SVN branch-- if I
> repeat the test using revs in the trunk, everything works as expected.
>
> Specifically, what I did was this:
>
>    git co -b testing refs/remotes/test-branch
>    git svn fetch
>
>    git svn reset -r 850
>    # does correctly rewind to rev 850, undoing commits in test-branch
>
>    git svn fetch
>    # oops, only fetches the *head* revision (rev 856) from SVN!
>    # In refs/remotes/test-branch, SVN rev 850 is now followed by 856!
>
> I then tried this again, but between reset and fetching I manually
> edited .git/svn/.metadata and moved back the *-maxRev versions to 850;
> after doing this, the fetch does the right thing.  I tried examining
> the logic in git-svn.perl to figure out why this happens and why it
> would be affecting a branch but not the trunk, but I didn't get very
> far.
>
> I can reproduce the problem on Linux (git version 1.7.2.1) and Windows
> (Git for Windows version 1.7.3.2.msysgit.0.4.ga4f3f or Cygwin git
> 1.7.2.3).
>
> Thoughts?
>
> --bert
>

[-- Attachment #2: svn-reset-test.sh --]
[-- Type: application/x-sh, Size: 2517 bytes --]

^ permalink raw reply

* Re: Resumable clone/Gittorrent (again)
From: Nguyen Thai Ngoc Duy @ 2011-01-06  6:36 UTC (permalink / raw)
  To: Maaartin-1; +Cc: git
In-Reply-To: <4D25385B.3010103@seznam.cz>

On Thu, Jan 6, 2011 at 10:34 AM, Maaartin-1 <grajcar1@seznam.cz> wrote:
> In theory, I could create many commits per seconds. I could create many
> unique paths per seconds, too. But I don't think it really happens. I do
> know no larger repository than git.git and I don't want to download it
> just to see how many commits, paths, and object it contains, but I'd
> suppose it's less than one million commits, which should be manageable,
> especially when commits get grouped together as I described below.

In pratice, commits are created every day in an active project. Paths
on the other hand are added less often (perhaps except webkit).

I've got some numbers:

 - wine.git has 72k commits, 260k trees, 200k blobs, 12k paths
 - git.git has 24k commits, 39k trees, 24k blobs, 2.7k paths
 - linux-2.6.git has 160k commits, 760k trees, 442k blobs, 46k paths

Large repos are more interesting because small ones can be cloned with
git-clone.

Listing all those commits in linux-2.6.git takes 160k*20=3M (I suppose
compressing is useless because SHA-1 is random). A compressed listing
of those 46k paths takes 200k.

>> And commits depend on other commits so
>> you can't verify a commit until you have got all of its parents. That
>> does apply to file, but then this file chain does not interfere other
>> file chains.
>
> That's true, but the verification is something done locally on the
> client, it consumes no network traffic and no server resources, so I
> consider it to be cheap. I need less than half a minute (using only a
> single core) for verifying of the whole git.git repository (36 MB). This
> is no problem, even when it had to wait until the download finishes. I'm
> sure, the OP of [1] would be happy if he could wait for this.

The point is you need to fetch its parent commits first in order to
verify a commit. Fetching a whole commit is more expensive than a
file. So while you can fetch a few commit bases and request for packs
from those bases in parallel, the cost of initial commit bases will be
high.

> I see I didn't explain it clear enough (or am missing something
> completely). I know why the packs normally used by git can't be used for
> this purpose. Let me retry: Let's assume there's a commit chain
> A-B-C-D-E-F-..., the client has already commit B and requests commit F.
> It may send requests to up to 4 servers, asking for C, D, E, and F,
> respectively. The server being asked for E _creates_ a pack containing
> all the information needed to create E given _all of_ A, B, C, D. As
> base for any blob/whatever in E it may choose any blob contained in any
> of these commits. Of course, it may also choose a blob already packed in
> this pack. It may not choose any other blob, so any client having all
> ancestors of E can use the pack. Different server and/or program
> versions may create different packs for E, but all of them are
> _interchangeable_. Because of this, it makes sense to _store_ it for
> future reuse.

They are interchangeable as a whole, yes. But you cannot fetch half
the pack from server A and the other half from server B. You can try
to recover as many deltas as possible in a broken pack, but how do you
request a server to send the rest of the pack to you?
-- 
Duy

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox