Git development
 help / color / mirror / Atom feed
* Re: [PATCH] git log [diff-tree options]...
From: Johannes Schindelin @ 2006-04-09 22:01 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git
In-Reply-To: <Pine.LNX.4.63.0604092312340.29136@wbgn013.biozentrum.uni-wuerzburg.de>

Hi,

On Sun, 9 Apr 2006, Johannes Schindelin wrote:

> On Sun, 9 Apr 2006, Linus Torvalds wrote:
> 
> >  - keep it - for historical reasons - as a internal shorthand, and just 
> >    turn it into "git log --diff -cc"
> 
> It is "git log --cc", right?

Like this?

---

 git.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

751e205a9ffd3a55094a0c0f657735023776cf74
diff --git a/git.c b/git.c
index 8776088..3a94afa 100644
--- a/git.c
+++ b/git.c
@@ -385,6 +385,13 @@ static int cmd_log(int argc, const char 
 	return 0;
 }
 
+static int cmd_whatchanged(int argc, const char **argv, char **envp)
+{
+	memmove(argv + 2, argv + 1, argc - 1);
+	argv[1] = "--cc";
+	return cmd_log(argc + 1, argv, envp);
+}
+
 static void handle_internal_command(int argc, const char **argv, char **envp)
 {
 	const char *cmd = argv[0];
@@ -395,6 +402,7 @@ static void handle_internal_command(int 
 		{ "version", cmd_version },
 		{ "help", cmd_help },
 		{ "log", cmd_log },
+		{ "whatchanged", cmd_whatchanged },
 	};
 	int i;
 
-- 
1.2.0.g61002-dirty

^ permalink raw reply related

* Re: [ANNOUNCE] git-svnconvert: YASI (Yet Another SVN importer)
From: Johannes Schindelin @ 2006-04-09 21:30 UTC (permalink / raw)
  To: git; +Cc: Jakub Narebski, git
In-Reply-To: <20060409211505.GA30567@nospam.com>

Hi,

On Sun, 9 Apr 2006, Rutger Nijlunsing wrote:

> On Sun, Apr 09, 2006 at 06:43:53PM +0200, Jakub Narebski wrote:
> > 
> > Instead adding dependence on Ruby, eh?
> 
> Take some, lose some ;)
> 
> Seriously, though, a dependancy on a mainstream language like
> Python/Perl/Ruby/.. isn't a problem since a package is available for
> all distributions. However, packages for mainstream languages are
> quite often out-of-date or are not supported at all. Seeing a program
> being dependant on a non-packaged module is enough for a truckload of
> people to not even try it.

I have _never_ seen a setup where Ruby was installed by default. Perl 
always, Python often.

Furthermore, my feeling is that we are in the beginning phase of migration 
from scripting languages (which are good for prototyping) towards plain C. 
So adding yet another scripting language dependency is a little backwards.

Ciao,
Dscho

^ permalink raw reply

* Re: [ANNOUNCE] git-svnconvert: YASI (Yet Another SVN importer)
From: Rutger Nijlunsing @ 2006-04-09 21:15 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git
In-Reply-To: <e1bdjq$qf6$1@sea.gmane.org>

On Sun, Apr 09, 2006 at 06:43:53PM +0200, Jakub Narebski wrote:
> Rutger Nijlunsing wrote:
> 
> > Since I didn't succeed in importing a (private) SVN repo into git, I
> > wrote a new converter to handle more cases like:
> 
> Both git-svn[*1*] and git-svnimport failed? Have you tried Tailor tool:
>   http://www.darcs.net/DarcsWiki/Tailor

git-svn and tailor can only track one branch (or trunk). As the
git-svn page states, it is for contributing to such a branch /
trunk. git-svnconvert is for converting a whole repository
incrementally of which branches (IMHO) are important to keep and
convert.

git-svnimport does handle multiple branches, but could not cope with
proxy + repo authentification, the weird repo layout I've had to cope
with (branches not only in /branches, several trunks) and some
revisions which contain non-sensical actions.

> >   - use command line svn instead of some perl library to have less
> >     dependancies and to support proxy + repo authentification.
> >     Might even work on MacOSX ;)
> 
> Instead adding dependence on Ruby, eh?

Take some, lose some ;)

Seriously, though, a dependancy on a mainstream language like
Python/Perl/Ruby/.. isn't a problem since a package is available for
all distributions. However, packages for mainstream languages are
quite often out-of-date or are not supported at all. Seeing a program
being dependant on a non-packaged module is enough for a truckload of
people to not even try it.

-- 
Rutger Nijlunsing ---------------------------------- eludias ed dse.nl
never attribute to a conspiracy which can be explained by incompetence
----------------------------------------------------------------------

^ permalink raw reply

* Re: [PATCH] git log [diff-tree options]...
From: Johannes Schindelin @ 2006-04-09 21:13 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git
In-Reply-To: <Pine.LNX.4.64.0604091158310.9504@g5.osdl.org>

Hi,

On Sun, 9 Apr 2006, Linus Torvalds wrote:

>  - keep it - for historical reasons - as a internal shorthand, and just 
>    turn it into "git log --diff -cc"

It is "git log --cc", right? And BTW, I was burnt by the difference of 
"git-log" and "git log" this time. "git-log" does not understand "--cc". 
Could we kill "git-log", please?

Ciao,
Dscho

^ permalink raw reply

* Re: [PATCH] git log [diff-tree options]...
From: Linus Torvalds @ 2006-04-09 19:26 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7v3bgmbm8b.fsf@assigned-by-dhcp.cox.net>



On Sun, 9 Apr 2006, Junio C Hamano wrote:
> Linus Torvalds <torvalds@osdl.org> writes:
> 
> > Well, on the other hand, the new "git log --diff" should get the revision 
> > counting right even if it's _not_ done by the caller.
> 
> Not if the user uses --diff-filter and/or --pickaxe, and after
> we start omitting the log message part when no diff is output.

Fair enough. At that point the counting does have to be done in the 
caller, I guess.

> A merge commit touching a path but not actually changing the contents of 
> the path from parents might be a significant event.

Yes. The fact that git-whatchanged happens to ignore such things right now 
is just a implementation detail, not a "good thing". The new git log seems 
to be better in pretty much all respects.

The bigger conceptual difference is actually that once you do revision 
pruning based on the pathname limiter, we prune away parents of merges 
that seem "uninteresting". So before, when you had the same change come 
through two different branches, "git-whatchanged" would actually show it 
twice, while the new "git log" approach would tend to show it just once 
(because it would pick one of the histories and ignore the other).

I think that's fine (and probably even preferable), but it's another 
example of something where we might want to have an option to not 
simplify the merge history. It's likely that nobody will ever care, but 
who knows..

			Linus

^ permalink raw reply

* Re: [PATCH] git log [diff-tree options]...
From: Junio C Hamano @ 2006-04-09 19:08 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git
In-Reply-To: <Pine.LNX.4.64.0604091158310.9504@g5.osdl.org>

Linus Torvalds <torvalds@osdl.org> writes:

> Well, on the other hand, the new "git log --diff" should get the revision 
> counting right even if it's _not_ done by the caller.

Not if the user uses --diff-filter and/or --pickaxe, and after
we start omitting the log message part when no diff is output.

> So I'd suggest:
>  - drop git-whatchanged entirely
>  - keep it - for historical reasons - as a internal shorthand, and just 
>    turn it into "git log --diff -cc"
>
> and everybody will be happy (yeah, it will show a few merge commits 
> without diffs, because the diffs end up being uninteresting, but that's 
> _fine_, even if it's not 100% the same thing git-whatchanged used to do)

I tend to agree.  A merge commit touching a path but not
actually changing the contents of the path from parents might be
a significant event.

^ permalink raw reply

* Re: git ident
From: sean @ 2006-04-09 19:02 UTC (permalink / raw)
  To: Jeremy English; +Cc: git
In-Reply-To: <44395711.7000902@jeremyenglish.org>

On Sun, 09 Apr 2006 13:48:49 -0500
Jeremy English <jhe@jeremyenglish.org> wrote:

> I keep a local project in a git archive.  After the last upgrade I get a 
> ident error when trying to commit.  It works after I set the environment 
> variables.  What I don't like is that the error comes up after I have 
> typed in my comment, then my comment is lost, that's frustrating.  The 
> other thing is I don't care if the commit is coming from a valid person, 
> why require this?

Believe it is required to reduce the number of commits made in the 
kernel project with incorrect attribution.   To remove the need to
set environment variables, use the repo-config command to set some
defaults:

$ git repo-config user.email "you@email.com"
$ git repo-config user.name "your name"

HTH,
Sean

^ permalink raw reply

* Re: [PATCH] git log [diff-tree options]...
From: Linus Torvalds @ 2006-04-09 19:02 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vbqvabn8f.fsf@assigned-by-dhcp.cox.net>



On Sun, 9 Apr 2006, Junio C Hamano wrote:
> 
> Also, I might have to rethink --max-count logic -- I think it is
> reasonable to skip the commit when doing limiting by diff like
> "whatchanged" does, but one thing I find suboptimal with the
> current whatchanged is that it does not count commits that are
> actually shown (it counts what the upstream rev-list feeds
> diff-tree).  With the "git log --diff" based whatchanged, it
> becomes trivial to skip the revs->max_count limiting and have
> the caller count the commits it actually does something
> user-visible to, instead of counting the commits it pulled out
> of get_revision().

Well, on the other hand, the new "git log --diff" should get the revision 
counting right even if it's _not_ done by the caller.

Really, the only reason "git-whatchanged" exists at all is that it used to 
be originally impossible, and later on too expensive to do the commit- 
limiting by pathname. With the new incremental path-limiting, the reason 
for "git-whatchanged" simply goes away.

So I'd suggest:
 - drop git-whatchanged entirely
 - keep it - for historical reasons - as a internal shorthand, and just 
   turn it into "git log --diff -cc"

and everybody will be happy (yeah, it will show a few merge commits 
without diffs, because the diffs end up being uninteresting, but that's 
_fine_, even if it's not 100% the same thing git-whatchanged used to do)

			Linus

^ permalink raw reply

* Re: git ident
From: Junio C Hamano @ 2006-04-09 19:01 UTC (permalink / raw)
  To: Jeremy English; +Cc: git
In-Reply-To: <44395711.7000902@jeremyenglish.org>

Jeremy English <jhe@jeremyenglish.org> writes:

> What I don't like is that the error comes up
> after I have typed in my comment, then my comment is lost, that's
> frustrating.

Sympathizable, but presumably a new user needs to be burned only
once (set them either in $HOME/.profile or .git/config if you
want to use separate identity per project).

> ....  The other thing is I don't care if the commit is coming
> from a valid person, why require this?

Because public projects like the kernel wants to prevent
otherwise good commits from a misconfigured repository to
propagate into them.  We could have a separate per-repository
configuration to say "broken identity is not a problem for this
project", but if the user has to set that in the configuration,
she would be better off setting her identity there.

And making it the default not to require the identity is going
backwards. Our primary focus is to support public, multi-person,
distributed development project.

^ permalink raw reply

* git ident
From: Jeremy English @ 2006-04-09 18:48 UTC (permalink / raw)
  To: git

I keep a local project in a git archive.  After the last upgrade I get a 
ident error when trying to commit.  It works after I set the environment 
variables.  What I don't like is that the error comes up after I have 
typed in my comment, then my comment is lost, that's frustrating.  The 
other thing is I don't care if the commit is coming from a valid person, 
why require this?

^ permalink raw reply

* Re: [PATCH] git log [diff-tree options]...
From: Junio C Hamano @ 2006-04-09 18:46 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git
In-Reply-To: <Pine.LNX.4.64.0604090950590.9504@g5.osdl.org>

Linus Torvalds <torvalds@osdl.org> writes:

> I wonder... This all looks fine, but there are actually two different 
> "diffs" that can be shown for "git log --diff <pathlimiter>":
>
>  - the whole diff for a commit
>  - the path-limited diff

Yes, exactly the same way sometimes you would want just pickaxe,
sometimes you would want it with --pickaxe-all.

Also, I might have to rethink --max-count logic -- I think it is
reasonable to skip the commit when doing limiting by diff like
"whatchanged" does, but one thing I find suboptimal with the
current whatchanged is that it does not count commits that are
actually shown (it counts what the upstream rev-list feeds
diff-tree).  With the "git log --diff" based whatchanged, it
becomes trivial to skip the revs->max_count limiting and have
the caller count the commits it actually does something
user-visible to, instead of counting the commits it pulled out
of get_revision().

BTW I think I could remove the log message generation part of
"git log" and have it use the one in log-tree (which I will
probably rewrite not to format the message into the static
this_header[] buffer when it is not shown).

Another thing that might be useful is to teach diff-* to do the
diffstat part internally.  After that is in place we could
introduce --pretty=patch to have "git log" produce format-patch
compatible output.

^ permalink raw reply

* Re: [RFH] Exploration of an alternative diff_delta() algorithm
From: Nicolas Pitre @ 2006-04-09 18:08 UTC (permalink / raw)
  To: Peter Eriksen; +Cc: git
In-Reply-To: <20060409175316.GA21455@erlang.gbar.dtu.dk>

On Sun, 9 Apr 2006, Peter Eriksen wrote:

> On Sun, Apr 09, 2006 at 01:40:14PM -0400, Nicolas Pitre wrote:
> ...
> > It also has lots of compilation warnings.
> 
> Hm, I don't get any warnings.  Would you mind pasting them, so I
> can see what it's about?

gcc -o diff-delta.o -c -g -O2 -Wall -DSHA1_HEADER='<openssl/sha.h>'  diff-delta.c
diff-delta.c: In function 'diff_delta':
diff-delta.c:123: warning: pointer targets in passing argument 1 of 'init_hash' differ in signedness
diff-delta.c:124: warning: pointer targets in passing argument 1 of 'init_hash' differ in signedness
diff-delta.c:170: warning: pointer targets in passing argument 1 of 'hash' differ in signedness
diff-delta.c:171: warning: pointer targets in passing argument 1 of 'hash' differ in signedness
diff-delta.c:203: warning: pointer targets in passing argument 1 of 'init_hash' differ in signedness
diff-delta.c:204: warning: pointer targets in passing argument 1 of 'init_hash' differ in signedness

Also you should avoid declaring new variables after code in the same 
scope, like you do with version_offset for example.  This is a feature 
that many C compilers don't support.


Nicolas

^ permalink raw reply

* Re: [RFH] Exploration of an alternative diff_delta() algorithm
From: Peter Eriksen @ 2006-04-09 17:53 UTC (permalink / raw)
  To: git
In-Reply-To: <Pine.LNX.4.64.0604091333140.2215@localhost.localdomain>

On Sun, Apr 09, 2006 at 01:40:14PM -0400, Nicolas Pitre wrote:
...
> Out of curiosity I just tried your diff-delta version with test-delta 
> and it produced a segmentation fault on the first attempt.

Yes, I get that too with your README example.

> It also has lots of compilation warnings.

Hm, I don't get any warnings.  Would you mind pasting them, so I
can see what it's about?

At least now I have one segmentation fault to work on.  
Thanks.

Peter

^ permalink raw reply

* Re: [RFH] Exploration of an alternative diff_delta() algorithm
From: Nicolas Pitre @ 2006-04-09 17:45 UTC (permalink / raw)
  To: Peter Eriksen; +Cc: git
In-Reply-To: <20060409173409.GB23908@erlang.gbar.dtu.dk>

On Sun, 9 Apr 2006, Peter Eriksen wrote:

> On Sun, Apr 09, 2006 at 01:14:31PM -0400, Nicolas Pitre wrote:
> ...
> > Nothing outside diff-delta.c and patch-delta.c is aware of the delta 
> > data format.  So if your version is meant to be a transparent 
> > replacement then it should pass all tests.  If it doesn't then it is 
> > broken.
> > 
> > To help you play around you could try the test-delta utility (make 
> > test-delta to build it).
> > 
> > So:
> > 
> > 	test-delta -d file1 file2 delta_file
> > 	test-delta -p file1 delta_file file3
> > 	cmp file2 file3
> 
> My tests of these kinds doesn't show any errors. 

Try this with the README file from the git source tree:

	sed s/git/GIT/g < ./README > /tmp/README.mod
	test-delta -d ./README /tmp/README.mod /tmp/README.delta
	[BOOM!]

> Though, if file2 is empty, test-delta writes: "file2: Invalid 
> argument".

We never delta against or towards empty files.


Nicolas

^ permalink raw reply

* Re: [RFH] Exploration of an alternative diff_delta() algorithm
From: Nicolas Pitre @ 2006-04-09 17:40 UTC (permalink / raw)
  To: Peter Eriksen; +Cc: git
In-Reply-To: <Pine.LNX.4.64.0604091307460.2215@localhost.localdomain>

On Sun, 9 Apr 2006, Nicolas Pitre wrote:

> On Sun, 9 Apr 2006, Peter Eriksen wrote:
> 
> > Greetings Gitlings,
> > 
> > I've been trying to implement an alternative algorithm
> > for diff_delta().  I'm getting close to something that
> > works, but now I'm stuck!
> 
> Nothing outside diff-delta.c and patch-delta.c is aware of the delta 
> data format.  So if your version is meant to be a transparent 
> replacement then it should pass all tests.  If it doesn't then it is 
> broken.
> 
> To help you play around you could try the test-delta utility (make 
> test-delta to build it).
> 
> So:
> 
> 	test-delta -d file1 file2 delta_file
> 	test-delta -p file1 delta_file file3
> 	cmp file2 file3
> 
> You should always have file3 identical to file2.

Out of curiosity I just tried your diff-delta version with test-delta 
and it produced a segmentation fault on the first attempt.

It also has lots of compilation warnings.


Nicolas

^ permalink raw reply

* Re: [RFH] Exploration of an alternative diff_delta() algorithm
From: Peter Eriksen @ 2006-04-09 17:34 UTC (permalink / raw)
  To: git
In-Reply-To: <Pine.LNX.4.64.0604091307460.2215@localhost.localdomain>

On Sun, Apr 09, 2006 at 01:14:31PM -0400, Nicolas Pitre wrote:
...
> Nothing outside diff-delta.c and patch-delta.c is aware of the delta 
> data format.  So if your version is meant to be a transparent 
> replacement then it should pass all tests.  If it doesn't then it is 
> broken.
> 
> To help you play around you could try the test-delta utility (make 
> test-delta to build it).
> 
> So:
> 
> 	test-delta -d file1 file2 delta_file
> 	test-delta -p file1 delta_file file3
> 	cmp file2 file3

My tests of these kinds doesn't show any errors.  Though, if file2 is
empty, test-delta writes: "file2: Invalid argument".

Peter

^ permalink raw reply

* Re: [RFH] Exploration of an alternative diff_delta() algorithm
From: Nicolas Pitre @ 2006-04-09 17:14 UTC (permalink / raw)
  To: Peter Eriksen; +Cc: git
In-Reply-To: <20060409143117.GA23908@erlang.gbar.dtu.dk>

On Sun, 9 Apr 2006, Peter Eriksen wrote:

> Greetings Gitlings,
> 
> I've been trying to implement an alternative algorithm
> for diff_delta().  I'm getting close to something that
> works, but now I'm stuck!  I think it has something to
> do with pack-objects.c, but I'm not sure.  Here's the
> first test that fails:
> 
> *** t5500-fetch-pack.sh ***
> * FAIL 1: 1st pull
>         git-fetch-pack -v .. B A > log.txt 2>&1
> * FAIL 2: fsck
>         git-fsck-objects --full > fsck.txt 2>&1
> * FAIL 3: new object count after 1st pull
>         test 33 = 0
> * FAIL 4: minimal count
>         test 33 = 0
> * FAIL 5: repack && prune-packed in client
>         (git-repack && git-prune-packed)2>>log.txt
> *   ok 5: 2nd pull
> *   ok 6: fsck
> * FAIL 7: new object count after 2nd pull
>         test 192 = 198
> * FAIL 8: minimal count
>         test 192 = 198
> * FAIL 9: repack && prune-packed in client
>         (git-repack && git-prune-packed)2>>log.txt
> *   ok 9: 3rd pull
> *   ok 10: fsck
> * FAIL 11: new object count after 3rd pull
>         test 3 = 228
> * FAIL 12: minimal count
>         test 3 = 30
> * failed 8 among 12 test(s)
> 
> I've been looking all around the current diff_delta(), and I
> can't see, what I'm missing.  Any ideas?  The file is meant to
> replace the current diff-delta.c.

Nothing outside diff-delta.c and patch-delta.c is aware of the delta 
data format.  So if your version is meant to be a transparent 
replacement then it should pass all tests.  If it doesn't then it is 
broken.

To help you play around you could try the test-delta utility (make 
test-delta to build it).

So:

	test-delta -d file1 file2 delta_file
	test-delta -p file1 delta_file file3
	cmp file2 file3

You should always have file3 identical to file2.


Nicolas

^ permalink raw reply

* Re: [PATCH] git log [diff-tree options]...
From: Linus Torvalds @ 2006-04-09 16:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7v7j5zce7x.fsf@assigned-by-dhcp.cox.net>



On Sun, 9 Apr 2006, Junio C Hamano wrote:
>
> And this makes "git log" to take common diff-tree options, so
> that it can be used as "git whatchanged".

I wonder... This all looks fine, but there are actually two different 
"diffs" that can be shown for "git log --diff <pathlimiter>":

 - the whole diff for a commit

 - the path-limited diff

and I think we'd likely want to have some way to select the output. 
Probably with the path-limited diff being the default (that's what 
"git-whatchanged" does), but perhaps with "--full-diff" showing the whole 
commit diff (which is what "gitk" does).

		Linus

^ permalink raw reply

* Re: [ANNOUNCE] git-svnconvert: YASI (Yet Another SVN importer)
From: Jakub Narebski @ 2006-04-09 16:43 UTC (permalink / raw)
  To: git
In-Reply-To: <20060409163426.GA22103@nospam.com>

Rutger Nijlunsing wrote:

> Since I didn't succeed in importing a (private) SVN repo into git, I
> wrote a new converter to handle more cases like:

Both git-svn[*1*] and git-svnimport failed? Have you tried Tailor tool:
  http://www.darcs.net/DarcsWiki/Tailor

>   - use command line svn instead of some perl library to have less
>     dependancies and to support proxy + repo authentification.
>     Might even work on MacOSX ;)

Instead adding dependence on Ruby, eh?


References
----------
[*1*] contrib/git-svn and http://git-svn.yhbt.net/
-- 
Jakub Narebski
Warsaw, Poland

^ permalink raw reply

* [ANNOUNCE] git-svnconvert: YASI (Yet Another SVN importer)
From: Rutger Nijlunsing @ 2006-04-09 16:34 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 1038 bytes --]

Hi all,

Since I didn't succeed in importing a (private) SVN repo into git, I
wrote a new converter to handle more cases like:

  - use command line svn instead of some perl library to have less
    dependancies and to support proxy + repo authentification.
    Might even work on MacOSX ;)
  - automatic merge detection by looking at from which revision a
    revision gets its files
  - visualisation of the branch tree with dotty to check what git-convert
    would produce _before_ importing it.
  - /trunk is moved to /branches/old, /branches/new_branch becomes /trunk
    in next revision (ARGH!)
  - To be able to continue after a ^C and be able to continue where
    it stopped.
  - have configurable repo layout to handle things like:
    - files next to branches in /branches
    - /branches/

Regards,
Rutger.

-- 
Rutger Nijlunsing ---------------------------------- eludias ed dse.nl
never attribute to a conspiracy which can be explained by incompetence
----------------------------------------------------------------------

[-- Attachment #2: git-svnconvert.rb --]
[-- Type: text/plain, Size: 29546 bytes --]

#!/usr/bin/env ruby

# Convert a Subversion repository with all its branches
# into git incrementally.
#
# The main difference with git-svnimport is
# that it handles a badly-broken archive which I wanted to 
# convert which git-svnimport did not handle ;)
#
# But, of course, your milage may (will!) vary. svn only defines
# snapshot-trees and interpretation of each leaf in the tree is left
# as an exercise for the reader.
#
# Features:
#  - import _all_ branches
#  - uses svn command line, so:
#    - supports HTTP proxy with authentification
#    - supports repository authentification
#    - supports incorrect (self-signed) SSL certificates ;)
#  - can handle multiple branche changes in one revision
#  - supports extra files in /branches next to real branches
#  - gets merge information by looking at which revisions we're based on
#    (so no need to parse commit messages)
#    This (only) works when the merge of two branches results
#    in files from both branches.
#  - handles missing revisions
#  - no extra packages needed, just ruby and subversion.
#  - outputs GraphViz .dot files for visualisation of branches
#  - fairly interrupt-save. Once in a while it saves it state
#    to be able to continue next time where it left off.
#  - works parallel: one process fetches, while another copies
#    it into git.
# Con:
#  - needs a complete, checked-out SVN repository locally
#    _for_ _each_ _branch_.
#    This takes disk space. However, disk space is cheap ;)
# TODO:
#  - parse XML as XML instead of text. In case the exact XML formatting
#    changes in the future...
#  - add tagging support
#  - handle file properties like execute and ignore
#  - use 'svn switch' to checkout other branches fast
#
# Workings
#  - Get a log of all commits
#  - Create a branches graph from this log
#  - Check out each revision separately and write a commit
#    for each revision.
#
# (c)2006 R. Nijlunsing <git-svnconvert@tux.tmfweb.nl>
# Released under the GNU Public License, version 2.


$VERBOSE = true			# Let Ruby warn more

require 'set'
require 'time'
require 'find'
require 'fileutils'
require 'optparse'
require 'cgi'			# For unescapeHTML()

#################### Configuration

# Root directories of the branches and trunk(s).
#   - one line per directory
#   - '*' matches one filename component and is matched last
#   - names are case sensitive
#   - whitespaces at start and end are ignored
$branch_dirs = %q{
  /branches/*
  /branches/cc_test/*
  /branches/pre-dev/trunk
  /branches/pre-dev/trunk_old
  /branches/tasks/*
  /trunk
}

# List of paths which might get matched by $branch_dirs,
# but which are not the root of branches.
#
# Argh. People put files into the root of branches,
# which ends up as a branch (so for example /branches/README
# ends up as a branch). List here all files in the roots
# to ignore them.
$not_branch_dirs = %q{
  /branches/README
  /branches/pre-dev
  /branches/cc_test
  /branches/tasks
}

#################### End of configuration

def read_svn_authors(authors_filename)
  users = {}
  begin
    IO.foreach(authors_filename) { |line|
      if line =~ %r{^(\S+?)\s*=\s*(.+?)\s*<(.+)>\s*$}
	user, name, email = $1, $2, $3
	users[user] = [name, email]
      end
    }
  rescue Errno::ENOENT, Errno::EACCES
    die("Could not read #{authors_filename}: #{$!}")
  end
  if $verbose
    puts "Read #{users.size} authors from #{authors_filename}"
  end
  return users
end

def write_svn_authors(users, authors_filename)
  begin
    File.open(authors_filename, "wb") { |io|
      users.keys.sort.each { |user|
	io.puts "#{user} = #{users[user][0]} <#{users[user][1]}>"
      }
    }
    if $verbose
      puts "Wrote #{users.size} authors to #{authors_filename}"
    end
  rescue Errno::EACCES
    die("Could not write #{authors_filename}: #{$!}")
  end
end

# Given a string with on each line a root directory, generate a regular
# expression matching one of those root directories.
def root_dirs_to_regexp(rootdirs, is_prefix)
  Regexp.new(
    "^(?:" +			# Match at start without capturing
    rootdirs.
      split("\n").
      find_all { |dir| dir.strip != "" }.	# Delete empty lines
      collect { |dir|
        # make path absolute; use '/' as path separator;
        ("/" + dir.strip).gsub(%r{[\\/]+}, "/")
      }.sort { |dir1, dir2|
        # Sort on size so '/branches/development/trunk' comes before
        # '/branches/*' and will therefore be matched.
        dir2.size <=> dir1.size
      }.collect { |dir|
        Regexp.escape(dir)	# Escape everything
      }.join("|").gsub("\\*", "[^/]+") + # Unescape '*' back
    ")" +
    # If not a prefix, must match at end
    # If prefix, must match at path delimiter (or at end)
    (is_prefix ? "(?=/|$)" : "$")  
  )
end

# Execute shell command; bail out at error
def safe_system(cmd)
  puts cmd if $verbose
  system(cmd)
  if $? != 0
    puts cmd if !$verbose
    puts "!!! Command returned non-zero exit code: #{$?}"
    puts "!!! Working dir: #{Dir.pwd}"
    exit $?
  end
end

def safe_popen(cmd, mode = "w+", &callback)
  puts "|" + cmd if $verbose
  res = IO.popen(cmd, mode, &callback)
  if $? != 0
    puts cmd if !$verbose
    puts "!!! Command returned non-zero exit code: #{$?}"
    puts "!!! Working dir: #{Dir.pwd}"
    exit $?
  end
  return res
end

module Shell
  # Escape string string so that it is parsed to the string itself
  # Compare to Regexp.escape .
  def self.escape(string)
    string !~ %r{[ "\\]}i ? 
      string : '"' + string.gsub(%r{(["\\])}i, '\\\\\1') + '"'
  end
end

def svn_common_args(branch = "")
  repo_url = $repo_url
  repo_url += "/" + branch if branch != ""
  res = "--non-interactive"
  res += " --username #{Shell.escape($username)}" if $username
  res += " --password #{Shell.escape($password)}" if $password
  res += " #{Shell.escape(repo_url)}"
  res
end

def svn_get_current_revision
  $stderr.print "Retrieving current HEAD revision... " if $verbose
  # 'svn info' doesn't always work to get most recent revision.
  # So parse output of 'svn log -r HEAD:HEAD'.
  svn_info = `svn log --xml -r HEAD:HEAD #{svn_common_args}`
  svn_info =~ %r{\srevision=\"(\d+)\"}m
  $stderr.puts "r#{$1}" if $verbose
  $1.to_i
end

# Prefix each line with "!!!" and only warn:
#  - once for each unique warning
#  - at most a fixed number of times per caller
$warning_txt = Set.new		# All dumped warnings texts
$warning_callers = {}		# Per caller: number of warnings
def warning(*txt)
  txt = txt.flatten.collect { |t| t.split("\n") }.flatten
  if !$warning_txt.include?(txt)
    $warning_txt << txt
    backtrace = caller[0]
    $warning_callers[backtrace] ||= 0
    times_warned = ($warning_callers[backtrace] += 1)
    if times_warned <= 5
      first = true
      txt.each { |line|
	$stderr.puts((first ? "!!!" : "   ") + " #{line}")
	first = false
      }
      if times_warned == 5
	$stderr.puts 
          "    (more of this type of warnings will be suppressed)"
      end
    end
  end
end

def die(*txt); warning(*txt); exit 1; end

class BranchColor
  @@available_colors = [
    "green", "yellow", "orange", "cyan", "steelblue3",
    "lightblue", "thistle", "red", '".7 .3 1.0"',
    "navy", "violet", "crimson", "azure", "linen", "peru",
    "tan", "darkgreen", "coral"
  ]
  @@branch_color = {}		# String branch -> String color

  def self.color(branch)
    # For a new branch, take a fixed color or a random one if out-of-colors.
    @@branch_color[branch] ||= (
      @@available_colors.shift ||
      ('"%.1f %.1f %.1f"' % [rand + 0.1, rand + 0.1, rand + 0.1])
    )
  end
end

# Map a filename into a branch name part and element name part.
# Branchname may contain '/''s.
# Element name starts with a '/' or is empty.
#
# SVN does not contain branches, but only contains one large tree with
# objects of (potentially different) versions. By only looking at a
# subtree, a branch is emulated. However, we have to know where those
# subtrees are rooted to be able to convert them to branches.
def path_to_branch(filename)
  if filename =~ $branch_dirs
    # We matched a branch-name.
    branch, elem = $&, filename[$&.size..-1]
    return nil if branch =~ $not_branch_dirs
    return [branch[1..-1], elem] # Remove leading '/' from branch name
  else
    # Outside the branches (e.g. a tag). Skip this.
    return nil
  end
end

# Revision of a branch; part of a revision.
# Forms a directed acyclic graph with other BranchRevisions
class BranchRevision
  attr_accessor :must_add_implicit_dep	# Boolean
  attr_accessor :depends_on	# Set of BranchRevision: parents
  attr_accessor :dependers      # Set of BranchRevision: children
  attr_reader :branch		# String: branchname
  attr_accessor :empty		# Bool: true if no reason to keep this rev
  attr_accessor :deleted	# Bool: true if deleted at end
  attr_accessor :commit_sha1	# String. git's SHA1 commit hash.

  # If true, implicitly adds an dependancy on the previous
  # version of this branch. This the the default.
  # However, when we detect that this branch consists of totally new files
  # (for example, when copying /trunk to /branches/branch_name) we set
  # this to false.
  # @must_add_implicit_dep  # Boolean

  def initialize(revision, branch)
    @revision = revision	# Revision
    @branch = branch
    @must_add_implicit_dep = true
    @depends_on = Set.new
    @dependers = Set.new
    @empty = true
    @deleted = false
    @commit_sha1 = nil
  end

  # Compare on revision number
  def <=>(other); self.nr <=> other.nr; end

  def to_s; "#{@branch}:#{nr}"; end

  def nr; @revision.nr; end	# Fixnum: revision nr

  # Returns true if this BranchRevision is a root revision since:
  #   - it does not depend on another BranchRevision
  #   - it contains changes _within_ the branch (== not empty)
  def root?; @depends_on.empty? && !@empty; end

  def add_depends_on(branch_rev)
    @depends_on << branch_rev
    branch_rev.dependers << self
  end

  def remove_depends_on(branch_rev)
    @depends_on.delete(branch_rev)
    branch_rev.dependers.delete(self)
  end

  def rev; @revision; end

  def other_branch_depends_on
    @depends_on.find_all { |b| b.branch != @branch }
  end

  # Returns all BranchRevisions we depend on in the same branch
  def same_branch_depends_on
    @depends_on.find_all { |b| b.branch == @branch }
  end

  # Returns branch revision on which we depend which is least number
  # of revisions back on the same branch. Since we go back, this is
  # the max. revision of the dependancies.
  def closest_same_branch_depends_on
    same_branch_depends_on.max
  end

  # Returns branch revision which depends on us which is least number
  # of revisions forward on the same branch.
  def closest_same_branch_depender
    @dependers.find_all { |b| b.branch == @branch }.min
  end

  # A dependancy is unneeded when:
  #   - the dependancy belongs to the same branch and the same revs
  #     can be reached by removed this dependancy, or
  #   - the dependancy belongs to another branch on which a rev in
  #     the same branch above us is already depending
  def remove_unneeded_depends_on
    return if @depends_on.size <= 1 # Optimisation

    # TODO: look farther back.

    # Same branch dependancy remover:
    # Follow the dependancy chain along the closest parent
    # till we reach the farthest dep.
    same_branch_deps = same_branch_depends_on()
    closest_same_branch_dep = same_branch_deps.max
    farthest_same_branch_nr = !same_branch_deps.empty? && same_branch_deps.min.nr
    current_dep = closest_same_branch_dep
    same_branch_deps_rec = Set.new
    while current_dep && current_dep.nr >= farthest_same_branch_nr
      same_branch_deps_rec.add(current_dep)
      current_dep = current_dep.closest_same_branch_depends_on
    end
    same_branch_deps.each { |red_rev|
      if red_rev != closest_same_branch_dep &&
	  same_branch_deps_rec.include?(red_rev)
#	puts "r#{nr}: Dep. on same branch r#{red_rev.nr} redundant" if $verbose
	remove_depends_on(red_rev)
      end
    }

    # Via-other-branch redundant dependancy remover:
    other_branch_depends_on.each { |other_dep|
      if !((other_dep.dependers & same_branch_deps_rec).empty?)
#	puts "r#{nr}: Dep. on other branch r#{other_dep.nr} redundant" if $verbose
	remove_depends_on(other_dep)
      end
    }
  end

  # Signal the fact that the last action has been added to this revision.
  # Now the dependancies can be calculated.
  def last_action_added
    if @must_add_implicit_dep
      prev_branch_rev = Revision.find_revision_with_branch(nr - 1, @branch)
      add_depends_on(prev_branch_rev) if prev_branch_rev
    end
    @depends_on.dup.each { |br|
      # We might depend on empty branch revisions. Since empty branch will not
      # be checked out, copy dependancies from those branches.
      if br.empty
	br.depends_on.each { |parent_depends_on|
	  add_depends_on(parent_depends_on)
	}
	remove_depends_on(br)
      end
    }
    remove_unneeded_depends_on
  end

  # Internal dotty label.
  def label; "_#{@branch}_#{nr}".delete("^0-9a-zA-Z_"); end

  def to_dotty
    return "\t/* r#{nr} is empty */\n" if @empty
    res = "\t#{label}[label=\"#{File.basename(@branch)} " +
      "#{nr}\\n#{rev.author}\",color=#{BranchColor.color(@branch)}];"
    @depends_on.each { |d|
      res += " #{d.label} -> #{label}"
      res += "[style=dashed]" if @branch != d.branch # A merge or branch
      res += ";"
    }
    res + "\n"
  end
end

# One revision in SVN. Contains zero or more BranchRevisions.
class Revision
  attr_reader :nr		# Fixnum
  attr_accessor :author		# String
  attr_accessor :msg		# String
  attr_accessor :time		# String
  attr_accessor :branches	# Hash: String rootdir to BranchRevision

  @@all_revs = []		# Array: all revisions, indexed by nr

  def self.get_all_revs; @@all_revs; end

  def self.set_all_revs(new_all_revs); @@all_revs = new_all_revs; end

  # Returns first revision number we're interested in.
  def self.get_next_log_nr
    @@all_revs[-1] ? @@all_revs[-1].nr + 1 : $start_revision
  end

  def self.[](nr); @@all_revs[nr]; end
  # Iterate sorted over each revision
  def self.each; @@all_revs.each { |rev| yield(rev) if rev }; end

  # Search for most recent BranchRevision which changed given branch.
  def self.find_revision_with_branch(start_nr, branch)
    start_nr.downto($start_revision) { |nr|
      rev = @@all_revs[nr]
      return rev.branches[branch] if rev && rev.branches.has_key?(branch)
    }
    warning(
      "Could not find branch #{branch.inspect} from revision #{start_nr} back"
    ) if start_nr >= $start_revision
    return nil
  end

  def initialize(nr)
    @nr = nr
    @branches = {}
    # Store new revision in global revision array
    @@all_revs[nr] = self
  end

  # Add an action to this revision
  #   'copyto_path' is the destination being changed. In all cases,
  #   this can be a file or directory.
  #   action == :R  : Replace. copyfrom_* always filled in.
  #   action == :M  : Modify
  #   action == :D  : Delete
  #   action == :A  : Create as new or copy from other revision
  #                   If from other revision, copyfrom_* are filled in.
  $branch_paths = Set.new
  def add_action(action, copyto_path, copyfrom_path, copyfrom_rev)
    branch, elem = *path_to_branch(copyto_path)
    return if !branch		# Action is outside a branch

    if $verbose && !$branch_paths.include?(branch)
      puts "r#{nr}: New branch: #{branch}"
      $branch_paths.add(branch)
    end

    branch_rev = (@branches[branch] ||= BranchRevision.new(self, branch))

    from_branch = from_elem = nil
    if copyfrom_path
      from_branch, from_elem = *path_to_branch(copyfrom_path)
      if !from_branch
	# Not a branch. Must be a tag.
	warning(
          "Ignoring dependancy r#{@nr} on a non-branch: " + 
          "#{copyfrom_path.inspect}:#{copyfrom_rev}"
        )
      else
	from_branch_rev = 
	  Revision.find_revision_with_branch(copyfrom_rev, from_branch)
	branch_rev.add_depends_on(from_branch_rev) if from_branch_rev
      end
    end

    # We cannot checkout deleted branches, so record the fact that
    # it is deleted at the end of the revision.
    branch_rev.deleted = true if elem == "" && action == :D

    # If something changes the root of the branch (deleted, copied
    # from other branch, ...) we break the implicit dependancy chain.
    # However, _modifying_ the root directory (adding files, removing files)
    # does not remove the dependancy.
    branch_rev.must_add_implicit_dep = false if elem == "" && action != :M

    # Check whether this action results in a revision. For example,
    # deleting the root of the branch or creating the root dir does not
    # change anything _within_ the branch.
    branch_rev.empty = false if elem != "" # Real files. Not empty.
  end

  # Signal the fact that the last action has been added to this revision.
  # Now the dependancies can be calculated.
  def last_action_added
    @branches.each_value { |b| b.last_action_added }
  end

  def to_dotty
    @branches.values.collect { |b| b.to_dotty }.join("")
  end
end

# Parse output of 'svn log --xml --verbose'
def parse_svn_log_xml(svnlog_filename)
  $stderr.puts "Branch analysis on svn log files..." if $verbose
  last_rev_nr = nil
  rev = nil			# Current Revision
  File.open(svnlog_filename, "rb") { |io|
    while line = io.gets
      case line
      when %r{^   revision="(\d+)"}
	rev_nr = $1.to_i
	$stderr.print "r#{rev_nr}" + ("\010" * 10)
	# Only create a new revision when not already loaded from
	# state file.
	if rev_nr > $start_revision && Revision[rev_nr] == nil
	  rev = Revision.new(rev_nr)
	  if last_rev_nr && (rev_nr != last_rev_nr + 1)
	    # Missing revisions are a sign of not getting the log of the
	    # whole archive, but only one of its branches (or subprojects).
	    # Which could be valid if we're after a subproject, but not
	    # if we want to import the whole repository.
	    warning("Missing reversion(s) #{last_rev_nr + 1}-#{rev_nr - 1}")
	  end
	  last_rev_nr = rev_nr
	end
      when %r{^\<path}: copyfrom_path = nil; copyfrom_rev = nil
      when %r{^   copyfrom-path="(.*)"}: copyfrom_path = $1
      when %r{^   copyfrom-rev="(\d+)"}: copyfrom_rev = $1.to_i
      when %r{^   action="([A-Z])"\>(.*)\</path\>}
	rev.add_action($1.to_sym, $2, copyfrom_path, copyfrom_rev) if rev
      when %r{^\</paths\>}
	# End of this revision.
	next if !rev
	rev.last_action_added
	root_branches = rev.branches.values.find_all { |branch| branch.root? }
	root_branches.each { |bi|
	  puts "r#{rev.nr} is root of #{bi.branch}"
	}
	changed_branches = rev.branches.values.find_all { |b| !b.empty }
	if changed_branches.size > 1
	  changed_branches_names = changed_branches.collect { |b| b.branch }
	  warning(
            "More than one branch changed in r#{rev.nr}..?",
            changed_branches_names
          )
	end
      when %r{^\<author\>(.*)\</author\>}
	rev.author = $1 if rev
      when %r{^\<msg\>(.*)}
	msg = $1.rstrip
	while !msg.gsub!("\</msg\>", "")
	  msg += "\n" + io.gets.rstrip
	end
	msg = CGI::unescapeHTML(msg.strip + "\n") # &amp; -> & etc.
	rev.msg = msg if rev
      when %r{^\<date\>(.*)\</date\>}
	rev.time = Time.parse($1) if rev
      end
    end
  }
end

$state_version = "20060409"

# Load state of git-svnconvert. Returns true in case of succes.
def load_state
  begin
    state = File.open($state, "rb") { |io| Marshal.load(io) }
    version = state.shift
    if version != $state_version
      warning(
        "Ignoring previous state: has different version (#{version}) " +
        "than this git-svnconvert version (#{$state_version})"
      )
    else
      $repo_url, all_new_revs = *state
      Revision.set_all_revs(all_new_revs)
      return true
    end
  rescue Errno::ENOENT
  end
  return false
end

def save_state
  start_time = Time.now
  File.open($state + ".new", "wb") { |io|
    io.write(Marshal.dump([$state_version, $repo_url, Revision.get_all_revs]))
  }
  # Rename atomically new state into _the_ state
  File.rename($state + ".new", $state)
  $stderr.puts "Saved new state in %.1fs" % (Time.now - start_time) if $verbose
end

# Export current graph of revisions to a GraphViz dotty file
def export_to_dot(dot_file)
  $stderr.puts "Writing branch graph to \"#{dot_file}\"..."
  File.open(dot_file, "wb") { |io|
    io.puts "strict digraph svn {"
    io.puts "\tnode[shape=box,style=filled];\n\n"
    Revision.each { |rev|
      next if !rev || (rev.nr < $start_revision)
      io.print rev.to_dotty
    }
    io.puts "}"
  }
end

# Runs an svn command, and run git-update-index on each file
# echoed to stdout by svn.
def svn_cmd_with_update_index(svn_cmd)
  safe_popen(svn_cmd, "r") { |svn_io|
    safe_popen("git-update-index --add --remove --stdin", "w") { |git_io|
      git_io.sync = true
      # Have a buffer of one line: svn echoes the file it is
      # working on. When svn echoes the next line, we then know
      # it finished the previous file, and can therefore be added
      # to git now.
      last_file = nil
      while line = svn_io.gets
	$stderr.puts line if $verbose
	if line =~ %r{^[A-Z][A-Z ]{3} (.*)$}
	  git_io.puts last_file if last_file 
	  last_file = $1
	end
      end
      git_io.puts last_file if last_file 
    }
  }
end

# Unknown contents in directory; recreate index
def git_update_index
  # Remove incomplete index
  git_index_file = ENV['GIT_INDEX_FILE']
  [git_index_file, git_index_file + ".lock"].each { |f|
    File.delete(f) if File.exist?(f)
  }
  # Add all files
  to_prune = Set.new([".git", ".svn"]) # Directories not to add
  safe_popen("git-update-index --add --remove --stdin") { |io|
    elems = 0
    Find.find(".") { |elem|
      Find.prune if to_prune.include?(File.basename(elem))
      if elem != "."
	elem = elem[2..-1]  # Remove ./
	io.puts elem
	elems += 1
      end
    }
    puts "Dir contained #{elems} elements" if $verbose
  }
end

######################################## Main

# Default values for options
$verbose = true
$username = nil			# Repository username
$password = nil			# Repository password
$start_revision = 1
dot_filename = nil
svn_authors_file = nil		# Mapping from 
$run_dotty = false

# Parse options
$opts = OptionParser.new
$opts.banner = %q{Convertor for a subversion archive into a git archive.

git-svnconvert [options] URL[@REV] [DIR]
  ..to convert complete SVN archive at URL into directory DIR.
  The URL must point to the root directory of the repo, so not f.e. to /trunk!
  By default, DIR will be the basename of the URL.
  Directory DIR will be created if it does not exist.
  Otherwise, DIR must already be a git repository.
git-svnconvert [options]
  ..to incrementally convert newly added revision and add them
    to the git repo in the current directory.

Examples:

git-svnconvert svn://svn.0pointer.net/fusedav .
  ..checks out svn://svn.0pointer.net/fusedav into current dir.
git-svnconvert svn://svn.0pointer.net/fusedav@10
  ..checks out starting at revision 10 into directory fusedav.
git-svnconvert
  ..updates already imported svn archive in current dir.

}
$opts.on("Options (defaults between []'s):")
$opts.on("--verbose", "-v", "Toggle verbose mode [#{$verbose}]") {
  $verbose = !$verbose
}
$opts.on("--help", "-h", "This usage") { puts $opts; exit 1 }
$opts.on("--revision REV", "-s", "Start revision [#{$start_revision}]") {
  |$start_revision|
}
$opts.on("--dot FILENAME", "-d", "Export branch tree as .dot file.") {
  |dot_filename|
}
$opts.on("--dotty", "-D", "Export branch tree and run dotty on it") {
  $run_dotty = true
}
$opts.on("--authors FILENAME", "-A", "Filename of svnauthors file to add") {
  |svn_authors_file|
}
$opts.on("\nOptions passed to 'svn':");
$opts.on("--username NAME", "-u", "Specify a username for SVN repo") {
  |$username|
}
$opts.on("--password PWD", "-p", "Specify a password for SVN repo") {
  |$password|
}

begin
  $opts.parse!(ARGV)
rescue OptionParser::InvalidOption
  die($!.to_s, $opts.to_s)
end

$repo_url = (ARGV.size > 0) ? ARGV.shift.dup : nil
# Parse @REV part of URL@REV if given...
$start_revision = $1.to_i if $repo_url && $repo_url.gsub!(%r{@(\d+)$}, "")
dest_dir = (ARGV.size > 0) ? ARGV.shift : nil
dest_dir = File.basename($repo_url) if $repo_url && !dest_dir
dest_dir ||= Dir.pwd	# No URL and no destdir given: use current dir
dest_dir = File.expand_path(dest_dir)
git_dir = ENV["GIT_DIR"] || "#{dest_dir}/.git"
svnconvert_dir = "#{git_dir}/svnconvert"
# Name of state file with all parsed log files and connection to git archive.
$state = "#{svnconvert_dir}/state"

if !File.exist?($state) && !$repo_url
  puts $opts.to_s
  puts "\n!!! Need an URL to checkout or a previously git-svnconvert'ed dir"
  exit 1
end

# Maping from SVN author name to full name and email
default_svn_authors_file = "#{git_dir}/svn-authors"
users = File.readable?(default_svn_authors_file) ?
  read_svn_authors(default_svn_authors_file) : {}
# If authors file explicitly given, add
if svn_authors_file
  users_to_add = read_svn_authors(svn_authors_file)
  if users_to_add.size > 0
    users.merge!(users_to_add) 
    write_svn_authors(users, default_svn_authors_file)
  end
end

FileUtils.mkdir_p(svnconvert_dir)	# Create destination if not existing
Dir.chdir(dest_dir)

$branch_dirs = root_dirs_to_regexp($branch_dirs, true)
$not_branch_dirs = root_dirs_to_regexp($not_branch_dirs, false)

# Load complete state, or failing that, same empty state with new repo URL
load_state || save_state
next_revision_nr = Revision.get_next_log_nr
if next_revision_nr < svn_get_current_revision
  svnlog = "#{svnconvert_dir}/svnlog.xml"
  puts "Retrieving revision log from #{next_revision_nr} and upwards..."
  cmd = "svn log #{svn_common_args} -r #{next_revision_nr}:HEAD --xml --verbose >#{svnlog}"
  safe_system(cmd)
  parse_svn_log_xml(svnlog)
  save_state
else
  puts "No new revision to get log of."
end

# The graph eases debugging, so export it always...
default_dot_filename = "#{svnconvert_dir}/branches.dot"
export_to_dot(dot_filename || default_dot_filename)
if $run_dotty
  safe_system("dotty #{Shell.escape(dot_filename || default_dot_filename)}")
end
exit 0 if dot_filename || $run_dotty

# Directory of all branches checked out:
svn_co_dir = "#{svnconvert_dir}/checkedout" 
ENV['GIT_DIR'] = git_dir

head_existed = File.exist?("#{git_dir}/HEAD")
safe_system("git-init-db")
if !head_existed
  $stderr.puts "Creating HEAD pointing to trunk"
  File.open("#{git_dir}/HEAD", "wb") { |io|
    io.puts "ref: refs/heads/trunk"
  }
end
Revision.each { |rev|
  next if rev.nr < $start_revision

  rev_commit_sha1s = []
  branches = rev.branches.values
  branches.each { |br|
    branch_dir = "#{svn_co_dir}/#{br.branch}"
    if br.commit_sha1
      # Skip if already imported
    elsif br.deleted
      # Branch contains only an 'delete this branch' command.
      $stderr.puts "r#{rev.nr}: #{br.branch} deleting #{branch_dir}"
#      FileUtils.rm_rf(branch_dir)
    elsif br.empty
      $stderr.puts "r#{rev.nr}: #{br.branch} is empty; skipping..."
    else
      # Hide the git index inside .svn dir.
      git_index_file = ENV['GIT_INDEX_FILE'] = "#{branch_dir}/.svn/index.git"
      if !File.directory?(branch_dir)
        # Start of new branch. Make (empty) directory to start new branch in
        FileUtils.mkdir_p(branch_dir)
	FileUtils.mkdir_p(File.dirname(ENV['GIT_INDEX_FILE']))
	Dir.chdir(branch_dir)
        puts "Checking out new branch #{br.branch}:#{rev.nr} in #{Dir.pwd}"
        cmd = "svn checkout -r #{rev.nr} #{svn_common_args(br.branch)} ."
        svn_cmd_with_update_index(cmd)
      else
	# Branched already checked out, update to new revision.
	Dir.chdir(branch_dir)
	svn_cmd = "svn update -r #{rev.nr} ."
	if File.exist?(git_index_file + ".lock")
	  $stderr.puts "Git index file already locked. Removing lock and recreating index."
	  safe_system(svn_cmd)
	  git_update_index
	else
	  puts "Updating branch to new revision: #{br.branch}:#{rev.nr}"
	  svn_cmd_with_update_index(svn_cmd)
	end
      end

      # git index is now up-to-date. Write the tree.
      tree_sha1 = `git-write-tree`.chomp

      # Now write this tree into an commit
      author = rev.author
      email = "unknown"
      author, email = users[author] if users[author]
      $stderr.puts "r#{rev.nr}: Author: #{author} <#{email}>" if $verbose
      ENV['GIT_AUTHOR_NAME'] = ENV['GIT_COMMITTER_NAME'] = author
      ENV['GIT_AUTHOR_EMAIL'] = ENV['GIT_COMMITTER_EMAIL'] = email
      ENV['GIT_AUTHOR_DATE'] = ENV['GIT_COMMITTER_DATE'] =
	rev.time.strftime("+0000 %Y-%m-%d %H:%M:%S") 
      parents = br.depends_on.find_all { |p|
	# Filter out all non-converted parents
	raise if !p.commit_sha1 && p.nr >= $start_revision
	p.commit_sha1 != nil
      }.collect { |p| "-p #{p.commit_sha1}" }.join(" ")
      puts "Committing #{tree_sha1}..."
      cmd = "git-commit-tree #{tree_sha1} #{parents}"
      puts cmd if $verbose
      safe_popen(cmd) { |io|
        io.puts rev.msg.strip
        io.close_write
        br.commit_sha1 = io.read.chomp
        rev_commit_sha1s << br.commit_sha1    
      }
      puts "#{br.branch}:#{rev.nr} has id #{br.commit_sha1}" if $verbose

      Dir.chdir(dest_dir)
      # Tag current branch + revision.
#    safe_system("git-tag -f #{br.branch}/r#{rev.nr} #{br.commit_sha1}")
      # Update HEAD of this branch
      safe_system("git-update-ref refs/heads/#{br.branch} #{br.commit_sha1}")
      puts
    end
  }
  if rev_commit_sha1s.size == 1
    # '-f' added to make this script restartable (state might not get saved)
    safe_system("git-tag -f r#{rev.nr} #{rev_commit_sha1s}")
  end
  save_state if !rev_commit_sha1s.empty?
}
puts "Done."

^ permalink raw reply

* [RFH] Exploration of an alternative diff_delta() algorithm
From: Peter Eriksen @ 2006-04-09 14:31 UTC (permalink / raw)
  To: git

Greetings Gitlings,

I've been trying to implement an alternative algorithm
for diff_delta().  I'm getting close to something that
works, but now I'm stuck!  I think it has something to
do with pack-objects.c, but I'm not sure.  Here's the
first test that fails:

*** t5500-fetch-pack.sh ***
* FAIL 1: 1st pull
        git-fetch-pack -v .. B A > log.txt 2>&1
* FAIL 2: fsck
        git-fsck-objects --full > fsck.txt 2>&1
* FAIL 3: new object count after 1st pull
        test 33 = 0
* FAIL 4: minimal count
        test 33 = 0
* FAIL 5: repack && prune-packed in client
        (git-repack && git-prune-packed)2>>log.txt
*   ok 5: 2nd pull
*   ok 6: fsck
* FAIL 7: new object count after 2nd pull
        test 192 = 198
* FAIL 8: minimal count
        test 192 = 198
* FAIL 9: repack && prune-packed in client
        (git-repack && git-prune-packed)2>>log.txt
*   ok 9: 3rd pull
*   ok 10: fsck
* FAIL 11: new object count after 3rd pull
        test 3 = 228
* FAIL 12: minimal count
        test 3 = 30
* failed 8 among 12 test(s)

I've been looking all around the current diff_delta(), and I
can't see, what I'm missing.  Any ideas?  The file is meant to
replace the current diff-delta.c.

Peter

----->8--diff-delta.c----->8----
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include "delta.h"


#define BASE 257
#define PREFIX_SIZE 3

#define SIZE 10
#define HASH_TABLE_SIZE (1<<SIZE)

#define DELTA_SIZE (1024 * 1024)


unsigned int init_hash(unsigned char* data) {
  return data[0]*BASE*BASE + data[1]*BASE + data[2];
}

unsigned int hash(unsigned char* data, unsigned int hash) {
  return (hash - data[-1]*BASE*BASE)*BASE + data[2];
}

#define GR_PRIME 0x9e370001
#define HASH(v) ((v * GR_PRIME) >> (32 - SIZE))

struct entry {
  char file;
  char* offset;
};


void flush(struct entry* table) {
  memset(table, 0, HASH_TABLE_SIZE * sizeof(struct entry));
}


int same_prefixes(char* data1, char* data2) {
  return !memcmp(data1, data2, PREFIX_SIZE);  
}


void encode_add(char* out, int* outpos, char* version_start, char* version_copy) {
  unsigned int size = version_copy - version_start;
  if (!size) return;
  int pos = *outpos;

  while(size > 127) {
    out[pos++] = 127;
    memcpy(out + pos, version_start, 127);
    pos += 127;
    version_start += 127;
    size -= 127;
  }
  out[pos++] = size;
  memcpy(out + pos, version_start, size);  
  pos += size;

  *outpos = pos;
}


void encode_copy(char* out, int* outpos, int offset, int size) {
     int pos = (*outpos) + 1;
     int i = 0x80;

     if (offset & 0xff) { out[pos++] = offset; i |= 0x01; }
     offset >>= 8;
     if (offset & 0xff) { out[pos++] = offset; i |= 0x02; }
     offset >>= 8;
     if (offset & 0xff) { out[pos++] = offset; i |= 0x04; }
     offset >>= 8;
     if (offset & 0xff) { out[pos++] = offset; i |= 0x08; }

     if (size & 0xff) { out[pos++] = size; i |= 0x10; }
     size >>= 8;
     if (size & 0xff) { out[pos++] = size; i |= 0x20; }

     out[*outpos] = i;
     *outpos = pos;
}



void encode_size(char* out, int* outpos, unsigned long size) {
  int pos = *outpos;
  out[pos] = size;
  size >>= 7;
  while (size) {
    out[pos++] |= 0x80;
    out[pos] = size;
    size >>= 7;
  }
  *outpos = ++pos;
}


void *diff_delta(void *from_buf, unsigned long from_size,
		 void *to_buf, unsigned long to_size,
		 unsigned long *delta_size,
		 unsigned long max_size) {
  int index;
  int l;
  char* base = from_buf;
  char* version = to_buf;
  unsigned long base_size = from_size;
  unsigned long version_size = to_size;

  char* base_copy = base;
  char* version_copy = version;
  struct entry* table = calloc(HASH_TABLE_SIZE, sizeof(struct entry));
  //int delta_alloc = DELTA_SIZE;
  char* delta = malloc(DELTA_SIZE);
  int deltapos = 0;
  char* base_top = base + base_size;
  char* version_top = version + version_size;

  encode_size(delta, &deltapos, base_size);
  encode_size(delta, &deltapos, version_size);

  char* base_offset = base;
  char* version_offset = version;
  unsigned int base_hash = init_hash(base);
  unsigned int version_hash = init_hash(version);
  char* version_start = version;

  while(base_offset + PREFIX_SIZE < base_top && 
	version_offset  + PREFIX_SIZE < version_top) {  
    // step2:
    
    index = HASH(base_hash);
    switch (table[index].file) {
    case '\0': {
      table[index].file = 'b';
      table[index].offset = base_offset;
      break;
    }
    case 'v': {
      if (same_prefixes(base_offset, table[index].offset)) {
	base_copy = base_offset;
	version_copy = table[index].offset;
	goto step3;
      } else break;
    }
    case 'b': break;
    default: printf("AAAAAARGH 2b\n");
    }
    
    index = HASH(version_hash);
    switch (table[index].file) {
    case '\0': {
      table[index].file = 'v';
      table[index].offset = version_offset;
      break;
    }
    case 'b': {
      if (same_prefixes(table[index].offset, version_offset)) {
	base_copy = table[index].offset;
	version_copy = version_offset;
	goto step3;
      } else break;
    }
    case 'v': break;
    default: printf("AAAAAARGH 2v\n");
    }
    
    base_offset++;
    version_offset++;

    base_hash = hash(base_offset, base_hash);
    version_hash = hash(version_offset, version_hash);
    continue;  //  goto step2;
    
  step3:
    l = 0;
    while(base_copy[l] == version_copy[l]) l++;
    base_offset = base_copy + l;
    version_offset = version_copy + l;
    
    /*
    // Make sure we don't run out of delta buffer when encoding.
    if((delta_alloc - deltapos) < 
       (version_start - version_copy) + 1 + 8 + (PREFIX_SIZE + 1)) {
      delta_alloc = delta_alloc * 3 / 2;
      delta = (char*) realloc(delta, delta_alloc);
    }
    */
	if(max_size && deltapos > max_size) {
		free(delta);
		free(table);
		return NULL;
	}

    // step4:
    encode_add(delta, &deltapos, version_start, version_copy);
    encode_copy(delta, &deltapos, base_copy - base, l);
    
    // step5:
    flush(table);
    
    version_start = version_offset;
    
    base_hash = init_hash(base_offset);
    version_hash = init_hash(version_offset);
    
  }  //  goto step2;
  
  encode_add(delta, &deltapos, version_start, version + version_size);
  *delta_size = deltapos;

  free(table);

  return delta;
}

^ permalink raw reply

* [PATCH] Improve the git-diff-tree -c/-cc documentation
From: Petr Baudis @ 2006-04-09 13:43 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vslonaxq6.fsf@assigned-by-dhcp.cox.net>

Dear diary, on Sun, Apr 09, 2006 at 11:45:37AM CEST, I got a letter
where Junio C Hamano <junkio@cox.net> said that...
> Petr Baudis <pasky@ucw.cz> writes:
> 
> >   Now, the -c option documentation says:
> >
> > 	  It shows the differences from each of the parents to the merge
> > 	result simultaneously, instead of showing pairwise diff between
> > 	a parent and the result one at a time, which '-m' option output
> > 	does.
> >
> >   This sounds as exactly what I want. Well, the only problem is that the
> > same diff command as above with -c option added produces no diff at all,
> > just the header and commit messages. Did I misunderstand the -c
> > description and does it do something different?
> 
> The --combined diff option is to show merges more sensibly than
> plain -m option.  Now, the definition of "sensible" is to say
> that a merge is not interesting if it takes a version from one
> of the parents.  The paths whose results do not match any of the
> parents' version are deemed interesting and are shown.
> 
> I think you could tweak and give an option to intersect_paths()
> in combine-diff.c, so that paths that match one of the parents
> are also included in the output.  I haven't thought about it too
> much, but my gut feeling is it would not be very involved
> change.

Aha, thanks! I actually think this is more sensible to do, so now I'm
only pondering if in cg-log -f it is more useful to show the list of
files changed relative to the first parent or the files which were
subjected to a content merge...

---
This tries to clarify the -c/-cc documentation and clean up the style and
grammar.

Signed-off-by: Petr Baudis <pasky@suse.cz>
---

 Documentation/git-diff-tree.txt |   25 ++++++++++++++++---------
 1 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/Documentation/git-diff-tree.txt b/Documentation/git-diff-tree.txt
index 9153e4c..d7e529b 100644
--- a/Documentation/git-diff-tree.txt
+++ b/Documentation/git-diff-tree.txt
@@ -60,7 +60,8 @@ separated with a single space are given.
 -m::
 	By default, "git-diff-tree --stdin" does not show
 	differences for merge commits.  With this flag, it shows
-	differences to that commit from all of its parents.
+	differences to that commit from all of its parents. See
+	also '-c'.
 
 -s::
 	By default, "git-diff-tree --stdin" shows differences,
@@ -81,19 +82,25 @@ separated with a single space are given.
 	git-diff-tree outputs a line with the commit ID when
 	applicable.  This flag suppressed the commit ID output.
 
--c,--cc::
-	These flags change the way a merge commit is displayed
+-c::
+	This flag changes the way a merge commit is displayed
 	(which means it is useful only when the command is given
 	one <tree-ish>, or '--stdin').  It shows the differences
-	from each of the parents to the merge result
-	simultaneously, instead of showing pairwise diff between
-	a parent and the result one at a time, which '-m' option
-	output does.  '--cc' further compresses the output by
-	omiting hunks that show differences from only one
+	from each of the parents to the merge result simultaneously
+	instead of showing pairwise diff between a parent and the
+	result one at a time (which is what the '-m' option does).
+	Furthermore, it lists only files which were modified
+	in both parents.
+
+-cc::
+	This flag changes the way a merge commit patch is displayed,
+	in a similar way to the '-c' option. It implies the '-c'
+	and '-p' options and further compresses the patch output
+	by omitting hunks that show differences from only one
 	parent, or show the same change from all but one parent
 	for an Octopus merge.  When this optimization makes all
 	hunks disappear, the commit itself and the commit log
-	message is not shown, just like any other "empty diff" cases.
+	message is not shown, just like in any other "empty diff" case.
 
 --always::
 	Show the commit itself and the commit log message even


-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Right now I am having amnesia and deja-vu at the same time.  I think
I have forgotten this before.

^ permalink raw reply related

* Re: Union diff
From: Junio C Hamano @ 2006-04-09  9:45 UTC (permalink / raw)
  To: Petr Baudis; +Cc: git
In-Reply-To: <20060326102100.GF18185@pasky.or.cz>

Petr Baudis <pasky@ucw.cz> writes:

>   Now, the -c option documentation says:
>
> 	  It shows the differences from each of the parents to the merge
> 	result simultaneously, instead of showing pairwise diff between
> 	a parent and the result one at a time, which '-m' option output
> 	does.
>
>   This sounds as exactly what I want. Well, the only problem is that the
> same diff command as above with -c option added produces no diff at all,
> just the header and commit messages. Did I misunderstand the -c
> description and does it do something different?

The --combined diff option is to show merges more sensibly than
plain -m option.  Now, the definition of "sensible" is to say
that a merge is not interesting if it takes a version from one
of the parents.  The paths whose results do not match any of the
parents' version are deemed interesting and are shown.

I think you could tweak and give an option to intersect_paths()
in combine-diff.c, so that paths that match one of the parents
are also included in the output.  I haven't thought about it too
much, but my gut feeling is it would not be very involved
change.

^ permalink raw reply

* Re: [PATCH] git log [diff-tree options]...
From: Junio C Hamano @ 2006-04-09  9:16 UTC (permalink / raw)
  To: git; +Cc: Linus Torvalds
In-Reply-To: <7v7j5zce7x.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano <junkio@cox.net> writes:

> And this makes "git log" to take common diff-tree options, so
> that it can be used as "git whatchanged".

BTW, this is *not* "whatchanged" in that it does not omit the
log when there is no diff output.  It may not matter much --
when we give paths limiter to the command, the commits that are
shown are already limited by those paths limiter.

^ permalink raw reply

* [PATCH] git log [diff-tree options]...
From: Junio C Hamano @ 2006-04-09  9:04 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

And this makes "git log" to take common diff-tree options, so
that it can be used as "git whatchanged".

The recent revision walker updates by Linus to make path
limiting low-latency helps this quite a bit.

Signed-off-by: Junio C Hamano <junkio@cox.net>

---

 git.c |   32 +++++++++++++++++++++++++++++++-
 1 files changed, 31 insertions(+), 1 deletions(-)

52b70d56bd23811003a72866cc23a0a44b9da1b7
diff --git a/git.c b/git.c
index fa58232..8776088 100644
--- a/git.c
+++ b/git.c
@@ -16,6 +16,8 @@ #include "common-cmds.h"
 #include "cache.h"
 #include "commit.h"
 #include "revision.h"
+#include "diff.h"
+#include "log-tree.h"
 
 #ifndef PATH_MAX
 # define PATH_MAX 4096
@@ -285,7 +287,10 @@ static int cmd_log(int argc, const char 
 	int abbrev = DEFAULT_ABBREV;
 	int abbrev_commit = 0;
 	const char *commit_prefix = "commit ";
+	struct log_tree_opt opt;
+	int do_diff = 0;
 
+	init_log_tree_opt(&opt);
 	argc = setup_revisions(argc, argv, &rev, "HEAD");
 	while (1 < argc) {
 		const char *arg = argv[1];
@@ -310,9 +315,31 @@ static int cmd_log(int argc, const char 
 			else if (40 < abbrev)
 				abbrev = 40;
 		}
-		else
+		else {
+			int cnt = log_tree_opt_parse(&opt, argv+1, argc-1);
+			if (0 < cnt) {
+				do_diff = 1;
+				argv += cnt;
+				argc -= cnt;
+				continue;
+			}
 			die("unrecognized argument: %s", arg);
+		}
+
 		argc--; argv++;
+	}
+	if (do_diff) {
+		opt.diffopt.abbrev = abbrev;
+		opt.verbose_header = 0;
+		opt.always_show_header = 0;
+		opt.no_commit_id = 1;
+		if (opt.combine_merges)
+			opt.ignore_merges = 0;
+		if (opt.dense_combined_merges)
+			opt.diffopt.output_format = DIFF_FORMAT_PATCH;
+		if (opt.diffopt.output_format == DIFF_FORMAT_PATCH)
+			opt.diffopt.recursive = 1;
+		diff_setup_done(&opt.diffopt);
 	}
 
 	prepare_revision_walk(&rev);
@@ -350,6 +377,9 @@ static int cmd_log(int argc, const char 
 		pretty_print_commit(commit_format, commit, ~0, buf,
 				    LOGSIZE, abbrev);
 		printf("%s\n", buf);
+
+		if (do_diff)
+			log_tree_commit(&opt, commit);
 	}
 	free(buf);
 	return 0;
-- 
1.2.6.gad0b

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox