Git development

Git development
 help / color / mirror / Atom feed

* Re: 2.6.17-rc6-mm2
From: Goo GGooo @ 2006-06-16  5:49 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, git
In-Reply-To: <Pine.LNX.4.64.0606151937360.5498@g5.osdl.org>

On 6/16/06, Linus Torvalds <torvalds@osdl.org> wrote:

> So to recap:
>  - http is fundamentally weaker, and needs some server-side help to work
>  - rsync is fine for the initial clone, but doesn't actually know what
>    it's doing, so the end result can actually even be a corrupted
>    repository, because you happened to rsync just as it was updating.
>  - the native git protocol generally should be considered the golden
>    standard, where the other ones are just fallbacks in case of problems
>    (like firewalls that don't let git:// through, or more commonly hosted
>    servers that don't do the git protocol at all).
>
> Which hopefully clarifies the issue a bit.

Thanks for explanation. Unfortunately I can't use git:// with "git
pull" (at least in git-1.3.2). First it does some traffic, that
suddenly stops - I guess the server starts doing *something*, perhaps
preparing the update for me or whatnot. After a pretty long while it
sends some more data but in the meanwhile my ADSL router dropped the
NAT entry and git sits on my side waiting for data forever. Recently I
tried the same on a system with direct Inet connection and that worked
just fine.

I suggest adding SO_KEEPALIVE option on the git socket.

Goo

^ permalink raw reply

* Re: Security problem
From: Alexander Litvinov @ 2006-06-16  5:37 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git
In-Reply-To: <Pine.LNX.4.64.0606152137410.5498@g5.osdl.org>

> Well, they may not be "safe" - you just need to work a _lot_ harder to
> corrupt a pack-file in any interesting manner. And again, git-fsck-objects
> would pick up any such thing going on.
As it shown in pack-objects.c, each object have stored sha1, almost the same 
as file rename.

> The first is that git-fsck-objects will definitely find any repository
> inconsistency, and to get around that, you either have to get around the
> basic properties of SHA-1 (ie break the hash) _or_ you have to actually
> change the repository so that it's still a valid repo, just with different
> content.
I still belive SHA-1 is good enouth to hash files - I did not hear about 
generation reasonable duplicate that can compile and work :-)

>  - if you corrupt the repository, subsequent clones (or even pulls) from
>    the corrupt repository simply won't work if you use the native
>    protocol, because the native protocol doesn't actually trust anything
>    but the actual contents (so if the contents won't match, then neither
>    will the SHA1 names). So the corruption is pretty strictly limited to
>    the _one_ repository that the attacker had write access to.
As I understand sent pack file will contains actial SHA-1 of objects. And any 
hack will be cleary visible.

>    So there's a pretty fundamental "corruption containment" part there.
...
Situation with evil repo is clear to me: you can turst only to trusted commit 
identified by SHA-1

> But yeah, I actually still personally do a fair number of
> "git-fsck-objects". I've never found anything that way since very early on
> (and back then, the real problem was rsync getting objects that weren't
> reachable), but I still do it. It makes me feel happier.
As the result: Always fsck repo after pull/clone !

^ permalink raw reply

* Re: Security problem
From: Linus Torvalds @ 2006-06-16  5:00 UTC (permalink / raw)
  To: Alexander Litvinov; +Cc: Junio C Hamano, git
In-Reply-To: <200606161054.46813.lan@academsoft.ru>

On Fri, 16 Jun 2006, Alexander Litvinov wrote:
>
> You are right, I trust my file system. But if our team had central repo with 
> ssh access to that machine, every developer can hack central repo.
> 
> Whould git-clone/git-fetch warn me about this ?

Using the native protocol, yes. Using rsync, unless you explicitly fsck 
the result, no.

> It can't checkout object (3609f20ebd357679b111783e8afaf36ec46427f3 is the 
> original file). It seems packed repos are safe from this point.

Well, they may not be "safe" - you just need to work a _lot_ harder to 
corrupt a pack-file in any interesting manner. And again, git-fsck-objects 
would pick up any such thing going on.

Anyway, what it boils down to is that anybody who has write access to a 
particular repository can certainly change the repo in "interesting" ways. 

However, there are various inherent safety valves in place that make it 
really hard to corrupt on a bigger scale.

The first is that git-fsck-objects will definitely find any repository 
inconsistency, and to get around that, you either have to get around the 
basic properties of SHA-1 (ie break the hash) _or_ you have to actually 
change the repository so that it's still a valid repo, just with different 
content.

So let's take a look at those two cases:

 - if you corrupt the repository, subsequent clones (or even pulls) from 
   the corrupt repository simply won't work if you use the native 
   protocol, because the native protocol doesn't actually trust anything 
   but the actual contents (so if the contents won't match, then neither 
   will the SHA1 names). So the corruption is pretty strictly limited to 
   the _one_ repository that the attacker had write access to.

   So there's a pretty fundamental "corruption containment" part there.

   (Side note: there's no question that we might well be able to do 
   better. A _malicious_ server could actually send a corrupt pack, and 
   it's possible that a properly corrupted remote archive could cause even 
   a "good" git-send-pack to just silently send a corrupt pack, so that 
   you'd need to use "git-fsck-objects" on the receiving side to notice 
   that you are missing objects, for example)

 - if the repository is good (ie fsck is fine), then obviously a "git 
   pull" will also succeed. However, you can't _hide_ the data the way you 
   tried to do: when the receiver checks out the most recent version, it 
   will definitely use the data in the object, there's no way to get the 
   server to serve different data in objects and in the working tree 
   (because the server literally doesn't even send the working tree at 
   all).

   So you can always convince somebody to pull from an "evil repository", 
   and that's no different from committing a bug by mistake. But at least 
   you can't try to hide the bug just in the object store and have it not 
   show up in diffs and in checked-out copies.

The latter case is true even with http and rsync, the actual pull event 
always pulls just the database, never any checked-out state (in fact, 
the common case is obviously to pull from a bare repository that doesn't 
even _have_ checked-out state). So you can't hide things in the index or 
in the checked-out state except in the filesystem that you have direct 
write access to.

But yeah, I actually still personally do a fair number of 
"git-fsck-objects". I've never found anything that way since very early on 
(and back then, the real problem was rsync getting objects that weren't 
reachable), but I still do it. It makes me feel happier.

Of course, bugs always happen. But I can pretty much guarantee that git is 
fundamentally harder to corrupt than most things. We've had git-fsck-cache 
since April 8th last year (or, put another way, literally since "Day 2" in 
git terms - it's the eight commit in the whole git history).

Git also has an almost total lack of redundant information. There's 
basically no "duplicate" information in the repository format itself where 
you could hide something so that it wouldn't be noticed.

In a checked-out project, the checked-out state itself is "duplicate 
information" (and that was where your "attack" tried to hide things), and 
there's the index (which is actually a much better and subtle place to 
hide things ;). But neither of them have any life outside of that 
particular repository.

			Linus

^ permalink raw reply

* Re: Security problem
From: Alexander Litvinov @ 2006-06-16  3:54 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git
In-Reply-To: <Pine.LNX.4.64.0606151948230.5498@g5.osdl.org>

> If you can't trust your local filesystem, you are screwed.

You are right, I trust my file system. But if our team had central repo with 
ssh access to that machine, every developer can hack central repo.

Whould git-clone/git-fetch warn me about this ?

My own test with (another) local repo says:
lan@lan:~/tmp/git/test> git clone 1 2
Generating pack...
Done counting 3 objects.
Deltifying 3 objects.
 100% (3/3) done
Total 3, written 3 (delta 0), reused 0 (delta 0)
error: git-checkout-index: unable to read sha1 file of a 
(3609f20ebd357679b111783e8afaf36ec46427f3)

It can't checkout object (3609f20ebd357679b111783e8afaf36ec46427f3 is the 
original file). It seems packed repos are safe from this point.

^ permalink raw reply

* Re: Security problem
From: Linus Torvalds @ 2006-06-16  2:56 UTC (permalink / raw)
  To: Alexander Litvinov; +Cc: Junio C Hamano, git
In-Reply-To: <200606160931.29553.lan@academsoft.ru>

On Fri, 16 Jun 2006, Alexander Litvinov wrote:
> 
> I have found the ability to hack git repo. After this hacking people will 
> checkout hacked files from the "trusted" commit. Only git-fsck-objects will 
> complain at this.

Right.

If you can't trust your local filesystem, you are screwed. 

git-fsck-objects will notice when somebody has done something bad, but 

> Why does not git-checkout check if file content match name of the object ?

Why would it? It really just slows things down, and if you don't trust 
your local repo, people can "hack" you much more easily by just generating 
a _proper_ tree with the _proper_ data, and git checkout checking the SHA1 
wouldn't help at all.

The way to security lies in using git-fsck-objects, together with an 
_external_ source of trust. For example, that external source of trust may 
be a signed tag, or, perhaps even more simply, just by saving off the top 
commit name on some trusted medium.

But you do need a "point of trust" to start with. Without that, it's a lot 
easier to "hack" a git repo by doing

	echo 'Hacked file' > a
	git commit --amend a
	git prune

and now the file "a" has changed to "Hacked file", and even 
git-fsck-objects can't tell that anything bad happened.

(Btw, if you want to _hide_ the fact that "a" now contains "Hacked file", 
you do so by faking it in the index. You can have the checked-out copy say 
what it should say - ie "Usual file" - and if you don't want git to show 
you the difference to HEAD, you edit the .git/index file by hand so that 
the timestamp, size and inode matches the real SHA1, even though the 
_contents_ match "Usual file").

See?

You do need to trust something. Normally you'd trust your own filesystem, 
but git certainly supports other forms of trust through either the native 
support for signed certificates in the form of tags, or any other form of 
external trust.

			Linus

^ permalink raw reply

* Re: 2.6.17-rc6-mm2
From: Linus Torvalds @ 2006-06-16  2:46 UTC (permalink / raw)
  To: Goo GGooo; +Cc: linux-kernel, git
In-Reply-To: <ef5305790606151814i252c37c4mdd005f11f06ceac@mail.gmail.com>

On Fri, 16 Jun 2006, Goo GGooo wrote:
> 
> That's confusing - I believed all protocols should behave the same way...?

Not really. The primary protocol is the native git one, and the others try 
to do a best effort, but the http protocol really can't do a very good 
job unless the server side has run "git update-server-info" to help the 
http client along.

I suspect that the -mm git tree simply doesn't do that. In fact, even the 
main tree didn't use to do it, but I finally just broke down and added the 
proper hook to make it always do it automatically when I push.

(In case Andrew wants to do that, the way to do it is:

	echo -e "#!/bin/sh\nexec git-update-server-info" > hooks/post-update
	chmod +x hooks/post-update

inside the git repository - all it will do is always execute that script, 
and this "git-update-server-info", after you've updated the repo).

Finally, the rsync protocol just copies all objects over, and since it 
doesn't even know _which_ objects it is getting, it doesn't do the normal 
tag following that the native git protocol does.

So to recap:
 - http is fundamentally weaker, and needs some server-side help to work
 - rsync is fine for the initial clone, but doesn't actually know what 
   it's doing, so the end result can actually even be a corrupted 
   repository, because you happened to rsync just as it was updating.
 - the native git protocol generally should be considered the golden 
   standard, where the other ones are just fallbacks in case of problems 
   (like firewalls that don't let git:// through, or more commonly hosted 
   servers that don't do the git protocol at all).

Which hopefully clarifies the issue a bit.

		Linus

^ permalink raw reply

* Re: Security problem
From: Linus Torvalds @ 2006-06-16  2:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Alexander Litvinov, git
In-Reply-To: <7vbqsuc60q.fsf@assigned-by-dhcp.cox.net>

On Thu, 15 Jun 2006, Junio C Hamano wrote:
>
> Alexander Litvinov <lan@academsoft.ru> writes:
> 
> > Why does not git-checkout check if file content match name of the object ?
> 
> Good point.  We could do a few things:

I missed the original mail. What's the problem?

If this is about the remote end lying about the SHA1 name, it's a total 
non-issue for any of the native protocols, since the native protocols 
don't actually send SHA1 names at all, they just send the data (and we 
re-create the SHA1 name on receipt).

So there's no way to have the name of an object not match its content, 
unless you have actual corruption (which is for git-fsck-object to find, 
not somethign that should slow down any normal operation), or if you use 
one of the dumb protocols.

And if you use the dumb protocols, the data should probably be validated 
_there_ (by fetch(), rather than anywhere else). And for "rsync", you 
really don't have much choice apart from doing a full fsck, I suspect.

So I don't see the security issue, unless you don't trust the local 
filesystem, in which case nothing git can do matters at all..

		Linus

^ permalink raw reply

* Re: 2.6.17-rc6-mm2
From: Goo GGooo @ 2006-06-16  1:14 UTC (permalink / raw)
  To: linux-kernel, git
In-Reply-To: <ef5305790606142040r5912ce58kf9f889c3d61b2cc0@mail.gmail.com>

On 6/15/06, Goo GGooo <googgooo@gmail.com> wrote:
> Andrew Morton wrote:
>
> > - To fetch an -mm tree using git, use (for example)
> >
> >  git fetch git://git.kernel.org/pub/scm/linux/kernel/git/smurf/linux-trees.git
> > v2.6.16-rc2-mm1
>
> I'm not able to get -mm tree from GIT. In
> http://git.kernel.org/.../smurf/linux-trees.git/refs/tags/ I can see
> the most recent tags like v2.6.17-rc6-mm2 but cg-clone
> http://git.kernel.org/.../smurf/linux-trees.git gives me only
> 2.6.16-rc3 :(
>
> I tried "cg-fetch v2.6.17-rc6-mm2" which seemed to fetch some more
> tags, then played with git-checkout & friends but still can't get the
> most recent source tree.

All right, finally this worked out:
git pull rsync://git.kernel.org/pub/scm/linux/kernel/git/smurf/linux-trees.git \
      tag v2.6.17-rc6-mm2

Strange enough with http:// instead of rsync:// I got some message
about nonexistent tag.

Now when I try git pull with http:// again it says the tree is up to
date. However with git:// it started downloading more things and tags.

That's confusing - I believed all protocols should behave the same way...?

Goo

^ permalink raw reply

* Re: Security problem
From: Junio C Hamano @ 2006-06-16  0:12 UTC (permalink / raw)
  To: Alexander Litvinov; +Cc: git
In-Reply-To: <200606151709.22752.lan@academsoft.ru>

Alexander Litvinov <lan@academsoft.ru> writes:

> Why does not git-checkout check if file content match name of the object ?

Good point.  We could do a few things:

 - entry.c:write_entry() could validate after read_sha1_file(). 

 - read_sha1_file() could do the checking; this has performance
   implications, though.

Cloning over git aware protocols validate the objects coming
over the wire, so it may make sense to cheat and do the former,
so that we do not have to pay the validation cost every time we
access any object.

^ permalink raw reply

* Setting up git server?
From: lamikr @ 2006-06-15 23:19 UTC (permalink / raw)
  To: git

Hi

I have git-repo cloned from the linux-omap-2.6 that we have used as a
base for our h6300 development.
Earlier we have kept our kernel in svn (sync between git-branches and
svn has happened about once in a month by using
traditional diff files...)

I have now pulled the server to "/repos/git/linux-omap-h6300-2.6" and
setup the /etc/xinetd.d/git-daemon by using docs in
http://www.kernel.org/pub/software/scm/git/docs/everyday.html

How can I now create the git url for this? For example something like
this: git://aragorn.kortex.jyu.fi/repos/git/linux-omap-h6300-2.6.git

Mika

^ permalink raw reply

* Re: Autoconf/Automake
From: Johannes Schindelin @ 2006-06-15 23:10 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Yann Dirson, Alex Riesen, Pavel Roskin, git
In-Reply-To: <Pine.LNX.4.64.0606151545050.5498@g5.osdl.org>

Hi,

On Thu, 15 Jun 2006, Linus Torvalds wrote:

> On Thu, 15 Jun 2006, Yann Dirson wrote:
> > 
> > In tha case of jam, the doc issue can certainly be raised, but the
> > most prominent problem is probably that everyone and their dog knows
> > make,
> 
> Oh, I agree. A "simpler" thing that people don't know is often much 
> inferior to a complex thing that people are generally intimately familiar 
> with.
> 
> I just personally believe that autoconf/automake are the worst of both 
> worlds (ie it's a _complex_ thing that a lot of people don't know).
> 
> GNU make in many ways is actually not that bad. Yeah, the makefiles get 
> more complex, but it's usually not totally unreadable, and you can do some 
> clever stuff with it. 

I can add to that with first-hand experience of ant and maven.

A whole sh*t-load of people think make is broken. It does not live up to 
what they want, and it is slow.

And then they invent a _DISEASE_ like ant, which _does not begin_ to sport 
the features of make.

In a project I am stuck in, maven is used. It tries -- of all things -- to 
fix a few shortcomings of ant -- which was supposed to fix shortcomings of 
make! And let's face it. Maven is complicated, slow as a dog lacking all 
four feet, and it still does not do the things I can do in three lines 
with make. It's a complete desaster.

So to keep the discussion on topic: tell me what you want to fix wrt the 
current setup of git, and I'll try to fix it in less than 10 lines of make 
code. If that is impossible, let's continue then and there with the 
discussion about a switch to a newer, less tested, replacement of make, 
okay?

Ciao,
Dscho

^ permalink raw reply

* Re: Autoconf/Automake
From: Johannes Schindelin @ 2006-06-15 22:58 UTC (permalink / raw)
  To: Yann Dirson; +Cc: Phil Richards, git
In-Reply-To: <20060615220534.GL7766@nowhere.earth>

Hi,

On Fri, 16 Jun 2006, Yann Dirson wrote:

> On Thu, Jun 15, 2006 at 10:42:40PM +0200, Johannes Schindelin wrote:
> > As for now, I fail to see why the current system is not adequate for git!
> 
> I can reassure you, gazillions of people still fail to see why cvs is
> not adequate for their project.  And the ratio of devs in the
> corporate world not knowning git to those not knowning cvs is far
> superior to 2.  And everyone here knows cvs is not more adequate than
> git for so many tasks :)

You know as well as I that this comparison is unfair. I am _NOT_ a 
corporate person. I hope that you do not judge me as a complete airhead.

The point is: the right tool solves the problem. You can have a tool which 
is mighty cool, but way too powerful (AKA complicated).

As for CVS: there _are_ a few use cases where CVS is just the right tool. 
There are many more use cases where git is more than adequate, where CVS 
is not.

_BUT_: there are cases where something like autoconf/jam/cmake/blablabla 
is adequate, but I still fail to see why for git, the makefile system 
should not work. It is the most transparent way to configure a make system 
I encountered. It is short, concise, and does the job. And I understand 
it. As opposed to autoconf/jam/cmake/blablabla.

Hth,
Dscho

^ permalink raw reply

* Re: Autoconf/Automake
From: Linus Torvalds @ 2006-06-15 22:54 UTC (permalink / raw)
  To: Yann Dirson; +Cc: Alex Riesen, Pavel Roskin, git
In-Reply-To: <20060615211454.GK7766@nowhere.earth>

On Thu, 15 Jun 2006, Yann Dirson wrote:
> 
> In tha case of jam, the doc issue can certainly be raised, but the
> most prominent problem is probably that everyone and their dog knows
> make,

Oh, I agree. A "simpler" thing that people don't know is often much 
inferior to a complex thing that people are generally intimately familiar 
with.

I just personally believe that autoconf/automake are the worst of both 
worlds (ie it's a _complex_ thing that a lot of people don't know).

GNU make in many ways is actually not that bad. Yeah, the makefiles get 
more complex, but it's usually not totally unreadable, and you can do some 
clever stuff with it. 

The kernel makefiles are a pretty extreme example (and it hides a lot of 
the complexity in files that get included and that most people never ever 
need to look at). I suspect that git could more easily do something like 
that (on a _much_ smaller scale - don't get me wrong).

			Linus

^ permalink raw reply

* Re: Autoconf/Automake
From: Yann Dirson @ 2006-06-15 22:05 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Phil Richards, git
In-Reply-To: <Pine.LNX.4.63.0606152239270.7480@wbgn013.biozentrum.uni-wuerzburg.de>

On Thu, Jun 15, 2006 at 10:42:40PM +0200, Johannes Schindelin wrote:
> As for now, I fail to see why the current system is not adequate for git!

I can reassure you, gazillions of people still fail to see why cvs is
not adequate for their project.  And the ratio of devs in the
corporate world not knowning git to those not knowning cvs is far
superior to 2.  And everyone here knows cvs is not more adequate than
git for so many tasks :)

Best regards,
-- 
Yann Dirson    <ydirson@altern.org> |
Debian-related: <dirson@debian.org> |   Support Debian GNU/Linux:
                                    |  Freedom, Power, Stability, Gratis
     http://ydirson.free.fr/        | Check <http://www.debian.org/>

^ permalink raw reply

* Re: observations on parsecvs testing
From: Keith Packard @ 2006-06-15 22:04 UTC (permalink / raw)
  To: Sean; +Cc: keithp, Nicolas Pitre, git
In-Reply-To: <20060615164742.570e33a0.seanlkml@sympatico.ca>

[-- Attachment #1: Type: text/plain, Size: 477 bytes --]

On Thu, 2006-06-15 at 16:47 -0400, Sean wrote:
> las,
> 
> That was a planned optimization which I did mention to Keith previously.
> Was kinda waiting to hear back how it was working for him, and if there
> was an interest to put more work into it to include in his mainline.

The rcs2git code is working great and is on 'master' at this point;
optimizations to generate all of the revisions in one pass would be
greatly appreciated.

-- 
keith.packard@intel.com

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply

* Re: observations on parsecvs testing
From: Keith Packard @ 2006-06-15 22:03 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: keithp, git
In-Reply-To: <Pine.LNX.4.64.0606151529350.16002@localhost.localdomain>

[-- Attachment #1: Type: text/plain, Size: 3871 bytes --]

On Thu, 2006-06-15 at 16:37 -0400, Nicolas Pitre wrote:
> My machine is a P4 @ 3GHz with 1GB ram.
> 
> Feeding parsecvs with the Mozilla repository, it first ran for 175 
> minutes with about 98% CPU spent in user space reading the 100458 ,v 
> files and writing 700000+ blob objects.  Memory usage grew to 1789MB 
> total while the resident memory saturated around 700MB.  This part was 
> fine even with 1GB of ram since unused memory was gently pushed to swap.  
> Only problem is that spawned git-pack-object instances started failing 
> with memory allocation by that time, which is unffortunate but not 
> fatal.

Right, the ,v -> blob conversion process uses around 160 bytes per
revision as best I can count (one rev_commit, one rev_file and 
a 41-byte sha1 string); 700000 revisions would therefore use 1.1GB just
for the revision objects. It should be possible to reduce the size of
this data structure fairly significantly; converting the sha1 value to
binary and compressing the CVS revision number to minimal length.
Switching from the general git/cvs structure to this cvs-specific
structure is 'on the list' of things I'd like to do.

> But then things started to go bad after all ,v files were parsed.  The 
> parsecvs dropped to 3% CPU while the rest of the time was spent waiting 
> after swap IO and therefore no substantial progress was made at that 
> point.

Yeah, after this point, parsecvs is merging the computed revision
historys of the individual files into a global history. This means it's
walking across the whole set of files to compute each git commit. For
each branch, it computes the set of files visible at the head of that
branch and then sorts the last revision of the visible files to discover
the last change set along that branch, constructing a commit for each
logical changeset backwards from the present into the past. As it's
constructing commits from the present backwards, it must go all the way
to the past before it can emit any commits to the repository. So, it has
to save them somewhere; right now, it's saving them in memory. What it
could do is construct tree objects for each commit, saving only the sha1
that results and dump the rest of the data. That should save plenty of
memory, but would require a radical restructuring of the code (which is
desparately needed, btw). With this change, parsecvs should actually
*shrink* over time, instead of grow.

> So the Mozilla clearly requires 2GB of ram to realistically be converted 
> to GIT using parsecvs, unless its second phase is reworked to avoid 
> totally random access in memory in order to improve swap behavior, or 
> its in-memory data set is shrinked at least by half.

Changing the data structures used in the first phase will shrink them
significantly; replacing the second state data structures with sha1 tree
hash values and disposing of the first phase objects incrementally
should elicit a shrinking memory pattern rather than growing. It might
well be easier at this point to just take the basic CVS parser and start
afresh though; the code is a horror show of incremental refinements.

> Also rcs2git() is very inefficient especially with files having many 
> revisions as it reconstructs the delta chain on every call.  For example 
> mozilla/configure,v has at least 1690 revisions, and actually converting 
> it into GIT blobs goes at a rate of 2.4 objects per second _only_ on my 
> machine.  Can't objects be created as the delta list is walked/applied 
> instead?  That would significantly reduce the initial convertion time.

Yes, I wanted to do this, but also wanted to ensure that the constructed
versions exactly matched the native rcs output. Starting with 'real' rcs
code seemed likely to ensure the latter. This "should" be easy to fix...

-- 
keith.packard@intel.com

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply

* [PATCH] Add a "--notags" option for git-p4import.
From: Sean @ 2006-06-15 21:26 UTC (permalink / raw)
  To: git


P4import currently creates a git tag for every commit it imports.
When importing from a large repository too many tags can be created
for git to manage, so this provides an option to shut that feature
off if necessary.

Signed-off-by: Sean Estabrooks <seanlkml@sympatico.ca>
---
 Documentation/git-p4import.txt |    5 ++++-
 git-p4import.py                |   12 ++++++++----
 2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-p4import.txt b/Documentation/git-p4import.txt
index c198ff2..0858e5e 100644
--- a/Documentation/git-p4import.txt
+++ b/Documentation/git-p4import.txt
@@ -8,7 +8,7 @@ git-p4import - Import a Perforce reposit
 
 SYNOPSIS
 --------
-`git-p4import` [-q|-v] [--authors <file>] [-t <timezone>] <//p4repo/path> <branch>
+`git-p4import` [-q|-v] [--notags] [--authors <file>] [-t <timezone>] <//p4repo/path> <branch>
 
 `git-p4import` --stitch <//p4repo/path>
 
@@ -43,6 +43,9 @@ OPTIONS
 	Specify an authors file containing a mapping of Perforce user
 	ids to full names and email addresses (see Notes below).
 
+\--notags::
+	Do not create a tag for each imported commit.
+
 \--stitch::
 	Import the contents of the given perforce branch into the
 	currently checked out git branch.
diff --git a/git-p4import.py b/git-p4import.py
index 74172ab..908941d 100644
--- a/git-p4import.py
+++ b/git-p4import.py
@@ -23,7 +23,6 @@ s = signal(SIGINT, SIG_DFL)
 if s != default_int_handler:
    signal(SIGINT, s)
 
-
 def die(msg, *args):
     for a in args:
         msg = "%s %s" % (msg, a)
@@ -38,6 +37,7 @@ verbosity = 1
 logfile = "/dev/null"
 ignore_warnings = False
 stitch = 0
+tagall = True
 
 def report(level, msg, *args):
     global verbosity
@@ -261,10 +261,9 @@ class git_command:
         self.make_tag("p4/%s"%id, commit)
         self.git("update-ref HEAD %s %s" % (commit, current) )
 
-
 try:
     opts, args = getopt.getopt(sys.argv[1:], "qhvt:",
-                    ["authors=","help","stitch=","timezone=","log=","ignore"])
+            ["authors=","help","stitch=","timezone=","log=","ignore","notags"])
 except getopt.GetoptError:
     usage()
 
@@ -275,6 +274,8 @@ for o, a in opts:
         verbosity += 1
     if o in ("--log"):
         logfile = a
+    if o in ("--notags"):
+        tagall = False
     if o in ("-h", "--help"):
         usage()
     if o in ("--ignore"):
@@ -350,7 +351,10 @@ for id in changes:
     report(1, "Importing changeset", id)
     change = p4.describe(id)
     p4.sync(id)
-    git.commit(change.author, change.email, change.date, change.msg, id)
+    if tagall :
+            git.commit(change.author, change.email, change.date, change.msg, id)
+    else:
+            git.commit(change.author, change.email, change.date, change.msg, "import")
     if stitch == 1:
         git.clean_directories()
         stitch = 0
-- 
1.4.0.rc2

^ permalink raw reply related

* Re: Autoconf/Automake
From: Yann Dirson @ 2006-06-15 21:14 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Alex Riesen, Pavel Roskin, git
In-Reply-To: <Pine.LNX.4.64.0606150954430.5498@g5.osdl.org>

On Thu, Jun 15, 2006 at 10:02:10AM -0700, Linus Torvalds wrote:
> Too many developers shrug off the "it's hard to use" argument. THEY think 
> it's fine. THEY think it's "lack of training". THEY think the tools are 
> fine, and the problem is the user.
> 
> THEY are wrong.
> 
> Almost every time when a user says "it's hard to use", the user is right. 
> Sometimes it's a lack of documentation, but quite often it's just that the 
> tool interfaces are bad.

In tha case of jam, the doc issue can certainly be raised, but the
most prominent problem is probably that everyone and their dog knows
make, and expects a replacement to work in a similar fashion.  The
current documentation and tutorial unfortunately does not show
precisely how people used to "make" can easily switch to jam.

For those not knowing about jam, I'd say the 1st thing to anchor in
one's mind is that jam gives complete (programmatic) control on the
dependency tree (eg. you just have to write once that the results of a
compilation have to be removed by "jam clean", and everytime you
declare a file to be built with your rule, you don't have to remember
to add it to the Clean rule - and more importantly, as soon as you
remove that declaration, you don't have to fear the Clean target to
remove it, in case it would be precious).

> Sometimes the problem space makes the interfaces fundamentally hard. But 
> sometimes the program itself just makes things ugly and hard, and autoconf 
> and automake definitely didn't make it easier for users - they were 
> designed for people who knew fifteen different versions of UNIX, and not 
> for sane people.
> 
> These days, there aren't fifteen different versions of UNIX. There's a 
> couple, and it's perfectly ok to actually say "fix your damn system and 
> just install GNU make". It's easier to install GNU make than it is to 
> install autoconf/automake.

Right, autoconf would be much more sane if it would not insist on
supporting vintage unices. OTOH, people having to work on these
systems (eg. for professional reason - not everyone has the luck to
work with modern systems all the time) are more than happy to be able
to build some recent tools to make there task easier.  Except when it
fails in that task (eg. a configure script for the bash package
failing to run on an years-old lynxos version because of a sh bug on
the OS), it still does a wonderful job in the end.

But I agree having to carry all this compat stuff, when one just wants
to benefit from higher-level features (like those mentionned by
Oliver), is annoying.  Maybe the support for legacy platforms could be
restricted in some way to the bare minimum.  Eg. using a "legacy"
backend where the cruft would go, and stubs for modern things, that
would generate a hopefully-more-portable-but-limited
./configure-simple script, and a "modern" backend generating a sane
full-fledged bash script.

But I'm going off-topic :)

Best regards,
-- 
Yann Dirson    <ydirson@altern.org> |
Debian-related: <dirson@debian.org> |   Support Debian GNU/Linux:
                                    |  Freedom, Power, Stability, Gratis
     http://ydirson.free.fr/        | Check <http://www.debian.org/>

^ permalink raw reply

* Re: observations on parsecvs testing
From: Nicolas Pitre @ 2006-06-15 20:55 UTC (permalink / raw)
  To: Sean; +Cc: keithp, git
In-Reply-To: <BAYC1-PASMTP10021C1A6034B8753D06DDAE820@CEZ.ICE>

On Thu, 15 Jun 2006, Sean wrote:

> On Thu, 15 Jun 2006 16:37:30 -0400 (EDT)
> Nicolas Pitre <nico@cam.org> wrote:
> 
> > Also rcs2git() is very inefficient especially with files having many 
> > revisions as it reconstructs the delta chain on every call.  For example 
> > mozilla/configure,v has at least 1690 revisions, and actually converting 
> > it into GIT blobs goes at a rate of 2.4 objects per second _only_ on my 
> > machine.  Can't objects be created as the delta list is walked/applied 
> > instead?  That would significantly reduce the initial convertion time.
> 
> Hi Nicolas,
> 
> That was a planned optimization which I did mention to Keith previously.
> Was kinda waiting to hear back how it was working for him, and if there
> was an interest to put more work into it to include in his mainline.

I think it is really worth it.  I'd expect the first half of the 
convertion to go significantly faster then.


Nicolas

^ permalink raw reply

* [PATCH 2/3] git-svn: fix several small bugs, enable branch optimization
From: Eric Wong @ 2006-06-15 20:55 UTC (permalink / raw)
  To: Junio C Hamano, git; +Cc: Eric Wong
In-Reply-To: <11504049313192-git-send-email-normalperson@yhbt.net>

Share the repack counter between branches when doing
multi-fetch.

Pass the -d flag to git repack by default.  That's the
main reason we will want automatic pack generation, to
save space and improve disk cache performance.  I won't
add -a by default since it can generate extremely large
packs that make RAM-starved systems unhappy.

We no longer generate the .git/svn/$GIT_SVN_ID/info/uuid
file, either.  It was never read in the first place.

Check for and create .rev_db if we need to during fetch (in case
somebody manually blew away their .rev_db and wanted to start
over.  Mainly makes debugging easier).

Croak with $? instead of $! if there's an error closing pipes

Quiet down some of the chatter, too.

Signed-off-by: Eric Wong <normalperson@yhbt.net>
---
 contrib/git-svn/git-svn.perl |  146 +++++++++++++++++++++++-------------------
 1 files changed, 81 insertions(+), 65 deletions(-)

diff --git a/contrib/git-svn/git-svn.perl b/contrib/git-svn/git-svn.perl
index 88af9c5..27f1d68 100755
--- a/contrib/git-svn/git-svn.perl
+++ b/contrib/git-svn/git-svn.perl
@@ -368,7 +368,6 @@ sub fetch_lib {
 		defined(my $pid = fork) or croak $!;
 		if (!$pid) {
 			$SVN::Error::handler = \&libsvn_skip_unknown_revs;
-			print "Fetching revisions $min .. $max\n";
 
 			# Yes I'm perfectly aware that the fourth argument
 			# below is the limit revisions number.  Unfortunately
@@ -391,7 +390,6 @@ sub fetch_lib {
 							$log_msg, @parents);
 					}
 				});
-			$SVN::Error::handler = sub { 'quiet warnings' };
 			exit 0;
 		}
 		waitpid $pid, 0;
@@ -463,7 +461,7 @@ sub commit_lib {
 	my (@revs) = @_;
 	my ($r_last, $cmt_last) = svn_grab_base_rev();
 	defined $r_last or die "Must have an existing revision to commit\n";
-	my $fetched = fetch_lib();
+	my $fetched = fetch();
 	if ($r_last != $fetched->{revision}) {
 		print STDERR "There are new revisions that were fetched ",
 				"and need to be merged (or acknowledged) ",
@@ -523,7 +521,7 @@ sub commit_lib {
 				$no = 1;
 			}
 		}
-		close $fh or croak $!;
+		close $fh or croak $?;
 		if (! defined $r_new && ! defined $cmt_new) {
 			unless ($no) {
 				die "Failed to parse revision information\n";
@@ -633,17 +631,8 @@ sub multi_init {
 sub multi_fetch {
 	# try to do trunk first, since branches/tags
 	# may be descended from it.
-	if (-d "$GIT_DIR/svn/trunk") {
-		print "Fetching trunk\n";
-		defined(my $pid = fork) or croak $!;
-		if (!$pid) {
-			$GIT_SVN = $ENV{GIT_SVN_ID} = 'trunk';
-			init_vars();
-			fetch(@_);
-			exit 0;
-		}
-		waitpid $pid, 0;
-		croak $? if $?;
+	if (-e "$GIT_DIR/svn/trunk/info/url") {
+		fetch_child_id('trunk', @_);
 	}
 	rec_fetch('', "$GIT_DIR/svn", @_);
 }
@@ -725,6 +714,41 @@ out:
 
 ########################### utility functions #########################
 
+sub fetch_child_id {
+	my $id = shift;
+	print "Fetching $id\n";
+	my $ref = "$GIT_DIR/refs/remotes/$id";
+	my $ca = file_to_s($ref) if (-r $ref);
+	defined(my $pid = fork) or croak $!;
+	if (!$pid) {
+		$GIT_SVN = $ENV{GIT_SVN_ID} = $id;
+		init_vars();
+		fetch(@_);
+		exit 0;
+	}
+	waitpid $pid, 0;
+	croak $? if $?;
+	return unless $_repack || -r $ref;
+
+	my $cb = file_to_s($ref);
+
+	defined($pid = open my $fh, '-|') or croak $!;
+	my $url = file_to_s("$GIT_DIR/svn/$id/info/url");
+	$url = qr/\Q$url\E/;
+	if (!$pid) {
+		exec qw/git-rev-list --pretty=raw/,
+				$ca ? "$ca..$cb" : $cb or croak $!;
+	}
+	while (<$fh>) {
+		if (/^    git-svn-id: $url\@\d+ [a-f0-9\-]+$/) {
+			check_repack();
+		} elsif (/^    git-svn-id: \S+\@\d+ [a-f0-9\-]+$/) {
+			last;
+		}
+	}
+	close $fh;
+}
+
 sub rec_fetch {
 	my ($pfx, $p, @args) = @_;
 	my @dir;
@@ -733,16 +757,7 @@ sub rec_fetch {
 			$pfx .= '/' if $pfx && $pfx !~ m!/$!;
 			my $id = $pfx . basename $_;
 			next if $id eq 'trunk';
-			print "Fetching $id\n";
-			defined(my $pid = fork) or croak $!;
-			if (!$pid) {
-				$GIT_SVN = $ENV{GIT_SVN_ID} = $id;
-				init_vars();
-				fetch(@args);
-				exit 0;
-			}
-			waitpid $pid, 0;
-			croak $? if $?;
+			fetch_child_id($id, @args);
 		} elsif (-d $_) {
 			push @dir, $_;
 		}
@@ -943,7 +958,6 @@ sub read_uuid {
 		$SVN_UUID = $info->{'Repository UUID'} or
 					croak "Repository UUID unreadable\n";
 	}
-	s_to_file($SVN_UUID,"$GIT_SVN_DIR/info/uuid");
 }
 
 sub quiet_run {
@@ -1107,7 +1121,7 @@ sub parse_diff_tree {
 			croak "Error parsing $_\n";
 		}
 	}
-	close $diff_fh or croak $!;
+	close $diff_fh or croak $?;
 
 	return \@mods;
 }
@@ -1348,7 +1362,7 @@ sub get_commit_message {
 				print $msg $_ or croak $!;
 			}
 		}
-		close $msg_fh or croak $!;
+		close $msg_fh or croak $?;
 	}
 	close $msg or croak $!;
 
@@ -1562,7 +1576,7 @@ sub svn_info {
 			push @{$ret->{-order}}, $1;
 		}
 	}
-	close $info_fh or croak $!;
+	close $info_fh or croak $?;
 	return $ret;
 }
 
@@ -1638,7 +1652,7 @@ sub do_update_index {
 		}
 		print $ui $x,"\0";
 	}
-	close $ui or croak $!;
+	close $ui or croak $?;
 }
 
 sub index_changes {
@@ -1765,11 +1779,15 @@ sub git_commit {
 
 	# this output is read via pipe, do not change:
 	print "r$log_msg->{revision} = $commit\n";
+	check_repack();
+	return $commit;
+}
+
+sub check_repack {
 	if ($_repack && (--$_repack_nr == 0)) {
 		$_repack_nr = $_repack;
 		sys("git repack $_repack_flags");
 	}
-	return $commit;
 }
 
 sub set_commit_env {
@@ -1877,6 +1895,10 @@ sub svn_cmd_checkout {
 }
 
 sub check_upgrade_needed {
+	if (!-r $REVDB) {
+		open my $fh, '>>',$REVDB or croak $!;
+		close $fh;
+	}
 	my $old = eval {
 		my $pid = open my $child, '-|';
 		defined $pid or croak $!;
@@ -2026,7 +2048,8 @@ sub migration_check {
 sub find_rev_before {
 	my ($r, $id, $eq_ok) = @_;
 	my $f = "$GIT_DIR/svn/$id/.rev_db";
-	# --$r unless $eq_ok;
+	return (undef,undef) unless -r $f;
+	--$r unless $eq_ok;
 	while ($r > 0) {
 		if (my $c = revdb_get($f, $r)) {
 			return ($r, $c);
@@ -2072,7 +2095,7 @@ sub set_default_vals {
 	if (defined $_repack) {
 		$_repack = 1000 if ($_repack <= 0);
 		$_repack_nr = $_repack;
-		$_repack_flags ||= '';
+		$_repack_flags ||= '-d';
 	}
 }
 
@@ -2352,7 +2375,7 @@ sub libsvn_get_file {
 	close $ho or croak $?;
 	$hash =~ /^$sha1$/o or die "not a sha1: $hash\n";
 	print $gui $mode,' ',$hash,"\t",$p,"\0" or croak $!;
-	close $fd or croak $!;
+	close $fd or croak $?;
 }
 
 sub libsvn_log_entry {
@@ -2381,7 +2404,7 @@ sub process_rm {
 		while (<$ls>) {
 			print $gui '0 ',0 x 40,"\t",$_ or croak $!;
 		}
-		close $ls or croak $!;
+		close $ls or croak $?;
 	} else {
 		print $gui '0 ',0 x 40,"\t",$f,"\0" or croak $!;
 	}
@@ -2411,7 +2434,7 @@ sub libsvn_fetch {
 		$pool->clear;
 	}
 	libsvn_get_file($gui, $_, $rev) foreach (@amr);
-	close $gui or croak $!;
+	close $gui or croak $?;
 	return libsvn_log_entry($rev, $author, $date, $msg, [$last_commit]);
 }
 
@@ -2514,36 +2537,30 @@ sub revisions_eq {
 }
 
 sub libsvn_find_parent_branch {
-	return undef; # XXX this function is disabled atm (not tested enough)
 	my ($paths, $rev, $author, $date, $msg) = @_;
 	my $svn_path = '/'.$SVN_PATH;
 
 	# look for a parent from another branch:
-	foreach (keys %$paths) {
-		next if ($_ ne $svn_path);
-		my $i = $paths->{$_};
-		my $branch_from = $i->copyfrom_path or next;
-		my $r = $i->copyfrom_rev;
-		print STDERR  "Found possible branch point: ",
-					"$branch_from => $svn_path, $r\n";
-		$branch_from =~ s#^/##;
-		my $l_map = read_url_paths();
-		my $url = $SVN->{url};
-		defined $l_map->{$url} or next;
-		my $id  = $l_map->{$url}->{$branch_from} or next;
-		my ($r0, $parent) = find_rev_before($r,$id,1);
-		if (defined $r0 && defined $parent &&
-					revisions_eq($branch_from, $r0, $r)) {
-			unlink $GIT_SVN_INDEX;
-			print STDERR "Found branch parent: $parent\n";
-			sys(qw/git-read-tree/, $parent);
-			return libsvn_fetch($parent, $paths, $rev,
-						$author, $date, $msg);
-		} else {
-			print STDERR
-				"Nope, branch point not imported or unknown\n";
-		}
-	}
+	my $i = $paths->{$svn_path} or return;
+	my $branch_from = $i->copyfrom_path or return;
+	my $r = $i->copyfrom_rev;
+	print STDERR  "Found possible branch point: ",
+				"$branch_from => $svn_path, $r\n";
+	$branch_from =~ s#^/##;
+	my $l_map = read_url_paths();
+	my $url = $SVN->{url};
+	defined $l_map->{$url} or return;
+	my $id = $l_map->{$url}->{$branch_from} or return;
+	my ($r0, $parent) = find_rev_before($r,$id,1);
+	return unless (defined $r0 && defined $parent);
+	if (revisions_eq($branch_from, $r0, $r)) {
+		unlink $GIT_SVN_INDEX;
+		print STDERR "Found branch parent: $parent\n";
+		sys(qw/git-read-tree/, $parent);
+		return libsvn_fetch($parent, $paths, $rev,
+					$author, $date, $msg);
+	}
+	print STDERR "Nope, branch point not imported or unknown\n";
 	return undef;
 }
 
@@ -2556,7 +2573,7 @@ sub libsvn_new_tree {
 	my $pool = SVN::Pool->new;
 	libsvn_traverse($gui, '', $SVN_PATH, $rev, $pool);
 	$pool->clear;
-	close $gui or croak $!;
+	close $gui or croak $?;
 	return libsvn_log_entry($rev, $author, $date, $msg);
 }
 
@@ -2630,7 +2647,7 @@ sub libsvn_commit_cb {
 			exit 1;
 		}
 	} else {
-		fetch_lib("$rev=$c");
+		fetch("$rev=$c");
 	}
 }
 
@@ -2664,7 +2681,6 @@ sub libsvn_skip_unknown_revs {
 	# 175002 - http(s)://
 	#   More codes may be discovered later...
 	if ($errno == 175002 || $errno == 160013) {
-		print STDERR "directory non-existent\n";
 		return;
 	}
 	croak "Error from SVN, ($errno): ", $err->expanded_message,"\n";
-- 
1.4.0

^ permalink raw reply related

* [PATCH 3/3] git-svn: Eliminate temp file usage in libsvn_get_file()
From: Eric Wong @ 2006-06-15 20:55 UTC (permalink / raw)
  To: Junio C Hamano, git; +Cc: Eric Wong
In-Reply-To: <11504049322660-git-send-email-normalperson@yhbt.net>

This means we'll have a loose object when we encounter a symlink
but that's not the common case.

We also don't have to worry about svn:eol-style when using the
SVN libraries, either.  So remove the code to deal with that.

Signed-off-by: Eric Wong <normalperson@yhbt.net>
---
 contrib/git-svn/git-svn.perl |   56 +++++++++++++++++-------------------------
 1 files changed, 23 insertions(+), 33 deletions(-)

diff --git a/contrib/git-svn/git-svn.perl b/contrib/git-svn/git-svn.perl
index 27f1d68..149149f 100755
--- a/contrib/git-svn/git-svn.perl
+++ b/contrib/git-svn/git-svn.perl
@@ -31,6 +31,7 @@ use File::Path qw/mkpath/;
 use Getopt::Long qw/:config gnu_getopt no_ignore_case auto_abbrev pass_through/;
 use File::Spec qw//;
 use POSIX qw/strftime/;
+use IPC::Open3;
 use Memoize;
 memoize('revisions_eq');
 
@@ -2335,47 +2336,36 @@ sub libsvn_get_file {
 	my $p = $f;
 	return unless ($p =~ s#^\Q$SVN_PATH\E/?##);
 
-	my $fd = IO::File->new_tmpfile or croak $!;
+	my ($hash, $pid, $in, $out);
 	my $pool = SVN::Pool->new;
-	my ($r, $props) = $SVN->get_file($f, $rev, $fd, $pool);
+	defined($pid = open3($in, $out, '>&STDERR',
+				qw/git-hash-object -w --stdin/)) or croak $!;
+	my ($r, $props) = $SVN->get_file($f, $rev, $in, $pool);
+	$in->flush == 0 or croak $!;
+	close $in or croak $!;
 	$pool->clear;
-	$fd->flush == 0 or croak $!;
-	seek $fd, 0, 0 or croak $!;
-	if (my $es = $props->{'svn:eol-style'}) {
-		my $new_fd = IO::File->new_tmpfile or croak $!;
-		eol_cp_fd($fd, $new_fd, $es);
-		close $fd or croak $!;
-		$fd = $new_fd;
-		seek $fd, 0, 0 or croak $!;
-		$fd->flush == 0 or croak $!;
-	}
-	my $mode = '100644';
-	if (exists $props->{'svn:executable'}) {
-		$mode = '100755';
-	}
+	chomp($hash = do { local $/; <$out> });
+	close $out or croak $!;
+	waitpid $pid, 0;
+	$hash =~ /^$sha1$/o or die "not a sha1: $hash\n";
+
+	my $mode = exists $props->{'svn:executable'} ? '100755' : '100644';
 	if (exists $props->{'svn:special'}) {
 		$mode = '120000';
-		local $/;
-		my $link = <$fd>;
+		my $link = `git-cat-file blob $hash`;
 		$link =~ s/^link // or die "svn:special file with contents: <",
 						$link, "> is not understood\n";
-		seek $fd, 0, 0 or croak $!;
-		truncate $fd, 0 or croak $!;
-		print $fd $link or croak $!;
-		seek $fd, 0, 0 or croak $!;
-		$fd->flush == 0 or croak $!;
-	}
-	my $pid = open my $ho, '-|';
-	defined $pid or croak $!;
-	if (!$pid) {
-		open STDIN, '<&', $fd or croak $!;
-		exec qw/git-hash-object -w --stdin/ or croak $!;
+		defined($pid = open3($in, $out, '>&STDERR',
+				qw/git-hash-object -w --stdin/)) or croak $!;
+		print $in $link;
+		$in->flush == 0 or croak $!;
+		close $in or croak $!;
+		chomp($hash = do { local $/; <$out> });
+		close $out or croak $!;
+		waitpid $pid, 0;
+		$hash =~ /^$sha1$/o or die "not a sha1: $hash\n";
 	}
-	chomp(my $hash = do { local $/; <$ho> });
-	close $ho or croak $?;
-	$hash =~ /^$sha1$/o or die "not a sha1: $hash\n";
 	print $gui $mode,' ',$hash,"\t",$p,"\0" or croak $!;
-	close $fd or croak $?;
 }
 
 sub libsvn_log_entry {
-- 
1.4.0

^ permalink raw reply related

* [PATCH 1/3] git-svn: avoid creating some small files
From: Eric Wong @ 2006-06-15 20:55 UTC (permalink / raw)
  To: Junio C Hamano, git; +Cc: Eric Wong

repo_path_split() is already pretty fast, and is already
optimized via caching.

We also don't need to create an exclude file if we're
relying on the SVN libraries.

Signed-off-by: Eric Wong <normalperson@yhbt.net>
---
 contrib/git-svn/git-svn.perl |   26 ++++++++------------------
 1 files changed, 8 insertions(+), 18 deletions(-)

diff --git a/contrib/git-svn/git-svn.perl b/contrib/git-svn/git-svn.perl
index 884969e..88af9c5 100755
--- a/contrib/git-svn/git-svn.perl
+++ b/contrib/git-svn/git-svn.perl
@@ -1005,12 +1005,6 @@ sub setup_git_svn {
 	close $fh;
 	s_to_file($SVN_URL,"$GIT_SVN_DIR/info/url");
 
-	open my $fd, '>>', "$GIT_SVN_DIR/info/exclude" or croak $!;
-	print $fd '.svn',"\n";
-	close $fd or croak $!;
-	my ($url, $path) = repo_path_split($SVN_URL);
-	s_to_file($url, "$GIT_SVN_DIR/info/repo_url");
-	s_to_file($path, "$GIT_SVN_DIR/info/repo_path");
 }
 
 sub assert_svn_wc_clean {
@@ -1649,6 +1643,12 @@ sub do_update_index {
 
 sub index_changes {
 	return if $_use_lib;
+
+	if (!-f "$GIT_SVN_DIR/info/exclude") {
+		open my $fd, '>>', "$GIT_SVN_DIR/info/exclude" or croak $!;
+		print $fd '.svn',"\n";
+		close $fd or croak $!;
+	}
 	my $no_text_base = shift;
 	do_update_index([qw/git-diff-files --name-only -z/],
 			'remove',
@@ -2018,9 +2018,6 @@ sub migration_check {
 		my $dn = dirname("$GIT_DIR/svn/$x");
 		mkpath([$dn]) unless -d $dn;
 		rename "$GIT_DIR/$x", "$GIT_DIR/svn/$x" or croak "$!: $x";
-		my ($url, $path) = repo_path_split($u);
-		s_to_file($url, "$GIT_DIR/svn/$x/info/repo_url");
-		s_to_file($path, "$GIT_DIR/svn/$x/info/repo_path");
 	}
 	migrate_revdb() if (-d $GIT_SVN_DIR && !-w $REVDB);
 	print "Done upgrading.\n";
@@ -2138,15 +2135,8 @@ sub write_grafts {
 sub read_url_paths {
 	my $l_map = {};
 	git_svn_each(sub { my $x = shift;
-			my $u = file_to_s("$GIT_DIR/svn/$x/info/repo_url");
-			my $p = file_to_s("$GIT_DIR/svn/$x/info/repo_path");
-			# we hate trailing slashes
-			if ($u =~ s#(?:^\/+|\/+$)##g) {
-				s_to_file($u,"$GIT_DIR/svn/$x/info/repo_url");
-			}
-			if ($p =~ s#(?:^\/+|\/+$)##g) {
-				s_to_file($p,"$GIT_DIR/svn/$x/info/repo_path");
-			}
+			my $url = file_to_s("$GIT_DIR/svn/$x/info/url");
+			my ($u, $p) = repo_path_split($url);
 			$l_map->{$u}->{$p} = $x;
 			});
 	return $l_map;
-- 
1.4.0

^ permalink raw reply related

* Hottest new offer Now you have chance to do it
From: Kellie @ 2006-06-15 20:53 UTC (permalink / raw)
  To: git

Hello,
Most quality products for anyone who wants to become a champion in bed 

You are just a couple of clicks away from our great prices and handy shipment
 Most trusted brands of the world, join the thousands of happy customers
 Come in: http://www.hoopcc.com
 Just check yourself!

^ permalink raw reply

* Re: observations on parsecvs testing
From: Sean @ 2006-06-15 20:47 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: keithp, git
In-Reply-To: <Pine.LNX.4.64.0606151529350.16002@localhost.localdomain>

On Thu, 15 Jun 2006 16:37:30 -0400 (EDT)
Nicolas Pitre <nico@cam.org> wrote:

> Also rcs2git() is very inefficient especially with files having many 
> revisions as it reconstructs the delta chain on every call.  For example 
> mozilla/configure,v has at least 1690 revisions, and actually converting 
> it into GIT blobs goes at a rate of 2.4 objects per second _only_ on my 
> machine.  Can't objects be created as the delta list is walked/applied 
> instead?  That would significantly reduce the initial convertion time.

Hi Nicolas,

That was a planned optimization which I did mention to Keith previously.
Was kinda waiting to hear back how it was working for him, and if there
was an interest to put more work into it to include in his mainline.

Sean

^ permalink raw reply

* Re: Autoconf/Automake
From: Johannes Schindelin @ 2006-06-15 20:42 UTC (permalink / raw)
  To: Phil Richards; +Cc: git
In-Reply-To: <20060615201000.600939E2BC@derisoft.derived-software.demon.co.uk>

Hi,

On Thu, 15 Jun 2006, Phil Richards wrote:

> On 2006-06-15, Alex Riesen <fork0@t-online.de> wrote:
>
> >  Git already has enough external dependencies (crypto, Python, Perl,
> >  bash, gmake), why create another one?
> > 
> >  If we are about to need a configuration system (and I doubt it), may
> >  be we should at least select a system small enough to have it always
> >  in git repo? (yes, as linux kernel configuration system is)
> 
> Well, since Python is already a dependency, why not use a build system
> that has Python as its scripting/extension language?  It's also quite
> small, and it's called SCons.  I found it rather easy to learn
> when I was having a quick look around at alternative build systems.

Okay, let's face it. There are gazillions of make clones which "guarantee" 
to fix all shortcomings of make. None of them are even close to make 
(regarding developer exposure: take 3 developers, and 1 does not know 
make, and 2 do not know whatever-your-favourite-make-clone-is).

As for now, I fail to see why the current system is not adequate for git!

Ciao,
Dscho

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox