Git development

Git development
 help / color / mirror / Atom feed

* Re: Security problem
From: Junio C Hamano @ 2006-06-16  0:12 UTC (permalink / raw)
  To: Alexander Litvinov; +Cc: git
In-Reply-To: <200606151709.22752.lan@academsoft.ru>

Alexander Litvinov <lan@academsoft.ru> writes:

> Why does not git-checkout check if file content match name of the object ?

Good point.  We could do a few things:

 - entry.c:write_entry() could validate after read_sha1_file(). 

 - read_sha1_file() could do the checking; this has performance
   implications, though.

Cloning over git aware protocols validate the objects coming
over the wire, so it may make sense to cheat and do the former,
so that we do not have to pay the validation cost every time we
access any object.

^ permalink raw reply

* Re: 2.6.17-rc6-mm2
From: Goo GGooo @ 2006-06-16  1:14 UTC (permalink / raw)
  To: linux-kernel, git
In-Reply-To: <ef5305790606142040r5912ce58kf9f889c3d61b2cc0@mail.gmail.com>

On 6/15/06, Goo GGooo <googgooo@gmail.com> wrote:
> Andrew Morton wrote:
>
> > - To fetch an -mm tree using git, use (for example)
> >
> >  git fetch git://git.kernel.org/pub/scm/linux/kernel/git/smurf/linux-trees.git
> > v2.6.16-rc2-mm1
>
> I'm not able to get -mm tree from GIT. In
> http://git.kernel.org/.../smurf/linux-trees.git/refs/tags/ I can see
> the most recent tags like v2.6.17-rc6-mm2 but cg-clone
> http://git.kernel.org/.../smurf/linux-trees.git gives me only
> 2.6.16-rc3 :(
>
> I tried "cg-fetch v2.6.17-rc6-mm2" which seemed to fetch some more
> tags, then played with git-checkout & friends but still can't get the
> most recent source tree.

All right, finally this worked out:
git pull rsync://git.kernel.org/pub/scm/linux/kernel/git/smurf/linux-trees.git \
      tag v2.6.17-rc6-mm2

Strange enough with http:// instead of rsync:// I got some message
about nonexistent tag.

Now when I try git pull with http:// again it says the tree is up to
date. However with git:// it started downloading more things and tags.

That's confusing - I believed all protocols should behave the same way...?

Goo

^ permalink raw reply

* Re: Security problem
From: Linus Torvalds @ 2006-06-16  2:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Alexander Litvinov, git
In-Reply-To: <7vbqsuc60q.fsf@assigned-by-dhcp.cox.net>

On Thu, 15 Jun 2006, Junio C Hamano wrote:
>
> Alexander Litvinov <lan@academsoft.ru> writes:
> 
> > Why does not git-checkout check if file content match name of the object ?
> 
> Good point.  We could do a few things:

I missed the original mail. What's the problem?

If this is about the remote end lying about the SHA1 name, it's a total 
non-issue for any of the native protocols, since the native protocols 
don't actually send SHA1 names at all, they just send the data (and we 
re-create the SHA1 name on receipt).

So there's no way to have the name of an object not match its content, 
unless you have actual corruption (which is for git-fsck-object to find, 
not somethign that should slow down any normal operation), or if you use 
one of the dumb protocols.

And if you use the dumb protocols, the data should probably be validated 
_there_ (by fetch(), rather than anywhere else). And for "rsync", you 
really don't have much choice apart from doing a full fsck, I suspect.

So I don't see the security issue, unless you don't trust the local 
filesystem, in which case nothing git can do matters at all..

		Linus

^ permalink raw reply

* Re: 2.6.17-rc6-mm2
From: Linus Torvalds @ 2006-06-16  2:46 UTC (permalink / raw)
  To: Goo GGooo; +Cc: linux-kernel, git
In-Reply-To: <ef5305790606151814i252c37c4mdd005f11f06ceac@mail.gmail.com>

On Fri, 16 Jun 2006, Goo GGooo wrote:
> 
> That's confusing - I believed all protocols should behave the same way...?

Not really. The primary protocol is the native git one, and the others try 
to do a best effort, but the http protocol really can't do a very good 
job unless the server side has run "git update-server-info" to help the 
http client along.

I suspect that the -mm git tree simply doesn't do that. In fact, even the 
main tree didn't use to do it, but I finally just broke down and added the 
proper hook to make it always do it automatically when I push.

(In case Andrew wants to do that, the way to do it is:

	echo -e "#!/bin/sh\nexec git-update-server-info" > hooks/post-update
	chmod +x hooks/post-update

inside the git repository - all it will do is always execute that script, 
and this "git-update-server-info", after you've updated the repo).

Finally, the rsync protocol just copies all objects over, and since it 
doesn't even know _which_ objects it is getting, it doesn't do the normal 
tag following that the native git protocol does.

So to recap:
 - http is fundamentally weaker, and needs some server-side help to work
 - rsync is fine for the initial clone, but doesn't actually know what 
   it's doing, so the end result can actually even be a corrupted 
   repository, because you happened to rsync just as it was updating.
 - the native git protocol generally should be considered the golden 
   standard, where the other ones are just fallbacks in case of problems 
   (like firewalls that don't let git:// through, or more commonly hosted 
   servers that don't do the git protocol at all).

Which hopefully clarifies the issue a bit.

		Linus

^ permalink raw reply

* Re: Security problem
From: Linus Torvalds @ 2006-06-16  2:56 UTC (permalink / raw)
  To: Alexander Litvinov; +Cc: Junio C Hamano, git
In-Reply-To: <200606160931.29553.lan@academsoft.ru>

On Fri, 16 Jun 2006, Alexander Litvinov wrote:
> 
> I have found the ability to hack git repo. After this hacking people will 
> checkout hacked files from the "trusted" commit. Only git-fsck-objects will 
> complain at this.

Right.

If you can't trust your local filesystem, you are screwed. 

git-fsck-objects will notice when somebody has done something bad, but 

> Why does not git-checkout check if file content match name of the object ?

Why would it? It really just slows things down, and if you don't trust 
your local repo, people can "hack" you much more easily by just generating 
a _proper_ tree with the _proper_ data, and git checkout checking the SHA1 
wouldn't help at all.

The way to security lies in using git-fsck-objects, together with an 
_external_ source of trust. For example, that external source of trust may 
be a signed tag, or, perhaps even more simply, just by saving off the top 
commit name on some trusted medium.

But you do need a "point of trust" to start with. Without that, it's a lot 
easier to "hack" a git repo by doing

	echo 'Hacked file' > a
	git commit --amend a
	git prune

and now the file "a" has changed to "Hacked file", and even 
git-fsck-objects can't tell that anything bad happened.

(Btw, if you want to _hide_ the fact that "a" now contains "Hacked file", 
you do so by faking it in the index. You can have the checked-out copy say 
what it should say - ie "Usual file" - and if you don't want git to show 
you the difference to HEAD, you edit the .git/index file by hand so that 
the timestamp, size and inode matches the real SHA1, even though the 
_contents_ match "Usual file").

See?

You do need to trust something. Normally you'd trust your own filesystem, 
but git certainly supports other forms of trust through either the native 
support for signed certificates in the form of tags, or any other form of 
external trust.

			Linus

^ permalink raw reply

* Re: Security problem
From: Alexander Litvinov @ 2006-06-16  3:54 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git
In-Reply-To: <Pine.LNX.4.64.0606151948230.5498@g5.osdl.org>

> If you can't trust your local filesystem, you are screwed.

You are right, I trust my file system. But if our team had central repo with 
ssh access to that machine, every developer can hack central repo.

Whould git-clone/git-fetch warn me about this ?

My own test with (another) local repo says:
lan@lan:~/tmp/git/test> git clone 1 2
Generating pack...
Done counting 3 objects.
Deltifying 3 objects.
 100% (3/3) done
Total 3, written 3 (delta 0), reused 0 (delta 0)
error: git-checkout-index: unable to read sha1 file of a 
(3609f20ebd357679b111783e8afaf36ec46427f3)

It can't checkout object (3609f20ebd357679b111783e8afaf36ec46427f3 is the 
original file). It seems packed repos are safe from this point.

^ permalink raw reply

* Re: Security problem
From: Linus Torvalds @ 2006-06-16  5:00 UTC (permalink / raw)
  To: Alexander Litvinov; +Cc: Junio C Hamano, git
In-Reply-To: <200606161054.46813.lan@academsoft.ru>

On Fri, 16 Jun 2006, Alexander Litvinov wrote:
>
> You are right, I trust my file system. But if our team had central repo with 
> ssh access to that machine, every developer can hack central repo.
> 
> Whould git-clone/git-fetch warn me about this ?

Using the native protocol, yes. Using rsync, unless you explicitly fsck 
the result, no.

> It can't checkout object (3609f20ebd357679b111783e8afaf36ec46427f3 is the 
> original file). It seems packed repos are safe from this point.

Well, they may not be "safe" - you just need to work a _lot_ harder to 
corrupt a pack-file in any interesting manner. And again, git-fsck-objects 
would pick up any such thing going on.

Anyway, what it boils down to is that anybody who has write access to a 
particular repository can certainly change the repo in "interesting" ways. 

However, there are various inherent safety valves in place that make it 
really hard to corrupt on a bigger scale.

The first is that git-fsck-objects will definitely find any repository 
inconsistency, and to get around that, you either have to get around the 
basic properties of SHA-1 (ie break the hash) _or_ you have to actually 
change the repository so that it's still a valid repo, just with different 
content.

So let's take a look at those two cases:

 - if you corrupt the repository, subsequent clones (or even pulls) from 
   the corrupt repository simply won't work if you use the native 
   protocol, because the native protocol doesn't actually trust anything 
   but the actual contents (so if the contents won't match, then neither 
   will the SHA1 names). So the corruption is pretty strictly limited to 
   the _one_ repository that the attacker had write access to.

   So there's a pretty fundamental "corruption containment" part there.

   (Side note: there's no question that we might well be able to do 
   better. A _malicious_ server could actually send a corrupt pack, and 
   it's possible that a properly corrupted remote archive could cause even 
   a "good" git-send-pack to just silently send a corrupt pack, so that 
   you'd need to use "git-fsck-objects" on the receiving side to notice 
   that you are missing objects, for example)

 - if the repository is good (ie fsck is fine), then obviously a "git 
   pull" will also succeed. However, you can't _hide_ the data the way you 
   tried to do: when the receiver checks out the most recent version, it 
   will definitely use the data in the object, there's no way to get the 
   server to serve different data in objects and in the working tree 
   (because the server literally doesn't even send the working tree at 
   all).

   So you can always convince somebody to pull from an "evil repository", 
   and that's no different from committing a bug by mistake. But at least 
   you can't try to hide the bug just in the object store and have it not 
   show up in diffs and in checked-out copies.

The latter case is true even with http and rsync, the actual pull event 
always pulls just the database, never any checked-out state (in fact, 
the common case is obviously to pull from a bare repository that doesn't 
even _have_ checked-out state). So you can't hide things in the index or 
in the checked-out state except in the filesystem that you have direct 
write access to.

But yeah, I actually still personally do a fair number of 
"git-fsck-objects". I've never found anything that way since very early on 
(and back then, the real problem was rsync getting objects that weren't 
reachable), but I still do it. It makes me feel happier.

Of course, bugs always happen. But I can pretty much guarantee that git is 
fundamentally harder to corrupt than most things. We've had git-fsck-cache 
since April 8th last year (or, put another way, literally since "Day 2" in 
git terms - it's the eight commit in the whole git history).

Git also has an almost total lack of redundant information. There's 
basically no "duplicate" information in the repository format itself where 
you could hide something so that it wouldn't be noticed.

In a checked-out project, the checked-out state itself is "duplicate 
information" (and that was where your "attack" tried to hide things), and 
there's the index (which is actually a much better and subtle place to 
hide things ;). But neither of them have any life outside of that 
particular repository.

			Linus

^ permalink raw reply

* Re: Security problem
From: Alexander Litvinov @ 2006-06-16  5:37 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git
In-Reply-To: <Pine.LNX.4.64.0606152137410.5498@g5.osdl.org>

> Well, they may not be "safe" - you just need to work a _lot_ harder to
> corrupt a pack-file in any interesting manner. And again, git-fsck-objects
> would pick up any such thing going on.
As it shown in pack-objects.c, each object have stored sha1, almost the same 
as file rename.

> The first is that git-fsck-objects will definitely find any repository
> inconsistency, and to get around that, you either have to get around the
> basic properties of SHA-1 (ie break the hash) _or_ you have to actually
> change the repository so that it's still a valid repo, just with different
> content.
I still belive SHA-1 is good enouth to hash files - I did not hear about 
generation reasonable duplicate that can compile and work :-)

>  - if you corrupt the repository, subsequent clones (or even pulls) from
>    the corrupt repository simply won't work if you use the native
>    protocol, because the native protocol doesn't actually trust anything
>    but the actual contents (so if the contents won't match, then neither
>    will the SHA1 names). So the corruption is pretty strictly limited to
>    the _one_ repository that the attacker had write access to.
As I understand sent pack file will contains actial SHA-1 of objects. And any 
hack will be cleary visible.

>    So there's a pretty fundamental "corruption containment" part there.
...
Situation with evil repo is clear to me: you can turst only to trusted commit 
identified by SHA-1

> But yeah, I actually still personally do a fair number of
> "git-fsck-objects". I've never found anything that way since very early on
> (and back then, the real problem was rsync getting objects that weren't
> reachable), but I still do it. It makes me feel happier.
As the result: Always fsck repo after pull/clone !

^ permalink raw reply

* Re: 2.6.17-rc6-mm2
From: Goo GGooo @ 2006-06-16  5:49 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, git
In-Reply-To: <Pine.LNX.4.64.0606151937360.5498@g5.osdl.org>

On 6/16/06, Linus Torvalds <torvalds@osdl.org> wrote:

> So to recap:
>  - http is fundamentally weaker, and needs some server-side help to work
>  - rsync is fine for the initial clone, but doesn't actually know what
>    it's doing, so the end result can actually even be a corrupted
>    repository, because you happened to rsync just as it was updating.
>  - the native git protocol generally should be considered the golden
>    standard, where the other ones are just fallbacks in case of problems
>    (like firewalls that don't let git:// through, or more commonly hosted
>    servers that don't do the git protocol at all).
>
> Which hopefully clarifies the issue a bit.

Thanks for explanation. Unfortunately I can't use git:// with "git
pull" (at least in git-1.3.2). First it does some traffic, that
suddenly stops - I guess the server starts doing *something*, perhaps
preparing the update for me or whatnot. After a pretty long while it
sends some more data but in the meanwhile my ADSL router dropped the
NAT entry and git sits on my side waiting for data forever. Recently I
tried the same on a system with direct Inet connection and that worked
just fine.

I suggest adding SO_KEEPALIVE option on the git socket.

Goo

^ permalink raw reply

* Re: Security problem
From: Linus Torvalds @ 2006-06-16  6:27 UTC (permalink / raw)
  To: Alexander Litvinov; +Cc: Junio C Hamano, git
In-Reply-To: <200606161237.21997.lan@academsoft.ru>

On Fri, 16 Jun 2006, Alexander Litvinov wrote:
>
> > Well, they may not be "safe" - you just need to work a _lot_ harder to
> > corrupt a pack-file in any interesting manner. And again, git-fsck-objects
> > would pick up any such thing going on.
>
> As it shown in pack-objects.c, each object have stored sha1, almost the same 
> as file rename.

Yes and no.

The index file has the stored sha1 (and in that sense you can do almost 
the same thing as a file rename by just modifying the index file).

But when we actually transfer a pack over from one place to another (ie a 
clone or a push), we don't even transfer the index file. Instead, the 
index file gets re-generated at the other end.

That's pretty much an on-going theme in most of git - trying to avoid 
having metadata, if that can instead of calculated directly.

So again, a "rsync" or a "http" thing that just gets the index and 
pack-files directly _as_files_, will actually also download a corrupt 
file. The git native protocol is much harder to fool.

git-fsck-objects actually verifies the pack-files and index files in 
several ways:

 - both the pack-file and the index-file actually contain a SHA1 checksum 
   of themselves, so any accidental corruption will be picked up (but if 
   somebody is able to get at the filesystem, they can obviously 
   re-calculate the SHA1 and update the checksum too)

 - the index file also contains the SHA-1 of the pack-file (and that is 
   then part of the checksum of the index file), again to avoid accidental 
   corruption or mixing of index and pack-files.

 - fsck checks all of these internal SHA-1 checksums, and verifies basic 
   information (ie number of objects must match etc)

 - each object in the index file is unpacked, and its SHA-1 is 
   re-calculated and checked against what the index file claimed.

So exactly as with individual objects, the pack-files are actually 
verified, and on (native-mode) transfer, the names of individual files are 
never actually transferred, rather they are re-calculated from the raw 
contents at the receiving end.

The pack-files then have a few additional sanity-checks of their own that 
should help pinpoint at least the accidental kind of corruption.

But no, the SHA1 checksums of the pack-files are not checked by normal 
operations. That would be deadly - trying to check the SHA1 hash of a 
pack-file obviously would involve reading it all in, something normal 
operations actually try to avoid (normal ops use the index exactly in 
order to only read the parts they need).

Perhaps most importantly, after fsck has checked the SHA-1's of each 
individual object, it will also do a full reachability check. That, in 
many ways, is even more important than checking that each object name 
matches its contents (ie there's no missing history either, and the 
"tips" of the repository end up basically validating all the rest).

So again, the thing is set up so that doing a full fsck actually does a 
_lot_ of integrity checking.

But in the absense of explicit fsck, we do trust the data, even if the 
actual _transfer_ of data will recalculate SHA-1's.

> >  - if you corrupt the repository, subsequent clones (or even pulls) from
> >    the corrupt repository simply won't work if you use the native
> >    protocol, because the native protocol doesn't actually trust anything
> >    but the actual contents (so if the contents won't match, then neither
> >    will the SHA1 names). So the corruption is pretty strictly limited to
> >    the _one_ repository that the attacker had write access to.
>
> As I understand sent pack file will contains actial SHA-1 of objects. And any 
> hack will be cleary visible.

No, as mentioned, the actual SHA-1's won't ever be sent, so what happens 
is that if the repository on the sending side was hacked, the _sending_ 
side may never even realize it (since it's not necessarily checking the 
SHA-1's), but the receiving side will only ever see the raw data, and as 
such, it won't ever even _see_ the "false hidden names", because it will 
generate a whole new index that purely depends on the data.

And maybe that's exactly what you meant - yes, the hack will be clearly 
visible, because the names will now be the "real" ones. You can't hide 
things by using a false name.

> >    So there's a pretty fundamental "corruption containment" part there.
> ...
> Situation with evil repo is clear to me: you can turst only to trusted commit 
> identified by SHA-1

Yes. Exactly.

And once you have a reason to trust a commit, everything you can reach 
from that commit is also trustworthy, assuming it passes fsck. IOW, you 
only really need to trust the head(s) in your repository.

> > But yeah, I actually still personally do a fair number of
> > "git-fsck-objects". I've never found anything that way since very early on
> > (and back then, the real problem was rsync getting objects that weren't
> > reachable), but I still do it. It makes me feel happier.
>
> As the result: Always fsck repo after pull/clone !

Well, even better, try to avoid pulling from untrusted sources in the 
first place ;)

But yes, fsck is actually fairly fast if you do incremental pulls and 
repack your repository. To help you do this, there's two modes to fsck: 
there's the "full mode", which goes through _everything_, including 
pack-files, and there's the "fsck only lose objects", which is the common 
one.

So for example, let's say that you only ever repack your repository 
locally when it's been "known good" (in fact, repacking in itself will 
generally find almost all of the problems that fsck can find, since a full 
repack will obviously do the reachability analysis as part of just the 
preparatory work). That means that you only ever need to do the quick 
default "light fsck" after a pull, since an incremental pull (with the 
native protocol) will have unpacked all the pulled objects.

So "fsck after each pull" is not something we do by default, but if you 
keep your repo fairly packed, doing so manually (or by just scripting 
things) won't even really slow you down, because it will only ever need to 
check incrementally - the stuff you've re-packed it doesn't need to check 
(assuming you can now trust your local filesystem).

So git certainly gives you the option to be really anal, and doesn't even 
make it needlessly hard or expensive, even with large repositories.

			Linus

^ permalink raw reply

* Re: 2.6.17-rc6-mm2
From: Linus Torvalds @ 2006-06-16  6:39 UTC (permalink / raw)
  To: Goo GGooo; +Cc: linux-kernel, git
In-Reply-To: <ef5305790606152249n2702873fy7b708d9c47c78470@mail.gmail.com>

On Fri, 16 Jun 2006, Goo GGooo wrote:
> 
> Thanks for explanation. Unfortunately I can't use git:// with "git
> pull" (at least in git-1.3.2). First it does some traffic, that
> suddenly stops - I guess the server starts doing *something*, perhaps
> preparing the update for me or whatnot.

Yeah, for a big pull, the server will have to think about the objects it 
is going to send you.

> I suggest adding SO_KEEPALIVE option on the git socket.

Actually, the really irritating thing is that we actually generate all 
these nice status updates, which just makes pulling and cloning a lot more 
comfortable, because you actually see what is going on, and what to 
expect. 

Except they only work over ssh, where we have a separate channel (for 
stderr), and with the native git protocol all that nice status work just 
gets flushed to /dev/null :(

Dang. It's literally the most irritating part of the thing: the protocol 
itself is exactly the same whether you go over ssh:// or over git://, but 
that visual information about what is going on is missing, and it's 
surprisingly important from a usability standpoint.

And in your case, the usability downside actually turned into a real 
accessibility bug.

Oh, well.

		Linus

^ permalink raw reply

* Re: Autoconf/Automake
From: Nikolai Weibull @ 2006-06-16  6:51 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Linus Torvalds, Yann Dirson, Alex Riesen, Pavel Roskin, git
In-Reply-To: <Pine.LNX.4.63.0606160105100.7480@wbgn013.biozentrum.uni-wuerzburg.de>

On 6/16/06, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:

> In a project I am stuck in, maven is used. It tries -- of all things -- to
> fix a few shortcomings of ant -- which was supposed to fix shortcomings of
> make! And let's face it. Maven is complicated, slow as a dog lacking all
> four feet, and it still does not do the things I can do in three lines
> with make. It's a complete desaster.

But...it uses XML...how can it not be a panacea?

  nikolai

^ permalink raw reply

* Re: Security problem
From: Alexander Litvinov @ 2006-06-16  8:18 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git
In-Reply-To: <Pine.LNX.4.64.0606152300460.5498@g5.osdl.org>

> So git certainly gives you the option to be really anal, and doesn't even
> make it needlessly hard or expensive, even with large repositories.

Thanks for detailed description. Now I can sleep without any worry about my 
repo :-)

^ permalink raw reply

* Re: Autoconf/Automake
From: Jerome Lovy @ 2006-06-16  9:06 UTC (permalink / raw)
  To: git
In-Reply-To: <20060615174833.GA32247@dspnet.fr.eu.org>

Olivier Galibert wrote:
> On Thu, Jun 15, 2006 at 10:02:10AM -0700, Linus Torvalds wrote:
> 
>>These days, there aren't fifteen different versions of UNIX. There's a 
>>couple, and it's perfectly ok to actually say "fix your damn system and 
>>just install GNU make". It's easier to install GNU make than it is to 
>>install autoconf/automake.
> 
> 
> You should be careful to separate autoconf and automake.  Autoconf is
> not so bad, and you can make clean, maintainable Makefile.in and
> config.h.in files with it, because it uses simple substitution.  It is
> quite useful to detect available librairies when some are optional,
> and also to lightly[1] ensure that prefix and friends will stay the
> same between make and make install.  Also, especially if you hack a
> little bit to alias 'enable' and 'with', you get a sane interface to
> optional feature selection.  Oh, and to seperate compilation
> directories too (vpath generation).

I fully agree with Olivier. It seems to me that you don't have to buy 
the whole autoconf/automake/libtool stack to leverage the autoconf 
functionality. autoconf alone provides the full "autoconfiguration" 
framework (running scriptlets and setting substitution variables 
accordingly). You still have to write Makefile.in (with statements 
looking like: CC=@CC@). Therefore the resulting Makefile is just as 
beautiful or as ugly as you wrote the initial Makefile.in: you have full 
control over it.

As for dependencies, one shouldn't confuse what is needed on the 
autoconfiguration developer's side (in order to build the configure 
script from the configure.in file) and what is needed on the installer's 
side to run the configure script and process the generated makefile. The 
former needs the autoconf package which itself relies on GNU m4. The 
latter merely needs a decently compatible Bourne shell and a decently 
compatible make.

On the other hand, what you get with automake is a fully automatically 
generated makefile, with make targets conforming to the GNU standards. 
But then you fully loose control over the Makefile: you don't write the 
Makefile.in anymore (automake does it for you) but rather the terce 
Makefile.am. In this respect, automake is like imake: you write few 
lines of (i)makefile, but then you cannot complain if you don't 
understand what comes in the generated makefile ;-) .

Jérôme Lovy

^ permalink raw reply

* Just out I think, yes. Be delighted with
From: Major @ 2006-06-16 10:26 UTC (permalink / raw)
  To: glenn

Hello my friend!
Make your girlfriend or wife speechless with increased hardness, richer orgsms and more power in bed 
Get everything you need delivered to your door low-cost and fast.

 Largest and most recognized brands are working to make you 100% happy with this stuff.
 All you need is here: http://www.extremeci.com
 We thank you for being interested in our products

^ permalink raw reply

* Get the freshest Now you have chance to do it Delight in
From: Brendan @ 2006-06-16 10:31 UTC (permalink / raw)
  To: glenda

Dear member.

Rock hard manhood, multiple explosions and several times more semen volume 
Order now and benefit from lowest costs and convenient shipment
 Hot deals on stuff produced by well-known brands from worldwide.

 Find what you need: http://www.extremeci.com

 The prices are really low and the quality it truly very high!

^ permalink raw reply

* Re: [BUG] stgit branch renaming into new dir crashes
From: Catalin Marinas @ 2006-06-16 12:06 UTC (permalink / raw)
  To: Yann Dirson; +Cc: GIT list
In-Reply-To: <20060613214053.GD7766@nowhere.earth>

On 13/06/06, Yann Dirson <ydirson@altern.org> wrote:
> When trying to rename a branch to a name including a slash, there is
> no explicit creation of leading dirs, and stgit crashes:
>
> $ stg branch -r multitag dev/multitag
> Traceback (most recent call last):
[...]

What version of StGIT are you using? It seems to be OK with 0.10.

-- 
Catalin

^ permalink raw reply

* Re: 2.6.17-rc6-mm2
From: Uwe Zeisberger @ 2006-06-16 12:40 UTC (permalink / raw)
  To: git
In-Reply-To: <ef5305790606152249n2702873fy7b708d9c47c78470@mail.gmail.com>

Hello,

> I suggest adding SO_KEEPALIVE option on the git socket.
I suggest to do this "manually", that is send an dummy (or status)
package every x seconds.  Then the server could detect if a cloning
client disconnected and stop generating the pack file.

(Currently I see from time to time a git server process (IIRC
git-pack-objects) that creates a packfile and only when it's done fails
to send it.)

Best regards
Uwe

-- 
Uwe Zeisberger

http://www.google.com/search?q=30+hours+and+4+days+in+seconds

^ permalink raw reply

* Cygwin git and windows network shares
From: Niklas Frykholm @ 2006-06-16 12:58 UTC (permalink / raw)
  To: git

I'm trying to use cygwin git (compiled from the 1.4.0 tarball) to create 
repository on a windows network share, but I get an error message.

    $ cd //computer/git/project
    $ git init-db
    defaulting to local storage area
    Could not rename the lock file?

The repository seems to be left in an inconsistent state after this:

    $ git clone //computer/git/project/
    fatal: no matching remote head
    fetch-pack from '//computer/git/project/.git' failed.

When working only with local files, I do not get these errors. Does 
anyone know the cause of this error/any way around it?

// Niklas

^ permalink raw reply

* Re: Cygwin git and windows network shares
From: Juergen Ruehle @ 2006-06-16 14:24 UTC (permalink / raw)
  To: Niklas Frykholm; +Cc: git
In-Reply-To: <4492AAFA.20807@grin.se>

Niklas Frykholm writes:
 > I'm trying to use cygwin git (compiled from the 1.4.0 tarball) to create 
 > repository on a windows network share, but I get an error message.
 > 
 >     $ cd //computer/git/project
 >     $ git init-db
 >     defaulting to local storage area
 >     Could not rename the lock file?

cygwin's rename seems to be capable of overwriting an existing target
only on NTFS. The following hack is a workaround, but is probably not
safe.

diff --git a/lockfile.c b/lockfile.c
index 2346e0e..5e78211 100644
--- a/lockfile.c
+++ b/lockfile.c
@@ -48,6 +48,7 @@ int commit_lock_file(struct lock_file *l
 	strcpy(result_file, lk->filename);
 	i = strlen(result_file) - 5; /* .lock */
 	result_file[i] = 0;
+	unlink(result_file);
 	i = rename(lk->filename, result_file);
 	lk->filename[0] = 0;
 	return i;

^ permalink raw reply related

* Why so much time in the kernel?
From: Jon Smirl @ 2006-06-16 14:49 UTC (permalink / raw)
  To: git

I'm still working on importing Mozilla CVS. I'm at the phase now where
all of the changeset have been identified. The scripts are pulling the
changesets one at a time out of CVS and putting them into git. I've
been running this phase for 2 days now on a 3GB machine and it still
isn't finished.

I am spending over 40% of the time in the kernel. This looks to be
caused from forks and starting small tasks, is that the correct
interpretation? Is the number of process that have been run recorded
any where? 1.4% of the time is spend in the dynamic linker.

Checking with oprofile I see this:

  18262372 41.0441 /home/good/vmlinux
  5465741 12.2841 /usr/bin/cvs
  4374336  9.8312 /lib/libc-2.4.so
  3627709  8.1532 /lib/libcrypto.so.0.9.8a
  2494610  5.6066 /usr/bin/oprofiled
  2471238  5.5540 /usr/lib/libz.so.1.2.3
   945349  2.1246 /usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libperl.so
   933646  2.0983 /usr/local/bin/git-read-tree
   758776  1.7053 /usr/local/bin/git-write-tree
   642502  1.4440 /lib/ld-2.4.so
   472903  1.0628 /nvidia
   379254  0.8524 /usr/local/bin/git-pack-objects

and breaking down the kernel number:

3467889  18.9893  copy_page_range
2190416  11.9941  unmap_vmas
1156011   6.3300  page_fault
887794    4.8613  release_pages
860853    4.7138  page_remove_rmap
633243    3.4675  get_page_from_freelist
398773    2.1836  do_wp_page
344422    1.8860  __mutex_lock_slowpath
280070    1.5336  __handle_mm_fault
241713    1.3236  do_page_fault
238398    1.3054  __d_lookup
236654    1.2959  vm_normal_page

-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply

* Re: Why so much time in the kernel?
From: Linus Torvalds @ 2006-06-16 15:06 UTC (permalink / raw)
  To: Jon Smirl; +Cc: git
In-Reply-To: <9e4733910606160749t4d7a541ev72a67383e96d86da@mail.gmail.com>

On Fri, 16 Jun 2006, Jon Smirl wrote:
> 
> I am spending over 40% of the time in the kernel. This looks to be
> caused from forks and starting small tasks, is that the correct
> interpretation?

Yes. Your kernel profile is all for stuff related to setting up and 
tearing down process space (well, __mutex_lock_slowpath at 1.88% and 
__d_lookup at 1.3% is not, but every single one before that does seem to 
be about fork/exec/exit).

I think it's both the CVS server that continually forks/exits (it doesn't 
actually do a exec at all - it seem sto be using fork/exit as a way to 
control its memory usage - knowing that the OS will free all the temporary 
memory on exit - I think the newer CVS development trees don't do this, 
but that also seems to be why they leak memory like mad and eventually run 
out ;).

AND it's git-cvsimport forking and exec'ing git helper processes. 

So that process overhead is expected.

What I would _not_ have expected is:

>   933646  2.0983 /usr/local/bin/git-read-tree

I don't see why git-read-tree is so hot for you. We should never need to 
read a tree when we're importing something, unless there are tons of 
branches and we switch back and forth between them.

I guess mozilla really does use a fair number of branches? 

Martin sent out a patch (that I don't think has been merged yet) to avoid 
the git-read-tree overhead when switching branches. Look for an email with 
a subject like "cvsimport: keep one index per branch during import", I 
suspect that would speed up the git part a lot.

(It will also avoid a few fork/exec's, but you'll still have most of them, 
so I don't think you'll see any really _fundamental_ changes to this, but 
the git-read-tree overhead should be basically gone, and some of the 
libz.so pressure would also be gone with it. It should also avoid 
rewriting the index file, so you'd get lower disk pressure, but it looks 
like none of your problems are really due to IO, so again, that probably 
won't make much of a difference for you).

			Linus

^ permalink raw reply

* Re: Why so much time in the kernel?
From: Jon Smirl @ 2006-06-16 15:25 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git
In-Reply-To: <Pine.LNX.4.64.0606160755170.5498@g5.osdl.org>

On 6/16/06, Linus Torvalds <torvalds@osdl.org> wrote:
>
>
> On Fri, 16 Jun 2006, Jon Smirl wrote:
> >
> > I am spending over 40% of the time in the kernel. This looks to be
> > caused from forks and starting small tasks, is that the correct
> > interpretation?
>
> Yes. Your kernel profile is all for stuff related to setting up and
> tearing down process space (well, __mutex_lock_slowpath at 1.88% and
> __d_lookup at 1.3% is not, but every single one before that does seem to
> be about fork/exec/exit).
>
> I think it's both the CVS server that continually forks/exits (it doesn't
> actually do a exec at all - it seem sto be using fork/exit as a way to
> control its memory usage - knowing that the OS will free all the temporary
> memory on exit - I think the newer CVS development trees don't do this,
> but that also seems to be why they leak memory like mad and eventually run
> out ;).

I am using cvs-1.11.21-3.2
I can try running their development tree.

>
> AND it's git-cvsimport forking and exec'ing git helper processes.

Is it worthwhile to make a library version of these? Svn has lib
versions and they barely show up in oprofile. cvsimport is only using
4-5 low level git funtions.

>
> So that process overhead is expected.
>
> What I would _not_ have expected is:
>
> >   933646  2.0983 /usr/local/bin/git-read-tree
>
> I don't see why git-read-tree is so hot for you. We should never need to
> read a tree when we're importing something, unless there are tons of
> branches and we switch back and forth between them.
>
> I guess mozilla really does use a fair number of branches?

Is 1,800 a lot?

>
> Martin sent out a patch (that I don't think has been merged yet) to avoid
> the git-read-tree overhead when switching branches. Look for an email with
> a subject like "cvsimport: keep one index per branch during import", I
> suspect that would speed up the git part a lot.

I'll check this out

> (It will also avoid a few fork/exec's, but you'll still have most of them,
> so I don't think you'll see any really _fundamental_ changes to this, but
> the git-read-tree overhead should be basically gone, and some of the
> libz.so pressure would also be gone with it. It should also avoid
> rewriting the index file, so you'd get lower disk pressure, but it looks
> like none of your problems are really due to IO, so again, that probably
> won't make much of a difference for you).

I have been CPU bound for two days, disk activity is minor.
git-cvsimport is 250MB and I have 2GB of disk cache.

After looking at this process for about a week it doesn't look like
processing chronologically is the best strategy. cvsps can quickly
work out the changesets, 15 minutes. Then it might be better to walk
the CVS files one at a time generating git IDs for each revision. Next
use the IDs and changeset info to build the git trees. Finally pack
everything. This strategy would minimize the work load on the CVS
files (adding all those delta to get random revs).

Can git build a repository in this manner? If this is feasible it may
be possible to do all of this in a single pass over the CVS tree by
modifying cvsps.

-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply

* Re: Why so much time in the kernel?
From: Linus Torvalds @ 2006-06-16 16:09 UTC (permalink / raw)
  To: Jon Smirl; +Cc: git
In-Reply-To: <9e4733910606160825hb538d6fo4c9f1d7d9768e100@mail.gmail.com>

On Fri, 16 Jun 2006, Jon Smirl wrote:
>
> I am using cvs-1.11.21-3.2
> I can try running their development tree.

No, don't. We already know that 1.12 leaks memory and makes the cvsimport 
not work at all.

> > 
> > AND it's git-cvsimport forking and exec'ing git helper processes.
> 
> Is it worthwhile to make a library version of these? Svn has lib
> versions and they barely show up in oprofile. cvsimport is only using
> 4-5 low level git funtions.

Eventually, I think that's where we'll get. We're already at the stage 
where most of the core could just be written as a library.

> > I guess mozilla really does use a fair number of branches?
> 
> Is 1,800 a lot?

Yeah. Although even just two is enough, if you just alternate committing 
on them ;)

So it's actually not number of branches, it's more about frequency of 
the branch changing in the cvsps output. And yes, you could probably 
improve performance by sorting the changesets differently, but Martin's 
change to use separate index files should make it all pretty moot.

		Linus

^ permalink raw reply

* Re: Why so much time in the kernel?
From: Jon Smirl @ 2006-06-16 17:00 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git
In-Reply-To: <Pine.LNX.4.64.0606160906250.5498@g5.osdl.org>

Is it a crazy idea to read the cvs files, compute an sha1 on each
expanded delta and then write the delta straight into a pack file? Are
the cvs and git delta formats the same? What about CVS's forward and
reverse delta use? While this is going on, track the
branches/changsets in memory and then finish up by writing these trees
into the pack file too. This should take no more ram than cvsps needs
currently.

This leaves the packfile is a non-optimal format but a repack should
fix that, right?

-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox