dumb transports not being welcomed..

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* dumb transports not being welcomed..
@ 2005-09-13 21:07 Junio C Hamano
  2005-09-13 21:14 ` Sam Ravnborg
  0 siblings, 1 reply; 27+ messages in thread
From: Junio C Hamano @ 2005-09-13 21:07 UTC (permalink / raw)
  To: git

I've looked at ~80 git repositories publicly available at
kernel.org and noticed only 23 are prepared to handle dumb
transports.  That probably means either most of these trees are
not pulled by people without kernel.org accounts, or public are
using rsync or cogito to pull from these trees.  I somehow find
this number very discouraging ...

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: dumb transports not being welcomed..
  2005-09-13 21:07 dumb transports not being welcomed Junio C Hamano
@ 2005-09-13 21:14 ` Sam Ravnborg
  2005-09-13 21:30   ` Junio C Hamano
  0 siblings, 1 reply; 27+ messages in thread
From: Sam Ravnborg @ 2005-09-13 21:14 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Tue, Sep 13, 2005 at 02:07:58PM -0700, Junio C Hamano wrote:
> I've looked at ~80 git repositories publicly available at
> kernel.org and noticed only 23 are prepared to handle dumb
> transports.  That probably means either most of these trees are
> not pulled by people without kernel.org accounts, or public are
> using rsync or cogito to pull from these trees.  I somehow find
> this number very discouraging ...

Whats wrong using cogito?
In other words. Why does you feel like that when we use cogito to do
cg-update.

	Sam

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: dumb transports not being welcomed..
  2005-09-13 21:14 ` Sam Ravnborg
@ 2005-09-13 21:30   ` Junio C Hamano
  2005-09-13 21:42     ` Sam Ravnborg
                       ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Junio C Hamano @ 2005-09-13 21:30 UTC (permalink / raw)
  To: Sam Ravnborg; +Cc: git

Sam Ravnborg <sam@ravnborg.org> writes:

> Whats wrong using cogito?
> In other words. Why does you feel like that when we use cogito to do
> cg-update.

Using cogito is not a problem at all.  The mechanism to prepare
trees to serve wider audience not being used widely is.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: dumb transports not being welcomed..
  2005-09-13 21:30   ` Junio C Hamano
@ 2005-09-13 21:42     ` Sam Ravnborg
  2005-09-13 22:03     ` Horst von Brand
  2005-09-13 22:11     ` Junio C Hamano
  2 siblings, 0 replies; 27+ messages in thread
From: Sam Ravnborg @ 2005-09-13 21:42 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Tue, Sep 13, 2005 at 02:30:16PM -0700, Junio C Hamano wrote:
> Sam Ravnborg <sam@ravnborg.org> writes:
> 
> > Whats wrong using cogito?
> > In other words. Why does you feel like that when we use cogito to do
> > cg-update.
> 
> Using cogito is not a problem at all.  The mechanism to prepare
> trees to serve wider audience not being used widely is.

What is the right method to use at kernel.org then?

I did:

rm -rf kbuild.git
cp -al ../linus/linus-2.6.git kbuild.get
echo ... > ---/alternates
GIT_DIR=kbuild.dir git-prune-cache

Worked like a charm although a bit hacky...
And I had a tree to start out from.

	Sam

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: dumb transports not being welcomed..
  2005-09-13 21:30   ` Junio C Hamano
  2005-09-13 21:42     ` Sam Ravnborg
@ 2005-09-13 22:03     ` Horst von Brand
  2005-09-13 22:23       ` Junio C Hamano
  2005-09-13 22:11     ` Junio C Hamano
  2 siblings, 1 reply; 27+ messages in thread
From: Horst von Brand @ 2005-09-13 22:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Sam Ravnborg, git

Junio C Hamano <junkio@cox.net> wrote:

[...]

> Using cogito is not a problem at all.  The mechanism to prepare
> trees to serve wider audience not being used widely is.

It isn't really documented...
-- 
Dr. Horst H. von Brand                   User #22616 counter.li.org
Departamento de Informatica                     Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria              +56 32 654239
Casilla 110-V, Valparaiso, Chile                Fax:  +56 32 797513

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: dumb transports not being welcomed..
  2005-09-13 21:30   ` Junio C Hamano
  2005-09-13 21:42     ` Sam Ravnborg
  2005-09-13 22:03     ` Horst von Brand
@ 2005-09-13 22:11     ` Junio C Hamano
  2005-09-13 22:21       ` Jeff Garzik
                         ` (3 more replies)
  2 siblings, 4 replies; 27+ messages in thread
From: Junio C Hamano @ 2005-09-13 22:11 UTC (permalink / raw)
  To: Sam Ravnborg; +Cc: git

Junio C Hamano <junkio@cox.net> writes:

> Sam Ravnborg <sam@ravnborg.org> writes:
>
>> Whats wrong using cogito?
>> In other words. Why does you feel like that when we use cogito to do
>> cg-update.
>
> Using cogito is not a problem at all.  The mechanism to prepare
> trees to serve wider audience not being used widely is.

I need to clarify what I meant by 'not welcoming dumb transport'
a bit better.  Namely, those (~80 - 23) = ~57 repositories lack
support for 'git ls-remote' over http, which means you cannot
discover what refs the repository has.

Some people argued that it can be done via recursive wget on
refs/ hierarchy.  Here is what you would get if you do that
against kernel.org:

  $ wget -r -np -nH --cut-dirs=4 http://kernel.org/pub/scm/git/git.git/refs/.
  $ ls -R refs
  refs:
  ./      index.html          index.html?C=N;O=A  index.html?C=S;O=D
  ../     index.html?C=M;O=A  index.html?C=N;O=D  tags/
  heads/  index.html?C=M;O=D  index.html?C=S;O=A

  refs/heads:
  ./          index.html?C=M;O=A  index.html?C=N;O=D  master  todo
  ../         index.html?C=M;O=D  index.html?C=S;O=A  pu
  index.html  index.html?C=N;O=A  index.html?C=S;O=D  rc

  refs/tags:
  ./                  index.html?C=M;O=D  index.html?C=S;O=D  v0.99.2  v0.99.6
  ../                 index.html?C=N;O=A  junio-gpg-pub       v0.99.3
  index.html          index.html?C=N;O=D  v0.99               v0.99.4
  index.html?C=M;O=A  index.html?C=S;O=A  v0.99.1             v0.99.5

Of course, I do not have a branch called index.html there, and
this also means I will not be able to have a branch with that
name even if I wanted to.

Also some webservers are configured not to even allow directory
index, and they may use different formatting for directory index
even when they do support it, so excluding anything that matches
index.html* would work well but that is only heuristics.

The file $GIT_DIR/info/refs was introduced to solve this by
listing the available refs for discovery, and hooks/post-update,
when enabled, runs update-server-info to update the file (among
other things) whenever you push into the repository.  info/refs
is not strictly necessary for repositories at kernel.org because
people tend to know what refs are available for pulling and you
can always visit there via gitweb to find it out.

I just felt that it is a good habit to get into to prepare your
repositories in a shape usable even when served by an HTTP
server that is less forgiving than what kernel.org runs -- that
was what I felt "discouraging" about.

Another thing is that the missing info/refs file means the
repository is not prepared with update-server-info, so it is
likely that it lacks objects/info/packs to describe what packs
are in the object database.  I believe cogito uses git-http-pull
after you tell which ref to pull, and this step would break if
the repository is packed, objects/info/packs is not available,
and if the downloader does not have an object that is already
prune-packed in the repository.  This means either people are
not packing their repository (hence nobody complained), or
public are pulling over rsync transport (which slurps everything
in sight).  Both are good reasons to feel discouraged about.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: dumb transports not being welcomed..
  2005-09-13 22:11     ` Junio C Hamano
@ 2005-09-13 22:21       ` Jeff Garzik
  2005-09-13 22:29         ` Junio C Hamano
  2005-09-13 22:29       ` Linus Torvalds
                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 27+ messages in thread
From: Jeff Garzik @ 2005-09-13 22:21 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Sam Ravnborg, git

Junio C Hamano wrote:
> The file $GIT_DIR/info/refs was introduced to solve this by
> listing the available refs for discovery, and hooks/post-update,
> when enabled, runs update-server-info to update the file (among
> other things) whenever you push into the repository.

This is helpful.  I'll run git-update-server-info before each push, now.

	Jeff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: dumb transports not being welcomed..
  2005-09-13 22:03     ` Horst von Brand
@ 2005-09-13 22:23       ` Junio C Hamano
  0 siblings, 0 replies; 27+ messages in thread
From: Junio C Hamano @ 2005-09-13 22:23 UTC (permalink / raw)
  To: Horst von Brand; +Cc: Sam Ravnborg, git

Horst von Brand <vonbrand@inf.utfsm.cl> writes:

> Junio C Hamano <junkio@cox.net> wrote:
>
> [...]
>
>> Using cogito is not a problem at all.  The mechanism to prepare
>> trees to serve wider audience not being used widely is.
>
> It isn't really documented...

True.  The existing documentation might be sketchy.  We only
have the following documentation pages right now.  Clarification
patches are welcome.

http://www.kernel.org/pub/software/scm/git/docs/tutorial.html
    Look for "Publishing your work" section, and
    "Working with Others section, "project lead" and "subsystem
    maintainer" subsections, both bullet point #2.

http://www.kernel.org/pub/software/scm/git/docs/git-update-server-info.html
    The above sections in the tutorial repeatedly mentions this
    command.

http://www.kernel.org/pub/software/scm/git/docs/repository-layout.html
    And git-update-server-info documentation refers to this page.
    Look for "objects/info/packs" and "info/refs".

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: dumb transports not being welcomed..
  2005-09-13 22:21       ` Jeff Garzik
@ 2005-09-13 22:29         ` Junio C Hamano
  2005-09-14 13:21           ` Jeff Garzik
  0 siblings, 1 reply; 27+ messages in thread
From: Junio C Hamano @ 2005-09-13 22:29 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Sam Ravnborg, git

Jeff Garzik <jgarzik@pobox.com> writes:

> Junio C Hamano wrote:
>> The file $GIT_DIR/info/refs was introduced to solve this by
>> listing the available refs for discovery, and hooks/post-update,
>> when enabled, runs update-server-info to update the file (among
>> other things) whenever you push into the repository.
>
> This is helpful.  I'll run git-update-server-info before each push, now.

Just to make sure I am not misunderstanding you, is your "publish"
workflow like this?

    - do your work on your private machine
    - git-update-server-info in your private repository
    - rsync that out to master.kernel.org

If so, then "before each push" makes sense.  

I know you understand the following but this is for other people
on the list.

The hooks/post-update I was talking about assumes a different
"publish" workflow:

    - do your work on your private machine
    - git push master.kernel.org:/pub/scm/...

and you have hooks/post-update enabled in the repository on
master.kernel.org; the counterpart program for 'git push' which
runs on master.kernel.org updates the info/refs and objects/info/packs
file over there once your push is done.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: dumb transports not being welcomed..
  2005-09-13 22:11     ` Junio C Hamano
  2005-09-13 22:21       ` Jeff Garzik
@ 2005-09-13 22:29       ` Linus Torvalds
  2005-09-13 22:37         ` Junio C Hamano
                           ` (2 more replies)
  2005-09-14 10:45       ` Sven Verdoolaege
  2005-09-14 14:10       ` Jon Loeliger
  3 siblings, 3 replies; 27+ messages in thread
From: Linus Torvalds @ 2005-09-13 22:29 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Sam Ravnborg, git

On Tue, 13 Sep 2005, Junio C Hamano wrote:
> 
> I need to clarify what I meant by 'not welcoming dumb transport'
> a bit better.  Namely, those (~80 - 23) = ~57 repositories lack
> support for 'git ls-remote' over http, which means you cannot
> discover what refs the repository has.

You do realize that up until a week ago (six days, to be exact),
kernel.org was running git-0.99.4, which I don't think actually
implemented any of the info stuff?

So out of the 57 repositories, how many haven't been updated in a week?

I suspect that explains a large portion of it.

Also, I really do think that the dumb transports are oversold, and 
git-daemon is undersold. I know all about firewalls, but I also think that 
if people used the smart protocols more, that's a problem that would 
largely solve itself. 

Dumb protocols can never do really well. That's just very fundamental. 

		Linus

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: dumb transports not being welcomed..
  2005-09-13 22:29       ` Linus Torvalds
@ 2005-09-13 22:37         ` Junio C Hamano
  2005-09-13 22:55           ` Linus Torvalds
  2005-09-14  0:01         ` Johannes Schindelin
  2005-09-15  9:17         ` Junio C Hamano
  2 siblings, 1 reply; 27+ messages in thread
From: Junio C Hamano @ 2005-09-13 22:37 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Sam Ravnborg, git

Linus Torvalds <torvalds@osdl.org> writes:

> You do realize that up until a week ago (six days, to be exact),
> kernel.org was running git-0.99.4, which I don't think actually
> implemented any of the info stuff?

Ah, no I didn't.  For future reference, how can I find that kind
of thing myself (not "what was not in 0.99.4", but "what did
kernel.org run a week ago")?

> Dumb protocols can never do really well. That's just very fundamental. 

I agree.  I am waiting for git-deamon to happen on kernel.org,

I am hoping there won't be much problems but am somewhat worried
that customized packing for each client might turn out to be too
much load.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: dumb transports not being welcomed..
  2005-09-13 22:37         ` Junio C Hamano
@ 2005-09-13 22:55           ` Linus Torvalds
  2005-09-13 23:02             ` Junio C Hamano
  0 siblings, 1 reply; 27+ messages in thread
From: Linus Torvalds @ 2005-09-13 22:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Sam Ravnborg, git



On Tue, 13 Sep 2005, Junio C Hamano wrote:
> 
> Ah, no I didn't.  For future reference, how can I find that kind
> of thing myself (not "what was not in 0.99.4", but "what did
> kernel.org run a week ago")?

I don't know. Hpa did an announcement when kernel.org switched to 0.99.4, 
but I never saw any announcement of upgrades (I only noticed that the date 
on /usr/bin/git is now Sep 7 a coupld of days ago - no proof of upgrade, 
but _something_ happened six days ago ;)

I personally run my own private set of binaries anyway, and I upgrade 
pretty randomly, so what the official kernel.org installation is doesnt' 
affect me personally ;)

		Linus

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: dumb transports not being welcomed..
  2005-09-13 22:55           ` Linus Torvalds
@ 2005-09-13 23:02             ` Junio C Hamano
  2005-09-14  2:25               ` Kay Sievers
  0 siblings, 1 reply; 27+ messages in thread
From: Junio C Hamano @ 2005-09-13 23:02 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Sam Ravnborg, git

Linus Torvalds <torvalds@osdl.org> writes:

> On Tue, 13 Sep 2005, Junio C Hamano wrote:
>> 
>> Ah, no I didn't.  For future reference, how can I find that kind
>> of thing myself (not "what was not in 0.99.4", but "what did
>> kernel.org run a week ago")?
>
> I don't know. Hpa did an announcement when kernel.org switched to 0.99.4, 
> but I never saw any announcement of upgrades (I only noticed that the date 
> on /usr/bin/git is now Sep 7 a coupld of days ago - no proof of upgrade, 
> but _something_ happened six days ago ;)

Thanks, that much I could have figured out myself.

> I personally run my own private set of binaries anyway, and I upgrade 
> pretty randomly, so what the official kernel.org installation is doesnt' 
> affect me personally ;)

Same here.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: dumb transports not being welcomed..
  2005-09-13 22:29       ` Linus Torvalds
  2005-09-13 22:37         ` Junio C Hamano
@ 2005-09-14  0:01         ` Johannes Schindelin
  2005-09-14  0:57           ` Linus Torvalds
  2005-09-15  9:17         ` Junio C Hamano
  2 siblings, 1 reply; 27+ messages in thread
From: Johannes Schindelin @ 2005-09-14  0:01 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Sam Ravnborg, git

Hi,

On Tue, 13 Sep 2005, Linus Torvalds wrote:

> 
> 
> Also, I really do think that the dumb transports are oversold, and 
> git-daemon is undersold.

My tests confirm that a single git-pull via git-daemon brings a small 
machine to its knees. Which means that multiple git-pull's bring a nice 
big machine like kernel.org to its knees.

IMHO the culprit is git-rev-list, which takes ages and ages for big 
repositories (beware: this could be my Darwin client which might be 
incapable to stop the rev enumeration in time; but if that can be done 
unintentionally, this can be intentionally, too!).

Did anybody think about using the information which helps the dumb 
transports for intelligent transports, too? (A sort of cache for 
git-rev-list would do wonders...) This could at least help the CPU load on 
the server.

(In retrospect it might have been a mistake to make the call to 
git-update-server-info optional: maybe an environment variable should be 
set to _inhibit_ the behaviour for those which absolutely cannot live with 
the performance hit.)

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: dumb transports not being welcomed..
  2005-09-14  0:01         ` Johannes Schindelin
@ 2005-09-14  0:57           ` Linus Torvalds
  2005-09-14  1:42             ` Linus Torvalds
  2005-09-14  8:38             ` Johannes Schindelin
  0 siblings, 2 replies; 27+ messages in thread
From: Linus Torvalds @ 2005-09-14  0:57 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Junio C Hamano, Sam Ravnborg, git

On Wed, 14 Sep 2005, Johannes Schindelin wrote:
> 
> IMHO the culprit is git-rev-list, which takes ages and ages for big 
> repositories (beware: this could be my Darwin client which might be 
> incapable to stop the rev enumeration in time; but if that can be done 
> unintentionally, this can be intentionally, too!).

Packed too?

git-rev-list will take a long time if the tree is unpacked and not in the 
cache. It's all disk seeks. That's _especially_ true of a full clone 
(which will walk the whole way down).

But I have tons of memory in my machines, and I haven't looked at how 
badly it does if you don't have that. I know that master.kernel.org is 
certainly not having any trouble at all with me pulling from lots of 
trees.. Maybe git-rev-list uses up lots of your memory.

I'm seeing 14 seconds of CPU-time for a _full_ kernel history, with
"--objects". Yes, it's not exactly cheap, and maybe I should optimize it
(it's all in the "--objects" handling and probably a large portion of it
is because trees actually pack very well indeed, so it's actually
unpacking a lot of trees), but considering that that is preparing the
metadata for pulling down a hundred megs of stuff..

That said, I do think that --objects handling is _very_ CPU-hungry. The 
offender is this old commit of mine:

	4311d328fee11fbd80862e3c5de06a26a0e80046
	Author: Linus Torvalds <torvalds@g5.osdl.org>
	Date:   Sat Jul 23 10:01:49 2005 -0700

	    Be more aggressive about marking trees uninteresting
	...

which is much better about avoiding objects in old trees, but it does so 
at the expense of being _horribly_ CPU-inefficient. It will walk through 
every tree of every commit that we decided was uninteresting.

You can try to just undo that one commit - it will make pack-files have a 
few extraneous objects, but I think it will make a huge difference in the 
CPU cost of "small pulls" (it won't matter at all for the "git clone" 
case: for that case we just always have to walk the whole object tree).

		Linus

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: dumb transports not being welcomed..
  2005-09-14  0:57           ` Linus Torvalds
@ 2005-09-14  1:42             ` Linus Torvalds
  2005-09-14  8:38             ` Johannes Schindelin
  1 sibling, 0 replies; 27+ messages in thread
From: Linus Torvalds @ 2005-09-14  1:42 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Junio C Hamano, Sam Ravnborg, git

On Tue, 13 Sep 2005, Linus Torvalds wrote:
> 
> That said, I do think that --objects handling is _very_ CPU-hungry. The 
> offender is this old commit of mine:

No, never mind. Even without that, we end up walking a _lot_ of really
uninterestng "internal" trees (ie trees where all parents were
uninteresting, and they were parsed just because we had to parse a lot of 
commits to determine what they reached). 

To explain it a bit better, let's see a common case:

HEAD:		a
	       / \
	      b   \
	     / \   \
	    c   d   \
	   /   / \   \
	  e   f   g   x
	   \ /   /   /
	    h   i   /
	     \ /   /
	      j   /
	       \ /
Old history:    k

Now, imagine that we do 

	git-rev-list b..a

which results in just two commits: 'x' and 'a' (everything else is
reachable from 'b'). This is actually not that uncommon. However, in order 
to realize that, we had to walk through _all_ of a..k and x before we saw 
that 'b'..'k' were all uninteresting, and there was nothing else reachable 
that migt be interesting.

Now, that's pretty cheap per se. git-rev-list is optimized for this case, 
and hey, it's usually just a few hundred objects. Not a big deal - 
generating the commit list takes a small fraction of a second.

However, now the true cost of "--objects" is clear: we will walk the two
"positive" trees ('a' and 'x') and look up all their objects (about 35,000
of them) interesting. So far so good. Just another fraction of a second. 

HOWEVER, then we walk _every_single_uninteresting_commit_ and walk _their_
objects to say "we've got this already". And the uninteresting commits are
often many more than the interesting ones - we might have had to go
several weeks back to list them all. The above example is not at all
extreme: we might have something like 20 interesting commits, and several
hundreds of the uninteresting ones.

Now, the way to optimize things is to realize that there are two "classes" 
of uninteresting commits. There are the uninteresting commits that are 
adjacent to an interesting one (in the above example, they are "b" and 
"k"), and there are the uninteresting commits that are only reachable from 
-other- uninteresting commits ('c'..'j'). Let's call the latter class 
"doubly uninteresting commits", and the former class "uninteresting edge 
commits".

And we really don't need to walk the "doubly uninteresting" trees. But we
do. Because we don't have another phase to discover the edge (we can't do
that during the initial discovery phase, because we don't know if a commit
is going to end up interesting in the end - we migth have another commit
that we haven't seen yet that might be the parent of a commit that _looks_
interesting right now, but ends up being uninteresting because that
eventually seen parent ended up being uninteresting).

In other words: I bet I could make "git-rev-list --objects" go from ten
seconds to a single second if I did that edge discovery for most small
incremental updates. Instead, I'm lazy, and I'm describing the problem on 
the list as an "educational experience", and am callously hoping that 
somebody will see it as an interesting challenge ;)

Btw, the above is definitely not made up. If I did my statistics right,
doing "git-rev-list v2.6.14-rc1.." with the current tree results in 178
"interesting" commits, and 6251 "uninteresting" ones. And I bet 99% of
those uninteresting ones are "doubly uninteresting" - and we're just
wasting CPU time looking at what objects are reachable from them..

		Linus

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: dumb transports not being welcomed..
  2005-09-13 23:02             ` Junio C Hamano
@ 2005-09-14  2:25               ` Kay Sievers
  2005-09-14  3:32                 ` Junio C Hamano
  0 siblings, 1 reply; 27+ messages in thread
From: Kay Sievers @ 2005-09-14  2:25 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Sam Ravnborg, git

On Tue, Sep 13, 2005 at 04:02:05PM -0700, Junio C Hamano wrote:
> Linus Torvalds <torvalds@osdl.org> writes:
> 
> > On Tue, 13 Sep 2005, Junio C Hamano wrote:
> >> 
> >> Ah, no I didn't.  For future reference, how can I find that kind
> >> of thing myself (not "what was not in 0.99.4", but "what did
> >> kernel.org run a week ago")?
> >
> > I don't know. Hpa did an announcement when kernel.org switched to 0.99.4, 
> > but I never saw any announcement of upgrades (I only noticed that the date 
> > on /usr/bin/git is now Sep 7 a coupld of days ago - no proof of upgrade, 
> > but _something_ happened six days ago ;)
> 
> Thanks, that much I could have figured out myself.

That's easier:

  $ rpm -q git-core
  git-core-0.99.6-1

  $ rpm -q cogito
  cogito-0.14-1

Kay

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: dumb transports not being welcomed..
  2005-09-14  2:25               ` Kay Sievers
@ 2005-09-14  3:32                 ` Junio C Hamano
  0 siblings, 0 replies; 27+ messages in thread
From: Junio C Hamano @ 2005-09-14  3:32 UTC (permalink / raw)
  To: Kay Sievers; +Cc: Linus Torvalds, Sam Ravnborg, git

Kay Sievers <kay.sievers@vrfy.org> writes:

> That's easier:
>
>   $ rpm -q git-core
>   git-core-0.99.6-1
>
>   $ rpm -q cogito
>   cogito-0.14-1

Sorry, I fail to see how it answers my question: "what did
kernel.org run a week ago?"

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: dumb transports not being welcomed..
  2005-09-14  0:57           ` Linus Torvalds
  2005-09-14  1:42             ` Linus Torvalds
@ 2005-09-14  8:38             ` Johannes Schindelin
  2005-09-14 15:07               ` Linus Torvalds
  1 sibling, 1 reply; 27+ messages in thread
From: Johannes Schindelin @ 2005-09-14  8:38 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Sam Ravnborg, git

Hi,

On Tue, 13 Sep 2005, Linus Torvalds wrote:
> 
> On Wed, 14 Sep 2005, Johannes Schindelin wrote:
> > 
> > IMHO the culprit is git-rev-list, which takes ages and ages for big 
> > repositories (beware: this could be my Darwin client which might be 
> > incapable to stop the rev enumeration in time; but if that can be done 
> > unintentionally, this can be intentionally, too!).
> 
> Packed too?

Yes. Almost all of it.

> git-rev-list will take a long time if the tree is unpacked and not in the 
> cache. It's all disk seeks. That's _especially_ true of a full clone 
> (which will walk the whole way down).

That could be the case, but my test case is a CVS project I track on one 
side, and I fetch on the other side. Therefore, your diagram from your 
other mail does not really apply. My history looks more or less like this:

a b c d origin
          |
          |
          |
          |
\ \ \ \   |
 ---------|

So, the origin is a linear CVS project. With many, many, many commits. At 
one stage I broke off new git branches. I kept tracking the CVS project, 
though.

What I see when fetching all heads (thanks to Junio, this is one call to 
git-fetch now), where all but origin are up to date, is that it takes a 
very long time. Swapping kicks in, and top tells me that 26.6% of the 
memory is occupied by git-rev-list (The server has 128M, with 1G swap, and 
I am unfortunately not the only user of this machine).

I fail to see why it should need those amounts of memory. (I tested this 
over the ssh protocol, which should essentially do the same as git-daemon, 
right?) After all, the merge point between the branches should be marked 
uninteresting after one single step from each of my private branches.

> But I have tons of memory in my machines, and I haven't looked at how 
> badly it does if you don't have that. I know that master.kernel.org is 
> certainly not having any trouble at all with me pulling from lots of 
> trees.. Maybe git-rev-list uses up lots of your memory.

That certainly is the case.

As for master.kernel.org: Unfortunately, you will not be the only puller. 
And if your process needs just 5% of the RAM, then 21 pullers will be too 
many.

> That said, I do think that --objects handling is _very_ CPU-hungry.

In my experience, before the swapping started, the process did not get 
more than 20% CPU.

Nevertheless, I still think that it would be a good idea to reuse the 
files created for the dumb transport for the intelligent transport. 
Especially for a project which is more often fetched than uploaded.

I also see other strange things like packing 0 objects, and packing >0 
objects after just having fetched from that repository. Hopefully I will 
have time to look into that (and understand the code to begin with).

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: dumb transports not being welcomed..
  2005-09-13 22:11     ` Junio C Hamano
  2005-09-13 22:21       ` Jeff Garzik
  2005-09-13 22:29       ` Linus Torvalds
@ 2005-09-14 10:45       ` Sven Verdoolaege
  2005-09-14 16:14         ` Junio C Hamano
  2005-09-14 14:10       ` Jon Loeliger
  3 siblings, 1 reply; 27+ messages in thread
From: Sven Verdoolaege @ 2005-09-14 10:45 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Sam Ravnborg, git

On Tue, Sep 13, 2005 at 03:11:42PM -0700, Junio C Hamano wrote:
> The file $GIT_DIR/info/refs was introduced to solve this by
> listing the available refs for discovery, and hooks/post-update,
> when enabled, runs update-server-info to update the file (among
> other things) whenever you push into the repository.  

It doesn't help that update-server-info crashes if you run
it for the first time on an old repo.
Maybe it should create the appropriate directory structure on the fly,
but the patch below at least checks whether new rev-cache could
be created.

skimo
--

write_rev_cache: check whether new cache could be created.

---
commit d30b87459c690ff68e65dfe8ecdc585dab64323a
tree 51127c1af00f8fd63e7b996384e86d7d31ad5562
parent 2ba6c47be1762726ad0c1d5779064c489150d789
author Sven Verdoolaege <skimo@liacs.nl> Wed, 14 Sep 2005 12:40:28 +0200
committer Sven Verdoolaege <skimo@liacs.nl> Wed, 14 Sep 2005 12:40:28 +0200

 rev-cache.c   |    8 +++++++-
 rev-cache.h   |    2 +-
 server-info.c |    7 ++++---
 3 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/rev-cache.c b/rev-cache.c
--- a/rev-cache.c
+++ b/rev-cache.c
@@ -103,7 +103,7 @@ static void write_one_rev_cache(FILE *re
 		write_one_rev_cache(rev_cache_file, rle->ri);
 }
 
-void write_rev_cache(const char *newpath, const char *oldpath)
+int write_rev_cache(const char *newpath, const char *oldpath)
 {
 	/* write the following commit ancestry information in
 	 * $GIT_DIR/info/rev-cache.
@@ -131,6 +131,11 @@ void write_rev_cache(const char *newpath
 		size_t sz;
 		FILE *oldfp = fopen(oldpath, "r");
 		rev_cache_file = fopen(newpath, "w");
+		if (!rev_cache_file) {
+			if (oldfp)
+				fclose(oldfp);
+			return error("cannot open %s", newpath);
+		}
 		if (oldfp) {
 			while (1) {
 				sz = fread(buf, 1, sizeof(buf), oldfp);
@@ -161,6 +166,7 @@ void write_rev_cache(const char *newpath
 		write_one_rev_cache(rev_cache_file, ri);
 	}
 	fclose(rev_cache_file);
+	return 0;
 }
 
 static void add_parent(struct rev_cache *child,
diff --git a/rev-cache.h b/rev-cache.h
--- a/rev-cache.h
+++ b/rev-cache.h
@@ -24,6 +24,6 @@ struct rev_list_elem {
 extern int find_rev_cache(const unsigned char *);
 extern int read_rev_cache(const char *, FILE *, int);
 extern int record_rev_cache(const unsigned char *, FILE *);
-extern void write_rev_cache(const char *new, const char *old);
+extern int write_rev_cache(const char *new, const char *old);
 
 #endif
diff --git a/server-info.c b/server-info.c
--- a/server-info.c
+++ b/server-info.c
@@ -536,6 +536,7 @@ static int update_info_revs(int force)
 	char *path0 = strdup(git_path("info/rev-cache"));
 	int len = strlen(path0);
 	char *path1 = xmalloc(len + 2);
+	int errs = 0;
 
 	strcpy(path1, path0);
 	strcpy(path1 + len, "+");
@@ -548,11 +549,11 @@ static int update_info_revs(int force)
 	for_each_ref(record_rev_cache_ref);
 
 	/* update the rev-cache database */
-	write_rev_cache(path1, force ? "/dev/null" : path0);
-	rename(path1, path0);
+	errs = errs || write_rev_cache(path1, force ? "/dev/null" : path0);
+	errs = errs || rename(path1, path0);
 	free(path1);
 	free(path0);
-	return 0;
+	return errs;
 }
 
 /* public */

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: dumb transports not being welcomed..
  2005-09-13 22:29         ` Junio C Hamano
@ 2005-09-14 13:21           ` Jeff Garzik
  0 siblings, 0 replies; 27+ messages in thread
From: Jeff Garzik @ 2005-09-14 13:21 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Sam Ravnborg, git

Junio C Hamano wrote:
> Jeff Garzik <jgarzik@pobox.com> writes:
> 
> 
>>Junio C Hamano wrote:
>>
>>>The file $GIT_DIR/info/refs was introduced to solve this by
>>>listing the available refs for discovery, and hooks/post-update,
>>>when enabled, runs update-server-info to update the file (among
>>>other things) whenever you push into the repository.
>>
>>This is helpful.  I'll run git-update-server-info before each push, now.
> 
> 
> Just to make sure I am not misunderstanding you, is your "publish"
> workflow like this?
> 
>     - do your work on your private machine
>     - git-update-server-info in your private repository
>     - rsync that out to master.kernel.org
> 
> If so, then "before each push" makes sense.  

Correct.

	Jeff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: dumb transports not being welcomed..
  2005-09-13 22:11     ` Junio C Hamano
                         ` (2 preceding siblings ...)
  2005-09-14 10:45       ` Sven Verdoolaege
@ 2005-09-14 14:10       ` Jon Loeliger
  2005-09-14 19:00         ` Junio C Hamano
  3 siblings, 1 reply; 27+ messages in thread
From: Jon Loeliger @ 2005-09-14 14:10 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Sam Ravnborg, Git List

On Tue, 2005-09-13 at 17:11, Junio C Hamano wrote:

> I just felt that it is a good habit to get into to prepare your
> repositories in a shape usable even when served by an HTTP
> server that is less forgiving than what kernel.org runs -- that
> was what I felt "discouraging" about.

Well, that is sort of just it, too.  Why not make the
default, obvious, common repo prep mechanism do all
the necessary steps for proper presentation?  Having
to remember to do 6 steps just begs for an additional
layer of scripting.

>   This means either people are
> not packing their repository (hence nobody complained), or
> public are pulling over rsync transport (which slurps everything
> in sight).  Both are good reasons to feel discouraged about.

I confess, I've been using rsync as it is what appears
to be able to reliably get a repository that works.

jdl

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: dumb transports not being welcomed..
  2005-09-14  8:38             ` Johannes Schindelin
@ 2005-09-14 15:07               ` Linus Torvalds
  0 siblings, 0 replies; 27+ messages in thread
From: Linus Torvalds @ 2005-09-14 15:07 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Junio C Hamano, Sam Ravnborg, git

On Wed, 14 Sep 2005, Johannes Schindelin wrote:
> 
> What I see when fetching all heads (thanks to Junio, this is one call to 
> git-fetch now), where all but origin are up to date, is that it takes a 
> very long time. Swapping kicks in, and top tells me that 26.6% of the 
> memory is occupied by git-rev-list (The server has 128M, with 1G swap, and 
> I am unfortunately not the only user of this machine).

Ok. As mentioned, I've not looked at memory usage. The machines I play 
with tend to have 2GB or more, simply because bk needed at least 1GB to be 
nice and cached on the kernel ;)

Git has needed less than bk, so I've not cared ;)

> I fail to see why it should need those amounts of memory. (I tested this 
> over the ssh protocol, which should essentially do the same as git-daemon, 
> right?) After all, the merge point between the branches should be marked 
> uninteresting after one single step from each of my private branches.

One of the issues is that git-rev-list will (for example) keep track of 
the commit messages too for every commit. That in itself can be a lot of 
stuff, depending on how active the tree is and how large the messages are.

Now, that should be easy enough to fix (parse_commit() normally saves the 
buffer it parses into "commit->buffer", so we'd just need to do something 
like

	if (!verbose_header && commit->buffer) {
		free(commit->buffer);
		commit->buffer = NULL;
	}

for each commit.

But for --objects, the bigger memory pressure is that it needs to track 
the "struct object" for every single object when it generates the 
reference tracking. And THAT tends to be expensive. The object lists are 
also not very space-efficient (ie one small allocation for each list 
entry).

We could probably make objects/lists more space-efficient.

> I also see other strange things like packing 0 objects, and packing >0 
> objects after just having fetched from that repository. Hopefully I will 
> have time to look into that (and understand the code to begin with).

Well, the "packing 0 objects" should be normal. I'm surprised at the ">0" 
case after a fetch: the packign is _not_ guaranteed to be exact, but if 
you have the exact same state as (or a superset of) the other end, you 
should always see a zero.

		Linus

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: dumb transports not being welcomed..
  2005-09-14 10:45       ` Sven Verdoolaege
@ 2005-09-14 16:14         ` Junio C Hamano
  0 siblings, 0 replies; 27+ messages in thread
From: Junio C Hamano @ 2005-09-14 16:14 UTC (permalink / raw)
  To: Sam Ravnborg; +Cc: git

Sven Verdoolaege <skimo@kotnet.org> writes:

> It doesn't help that update-server-info crashes if you run
> it for the first time on an old repo.
> Maybe it should create the appropriate directory structure on the fly,
> but the patch below at least checks whether new rev-cache could
> be created.

True; thanks for the patch.  However, since nobody seems to use
rev-cache, it _might_ make sense to just yank it out.  If it
turns out to be useful later we could always resurrect it.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: dumb transports not being welcomed..
  2005-09-14 14:10       ` Jon Loeliger
@ 2005-09-14 19:00         ` Junio C Hamano
  2005-09-14 19:13           ` Jon Loeliger
  0 siblings, 1 reply; 27+ messages in thread
From: Junio C Hamano @ 2005-09-14 19:00 UTC (permalink / raw)
  To: Jon Loeliger; +Cc: Sam Ravnborg, Git List

Jon Loeliger <jdl@freescale.com> writes:

> Well, that is sort of just it, too.  Why not make the
> default, obvious, common repo prep mechanism do all
> the necessary steps for proper presentation?  Having
> to remember to do 6 steps just begs for an additional
> layer of scripting.

Fair enough.  My excuse is that Linus did not want the
update-server-info hook enabled by default.  He does not believe
in dumb transports anyway, but aside from that, it still is a
valid attitude because it is not necessary when you do not
intend to publish your repository over dumb transport at all but
still want to push into it.  And another excuse is I do not
in general think enabling hooks by default is a good idea.

Even if you built your repository with older git tools, you
should be always able to say 'GIT_DIR=that-repository
git-init-db' without damaging its existing contents to install
the disabled hooks in its hooks/ directory.

> I confess, I've been using rsync as it is what appears
> to be able to reliably get a repository that works.

And I thought rsync was a reliable way too, until I saw a
message from Tony Luck this morning X-<.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: dumb transports not being welcomed..
  2005-09-14 19:00         ` Junio C Hamano
@ 2005-09-14 19:13           ` Jon Loeliger
  0 siblings, 0 replies; 27+ messages in thread
From: Jon Loeliger @ 2005-09-14 19:13 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Sam Ravnborg, Git List

On Wed, 2005-09-14 at 14:00, Junio C Hamano wrote:

> Fair enough.  My excuse is that Linus did not want the
> update-server-info hook enabled by default.  He does not believe
> in dumb transports anyway, but aside from that, it still is a
> valid attitude because it is not necessary when you do not
> intend to publish your repository over dumb transport at all but
> still want to push into it.  And another excuse is I do not
> in general think enabling hooks by default is a good idea.
> 
> Even if you built your repository with older git tools, you
> should be always able to say 'GIT_DIR=that-repository
> git-init-db' without damaging its existing contents to install
> the disabled hooks in its hooks/ directory.

Hmmm.  Maybe this is all begging a form of documentation
down the "Best Practices" line, or "Tips I Learned
By Reading Junio and Linus Postings on Git". :-)

> And I thought rsync was a reliable way too, until I saw a
> message from Tony Luck this morning X-<.

Heh.  I read his problem description too, and it sounded
remarkably close to the HTTP pull problems that I had
been victimized by, thus converting me to rsync (for now).

I have recloned entire repos due to the aborted pulls
being left in an inconsistent state.  In fact, that is
what lead me to find git-fsck-cache, which I was expecting
to sort out the missing pieces and detect an incomplete
clone.  Perhaps marking it as "needing completion" via
another clone/pull effort.  Dunno.

jdl

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: dumb transports not being welcomed..
  2005-09-13 22:29       ` Linus Torvalds
  2005-09-13 22:37         ` Junio C Hamano
  2005-09-14  0:01         ` Johannes Schindelin
@ 2005-09-15  9:17         ` Junio C Hamano
  2 siblings, 0 replies; 27+ messages in thread
From: Junio C Hamano @ 2005-09-15  9:17 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Sam Ravnborg, git

Linus Torvalds <torvalds@osdl.org> writes:

> Also, I really do think that the dumb transports are oversold, and 
> git-daemon is undersold. I know all about firewalls, but I also think that 
> if people used the smart protocols more, that's a problem that would 
> largely solve itself. 

This reminds me of one thing I've been wanting to have on the
client side (git-fetch-pack): HTTP CONNECT passthru ala
"tn-gw-nav -H", which is one of the ways many people ssh out
over firewalls if I understand correctly.  CVS pserver protocol
seems to do the same using `proxy' connection option.

Anybody interested?

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2005-09-15  9:17 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-09-13 21:07 dumb transports not being welcomed Junio C Hamano
2005-09-13 21:14 ` Sam Ravnborg
2005-09-13 21:30   ` Junio C Hamano
2005-09-13 21:42     ` Sam Ravnborg
2005-09-13 22:03     ` Horst von Brand
2005-09-13 22:23       ` Junio C Hamano
2005-09-13 22:11     ` Junio C Hamano
2005-09-13 22:21       ` Jeff Garzik
2005-09-13 22:29         ` Junio C Hamano
2005-09-14 13:21           ` Jeff Garzik
2005-09-13 22:29       ` Linus Torvalds
2005-09-13 22:37         ` Junio C Hamano
2005-09-13 22:55           ` Linus Torvalds
2005-09-13 23:02             ` Junio C Hamano
2005-09-14  2:25               ` Kay Sievers
2005-09-14  3:32                 ` Junio C Hamano
2005-09-14  0:01         ` Johannes Schindelin
2005-09-14  0:57           ` Linus Torvalds
2005-09-14  1:42             ` Linus Torvalds
2005-09-14  8:38             ` Johannes Schindelin
2005-09-14 15:07               ` Linus Torvalds
2005-09-15  9:17         ` Junio C Hamano
2005-09-14 10:45       ` Sven Verdoolaege
2005-09-14 16:14         ` Junio C Hamano
2005-09-14 14:10       ` Jon Loeliger
2005-09-14 19:00         ` Junio C Hamano
2005-09-14 19:13           ` Jon Loeliger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).