Git development

Git development
 help / color / mirror / Atom feed

* Re: [PATCH 0/2] tagsize < 8kb restriction
From: Linus Torvalds @ 2006-05-23  0:02 UTC (permalink / raw)
  To: Sean; +Cc: Junio C Hamano, BjEngelmann, git
In-Reply-To: <BAYC1-PASMTP1164FE2A24B4D1B4C0A607AE9A0@CEZ.ICE>

On Mon, 22 May 2006, Sean wrote:
> What seems to becoming clear as more people find new ways to use
> git is that many of them would be well served by having a solid
> infrastructure to handle metadata.  Consider the case above: _git_
> itself doesn't need a structural reference, but users and external
> applications definitely need to be able to lookup which metadata
> is associated with any given commit.  Having a git standard for
> this type of data would help.  Tags already do this, so they're
> likely to be used and abused in ways not initially envisioned,
> just because git doesn't have another such facility.

I definitely think we should allow arbitrary tags.

That said, I think that what you actually want to do may be totally 
different.

If _each_ commit has some extra information associated with it, you don't 
want to create a tag that points to the commit, you more likely want to 
create an object that is indexed by the commit ID rather than the other 
way around.

IOW, I _think_ that what you described would be that if you have the 
commit ID, you want to find the data based on that ID. No?

And that you can do quite easily, while _also_ using git to distribute the 
extra per-commit meta-data. Just create a separate branch that has the 
data indexed by commit ID. That could be as simple as having one file per 
commit (using, perhaps, a similar directory layout as the .git/objects/ 
directory itself), and then you could do something like

	# Get the SHA1 of the named commit
	commit=$(git-rev-parse --verify "$cmitname"^0)

	# turn it into a filename (slash between two first chars and the rest)
	filename=$(echo $commit | sed 's:^\(..\)\(.*\):\1/\2:')

	# look it up in the "annotations" branch
	git cat-file blob "annotations:$filename"

which gets the data from the "annotations" branch, indexed by the SHA1 
name.

Now, everybody can track your "annotations" branch using git, and get your 
per-commit annotations for the main branch.

See?

The real advantage of tags is that you can use them for the SHA1 
expressions, and follow them automatically. If that's what you want (ie 
you don't want to index things by the commit SHA1, but by some external 
name, like the name the commit had in some other repository), then by all 
means use tags. But if you just want to associate some data with each 
commit, the above "separate branch for annotations" approach is much more 
efficient.

		Linus

^ permalink raw reply

* Re: irc usage..
From: Linus Torvalds @ 2006-05-22 23:33 UTC (permalink / raw)
  To: Martin Langhoff
  Cc: Matthias Urlichs, Donnie Berkholz, Yann Dirson, Git Mailing List,
	Johannes Schindelin
In-Reply-To: <46a038f90605221623g25325e71hf3faf0a6a6ca628a@mail.gmail.com>

On Tue, 23 May 2006, Martin Langhoff wrote:
> 
> I really don't think that using the local cvs binary is a problem at
> all. In my experience, the thing is fairly fast and optimized when you
> ask it to perform file-oriented questions and that's all we do,
> really.

Fair enough. My worry was mainly that the cvs server was doing something 
stupid, but I suspect most of the fork/exec's are probably from the 
cvsimport perl script itself.

> In any case, we have it already -- parsecvs does it quite well (modulo
> memory leaks!) and I've used it several times in conjunction with
> cvsimport. Just perform the initial import with parsecvs and then
> 'track' the remote project with cvsimport.

I didn't get parsecvs working when I tried it a long time ago, and Donnie 
reported that it ran out of memory, so I didn't even really consider it. 
I'd love for it to work well, and it may be reasonable to do really big 
imports on multi-gigabyte 64-bit machines (after all, they aren't _hard_ 
to find any more, and you only need to do it once).

That said, it still seems pretty stupid to require that much memory just 
to import from CVS.

		Linus

^ permalink raw reply

* Re: irc usage..
From: Martin Langhoff @ 2006-05-22 23:29 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Matthias Urlichs, Donnie Berkholz, Yann Dirson, Git Mailing List,
	Johannes Schindelin
In-Reply-To: <46a038f90605221623g25325e71hf3faf0a6a6ca628a@mail.gmail.com>

On 5/23/06, Martin Langhoff <martin.langhoff@gmail.com> wrote:
> The problem is that they lead to slightly different trees.

Sorry! s/trees/histories/ there. The trees are (or should!) be the
same, and tree differences should be addressed as bugs. Differences in
how history is parsed are unavoidable right now.

martin

^ permalink raw reply

* Re: Local clone/fetch with cogito is glacial
From: Petr Baudis @ 2006-05-22 23:24 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: H. Peter Anvin, Sean, git
In-Reply-To: <Pine.LNX.4.64.0605221615030.3697@g5.osdl.org>

> The git-clone script will literally special-case rsync:// and http://. 
> Everything else should work fine with git-fetch-pack.

Aha, I overlooked that what I described goes on in git-clone happens
only with git-clone -l, otherwise it indeed seems to use git-fetch-pack.
Sorry about the confusion.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Right now I am having amnesia and deja-vu at the same time.  I think
I have forgotten this before.

^ permalink raw reply

* Re: irc usage..
From: Martin Langhoff @ 2006-05-22 23:23 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Matthias Urlichs, Donnie Berkholz, Yann Dirson, Git Mailing List,
	Johannes Schindelin
In-Reply-To: <Pine.LNX.4.64.0605221516500.3697@g5.osdl.org>

On 5/23/06, Linus Torvalds <torvalds@osdl.org> wrote:
> I don't think the remote usability is valid, except for some really small
> repositories. The fact that it takes hours even when the CVS server is
> local doesn't bode well for doing it remotely for any but the most trivial
> things.

I really don't think that using the local cvs binary is a problem at
all. In my experience, the thing is fairly fast and optimized when you
ask it to perform file-oriented questions and that's all we do,
really.

If you want to try it, you'll see that local checkouts of large trees
(like this gentoo one) are fairly fast. Not as fast as GIT itself, but
good enough. I think Donnie has hit a bug with a bad version of cvs,
but other than that, my experience with it is that it is fairly well
behaved -- even if the tool is bad, ubiquity has lead to resiliency
over the years.

> I really think it would be better to have local use be the optimized case,
> with remote being the "it's _possible_" case.

Agreed, but I think we won't see much benefit in direct parsing. And
we'll have to take the hit of double-implementation.

In any case, we have it already -- parsecvs does it quite well (modulo
memory leaks!) and I've used it several times in conjunction with
cvsimport. Just perform the initial import with parsecvs and then
'track' the remote project with cvsimport.

The problem is that they lead to slightly different trees. So their
output is not consistent, and I don't think that'll be easy to fix.

cheers,

martin

^ permalink raw reply

* Re: Local clone/fetch with cogito is glacial
From: Linus Torvalds @ 2006-05-22 23:18 UTC (permalink / raw)
  To: Petr Baudis; +Cc: H. Peter Anvin, Sean, git
In-Reply-To: <20060522225054.GL11941@pasky.or.cz>

On Tue, 23 May 2006, Petr Baudis wrote:
> 
> Even rsync and HTTP cg-clones? git:// and git+ssh:// fetching follows an
> almost entirely different code patch and it's much more efficient since
> I just accumulate the tag object ids I want to check and then pour them
> to git-fetch-pack - I cannot do that with git-(local|http)-fetch. :-(

Sure you can. Well, not to http-fetch, but git-fetch-pack should work fine 
for a local repo.

The git-clone script will literally special-case rsync:// and http://. 
Everything else should work fine with git-fetch-pack.

		Linus

^ permalink raw reply

* Re: [PATCH 0/2] tagsize < 8kb restriction
From: Sean @ 2006-05-22 23:12 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: BjEngelmann, git
In-Reply-To: <7vac99c1hv.fsf@assigned-by-dhcp.cox.net>

On Mon, 22 May 2006 12:18:04 -0700
Junio C Hamano <junkio@cox.net> wrote:

> Now, about the usage of such a long tag for your purpose.
> 
> As you noticed, commits and tags are the only types of objetcs
> that can refer to other commits structurally.  But there are
> cases where you do not even need nor want structural reference.
> For example, 'git cherry-pick' records the commit object name of
> the cherry-picked commit in the commit message as part of the
> text -- such a commit does not have structural reference to the
> original commit, and we would not _want_ one.  I have a strong
> suspicion that your application does not need or want structural
> reference to commits, and it might be better to merely mention
> their object names as part of the text the application produces,
> just like what 'git cherry-pick' does.

What seems to becoming clear as more people find new ways to use
git is that many of them would be well served by having a solid
infrastructure to handle metadata.  Consider the case above: _git_
itself doesn't need a structural reference, but users and external
applications definitely need to be able to lookup which metadata
is associated with any given commit.  Having a git standard for
this type of data would help.  Tags already do this, so they're
likely to be used and abused in ways not initially envisioned,
just because git doesn't have another such facility.

> Presumably you will have one such tag per commit, and by default
> 'fetch' (both cg and git) tries to follow tags, which means
> anybody who fetches new revision would automatically download
> this QA data -- that is one implication of using a tag to store
> this information.  Without knowing the nature of it, I am not
> sure if everybody who tracks the source wants such baggage.  If
> not, then use of a tag for this may not be appropriate.

Right.  It would be much nicer if it was possible to request or
ignore specific types of metadata when fetching; yet another
reason that it would be great if git had something built in
which anticipated this need.

> Another question is if the QA data expected to be amended or
> annotated later, after it is created.
> 
> If the answer is yes, then you probably would not want tags --
> you can create a new tag that points at the same commit to
> update the data, but then you have no structural relationships
> given by git between such tags that point at the same commit.
> You could infer their order by timestamp but that is about it.
> I think you are better off creating a separate QA project that
> adds one new file per commit on the main project, and have the
> file identify the commit object on the main project (either
> start your text file format for QA data with the commit object
> name, or name each such QA data file after the commit object
> name).  Then your automated procedure could scan and add a new
> file to the QA project every time a new commit is made to the
> main project, and the data in the QA project can be amended or
> annotated and the changes will be version controlled.

There are a lot of nice features with using a separate meta-data
branch.  However, you lose the ability to do lookups like you can
with tags.  A tag like index that gave the ability to associate
commits on otherwise unrelated branches might be a way to get
the best of both worlds.  However, there will be times where
version controlled meta-data is overkill.  Just need to codify a
git-standard for meta data, so that git can help where possible.

> If the answer is no, then it is probably better to just use an
> append-only log file that textually records which entry
> corresponds to which commit in the project.  If it is not
> version controlled, and if it is not part of the main project, I
> do not see much point in putting the data under git control and
> in the same project.

It would be very nice if git gave a standard way to lookup and
perhaps even display metadata.   Could add an option to git log
for example that said, show all metadata of a certain type.

There are a limitless number of examples where people want to
associate extra information with each commit.  Other SCM's call
these "attributes" or have other such names.  Given git's design
it isn't too hard to imagine offering the ability for version 
controlled (or not) and public (or not) meta-data.  Very similar
to tags, but perhaps with a few extra features.

If git already offered this feature, there'd be no need for a
flat-file ref-log; the data could be stored in a git-standard
way for metadata and gain the features of whatever tools grow
up around it, like querying, inspecting, purging etc..  All of
a sudden people would be able to look at (and perhaps even update)
their own meta data via git log/qgit/gitk/gitweb etc..   All we
need is a standard that everyone can conform with.

Sean

^ permalink raw reply

* Re: irc usage..
From: Martin Langhoff @ 2006-05-22 23:15 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Matthias Urlichs, git
In-Reply-To: <7v8xotadm3.fsf@assigned-by-dhcp.cox.net>

On 5/23/06, Junio C Hamano <junkio@cox.net> wrote:
> > I simply was too lazy to count the actual filenames' lengths. ;-)
>
> I think cvsimport predates that option, but these days that loop
> can be optimized by feeding --index-info from standard input.

Oh, yep, that'd be a good addition. I think we can also cut down on
the number of fork+exec calls (as Linus points out they are killing
us) by caching some data we should already have that we are repeatedly
asking from git-ref-parse.

Other TODOs from my reading of the code last night...

 - Switch from line-oriented reads to block reads when fetching files
from CVS. This gentoo has repo has some large binary blobs in it and
we end up slurping them into memory.

 - Stop abusing globals in commit() -- pass the commit data as parameters.

 - Further profiling? Whatever we are doing, we aren't doing it fast :(

Will be trying to do those things in the next few days, don't mind if
someone jumps in as well.

martin

^ permalink raw reply

* Re: Current Issues #3
From: Shawn Pearce @ 2006-05-22 23:12 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Junio C Hamano, git
In-Reply-To: <Pine.LNX.4.64.0605221738090.6713@iabervon.org>

Daniel Barkalow <barkalow@iabervon.org> wrote:
> On Mon, 22 May 2006, Junio C Hamano wrote:
> 
> > * reflog
> > 
> >   I still haven't merged this series to "next" -- I do not have
> >   much against what the code does, but I am unconvinced if it is
> >   useful.  Also objections raised on the list that this can be
> >   replaced by making sure that a repository that has hundreds of
> >   tags usable certainly have a point.
> 
> I think it would make gitweb's summary view clearer, and Linus seemed 
> interested in being able to look up what happened in the fast forward 
> which was the first of several merges in a day.
> 
> It could be replaced by a repository with hundreds of machine-readable 
> tags with code to parse dates into queries for suitable tags. But I don't 
> think there's an advantage to using the tag mechanism here, because you 
> never want to look the history up by exactly which history it is (the 
> thing that a tag ref is good for); you'll be looking for whatever reflog 
> item is the newest not after a specified time, where the specified time is 
> almost never a time that a reflog item was created.

The thing is this might also be easily represented as a structure
of tags; for example:

	refs/logs/heads/<ref>/<year>/<month>/<day> <hour>:<min>:<sec>:<seq>

where the tag is a tag of the commit which was valid in that ref
at that time.  Searching for an entry "around a particular time"
isn't that much more difficult than parsing a file, you just have
to walk backwards through the sorted directory listings then read
the tag object which matches; that tag object will point at the
tree/commit/tag which is was in that ref..

What's ugly about this is simply the disk storage: a ref file is an
expensive thing (relatively speaking) on most UNIX file systems due
to the inode overhead.  If this was stored in a more compact format
(such as a GIT tree) then this would cost very little.

So the alternative that I have been mentaly kicking around for
the past two days is storing the GIT_DIR/refs directory within a
standard GIT tree.  This of course would need to be an option that
gets enabled by the user as currently most tools expect the refs
directory to actually be a directory, not a tree.  The advantage here
is that unlike proposed reflog it is a compact ref representation
which could be used by other features, such as tagging a GIT
commit with the unique name of the same change from another SCM.
Or tagging your repository on every automated build, which runs
once every 5 minutes.

-- 
Shawn.

^ permalink raw reply

* git-diff-tree crashes on ubuntu kernel git repository
From: Torgil Svensson @ 2006-05-22 23:09 UTC (permalink / raw)
  To: git

Hi

It seems like git-diff-tree has some problems with moved files:

$ git-diff-tree -p --stat --summary -M
348f179e3195448cea49c98a79cce8c7f446ce26
343ca16424ba031b37e4df49afddaee098a8f347 | wc -l
*** glibc detected *** free(): invalid pointer: 0x12ecbbf0 ***
6101

As can be seen below there is some obvious error in the output just
prior to the crash:
 drivers/w1/{masters => }/ds_w1_bridge.c            |   38

This file is moved into "w1/masters" by commit
bd529cfb40c427d5b5aae0d315afb9f0a1da5e76

$ git --version
git version 1.3.3.g5e36

$ cat .git/remotes/origin
URL: git://git.kernel.org/pub/scm/linux/kernel/git/bcollins/ubuntu-2.6
Pull: refs/heads/master:refs/heads/origin

 $ gdb git-diff-tree
(gdb) run -p --stat --summary -M
348f179e3195448cea49c98a79cce8c7f446ce26
343ca16424ba031b37e4df49afddaee098a8f347

<...lots of files...>

 drivers/video/w100fb.c                             |  162
 drivers/video/w100fb.h                             |  748 -
 drivers/w1/Kconfig                                 |   62
 drivers/w1/Makefile                                |   10
 drivers/w1/{masters => }/ds_w1_bridge.c            |   38
*** glibc detected *** free(): invalid pointer: 0x12ecbbf0 ***

Program received signal SIGABRT, Aborted.
0xffffe410 in __kernel_vsyscall ()
(gdb) bt
#0  0xffffe410 in __kernel_vsyscall ()
#1  0xb7d7e9a1 in raise () from /lib/tls/i686/cmov/libc.so.6
#2  0xb7d802b9 in abort () from /lib/tls/i686/cmov/libc.so.6
#3  0xb7db287a in __fsetlocking () from /lib/tls/i686/cmov/libc.so.6
#4  0xb7db8fd4 in malloc_usable_size () from /lib/tls/i686/cmov/libc.so.6
#5  0xb7db934a in free () from /lib/tls/i686/cmov/libc.so.6
#6  0x08056902 in show_stats (data=0x8deff80) at diff.c:392
#7  0x08058466 in diff_flush (options=0x80686b0) at diff.c:1999
#8  0x0805b143 in log_tree_diff_flush (opt=0x8068680) at log-tree.c:82
#9  0x08049d11 in main (argc=0, argv=0xbfcf8a14) at diff-tree.c:130
(gdb)

As shown above I can easily recreate the crash if you want more info.
Thank you for a wonderful tool.

//Torgil

^ permalink raw reply

* Re: Local clone/fetch with cogito is glacial
From: Petr Baudis @ 2006-05-22 23:08 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Sean, git
In-Reply-To: <4472432A.8010002@zytor.com>

Dear diary, on Tue, May 23, 2006 at 01:03:06AM CEST, I got a letter
where "H. Peter Anvin" <hpa@zytor.com> said that...
> Petr Baudis wrote:
> >
> >Even rsync and HTTP cg-clones? git:// and git+ssh:// fetching follows an
> >almost entirely different code patch and it's much more efficient since
> >I just accumulate the tag object ids I want to check and then pour them
> >to git-fetch-pack - I cannot do that with git-(local|http)-fetch. :-(
> >
> 
> No, but git-fetch-pack could operate over a local pipe just fine (after 
> all, all it does is ssh an "git-send-pack" command to the other side.)

Yes, but in that case it couldn't hardlink the objects so you would see
quite a big bump in disk usage if you have many local clones of the same
repo.

That said, hardlinking is probably not all that big an advantage if you
repack often, repack everywhere, and in the many-repositories cases it
might be more sensible to use alternates (which is what cg-clone -l
should really do instead of symlinking), so it might be well worth
the sacrifice.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Right now I am having amnesia and deja-vu at the same time.  I think
I have forgotten this before.

^ permalink raw reply

* Re: Local clone/fetch with cogito is glacial
From: H. Peter Anvin @ 2006-05-22 23:03 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Sean, git
In-Reply-To: <20060522225054.GL11941@pasky.or.cz>

Petr Baudis wrote:
> 
> Even rsync and HTTP cg-clones? git:// and git+ssh:// fetching follows an
> almost entirely different code patch and it's much more efficient since
> I just accumulate the tag object ids I want to check and then pour them
> to git-fetch-pack - I cannot do that with git-(local|http)-fetch. :-(
> 

No, but git-fetch-pack could operate over a local pipe just fine (after all, all it does 
is ssh an "git-send-pack" command to the other side.)

	-hpa

^ permalink raw reply

* Re: Local clone/fetch with cogito is glacial
From: Petr Baudis @ 2006-05-22 22:50 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Sean, git
In-Reply-To: <447239F0.9030705@zytor.com>

Dear diary, on Tue, May 23, 2006 at 12:23:44AM CEST, I got a letter
where "H. Peter Anvin" <hpa@zytor.com> said that...
> Petr Baudis wrote:
> >git-clone has an advantage here since it clones _everything_ while
> >Cogito fetches only stuff related to the branch you are cloning, and
> >verifying if what it fetches is sensible for you unfortunately takes a
> >lot of time. :/ I guess there is no way to verify presence of multiple
> >objects at once and there is also no way to order local fetch of
> >multiple objects at once.
> 
> Note that non-local cg-clones are at least an order of magnitude faster, 
> even when the nonlocal is just git+ssh:.  One could presumably do the same 
> thing over a pipe.

Even rsync and HTTP cg-clones? git:// and git+ssh:// fetching follows an
almost entirely different code patch and it's much more efficient since
I just accumulate the tag object ids I want to check and then pour them
to git-fetch-pack - I cannot do that with git-(local|http)-fetch. :-(

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Right now I am having amnesia and deja-vu at the same time.  I think
I have forgotten this before.

^ permalink raw reply

* Re: irc usage..
From: Junio C Hamano @ 2006-05-22 22:39 UTC (permalink / raw)
  To: Matthias Urlichs; +Cc: git
In-Reply-To: <20060522214128.GE16677@kiste.smurf.noris.de>

Matthias Urlichs <smurf@smurf.noris.de> writes:

> Hi,
>
> Linus Torvalds:
>> I wonder why those "git-update-index" calls seem to be (assuming I read 
>> the perl correctly) done only a few files at a time. We can do a hundreds 
>> in one go, but it seems to want to do just ten files or something at the 
>> same time.
>
> No, fifty.
>
> I simply was too lazy to count the actual filenames' lengths. ;-)

I think cvsimport predates that option, but these days that loop
can be optimized by feeding --index-info from standard input.

^ permalink raw reply

* Re: Local clone/fetch with cogito is glacial
From: H. Peter Anvin @ 2006-05-22 22:23 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Sean, git
In-Reply-To: <20060522220206.GA10488@pasky.or.cz>

Petr Baudis wrote:
> 
> What about incremental fetches using git-fetch? From a quick scan of the
> git-fetch automagic tags following code, it seems to be even
> significantly more expensive than Cogito's (in terms of number of
> forks).
> 

Well, I haven't used git-fetch, so I can't comment on that one.

> git-clone has an advantage here since it clones _everything_ while
> Cogito fetches only stuff related to the branch you are cloning, and
> verifying if what it fetches is sensible for you unfortunately takes a
> lot of time. :/ I guess there is no way to verify presence of multiple
> objects at once and there is also no way to order local fetch of
> multiple objects at once.

Note that non-local cg-clones are at least an order of magnitude faster, even when the 
nonlocal is just git+ssh:.  One could presumably do the same thing over a pipe.

	-hpa

^ permalink raw reply

* Re: irc usage..
From: Linus Torvalds @ 2006-05-22 22:18 UTC (permalink / raw)
  To: Matthias Urlichs
  Cc: Martin Langhoff, Donnie Berkholz, Yann Dirson, Git Mailing List,
	Johannes Schindelin
In-Reply-To: <20060522214128.GE16677@kiste.smurf.noris.de>

On Mon, 22 May 2006, Matthias Urlichs wrote:
> 
> The beast *was* mainly written to do this remotely...

I don't think the remote usability is valid, except for some really small 
repositories. The fact that it takes hours even when the CVS server is 
local doesn't bode well for doing it remotely for any but the most trivial 
things.

I really think it would be better to have local use be the optimized case, 
with remote being the "it's _possible_" case.

		Linus

^ permalink raw reply

* Re: irc usage..
From: Matthias Urlichs @ 2006-05-22 21:41 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Martin Langhoff, Donnie Berkholz, Yann Dirson, Git Mailing List,
	Johannes Schindelin
In-Reply-To: <Pine.LNX.4.64.0605221256090.3697@g5.osdl.org>

[-- Attachment #1: Type: text/plain, Size: 872 bytes --]

Hi,

Linus Torvalds:
> I wonder why those "git-update-index" calls seem to be (assuming I read 
> the perl correctly) done only a few files at a time. We can do a hundreds 
> in one go, but it seems to want to do just ten files or something at the 
> same time.

No, fifty.

I simply was too lazy to count the actual filenames' lengths. ;-)

> That thing would probably be an order of magnitude faster if written to 
> use the git library interfaces directly. Of course, the CVS part is 
> probably a big overhead, so it might not help much 

The beast *was* mainly written to do this remotely...

-- 
Matthias Urlichs   |   {M:U} IT Design @ m-u-it.de   |  smurf@smurf.noris.de
Disclaimer: The quote was selected randomly. Really. | http://smurf.noris.de
 - -
The worst form of inequality is to try to make unequal things equal.
					-- Aristotle

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 191 bytes --]

^ permalink raw reply

* Re: Current Issues #3
From: Carl Worth @ 2006-05-22 22:02 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Junio C Hamano, git
In-Reply-To: <Pine.LNX.4.64.0605221738090.6713@iabervon.org>

[-- Attachment #1: Type: text/plain, Size: 214 bytes --]

On Mon, 22 May 2006 17:54:28 -0400 (EDT), Daniel Barkalow wrote:
> On Mon, 22 May 2006, Junio C Hamano wrote:
> 
> > * reflog

Am I the only one that read that as re-flog rather than ref-log the
first time?

-Carl

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply

* Re: Local clone/fetch with cogito is glacial
From: Petr Baudis @ 2006-05-22 22:02 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Sean, git
In-Reply-To: <44722A8F.9020609@zytor.com>

Dear diary, on Mon, May 22, 2006 at 11:18:07PM CEST, I got a letter
where "H. Peter Anvin" <hpa@zytor.com> said that...
> Sean wrote:
> >On Sun, 21 May 2006 16:47:45 -0700
> >"H. Peter Anvin" <hpa@zytor.com> wrote:
> >
> >>It appears that doing a *local* -- meaning using a file path or file URL 
> >>-- clone or fetch with cogito is just glacial when the repository has an 
> >>even moderate number of tags (and it's fetching the tags that takes all 
> >>the time.)  That's a really serious problem for me.
> >>
> >
> >Peter, does git clone work acceptably for you?
> >
> 
> Well, it does, except it doesn't set up the cogito branches (which one can 
> of course copy manually.)

What about incremental fetches using git-fetch? From a quick scan of the
git-fetch automagic tags following code, it seems to be even
significantly more expensive than Cogito's (in terms of number of
forks).

git-clone has an advantage here since it clones _everything_ while
Cogito fetches only stuff related to the branch you are cloning, and
verifying if what it fetches is sensible for you unfortunately takes a
lot of time. :/ I guess there is no way to verify presence of multiple
objects at once and there is also no way to order local fetch of
multiple objects at once.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Right now I am having amnesia and deja-vu at the same time.  I think
I have forgotten this before.

^ permalink raw reply

* Re: Current Issues #3
From: Daniel Barkalow @ 2006-05-22 21:54 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7v8xoue9eo.fsf@assigned-by-dhcp.cox.net>

On Mon, 22 May 2006, Junio C Hamano wrote:

> * reflog
> 
>   I still haven't merged this series to "next" -- I do not have
>   much against what the code does, but I am unconvinced if it is
>   useful.  Also objections raised on the list that this can be
>   replaced by making sure that a repository that has hundreds of
>   tags usable certainly have a point.

I think it would make gitweb's summary view clearer, and Linus seemed 
interested in being able to look up what happened in the fast forward 
which was the first of several merges in a day.

It could be replaced by a repository with hundreds of machine-readable 
tags with code to parse dates into queries for suitable tags. But I don't 
think there's an advantage to using the tag mechanism here, because you 
never want to look the history up by exactly which history it is (the 
thing that a tag ref is good for); you'll be looking for whatever reflog 
item is the newest not after a specified time, where the specified time is 
almost never a time that a reflog item was created.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply

* Re: irc usage..
From: Donnie Berkholz @ 2006-05-22 21:48 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Martin Langhoff, Yann Dirson, Git Mailing List, Matthias Urlichs,
	Johannes Schindelin
In-Reply-To: <Pine.LNX.4.64.0605221312380.3697@g5.osdl.org>

[-- Attachment #1: Type: text/plain, Size: 233 bytes --]

Linus Torvalds wrote:
> The latest stable CVS release is 1.11.21, I think: you seem to be running 
> the "development" version (1.12.x).

Backed down to the 1.11 series, things seem to be going fine so far.

Thanks,
Donnie


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 252 bytes --]

^ permalink raw reply

* Re: Local clone/fetch with cogito is glacial
From: H. Peter Anvin @ 2006-05-22 21:18 UTC (permalink / raw)
  To: Sean; +Cc: git
In-Reply-To: <BAYC1-PASMTP11FDE05B530CFF43C043E5AE9A0@CEZ.ICE>

Sean wrote:
> On Sun, 21 May 2006 16:47:45 -0700
> "H. Peter Anvin" <hpa@zytor.com> wrote:
> 
>> It appears that doing a *local* -- meaning using a file path or file URL 
>> -- clone or fetch with cogito is just glacial when the repository has an 
>> even moderate number of tags (and it's fetching the tags that takes all 
>> the time.)  That's a really serious problem for me.
>>
> 
> Peter, does git clone work acceptably for you?
> 

Well, it does, except it doesn't set up the cogito branches (which one can of course copy 
manually.)

cg-clone probably should be rewritten as a thin wrapper around git-clone.

	-hpa

^ permalink raw reply

* [PATCH] git status: ignore empty directories (because they cannot be added)
From: Matthias Lederhofer @ 2006-05-22 21:02 UTC (permalink / raw)
  To: git

and a new option -u / --untracked-files to show files in untracked
directories.

---
A few things I'm not sure about:
- Should there be another option to disable --no-empty-directory?
- Is the option name --untracked-files ok?
- Should it be documented (probably yes :))? At the moment the
  git-status man page does not tell about any command line option at
  all but for git-commit it does not make sense.

 git-commit.sh |   17 ++++++++++++++---
 1 files changed, 14 insertions(+), 3 deletions(-)

---

1921592d5e7809f72a902cca1a38217b150800a9
diff --git a/git-commit.sh b/git-commit.sh
index 6ef1a9d..6785826 100755
--- a/git-commit.sh
+++ b/git-commit.sh
@@ -3,7 +3,7 @@ #
 # Copyright (c) 2005 Linus Torvalds
 # Copyright (c) 2006 Junio C Hamano
 
-USAGE='[-a] [-s] [-v] [--no-verify] [-m <message> | -F <logfile> | (-C|-c) <commit>) [--amend] [-e] [--author <author>] [[-i | -o] <path>...]'
+USAGE='[-a] [-s] [-v] [--no-verify] [-m <message> | -F <logfile> | (-C|-c) <commit>] [-u] [--amend] [-e] [--author <author>] [[-i | -o] <path>...]'
 SUBDIRECTORY_OK=Yes
 . git-sh-setup
 
@@ -134,13 +134,17 @@ #'
 	report "Changed but not updated" \
 	    "use git-update-index to mark for commit"
 
+        option=""
+        if test -z "$untracked_files"; then
+            option="--directory --no-empty-directory"
+        fi
 	if test -f "$GIT_DIR/info/exclude"
 	then
-	    git-ls-files -z --others --directory \
+	    git-ls-files -z --others $option \
 		--exclude-from="$GIT_DIR/info/exclude" \
 		--exclude-per-directory=.gitignore
 	else
-	    git-ls-files -z --others --directory \
+	    git-ls-files -z --others $option \
 		--exclude-per-directory=.gitignore
 	fi |
 	perl -e '$/ = "\0";
@@ -203,6 +207,7 @@ verbose=
 signoff=
 force_author=
 only_include_assumed=
+untracked_files=
 while case "$#" in 0) break;; esac
 do
   case "$1" in
@@ -340,6 +345,12 @@ do
       verbose=t
       shift
       ;;
+  -u|--u|--un|--unt|--untr|--untra|--untrac|--untrack|--untracke|--untracked|\
+  --untracked-|--untracked-f|--untracked-fi|--untracked-fil|--untracked-file|\
+  --untracked-files)
+      untracked_files=t
+      shift
+      ;;
   --)
       shift
       break
-- 
1.3.2

^ permalink raw reply related

* Re: irc usage..
From: Linus Torvalds @ 2006-05-22 20:33 UTC (permalink / raw)
  To: Martin Langhoff
  Cc: Donnie Berkholz, Yann Dirson, Git Mailing List, Matthias Urlichs,
	Johannes Schindelin
In-Reply-To: <Pine.LNX.4.64.0605221256090.3697@g5.osdl.org>

On Mon, 22 May 2006, Linus Torvalds wrote:
> 
> Of course, the CVS part is probably a big overhead, so it might not help 
> much (I would not be surprised at all if a number of the fork/exec/exit 
> things are due to the CVS server starting RCS or something, not due to 
> git-cvsimport itself)

Ahh. stracing the CVS server seems to imply that it forks off a subprocess 
for every command. It doesn't actually execute any external program, but 
just does a fork + muck around in the ,v files + exit.

Maybe one of the changes in the 1.12.x versions is to not do that, which 
might explain why Donnie seems to see much better performance, but also 
sees all the memory leakage?

		Linus

^ permalink raw reply

* Re: irc usage..
From: Linus Torvalds @ 2006-05-22 20:20 UTC (permalink / raw)
  To: Donnie Berkholz
  Cc: Martin Langhoff, Yann Dirson, Git Mailing List, Matthias Urlichs,
	Johannes Schindelin
In-Reply-To: <447215D4.5020403@gentoo.org>

On Mon, 22 May 2006, Donnie Berkholz wrote:
>
> Linus Torvalds wrote:
> > Hmm. My cvs server doesn't really grow at all. It's at 13M RSS.
> 
> Yeah, that's the thing. RSS stayed about the same (according to top),
> but virtual just kept growing.

Not for me. The virtual size is certainly bigger than RSS, but not by a 
huge amount. So this might be a regression in CVS, since you seem to have 
a newer version than I do.

The latest stable CVS release is 1.11.21, I think: you seem to be running 
the "development" version (1.12.x).

			Linus

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox