Git development

Git development
 help / color / mirror / Atom feed

* Re: Git 1.3.2 on Solaris
From: Stefan Pfetzing @ 2006-05-23  3:20 UTC (permalink / raw)
  To: Git Mailing List
In-Reply-To: <6471.1147883724@lotus.CS.Berkeley.EDU>

Hi Jason,

2006/5/17, Jason Riedy <ejr@eecs.berkeley.edu>:
> And pkgsrc itself works just fine without the silly g prefix,
> or at least does for me as a mere user (and as well as it does
> work).  But if you intend on adding the package upstream, it'll
> need something to cope with the g.  And pkgsrc handles local
> patches...

Well I had some problems on NetBSD without the g prefix for the
gnu coreutils - since then I always used that prefix.

But now I have a completely different problem with the tests on
solaris. It seems on solaris access() always returns 0 if a file is
existant and the effective uid is 0.

so:
--- snip ---
#include <stdio.h>
#include <unistd.h>

int
main (int argc, char **argv)
{
  printf ("access: %d\n", access("/etc/motd", X_OK));
  return 0;
}
--- snap ---

will return 0 on solaris - when run as root, even though /etc/motd
is not executeable. This seems to break hooks on Solaris - but
I'm not sure if this is only a Solaris Express bug. (I have no Solaris
10 system to verify it)

bye

Stefan

-- 
       http://www.dreamind.de/
Oroborus and Debian GNU/Linux Developer.

^ permalink raw reply

* Re: [PATCH] cvsimport: introduce -L<imit> option to workaround memory leaks
From: Martin Langhoff (CatalystIT) @ 2006-05-23  3:15 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Git Mailing List, Junio C Hamano, Johannes.Schindelin, spyderous,
	smurf
In-Reply-To: <Pine.LNX.4.64.0605221926270.3697@g5.osdl.org>

Linus Torvalds wrote:

> 
> This stupid patch on top of yours seems to make git happier. It's 
> disgusting, I know, but it just repacks things every kilo-commit.
> 
> I actually think that I found a real ext3 performance bug from trying to 
> determine why git sometimes slows down ridiculously when the tree has been 
> allowed to go too long without a repack.
> 

Acked (in case anyone cares for such an obvious one), and thanks! I 
thought of doing that last night together with that exact patch, but I 
was focussing on the leak.

cheers,


m
-- 
-----------------------------------------------------------------------
Martin @ Catalyst .Net .NZ  Ltd, PO Box 11-053, Manners St,  Wellington
WEB: http://catalyst.net.nz/           PHYS: Level 2, 150-154 Willis St
OFFICE: +64(4)916-7224                              MOB: +64(21)364-017
       Make things as simple as possible, but no simpler - Einstein
-----------------------------------------------------------------------

^ permalink raw reply

* Re: [PATCH] cvsimport: introduce -L<imit> option to workaround memory leaks
From: Linus Torvalds @ 2006-05-23  2:28 UTC (permalink / raw)
  To: Martin Langhoff
  Cc: Git Mailing List, Junio C Hamano, Johannes.Schindelin, spyderous,
	smurf
In-Reply-To: <11482978883713-git-send-email-martin@catalyst.net.nz>



This stupid patch on top of yours seems to make git happier. It's 
disgusting, I know, but it just repacks things every kilo-commit.

I actually think that I found a real ext3 performance bug from trying to 
determine why git sometimes slows down ridiculously when the tree has been 
allowed to go too long without a repack.

		Linus

---
diff --git a/git-cvsimport.perl b/git-cvsimport.perl
index fb56278..c141f5e 100755
--- a/git-cvsimport.perl
+++ b/git-cvsimport.perl
@@ -853,10 +853,14 @@ #	VERSION:1.96->1.96.2.1
 	} elsif($state == 9 and /^\s*$/) {
 		$state = 10;
 	} elsif(($state == 9 or $state == 10) and /^-+$/) {
-		if ($opt_L && $commitcount++ >= $opt_L) {
+		$commitcount++;
+		if ($opt_L && $commitcount > $opt_L) {
 			last;
 		}
 		commit();
+		if (($commitcount & 1023) == 0) {
+			system("git repack -a -d");
+		}
 		$state = 1;
 	} elsif($state == 11 and /^-+$/) {
 		$state = 1;

^ permalink raw reply related

* Re: [PATCH] Change GIT-VERSION-GEN to call git commands with "git" not "git-".
From: Junio C Hamano @ 2006-05-23  1:20 UTC (permalink / raw)
  To: git
In-Reply-To: <BAYC1-PASMTP09B22AA86724B4F2C01F7FAE9A0@CEZ.ICE>

Sean <seanlkml@sympatico.ca> writes:

> GIT-VERSION-GEN can incorrectly return a default version of
> "v1.3.GIT" because it tries to execute git commands using the
> "git-cmd" format that expects all git commands to be in the $PATH.
> Convert these to  "git cmd" format so that a proper answer is
> returned even when the git commands have been moved out of the
> $PATH and into a $gitexecdir.

IIRC, the reason we spelled it as "git-describe"
with a dash is ancient git wrapper said "not a git command" when
given "describe" which it did not understand without failing.

I think it has been long enough since we introduced "git describe",
so this would be OK. 

^ permalink raw reply

* Re: [PATCH] git status: ignore empty directories (because they cannot be added)
From: Junio C Hamano @ 2006-05-23  1:11 UTC (permalink / raw)
  To: Matthias Lederhofer; +Cc: git
In-Reply-To: <E1FiHXS-0008MC-LB@moooo.ath.cx>

Matthias Lederhofer <matled@gmx.net> writes:

> and a new option -u / --untracked-files to show files in untracked
> directories.
>
> ---
> A few things I'm not sure about:
> - Should there be another option to disable --no-empty-directory?

I am not sure about this.  We used to show everything in a
directory full of untracked directory, which was distracting and
that was the reason we added --directory there.  Maybe it would
be less confusing if we just updated the message

	    print "#\n# Untracked files:\n";
	    print "#   (use \"git add\" to add to commit)\n";
	    print "#\n";

to say "use 'git add' on these files and files in these
directories you wish to add", or something silly like that,
without this patch?

^ permalink raw reply

* Re: [PATCH] Avoid segfault in diff --stat rename output.
From: Junio C Hamano @ 2006-05-23  1:02 UTC (permalink / raw)
  To: Sean; +Cc: git
In-Reply-To: <BAYC1-PASMTP115C9137E5BDABD705881BAE9B0@CEZ.ICE>

Sean <seanlkml@sympatico.ca> writes:

> Signed-off-by: Sean Estabrooks <seanlkml@sympatico.ca>
> ---
>  diff.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/diff.c b/diff.c
> index 7f35e59..a7bb9b9 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -237,7 +237,7 @@ static char *pprint_rename(const char *a
>  		if (a_midlen < 0) a_midlen = 0;
>  		if (b_midlen < 0) b_midlen = 0;
>  
> -		name = xmalloc(len_a + len_b - pfx_length - sfx_length + 7);
> +		name = xmalloc(pfx_length + a_midlen + b_midlen + sfx_length + 7);
>  		sprintf(name, "%.*s{%.*s => %.*s}%s",

Obviously correct given what the sprintf() that immediately
follows does.  Sheesh, what was I smoking back then.  *BLUSH*

Thanks.

^ permalink raw reply

* [PATCH] Avoid segfault in diff --stat rename output.
From: Sean @ 2006-05-23  0:36 UTC (permalink / raw)
  To: Torgil Svensson; +Cc: git
In-Reply-To: <e7bda7770605221609h7c18c2ccpe92db34050d46f9f@mail.gmail.com>


Signed-off-by: Sean Estabrooks <seanlkml@sympatico.ca>
---
 diff.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

On Tue, 23 May 2006 01:09:43 +0200
"Torgil Svensson" <torgil.svensson@gmail.com> wrote:

> Hi
> 
> It seems like git-diff-tree has some problems with moved files:
> 
> $ git-diff-tree -p --stat --summary -M
> 348f179e3195448cea49c98a79cce8c7f446ce26
> 343ca16424ba031b37e4df49afddaee098a8f347 | wc -l
> *** glibc detected *** free(): invalid pointer: 0x12ecbbf0 ***
> 6101


diff --git a/diff.c b/diff.c
index 7f35e59..a7bb9b9 100644
--- a/diff.c
+++ b/diff.c
@@ -237,7 +237,7 @@ static char *pprint_rename(const char *a
 		if (a_midlen < 0) a_midlen = 0;
 		if (b_midlen < 0) b_midlen = 0;
 
-		name = xmalloc(len_a + len_b - pfx_length - sfx_length + 7);
+		name = xmalloc(pfx_length + a_midlen + b_midlen + sfx_length + 7);
 		sprintf(name, "%.*s{%.*s => %.*s}%s",
 			pfx_length, a,
 			a_midlen, a + pfx_length,
-- 
1.3.GIT

^ permalink raw reply related

* cogito testsuite failure - git-read-tree too careful
From: Pavel Roskin @ 2006-05-23  0:28 UTC (permalink / raw)
  To: Petr Baudis, git

Hello, Petr!

The testsuite for cogito fails in t9300-seek.sh for up-to-date master
branches of git and Cogito.

$ ./t9300-seek.sh -v -i
* expecting success: cg-seek cbd273f56aecaaf28b857ae74da77cbb11a4d659
Warning: uncommitted local changes, trying to bring them along
fatal: Entry 'newdir/newfile' not uptodate. Cannot merge.
cg-seek: cbd273f56aecaaf28b857ae74da77cbb11a4d659: bad commit
* FAIL 21: seeking to the first commit
        cg-seek cbd273f56aecaaf28b857ae74da77cbb11a4d659

As I understand it, "git-read-tree -m" in cg-Xlib refuses to merge if
there are local changes.  This was likely caused by commit
fcc387db9bc453dc7e07a262873481af2ee9e5c8:

read-tree -m -u: do not overwrite or remove untracked working tree
files.

I guess git-read-tree should be using "--reset" or something to restore
the original behavior.

The "tutorial" testsuite also fails:

Should not be doing an Octopus.
No merge strategy handled the merge.
Merging 5de8995e58b4b478dff476788c3607ed5021fc24 ->
ba8b9edd80500d60d68a6630ee415a3e710f6db2
        to a60f36f73018dc1959d8d2cbd28271f93ee5f686 ...
fatal: Untracked working tree file 'stack.h' would be overwritten by
merge.
cg-merge: git-read-tree failed (merge likely blocked by local changes)
162
?
Unexpected error 4 on line 242

In this case, we may want cg-merge to fail, because it's wrong to
overwrite local files without backing them up.

-- 
Regards,
Pavel Roskin

^ permalink raw reply

* Re: [PATCH 0/2] tagsize < 8kb restriction
From: Linus Torvalds @ 2006-05-23  0:02 UTC (permalink / raw)
  To: Sean; +Cc: Junio C Hamano, BjEngelmann, git
In-Reply-To: <BAYC1-PASMTP1164FE2A24B4D1B4C0A607AE9A0@CEZ.ICE>

On Mon, 22 May 2006, Sean wrote:
> What seems to becoming clear as more people find new ways to use
> git is that many of them would be well served by having a solid
> infrastructure to handle metadata.  Consider the case above: _git_
> itself doesn't need a structural reference, but users and external
> applications definitely need to be able to lookup which metadata
> is associated with any given commit.  Having a git standard for
> this type of data would help.  Tags already do this, so they're
> likely to be used and abused in ways not initially envisioned,
> just because git doesn't have another such facility.

I definitely think we should allow arbitrary tags.

That said, I think that what you actually want to do may be totally 
different.

If _each_ commit has some extra information associated with it, you don't 
want to create a tag that points to the commit, you more likely want to 
create an object that is indexed by the commit ID rather than the other 
way around.

IOW, I _think_ that what you described would be that if you have the 
commit ID, you want to find the data based on that ID. No?

And that you can do quite easily, while _also_ using git to distribute the 
extra per-commit meta-data. Just create a separate branch that has the 
data indexed by commit ID. That could be as simple as having one file per 
commit (using, perhaps, a similar directory layout as the .git/objects/ 
directory itself), and then you could do something like

	# Get the SHA1 of the named commit
	commit=$(git-rev-parse --verify "$cmitname"^0)

	# turn it into a filename (slash between two first chars and the rest)
	filename=$(echo $commit | sed 's:^\(..\)\(.*\):\1/\2:')

	# look it up in the "annotations" branch
	git cat-file blob "annotations:$filename"

which gets the data from the "annotations" branch, indexed by the SHA1 
name.

Now, everybody can track your "annotations" branch using git, and get your 
per-commit annotations for the main branch.

See?

The real advantage of tags is that you can use them for the SHA1 
expressions, and follow them automatically. If that's what you want (ie 
you don't want to index things by the commit SHA1, but by some external 
name, like the name the commit had in some other repository), then by all 
means use tags. But if you just want to associate some data with each 
commit, the above "separate branch for annotations" approach is much more 
efficient.

		Linus

^ permalink raw reply

* Re: irc usage..
From: Linus Torvalds @ 2006-05-22 23:33 UTC (permalink / raw)
  To: Martin Langhoff
  Cc: Matthias Urlichs, Donnie Berkholz, Yann Dirson, Git Mailing List,
	Johannes Schindelin
In-Reply-To: <46a038f90605221623g25325e71hf3faf0a6a6ca628a@mail.gmail.com>

On Tue, 23 May 2006, Martin Langhoff wrote:
> 
> I really don't think that using the local cvs binary is a problem at
> all. In my experience, the thing is fairly fast and optimized when you
> ask it to perform file-oriented questions and that's all we do,
> really.

Fair enough. My worry was mainly that the cvs server was doing something 
stupid, but I suspect most of the fork/exec's are probably from the 
cvsimport perl script itself.

> In any case, we have it already -- parsecvs does it quite well (modulo
> memory leaks!) and I've used it several times in conjunction with
> cvsimport. Just perform the initial import with parsecvs and then
> 'track' the remote project with cvsimport.

I didn't get parsecvs working when I tried it a long time ago, and Donnie 
reported that it ran out of memory, so I didn't even really consider it. 
I'd love for it to work well, and it may be reasonable to do really big 
imports on multi-gigabyte 64-bit machines (after all, they aren't _hard_ 
to find any more, and you only need to do it once).

That said, it still seems pretty stupid to require that much memory just 
to import from CVS.

		Linus

^ permalink raw reply

* Re: irc usage..
From: Martin Langhoff @ 2006-05-22 23:29 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Matthias Urlichs, Donnie Berkholz, Yann Dirson, Git Mailing List,
	Johannes Schindelin
In-Reply-To: <46a038f90605221623g25325e71hf3faf0a6a6ca628a@mail.gmail.com>

On 5/23/06, Martin Langhoff <martin.langhoff@gmail.com> wrote:
> The problem is that they lead to slightly different trees.

Sorry! s/trees/histories/ there. The trees are (or should!) be the
same, and tree differences should be addressed as bugs. Differences in
how history is parsed are unavoidable right now.

martin

^ permalink raw reply

* Re: Local clone/fetch with cogito is glacial
From: Petr Baudis @ 2006-05-22 23:24 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: H. Peter Anvin, Sean, git
In-Reply-To: <Pine.LNX.4.64.0605221615030.3697@g5.osdl.org>

> The git-clone script will literally special-case rsync:// and http://. 
> Everything else should work fine with git-fetch-pack.

Aha, I overlooked that what I described goes on in git-clone happens
only with git-clone -l, otherwise it indeed seems to use git-fetch-pack.
Sorry about the confusion.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Right now I am having amnesia and deja-vu at the same time.  I think
I have forgotten this before.

^ permalink raw reply

* Re: irc usage..
From: Martin Langhoff @ 2006-05-22 23:23 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Matthias Urlichs, Donnie Berkholz, Yann Dirson, Git Mailing List,
	Johannes Schindelin
In-Reply-To: <Pine.LNX.4.64.0605221516500.3697@g5.osdl.org>

On 5/23/06, Linus Torvalds <torvalds@osdl.org> wrote:
> I don't think the remote usability is valid, except for some really small
> repositories. The fact that it takes hours even when the CVS server is
> local doesn't bode well for doing it remotely for any but the most trivial
> things.

I really don't think that using the local cvs binary is a problem at
all. In my experience, the thing is fairly fast and optimized when you
ask it to perform file-oriented questions and that's all we do,
really.

If you want to try it, you'll see that local checkouts of large trees
(like this gentoo one) are fairly fast. Not as fast as GIT itself, but
good enough. I think Donnie has hit a bug with a bad version of cvs,
but other than that, my experience with it is that it is fairly well
behaved -- even if the tool is bad, ubiquity has lead to resiliency
over the years.

> I really think it would be better to have local use be the optimized case,
> with remote being the "it's _possible_" case.

Agreed, but I think we won't see much benefit in direct parsing. And
we'll have to take the hit of double-implementation.

In any case, we have it already -- parsecvs does it quite well (modulo
memory leaks!) and I've used it several times in conjunction with
cvsimport. Just perform the initial import with parsecvs and then
'track' the remote project with cvsimport.

The problem is that they lead to slightly different trees. So their
output is not consistent, and I don't think that'll be easy to fix.

cheers,

martin

^ permalink raw reply

* Re: Local clone/fetch with cogito is glacial
From: Linus Torvalds @ 2006-05-22 23:18 UTC (permalink / raw)
  To: Petr Baudis; +Cc: H. Peter Anvin, Sean, git
In-Reply-To: <20060522225054.GL11941@pasky.or.cz>

On Tue, 23 May 2006, Petr Baudis wrote:
> 
> Even rsync and HTTP cg-clones? git:// and git+ssh:// fetching follows an
> almost entirely different code patch and it's much more efficient since
> I just accumulate the tag object ids I want to check and then pour them
> to git-fetch-pack - I cannot do that with git-(local|http)-fetch. :-(

Sure you can. Well, not to http-fetch, but git-fetch-pack should work fine 
for a local repo.

The git-clone script will literally special-case rsync:// and http://. 
Everything else should work fine with git-fetch-pack.

		Linus

^ permalink raw reply

* Re: [PATCH 0/2] tagsize < 8kb restriction
From: Sean @ 2006-05-22 23:12 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: BjEngelmann, git
In-Reply-To: <7vac99c1hv.fsf@assigned-by-dhcp.cox.net>

On Mon, 22 May 2006 12:18:04 -0700
Junio C Hamano <junkio@cox.net> wrote:

> Now, about the usage of such a long tag for your purpose.
> 
> As you noticed, commits and tags are the only types of objetcs
> that can refer to other commits structurally.  But there are
> cases where you do not even need nor want structural reference.
> For example, 'git cherry-pick' records the commit object name of
> the cherry-picked commit in the commit message as part of the
> text -- such a commit does not have structural reference to the
> original commit, and we would not _want_ one.  I have a strong
> suspicion that your application does not need or want structural
> reference to commits, and it might be better to merely mention
> their object names as part of the text the application produces,
> just like what 'git cherry-pick' does.

What seems to becoming clear as more people find new ways to use
git is that many of them would be well served by having a solid
infrastructure to handle metadata.  Consider the case above: _git_
itself doesn't need a structural reference, but users and external
applications definitely need to be able to lookup which metadata
is associated with any given commit.  Having a git standard for
this type of data would help.  Tags already do this, so they're
likely to be used and abused in ways not initially envisioned,
just because git doesn't have another such facility.

> Presumably you will have one such tag per commit, and by default
> 'fetch' (both cg and git) tries to follow tags, which means
> anybody who fetches new revision would automatically download
> this QA data -- that is one implication of using a tag to store
> this information.  Without knowing the nature of it, I am not
> sure if everybody who tracks the source wants such baggage.  If
> not, then use of a tag for this may not be appropriate.

Right.  It would be much nicer if it was possible to request or
ignore specific types of metadata when fetching; yet another
reason that it would be great if git had something built in
which anticipated this need.

> Another question is if the QA data expected to be amended or
> annotated later, after it is created.
> 
> If the answer is yes, then you probably would not want tags --
> you can create a new tag that points at the same commit to
> update the data, but then you have no structural relationships
> given by git between such tags that point at the same commit.
> You could infer their order by timestamp but that is about it.
> I think you are better off creating a separate QA project that
> adds one new file per commit on the main project, and have the
> file identify the commit object on the main project (either
> start your text file format for QA data with the commit object
> name, or name each such QA data file after the commit object
> name).  Then your automated procedure could scan and add a new
> file to the QA project every time a new commit is made to the
> main project, and the data in the QA project can be amended or
> annotated and the changes will be version controlled.

There are a lot of nice features with using a separate meta-data
branch.  However, you lose the ability to do lookups like you can
with tags.  A tag like index that gave the ability to associate
commits on otherwise unrelated branches might be a way to get
the best of both worlds.  However, there will be times where
version controlled meta-data is overkill.  Just need to codify a
git-standard for meta data, so that git can help where possible.

> If the answer is no, then it is probably better to just use an
> append-only log file that textually records which entry
> corresponds to which commit in the project.  If it is not
> version controlled, and if it is not part of the main project, I
> do not see much point in putting the data under git control and
> in the same project.

It would be very nice if git gave a standard way to lookup and
perhaps even display metadata.   Could add an option to git log
for example that said, show all metadata of a certain type.

There are a limitless number of examples where people want to
associate extra information with each commit.  Other SCM's call
these "attributes" or have other such names.  Given git's design
it isn't too hard to imagine offering the ability for version 
controlled (or not) and public (or not) meta-data.  Very similar
to tags, but perhaps with a few extra features.

If git already offered this feature, there'd be no need for a
flat-file ref-log; the data could be stored in a git-standard
way for metadata and gain the features of whatever tools grow
up around it, like querying, inspecting, purging etc..  All of
a sudden people would be able to look at (and perhaps even update)
their own meta data via git log/qgit/gitk/gitweb etc..   All we
need is a standard that everyone can conform with.

Sean

^ permalink raw reply

* Re: irc usage..
From: Martin Langhoff @ 2006-05-22 23:15 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Matthias Urlichs, git
In-Reply-To: <7v8xotadm3.fsf@assigned-by-dhcp.cox.net>

On 5/23/06, Junio C Hamano <junkio@cox.net> wrote:
> > I simply was too lazy to count the actual filenames' lengths. ;-)
>
> I think cvsimport predates that option, but these days that loop
> can be optimized by feeding --index-info from standard input.

Oh, yep, that'd be a good addition. I think we can also cut down on
the number of fork+exec calls (as Linus points out they are killing
us) by caching some data we should already have that we are repeatedly
asking from git-ref-parse.

Other TODOs from my reading of the code last night...

 - Switch from line-oriented reads to block reads when fetching files
from CVS. This gentoo has repo has some large binary blobs in it and
we end up slurping them into memory.

 - Stop abusing globals in commit() -- pass the commit data as parameters.

 - Further profiling? Whatever we are doing, we aren't doing it fast :(

Will be trying to do those things in the next few days, don't mind if
someone jumps in as well.

martin

^ permalink raw reply

* Re: Current Issues #3
From: Shawn Pearce @ 2006-05-22 23:12 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Junio C Hamano, git
In-Reply-To: <Pine.LNX.4.64.0605221738090.6713@iabervon.org>

Daniel Barkalow <barkalow@iabervon.org> wrote:
> On Mon, 22 May 2006, Junio C Hamano wrote:
> 
> > * reflog
> > 
> >   I still haven't merged this series to "next" -- I do not have
> >   much against what the code does, but I am unconvinced if it is
> >   useful.  Also objections raised on the list that this can be
> >   replaced by making sure that a repository that has hundreds of
> >   tags usable certainly have a point.
> 
> I think it would make gitweb's summary view clearer, and Linus seemed 
> interested in being able to look up what happened in the fast forward 
> which was the first of several merges in a day.
> 
> It could be replaced by a repository with hundreds of machine-readable 
> tags with code to parse dates into queries for suitable tags. But I don't 
> think there's an advantage to using the tag mechanism here, because you 
> never want to look the history up by exactly which history it is (the 
> thing that a tag ref is good for); you'll be looking for whatever reflog 
> item is the newest not after a specified time, where the specified time is 
> almost never a time that a reflog item was created.

The thing is this might also be easily represented as a structure
of tags; for example:

	refs/logs/heads/<ref>/<year>/<month>/<day> <hour>:<min>:<sec>:<seq>

where the tag is a tag of the commit which was valid in that ref
at that time.  Searching for an entry "around a particular time"
isn't that much more difficult than parsing a file, you just have
to walk backwards through the sorted directory listings then read
the tag object which matches; that tag object will point at the
tree/commit/tag which is was in that ref..

What's ugly about this is simply the disk storage: a ref file is an
expensive thing (relatively speaking) on most UNIX file systems due
to the inode overhead.  If this was stored in a more compact format
(such as a GIT tree) then this would cost very little.

So the alternative that I have been mentaly kicking around for
the past two days is storing the GIT_DIR/refs directory within a
standard GIT tree.  This of course would need to be an option that
gets enabled by the user as currently most tools expect the refs
directory to actually be a directory, not a tree.  The advantage here
is that unlike proposed reflog it is a compact ref representation
which could be used by other features, such as tagging a GIT
commit with the unique name of the same change from another SCM.
Or tagging your repository on every automated build, which runs
once every 5 minutes.

-- 
Shawn.

^ permalink raw reply

* git-diff-tree crashes on ubuntu kernel git repository
From: Torgil Svensson @ 2006-05-22 23:09 UTC (permalink / raw)
  To: git

Hi

It seems like git-diff-tree has some problems with moved files:

$ git-diff-tree -p --stat --summary -M
348f179e3195448cea49c98a79cce8c7f446ce26
343ca16424ba031b37e4df49afddaee098a8f347 | wc -l
*** glibc detected *** free(): invalid pointer: 0x12ecbbf0 ***
6101

As can be seen below there is some obvious error in the output just
prior to the crash:
 drivers/w1/{masters => }/ds_w1_bridge.c            |   38

This file is moved into "w1/masters" by commit
bd529cfb40c427d5b5aae0d315afb9f0a1da5e76

$ git --version
git version 1.3.3.g5e36

$ cat .git/remotes/origin
URL: git://git.kernel.org/pub/scm/linux/kernel/git/bcollins/ubuntu-2.6
Pull: refs/heads/master:refs/heads/origin

 $ gdb git-diff-tree
(gdb) run -p --stat --summary -M
348f179e3195448cea49c98a79cce8c7f446ce26
343ca16424ba031b37e4df49afddaee098a8f347

<...lots of files...>

 drivers/video/w100fb.c                             |  162
 drivers/video/w100fb.h                             |  748 -
 drivers/w1/Kconfig                                 |   62
 drivers/w1/Makefile                                |   10
 drivers/w1/{masters => }/ds_w1_bridge.c            |   38
*** glibc detected *** free(): invalid pointer: 0x12ecbbf0 ***

Program received signal SIGABRT, Aborted.
0xffffe410 in __kernel_vsyscall ()
(gdb) bt
#0  0xffffe410 in __kernel_vsyscall ()
#1  0xb7d7e9a1 in raise () from /lib/tls/i686/cmov/libc.so.6
#2  0xb7d802b9 in abort () from /lib/tls/i686/cmov/libc.so.6
#3  0xb7db287a in __fsetlocking () from /lib/tls/i686/cmov/libc.so.6
#4  0xb7db8fd4 in malloc_usable_size () from /lib/tls/i686/cmov/libc.so.6
#5  0xb7db934a in free () from /lib/tls/i686/cmov/libc.so.6
#6  0x08056902 in show_stats (data=0x8deff80) at diff.c:392
#7  0x08058466 in diff_flush (options=0x80686b0) at diff.c:1999
#8  0x0805b143 in log_tree_diff_flush (opt=0x8068680) at log-tree.c:82
#9  0x08049d11 in main (argc=0, argv=0xbfcf8a14) at diff-tree.c:130
(gdb)

As shown above I can easily recreate the crash if you want more info.
Thank you for a wonderful tool.

//Torgil

^ permalink raw reply

* Re: Local clone/fetch with cogito is glacial
From: Petr Baudis @ 2006-05-22 23:08 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Sean, git
In-Reply-To: <4472432A.8010002@zytor.com>

Dear diary, on Tue, May 23, 2006 at 01:03:06AM CEST, I got a letter
where "H. Peter Anvin" <hpa@zytor.com> said that...
> Petr Baudis wrote:
> >
> >Even rsync and HTTP cg-clones? git:// and git+ssh:// fetching follows an
> >almost entirely different code patch and it's much more efficient since
> >I just accumulate the tag object ids I want to check and then pour them
> >to git-fetch-pack - I cannot do that with git-(local|http)-fetch. :-(
> >
> 
> No, but git-fetch-pack could operate over a local pipe just fine (after 
> all, all it does is ssh an "git-send-pack" command to the other side.)

Yes, but in that case it couldn't hardlink the objects so you would see
quite a big bump in disk usage if you have many local clones of the same
repo.

That said, hardlinking is probably not all that big an advantage if you
repack often, repack everywhere, and in the many-repositories cases it
might be more sensible to use alternates (which is what cg-clone -l
should really do instead of symlinking), so it might be well worth
the sacrifice.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Right now I am having amnesia and deja-vu at the same time.  I think
I have forgotten this before.

^ permalink raw reply

* Re: Local clone/fetch with cogito is glacial
From: H. Peter Anvin @ 2006-05-22 23:03 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Sean, git
In-Reply-To: <20060522225054.GL11941@pasky.or.cz>

Petr Baudis wrote:
> 
> Even rsync and HTTP cg-clones? git:// and git+ssh:// fetching follows an
> almost entirely different code patch and it's much more efficient since
> I just accumulate the tag object ids I want to check and then pour them
> to git-fetch-pack - I cannot do that with git-(local|http)-fetch. :-(
> 

No, but git-fetch-pack could operate over a local pipe just fine (after all, all it does 
is ssh an "git-send-pack" command to the other side.)

	-hpa

^ permalink raw reply

* Re: Local clone/fetch with cogito is glacial
From: Petr Baudis @ 2006-05-22 22:50 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Sean, git
In-Reply-To: <447239F0.9030705@zytor.com>

Dear diary, on Tue, May 23, 2006 at 12:23:44AM CEST, I got a letter
where "H. Peter Anvin" <hpa@zytor.com> said that...
> Petr Baudis wrote:
> >git-clone has an advantage here since it clones _everything_ while
> >Cogito fetches only stuff related to the branch you are cloning, and
> >verifying if what it fetches is sensible for you unfortunately takes a
> >lot of time. :/ I guess there is no way to verify presence of multiple
> >objects at once and there is also no way to order local fetch of
> >multiple objects at once.
> 
> Note that non-local cg-clones are at least an order of magnitude faster, 
> even when the nonlocal is just git+ssh:.  One could presumably do the same 
> thing over a pipe.

Even rsync and HTTP cg-clones? git:// and git+ssh:// fetching follows an
almost entirely different code patch and it's much more efficient since
I just accumulate the tag object ids I want to check and then pour them
to git-fetch-pack - I cannot do that with git-(local|http)-fetch. :-(

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Right now I am having amnesia and deja-vu at the same time.  I think
I have forgotten this before.

^ permalink raw reply

* Re: irc usage..
From: Junio C Hamano @ 2006-05-22 22:39 UTC (permalink / raw)
  To: Matthias Urlichs; +Cc: git
In-Reply-To: <20060522214128.GE16677@kiste.smurf.noris.de>

Matthias Urlichs <smurf@smurf.noris.de> writes:

> Hi,
>
> Linus Torvalds:
>> I wonder why those "git-update-index" calls seem to be (assuming I read 
>> the perl correctly) done only a few files at a time. We can do a hundreds 
>> in one go, but it seems to want to do just ten files or something at the 
>> same time.
>
> No, fifty.
>
> I simply was too lazy to count the actual filenames' lengths. ;-)

I think cvsimport predates that option, but these days that loop
can be optimized by feeding --index-info from standard input.

^ permalink raw reply

* Re: Local clone/fetch with cogito is glacial
From: H. Peter Anvin @ 2006-05-22 22:23 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Sean, git
In-Reply-To: <20060522220206.GA10488@pasky.or.cz>

Petr Baudis wrote:
> 
> What about incremental fetches using git-fetch? From a quick scan of the
> git-fetch automagic tags following code, it seems to be even
> significantly more expensive than Cogito's (in terms of number of
> forks).
> 

Well, I haven't used git-fetch, so I can't comment on that one.

> git-clone has an advantage here since it clones _everything_ while
> Cogito fetches only stuff related to the branch you are cloning, and
> verifying if what it fetches is sensible for you unfortunately takes a
> lot of time. :/ I guess there is no way to verify presence of multiple
> objects at once and there is also no way to order local fetch of
> multiple objects at once.

Note that non-local cg-clones are at least an order of magnitude faster, even when the 
nonlocal is just git+ssh:.  One could presumably do the same thing over a pipe.

	-hpa

^ permalink raw reply

* Re: irc usage..
From: Linus Torvalds @ 2006-05-22 22:18 UTC (permalink / raw)
  To: Matthias Urlichs
  Cc: Martin Langhoff, Donnie Berkholz, Yann Dirson, Git Mailing List,
	Johannes Schindelin
In-Reply-To: <20060522214128.GE16677@kiste.smurf.noris.de>

On Mon, 22 May 2006, Matthias Urlichs wrote:
> 
> The beast *was* mainly written to do this remotely...

I don't think the remote usability is valid, except for some really small 
repositories. The fact that it takes hours even when the CVS server is 
local doesn't bode well for doing it remotely for any but the most trivial 
things.

I really think it would be better to have local use be the optimized case, 
with remote being the "it's _possible_" case.

		Linus

^ permalink raw reply

* Re: irc usage..
From: Matthias Urlichs @ 2006-05-22 21:41 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Martin Langhoff, Donnie Berkholz, Yann Dirson, Git Mailing List,
	Johannes Schindelin
In-Reply-To: <Pine.LNX.4.64.0605221256090.3697@g5.osdl.org>

[-- Attachment #1: Type: text/plain, Size: 872 bytes --]

Hi,

Linus Torvalds:
> I wonder why those "git-update-index" calls seem to be (assuming I read 
> the perl correctly) done only a few files at a time. We can do a hundreds 
> in one go, but it seems to want to do just ten files or something at the 
> same time.

No, fifty.

I simply was too lazy to count the actual filenames' lengths. ;-)

> That thing would probably be an order of magnitude faster if written to 
> use the git library interfaces directly. Of course, the CVS part is 
> probably a big overhead, so it might not help much 

The beast *was* mainly written to do this remotely...

-- 
Matthias Urlichs   |   {M:U} IT Design @ m-u-it.de   |  smurf@smurf.noris.de
Disclaimer: The quote was selected randomly. Really. | http://smurf.noris.de
 - -
The worst form of inequality is to try to make unequal things equal.
					-- Aristotle

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 191 bytes --]

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox