Git development
 help / color / mirror / Atom feed
* [PATCH 1/6] show-diff.c: clean up private buffer use.
From: Junio C Hamano @ 2005-04-18 20:33 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

[PATCH 1/6] show-diff.c: clean up private buffer use.

This patch fixes sq_expand() and show_differences() not to use and
hold onto its privately allocated buffer, which was a misguided
attempt to reduce calls to malloc but made later changes harder.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---

 show-diff.c |   22 ++++++----------------
 1 files changed, 6 insertions(+), 16 deletions(-)

show-diff.c: 1342b3c9848a7675665594a6bc19d95e3b7ff358
--- a/show-diff.c
+++ b/show-diff.c
@@ -23,7 +23,6 @@ static char *diff_cmd = "diff -L '%s' -u
 static char *sq_expand(char *src)
 {
 	static char *buf = NULL;
-	static int buf_size = -1;
 	int cnt, c;
 	char *cp;
 
@@ -32,12 +31,8 @@ static char *sq_expand(char *src)
 		if (*cp == '\'')
 			cnt += 3;
 
-	if (buf_size < cnt) {
-		free(buf);
-		buf_size = cnt;
-		buf = malloc(cnt);
-	}
-
+	if (! (buf = malloc(cnt)))
+	    return buf;
 	cp = buf;
 	while ((c = *src++)) {
 		if (c != '\'')
@@ -55,22 +50,17 @@ static void show_differences(char *name,
 			     unsigned long long old_size)
 {
 	FILE *f;
-	static char *cmd = NULL;
-	static int cmd_size = -1;
-
 	char *name_sq = sq_expand(name);
-	int cmd_required_length = strlen(name_sq) * 2 + strlen(diff_cmd);
+	int cmd_size = strlen(name_sq) * 2 + strlen(diff_cmd);
+	char *cmd = malloc(cmd_size);
 
-	if (cmd_size < cmd_required_length) {
-		free(cmd);
-		cmd_size = cmd_required_length;
-		cmd = malloc(cmd_required_length);
-	}
 	snprintf(cmd, cmd_size, diff_cmd, name_sq, name_sq);
 	f = popen(cmd, "w");
 	if (old_size)
 		fwrite(old_contents, old_size, 1, f);
 	pclose(f);
+	free(name_sq);
+	free(cmd);
 }
 
 static void show_diff_empty(struct cache_entry *ce)


^ permalink raw reply

* SCSI trees, merges and git status
From: James Bottomley @ 2005-04-18 20:28 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git, SCSI Mailing List

As of today, I have two SCSI git trees operational:

rsync://www.parisc-linux.org/~jejb/scsi-rc-fixes-2.6.git

and

rsync://www.parisc-linux.org/~jejb/scsi-misc-2.6.git

The latter has a non trivial merge in it because of a conflict in
scsi_device.h, so merges actually do work ...

The trees are exported from BK a changeset at a time (except the merge
bits, which were done manually).  I'll continue to accumulate patches in
the BK trees for the time being since we don't have a nice web browser
interface for the git trees (and also my commit scripts are all BK
based).

Linus, the rc-fixes repo is ready for applying ... it's the same one I
announced on linux-scsi and lkml a while ago just with the git date
information updated to be correct (the misc one should wait until after
2.6.12 is final).

James



^ permalink raw reply

* Re: [0/5] Parsers for git objects, porting some programs
From: Daniel Barkalow @ 2005-04-18 20:12 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Petr Baudis, git
In-Reply-To: <Pine.LNX.4.58.0504181126480.15725@ppc970.osdl.org>

On Mon, 18 Apr 2005, Linus Torvalds wrote:

> On Sun, 17 Apr 2005, Daniel Barkalow wrote:
> >
> > This series introduces common parsers for objects, and ports the programs
> > that currently use revision.h to them.
> > 
> >  1: the header files
> >  2: the implementations
> >  3: port rev-tree
> >  4: port fsck-cache
> >  5: port merge-base
> 
> Ok, having now looked at the code, I don't have any objections at all. 
> Could you clarify the "fsck" issue about reading the same object twice? 
> When does that happen?

Currently, the fsck-cache code is unpacking the objects to find out what
type they are, and the old code would pass the unpacked objects to the
parsing code. The new code doesn't take the unpacked objects, so it
unpacks them again. (I.e., fsck-cache will look at each object exactly
twice). The right solution is to have the internals reorganized slightly
such that a "parse_object" method, which does what fsck-cache wants (i.e.,
parse this object regardless of what type it is, and tell me the type),
could be fit in efficiently. But it doesn't affect the header file
interface, and it's only relevant to fsck-cache, which wants to look at
random junk that it doesn't have a reference to.

	-Daniel
*This .sig left intentionally blank*


^ permalink raw reply

* Re: [PATCH] Pretty-print date in 'git log'
From: Sanjoy Mahajan @ 2005-04-18 19:57 UTC (permalink / raw)
  To: Ray Lee; +Cc: David Woodhouse, git, Petr Baudis
In-Reply-To: <1113850922.23938.54.camel@orca.madrabbit.org>

> ray:~/work/home$ date -ud 'jan 1, 1970 + 1111111111 seconds'
> Fri Mar 18 01:58:31 UTC 2005
> ray:~/work/home$ date -ud 'jan 1, 1970 + 1111111111 seconds + 0800'
> Fri Mar 18 09:58:31 UTC 2005

I sent David a short script to do almost that, except that mine needed
to negate the timezone whereas yours elegantly changes +0800 to + 0800

In your 2nd example, you'll need 'sed' to replace UTC (or +0000 if
using -R) in the output by +0800, because the 1111111111 is the UTC
seconds, so the actual time is Fri Mar 18 01:58:31 UTC 2005 (as given
by your first example).

-Sanjoy

^ permalink raw reply

* Re: [PATCH] fix bug in read-cache.c which loses files when merging a tree
From: Linus Torvalds @ 2005-04-18 19:25 UTC (permalink / raw)
  To: James Bottomley; +Cc: git
In-Reply-To: <1113848239.4998.45.camel@mulgrave>



On Mon, 18 Apr 2005, James Bottomley wrote:
>
> I noticed this when I tried a non-trivial scsi merge and checked the
> results against BK.  The problem is that remove_entry_at() actually
> decrements active_nr, so decrementing it in add_cache_entry() before
> calling remove_entry_at() is a double decrement (hence we lose cache
> entries at the end).

Thanks. Just before I was going to hit the same issue, too.

I've pushed out my first real content merge: since Daniel Barkalow's
object model stuff didn't apply to my tree any more (I had added the
commit type tracking to mine after Daniel did his conversion), I
instead applied his series to the place they were done against,
and used git to merge the result with my current top-of-tree.

I based it on the two example scripts I had sent out, but obviously never 
tested until this point (since both of them had some serious syntax 
errors, and thus clearly wouldn't work).

I also checked in the stupid scripts, in the expectation that somebody
else can improve on them and make them useful. For example, firing up an 
editor when the merge fails is probably a damn good idea.

Anyway, it seems to prove the concept of a real three-way merge, and it 
all actually worked exactly the way I envisioned. Whether the end result 
works or not, that's a different issue ;)

			Linus

^ permalink raw reply

* Re: optimize gitdiff-do script
From: Paul Jackson @ 2005-04-18 19:17 UTC (permalink / raw)
  To: Petr Baudis; +Cc: git
In-Reply-To: <20050418183038.GB5554@pasky.ji.cz>

Pasky wrote:
> But what I said still holds - this can go
> in only after we have a shell library sharing the common functions

Ah - thanks for repeating that - it didn't sink in the first time.

Good idea.

> Yes, sorry about that; I had a lot of mail traffic lately ...

No problem.  I hope you're having fun at the center of this cyclone.

> I cannot guarantee I will look at it immediately, though.

Good.  You priorities sound fine to me.

I'll rework the patches and send them along again in a few days,
when I get a chance.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401

^ permalink raw reply

* Re: [PATCH] Pretty-print date in 'git log'
From: Ray Lee @ 2005-04-18 19:02 UTC (permalink / raw)
  To: David Woodhouse; +Cc: git, Petr Baudis
In-Reply-To: <1113803220.11910.81.camel@localhost.localdomain>

On Mon, 2005-04-18 at 15:46 +1000, David Woodhouse wrote:
> Add tool to render git's "<utcseconds> <zone>" into an RFC2822-compliant
> string, because I don't think date(1) can do it.

I admit it's not obvious, but date(1) includes gnu's full date parser,
so you can pull stunts like:

ray:~/work/home$ date -ud 'jan 1, 1970 + 1111111111 seconds'
Fri Mar 18 01:58:31 UTC 2005
ray:~/work/home$ date -ud 'jan 1, 1970 + 1111111111 seconds + 0800'
Fri Mar 18 09:58:31 UTC 2005

Ray



^ permalink raw reply

* Re: [PATCH] Get commits from remote repositories by HTTP
From: Petr Baudis @ 2005-04-18 18:47 UTC (permalink / raw)
  To: tony.luck; +Cc: git
In-Reply-To: <200504181841.j3IIfgP31258@unix-os.sc.intel.com>

Dear diary, on Mon, Apr 18, 2005 at 08:41:42PM CEST, I got a letter
where tony.luck@intel.com told me that...
> >Not a patch ... it is a whole file.  I called it "git-wget", but it might
> >also want to be called "git-pulltop".
> 
> It's been pointed out to me that I based this script on a pre-historic version
> of ls-tree from sometime last week.  Modern versions print the mode with %06o
> so there is a leading 0 on the mode for a directory.  Just change
> 
> 		if [ $mode = 40000 ]
> 
> to
> 
> 		if [ $mode = 040000 ]
> 
> to fix it.

...and this is precisely why ls-tree actually outputs those "blob" and
"tree" tags. ;-)

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply

* Re: [PATCH] Get commits from remote repositories by HTTP
From: tony.luck @ 2005-04-18 18:41 UTC (permalink / raw)
  To: git
In-Reply-To: <200504170316.j3H3GaZ03333@unix-os.sc.intel.com>

>Not a patch ... it is a whole file.  I called it "git-wget", but it might
>also want to be called "git-pulltop".

It's been pointed out to me that I based this script on a pre-historic version
of ls-tree from sometime last week.  Modern versions print the mode with %06o
so there is a leading 0 on the mode for a directory.  Just change

		if [ $mode = 40000 ]

to

		if [ $mode = 040000 ]

to fix it.

The script might also be useful for anyone behind a firewall that blocks
rsync transfers.

-Tony

^ permalink raw reply

* Re: [darcs-devel] Darcs and git: plan of action
From: Ray Lee @ 2005-04-18 18:35 UTC (permalink / raw)
  To: David Roundy; +Cc: Linus Torvalds, Git Mailing List, darcs-devel
In-Reply-To: <20050418122011.GA13769@abridgegame.org>

On Mon, 2005-04-18 at 08:20 -0400, David Roundy wrote:
> Putting darcs patches *into* git is more complicated, since we'll want to
> get them back again without modification.  Normal "hunk" patches would be
> no problem, provided we never change our diff algorithm (which has been
> discussed recently, in the context of making hunks better align with blocks
> of code).  We could perhaps tell users not to use "replace" patches.  But
> avoiding "mv" patches would be downright silly.

Okay, I still haven't used git yet (and have only toyed around with
darcs for a bit), so take what I'm saying with a grain of salt.
Regardless, I think you may be asking the wrong question. The tracking
of renames was bandied about pretty thoroughly on-list from Wednesday
through Friday (for far better commentary and insight, see Linus'
messages with subject: Merge with git-pasky II.)

git does track changesets that describe the parent tree(s) and the
result. The trees track filenames and hashes. So, doing a fairly
straightforward compare on two trees will let you immediately discover
renames that have occurred, as the filename in the tree changed while
the hash didn't.

So, the question then becomes, can an outside tool cheaply derive all
the information that darcs would need to perform it's work? The renames
should be easy, as long as no content changed during the rename. As for
token replacement (and whitespace changes, etc.), that could be
discovered via domain-specific parsers (something specific per language,
for example). Linus tossed a link to one such tool (hmm, where was it.
Sheesh. You sure right a lot, dude :-).)

	http://minnie.tuhs.org/Programs   (see Ctcompare)

...which should be viewed more as a proof-of-concept than a mergeable
code-set. It does show that diff's vocabulary is sadly lacking in
expressiveness, and improving that, I think, would be a useful area to
expend effort. 

Again, I may be off here, especially considering I've a backlog of a
couple hundred messages to read since the weekend. (You guys need to go
outside more often.)

Ray


^ permalink raw reply

* Re: [0/5] Parsers for git objects, porting some programs
From: Linus Torvalds @ 2005-04-18 18:34 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Petr Baudis, git
In-Reply-To: <Pine.LNX.4.21.0504172229240.30848-100000@iabervon.org>



On Sun, 17 Apr 2005, Daniel Barkalow wrote:
>
> This series introduces common parsers for objects, and ports the programs
> that currently use revision.h to them.
> 
>  1: the header files
>  2: the implementations
>  3: port rev-tree
>  4: port fsck-cache
>  5: port merge-base

Ok, having now looked at the code, I don't have any objections at all. 
Could you clarify the "fsck" issue about reading the same object twice? 
When does that happen?

		Linus

^ permalink raw reply

* Re: optimize gitdiff-do script
From: Petr Baudis @ 2005-04-18 18:30 UTC (permalink / raw)
  To: Paul Jackson; +Cc: git
In-Reply-To: <20050418082334.25359013.pj@sgi.com>

Dear diary, on Mon, Apr 18, 2005 at 05:23:34PM CEST, I got a letter
where Paul Jackson <pj@sgi.com> told me that...
> Pasky,
> 
> Looks like a couple of questions I asked over the weekend
> got lost along the way.

Yes, sorry about that; I had a lot of mail traffic lately and I'm not so
used to it. ;-)

>  1) How do you want me to fix the indentation on my patch
>     to optimize gitdiff-do script:
> 	- forget my first patch and resend from scratch, or
> 	- a second patch restoring indentation, on top of my first one.

Resend from scratch, please.

I cannot guarantee I will look at it immediately, though. Optimizing is
nice, but gitdiff-do's speed is already usable and there are much more
pressing issues for git-pasky right now.

>  2) Would you be interested in a patch that used a more robust tmp
>     file creation, along the lines of replacing
> 
> 	    t=${TMPDIR:-/usr/tmp}/gitdiff.$$
> 	    trap 'set +f; rm -fr $t.?; trap 0; exit 0' 0 1 2 3 15
> 
>     with:
> 
> 	    tmp=${TMPDIR-/tmp}
> 	    tmp=$tmp/gitdiff-do.$RANDOM.$RANDOM.$RANDOM.$$
> 	    (umask 077 && mkdir $tmp) || {
> 		    echo "Could not create temporary directory! Exiting." 1>&2 
> 		    exit 1
> 	    }
> 	    trap 'rm -fr $tmp; trap 0; exit 0' 0 1 2 3 15
> 	    t=$tmp/tmp
> 
>     From the www.linuxsecurity.com link that Dave Jones provided, the
>     above $tmp directory is about as good as using mktemp, while
>     avoiding both dependency on mktemp options not everyone has.
> 
>  3) If interested in (2), would you want it instead of my previous mktemp
>     removal patch, or on top of it?

Instead of the previous patch. But what I said still holds - this can go
in only after we have a shell library sharing the common functions - I
don't want to have this horrid stuff in every file.

Actually, if you will make a mktemp shell function, no changes
whatsoever might be needed to the other scripts.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply

* [PATCH] fix bug in read-cache.c which loses files when merging a tree
From: James Bottomley @ 2005-04-18 18:17 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

I noticed this when I tried a non-trivial scsi merge and checked the
results against BK.  The problem is that remove_entry_at() actually
decrements active_nr, so decrementing it in add_cache_entry() before
calling remove_entry_at() is a double decrement (hence we lose cache
entries at the end).

James

read-cache.c: 4d4d94f75cceb8039eb466c1956f8b54dc0e24b6
--- read-cache.c
+++ read-cache.c	2005-04-18 13:08:09.000000000 -0500
@@ -402,7 +402,6 @@
 	if (pos < active_nr && ce_stage(ce) == 0) {
 		while (same_name(active_cache[pos], ce)) {
 			ok_to_add = 1;
-			active_nr--;
 			if (!remove_entry_at(pos))
 				break;
 		}



^ permalink raw reply

* Re: Re-done kernel archive - real one?
From: Petr Baudis @ 2005-04-18 18:07 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List
In-Reply-To: <Pine.LNX.4.58.0504181003030.15725@ppc970.osdl.org>

Dear diary, on Mon, Apr 18, 2005 at 07:05:19PM CEST, I got a letter
where Linus Torvalds <torvalds@osdl.org> told me that...
> 
> 
> On Mon, 18 Apr 2005, Linus Torvalds wrote:
> > 
> > No, that can't work. The pesky tools are helpful [...]
> > I'm afraid that until Pasky's tools script this properly, [... ]
> > If Pesky wants to take the above script, test it, [...]
> 
> Ok, one out of three isn't too bad, is it? Pesky/Pasky, so close yet so 
> far. Sorry,

No problem. :-) Or you can just call me Petr if you want. ;-)

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply

* Re: SHA1 hash safety
From: C. Scott Ananian @ 2005-04-18 17:04 UTC (permalink / raw)
  To: Andy Isaacson; +Cc: omb, git
In-Reply-To: <20050418074323.GA29765@hexapodia.org>

On Mon, 18 Apr 2005, Andy Isaacson wrote:

> If you had actual evidence of a collision, I'd love to see it - even if
> it's just the equivalent of
> % md5 foo
> d3b07384d113edec49eaa6238ad5ff00 foo
> % md5 bar
> d3b07384d113edec49eaa6238ad5ff00 bar
> % cmp foo bar
> foo bar differ: byte 25, line 1
> %
>
> But in the absence of actual evidence, we have to assume (just based on
> the probabilities) that there was some error in your testing.

I've already had a long correspondence with this poster.  He claims that 
"this happened 7 years ago", involved a "commercial contract covered by 
Swiss Banking Law" (with caps!) and that, of course, he "certainly doesn't 
retain [his] client's documents", and even if he *did*, he wouldn't show 
them to *me*.

And then he was unable to comprehend that I couldn't accept his word alone 
as prima facie evidence that the laws of probability did not apply to him or 
his clients.

I've been a coder far too long to attribute to "The Mysterious Hand Of 
God" what can adequately be described by subtle programmer error.

The most reasonable explanation, given the (lack of) evidence, is that 
the programmer involved quickly took refuge in a (wildly improbable, but 
his clients'll never know) "MD5 collision" instead of buckling down and 
finding the bug in his code.
  --scott

ODOATH Ortega FBI SGUAT AEBARMAN India Peking ODACID operation RYBAT 
[Hello to all my fans in domestic surveillance] for Dummies KUCLUB
                          ( http://cscott.net/ )

^ permalink raw reply

* Re: Re-done kernel archive - real one?
From: Linus Torvalds @ 2005-04-18 17:05 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Git Mailing List
In-Reply-To: <Pine.LNX.4.58.0504180802060.7211@ppc970.osdl.org>



On Mon, 18 Apr 2005, Linus Torvalds wrote:
> 
> No, that can't work. The pesky tools are helpful [...]
> I'm afraid that until Pasky's tools script this properly, [... ]
> If Pesky wants to take the above script, test it, [...]

Ok, one out of three isn't too bad, is it? Pesky/Pasky, so close yet so 
far. Sorry,

		Linus

^ permalink raw reply

* Re: [PATCH] Add help details to git help command.
From: Steven Cole @ 2005-04-18 16:59 UTC (permalink / raw)
  To: Petr Baudis; +Cc: git
In-Reply-To: <20050418102412.GJ1461@pasky.ji.cz>

Petr Baudis wrote:
> Dear diary, on Mon, Apr 18, 2005 at 06:42:26AM CEST, I got a letter
> where Steven Cole <elenstev@mesatop.com> told me that...
[snippage]
> 
>>This patch will provide the comment lines in the shell script associated
>>with the command, cleaned up a bit for presentation.
>>
>>BUGS: This will also print any comments in the entire file, which may
>>not be desired.  If a command name and shell script filename
>>do not follow the usual convention, this won't work, e.g. ci for commit.
> 
> 
> Hey, those BUGS are the only slightly non-trivial thing on the whole
> thing! I could do this patch myself... ;-) Also, you don't want to print
> the first newline and the Copyright notices.
> 

Fixed extra vertical whitespace, Copyright notice problems, and issue
with git help ci.

Here's a better version.  Didn't fix the more interesting bugs, as I'm
pressed for time (aren't we all).  Perhaps someone can polish this up.

Anyway, I think it's pretty useful in its present form.

Thanks,
Steven

---------

This patch will provide the comment lines in the shell script associated
with the command, cleaned up a bit for presentation.

BUGS: This will also print any comments in the entire file, which may
not be desired.  If a command name and shell script filename
do not follow the usual convention, this won't work.

git: b648169640025bd68d1b27a0fcc85b65d85e4440
--- git
+++ git	2005-04-18 10:34:17.000000000 -0600
@@ -19,6 +19,11 @@


  help () {
+
+command=$1
+scriptfile=git$command.sh
+
+if [ ! $command ]; then
  	cat <<__END__
  The GIT scripted toolkit  $(gitversion.sh)

@@ -48,7 +53,10 @@
  	track		[RNAME]
  	version

+Additional help is available with: git help COMMAND
+
  Note that these expressions can be used interchangably as "ID"s:
+
  	empty string (current HEAD)
  	local (the local branch if tracking a remote one)
  	remote name (as registered with git addremote)
@@ -57,6 +65,14 @@
  	commit object hash (as returned by commit-id)
  	tree object hash (accepted only by some commands)
  __END__
+fi
+if [ $scriptfile = "gitci.sh" ]; then
+	scriptfile="gitcommit.sh"
+fi
+if [ ! $scriptfile = "git.sh" ]; then
+	grep ^# $scriptfile | grep -v "!/bin" | grep -v "(c)" \
+	| cut -c 2- | grep ^.
+fi
  }




^ permalink raw reply

* Re: SHA1 hash safety
From: C. Scott Ananian @ 2005-04-18 16:50 UTC (permalink / raw)
  To: Horst von Brand; +Cc: ross, omb, David Lang, Ingo Molnar, git
In-Reply-To: <200504170635.j3H6Z0Ga005661@laptop11.inf.utfsm.cl>

On Sun, 17 Apr 2005, Horst von Brand wrote:

>> crypto-babble about collision whitepapers is uninteresting without a
>> repo that has real collisions.  git is far too cool as is - prove I
> Just copy over a file (might be the first step in splitting it, or a
> header file that is duplicated for convenience, ...)

This is not a collision.  This is a *feature*.
  --scott

payment UKUSA ODOATH AVBLIMP ESSENCE JUBILIST ASW AK-47 CABOUNCE Ortega 
PBPRIME North Korea anthrax Milosevic bomb Soviet  QKFLOWAGE Yeltsin
                          ( http://cscott.net/ )

^ permalink raw reply

* Re: Yet another base64 patch
From: David A. Wheeler @ 2005-04-18 16:42 UTC (permalink / raw)
  To: yarcs; +Cc: git
In-Reply-To: <4263AF41.1070806@qualitycode.com>

I asked:
> > Does anyone know of any other issues in how git data is stored that
> > might cause problems for some situations? ...

Kevin said:
> If git is retaining hex naming, and not moving to base64, then I don't
> think what I am about to say is relevant. However, if base64 file naming
> is still being considered, then vfat32 compatibility may be a concern
> (I'm not sure about NTFS).

I can't speak for the git developers. However, I think the current
naming scheme for the object database as used in git-pasky
is actually a very good one and should be left as-is
(SHA-1 hex values, directory of 2-char prefixes,
filenames with the rest of the value).

As far as I can tell from various calculations (& supported by the
performance measurements done by others), the hex values
with one level of directory turns out to work pretty well!
It's easily understood, works with non-massive projects on stupid
filesystems, and it has good performance on good filesystems
even with massive projects with huge histories.  You could
tune it further, but a single approach that works "everywhere"
is a whole lot simpler.  So I'd recommend keeping that
approach.

As far as base64/32 vs. hex names, I think there
are many reasons to stay with the hex names.
Using hex names is a good idea for the simple reason that
normally SHA-1 hashes are presented as hex values;
you'll work WITH instead of AGAINST other tools, and
humans who deal with this stuff will "see what they expect".
It takes a few more characters, but not many, and it's not
like base64 is any more comprehensible to humans.
And the fact that hex values don't allow "all" legal values
means that some errors are trivially detectable.

You're right, base64 eliminates many bits of differentiation,
and in a very non-obvious way (I _hate_ weird surprises like
that, they cause lots of trouble).  I think there's another
problem too that's more insideous. Although the _filesystem_
is case-preserving, I suspect some _tools_ on Windows don't take
care to preserve case.  If that's so, it'd be easily possible for a
Windows user to use some tools that screw up a Unix/Linux user
once they were imported, causing all sorts of "extraneous" files &
files that mysteriously disappeared (they were only accessible
from Windows). Ugh.
This can even happen on Unix/Linux systems if they use
a fileserver with NTFS semantics. In contrast,
if a hex value has its case changed, it's easy to fix locally.

By choosing the more traditional hex representation, you
eliminate lots of problems, and it's easier to explain too.

Kevin added:
> I'll take this opportunity to support David's position that it would be
> fantastic if git could end up being valuable for a wide range of
> projects, rather than just the kernel. I also fully understand that the
> kernel is the primary target, but when there are opportunities to make
> the data structures more generally useful without causing problems for
> the kernel project, I hope they are taken.

Thanks for the vote of confidence!

--- David A. Wheeler

^ permalink raw reply

* (no subject)
From: Davide Rossetti @ 2005-04-18 16:31 UTC (permalink / raw)
  To: git

subscribe git


^ permalink raw reply

* Re: A couple of questions
From: Paul Jackson @ 2005-04-18 16:23 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: is, git
In-Reply-To: <Pine.LNX.4.58.0504180825280.7211@ppc970.osdl.org>

Linus wrote:
> Nothing beats backups and distribution.

Famous quote from the past:

"Only wimps use tape backup: real men just upload their important stuff on ftp,
 and let the rest of the world mirror it ;)" Linus Torvalds

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401

^ permalink raw reply

* Re: Re-done kernel archive - real one?
From: Linus Torvalds @ 2005-04-18 15:42 UTC (permalink / raw)
  To: Greg KH; +Cc: Russell King, David Woodhouse, Git Mailing List, Peter Anvin
In-Reply-To: <20050418150456.GC12750@kroah.com>



On Mon, 18 Apr 2005, Greg KH wrote:
>
> On Sun, Apr 17, 2005 at 04:24:24PM -0700, Linus Torvalds wrote:
> > 
> > Tools absolutely matter. And it will take time for us to build up that 
> > kind of helper infrastructure. So being newbie might be part of it, but 
> > it's the smaller part, I say. Rough interfaces is a big issue.
> 
> Speaking of tools, you had a "dotest" program to apply patches in email
> form to a bk tree.  And from what I can gather, you've changed that to
> handle git archives, right?

Yup.

It's a git archive at 

	kernel.org:/pub/linux/kernel/people/torvalds/git-tools.git

and it seems to work. It's what I've used for all the kernel patches 
(except for the merge), and it's what I use for the git stuff that shows 
up as authored by others.

		Linus

^ permalink raw reply

* Re: Darcs and git: plan of action
From: Linus Torvalds @ 2005-04-18 15:38 UTC (permalink / raw)
  To: David Roundy; +Cc: Git Mailing List, darcs-devel
In-Reply-To: <20050418122011.GA13769@abridgegame.org>



On Mon, 18 Apr 2005, David Roundy wrote:
> 
> I'm cc'ing you on this email, since Juliusz had some interesting ideas as
> to how darcs could interact with git, which then gave me an idea concerning
> which I'd like feedback from you.  In particular, it would make life (that
> is, life interacting back and forth with git) easier if we were to embed
> darcs patches in their entirety in the git comment block.

Hell no.

The commit _does_ specify the patch uniquely and exactly, so I really 
don't see the point. You can always get the patch by just doing a

	git diff $parent_tree $thistree

so putting the patch in the comment is not an option.

Then you can use the patch to index to whatever extra "darcs index" 
information you want to.

> As I say, it's a bit ugly, and before we explore the idea further, it would
> be nice to know if this would cause Linus to vomit in disgust and/or refuse
> patches from darcs users.

That's definitely the case. I will _not_ be taking random files etc just 
to keep other peoples stuff straightened up.

If you want to add a "log ID", you can certainly do that, but the data the 
ID refers to is _you_ data, and will not go into the git archive. So:

> Another slightly less noxious possibility would
> be to store the darcs patch as a "hidden" file, if git were given the
> concept of commit-specific files.

No, git will not track commit-specific files. There's the comment section,
and that _is_ the commit-specific file. But I will refuse to take any
comments that aren't just human-readable explanations, together with maybe 
one extra line of

	# Darcs ID: 780c057447d4feef015a905aaf6c87db894ff58c

(others will want to track _their_ PR numbers etc) and that's it. The 
actual darcs data that that ID refers to can obviously be maintained in 
_another_ git archive, but it's not one I'm going to carry about.

			Linus

^ permalink raw reply

* Re: A couple of questions
From: Linus Torvalds @ 2005-04-18 15:31 UTC (permalink / raw)
  To: Imre Simon; +Cc: git
In-Reply-To: <42639F24.90007@ime.usp.br>



On Mon, 18 Apr 2005, Imre Simon wrote:
>
> How will git handle a corrupted (git) file system?
> 
> For instance, what can be done if objects/xy/z{38} does not pass the
> simple consistency test, i.e. if the file's sha1 hash is not xyz{38}?
> This might be a serious problem because, in general, one cannot
> reconstruct the contents of file objects/xy/z{38} from its name
> xyz{38}.

Nothing beats backups and distribution. The distributed nature of git 
means that you can replicate your objects abitrarily.

> Another problem might come up if the file does pass the simple
> consistency test but the file's contents is not a valid git file,

Run "fsck-cache". It not only tests SHA1 and general object sanity, but it
does full tracking of the resulting reachability and everything else. It
prints out any corruption it finds (missing or bad objects), and if you
use the "--unreachable" flag it will also print out objects that exist but 
that aren't readable from any of the HEAD nodes (which you need to 
specify).

So for example

	fsck-cache --unreachable $(cat .git/HEAD)

will do quite a _lot_ of verification on the tree. There are a few extra 
validity tests I'm going to add (make sure that tree objects are sorted 
properly etc), but on the whole if "fsck-cache" is happy, you do have a 
valid tree.

Any corrupt objects you will have to find in backups or other archives (ie
you can just remove them and do an "rsync" with some other site in the
hopes that somebody else has the object you have corrupted).

Of course, "valid tree" doesn't mean that it wasn't generated by some evil 
person, and the end result might be crap. Git is a revision tracking 
system, not a quality assurance system ;)

		Linus

^ permalink raw reply

* Re: optimize gitdiff-do script
From: Paul Jackson @ 2005-04-18 15:23 UTC (permalink / raw)
  To: pasky; +Cc: git
In-Reply-To: <20050416171009.0bedbab4.pj@sgi.com>

Pasky,

Looks like a couple of questions I asked over the weekend
got lost along the way.

 1) How do you want me to fix the indentation on my patch
    to optimize gitdiff-do script:
	- forget my first patch and resend from scratch, or
	- a second patch restoring indentation, on top of my first one.

 2) Would you be interested in a patch that used a more robust tmp
    file creation, along the lines of replacing

	    t=${TMPDIR:-/usr/tmp}/gitdiff.$$
	    trap 'set +f; rm -fr $t.?; trap 0; exit 0' 0 1 2 3 15

    with:

	    tmp=${TMPDIR-/tmp}
	    tmp=$tmp/gitdiff-do.$RANDOM.$RANDOM.$RANDOM.$$
	    (umask 077 && mkdir $tmp) || {
		    echo "Could not create temporary directory! Exiting." 1>&2 
		    exit 1
	    }
	    trap 'rm -fr $tmp; trap 0; exit 0' 0 1 2 3 15
	    t=$tmp/tmp

    From the www.linuxsecurity.com link that Dave Jones provided, the
    above $tmp directory is about as good as using mktemp, while
    avoiding both dependency on mktemp options not everyone has.

 3) If interested in (2), would you want it instead of my previous mktemp
    removal patch, or on top of it?

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox