Git development

Git development
 help / color / mirror / Atom feed

* Re: What's in git.git
From: Linus Torvalds @ 2006-05-10  4:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vd5emze3h.fsf@assigned-by-dhcp.cox.net>



On Tue, 9 May 2006, Junio C Hamano wrote:
> 
> It is problematic but not more than the current index + "Binary
> files differ" output.  If you have both pre and postimage then
> you do not need the binary data.

Fair enough.

> > But at least in theory we might well want to do "-R" eventually.
> 
> Yes, but even without binary, -R has a funny implication when
> copy-edit patch is involved.  What if a patch copy-edits to
> create a new file B based on old A, and also modifies A
> in-place, and somehow the postimages of A and B you already have
> are not consistent with what that patch does?

Yeah, that could get exciting ;)

		Linus

^ permalink raw reply

* [PATCH] cg-status -- disambiguate parameters
From: Martin Langhoff @ 2006-05-10  5:02 UTC (permalink / raw)
  To: pasky, git, peter.bulmer; +Cc: Martin Langhoff

Signed-off-by: Martin Langhoff <martin@catalyst.net.nz>


---

 cg-status |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

81fa6ce82f47e4973b172eddd6bb6f1b2f2bff93
diff --git a/cg-status b/cg-status
index 944f9c5..214d6cf 100755
--- a/cg-status
+++ b/cg-status
@@ -238,7 +238,7 @@ if [ "$workstatus" ]; then
 		git-diff-index HEAD -- "${basepath:-.}" | cut -f5- -d' ' | 
 		while IFS=$'\t' read -r mode file; do
 			if [ "$mode" = D ]; then
-				[ "$(git-diff-files "$file")" ] && mode=!
+				[ "$(git-diff-files -- "$file")" ] && mode=!
 			elif [ "$mode" = M ] && [ "$commitignore" ]; then
 				fgrep -qx "$file" "$_git/commit-ignore" && mode=m
 			fi
-- 
1.3.2.g82000

^ permalink raw reply related

* Re: What's in git.git
From: Junio C Hamano @ 2006-05-10  5:05 UTC (permalink / raw)
  To: Randal L. Schwartz; +Cc: git
In-Reply-To: <864pzyh4x0.fsf@blue.stonehenge.com>

merlyn@stonehenge.com (Randal L. Schwartz) writes:

>>>>>> "Junio" == Junio C Hamano <junkio@cox.net> writes:
>
> Junio> This week's "What's in" is a day early, since I do not expect to
> Junio> be able to do much gitting for the rest of the week.
>
> I just got this with the latest, on the git archive, using git-repack -a:
>
> Generating pack...
> Done counting 19151 objects.
> Deltifying 19151 objects.
> Segmentation fault (core dumped)
>
> This is on OpenBSD.  Is there a secret sabotage afoot?  This is repeatable.
> Is there anything I can try differently?

Tonight's latest (f7a3276) merged Nico's delta patches that was
in "next" branch for some time, so that is what I would suspect
first.

    commit 06a9f9203570d21f9ef5fe219cdde527dcdf0990
    Author: Nicolas Pitre <nico@cam.org>

        improve diff-delta with sparse and/or repetitive data

    commit 2d08e5dd730680f7f8645a6326ec653435e032df
    Author: Nicolas Pitre <nico@cam.org>

        tiny optimization to diff-delta

    commit 3dc5a9e4cdcc7124c665a050547d1285d86a421f
    Author: Nicolas Pitre <nico@cam.org>

        replace adler32 with Rabin's polynomial in diff-delta

    commit f6c7081aa97aa67baa06390a1ef36088c33bf010
    Author: Nicolas Pitre <nico@cam.org>

        use delta index data when finding best delta matches

    commit 08abe669c05521499149dbf84fedefb04a8fa34d
    Author: Nicolas Pitre <nico@cam.org>

        split the diff-delta interface

Could you revert them (i.e. run "git revert 06a9f9", "git revert
2d08e5", ...) to see where it starts behaving again?

Alternatively, you could bisect between 2fc240a (before merging
np/delta branch) and master (f7a3276).

(1) Reverting recipe.

	$ git checkout -b testfix master
        $ make
        $ GIT_EXEC_PATH=. PATH=.:$PATH \
          ./git-repack -a ;# you already know this fails.
        $ EDITOR=: git revert 06a9f92
        $ make
        $ GIT_EXEC_PATH=. PATH=.:$PATH ./git-repack -a ;# does it?
        $ EDITOR=: git revert 2d08e5d
        $ make
        $ GIT_EXEC_PATH=. PATH=.:$PATH ./git-repack -a ;# does it?
        $ EDITOR=: git revert 3dc5a9e
        $ make
        $ GIT_EXEC_PATH=. PATH=.:$PATH ./git-repack -a ;# does it?
        ...

(2) Bisecting recipe.

	$ git checkout -b testfix master
	$ make
        $ GIT_EXEC_PATH=. PATH=.:$PATH \
          ./git-repack -a ;# you already know this fails.
        $ git reset --hard 2fc240a
	$ make
        $ GIT_EXEC_PATH=. PATH=.:$PATH \
          ./git-repack -a ;# before merging np/delta,
                           # should hopefully work
        $ git bisect start
        $ git bisect good 2fc240a
        $ git bisect bad master
        Bisecting: 10 revisions left to test after this
        [143f4d94c6e2188a6bedfdfa268e66b579e3fbf9] Merge branch 'jc/again'
	$ make
        $ GIT_EXEC_PATH=. PATH=.:$PATH ./git-repack -a ;# does this work?

        $ git bisect good ;# if it works, or
        $ git bisect bad  ;# otherwise
	Bisecting: 7 revisions left to test after this
	...

	

^ permalink raw reply

* Re: What's in git.git
From: Martin Langhoff @ 2006-05-10  5:34 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7viroezi8s.fsf@assigned-by-dhcp.cox.net>

On 5/10/06, Junio C Hamano <junkio@cox.net> wrote:
> * The 'next' branch, in addition, has these.
>
>   - cvsserver and cvsexportcommit updates (Martin Langhoff and Martyn Smith)
>
>     This is a new merge but not very new code.  Martin may want
>     to comment on how ready they are.

They have seen some limited use in-house -- we don't use Eclipse
in-house that much, but that has been the test target. I am currently
looking for a machine with good bandwidth to a backbone and cycles to
spare where we can run anon cvs access to Linus's kernel.git repo.

cheers,

martin

^ permalink raw reply

* Re: What's in git.git
From: Jakub Narebski @ 2006-05-10  6:48 UTC (permalink / raw)
  To: git
In-Reply-To: <7viroezi8s.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano wrote:

>   - built-in grep (me)
> 
>     I think this is also ready, even though it robs users from
>     having funky "grep" on their $PATH and invoke it.  Compared
>     to GNU grep, it lacks -P (pcre), -Z (NUL-terminated output),
>     -q (totally quiet), -z (NUL-terminated input), but all the
>     commonly used ones including -f (from file), -F (fixed), -w
>     (word regexp), -l/-L (files with/without match) and -n (line
>     number) are implemented.  The same "stop me or else" comment
>     applies.

If there would be possible to use external grep (like one can use external
diff), then lack of some options wouldn't matter.

-- 
Jakub Narebski
Warsaw, Poland

^ permalink raw reply

* Re: Implementing branch attributes in git config
From: Martin Langhoff @ 2006-05-10  7:19 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: sean, junkio, Johannes.Schindelin, git
In-Reply-To: <Pine.LNX.4.64.0605091854380.3718@g5.osdl.org>

On 5/10/06, Linus Torvalds <torvalds@osdl.org> wrote:
> You _can_ be user-friendly and machine-parseable at the same time!

Good one. I'm following this thread with interest, but it feels we've
been attacked by the 'bike shed bug' in the act of redesigning
Windows.ini.

As an end-user, I have personally stayed away from the increasingly
complex scheme for remotes waiting for it to settle, and stuck with
the old-styled .git/branches stuff which is slam-dunk simple and it
just works.

The normal non-branch config options don't need any of this fancy
stuff. And I think the branches is reasonably well managed as files as
is done in .git/remotes which is trivial to work on with standard
shell commands. What I mean is that I can grep them trivially to ask
"how many remotes pull from server X" or from repo Y. Or via rsync.

Also -- repo config is tricky in the sense of scope. I want all my
"dev" repos of different projects on my laptop to have mostly the same
config but radically different remotes listings.

So... call me old-styled... but I'm happy with one-file-per-remote.
Was it broken to start with? Should we restart the track renames
flameway instead?

cheers,

martin

^ permalink raw reply

* Re: git-feed-mail-list.sh
From: Martin Mares @ 2006-05-10  8:49 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Junio C Hamano, git
In-Reply-To: <1147137170.2694.58.camel@shinybook.infradead.org>

Hello!

> I think I'd best wait for it to turn up in the release; preferably
> already capable of MIME quoting.

Wouldn't it be easier to feed the output to a MUA and letting it handle
the MIME stuff for you?

I am using mutt for this purpose:

mutt -x -e 'set charset="utf-8"; set send_charset="us-ascii:iso-8859-2:utf-8"' -s "$subj" "$recipient" <$out

				Have a nice fortnight
-- 
Martin `MJ' Mares   <mj@ucw.cz>   http://atrey.karlin.mff.cuni.cz/~mj/
Faculty of Math and Physics, Charles University, Prague, Czech Rep., Earth
Next lecture on time travel will be held on previous Monday.

^ permalink raw reply

* Re: Implementing branch attributes in git config
From: Johannes Schindelin @ 2006-05-10 11:07 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: git
In-Reply-To: <46a038f90605100019q3b44b87kf49e456668f2e249@mail.gmail.com>

Hi,

On Wed, 10 May 2006, Martin Langhoff wrote:

> So... call me old-styled... but I'm happy with one-file-per-remote.
> Was it broken to start with?

Depends on how you look at it. A bunch of files is okay for quick-n-dirty, 
where you do not care about locking or consistency or extensibility.

Ciao,
Dscho

^ permalink raw reply

* Re: What's in git.git
From: Nicolas Pitre @ 2006-05-10 14:15 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Randal L. Schwartz, Junio C Hamano, Git Mailing List
In-Reply-To: <Pine.LNX.4.64.0605092142050.3718@g5.osdl.org>

On Tue, 9 May 2006, Linus Torvalds wrote:

> 
> 
> On Tue, 9 May 2006, Randal L. Schwartz wrote:
> 
> > >>>>> "Junio" == Junio C Hamano <junkio@cox.net> writes:
> > 
> > Junio> This week's "What's in" is a day early, since I do not expect to
> > Junio> be able to do much gitting for the rest of the week.
> > 
> > I just got this with the latest, on the git archive, using git-repack -a:
> > 
> > Generating pack...
> > Done counting 19151 objects.
> > Deltifying 19151 objects.
> > Segmentation fault (core dumped)
> > 
> > This is on OpenBSD.  Is there a secret sabotage afoot?  This is repeatable.
> > Is there anything I can try differently?
> 
> Can you see what the traceback is with gdb?
> 
> I'd suspect the deltifier changes, the rabin hash in particular. The core 
> file traceback would probably point right at the culprit if so.
> 
> I don't see the problem myself, but if it's an access just past the end of 
> an array or something, it would depend on exactly what the delta pattern 
> is (which, without the "-f" flag, in turn depends on what your previous 
> packs looked like) and also on the allocation strategy (which migth 
> explain why it shows on OpenBSD but Linux people hadn't seen it).

When linking with Electric Fence I can reproduce the segfault on Linux 
as well.

Looking into it now.


Nicolas

^ permalink raw reply

* Re: What's in git.git
From: Alex Riesen @ 2006-05-10 15:00 UTC (permalink / raw)
  To: Nicolas Pitre
  Cc: Linus Torvalds, Randal L. Schwartz, Junio C Hamano,
	Git Mailing List
In-Reply-To: <Pine.LNX.4.64.0605101012250.24505@localhost.localdomain>

[-- Attachment #1: Type: text/plain, Size: 985 bytes --]

On 5/10/06, Nicolas Pitre <nico@cam.org> wrote:
> > >
> > > Junio> This week's "What's in" is a day early, since I do not expect to
> > > Junio> be able to do much gitting for the rest of the week.
> > >
> > > I just got this with the latest, on the git archive, using git-repack -a:
> > >
> > > Generating pack...
> > > Done counting 19151 objects.
> > > Deltifying 19151 objects.
> > > Segmentation fault (core dumped)
> > >

Probably unrelated but I get something similar in cygwin:
$ git pull //pc1/share/dir +ref1:ref2
Generating pack...
Done counting 93 objects.
Result has 62 objects.
Deltifying 62 objects.
git-unpack-objects: fatal: early EOF
git-fetch-pack: fatal: git-unpack-objects died with error code 128

(The program names come from a patch attached to the message)

After I remerged (from git's master) everything but np/delta
(I saw that report about crash on openbsd) it worked again.
This time I cannot make the repository public but will try to
help with testing.

[-- Attachment #2: proc-self-cmdline.patch --]
[-- Type: application/xxxxx, Size: 638 bytes --]

^ permalink raw reply

* Re: Implementing branch attributes in git config
From: Linus Torvalds @ 2006-05-10 15:37 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: sean, junkio, Johannes.Schindelin, git
In-Reply-To: <46a038f90605100019q3b44b87kf49e456668f2e249@mail.gmail.com>

On Wed, 10 May 2006, Martin Langhoff wrote:
> 
> Good one. I'm following this thread with interest, but it feels we've
> been attacked by the 'bike shed bug' in the act of redesigning
> Windows.ini.

Sure. It clearly _is_ a bike shed, which is why I posted patches: I think 
the way to resolve bike sheds is to "just do it". In the kernel, the 
general rule ends up being "he who writes the code gets to choose how it 
gets done", because it turns out that there are a lot more people willing 
to _argue_ about code than there are people willing to _write_ it, and 
thus that "real code wins" rule actually works very well in practice.

I don't think I've ever really seen an argument where you ended up having 
seriously competing patches. Yes, you can have patches to do things 
different ways, but once you have that, it tends to be pretty easy for the 
maintainer to just draw a line in the sand. And once one patch has been 
accepted, it's all over.

So the real problem with "bike sheds" is actually when you have a culture 
of arguing, and not enough of a culture of "just do it".

If you have a "just do it" culture, bike sheds are often good things. If 
it really _is_ that easy, a bike shed is a perfect opportunity for 
somebody who isn't all that deeply into a project to make a contribution 
and start feeling more comfy about it. It obviously didn't happen here, 
but it's definitely true that a lot of people in the kernel get introduced 
to doing patches through various "trivial" things.

So don't knock the bike sheds. It's a BSD term, and I think there's a 
_reason_ why it's a BSD term. Those people seem to sometimes like to argue 
about theoretical things (or about anything else, for that matter) more 
than actually getting the damn job done.

> As an end-user, I have personally stayed away from the increasingly
> complex scheme for remotes waiting for it to settle, and stuck with
> the old-styled .git/branches stuff which is slam-dunk simple and it
> just works.

It does work, and I think it's nice in many ways. It was certainly a good 
way to prototype things.

At the same time, especially with moving things more to C (or almost any 
other language, for that matter - shell is really pretty special in making 
it _easier_ to have things in individual files), it's in many ways nicer 
to have a more structured representation that has a nice fixed interface 
and a library and known access methods behind it.

And I'm personally actually pretty fed up with the .git/branches/ and 
.git/remotes/ thing, partly because I can never remember which file is 
which. I had to look at the code of git-parse-remote.sh to remind me which 
had what semantics. We could remove the old one entirely, of course (and 
no, I don't remember which is which now either), and avoid that particular 
problem, but it kind of soured me on it. 

And if we truly have separate files, we should go all the way, and have 
the good old "one file, one value" rule. Which we don't, and which I think 
everybody admits would be horrible for this case for users (even if it 
might be very nice for scripting).

		Linus

^ permalink raw reply

* Re: Implementing branch attributes in git config
From: Jakub Narebski @ 2006-05-10 16:03 UTC (permalink / raw)
  To: git
In-Reply-To: <Pine.LNX.4.64.0605100823350.3718@g5.osdl.org>

Linus Torvalds wrote:

> On Wed, 10 May 2006, Martin Langhoff wrote:
>> 
>> Good one. I'm following this thread with interest, but it feels we've
>> been attacked by the 'bike shed bug' in the act of redesigning
>> Windows.ini.
[...]
>> As an end-user, I have personally stayed away from the increasingly
>> complex scheme for remotes waiting for it to settle, and stuck with
>> the old-styled .git/branches stuff which is slam-dunk simple and it
>> just works.

> And I'm personally actually pretty fed up with the .git/branches/ and
> .git/remotes/ thing, partly because I can never remember which file is
> which. I had to look at the code of git-parse-remote.sh to remind me which
> had what semantics. We could remove the old one entirely, of course (and
> no, I don't remember which is which now either), and avoid that particular
> problem, but it kind of soured me on it.
> 
> And if we truly have separate files, we should go all the way, and have
> the good old "one file, one value" rule. Which we don't, and which I think
> everybody admits would be horrible for this case for users (even if it
> might be very nice for scripting).

On one hand the remotes/ (or older branches/) is similar to refs/heads and
refs/tags that it contains shortcut names for pulling and pushing. On the
other hand configuration should belong to configuration.

Besides, AFAICT we did not have the place to have branch specific
configuration (like description, default place to pull from + marking
branch as immutable, default place to push to, etc.) and if I understand
correctly branches/ was used for something else. refs/heads/`name` stored
branches, including temporary branches which did not need configuration.

I guess that for the time being we can have remotes both in remotes/ and in
config file, plus script to freely transform between them (unless some
config would be unattainable by remotes/ file).

-- 
Jakub Narebski
Warsaw, Poland

^ permalink raw reply

* [PATCH] fix diff-delta bad memory access
From: Nicolas Pitre @ 2006-05-10 16:26 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Randal L. Schwartz, Alex Riesen

It cannot be assumed that the given buffer will never be moved when 
shrinking the allocated memory size with realloc().  So let's ignore 
that optimization for now.

This patch makes Electric Fence happy on Linux.

Signed-off-by: Nicolas Pitre <nico@cam.org>

---

I can't tell if that fixes it on BSD and Cygwin though.

diff --git a/Makefile b/Makefile
diff --git a/diff-delta.c b/diff-delta.c
index c618875..25a798d 100644
--- a/diff-delta.c
+++ b/diff-delta.c
@@ -199,7 +199,6 @@ struct delta_index * create_delta_index(
 			entry->next = hash[i];
 			hash[i] = entry++;
 			hash_count[i]++;
-			entries--;
 		}
 	}

@@ -230,10 +229,6 @@ struct delta_index * create_delta_index(
 	}
 	free(hash_count);

-	/* If we didn't use all hash entries, free the unused memory. */
-	if (entries)
-		index = realloc(index, memsize - entries * sizeof(*entry));
-
 	return index;
 }

^ permalink raw reply related

* Re: What's in git.git
From: Linus Torvalds @ 2006-05-10 16:48 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Randal L. Schwartz, Junio C Hamano, Git Mailing List
In-Reply-To: <Pine.LNX.4.64.0605101012250.24505@localhost.localdomain>

On Wed, 10 May 2006, Nicolas Pitre wrote:
> 
> When linking with Electric Fence I can reproduce the segfault on Linux 
> as well.
> 
> Looking into it now.

Yeah, I get 

	#0  0x1000bfe4 in create_delta (index=0xf7d758a0, trg_buf=0xf7d72eb8, trg_size=327, delta_size=0xffb422b4, 
	    max_size=143) at diff-delta.c:308
	#1  0x10005734 in try_delta (trg=0xf7fdbfa0, src=0xf7fdbf94, src_index=0xf7d758a0, max_depth=10)
	    at pack-objects.c:1049
	#2  0x10005b04 in find_deltas (list=0xf7e32f54, window=11, depth=10) at pack-objects.c:1128
	#3  0x10005ca0 in prepare_pack (window=10, depth=10) at pack-objects.c:1161
	#4  0x1000677c in main (argc=3, argv=0xffb436e4) at pack-objects.c:1359

ie the "entry" chain seems to be corrupt in create_delta.

Actually, it's not even the chain: it's the very first entry:

	(gdb) p index->hash[i]
	$2 = (struct index_entry *) 0xf7d7385c
	(gdb) p entry
	$3 = (struct index_entry *) 0xf7d7385c

and it's not a random value - it looks perfectly valid. As do all the 
other index hash entries. It's just that the index hash entries seem to 
all have been freed, so accessing them causes a SIGSEV!

			Linus

^ permalink raw reply

* Re: [PATCH] fix diff-delta bad memory access
From: Linus Torvalds @ 2006-05-10 17:00 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Junio C Hamano, git, Randal L. Schwartz, Alex Riesen
In-Reply-To: <Pine.LNX.4.64.0605101216360.24505@localhost.localdomain>

On Wed, 10 May 2006, Nicolas Pitre wrote:
>
> It cannot be assumed that the given buffer will never be moved when 
> shrinking the allocated memory size with realloc().  So let's ignore 
> that optimization for now.

Yeah, that was totally bogus. It would re-allocate _part_ of an 
allocation, but that allocation contained not just the index, but all the 
hashes and the hash entries too!

This has nothing to do with moving a buffer - it has everything to do with 
the fact that you shrunk a buffer that still contained all the other 
buffers: you shrunk the "mem" allocation to fit just the first part of it, 
and entirely ignored the "hash" and "entry" parts of it.

Btw, I think that whole "allocate everything in one allocation" thing is 
potentially bogus even the way it is now, if the alignment constraints of 
"index", "hash" and "entry" are different.

When you do

	..
	index = mem;
	mem = index + 1;
	hash = mem;
	mem = hash + hsize;
	entry = mem;
	..

it's perfectly fine for "index", but "hash" and "entry" end up having 
alignments that depend on the size/alignment of "index" (and for "entry" 
on "hash").

So if their alignment requirements are different, you're basically 
screwed.

It may work in practice (maybe they all align on pointer boundaries), but 
it's damn scary. You should re-consider, or at least make that code be a 
lot safer (like actually taking alignment into consideration, both for 
total size and for the offset calculations).

That could be done by using unions or something.

			Linus

^ permalink raw reply

* Re: [PATCH] fix diff-delta bad memory access
From: Linus Torvalds @ 2006-05-10 17:18 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Junio C Hamano, git, Randal L. Schwartz, Alex Riesen
In-Reply-To: <Pine.LNX.4.64.0605100953090.3718@g5.osdl.org>

On Wed, 10 May 2006, Linus Torvalds wrote:
> 
> Yeah, that was totally bogus. It would re-allocate _part_ of an 
> allocation, but that allocation contained not just the index, but all the 
> hashes and the hash entries too!

Actually, no, you got that part right. Mea culpa. It really only ended up 
being a problem when the area was moved, since the pointers into that area 
weren't updated.

The alignment issue is real, but looking at the different structures, 
they'll all have pointers that end up being the real (and only) alignment 
constraint, so as a result, on any reasonable machine they all have the 
same alignment and things are fine.

Junio - pls apply Nico's patch asap. It's correct, and fixes a really 
nasty problem. And I'll withdraw my other worries as "not consequential".

		Linus

^ permalink raw reply

* Re: [PATCH] fix diff-delta bad memory access
From: Nicolas Pitre @ 2006-05-10 17:27 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git, Randal L. Schwartz, Alex Riesen
In-Reply-To: <Pine.LNX.4.64.0605100953090.3718@g5.osdl.org>

On Wed, 10 May 2006, Linus Torvalds wrote:

> On Wed, 10 May 2006, Nicolas Pitre wrote:
> >
> > It cannot be assumed that the given buffer will never be moved when 
> > shrinking the allocated memory size with realloc().  So let's ignore 
> > that optimization for now.
> 
> Yeah, that was totally bogus. It would re-allocate _part_ of an 
> allocation, but that allocation contained not just the index, but all the 
> hashes and the hash entries too!

Yep.

> This has nothing to do with moving a buffer - it has everything to do with 
> the fact that you shrunk a buffer that still contained all the other 
> buffers: you shrunk the "mem" allocation to fit just the first part of it, 
> and entirely ignored the "hash" and "entry" parts of it.

No.

The initial allocation assumes a perfectly even distribution of entries 
in which case the entry array would be all populated.

When there are repeated bytes then consecutive blocks will have the same 
hash, and in that case keeping only the first one is useful.

Which means that the entry pointer won't get to the end of the allocated 
space for all entries and I naively assumed that using realloc would 
only mark the allocated memory as smaller than it originally was without 
actually copying any of the remaining data, which is what happened in 
most cases but evidently not always.

But if the buffer moves the hash array containing _pointers_ to hash 
entries gets totally screwed.

> Btw, I think that whole "allocate everything in one allocation" thing is 
> potentially bogus even the way it is now, if the alignment constraints of 
> "index", "hash" and "entry" are different.

Yeah...  I might just do a separate allocation for hash entries as well.

Or maybe even drop the hash chaining altogether (now that the code does 
backward matching that might be worth the code simplification).

Nicolas

^ permalink raw reply

* [Patch] git-cvsimport: tiny fix
From: Elrond @ 2006-05-10 17:37 UTC (permalink / raw)
  To: git

git-cvsimport: Handle "Removed" from pserver

Sometimes the pserver says "Removed" instead of
"Remove-entry".

Signed-off-by: Elrond <elrond+kernel.org@samba-tng.org>
---
Hi,

At least the above happened to me on a repository I tried
to convert.
Without the patch, it just die("Unknown: Removed ...")s.


    Elrond

--- git-cvsimport.perl
+++ git-cvsimport.perl
@@ -350,7 +350,7 @@ sub _line {
 				return $res;
 			} elsif($line =~ s/^E //) {
 				# print STDERR "S: $line\n";
-			} elsif($line =~ /^Remove-entry /i) {
+			} elsif($line =~ /^(Remove-entry|Removed) /i) {
 				$line = $self->readline(); # filename
 				$line = $self->readline(); # OK
 				chomp $line;

^ permalink raw reply

* Re: [PATCH] fix diff-delta bad memory access
From: Linus Torvalds @ 2006-05-10 19:01 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Junio C Hamano, git, Randal L. Schwartz, Alex Riesen
In-Reply-To: <Pine.LNX.4.64.0605101311020.24505@localhost.localdomain>

Btw, Nico, that rabin hash code is _extremely_ confusing.

The hash entry pointers point to "data + RABIN_WINDOW", and then to make 
things even _more_ confusing, the hash calculation code is actually offset 
by one, so it will have computed the hash with

	val = ((val << 8) | data[i]) ^ T[val >> RABIN_SHIFT];

where "i" goes from _1_ to RABIN_WINDOW instead of 0..WINDOW-1.

So, if I read that correctly, the "entry->ptr" actually points not to the 
beginning of the data that was hashed, or even the end, but literally to 
the last byte of the data that was hashed in that window.

Isn't that just _really_ confusing?

Or is there some sense to this?

			Linus

^ permalink raw reply

* Re: [PATCH] fix diff-delta bad memory access
From: Nicolas Pitre @ 2006-05-10 19:43 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git, Randal L. Schwartz, Alex Riesen
In-Reply-To: <Pine.LNX.4.64.0605101159160.3718@g5.osdl.org>

On Wed, 10 May 2006, Linus Torvalds wrote:

> 
> 
> Btw, Nico, that rabin hash code is _extremely_ confusing.

Possible.

> The hash entry pointers point to "data + RABIN_WINDOW", and then to make 
> things even _more_ confusing, the hash calculation code is actually offset 
> by one, so it will have computed the hash with
> 
> 	val = ((val << 8) | data[i]) ^ T[val >> RABIN_SHIFT];
> 
> where "i" goes from _1_ to RABIN_WINDOW instead of 0..WINDOW-1.
> 
> So, if I read that correctly, the "entry->ptr" actually points not to the 
> beginning of the data that was hashed, or even the end, but literally to 
> the last byte of the data that was hashed in that window.
> 
> Isn't that just _really_ confusing?
> 
> Or is there some sense to this?

Yes, ptr points to the last byte of the window for given hash value.

This is so because in the matching loop the window is scrolled byte by 
byte and the new hash value is always made of ROBIN_WINDOW-1 bytes which 
already have been processed (most probably added as literal bytes in the 
delta buffer) plus one new byte.  So it makes sense to start forward 
byte matching only from that last byte to find the longest source area 
to match, especially since all the other bytes in the window are likely 
to be identical already and comparing them repeatedly for entries with 
the same hash would be wasteful in most cases.

Further down, once the best offset with the longest match in the source 
buffer has been found, backward matching is performed to remove as much 
literal bytes that were added to the delta output as possible, which is 
most likely to catch the whole Robin window that hasn't been compared 
previously, but in that case the window content is compared only once.

And the reason why the reference hash is computed with an offset of 1 to 
RABIN_WINDOW inclusive in create_delta_index() is to allow for quick 
initialization of the Rabin window _outside_ of the main loop in 
create_delta().  There is a comment to that effect near the top of 
create_delta_index but probably a small reminder should be added in the 
index loop as well.

Nicolas

^ permalink raw reply

* Re: [PATCH] fix diff-delta bad memory access
From: Nicolas Pitre @ 2006-05-10 19:57 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git, Randal L. Schwartz, Alex Riesen
In-Reply-To: <Pine.LNX.4.64.0605101515420.24505@localhost.localdomain>


And of course s/robin/rabin/ (I can't type RABIN without having my 
fingers decide on ROBIN by themselves).

On Wed, 10 May 2006, Nicolas Pitre wrote:

> On Wed, 10 May 2006, Linus Torvalds wrote:
> 
> > 
> > 
> > Btw, Nico, that rabin hash code is _extremely_ confusing.
> 
> Possible.
> 
> > The hash entry pointers point to "data + RABIN_WINDOW", and then to make 
> > things even _more_ confusing, the hash calculation code is actually offset 
> > by one, so it will have computed the hash with
> > 
> > 	val = ((val << 8) | data[i]) ^ T[val >> RABIN_SHIFT];
> > 
> > where "i" goes from _1_ to RABIN_WINDOW instead of 0..WINDOW-1.
> > 
> > So, if I read that correctly, the "entry->ptr" actually points not to the 
> > beginning of the data that was hashed, or even the end, but literally to 
> > the last byte of the data that was hashed in that window.
> > 
> > Isn't that just _really_ confusing?
> > 
> > Or is there some sense to this?
> 
> Yes, ptr points to the last byte of the window for given hash value.
> 
> This is so because in the matching loop the window is scrolled byte by 
> byte and the new hash value is always made of ROBIN_WINDOW-1 bytes which 
> already have been processed (most probably added as literal bytes in the 
> delta buffer) plus one new byte.  So it makes sense to start forward 
> byte matching only from that last byte to find the longest source area 
> to match, especially since all the other bytes in the window are likely 
> to be identical already and comparing them repeatedly for entries with 
> the same hash would be wasteful in most cases.
> 
> Further down, once the best offset with the longest match in the source 
> buffer has been found, backward matching is performed to remove as much 
> literal bytes that were added to the delta output as possible, which is 
> most likely to catch the whole Robin window that hasn't been compared 
> previously, but in that case the window content is compared only once.
> 
> And the reason why the reference hash is computed with an offset of 1 to 
> RABIN_WINDOW inclusive in create_delta_index() is to allow for quick 
> initialization of the Rabin window _outside_ of the main loop in 
> create_delta().  There is a comment to that effect near the top of 
> create_delta_index but probably a small reminder should be added in the 
> index loop as well.
> 
> 
> Nicolas
> -
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


Nicolas

^ permalink raw reply

* common URL for repository and gitweb
From: Martin Waitz @ 2006-05-10 23:00 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 1114 bytes --]

hoi :)

I hacked a little bit in gitweb so that it can get the project path
form the URI without using a ?p= parameter.  That is, you can now
use "http://.../cgi-bin/gitweb.cgi/path/to/project/" to show
the summary of your project.

Together with a apache configuration like below, you can give your
gitweb pages the same URL as your repositories:

	<VirtualHost www:80>
		ServerName git.hostname.org
		DocumentRoot /pub/git
		RewriteEngine on
		RewriteRule ^/(.*\.git/(.*\.html)?)?$ /usr/lib/cgi-bin/gitweb.cgi%{REQUEST_URI}  [L]
	</VirtualHost>

This will rewrite all URLs that go to gitweb to use the CGI, while
leaving URLs for the repository intact.

You can see an example at http://git.admingilde.org/.
The gitweb version used for that is available here, too.

As an added bonus, gitweb can now serve the "html" branch of a
repository directly using "text/html", so you can show your
documentation without needing to update a checked out version
of this branch.
For example have a look at the GIT manpages at
http://git.admingilde.org/tali/git.git/git.html

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply

* Re: Implementing branch attributes in git config
From: Martin Langhoff @ 2006-05-10 23:17 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: sean, junkio, Johannes.Schindelin, git
In-Reply-To: <Pine.LNX.4.64.0605100823350.3718@g5.osdl.org>

On 5/11/06, Linus Torvalds <torvalds@osdl.org> wrote:
> Sure. It clearly _is_ a bike shed, which is why I posted patches: I think
> the way to resolve bike sheds is to "just do it". In the kernel, the

there's no disputing that!

> So don't knock the bike sheds. It's a BSD term, and I think there's a
> _reason_ why it's a BSD term. Those people seem to sometimes like to

Apologies -- I didn't want to know it, but I do wonder what the gain
behind the change is. It seems to me that it would be a bad idea to
store refs in the config file and, in my mind at least, branch info is
closer to refs.

> > As an end-user, I have personally stayed away from the increasingly
> > complex scheme for remotes waiting for it to settle, and stuck with
> > the old-styled .git/branches stuff which is slam-dunk simple and it
> > just works.
>
> It does work, and I think it's nice in many ways. It was certainly a good
> way to prototype things.
>
> At the same time, especially with moving things more to C (or almost any
> other language, for that matter - shell is really pretty special in making
> it _easier_ to have things in individual files), it's in many ways nicer
> to have a more structured representation that has a nice fixed interface
> and a library and known access methods behind it.

But it is a bit of a loss for perl/shell porcelains, and for users
that abuse the contents of .git directly on a regular basis...

there's nothing to stop us from having a structured representation of
what's in .git/branches/

> And I'm personally actually pretty fed up with the .git/branches/ and
> .git/remotes/ thing, partly because I can never remember which file is
> which. I had to look at the code of git-parse-remote.sh to remind me

Agreed, it's a mess, but I suspect it's still there to support cogito.
In keeping with the 'talk code' ethos, I'll work on adding support for
.git/remotes in cogito.

> And if we truly have separate files, we should go all the way, and have
> the good old "one file, one value" rule.

What about one semantics, one file? So far we have had 3 semantics:
git config, remotes, refs. And git config has been mostly
project-indepentent. In fact, I have been copying it across my
checkouts of different, unrelated projects. You just don't do that
with remotes or refs.

cheers,

martin

^ permalink raw reply

* Re: Implementing branch attributes in git config
From: Linus Torvalds @ 2006-05-10 23:55 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: sean, junkio, Johannes.Schindelin, git
In-Reply-To: <46a038f90605101617x1aa9bd2du959ead77ebf61795@mail.gmail.com>

On Thu, 11 May 2006, Martin Langhoff wrote:
> 
> Apologies -- I didn't want to know it, but I do wonder what the gain
> behind the change is.

I think we can do better in a few pretty important regards:

 - having the information in one place. I agree that the multi-file 
   approach works fine for shell scripts (although I disagree that the new 
   one would be harder - you just use git-repo-config instead), but I 
   think it's quite confusing from a new user perspective.

   I bet that even without any tools, new users can be told to just open 
   ".git/config", and guess how hard a time they would have to add a new 
   branch, if they already had one that said

	[branch "origin"]
		remote = git://git.kernel.org/pub/scm/git/git.git
		branch master

   which would tell you that the local branch "origin" is really branch
   "master" at that remote git repository.

   Yeah, I'm not sure what the actual config rules would be, but think it 
   would be a hell of a lot more intuitive than what we have now. 

   What we have now _works_. It works really well. No question about that. 
   It's just pretty hard to explain. The above syntax wouldn't even need 
   any explanation. You could just tell people to look into their config 
   files.

 - I think we'll have a much easier time (from a purely technical angle) 
   to add special attributes to the local branches. Add a "read-only" 
   specifier? It's _obvious_:

	[branch "origin"] 
		remote = git://git.kernel.org/pub/scm/git/git.git
		branch master
		readonly

   and it's absolutely trivial to parse. And part of the important thing 
   is that this all makes 100% sense EVEN IF IT'S NOT A REMOTE REPO!

   So imagine that it's a purely local branch, but you want to protect it. 
   Solution?

	[branch "July Snapshot"]
		readonly

   and you're done. In contrast, even if you ended up just extending the 
   file format for the .git/remotes/July\ Snapshot file, and just added a 
   "readonly" line to it, it wouldn't make _sense_. Whaa? "remotes"? In 
   contrast, in the .git/config file, it makes a ton of sense, and in fact 
   it's totally obvious.

   (Actually, we should probably have the .git/config file syntax separate 
   local branches like "master" from remote branches like "origin", so it 
   might be more like

	[remote "origin"]
		url = git://git.kernel.org/pub/scm/git/git.git

    which just tells that the _word_ "origin" corresponds to a 
    shorthand for a particular remote repository

	[branch "origin"]
		remote = origin
		branch = master

   or something to show that your _local_ branch named "origin" 
   corresponds to a particular remote (which could be a shorthand like the 
   above, or just spelled out), and a particular branch _at_ that remote 
   repository)

   Anyway, the point is, I think our current .git/remotes/xyzzy files 
   actually mix two different concepts, and they also end up doing it 
   pretty badly. They _work_, but because of the mix-ups, they aren't all 
   that they could be, and it's fundamentally impossible to make them so, 
   because the mixup really is that "origin" means TWO DIFFERENT THINGS 
   (the local branch, and the remote that it corresponds to)

 - Finally, I think it opens the possibility for some other things. For 
   example, once you accept that different branches might want attributes 
   like "readonly", you realize that some other attributes also make 
   sense. Like adding the default pull source per local branch, etc.

Again, I'm not saying that we can't work with the .git/remotes/ files. But 
I think it gets increasingly ugly, and the confusion gets increasingly 
worse.

> But it is a bit of a loss for perl/shell porcelains, and for users
> that abuse the contents of .git directly on a regular basis...

I really disagree. 

The .git/config file is _easier_ to edit by hand than the remotes. It's 
easier to copy-paste within one file than it is to work with two different 
files (and let's face it, copy-paste is usually what at least I would do 
for something like this). And it's _easier_ to just always open one file, 
and search within that one, than try to remember what file it was.

Now, C programs can very easily use the config library, and shell programs 
can equally easily query the variables with "git repo-config". I really 
doubt it's very hard for perl either, but I'm not a perl person, so maybe 
I don't see why this is hard.

		Linus

^ permalink raw reply

* Re: Implementing branch attributes in git config
From: Linus Torvalds @ 2006-05-11  0:11 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: sean, junkio, Johannes.Schindelin, git
In-Reply-To: <Pine.LNX.4.64.0605101629230.3718@g5.osdl.org>

On Wed, 10 May 2006, Linus Torvalds wrote:
>
>  - having the information in one place. I agree that the multi-file 
>    approach works fine for shell scripts (although I disagree that the new 
>    one would be harder - you just use git-repo-config instead), but I 
>    think it's quite confusing from a new user perspective.

Btw, I seriously believe that git has come to the point where we've licked 
the real technical issues. Stability hasn't been a concern for the last 
year - and even something as seriously as a repacking bug causing a 
SIGSEGV (yesterday) was actually basically designed to not be able to 
cause problems. The repack failed, and nothing happened to the old data. 
It was scary, but it wasn't "bad".

The last performance problem was a stupid one-liner, where one of the 
shell scripts didn't use the "--aggressive" flag for doing the trivial 
three-way merge, so it ended up forking and executing the "merge-one-file" 
shell script for 4500+ files for one unfortunate project that had a 
strange workflow. Adding the "--aggressive" flag took a 5-minute (where 
all of the time was spent in a shell script basically doing nothing) thing 
down to under a second.

So git should kick butt in performance, scale very well, and seems to take 
less disk space than just about anybody else. 

So what do people actually _complain_ about? 

I don't think we've seen a serious complaint lately that hasn't been about 
nice user interface and/or documentation. Anybody? 

So as far as I can tell, the #1 issue is that "new user" experience. You 
can pretty much forget about anything else. Working with git in a 
distributed manner is really easy and efficient, but from the comments 
I've seen, it's not always easy and obvious how to get to that point. 

Creating a remote repository and filling it. And being able to understand 
what the local vs remote branches actually _mean_. And I think our current 
.git/remotes/ thing is a part of that. It's not exactly user _hostile_, 
but it's very much "implementation friendly, and doesn't care about the 
user". So I think .git/config can help us there.

I also think we could do with a few scripts to just do setup of a remote 
repo:

	git remote clone <remoteaddress>
	git remote branch <remoteaddress> [-D]
	git remote fsck <remoteaddress>
	git remote repack <remoteaddress> -a -d

which would all basically boil down to "ssh to the remote address, cd into 
that directory, and do the named git command there" (well, not clone: 
doing a remote clone involves doing a mkdir/git-init-db/git-receive-pack 
remotely, and doing a git-send-pack locally, so some of them would be 
about doing things _both_ locally and remotely).

And documentation.

Now, I don't do documentation, and I really think somebody else could do 
the whole "git remote <cmd>" thing too. It _should_ really be pretty 
trivial. My real point is that almost none of this is about technology, 
and it's much more about trying to put a whole lot of lipstick on this 
pig. We have _got_ the technology already, and I think most people will 
agree git is doing pretty damn well there.

Because I really think the pig is quite charming, just sometimes you see 
some of its boorish sides right now..

(Or should that be "boarish", when we talk about pigs? ;)

		Linus

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox