Git development
 help / color / mirror / Atom feed
* Re: [PATCH 5/6] gitopt: convert setup_revisions(), and diff_opt_parse()
From: Eric Wong @ 2006-05-11 20:19 UTC (permalink / raw)
  To: git
In-Reply-To: <11471512123005-git-send-email-normalperson@yhbt.net>

In other news, this patch is broken.  Bundled args won't work if some
in the bundle are handled setup_revisions() and some are handled by
diff_opt_parse().

I'll work on fixing this, as well (may take a while working mostly one-handed).

-- 
Eric Wong

^ permalink raw reply

* Re: [PATCH] Fix git-pack-objects for 64-bit platforms
From: Linus Torvalds @ 2006-05-11 19:27 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Dennis Stosberg, git
In-Reply-To: <7v7j4swg0r.fsf@assigned-by-dhcp.cox.net>



On Thu, 11 May 2006, Junio C Hamano wrote:
> 
> Since I saw a patch that touches only one place, I thought I'd
> better point this out...
> 
> There are a few more places that knows about this
> ((char*)base_pointer + (entry_count * 24)) magic in our code.

Since I _do_ have a 64-bit big-endian architecture, I decided to install 
some of the 64-bit development libraries that I didn't already have, and I 
added "-m64" to the compiler flags.

All the tests seem to pass with the simple fix (and without it, we do 
indeed fail at least t5700-clone-reference.sh).

Of course, there might well be something else that doesn't get coverage, 
but passing all tests is at least a good sign. And since x86-64 has been 
getting pretty extensive coverage, I don't think we have a lot of 64-bit 
bugs per se, this one just happened to hide.

		Linus

---
diff --git a/pack-objects.c b/pack-objects.c
index 523a1c7..1b9e7a1 100644
--- a/pack-objects.c
+++ b/pack-objects.c
@@ -156,7 +156,7 @@ static void prepare_pack_revindex(struct
 
 	rix->revindex = xmalloc(sizeof(unsigned long) * (num_ent + 1));
 	for (i = 0; i < num_ent; i++) {
-		long hl = *((long *)(index + 24 * i));
+		uint32_t hl = *((uint32_t *)(index + 24 * i));
 		rix->revindex[i] = ntohl(hl);
 	}
 	/* This knows the pack format -- the 20-byte trailer

^ permalink raw reply related

* Re: [PATCH] Fix git-pack-objects for 64-bit platforms
From: Linus Torvalds @ 2006-05-11 19:10 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Dennis Stosberg, git
In-Reply-To: <7v7j4swg0r.fsf@assigned-by-dhcp.cox.net>



On Thu, 11 May 2006, Junio C Hamano wrote:
> 
> Is uint32_t guaranteed to be exactly 32-bit, or merely enough to
> hold 32-bit?

I think it's guaranteed to be 32-bit, but regardless, the current git 
headers already assume that "unsigned int" is 32-bit.

Which is a pretty safe assumption for at least the next ten years or so, 
possibly much longer. So I don't think we need to worry _too_ much about 
this. I think it's more important to try to get git working on Windows, 
than on 16-bit DOS or on a PDP-9, or one of the odd cray machines.

		Linus

^ permalink raw reply

* Re: [PATCH] Fix git-pack-objects for 64-bit platforms
From: Junio C Hamano @ 2006-05-11 18:52 UTC (permalink / raw)
  To: Dennis Stosberg; +Cc: git, Linus Torvalds
In-Reply-To: <Pine.LNX.4.64.0605111054290.3866@g5.osdl.org>

Linus Torvalds <torvalds@osdl.org> writes:

> On Thu, 11 May 2006, Dennis Stosberg wrote:
>> 
>> I am not sure whether an int cast or an int32_t cast is more
>> appropriate here.  An int is not guaranteed to be four bytes wide,
>> but I don't know of any modern platform where that's not the case.
>> On the other hand int32_t is not necessarily available before C99.
>> 
>> Any opinions?  I wonder why no one has hit this on x86_64...
>
> I think the "ntohl()" hides it. It loads a 64-bit value, but since x86-64 
> is little-endian, the low 32 bits are correct. The htonl() will then strip 
> the high bits and make it all be big-endian.

That sounds sensible.

Since I saw a patch that touches only one place, I thought I'd
better point this out...

There are a few more places that knows about this
((char*)base_pointer + (entry_count * 24)) magic in our code.

$ git grep -n -e '24  *\*' -e '\*  *24' master -- '*.c'

master:pack-objects.c:159:		long hl = *((long *)(index + 24 * i));

	This is yours.

master:sha1_file.c:447:	if (idx_size != 4*256 + nr * 24 + 20 + 20)
master:sha1_file.c:1148:	memcpy(sha1, (index + 24 * n + 4), 20);
master:sha1_file.c:1162:		int cmp = memcmp(index + 24 * mi + 4, sha1, 20);

	These three should be OK, I think.

master:sha1_file.c:1164:			e->offset = ntohl(*((int*)(index + 24 * mi)));

	This you might want to look at; I suspect it is what
	Linus suggests.

Also we _might_ have uglier magic that assumes the base_pointer to
be a pointer to a 4-byte integer and uses offset of multiple of
6 instead of 24, although I do not think it is likely.

I have to leave the keyboard in a few minutes so I cannot verify
nor fix them myself for the next 8 hours or so.  Sorry.

> And while I actually run a 64-bit big-endian machine myself (G5 ppc64), my 
> user space is all 32-bit by default, so it never showed up on linux-ppc64 
> either.
>
> Anyway, the correct type to use is "uint32_t" in this case. That's what 
> htonl() takes.

Is uint32_t guaranteed to be exactly 32-bit, or merely enough to
hold 32-bit?

^ permalink raw reply

* Re: [PATCH] Fix git-pack-objects for 64-bit platforms
From: Linus Torvalds @ 2006-05-11 17:58 UTC (permalink / raw)
  To: Dennis Stosberg; +Cc: git
In-Reply-To: <20060511173632.G60c08b4@leonov.stosberg.net>



On Thu, 11 May 2006, Dennis Stosberg wrote:
> 
> I am not sure whether an int cast or an int32_t cast is more
> appropriate here.  An int is not guaranteed to be four bytes wide,
> but I don't know of any modern platform where that's not the case.
> On the other hand int32_t is not necessarily available before C99.
> 
> Any opinions?  I wonder why no one has hit this on x86_64...

I think the "ntohl()" hides it. It loads a 64-bit value, but since x86-64 
is little-endian, the low 32 bits are correct. The htonl() will then strip 
the high bits and make it all be big-endian.

And while I actually run a 64-bit big-endian machine myself (G5 ppc64), my 
user space is all 32-bit by default, so it never showed up on linux-ppc64 
either.

Anyway, the correct type to use is "uint32_t" in this case. That's what 
htonl() takes.

		Linus

^ permalink raw reply

* [PATCH] Fix git-pack-objects for 64-bit platforms
From: Dennis Stosberg @ 2006-05-11 17:36 UTC (permalink / raw)
  To: git

The offset of an object in the pack is recorded as a 4-byte integer
in the index file.  When reading the offset from the mmap'ed index
in prepare_pack_revindex(), the address is dereferenced as a long*.
This works fine as long as the long type is four bytes wide.  On
NetBSD/sparc64, however, a long is 8 bytes wide and so dereferencing
the offset produces garbage.

Signed-off-by: Dennis Stosberg <dennis@stosberg.net>

---

I am not sure whether an int cast or an int32_t cast is more
appropriate here.  An int is not guaranteed to be four bytes wide,
but I don't know of any modern platform where that's not the case.
On the other hand int32_t is not necessarily available before C99.

Any opinions?  I wonder why no one has hit this on x86_64...


 pack-objects.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

026b1b2cdd5332f59e15cd8611a49ead3094d08c
diff --git a/pack-objects.c b/pack-objects.c
index 523a1c7..29bda43 100644
--- a/pack-objects.c
+++ b/pack-objects.c
@@ -156,7 +156,7 @@ static void prepare_pack_revindex(struct
 
 	rix->revindex = xmalloc(sizeof(unsigned long) * (num_ent + 1));
 	for (i = 0; i < num_ent; i++) {
-		long hl = *((long *)(index + 24 * i));
+		long hl = *((int *)(index + 24 * i));
 		rix->revindex[i] = ntohl(hl);
 	}
 	/* This knows the pack format -- the 20-byte trailer
-- 
1.3.2.gbe65

^ permalink raw reply related

* [PATCH] Fix compilation on newer NetBSD systems
From: Dennis Stosberg @ 2006-05-11 17:35 UTC (permalink / raw)
  To: git

NetBSD >=2.0 has iconv() in libc.  A libiconv is not required and
does not exist.

See: http://netbsd.gw.com/cgi-bin/man-cgi?iconv+3+NetBSD-2.0

Signed-off-by: Dennis Stosberg <dennis@stosberg.net>

---

 Makefile |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

2a73fcf43bfd1f73ec5e1e50396d54b83abae5e1
diff --git a/Makefile b/Makefile
index 37fbe78..26fa4e6 100644
--- a/Makefile
+++ b/Makefile
@@ -285,7 +285,9 @@ ifeq ($(uname_S),OpenBSD)
 	ALL_LDFLAGS += -L/usr/local/lib
 endif
 ifeq ($(uname_S),NetBSD)
-	NEEDS_LIBICONV = YesPlease
+	ifeq ($(shell test `uname -r | sed -e 's/^\([0-9]\).*/\1/'` -lt 2 && echo y),y)
+		NEEDS_LIBICONV = YesPlease
+	endif
 	ALL_CFLAGS += -I/usr/pkg/include
 	ALL_LDFLAGS += -L/usr/pkg/lib -Wl,-rpath,/usr/pkg/lib
 endif
-- 
1.3.2.gbe65

^ permalink raw reply related

* Re: Implementing branch attributes in git config
From: Junio C Hamano @ 2006-05-11 17:22 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git, Johannes Schindelin, sean
In-Reply-To: <Pine.LNX.4.64.0605091215340.3718@g5.osdl.org>

Linus Torvalds <torvalds@osdl.org> writes:

> On Tue, 9 May 2006, Junio C Hamano wrote:
>> 
>> If we are shooting for "let's not do this again", I do not think
>> (4) without some quoting convention is good enough.  Today, we
>> are talking about branch names so we could give them artificial
>> limits, which could be weaker than what we already have on the
>> branch names, but we would later regret that, when we start
>> wanting to have other names in the configuration (e.g. people's
>> names).
>
> Here's my suggestion as a patch.
>
> NOTE! This patch could be applied right now, and to all the branches (to
> make 1.x, 1.2.x and 1.3.x all support the _syntax_). Even if nothing 
> actually uses it.

Linus,

I've adjusted this patch, your follow-up patch, and Sean's
"extended section part is case sensitive" patch, along with the
test tweak to "maint" branch.  I also prepared them to be
mergeable to the "master" branch.  Tentatively it is in "next"
for testing.

I'm ready to push out "maint" (not tagged as 1.3.3 yet) and
"next", but have not done so.

I understand the plan is to have 1.3.3 out from this "maint",
and also merge this in "master" about the same time, but I am
expecting I will be offline for the rest of the week most of the
time, so it would happen over the weekend at the earliest.

Does that sound good to you?

I do not think we would need v1.1.7 nor v1.2.7 for this.  People
who have stayed at v1.1.6 or v1.2.6 would need to update if they
are going to use newer git in their repo anyway, and I do not
think of a reason not to update to 1.3.3 but update to 1.1.7 or
1.2.7 in order to stay at the feature level of 1.1.X or 1.2.X
series.  We haven't made incompatible changes as far as I
remember.

Here is what the (adjusted) test case in "next" looks like.
Corresponding one in "maint" lack --list so does not have the
last hunk.  The "maint" and "next" branches I have locally both
passes the test.

-- >8 --

diff --git a/t/t1300-repo-config.sh b/t/t1300-repo-config.sh
index 7090ea9..8260d57 100755
--- a/t/t1300-repo-config.sh
+++ b/t/t1300-repo-config.sh
@@ -229,7 +229,7 @@ test_expect_failure 'invalid key' 'git-r
 test_expect_success 'correct key' 'git-repo-config 123456.a123 987'
 
 test_expect_success 'hierarchical section' \
-	'git-repo-config 1.2.3.alpha beta'
+	'git-repo-config Version.1.2.3eX.Alpha beta'
 
 cat > expect << EOF
 [beta] ; silly comment # another comment
@@ -241,8 +241,8 @@ # empty line
 	NoNewLine = wow2 for me
 [123456]
 	a123 = 987
-[1.2.3]
-	alpha = beta
+[Version "1.2.3eX"]
+	Alpha = beta
 EOF
 
 test_expect_success 'hierarchical section value' 'cmp .git/config expect'
@@ -251,7 +251,7 @@ cat > expect << EOF
 beta.noindent=sillyValue
 nextsection.nonewline=wow2 for me
 123456.a123=987
-1.2.3.alpha=beta
+version.1.2.3eX.alpha=beta
 EOF
 
 test_expect_success 'working --list' \

^ permalink raw reply related

* Re: Implementing branch attributes in git config
From: Jeff King @ 2006-05-11 11:39 UTC (permalink / raw)
  To: git
In-Reply-To: <20060511095158.GA23620@coredump.intra.peff.net>

On Thu, May 11, 2006 at 05:51:58AM -0400, Jeff King wrote:

> +remote=`get_remote_url "$1"` shift;

Oops, this should be:
  remote=`get_remote_url "$1"`; shift
but it actually works fine under dash (a bug in dash?)

-Peff

^ permalink raw reply

* Re: Implementing branch attributes in git config
From: Johannes Schindelin @ 2006-05-11 10:30 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: git
In-Reply-To: <46a038f90605101713n506207aem1326cd2bf5a1f861@mail.gmail.com>

Hi,

On Thu, 11 May 2006, Martin Langhoff wrote:

> [...] where I now I just mostly copy .git/config around.

Why not just edit the template?

Ciao,
Dscho

^ permalink raw reply

* Re: Implementing branch attributes in git config
From: Jeff King @ 2006-05-11  9:51 UTC (permalink / raw)
  To: git
In-Reply-To: <Pine.LNX.4.64.0605101656110.3718@g5.osdl.org>

On Wed, May 10, 2006 at 05:11:17PM -0700, Linus Torvalds wrote:

> I also think we could do with a few scripts to just do setup of a remote 
> repo:
> 
> 	git remote clone <remoteaddress>
> 	git remote branch <remoteaddress> [-D]
> 	git remote fsck <remoteaddress>
> 	git remote repack <remoteaddress> -a -d

Here's a 'git remote' that handles the easy commands. It makes things
like 'git remote origin repack -a -d' do what you expect. The biggest
problems are:
  - it only works for ssh remotes
  - it assumes your remote path is a git dir (do we have a usual way of
    deciding between $path and $path/.git?)
  - ssh'ing will mangle your shell quoting in the command arguments
  - the url parsing is somewhat ad-hoc (do we have a usual way of
    parsing urls for shell scripts?)

---
Add braindead git-remote script.

This script is a convenience wrapper for performing remote commands on a
repository using ssh.

---

 Makefile      |    2 +-
 git-remote.sh |   19 +++++++++++++++++++
 2 files changed, 20 insertions(+), 1 deletions(-)
 create mode 100644 git-remote.sh

8810ae2524d3339b8a8341b34b2d3f14ddb9c899
diff --git a/Makefile b/Makefile
index 37fbe78..58eddd8 100644
--- a/Makefile
+++ b/Makefile
@@ -125,7 +125,7 @@ SCRIPT_SH = \
 	git-applymbox.sh git-applypatch.sh git-am.sh \
 	git-merge.sh git-merge-stupid.sh git-merge-octopus.sh \
 	git-merge-resolve.sh git-merge-ours.sh git-grep.sh \
-	git-lost-found.sh
+	git-lost-found.sh git-remote.sh
 
 SCRIPT_PERL = \
 	git-archimport.perl git-cvsimport.perl git-relink.perl \
diff --git a/git-remote.sh b/git-remote.sh
new file mode 100644
index 0000000..04b1ce9
--- /dev/null
+++ b/git-remote.sh
@@ -0,0 +1,19 @@
+#!/bin/sh
+
+USAGE='<remote> <command> [options]'
+. git-sh-setup
+. git-parse-remote
+
+case "$#" in
+  0|1) usage ;;
+esac
+
+remote=`get_remote_url "$1"` shift;
+case "$remote" in
+  ssh://*|git+ssh://*|ssh+git://*)
+    host=`echo "$remote" | sed 's!^[^/]*://\([^/]*\).*!\1!'`
+    path=`echo "$remote" | sed 's!^[^/]*://[^/]*\(.*\)!\1!'`
+    exec ssh -n $host "GIT_DIR=$path git $@"
+    ;;
+  *) die "unhandled protocol: $remote" ;;
+esac
-- 
1.3.1

^ permalink raw reply related

* Re: Implementing branch attributes in git config
From: Jakub Narebski @ 2006-05-11  6:02 UTC (permalink / raw)
  To: git
In-Reply-To: <Pine.LNX.4.64.0605102148310.24505@localhost.localdomain>

Nicolas Pitre wrote:

> On Wed, 10 May 2006, Linus Torvalds wrote:
> 
>> [branch "origin"]
>> remote = git://git.kernel.org/pub/scm/git/git.git
>> branch master
> 
> I totally agree with the principle, but I find the above really
> confusing.  Which "branch" means what?  At least if it was "remote_url"
> and "remote_branch" then there wouldn't be any possibility for
> confusion.

I'm not sure if remotes shortcuts and configuration (which branches should
be pulled, and to which branches) should be in branches configuration. It
is somewhat confusing. Branches configuration might be used for default
pulling, e.g.


[remote "kernel.org"]
        url = git://git.kernel.org/pub/scm/git/git.git
        pull = master:origin
        ...
        pull = +pu:pu

[branch "origin"]
        pull = kernel.org
        readonly

[branch "pu"]
        pull = kernel.org
        readonly
        fast-forward = no

-- 
Jakub Narebski
Warsaw, Poland

^ permalink raw reply

* Re: Implementing branch attributes in git config
From: Nicolas Pitre @ 2006-05-11  1:53 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Martin Langhoff, sean, junkio, Johannes.Schindelin, git
In-Reply-To: <Pine.LNX.4.64.0605101629230.3718@g5.osdl.org>

On Wed, 10 May 2006, Linus Torvalds wrote:

> 	[branch "origin"]
> 		remote = git://git.kernel.org/pub/scm/git/git.git
> 		branch master

I totally agree with the principle, but I find the above really 
confusing.  Which "branch" means what?  At least if it was "remote_url" 
and "remote_branch" then there wouldn't be any possibility for 
confusion.


Nicolas

^ permalink raw reply

* Re: Implementing branch attributes in git config
From: Martin Langhoff @ 2006-05-11  0:13 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: sean, junkio, Johannes.Schindelin, git
In-Reply-To: <Pine.LNX.4.64.0605101629230.3718@g5.osdl.org>

On 5/11/06, Linus Torvalds <torvalds@osdl.org> wrote:
>         [branch "origin"]
>                 remote = git://git.kernel.org/pub/scm/git/git.git
>                 branch master

This is confusing at first read -- is it branch origin or branch master?

>    Anyway, the point is, I think our current .git/remotes/xyzzy files
>    actually mix two different concepts, and they also end up doing it
>    pretty badly. They _work_, but because of the mix-ups, they aren't all
>    that they could be, and it's fundamentally impossible to make them so,
>    because the mixup really is that "origin" means TWO DIFFERENT THINGS
>    (the local branch, and the remote that it corresponds to)

As you say, this needs to be explained/exposed better to the user.
Now, how about having one .git/config and one .git/branches file?
Different semantics, etc.

> The .git/config file is _easier_ to edit by hand than the remotes. It's
> easier to copy-paste within one file than it is to work with two

Agreed, but I suspect repo config and branches config travel at
different speeds. Maybe what this means is that if this happens, we'll
start seeing a need for ~/.git/config and /etc/git/config to set
defaults (merge.summary=1 for all my repos, core.sharedrepository=1
for all the repos on this server) where I now I just mostly copy
.git/config around.

Does that make sense?

> I don't see why this is hard.

Must be me... it's not the Perl part... I just do a lot of grep |
xargs | sed stuff in my daily git usage ;-)

cheers,


martin

^ permalink raw reply

* Re: Implementing branch attributes in git config
From: Linus Torvalds @ 2006-05-11  0:11 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: sean, junkio, Johannes.Schindelin, git
In-Reply-To: <Pine.LNX.4.64.0605101629230.3718@g5.osdl.org>



On Wed, 10 May 2006, Linus Torvalds wrote:
>
>  - having the information in one place. I agree that the multi-file 
>    approach works fine for shell scripts (although I disagree that the new 
>    one would be harder - you just use git-repo-config instead), but I 
>    think it's quite confusing from a new user perspective.

Btw, I seriously believe that git has come to the point where we've licked 
the real technical issues. Stability hasn't been a concern for the last 
year - and even something as seriously as a repacking bug causing a 
SIGSEGV (yesterday) was actually basically designed to not be able to 
cause problems. The repack failed, and nothing happened to the old data. 
It was scary, but it wasn't "bad".

The last performance problem was a stupid one-liner, where one of the 
shell scripts didn't use the "--aggressive" flag for doing the trivial 
three-way merge, so it ended up forking and executing the "merge-one-file" 
shell script for 4500+ files for one unfortunate project that had a 
strange workflow. Adding the "--aggressive" flag took a 5-minute (where 
all of the time was spent in a shell script basically doing nothing) thing 
down to under a second.

So git should kick butt in performance, scale very well, and seems to take 
less disk space than just about anybody else. 

So what do people actually _complain_ about? 

I don't think we've seen a serious complaint lately that hasn't been about 
nice user interface and/or documentation. Anybody? 

So as far as I can tell, the #1 issue is that "new user" experience. You 
can pretty much forget about anything else. Working with git in a 
distributed manner is really easy and efficient, but from the comments 
I've seen, it's not always easy and obvious how to get to that point. 

Creating a remote repository and filling it. And being able to understand 
what the local vs remote branches actually _mean_. And I think our current 
.git/remotes/ thing is a part of that. It's not exactly user _hostile_, 
but it's very much "implementation friendly, and doesn't care about the 
user". So I think .git/config can help us there.

I also think we could do with a few scripts to just do setup of a remote 
repo:

	git remote clone <remoteaddress>
	git remote branch <remoteaddress> [-D]
	git remote fsck <remoteaddress>
	git remote repack <remoteaddress> -a -d

which would all basically boil down to "ssh to the remote address, cd into 
that directory, and do the named git command there" (well, not clone: 
doing a remote clone involves doing a mkdir/git-init-db/git-receive-pack 
remotely, and doing a git-send-pack locally, so some of them would be 
about doing things _both_ locally and remotely).

And documentation.

Now, I don't do documentation, and I really think somebody else could do 
the whole "git remote <cmd>" thing too. It _should_ really be pretty 
trivial. My real point is that almost none of this is about technology, 
and it's much more about trying to put a whole lot of lipstick on this 
pig. We have _got_ the technology already, and I think most people will 
agree git is doing pretty damn well there.

Because I really think the pig is quite charming, just sometimes you see 
some of its boorish sides right now..

(Or should that be "boarish", when we talk about pigs? ;)

		Linus

^ permalink raw reply

* Re: Implementing branch attributes in git config
From: Linus Torvalds @ 2006-05-10 23:55 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: sean, junkio, Johannes.Schindelin, git
In-Reply-To: <46a038f90605101617x1aa9bd2du959ead77ebf61795@mail.gmail.com>



On Thu, 11 May 2006, Martin Langhoff wrote:
> 
> Apologies -- I didn't want to know it, but I do wonder what the gain
> behind the change is.

I think we can do better in a few pretty important regards:

 - having the information in one place. I agree that the multi-file 
   approach works fine for shell scripts (although I disagree that the new 
   one would be harder - you just use git-repo-config instead), but I 
   think it's quite confusing from a new user perspective.

   I bet that even without any tools, new users can be told to just open 
   ".git/config", and guess how hard a time they would have to add a new 
   branch, if they already had one that said

	[branch "origin"]
		remote = git://git.kernel.org/pub/scm/git/git.git
		branch master

   which would tell you that the local branch "origin" is really branch
   "master" at that remote git repository.

   Yeah, I'm not sure what the actual config rules would be, but think it 
   would be a hell of a lot more intuitive than what we have now. 

   What we have now _works_. It works really well. No question about that. 
   It's just pretty hard to explain. The above syntax wouldn't even need 
   any explanation. You could just tell people to look into their config 
   files.

 - I think we'll have a much easier time (from a purely technical angle) 
   to add special attributes to the local branches. Add a "read-only" 
   specifier? It's _obvious_:

	[branch "origin"] 
		remote = git://git.kernel.org/pub/scm/git/git.git
		branch master
		readonly

   and it's absolutely trivial to parse. And part of the important thing 
   is that this all makes 100% sense EVEN IF IT'S NOT A REMOTE REPO!

   So imagine that it's a purely local branch, but you want to protect it. 
   Solution?

	[branch "July Snapshot"]
		readonly

   and you're done. In contrast, even if you ended up just extending the 
   file format for the .git/remotes/July\ Snapshot file, and just added a 
   "readonly" line to it, it wouldn't make _sense_. Whaa? "remotes"? In 
   contrast, in the .git/config file, it makes a ton of sense, and in fact 
   it's totally obvious.

   (Actually, we should probably have the .git/config file syntax separate 
   local branches like "master" from remote branches like "origin", so it 
   might be more like

	[remote "origin"]
		url = git://git.kernel.org/pub/scm/git/git.git

    which just tells that the _word_ "origin" corresponds to a 
    shorthand for a particular remote repository

	[branch "origin"]
		remote = origin
		branch = master

   or something to show that your _local_ branch named "origin" 
   corresponds to a particular remote (which could be a shorthand like the 
   above, or just spelled out), and a particular branch _at_ that remote 
   repository)

   Anyway, the point is, I think our current .git/remotes/xyzzy files 
   actually mix two different concepts, and they also end up doing it 
   pretty badly. They _work_, but because of the mix-ups, they aren't all 
   that they could be, and it's fundamentally impossible to make them so, 
   because the mixup really is that "origin" means TWO DIFFERENT THINGS 
   (the local branch, and the remote that it corresponds to)

 - Finally, I think it opens the possibility for some other things. For 
   example, once you accept that different branches might want attributes 
   like "readonly", you realize that some other attributes also make 
   sense. Like adding the default pull source per local branch, etc.

Again, I'm not saying that we can't work with the .git/remotes/ files. But 
I think it gets increasingly ugly, and the confusion gets increasingly 
worse.

> But it is a bit of a loss for perl/shell porcelains, and for users
> that abuse the contents of .git directly on a regular basis...

I really disagree. 

The .git/config file is _easier_ to edit by hand than the remotes. It's 
easier to copy-paste within one file than it is to work with two different 
files (and let's face it, copy-paste is usually what at least I would do 
for something like this). And it's _easier_ to just always open one file, 
and search within that one, than try to remember what file it was.

Now, C programs can very easily use the config library, and shell programs 
can equally easily query the variables with "git repo-config". I really 
doubt it's very hard for perl either, but I'm not a perl person, so maybe 
I don't see why this is hard.

		Linus

^ permalink raw reply

* Re: Implementing branch attributes in git config
From: Martin Langhoff @ 2006-05-10 23:17 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: sean, junkio, Johannes.Schindelin, git
In-Reply-To: <Pine.LNX.4.64.0605100823350.3718@g5.osdl.org>

On 5/11/06, Linus Torvalds <torvalds@osdl.org> wrote:
> Sure. It clearly _is_ a bike shed, which is why I posted patches: I think
> the way to resolve bike sheds is to "just do it". In the kernel, the

there's no disputing that!

> So don't knock the bike sheds. It's a BSD term, and I think there's a
> _reason_ why it's a BSD term. Those people seem to sometimes like to

Apologies -- I didn't want to know it, but I do wonder what the gain
behind the change is. It seems to me that it would be a bad idea to
store refs in the config file and, in my mind at least, branch info is
closer to refs.

> > As an end-user, I have personally stayed away from the increasingly
> > complex scheme for remotes waiting for it to settle, and stuck with
> > the old-styled .git/branches stuff which is slam-dunk simple and it
> > just works.
>
> It does work, and I think it's nice in many ways. It was certainly a good
> way to prototype things.
>
> At the same time, especially with moving things more to C (or almost any
> other language, for that matter - shell is really pretty special in making
> it _easier_ to have things in individual files), it's in many ways nicer
> to have a more structured representation that has a nice fixed interface
> and a library and known access methods behind it.

But it is a bit of a loss for perl/shell porcelains, and for users
that abuse the contents of .git directly on a regular basis...

there's nothing to stop us from having a structured representation of
what's in .git/branches/

> And I'm personally actually pretty fed up with the .git/branches/ and
> .git/remotes/ thing, partly because I can never remember which file is
> which. I had to look at the code of git-parse-remote.sh to remind me

Agreed, it's a mess, but I suspect it's still there to support cogito.
In keeping with the 'talk code' ethos, I'll work on adding support for
.git/remotes in cogito.

> And if we truly have separate files, we should go all the way, and have
> the good old "one file, one value" rule.

What about one semantics, one file? So far we have had 3 semantics:
git config, remotes, refs. And git config has been mostly
project-indepentent. In fact, I have been copying it across my
checkouts of different, unrelated projects. You just don't do that
with remotes or refs.

cheers,


martin

^ permalink raw reply

* common URL for repository and gitweb
From: Martin Waitz @ 2006-05-10 23:00 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 1114 bytes --]

hoi :)

I hacked a little bit in gitweb so that it can get the project path
form the URI without using a ?p= parameter.  That is, you can now
use "http://.../cgi-bin/gitweb.cgi/path/to/project/" to show
the summary of your project.

Together with a apache configuration like below, you can give your
gitweb pages the same URL as your repositories:

	<VirtualHost www:80>
		ServerName git.hostname.org
		DocumentRoot /pub/git
		RewriteEngine on
		RewriteRule ^/(.*\.git/(.*\.html)?)?$ /usr/lib/cgi-bin/gitweb.cgi%{REQUEST_URI}  [L]
	</VirtualHost>

This will rewrite all URLs that go to gitweb to use the CGI, while
leaving URLs for the repository intact.

You can see an example at http://git.admingilde.org/.
The gitweb version used for that is available here, too.

As an added bonus, gitweb can now serve the "html" branch of a
repository directly using "text/html", so you can show your
documentation without needing to update a checked out version
of this branch.
For example have a look at the GIT manpages at
http://git.admingilde.org/tali/git.git/git.html

-- 
Martin Waitz

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply

* Re: [PATCH] fix diff-delta bad memory access
From: Nicolas Pitre @ 2006-05-10 19:57 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git, Randal L. Schwartz, Alex Riesen
In-Reply-To: <Pine.LNX.4.64.0605101515420.24505@localhost.localdomain>


And of course s/robin/rabin/ (I can't type RABIN without having my 
fingers decide on ROBIN by themselves).

On Wed, 10 May 2006, Nicolas Pitre wrote:

> On Wed, 10 May 2006, Linus Torvalds wrote:
> 
> > 
> > 
> > Btw, Nico, that rabin hash code is _extremely_ confusing.
> 
> Possible.
> 
> > The hash entry pointers point to "data + RABIN_WINDOW", and then to make 
> > things even _more_ confusing, the hash calculation code is actually offset 
> > by one, so it will have computed the hash with
> > 
> > 	val = ((val << 8) | data[i]) ^ T[val >> RABIN_SHIFT];
> > 
> > where "i" goes from _1_ to RABIN_WINDOW instead of 0..WINDOW-1.
> > 
> > So, if I read that correctly, the "entry->ptr" actually points not to the 
> > beginning of the data that was hashed, or even the end, but literally to 
> > the last byte of the data that was hashed in that window.
> > 
> > Isn't that just _really_ confusing?
> > 
> > Or is there some sense to this?
> 
> Yes, ptr points to the last byte of the window for given hash value.
> 
> This is so because in the matching loop the window is scrolled byte by 
> byte and the new hash value is always made of ROBIN_WINDOW-1 bytes which 
> already have been processed (most probably added as literal bytes in the 
> delta buffer) plus one new byte.  So it makes sense to start forward 
> byte matching only from that last byte to find the longest source area 
> to match, especially since all the other bytes in the window are likely 
> to be identical already and comparing them repeatedly for entries with 
> the same hash would be wasteful in most cases.
> 
> Further down, once the best offset with the longest match in the source 
> buffer has been found, backward matching is performed to remove as much 
> literal bytes that were added to the delta output as possible, which is 
> most likely to catch the whole Robin window that hasn't been compared 
> previously, but in that case the window content is compared only once.
> 
> And the reason why the reference hash is computed with an offset of 1 to 
> RABIN_WINDOW inclusive in create_delta_index() is to allow for quick 
> initialization of the Rabin window _outside_ of the main loop in 
> create_delta().  There is a comment to that effect near the top of 
> create_delta_index but probably a small reminder should be added in the 
> index loop as well.
> 
> 
> Nicolas
> -
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


Nicolas

^ permalink raw reply

* Re: [PATCH] fix diff-delta bad memory access
From: Nicolas Pitre @ 2006-05-10 19:43 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git, Randal L. Schwartz, Alex Riesen
In-Reply-To: <Pine.LNX.4.64.0605101159160.3718@g5.osdl.org>

On Wed, 10 May 2006, Linus Torvalds wrote:

> 
> 
> Btw, Nico, that rabin hash code is _extremely_ confusing.

Possible.

> The hash entry pointers point to "data + RABIN_WINDOW", and then to make 
> things even _more_ confusing, the hash calculation code is actually offset 
> by one, so it will have computed the hash with
> 
> 	val = ((val << 8) | data[i]) ^ T[val >> RABIN_SHIFT];
> 
> where "i" goes from _1_ to RABIN_WINDOW instead of 0..WINDOW-1.
> 
> So, if I read that correctly, the "entry->ptr" actually points not to the 
> beginning of the data that was hashed, or even the end, but literally to 
> the last byte of the data that was hashed in that window.
> 
> Isn't that just _really_ confusing?
> 
> Or is there some sense to this?

Yes, ptr points to the last byte of the window for given hash value.

This is so because in the matching loop the window is scrolled byte by 
byte and the new hash value is always made of ROBIN_WINDOW-1 bytes which 
already have been processed (most probably added as literal bytes in the 
delta buffer) plus one new byte.  So it makes sense to start forward 
byte matching only from that last byte to find the longest source area 
to match, especially since all the other bytes in the window are likely 
to be identical already and comparing them repeatedly for entries with 
the same hash would be wasteful in most cases.

Further down, once the best offset with the longest match in the source 
buffer has been found, backward matching is performed to remove as much 
literal bytes that were added to the delta output as possible, which is 
most likely to catch the whole Robin window that hasn't been compared 
previously, but in that case the window content is compared only once.

And the reason why the reference hash is computed with an offset of 1 to 
RABIN_WINDOW inclusive in create_delta_index() is to allow for quick 
initialization of the Rabin window _outside_ of the main loop in 
create_delta().  There is a comment to that effect near the top of 
create_delta_index but probably a small reminder should be added in the 
index loop as well.


Nicolas

^ permalink raw reply

* Re: [PATCH] fix diff-delta bad memory access
From: Linus Torvalds @ 2006-05-10 19:01 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Junio C Hamano, git, Randal L. Schwartz, Alex Riesen
In-Reply-To: <Pine.LNX.4.64.0605101311020.24505@localhost.localdomain>



Btw, Nico, that rabin hash code is _extremely_ confusing.

The hash entry pointers point to "data + RABIN_WINDOW", and then to make 
things even _more_ confusing, the hash calculation code is actually offset 
by one, so it will have computed the hash with

	val = ((val << 8) | data[i]) ^ T[val >> RABIN_SHIFT];

where "i" goes from _1_ to RABIN_WINDOW instead of 0..WINDOW-1.

So, if I read that correctly, the "entry->ptr" actually points not to the 
beginning of the data that was hashed, or even the end, but literally to 
the last byte of the data that was hashed in that window.

Isn't that just _really_ confusing?

Or is there some sense to this?

			Linus

^ permalink raw reply

* [Patch] git-cvsimport: tiny fix
From: Elrond @ 2006-05-10 17:37 UTC (permalink / raw)
  To: git

git-cvsimport: Handle "Removed" from pserver

Sometimes the pserver says "Removed" instead of
"Remove-entry".

Signed-off-by: Elrond <elrond+kernel.org@samba-tng.org>
---
Hi,

At least the above happened to me on a repository I tried
to convert.
Without the patch, it just die("Unknown: Removed ...")s.


    Elrond

--- git-cvsimport.perl
+++ git-cvsimport.perl
@@ -350,7 +350,7 @@ sub _line {
 				return $res;
 			} elsif($line =~ s/^E //) {
 				# print STDERR "S: $line\n";
-			} elsif($line =~ /^Remove-entry /i) {
+			} elsif($line =~ /^(Remove-entry|Removed) /i) {
 				$line = $self->readline(); # filename
 				$line = $self->readline(); # OK
 				chomp $line;

^ permalink raw reply

* Re: [PATCH] fix diff-delta bad memory access
From: Nicolas Pitre @ 2006-05-10 17:27 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git, Randal L. Schwartz, Alex Riesen
In-Reply-To: <Pine.LNX.4.64.0605100953090.3718@g5.osdl.org>

On Wed, 10 May 2006, Linus Torvalds wrote:

> On Wed, 10 May 2006, Nicolas Pitre wrote:
> >
> > It cannot be assumed that the given buffer will never be moved when 
> > shrinking the allocated memory size with realloc().  So let's ignore 
> > that optimization for now.
> 
> Yeah, that was totally bogus. It would re-allocate _part_ of an 
> allocation, but that allocation contained not just the index, but all the 
> hashes and the hash entries too!

Yep.

> This has nothing to do with moving a buffer - it has everything to do with 
> the fact that you shrunk a buffer that still contained all the other 
> buffers: you shrunk the "mem" allocation to fit just the first part of it, 
> and entirely ignored the "hash" and "entry" parts of it.

No.

The initial allocation assumes a perfectly even distribution of entries 
in which case the entry array would be all populated.

When there are repeated bytes then consecutive blocks will have the same 
hash, and in that case keeping only the first one is useful.

Which means that the entry pointer won't get to the end of the allocated 
space for all entries and I naively assumed that using realloc would 
only mark the allocated memory as smaller than it originally was without 
actually copying any of the remaining data, which is what happened in 
most cases but evidently not always.

But if the buffer moves the hash array containing _pointers_ to hash 
entries gets totally screwed.

> Btw, I think that whole "allocate everything in one allocation" thing is 
> potentially bogus even the way it is now, if the alignment constraints of 
> "index", "hash" and "entry" are different.

Yeah...  I might just do a separate allocation for hash entries as well.

Or maybe even drop the hash chaining altogether (now that the code does 
backward matching that might be worth the code simplification).


Nicolas

^ permalink raw reply

* Re: [PATCH] fix diff-delta bad memory access
From: Linus Torvalds @ 2006-05-10 17:18 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Junio C Hamano, git, Randal L. Schwartz, Alex Riesen
In-Reply-To: <Pine.LNX.4.64.0605100953090.3718@g5.osdl.org>



On Wed, 10 May 2006, Linus Torvalds wrote:
> 
> Yeah, that was totally bogus. It would re-allocate _part_ of an 
> allocation, but that allocation contained not just the index, but all the 
> hashes and the hash entries too!

Actually, no, you got that part right. Mea culpa. It really only ended up 
being a problem when the area was moved, since the pointers into that area 
weren't updated.

The alignment issue is real, but looking at the different structures, 
they'll all have pointers that end up being the real (and only) alignment 
constraint, so as a result, on any reasonable machine they all have the 
same alignment and things are fine.

Junio - pls apply Nico's patch asap. It's correct, and fixes a really 
nasty problem. And I'll withdraw my other worries as "not consequential".

		Linus

^ permalink raw reply

* Re: [PATCH] fix diff-delta bad memory access
From: Linus Torvalds @ 2006-05-10 17:00 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Junio C Hamano, git, Randal L. Schwartz, Alex Riesen
In-Reply-To: <Pine.LNX.4.64.0605101216360.24505@localhost.localdomain>



On Wed, 10 May 2006, Nicolas Pitre wrote:
>
> It cannot be assumed that the given buffer will never be moved when 
> shrinking the allocated memory size with realloc().  So let's ignore 
> that optimization for now.

Yeah, that was totally bogus. It would re-allocate _part_ of an 
allocation, but that allocation contained not just the index, but all the 
hashes and the hash entries too!

This has nothing to do with moving a buffer - it has everything to do with 
the fact that you shrunk a buffer that still contained all the other 
buffers: you shrunk the "mem" allocation to fit just the first part of it, 
and entirely ignored the "hash" and "entry" parts of it.

Btw, I think that whole "allocate everything in one allocation" thing is 
potentially bogus even the way it is now, if the alignment constraints of 
"index", "hash" and "entry" are different.

When you do

	..
	index = mem;
	mem = index + 1;
	hash = mem;
	mem = hash + hsize;
	entry = mem;
	..

it's perfectly fine for "index", but "hash" and "entry" end up having 
alignments that depend on the size/alignment of "index" (and for "entry" 
on "hash").

So if their alignment requirements are different, you're basically 
screwed.

It may work in practice (maybe they all align on pointer boundaries), but 
it's damn scary. You should re-consider, or at least make that code be a 
lot safer (like actually taking alignment into consideration, both for 
total size and for the offset calculations).

That could be done by using unions or something.

			Linus

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox