Git development
 help / color / mirror / Atom feed
* Re: Unresolved issues #2 (shallow clone again)
From: Linus Torvalds @ 2006-05-08 15:32 UTC (permalink / raw)
  To: Jeff King; +Cc: git
In-Reply-To: <20060508042429.GA20249@coredump.intra.peff.net>



On Mon, 8 May 2006, Jeff King wrote:
>
> On Sun, May 07, 2006 at 08:27:02AM -0700, Linus Torvalds wrote:
> 
> > factor for a lot of things for many "common" filesystem setups. You 
> > probably didn't even account for the size of inodes in your "du" setup.
> 
> My numbers came from git-count-objects, which uses the st_blocks sum for
> all objects. The actual du numbers showing space wasted by block
> boundaries are:
>   du -c ??: 1429216
>   du -c --apparent-size ??: 792277
> So it's about 45% wasted space.

And that's actually ignoring inode sizes and directory sizes (well, it 
doesn't "ignore" directory sizes - it counts them - but if you compare it 
to a straight packed format, it's still overhead).

Anyway, looks like it's about 2:1, not 3:1 like I claimed, but the point 
being that blocking factors tend to be at least on the same order of 
magnitude as just plain compression (which also tends to be in the 2:1 
area for normal, fairly easily compressible, stuff).

The delta-packing obviously is much bigger for any project with real 
history. In traditional setups (where you always delta-pack within one 
thing, ie at the level of individual SCCS/RCS files), the delta packing 
obviously _also_ avoids blocking issues, since it means that a thousand 
revisions of the same file will all share the same inode.

So because git uses a whole-file model, it obviously makes the blocking 
issues with its unpacked format _much_ higher than for any traditional 
medium - no conglomeration of different versions of the file in the same 
filesystem object. On the other hand, the packed format also tends to be 
even _more_ efficient than a traditional one, so the end result of it all 
is apparently a pretty big net win even in space consumption).

Side note: I realize that some people think the packs are ugly and 
strange. They aren't linear versions of a file, and instead appear as a 
fairly random "jumble". And they can't be incrementally re-packed, and you 
have to generate a whole new pack-file (which can be incremental in 
_content_, of course). So people think they are ugly.

I'd argue that they are beautiful. They are beautiful because they _don't_ 
contain history in themselves (the objects they contain encode the history 
of course, but the pack-file itself does not).

And they are beautiful because we can use the exact same format for 
streaming data over the network as for the database itself (that, of 
course, was just about _the_ design consideration). Show me another system 
that has exactly the same (not "similar", not "same concepts": _same_) 
network protocol as it internal database.

And they are beautiful exactly because their lack of any internal 
structure allows you to pack things by criteria _you_ care about, ie the 
whole "sort things by recency" thing, so that commonly accessed data can 
be packed at the head of the pack-file - exactly because the pack-file 
doesn't have any internal structure of its own that you need to worry 
about and that constrains your sorting.

The same thing is what allows you to delta any blob against any other 
blob - without worrying about history or other random pack-file rules. You 
can do packign purely by how well you want to pack, not by any secondary 
constraints.

And the "no incremental updates" may sound like a huge downside, but it's 
all the same basic git logic: objects and filesystem contents are 
immutable, and that allows us to avoid a lot of locking overhead. Locking 
is _hard_. Locking is _inefficient_. And locking really really screws you 
when you miss it.

So I'll happily say that pack-files are strange, and that you have to get 
a bit used to the notion that they should be repacked "asynchronously". 
But it's really a matter of "getting used to it", because once you do, 
you'll see that it's actually an absolutely huge deal, and you'll learn to 
love the bomb^H^H^H^Hpack-file.

			Linus "pack-files rule" Torvalds

^ permalink raw reply

* [PATCH] improve base85 generated assembly code
From: Nicolas Pitre @ 2006-05-08 15:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

This code is arguably pretty hot, if you use binary patches of course. 
This patch helps gcc generate both smaller and faster code especially in 
the error free path.

Signed-off-by: Nicolas Pitre <nico@cam.org>

---

diff --git a/base85.c b/base85.c
index b97f7f9..a9e97f8 100644
--- a/base85.c
+++ b/base85.c
@@ -44,34 +44,38 @@ int decode_85(char *dst, char *buffer, i
 	say2("decode 85 <%.*s>", len/4*5, buffer);
 	while (len) {
 		unsigned acc = 0;
-		int cnt;
-		for (cnt = 0; cnt < 5; cnt++, buffer++) {
-			int ch = *((unsigned char *)buffer);
-			int de = de85[ch];
-			if (!de)
+		int de, cnt = 4;
+		unsigned char ch;
+		do {
+			ch = *buffer++;
+			de = de85[ch];
+			if (--de < 0)
 				return error("invalid base85 alphabet %c", ch);
-			de--;
-			if (cnt == 4) {
-				/*
-				 * Detect overflow.  The largest
-				 * 5-letter possible is "|NsC0" to
-				 * encode 0xffffffff, and "|NsC" gives
-				 * 0x03030303 at this point (i.e.
-				 * 0xffffffff = 0x03030303 * 85).
-				 */
-				if (0x03030303 < acc ||
-				    (0x03030303 == acc && de))
-					error("invalid base85 sequence %.5s",
-					      buffer-3);
-			}
 			acc = acc * 85 + de;
-			say1(" <%08x>", acc);
-		}
+		} while (--cnt);
+		ch = *buffer++;
+		de = de85[ch];
+		if (--de < 0)
+			return error("invalid base85 alphabet %c", ch);
+		/*
+		 * Detect overflow.  The largest
+		 * 5-letter possible is "|NsC0" to
+		 * encode 0xffffffff, and "|NsC" gives
+		 * 0x03030303 at this point (i.e.
+		 * 0xffffffff = 0x03030303 * 85).
+		 */
+		if (0x03030303 < acc ||
+		    0xffffffff - de < (acc *= 85))
+			error("invalid base85 sequence %.5s", buffer-5);
+		acc += de;
 		say1(" %08x", acc);
-		for (cnt = 0; cnt < 4 && len; cnt++, len--) {
-			*dst++ = (acc >> 24) & 0xff;
-			acc = acc << 8;
-		}
+
+		cnt = (len < 4) ? len : 4;
+		len -= cnt;
+		do {
+			acc = (acc << 8) | (acc >> 24);
+			*dst++ = acc;
+		} while (--cnt);
 	}
 	say("\n");
 
@@ -86,15 +90,17 @@ void encode_85(char *buf, unsigned char 
 	while (bytes) {
 		unsigned acc = 0;
 		int cnt;
-		for (cnt = 0; cnt < 4 && bytes; cnt++, bytes--) {
+		for (cnt = 24; cnt >= 0; cnt -= 8) {
 			int ch = *data++;
-			acc |= ch << ((3-cnt)*8);
+			acc |= ch << cnt;
+			if (--bytes == 0)
+				break;
 		}
 		say1(" %08x", acc);
-		for (cnt = 0; cnt < 5; cnt++) {
+		for (cnt = 4; cnt >= 0; cnt--) {
 			int val = acc % 85;
 			acc /= 85;
-			buf[4-cnt] = en85[val];
+			buf[cnt] = en85[val];
 		}
 		buf += 5;
 	}

^ permalink raw reply related

* Re: Implementing branch attributes in git config
From: Pavel Roskin @ 2006-05-08 15:15 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vody8howu.fsf@assigned-by-dhcp.cox.net>

On Mon, 2006-05-08 at 02:00 -0700, Junio C Hamano wrote:
> Stating what you do not like about something is a good first
> step to improve that something.  It should not be too hard to
> extend the parser to grok:
> 
> 	repo-config --get branchdata.description '\(.*\) for netdev$'
> 
> and when the value_regex has a capture return what matches
> instead of the entire value.  I think that would do what you
> want.

OK, that would be acceptable.

I still don't like the "for" conversion because it masquerades syntax
(note that text to right of "for" must be unique) as plain text, but
it's a matter of taste, so it's hard for me to argue about it.

-- 
Regards,
Pavel Roskin

^ permalink raw reply

* Re: [patch] munmap-before-rename, cygwin need
From: Yakov Lerner @ 2006-05-08 14:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vslnlk04v.fsf@assigned-by-dhcp.cox.net>

[-- Attachment #1: Type: text/plain, Size: 3308 bytes --]

On 5/7/06, Junio C Hamano <junkio@cox.net> wrote:
> "Yakov Lerner" <iler.ml@gmail.com> writes:
> > I found that mmap() works on cygwin, but needs a patch.
> > On Cygwin, rename() fails if target file has active mmap().
> > The patch below adds  munmap() before rename().
> This is interesting in three counts.
>
>  - I from time to time test Cygwin version on my day-job machine
>    (W2K) and my wife's machine (XP); on both machines I usually
>    have less than two weeks old Cygwin installation, and I have
>    not seen the breakage.  I wonder how reproducible this is.
>    Also previously people reported mmap() works for some and
>    fake mmap is needed for others.  Would this patch make things
>    work for everybody?
>
>  - The part you patched is commit_index_file().  This typically
>    is called just before program exit, but some callers, like
>    apply.c, may want to still look at the index after calling
>    it, fully aware that the changes after commit_index will not
>    be written out.  Although I haven't traced the codepath fully
>    in apply.c yet, unmapping would break the access to the index
>    (i.e. active_cache[]).  Does apply still work with your
>    patch?

You are right. Apply did not work when I gave it more than one patchfile on
commandline (and --index option). I fixed this by zeroing active_nr and freeing
active_cache in unmap_cache(). Then I got infinite loop in
remove_lock_file (after multiple calls to hold_index_file_for_update
with same cf, cache_file_list points to cf and cf->next points to
cf creating infinite loop.) The fix in index.c is easy.

The patch below works for me. However, it changes internal
working of apply.c in the scenario 'git-apply --index patch1 patch2 ...'.

(1) With the patch below, apply.c repeats mmap() on index after every patch
argument (because index gets unmapped after every patchfile argument).

(2) Current apply.c does single mmap() at the beginning. It modfies index
on disk and cache in memory and it does not repeat mmap. This mmap
is to original (now deleted) index, if i understand correctly (the
no-name inode).

But (2) this does not work in cygwin. The end results of (1) and
(2) are the same, I think. (2) looks to me bit faster (I didn't do
measurements).

It's up to you whether to make it under #ifdef MUNMAP_BEFORE_RENAME
of not. The changes are now bigger that in original patch.

The fix to index.c prevents circular list. I got infinite loop in
cache_file_list
every time I tried more than 1 patch on commandline of git-apply. I tried
other solution with the function below, but what I put in the atached patch
is shorter than the alternative below.

Yakov

P.S. I am attaching not inlining bacause
gmail totally removes leading tabs from inline text.

P.P.S.
Alternate fix for index.c:

static void clean_cache_file_list(struct cache_file *cf)
{
        struct cache_file *ppcf = &cache_file_list;
        cf->lockfile[0] = 0;
        while( *ppcf ) {
            if(*ppcf == cf ) {
                *ppcf = cf->next;
            } else
                ppcf = &(cf->next);
        }
}
-         cf->lockfile[0] = 0;
+        clean_cache_file_list(cf);
-         cf->lockfile[0] = 0;
+        clean_cache_file_list(cf);

[-- Attachment #2: patch2-unmap-before-rename --]
[-- Type: application/octet-stream, Size: 2712 bytes --]

--- cache.h.000	2006-05-08 16:59:09.000000000 +0000
+++ cache.h	2006-05-08 16:42:15.000000000 +0000
@@ -141,6 +141,7 @@
 
 /* Initialize and use the cache information */
 extern int read_cache(void);
+extern void unmap_cache(void);
 extern int write_cache(int newfd, struct cache_entry **cache, int entries);
 extern int cache_name_pos(const char *name, int namelen);
 #define ADD_CACHE_OK_TO_ADD 1		/* Ok to add */
@@ -161,6 +162,7 @@
 struct cache_file {
 	struct cache_file *next;
 	char lockfile[PATH_MAX];
+	int enlisted;
 };
 extern int hold_index_file_for_update(struct cache_file *, const char *path);
 extern int commit_index_file(struct cache_file *);
--- index.c.000	2006-05-08 16:19:22.000000000 +0000
+++ index.c	2006-05-08 16:57:14.000000000 +0000
@@ -5,6 +5,7 @@
 #include "cache.h"
 
 static struct cache_file *cache_file_list;
+static int cleanup_installed = 0;
 
 static void remove_lock_file(void)
 {
@@ -27,11 +28,17 @@
 	int fd;
 	sprintf(cf->lockfile, "%s.lock", path);
 	fd = open(cf->lockfile, O_RDWR | O_CREAT | O_EXCL, 0666);
-	if (fd >=0 && !cf->next) {
-		cf->next = cache_file_list;
-		cache_file_list = cf;
-		signal(SIGINT, remove_lock_file_on_signal);
-		atexit(remove_lock_file);
+	if (fd >=0) {
+		if (!cf->enlisted) {
+			cf->next = cache_file_list;
+			cache_file_list = cf;
+			cf->enlisted = 1;
+		}
+		if (!cleanup_installed) {
+			signal(SIGINT, remove_lock_file_on_signal);
+			atexit(remove_lock_file);
+			cleanup_installed = 1;
+		}
 	}
 	return fd;
 }
@@ -43,6 +50,9 @@
 	strcpy(indexfile, cf->lockfile);
 	i = strlen(indexfile) - 5; /* .lock */
 	indexfile[i] = 0;
+//#ifdef MUMNAP_BEFORE_RENAME
+	unmap_cache();
+//#endif
 	i = rename(cf->lockfile, indexfile);
 	cf->lockfile[0] = 0;
 	return i;
--- read-cache.c.000	2006-05-08 17:01:28.000000000 +0000
+++ read-cache.c	2006-05-08 17:02:24.000000000 +0000
@@ -513,6 +513,9 @@
 	return 0;
 }
 
+static void *mapaddr = MAP_FAILED;
+static unsigned long mapsize;
+
 int read_cache(void)
 {
 	int fd, i;
@@ -541,6 +544,8 @@
 		errno = EINVAL;
 		if (size >= sizeof(struct cache_header) + 20)
 			map = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
+		mapaddr = map;
+		mapsize = size;
 	}
 	close(fd);
 	if (map == MAP_FAILED)
@@ -565,10 +570,23 @@
 
 unmap:
 	munmap(map, size);
+	mapaddr = MAP_FAILED;
 	errno = EINVAL;
 	die("index file corrupt");
 }
 
+void unmap_cache(void)
+{
+	if ( mapaddr != MAP_FAILED ) {
+		munmap(mapaddr, mapsize);
+		mapaddr = MAP_FAILED;
+	}
+	active_cache_changed = 0;
+	active_nr = 0;
+	free(active_cache);
+	active_cache = NULL;
+}
+
 #define WRITE_BUFFER_SIZE 8192
 static unsigned char write_buffer[WRITE_BUFFER_SIZE];
 static unsigned long write_buffer_len;








^ permalink raw reply

* Re: stgit bug: pulling file deletion patch leaves copy of file
From: Catalin Marinas @ 2006-05-08 14:26 UTC (permalink / raw)
  To: Karl Hasselström; +Cc: git
In-Reply-To: <20060508124127.GA30662@diana.vm.bytemark.co.uk>

On 08/05/06, Karl Hasselström <kha@treskal.com> wrote:
> When I pull (with -m) a branch that has accepted a patch of mine which
> deletes a file, stgit does the right thing wrt detecting the applied
> patch, etc. But it leaves an untracked copy of the deleted file in my
> working directory, which I get to delete.

Indeed, it should but it doesn't. The file is generated while
reverse-applying the patches but the git.reset() function (which calls
git.checkout()) doesn't remove the file. I'll fix it and push the
changes tonight.

Thanks.

--
Catalin

^ permalink raw reply

* stgit bug: pulling file deletion patch leaves copy of file
From: Karl Hasselström @ 2006-05-08 12:41 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: git

When I pull (with -m) a branch that has accepted a patch of mine which
deletes a file, stgit does the right thing wrt detecting the applied
patch, etc. But it leaves an untracked copy of the deleted file in my
working directory, which I get to delete.

I suspect this file should be deleted -- but isn't -- by the patch
merge detection code that undoes the application of the reverse of my
patch.

-- 
Karl Hasselström, kha@treskal.com
      www.treskal.com/kalle

^ permalink raw reply

* Re: Implementing branch attributes in git config
From: Johannes Schindelin @ 2006-05-08 12:17 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Pavel Roskin, git
In-Reply-To: <7vody8howu.fsf@assigned-by-dhcp.cox.net>

Hi,

On Mon, 8 May 2006, Junio C Hamano wrote:

> 	repo-config --get branchdata.description '\(.*\) for netdev$'

POSIX regexps do not want the backslashes...

Something like this?

---
[PATCH] repo-config: allow one group in value regexp

The regular expression for the value can now contain one group. In case
there is one, the output will be just that group, not the whole value.
Now you can say

        git-repo-config --get branch.defaultremote '(.*) for junio'

and for a config like this:

        [branch]
                defaultRemote = sushi for junio

the output will be "sushi".

Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>

---

 repo-config.c |   30 ++++++++++++++++++++----------
 1 files changed, 20 insertions(+), 10 deletions(-)

diff --git a/repo-config.c b/repo-config.c
index 63eda1b..9ac4c51 100644
--- a/repo-config.c
+++ b/repo-config.c
@@ -26,31 +26,41 @@ static int show_all_config(const char *k
 static int show_config(const char* key_, const char* value_)
 {
 	char value[256];
-	const char *vptr = value;
+	const char *vptr = value_;
 	int dup_error = 0;
 
 	if (value_ == NULL)
-		value_ = "";
+		vptr = value_ = "";
 
 	if (!use_key_regexp && strcmp(key_, key))
 		return 0;
 	if (use_key_regexp && regexec(key_regexp, key_, 0, NULL, 0))
 		return 0;
-	if (regexp != NULL &&
-			 (do_not_match ^
-			  regexec(regexp, value_, 0, NULL, 0)))
-		return 0;
+	if (regexp != NULL) {
+		regmatch_t matches[2];
+		int len;
+
+		if (do_not_match ^ regexec(regexp, value_, 2, matches, 0))
+			return 0;
+		len = matches[1].rm_eo - matches[1].rm_so;
+		if (!do_not_match && len > 0) {
+			if (len > 255)
+				len = 255;
+			strncpy(value, value_ + matches[1].rm_so, len);
+			value[len] = 0;
+			vptr = value;
+		}
+	}
 
 	if (show_keys)
 		printf("%s ", key_);
 	if (seen && !do_all)
 		dup_error = 1;
-	if (type == T_INT)
+	if (type == T_INT) {
 		sprintf(value, "%d", git_config_int(key_, value_));
-	else if (type == T_BOOL)
+		vptr = value;
+	} else if (type == T_BOOL)
 		vptr = git_config_bool(key_, value_) ? "true" : "false";
-	else
-		vptr = value_;
 	seen++;
 	if (dup_error) {
 		error("More than one value for the key %s: %s",
-- 
1.3.1.g297e8-dirty

^ permalink raw reply related

* Re: [PATCH] Release config lock if the regex is invalid
From: Johannes Schindelin @ 2006-05-08 11:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7v7j4xi6lo.fsf@assigned-by-dhcp.cox.net>

Hi,

On Sun, 7 May 2006, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> 
> > This is not enough. There are quite a few exit paths. Notice the "goto 
> > out_free"? That is where this must go.
> >
> > This patch is totally untested but obviously correct:
> 
> except that many places you already close(fd) and
> unlink(lock_file).

Yes, my bad. I forgot to look for (and remove) them. Your patch looks fine 
to me.

Ciao,
Dscho

^ permalink raw reply

* Re: [patch] munmap-before-rename, cygwin need
From: Yakov Lerner @ 2006-05-08 10:58 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vslnlk04v.fsf@assigned-by-dhcp.cox.net>

On 5/8/06, Junio C Hamano <junkio@cox.net> wrote:
> "Yakov Lerner" <iler.ml@gmail.com> writes:
>
> > I found that mmap() works on cygwin, but needs a patch.
> > On Cygwin, rename() fails if target file has active mmap().
> > The patch below adds  munmap() before rename().
>
> This is interesting in three counts.
>
>  - I from time to time test Cygwin version on my day-job machine
>   (W2K) and my wife's machine (XP); on both machines I usually
>   have less than two weeks old Cygwin installation, and I have
>   not seen the breakage.  I wonder how reproducible this is.
>   Also previously people reported mmap() works for some and
>   fake mmap is needed for others.  Would this patch make things
>   work for everybody?
>
>  - The part you patched is commit_index_file().  This typically
>   is called just before program exit, but some callers, like
>   apply.c, may want to still look at the index after calling
>   it, fully aware that the changes after commit_index will not
>   be written out.  Although I haven't traced the codepath fully
>   in apply.c yet, unmapping would break the access to the index
>   (i.e. active_cache[]).  Does apply still work with your
>   patch?

I am checking this. I am trying different options and different
scenarios.

Yakov

^ permalink raw reply

* Re: Locking a branch
From: Catalin Marinas @ 2006-05-08 10:45 UTC (permalink / raw)
  To: Pavel Roskin; +Cc: git
In-Reply-To: <1146970243.24434.77.camel@dv>

Pavel Roskin <proski@gnu.org> wrote:
> I see there is an active discussion about branch attributes.  It
> would be nice to have an attribute to prevent git from changing the
> branch head in any way.  The reason is that it interferes with StGIT
> on StGIT managed branches.  If StGIT is fine with the change, it
> would remove or override the lock temporarily.  StGIT could also
> unlock the branch permanently if there are no applied patches.

I'm fine with this. At the moment, StGIT does extra checking before
some of the commands to ensure that HEAD is the same as the top of the
stack (i.e. no commits were generated outside stg commands). While
I'll probably still keep this check, the addition would be useful for
people mixing GIT and StGIT commands (I personally try to stay with
StGIT commands as much as possible).

-- 
Catalin

^ permalink raw reply

* Re: Implementing branch attributes in git config
From: Junio C Hamano @ 2006-05-08  9:00 UTC (permalink / raw)
  To: Pavel Roskin; +Cc: git
In-Reply-To: <1147053329.17371.52.camel@dv>

Pavel Roskin <proski@gnu.org> writes:

> On Mon, 2006-05-08 at 03:27 +0200, Johannes Schindelin wrote:
>> > Now, how can I get a description for the "netdev" branch by one
>> > git-repo-config command, without pipes?
>> 
>> 	git-repo-config --get branchdata.description ' for netdev$'
>
> No, it doesn't remove "for netdev".  What I really don't like is that
> git-repo-config treats it as "not my problem".

Stating what you do not like about something is a good first
step to improve that something.  It should not be too hard to
extend the parser to grok:

	repo-config --get branchdata.description '\(.*\) for netdev$'

and when the value_regex has a capture return what matches
instead of the entire value.  I think that would do what you
want.

^ permalink raw reply

* Re: Unresolved issues #2 (shallow clone again)
From: Jeff King @ 2006-05-08  4:24 UTC (permalink / raw)
  To: git
In-Reply-To: <Pine.LNX.4.64.0605070802590.16343@g5.osdl.org>

On Sun, May 07, 2006 at 08:27:02AM -0700, Linus Torvalds wrote:

> factor for a lot of things for many "common" filesystem setups. You 
> probably didn't even account for the size of inodes in your "du" setup.

My numbers came from git-count-objects, which uses the st_blocks sum for
all objects. The actual du numbers showing space wasted by block
boundaries are:
  du -c ??: 1429216
  du -c --apparent-size ??: 792277
So it's about 45% wasted space.

On Sun, May 07, 2006 at 08:33:38PM -0400, Theodore Tso wrote:

> If there are 233338 objects, then the average wasted space due to
> internal fragmentation is 233338 * 2k, or 466676 kilobytes, or only
> 36% of the wasted space.  Most of the savings is probably coming from
> the compression and delta packing.

As Linus indicated, that assumes a uniform distribution of file sizes
(and my numbers above show that it is, in fact, somewhat higher). FYI,
the mean and median of usage of the final 4K block in the linux-2.6
repository are 1309 and 912 bytes, respectively.

-Peff

^ permalink raw reply

* Re: Unresolved issues #2 (shallow clone again)
From: Jakub Narebski @ 2006-05-08  4:24 UTC (permalink / raw)
  To: git
In-Reply-To: <7vy7xdgram.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano wrote:

> Jakub Narebski <jnareb@gmail.com> writes:
> 
>> Wouldn't it be easier (sorry, no code yet) to have the following:
>>
>>         I WANT to have these
>>         I HAVE these
>>         These are GRAFT PARENTLESS
>>
>> with the target side sending list of all parentless commits in the
>> ... The source side will then do the grafting 'in memory' and
>> send the packs like normal, only with those cauterizing grafts in place.
> 
> I think that is essentially the outline of shallow clone
> proposal, except that you have to be careful and take not just
> "parentless" but other grafts (e.g. one that removes one parent
> from a merge commit to pretend that a side branch did not exist)
> into account as well.  I do not remember if I already coded it
> or not -- I might have.

So, let it be all grafts removing some or all parents from commit.

And that proposal would work I think also for the fetch, not only clone.

-- 
Jakub Narebski
Warsaw, Poland

^ permalink raw reply

* Re: Unresolved issues #2 (shallow clone again)
From: Jakub Narebski @ 2006-05-08  4:02 UTC (permalink / raw)
  To: git
In-Reply-To: <7vy7xdgram.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano wrote:

> Jakub Narebski <jnareb@gmail.com> writes:
> 
>> Wouldn't it be easier (sorry, no code yet) to have the following:
>>
>>         I WANT to have these
>>         I HAVE these
>>         These are GRAFT PARENTLESS
>>
>> with the target side sending list of all parentless commits in the
>> ... The source side will then do the grafting 'in memory' and
>> send the packs like normal, only with those cauterizing grafts in place.
> 
> I think that is essentially the outline of shallow clone
> proposal, except that you have to be careful and take not just
> "parentless" but other grafts (e.g. one that removes one parent
> from a merge commit to pretend that a side branch did not exist)
> into account as well.  I do not remember if I already coded it
> or not -- I might have.

Having grafts file being used for both joining history and cauterizing
history makes re-cauterizing (e.g. changing depth of clone) difficult at
best...

-- 
Jakub Narebski
Warsaw, Poland

^ permalink raw reply

* Re: Unresolved issues #2 (shallow clone again)
From: Junio C Hamano @ 2006-05-08  2:54 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git
In-Reply-To: <e3ksoq$is$1@sea.gmane.org>

Jakub Narebski <jnareb@gmail.com> writes:

> Wouldn't it be easier (sorry, no code yet) to have the following:
>
>         I WANT to have these
>         I HAVE these
>         These are GRAFT PARENTLESS        
>
> with the target side sending list of all parentless commits in the
> ... The source side will then do the grafting 'in memory' and
> send the packs like normal, only with those cauterizing grafts in place.

I think that is essentially the outline of shallow clone
proposal, except that you have to be careful and take not just
"parentless" but other grafts (e.g. one that removes one parent
from a merge commit to pretend that a side branch did not exist)
into account as well.  I do not remember if I already coded it
or not -- I might have.

^ permalink raw reply

* Re: Unresolved issues #2
From: Junio C Hamano @ 2006-05-08  2:51 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git
In-Reply-To: <e3km6q$f7p$1@sea.gmane.org>

Jakub Narebski <jnareb@gmail.com> writes:

> Junio C Hamano wrote:
>
>>             char *value; /* "existence" bool may have NULL,
>>                           * otherwise probably a string "= value"
>>                           */
>
> Probably " = value" to preserve whitespace (e.g. justify on equal sign in
> hand crafted config file).

Probably even better is to remove the separate *value_comment,
and make this thing point at the entire " = value ; this is the
comment for the value\n" thing.

^ permalink raw reply

* Re: Implementing branch attributes in git config
From: sean @ 2006-05-08  2:39 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vbqu9i6zl.fsf@assigned-by-dhcp.cox.net>

On Sun, 07 May 2006 19:29:50 -0700
Junio C Hamano <junkio@cox.net> wrote:


> Not at all.  Whatever Porcelain that runs repo-config to record
> the branch name needs to spell that branch name with proper
> quoting, like:

Okay.  It just seems nuts to require quoting because you happen
to use an uppercase character.  People are used to quoting 
special characters like * and $, not uppercase letters.

> I _do_ want to keep my slashes intact and also dots; mangling
> them is not very nice to me X-<.

You're right.

We should just relax the config file rules for legal characters,
in identifiers, at least [a-zA-Z0-9_/-].

Sean.

^ permalink raw reply

* Re: Unresolved issues #2 (shallow clone again)
From: Linus Torvalds @ 2006-05-08  2:42 UTC (permalink / raw)
  To: Theodore Tso; +Cc: git
In-Reply-To: <20060508022432.GA26076@thunk.org>



On Sun, 7 May 2006, Theodore Tso wrote:
> 
> That brings up an interesting question though --- why not skip
> compressing files that are under 4k (or whatever the filesystem
> blocksize happens to be) if they are unpacked?  It burns CPU time;
> maybe not enough to be human-noticeable, but it's still not buying you
> anything.

Well, other filesystems don't have 4kB issues. Reiser can do smaller 
things iirc, and you might obviously have a ext3 filesystem with a 1kB 
blocksize too. And with tails on FFS, you might have a filesystem with a 
8kB blocksize, but despite that it might lay out <1kB files well.

Anyway, packing makes all this basically a non-issue. There are no block 
boundaries in a pack-file, and you only use a single inode. And you'd 
obviously want to pack for other reasons anyway (ie the delta compression 
will makea huge difference over time).

		Linus

^ permalink raw reply

* Re: [PATCH] Release config lock if the regex is invalid
From: Junio C Hamano @ 2006-05-08  2:38 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git
In-Reply-To: <Pine.LNX.4.63.0605080229220.32508@wbgn013.biozentrum.uni-wuerzburg.de>

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> This is not enough. There are quite a few exit paths. Notice the "goto 
> out_free"? That is where this must go.
>
> This patch is totally untested but obviously correct:

except that many places you already close(fd) and
unlink(lock_file).  

Somehow it vaguely reminds me of recent "kernel in C++" thread
in the other mailing list, which I do not want people to start
talking about here, but moving all the clean-up code to exit
path indeed makes things simpler to read.

How about doing something like this?

-- >8 --
diff --git a/config.c b/config.c
index 11d65f8..adb5ce4 100644
--- a/config.c
+++ b/config.c
@@ -420,7 +420,7 @@ int git_config_set_multivar(const char* 
 	const char* value_regex, int multi_replace)
 {
 	int i;
-	int fd, in_fd;
+	int fd = -1, in_fd;
 	int ret;
 	char* config_filename = strdup(git_path("config"));
 	char* lock_file = strdup(git_path("config.lock"));
@@ -478,15 +478,11 @@ int git_config_set_multivar(const char* 
 		if ( ENOENT != errno ) {
 			error("opening %s: %s", config_filename,
 			      strerror(errno));
-			close(fd);
-			unlink(lock_file);
 			ret = 3; /* same as "invalid config file" */
 			goto out_free;
 		}
 		/* if nothing to unset, error out */
 		if (value == NULL) {
-			close(fd);
-			unlink(lock_file);
 			ret = 5;
 			goto out_free;
 		}
@@ -514,8 +510,6 @@ int git_config_set_multivar(const char* 
 				fprintf(stderr, "Invalid pattern: %s\n",
 					value_regex);
 				free(store.value_regex);
-				close(fd);
-				unlink(lock_file);
 				ret = 6;
 				goto out_free;
 			}
@@ -551,8 +545,6 @@ int git_config_set_multivar(const char* 
 		/* if nothing to unset, or too many matches, error out */
 		if ((store.seen == 0 && value == NULL) ||
 				(store.seen > 1 && multi_replace == 0)) {
-			close(fd);
-			unlink(lock_file);
 			ret = 5;
 			goto out_free;
 		}
@@ -601,8 +593,6 @@ int git_config_set_multivar(const char* 
 		unlink(config_filename);
 	}
 
-	close(fd);
-
 	if (rename(lock_file, config_filename) < 0) {
 		fprintf(stderr, "Could not rename the lock file?\n");
 		ret = 4;
@@ -612,10 +602,14 @@ int git_config_set_multivar(const char* 
 	ret = 0;
 
 out_free:
+	if (0 <= fd)
+		close(fd);
 	if (config_filename)
 		free(config_filename);
-	if (lock_file)
+	if (lock_file) {
+		unlink(lock_file);
 		free(lock_file);
+	}
 	return ret;
 }
 

^ permalink raw reply related

* Re: Implementing branch attributes in git config
From: Junio C Hamano @ 2006-05-08  2:29 UTC (permalink / raw)
  To: sean; +Cc: git
In-Reply-To: <BAYC1-PASMTP0334B471C6908E4E40BFD2AEA80@CEZ.ICE>

sean <seanlkml@sympatico.ca> writes:

> On Sun, 07 May 2006 18:27:32 -0700
> Junio C Hamano <junkio@cox.net> wrote:
>
>
>> How about keeping the default syntax as it is (tokens are case
>> insensitive and alnums only, dot separates tokens into
>> sections), and when a token that violates that rule needs to be
>> spelled out, require quoting, so:
>> 
>> 	branch.foo	BranCh.FoO	branch.FOO
>  
>> are the same (section "branch.foo"),
>
> Doesn't that mean you have to then prohibit creating mixed
> case branches with "git branch" and "git checkout -b" ?

Not at all.  Whatever Porcelain that runs repo-config to record
the branch name needs to spell that branch name with proper
quoting, like:

>> 	branch."js/fmt.patch"	or   "branch.js/fmt.patch"        
>> 
>> and the URL variable for that section is
>> 
>> 	$ git repo-config '"branch.js/fmt.patch".url'
>
> How about transforming slashes into dots?  so the above would 
> be:
>
>    [branch.js.fmt.patch]

I _do_ want to keep my slashes intact and also dots; mangling
them is not very nice to me X-<.

^ permalink raw reply

* Re: Unresolved issues #2 (shallow clone again)
From: Theodore Tso @ 2006-05-08  2:24 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git
In-Reply-To: <Pine.LNX.4.64.0605071853290.3718@g5.osdl.org>

On Sun, May 07, 2006 at 07:04:48PM -0700, Linus Torvalds wrote:
> Is that without compression?

Yes, without compression.  So yes, that probably explains the
difference between your numbers and mine. 

That brings up an interesting question though --- why not skip
compressing files that are under 4k (or whatever the filesystem
blocksize happens to be) if they are unpacked?  It burns CPU time;
maybe not enough to be human-noticeable, but it's still not buying you
anything.

						- Ted

^ permalink raw reply

* Re: Unresolved issues #2 (shallow clone again)
From: Linus Torvalds @ 2006-05-08  2:04 UTC (permalink / raw)
  To: Theodore Tso; +Cc: git
In-Reply-To: <20060508012632.GD17138@thunk.org>



On Sun, 7 May 2006, Theodore Tso wrote:
> 
> I just ran the numbers on filesizes of a kernel tree I had handy,
> which happened to be 2.6.16.11.  With no object files, git files,
> etc. the average loss was 2351 bytes --- not that far away from the
> average of 2048 bytes.

Is that without compression?

git objects are compressed, and common types (trees) tend to be smaller 
than your normal C file.

So git objects tend to be _smaller_ than the regular files. By about 30%. 
In addition, the non-blob git objects themselves tend to be smaller still.

So for example, right now I have just 58 unpacked objects (I repack pretty 
often). But of those 58 objects, exactly _fifty_ are smaller than 2kB, and 
38 are smaller than 1kB. The median size is 771 bytes.

On master.kernel.org, I've not repacked as recently, so I've got 2268 
unpacked objects. But the median size there is 773 bytes, so it looks like 
the numbers are statistically pretty stable.

			Linus

^ permalink raw reply

* Re: Implementing branch attributes in git config
From: Pavel Roskin @ 2006-05-08  1:55 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Linus Torvalds, git
In-Reply-To: <Pine.LNX.4.63.0605080327490.13794@wbgn013.biozentrum.uni-wuerzburg.de>

On Mon, 2006-05-08 at 03:27 +0200, Johannes Schindelin wrote:
> > Now, how can I get a description for the "netdev" branch by one
> > git-repo-config command, without pipes?
> 
> 	git-repo-config --get branchdata.description ' for netdev$'

No, it doesn't remove "for netdev".  What I really don't like is that
git-repo-config treats it as "not my problem".

git-repo-config places extremely tight limitations of the names of the
sections and the keys.  But sometimes a relationship between two loosely
defined strings needs to be presented.  It's a real need.  And
git-repo-config doesn't address this need.

I believe git-repo-config should allow direct retrieval of data from any
depth, and the syntax should be explicit rather than fuzzy.  A dot is
more explicit than "for", especially if the dot appears after a name
that may not contain dots.

Another question is how we want to group the data.  Do we want to have
all descriptions together or in separate sections?  Whatever the answer,
git-repo-config should provide means to extract all data in one command,
without need for postprocessing.

So I understand arguing where to place the branch name.  But what I
don't like is the desire to offload part of the processing on the
callers.

-- 
Regards,
Pavel Roskin

^ permalink raw reply

* Re: Implementing branch attributes in git config
From: sean @ 2006-05-08  1:44 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git
In-Reply-To: <Pine.LNX.4.63.0605080344480.14008@wbgn013.biozentrum.uni-wuerzburg.de>

On Mon, 8 May 2006 03:45:47 +0200 (CEST)
Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:

> Hi,
> 
> On Sun, 7 May 2006, sean wrote:
> 
> > Not worth it.  Branch names should be alnums and imho should be
> > case sensitive too.
> 
> Why should they be case sensitive? So you have a branch "origin" and 
> another named "Origin" and get totally confused?
> 

I don't care either way, just think we should be consistent.  Currently
we support case sensitivity in branch names and let people get 
totally confused.   But in practice people don't really get confused
by the linux filesystem which is case sensitive.

Sean

^ permalink raw reply

* Re: Implementing branch attributes in git config
From: Johannes Schindelin @ 2006-05-08  1:45 UTC (permalink / raw)
  To: sean; +Cc: git
In-Reply-To: <BAYC1-PASMTP0334B471C6908E4E40BFD2AEA80@CEZ.ICE>

Hi,

On Sun, 7 May 2006, sean wrote:

> Not worth it.  Branch names should be alnums and imho should be
> case sensitive too.

Why should they be case sensitive? So you have a branch "origin" and 
another named "Origin" and get totally confused?

Ciao,
Dscho

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox