Git development

Git development
 help / color / mirror / Atom feed

* Re: Porcelain specific metadata under .git?
From: Andreas Ericsson @ 2006-06-14 13:07 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git
In-Reply-To: <e6os3v$r5g$1@sea.gmane.org>

Jakub Narebski wrote:
> Andreas Ericsson wrote:
> 
> 
>>Shawn Pearce wrote:
>>
>>>I already assume/know that refs/heads and refs/tags are completely
>>>off-limits as they are for user refs only.
>>>
>>>I also think the core GIT tools already assume that anything
>>>directly under .git which is strictly a file and which is named
>>>entirely with uppercase letters (aside from "HEAD") is strictly a
>>>temporary/short-lived state type item (e.g. COMMIT_MSG) used by a
>>>Porcelain.
>>>
>>>But is saying ".git/refs/eclipse-workspaces" is probably able to
>>>be used for this purpose safe?  :-)
>>>
>>
>>.git/eclipse/whatever-you-like
>>
>>would probably be better. Heads can be stored directly under .git/refs 
>>too. Most likely, nothing will ever be stored under ./git/eclipse by 
>>either core git or the current (other) porcelains though.
> 
> 
> I think if it is a ref, which one wants to be visible to git-fsck (and
> git-prune), it should be under .git/refs.
> 

Yes, but I understood him to mean "it's a tree-sha" instead of a 
branch/head thing, which would mean it doesn't fit the .git/refs 
definition of ref.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

^ permalink raw reply

* Re: [PATCH] fix git alias
From: Johannes Schindelin @ 2006-06-14 13:14 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vu06nevse.fsf@assigned-by-dhcp.cox.net>

Hi,

On Wed, 14 Jun 2006, Junio C Hamano wrote:

>  * This would make "git l -n 4" work when you have "alias.l =
>    log -M" in your configuration.  The original code generated
>    an equivalent of "git log -M l -n 4".

Of course, I tested it only with links... (ln git git-l). Thanks.

>    There is another more grave problem I seem to be hitting but
>    haven't figured out (and will probably not figure out while
>    away); I'd appreciate if you can track it down.  With
>    "alias.wh = whatchanged --patch-with-stat", "git wh HEAD --
>    mailinfo.c" segfaults at fclose() in git_config_from_file()
>    when it reads the configuration for the second time (the
>    first time being getting the alias).  The second call comes
>    via init_revisions() calling setup_git_directory().  Oddly
>    I do not seem to be able to reproduce this segfault on amd64.

I will do that.

Note that I have a mmap()ed version in the pipeline. I just wanted to wait 
with that until I manage to implement your cool idea about config 
rewriting. Obviously, this mmap()ed version does not have this problem.

Ciao,
Dscho

^ permalink raw reply

* Re: Porcelain specific metadata under .git?
From: Junio C Hamano @ 2006-06-14 13:30 UTC (permalink / raw)
  To: git
In-Reply-To: <44900A2F.7050704@op5.se>

Andreas Ericsson <ae@op5.se> writes:

> Yes, but I understood him to mean "it's a tree-sha" instead of a
> branch/head thing, which would mean it doesn't fit the .git/refs
> definition of ref.

I am not sure what you meant by "it's a tree-sha", but if you
have an impression that .git/refs define "ref" as committish,
you are mistaken.  Linus has .git/refs/tags/v2.6.11-tree which
tags a tree object.  I even have a .git/refs/tags/junio-gpg-pub
which tags a blob (blobish ;-> ?).

^ permalink raw reply

* Re: [PATCH] fix git alias
From: Johannes Schindelin @ 2006-06-14 13:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <Pine.LNX.4.63.0606141507420.16802@wbgn013.biozentrum.uni-wuerzburg.de>

Hi,

On Wed, 14 Jun 2006, Johannes Schindelin wrote:

> >    There is another more grave problem I seem to be hitting but
> >    haven't figured out (and will probably not figure out while
> >    away); I'd appreciate if you can track it down.  With
> >    "alias.wh = whatchanged --patch-with-stat", "git wh HEAD --
> >    mailinfo.c" segfaults at fclose() in git_config_from_file()
> >    when it reads the configuration for the second time (the
> >    first time being getting the alias).  The second call comes
> >    via init_revisions() calling setup_git_directory().  Oddly
> >    I do not seem to be able to reproduce this segfault on amd64.
> 
> I will do that.

I cannot reproduce, sorry. Valgrind says some objects are not released, 
but I cannot find another error. That's with 'next'.

Ciao,
Dscho

^ permalink raw reply

* [PATCH/RFC] Teach diff about -b and -w flags
From: Johannes Schindelin @ 2006-06-14 15:40 UTC (permalink / raw)
  To: davidel, git, junkio


This adds -b (--ignore-space-change) and -w (--ignore-all-space) flags to
diff. The main part of the patch is teaching libxdiff about it.

Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
---

	Note that -b will not treat DOS and Unix line endings as equal,
	although it would be trivial. Is this desired?

	Another question: instead of checking the flags all the time,
	xdl_line_match (and possibly xdl_hash_record) could be split into
	three different functions, and a function pointer could be set
	at init time. This would be faster, but less elegant, no?

 diff.c           |   13 +++++++++----
 diff.h           |    1 +
 xdiff/xdiff.h    |    3 +++
 xdiff/xdiffi.c   |   12 ++++++------
 xdiff/xdiffi.h   |    1 -
 xdiff/xmacros.h  |    1 -
 xdiff/xprepare.c |   16 ++++++++++------
 xdiff/xutils.c   |   51 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 xdiff/xutils.h   |    3 ++-
 9 files changed, 81 insertions(+), 20 deletions(-)

diff --git a/diff.c b/diff.c
index bc32a4a..5b34f73 100644
--- a/diff.c
+++ b/diff.c
@@ -661,7 +661,7 @@ static void builtin_diff(const char *nam
 		memset(&ecbdata, 0, sizeof(ecbdata));
 		ecbdata.label_path = lbl;
 		ecbdata.color_diff = o->color_diff;
-		xpp.flags = XDF_NEED_MINIMAL;
+		xpp.flags = XDF_NEED_MINIMAL | o->xdl_opts;
 		xecfg.ctxlen = o->context;
 		xecfg.flags = XDL_EMIT_FUNCNAMES;
 		if (!diffopts)
@@ -686,6 +686,7 @@ static void builtin_diffstat(const char 
 			     struct diff_filespec *one,
 			     struct diff_filespec *two,
 			     struct diffstat_t *diffstat,
+			     struct diff_options *o,
 			     int complete_rewrite)
 {
 	mmfile_t mf1, mf2;
@@ -715,7 +716,7 @@ static void builtin_diffstat(const char 
 		xdemitconf_t xecfg;
 		xdemitcb_t ecb;
 
-		xpp.flags = XDF_NEED_MINIMAL;
+		xpp.flags = XDF_NEED_MINIMAL | o->xdl_opts;
 		xecfg.ctxlen = 0;
 		xecfg.flags = 0;
 		ecb.outf = xdiff_outf;
@@ -1300,7 +1301,7 @@ static void run_diffstat(struct diff_fil
 
 	if (DIFF_PAIR_UNMERGED(p)) {
 		/* unmerged */
-		builtin_diffstat(p->one->path, NULL, NULL, NULL, diffstat, 0);
+		builtin_diffstat(p->one->path, NULL, NULL, NULL, diffstat, o, 0);
 		return;
 	}
 
@@ -1312,7 +1313,7 @@ static void run_diffstat(struct diff_fil
 
 	if (p->status == DIFF_STATUS_MODIFIED && p->score)
 		complete_rewrite = 1;
-	builtin_diffstat(name, other, p->one, p->two, diffstat, complete_rewrite);
+	builtin_diffstat(name, other, p->one, p->two, diffstat, o, complete_rewrite);
 }
 
 static void run_checkdiff(struct diff_filepair *p, struct diff_options *o)
@@ -1517,6 +1518,10 @@ int diff_opt_parse(struct diff_options *
 	}
 	else if (!strcmp(arg, "--color"))
 		options->color_diff = 1;
+	else if (!strcmp(arg, "-w") || !strcmp(arg, "--ignore-all-space"))
+		options->xdl_opts |= XDF_IGNORE_WHITESPACE;
+	else if (!strcmp(arg, "-b") || !strcmp(arg, "--ignore-space-change"))
+		options->xdl_opts |= XDF_IGNORE_WHITESPACE_CHANGE;
 	else
 		return 0;
 	return 1;
diff --git a/diff.h b/diff.h
index 2b821df..7d7b6cd 100644
--- a/diff.h
+++ b/diff.h
@@ -46,6 +46,7 @@ struct diff_options {
 	int setup;
 	int abbrev;
 	const char *stat_sep;
+	long xdl_opts;
 
 	int nr_paths;
 	const char **paths;
diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
index 2540e8a..2ce10b4 100644
--- a/xdiff/xdiff.h
+++ b/xdiff/xdiff.h
@@ -29,6 +29,9 @@ #endif /* #ifdef __cplusplus */
 
 
 #define XDF_NEED_MINIMAL (1 << 1)
+#define XDF_IGNORE_WHITESPACE (1 << 2)
+#define XDF_IGNORE_WHITESPACE_CHANGE (1 << 3)
+#define XDF_WHITESPACE_FLAGS (XDF_IGNORE_WHITESPACE | XDF_IGNORE_WHITESPACE_CHANGE)
 
 #define XDL_PATCH_NORMAL '-'
 #define XDL_PATCH_REVERSE '+'
diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
index b95ade2..5d09a16 100644
--- a/xdiff/xdiffi.c
+++ b/xdiff/xdiffi.c
@@ -45,7 +45,7 @@ static long xdl_split(unsigned long cons
 		      long *kvdf, long *kvdb, int need_min, xdpsplit_t *spl,
 		      xdalgoenv_t *xenv);
 static xdchange_t *xdl_add_change(xdchange_t *xscr, long i1, long i2, long chg1, long chg2);
-static int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo);
+static int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags);
 
 
 
@@ -397,7 +397,7 @@ static xdchange_t *xdl_add_change(xdchan
 }
 
 
-static int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo) {
+static int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
 	long ix, ixo, ixs, ixref, grpsiz, nrec = xdf->nrec;
 	char *rchg = xdf->rchg, *rchgo = xdfo->rchg;
 	xrecord_t **recs = xdf->recs;
@@ -440,7 +440,7 @@ static int xdl_change_compact(xdfile_t *
 			 * the group.
 			 */
 			while (ixs > 0 && recs[ixs - 1]->ha == recs[ix - 1]->ha &&
-			       XDL_RECMATCH(recs[ixs - 1], recs[ix - 1])) {
+			       xdl_line_match(recs[ixs - 1]->ptr, recs[ixs - 1]->size, recs[ix - 1]->ptr, recs[ix - 1]->size, flags)) {
 				rchg[--ixs] = 1;
 				rchg[--ix] = 0;
 
@@ -468,7 +468,7 @@ static int xdl_change_compact(xdfile_t *
 			 * the group.
 			 */
 			while (ix < nrec && recs[ixs]->ha == recs[ix]->ha &&
-			       XDL_RECMATCH(recs[ixs], recs[ix])) {
+			       xdl_line_match(recs[ixs]->ptr, recs[ixs]->size, recs[ix]->ptr, recs[ix]->size, flags)) {
 				rchg[ixs++] = 0;
 				rchg[ix++] = 1;
 
@@ -546,8 +546,8 @@ int xdl_diff(mmfile_t *mf1, mmfile_t *mf
 
 		return -1;
 	}
-	if (xdl_change_compact(&xe.xdf1, &xe.xdf2) < 0 ||
-	    xdl_change_compact(&xe.xdf2, &xe.xdf1) < 0 ||
+	if (xdl_change_compact(&xe.xdf1, &xe.xdf2, xpp->flags) < 0 ||
+	    xdl_change_compact(&xe.xdf2, &xe.xdf1, xpp->flags) < 0 ||
 	    xdl_build_script(&xe, &xscr) < 0) {
 
 		xdl_free_env(&xe);
diff --git a/xdiff/xdiffi.h b/xdiff/xdiffi.h
index dd8f3c9..d3b7271 100644
--- a/xdiff/xdiffi.h
+++ b/xdiff/xdiffi.h
@@ -55,6 +55,5 @@ void xdl_free_script(xdchange_t *xscr);
 int xdl_emit_diff(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb,
 		  xdemitconf_t const *xecfg);
 
-
 #endif /* #if !defined(XDIFFI_H) */
 
diff --git a/xdiff/xmacros.h b/xdiff/xmacros.h
index 78f0260..4c2fde8 100644
--- a/xdiff/xmacros.h
+++ b/xdiff/xmacros.h
@@ -33,7 +33,6 @@ #define XDL_ABS(v) ((v) >= 0 ? (v): -(v)
 #define XDL_ISDIGIT(c) ((c) >= '0' && (c) <= '9')
 #define XDL_HASHLONG(v, b) (((unsigned long)(v) * GR_PRIME) >> ((CHAR_BIT * sizeof(unsigned long)) - (b)))
 #define XDL_PTRFREE(p) do { if (p) { xdl_free(p); (p) = NULL; } } while (0)
-#define XDL_RECMATCH(r1, r2) ((r1)->size == (r2)->size && memcmp((r1)->ptr, (r2)->ptr, (r1)->size) == 0)
 #define XDL_LE32_PUT(p, v) \
 do { \
 	unsigned char *__p = (unsigned char *) (p); \
diff --git a/xdiff/xprepare.c b/xdiff/xprepare.c
index add5a75..f2a12ae 100644
--- a/xdiff/xprepare.c
+++ b/xdiff/xprepare.c
@@ -43,12 +43,13 @@ typedef struct s_xdlclassifier {
 	xdlclass_t **rchash;
 	chastore_t ncha;
 	long count;
+	long flags;
 } xdlclassifier_t;
 
 
 
 
-static int xdl_init_classifier(xdlclassifier_t *cf, long size);
+static int xdl_init_classifier(xdlclassifier_t *cf, long size, long flags);
 static void xdl_free_classifier(xdlclassifier_t *cf);
 static int xdl_classify_record(xdlclassifier_t *cf, xrecord_t **rhash, unsigned int hbits,
 			       xrecord_t *rec);
@@ -63,9 +64,11 @@ static int xdl_optimize_ctxs(xdfile_t *x
 
 
 
-static int xdl_init_classifier(xdlclassifier_t *cf, long size) {
+static int xdl_init_classifier(xdlclassifier_t *cf, long size, long flags) {
 	long i;
 
+	cf->flags = flags;
+
 	cf->hbits = xdl_hashbits((unsigned int) size);
 	cf->hsize = 1 << cf->hbits;
 
@@ -103,8 +106,9 @@ static int xdl_classify_record(xdlclassi
 	line = rec->ptr;
 	hi = (long) XDL_HASHLONG(rec->ha, cf->hbits);
 	for (rcrec = cf->rchash[hi]; rcrec; rcrec = rcrec->next)
-		if (rcrec->ha == rec->ha && rcrec->size == rec->size &&
-		    !memcmp(line, rcrec->line, rec->size))
+		if (rcrec->ha == rec->ha &&
+				xdl_line_match(rcrec->line, rcrec->size,
+					rec->ptr, rec->size, cf->flags))
 			break;
 
 	if (!rcrec) {
@@ -173,7 +177,7 @@ static int xdl_prepare_ctx(mmfile_t *mf,
 				top = blk + bsize;
 			}
 			prev = cur;
-			hav = xdl_hash_record(&cur, top);
+			hav = xdl_hash_record(&cur, top, xpp->flags);
 			if (nrec >= narec) {
 				narec *= 2;
 				if (!(rrecs = (xrecord_t **) xdl_realloc(recs, narec * sizeof(xrecord_t *)))) {
@@ -268,7 +272,7 @@ int xdl_prepare_env(mmfile_t *mf1, mmfil
 	enl1 = xdl_guess_lines(mf1) + 1;
 	enl2 = xdl_guess_lines(mf2) + 1;
 
-	if (xdl_init_classifier(&cf, enl1 + enl2 + 1) < 0) {
+	if (xdl_init_classifier(&cf, enl1 + enl2 + 1, xpp->flags) < 0) {
 
 		return -1;
 	}
diff --git a/xdiff/xutils.c b/xdiff/xutils.c
index 21ab8e7..3dd5fe1 100644
--- a/xdiff/xutils.c
+++ b/xdiff/xutils.c
@@ -189,12 +189,61 @@ long xdl_guess_lines(mmfile_t *mf) {
 	return nl + 1;
 }
 
+int xdl_line_match(const char *l1, long s1, const char *l2, long s2, long flags)
+{
+	int i1, i2;
+
+	if (flags & XDF_IGNORE_WHITESPACE) {
+		for (i1 = i2 = 0; i1 < s1 && i2 < s2; i1++, i2++) {
+			if (isspace(l1[i1]))
+				while (isspace(l1[i1]) && i1 < s1)
+					i1++;
+			else if (isspace(l2[i2]))
+				while (isspace(l2[i2]) && i2 < s2)
+					i2++;
+			else if (l1[i1] != l2[i2])
+				return l2[i2] - l1[i1];
+		}
+		if (i1 >= s1)
+			return 1;
+		else if (i2 >= s2)
+			return -1;
+	} else if (flags & XDF_IGNORE_WHITESPACE_CHANGE) {
+		for (i1 = i2 = 0; i1 < s1 && i2 < s2; i1++, i2++) {
+			if (isspace(l1[i1])) {
+				if (!isspace(l2[i2]))
+					return -1;
+				while (isspace(l1[i1]) && i1 < s1)
+					i1++;
+				while (isspace(l2[i2]) && i2 < s2)
+					i2++;
+			} else if (l1[i1] != l2[i2])
+				return l2[i2] - l1[i1];
+		}
+		if (i1 >= s1)
+			return 1;
+		else if (i2 >= s2)
+			return -1;
+	} else
+		return s1 == s2 && !memcmp(l1, l2, s1);
+
+	return 0;
+}
 
-unsigned long xdl_hash_record(char const **data, char const *top) {
+unsigned long xdl_hash_record(char const **data, char const *top, long flags) {
 	unsigned long ha = 5381;
 	char const *ptr = *data;
 
 	for (; ptr < top && *ptr != '\n'; ptr++) {
+		if (isspace(*ptr) && (flags & XDF_WHITESPACE_FLAGS)) {
+			while (ptr < top && isspace(*ptr) && ptr[1] != '\n')
+				ptr++;
+			if (flags & XDF_IGNORE_WHITESPACE_CHANGE) {
+				ha += (ha << 5);
+				ha ^= (unsigned long) ' ';
+			}
+			continue;
+		}
 		ha += (ha << 5);
 		ha ^= (unsigned long) *ptr;
 	}
diff --git a/xdiff/xutils.h b/xdiff/xutils.h
index ea38ee9..2701eea 100644
--- a/xdiff/xutils.h
+++ b/xdiff/xutils.h
@@ -33,7 +33,8 @@ void *xdl_cha_alloc(chastore_t *cha);
 void *xdl_cha_first(chastore_t *cha);
 void *xdl_cha_next(chastore_t *cha);
 long xdl_guess_lines(mmfile_t *mf);
-unsigned long xdl_hash_record(char const **data, char const *top);
+int xdl_line_match(const char *l1, long s1, const char *l2, long s2, long flags);
+unsigned long xdl_hash_record(char const **data, char const *top, long flags);
 unsigned int xdl_hashbits(unsigned int size);
 int xdl_num_out(char *out, long val);
 long xdl_atol(char const *str, char const **next);
-- 
1.4.0.ga9a96-dirty

^ permalink raw reply related

* Re: Repacking many disconnected blobs
From: Linus Torvalds @ 2006-06-14 15:53 UTC (permalink / raw)
  To: Keith Packard; +Cc: Git Mailing List
In-Reply-To: <1150269478.20536.150.camel@neko.keithp.com>

On Wed, 14 Jun 2006, Keith Packard wrote:
>
> parsecvs scans every ,v file and creates a blob for every revision of
> every file right up front. Once these are created, it discards the
> actual file contents and deals solely with the hash values.
> 
> The problem is that while this is going on, the repository consists
> solely of disconnected objects, and I can't make git-repack put those
> into pack objects.

Ok. That's actually _easily_ rectifiable, because it turns out that your 
behaviour is something that re-packing is actually really good at 
handling.

The thing is, "git repack" (the wrapper function) is all about finding all 
the heads of a repository, and then tellign the _real_ packing logic which 
objects to pack.

In other words, it literally boils down to basically

	git-rev-list --all --objects $rev_list |
		git-pack-objects --non-empty $pack_objects .tmp-pack

where "$rev_list" and "$pack_objects" are just extra flags to the two 
phases that you don't really care about.

But the important point to recognize is that the pack generation itself 
doesn't care about reachability or anything else AT ALL. The pack is just 
a jumble of objects, nothing more. Which is exactly what you want.

> I'm assuming that if I could get these disconnected blobs all neatly
> tucked into a pack object, things might go a bit faster.

Absolutely. And it's even easy.

What you should do is to just generate a list of objects every once in a 
while, and pass that list off to "git-pack-objects", which will create a 
pack-file for you. Then you just move the generated pack-file (and index 
file) into the .git/objects/pack directory, and then you can run the 
normal "git-prune-packed", and you're done.

There's just two small subtle points to look out for:

 - You can list the objects with "most important first" order first, if 
   you can.  That will improve locality later (the packing will try to 
   generate the pack so that the order you gave the objects in will be a 
   rough order of the resul - the first objects will be together at the 
   beginning, the last objects will be at the end)

   This is not a huge deal. If you don't have a good order, give them in 
   any order, and then after you're done (and you do have branches and 
   tag-heads), the final repack (with a regular "git repack") will fix it 
   all up.

   You'll still get all of the size/access advantage of packfiles without 
   this, it just won't have the additional "nice IO patterns within the 
   packfile" behaviour (which mainly matters for the cold-cache case, so 
   you may well not care).

 - append the filename the object is associated with to the object name on 
   the list, if at all possible. This is what git-pack-objects will use as 
   part of the heuristic for finding the deltas, so this is actually a big 
   deal. If you forget (or mess up) the filename, packing will still 
   _work_ - it's just a heuristic, after all, and there are a few others 
   too - but the pack-file will have inferior delta chains.

   (The name doesn't have to be the "real name", it really only needs to 
   be something unique per *,v file, but real name is probably best)

   The corollary to this is that it's better to generate the pack-file 
   from a list of every version of a few files than it is to generate it 
   from a few versions of every file. Ie, if you process things one file 
   at a time, and create every object for that file, that is actually good 
   for packing, since there will be the optimal delta opportunity.

In other words, you should just feed git-pack-file a list of objects in 
the form "<sha1><space><filename>\n", and git-pack-file will do the rest.

Just as a stupid example, if you were to want to pack just the _tree_ that 
is the current version of a git archive, you'd do

	git-rev-list --objects HEAD^{tree} |
		git-pack-objects --non-empty .tmp-pack

which you can try on the current git tree just to see (the first line will 
generate a list of all objects reachable from the current _tree_: no 
history at all, the second line will create two files under the name of  
".tmp-pack-<sha1-of-object-list>.{pack|idx}".

The reason I suggest doing this for the current tree of the git archive is 
simply that you can look at the git-rev-list output with "less", and see 
for yourself what it actually does (and there are just a few hundred 
objects there: a few tree objects, and the blob objects for every file in 
the current HEAD).

So the git pack-format is actually _optimal_ for your particular case, 
exactly because the pack-files don't actually care about any high-level 
semantics: all they contain is a list of objects.

So in phase 1, when you generate all the objects, the simplest thing to do 
is to literally just remember the last five thousand objects or so as you 
generate them, and when that array of objects fills up, you just start the 
"git-pack-objects" thing, and feed it the list of objects, move the 
pack-file into .git/objects/pack/pack-... and do a "git prune-packed". 

Then you just continue.

So this should all fit the parsecvs approach very well indeed.

		Linus

^ permalink raw reply

* Re: Repacking many disconnected blobs
From: Keith Packard @ 2006-06-14 17:55 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: keithp, Git Mailing List
In-Reply-To: <Pine.LNX.4.64.0606140826200.5498@g5.osdl.org>

[-- Attachment #1: Type: text/plain, Size: 1373 bytes --]

On Wed, 2006-06-14 at 08:53 -0700, Linus Torvalds wrote:

>  - You can list the objects with "most important first" order first, if 
>    you can.  That will improve locality later (the packing will try to 
>    generate the pack so that the order you gave the objects in will be a 
>    rough order of the resul - the first objects will be together at the 
>    beginning, the last objects will be at the end)

I take every ,v file and construct blobs for every revision. If I
understand this correctly, I should be shuffling the revisions so I send
the latest revision of every file first, then the next-latest revision.
It would be somewhat easier to just send the whole list of revisions for
the first file and then move to the next file, but if shuffling is what
I want, I'll do that.

>    The corollary to this is that it's better to generate the pack-file 
>    from a list of every version of a few files than it is to generate it 
>    from a few versions of every file. Ie, if you process things one file 
>    at a time, and create every object for that file, that is actually good 
>    for packing, since there will be the optimal delta opportunity.

I assumed that was the case. Fortunately, I process each file
separately, so this matches my needs exactly. I should be able to report
on this shortly.

-- 
keith.packard@intel.com

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply

* Re: Repacking many disconnected blobs
From: Linus Torvalds @ 2006-06-14 18:18 UTC (permalink / raw)
  To: Keith Packard; +Cc: Git Mailing List
In-Reply-To: <1150307715.20536.166.camel@neko.keithp.com>

On Wed, 14 Jun 2006, Keith Packard wrote:

> On Wed, 2006-06-14 at 08:53 -0700, Linus Torvalds wrote:
> 
> >  - You can list the objects with "most important first" order first, if 
> >    you can.  That will improve locality later (the packing will try to 
> >    generate the pack so that the order you gave the objects in will be a 
> >    rough order of the resul - the first objects will be together at the 
> >    beginning, the last objects will be at the end)
> 
> I take every ,v file and construct blobs for every revision. If I
> understand this correctly, I should be shuffling the revisions so I send
> the latest revision of every file first, then the next-latest revision.
> It would be somewhat easier to just send the whole list of revisions for
> the first file and then move to the next file, but if shuffling is what
> I want, I'll do that.

You don't _need_ to shuffle. As mentioned, it will only affect the 
location of the data in the pack-file, which in turn will mostly matter 
as an IO pattern thing, not anything really fundamental.  If the pack-file 
ends up caching well, the IO patterns obviously will never matter.

Eventually, after the whole import has finished, and you do the final 
repack, that one will do things in "recency order" (or "global 
reachability order" if you prefer), which means that all the objects in 
the final pack will be sorted by how "close" they are to the top-of-tree. 

And that will happen regardless of what the intermediate ordering has 
been.

So if shuffling is inconvenient, just don't do it.

On the other hand, if you know that you generated the blobs "oldest to 
newest", just print them in the reverse order when you end up repacking, 
and you're all done (if you just save the info into some array before you 
repack, just walk the array backwards).

			Linus

^ permalink raw reply

* Re: Repacking many disconnected blobs
From: Linus Torvalds @ 2006-06-14 18:52 UTC (permalink / raw)
  To: Keith Packard; +Cc: Git Mailing List
In-Reply-To: <Pine.LNX.4.64.0606141113130.5498@g5.osdl.org>



On Wed, 14 Jun 2006, Linus Torvalds wrote:
> 
> You don't _need_ to shuffle. As mentioned, it will only affect the 
> location of the data in the pack-file, which in turn will mostly matter 
> as an IO pattern thing, not anything really fundamental.  If the pack-file 
> ends up caching well, the IO patterns obviously will never matter.

Actually, thinking about it more, the way you do things, shuffling 
probably won't even help.

Why? Because you'll obviously have multiple files, and even if each file 
were to be sorted "correctly", the access patterns from any global 
standpoint won't really matter, becase you'd probably bounce back and 
forth in the pack-file anyway.

So if anything, I would say

 - just dump them into the packfile in whatever order is most convenient

 - if you know that later phases will go through the objects and actually 
   use them (as opposed to just building trees out of their SHA1 values) 
   in some particular order, _that_ might be the ordering to use.

 - in many ways, getting good delta chains is _much_ more important, since 
   "git repack -a -d" will re-use good deltas from a previous pack, but 
   will _not_ care about any ordering in the old pack. As well as 
   obviously improving the size of the temporary pack-files anyway.

I'll pontificate more if I can think of any other cases that might matter.

		Linus

^ permalink raw reply

* Re: Repacking many disconnected blobs
From: Keith Packard @ 2006-06-14 18:59 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: keithp, Git Mailing List
In-Reply-To: <Pine.LNX.4.64.0606141113130.5498@g5.osdl.org>

[-- Attachment #1: Type: text/plain, Size: 851 bytes --]

On Wed, 2006-06-14 at 11:18 -0700, Linus Torvalds wrote:

> You don't _need_ to shuffle. As mentioned, it will only affect the 
> location of the data in the pack-file, which in turn will mostly matter 
> as an IO pattern thing, not anything really fundamental.  If the pack-file 
> ends up caching well, the IO patterns obviously will never matter.

Ok, sounds like shuffling isn't necessary; the only benefit packing
gains me is to reduce the size of each directory in the object store;
the process I follow is to construct blobs for every revision, then just
use the sha1 values to construct an index for each commit. I never
actually look at the blobs myself, so IO access patterns aren't
relevant.

Repacking after the import is completed should undo whatever horror show
I've created in any case.

-- 
keith.packard@intel.com

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply

* Re: Repacking many disconnected blobs
From: Linus Torvalds @ 2006-06-14 19:18 UTC (permalink / raw)
  To: Keith Packard; +Cc: Git Mailing List
In-Reply-To: <1150311567.30681.28.camel@neko.keithp.com>

On Wed, 14 Jun 2006, Keith Packard wrote:
> 
> Ok, sounds like shuffling isn't necessary; the only benefit packing
> gains me is to reduce the size of each directory in the object store;

There's actually a secondary benefit to packing that turned out to be much 
bigger from a performance standpoint: the size benefit coupled with the 
fact that it's all in one file ends up meaning that accessing packed 
objects is _much_ faster than accessing individual files.

The Linux system call overhead is one of the lowest ones out there, but 
it's still much bigger than just a function call, and doing a full 
pathname walk and open/close is bigger yet. In contrast, if you access 
lots of objects and they are all in a pack, you only end up doing one mmap 
and a page fault for each 4kB entry, and that's it.

So packing has a large performance benefit outside of the actual disk use 
one, and to some degree that performance benefit is then further magnified 
by good locality (ie you get more effective objects per page fault), but 
in your case that locality issue is secondary.

I assume that you never actually end up looking at the _contents_ of the 
objects any more ever afterwards, because in a very real sense you're 
really interested in the SHA1 names, right? All the latter phases of 
parsecvs will just use the SHA1 names directly, and never actually even 
open the data (packed or not).

So in that sense, you only care about the disksize and a much improved 
directory walk from fewer files (until the repository has actually been 
fully created, at which point a repack will do the right thing).

			Linus

^ permalink raw reply

* Re: Repacking many disconnected blobs
From: Nicolas Pitre @ 2006-06-14 19:25 UTC (permalink / raw)
  To: Keith Packard; +Cc: Linus Torvalds, Git Mailing List
In-Reply-To: <1150311567.30681.28.camel@neko.keithp.com>

On Wed, 14 Jun 2006, Keith Packard wrote:

> On Wed, 2006-06-14 at 11:18 -0700, Linus Torvalds wrote:
> 
> > You don't _need_ to shuffle. As mentioned, it will only affect the 
> > location of the data in the pack-file, which in turn will mostly matter 
> > as an IO pattern thing, not anything really fundamental.  If the pack-file 
> > ends up caching well, the IO patterns obviously will never matter.
> 
> Ok, sounds like shuffling isn't necessary; the only benefit packing
> gains me is to reduce the size of each directory in the object store;
> the process I follow is to construct blobs for every revision, then just
> use the sha1 values to construct an index for each commit. I never
> actually look at the blobs myself, so IO access patterns aren't
> relevant.
> 
> Repacking after the import is completed should undo whatever horror show
> I've created in any case.

The only advantage of feeding object names from latest to oldest has to 
do with the delta direction.  In doing so the delta are backward such 
that objects with deeper delta chain are further back in history and 
this is what you want in the final pack for faster access to the latest 
revision.

Of course the final repack will do that automatically, but only if you 
use -a -f with git-repack.  But when -f is not provided then already 
deltified objects from other packs are copied as is without any delta 
computation making the repack process lots faster.  In that case it 
might be preferable that the reuse of already deltified data is made of 
backward delta which is the reason you might consider feeding object in 
the prefered order up front.

Nicolas

^ permalink raw reply

* Re: oprofile on svn import
From: Jon Smirl @ 2006-06-14 19:25 UTC (permalink / raw)
  To: git
In-Reply-To: <9e4733910606131932w362c6ddcx5bf36ea5591feba1@mail.gmail.com>

Stats after 18 hours into git-svnimport. Process is now stuck in the
kernel 64% of the time. All of the kernel time is in page management.
Perl svnimport process is 290MB now.

My top candidates for causing the problem are the fork in the perl
code or the execing of a million tiny git processes.

The key low level git functions could be made into a library to avoid
the need to exec them continuously. The svn functions are libraries
and they hardly show up.

   606218  2.4143 /usr/local/bin/git-update-index
   127170  0.5065 /usr/local/bin/git-write-tree
    81153  0.3232 /usr/local/bin/git-read-tree
    13065  0.0520 /usr/local/bin/git-ls-files
     2624  0.0105 /usr/local/bin/git-hash-object
      754  0.0030 /usr/local/bin/git-commit-tree
      462  0.0018 /usr/local/bin/git-ls-tree
      398  0.0016 /usr/local/bin/git-rev-parse

versus

   102784  0.3641 /usr/lib/libsvn_subr-1.so.0.0.0
    70235  0.2488 /usr/lib/libsvn_fs_fs-1.so.0.0.0
    67081  0.2376 /usr/lib/libsvn_delta-1.so.0.0.0
      848  0.0030 /usr/lib/libsvn_swig_perl-1.so.0.0.0
      512  0.0018 /usr/lib/libsvn_ra_local-1.so.0.0.0
      350  0.0012 /usr/lib/libsvn_fs-1.so.0.0.0
      222 7.9e-04 /usr/lib/libsvn_repos-1.so.0.0.0
      124 4.4e-04 /usr/lib/libsvn_ra-1.so.0.0.0

------------------------------------------------------------------------------------------------------------

  4093890 64.3711 /home/good/vmlinux
   906014 14.2459 /lib/libcrypto.so.0.9.8a
   435744  6.8515 /lib/libc-2.4.so
   158325  2.4895 /usr/lib/libz.so.1.2.3
   139995  2.2012 /usr/local/bin/git-update-index
    75322  1.1843 /nvidia
    64349  1.0118 /usr/bin/oprofiled
    52825  0.8306 /usr/lib/perl5/5.8.8/i386-linux-thread-multi/CORE/libperl.so
    51930  0.8165 /usr/lib/libapr-1.so.0.2.2
    42771  0.6725 /usr/local/bin/git-read-tree
    37774  0.5939 /lib/ld-2.4.so
    34761  0.5466 /usr/local/bin/git-write-tree
    29560  0.4648 /usr/lib/libsvn_subr-1.so.0.0.0
    28210  0.4436 /usr/lib/libaprutil-1.so.0.2.2

-----------------------------------------------------------------------------------------------------------------

2471826  32.8741    copy_page_range
375260  18.2903  unmap_vmas
574208    7.6367  release_pages
572189    7.6098  page_remove_rmap
233367    3.1037  free_pages_and_swap_cache
191051    2.5409  get_page_from_freelist
169058    2.2484  unlock_page
162027    2.1549  vm_normal_page
155691    2.0706  swap_info_get
136324    1.8130  swap_duplicate
119227    1.5857  page_fault
99729     1.3263  page_waitqueue
49288     0.6555  remove_exclusive_swap_page
39611     0.5268  do_wp_page
39142     0.5206  __wake_up_bit
34384     0.4573  __copy_from_user_ll
31111     0.4138  __handle_mm_fault
29990     0.3989  find_get_page
29682     0.3948  do_page_fault


-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply

* [PATCH] auto-detect changed $prefix in Makefile and properly rebuild to avoid broken install
From: Yakov Lerner @ 2006-06-14 19:26 UTC (permalink / raw)
  To: git; +Cc: iler.ml

Many times, I mistakenly used 'make prefix=... install' where prefix value
was different from prefix value during build. This resulted in broken
install. This patch adds auto-detection of $prefix change to the Makefile.
This results in correct install whenever prefix is changed.

Signed-off-by: Yakov Lerner <iler.ml@gmail.com>
---
 Makefile |   29 ++++++++++++++++++++++-------
 1 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/Makefile b/Makefile
index 2a1e639..015c9b2 100644
--- a/Makefile
+++ b/Makefile
@@ -464,6 +464,7 @@ DESTDIR_SQ = $(subst ','\'',$(DESTDIR))
 bindir_SQ = $(subst ','\'',$(bindir))
 gitexecdir_SQ = $(subst ','\'',$(gitexecdir))
 template_dir_SQ = $(subst ','\'',$(template_dir))
+prefix_SQ = $(subst ','\'',$(prefix))
 
 SHELL_PATH_SQ = $(subst ','\'',$(SHELL_PATH))
 PERL_PATH_SQ = $(subst ','\'',$(PERL_PATH))
@@ -484,7 +485,7 @@ all:
 strip: $(PROGRAMS) git$X
 	$(STRIP) $(STRIP_OPTS) $(PROGRAMS) git$X
 
-git$X: git.c common-cmds.h $(BUILTIN_OBJS) $(GITLIBS)
+git$X: git.c common-cmds.h $(BUILTIN_OBJS) $(GITLIBS) .git.prefix
 	$(CC) -DGIT_VERSION='"$(GIT_VERSION)"' \
 		$(ALL_CFLAGS) -o $@ $(filter %.c,$^) \
 		$(BUILTIN_OBJS) $(ALL_LDFLAGS) $(LIBS)
@@ -516,7 +517,7 @@ common-cmds.h: Documentation/git-*.txt
 	chmod +x $@+
 	mv $@+ $@
 
-$(patsubst %.py,%,$(SCRIPT_PYTHON)) : % : %.py
+$(patsubst %.py,%,$(SCRIPT_PYTHON)) : % : %.py .git.prefix
 	rm -f $@ $@+
 	sed -e '1s|#!.*python|#!$(PYTHON_PATH_SQ)|' \
 	    -e 's|@@GIT_PYTHON_PATH@@|$(GIT_PYTHON_DIR_SQ)|g' \
@@ -540,19 +541,19 @@ git$X git.spec \
 	$(patsubst %.py,%,$(SCRIPT_PYTHON)) \
 	: GIT-VERSION-FILE
 
-%.o: %.c
+%.o: %.c .git.prefix
 	$(CC) -o $*.o -c $(ALL_CFLAGS) $<
 %.o: %.S
 	$(CC) -o $*.o -c $(ALL_CFLAGS) $<
 
-exec_cmd.o: exec_cmd.c
+exec_cmd.o: exec_cmd.c .git.prefix
 	$(CC) -o $*.o -c $(ALL_CFLAGS) '-DGIT_EXEC_PATH="$(gitexecdir_SQ)"' $<
 
-http.o: http.c
+http.o: http.c .git.prefix
 	$(CC) -o $*.o -c $(ALL_CFLAGS) -DGIT_USER_AGENT='"git/$(GIT_VERSION)"' $<
 
 ifdef NO_EXPAT
-http-fetch.o: http-fetch.c http.h
+http-fetch.o: http-fetch.c http.h .git.prefix
 	$(CC) -o $*.o -c $(ALL_CFLAGS) -DNO_EXPAT $<
 endif
 
@@ -609,6 +610,14 @@ tags:
 	rm -f tags
 	find . -name '*.[hcS]' -print | xargs ctags -a
 
+### Detect prefix changes
+.git.prefix: .FORCE-git.prefix
+	@PREFIXES='$(bindir_SQ):$(gitexecdir_SQ):$(template_dir_SQ):$(prefix_SQ)';\
+	    if test x"$$PREFIXES" != x"`cat .git.prefix 2>/dev/null`" ; then \
+		echo 1>&2 "    * prefix changed"; \
+		echo "$$PREFIXES" >.git.prefix; \
+            fi
+
 ### Testing rules
 
 # GNU make supports exporting all variables by "export" without parameters.
@@ -632,6 +641,12 @@ test-dump-cache-tree$X: dump-cache-tree.
 check:
 	for i in *.c; do sparse $(ALL_CFLAGS) $(SPARSE_FLAGS) $$i || exit; done
 
+test-prefix-change:
+	mkdir -p "`pwd`/tmp1" "`pwd`/tmp2"
+	$(MAKE) clean install prefix="`pwd`/tmp1"
+	$(MAKE) install prefix="`pwd`/tmp2"
+	@grep -r "`pwd`/tmp1" "`pwd`/tmp2" >/dev/null; if test $$? = 0 ; then\
+	    echo Error, test failed; exit 1; else echo Ok, test passed; fi
 
 
 ### Installation rules
@@ -714,7 +729,7 @@ clean:
 	rm -f GIT-VERSION-FILE
 
 .PHONY: all install clean strip
-.PHONY: .FORCE-GIT-VERSION-FILE TAGS tags
+.PHONY: .FORCE-GIT-VERSION-FILE TAGS tags .FORCE-git.prefix
 
 ### Check documentation
 #
-- 
1.4.0

^ permalink raw reply related

* Re: oprofile on svn import
From: Jakub Narebski @ 2006-06-14 19:38 UTC (permalink / raw)
  To: git
In-Reply-To: <9e4733910606141225n11b406fte6229ea9993825dd@mail.gmail.com>

Jon Smirl wrote:

> Stats after 18 hours into git-svnimport. Process is now stuck in the
> kernel 64% of the time. All of the kernel time is in page management.
> Perl svnimport process is 290MB now.
> 
> My top candidates for causing the problem are the fork in the perl
> code or the execing of a million tiny git processes.
> 
> The key low level git functions could be made into a library to avoid
> the need to exec them continuously. The svn functions are libraries
> and they hardly show up.

There is ongoing effort to translate git functions into builtins.
Still you would need to translate git-svnimport Perl code into C,
or somehow access git library from Perl.

-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git

^ permalink raw reply

* Re: [PATCH] auto-detect changed $prefix in Makefile and properly rebuild to avoid broken install
From: Junio C Hamano @ 2006-06-14 20:04 UTC (permalink / raw)
  To: Yakov Lerner; +Cc: git
In-Reply-To: <0J0V00LDT7B9BU00@mxout2.netvision.net.il>

Yakov Lerner <iler.ml@gmail.com> writes:

> Many times, I mistakenly used 'make prefix=... install' where prefix value
> was different from prefix value during build. This resulted in broken
> install. This patch adds auto-detection of $prefix change to the Makefile.
> This results in correct install whenever prefix is changed.
>
> Signed-off-by: Yakov Lerner <iler.ml@gmail.com>

I do not mind this per se, and probably even agree that this is
an improvement compared to the current state of affairs, but a few
points:

 - please make sure you clean that state file in "make clean";

 - we may want to make the state file a bit more visible (IOW, I
   somewhat do mind the name being dot-git-dot-prefix).

 - we might want to later (or at the same time as this patch)
   do "consistent set of compilation flags" (e.g. run early
   part of compilation with openssl SHA-1 implementation,
   interrupt it and build and link the rest with mozilla SHA-1
   implementation -- then you will get a nonsense binary without
   linker errors).  It might make sense to prepare this
   mechanism so we could reuse it for that purpose.

^ permalink raw reply

* Re: [PATCH] gitweb: Adding a `blame' interface.
From: Junio C Hamano @ 2006-06-14 20:27 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: Florian Forster, git
In-Reply-To: <46a038f90606111502g607be3cfnf83ce81764a5f909@mail.gmail.com>

"Martin Langhoff" <martin.langhoff@gmail.com> writes:

> Florian,
>
> Looks good! git-blame/git-annotate are quite expensive to run. Do you
> think it would make sense making it conditional on a git-repo-config
> option (gitweb.blame=1)?
>
> kernel.org is the flagship user for gitweb, so expensive options
> should default to off :-/

Seconded.  Thanks Florian and Martin.

^ permalink raw reply

* Re: [PATCH] auto-detect changed $prefix in Makefile and properly rebuild to avoid broken install
From: Yakov Lerner @ 2006-06-14 20:30 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vver3cxlw.fsf@assigned-by-dhcp.cox.net>

On 6/14/06, Junio C Hamano <junkio@cox.net> wrote:
> Yakov Lerner <iler.ml@gmail.com> writes:
>
> > Many times, I mistakenly used 'make prefix=... install' where prefix value
> > was different from prefix value during build. This resulted in broken
> > install. This patch adds auto-detection of $prefix change to the Makefile.
> > This results in correct install whenever prefix is changed.
> >
> > Signed-off-by: Yakov Lerner <iler.ml@gmail.com>
>
> I do not mind this per se, and probably even agree that this is
> an improvement compared to the current state of affairs, but a few
> points:
>
>  - please make sure you clean that state file in "make clean";
done

>  - we may want to make the state file a bit more visible (IOW, I
>    somewhat do mind the name being dot-git-dot-prefix).
I renamed .git.prefix to GIT-PREFIX. Is this ok.

>  - we might want to later (or at the same time as this patch)
>    do "consistent set of compilation flags" (e.g. run early
>    part of compilation with openssl SHA-1 implementation,
>    interrupt it and build and link the rest with mozilla SHA-1
>    implementation -- then you will get a nonsense binary without
>    linker errors).  It might make sense to prepare this
>    mechanism so we could reuse it for that purpose.

Do you think two separate GIT-PREFIX and GIT-BUILD-FLAGS are needed,
or just once GIT-BUILD-FLAGS will do, which will include
prefixes (as passed with -D... to cc) ?

I think single GIT-BUILD-FLAGS
is enough, which will cover prefixes, too. Is this OK ?

BTW, I think it's useful to add Makefile itself as prerequisite for all *.o,
so change in Makefile will cause recompilations. Shall I include this
into this patch, too ?

Yakov

^ permalink raw reply

* Re: git-diff --cc broken in 1.4.0?
From: Junio C Hamano @ 2006-06-14 20:32 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: git
In-Reply-To: <46a038f90606112132jaf33a25x5794a19db2a06d8d@mail.gmail.com>

"Martin Langhoff" <martin.langhoff@gmail.com> writes:

> I was looking at some merges in gitk and lamenting the apparent loss
> of the nice two-sided diff we get with -cc, and now duting a slightly
> messy merge I did git-diff -cc only to get...

This not a regression that I know of in 1.4.0; mind showing the
stage 2 and 3 blobs and the file in the resolution result (I do
not need stage 1)?

One thing to note is that --cc does not show a hunk in which you
take only from one side.

>
> $ git-ls-files --unmerged
> 100644 f1d3843b2b2e42ba78adcf37da6440f0d321852e 1       local/version.php
> 100644 9352efa45cd25d9ad58df12b4ac241ac226a8ad4 2       local/version.php
> 100644 50da9b47903f6179f55a3f44290e7feaa08342f4 3       local/version.php

^ permalink raw reply

* Re: git-diff --cc broken in 1.4.0?
From: Martin Langhoff @ 2006-06-14 20:58 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vmzcfcwb5.fsf@assigned-by-dhcp.cox.net>

On 6/15/06, Junio C Hamano <junkio@cox.net> wrote:
> One thing to note is that --cc does not show a hunk in which you
> take only from one side.

Must have been that. I was looking at a big (but clean) merge, and
where gitk used to show a large-ish diff there was nothing. I somehow
expected an even larger diff with the changes against both sides.

thanks for the clarification and sorry about that!

m

^ permalink raw reply

* Re: Repacking many disconnected blobs
From: Keith Packard @ 2006-06-14 21:05 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: keithp, Linus Torvalds, Git Mailing List
In-Reply-To: <Pine.LNX.4.64.0606141514000.2703@localhost.localdomain>

[-- Attachment #1: Type: text/plain, Size: 1149 bytes --]

On Wed, 2006-06-14 at 15:25 -0400, Nicolas Pitre wrote:

> The only advantage of feeding object names from latest to oldest has to 
> do with the delta direction.  In doing so the delta are backward such 
> that objects with deeper delta chain are further back in history and 
> this is what you want in the final pack for faster access to the latest 
> revision.

Ok, so I'm feeding them from latest to oldest along each branch, which
optimizes only the 'master' branch, leaving other branches much further
down in the data file. That should mean repacking will help a lot for
repositories with many active branches.

> In that case it 
> might be preferable that the reuse of already deltified data is made of 
> backward delta which is the reason you might consider feeding object in 
> the prefered order up front.

Hmm. As I'm deltafying along branches, the delta data should actually be
fairly good; the only 'bad' result will be the sub-optimal object
ordering in the pack files. I'll experiment with some larger trees to
see how much additional savings the various repack options yield.

-- 
keith.packard@intel.com

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply

* Re: Repacking many disconnected blobs
From: Linus Torvalds @ 2006-06-14 21:17 UTC (permalink / raw)
  To: Keith Packard; +Cc: Nicolas Pitre, Git Mailing List
In-Reply-To: <1150319115.30681.54.camel@neko.keithp.com>

On Wed, 14 Jun 2006, Keith Packard wrote:
> 
> > In that case it 
> > might be preferable that the reuse of already deltified data is made of 
> > backward delta which is the reason you might consider feeding object in 
> > the prefered order up front.
> 
> Hmm. As I'm deltafying along branches, the delta data should actually be
> fairly good; the only 'bad' result will be the sub-optimal object
> ordering in the pack files. I'll experiment with some larger trees to
> see how much additional savings the various repack options yield.

The fact that git repacking sorts by filesize after it sorts by filename 
should make this a non-issue: we always try to delta against the larger 
version (where "larger" is not only almost invariable also "newer", but 
the delta is simpler, since deleting data doesn't take up any space in the 
delta, while adding data needs to ay what the data added was, of course).

		Linus

^ permalink raw reply

* Re: Repacking many disconnected blobs
From: Nicolas Pitre @ 2006-06-14 21:20 UTC (permalink / raw)
  To: Keith Packard; +Cc: Linus Torvalds, Git Mailing List
In-Reply-To: <1150319115.30681.54.camel@neko.keithp.com>

On Wed, 14 Jun 2006, Keith Packard wrote:

> Hmm. As I'm deltafying along branches, the delta data should actually be
> fairly good; the only 'bad' result will be the sub-optimal object
> ordering in the pack files. I'll experiment with some larger trees to
> see how much additional savings the various repack options yield.

Note that the object list order is unlikely to affect pack size.  It is 
really about optimizing the pack layout for subsequent access to it.


Nicolas

^ permalink raw reply

* Re: [PATCH] auto-detect changed $prefix in Makefile and properly rebuild to avoid broken install
From: Junio C Hamano @ 2006-06-14 21:32 UTC (permalink / raw)
  To: Yakov Lerner; +Cc: Junio C Hamano, git
In-Reply-To: <f36b08ee0606141330l28330d79hab1aec5c741188c7@mail.gmail.com>

"Yakov Lerner" <iler.ml@gmail.com> writes:

> I think single GIT-BUILD-FLAGS
> is enough, which will cover prefixes, too. Is this OK ?

Yes, it was what I was getting at.  I think a single
GIT-BUILD-FLAGS (or whatever name the list can fight over while
I am away) is preferred.

> BTW, I think it's useful to add Makefile itself as prerequisite for all *.o,
> so change in Makefile will cause recompilations. Shall I include this
> into this patch, too ?

I've thought about it but in practice this would make things
more inconvenient for developers without much gain, so I'd leave
it out.

^ permalink raw reply

* Re: [PATCH] auto-detect changed $prefix in Makefile and properly rebuild to avoid broken install
From: Yakov Lerner @ 2006-06-14 21:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vhd2nctjk.fsf@assigned-by-dhcp.cox.net>

On 6/15/06, Junio C Hamano <junkio@cox.net> wrote:
> "Yakov Lerner" <iler.ml@gmail.com> writes:
>
> > I think single GIT-BUILD-FLAGS
> > is enough, which will cover prefixes, too. Is this OK ?
>
> Yes, it was what I was getting at.  I think a single
> GIT-BUILD-FLAGS (or whatever name the list can fight over while
> I am away) is preferred.

Either GIT-CFLAGS or GIT-BUILD-FLAGS,
whichever is shorter :-)

Yakov

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox