Git development

Git development
 help / color / mirror / Atom feed

* Re: [PATCH] Make builtin-tag.c use parse_options.
From: Junio C Hamano @ 2007-11-10  9:26 UTC (permalink / raw)
  To: Kristian Høgsberg; +Cc: git, Carlos Rica
In-Reply-To: <7vabpmpr9y.fsf@gitster.siamese.dyndns.org>

Junio C Hamano <gitster@pobox.com> writes:

> While I freely admit that I do not particularly find the "One -m
> introduces one new line, concatenated to form the final
> paragraph" handling of multiple -m options done by git-commit
> nice nor useful, I suspect that it would make more sense to make
> git-tag and git-commit handle multiple -m option consistently,
> if you are going to change the existing semantics.  Since some
> people really seem to like multiple -m handling of git-commit,
> the avenue of the least resistance for better consistency would
> be to accept and concatenate (with LF in between) multiple -m
> options.
>
> With multiple -F, I think erroring out would be the sensible
> thing to do, but some people might prefer concatenation.  I do
> not care either way as long as commit and tag behave
> consistently.

Alas, this exposes a regression in kh/commit series.

 t/t7501-commit.sh |   23 +++++++++++++++++++++++
 1 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/t/t7501-commit.sh b/t/t7501-commit.sh
index 1b444d4..bf5dd86 100644
--- a/t/t7501-commit.sh
+++ b/t/t7501-commit.sh
@@ -178,4 +178,27 @@ test_expect_success 'amend commit to fix author' '
 	diff expected current
 
 '
+
+test_expect_success 'sign off' '
+
+	>positive &&
+	git add positive &&
+	git commit -s -m "thank you" &&
+	actual=$(git cat-file commit HEAD | sed -ne "s/Signed-off-by: //p") &&
+	expected=$(git var GIT_COMMITTER_IDENT | sed -e "s/>.*/>/") &&
+	test "z$actual" = "z$expected"
+
+'
+
+test_expect_success 'multiple -m' '
+
+	>negative &&
+	git add negative &&
+	git commit -m "one" -m "two" -m "three" &&
+	actual=$(git cat-file commit HEAD | sed -e "1,/^\$/d") &&
+	expected=$(echo one; echo; echo two; echo; echo three) &&
+	test "z$actual" = "z$expected"
+
+'
+
 test_done

^ permalink raw reply related

* Re: [PATCH] builtin-commit: fix --signoff
From: Junio C Hamano @ 2007-11-10  9:06 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git, krh, gitster
In-Reply-To: <Pine.LNX.4.64.0711100548071.4362@racer.site>

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

>  	if (signoff) {
> ...
> +		strbuf_init(&sob, 0);
> +		strbuf_addstr(&sob, sign_off_header);
> +		strbuf_addstr(&sob, git_committer_info(1));
> +		p = strrchr(sob.buf, '>');
> +		if (p)
> +			strbuf_setlen(&sob, p + 1 - sob.buf);
> +		strbuf_addch(&sob, '\n');
> +
> +		for (i = sb.len - 1; i > 0 && sb.buf[i - 1] != '\n'; i--)
> +			; /* do nothing */
> +		if (prefixcmp(sb.buf + i, sob.buf))
> +			strbuf_addbuf(&sb, &sob);
>  	}

At this point doesn't this leak sob.buf?

^ permalink raw reply

* [PATCH 2/2] git-add: make the entry stat-clean after re-adding the same contents
From: Junio C Hamano @ 2007-11-10  9:02 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Johannes Schindelin, Kristian Høgsberg, git
In-Reply-To: <alpine.LFD.0.999.0711091840120.15101@woody.linux-foundation.org>

Earlier in commit 0781b8a9b2fe760fc4ed519a3a26e4b9bd6ccffe
(add_file_to_index: skip rehashing if the cached stat already
matches), add_file_to_index() were taught not to re-add the path
if it already matches the index.

The change meant well, but was not executed quite right.  It
used ie_modified() to see if the file on the work tree is really
different from the index, and skipped adding the contents if the
function says "not modified".

This was wrong.  There are three possible comparison results
between the index and the file in the work tree:

 - with lstat(2) we _know_ they are different.  E.g. if the
   length or the owner in the cached stat information is
   different from the length we just obtained from lstat(2), we
   can tell the file is modified without looking at the actual
   contents.

 - with lstat(2) we _know_ they are the same.  The same length,
   the same owner, the same everything (but this has a twist, as
   described below).

 - we cannot tell from lstat(2) information alone and need to go
   to the filesystem to actually compare.

The last case arises from what we call 'racy git' situation,
that can be caused with this sequence:

    $ echo hello >file
    $ git add file
    $ echo aeiou >file ;# the same length

If the second "echo" is done within the same filesystem
timestamp granularity as the first "echo", then the timestamp
recorded by "git add" and the timestamp we get from lstat(2)
will be the same, and we can mistakenly say the file is not
modified.  The path is called 'racily clean'.  We need to
reliably detect racily clean paths are in fact modified.

To solve this problem, when we write out the index, we mark the
index entry that has the same timestamp as the index file itself
(that is the time from the point of view of the filesystem) to
tell any later code that does the lstat(2) comparison not to
trust the cached stat info, and ie_modified() then actually goes
to the filesystem to compare the contents for such a path.

That's all good, but it should not be used for this "git add"
optimization, as the goal of "git add" is to actually update the
path in the index and make it stat-clean.  With the false
optimization, we did _not_ cause any data loss (after all, what
we failed to do was only to update the cached stat information),
but it made the following sequence leave the file stat dirty:

    $ echo hello >file
    $ git add file
    $ echo hello >file ;# the same contents
    $ git add file

The solution is not to use ie_modified() which goes to the
filesystem to see if it is really clean, but instead use
ie_match_stat() with "assume racily clean paths are dirty"
option, to force re-adding of such a path.

There was another problem with "git add -u".  The codepath
shares the same issue when adding the paths that are found to be
modified, but in addition, it asked "git diff-files" machinery
run_diff_files() function (which is "git diff-files") to list
the paths that are modified.  But "git diff-files" machinery
uses the same ie_modified() call so that it does not report
racily clean _and_ actually clean paths as modified, which is
not what we want.

The patch allows the callers of run_diff_files() to pass the
same "assume racily clean paths are dirty" option, and makes
"git-add -u" codepath to use that option, to discover and re-add
racily clean _and_ actually clean paths.

We could further optimize on top of this patch to differentiate
the case where the path really needs re-adding (i.e. the content
of the racily clean entry was indeed different) and the case
where only the cached stat information needs to be refreshed
(i.e. the racily clean entry was actually clean), but I do not
think it is worth it.

This patch applies to maint and all the way up.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 * And this is a respin of the earlier fix, using the new
   constants.

 builtin-add.c |    2 +-
 diff-lib.c    |    4 +++-
 diff.h        |    2 ++
 read-cache.c  |    3 ++-
 4 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/builtin-add.c b/builtin-add.c
index 373f87f..e072320 100644
--- a/builtin-add.c
+++ b/builtin-add.c
@@ -123,7 +123,7 @@ static void update(int verbose, const char *prefix, const char **files)
 	rev.diffopt.format_callback_data = &verbose;
 	if (read_cache() < 0)
 		die("index file corrupt");
-	run_diff_files(&rev, 0);
+	run_diff_files(&rev, DIFF_RACY_IS_MODIFIED);
 }

 static void refresh(int verbose, const char **pathspec)
diff --git a/diff-lib.c b/diff-lib.c
index 9f8afbe..ec1b5e3 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -338,6 +338,8 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
 	int entries, i;
 	int diff_unmerged_stage = revs->max_count;
 	int silent_on_removed = option & DIFF_SILENT_ON_REMOVED;
+	unsigned ce_option = ((option & DIFF_RACY_IS_MODIFIED)
+			      ? CE_MATCH_RACY_IS_DIRTY : 0);

 	if (diff_unmerged_stage < 0)
 		diff_unmerged_stage = 2;
@@ -443,7 +445,7 @@ int run_diff_files(struct rev_info *revs, unsigned int option)
 				       ce->sha1, ce->name, NULL);
 			continue;
 		}
-		changed = ce_match_stat(ce, &st, 0);
+		changed = ce_match_stat(ce, &st, ce_option);
 		if (!changed && !revs->diffopt.find_copies_harder)
 			continue;
 		oldmode = ntohl(ce->ce_mode);
diff --git a/diff.h b/diff.h
index de533da..efaa8f7 100644
--- a/diff.h
+++ b/diff.h
@@ -226,6 +226,8 @@ extern const char *diff_unique_abbrev(const unsigned char *, int);

 /* do not report anything on removed paths */
 #define DIFF_SILENT_ON_REMOVED 01
+/* report racily-clean paths as modified */
+#define DIFF_RACY_IS_MODIFIED 02
 extern int run_diff_files(struct rev_info *revs, unsigned int option);
 extern int setup_diff_no_index(struct rev_info *revs,
 		int argc, const char ** argv, int nongit, const char *prefix);
diff --git a/read-cache.c b/read-cache.c
index 9e4d4a9..c3dbf89 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -388,6 +388,7 @@ int add_file_to_index(struct index_state *istate, const char *path, int verbose)
 	int size, namelen, pos;
 	struct stat st;
 	struct cache_entry *ce;
+	unsigned ce_option = CE_MATCH_IGNORE_VALID|CE_MATCH_RACY_IS_DIRTY;

 	if (lstat(path, &st))
 		die("%s: unable to stat (%s)", path, strerror(errno));
@@ -422,7 +423,7 @@ int add_file_to_index(struct index_state *istate, const char *path, int verbose)
 	pos = index_name_pos(istate, ce->name, namelen);
 	if (0 <= pos &&
 	    !ce_stage(istate->cache[pos]) &&
-	    !ie_modified(istate, istate->cache[pos], &st, CE_MATCH_IGNORE_VALID)) {
+	    !ie_match_stat(istate, istate->cache[pos], &st, ce_option)) {
 		/* Nothing changed, really */
 		free(ce);
 		return 0;
-- 
1.5.3.5.1651.g30bf0

^ permalink raw reply related

* [PATCH 1/2] ce_match_stat, run_diff_files: use symbolic constants for readability
From: Junio C Hamano @ 2007-11-10  9:01 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Johannes Schindelin, Kristian Høgsberg, git
In-Reply-To: <alpine.LFD.0.999.0711091840120.15101@woody.linux-foundation.org>

ce_match_stat() can be told:

 (1) to ignore CE_VALID bit (used under "assume unchanged" mode)
     and perform the stat comparison anyway;

 (2) not to perform the contents comparison for racily clean
     entries and report mismatch of cached stat information;

using its "option" parameter.  Give them symbolic constants.

Similarly, run_diff_files() can be told not to report anything
on removed paths.  Also give it a symbolic constant for that.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 Linus Torvalds <torvalds@linux-foundation.org> writes:

 > On Fri, 9 Nov 2007, Junio C Hamano wrote:
 >>  
 >> -int run_diff_files(struct rev_info *revs, int silent_on_removed)
 >> +int run_diff_files(struct rev_info *revs, int option)
 >
 > Wouldn't it be much better to now 
 >  - make it "unsigned int flags"
 >  - create a few enums or #define's to make the usage be more readable?
 >
 > Because this:
 >
 >>-       run_diff_files(&rev, 0);
 >>+       run_diff_files(&rev, 2);
 >> -	    !ie_modified(istate, istate->cache[pos], &st, 1)) {
 >> +	    !ie_match_stat(istate, istate->cache[pos], &st, 3)) {
 >
 > just went from subtle to "incredibly non-obvious".

 That really is true.  Apparently I am getting much slower
 lately.  This is to just introduce the constants and change the
 types.

 builtin-apply.c |    2 +-
 cache.h         |   14 ++++++++++----
 check-racy.c    |    2 +-
 diff-lib.c      |   16 +++++++++-------
 diff.h          |    4 +++-
 entry.c         |    2 +-
 read-cache.c    |   47 ++++++++++++++++++++++++++++++-----------------
 unpack-trees.c  |    4 ++--
 8 files changed, 57 insertions(+), 34 deletions(-)

diff --git a/builtin-apply.c b/builtin-apply.c
index 5cc90e6..0fff02e 100644
--- a/builtin-apply.c
+++ b/builtin-apply.c
@@ -2099,7 +2099,7 @@ static int verify_index_match(struct cache_entry *ce, struct stat *st)
 			return -1;
 		return 0;
 	}
-	return ce_match_stat(ce, st, 1);
+	return ce_match_stat(ce, st, CE_MATCH_IGNORE_VALID);
 }
 
 static int check_patch(struct patch *patch, struct patch *prev_patch)
diff --git a/cache.h b/cache.h
index fc195bc..31af16a 100644
--- a/cache.h
+++ b/cache.h
@@ -174,8 +174,8 @@ extern struct index_state the_index;
 #define remove_file_from_cache(path) remove_file_from_index(&the_index, (path))
 #define add_file_to_cache(path, verbose) add_file_to_index(&the_index, (path), (verbose))
 #define refresh_cache(flags) refresh_index(&the_index, (flags), NULL, NULL)
-#define ce_match_stat(ce, st, really) ie_match_stat(&the_index, (ce), (st), (really))
-#define ce_modified(ce, st, really) ie_modified(&the_index, (ce), (st), (really))
+#define ce_match_stat(ce, st, options) ie_match_stat(&the_index, (ce), (st), (options))
+#define ce_modified(ce, st, options) ie_modified(&the_index, (ce), (st), (options))
 #endif
 
 enum object_type {
@@ -266,8 +266,14 @@ extern int remove_file_from_index(struct index_state *, const char *path);
 extern int add_file_to_index(struct index_state *, const char *path, int verbose);
 extern struct cache_entry *make_cache_entry(unsigned int mode, const unsigned char *sha1, const char *path, int stage, int refresh);
 extern int ce_same_name(struct cache_entry *a, struct cache_entry *b);
-extern int ie_match_stat(struct index_state *, struct cache_entry *, struct stat *, int);
-extern int ie_modified(struct index_state *, struct cache_entry *, struct stat *, int);
+
+/* do stat comparison even if CE_VALID is true */
+#define CE_MATCH_IGNORE_VALID		01
+/* do not check the contents but report dirty on racily-clean entries */
+#define CE_MATCH_RACY_IS_DIRTY	02
+extern int ie_match_stat(struct index_state *, struct cache_entry *, struct stat *, unsigned int);
+extern int ie_modified(struct index_state *, struct cache_entry *, struct stat *, unsigned int);
+
 extern int ce_path_match(const struct cache_entry *ce, const char **pathspec);
 extern int index_fd(unsigned char *sha1, int fd, struct stat *st, int write_object, enum object_type type, const char *path);
 extern int read_fd(int fd, char **return_buf, unsigned long *return_size);
diff --git a/check-racy.c b/check-racy.c
index d6a08b4..00d92a1 100644
--- a/check-racy.c
+++ b/check-racy.c
@@ -18,7 +18,7 @@ int main(int ac, char **av)
 
 		if (ce_match_stat(ce, &st, 0))
 			dirty++;
-		else if (ce_match_stat(ce, &st, 2))
+		else if (ce_match_stat(ce, &st, CE_MATCH_RACY_IS_DIRTY))
 			racy++;
 		else
 			clean++;
diff --git a/diff-lib.c b/diff-lib.c
index da55713..9f8afbe 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -173,9 +173,10 @@ static int is_in_index(const char *path)
 }
 
 static int handle_diff_files_args(struct rev_info *revs,
-		int argc, const char **argv, int *silent)
+				  int argc, const char **argv,
+				  unsigned int *options)
 {
-	*silent = 0;
+	*options = 0;
 
 	/* revs->max_count == -2 means --no-index */
 	while (1 < argc && argv[1][0] == '-') {
@@ -192,7 +193,7 @@ static int handle_diff_files_args(struct rev_info *revs,
 			revs->diffopt.no_index = 1;
 		}
 		else if (!strcmp(argv[1], "-q"))
-			*silent = 1;
+			*options |= DIFF_SILENT_ON_REMOVED;
 		else
 			return error("invalid option: %s", argv[1]);
 		argv++; argc--;
@@ -305,9 +306,9 @@ int setup_diff_no_index(struct rev_info *revs,
 
 int run_diff_files_cmd(struct rev_info *revs, int argc, const char **argv)
 {
-	int silent_on_removed;
+	unsigned int options;
 
-	if (handle_diff_files_args(revs, argc, argv, &silent_on_removed))
+	if (handle_diff_files_args(revs, argc, argv, &options))
 		return -1;
 
 	if (revs->diffopt.no_index) {
@@ -329,13 +330,14 @@ int run_diff_files_cmd(struct rev_info *revs, int argc, const char **argv)
 		perror("read_cache");
 		return -1;
 	}
-	return run_diff_files(revs, silent_on_removed);
+	return run_diff_files(revs, options);
 }
 
-int run_diff_files(struct rev_info *revs, int silent_on_removed)
+int run_diff_files(struct rev_info *revs, unsigned int option)
 {
 	int entries, i;
 	int diff_unmerged_stage = revs->max_count;
+	int silent_on_removed = option & DIFF_SILENT_ON_REMOVED;
 
 	if (diff_unmerged_stage < 0)
 		diff_unmerged_stage = 2;
diff --git a/diff.h b/diff.h
index 4546aad..de533da 100644
--- a/diff.h
+++ b/diff.h
@@ -224,7 +224,9 @@ extern void diff_flush(struct diff_options*);
 
 extern const char *diff_unique_abbrev(const unsigned char *, int);
 
-extern int run_diff_files(struct rev_info *revs, int silent_on_removed);
+/* do not report anything on removed paths */
+#define DIFF_SILENT_ON_REMOVED 01
+extern int run_diff_files(struct rev_info *revs, unsigned int option);
 extern int setup_diff_no_index(struct rev_info *revs,
 		int argc, const char ** argv, int nongit, const char *prefix);
 extern int run_diff_files_cmd(struct rev_info *revs, int argc, const char **argv);
diff --git a/entry.c b/entry.c
index fc3a506..ef88f62 100644
--- a/entry.c
+++ b/entry.c
@@ -200,7 +200,7 @@ int checkout_entry(struct cache_entry *ce, const struct checkout *state, char *t
 	strcpy(path + len, ce->name);
 
 	if (!lstat(path, &st)) {
-		unsigned changed = ce_match_stat(ce, &st, 1);
+		unsigned changed = ce_match_stat(ce, &st, CE_MATCH_IGNORE_VALID);
 		if (!changed)
 			return 0;
 		if (!state->force) {
diff --git a/read-cache.c b/read-cache.c
index 928e8fa..9e4d4a9 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -194,11 +194,12 @@ static int ce_match_stat_basic(struct cache_entry *ce, struct stat *st)
 }
 
 int ie_match_stat(struct index_state *istate,
-		  struct cache_entry *ce, struct stat *st, int options)
+		  struct cache_entry *ce, struct stat *st,
+		  unsigned int options)
 {
 	unsigned int changed;
-	int ignore_valid = options & 01;
-	int assume_racy_is_modified = options & 02;
+	int ignore_valid = options & CE_MATCH_IGNORE_VALID;
+	int assume_racy_is_modified = options & CE_MATCH_RACY_IS_DIRTY;
 
 	/*
 	 * If it's marked as always valid in the index, it's
@@ -238,10 +239,11 @@ int ie_match_stat(struct index_state *istate,
 }
 
 int ie_modified(struct index_state *istate,
-		struct cache_entry *ce, struct stat *st, int really)
+		struct cache_entry *ce, struct stat *st, unsigned int options)
 {
 	int changed, changed_fs;
-	changed = ie_match_stat(istate, ce, st, really);
+
+	changed = ie_match_stat(istate, ce, st, options);
 	if (!changed)
 		return 0;
 	/*
@@ -420,7 +422,7 @@ int add_file_to_index(struct index_state *istate, const char *path, int verbose)
 	pos = index_name_pos(istate, ce->name, namelen);
 	if (0 <= pos &&
 	    !ce_stage(istate->cache[pos]) &&
-	    !ie_modified(istate, istate->cache[pos], &st, 1)) {
+	    !ie_modified(istate, istate->cache[pos], &st, CE_MATCH_IGNORE_VALID)) {
 		/* Nothing changed, really */
 		free(ce);
 		return 0;
@@ -782,11 +784,13 @@ int add_index_entry(struct index_state *istate, struct cache_entry *ce, int opti
  * to link up the stat cache details with the proper files.
  */
 static struct cache_entry *refresh_cache_ent(struct index_state *istate,
-					     struct cache_entry *ce, int really, int *err)
+					     struct cache_entry *ce,
+					     unsigned int options, int *err)
 {
 	struct stat st;
 	struct cache_entry *updated;
 	int changed, size;
+	int ignore_valid = options & CE_MATCH_IGNORE_VALID;
 
 	if (lstat(ce->name, &st) < 0) {
 		if (err)
@@ -794,16 +798,23 @@ static struct cache_entry *refresh_cache_ent(struct index_state *istate,
 		return NULL;
 	}
 
-	changed = ie_match_stat(istate, ce, &st, really);
+	changed = ie_match_stat(istate, ce, &st, options);
 	if (!changed) {
-		if (really && assume_unchanged &&
+		/*
+		 * The path is unchanged.  If we were told to ignore
+		 * valid bit, then we did the actual stat check and
+		 * found that the entry is unmodified.  If the entry
+		 * is not marked VALID, this is the place to mark it
+		 * valid again, under "assume unchanged" mode.
+		 */
+		if (ignore_valid && assume_unchanged &&
 		    !(ce->ce_flags & htons(CE_VALID)))
 			; /* mark this one VALID again */
 		else
 			return ce;
 	}
 
-	if (ie_modified(istate, ce, &st, really)) {
+	if (ie_modified(istate, ce, &st, options)) {
 		if (err)
 			*err = EINVAL;
 		return NULL;
@@ -814,13 +825,14 @@ static struct cache_entry *refresh_cache_ent(struct index_state *istate,
 	memcpy(updated, ce, size);
 	fill_stat_cache_info(updated, &st);
 
-	/* In this case, if really is not set, we should leave
-	 * CE_VALID bit alone.  Otherwise, paths marked with
-	 * --no-assume-unchanged (i.e. things to be edited) will
-	 * reacquire CE_VALID bit automatically, which is not
-	 * really what we want.
+	/*
+	 * If ignore_valid is not set, we should leave CE_VALID bit
+	 * alone.  Otherwise, paths marked with --no-assume-unchanged
+	 * (i.e. things to be edited) will reacquire CE_VALID bit
+	 * automatically, which is not really what we want.
 	 */
-	if (!really && assume_unchanged && !(ce->ce_flags & htons(CE_VALID)))
+	if (!ignore_valid && assume_unchanged &&
+	    !(ce->ce_flags & htons(CE_VALID)))
 		updated->ce_flags &= ~htons(CE_VALID);
 
 	return updated;
@@ -834,6 +846,7 @@ int refresh_index(struct index_state *istate, unsigned int flags, const char **p
 	int allow_unmerged = (flags & REFRESH_UNMERGED) != 0;
 	int quiet = (flags & REFRESH_QUIET) != 0;
 	int not_new = (flags & REFRESH_IGNORE_MISSING) != 0;
+	unsigned int options = really ? CE_MATCH_IGNORE_VALID : 0;
 
 	for (i = 0; i < istate->cache_nr; i++) {
 		struct cache_entry *ce, *new;
@@ -855,7 +868,7 @@ int refresh_index(struct index_state *istate, unsigned int flags, const char **p
 		if (pathspec && !match_pathspec(pathspec, ce->name, strlen(ce->name), 0, seen))
 			continue;
 
-		new = refresh_cache_ent(istate, ce, really, &cache_errno);
+		new = refresh_cache_ent(istate, ce, options, &cache_errno);
 		if (new == ce)
 			continue;
 		if (!new) {
diff --git a/unpack-trees.c b/unpack-trees.c
index ccfeb6e..9411c67 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -406,7 +406,7 @@ static void verify_uptodate(struct cache_entry *ce,
 		return;
 
 	if (!lstat(ce->name, &st)) {
-		unsigned changed = ce_match_stat(ce, &st, 1);
+		unsigned changed = ce_match_stat(ce, &st, CE_MATCH_IGNORE_VALID);
 		if (!changed)
 			return;
 		/*
@@ -927,7 +927,7 @@ int oneway_merge(struct cache_entry **src,
 		if (o->reset) {
 			struct stat st;
 			if (lstat(old->name, &st) ||
-			    ce_match_stat(old, &st, 1))
+			    ce_match_stat(old, &st, CE_MATCH_IGNORE_VALID))
 				old->ce_flags |= htons(CE_UPDATE);
 		}
 		return keep_entry(old, o);
-- 
1.5.3.5.1651.g30bf0

^ permalink raw reply related

* Re: git packs
From: David Brown @ 2007-11-10  7:59 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Luke Lu, bob, git
In-Reply-To: <alpine.LFD.0.999.0711092254130.15101@woody.linux-foundation.org>

On Fri, Nov 09, 2007 at 10:58:16PM -0800, Linus Torvalds wrote:

>But hearing whether -m64 makes a difference would be interesting. I'm 
>hoping OS X is LP64, not some insane half-way thing like Vista is.

Some casual tests with printf and sizeof makes it look like it is.  At
least sizeof (void *) and sizeof (long) are both 8.

David

^ permalink raw reply

* Re: git packs
From: bob @ 2007-11-10  7:19 UTC (permalink / raw)
  To: git
In-Reply-To: <alpine.LFD.0.999.0711092211250.15101@woody.linux-foundation.org>

I will try a few things and see if I can get a script put together
that generates the inflate problem.  The data that I am
using is a backup of my original repository.  So, I can
play all that I want.  But it would be a lot easier if I
could just generate some files using dd or something.

I'll also try the 64-bit compile on my Mac Pro and see
if it works.

My only reason for keeping this directory under git is
that I find git so easy to work with across multiple
machines.  I own 10+ computers and I use git to
provide easy transfers/updates from machine to
machine normally without any issues and a great
deal of reliability.

This directory is a website that I use
internally to keep track of things important to me.
For instance, the one large file is a movie of the
inside of my house before they sheet-rocked it
so that later I would have an easier time finding
things in the walls.  There is some html and php
that I wrote in it which I did want versioned.

Maybe I should just drop back to using two
directories, one of the large files which are
static anyway and a git repo for the html and php.
I was just trying to keep everything in the
repo for simplicity.

No matter which direction that I decide, I will still
try to provide the script.  Thank you both (Luke Lu)
for the feedback.


On Nov 10, 2007, at 1:38 AM, Linus Torvalds wrote:

>
>
> On Sat, 10 Nov 2007, bob wrote:
>>
>> The reason that I ask is that I have been playing different
>> scenarios using git 1.5.3.5 under MacOSX 10.4.10 mostly
>> all day and every time that
>>
>> A) 	a file approaches or exceeds 2gig on an 'add', it
>> 	results in:
>> 		fatal: Out of memory? mmap failed: Cannot allocate memory
>
> Git wants to handle single files as one single entity, so single  
> big files
> really do end up being very painful. The costs of compressing them and
> generating deltas would probably get prohibitively high *anyway*,  
> but it
> does mean that if you have gigabyte files, you do want a 64-bit VM.
>
> I thought OS X could do 64 bits these days. Maybe not.
>
> Anyway, that explains the "cannot allocate memory". Git simply  
> wants to
> mmap the whole file. You don't have enough VM space for it.
>
> (And if you seriously want to work with multi-gigabyte files, git  
> probbaly
> isn't going to perform wonderfully well, even if it *should* work  
> fine if
> you just have a full 64-bit environment that allows the mmap).
>
>> B) 	the repository size less the .git subdirectory approaches
>> 	4gig on a 'fetch' it results in:
>>
>> 	Resolving 3356 deltas...
>> 	fatal: serious inflate inconsistency: -3 (unknown compression  
>> method)
>
> That sounds really broken. I'm not seeing what would cause that, apart
> from some really bad data corruption and/or broken zlib  
> implementation.
> But if the pack-file really is 2GB+ in size, I could imagine some sign
> issues cropping up.
>
> git will generally use "unsigned long" (which is probably just 32- 
> bit on
> your setup), but since git in those circumstances would be limited  
> by the
> size of the VM _anyway_, that's not really much of a limitation  
> (although
> probably broken on the crazy Windows "LLP64" model). But maybe we have
> some place where we use a signed thing, or zlib does, and I could  
> see that
> causing breakage.
>
> But that code-sequence really should never even come *close* to the  
> 31-bit
> limit, as long as the individual objects themselves aren't bigger  
> than the
> available VM space (and git currently assumes "unsigned long" is
> sufficiently big to cover the VM space, which is not technically  
> correct,
> but should be fine on OS X too).
>
> That said, we should use "off_t" in that function. I suspect we have a
> number of people (read: me) who have grown too used to living in a  
> 64-bit
> world..
>
>> I have been testing on my laptop which has a 32-bit Intel Core Duo.
>
> Ok, so you're 32-bit limited even if there is were to be some 64-bit
> support for OS X.
>
>> Also, I have run the same tests on a dual quad-core Intel processor
>> which is 64 bit, (but not sure that Apple uses the 64 bits in  
>> 10.4.10).  I
>> get the same results as above.
>
> I'm pretty sure OS X defaults to a 32-bit environment, but has at  
> least
> *some* 64-bit support. It would definitely need to be enabled  
> explicitly
> (since they made the *insane* decision to move over to Intel laptop  
> chips
> six months before they got 64-bit support! Somebody at Apple is a  
> total
> idiot, and should get fired).
>
> So it would be interesting to hear if a 64-bit build would make a
> difference.
>
>> The zlib is at the latest revision of 1.2.3 and gcc is at 4.0.1
>> which from what I can tell supports large files, because 'off_t'  
>> is 8 bytes
>> which is the size used for a 'stat' file size.
>
> See above: single files are size-limited, but with a large off_t like
> yours, you should be fine. Except we may have screwed up.
>
>> I am just wondering if these size limitations exist for MacOSX
>> or maybe I am doing something wrong (which is probably
>> the case).
>
> We *have* had issues with broken implementations of "pread()" on some
> systems.
>
> You could try setting NO_PREAD in the Makefile and compiling with the
> compatibility function.. That's the only thing that comes to mind  
> as being
> worth trying in that area.
>
> And if you have some script to generate the repository (ie you aren't
> using "live data", but are testing the limits of the system), if  
> you can
> make that available, so that people with non-OSX environments can  
> test,
> that would be interesting..
>
> I certainly have some 32-bit environments too (old linux boxes),  
> but I'm
> too lazy to write a test-case, so I was hoping you'd be using some  
> simple
> scripts that I could just test and see if I can see the behaviour you
> describe myself.
>
> That said, I have worked with a 3GB pack-file (one of the KDE trial
> repos). That worked fine. But git does tend to want a *lot* of  
> memory for
> really big repositories, so I suspect that if you actually work  
> with 2GB+
> pack-files, you'll be wanting a 64-bit environment just because  
> you'll be
> wanting more than 2GB of physical RAM in order to be able to access it
> efficiently.
>
> 			Linus

^ permalink raw reply

* Re: git packs
From: Linus Torvalds @ 2007-11-10  6:58 UTC (permalink / raw)
  To: Luke Lu; +Cc: bob, git
In-Reply-To: <DF65F7E4-448A-4726-8B42-642776155A8F@vicaya.com>

On Fri, 9 Nov 2007, Luke Lu wrote:
> 
> mmap(2), which git uses by default, is subject to vm limits (typically <2GB),
> regardless of large file support. file `which git` will probably tell you that
> it's a Mach-O executable i386 instead of x86_64. In order to get 64 bit
> binaries on Mactel boxes, you'll need the -m64 flag for gcc. I suspect that
> compiling with NO_MMAP option work as well.

Even with NO_MMAP, git will still want to read in source files in their 
entirety (just with regular reads). So you'll still be VM size-limited: 
the mmap() will just be replaced with a malloc+read in order to avoid 
some broken windows mmap() behaviour.

But hearing whether -m64 makes a difference would be interesting. I'm 
hoping OS X is LP64, not some insane half-way thing like Vista is.

		Linus

^ permalink raw reply

* Re: git packs
From: Linus Torvalds @ 2007-11-10  6:53 UTC (permalink / raw)
  To: bob; +Cc: git
In-Reply-To: <alpine.LFD.0.999.0711092211250.15101@woody.linux-foundation.org>



On Fri, 9 Nov 2007, Linus Torvalds wrote:
> 
> That said, I have worked with a 3GB pack-file (one of the KDE trial 
> repos). That worked fine. But git does tend to want a *lot* of memory for 
> really big repositories, so I suspect that if you actually work with 2GB+ 
> pack-files, you'll be wanting a 64-bit environment just because you'll be 
> wanting more than 2GB of physical RAM in order to be able to access it 
> efficiently.

Just double-checked. Yes, sirree. You definitely want 4GB+ if you are 
cloning a 3GB git pack-file. The "git-pack-objects" phase not only is 
going to walk all over the pack-file, it's going to add its own memory 
footprint on top of that just keeping track of all the objects.

So I doubt 2GB+ pack-files are all that practical on 32-bit hosts. At 
least not with the kind of performance behaviour *I* would accept.

(Of course, since git packs things pretty damn well, it would need to be a 
really really big project to be a 2GB+ pack-file, or just contain a lot of 
generally large non-deltable binary data file - one scenario where git 
definitely doesn't work wonderfully well, although I doubt many other 
SCM's do either..)

			Linus

^ permalink raw reply

* Re: git packs
From: Linus Torvalds @ 2007-11-10  6:38 UTC (permalink / raw)
  To: bob; +Cc: git
In-Reply-To: <FC175E4F-D9BE-42CC-B0BB-561B2EDCD941@mac.com>

On Sat, 10 Nov 2007, bob wrote:
> 
> The reason that I ask is that I have been playing different
> scenarios using git 1.5.3.5 under MacOSX 10.4.10 mostly
> all day and every time that
> 
> A) 	a file approaches or exceeds 2gig on an 'add', it
> 	results in:
> 		fatal: Out of memory? mmap failed: Cannot allocate memory

Git wants to handle single files as one single entity, so single big files 
really do end up being very painful. The costs of compressing them and 
generating deltas would probably get prohibitively high *anyway*, but it 
does mean that if you have gigabyte files, you do want a 64-bit VM.

I thought OS X could do 64 bits these days. Maybe not.

Anyway, that explains the "cannot allocate memory". Git simply wants to 
mmap the whole file. You don't have enough VM space for it.

(And if you seriously want to work with multi-gigabyte files, git probbaly 
isn't going to perform wonderfully well, even if it *should* work fine if 
you just have a full 64-bit environment that allows the mmap).

> B) 	the repository size less the .git subdirectory approaches
> 	4gig on a 'fetch' it results in:
> 
> 	Resolving 3356 deltas...
> 	fatal: serious inflate inconsistency: -3 (unknown compression method)

That sounds really broken. I'm not seeing what would cause that, apart 
from some really bad data corruption and/or broken zlib implementation. 
But if the pack-file really is 2GB+ in size, I could imagine some sign 
issues cropping up.

git will generally use "unsigned long" (which is probably just 32-bit on 
your setup), but since git in those circumstances would be limited by the 
size of the VM _anyway_, that's not really much of a limitation (although 
probably broken on the crazy Windows "LLP64" model). But maybe we have 
some place where we use a signed thing, or zlib does, and I could see that 
causing breakage.

But that code-sequence really should never even come *close* to the 31-bit 
limit, as long as the individual objects themselves aren't bigger than the 
available VM space (and git currently assumes "unsigned long" is 
sufficiently big to cover the VM space, which is not technically correct, 
but should be fine on OS X too).

That said, we should use "off_t" in that function. I suspect we have a 
number of people (read: me) who have grown too used to living in a 64-bit 
world..

> I have been testing on my laptop which has a 32-bit Intel Core Duo.

Ok, so you're 32-bit limited even if there is were to be some 64-bit 
support for OS X.

> Also, I have run the same tests on a dual quad-core Intel processor
> which is 64 bit, (but not sure that Apple uses the 64 bits in 10.4.10).  I
> get the same results as above.

I'm pretty sure OS X defaults to a 32-bit environment, but has at least 
*some* 64-bit support. It would definitely need to be enabled explicitly 
(since they made the *insane* decision to move over to Intel laptop chips 
six months before they got 64-bit support! Somebody at Apple is a total 
idiot, and should get fired).

So it would be interesting to hear if a 64-bit build would make a 
difference.

> The zlib is at the latest revision of 1.2.3 and gcc is at 4.0.1
> which from what I can tell supports large files, because 'off_t' is 8 bytes
> which is the size used for a 'stat' file size.

See above: single files are size-limited, but with a large off_t like 
yours, you should be fine. Except we may have screwed up.

> I am just wondering if these size limitations exist for MacOSX
> or maybe I am doing something wrong (which is probably
> the case).

We *have* had issues with broken implementations of "pread()" on some 
systems.  

You could try setting NO_PREAD in the Makefile and compiling with the 
compatibility function.. That's the only thing that comes to mind as being 
worth trying in that area.

And if you have some script to generate the repository (ie you aren't 
using "live data", but are testing the limits of the system), if you can 
make that available, so that people with non-OSX environments can test, 
that would be interesting..

I certainly have some 32-bit environments too (old linux boxes), but I'm 
too lazy to write a test-case, so I was hoping you'd be using some simple 
scripts that I could just test and see if I can see the behaviour you 
describe myself.

That said, I have worked with a 3GB pack-file (one of the KDE trial 
repos). That worked fine. But git does tend to want a *lot* of memory for 
really big repositories, so I suspect that if you actually work with 2GB+ 
pack-files, you'll be wanting a 64-bit environment just because you'll be 
wanting more than 2GB of physical RAM in order to be able to access it 
efficiently.

			Linus

^ permalink raw reply

* Re: git packs
From: Luke Lu @ 2007-11-10  6:36 UTC (permalink / raw)
  To: bob; +Cc: git
In-Reply-To: <FC175E4F-D9BE-42CC-B0BB-561B2EDCD941@mac.com>

On Nov 9, 2007, at 10:00 PM, bob wrote:
> When you say toolchain, are you referring to the compiler
> and associated libraries or are you referring to OS programs
> such as ls, md5, cat, etc or both?
>
> The reason that I ask is that I have been playing different
> scenarios using git 1.5.3.5 under MacOSX 10.4.10 mostly
> all day and every time that
>
> A) 	a file approaches or exceeds 2gig on an 'add', it
> 	results in:
> 	
> 	fatal: Out of memory? mmap failed: Cannot allocate memory
>
>
>
> B) 	the repository size less the .git subdirectory approaches
> 	4gig on a 'fetch' it results in:
>
> 	Resolving 3356 deltas...
> 	fatal: serious inflate inconsistency: -3 (unknown compression method)
> 	fatal: index-pack died with error code 128
> 	fatal: Fetch failure: ../rmwHtmlOld
>
> 	Under B, building the initial repository works fine.
>
> (I added a patch the Linus Torvalds gave out when a previous  
> inflate problem
> was being researched.)  Also, I have been looking in the source
> in particular in builtin-add.c builtin-pack-objects.c and  
> associated headers
> and see int and unsigned long being used a lot, but not any  
> unsigned long
> longs.  I have been testing on my laptop which has a 32-bit Intel  
> Core Duo.
> Also, I have run the same tests on a dual quad-core Intel processor
> which is 64 bit, (but not sure that Apple uses the 64 bits in  
> 10.4.10).  I
> get the same results as above.
>
> The zlib is at the latest revision of 1.2.3 and gcc is at 4.0.1
> which from what I can tell supports large files, because 'off_t' is  
> 8 bytes
> which is the size used for a 'stat' file size.

mmap(2), which git uses by default, is subject to vm limits  
(typically <2GB), regardless of large file support. file `which git`  
will probably tell you that it's a Mach-O executable i386 instead of  
x86_64. In order to get 64 bit binaries on Mactel boxes, you'll need  
the -m64 flag for gcc. I suspect that compiling with NO_MMAP option  
work as well.

__Luke

^ permalink raw reply

* Re: gitweb, updating 'last changed' column on the project page
From: Jon Smirl @ 2007-11-10  6:10 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git
In-Reply-To: <9e4733910711092201n5aaeeb7cvfd0e76e43170d481@mail.gmail.com>

On 11/10/07, Jon Smirl <jonsmirl@gmail.com> wrote:
> On 11/9/07, Jakub Narebski <jnareb@gmail.com> wrote:
> > Jon Smirl wrote:
> >
> > > At http://git.digispeaker.com/ the 'last change' column is not getting updated.
> > >
> > > mpc5200b.git
> > >       DigiSpeaker for Freescale MPC5200B.
> > >       Jon Smirl
> > >       5 weeks ago
> > >       summary | shortlog | log | tree
> > >
> > > It still says 5 weeks ago, but if I click on the project last change is today.
> > >
> > > What controls this? I tried running update-server-info
> >
> > What does
> >
> >   git for-each-ref --format="%(refname):%09%(committer)" --sort=-committerdate
> >       refs/heads
>
> [daedalus]$ git for-each-ref --format="%(refname):%09%(committer)"
> --sort=-committerdate refs/heads
> refs/heads/m24: Jon Smirl <jonsmirl@gmail.com> 1191362799 -0400
> refs/heads/m25: Jon Smirl <jonsmirl@gmail.com> 1191472422 -0400
> refs/heads/m26: Jon Smirl <jonsmirl@gmail.com> 1194382038 -0500
> refs/heads/m28: Jon Smirl <jonsmirl@gmail.com> 1194385071 -0500
> refs/heads/m29: Jon Smirl <jonsmirl@gmail.com> 1194585780 -0500


It appears to be using the first head instead of the most recent date.


>
> >
> > return? Does adding --count select proper branch, with proper update
> > date?
>
> Is it looking for master, and just picking the first branch instead?
>
> >
> > Which gitweb version is this?
>
> <!-- git web interface version 1.5.3.5.605.g79fa-dirty, (C) 2005-2006,
> Kay Sievers <kay.sievers@vrfy.org>, Christian Gierke -->
> <!-- git core binaries version 1.5.3.5.605.g79fa-dirty -->
>
> >
> > --
> > Jakub Narebski
> > Warsaw, Poland
> > ShadeHawk on #git
> >
> >
> > -
> > To unsubscribe from this list: send the line "unsubscribe git" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
>
>
> --
> Jon Smirl
> jonsmirl@gmail.com
>


-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply

* Re: [PATCH] Make builtin-tag.c use parse_options.
From: Junio C Hamano @ 2007-11-10  6:07 UTC (permalink / raw)
  To: Carlos Rica; +Cc: git
In-Reply-To: <473463E0.7000406@gmail.com>

Carlos Rica <jasampler@gmail.com> writes:

> Also, this removes those tests ensuring that repeated
> -m options don't allocate memory more than once, because now
> this is done after parsing options, using the last one
> when more are given. The same for -F.

The reason for this change is...?  Is this because it is
cumbersome to detect and refuse multiple -m options using the
parseopt API?  If so, the API may be what needs to be fixed.
Taking the last one and discarding earlier ones feels to me an
arbitrary choice.

While I freely admit that I do not particularly find the "One -m
introduces one new line, concatenated to form the final
paragraph" handling of multiple -m options done by git-commit
nice nor useful, I suspect that it would make more sense to make
git-tag and git-commit handle multiple -m option consistently,
if you are going to change the existing semantics.  Since some
people really seem to like multiple -m handling of git-commit,
the avenue of the least resistance for better consistency would
be to accept and concatenate (with LF in between) multiple -m
options.

With multiple -F, I think erroring out would be the sensible
thing to do, but some people might prefer concatenation.  I do
not care either way as long as commit and tag behave
consistently.

^ permalink raw reply

* Re: Reducing the memory footprint
From: Jon Smirl @ 2007-11-10  6:07 UTC (permalink / raw)
  To: Brian Downing; +Cc: Git Mailing List
In-Reply-To: <9e4733910711091705i6f77d05uc5ba04f668796a73@mail.gmail.com>

On 11/9/07, Jon Smirl <jonsmirl@gmail.com> wrote:
> On 11/9/07, Brian Downing <bdowning@lavos.net> wrote:
> > On Fri, Nov 09, 2007 at 06:38:00PM -0500, Jon Smirl wrote:
> > > I'm using this config file:
> > >
> > > [pack]
> > >         windowMemory = 1M
> > >         deltaCacheSize = 1M
> > >
> > > And I have NO_MMAP compiled in.
> > >
> > > git is still using over 200MB of memory or address space, my process
> > > gets killed either way.
> >
> > I'm assuming it's dying on repacking since you included the pack
> > parameters.
> >
> > How big is your biggest object?  Even with pack.windowMemory, it still
> > keeps the last object around to try and delta against (in other words,
> > the window only shrinks to size 1), which means you have to have room
> > for it and its delta index.
>
> It's a Linux kernel repository. Git receive-pack is going over 200MB
> and getting zapped.  I don't understand why the process is so large. I
> am compiled with -DNO_MMAP.

I believe I must not have installed everything correctly with my
NO_MMAP build. After debugging for a while and fixing things I'm able
to do a push now in about 80MB of memory.

> I think I have a achieved a work around. I rsync'd in my last several
> weeks of changes. Now I can 'git push' small amounts of changes
> without getting killed.
>
> I'm begging dreamhost to simply install git. Installed commands don't
> get zapped.
>
> --
> Jon Smirl
> jonsmirl@gmail.com
>


-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply

* Re: gitweb, updating 'last changed' column on the project page
From: Jon Smirl @ 2007-11-10  6:01 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git
In-Reply-To: <fh337a$ggp$1@ger.gmane.org>

On 11/9/07, Jakub Narebski <jnareb@gmail.com> wrote:
> Jon Smirl wrote:
>
> > At http://git.digispeaker.com/ the 'last change' column is not getting updated.
> >
> > mpc5200b.git
> >       DigiSpeaker for Freescale MPC5200B.
> >       Jon Smirl
> >       5 weeks ago
> >       summary | shortlog | log | tree
> >
> > It still says 5 weeks ago, but if I click on the project last change is today.
> >
> > What controls this? I tried running update-server-info
>
> What does
>
>   git for-each-ref --format="%(refname):%09%(committer)" --sort=-committerdate
>       refs/heads

[daedalus]$ git for-each-ref --format="%(refname):%09%(committer)"
--sort=-committerdate refs/heads
refs/heads/m24: Jon Smirl <jonsmirl@gmail.com> 1191362799 -0400
refs/heads/m25: Jon Smirl <jonsmirl@gmail.com> 1191472422 -0400
refs/heads/m26: Jon Smirl <jonsmirl@gmail.com> 1194382038 -0500
refs/heads/m28: Jon Smirl <jonsmirl@gmail.com> 1194385071 -0500
refs/heads/m29: Jon Smirl <jonsmirl@gmail.com> 1194585780 -0500

>
> return? Does adding --count select proper branch, with proper update
> date?

Is it looking for master, and just picking the first branch instead?

>
> Which gitweb version is this?

<!-- git web interface version 1.5.3.5.605.g79fa-dirty, (C) 2005-2006,
Kay Sievers <kay.sievers@vrfy.org>, Christian Gierke -->
<!-- git core binaries version 1.5.3.5.605.g79fa-dirty -->

>
> --
> Jakub Narebski
> Warsaw, Poland
> ShadeHawk on #git
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply

* Re: git packs
From: bob @ 2007-11-10  6:00 UTC (permalink / raw)
  To: git
In-Reply-To: <alpine.LFD.0.9999.0711100011150.21255@xanadu.home>

When you say toolchain, are you referring to the compiler
and associated libraries or are you referring to OS programs
such as ls, md5, cat, etc or both?

The reason that I ask is that I have been playing different
scenarios using git 1.5.3.5 under MacOSX 10.4.10 mostly
all day and every time that

A) 	a file approaches or exceeds 2gig on an 'add', it
	results in:

	fatal: Out of memory? mmap failed: Cannot allocate memory

B) 	the repository size less the .git subdirectory approaches
	4gig on a 'fetch' it results in:

	Resolving 3356 deltas...
	fatal: serious inflate inconsistency: -3 (unknown compression method)
	fatal: index-pack died with error code 128
	fatal: Fetch failure: ../rmwHtmlOld

	Under B, building the initial repository works fine.

(I added a patch the Linus Torvalds gave out when a previous inflate  
problem
was being researched.)  Also, I have been looking in the source
in particular in builtin-add.c builtin-pack-objects.c and associated  
headers
and see int and unsigned long being used a lot, but not any unsigned  
long
longs.  I have been testing on my laptop which has a 32-bit Intel  
Core Duo.
Also, I have run the same tests on a dual quad-core Intel processor
which is 64 bit, (but not sure that Apple uses the 64 bits in  
10.4.10).  I
get the same results as above.

The zlib is at the latest revision of 1.2.3 and gcc is at 4.0.1
which from what I can tell supports large files, because 'off_t' is 8  
bytes
which is the size used for a 'stat' file size.

I am just wondering if these size limitations exist for MacOSX
or maybe I am doing something wrong (which is probably
the case).

On Nov 10, 2007, at 12:13 AM, Nicolas Pitre wrote:

> On Fri, 9 Nov 2007, bob wrote:
>
>> When a repository is packed such as for a clone or fetch, is there  
>> just one
>> pack file created that is used for the transfer?
>
> Yes.
>
> And modern Git is able to handle packs larger than 4GB too,  
> assuming it
> is compiled using a toolchain with large file support.
>
>
> Nicolas
> -

^ permalink raw reply

* [PATCH] builtin-commit: fix --signoff
From: Johannes Schindelin @ 2007-11-10  5:49 UTC (permalink / raw)
  To: git, krh, gitster


The Signed-off-by: line contained a spurious timestamp.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 builtin-commit.c |   31 ++++++++++++++++++++-----------
 1 files changed, 20 insertions(+), 11 deletions(-)

diff --git a/builtin-commit.c b/builtin-commit.c
index e8bc4c4..f79ad48 100644
--- a/builtin-commit.c
+++ b/builtin-commit.c
@@ -181,21 +181,30 @@ static int prepare_log_message(const char *index_file, const char *prefix)
 		die("could not open %s\n", git_path(commit_editmsg));
 
 	stripspace(&sb, 0);
-	if (fwrite(sb.buf, 1, sb.len, fp) < sb.len)
-		die("could not write commit template: %s\n",
-		    strerror(errno));
 
 	if (signoff) {
-		const char *info, *bol;
-
-		info = git_committer_info(1);
-		strbuf_addch(&sb, '\0');
-		bol = strrchr(sb.buf + sb.len - 1, '\n');
-		if (!bol || prefixcmp(bol, sign_off_header))
-			fprintf(fp, "\n");
-		fprintf(fp, "%s%s\n", sign_off_header, git_committer_info(1));
+		struct strbuf sob;
+		const char *p;
+		int i;
+
+		strbuf_init(&sob, 0);
+		strbuf_addstr(&sob, sign_off_header);
+		strbuf_addstr(&sob, git_committer_info(1));
+		p = strrchr(sob.buf, '>');
+		if (p)
+			strbuf_setlen(&sob, p + 1 - sob.buf);
+		strbuf_addch(&sob, '\n');
+
+		for (i = sb.len - 1; i > 0 && sb.buf[i - 1] != '\n'; i--)
+			; /* do nothing */
+		if (prefixcmp(sb.buf + i, sob.buf))
+			strbuf_addbuf(&sb, &sob);
 	}
 
+	if (fwrite(sb.buf, 1, sb.len, fp) < sb.len)
+		die("could not write commit template: %s\n",
+		    strerror(errno));
+
 	strbuf_release(&sb);
 
 	if (in_merge && !no_edit)
-- 
1.5.3.5.1674.g6e7f7

^ permalink raw reply related

* Re: git packs
From: Nicolas Pitre @ 2007-11-10  5:13 UTC (permalink / raw)
  To: bob; +Cc: git
In-Reply-To: <F6DD8DCD-416B-4DDF-B384-7213C9ED5565@mac.com>

On Fri, 9 Nov 2007, bob wrote:

> When a repository is packed such as for a clone or fetch, is there just one
> pack file created that is used for the transfer?

Yes.

And modern Git is able to handle packs larger than 4GB too, assuming it 
is compiled using a toolchain with large file support.

Nicolas

^ permalink raw reply

* git packs
From: bob @ 2007-11-10  4:47 UTC (permalink / raw)
  To: git

When a repository is packed such as for a clone or fetch, is there  
just one pack file created that is used for the transfer?

^ permalink raw reply

* Re: [PATCH] git-add: make the entry stat-clean after re-adding the same contents
From: Linus Torvalds @ 2007-11-10  2:43 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Schindelin, Kristian Høgsberg, git
In-Reply-To: <7vzlxmq1oj.fsf_-_@gitster.siamese.dyndns.org>



On Fri, 9 Nov 2007, Junio C Hamano wrote:
>  
> -int run_diff_files(struct rev_info *revs, int silent_on_removed)
> +int run_diff_files(struct rev_info *revs, int option)

Wouldn't it be much better to now 
 - make it "unsigned int flags"
 - create a few enums or #define's to make the usage be more readable?

Because this:

>-       run_diff_files(&rev, 0);
>+       run_diff_files(&rev, 2);
> -	    !ie_modified(istate, istate->cache[pos], &st, 1)) {
> +	    !ie_match_stat(istate, istate->cache[pos], &st, 3)) {

just went from subtle to "incredibly non-obvious".

I realize that 3 is "silent + assume_racy" and 2 is just "assume_racy", 
but I may realize that now, and in a month? I'd need to look it up.

		Linus

^ permalink raw reply

* Re: *[PATCH] builtin-commit: Refresh cache after adding files.
From: Junio C Hamano @ 2007-11-10  2:26 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Kristian Høgsberg, git
In-Reply-To: <Pine.LNX.4.64.0711100201310.4362@racer.site>

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> But I think that we still need to refresh the index.

You are correct.  See

    http://article.gmane.org/gmane.comp.version-control.git/64254

^ permalink raw reply

* [PATCH] git-add: make the entry stat-clean after re-adding the same contents
From: Junio C Hamano @ 2007-11-10  2:22 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Kristian Høgsberg, git
In-Reply-To: <7vhcjvtgz5.fsf@gitster.siamese.dyndns.org>

Earlier in commit 0781b8a9b2fe760fc4ed519a3a26e4b9bd6ccffe
(add_file_to_index: skip rehashing if the cached stat already
matches), add_file_to_index() were taught not to re-add the path
if it already matches the index.

The change meant well, but was not executed quite right.  It
used ie_modified() to see if the file on the work tree is really
different from the index, and skipped adding the contents if the
function says "not modified".

This was wrong.  There are three possible comparison results
between the index and the file in the work tree:

 - with lstat(2) we _know_ they are different.  E.g. if the
   length or the owner in the cached stat information is
   different from the length we just obtained from lstat(2), we
   can tell the file is modified without looking at the actual
   contents.

 - with lstat(2) we _know_ they are the same.  The same length,
   the same owner, the same everything (but this has a twist, as
   described below).

 - we cannot tell from lstat(2) information alone and need to go
   to the filesystem to actually compare.

The last case arises from what we call 'racy git' situation,
that can be caused with this sequence:

    $ echo hello >file
    $ git add file
    $ echo aeiou >file ;# the same length

If the second "echo" is done within the same filesystem
timestamp granularity as the first "echo", then the timestamp
recorded by "git add" and the timestamp we get from lstat(2)
will be the same, and we can mistakenly say the file is not
modified.  The path is called 'racily clean'.  We need to
reliably detect racily clean paths are in fact modified.

To solve this problem, when we write out the index, we mark the
index entry that has the same timestamp as the index file itself
(that is the time from the point of view of the filesystem) to
tell any later code that does the lstat(2) comparison not to
trust the cached stat info, and ie_modified() then actually goes
to the filesystem to compare the contents for such a path.

That's all good, but it should not be used for this "git add"
optimization, as the goal of "git add" is to actually update the
path in the index and make it stat-clean.  With the false
optimization, we did _not_ cause any data loss (after all, what
we failed to do was only to update the cached stat information),
but it made the following sequence leave the file stat dirty:

    $ echo hello >file
    $ git add file
    $ echo hello >file ;# the same contents
    $ git add file

The solution is not to use ie_modified() which goes to the
filesystem to see if it is really clean, but instead use
ie_match_stat() with "assume racily clean paths are dirty"
option, to force re-adding of such a path.

There was another problem with "git add -u".  The codepath
shares the same issue when adding the paths that are found to be
modified, but in addition, it asked "git diff-files" machinery
run_diff_files() function (which is "git diff-files") to list
the paths that are modified.  But "git diff-files" machinery
uses the same ie_modified() call so that it does not report
racily clean _and_ actually clean paths as modified, which is
not what we want.

The patch allows the callers of run_diff_files() to pass the
same "assume racily clean paths are dirty" option, and makes
"git-add -u" codepath to use that option, to discover and re-add
racily clean _and_ actually clean paths.

We could further optimize on top of this patch to differentiate
the case where the path really needs re-adding (i.e. the content
of the racily clean entry was indeed different) and the case
where only the cached stat information needs to be refreshed
(i.e. the racily clean entry was actually clean), but I do not
think it is worth it.

This patch applies to maint and all the way up.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 * Blech.  This took longer than I thought.  It's only cached
   stat info and does not cause real data breakage, but I think
   it is a good thing to fix nevertheless.

 builtin-add.c |    2 +-
 diff-lib.c    |    6 ++++--
 diff.h        |    2 +-
 read-cache.c  |    2 +-
 4 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/builtin-add.c b/builtin-add.c
index 373f87f..b6d6bbe 100644
--- a/builtin-add.c
+++ b/builtin-add.c
@@ -123,7 +123,7 @@ static void update(int verbose, const char *prefix, const char **files)
 	rev.diffopt.format_callback_data = &verbose;
 	if (read_cache() < 0)
 		die("index file corrupt");
-	run_diff_files(&rev, 0);
+	run_diff_files(&rev, 2);
 }

 static void refresh(int verbose, const char **pathspec)
diff --git a/diff-lib.c b/diff-lib.c
index da55713..b83f650 100644
--- a/diff-lib.c
+++ b/diff-lib.c
@@ -332,10 +332,12 @@ int run_diff_files_cmd(struct rev_info *revs, int argc, const char **argv)
 	return run_diff_files(revs, silent_on_removed);
 }

-int run_diff_files(struct rev_info *revs, int silent_on_removed)
+int run_diff_files(struct rev_info *revs, int option)
 {
 	int entries, i;
 	int diff_unmerged_stage = revs->max_count;
+	int silent_on_removed = option & 01;
+	int assume_racy_is_modified = option & 02;

 	if (diff_unmerged_stage < 0)
 		diff_unmerged_stage = 2;
@@ -441,7 +443,7 @@ int run_diff_files(struct rev_info *revs, int silent_on_removed)
 				       ce->sha1, ce->name, NULL);
 			continue;
 		}
-		changed = ce_match_stat(ce, &st, 0);
+		changed = ce_match_stat(ce, &st, assume_racy_is_modified);
 		if (!changed && !revs->diffopt.find_copies_harder)
 			continue;
 		oldmode = ntohl(ce->ce_mode);
diff --git a/diff.h b/diff.h
index 4546aad..040e18c 100644
--- a/diff.h
+++ b/diff.h
@@ -224,7 +224,7 @@ extern void diff_flush(struct diff_options*);

 extern const char *diff_unique_abbrev(const unsigned char *, int);

-extern int run_diff_files(struct rev_info *revs, int silent_on_removed);
+extern int run_diff_files(struct rev_info *revs, int option);
 extern int setup_diff_no_index(struct rev_info *revs,
 		int argc, const char ** argv, int nongit, const char *prefix);
 extern int run_diff_files_cmd(struct rev_info *revs, int argc, const char **argv);
diff --git a/read-cache.c b/read-cache.c
index 928e8fa..75e2d46 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -420,7 +420,7 @@ int add_file_to_index(struct index_state *istate, const char *path, int verbose)
 	pos = index_name_pos(istate, ce->name, namelen);
 	if (0 <= pos &&
 	    !ce_stage(istate->cache[pos]) &&
-	    !ie_modified(istate, istate->cache[pos], &st, 1)) {
+	    !ie_match_stat(istate, istate->cache[pos], &st, 3)) {
 		/* Nothing changed, really */
 		free(ce);
 		return 0;
-- 
1.5.3.5.1651.g30bf0

^ permalink raw reply related

* Re: *[PATCH] builtin-commit: Refresh cache after adding files.
From: Johannes Schindelin @ 2007-11-10  2:02 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Kristian Høgsberg, git
In-Reply-To: <7v640ari76.fsf@gitster.siamese.dyndns.org>

Hi,

On Fri, 9 Nov 2007, Junio C Hamano wrote:

> Junio C Hamano <gitster@pobox.com> writes:
> 
> > This discussion exposes a problem with add_files_to_cache()
> > function.
> > ...
> > And add_file_to_cache(), which calls add_file_to_index() on
> > the_index, does call the fill_stat_cache_info() to sync the stat
> > information by itself, as it should be.  I am still puzzled with
> > this.
> 
> Blech.  I think it is 0781b8a9b2fe760fc4ed519a3a26e4b9bd6ccffe 
> (add_file_to_index: skip rehashing if the cached stat already matches) 
> that broke "git add".  I get the same problem not just with "git add -u" 
> but also with an explicit "git add Makefile".

But I think that we still need to refresh the index.  It outputs which 
files names were new, and which ones were removed.  Otherwise git commit 
would be eerily silent on those.

Ciao,
Dscho

^ permalink raw reply

* Re: gitweb, updating 'last changed' column on the project page
From: Jakub Narebski @ 2007-11-10  1:58 UTC (permalink / raw)
  To: git
In-Reply-To: <9e4733910711091709k173bf23flf2824673f82de9bb@mail.gmail.com>

Jon Smirl wrote:

> At http://git.digispeaker.com/ the 'last change' column is not getting updated.
> 
> mpc5200b.git
>       DigiSpeaker for Freescale MPC5200B.
>       Jon Smirl
>       5 weeks ago
>       summary | shortlog | log | tree
> 
> It still says 5 weeks ago, but if I click on the project last change is today.
> 
> What controls this? I tried running update-server-info

What does

  git for-each-ref --format="%(refname):%09%(committer)" --sort=-committerdate
      refs/heads

return? Does adding --count select proper branch, with proper update
date?

Which gitweb version is this?

-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git

^ permalink raw reply

* Re: *[PATCH] builtin-commit: Refresh cache after adding files.
From: Junio C Hamano @ 2007-11-10  1:40 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Kristian Høgsberg, git
In-Reply-To: <7vhcjvtgz5.fsf@gitster.siamese.dyndns.org>

Junio C Hamano <gitster@pobox.com> writes:

> This discussion exposes a problem with add_files_to_cache()
> function.
> ...
> And add_file_to_cache(), which calls add_file_to_index() on
> the_index, does call the fill_stat_cache_info() to sync the stat
> information by itself, as it should be.  I am still puzzled with
> this.

Blech.  I think it is 0781b8a9b2fe760fc4ed519a3a26e4b9bd6ccffe
(add_file_to_index: skip rehashing if the cached stat already
matches) that broke "git add".  I get the same problem not just
with "git add -u" but also with an explicit "git add Makefile".

In add_file_to_index(), we check if ie_modified() says if the
path is modified, but that is actually a very wrong check in
that function.  ie_modified() knows the racy git condition and
digs deeper to find if the file in the work tree is really
different.  We end up saying "ah, Ok, if that is the same, then
we do not add it to the index again", which is why we do not
update the stat info in this case.

Instead, we should ask ie_match_stat() which answers "does not
match" for a entry that _could_ be racily clean, so that we make
sure we re-add such an entry to clear the stat info.

This applies to maint and all the way up.

I also suspect that run_diff_files() that is called from
builtin-add.c:update() needs to tell ie_match_stat() to assume
that a racily clean path is unclean (that is what bit (1<<1) of
the third parameter to ie_match_stat() is about), so that "add
-u" will call the add_file_to_index() function, but that would
be a bigger patch.

---

 read-cache.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index 928e8fa..75e2d46 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -420,7 +420,7 @@ int add_file_to_index(struct index_state *istate, const char *path, int verbose)
 	pos = index_name_pos(istate, ce->name, namelen);
 	if (0 <= pos &&
 	    !ce_stage(istate->cache[pos]) &&
-	    !ie_modified(istate, istate->cache[pos], &st, 1)) {
+	    !ie_match_stat(istate, istate->cache[pos], &st, 3)) {
 		/* Nothing changed, really */
 		free(ce);
 		return 0;

^ permalink raw reply related

* gitweb, updating 'last changed' column on the project page
From: Jon Smirl @ 2007-11-10  1:09 UTC (permalink / raw)
  To: Git Mailing List

At http://git.digispeaker.com/ the 'last change' column is not getting updated.

mpc5200b.git
	DigiSpeaker for Freescale MPC5200B.
	Jon Smirl
	5 weeks ago
	summary | shortlog | log | tree

It still says 5 weeks ago, but if I click on the project last change is today.

What controls this? I tried running update-server-info

-- 
Jon Smirl
jonsmirl@gmail.com

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox