Git development

Git development
 help / color / mirror / Atom feed

* Re: [PATCH v3 16/19] pathspec.c: move reusable code from builtin/add.c
From: Adam Spiers @ 2012-12-29  0:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git list
In-Reply-To: <CAOkDyE8vSyT=eJ4FxRwYgz7Jzqu1+0LSzxtq9iSSzJEgTD1M0g@mail.gmail.com>

On Fri, Dec 28, 2012 at 8:45 PM, Adam Spiers <git@adamspiers.org> wrote:
> On Fri, Dec 28, 2012 at 8:32 PM, Junio C Hamano <gitster@pobox.com> wrote:
>> Adam Spiers <git@adamspiers.org> writes:
>>
>>> diff --git a/pathspec.h b/pathspec.h
>>> new file mode 100644
>>> index 0000000..8bb670b
>>> --- /dev/null
>>> +++ b/pathspec.h
>>> @@ -0,0 +1,5 @@
>>> +extern char *find_used_pathspec(const char **pathspec);
>>> +extern void fill_pathspec_matches(const char **pathspec, char *seen, int specs);
>>> +extern const char *treat_gitlink(const char *path);
>>> +extern void treat_gitlinks(const char **pathspec);
>>> +extern const char **validate_pathspec(const char **argv, const char *prefix);
>>
>> Protect this against multiple inclusion with "#ifndef PATHSPEC_H".
>
> Yep good idea, how should I submit this?  It will cause a trivially
> resolvable conflict with the next patch in the series (17/19):
>
>   pathspec.c: extract new validate_path() for reuse

I was wrong about that - it didn't cause a conflict, although it does
marginally change the context at the end of the pathspec.h hunk in the
above patch.

> but I'd prefer not to re-roll 16--19 when just 16 and 17 are sufficient.

Based on your other feedback, all of 16--19 require changes, and as
things stand, conveniently nothing earlier in the series does, so I'll
re-roll those four once the outstanding issues are all resolved.

^ permalink raw reply

* Re: [RFC] pack-objects: compression level for non-blobs
From: Jeff King @ 2012-12-29  0:41 UTC (permalink / raw)
  To: David Michael Barr; +Cc: Git Mailing List
In-Reply-To: <1353911154-23495-1-git-send-email-b@rr-dav.id.au>

On Mon, Nov 26, 2012 at 05:25:54PM +1100, David Michael Barr wrote:

>  The intent is to allow selective recompression of pack data.
>  For small objects/deltas the overhead of deflate is significant.
>  This may improve read performance for the object graph.
> 
>  I ran some unscientific experiments with the chromium repository.
>  With pack.graphcompression = 0, there was a 2.7% increase in pack size.
>  I saw a 35% improvement with cold caches and 43% otherwise on git log --raw.

There wasn't much response to this, but those numbers are encouraging. I
was curious to replicate them, as well as to break it down by trees and
commits. I also wanted to test on more repositories, as well as on both
SSDs and spinning disks (for cold-cache numbers). Maybe that will catch
more people's interest.

As you mentioned in your follow-up, I ran into the "delta size changed"
problem. Not sure if it is related, but I noticed here:

> @@ -379,6 +396,13 @@ static unsigned long write_reuse_object(struct sha1file *f, struct object_entry
>  	offset += entry->in_pack_header_size;
>  	datalen -= entry->in_pack_header_size;
>  
> +	if (!pack_to_stdout &&
> +	    pack_graph_compression_seen &&
> +	    check_pack_compressed(p, &w_curs, offset) != !!compression_level(entry->actual_type)) {
> +		unuse_pack(&w_curs);
> +		return write_no_reuse_object(f, entry, limit, usable_delta);
> +	}
> +

...that we seem to re-compress more than necessary. If I instrument that
block with a message to stderr and run "git repack -ad" repeatedly
without changing the config in between, runs after the first should
never re-compress, right? But they seem to. I'm not sure if your
check_pack_compressed heuristic is off or something else. It may or may
not be related to the "delta sized change" failure.

But we can leave this side a bit for a moment. Conceptually there are
two interesting things going on in your patch:

  1. Per-object-type compression levels

  2. Auto-recompression when levels change.

We can figure out (2) later. The meat of the idea is (1), and the patch
for that is much simpler. In fact, we can test it out with entirely
stock git by creating separate tree, commit, and blob packs, each with
different compression. So that's what I did for my timing, just to keep
things simple.

I timed git.git, linux-2.6.git, and WebKit.git. For each repo, I tested
it with four pack compression scenarios:

  1. all objects at -1 (zlib default)

  2. commits at 0, everything else at -1

  3. trees at 0, everything else at -1

  4. commits and trees at 0, everything else at -1

For each scenario, I timed "git rev-list --count --all" to traverse all
commits (which roughly models things like merge-base, ahead/behind
counts, etc), and then the same thing with "--objects" to traverse all
objects (which would roughly match what "git prune" or the "counting
objects" phase of packing would do). For each command, I timed both warm
and cold disk cache (the latter via "echo 3 >/proc/sys/vm/drop_caches").
Each timing is a best-of-five.  The timings were done on a machine with
an SSD (which probably matters for cold-cache; I have some spinning disk
numbers later).

Here are the git.git numbers:

 Pack  | Size          |  Cold Revs  |  Warm Revs  | Cold Objects | Warm Objects
-------+---------------+-------------+-------------+--------------+--------------
  none |  41.48        | 0.78        | 0.33        |  2.35        |  1.94       
commit |  49.34 (+18%) | 0.57 (-26%) | 0.09 (-74%) |  2.48  (+5%) |  1.70 (-12%)
  tree |  45.43  (+9%) | 0.80  (+3%) | 0.33   (0%) |  2.11  (-9%) |  1.74 (-10%)
  both |  53.31 (+28%) | 0.79  (+1%) | 0.08 (-75%) |  2.27  (-3%) |  1.49 (-23%)

The pack column specifies which scenario (i.e., what was left
uncompressed).  The size column is the size of the object-dir (in
megabytes). The other columns are times to run each command in
wall-clock seconds. Percentages are comparisons to the baseline "none"
case (i.e., the status quo).

So you can see that it's a big win for warm-cache pure-commit
traversals. As a sanity check, we can see that the tree-only case is not
helped at all there (because we do not look at trees at all). The
cold-cache case is helped, too, but that benefit goes away (and even
hurts slightly, but that is probably within the noise) when we also
leave the trees uncompressed.

The full-objects traversal doesn't fare quite as well, though there's
still some improvement. I think it argues for leaving both uncompressed,
as the warm case really benefits when both are uncompressed. You lose
the cold-cache benefits in the revs-only case, though.

Here are the numbers for linux-2.6.git:

 Pack  | Size          |  Cold Revs  |  Warm Revs  | Cold Objects | Warm Objects
-------+---------------+-------------+-------------+--------------+--------------
  none | 864.61        | 8.66        | 4.07        | 42.76        | 36.32       
commit | 970.46 (+12%) | 8.87  (+2%) | 1.02 (-74%) | 42.94   (0%) | 33.43  (-7%)
  tree | 895.37  (+3%) | 9.08  (+4%) | 4.07   (0%) | 36.01 (-15%) | 29.62 (-18%)
  both | 1001.25 (+15%) | 8.90  (+2%) | 1.03 (-74%) | 35.57 (-16%) | 26.25 (-27%)

Similar warm-cache numbers, but the cold cache for the revs-only case is
hurt a little bit more.

And here's WebKit.git (sizes are in gigabytes this time):

 Pack  | Size          |  Cold Revs  |  Warm Revs  | Cold Objects | Warm Objects
-------+---------------+-------------+-------------+--------------+--------------
  none |   3.46        | 1.61        | 1.38        | 20.46        | 18.72       
commit |   3.54  (+2%) | 1.42 (-11%) | 0.34 (-75%) | 20.42   (0%) | 17.57  (-6%)
  tree |   3.59  (+3%) | 1.61   (0%) | 1.39   (0%) | 16.01 (-21%) | 14.00 (-25%)
  both |   3.67  (+6%) | 1.45 (-10%) | 0.34 (-75%) | 15.94 (-22%) | 12.91 (-31%)

Pretty similar again (slightly better on the full object traversal).

And finally, for comparison, here are the numbers from a (much slower)
machine that has spinning disks (albeit in a mirrored raid, which should
improve read times) on git.git:

 Pack  | Size          |  Cold Revs  |  Warm Revs  | Cold Objects | Warm Objects
-------+---------------+-------------+-------------+--------------+--------------
  none |  41.35        | 1.85        | 0.64        |  5.58        |  3.91       
commit |  49.23 (+19%) | 1.94  (+4%) | 0.14 (-77%) |  5.51  (-1%) |  3.40 (-12%)
  tree |  45.27  (+9%) | 1.78  (-3%) | 0.64   (0%) |  5.13  (-8%) |  3.53  (-9%)
  both |  53.16 (+28%) | 1.83  (-1%) | 0.14 (-77%) |  4.96 (-11%) |  3.32 (-14%)

Surprisingly not all that different than the SSD times. Which may mean I
screwed something up. I'm happy to make my test harness available if
anybody else feels like timing on their repos or machines. But it does
point to potentially leaving commits uncompressed, and possibly trees.

I wonder if we could do even better, though. For a traversal, we only
need to look at the commit header. We could potentially do a progressive
inflate and stop before getting to the commit message (which is the bulk
of the data, and the part that is most likely to benefit from
compression).

-Peff

^ permalink raw reply

* Re: [PATCH v3 18/19] setup.c: document get_pathspec()
From: Adam Spiers @ 2012-12-29  0:52 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git list
In-Reply-To: <CAOkDyE-UXGhe1aiWy_1_y7cvrmfvivSBxY9LHudOmeZD=M-12A@mail.gmail.com>

On Fri, Dec 28, 2012 at 8:40 PM, Adam Spiers <git@adamspiers.org> wrote:
> On Fri, Dec 28, 2012 at 8:36 PM, Junio C Hamano <gitster@pobox.com> wrote:
>> Adam Spiers <git@adamspiers.org> writes:
>>
>>> Since we have just created a new pathspec-handling library, now is a
>>> good time to add some comments explaining get_pathspec().
>>>
>>> Signed-off-by: Adam Spiers <git@adamspiers.org>
>>> ---
>>
>> Yes, but we would rather not to see new users of this function added
>> to our codebase in its current form, as explained in the nearby
>> comment.  We would want to migrate everybody to "struct pathspec"
>> based interface to support magic pathspecs in the longer term.
>
> I see.  Please feel free to drop that patch from the series or amend
> as you see fit.

I've added this sentence to the top of the comments above
get_pathspec():

    /*
     * N.B. get_pathspec() is deprecated in favor of the "struct pathspec"
     * based interface - see pathspec_magic above.
     *
    [...]

That should be sufficient to discourage people from adding new users
of get_pathspec().

^ permalink raw reply

* Re: Lockless Refs?
From: Martin Fick @ 2012-12-29  1:07 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Michael Haggerty, Jeff King, git
In-Reply-To: <7vlicijepv.fsf@alter.siamese.dyndns.org>

On Friday, December 28, 2012 09:58:36 am Junio C Hamano 
wrote:
> Martin Fick <mfick@codeaurora.org> writes:
> > 3) To create a ref, it must be renamed from the null
> > file (sha 0000...) to the new value just as if it were
> > being updated from any other value, but there is one
> > extra condition: before renaming the null file, a full
> > directory scan must be done to ensure that the null
> > file is the only file in the directory...
> 
> While you are scanning this directory to make sure it is
> empty, 

The objective is not to scan for an empty dir, but to scan 
for the existence of only the null file.

> I am contemplating to create the same ref with a
> different value.  You finished checking but haven't
> created the null.

The scan needs to happen after creating the null, not before, 
so I don't believe the rest of the scenario below is possible 
then?

> I have also scanned, created the null
> and renamed it to my value.  Now you try to create the
> null, succeed, and then rename.  We won't know which of
> the two non-null values are valid, but worse yet, I think
> one of them should have failed in the first place.

> Sounds like we would need some form of locking around
> here.  Is your goal "no locks", or "less locks"?
(answered below)

> > I don't know how this new scheme could be made to work
> > with the current scheme,...
> 
> It is much more important to know if/why yours is better
> than the current scheme in the first place.  

The goal is: "no locks which do not have a clearly defined 
reliable recovery procedure".

Stale locks without a reliable recovery procedure will lead 
to stolen locks.  At this point it is only a matter of luck 
whether this leads to data loss or not.  So I do hope to 
convince people first that the current scheme is bad, not that 
my scheme is better!  My scheme was proposed to get people 
thinking that we may not have to use locks to get reliable 
updates.

> Without an
> analysis on how the new scheme interacts with the packed
> refs and gives better behaviour, that is kinda difficult.

Fair enough. I will attempt this if the basic idea seems at 
least sane?  I do hope that eventually the packed-refs piece 
and its locking will be reconsidered also; as Michael pointed 
out it has issues already.  So, I am hoping to get people 
thinking more about lockless approaches to all the pieces. I 
think I have some solutions to some of the other pieces also, 
but I don't want to overwhelm the discussion all at once 
(especially if my first piece is shown to be flawed, or if no 
one has any interest in eliminating the current locks?)

> I think transition plans can wait until that is done.  If
> it is not even marginally better, we do not have to worry
> about transitioning at all.  If it is only marginally
> better, the transition has to be designed to be no impact
> to the existing repositories.  If it is vastly better, we
> might be able to afford a flag day.

OK, makes sense, I jumped the gun a bit,

-Martin

^ permalink raw reply

* Re: [PATCH v3 19/19] Add git-check-ignore sub-command
From: Adam Spiers @ 2012-12-29  1:23 UTC (permalink / raw)
  To: git list
In-Reply-To: <7vtxr5ho01.fsf@alter.siamese.dyndns.org>

On Fri, Dec 28, 2012 at 01:21:02PM -0800, Junio C Hamano wrote:
> Adam Spiers <git@adamspiers.org> writes:
> > +static void output_exclude(const char *path, struct exclude *exclude)
> > +{
> > +	char *bang = exclude->flags & EXC_FLAG_NEGATIVE ? "!" : "";
> > +	char *dir  = (exclude->flags & EXC_FLAG_MUSTBEDIR) ? "/" : "";
> 
> That's inconsistent.  Either s/bang/negated/ or s/dir/slash/ (but
> obviously not both).

OK.  I'll make the use of parentheses there consistent too.

> > +static int check_ignore(const char *prefix, const char **pathspec)
> > +{
> > +	struct dir_struct dir;
> > +...
> > +	if (pathspec) {
> > +...
> > +	} else {
> > +		printf("no pathspec\n");
> > +	}
> 
> Is this an error message, an informative warning, or what?  The
> command is presumably for script consumption downstream of a pipe.
> Does it make sense to emit this message to their standard input?
> Does it even have to be output?  Especially what should happen when
> the user says "--quiet"?
>
> Perhaps
> 
> 	if (!quiet)
> 		fprintf(stderr, "no pathspec given.\n");
> 
> or something?

Hmm.  I suspect this was an unfinished remnant of one of my early
prototypes, because this code path never gets hit.  There's even a
test which checks that a fatal error is generated when no pathspecs
are given via argv.  

However, the test for behaviour when no pathspecs are given via
--stdin *is* missing.  In this case, I think the code you suggest
above makes sense for generating a warning, and the existing behaviour
of returning an exit code of 1 also seems correct.  I have added your
code and a test to cover it.

> > +int cmd_check_ignore(int argc, const char **argv, const char *prefix)
> > +{
> > +	int num_ignored = 0;
> 
> I do not think you need to initialize this one.

Fixed.

> > +	if (stdin_paths) {
> > +		num_ignored = check_ignore_stdin_paths(prefix);
> > +	} else {
> > +		num_ignored = check_ignore(prefix, argv);
> > +		maybe_flush_or_die(stdout, "ignore to stdout");
> > +	}
> > +
> > +	return num_ignored > 0 ? 0 : 1;
> 
> Given that num_ignored will always be >=0, that is a funny way to
> say
> 
> 	return !num_ignored;

Personally I find that harder to read, but I'm not a regular C
programmer so I'll defer to your sense of style.

> or if you plan to report a non-fatal error as negative return from
> the two check_ignore* functions,

I don't think that's necessary here, but I'll bear it in mind.

> > +stderr_contains () {
> > +	regexp="$1"
> > +	if grep -q "$regexp" "$HOME/stderr"
> 
> Please do not use "grep -q"; the output from commands in test
> harness is normally hidden already.  This only makes script more
> cluttered and robs debuggers of potentially useful output when the
> test script is run with "-v".

Fixed.

> > +test_check_ignore () {
> > +	args="$1" expect_code="${2:-0}" global_args="$3"
> > +
> > +	init_vars &&
> > +	rm -f "$HOME/stdout" "$HOME/stderr" "$HOME/cmd" &&
> > +	echo git $global_args check-ignore $quiet_opt $verbose_opt $args \
> > +		>"$HOME/cmd" &&
> 
> Does a debugger needs this "cmd" file when we already have "-v" option?

Yes, I think it's useful; for example:

    $ ./t0008-ignores.sh -i -v

    [... test fails ...]

    $ cd trash\ directory.t0008-ignores
    $ gdb --args ../../$(<cmd)

> > +test_expect_success_multi () {
> > +	prereq=
> > +	if test $# -eq 4
> > +	then
> > +		prereq=$1
> > +		shift
> > +	fi
> > +	testname="$1" expect_verbose="$2" code="$3"
> > +
> > +	expect=$( echo "$expect_verbose" | sed -e 's/.*	//' )
> > +
> > +	test_expect_success $prereq "$testname" "
> > +		expect '$expect' &&
> > +		$code
> > +	"
> 
> This is brittle when $expect may have single quotes, isn't it?

Currently $expect will never have single quotes, but I grant this may
change in the future.

> Perhaps
> 
> 	test_expect_success $prereq "$testname" '
> 		expect "$expect" && eval "$code"
>         '
> 
> may fix it,

Yes that works, thanks.

> but in general I hate to see each test script trying to
> invent their own mini-harness like this (and getting them wrong).

test_expect_success_multi is specific to check-ignore and avoids a
huge amount of code duplication.  I can't think of a better approach
but if you can then I'm all ears.

I believe this only leaves two items from your review of v3 which are
yet to be resolved:

1. What should validate_path() should be renamed to in order to avoid
   ambiguity with other path validation functions?  Currently my
   preferred choice is die_if_path_beyond_symlink() but I don't really
   mind - I'll go with that unless I hear someone prefers another
   name.

2. The now-public functions fill_pathspec_matches() and
   find_used_pathspec(), need to be documented, and in particular need
   to mention that they traverse the index not a tree or the working
   directory.  I'll work on improving my understanding of these
   functions to the point where I can document them accurately (but
   hints are still welcome).

Once these are both resolved, I'll re-roll patches 16--19 only of the
v3 series, labelled as v4.

FYI, attached is the diff between check-ignore-v3 and my current
check-ignore, which is available at github:

    https://github.com/aspiers/git/commits/check-ignore

diff --git a/builtin/check-ignore.c b/builtin/check-ignore.c
index c825736..06e250e 100644
--- a/builtin/check-ignore.c
+++ b/builtin/check-ignore.c
@@ -27,8 +27,8 @@ static const struct option check_ignore_options[] = {
 
 static void output_exclude(const char *path, struct exclude *exclude)
 {
-	char *bang = exclude->flags & EXC_FLAG_NEGATIVE ? "!" : "";
-	char *dir  = (exclude->flags & EXC_FLAG_MUSTBEDIR) ? "/" : "";
+	char *bang  = exclude->flags & EXC_FLAG_NEGATIVE  ? "!" : "";
+	char *slash = exclude->flags & EXC_FLAG_MUSTBEDIR ? "/" : "";
 	if (!null_term_line) {
 		if (!verbose) {
 			write_name_quoted(path, stdout, '\n');
@@ -36,7 +36,7 @@ static void output_exclude(const char *path, struct exclude *exclude)
 			quote_c_style(exclude->el->src, NULL, stdout, 0);
 			printf(":%d:%s%s%s\t",
 			       exclude->srcpos,
-			       bang, exclude->pattern, dir);
+			       bang, exclude->pattern, slash);
 			quote_c_style(path, NULL, stdout, 0);
 			fputc('\n', stdout);
 		}
@@ -47,7 +47,7 @@ static void output_exclude(const char *path, struct exclude *exclude)
 			printf("%s%c%d%c%s%s%s%c%s%c",
 			       exclude->el->src, '\0',
 			       exclude->srcpos, '\0',
-			       bang, exclude->pattern, dir, '\0',
+			       bang, exclude->pattern, slash, '\0',
 			       path, '\0');
 		}
 	}
@@ -57,8 +57,10 @@ static int check_ignore(const char *prefix, const char **pathspec)
 {
 	struct dir_struct dir;
 	const char *path;
-	char *seen = NULL;
-	int num_ignored = 0;
+	char *seen;
+	int num_ignored = 0, i;
+	struct path_exclude_check check;
+	struct exclude *exclude;
 
 	/* read_cache() is only necessary so we can watch out for submodules. */
 	if (read_cache() < 0)
@@ -68,38 +70,36 @@ static int check_ignore(const char *prefix, const char **pathspec)
 	dir.flags |= DIR_COLLECT_IGNORED;
 	setup_standard_excludes(&dir);
 
-	if (pathspec) {
-		int i;
-		struct path_exclude_check check;
-		struct exclude *exclude;
-
-		path_exclude_check_init(&check, &dir);
-		if (!seen)
-			seen = find_used_pathspec(pathspec);
-		for (i = 0; pathspec[i]; i++) {
-			const char *full_path;
-			path = pathspec[i];
-			full_path = prefix_path(prefix, prefix
-						? strlen(prefix) : 0, path);
-			full_path = treat_gitlink(full_path);
-			validate_path(full_path, prefix);
-			if (!seen[i] && path[0]) {
-				int dtype = DT_UNKNOWN;
-				exclude = last_exclude_matching_path(&check, full_path,
-								     -1, &dtype);
-				if (exclude) {
-					if (!quiet)
-						output_exclude(path, exclude);
-					num_ignored++;
-				}
+	if (!pathspec || !*pathspec) {
+		if (!quiet)
+			fprintf(stderr, "no pathspec given.\n");
+		return 0;
+	}
+
+	path_exclude_check_init(&check, &dir);
+	seen = find_used_pathspec(pathspec);
+	for (i = 0; pathspec[i]; i++) {
+		const char *full_path;
+		path = pathspec[i];
+		full_path = prefix_path(prefix, prefix
+					? strlen(prefix) : 0, path);
+		full_path = treat_gitlink(full_path);
+		die_if_path_beyond_symlink(full_path, prefix);
+		if (!seen[i] && path[0]) {
+			int dtype = DT_UNKNOWN;
+			exclude = last_exclude_matching_path(&check, full_path,
+							     -1, &dtype);
+			if (exclude) {
+				if (!quiet)
+					output_exclude(path, exclude);
+				num_ignored++;
 			}
 		}
-		free(seen);
-		clear_directory(&dir);
-		path_exclude_check_clear(&check);
-	} else {
-		printf("no pathspec\n");
 	}
+	free(seen);
+	clear_directory(&dir);
+	path_exclude_check_clear(&check);
+
 	return num_ignored;
 }
 
@@ -136,7 +136,7 @@ static int check_ignore_stdin_paths(const char *prefix)
 
 int cmd_check_ignore(int argc, const char **argv, const char *prefix)
 {
-	int num_ignored = 0;
+	int num_ignored;
 
 	git_config(git_default_config, NULL);
 
@@ -144,7 +144,7 @@ int cmd_check_ignore(int argc, const char **argv, const char *prefix)
 			     check_ignore_usage, 0);
 
 	if (stdin_paths) {
-		if (0 < argc)
+		if (argc > 0)
 			die(_("cannot specify pathnames with --stdin"));
 	} else {
 		if (null_term_line)
@@ -166,5 +166,5 @@ int cmd_check_ignore(int argc, const char **argv, const char *prefix)
 		maybe_flush_or_die(stdout, "ignore to stdout");
 	}
 
-	return num_ignored > 0 ? 0 : 1;
+	return !num_ignored;
 }
diff --git a/pathspec.c b/pathspec.c
index 6724121..3789b14 100644
--- a/pathspec.c
+++ b/pathspec.c
@@ -36,9 +36,10 @@ char *find_used_pathspec(const char **pathspec)
 }
 
 /*
- * Check whether path refers to a submodule, or something inside a
- * submodule.  If the former, returns the path with any trailing slash
- * stripped.  If the latter, dies with an error message.
+ * Check the index to see whether path refers to a submodule, or
+ * something inside a submodule.  If the former, returns the path with
+ * any trailing slash stripped.  If the latter, dies with an error
+ * message.
  */
 const char *treat_gitlink(const char *path)
 {
@@ -78,9 +79,9 @@ void treat_gitlinks(const char **pathspec)
 
 /*
  * Dies if the given path refers to a file inside a symlinked
- * directory.
+ * directory in the index.
  */
-void validate_path(const char *path, const char *prefix)
+void die_if_path_beyond_symlink(const char *path, const char *prefix)
 {
 	if (has_symlink_leading_path(path, strlen(path))) {
 		int len = prefix ? strlen(prefix) : 0;
@@ -90,7 +91,8 @@ void validate_path(const char *path, const char *prefix)
 
 /*
  * Normalizes argv relative to prefix, via get_pathspec(), and then
- * runs validate_path() on each path in the normalized list.
+ * runs die_if_path_beyond_symlink() on each path in the normalized
+ * list.
  */
 const char **validate_pathspec(const char **argv, const char *prefix)
 {
@@ -99,7 +101,7 @@ const char **validate_pathspec(const char **argv, const char *prefix)
 	if (pathspec) {
 		const char **p;
 		for (p = pathspec; *p; p++) {
-			validate_path(*p, prefix);
+			die_if_path_beyond_symlink(*p, prefix);
 		}
 	}
 
diff --git a/pathspec.h b/pathspec.h
index c251441..1961b19 100644
--- a/pathspec.h
+++ b/pathspec.h
@@ -1,6 +1,11 @@
+#ifndef PATHSPEC_H
+#define PATHSPEC_H
+
 extern char *find_used_pathspec(const char **pathspec);
 extern void fill_pathspec_matches(const char **pathspec, char *seen, int specs);
 extern const char *treat_gitlink(const char *path);
 extern void treat_gitlinks(const char **pathspec);
 extern void validate_path(const char *path, const char *prefix);
 extern const char **validate_pathspec(const char **argv, const char *prefix);
+
+#endif /* PATHSPEC_H */
diff --git a/setup.c b/setup.c
index 03d6d5c..9570147 100644
--- a/setup.c
+++ b/setup.c
@@ -250,8 +250,12 @@ static const char *prefix_pathspec(const char *prefix, int prefixlen, const char
 }
 
 /*
- * prefix - a path relative to the root of the working tree
- * pathspec - a list of paths underneath the prefix path
+ * N.B. get_pathspec() is deprecated in favor of the "struct pathspec"
+ * based interface - see pathspec_magic above.
+ *
+ * Arguments:
+ *  - prefix - a path relative to the root of the working tree
+ *  - pathspec - a list of paths underneath the prefix path
  *
  * Iterates over pathspec, prepending each path with prefix,
  * and return the resulting list.
diff --git a/t/t0008-ignores.sh b/t/t0008-ignores.sh
index 3937ca4..8780b1e 100755
--- a/t/t0008-ignores.sh
+++ b/t/t0008-ignores.sh
@@ -39,7 +39,7 @@ test_stderr () {
 
 stderr_contains () {
 	regexp="$1"
-	if grep -q "$regexp" "$HOME/stderr"
+	if grep "$regexp" "$HOME/stderr"
 	then
 		return 0
 	else
@@ -86,10 +86,10 @@ test_expect_success_multi () {
 
 	expect=$( echo "$expect_verbose" | sed -e 's/.*	//' )
 
-	test_expect_success $prereq "$testname" "
-		expect '$expect' &&
-		$code
-	"
+	test_expect_success $prereq "$testname" '
+		expect "$expect" &&
+		eval "$code"
+	'
 
 	for quiet_opt in '-q' '--quiet'
 	do
@@ -164,6 +164,15 @@ test_expect_success_multi 'empty command line' '' '
 	stderr_contains "fatal: no path specified"
 '
 
+test_expect_success_multi '--stdin with empty STDIN' '' '
+	test_check_ignore "--stdin" 1 </dev/null &&
+	if test -n "$quiet_opt"; then
+		test_stderr ""
+	else
+		test_stderr "no pathspec given."
+	fi
+'
+
 test_expect_success '-q with multiple args' '
 	expect "" &&
 	test_check_ignore "-q one two" 128 &&

^ permalink raw reply related

* Re: [PATCH v3 18/19] setup.c: document get_pathspec()
From: Junio C Hamano @ 2012-12-29  1:36 UTC (permalink / raw)
  To: Adam Spiers; +Cc: git list
In-Reply-To: <CAOkDyE89fm5_z2V=VW5n+8XAvB6tE+DqciXttbhX29X8mWjXTQ@mail.gmail.com>

Adam Spiers <git@adamspiers.org> writes:

> I've added this sentence to the top of the comments above
> get_pathspec():
>
>     /*
>      * N.B. get_pathspec() is deprecated in favor of the "struct pathspec"
>      * based interface - see pathspec_magic above.
>      *
>     [...]
>
> That should be sufficient to discourage people from adding new users
> of get_pathspec().

Yeah, that sounds sensible.

Thanks, and happy new year to you ;-)

^ permalink raw reply

* Re: [PATCH 0/4] pre-push hook support
From: Junio C Hamano @ 2012-12-29  2:01 UTC (permalink / raw)
  To: Aaron Schrab; +Cc: git
In-Reply-To: <1356735452-21667-1-git-send-email-aaron@schrab.com>

Aaron Schrab <aaron@schrab.com> writes:

> There have been at least a couple of submissions to add support for a
> pre-push hook, which were rejected at least partially because they didn't
> provide enough information to a hook script for it to determine what was
> to be pushed any better than a separate wrapper around the 'git push'
> command would be able to do.  In this series I attempt to address that
> problem.
>
> The first two patches in this series do a little bit of refactoring in
> order to make it easier to call hooks with a variable number of arguments.
>
> The third patch actually adds support for calling a pre-push hook.  If it
> exists, it will be called with the name and URL of the destination remote
> (if a named remote isn't being used, the URL will be supplied for both)
> followed by another argument for each ref being pushed; these arguments
> take the form:
>
>   <local ref>:<local sha1>:<remote ref>:<remote sha1>

One lesson we learned long time ago while doing hooks is to avoid
unbound number of command line arguments and instead feed them from
the standard input.  I think this should do the same.

> This should provide enough information for a script to easily determine
> the set of commits that is being pushed, and thus make a decision if that
> should be allowed.

How does the hook communicate its decision to the calling Git?

Will it be "all-or-none", or "I'll allow these but not those"?

^ permalink raw reply

* Re: [PATCH 1/4] hooks: Add function to check if a hook exists
From: Junio C Hamano @ 2012-12-29  2:08 UTC (permalink / raw)
  To: Aaron Schrab; +Cc: git
In-Reply-To: <1356735452-21667-2-git-send-email-aaron@schrab.com>

Aaron Schrab <aaron@schrab.com> writes:

> Create find_hook() function to determine if a given hook exists and is
> executable.  If it is the path to the script will be returned, otherwise
> NULL is returned.

Sounds like a sensible thing to do.  To make sure the API is also
sensible, all the existing hooks should be updated to use this API,
no?

> This is in support for an upcoming run_hook_argv() function which will
> expect the full path to the hook script as the first element in the
> argv_array.  

There is currently a public function called run_hook() that squats
on the good name with a kludgy API that is too specific to using
separate index file.  Back when it was a private helper in the
implementation of "git commit", it was perfectly fine, but it was
exported without giving much thought on the API.

If you are introducing a new run_hook_* function, give it a generic
enough API that lets all the existing hook callers to use it.  I
would imagine that the API requirement may be modelled after
run_command() API so that we can pass argv[] and tweak the hook's
environ[], as well as feeding its stdin and possibly reading from
its stdout.  That would be very useful.

^ permalink raw reply

* Re: [PATCH v3 19/19] Add git-check-ignore sub-command
From: Adam Spiers @ 2012-12-29  3:32 UTC (permalink / raw)
  To: git list
In-Reply-To: <20121229012352.GA20379@pacific.linksys.moosehall>

On Sat, Dec 29, 2012 at 01:23:52AM +0000, Adam Spiers wrote:
> FYI, attached is the diff between check-ignore-v3 and my current
> check-ignore, which is available at github:
> 
>     https://github.com/aspiers/git/commits/check-ignore

[snipped]

> diff --git a/pathspec.c b/pathspec.c
> index 6724121..3789b14 100644
> --- a/pathspec.c
> +++ b/pathspec.c

[snipped]

> -void validate_path(const char *path, const char *prefix)
> +void die_if_path_beyond_symlink(const char *path, const char *prefix)

[snipped]

> diff --git a/pathspec.h b/pathspec.h
> index c251441..1961b19 100644
> --- a/pathspec.h
> +++ b/pathspec.h
> @@ -1,6 +1,11 @@
> +#ifndef PATHSPEC_H
> +#define PATHSPEC_H
> +
>  extern char *find_used_pathspec(const char **pathspec);
>  extern void fill_pathspec_matches(const char **pathspec, char *seen, int specs);
>  extern const char *treat_gitlink(const char *path);
>  extern void treat_gitlinks(const char **pathspec);
>  extern void validate_path(const char *path, const char *prefix);
               ^^^^^^^^^^^^^
I forgot to rename this one.  Will be fixed in v4.

^ permalink raw reply

* Re: [PATCH] Use longer alias names in subdirectory tests
From: Junio C Hamano @ 2012-12-29  3:42 UTC (permalink / raw)
  To: Aaron Schrab; +Cc: git
In-Reply-To: <1356735786-24001-1-git-send-email-aaron@schrab.com>

Aaron Schrab <aaron@schrab.com> writes:

> When testing aliases in t/t1020-subdirectory.sh use longer names so that
> they're less likely to conflict with a git-* command somewhere in the
> $PATH.

Thanks.

In the longer term we might want to rethink the way we run the tests
so that random $PATH the user has has less chance of interacting
with our tests (we had a similar topic around completion output
recently), but until that happens, I think this is a good change.

^ permalink raw reply

* Re: [RFC] pack-objects: compression level for non-blobs
From: Nguyen Thai Ngoc Duy @ 2012-12-29  4:34 UTC (permalink / raw)
  To: Jeff King; +Cc: David Michael Barr, Git Mailing List
In-Reply-To: <20121229004104.GA24828@sigill.intra.peff.net>

On Sat, Dec 29, 2012 at 7:41 AM, Jeff King <peff@peff.net> wrote:
> I wonder if we could do even better, though. For a traversal, we only
> need to look at the commit header. We could potentially do a progressive
> inflate and stop before getting to the commit message (which is the bulk
> of the data, and the part that is most likely to benefit from
> compression).

Commit cache should solve this efficiently as it also eliminates
parsing cost. We discussed this last time as a side topic of the
reachability bitmap feature.
-- 
Duy

^ permalink raw reply

* Re: [PATCH] Remove the suggestion to use parsecvs, which is currently broken.
From: Eric S. Raymond @ 2012-12-29  4:42 UTC (permalink / raw)
  To: Heiko Voigt; +Cc: git
In-Reply-To: <20121228230149.GA3575@book-mint>

Heiko Voigt <hvoigt@hvoigt.net>:
> Maybe you could add that information to the parsecvs compile
> instructions? I think just because it takes some effort to compile does
> not justify to remove this useful pointer here. When I was converting a
> legacy cvs repository this pointer would have helped me a lot.

I'm parsecvs's maintainer now.  It's not in good shape; there is at
least one other known showstopper besides the build issue.  I would
strongly prefer to direct peoples' attention away from it until I
have time to fix it and cut a release.  This is not a distant 
prospect - two or three weeks out, maybe.

The priority that is between me and fixing parsecvs is getting (a)
cvsps and git-cvsimport to a non-broken state, and (b) having a sound
test suite in place so I *know* it's in a non-broken state. As previously
discussed, I will then apply that test suite to parsecvs.

Heiko, you can speed up the process by (a) adapting your tests for
the new cvsps test code, and (b) merging the fix you wrote so cvsps
would pass the t9603 test.  

The sooner I can get that out of the way, the sooner I will be avble
to pay serious attention to parsecvs.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

^ permalink raw reply

* Re: [RFC] pack-objects: compression level for non-blobs
From: Jeff King @ 2012-12-29  5:07 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: David Michael Barr, Git Mailing List
In-Reply-To: <CACsJy8D_E0shqJAvZH7xqij6F4a6qUxkUPNcZL=0yX5w9bLd_g@mail.gmail.com>

On Sat, Dec 29, 2012 at 11:34:09AM +0700, Nguyen Thai Ngoc Duy wrote:

> On Sat, Dec 29, 2012 at 7:41 AM, Jeff King <peff@peff.net> wrote:
> > I wonder if we could do even better, though. For a traversal, we only
> > need to look at the commit header. We could potentially do a progressive
> > inflate and stop before getting to the commit message (which is the bulk
> > of the data, and the part that is most likely to benefit from
> > compression).
> 
> Commit cache should solve this efficiently as it also eliminates
> parsing cost. We discussed this last time as a side topic of the
> reachability bitmap feature.

I agree that a commit cache would solve this (though it can not help the
tree traversal). But just dropping the compression (or doing partial
decompression when we only care about the beginning part) is way less
code and complexity. There's no cache to manage.

-Peff

^ permalink raw reply

* Re: [RFC] pack-objects: compression level for non-blobs
From: Nguyen Thai Ngoc Duy @ 2012-12-29  5:25 UTC (permalink / raw)
  To: Jeff King; +Cc: David Michael Barr, Git Mailing List
In-Reply-To: <20121229050707.GA14475@sigill.intra.peff.net>

On Sat, Dec 29, 2012 at 12:07 PM, Jeff King <peff@peff.net> wrote:
> On Sat, Dec 29, 2012 at 11:34:09AM +0700, Nguyen Thai Ngoc Duy wrote:
>
>> On Sat, Dec 29, 2012 at 7:41 AM, Jeff King <peff@peff.net> wrote:
>> > I wonder if we could do even better, though. For a traversal, we only
>> > need to look at the commit header. We could potentially do a progressive
>> > inflate and stop before getting to the commit message (which is the bulk
>> > of the data, and the part that is most likely to benefit from
>> > compression).
>>
>> Commit cache should solve this efficiently as it also eliminates
>> parsing cost. We discussed this last time as a side topic of the
>> reachability bitmap feature.
>
> I agree that a commit cache would solve this (though it can not help the
> tree traversal).

Yeah, caching trees efficiently is not easy.

> But just dropping the compression (or doing partial
> decompression when we only care about the beginning part) is way less
> code and complexity.

I think I tried the partial decompression for commit header and it did
not help much (or I misremember it, not so sure).

> There's no cache to manage.

If reachability bitmap is implemented, we'll have per-pack cache
infrastructure ready, so less work there for commit cache.
-- 
Duy

^ permalink raw reply

* Re: [RFC] pack-objects: compression level for non-blobs
From: Jeff King @ 2012-12-29  5:27 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: David Michael Barr, Git Mailing List
In-Reply-To: <CACsJy8AN3y_4wcZ_w0zz+ZAaDasRT-+h8vA_fp2j4+FL00dbLw@mail.gmail.com>

On Sat, Dec 29, 2012 at 12:25:04PM +0700, Nguyen Thai Ngoc Duy wrote:

> > But just dropping the compression (or doing partial
> > decompression when we only care about the beginning part) is way less
> > code and complexity.
> 
> I think I tried the partial decompression for commit header and it did
> not help much (or I misremember it, not so sure).

I'll see if I can dig up the reference, as it was something I was going
to look at next.

> > There's no cache to manage.
> 
> If reachability bitmap is implemented, we'll have per-pack cache
> infrastructure ready, so less work there for commit cache.

True. I don't want to dissuade you from doing any commit cache work. I
only wanted to point out that this alternative may have merit because of
its simplicity (so we can use it until a caching solution exists, or
even after, if managing the cache has downsides).

-Peff

^ permalink raw reply

* Re: [PATCH] refs: do not use cached refs in repack_without_ref
From: Jeff King @ 2012-12-29  7:16 UTC (permalink / raw)
  To: Michael Haggerty; +Cc: git, Junio C Hamano
In-Reply-To: <50DAB447.8000101@alum.mit.edu>

On Wed, Dec 26, 2012 at 09:24:39AM +0100, Michael Haggerty wrote:

> I'm sorry to take so long to respond to this patch.  Thank you for
> tracking down this bug and for your careful analysis.
> 
> I think your patch is correct and should fix the first race condition
> that you described.

Thanks for checking it over. I almost cc'd you, as I know you have been
working on ref caching. But I think that the problem is much older, as
we always cached the packed-refs list in memory.

> But I think the continued existence of the other race conditions is an
> indication of a fundamental problem with the reference locking
> policy--independent of the in-RAM reference cache.
> 
> The tacit assumption of the current locking policy is that changes to
> the packed-refs file are not critical for correctness, because loose
> references take precedence over it anyway.  This is true for adding and
> modifying references.  But it is not true for deleting references,
> because there is no way for a deletion to be represented as a loose
> reference in a way that takes precedence over the packed-refs file
> (i.e., there is no way for a loose reference to say "I am deleted,
> regardless of what packed-refs says").  Thus the race conditions for
> deleting references, whether via delete_ref() or via pack_refs() with
> --prune.

Yeah. It would be much nicer if we could just write the null sha1 or a
similar sentinel into the loose ref for "I am deleted". The problem
(besides backwards compatibility) is the usual D/F conflict. I cannot
delete "refs/heads/foo" and then create "refs/heads/foo/bar" if the old
ref file is in the way.

> There is a problem if two processes try to delete a reference at the
> same time, or if one process tries to delete a reference at the same
> time as another process is trying to pack the references.  The reason is
> that there is no "transaction" that spans both the rewriting of the
> packed-refs file and also the deletion of the loose-ref files, and
> therefore it is possible for conflicting changes to be made in the two
> locations.

Just to be clear, are you talking about the races I wrote about in my
other email? Or are there other races? I didn't (and still don't) see
any actual on-disk data loss races. Just ones where a reader may get an
old, packed value (which is still bad, but is less bad than one where a
write is lost).

> I think that all of the problems would be fixed if a lock would be held
> on the packed-refs file during the whole process of deleting any
> reference; i.e., change the algorithms to:

Yeah, I looked at that, too. In fact, before I had correctly analyzed
the problem, I thought that doing so would solve the problem I was
seeing (which I noticed was wrong when it did not pass my tests :) ).

>From your description, I think you realize this, but I want to point out
for other readers: just updating the pack_refs side to call prune_refs
under the lock would create a deadlock with a simultaneous delete (which
takes the individual ref lock first, then the packed-refs lock). Of
course, I do not think git is capable of deadlock, as we typically just
die() instead of blocking on a lock. So maybe it does not matter so
much. :)

> * Delete reference foo:
> 
>   1. Acquire the lock $GIT_DIR/packed-refs.lock (regardless of whether
>      "foo" is a packed ref)

This suffers from the same problem I mentioned in my earlier email: we
create lock contention on packed-refs.lock when there are two
simultaneous deletes, even when the refs aren't packed. That might be an
acceptable tradeoff, though. It's only for deletion, not for update,
which is presumably rare. And it has to happen anyway when both refs are
packed.

>   2. Write to $GIT_DIR/packed-refs.new a version of the packed-refs
>      file that omits "foo"
> [...]

All of the further steps make sense. By deleting from packed-refs first,
we eliminate the read race-condition I mentioned in my earlier email.
The only downside is the possible increased lock contention on
packed-refs.lock.  I'm very tempted to go this route. It's better to be
correct and sometimes die on lock contention than to sometimes give the
wrong answer.

> I would appreciate a critique of my analysis.  Even if you agree, I
> think it would be OK to apply Peff's patch to fix up the most pressing
> problem, then implement the more complete solution later.

I think your analysis is correct, modulo the issue I mentioned. And I
agree that my patch is a good interim, as it fixes the worst case
(losing writes).

-Peff

^ permalink raw reply

* Re: [PATCH v2] wt-status: Show ignored files in untracked dirs
From: Jeff King @ 2012-12-29  7:22 UTC (permalink / raw)
  To: Antoine Pelisse; +Cc: Junio C Hamano, git
In-Reply-To: <CALWbr2yNR=K3MqBVe=sfSxPaJ+-A8utHBqoiHPHPLxr_9e9SVQ@mail.gmail.com>

On Fri, Dec 28, 2012 at 03:05:30PM +0100, Antoine Pelisse wrote:

> Using the example from Michael's mail, I end up having this:
> $ git status --porcelain --ignored
> ?? .gitignore
> ?? x
> ?? y/
> !! x.ignore-me
> !! y/
> 
> y/ is referred as untracked, because it contains untracked files, and
> then as ignored because it
> contains ignored files.
> 
> Showing it twice doesn't feel right though, so I guess we should only
> show "?? y/" with untracked=normal,
> and "!! y/foo.ignore-me" when using untracked=all
> 
> What do you think ?

Good catch. I agree that showing just "?? y/" in the untracked=normal
case makes sense. It makes the definition of "!!" to mean "all untracked
files in this path are ignored". IOW, showing "??" means there are one
or more untracked, unignored files. There may _also_ be ignored files,
but we do not say (nor we even necessarily need to bother checking).

In retrospect, I think it might have made more sense to use the second
character of an untracked line to represent "ignored". That is, the
output:

  ?? .gitignore
  ?? x
  ?! y/
  !! x.ignore-me

would make sense to me. But that would be a backwards-incompatible
change at this point, and I don't think it's worth it.

-Peff

^ permalink raw reply

* Re: Lockless Refs?  (Was [PATCH] refs: do not use cached refs in repack_without_ref)
From: Jeff King @ 2012-12-29  8:10 UTC (permalink / raw)
  To: Martin Fick; +Cc: Michael Haggerty, git, Junio C Hamano
In-Reply-To: <201212271611.52203.mfick@codeaurora.org>

On Thu, Dec 27, 2012 at 04:11:51PM -0700, Martin Fick wrote:

> For a single user repo this is not a big deal, the lock can 
> always be cleaned up manually (and it is a rare occurrence).  
> However, in a multi user server environment, possibly even 
> from multiple hosts over a shared filesystem such as NFS, 
> stale locks could lead to serious downtime and risky recovery 
> (since it is currently hard to figure out if a lock really is 
> stale).  Even though stale locks are probably rare even today 
> in the larger shared repo case, as git scales to even larger 
> shared repositories, this will eventually become more of a 
> problem *1.  Naturally, this has me thinking that git should 
> possibly consider moving towards a lockless design for refs 
> in the long term.

FWIW, I am involved in cleaning up stale locks for a very large git
hosting site. It actually happens surprisingly little. I think it is
mostly because git holds actual locks for a very short period of time
(just enough to check that the value is unchanged from when we started a
lengthy operation, and then atomically write the new value).

So I agree it would be cool (and maybe open up new realms of
scalability) for git to be lockless, but in my experience, this isn't
that pressing a problem (and any solutions are not going to be backwards
compatible, so there is going to be a high deployment cost).

> My idea is based on using filenames to store sha1s instead of 
> file contents.  To do this, the sha1 one of a ref would be 
> stored in a file in a directory named after the loose ref.  I 
> believe this would then make it possible to have lockless 
> atomic ref updates by renaming the file.
> 
> To more fully illustrate the idea, imagine that any file 
> (except for the null file) in the directory will represent the 
> value of the ref with its name, then the following 
> transitions can represent atomic state changes to a refs 
> value and existence:

Hmm. So basically you are relying on atomic rename() to move the value
around within a directory, rather than using write to move it around
within a file. Atomic rename is usually something we have on local
filesystems (and I think we rely on it elsewhere). Though I would not be
surprised if it is not atomic on all networked filesystems (though it is
on NFS, at least).

> 1) To update the value from a known value to a new value 
> atomically, simply rename the file to the new value.  This 
> operation should only succeed if the file exists and is still 
> named old value before the rename.  This should even be 
> faster than today's approach, especially on remote filesystems 
> since it would require only 1 round trip in the success case 
> instead of 3!

OK. Makes sense.

> 2) To delete the ref, simply delete the filename representing 
> the current value of the ref.  This ensures that you are 
> deleting the ref from a specific value.  I am not sure if git 
> needs to be able to delete refs without knowing their values?  
> If so, this would require reading the value and looping until 
> the delete succeeds, this may be a bit slow for a constantly 
> updated ref, but likely a rare situation (and not likely 
> worse than trying to acquire the ref-lock today).  Overall, 
> this again would likely be faster than today's approach.

We do sometimes delete without knowing the value. In most cases we would
not want to do this, but for some "force"-type commands, we do. You
would actually have the same problem with updating above, as we
sometimes update with the intent to overwrite whatever is there.

> 3) To create a ref, it must be renamed from the null file (sha 
> 0000...) to the new value just as if it were being updated 
> from any other value, but there is one extra condition: 
> before renaming the null file, a full directory scan must be 
> done to ensure that the null file is the only file in the 
> directory (this condition exists because creating the 
> directory and null file cannot be atomic unless the filesystem 
> supports atomic directory renames, an expectation git does 
> not currently make).  I am not sure how this compares to 
> today's approach, but including the setup costs (described 
> below), I suspect it is slower.

Hmm. mkdir is atomic. So wouldn't it be sufficient to just mkdir and
create the correct sha1 file?  A simultaneous creator would fail on the
mkdir and abort. A simultaneous reader might see the directory, but it
would either see it as empty, or with the correct file. In the former
case, it would treat that the same as if the directory did not exist.

Speaking of which, you did not cover reading at all, but it would have
to be:

  dh = opendir(ref);
  if (!dh) {
          if (errno == ENOENT)
                  return 0; /* no such ref */
          else
                  return error("couldn't read ref");
  }

  while ((ent = readdir(dh)) {
          if (ent->d_name[0] == '.')
                  /*
                   * skip "." and "..", and leave room for annotating 
                   * refs via dot-files
                   */
                   continue;
          /* otherwise, we found it */
          if (get_sha1_hex(ent->d_name, sha1) < 0)
                  return error("weird junk in ref dir?");
          return 1; /* found it */
  }
  return 0; /* did not contain an entry; ref being created? Retry? */

Is readdir actually atomic with respect to directory updates? That is,
if I am calling readdir() and somebody else is renaming, what do I get?
POSIX says:

   If a file is removed from or added to the directory after the most
   recent call to opendir() or rewinddir(), whether a subsequent call to
   readdir() returns an entry for that file is unspecified.

If I get one or the other file (that is, the old name or the new one),
it is OK. It does not matter which, as it is a race whether I see the
old value or the new one during an update. But according to POSIX, it is
possible that I may see neither.

I suppose we could rewinddir() and retry. We might hit the race again
(if somebody else is updating quickly), but realistically, this will
happen very infrequently, and we can just keep trying until we win the
race and get a valid read.

> I don't know how this new scheme could be made to work with 
> the current scheme, it seems like perhaps new git releases 
> could be made to understand both the old and the new, and a 
> config option could be used to tell it which method to write 
> new refs with.  Since in this new scheme ref directory names 
> would conflict with old ref filenames, this would likely 
> prevent both schemes from erroneously being used 
> simultaneously (so they shouldn't corrupt each other), except 
> for the fact that refs can be nested in directories which 
> confuses things a bit.  I am not sure what a good solution to 
> this is?

I think you would need to bump core.repositoryformatversion, and just
never let old versions of git access the repository directly. Not the
end of the world, but it certainly increases deployment effort. If we
were going to do that, it would probably make sense to think about
solving the D/F conflict issues at the same time (i.e., start calling
"refs/heads/foo" in the filesystem "refs.d/heads.d/foo.ref" so that it
cannot conflict with "refs.d/heads.d/foo.d/bar.ref").

-Peff

^ permalink raw reply

* Re: Lockless Refs?  (Was [PATCH] refs: do not use cached refs in repack_without_ref)
From: Jeff King @ 2012-12-29  8:12 UTC (permalink / raw)
  To: Martin Fick; +Cc: Michael Haggerty, git, Junio C Hamano, Shawn Pearce
In-Reply-To: <201212280750.14695.mfick@codeaurora.org>

On Fri, Dec 28, 2012 at 07:50:14AM -0700, Martin Fick wrote:

> Hmm, actually I believe that with a small modification to the 
> semantics described here it would be possible to make multi 
> repo/branch commits work.   Simply allow the ref filename to 
> be locked by a transaction by appending the transaction ID to 
> the filename.  So if transaction 123 wants to lock master 
> which points currently to abcde, then it will move 
> master/abcde to master/abcde_123.  If transaction 123 is 
> designed so that any process can commit/complete/abort it 
> without requiring any locks which can go stale, then this ref 
> lock will never go stale either (easy as long as it writes 
> all its proposed updates somewhere upfront and has atomic 
> semantics for starting, committing and aborting).  On commit, 
> the ref lock gets updated to its new value: master/newsha and 
> on abort it gets unlocked: master/abcde.

Hmm. I thought our goal was to avoid locks? Isn't this just locking by
another name?

I guess your point is to have no locks in the "normal" case, and have
locked transactions as an optional add-on?

-Peff

^ permalink raw reply

* Re: Lockless Refs?
From: Jeff King @ 2012-12-29  8:16 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Martin Fick, Michael Haggerty, git, Shawn Pearce
In-Reply-To: <7vhan6jdx3.fsf@alter.siamese.dyndns.org>

On Fri, Dec 28, 2012 at 09:15:52AM -0800, Junio C Hamano wrote:

> Martin Fick <mfick@codeaurora.org> writes:
> 
> > Hmm, actually I believe that with a small modification to the 
> > semantics described here it would be possible to make multi 
> > repo/branch commits work....
> >
> > Shawn talked about adding multi repo/branch transaction 
> > semantics to jgit, this might be something that git wants to 
> > support also at some point?
> 
> Shawn may have talked about it and you may have listened to it, but
> others wouldn't have any idea what kind of "multi repo/branch
> transaction" you are talking about.  Is it about "I want to push
> this ref to that repo and push this other ref to that other repo",
> in what situation will it be used/useful, what are the failure
> modes, what are failure tolerances by the expected use cases, ...?
> 
> Care to explain?

I cannot speak for Martin, but I am assuming the point is to atomically
update 2 (or more) refs on the same repo. That is, if I have a branch
"refs/heads/foo" and a ref pointing to meta-information (say, notes
about commits in foo, in "refs/notes/meta/foo"), I would want to "git
push" them, and only update them if _both_ will succeed, and otherwise
fail and update nothing.

I think Shawn mentioned this at the last GitTogether as a stumbling
block for pushing more of Gerrit's meta-information as refs over the git
protocol. But I might be mis-remembering.

-Peff

^ permalink raw reply

* Re: [RFC] pack-objects: compression level for non-blobs
From: Jeff King @ 2012-12-29  9:05 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: David Michael Barr, Git Mailing List
In-Reply-To: <20121229052747.GA14928@sigill.intra.peff.net>

On Sat, Dec 29, 2012 at 12:27:47AM -0500, Jeff King wrote:

> > I think I tried the partial decompression for commit header and it did
> > not help much (or I misremember it, not so sure).
> 
> I'll see if I can dig up the reference, as it was something I was going
> to look at next.

I tried the simple patch below, but it actually made things slower!  I'm
assuming it is because the streaming setup is not micro-optimized very
well. A custom read_sha1_until_blank_line() could probably do better.

diff --git a/commit.c b/commit.c
index e8eb0ae..efd6c06 100644
--- a/commit.c
+++ b/commit.c
@@ -8,6 +8,7 @@
 #include "notes.h"
 #include "gpg-interface.h"
 #include "mergesort.h"
+#include "streaming.h"
 
 static struct commit_extra_header *read_commit_extra_header_lines(const char *buf, size_t len, const char **);
 
@@ -306,6 +307,39 @@ int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long s
 	return 0;
 }
 
+static void *read_commit_header(const unsigned char *sha1,
+				enum object_type *type,
+				unsigned long *size)
+{
+	static const int chunk_size = 256;
+	struct strbuf buf = STRBUF_INIT;
+	struct git_istream *st;
+
+	st = open_istream(sha1, type, size, NULL);
+	if (!st)
+		return NULL;
+	while (1) {
+		size_t start = buf.len;
+		ssize_t readlen;
+
+		strbuf_grow(&buf, chunk_size);
+		readlen = read_istream(st, buf.buf + start, chunk_size);
+		buf.buf[start + readlen + 1] = '\0';
+		buf.len += readlen;
+
+		if (readlen < 0) {
+			close_istream(st);
+			strbuf_release(&buf);
+			return NULL;
+		}
+		if (!readlen || strstr(buf.buf + start, "\n\n"))
+			break;
+	}
+
+	close_istream(st);
+	return strbuf_detach(&buf, size);
+}
+
 int parse_commit(struct commit *item)
 {
 	enum object_type type;
@@ -317,7 +351,11 @@ int parse_commit(struct commit *item)
 		return -1;
 	if (item->object.parsed)
 		return 0;
-	buffer = read_sha1_file(item->object.sha1, &type, &size);
+
+	if (!save_commit_buffer)
+		buffer = read_commit_header(item->object.sha1, &type, &size);
+	else
+		buffer = read_sha1_file(item->object.sha1, &type, &size);
 	if (!buffer)
 		return error("Could not read %s",
 			     sha1_to_hex(item->object.sha1));

^ permalink raw reply related

* Re: [RFC] pack-objects: compression level for non-blobs
From: Jeff King @ 2012-12-29  9:48 UTC (permalink / raw)
  To: Nguyen Thai Ngoc Duy; +Cc: David Michael Barr, Git Mailing List
In-Reply-To: <20121229090558.GA31291@sigill.intra.peff.net>

On Sat, Dec 29, 2012 at 04:05:58AM -0500, Jeff King wrote:

> On Sat, Dec 29, 2012 at 12:27:47AM -0500, Jeff King wrote:
> 
> > > I think I tried the partial decompression for commit header and it did
> > > not help much (or I misremember it, not so sure).
> > 
> > I'll see if I can dig up the reference, as it was something I was going
> > to look at next.
> 
> I tried the simple patch below, but it actually made things slower!  I'm
> assuming it is because the streaming setup is not micro-optimized very
> well. A custom read_sha1_until_blank_line() could probably do better.

Something like the patch below, which does speed things up. But not
nearly as much as I'd hoped:

  [before]
  $ best-of-five git rev-list --count --all
  real    0m4.197s
  user    0m4.112s
  sys     0m0.072s

  [after]
  $ best-of-five git rev-list --count --all
  real    0m3.782s
  user    0m3.708s
  sys     0m0.064s

Only about a 10% speedup (versus ~75% with uncompressed commits).

---
diff --git a/cache.h b/cache.h
index 18fdd18..a494d3b 100644
--- a/cache.h
+++ b/cache.h
@@ -724,6 +724,7 @@ int offset_1st_component(const char *path);
 
 /* object replacement */
 #define READ_SHA1_FILE_REPLACE 1
+#define READ_SHA1_FILE_HEADER 2
 extern void *read_sha1_file_extended(const unsigned char *sha1, enum object_type *type, unsigned long *size, unsigned flag);
 static inline void *read_sha1_file(const unsigned char *sha1, enum object_type *type, unsigned long *size)
 {
@@ -1059,7 +1060,7 @@ extern int is_pack_valid(struct packed_git *);
 extern off_t nth_packed_object_offset(const struct packed_git *, uint32_t);
 extern off_t find_pack_entry_one(const unsigned char *, struct packed_git *);
 extern int is_pack_valid(struct packed_git *);
-extern void *unpack_entry(struct packed_git *, off_t, enum object_type *, unsigned long *);
+extern void *unpack_entry(struct packed_git *, off_t, enum object_type *, unsigned long *, int);
 extern unsigned long unpack_object_header_buffer(const unsigned char *buf, unsigned long len, enum object_type *type, unsigned long *sizep);
 extern unsigned long get_size_from_delta(struct packed_git *, struct pack_window **, off_t);
 extern int unpack_object_header(struct packed_git *, struct pack_window **, off_t *, unsigned long *);
diff --git a/commit.c b/commit.c
index e8eb0ae..0a088dc 100644
--- a/commit.c
+++ b/commit.c
@@ -312,12 +312,13 @@ int parse_commit(struct commit *item)
 	void *buffer;
 	unsigned long size;
 	int ret;
+	int flags = save_commit_buffer ? 0 : READ_SHA1_FILE_HEADER;
 
 	if (!item)
 		return -1;
 	if (item->object.parsed)
 		return 0;
-	buffer = read_sha1_file(item->object.sha1, &type, &size);
+	buffer = read_sha1_file_extended(item->object.sha1, &type, &size, flags);
 	if (!buffer)
 		return error("Could not read %s",
 			     sha1_to_hex(item->object.sha1));
diff --git a/fast-import.c b/fast-import.c
index c2a814e..a140d57 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -1303,7 +1303,7 @@ static void *gfi_unpack_entry(
 		 */
 		p->pack_size = pack_size + 20;
 	}
-	return unpack_entry(p, oe->idx.offset, &type, sizep);
+	return unpack_entry(p, oe->idx.offset, &type, sizep, 0);
 }
 
 static const char *get_mode(const char *str, uint16_t *modep)
diff --git a/pack-check.c b/pack-check.c
index 63a595c..e4a43c0 100644
--- a/pack-check.c
+++ b/pack-check.c
@@ -116,7 +116,7 @@ static int verify_packfile(struct packed_git *p,
 					    sha1_to_hex(entries[i].sha1),
 					    p->pack_name, (uintmax_t)offset);
 		}
-		data = unpack_entry(p, entries[i].offset, &type, &size);
+		data = unpack_entry(p, entries[i].offset, &type, &size, 0);
 		if (!data)
 			err = error("cannot unpack %s from %s at offset %"PRIuMAX"",
 				    sha1_to_hex(entries[i].sha1), p->pack_name,
diff --git a/sha1_file.c b/sha1_file.c
index 40b2329..1f1f31a 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1469,16 +1469,19 @@ static void *unpack_sha1_file(void *map, unsigned long mapsize, enum object_type
 	return *hdr ? -1 : type_from_string(type);
 }
 
-static void *unpack_sha1_file(void *map, unsigned long mapsize, enum object_type *type, unsigned long *size, const unsigned char *sha1)
+static void *unpack_sha1_file(void *map, unsigned long mapsize, enum object_type *type, unsigned long *size, const unsigned char *sha1, int stop_at_blank)
 {
 	int ret;
 	git_zstream stream;
-	char hdr[8192];
+	char hdr[512];
 
 	ret = unpack_sha1_header(&stream, map, mapsize, hdr, sizeof(hdr));
 	if (ret < Z_OK || (*type = parse_sha1_header(hdr, size)) < 0)
 		return NULL;
-
+	if (stop_at_blank && strstr(hdr, "\n\n")) {
+		*size = strlen(hdr);
+		return xstrdup(hdr);
+	}
 	return unpack_sha1_rest(&stream, hdr, *size, sha1);
 }
 
@@ -1667,8 +1670,11 @@ static void *unpack_compressed_entry(struct packed_git *p,
 static void *unpack_compressed_entry(struct packed_git *p,
 				    struct pack_window **w_curs,
 				    off_t curpos,
-				    unsigned long size)
+				    unsigned long *sizep,
+				    int stop_at_blank)
 {
+	static const int chunk_size = 256;
+	unsigned long size = *sizep;
 	int st;
 	git_zstream stream;
 	unsigned char *buffer, *in;
@@ -1676,15 +1682,27 @@ static void *unpack_compressed_entry(struct packed_git *p,
 	buffer = xmallocz(size);
 	memset(&stream, 0, sizeof(stream));
 	stream.next_out = buffer;
-	stream.avail_out = size + 1;
+
+	if (stop_at_blank)
+		stream.avail_out = chunk_size;
+	else
+		stream.avail_out = size + 1;
 
 	git_inflate_init(&stream);
 	do {
 		in = use_pack(p, w_curs, curpos, &stream.avail_in);
 		stream.next_in = in;
 		st = git_inflate(&stream, Z_FINISH);
-		if (!stream.avail_out)
-			break; /* the payload is larger than it should be */
+		if (!stream.avail_out) {
+			if (!stop_at_blank)
+				break; /* the payload is larger than it should be */
+			if (memmem(buffer, chunk_size, "\n\n", 2)) {
+				git_inflate_end(&stream);
+				*sizep = chunk_size;
+				return buffer;
+			}
+			stream.avail_out = size + 1 - chunk_size;
+		}
 		curpos += stream.next_in - in;
 	} while (st == Z_OK || st == Z_BUF_ERROR);
 	git_inflate_end(&stream);
@@ -1731,7 +1749,8 @@ static void *cache_or_unpack_entry(struct packed_git *p, off_t base_offset,
 }
 
 static void *cache_or_unpack_entry(struct packed_git *p, off_t base_offset,
-	unsigned long *base_size, enum object_type *type, int keep_cache)
+	unsigned long *base_size, enum object_type *type, int keep_cache,
+	int stop_at_blank)
 {
 	void *ret;
 	unsigned long hash = pack_entry_hash(p, base_offset);
@@ -1739,9 +1758,9 @@ static void *cache_or_unpack_entry(struct packed_git *p, off_t base_offset,
 
 	ret = ent->data;
 	if (!ret || ent->p != p || ent->base_offset != base_offset)
-		return unpack_entry(p, base_offset, type, base_size);
+		return unpack_entry(p, base_offset, type, base_size, stop_at_blank);
 
-	if (!keep_cache) {
+	if (!stop_at_blank && !keep_cache) {
 		ent->data = NULL;
 		ent->lru.next->prev = ent->lru.prev;
 		ent->lru.prev->next = ent->lru.next;
@@ -1810,7 +1829,7 @@ static void *read_object(const unsigned char *sha1, enum object_type *type,
 }
 
 static void *read_object(const unsigned char *sha1, enum object_type *type,
-			 unsigned long *size);
+			 unsigned long *size, int stop_at_blank);
 
 static void *unpack_delta_entry(struct packed_git *p,
 				struct pack_window **w_curs,
@@ -1832,7 +1851,7 @@ static void *unpack_delta_entry(struct packed_git *p,
 		return NULL;
 	}
 	unuse_pack(w_curs);
-	base = cache_or_unpack_entry(p, base_offset, &base_size, type, 0);
+	base = cache_or_unpack_entry(p, base_offset, &base_size, type, 0, 0);
 	if (!base) {
 		/*
 		 * We're probably in deep shit, but let's try to fetch
@@ -1851,12 +1870,12 @@ static void *unpack_delta_entry(struct packed_git *p,
 		      sha1_to_hex(base_sha1), (uintmax_t)base_offset,
 		      p->pack_name);
 		mark_bad_packed_object(p, base_sha1);
-		base = read_object(base_sha1, type, &base_size);
+		base = read_object(base_sha1, type, &base_size, 0);
 		if (!base)
 			return NULL;
 	}
 
-	delta_data = unpack_compressed_entry(p, w_curs, curpos, delta_size);
+	delta_data = unpack_compressed_entry(p, w_curs, curpos, &delta_size, 0);
 	if (!delta_data) {
 		error("failed to unpack compressed delta "
 		      "at offset %"PRIuMAX" from %s",
@@ -1895,7 +1914,8 @@ void *unpack_entry(struct packed_git *p, off_t obj_offset,
 int do_check_packed_object_crc;
 
 void *unpack_entry(struct packed_git *p, off_t obj_offset,
-		   enum object_type *type, unsigned long *sizep)
+		   enum object_type *type, unsigned long *sizep,
+		   int stop_at_blank)
 {
 	struct pack_window *w_curs = NULL;
 	off_t curpos = obj_offset;
@@ -1929,7 +1949,8 @@ void *unpack_entry(struct packed_git *p, off_t obj_offset,
 	case OBJ_TREE:
 	case OBJ_BLOB:
 	case OBJ_TAG:
-		data = unpack_compressed_entry(p, &w_curs, curpos, *sizep);
+		data = unpack_compressed_entry(p, &w_curs, curpos, sizep,
+					       stop_at_blank);
 		break;
 	default:
 		data = NULL;
@@ -2208,14 +2229,15 @@ static void *read_packed_sha1(const unsigned char *sha1,
 }
 
 static void *read_packed_sha1(const unsigned char *sha1,
-			      enum object_type *type, unsigned long *size)
+			      enum object_type *type, unsigned long *size,
+			      int stop_at_blank)
 {
 	struct pack_entry e;
 	void *data;
 
 	if (!find_pack_entry(sha1, &e))
 		return NULL;
-	data = cache_or_unpack_entry(e.p, e.offset, size, type, 1);
+	data = cache_or_unpack_entry(e.p, e.offset, size, type, 1, stop_at_blank);
 	if (!data) {
 		/*
 		 * We're probably in deep shit, but let's try to fetch
@@ -2226,7 +2248,7 @@ static void *read_packed_sha1(const unsigned char *sha1,
 		error("failed to read object %s at offset %"PRIuMAX" from %s",
 		      sha1_to_hex(sha1), (uintmax_t)e.offset, e.p->pack_name);
 		mark_bad_packed_object(e.p, sha1);
-		data = read_object(sha1, type, size);
+		data = read_object(sha1, type, size, stop_at_blank);
 	}
 	return data;
 }
@@ -2255,7 +2277,7 @@ static void *read_object(const unsigned char *sha1, enum object_type *type,
 }
 
 static void *read_object(const unsigned char *sha1, enum object_type *type,
-			 unsigned long *size)
+			 unsigned long *size, int stop_at_blank)
 {
 	unsigned long mapsize;
 	void *map, *buf;
@@ -2268,17 +2290,18 @@ static void *read_object(const unsigned char *sha1, enum object_type *type,
 		return xmemdupz(co->buf, co->size);
 	}
 
-	buf = read_packed_sha1(sha1, type, size);
+	buf = read_packed_sha1(sha1, type, size, stop_at_blank);
 	if (buf)
 		return buf;
 	map = map_sha1_file(sha1, &mapsize);
 	if (map) {
-		buf = unpack_sha1_file(map, mapsize, type, size, sha1);
+		buf = unpack_sha1_file(map, mapsize, type, size, sha1,
+				       stop_at_blank);
 		munmap(map, mapsize);
 		return buf;
 	}
 	reprepare_packed_git();
-	return read_packed_sha1(sha1, type, size);
+	return read_packed_sha1(sha1, type, size, stop_at_blank);
 }
 
 /*
@@ -2296,9 +2319,10 @@ void *read_sha1_file_extended(const unsigned char *sha1,
 	const struct packed_git *p;
 	const unsigned char *repl = (flag & READ_SHA1_FILE_REPLACE)
 		? lookup_replace_object(sha1) : sha1;
+	int stop_at_blank = !!(flag & READ_SHA1_FILE_HEADER);
 
 	errno = 0;
-	data = read_object(repl, type, size);
+	data = read_object(repl, type, size, stop_at_blank);
 	if (data)
 		return data;
 
@@ -2597,7 +2621,7 @@ int force_object_loose(const unsigned char *sha1, time_t mtime)
 
 	if (has_loose_object(sha1))
 		return 0;
-	buf = read_packed_sha1(sha1, &type, &len);
+	buf = read_packed_sha1(sha1, &type, &len, 0);
 	if (!buf)
 		return error("cannot read sha1_file for %s", sha1_to_hex(sha1));
 	hdrlen = sprintf(hdr, "%s %lu", typename(type), len) + 1;

^ permalink raw reply related

* Re: [PATCH 2/3] do not write null sha1s to on-disk index
From: Jonathan Nieder @ 2012-12-29 10:03 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Jens Lehmann, Junio C Hamano
In-Reply-To: <20120728150524.GB25269@sigill.intra.peff.net>

Hi Peff,

Jeff King wrote:

> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -1800,6 +1800,8 @@ int write_index(struct index_state *istate, int newfd)
>  			continue;
>  		if (!ce_uptodate(ce) && is_racy_timestamp(istate, ce))
>  			ce_smudge_racily_clean_entry(ce);
> +		if (is_null_sha1(ce->sha1))
> +			return error("cache entry has null sha1: %s", ce->name);
>  		if (ce_write_entry(&c, newfd, ce, previous_name) < 0)
>  			return -1;

Quick heads up: this is tripping for me in the "git am --abort"
codepath:

  $ cd ~/src/linux
  $ git checkout v3.2.35
  $ git am -3sc /tmp/alsa-usb-audio-fix-missing-autopm-for-midi-input.patch 
  Applying: ALSA: usb-audio: Fix missing autopm for MIDI input
  Using index info to reconstruct a base tree...
  Falling back to patching base and 3-way merge...
  error: Your local changes to the following files would be overwritten by merge:
	sound/usb/midi.c
  Please, commit your changes or stash them before you can merge.
  Aborting
  Failed to merge in the changes.
  Patch failed at 0001 ALSA: usb-audio: Fix missing autopm for MIDI input
  When you have resolved this problem run "git am --resolved".
  If you would prefer to skip this patch, instead run "git am --skip".
  To restore the original branch and stop patching run "git am --abort".
  $ git am --abort
  error: cache entry has null sha1: sound/usb/midi.c
  fatal: unable to write new index file
  Unstaged changes after reset:
  M       sound/usb/midi.c
  $

Reproducible using v1.8.1-rc3 and master.  Bisects to 4337b585 (do not
write null sha1s to on-disk index, 2012-07-28).  For comparison, older
gits produced

  $ git am --abort
  Unstaged changes after reset:
  M       sound/usb/midi.c

which is also not great but at least didn't involve any obviously
invalid behavior.  Known problem?

Thanks,
Jonathan

^ permalink raw reply

* Re: [PATCH 2/3] do not write null sha1s to on-disk index
From: Jeff King @ 2012-12-29 10:27 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git, Jens Lehmann, Junio C Hamano
In-Reply-To: <20121229100130.GA31497@elie.Belkin>

On Sat, Dec 29, 2012 at 02:03:46AM -0800, Jonathan Nieder wrote:

> > --- a/read-cache.c
> > +++ b/read-cache.c
> > @@ -1800,6 +1800,8 @@ int write_index(struct index_state *istate, int newfd)
> >  			continue;
> >  		if (!ce_uptodate(ce) && is_racy_timestamp(istate, ce))
> >  			ce_smudge_racily_clean_entry(ce);
> > +		if (is_null_sha1(ce->sha1))
> > +			return error("cache entry has null sha1: %s", ce->name);
> >  		if (ce_write_entry(&c, newfd, ce, previous_name) < 0)
> >  			return -1;
> 
> Quick heads up: this is tripping for me in the "git am --abort"
> codepath:

Thanks. The intent was that this should never happen, and we are
protecting against bogus sha1s slipping into the index. So either we
have found a bug in "am --abort", or the assumption that it should never
happen was wrong. :)

>   $ cd ~/src/linux
>   $ git checkout v3.2.35
>   $ git am -3sc /tmp/alsa-usb-audio-fix-missing-autopm-for-midi-input.patch 
>   Applying: ALSA: usb-audio: Fix missing autopm for MIDI input
>   Using index info to reconstruct a base tree...
>   Falling back to patching base and 3-way merge...
>   error: Your local changes to the following files would be overwritten by merge:
> 	sound/usb/midi.c
>   Please, commit your changes or stash them before you can merge.
>   Aborting
>   Failed to merge in the changes.
>   Patch failed at 0001 ALSA: usb-audio: Fix missing autopm for MIDI input
>   When you have resolved this problem run "git am --resolved".
>   If you would prefer to skip this patch, instead run "git am --skip".
>   To restore the original branch and stop patching run "git am --abort".
>   $ git am --abort
>   error: cache entry has null sha1: sound/usb/midi.c
>   fatal: unable to write new index file
>   Unstaged changes after reset:
>   M       sound/usb/midi.c
>   $

I can't reproduce here. I can checkout v3.2.35, and I guess that the
patch you are applying comes from f5f1654, but I don't know your
local modification to sound/usb/midi.c. If I just make a fake
modification, I get this:

  $ git checkout v3.2.35
  $ echo fake >sound/usb/midi.c
  $ git format-patch --stdout -1 f5f1654 | git am -3 ~/patch
  [same errors as you]
  $ git am --abort
  Unstaged changes after reset:
  M       sound/usb/midi.c

So I wonder if there is something in the way your working tree is
modified. Can you give more details?

> Reproducible using v1.8.1-rc3 and master.  Bisects to 4337b585 (do not
> write null sha1s to on-disk index, 2012-07-28).  For comparison, older
> gits produced
> 
>   $ git am --abort
>   Unstaged changes after reset:
>   M       sound/usb/midi.c

What does your index look like afterwards? Does it have a null sha1 in
it (check "ls-files -s")?

-Peff

^ permalink raw reply

* Re: [PATCH 2/3] do not write null sha1s to on-disk index
From: Jonathan Nieder @ 2012-12-29 10:34 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Jens Lehmann, Junio C Hamano
In-Reply-To: <20121229102707.GA26730@sigill.intra.peff.net>

Jeff King wrote:

> I can't reproduce here. I can checkout v3.2.35, and I guess that the
> patch you are applying comes from f5f1654, but I don't know your
> local modification to sound/usb/midi.c.

No local modification.  The unstaged change after "git am --abort" to
recover from a conflicted git am is a longstanding bug (at least a
couple of years old).

The patch creating conflicts is

	queue-3.2/alsa-usb-audio-fix-missing-autopm-for-midi-input.patch

from git://git.kernel.org/pub/scm/linux/kernel/git/bwh/linux-3.2.y-queue.git

[...]
>>   $ git am --abort
>>   Unstaged changes after reset:
>>   M       sound/usb/midi.c
>
> What does your index look like afterwards? Does it have a null sha1 in
> it (check "ls-files -s")?

$ git diff-index --abbrev HEAD
:100644 100644 eeefbce3873c... 000000000000... M	sound/usb/midi.c
$ git ls-files --abbrev -s sound/usb/midi.c
100644 eeefbce3873c 0	sound/usb/midi.c

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox