Git development
 help / color / mirror / Atom feed
* Re: [PATCHv2 4/4] submodule: add embed-git-dir function
From: Junio C Hamano @ 2016-11-30 22:18 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Duy Nguyen, Brandon Williams, Git Mailing List, Jonathan Nieder,
	Jens Lehmann, Heiko Voigt
In-Reply-To: <CAGZ79kZSAJauwBwrxf+QAhQgyu4ACn+8LrwjpFGVaUQfSzHEAg@mail.gmail.com>

Stefan Beller <sbeller@google.com> writes:

> On Wed, Nov 30, 2016 at 1:39 PM, Junio C Hamano <gitster@pobox.com> wrote:
>> Stefan Beller <sbeller@google.com> writes:
>>
>>>     git relocate-git-dir (--into-workingtree|--into-gitdir) \
>>
>> I am not sure if you meant this as a submodule-specific subcommand
>> or more general helper.  "into-workingtree" suggests to me that it
>> is submodule specific, so I'll base my response on that assumption.
>>
>> Would there ever be a situation where you already have submodule
>> repositories in the right place (according to the more modern
>> practice, to keep them in .git/modules/ of superproject) and want to
>> move them to embed them in worktrees of submodules?  I do not think
>> of any.
>
>  "Hi, I made a mistake by using submodules. I don't want to use
>   them any more, I rather want to:
>   A) make it a separate git repo again and I'll keep them in sync myself
>   B) ... "

OK, I can buy that.  Thanks.

^ permalink raw reply

* Re: [RFC/PATCH v3 00/16] Add initial experimental external ODB support
From: Junio C Hamano @ 2016-11-30 22:36 UTC (permalink / raw)
  To: Christian Couder
  Cc: git, Jeff King, Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider,
	Eric Wong, Christian Couder
In-Reply-To: <20161130210420.15982-1-chriscool@tuxfamily.org>

Christian Couder <christian.couder@gmail.com> writes:

> For now there should be one odb ref per blob. Each ref name should be
> refs/odbs/<odbname>/<sha1> where <sha1> is the sha1 of the blob stored
> in the external odb named <odbname>.
>
> These odb refs should all point to a blob that should be stored in the
> Git repository and contain information about the blob stored in the
> external odb. This information can be specific to the external odb.
> The repos can then share this information using commands like:
>
> `git fetch origin "refs/odbs/<odbname>/*:refs/odbs/<odbname>/*"`

Unless this is designed to serve only a handful of blobs, I cannot
see how this design would scale successfully.  I notice you wrote
"For now" at the beginning, but what is the envisioned way this will
evolve in the future?


^ permalink raw reply

* Re: [PATCH] difftool.c: mark a file-local symbol with static
From: Ramsay Jones @ 2016-11-30 22:37 UTC (permalink / raw)
  To: Jeff King, Junio C Hamano; +Cc: Johannes Schindelin, GIT Mailing-list
In-Reply-To: <20161130212510.ihcmvig7jq44p3nx@sigill.intra.peff.net>



On 30/11/16 21:25, Jeff King wrote:
> On Wed, Nov 30, 2016 at 12:40:25PM -0800, Junio C Hamano wrote:
> 
>> Ramsay Jones <ramsay@ramsayjones.plus.com> writes:
>>
>>> [I have fixed my config.mak file now, so I don't see the warning
>>> anymore! Having -Wno-format-zero-length in DEVELOPER_CFLAGS, or
>>> not, is a separate matter.]
>>
>> I suspect that 658df95a4a ("add DEVELOPER makefile knob to check for
>> acknowledged warnings", 2016-02-25) took it from me (namely, Make
>> script in my 'todo' branch).  In turn, I added it to my set of flags
>> in order to squelch this exact warning, so...
> 
> For anybody interested in the history, we started using this when
> status_printf() got the format attribute. Relevant patch and discussion:
> 
>   http://public-inbox.org/git/20130710002328.GC19423@sigill.intra.peff.net/T/#u

Ah, thank you! I've been trying to remember this for the last hour or so ...
(I misremembered something about gettext, and went looking in the wrong
direction ;-) ).

> We went with disabling the warning because it really is wrong. It makes
> an assumption that calling a format function with an empty string
> doesn't do anything, but that's only true of the stock printf functions.
> Our custom functions _do_ have a side effect in this case.

Yes, I agree, gcc is making an unwarranted assumption.

> The other options are to have a special function for "print a warning
> (or status) line with no content". Or to teach those functions to handle
> newlines specially. We've often discussed that you should be able to do:
> 
>   warning("foo\nbar");
> 
> and have it print:
> 
>   warning: foo
>   warning: bar
> 
> That's useful in itself, and would probably make cases like this easier
> to handle, too. But it's a pretty big change. Another option would be to
> just teach formatting functions to handle a single "\n" as a synonym for
> the empty string (or even detect trailing newlines and avoid appending
> our own in that case). That would mean you could do:
> 
>   warning("\n");
> 
> to print a blank line. That's arguably more obvious about the intent to
> a reader (I say arguably because the new behavior _is_ subtle if you
> happen to know that warning() usually appends a newline).

Yes, I remember the discussion now and agree that this could
cause problems.

> Anyway. Those are all options, but I don't think there is any problem
> with sticking with warning("") for now. It is not the first part of the
> code that tickles the format-zero-length warning.

Hmm, well I have been building git for some time with the warning
enabled without problem.

[I can see a few examples of 'status_printf_ln(s, c, "%s", "");' in
git-grep output, so ...]

I am not suggesting any change here, and was just curious why it
seemed to be unnecessary now (I knew it was necessary at one time).

ATB,
Ramsay Jones


^ permalink raw reply

* Re: "git add -p ." raises an unexpected "warning: empty strings as pathspecs will be made invalid in upcoming releases. please use . instead if you meant to match all paths"
From: Junio C Hamano @ 2016-11-30 22:40 UTC (permalink / raw)
  To: git; +Cc: Emily Xie
In-Reply-To: <xmqq7f7kd3pj.fsf@gitster.mtv.corp.google.com>

Junio C Hamano <gitster@pobox.com> forgot to Cc: the author of the
most relevant change to the issue, d426430e6e ("pathspec: warn on
empty strings as pathspec", 2016-06-22).

> Kevin Daudt <me@ikke.info> writes:
>
>> On Wed, Nov 30, 2016 at 12:31:49PM -0800, Peter Urda wrote:
>>> After upgrading to version 2.11.0 I am getting a warning about empty
>>> strings as pathspecs while using 'patch'
>>> 
>>> - Ran 'git add -p .' from the root of my git repository.
>>> 
>>> - I was able to normally stage my changes, but was presented with a
>>> "warning: empty strings as pathspecs will be made invalid in upcoming
>>> releases. please use . instead if you meant to match all paths"
>>> message.
>>> 
>>> - I expected no warning message since I included a "." with my original command.
>>> 
>>> I believe that I should not be seeing this warning message as I
>>> included the requested "." pathspec.
>
> Yes, this seems to be caused by pathspec.c::prefix_pathspec()
> overwriting the original pathspec "." into "".  The callchain
> looks like this:
>
>     builtin/add.c::interactive_add()
>      -> parse_pathspec()
>         passes argv[] that has "." to the caller,
>         receives pathspec whose pathspec->items[].original
> 	is supposed to point at the unmolested original,
>         but prefix_pathspec() munges "." into ""
>      -> run_add_interactive()
>         which runs "git add--interactive" with
> 	pathspec->items[].original as pathspecs
>
>
> Perhaps this would work it around, but there should be a better way
> to fix it (like, making sure that what we call "original" indeed
> stays "original").
>
>  builtin/add.c | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/builtin/add.c b/builtin/add.c
> index e8fb80b36e..137097192d 100644
> --- a/builtin/add.c
> +++ b/builtin/add.c
> @@ -167,9 +167,18 @@ int run_add_interactive(const char *revision, const char *patch_mode,
>  	if (revision)
>  		argv_array_push(&argv, revision);
>  	argv_array_push(&argv, "--");
> -	for (i = 0; i < pathspec->nr; i++)
> +	for (i = 0; i < pathspec->nr; i++) {
>  		/* pass original pathspec, to be re-parsed */
> +		if (!*pathspec->items[i].original) {
> +			/*
> +			 * work around a misfeature in parse_pathspecs()
> +			 * that munges "." into "".
> +			 */
> +			argv_array_push(&argv, ".");
> +			continue;
> +		}
>  		argv_array_push(&argv, pathspec->items[i].original);
> +	}
>  
>  	status = run_command_v_opt(argv.argv, RUN_GIT_CMD);
>  	argv_array_clear(&argv);
> @@ -180,7 +189,7 @@ int interactive_add(int argc, const char **argv, const char *prefix, int patch)
>  {
>  	struct pathspec pathspec;
>  
> -	parse_pathspec(&pathspec, 0,
> +	parse_pathspec(&pathspec, 0,
>  		       PATHSPEC_PREFER_FULL |
>  		       PATHSPEC_SYMLINK_LEADING_PATH |
>  		       PATHSPEC_PREFIX_ORIGIN,

^ permalink raw reply

* Re: [PATCH] difftool.c: mark a file-local symbol with static
From: Jeff King @ 2016-11-30 23:18 UTC (permalink / raw)
  To: Ramsay Jones; +Cc: Junio C Hamano, Johannes Schindelin, GIT Mailing-list
In-Reply-To: <3e6a6685-19ec-4536-4a5f-3a56e30fb530@ramsayjones.plus.com>

On Wed, Nov 30, 2016 at 10:37:40PM +0000, Ramsay Jones wrote:

> > Anyway. Those are all options, but I don't think there is any problem
> > with sticking with warning("") for now. It is not the first part of the
> > code that tickles the format-zero-length warning.
> 
> Hmm, well I have been building git for some time with the warning
> enabled without problem.
> 
> [I can see a few examples of 'status_printf_ln(s, c, "%s", "");' in
> git-grep output, so ...]
> 
> I am not suggesting any change here, and was just curious why it
> seemed to be unnecessary now (I knew it was necessary at one time).

I forgot, we ended up reversing course later and silencing them:

  http://public-inbox.org/git/20140505052117.GC6569@sigill.intra.peff.net/

By the rationale of that conversation, we should be doing:

  warning("%s", "");

here.

-Peff

^ permalink raw reply

* [RFC/PATCH v3 04/16] t0400: add 'put' command to odb-helper script
From: Christian Couder @ 2016-11-30 21:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Nguyen Thai Ngoc Duy, Mike Hommey,
	Lars Schneider, Eric Wong, Christian Couder
In-Reply-To: <20161130210420.15982-1-chriscool@tuxfamily.org>

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 t/t0400-external-odb.sh | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/t/t0400-external-odb.sh b/t/t0400-external-odb.sh
index fe85413725..0f1bb97f38 100755
--- a/t/t0400-external-odb.sh
+++ b/t/t0400-external-odb.sh
@@ -7,6 +7,10 @@ test_description='basic tests for external object databases'
 ALT_SOURCE="$PWD/alt-repo/.git"
 export ALT_SOURCE
 write_script odb-helper <<\EOF
+die() {
+	printf >&2 "%s\n" "$@"
+	exit 1
+}
 GIT_DIR=$ALT_SOURCE; export GIT_DIR
 case "$1" in
 have)
@@ -16,6 +20,16 @@ have)
 get)
 	cat "$GIT_DIR"/objects/$(echo $2 | sed 's#..#&/#')
 	;;
+put)
+	sha1="$2"
+	size="$3"
+	kind="$4"
+	writen=$(git hash-object -w -t "$kind" --stdin)
+	test "$writen" = "$sha1" || die "bad sha1 passed '$sha1' vs writen '$writen'"
+	;;
+*)
+	die "unknown command '$1'"
+	;;
 esac
 EOF
 HELPER="\"$PWD\"/odb-helper"
-- 
2.11.0.rc2.37.geb49ca6


^ permalink raw reply related

* [RFC/PATCH v3 08/16] t0400: add test for external odb write support
From: Christian Couder @ 2016-11-30 21:04 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Nguyen Thai Ngoc Duy, Mike Hommey,
	Lars Schneider, Eric Wong, Christian Couder
In-Reply-To: <20161130210420.15982-1-chriscool@tuxfamily.org>

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
 t/t0400-external-odb.sh | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/t/t0400-external-odb.sh b/t/t0400-external-odb.sh
index 6c6da5cf4f..3c868cad4c 100755
--- a/t/t0400-external-odb.sh
+++ b/t/t0400-external-odb.sh
@@ -66,4 +66,12 @@ test_expect_success 'helper can add objects to alt repo' '
 	test "$size" -eq "$alt_size"
 '
 
+test_expect_success 'commit adds objects to alt repo' '
+	test_config odb.magic.command "$HELPER" &&
+	test_commit three &&
+	hash3=$(git ls-tree HEAD | grep three.t | cut -f1 | cut -d\  -f3) &&
+	content=$(cd alt-repo && git show "$hash3") &&
+	test "$content" = "three"
+'
+
 test_done
-- 
2.11.0.rc2.37.geb49ca6


^ permalink raw reply related

* Re: What's cooking in git.git (Nov 2016, #06; Mon, 28)
From: Brandon Williams @ 2016-11-30 23:28 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git
In-Reply-To: <20161130195427.GA166433@google.com>

On 11/30, Brandon Williams wrote:
> On 11/29, Jeff King wrote:
> > On Tue, Nov 29, 2016 at 01:37:59AM -0500, Jeff King wrote:
> > 
> > >   2. Grep threads doing more complicated stuff that needs to take a
> > >      lock. You might try building with -fsanitize=thread to see if it
> > >      turns up anything.
> > 
> > I tried this and it didn't find anything useful. It complains about
> > multiple threads calling want_color() at the same time, which you can
> > silence with something like:
> > 
> > diff --git a/builtin/grep.c b/builtin/grep.c
> > index 2c727ef49..d48846f40 100644
> > --- a/builtin/grep.c
> > +++ b/builtin/grep.c
> > @@ -207,6 +207,12 @@ static void start_threads(struct grep_opt *opt)
> >  {
> >  	int i;
> >  
> > +	/*
> > +	 * trigger want_color() for its side effect of caching the result;
> > +	 * otherwise the threads will fight over setting the cache
> > +	 */
> > +	want_color(GIT_COLOR_AUTO);
> > +
> >  	pthread_mutex_init(&grep_mutex, NULL);
> >  	pthread_mutex_init(&grep_read_mutex, NULL);
> >  	pthread_mutex_init(&grep_attr_mutex, NULL);
> > 
> > But the problem persists even with that patch, so it is something else.
> > It may still be a threading problem; -fsanitize=thread isn't perfect. I
> > also couldn't get the stress-test to fail when compiled with it. But
> > that may simply mean that the timing of the resulting binary is changed
> > enough not to trigger the issue.
> > 
> > -Peff
> 
> With you're stress script I'm able to see the failures.  The interesting
> thing is that the entry missing is always from the non-submodule file.

So I couldn't find a race condition in the code.  I tracked the problem
to grep_source_load_file which attempts to run lstat on the file so that
it can read it into a buffer.  The lstat call fails with ENOENT (which
conveniently is skipped by the if statement which calls error_errno).  So
for some reason the file cannot be found and read into memory resulting
in nothing being grep'ed for that particular file (since the buffer is
NULL).

Maybe there is something wrong with the way I wrote the tests
themselves?  So I could try rewriting them.  Any other ideas?

-- 
Brandon Williams

^ permalink raw reply

* Re: [RFC/PATCH v3 01/16] Add initial external odb support
From: Junio C Hamano @ 2016-11-30 23:30 UTC (permalink / raw)
  To: Christian Couder
  Cc: git, Jeff King, Nguyen Thai Ngoc Duy, Mike Hommey, Lars Schneider,
	Eric Wong, Christian Couder
In-Reply-To: <20161130210420.15982-2-chriscool@tuxfamily.org>

Christian Couder <christian.couder@gmail.com> writes:

> From: Jeff King <peff@peff.net>
>
> Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
> ---

By the time the series loses RFC prefix, we'd need to see the above
three lines straightened out a bit more, e.g. a real message and a
more proper sign-off sequence.

> +static struct odb_helper *find_or_create_helper(const char *name, int len)
> +{
> +	struct odb_helper *o;
> +
> +	for (o = helpers; o; o = o->next)
> +		if (!strncmp(o->name, name, len) && !o->name[len])
> +			return o;
> +
> +	o = odb_helper_new(name, len);
> +	*helpers_tail = o;
> +	helpers_tail = &o->next;
> +
> +	return o;
> +}
> +
> +static int external_odb_config(const char *var, const char *value, void *data)
> +{
> +	struct odb_helper *o;
> +	const char *key, *dot;
> +
> +	if (!skip_prefix(var, "odb.", &key))
> +		return 0;
> +	dot = strrchr(key, '.');
> +	if (!dot)
> +		return 0;

Is this something Peff wrote long time ago?  I find it surprising
that he would write this without using parse_config_key().

> +struct odb_helper_cmd {
> +	struct argv_array argv;
> +	struct child_process child;
> +};
> +
> +static void prepare_helper_command(struct argv_array *argv, const char *cmd,
> +				   const char *fmt, va_list ap)
> +{
> +	struct strbuf buf = STRBUF_INIT;
> +
> +	strbuf_addstr(&buf, cmd);
> +	strbuf_addch(&buf, ' ');
> +	strbuf_vaddf(&buf, fmt, ap);
> +
> +	argv_array_push(argv, buf.buf);
> +	strbuf_release(&buf);

This concatenates the cmdname (like "get") and its parameters into a
single string (e.g. "get 454cb6bd52a4de614a3633e4f547af03d5c3b640")
and then places the resulting string as the sole element in argv[]
array, which I find somewhat unusual.  It at least deserves a
comment in front of the function that the callers are responsible to
ensure that the result of vaddf(fmt, ap) is properly shell-quoted.
Otherwise it is inviting quoting errors in the future (even if there
is no such errors in the current code and command set, i.e. a 40-hex
object name that "get" subcommand takes happens to require no
special shell-quoting consideration).

> +int odb_helper_fetch_object(struct odb_helper *o, const unsigned char *sha1,
> +			    int fd)
> +{
> +	struct odb_helper_object *obj;
> +	struct odb_helper_cmd cmd;
> +	unsigned long total_got;
> +	git_zstream stream;
> +	int zret = Z_STREAM_END;
> +	git_SHA_CTX hash;
> +	unsigned char real_sha1[20];
> +
> +	obj = odb_helper_lookup(o, sha1);
> +	if (!obj)
> +		return -1;
> +
> +	if (odb_helper_start(o, &cmd, "get %s", sha1_to_hex(sha1)) < 0)
> +		return -1;
> +
> +	memset(&stream, 0, sizeof(stream));
> +	git_inflate_init(&stream);
> +	git_SHA1_Init(&hash);
> +	total_got = 0;
> +
> +	for (;;) {
> +		unsigned char buf[4096];
> +		int r;
> +
> +		r = xread(cmd.child.out, buf, sizeof(buf));
> +		if (r < 0) {
> +			error("unable to read from odb helper '%s': %s",
> +			      o->name, strerror(errno));
> +			close(cmd.child.out);
> +			odb_helper_finish(o, &cmd);
> +			git_inflate_end(&stream);
> +			return -1;
> +		}
> +		if (r == 0)
> +			break;
> +
> +		write_or_die(fd, buf, r);
> +
> +		stream.next_in = buf;
> +		stream.avail_in = r;
> +		do {
> +			unsigned char inflated[4096];
> +			unsigned long got;
> +
> +			stream.next_out = inflated;
> +			stream.avail_out = sizeof(inflated);
> +			zret = git_inflate(&stream, Z_SYNC_FLUSH);
> +			got = sizeof(inflated) - stream.avail_out;
> +
> +			git_SHA1_Update(&hash, inflated, got);
> +			/* skip header when counting size */
> +			if (!total_got) {
> +				const unsigned char *p = memchr(inflated, '\0', got);

Does this assume that a single xread() that can result in a
short-read would not return in the middle of "header", and if so, is
that a safe assumption to make?

> +				if (p)
> +					got -= p - inflated + 1;
> +				else
> +					got = 0;
> +			}
> +			total_got += got;
> +		} while (stream.avail_in && zret == Z_OK);
> +	}
> +
> +	close(cmd.child.out);
> +	git_inflate_end(&stream);
> +	git_SHA1_Final(real_sha1, &hash);
> +	if (odb_helper_finish(o, &cmd))
> +		return -1;
> +	if (zret != Z_STREAM_END) {
> +		warning("bad zlib data from odb helper '%s' for %s",
> +			o->name, sha1_to_hex(sha1));
> +		return -1;
> +	}
> +	if (total_got != obj->size) {
> +		warning("size mismatch from odb helper '%s' for %s (%lu != %lu)",
> +			o->name, sha1_to_hex(sha1), total_got, obj->size);
> +		return -1;
> +	}
> +	if (hashcmp(real_sha1, sha1)) {
> +		warning("sha1 mismatch from odb helper '%s' for %s (got %s)",
> +			o->name, sha1_to_hex(sha1), sha1_to_hex(real_sha1));
> +		return -1;
> +	}
> +
> +	return 0;
> +}

OK.  So we call the external helper with "get" command, expecting
that the helper returns the data as a zlib deflated stream, and
validate that the data matches the expected hash and the expected
size, while also saving the contents of the deflated stream to fd.
Presumably the caller opened a fd to write into a loose object?  I
do not see this code actually validating that the loose object
header, i.e. "<type> <len>\0", matches what the caller wanted to see
in obj->size and obj->type.  Shouldn't there be a check for that,
too?

I am tempted to debate myself if it is a sensible design to require
"get" to return a loose object representation, but cannot decide
without seeing the remainder of the series.  An obvious alternative
is to define the "get" request to interface with us via the raw
contents (not even deflated) and leave the deflating to us, i.e. Git
sitting on the receiving end, which would allow us to choose to
store it differently (e.g. we may want to try streaming it into its
own pack using the streaming.h API, for example).


^ permalink raw reply

* Re: What's cooking in git.git (Nov 2016, #06; Mon, 28)
From: Jeff King @ 2016-11-30 23:32 UTC (permalink / raw)
  To: Brandon Williams; +Cc: Junio C Hamano, git
In-Reply-To: <20161130232823.GA192901@google.com>

On Wed, Nov 30, 2016 at 03:28:23PM -0800, Brandon Williams wrote:

> So I couldn't find a race condition in the code.  I tracked the problem
> to grep_source_load_file which attempts to run lstat on the file so that
> it can read it into a buffer.  The lstat call fails with ENOENT (which
> conveniently is skipped by the if statement which calls error_errno).  So
> for some reason the file cannot be found and read into memory resulting
> in nothing being grep'ed for that particular file (since the buffer is
> NULL).

That's definitely weird. Is it possible that any of the underlying calls
from another thread are using chdir()? I think realpath() make do that
behind the scenes, and there may be others.

A full strace from a failing case would be interesting reading. In
theory we should be able to get that by running the stress script for
long enough. :)

-Peff

^ permalink raw reply

* Re: [RFC/PATCH v3 01/16] Add initial external odb support
From: Jeff King @ 2016-11-30 23:37 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Christian Couder, git, Nguyen Thai Ngoc Duy, Mike Hommey,
	Lars Schneider, Eric Wong, Christian Couder
In-Reply-To: <xmqqmvggbl6m.fsf@gitster.mtv.corp.google.com>

On Wed, Nov 30, 2016 at 03:30:09PM -0800, Junio C Hamano wrote:

> Christian Couder <christian.couder@gmail.com> writes:
> 
> > From: Jeff King <peff@peff.net>
> >
> > Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
> > ---
> 
> By the time the series loses RFC prefix, we'd need to see the above
> three lines straightened out a bit more, e.g. a real message and a
> more proper sign-off sequence.

Actually, I would assume that this patch would go away altogether and
get folded into commits that get us closer to the end state. But I
haven't read the series carefully yet, so maybe it really does work
better with this as a base.

I am perfectly OK dropping my commit count by one and getting "based on
a patch by Peff" in a cover letter or elsewhere.

-Peff

^ permalink raw reply

* Re: What's cooking in git.git (Nov 2016, #06; Mon, 28)
From: Jeff King @ 2016-11-30 23:40 UTC (permalink / raw)
  To: Brandon Williams; +Cc: Junio C Hamano, git
In-Reply-To: <20161130233204.ihbrjwwu3yiv4ugq@sigill.intra.peff.net>

On Wed, Nov 30, 2016 at 06:32:04PM -0500, Jeff King wrote:

> On Wed, Nov 30, 2016 at 03:28:23PM -0800, Brandon Williams wrote:
> 
> > So I couldn't find a race condition in the code.  I tracked the problem
> > to grep_source_load_file which attempts to run lstat on the file so that
> > it can read it into a buffer.  The lstat call fails with ENOENT (which
> > conveniently is skipped by the if statement which calls error_errno).  So
> > for some reason the file cannot be found and read into memory resulting
> > in nothing being grep'ed for that particular file (since the buffer is
> > NULL).
> 
> That's definitely weird. Is it possible that any of the underlying calls
> from another thread are using chdir()? I think realpath() make do that
> behind the scenes, and there may be others.
> 
> A full strace from a failing case would be interesting reading. In
> theory we should be able to get that by running the stress script for
> long enough. :)

Actually, it failed pretty much immediately. I guess the extra stracing
changes the timing to make the problem _more_ likely.

And indeed, I see:

20867 lstat("fi:le",  <unfinished ...>
20813 <... read resumed> "", 232)       = 0
20871 futex(0x558cdec8b164, FUTEX_WAIT_PRIVATE, 7, NULL <unfinished ...>
20813 close(7 <unfinished ...>
20870 <... futex resumed> )             = 0
20869 lstat(".gitmodules",  <unfinished ...>
20813 <... close resumed> )             = 0
20865 set_robust_list(0x7f1df92579e0, 24 <unfinished ...>
20813 lstat("su:b/../.git/modules/su:b/commondir", 0x7ffecc8b3ac0) = -1 ENOENT (No such file or directory)
20865 <... set_robust_list resumed> )   = 0
20868 set_robust_list(0x7f1df7a549e0, 24 <unfinished ...>
20813 access("su:b/../.git/modules/su:b/objects", X_OK) = 0
20813 access("su:b/../.git/modules/su:b/refs", X_OK) = 0
20813 stat("su:b/../.git/modules/su:b", {st_mode=S_IFDIR|0755, st_size=280, ...}) = 0
20813 getcwd("/var/ram/git-stress/root-4/trash directory.t7814-grep-recurse-submodules/parent", 129) = 80
20869 <... lstat resumed> {st_mode=S_IFREG|0644, st_size=47, ...}) = 0
20813 chdir("su:b/../.git/modules/su:b") = 0
20869 open(".gitmodules", O_RDONLY <unfinished ...>
20813 getcwd( <unfinished ...>
20867 <... lstat resumed> 0x7f1df8254cf0) = -1 ENOENT (No such file or directory)

where 20813 and 20867 are two threads of the main process. One is doing
the lstat and the other calls chdir at the same moment.

-Peff

^ permalink raw reply

* Re: What's cooking in git.git (Nov 2016, #06; Mon, 28)
From: Brandon Williams @ 2016-11-30 23:42 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git
In-Reply-To: <20161130234056.iltitkszvccbjivp@sigill.intra.peff.net>

On 11/30, Jeff King wrote:
> On Wed, Nov 30, 2016 at 06:32:04PM -0500, Jeff King wrote:
> 
> > On Wed, Nov 30, 2016 at 03:28:23PM -0800, Brandon Williams wrote:
> > 
> > > So I couldn't find a race condition in the code.  I tracked the problem
> > > to grep_source_load_file which attempts to run lstat on the file so that
> > > it can read it into a buffer.  The lstat call fails with ENOENT (which
> > > conveniently is skipped by the if statement which calls error_errno).  So
> > > for some reason the file cannot be found and read into memory resulting
> > > in nothing being grep'ed for that particular file (since the buffer is
> > > NULL).
> > 
> > That's definitely weird. Is it possible that any of the underlying calls
> > from another thread are using chdir()? I think realpath() make do that
> > behind the scenes, and there may be others.
> > 
> > A full strace from a failing case would be interesting reading. In
> > theory we should be able to get that by running the stress script for
> > long enough. :)
> 
> Actually, it failed pretty much immediately. I guess the extra stracing
> changes the timing to make the problem _more_ likely.
> 
> And indeed, I see:
> 
> 20867 lstat("fi:le",  <unfinished ...>
> 20813 <... read resumed> "", 232)       = 0
> 20871 futex(0x558cdec8b164, FUTEX_WAIT_PRIVATE, 7, NULL <unfinished ...>
> 20813 close(7 <unfinished ...>
> 20870 <... futex resumed> )             = 0
> 20869 lstat(".gitmodules",  <unfinished ...>
> 20813 <... close resumed> )             = 0
> 20865 set_robust_list(0x7f1df92579e0, 24 <unfinished ...>
> 20813 lstat("su:b/../.git/modules/su:b/commondir", 0x7ffecc8b3ac0) = -1 ENOENT (No such file or directory)
> 20865 <... set_robust_list resumed> )   = 0
> 20868 set_robust_list(0x7f1df7a549e0, 24 <unfinished ...>
> 20813 access("su:b/../.git/modules/su:b/objects", X_OK) = 0
> 20813 access("su:b/../.git/modules/su:b/refs", X_OK) = 0
> 20813 stat("su:b/../.git/modules/su:b", {st_mode=S_IFDIR|0755, st_size=280, ...}) = 0
> 20813 getcwd("/var/ram/git-stress/root-4/trash directory.t7814-grep-recurse-submodules/parent", 129) = 80
> 20869 <... lstat resumed> {st_mode=S_IFREG|0644, st_size=47, ...}) = 0
> 20813 chdir("su:b/../.git/modules/su:b") = 0
> 20869 open(".gitmodules", O_RDONLY <unfinished ...>
> 20813 getcwd( <unfinished ...>
> 20867 <... lstat resumed> 0x7f1df8254cf0) = -1 ENOENT (No such file or directory)
> 
> where 20813 and 20867 are two threads of the main process. One is doing
> the lstat and the other calls chdir at the same moment.
> 
> -Peff

Yeah so it looks like the start_command function calls chdir.  Which
means any uses of the run-command interface are not thread safe....

For now the work around could be to just pass "-C <dir>" to the child
process instead of relying on run-command to chdir.

-- 
Brandon Williams

^ permalink raw reply

* Re: What's cooking in git.git (Nov 2016, #06; Mon, 28)
From: Stefan Beller @ 2016-11-30 23:43 UTC (permalink / raw)
  To: Jeff King; +Cc: Brandon Williams, Junio C Hamano, git@vger.kernel.org
In-Reply-To: <20161130234056.iltitkszvccbjivp@sigill.intra.peff.net>

On Wed, Nov 30, 2016 at 3:40 PM, Jeff King <peff@peff.net> wrote:
> On Wed, Nov 30, 2016 at 06:32:04PM -0500, Jeff King wrote:
>
>> On Wed, Nov 30, 2016 at 03:28:23PM -0800, Brandon Williams wrote:
>>
>> > So I couldn't find a race condition in the code.  I tracked the problem
>> > to grep_source_load_file which attempts to run lstat on the file so that
>> > it can read it into a buffer.  The lstat call fails with ENOENT (which
>> > conveniently is skipped by the if statement which calls error_errno).  So
>> > for some reason the file cannot be found and read into memory resulting
>> > in nothing being grep'ed for that particular file (since the buffer is
>> > NULL).
>>
>> That's definitely weird. Is it possible that any of the underlying calls
>> from another thread are using chdir()? I think realpath() make do that
>> behind the scenes, and there may be others.
>>
>> A full strace from a failing case would be interesting reading. In
>> theory we should be able to get that by running the stress script for
>> long enough. :)
>
> Actually, it failed pretty much immediately. I guess the extra stracing
> changes the timing to make the problem _more_ likely.
>
> And indeed, I see:
>
> 20867 lstat("fi:le",  <unfinished ...>
> 20813 <... read resumed> "", 232)       = 0
> 20871 futex(0x558cdec8b164, FUTEX_WAIT_PRIVATE, 7, NULL <unfinished ...>
> 20813 close(7 <unfinished ...>
> 20870 <... futex resumed> )             = 0
> 20869 lstat(".gitmodules",  <unfinished ...>
> 20813 <... close resumed> )             = 0
> 20865 set_robust_list(0x7f1df92579e0, 24 <unfinished ...>
> 20813 lstat("su:b/../.git/modules/su:b/commondir", 0x7ffecc8b3ac0) = -1 ENOENT (No such file or directory)
> 20865 <... set_robust_list resumed> )   = 0
> 20868 set_robust_list(0x7f1df7a549e0, 24 <unfinished ...>
> 20813 access("su:b/../.git/modules/su:b/objects", X_OK) = 0
> 20813 access("su:b/../.git/modules/su:b/refs", X_OK) = 0
> 20813 stat("su:b/../.git/modules/su:b", {st_mode=S_IFDIR|0755, st_size=280, ...}) = 0
> 20813 getcwd("/var/ram/git-stress/root-4/trash directory.t7814-grep-recurse-submodules/parent", 129) = 80
> 20869 <... lstat resumed> {st_mode=S_IFREG|0644, st_size=47, ...}) = 0
> 20813 chdir("su:b/../.git/modules/su:b") = 0
> 20869 open(".gitmodules", O_RDONLY <unfinished ...>
> 20813 getcwd( <unfinished ...>
> 20867 <... lstat resumed> 0x7f1df8254cf0) = -1 ENOENT (No such file or directory)
>
> where 20813 and 20867 are two threads of the main process. One is doing
> the lstat and the other calls chdir at the same moment.
>

Lessons learned here:
The run-command API is not thread safe when used with setting the directory.
If you need to run a thing in a threaded environment run
git -C <dir> ... such that the child chdirs.

Are there any other threaded environments that run things with .dir set?

^ permalink raw reply

* Re: What's cooking in git.git (Nov 2016, #06; Mon, 28)
From: Jeff King @ 2016-11-30 23:46 UTC (permalink / raw)
  To: Brandon Williams; +Cc: Junio C Hamano, git
In-Reply-To: <20161130234248.GB192901@google.com>

On Wed, Nov 30, 2016 at 03:42:48PM -0800, Brandon Williams wrote:

> > where 20813 and 20867 are two threads of the main process. One is doing
> > the lstat and the other calls chdir at the same moment.
>
> Yeah so it looks like the start_command function calls chdir.  Which
> means any uses of the run-command interface are not thread safe....

That seems crazy. The chdir should be happening on the child side of the
fork (and looking at the code, it seems to be the case). And on the
Windows side, without fork, it's an option to the spawn call, which
makes sense.

> For now the work around could be to just pass "-C <dir>" to the child
> process instead of relying on run-command to chdir.

Yeah, that would push it after the exec. I just don't understand why
that would be necessary.

-Peff

^ permalink raw reply

* Re: [PATCH] difftool.c: mark a file-local symbol with static
From: Junio C Hamano @ 2016-11-30 23:46 UTC (permalink / raw)
  To: Jeff King; +Cc: Ramsay Jones, Johannes Schindelin, GIT Mailing-list
In-Reply-To: <20161130231848.v5ge6otytim2t6d2@sigill.intra.peff.net>

Jeff King <peff@peff.net> writes:

> I forgot, we ended up reversing course later and silencing them:
>
>   http://public-inbox.org/git/20140505052117.GC6569@sigill.intra.peff.net/
>
> By the rationale of that conversation, we should be doing:
>
>   warning("%s", "");
>
> here.

I forgot too.  Thanks for digging up that thread.


^ permalink raw reply

* Re: What's cooking in git.git (Nov 2016, #06; Mon, 28)
From: Brandon Williams @ 2016-11-30 23:57 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git
In-Reply-To: <20161130234636.6az7xfywzprpn6ly@sigill.intra.peff.net>

On 11/30, Jeff King wrote:
> On Wed, Nov 30, 2016 at 03:42:48PM -0800, Brandon Williams wrote:
> 
> > > where 20813 and 20867 are two threads of the main process. One is doing
> > > the lstat and the other calls chdir at the same moment.
> >
> > Yeah so it looks like the start_command function calls chdir.  Which
> > means any uses of the run-command interface are not thread safe....
> 
> That seems crazy. The chdir should be happening on the child side of the
> fork (and looking at the code, it seems to be the case). And on the
> Windows side, without fork, it's an option to the spawn call, which
> makes sense.
> 
> > For now the work around could be to just pass "-C <dir>" to the child
> > process instead of relying on run-command to chdir.
> 
> Yeah, that would push it after the exec. I just don't understand why
> that would be necessary.
> 
> -Peff

You're right, I jumped the gun.  That doesn't seem to fix the problem
(as I'm still seeing the same failure).


-- 
Brandon Williams

^ permalink raw reply

* Re: What's cooking in git.git (Nov 2016, #06; Mon, 28)
From: Jeff King @ 2016-11-30 23:59 UTC (permalink / raw)
  To: Brandon Williams; +Cc: Junio C Hamano, git
In-Reply-To: <20161130234636.6az7xfywzprpn6ly@sigill.intra.peff.net>

On Wed, Nov 30, 2016 at 06:46:36PM -0500, Jeff King wrote:

> > For now the work around could be to just pass "-C <dir>" to the child
> > process instead of relying on run-command to chdir.
> 
> Yeah, that would push it after the exec. I just don't understand why
> that would be necessary.

Hmm. It still seems to fail, even with the workaround (the patch in
run-command is there to make sure there's not some other call that we're
not catching):

diff --git a/builtin/grep.c b/builtin/grep.c
index 2c727ef49..3323a3e7f 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -553,6 +553,7 @@ static int grep_submodule_launch(struct grep_opt *opt,
 	argv_array_pushf(&cp.args, "--super-prefix=%s%s/",
 			 super_prefix ? super_prefix : "",
 			 name);
+	argv_array_pushl(&cp.args, "-C", gs->path, NULL);
 	argv_array_push(&cp.args, "grep");
 
 	/*
@@ -586,7 +587,6 @@ static int grep_submodule_launch(struct grep_opt *opt,
 	}
 
 	cp.git_cmd = 1;
-	cp.dir = gs->path;
 
 	/*
 	 * Capture output to output buffer and check the return code from the
diff --git a/run-command.c b/run-command.c
index 5a4dbb66d..d040f4f77 100644
--- a/run-command.c
+++ b/run-command.c
@@ -393,6 +393,8 @@ int start_command(struct child_process *cmd)
 			close(cmd->out);
 		}
 
+		if (cmd->dir && git_env_bool("GIT_NO_CHDIR", 0))
+			die("temporarily disallowing chdir");
 		if (cmd->dir && chdir(cmd->dir))
 			die_errno("exec '%s': cd to '%s' failed", cmd->argv[0],
 			    cmd->dir);
diff --git a/t/t7814-grep-recurse-submodules.sh b/t/t7814-grep-recurse-submodules.sh
index 050777186..591ff74ed 100755
--- a/t/t7814-grep-recurse-submodules.sh
+++ b/t/t7814-grep-recurse-submodules.sh
@@ -175,7 +175,7 @@ test_expect_success 'grep recurse submodule colon in name' '
 	fi:le:foobar
 	su:b/fi:le:foobar
 	EOF
-	git -C parent grep -e "foobar" --recurse-submodules >actual &&
+	GIT_NO_CHDIR=1 strace -o foo.out -f git -C parent grep -e "foobar" --recurse-submodules >actual &&
 	test_cmp expect actual &&
 
 	cat >expect <<-\EOF &&


So I think there is some other chdir(). I'm not sure if there is an easy
way to get a backtrace on every call to chdir() in every thread. I'm
sure somebody more clever than me could figure out how to make gdb do it
automatically, but it might be workable manually. I think the chdir was
in the main thread.

-Peff

^ permalink raw reply related

* Re: What's cooking in git.git (Nov 2016, #06; Mon, 28)
From: Jeff King @ 2016-12-01  0:04 UTC (permalink / raw)
  To: Brandon Williams; +Cc: Junio C Hamano, git
In-Reply-To: <20161130235952.4j63gkdlbobjitdb@sigill.intra.peff.net>

On Wed, Nov 30, 2016 at 06:59:52PM -0500, Jeff King wrote:

> So I think there is some other chdir(). I'm not sure if there is an easy
> way to get a backtrace on every call to chdir() in every thread. I'm
> sure somebody more clever than me could figure out how to make gdb do it
> automatically, but it might be workable manually. I think the chdir was
> in the main thread.

Ah, that turned out to be quite easy. The culprit is probably:

(gdb) bt
#0  chdir () at ../sysdeps/unix/syscall-template.S:84
#1  0x00005555555fe259 in real_path_internal (path=0x5555559f6b30 "su:b/../.git/modules/su:b", die_on_error=1)
    at abspath.c:84
#2  0x00005555555fe48a in real_path (path=0x5555559f6b30 "su:b/../.git/modules/su:b") at abspath.c:135
#3  0x00005555556d09e6 in read_gitfile_gently (path=0x5555559f6ac0 "su:b/.git", return_error_code=0x0)
    at setup.c:555
#4  0x00005555556d19cf in resolve_gitdir (suspect=0x5555559f6ac0 "su:b/.git") at setup.c:1021
#5  0x00005555556e7e34 in is_submodule_populated (path=0x5555559f5ec8 "su:b") at submodule.c:244
#6  0x00005555555a0f05 in grep_submodule (opt=0x7fffffffd8b0, sha1=0x0, filename=0x5555559f5ec8 "su:b", 
    path=0x5555559f5ec8 "su:b") at builtin/grep.c:619
#7  0x00005555555a12ac in grep_cache (opt=0x7fffffffd8b0, pathspec=0x7fffffffd880, cached=0) at builtin/grep.c:700
#8  0x00005555555a36cb in cmd_grep (argc=0, argv=0x7fffffffdf40, prefix=0x0) at builtin/grep.c:1257
#9  0x000055555556603b in run_builtin (p=0x5555559b3ad8 <commands+984>, argc=4, argv=0x7fffffffdf40) at git.c:373
#10 0x00005555555662bc in handle_builtin (argc=4, argv=0x7fffffffdf40) at git.c:572
#11 0x000055555556641a in run_argv (argcp=0x7fffffffddfc, argv=0x7fffffffddf0) at git.c:630
#12 0x00005555555665a8 in cmd_main (argc=4, argv=0x7fffffffdf40) at git.c:702
#13 0x00005555555fde47 in main (argc=7, argv=0x7fffffffdf28) at common-main.c:40

So is_submodule_populated() needs to take a lock. But what's really
gross is that the _other_ threads need to lock just to call lstat().
Presumably it could be done as a reader/writer type of lock where many
"reader" threads can take the "I need to lstat()" lock simultaneously,
but block when an "I'm going to chdir()" writer holds it.

-Peff

^ permalink raw reply

* Re: What's cooking in git.git (Nov 2016, #06; Mon, 28)
From: Brandon Williams @ 2016-12-01  0:06 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git
In-Reply-To: <20161130235952.4j63gkdlbobjitdb@sigill.intra.peff.net>

On 11/30, Jeff King wrote:
> So I think there is some other chdir(). I'm not sure if there is an easy
> way to get a backtrace on every call to chdir() in every thread. I'm
> sure somebody more clever than me could figure out how to make gdb do it
> automatically, but it might be workable manually. I think the chdir was
> in the main thread.
> 
> -Peff

Yeah maybe we're missing something else...

How did you run strace with your stress script?

-- 
Brandon Williams

^ permalink raw reply

* Re: What's cooking in git.git (Nov 2016, #06; Mon, 28)
From: Brandon Williams @ 2016-12-01  0:08 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git
In-Reply-To: <20161201000437.altjlcsn4u7rwe5c@sigill.intra.peff.net>

On 11/30, Jeff King wrote:
> On Wed, Nov 30, 2016 at 06:59:52PM -0500, Jeff King wrote:
> 
> > So I think there is some other chdir(). I'm not sure if there is an easy
> > way to get a backtrace on every call to chdir() in every thread. I'm
> > sure somebody more clever than me could figure out how to make gdb do it
> > automatically, but it might be workable manually. I think the chdir was
> > in the main thread.
> 
> Ah, that turned out to be quite easy. The culprit is probably:
> 
> (gdb) bt
> #0  chdir () at ../sysdeps/unix/syscall-template.S:84
> #1  0x00005555555fe259 in real_path_internal (path=0x5555559f6b30 "su:b/../.git/modules/su:b", die_on_error=1)
>     at abspath.c:84
> #2  0x00005555555fe48a in real_path (path=0x5555559f6b30 "su:b/../.git/modules/su:b") at abspath.c:135
> #3  0x00005555556d09e6 in read_gitfile_gently (path=0x5555559f6ac0 "su:b/.git", return_error_code=0x0)
>     at setup.c:555
> #4  0x00005555556d19cf in resolve_gitdir (suspect=0x5555559f6ac0 "su:b/.git") at setup.c:1021
> #5  0x00005555556e7e34 in is_submodule_populated (path=0x5555559f5ec8 "su:b") at submodule.c:244
> #6  0x00005555555a0f05 in grep_submodule (opt=0x7fffffffd8b0, sha1=0x0, filename=0x5555559f5ec8 "su:b", 
>     path=0x5555559f5ec8 "su:b") at builtin/grep.c:619
> #7  0x00005555555a12ac in grep_cache (opt=0x7fffffffd8b0, pathspec=0x7fffffffd880, cached=0) at builtin/grep.c:700
> #8  0x00005555555a36cb in cmd_grep (argc=0, argv=0x7fffffffdf40, prefix=0x0) at builtin/grep.c:1257
> #9  0x000055555556603b in run_builtin (p=0x5555559b3ad8 <commands+984>, argc=4, argv=0x7fffffffdf40) at git.c:373
> #10 0x00005555555662bc in handle_builtin (argc=4, argv=0x7fffffffdf40) at git.c:572
> #11 0x000055555556641a in run_argv (argcp=0x7fffffffddfc, argv=0x7fffffffddf0) at git.c:630
> #12 0x00005555555665a8 in cmd_main (argc=4, argv=0x7fffffffdf40) at git.c:702
> #13 0x00005555555fde47 in main (argc=7, argv=0x7fffffffdf28) at common-main.c:40
> 
> So is_submodule_populated() needs to take a lock. But what's really
> gross is that the _other_ threads need to lock just to call lstat().
> Presumably it could be done as a reader/writer type of lock where many
> "reader" threads can take the "I need to lstat()" lock simultaneously,
> but block when an "I'm going to chdir()" writer holds it.
> 
> -Peff

Oh interesting, I wonder if there is a way to not have to perform a
chdir since taking a lock to lstat wouldn't be ideal.

Thanks for helping out with this!

-- 
Brandon Williams

^ permalink raw reply

* Re: What's cooking in git.git (Nov 2016, #06; Mon, 28)
From: Stefan Beller @ 2016-12-01  0:14 UTC (permalink / raw)
  To: Brandon Williams; +Cc: Jeff King, Junio C Hamano, git@vger.kernel.org
In-Reply-To: <20161201000824.GE192901@google.com>

> Oh interesting, I wonder if there is a way to not have to perform a
> chdir since taking a lock to lstat wouldn't be ideal.

I think we could rewrite is_submodule_populated to be

int is_submodule_populated_cheap_with_no_chdir(char *path)
{
    return stat(path + ".git")
}

i.e. just take the presence of the .git file/dir as a hint to run
the child process?

^ permalink raw reply

* 'git repack' and repack.writeBitmaps=true with kept packs
From: Steven Noonan @ 2016-12-01  0:15 UTC (permalink / raw)
  To: git

I have some unexpected behavior with 'git repack' on git 2.10.2 and 2.11.0.

$ cat /etc/gitconfig
[pack]
        writeBitmapHashCache = true
[repack]
        writeBitmaps = true
$ touch objects/pack/pack-3841d81123a96cedeb3c1bd7acf7e29bfba26639.keep
$ find objects
objects
objects/pack
objects/pack/pack-3841d81123a96cedeb3c1bd7acf7e29bfba26639.keep
objects/pack/pack-3841d81123a96cedeb3c1bd7acf7e29bfba26639.idx
objects/pack/pack-3841d81123a96cedeb3c1bd7acf7e29bfba26639.bitmap
objects/pack/pack-3841d81123a96cedeb3c1bd7acf7e29bfba26639.pack
objects/info
objects/info/packs
$ git repack -Adfl
Counting objects: 16321, done.
Delta compression using up to 24 threads.
Compressing objects: 100% (16118/16118), done.
Writing objects: 100% (16321/16321), done.
Reusing bitmaps: 110, done.
Selecting bitmap commits: 3257, done.
Building bitmaps: 100% (137/137), done.
Total 16321 (delta 11568), reused 4507 (delta 0)
$ du -sh objects/pack/pack*
100K    objects/pack/pack-3841d81123a96cedeb3c1bd7acf7e29bfba26639.bitmap
524K    objects/pack/pack-3841d81123a96cedeb3c1bd7acf7e29bfba26639.idx
512     objects/pack/pack-3841d81123a96cedeb3c1bd7acf7e29bfba26639.keep
4.3M    objects/pack/pack-3841d81123a96cedeb3c1bd7acf7e29bfba26639.pack
100K    objects/pack/pack-6043151bee7bd61bdae6e3a2ba0f13cd1b0277af.bitmap
524K    objects/pack/pack-6043151bee7bd61bdae6e3a2ba0f13cd1b0277af.idx
5.8M    objects/pack/pack-6043151bee7bd61bdae6e3a2ba0f13cd1b0277af.pack

It seems like it's behaving as though I've provided
--pack-kept-objects. In order to ensure the .bitmap is created, it
repacks everything, including everything in existing .pack files (not
respecting .keep). But then it's not deleting the old .pack file
(oddly, respecting .keep).

What I'd expect it to do here is ignore the 'repack.writeBitmaps =
true' value if there's a .keep that needs to be respected. Is this not
a correct assumption?

^ permalink raw reply

* Re: What's cooking in git.git (Nov 2016, #06; Mon, 28)
From: Jeff King @ 2016-12-01  0:19 UTC (permalink / raw)
  To: Brandon Williams; +Cc: Junio C Hamano, git
In-Reply-To: <20161201000605.GD192901@google.com>

On Wed, Nov 30, 2016 at 04:06:05PM -0800, Brandon Williams wrote:

> On 11/30, Jeff King wrote:
> > So I think there is some other chdir(). I'm not sure if there is an easy
> > way to get a backtrace on every call to chdir() in every thread. I'm
> > sure somebody more clever than me could figure out how to make gdb do it
> > automatically, but it might be workable manually. I think the chdir was
> > in the main thread.
> > 
> > -Peff
> 
> Yeah maybe we're missing something else...
> 
> How did you run strace with your stress script?

It's hidden in the patch I sent a moment ago, but basically just "strace
-o foo.out" will dump the trace in the trash directory. After the stress
script runs, you can "cat fail/trash*/foo.out".

> > (gdb) bt
> > #0  chdir () at ../sysdeps/unix/syscall-template.S:84
> > #1  0x00005555555fe259 in real_path_internal (path=0x5555559f6b30 "su:b/../.git/modules/su:b", die_on_error=1)
> >     at abspath.c:84
> > #2  0x00005555555fe48a in real_path (path=0x5555559f6b30 "su:b/../.git/modules/su:b") at abspath.c:135
> > #3  0x00005555556d09e6 in read_gitfile_gently (path=0x5555559f6ac0 "su:b/.git", return_error_code=0x0)
> >     at setup.c:555
> > #4  0x00005555556d19cf in resolve_gitdir (suspect=0x5555559f6ac0 "su:b/.git") at setup.c:1021
> > #5  0x00005555556e7e34 in is_submodule_populated (path=0x5555559f5ec8 "su:b") at submodule.c:244
> > #6  0x00005555555a0f05 in grep_submodule (opt=0x7fffffffd8b0, sha1=0x0, filename=0x5555559f5ec8 "su:b", 
> >     path=0x5555559f5ec8 "su:b") at builtin/grep.c:619
> > #7  0x00005555555a12ac in grep_cache (opt=0x7fffffffd8b0, pathspec=0x7fffffffd880, cached=0) at builtin/grep.c:700
> > #8  0x00005555555a36cb in cmd_grep (argc=0, argv=0x7fffffffdf40, prefix=0x0) at builtin/grep.c:1257
> > #9  0x000055555556603b in run_builtin (p=0x5555559b3ad8 <commands+984>, argc=4, argv=0x7fffffffdf40) at git.c:373
> > #10 0x00005555555662bc in handle_builtin (argc=4, argv=0x7fffffffdf40) at git.c:572
> > #11 0x000055555556641a in run_argv (argcp=0x7fffffffddfc, argv=0x7fffffffddf0) at git.c:630
> > #12 0x00005555555665a8 in cmd_main (argc=4, argv=0x7fffffffdf40) at git.c:702
> > #13 0x00005555555fde47 in main (argc=7, argv=0x7fffffffdf28) at common-main.c:40
> > 
> > So is_submodule_populated() needs to take a lock. But what's really
> > gross is that the _other_ threads need to lock just to call lstat().
> > Presumably it could be done as a reader/writer type of lock where many
> > "reader" threads can take the "I need to lstat()" lock simultaneously,
> > but block when an "I'm going to chdir()" writer holds it.
> 
> Oh interesting, I wonder if there is a way to not have to perform a
> chdir since taking a lock to lstat wouldn't be ideal.

I don't think so.  It comes from real_path(), which needs to either
chdir(), or start interpreting symbolic links itself (and madness that
way lies).

I think with a reader/writer lock as I described it wouldn't be too bad.
The common case would pay only the locking cost and not ever block,
since submodules are rare (and they're super-heavyweight to descend into
anyway).

I think putting it at the individual lstat() would be way too low, but
probably you could do it right before calling grep_source(). It may even
be possible to do some of the submodule work ahead of time while holding
grep_lock().

> Thanks for helping out with this!

I wasn't planning on it, but this turned into an intriguing puzzle. ;)

-Peff

^ permalink raw reply

* [BUG?] Remote helper's wrong assumption of unchanged ref
From: Jonathan Tan @ 2016-12-01  0:54 UTC (permalink / raw)
  To: git; +Cc: Jonathan Tan, barkalow

Commit f8ec916 ("Allow helpers to report in "list" command that the ref
is unchanged", 2009-11-17) allowed a remote helper to report that a ref
is unchanged even if it did not know the contents of a ref. However,
that commit also made Git assume that such a remote ref has the contents
of the local ref of the same name. If I'm not missing anything, this
assumption seems wrong; the attached test illustrates one case of local
edits being made after cloning with default parameters.

The original e-mail thread [1] seems to indicate that this feature is
meant for a remote helper with no Git-specific code (which is possible
if it supports "import" but not "fetch" - in this case, it would not
deal with SHA-1s at all) to nevertheless indicate "unchanged", most
likely to support optimizations on the client side.

But it seems to me that Git cannot perform this optimization. In other
words, it should just ignore "unchanged". If this makes sense, I'll
prepare a patch to do this.

[1] "[PATCH 00/13] Native and foreign helpers"
    <alpine.LNX.2.00.0908050052390.2147@iabervon.org>
---
 git-remote-testgit.sh     |  8 +++++++-
 t/t5801-remote-helpers.sh | 40 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/git-remote-testgit.sh b/git-remote-testgit.sh
index 752c763..6357868 100755
--- a/git-remote-testgit.sh
+++ b/git-remote-testgit.sh
@@ -52,7 +52,13 @@ do
 		echo
 		;;
 	list)
-		git for-each-ref --format='? %(refname)' 'refs/heads/'
+		if test -n "$GIT_REMOTE_TESTGIT_UNCHANGED_BRANCH_REGEX"
+		then
+			git for-each-ref --format='? %(refname)' 'refs/heads/' |
+				sed "/${GIT_REMOTE_TESTGIT_UNCHANGED_BRANCH_REGEX}/s/$/ unchanged/"
+		else
+			git for-each-ref --format='? %(refname)' 'refs/heads/'
+		fi
 		head=$(git symbolic-ref HEAD)
 		echo "@$head HEAD"
 		echo
diff --git a/t/t5801-remote-helpers.sh b/t/t5801-remote-helpers.sh
index 362b158..4a48f2b 100755
--- a/t/t5801-remote-helpers.sh
+++ b/t/t5801-remote-helpers.sh
@@ -301,4 +301,44 @@ test_expect_success 'fetch url' '
 	compare_refs server HEAD local FETCH_HEAD
 '
 
+test_expect_success 'setup remote repository and divergent clone' '
+	git init s2 &&
+	(
+		cd s2 &&
+		test_commit M1 &&
+		git checkout -b mybranch &&
+		test_commit B1
+	) &&
+	git clone "testgit::${PWD}/s2" divergent &&
+
+	(
+		cd divergent &&
+		git checkout master &&
+		test_commit M2 &&
+		git checkout mybranch &&
+		test_commit B2
+	)
+'
+
+test_expect_success 'fetch with unchanged claims' '
+	rm -rf local &&
+	cp -r divergent local &&
+
+	# No unchanged branches
+
+	GIT_REMOTE_TESTGIT_UNCHANGED_BRANCH_REGEX=ABCDE git -C local fetch &&
+	compare_refs s2 M1 local refs/remotes/origin/master &&
+	compare_refs s2 B1 local refs/remotes/origin/mybranch &&
+
+	# One unchanged branch
+
+	GIT_REMOTE_TESTGIT_UNCHANGED_BRANCH_REGEX=mybranch git -C local fetch &&
+	compare_refs s2 M1 local refs/remotes/origin/master &&
+
+	# I (Jonathan Tan) would expect refs/remotes/origin/mybranch to be B1,
+	# but it is B2.
+	test_must_fail compare_refs s2 B1 local refs/remotes/origin/mybranch &&
+	compare_refs local B2 local refs/remotes/origin/mybranch
+'
+
 test_done
-- 
2.8.0.rc3.226.g39d4020


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox