Git development

Git development
 help / color / mirror / Atom feed

* Re: [PATCH v2 1/1] reset: support the --stdin option
From: Junio C Hamano @ 2017-01-27 18:30 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Schindelin, git, Jakub Narębski
In-Reply-To: <20170127170422.lvpghp6jdud37zxx@sigill.intra.peff.net>

Jeff King <peff@peff.net> writes:

> A few minor suggestions:
>
>> +--stdin::
>> +	Instead of taking list of paths from the command line,
>> +	read list of paths from the standard input.  Paths are
>> +	separated by LF (i.e. one path per line) by default.
>> +
>> +-z::
>> +	Only meaningful with `--stdin`; paths are separated with
>> +	NUL character instead of LF.
>
> Is it worth clarifying that these are paths, not pathspecs? The word
> "paths" is used to refer to the pathspecs on the command-line elsewhere
> in the document.

If the code forces literal pathspecs, then what the user feeds to
the command is no longer pathspecs from the user's point of view,
and I agree that the distinction is important.  

If the remainder of the documentation is loose in terminology and
the reason why we were able to get away with it was because we
consistently used "paths" when we meant "pathspec", these existing
mention of "paths" have to be tightened, either in this patch or a
preparatory step patch before this one, because the addition of
"this thing reads paths not pathspecs" is what makes them ambiguous.

Thanks.  I re-read the part of the code that reads and unquotes as
necessary in the patch and they looked correct.

^ permalink raw reply

* Re: [PATCH] help: correct behavior for is_executable on Windows
From: Junio C Hamano @ 2017-01-27 18:43 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git, Heiko Voigt
In-Reply-To: <c1c6ccae4e60608259809914e8ff3d3d5e1ead5a.1485524999.git.johannes.schindelin@gmx.de>

Johannes Schindelin <johannes.schindelin@gmx.de> writes:

> From: Heiko Voigt <hvoigt@hvoigt.net>
>
> The previous implementation said that the filesystem information on
> Windows is not reliable to determine whether a file is executable. To
> gather this information it was peeking into the first two bytes of a
> file to see whether it looks executable.
>
> Apart from the fact that on Windows executables are defined as such by
> their extension (and Git has special code to support "executing" scripts
> when they have a she-bang line) it leads to serious performance
> problems: not only do we have to open many files now, it gets even
> slower when a virus scanner is running.

Heiko, around here (before going into the details of how severe the
problem is and how wonderful the result applying of this change is)
is the best place to summarize the solution.  E.g.

	Because the definition of "executable" on Windows is based
	on the file extension, update the function to declare that a
	file with ".exe" extension without opening and reading the
	early bytes from it.  This avoids serious performance issues.

I paraphrased the rest only so that the description of the solution
(i.e. "instead of opening and peeking, we trust .exe suffix") fits
well in the surrounding text; the important part is to say what the
change does clearly.

I agree with the reasoning and the execution of the patch, except
that 

 - "correct behaviour" in the title makes it appear that this is a
   correctness thing, but this is primarily a performance fix.

 - It is a bit strange that "MZ" is dropped in the same patch
   without any mention.

The latter may be "correctness" fix, in that earlier we treated any
file that happens to begin with MZ as an executable, regardless of
the filename, which may have misidentified a non-executable file as
an executable.  If that is what is going on, it deserves to be
described in the log message.

> diff --git a/help.c b/help.c
> index 53e2a67e00..bc6cd19cf3 100644
> --- a/help.c
> +++ b/help.c
> @@ -105,7 +105,22 @@ static int is_executable(const char *name)
>  		return 0;
>  
>  #if defined(GIT_WINDOWS_NATIVE)
> -{	/* cannot trust the executable bit, peek into the file instead */
> +	/*
> +	 * On Windows there is no executable bit. The file extension
> +	 * indicates whether it can be run as an executable, and Git
> +	 * has special-handling to detect scripts and launch them
> +	 * through the indicated script interpreter. We test for the
> +	 * file extension first because virus scanners may make
> +	 * it quite expensive to open many files.
> +	 */
> +	if (ends_with(name, ".exe"))
> +		return S_IXUSR;
> +
> +{
> +	/*
> +	 * Now that we know it does not have an executable extension,
> +	 * peek into the file instead.
> +	 */
>  	char buf[3] = { 0 };
>  	int n;
>  	int fd = open(name, O_RDONLY);
> @@ -113,8 +128,8 @@ static int is_executable(const char *name)
>  	if (fd >= 0) {
>  		n = read(fd, buf, 2);
>  		if (n == 2)
> -			/* DOS executables start with "MZ" */
> -			if (!strcmp(buf, "#!") || !strcmp(buf, "MZ"))
> +			/* look for a she-bang */
> +			if (!strcmp(buf, "#!"))
>  				st.st_mode |= S_IXUSR;
>  		close(fd);
>  	}
>
> base-commit: 4e59582ff70d299f5a88449891e78d15b4b3fabe

^ permalink raw reply

* Re: [PATCH] test-lib: on FreeBSD, look for unzip(1) in /usr/local/bin/
From: Junio C Hamano @ 2017-01-27 18:44 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Eric Wong, git
In-Reply-To: <alpine.DEB.2.20.1701271502420.3469@virtualbox>

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> Hi Junio,
>
> On Thu, 21 Jul 2016, Eric Wong wrote:
>
>> Johannes Schindelin <johannes.schindelin@gmx.de> wrote:
>> > The common work-around is to install Info-Zip on FreeBSD, into
>> > /usr/local/bin/.
>> > 
>> > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
>> 
>> Thanks, t5003 now works out-of-the-box.
>> Tested with Info-ZIP unzip installed and uninstalled.
>> 
>> Tested-by: Eric Wong <e@80x24.org>
>
> Did you forget about this?

This fell off the radar.  

You could have resent with Eric's Tested-by: added, to make it
easier to apply.  I'll see if I can find the original but it won't
happen until later this afternoon.

^ permalink raw reply

* Re: [PATCH v4] t/Makefile: add a rule to re-run previously-failed tests
From: Junio C Hamano @ 2017-01-27 18:53 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: git, Jeff King, Sverre Rabbelier,
	Ævar Arnfjörð Bjarmason, Matthieu Moy
In-Reply-To: <0563f07117e828c072ba542c1a57441e2e8efb81.1485537593.git.johannes.schindelin@gmx.de>

Johannes Schindelin <johannes.schindelin@gmx.de> writes:

> This patch automates the process of determining which tests failed
> previously and re-running them.
> ...
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>

I stored both versions in files and compared them, and it seems the
single word change in the proposed commit log message is the only
difference.  I would have written "Automate the process...", though.

If you are resending, touching up to cover all points raised by a
reviewer and doing nothing else, having "Reviewed-by: Jeff King
<peff@peff.net>" would have been nicer.  

Will queue.  Thanks.

> ---
> Published-As: https://github.com/dscho/git/releases/tag/failing-tests-v4
> Fetch-It-Via: git fetch https://github.com/dscho/git failing-tests-v4
>
>  t/Makefile | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/t/Makefile b/t/Makefile
> index d613935f14..1bb06c36f2 100644
> --- a/t/Makefile
> +++ b/t/Makefile
> @@ -35,6 +35,12 @@ all: $(DEFAULT_TEST_TARGET)
>  test: pre-clean $(TEST_LINT)
>  	$(MAKE) aggregate-results-and-cleanup
>  
> +failed:
> +	@failed=$$(cd '$(TEST_RESULTS_DIRECTORY_SQ)' && \
> +		grep -l '^failed [1-9]' *.counts | \
> +		sed -n 's/\.counts$$/.sh/p') && \
> +	test -z "$$failed" || $(MAKE) $$failed
> +
>  prove: pre-clean $(TEST_LINT)
>  	@echo "*** prove ***"; $(PROVE) --exec '$(SHELL_PATH_SQ)' $(GIT_PROVE_OPTS) $(T) :: $(GIT_TEST_OPTS)
>  	$(MAKE) clean-except-prove-cache
>
> base-commit: 4e59582ff70d299f5a88449891e78d15b4b3fabe

^ permalink raw reply

* Re: [PATCH] doc: clarify distinction between sign-off and pgp-signing
From: Cornelius Weig @ 2017-01-27 20:04 UTC (permalink / raw)
  To: git@vger.kernel.org
In-Reply-To: <20170127200136.29949-1-cornelius.weig@tngtech.com>

Sorry, I forgot to mark this patch as follow-up to message
<xmqq60l01jr9.fsf@gitster.mtv.corp.google.com>

^ permalink raw reply

* [PATCH] doc: clarify distinction between sign-off and pgp-signing
From: cornelius.weig @ 2017-01-27 20:01 UTC (permalink / raw)
  To: gitster, philipoakley, sbeller; +Cc: git, Cornelius Weig

From: Cornelius Weig <cornelius.weig@tngtech.com>

The documentation for submission discourages pgp-signing, but demands
a proper sign-off by contributors. However, when skimming the headings,
the wording of the section for sign-off could mistakenly be understood
as concerning pgp-signing. Thus, new contributors could oversee the
necessary sign-off.

This commit improves the wording such that the section about sign-off
cannot be misunderstood as pgp-signing. In addition, the paragraph about
pgp-signing is changed such that it avoids the impression that
pgp-signing could be relevant at later stages of the submission.

Signed-off-by: Cornelius Weig <cornelius.weig@tngtech.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Philip Oakley <philipoakley@iee.org>
Signed-off-by: Stefan Beller <sbeller@google.com>
---

Notes:
    This patch summarizes the suggested changes.
    
    As I don't know what is appropriate, I took the liberty to add everybody's
    sign-off who was involved in the discussion in alphabetic order.

 Documentation/SubmittingPatches | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
index 08352de..3faf7eb 100644
--- a/Documentation/SubmittingPatches
+++ b/Documentation/SubmittingPatches
@@ -216,12 +216,11 @@ that it will be postponed.
 Exception:  If your mailer is mangling patches then someone may ask
 you to re-send them using MIME, that is OK.
 
-Do not PGP sign your patch, at least for now.  Most likely, your
-maintainer or other people on the list would not have your PGP
-key and would not bother obtaining it anyway.  Your patch is not
-judged by who you are; a good patch from an unknown origin has a
-far better chance of being accepted than a patch from a known,
-respected origin that is done poorly or does incorrect things.
+Do not PGP sign your patch. Most likely, your maintainer or other people on the
+list would not have your PGP key and would not bother obtaining it anyway.
+Your patch is not judged by who you are; a good patch from an unknown origin
+has a far better chance of being accepted than a patch from a known, respected
+origin that is done poorly or does incorrect things.
 
 If you really really really really want to do a PGP signed
 patch, format it as "multipart/signed", not a text/plain message
@@ -246,7 +245,7 @@ patch.
      *2* The mailing list: git@vger.kernel.org
 
 
-(5) Sign your work
+(5) Certify your work by adding your "Signed-off-by: " line
 
 To improve tracking of who did what, we've borrowed the
 "sign-off" procedure from the Linux kernel project on patches
-- 
2.10.2


^ permalink raw reply related

* Re: [PATCH] doc: clarify distinction between sign-off and pgp-signing
From: Stefan Beller @ 2017-01-27 20:31 UTC (permalink / raw)
  To: Cornelius Weig; +Cc: Junio C Hamano, Philip Oakley, git@vger.kernel.org
In-Reply-To: <20170127200136.29949-1-cornelius.weig@tngtech.com>

On Fri, Jan 27, 2017 at 12:01 PM,  <cornelius.weig@tngtech.com> wrote:
> From: Cornelius Weig <cornelius.weig@tngtech.com>
>
> The documentation for submission discourages pgp-signing, but demands
> a proper sign-off by contributors. However, when skimming the headings,
> the wording of the section for sign-off could mistakenly be understood
> as concerning pgp-signing. Thus, new contributors could oversee the
> necessary sign-off.
>
> This commit improves the wording such that the section about sign-off
> cannot be misunderstood as pgp-signing. In addition, the paragraph about
> pgp-signing is changed such that it avoids the impression that
> pgp-signing could be relevant at later stages of the submission.
>
> Signed-off-by: Cornelius Weig <cornelius.weig@tngtech.com>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> Signed-off-by: Philip Oakley <philipoakley@iee.org>
> Signed-off-by: Stefan Beller <sbeller@google.com>
> ---
>
> Notes:
>     This patch summarizes the suggested changes.
>
>     As I don't know what is appropriate, I took the liberty to add everybody's
>     sign-off who was involved in the discussion in alphabetic order.

Heh, my first though was to conclude you haven't read the
sign off part, yet apart from the changed header.
/me goes back and actually reads the DCO again.
And actually these sign offs were there in other patches in this area,
so you'd claim (a) that yours was just created partly by you and having
other patches that were also signed off (b), whose sign offs you
merely copy over to here.

And then after reading I realized I slightly confused the signing
myself as the sign offs are also used to track the flow of a patch.
These sign offs suggest that you made the patch initially, then
passed it to Junio, then to Philip and then to me.
And Junio will sign it again when applying the patch.

So maybe s/signed-off-by/helped-by/?

The patch with the aggregation looks good to me.

Thanks,
Stefan

>
>  Documentation/SubmittingPatches | 13 ++++++-------
>  1 file changed, 6 insertions(+), 7 deletions(-)
>
> diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
> index 08352de..3faf7eb 100644
> --- a/Documentation/SubmittingPatches
> +++ b/Documentation/SubmittingPatches
> @@ -216,12 +216,11 @@ that it will be postponed.
>  Exception:  If your mailer is mangling patches then someone may ask
>  you to re-send them using MIME, that is OK.
>
> -Do not PGP sign your patch, at least for now.  Most likely, your
> -maintainer or other people on the list would not have your PGP
> -key and would not bother obtaining it anyway.  Your patch is not
> -judged by who you are; a good patch from an unknown origin has a
> -far better chance of being accepted than a patch from a known,
> -respected origin that is done poorly or does incorrect things.
> +Do not PGP sign your patch. Most likely, your maintainer or other people on the
> +list would not have your PGP key and would not bother obtaining it anyway.
> +Your patch is not judged by who you are; a good patch from an unknown origin
> +has a far better chance of being accepted than a patch from a known, respected
> +origin that is done poorly or does incorrect things.
>
>  If you really really really really want to do a PGP signed
>  patch, format it as "multipart/signed", not a text/plain message
> @@ -246,7 +245,7 @@ patch.
>       *2* The mailing list: git@vger.kernel.org
>
>
> -(5) Sign your work
> +(5) Certify your work by adding your "Signed-off-by: " line
>
>  To improve tracking of who did what, we've borrowed the
>  "sign-off" procedure from the Linux kernel project on patches
> --
> 2.10.2
>

^ permalink raw reply

* Re: [PATCH 11/12] receive-pack: treat namespace .have lines like alternates
From: Junio C Hamano @ 2017-01-27 20:42 UTC (permalink / raw)
  To: Jeff King; +Cc: Lukas Fleischer, git
In-Reply-To: <20170127175807.4tjxpenu2gk77dhv@sigill.intra.peff.net>

Jeff King <peff@peff.net> writes:

> On Fri, Jan 27, 2017 at 06:45:26PM +0100, Lukas Fleischer wrote:
>
>> I think this is already possible using receive.hideRefs (which causes
>> the ref_is_hidden() branch above to return if applicable).
>> ...
>
> Thanks for the pointers. I think a "turn off namespace .have lines"
> option would be easier for some cases, but what you've implemented is
> much more flexible. So if people using namespaces are happy with it, I
> don't see any need to add another way to do the same thing.

Yeah, I agree.  Thanks, both.

^ permalink raw reply

* octopus merge --no-ff claims to fast-forward even though it doesn't
From: Samuel Lijin @ 2017-01-27 20:46 UTC (permalink / raw)
  To: git@vger.kernel.org

I was doing an octopus merge earlier and noticed that it claims to
fast-forward when you specify --no-ff, even though it does actually
abide by --no-ff.

I can consistently reproduce as follows:

$ git clone https://github.com/sxlijin/merge-octopus-experiment
$ cd merge-octopus-experiment
$ git merge --no-ff origin/A origin/B --no-edit
Fast-forwarding to: origin/A
Trying simple merge with origin/B
Merge made by the 'octopus' strategy.
 anotherA | 0
 anotherB | 0
 otherA   | 0
 otherB   | 0
 4 files changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 anotherA
 create mode 100644 anotherB
 create mode 100644 otherA
 create mode 100644 otherB

$ git log --graph --pretty=oneline --decorate

I've reproduced the issue with 2.11.0 on both a Windows box (MSYS) and
Linux (Arch).

The issue seems to live in git-merge-octopus.sh, specifically in that
--no-ff does not affect the initial value of NON_FF_MERGE. I'm happy
to submit a patch if someone can point me in the right direction.

Sam

^ permalink raw reply

* Re: [PATCH] test-lib: on FreeBSD, look for unzip(1) in /usr/local/bin/
From: Junio C Hamano @ 2017-01-27 20:47 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Eric Wong, git
In-Reply-To: <xmqq8tpwz6jp.fsf@gitster.mtv.corp.google.com>

Junio C Hamano <gitster@pobox.com> writes:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>> On Thu, 21 Jul 2016, Eric Wong wrote:
>>
>>> Thanks, t5003 now works out-of-the-box.
>>> Tested with Info-ZIP unzip installed and uninstalled.
>>> 
>>> Tested-by: Eric Wong <e@80x24.org>
>>
>> Did you forget about this?
>
> This fell off the radar.  
>
> You could have resent with Eric's Tested-by: added, to make it
> easier to apply.  I'll see if I can find the original but it won't
> happen until later this afternoon.

The errand I needed to run earlier in the day turned out to be not
so time consuming---I found the exchange and the patch is now
queued, and will be part of today's pushout.

Thanks, both.

^ permalink raw reply

* Re: [PATCH] doc: clarify distinction between sign-off and pgp-signing
From: Cornelius Weig @ 2017-01-27 20:48 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Junio C Hamano, Philip Oakley, git@vger.kernel.org
In-Reply-To: <CAGZ79kb29usw4WyMdy3c5-rGH1nEcQ1gUabzdAtGgOW9zfTCDA@mail.gmail.com>

> 
> So maybe s/signed-off-by/helped-by/?
> 

This is getting real complex :-/

As I said in the notes for the patch:

>>     As I don't know what is appropriate, I took the liberty to add everybody's
>>     sign-off who was involved in the discussion in alphabetic order.

With your explanation, I guess the most accurate sign-off chain would be:

Signed-off-by: Stefan Beller <sbeller@google.com> (as you sent a patch)
Helped-by: Philip Oakley <philipoakley@iee.org> (no patch, but helpful)
Signed-off-by: Cornelius Weig <cornelius.weig@tngtech.com> (this patch)
Signed-off-by: Junio C Hamano <gitster@pobox.com> (once he decides it's good)

^ permalink raw reply

* Re: [PATCH] doc: clarify distinction between sign-off and pgp-signing
From: Stefan Beller @ 2017-01-27 21:01 UTC (permalink / raw)
  To: Cornelius Weig; +Cc: Junio C Hamano, Philip Oakley, git@vger.kernel.org
In-Reply-To: <01fc4c33-2d4e-e19f-d447-6a187e15d2ed@tngtech.com>

On Fri, Jan 27, 2017 at 12:48 PM, Cornelius Weig
<cornelius.weig@tngtech.com> wrote:
>>
>> So maybe s/signed-off-by/helped-by/?
>>
>
> This is getting real complex :-/

uh; sorry for that. I do not mind the patch as posted,
just in case you reroll for another reason, this is worth thinking about.

In fact, as said before I like that patch.

>
> As I said in the notes for the patch:
>
>>>     As I don't know what is appropriate, I took the liberty to add everybody's
>>>     sign-off who was involved in the discussion in alphabetic order.
>
> With your explanation, I guess the most accurate sign-off chain would be:
>
> Signed-off-by: Stefan Beller <sbeller@google.com> (as you sent a patch)

...and here we could continue arguing. ;)
Is the patch I sent note-worthy enough to be deriving work from?
My gut reaction would be "no".

> Helped-by: Philip Oakley <philipoakley@iee.org> (no patch, but helpful)
> Signed-off-by: Cornelius Weig <cornelius.weig@tngtech.com> (this patch)
> Signed-off-by: Junio C Hamano <gitster@pobox.com> (once he decides it's good)

^ permalink raw reply

* [PATCH v2 7/7] completion: recognize more long-options
From: cornelius.weig @ 2017-01-27 21:17 UTC (permalink / raw)
  To: j6t; +Cc: szeder.dev, spearce, git, Cornelius Weig
In-Reply-To: <20170127211703.24910-1-cornelius.weig@tngtech.com>

From: Cornelius Weig <cornelius.weig@tngtech.com>

Recognize several new long-options for bash completion in the following
commands:

 - apply: --recount --directory=
 - archive: --output
 - branch: --column --no-column --sort= --points-at
 - clone: --no-single-branch --shallow-submodules
 - commit: --patch --short --date --allow-empty
 - describe: --first-parent
 - fetch, pull: --unshallow --update-shallow
 - fsck: --name-objects
 - grep: --break --heading --show-function --function-context
         --untracked --no-index
 - mergetool: --prompt --no-prompt
 - reset: --keep
 - revert: --strategy= --strategy-option=
 - rm: --force
 - shortlog: --email
 - tag: --merged --no-merged --create-reflog

Signed-off-by: Cornelius Weig <cornelius.weig@tngtech.com>
Helped-by: Johannes Sixt <j6t@kdbg.org>
---
 contrib/completion/git-completion.bash | 31 +++++++++++++++++++++----------
 1 file changed, 21 insertions(+), 10 deletions(-)

diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash
index 0e09519..933bb6e 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -936,6 +936,7 @@ _git_apply ()
 			--apply --no-add --exclude=
 			--ignore-whitespace --ignore-space-change
 			--whitespace= --inaccurate-eof --verbose
+			--recount --directory=
 			"
 		return
 	esac
@@ -974,7 +975,7 @@ _git_archive ()
 	--*)
 		__gitcomp "
 			--format= --list --verbose
-			--prefix= --remote= --exec=
+			--prefix= --remote= --exec= --output
 			"
 		return
 		;;
@@ -1029,6 +1030,7 @@ _git_branch ()
 			--track --no-track --contains --merged --no-merged
 			--set-upstream-to= --edit-description --list
 			--unset-upstream --delete --move --remotes
+			--column --no-column --sort= --points-at
 			"
 		;;
 	*)
@@ -1142,6 +1144,8 @@ _git_clone ()
 			--single-branch
 			--branch
 			--recurse-submodules
+			--no-single-branch
+			--shallow-submodules
 			"
 		return
 		;;
@@ -1183,6 +1187,7 @@ _git_commit ()
 			--reset-author --file= --message= --template=
 			--cleanup= --untracked-files --untracked-files=
 			--verbose --quiet --fixup= --squash=
+			--patch --short --date --allow-empty
 			"
 		return
 	esac
@@ -1201,7 +1206,7 @@ _git_describe ()
 	--*)
 		__gitcomp "
 			--all --tags --contains --abbrev= --candidates=
-			--exact-match --debug --long --match --always
+			--exact-match --debug --long --match --always --first-parent
 			"
 		return
 	esac
@@ -1284,6 +1289,7 @@ __git_fetch_recurse_submodules="yes on-demand no"
 __git_fetch_options="
 	--quiet --verbose --append --upload-pack --force --keep --depth=
 	--tags --no-tags --all --prune --dry-run --recurse-submodules=
+	--unshallow --update-shallow
 "
 
 _git_fetch ()
@@ -1333,7 +1339,7 @@ _git_fsck ()
 	--*)
 		__gitcomp "
 			--tags --root --unreachable --cache --no-reflogs --full
-			--strict --verbose --lost-found
+			--strict --verbose --lost-found --name-objects
 			"
 		return
 		;;
@@ -1377,6 +1383,8 @@ _git_grep ()
 			--max-depth
 			--count
 			--and --or --not --all-match
+			--break --heading --show-function --function-context
+			--untracked --no-index
 			"
 		return
 		;;
@@ -1576,7 +1584,7 @@ _git_mergetool ()
 		return
 		;;
 	--*)
-		__gitcomp "--tool="
+		__gitcomp "--tool= --prompt --no-prompt"
 		return
 		;;
 	esac
@@ -2456,7 +2464,7 @@ _git_reset ()
 
 	case "$cur" in
 	--*)
-		__gitcomp "--merge --mixed --hard --soft --patch"
+		__gitcomp "--merge --mixed --hard --soft --patch --keep"
 		return
 		;;
 	esac
@@ -2472,7 +2480,10 @@ _git_revert ()
 	fi
 	case "$cur" in
 	--*)
-		__gitcomp "--edit --mainline --no-edit --no-commit --signoff"
+		__gitcomp "
+			--edit --mainline --no-edit --no-commit --signoff
+			--strategy= --strategy-option=
+			"
 		return
 		;;
 	esac
@@ -2483,7 +2494,7 @@ _git_rm ()
 {
 	case "$cur" in
 	--*)
-		__gitcomp "--cached --dry-run --ignore-unmatch --quiet"
+		__gitcomp "--cached --dry-run --ignore-unmatch --quiet --force"
 		return
 		;;
 	esac
@@ -2500,7 +2511,7 @@ _git_shortlog ()
 		__gitcomp "
 			$__git_log_common_options
 			$__git_log_shortlog_options
-			--numbered --summary
+			--numbered --summary --email
 			"
 		return
 		;;
@@ -2778,8 +2789,8 @@ _git_tag ()
 	--*)
 		__gitcomp "
 			--list --delete --verify --annotate --message --file
-			--sign --cleanup --local-user --force --column --sort
-			--contains --points-at
+			--sign --cleanup --local-user --force --column --sort=
+			--contains --points-at --merged --no-merged --create-reflog
 			"
 		;;
 	esac
-- 
2.10.2


^ permalink raw reply related

* [PATCH v2 0/7] completion: recognize more long-options
From: cornelius.weig @ 2017-01-27 21:17 UTC (permalink / raw)
  To: j6t; +Cc: szeder.dev, spearce, git, Cornelius Weig
In-Reply-To: <74ecd09c-55da-3858-5187-52c286a6bf62@kdbg.org>

From: Cornelius Weig <cornelius.weig@tngtech.com>

This revision addresses Johannes' concerns. Changes wrt v1:

 - fixed the commit message: two of the "dangerous" options erroneously ended
   up in the commit message. These options were already in the list of
   auto-completable options.
 - removed the possibly dangerous option '--unsafe-paths' from git-apply.
 - added my sign-off

Patches 1-6 are not resent, because they have not changed (other than my added sign-off).

Also, I added further people to CC, because nobody actually has looked at the code yet.


Cornelius Weig (7):
  completion: recognize more long-options

 contrib/completion/git-completion.bash | 132 +++++++++++++++++++++++++++------
 1 file changed, 110 insertions(+), 22 deletions(-)

-- 
2.10.2


^ permalink raw reply

* Re: [PATCH] doc: clarify distinction between sign-off and pgp-signing
From: Philip Oakley @ 2017-01-27 21:38 UTC (permalink / raw)
  To: Stefan Beller, Cornelius Weig; +Cc: Junio C Hamano, git
In-Reply-To: <CAGZ79kb29usw4WyMdy3c5-rGH1nEcQ1gUabzdAtGgOW9zfTCDA@mail.gmail.com>

From: "Stefan Beller" <sbeller@google.com>
> On Fri, Jan 27, 2017 at 12:01 PM,  <cornelius.weig@tngtech.com> wrote:
>> From: Cornelius Weig <cornelius.weig@tngtech.com>
>>
>> The documentation for submission discourages pgp-signing, but demands
>> a proper sign-off by contributors. However, when skimming the headings,
>> the wording of the section for sign-off could mistakenly be understood
>> as concerning pgp-signing. Thus, new contributors could oversee the
>> necessary sign-off.
>>
>> This commit improves the wording such that the section about sign-off
>> cannot be misunderstood as pgp-signing. In addition, the paragraph about
>> pgp-signing is changed such that it avoids the impression that
>> pgp-signing could be relevant at later stages of the submission.
>>
>> Signed-off-by: Cornelius Weig <cornelius.weig@tngtech.com>
>> Signed-off-by: Junio C Hamano <gitster@pobox.com>
>> Signed-off-by: Philip Oakley <philipoakley@iee.org>
>> Signed-off-by: Stefan Beller <sbeller@google.com>
>> ---
>>
>> Notes:
>>     This patch summarizes the suggested changes.
>>
>>     As I don't know what is appropriate, I took the liberty to add 
>> everybody's
>>     sign-off who was involved in the discussion in alphabetic order.
>
> Heh, my first though was to conclude you haven't read the
> sign off part, yet apart from the changed header.
> /me goes back and actually reads the DCO again.
> And actually these sign offs were there in other patches in this area,
> so you'd claim (a) that yours was just created partly by you and having
> other patches that were also signed off (b), whose sign offs you
> merely copy over to here.
>
> And then after reading I realized I slightly confused the signing
> myself as the sign offs are also used to track the flow of a patch.
> These sign offs suggest that you made the patch initially, then
> passed it to Junio, then to Philip and then to me.
> And Junio will sign it again when applying the patch.
>
> So maybe s/signed-off-by/helped-by/?

Helped-by: Philip Oakley <philipoakley@iee.org>

is sufficient for me (if that).
>
> The patch with the aggregation looks good to me.
>
> Thanks,
> Stefan
>
>>
>>  Documentation/SubmittingPatches | 13 ++++++-------
>>  1 file changed, 6 insertions(+), 7 deletions(-)
>>
>> diff --git a/Documentation/SubmittingPatches 
>> b/Documentation/SubmittingPatches
>> index 08352de..3faf7eb 100644
>> --- a/Documentation/SubmittingPatches
>> +++ b/Documentation/SubmittingPatches
>> @@ -216,12 +216,11 @@ that it will be postponed.
>>  Exception:  If your mailer is mangling patches then someone may ask
>>  you to re-send them using MIME, that is OK.
>>
>> -Do not PGP sign your patch, at least for now.  Most likely, your
>> -maintainer or other people on the list would not have your PGP
>> -key and would not bother obtaining it anyway.  Your patch is not
>> -judged by who you are; a good patch from an unknown origin has a
>> -far better chance of being accepted than a patch from a known,
>> -respected origin that is done poorly or does incorrect things.
>> +Do not PGP sign your patch. Most likely, your maintainer or other people 
>> on the
>> +list would not have your PGP key and would not bother obtaining it 
>> anyway.
>> +Your patch is not judged by who you are; a good patch from an unknown 
>> origin
>> +has a far better chance of being accepted than a patch from a known, 
>> respected
>> +origin that is done poorly or does incorrect things.
>>
>>  If you really really really really want to do a PGP signed
>>  patch, format it as "multipart/signed", not a text/plain message
>> @@ -246,7 +245,7 @@ patch.
>>       *2* The mailing list: git@vger.kernel.org
>>
>>
>> -(5) Sign your work
>> +(5) Certify your work by adding your "Signed-off-by: " line
>>
>>  To improve tracking of who did what, we've borrowed the
>>  "sign-off" procedure from the Linux kernel project on patches
>> --
>> 2.10.2
>>
> 


^ permalink raw reply

* Re: [PATCH] doc: clarify distinction between sign-off and pgp-signing
From: Junio C Hamano @ 2017-01-27 21:40 UTC (permalink / raw)
  To: Cornelius Weig; +Cc: git@vger.kernel.org
In-Reply-To: <d65bc165-9bd3-c7a6-9a55-1904d1bc095e@tngtech.com>

Cornelius Weig <cornelius.weig@tngtech.com> writes:

> Sorry, I forgot to mark this patch as follow-up to message
> <xmqq60l01jr9.fsf@gitster.mtv.corp.google.com>

I appreciate that you are very considerate, but in practice, if you
do not have too many topics in flight and your response time is less
than 48 hours, we can tell which new message is about which older
discussion thread.  Don't worry about it too much.

Thanks.

^ permalink raw reply

* Re: octopus merge --no-ff claims to fast-forward even though it doesn't
From: Junio C Hamano @ 2017-01-27 21:59 UTC (permalink / raw)
  To: Samuel Lijin; +Cc: git@vger.kernel.org
In-Reply-To: <CAJZjrdWdRGZ5DC1XV_YiNt-1sKiNgAtiS-eS9L6H2GJ+_8n08w@mail.gmail.com>

Samuel Lijin <sxlijin@gmail.com> writes:

> I was doing an octopus merge earlier and noticed that it claims to
> fast-forward when you specify --no-ff, even though it does actually
> abide by --no-ff.

This was intentional and hasn't changed since it was first designed;
the octopus was to be used only for the simple and obvious merges.

If anything, I think it should error out when the --no-ff option is
given, issuing the same error message as the one given when any step
other than the last one needs manual resolution.

^ permalink raw reply

* [PATCH 0/2] limit reused delta chains based on --depth
From: Jeff King @ 2017-01-27 22:01 UTC (permalink / raw)
  To: git; +Cc: Michael Haggerty

Back when we switched pack-objects to visiting packs in
most-recently-used order last August, we realized that this could reuse
cross-pack deltas, and that the result could have longer delta chains
than any single pack contains.

I produced a patch back then[1], but we decided not to follow through
with it. Two things happened to make me revive that patch:

  1. I hit a case in the wild with a really long delta chain that caused
     pack-objects' write_one() to run out of stack space.

  2. We dropped the --aggressive depth to match the normal one. So
     there's less concern about mismatches throwing out on-disk deltas
     from a previous aggressive repack (but see the caveats in the first
     patch's commit message).

So here it is, plus another patch that converts the recursion to
iteration (the stack space needed by the function is small enough that
just the first patch was enough to fix my problem case, but it seemed
like tempting fate to leave it recursive).

  [1/2]: pack-objects: enforce --depth limit in reused deltas
  [2/2]: pack-objects: convert recursion to iteration in break_delta_chain()

 builtin/pack-objects.c      | 133 ++++++++++++++++++++++++++++++++++++--------
 pack-objects.h              |   4 ++
 t/t5316-pack-delta-depth.sh |  93 +++++++++++++++++++++++++++++++
 3 files changed, 207 insertions(+), 23 deletions(-)
 create mode 100755 t/t5316-pack-delta-depth.sh

-Peff

[1] http://public-inbox.org/git/20160811095710.p2bffympjlwmv3gc@sigill.intra.peff.net/

^ permalink raw reply

* [PATCH 1/2] pack-objects: enforce --depth limit in reused deltas
From: Jeff King @ 2017-01-27 22:02 UTC (permalink / raw)
  To: git; +Cc: Michael Haggerty
In-Reply-To: <20170127220143.boo5phhgogqlucsj@sigill.intra.peff.net>

Since 898b14c (pack-objects: rework check_delta_limit usage,
2007-04-16), we check the delta depth limit only when
figuring out whether we should make a new delta. We don't
consider it at all when reusing deltas, which means that
packing once with --depth=250, and then again with
--depth=50, the second pack my still contain chains larger
than 50.

This is generally considered a feature, as the results of
earlier high-depth repacks are carried forward, used for
serving fetches, etc. However, since we started using
cross-pack deltas in c9af708b1 (pack-objects: use mru list
when iterating over packs, 2016-08-11), we are no longer
bounded by the length of an existing delta chain in a single
pack.

Here's one particular pathological case: a sequence of N
packs, each with 2 objects, the base of which is stored as a
delta in a previous pack. If we chain all the deltas
together, we have a cycle of length N. We break the cycle,
but the tip delta is still at depth N-1.

This is less unlikely than it might sound. See the included
test for a reconstruction based on real-world actions.  I
ran into such a case in the wild, where a client was rapidly
sending packs, and we had accumulated 10,000 before doing a
server-side repack.  The pack that "git repack" tried to
generate had a very deep chain, which caused pack-objects to
run out of stack space in the recursive write_one().

This patch bounds the length of delta chains in the output
pack based on --depth, regardless of whether they are caused
by cross-pack deltas or existed in the input packs. This
fixes the problem, but does have two possible downsides:

  1. High-depth aggressive repacks followed by "normal"
     repacks will throw away the high-depth chains.

     In the long run this is probably OK; investigation
     showed that high-depth repacks aren't actually
     beneficial, and we dropped the aggressive depth default
     to match the normal case in 07e7dbf0d (gc: default
     aggressive depth to 50, 2016-08-11).

  2. If you really do want to store high-depth deltas on
     disk, they may be discarded and new delta computed when
     serving a fetch, unless you set pack.depth to match
     your high-depth size.

The implementation uses the existing search for delta
cycles.  That lets us compute the depth of any node based on
the depth of its base, because we know the base is DFS_DONE
by the time we look at it (modulo any cycles in the graph,
but we know there cannot be any because we break them as we
see them).

There is some subtlety worth mentioning, though. We record
the depth of each object as we compute it. It might seem
like we could save the per-object storage space by just
keeping track of the depth of our traversal (i.e., have
break_delta_chains() report how deep it went). But we may
visit an object through multiple delta paths, and on
subsequent paths we want to know its depth immediately,
without having to walk back down to its final base (doing so
would make our graph walk quadratic rather than linear).

Likewise, one could try to record the depth not from the
base, but from our starting point (i.e., start
recursion_depth at 0, and pass "recursion_depth + 1" to each
invocation of break_delta_chains()). And then when
recursion_depth gets too big, we know that we must cut the
delta chain.  But that technique is wrong if we do not visit
the nodes in topological order. In a chain A->B->C, it
if we visit "C", then "B", then "A", we will never recurse
deeper than 1 link (because we see at each node that we have
already visited it).

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/pack-objects.c      | 18 +++++++++
 pack-objects.h              |  4 ++
 t/t5316-pack-delta-depth.sh | 93 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 115 insertions(+)
 create mode 100755 t/t5316-pack-delta-depth.sh

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 8841f8b36..2b08c8121 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1539,6 +1539,8 @@ static int pack_offset_sort(const void *_a, const void *_b)
  *   2. Updating our size/type to the non-delta representation. These were
  *      either not recorded initially (size) or overwritten with the delta type
  *      (type) when check_object() decided to reuse the delta.
+ *
+ *   3. Resetting our delta depth, as we are now a base object.
  */
 static void drop_reused_delta(struct object_entry *entry)
 {
@@ -1552,6 +1554,7 @@ static void drop_reused_delta(struct object_entry *entry)
 			p = &(*p)->delta_sibling;
 	}
 	entry->delta = NULL;
+	entry->depth = 0;
 
 	oi.sizep = &entry->size;
 	oi.typep = &entry->type;
@@ -1570,6 +1573,9 @@ static void drop_reused_delta(struct object_entry *entry)
  * Follow the chain of deltas from this entry onward, throwing away any links
  * that cause us to hit a cycle (as determined by the DFS state flags in
  * the entries).
+ *
+ * We also detect too-long reused chains that would violate our --depth
+ * limit.
  */
 static void break_delta_chains(struct object_entry *entry)
 {
@@ -1587,6 +1593,18 @@ static void break_delta_chains(struct object_entry *entry)
 		 */
 		entry->dfs_state = DFS_ACTIVE;
 		break_delta_chains(entry->delta);
+
+		/*
+		 * Once we've recursed, our base (if we still have one) knows
+		 * its depth, so we can compute ours (and check it against
+		 * the limit).
+		 */
+		if (entry->delta) {
+			entry->depth = entry->delta->depth + 1;
+			if (entry->depth > depth)
+				drop_reused_delta(entry);
+		}
+
 		entry->dfs_state = DFS_DONE;
 		break;
 
diff --git a/pack-objects.h b/pack-objects.h
index cc9b9a9b9..03f119165 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -30,12 +30,16 @@ struct object_entry {
 
 	/*
 	 * State flags for depth-first search used for analyzing delta cycles.
+	 *
+	 * The depth is measured in delta-links to the base (so if A is a delta
+	 * against B, then A has a depth of 1, and B a depth of 0).
 	 */
 	enum {
 		DFS_NONE = 0,
 		DFS_ACTIVE,
 		DFS_DONE
 	} dfs_state;
+	int depth;
 };
 
 struct packing_data {
diff --git a/t/t5316-pack-delta-depth.sh b/t/t5316-pack-delta-depth.sh
new file mode 100755
index 000000000..236d60fe6
--- /dev/null
+++ b/t/t5316-pack-delta-depth.sh
@@ -0,0 +1,93 @@
+#!/bin/sh
+
+test_description='pack-objects breaks long cross-pack delta chains'
+. ./test-lib.sh
+
+# This mirrors a repeated push setup:
+#
+# 1. A client repeatedly modifies some files, makes a
+#      commit, and pushes the result. It does this N times
+#      before we get around to repacking.
+#
+# 2. Each push generates a thin pack with the new version of
+#    various objects. Let's consider some file in the root tree
+#    which is updated in each commit.
+#
+#    When generating push number X, we feed commit X-1 (and
+#    thus blob X-1) as a preferred base. The resulting pack has
+#    blob X as a thin delta against blob X-1.
+#
+#    On the receiving end, "index-pack --fix-thin" will
+#    complete the pack with a base copy of tree X-1.
+#
+# 3. In older versions of git, if we used the delta from
+#    pack X, then we'd always find blob X-1 as a base in the
+#    same pack (and generate a fresh delta).
+#
+#    But with the pack mru, we jump from delta to delta
+#    following the traversal order:
+#
+#      a. We grab blob X from pack X as a delta, putting it at
+#         the tip of our mru list.
+#
+#      b. Eventually we move onto commit X-1. We need other
+#         objects which are only in pack X-1 (in the test code
+#         below, it's the containing tree). That puts pack X-1
+#         at the tip of our mru list.
+#
+#      c. Eventually we look for blob X-1, and we find the
+#         version in pack X-1 (because it's the mru tip).
+#
+# Now we have blob X as a delta against X-1, which is a delta
+# against X-2, and so forth.
+#
+# In the real world, these small pushes would get exploded by
+# unpack-objects rather than "index-pack --fix-thin", but the
+# same principle applies to larger pushes (they only need one
+# repeatedly-modified file to generate the delta chain).
+
+test_expect_success 'create series of packs' '
+	test-genrandom foo 4096 >content &&
+	prev= &&
+	for i in $(test_seq 1 10)
+	do
+		cat content >file &&
+		echo $i >>file &&
+		git add file &&
+		git commit -m $i &&
+		cur=$(git rev-parse HEAD^{tree}) &&
+		{
+			test -n "$prev" && echo "-$prev"
+			echo $cur
+			echo "$(git rev-parse :file) file"
+		} | git pack-objects --stdout >tmp &&
+		git index-pack --stdin --fix-thin <tmp || return 1
+		prev=$cur
+	done
+'
+
+max_chain() {
+	git index-pack --verify-stat-only "$1" >output &&
+	perl -lne '
+	  /chain length = (\d+)/ and $len = $1;
+	  END { print $len }
+	' output
+}
+
+# Note that this whole setup is pretty reliant on the current
+# packing heuristics. We double-check that our test case
+# actually produces a long chain. If it doesn't, it should be
+# adjusted (or scrapped if the heuristics have become too unreliable)
+test_expect_success 'packing produces a long delta' '
+	# Use --window=0 to make sure we are seeing reused deltas,
+	# not computing a new long chain.
+	pack=$(git pack-objects --all --window=0 </dev/null pack) &&
+	test 9 = "$(max_chain pack-$pack.pack)"
+'
+
+test_expect_success '--depth limits depth' '
+	pack=$(git pack-objects --all --depth=5 </dev/null pack) &&
+	test 5 = "$(max_chain pack-$pack.pack)"
+'
+
+test_done
-- 
2.11.0.914.gb3b960f50


^ permalink raw reply related

* [PATCH 2/2] pack-objects: convert recursion to iteration in break_delta_chain()
From: Jeff King @ 2017-01-27 22:05 UTC (permalink / raw)
  To: git; +Cc: Michael Haggerty
In-Reply-To: <20170127220143.boo5phhgogqlucsj@sigill.intra.peff.net>

The break_delta_chain() function is recursive over the depth
of a given delta chain, which can lead to possibly running
out of stack space. Normally delta depth is quite small, but
if there _is_ a pathological case, this is where we would
find and fix it, so we should be more careful.

We can do it without recursion at all, but there's a little
bit of cleverness needed to do so. It's easiest to explain
by covering the less-clever strategies first.

The obvious thing to try is just keeping our own stack on
the heap. Whenever we would recurse, push the new entry onto
the stack and loop instead. But this gets tricky; when we
see an ACTIVE entry, we need to care if we just pushed it
(in which case it's a cycle) or if we just popped it (in
which case we dealt with its bases, and no we need to clear
the ACTIVE flag and compute its depth).

You can hack around that in various ways, like keeping a
"just pushed" flag, but the logic gets muddled. However, we
can observe that we do all of our pushes first, and then all
of our pops afterwards. In other words, we can do this in
two passes. First dig down to the base, stopping when we see
a cycle, and pushing each item onto our stack.  Then pop the
stack elements, clearing the ACTIVE flag and computing the
depth for each.

This works, and is reasonably elegant. However, why do we
need the stack for the second pass? We can just walk the
delta pointers again. There's one complication. Popping the
stack went over our list in reverse, so we could compute the
depth of each entry by incrementing the depth of its base,
which we will have just computed.  To go forward in the
second pass, we have to compute the total depth on the way
down, and then assign it as we go.

This patch implements this final strategy, because it not
only keeps the memory off the stack, but it eliminates it
entirely. Credit for the cleverness in that approach goes to
Michael Haggerty; bugs are mine.

Signed-off-by: Jeff King <peff@peff.net>
---
The diff is nearly impossible to read, so I'd recommend just looking at
the end result. I tried to document the tricky parts with comments.
There are a few parts that could be made more terse, but I screwed it up
so many times while writing it that I decided to do it in a way that
carefully documents all of the assumptions.

 builtin/pack-objects.c | 129 +++++++++++++++++++++++++++++++++++++------------
 1 file changed, 99 insertions(+), 30 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 2b08c8121..c7af47548 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1579,48 +1579,117 @@ static void drop_reused_delta(struct object_entry *entry)
  */
 static void break_delta_chains(struct object_entry *entry)
 {
-	/* If it's not a delta, it can't be part of a cycle. */
-	if (!entry->delta) {
-		entry->dfs_state = DFS_DONE;
-		return;
-	}
+	/*
+	 * The actual depth of each object we will write is stored as an int,
+	 * as it cannot exceed our int "depth" limit. But before we break
+	 * changes based no that limit, we may potentially go as deep as the
+	 * number of objects, which is elsewhere bounded to a uint32_t.
+	 */
+	uint32_t total_depth;
+	struct object_entry *cur, *next;
+
+	for (cur = entry, total_depth = 0;
+	     cur;
+	     cur = cur->delta, total_depth++) {
+		if (cur->dfs_state == DFS_DONE) {
+			/*
+			 * We've already seen this object and know it isn't
+			 * part of a cycle. We do need to append its depth
+			 * to our count.
+			 */
+			total_depth += cur->depth;
+			break;
+		}
 
-	switch (entry->dfs_state) {
-	case DFS_NONE:
 		/*
-		 * This is the first time we've seen the object. We mark it as
-		 * part of the active potential cycle and recurse.
+		 * We break cycles before looping, so an ACTIVE state (or any
+		 * other cruft which made its way into the state variable)
+		 * is a bug.
 		 */
-		entry->dfs_state = DFS_ACTIVE;
-		break_delta_chains(entry->delta);
+		if (cur->dfs_state != DFS_NONE)
+			die("BUG: confusing delta dfs state in first pass: %d",
+			    cur->dfs_state);
 
 		/*
-		 * Once we've recursed, our base (if we still have one) knows
-		 * its depth, so we can compute ours (and check it against
-		 * the limit).
+		 * Now we know this is the first time we've seen the object. If
+		 * it's not a delta, we're done traversing, but we'll mark it
+		 * done to save time on future traversals.
 		 */
-		if (entry->delta) {
-			entry->depth = entry->delta->depth + 1;
-			if (entry->depth > depth)
-				drop_reused_delta(entry);
+		if (!cur->delta) {
+			cur->dfs_state = DFS_DONE;
+			break;
 		}
 
-		entry->dfs_state = DFS_DONE;
-		break;
+		/*
+		 * Mark ourselves as active and see if the next step causes
+		 * us to cycle to another active object. It's important to do
+		 * this _before_ we loop, because it impacts where we make the
+		 * cut, and thus how our total_depth counter works.
+		 * E.g., We may see a partial loop like:
+		 *
+		 *   A -> B -> C -> D -> B
+		 *
+		 * Cutting B->C breaks the cycle. But now the depth of A is
+		 * only 1, and our total_depth counter is at 3. The size of the
+		 * error is always one less than the size of the cycle we
+		 * broke. Commits C and D were "lost" from A's chain.
+		 *
+		 * If we instead cut D->B, then the depth of A is correct at 3.
+		 * We keep all commits in the chain that we examined.
+		 */
+		cur->dfs_state = DFS_ACTIVE;
+		if (cur->delta->dfs_state == DFS_ACTIVE) {
+			drop_reused_delta(cur);
+			cur->dfs_state = DFS_DONE;
+			break;
+		}
+	}
 
-	case DFS_DONE:
-		/* object already examined, and not part of a cycle */
-		break;
+	/*
+	 * And now that we've gone all the way to the bottom of the chain, we
+	 * need to clear the active flags and set the depth fields as
+	 * appropriate. Unlike the loop above, which can quit when it drops a
+	 * delta, we need to keep going to look for more depth cuts. So we need
+	 * an extra "next" pointer to keep going after we reset cur->delta.
+	 */
+	for (cur = entry; cur; cur = next) {
+		next = cur->delta;
 
-	case DFS_ACTIVE:
 		/*
-		 * We found a cycle that needs broken. It would be correct to
-		 * break any link in the chain, but it's convenient to
-		 * break this one.
+		 * We should have a chain of zero or more ACTIVE states down to
+		 * a final DONE. We can quit after the DONE, because either it
+		 * has no bases, or we've already handled them in a previous
+		 * call.
 		 */
-		drop_reused_delta(entry);
-		entry->dfs_state = DFS_DONE;
-		break;
+		if (cur->dfs_state == DFS_DONE)
+			break;
+		else if (cur->dfs_state != DFS_ACTIVE)
+			die("BUG: confusing delta dfs state in second pass: %d",
+			    cur->dfs_state);
+
+		/*
+		 * If the total_depth is more than depth, then we need to snip
+		 * the chain into two or more smaller chains that don't exceed
+		 * the maximum depth. Most of the resulting chains will contain
+		 * (depth + 1) entries (i.e., depth deltas plus one base), and
+		 * the last chain (i.e., the one containing entry) will contain
+		 * whatever entries are left over, namely
+		 * (total_depth % (depth + 1)) of them.
+		 *
+		 * Since we are iterating towards decreasing depth, we need to
+		 * decrement total_depth as we go, and we need to write to the
+		 * entry what its final depth will be after all of the
+		 * snipping. Since we're snipping into chains of length (depth
+		 * + 1) entries, the final depth of an entry will be its
+		 * original depth modulo (depth + 1). Any time we encounter an
+		 * entry whose final depth is supposed to be zero, we snip it
+		 * from its delta base, thereby making it so.
+		 */
+		cur->depth = (total_depth--) % (depth + 1);
+		if (!cur->depth)
+			drop_reused_delta(cur);
+
+		cur->dfs_state = DFS_DONE;
 	}
 }
 
-- 
2.11.0.914.gb3b960f50

^ permalink raw reply related

* Re: [PATCH v2 1/1] reset: support the --stdin option
From: Jeff King @ 2017-01-27 22:12 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Schindelin, git, Jakub Narębski
In-Reply-To: <xmqqh94kz76v.fsf@gitster.mtv.corp.google.com>

On Fri, Jan 27, 2017 at 10:30:48AM -0800, Junio C Hamano wrote:

> > Is it worth clarifying that these are paths, not pathspecs? The word
> > "paths" is used to refer to the pathspecs on the command-line elsewhere
> > in the document.
> 
> If the code forces literal pathspecs, then what the user feeds to
> the command is no longer pathspecs from the user's point of view,
> and I agree that the distinction is important.  
> 
> If the remainder of the documentation is loose in terminology and
> the reason why we were able to get away with it was because we
> consistently used "paths" when we meant "pathspec", these existing
> mention of "paths" have to be tightened, either in this patch or a
> preparatory step patch before this one, because the addition of
> "this thing reads paths not pathspecs" is what makes them ambiguous.

I think a lot of the documentation uses <paths> to refer to pathspecs
(e.g., git-log(1), git-diff(1), etc).  As long as we're consistent with
that convention, I don't think it's that big a deal.

This spot needs a specific mention because it violates the convention.

I don't know if the are other spots where it might be unclear, but I
think we're probably better to tighten those as they come up, rather
than switch to saying "<pathspecs>" everywhere.

That's outside the scope of this series, though, I would think.

-Peff

^ permalink raw reply

* Deadlock between git-remote-http and git fetch-pack
From: tsuna @ 2017-01-27 22:31 UTC (permalink / raw)
  To: git

Hi there,
While investigating a hung job in our CI system today, I think I found
a deadlock in git-remote-http

Git version: 2.9.3
Linux (amd64) kernel 4.9.0

Excerpt from the process list:

jenkins  27316  0.0  0.0  18508  6024 ?        S    19:30   0:00  |
               \_ git -C ../../../arista fetch --unshallow
jenkins  27317  0.0  0.0 169608 10916 ?        S    19:30   0:00  |
                   \_ git-remote-http origin http://gerrit/arista
jenkins  27319  0.0  0.0  24160  8260 ?        S    19:30   0:00  |
                       \_ git fetch-pack --stateless-rpc --stdin
--lock-pack --include-tag --thin --no-progress --depth=2147483647
http://gerrit/arista/

Here PID 27319 (git fetch-pack) is stuck reading on stdin, while its
parent, PID 27317 (git-remote-http) is stuck reading on its child’s
stdout.  Nothing has moved for like 2h, it’s deadlocked.

> strace -fp 27319
strace: Process 27319 attached
read(0,

Here FD 0 is a pipe:

~ @8a33a534e2f7> lsof -np 27319 | grep 0r
git     27319 jenkins    0r  FIFO   0,10      0t0 354519158 pipe

The writing end of which is owned by the parent process:

~ @8a33a534e2f7> lsof -n 2>/dev/null | fgrep 354519158
git-remot 27317                jenkins    4w     FIFO   0,10      0t0
354519158 pipe
git       27319                jenkins    0r     FIFO   0,10      0t0
354519158 pipe

And the parent process (git-remote-http) is stuck reading from another FD:

> strace -fp 27317
strace: Process 27317 attached
read(5,

And here FD 5 is another pipe:

~ @8a33a534e2f7> lsof -np 27317 | grep 5r
git-remot 27317 jenkins    5r  FIFO   0,10      0t0 354519159 pipe

Which is the child’s stdout:

> lsof -n 2>/dev/null | fgrep 354519159
git-remot 27317                jenkins    5r     FIFO   0,10      0t0
354519159 pipe
git       27319                jenkins    1w     FIFO   0,10      0t0
354519159 pipe

Hence the deadlock.

Stack trace in git-remote-http:

(gdb) bt
#0  0x00007f04f1e1363d in read () from target:/lib64/libpthread.so.0
#1  0x0000562417472d73 in xread ()
#2  0x0000562417472f2b in read_in_full ()
#3  0x0000562417438a6e in get_packet_data ()
#4  0x0000562417439129 in packet_read ()
#5  0x00005624174245e0 in rpc_service ()
#6  0x0000562417424f10 in fetch_git ()
#7  0x00005624174233fd in main ()

Stack trace in git fetch-pack:

(gdb) bt
#0  0x00007fb3ab478620 in __read_nocancel () from target:/lib64/libpthread.so.0
#1  0x000055f688827283 in xread ()
#2  0x000055f68882743b in read_in_full ()
#3  0x000055f6887ce35e in get_packet_data ()
#4  0x000055f6887cea19 in packet_read ()
#5  0x000055f6887ceb90 in packet_read_line ()
#6  0x000055f68879dd05 in get_ack ()
#7  0x000055f68879f6b4 in fetch_pack ()
#8  0x000055f688710619 in cmd_fetch_pack ()
#9  0x000055f6886dff7b in handle_builtin ()
#10 0x000055f6886df026 in main ()

I looked at the diff between v2.9.3 and HEAD on fetch-pack.c and
remote-curl.c and didn’t see anything noteworthy in that area of the
code, so I presume the bug is still there in master.

-- 
Benoit "tsuna" Sigoure

^ permalink raw reply

* Re: Deadlock between git-remote-http and git fetch-pack
From: Jonathan Tan @ 2017-01-27 23:19 UTC (permalink / raw)
  To: tsuna, git
In-Reply-To: <CAFKYj4cMSK5nQ1nS66c4Opz8y7x+xQH+OdW8PTi7LmCiGBP1ZA@mail.gmail.com>

On 01/27/2017 02:31 PM, tsuna wrote:
> Hi there,
> While investigating a hung job in our CI system today, I think I found
> a deadlock in git-remote-http
>
> Git version: 2.9.3
> Linux (amd64) kernel 4.9.0
>
> Excerpt from the process list:
>
> jenkins  27316  0.0  0.0  18508  6024 ?        S    19:30   0:00  |
>                \_ git -C ../../../arista fetch --unshallow
> jenkins  27317  0.0  0.0 169608 10916 ?        S    19:30   0:00  |
>                    \_ git-remote-http origin http://gerrit/arista
> jenkins  27319  0.0  0.0  24160  8260 ?        S    19:30   0:00  |
>                        \_ git fetch-pack --stateless-rpc --stdin
> --lock-pack --include-tag --thin --no-progress --depth=2147483647
> http://gerrit/arista/
>
> Here PID 27319 (git fetch-pack) is stuck reading on stdin, while its
> parent, PID 27317 (git-remote-http) is stuck reading on its child’s
> stdout.  Nothing has moved for like 2h, it’s deadlocked.
>
>> strace -fp 27319
> strace: Process 27319 attached
> read(0,
>
> Here FD 0 is a pipe:
>
> ~ @8a33a534e2f7> lsof -np 27319 | grep 0r
> git     27319 jenkins    0r  FIFO   0,10      0t0 354519158 pipe
>
> The writing end of which is owned by the parent process:
>
> ~ @8a33a534e2f7> lsof -n 2>/dev/null | fgrep 354519158
> git-remot 27317                jenkins    4w     FIFO   0,10      0t0
> 354519158 pipe
> git       27319                jenkins    0r     FIFO   0,10      0t0
> 354519158 pipe
>
> And the parent process (git-remote-http) is stuck reading from another FD:
>
>> strace -fp 27317
> strace: Process 27317 attached
> read(5,
>
> And here FD 5 is another pipe:
>
> ~ @8a33a534e2f7> lsof -np 27317 | grep 5r
> git-remot 27317 jenkins    5r  FIFO   0,10      0t0 354519159 pipe
>
> Which is the child’s stdout:
>
>> lsof -n 2>/dev/null | fgrep 354519159
> git-remot 27317                jenkins    5r     FIFO   0,10      0t0
> 354519159 pipe
> git       27319                jenkins    1w     FIFO   0,10      0t0
> 354519159 pipe
>
> Hence the deadlock.
>
> Stack trace in git-remote-http:
>
> (gdb) bt
> #0  0x00007f04f1e1363d in read () from target:/lib64/libpthread.so.0
> #1  0x0000562417472d73 in xread ()
> #2  0x0000562417472f2b in read_in_full ()
> #3  0x0000562417438a6e in get_packet_data ()
> #4  0x0000562417439129 in packet_read ()
> #5  0x00005624174245e0 in rpc_service ()
> #6  0x0000562417424f10 in fetch_git ()
> #7  0x00005624174233fd in main ()
>
> Stack trace in git fetch-pack:
>
> (gdb) bt
> #0  0x00007fb3ab478620 in __read_nocancel () from target:/lib64/libpthread.so.0
> #1  0x000055f688827283 in xread ()
> #2  0x000055f68882743b in read_in_full ()
> #3  0x000055f6887ce35e in get_packet_data ()
> #4  0x000055f6887cea19 in packet_read ()
> #5  0x000055f6887ceb90 in packet_read_line ()
> #6  0x000055f68879dd05 in get_ack ()
> #7  0x000055f68879f6b4 in fetch_pack ()
> #8  0x000055f688710619 in cmd_fetch_pack ()
> #9  0x000055f6886dff7b in handle_builtin ()
> #10 0x000055f6886df026 in main ()
>
> I looked at the diff between v2.9.3 and HEAD on fetch-pack.c and
> remote-curl.c and didn’t see anything noteworthy in that area of the
> code, so I presume the bug is still there in master.
>

I haven't looked into this in detail, but this might be related to 
something I discovered while writing my patch set. I noticed that 
upload-pack (the process on the "other side" of fetch-pack) can die 
without first writing any notification, causing fetch-pack to block 
forever on a read. A fix would probably look like that patch [1].

[1] 
<afe5d7d3f876893fdad318665805df1e056717c6.1485381677.git.jonathantanmy@google.com>

^ permalink raw reply

* Re: [PATCH 1/2] pack-objects: enforce --depth limit in reused deltas
From: Junio C Hamano @ 2017-01-27 23:31 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Michael Haggerty
In-Reply-To: <20170127220233.mwg36mgxdklwz7lr@sigill.intra.peff.net>

Jeff King <peff@peff.net> writes:

> Since 898b14c (pack-objects: rework check_delta_limit usage,
> 2007-04-16), we check the delta depth limit only when
> figuring out whether we should make a new delta. We don't
> consider it at all when reusing deltas, which means that
> packing once with --depth=250, and then again with
> --depth=50, the second pack my still contain chains larger
> than 50.

"may still contain", methinks.

> ...
>
> This patch bounds the length of delta chains in the output
> pack based on --depth, regardless of whether they are caused
> by cross-pack deltas or existed in the input packs. This
> fixes the problem, but does have two possible downsides:
>
>   1. High-depth aggressive repacks followed by "normal"
>      repacks will throw away the high-depth chains.

I actually think it is a feature that the normal one that runs later
is allowed to fix an over-deep delta chain.

>   2. If you really do want to store high-depth deltas on
>      disk, they may be discarded and new delta computed when
>      serving a fetch, unless you set pack.depth to match
>      your high-depth size.

Likewise.

> ... But we may
> visit an object through multiple delta paths, ...

Yes, good thinking.

> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 8841f8b36..2b08c8121 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -1539,6 +1539,8 @@ static int pack_offset_sort(const void *_a, const void *_b)
>   *   2. Updating our size/type to the non-delta representation. These were
>   *      either not recorded initially (size) or overwritten with the delta type
>   *      (type) when check_object() decided to reuse the delta.
> + *
> + *   3. Resetting our delta depth, as we are now a base object.
>   */
>  static void drop_reused_delta(struct object_entry *entry)
>  {
> @@ -1552,6 +1554,7 @@ static void drop_reused_delta(struct object_entry *entry)
>  			p = &(*p)->delta_sibling;
>  	}
>  	entry->delta = NULL;
> +	entry->depth = 0;
>  
>  	oi.sizep = &entry->size;
>  	oi.typep = &entry->type;

Makes sense.

>  static void break_delta_chains(struct object_entry *entry)
>  {
> @@ -1587,6 +1593,18 @@ static void break_delta_chains(struct object_entry *entry)
>  		 */
>  		entry->dfs_state = DFS_ACTIVE;
>  		break_delta_chains(entry->delta);
> +
> +		/*
> +		 * Once we've recursed, our base (if we still have one) knows
> +		 * its depth, so we can compute ours (and check it against
> +		 * the limit).
> +		 */
> +		if (entry->delta) {
> +			entry->depth = entry->delta->depth + 1;
> +			if (entry->depth > depth)
> +				drop_reused_delta(entry);
> +		}

;-)  Surprisingly simpple and effective.

> diff --git a/t/t5316-pack-delta-depth.sh b/t/t5316-pack-delta-depth.sh
> new file mode 100755
> index 000000000..236d60fe6
> --- /dev/null
> +++ b/t/t5316-pack-delta-depth.sh
> @@ -0,0 +1,93 @@
> +#!/bin/sh
> +
> +test_description='pack-objects breaks long cross-pack delta chains'
> +. ./test-lib.sh
> +
> +# This mirrors a repeated push setup:
> +#
> +# 1. A client repeatedly modifies some files, makes a
> +#      commit, and pushes the result. It does this N times
> +#      before we get around to repacking.
> +#
> +# 2. Each push generates a thin pack with the new version of
> +#    various objects. Let's consider some file in the root tree
> +#    which is updated in each commit.
> +#
> +#    When generating push number X, we feed commit X-1 (and
> +#    thus blob X-1) as a preferred base. The resulting pack has
> +#    blob X as a thin delta against blob X-1.
> +#
> +#    On the receiving end, "index-pack --fix-thin" will
> +#    complete the pack with a base copy of tree X-1.

blob? tree? I think the argument would work the same way for either
type of objects, but the previous paragraph is using blob as the
example, so s/tree/blob/ here?

> +# 3. In older versions of git, if we used the delta from
> +#    pack X, then we'd always find blob X-1 as a base in the
> +#    same pack (and generate a fresh delta).
> +#
> +#    But with the pack mru, we jump from delta to delta
> +#    following the traversal order:
> +# ...
> +# Now we have blob X as a delta against X-1, which is a delta
> +# against X-2, and so forth.

> +# Note that this whole setup is pretty reliant on the current
> +# packing heuristics. We double-check that our test case
> +# actually produces a long chain. If it doesn't, it should be
> +# adjusted (or scrapped if the heuristics have become too unreliable)

IOW, we want something that says "we first check X and if X still
holds, then we expect Y to also hold; if X no longer hold, do not
bother to test that Y holds".  Nice food for thought.  Perhaps we
want a way to express that in our test framework, or is X in the
above merely another way to say "prerequisite"?

> +test_expect_success 'packing produces a long delta' '
> +	# Use --window=0 to make sure we are seeing reused deltas,
> +	# not computing a new long chain.
> +	pack=$(git pack-objects --all --window=0 </dev/null pack) &&
> +	test 9 = "$(max_chain pack-$pack.pack)"
> +'
> +
> +test_expect_success '--depth limits depth' '
> +	pack=$(git pack-objects --all --depth=5 </dev/null pack) &&
> +	test 5 = "$(max_chain pack-$pack.pack)"
> +'
> +
> +test_done

^ permalink raw reply

* Re: Deadlock between git-remote-http and git fetch-pack
From: Junio C Hamano @ 2017-01-27 23:34 UTC (permalink / raw)
  To: tsuna; +Cc: git
In-Reply-To: <CAFKYj4cMSK5nQ1nS66c4Opz8y7x+xQH+OdW8PTi7LmCiGBP1ZA@mail.gmail.com>

tsuna <tsunanet@gmail.com> writes:

> While investigating a hung job in our CI system today, I think I found
> a deadlock in git-remote-http
> ...
> Here PID 27319 (git fetch-pack) is stuck reading on stdin, while its
> parent, PID 27317 (git-remote-http) is stuck reading on its child’s
> stdout.  Nothing has moved for like 2h, it’s deadlocked.

Hmph, would this be related to 296b847c0d ("remote-curl: don't hang
when a server dies before any output", 2016-11-18) I wonder...

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox