Git development

Git development
 help / color / mirror / Atom feed

* Re: [PATCH 4/4] unpack-trees: support super-prefix option
From: Junio C Hamano @ 2017-01-11 23:28 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Brandon Williams, David Turner, git@vger.kernel.org
In-Reply-To: <CAGZ79kaHDnVLQr8WLoaD5KKs7EqeW=KbkptF=iHpU5t054Xcdw@mail.gmail.com>

Stefan Beller <sbeller@google.com> writes:

>> Preparing the expected output "expect" outside test_expect_success
>> block is also old-school.  Move it inside the new test?
>
> I looked into that. What is our current stance on using single/double quotes?

Using dq around executable part, i.e.

	test_expect_success title "
		echo \"'foo'\"
	"

is a million-times more grave sin even if the test initially does
not refer to any variable in its body.  Somebody will eventually
need to add a line that refers to a variable and either forgets to
quote \$ in front or properly quotes it.  The former makes the
variable interpolated while the arguments to the test_expect_success
is prepared (which is a bug) and the latter makes the result quite
ugly, like so:

	test_expect_success title "
		sq=\' var=foo &&
		echo \"\${sq}\$value\${sq}\"
	"

Enclosing the body always in sq-pair does mean something ugly like
this from the beginning once you need to use sq inside:

	test_expect_success title '
		sq='\'' &&
		echo "${sq}foo${sq}"
	'

or

	test_expect_success title '
		echo "'\''foo'\''"
	'

but the ugliness is localized to places where single quote is
absolutely needed, and '\'' is something eyes soon get used to as an
idiom [*1*].  It does not affect every reference of variables like
enclosing with dq does.

[Footnote]

*1* That "idiom"-ness is the reason why the last example above is
    not written like this:

	test_expect_success title '
		echo "'\'foo\''"
	'

    even though the sq pair surrounding "foo" is not technically
    necessary.  It lets us think of the string as almost always is
    inside sq context, with the single exception that we step
    outside sq context and append a sq by saying bs-sq (\').

^ permalink raw reply

* Re: [PATCH 4/4] unpack-trees: support super-prefix option
From: Stefan Beller @ 2017-01-11 22:12 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Brandon Williams, David Turner, git@vger.kernel.org
In-Reply-To: <xmqq37gpnuyi.fsf@gitster.mtv.corp.google.com>

On Wed, Jan 11, 2017 at 1:32 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Stefan Beller <sbeller@google.com> writes:
>
>> Add support for the super-prefix option for commands that unpack trees.
>> For testing purposes enable it in read-tree, which has no other path
>> related output.
>
> "path related output"?  I am not sure I understand this.

Well, s/path related output/output of paths/.

> When read-tree reads N trees, or unpack_trees() is asked to "merge"
> N trees, how does --super-prefix supposed to interact with the paths
> in these trees?  Does the prefix added to paths in all trees?

Internally the super-prefix is ignored. Only the (input and) output
is using that super prefix for messages.

It was introduced for grepping recursively into submodules, i.e.

invoke as

    git grep --recurse-submodules \
      -e path/inside/submodule/and/further/down
    # internally it invokes:
    git -C .. --super-prefix .. grep ..
    # which operates "just normal" except for the
    # input parsing and output

The use case for this patch is working tree related things, i.e.
    git checkout --recurse-submodules
    # internally when recursing we call "git read-tree -u", but
    # reporting could be:
    Your local changes to the following files would be overwritten by checkout:
      path/inside/submodule/file.txt


>
> Did you mean that read-tree (and unpack_trees() in general) is a
> whole-tree operation and there is no path related input (i.e.
> pathspec limiting), but if another command that started in the
> superproject is running us in its submodule, it wants us to prefix
> the path to the submodule when we report errors?

Yes. I tried to explain it better above, but you got it here.

>
> If that is what is going on, I think it makes sense, but it may be
> asking too much from the readers to guess that from "Add support for
> the super-prefix option".

I need to enhance the commit message by a lot then.

>
>> --- a/t/t1001-read-tree-m-2way.sh
>> +++ b/t/t1001-read-tree-m-2way.sh
>> @@ -363,6 +363,15 @@ test_expect_success 'a/b (untracked) vs a, plus c/d case test.' '
>>       test -f a/b
>>  '
>>
>> +cat <<-EOF >expect &&
>> +     error: Updating 'fictional/a' would lose untracked files in it
>> +EOF
>> +
>> +test_expect_success 'read-tree supports the super-prefix' '
>> +     test_must_fail git --super-prefix fictional/ read-tree -u -m "$treeH" "$treeM" 2>actual &&
>> +     test_cmp expect actual
>> +'
>> +
>
> Preparing the expected output "expect" outside test_expect_success
> block is also old-school.  Move it inside the new test?

I looked into that. What is our current stance on using single/double quotes?
Most tests use single quotes for the test title as well as the test
instructions,
such that double quotes can be used naturally in there. But single quotes
cannot be used even escaped, such that we need to do a thing like

sq="'"

test_expect_success 'test title' '
    cat <<-EOF >expect &&
        error for ${sq}path${sq}
    EOF
    test instructions go here
'

though that is not used as often as I would think as
grep \${sq} yields t1507 and t3005.

>
> Hmph, as a reader of the code, I do not even want to wonder how
> expensive get_super_prefix() is.  If the executable part of the
> above were written like this, it would have been easier to
> understand:
>
>         if (super_prefix_len < 0) {
>                 if (!get_super_prefix())
>                         super_prefix_len = 0;
>                 else {
>                         int i;
>                         ... prepare buf[] and set super_prefix_len ...;
>                 }
>         }
>
>         if (!super_prefix_len)
>                 return path;
>
>         ... use buf[] to do the prefixing and return it ...
>

good point. I'll fix that.

^ permalink raw reply

* Re: [RFC PATCH 3/2] vreportf: add prefix to each line
From: Junio C Hamano @ 2017-01-11 22:11 UTC (permalink / raw)
  To: Jeff King; +Cc: git
In-Reply-To: <20170111140758.yyfsc3r3spqpi6es@sigill.intra.peff.net>

Jeff King <peff@peff.net> writes:

> On Wed, Jan 11, 2017 at 09:01:38AM -0500, Jeff King wrote:
>
>>   [1/2]: Revert "vreportf: avoid intermediate buffer"
>>   [2/2]: vreport: sanitize ASCII control chars
>
> We've talked before about repeating the "error:" header for multi-line
> messages. The reversion in patch 1 makes this easy to play with, so I
> did. I kind of hate it. But here it is, for your viewing pleasure.

Thanks.

>
> -- >8 --
> Subject: vreportf: add prefix to each line
>
> Some error and warning messages are multi-line, but we put
> the prefix only on the first line. This means that either
> the subsequent lines don't have any indication that they're
> connected to the original error, or the caller has to make
> multiple calls to error() or warning(). And that's not even
> possible for die().
>
> Instead, let's put the prefix at the start of each line,
> just as advise() does.
>
> Note that this patch doesn't update the tests yet, so it
> causes tons of failures. This is just to see what it might
> look like.
>
> It's actually kind of ugly.  For instance, a failing test in
> t3600 now produces:
>
>    error: the following files have staged content different from both the
>    error: file and the HEAD:
>    error:     bar.txt
>    error:     foo.txt
>    error: (use -f to force removal)
>
> which seems a bit aggressive.  

I agree that it is ugly, but one reason I was hoping to do this
myself (or have somebody else do it by procrastinating) was that I
thought it may help i18n.  That is, for an original

	error(_("we found an error"))

a .po file may translate the string to a multi-line string that has
LFs in it and the output would look correct.  The translator already
can do so by indenting the second and subsequent lines by the same
column-width as "error: " (or its translation in their language, if
we are going to i18n these headers), but that (1) is extra work for
them, and (2) makes it impossible to have the same message appear in
different contexts (i.e. "error:" vs "warning:" that have different
column-widths).

> It also screws up messages which indent with tabs (the prefix eats
> up some of the tabstop, making the indentation smaller).

This is unavoidable and at the same time is a non-issue, isn't it?
Messages that indent the second and subsequent lines with tabs are
compensating the lack of the multi-line support of vreportf(), which
this RFC patch fixes.  They may need to be adjusted to the new world
order, but that is a good thing.  New multi-line messages no longer
have to worry about the prefix that is added only to the first line
when continuing the message to multiple lines.

> And the result is
> a little harder if you need to cut-and-paste the file lines
> (if your terminal lets you triple-click to select a whole
> line, now you have this "error:" cruft on the front).

This might be an issue, but I personally do not care (as I know how
to drive my "screen", namely, use 'c' while cutting) ;-).

> It may be possible to address some of that by using some
> other kind of continuation marker (instead of just repeating
> the prefix), and expanding initial tabs.

Yes indeed.  The "some other kind of continuation marker" could just
be a run of spaces that fill the same column as the "error: " or
other prefix given to the first line.

> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  usage.c | 16 ++++++++++++++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/usage.c b/usage.c
> index ad6d2910f..8a1f6ff4e 100644
> --- a/usage.c
> +++ b/usage.c
> @@ -8,18 +8,30 @@
>  
>  static FILE *error_handle;
>  
> +static void show_line(FILE *fh, const char *prefix, const char *line, int len)
> +{
> +	fprintf(fh, "%s%.*s\n", prefix, len, line);
> +}
> +
>  void vreportf(const char *prefix, const char *err, va_list params)
>  {
>  	char msg[4096];
>  	FILE *fh = error_handle ? error_handle : stderr;
> +	const char *base;
>  	char *p;
>  
>  	vsnprintf(msg, sizeof(msg), err, params);
> +
> +	base = msg;
>  	for (p = msg; *p; p++) {
> -		if (iscntrl(*p) && *p != '\t' && *p != '\n')
> +		if (*p == '\n') {
> +			show_line(fh, prefix, base, p - base);
> +			base = p + 1;
> +		} else if (iscntrl(*p) && *p != '\t')
>  			*p = '?';
>  	}
> -	fprintf(fh, "%s%s\n", prefix, msg);
> +
> +	show_line(fh, prefix, base, p - base);
>  }
>  
>  static NORETURN void usage_builtin(const char *err, va_list params)

^ permalink raw reply

* Re: [PATCHv2 1/2] builtin/commit.c: drop use snprintf via dynamic allocation
From: Junio C Hamano @ 2017-01-11 21:48 UTC (permalink / raw)
  To: Elia Pinto; +Cc: git
In-Reply-To: <20170111071032.27797-1-gitter.spiros@gmail.com>

Elia Pinto <gitter.spiros@gmail.com> writes:

> In general snprintf is bad because it may silently truncate results if we're
> wrong. In this patch where we use PATH_MAX, we'd want to handle larger
> paths anyway, so we switch to dynamic allocation.
>
> Helped-by: Jeff King <peff@peff.net>
> Signed-off-by: Elia Pinto <gitter.spiros@gmail.com>
> ---
> This is the second version of the patch.
>
> I have split the original commit in two, as discussed here
> http://public-inbox.org/git/20161213132717.42965-1-gitter.spiros@gmail.com/.

And both halves have the same title?

I think the the primary value of this one is that it removes the
PATH_MAX limitation from the environment setting that points to a
filename.  Reducing the number of snprintf() call is a side effect,
even though that may have been what motivated you originally.  The
original code used snprintf() correctly after all.

The primary value of 2/2 is reduction of the cognitive burden from
the programmers---switching to xstrfmt(), they no longer have to
count bytes needed for static allocation.  Reducing the number of
snprintf() call is a side effect, even though that may have been
what motivated you originally.  The original code used snprintf()
correctly after all.

The code looks correct in both patches (except that in 1/2 _clear()
immediately before exit() wouldn't have to be there, but it does not
hurt to have one either).  They need to be better explained.

Thanks.

>  builtin/commit.c | 22 +++++++++++-----------
>  1 file changed, 11 insertions(+), 11 deletions(-)
>
> diff --git a/builtin/commit.c b/builtin/commit.c
> index 0ed634b26..09bcc0f13 100644
> --- a/builtin/commit.c
> +++ b/builtin/commit.c
> @@ -960,15 +960,16 @@ static int prepare_to_commit(const char *index_file, const char *prefix,
>  		return 0;
>  
>  	if (use_editor) {
> -		char index[PATH_MAX];
> -		const char *env[2] = { NULL };
> -		env[0] =  index;
> -		snprintf(index, sizeof(index), "GIT_INDEX_FILE=%s", index_file);
> -		if (launch_editor(git_path_commit_editmsg(), NULL, env)) {
> +		struct argv_array env = ARGV_ARRAY_INIT;
> +
> +		argv_array_pushf(&env, "GIT_INDEX_FILE=%s", index_file);
> +		if (launch_editor(git_path_commit_editmsg(), NULL, env.argv)) {
>  			fprintf(stderr,
>  			_("Please supply the message using either -m or -F option.\n"));
> +			argv_array_clear(&env);
>  			exit(1);
>  		}
> +		argv_array_clear(&env);
>  	}
>  
>  	if (!no_verify &&
> @@ -1557,23 +1558,22 @@ static int run_rewrite_hook(const unsigned char *oldsha1,
>  
>  int run_commit_hook(int editor_is_used, const char *index_file, const char *name, ...)
>  {
> -	const char *hook_env[3] =  { NULL };
> -	char index[PATH_MAX];
> +	struct argv_array hook_env = ARGV_ARRAY_INIT;
>  	va_list args;
>  	int ret;
>  
> -	snprintf(index, sizeof(index), "GIT_INDEX_FILE=%s", index_file);
> -	hook_env[0] = index;
> +	argv_array_pushf(&hook_env, "GIT_INDEX_FILE=%s", index_file);
>  
>  	/*
>  	 * Let the hook know that no editor will be launched.
>  	 */
>  	if (!editor_is_used)
> -		hook_env[1] = "GIT_EDITOR=:";
> +		argv_array_push(&hook_env, "GIT_EDITOR=:");
>  
>  	va_start(args, name);
> -	ret = run_hook_ve(hook_env, name, args);
> +	ret = run_hook_ve(hook_env.argv,name, args);
>  	va_end(args);
> +	argv_array_clear(&hook_env);
>  
>  	return ret;
>  }

^ permalink raw reply

* Re: [PATCH] index: improve constness for reading blob data
From: Junio C Hamano @ 2017-01-11 21:36 UTC (permalink / raw)
  To: Brandon Williams; +Cc: git
In-Reply-To: <20170110200610.146596-1-bmwill@google.com>

Brandon Williams <bmwill@google.com> writes:

> Improve constness of the index_state parameter to the
> 'read_blob_data_from_index' function.
>
> Signed-off-by: Brandon Williams <bmwill@google.com>
> ---

Thanks.

^ permalink raw reply

* Re: [PATCH 4/4] unpack-trees: support super-prefix option
From: Junio C Hamano @ 2017-01-11 21:32 UTC (permalink / raw)
  To: Stefan Beller; +Cc: bmwill, novalis, git
In-Reply-To: <20170110014542.19352-5-sbeller@google.com>

Stefan Beller <sbeller@google.com> writes:

> Add support for the super-prefix option for commands that unpack trees.
> For testing purposes enable it in read-tree, which has no other path
> related output.

"path related output"?  I am not sure I understand this.

When read-tree reads N trees, or unpack_trees() is asked to "merge"
N trees, how does --super-prefix supposed to interact with the paths
in these trees?  Does the prefix added to paths in all trees?

Did you mean that read-tree (and unpack_trees() in general) is a
whole-tree operation and there is no path related input (i.e.
pathspec limiting), but if another command that started in the
superproject is running us in its submodule, it wants us to prefix
the path to the submodule when we report errors?

If that is what is going on, I think it makes sense, but it may be
asking too much from the readers to guess that from "Add support for
the super-prefix option".

> --- a/t/t1001-read-tree-m-2way.sh
> +++ b/t/t1001-read-tree-m-2way.sh
> @@ -363,6 +363,15 @@ test_expect_success 'a/b (untracked) vs a, plus c/d case test.' '
>  	test -f a/b
>  '
>  
> +cat <<-EOF >expect &&
> +	error: Updating 'fictional/a' would lose untracked files in it
> +EOF
> +
> +test_expect_success 'read-tree supports the super-prefix' '
> +	test_must_fail git --super-prefix fictional/ read-tree -u -m "$treeH" "$treeM" 2>actual &&
> +	test_cmp expect actual
> +'
> +

Preparing the expected output "expect" outside test_expect_success
block is also old-school.  Move it inside the new test?

> diff --git a/unpack-trees.c b/unpack-trees.c
> index 7a6df99d10..bc56195e27 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -52,6 +52,37 @@ static const char *unpack_plumbing_errors[NB_UNPACK_TREES_ERROR_TYPES] = {
>  	  ? ((o)->msgs[(type)])      \
>  	  : (unpack_plumbing_errors[(type)]) )
>  
> +static const char *super_prefixed(const char *path)
> +{
> +	/*
> +	 * This is used for the error messages above.
> +	 * We need to have exactly two buffer spaces.
> +	 */
> +	static struct strbuf buf[2] = {STRBUF_INIT, STRBUF_INIT};
> +	static int super_prefix_len = -1;
> +	static unsigned idx = 0;
> +
> +	if (!get_super_prefix())
> +		return path;
> +
> +	if (super_prefix_len < 0) {
> +		int i;
> +
> +		for (i = 0; i < ARRAY_SIZE(buf); i++)
> +			strbuf_addstr(&buf[i], get_super_prefix());
> +
> +		super_prefix_len = strlen(get_super_prefix());
> +	}
> +
> +	if (++idx >= ARRAY_SIZE(buf))
> +		idx = 0;
> +
> +	strbuf_setlen(&buf[idx], super_prefix_len);
> +	strbuf_addstr(&buf[idx], path);
> +
> +	return buf[idx].buf;
> +}

Hmph, as a reader of the code, I do not even want to wonder how
expensive get_super_prefix() is.  If the executable part of the
above were written like this, it would have been easier to
understand:

	if (super_prefix_len < 0) {
		if (!get_super_prefix())
			super_prefix_len = 0;
		else {
			int i;
			... prepare buf[] and set super_prefix_len ...;
                }
	}

        if (!super_prefix_len)
		return path;

	... use buf[] to do the prefixing and return it ...

^ permalink raw reply

* Re: [PATCH 2/2] diff: document the pattern format for diff.orderFile
From: Junio C Hamano @ 2017-01-11 21:06 UTC (permalink / raw)
  To: Richard Hansen; +Cc: git
In-Reply-To: <2fb4296d-f084-4a76-44f3-7dc7d7cca7b1@google.com>

Richard Hansen <hansenr@google.com> writes:

>> Back then we didn't even have wildmatch(), and used fnmatch()
>> instead, so forcing FNM_PATHNAME would have meant that people
>> wouldn't be able to say "foo*bar" to match "foo/other/bar"; with
>> wildmatch, "foo**bar" lets you defeat WM_PATHNAME so having
>> WM_PATHNAME always in effect is less of an issue, but with
>> fnmatch(), having FNM_PATHNAME always in effect has a lot of
>> downside.
>
> Ah, that makes sense.
>>
>> I'd expect that orderfile people have today will be broken and
>> require tweaking if you switched WM_PATHNAME on.
>
> OK, so we don't want to turn on WM_PATHNAME unless we do it for a new
> major version.

I do agree with you that if we were starting Git from scratch, or at
least if we were adding diff.orderfile feature today, we would have
used wildmatch(WM_PATHNAME) for this matching.  We would also have
used the same parser as used to read the exclude files (and when we
see negative matching entries in the parsed result, either errored
out or ignored them with warning).  That kind of change unfortunately
would require a major version bump, I am afraid.

^ permalink raw reply

* [PATCH] submodule absorbgitdirs: mention in docstring help
From: Stefan Beller @ 2017-01-11 20:59 UTC (permalink / raw)
  To: gitster; +Cc: git, jrnieder, Stefan Beller

This part was missing in f6f85861 (submodule: add absorb-git-dir function,
2016-12-12).

Noticed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
---
 git-submodule.sh | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/git-submodule.sh b/git-submodule.sh
index 554bd1c494..b43af1742c 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -12,7 +12,8 @@ USAGE="[--quiet] add [-b <branch>] [-f|--force] [--name <name>] [--reference <re
    or: $dashless [--quiet] update [--init] [--remote] [-N|--no-fetch] [-f|--force] [--checkout|--merge|--rebase] [--[no-]recommend-shallow] [--reference <repository>] [--recursive] [--] [<path>...]
    or: $dashless [--quiet] summary [--cached|--files] [--summary-limit <n>] [commit] [--] [<path>...]
    or: $dashless [--quiet] foreach [--recursive] <command>
-   or: $dashless [--quiet] sync [--recursive] [--] [<path>...]"
+   or: $dashless [--quiet] sync [--recursive] [--] [<path>...]
+   or: $dashless [--quiet] absorbgitdirs [--] [<path>...]"
 OPTIONS_SPEC=
 SUBDIRECTORY_OK=Yes
 . git-sh-setup
-- 
2.11.0.259.g7b30ecf4f0


^ permalink raw reply related

* Re: [PATCH 1/2] diff: document behavior of relative diff.orderFile
From: Junio C Hamano @ 2017-01-11 20:53 UTC (permalink / raw)
  To: Jeff King; +Cc: Richard Hansen, git
In-Reply-To: <20170111144158.ef6kle3vw3ejgmut@sigill.intra.peff.net>

Jeff King <peff@peff.net> writes:

> On Tue, Jan 10, 2017 at 02:15:11PM -0800, Junio C Hamano wrote:
>
>> Richard Hansen <hansenr@google.com> writes:
>> ...
>> > I think so.  (For bare repositories anyway; non-bare should be
>> > relative to GIT_WORK_TREE.)  Perhaps git_config_pathname() itself
>> > should convert relative paths to absolute so that every pathname
>> > setting automatically works without changing any calling code.
>> 
>> Yes, that was what I was alluding to.  We might have to wait until
>> major version boundary to do so, but I think that it is the sensible
>> way forward in the longer term to convert relative to absolute in
>> git_config_pathname().
>
> Yeah, I'd agree.
>
> I'm undecided on whether it would need to happen at a major version
> bump. ...
>
> So I dunno. I do hate to break even corner cases, but I'm having trouble
> imagining the scenario where somebody is actually using the current
> behavior in a useful way.

Thanks for a detailed analysis (I probably should have spelt them
out when I said "we might have to" to save you the trouble).


^ permalink raw reply

* Re: [musl] Re: Test failures when Git is built with libpcre and grep is built without it
From: Junio C Hamano @ 2017-01-11 20:49 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Andreas Schwab, A. Wilcox
In-Reply-To: <20170111100400.vhd5ytarqpujigbn@sigill.intra.peff.net>

Jeff King <peff@peff.net> writes:

> Another difference I found is that "[\d]" matches a literal "\" or "d"
> in ERE, but behaves like "[0-9]" in PCRE. I'll work up a patch based on
> that.

Wow, clever.

Thanks.

^ permalink raw reply

* Re: [PATCH v1] convert: add "status=delayed" to filter process protocol
From: Junio C Hamano @ 2017-01-11 20:45 UTC (permalink / raw)
  To: Lars Schneider
  Cc: Git mailing list, Eric Wong, Jakub Narębski,
	Torsten Bögershausen, Taylor Blau
In-Reply-To: <3165D057-7486-4ACB-8336-E63F49182CBE@gmail.com>

Lars Schneider <larsxschneider@gmail.com> writes:

>> Hmm, I would have expected that the basic flow would become
>> 
>> 	for each paths to be processed:
>> 		convert-to-worktree to buf
>> 		if not delayed:
>> 			do the caller's thing to use buf
>> 		else:
>> 			remember path
>> 
>> 	for each delayed paths:
>> 		ensure filter process finished processing for path
>> 		fetch the thing to buf from the process
>> 		do the caller's thing to use buf
>> 
>> and that would make quite a lot of sense.  However, what is actually
>> implemented is a bit disappointing from that point of view.  While
>> its first part is the same as above, the latter part instead does:
>> 
>> 	for each delayed paths:
>> 		checkout the path
>> ...
>
> I am not sure I can follow you here.
> ...
> I implemented the "checkout_delayed_entries" function in v1 because
> it solved the problem with minimal changes in the existing code. Our previous 
> discussion made me think that this is the preferred way:
>
>      I do not think we want to see such a rewrite all over the
>      codepaths.  It might be OK to add such a "these entries are known
>      to be delayed" list in struct checkout so that the above becomes
>      more like this:
>
>        for (i = 0; i < active_nr; i++)
>           checkout_entry(active_cache[i], state, NULL);
>      + checkout_entry_finish(state);
>
>      That is, addition of a single "some of the checkout_entry() calls
>      done so far might have been lazy, and I'll give them a chance to
>      clean up" might be palatable.  Anything more than that on the
>      caller side is not.

But that is apples-and-oranges comparision, no?  The old discussion
assumes there is no "caller's thing to use buf" other than "checkout
to the working tree", which is why the function its main loop calls
is "checkout_entry()" and the caller does not see the contents of
the filtered blob at all.  In that context, checkout_entry_finish()
that equally does not let the caller see the contents of the
filtered blob is quite appropriate.


^ permalink raw reply

* Re: [PATCH v1] convert: add "status=delayed" to filter process protocol
From: Junio C Hamano @ 2017-01-11 20:41 UTC (permalink / raw)
  To: Jakub Narębski
  Cc: Lars Schneider, Git mailing list, Eric Wong, Taylor Blau
In-Reply-To: <17fa31a5-8689-2766-952b-704f433a5b3a@gmail.com>

Jakub Narębski <jnareb@gmail.com> writes:

>> Yes, this problem happens every day with filters that perform network
>> requests (e.g. GitLFS). 
>
> Do I understand it correctly that the expected performance improvement
> thanks to this feature is possible only if there is some amount of
> parallelism and concurrency in the filter?  That is, filter can be sending
> one blob to Git while processing other one, or filter can be fetching blobs
> in parallel.

The first-object latency may not be helped, but by allowing
"delayed", the latency to retrieve the second and subsequent objects
can be hidden, I would think.

^ permalink raw reply

* Re: git cat-file on a submodule
From: Junio C Hamano @ 2017-01-11 18:56 UTC (permalink / raw)
  To: Jeff King; +Cc: David Turner, git
In-Reply-To: <20170111125330.3skwxdleoooacts6@sigill.intra.peff.net>

Jeff King <peff@peff.net> writes:

> On Tue, Jan 10, 2017 at 07:11:40PM -0500, David Turner wrote:
>
>> Why does git cat-file -t $sha:foo, where foo is a submodule, not work?
> ...
> I'm not sure if I'm complaining or not. I can't immediately think of
> something that would be horribly broken. But it really feels like you
> are using the wrong tool, and patching the tool to handle this case will
> probably lead to weird cognitive dissonance down the road.

"git cat-file [any option] $sha" should fail and complain for any
$sha that does not name an object that exists in the object database
of the repository it is working on.  

So I'd complain if the first example command quoted above from
David's mail stopped failing when the commit bound at 'foo' in the
top-level treeish $sha (i.e. a commit in the submodule) does not
exist in the top-level repository's object database.

> Maybe it would help to describe your use case more fully. If what you
> care about is the presumed type based on the surrounding tree, then
> maybe:
>
>   git --literal-pathspecs ls-tree $sha -- foo
>
> would be a better match.

Yup.

^ permalink raw reply

* [PATCH] lib-submodule-update.sh: drop unneeded shell
From: Stefan Beller @ 2017-01-11 18:47 UTC (permalink / raw)
  To: gitster; +Cc: git, Stefan Beller

In modern Git we prefer "git -C <cmd" over "(cd <somewhere && git <cmd>)"
as it doesn't need an extra shell.

Signed-off-by: Stefan Beller <sbeller@google.com>
---

And because it is in a setup function, we actually save the invocation
of 22 shells for a single run of the whole test suite.

Noticed while adding a lot more in near vincinity, though not as near
to cause merge conflicts, so sending it extra.

Thanks,
Stefan

 t/lib-submodule-update.sh | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/t/lib-submodule-update.sh b/t/lib-submodule-update.sh
index 79cdd34a54..915eb4a7c6 100755
--- a/t/lib-submodule-update.sh
+++ b/t/lib-submodule-update.sh
@@ -69,10 +69,7 @@ create_lib_submodule_repo () {
 
 		git checkout -b "replace_sub1_with_directory" "add_sub1" &&
 		git submodule update &&
-		(
-			cd sub1 &&
-			git checkout modifications
-		) &&
+		git -C sub1 checkout modifications &&
 		git rm --cached sub1 &&
 		rm sub1/.git* &&
 		git config -f .gitmodules --remove-section "submodule.sub1" &&
-- 
2.11.0.259.g7b30ecf4f0


^ permalink raw reply related

* Re: [PATCH 2/2] diff: document the pattern format for diff.orderFile
From: Richard Hansen @ 2017-01-11 18:36 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <xmqqwpe1o43k.fsf@gitster.mtv.corp.google.com>

On 2017-01-11 13:15, Junio C Hamano wrote:
> Richard Hansen <hansenr@google.com> writes:
>
>> On 2017-01-10 21:46, Junio C Hamano wrote:
>>> Richard Hansen <hansenr@google.com> writes:
>>>
>>>> I was looking at the code to see how the two file formats differed and
>>>> noticed that match_order() doesn't set the WM_PATHNAME flag when it
>>>> calls wildmatch().  That's unintentional (a bug), right?
>>>
>>> It has been that way from day one IIRC even before we introduced
>>> wildmatch()---IOW it may be intentional that the current code that
>>> uses wildmatch() does not use WM_PATHNAME.
>>
>> You are the original author (af5323e027 2005-05-30).  Do you remember
>> what your intention was?
>
> Yes.
>
> Back then we didn't even have wildmatch(), and used fnmatch()
> instead, so forcing FNM_PATHNAME would have meant that people
> wouldn't be able to say "foo*bar" to match "foo/other/bar"; with
> wildmatch, "foo**bar" lets you defeat WM_PATHNAME so having
> WM_PATHNAME always in effect is less of an issue, but with
> fnmatch(), having FNM_PATHNAME always in effect has a lot of
> downside.

Ah, that makes sense.

>
> I'd expect that orderfile people have today will be broken and
> require tweaking if you switched WM_PATHNAME on.

OK, so we don't want to turn on WM_PATHNAME unless we do it for a new 
major version.

I'll do another re-roll and document the non-WM_PATHNAME behavior. 
Perhaps I'll encourage users to prefer ** over * if they want to match 
slash (even though they are equivalent) so that migration is easier if 
we ever do turn on WM_PATHNAME.

-Richard

^ permalink raw reply

* Re: git cat-file on a submodule
From: David Turner @ 2017-01-11 18:21 UTC (permalink / raw)
  To: Jeff King; +Cc: git
In-Reply-To: <20170111125330.3skwxdleoooacts6@sigill.intra.peff.net>

On Wed, 2017-01-11 at 07:53 -0500, Jeff King wrote:
> On Tue, Jan 10, 2017 at 07:11:40PM -0500, David Turner wrote:
> 
> > Why does git cat-file -t $sha:foo, where foo is a submodule, not work?
> 
> Because "cat-file" is about inspecting items in the object database, and
> typically the submodule commit is not present in the superproject's
> database. So we cannot know its type. You can infer what it _should_ be
> from the surrounding tree, but you cannot actually do the object lookup.
> 
> Likewise, "git cat-file -t $sha1:Makefile" is not just telling you that
> we found a 100644 entry in the tree, so we expect a blob. It's resolving
> to a sha1, and then checking the type of that sha1 in the database. It
> _should_ be a blob, but if it isn't, then cat-file is the tool that
> should tell you that it is not.
> 
> > git rev-parse $sha:foo works.
> 
> Right. Because that command is about resolving a name to a sha1, which
> we can do even without the object.
> 
> > By "why", I mean "would anyone complain if I fixed it?"  FWIW, I think
> > -p should just return the submodule's sha.
> 
> I'm not sure if I'm complaining or not. I can't immediately think of
> something that would be horribly broken. But it really feels like you
> are using the wrong tool, and patching the tool to handle this case will
> probably lead to weird cognitive dissonance down the road.

OK, this makes sense to me.  I tried cat-file because that is the tool I
was familiar with, but that doesn't mean that it was the right thing to
do.

> Maybe it would help to describe your use case more fully. If what you
> care about is the presumed type based on the surrounding tree, then
> maybe:
> 
>   git --literal-pathspecs ls-tree $sha -- foo

That (minus --literal-pathspecs, which does not exist in the version of
git I happen to be using) is fine for my purposes.  Thanks.


^ permalink raw reply

* Re: [PATCH 2/2] diff: document the pattern format for diff.orderFile
From: Junio C Hamano @ 2017-01-11 18:15 UTC (permalink / raw)
  To: Richard Hansen; +Cc: git
In-Reply-To: <21b416ae-8bf6-4f82-25d3-e51a574e7746@google.com>

Richard Hansen <hansenr@google.com> writes:

> On 2017-01-10 21:46, Junio C Hamano wrote:
>> Richard Hansen <hansenr@google.com> writes:
>>
>>> I was looking at the code to see how the two file formats differed and
>>> noticed that match_order() doesn't set the WM_PATHNAME flag when it
>>> calls wildmatch().  That's unintentional (a bug), right?
>>
>> It has been that way from day one IIRC even before we introduced
>> wildmatch()---IOW it may be intentional that the current code that
>> uses wildmatch() does not use WM_PATHNAME.
>
> You are the original author (af5323e027 2005-05-30).  Do you remember
> what your intention was?

Yes.  

Back then we didn't even have wildmatch(), and used fnmatch()
instead, so forcing FNM_PATHNAME would have meant that people
wouldn't be able to say "foo*bar" to match "foo/other/bar"; with
wildmatch, "foo**bar" lets you defeat WM_PATHNAME so having
WM_PATHNAME always in effect is less of an issue, but with
fnmatch(), having FNM_PATHNAME always in effect has a lot of
downside.

I'd expect that orderfile people have today will be broken and
require tweaking if you switched WM_PATHNAME on.

^ permalink raw reply

* Re: [RFC PATCH 0/5] Localise error headers
From: Junio C Hamano @ 2017-01-11 18:08 UTC (permalink / raw)
  To: Jeff King; +Cc: Stefan Beller, Michael J Gruber, git@vger.kernel.org
In-Reply-To: <20170111113725.avl3wetwrfezdni2@sigill.intra.peff.net>

Jeff King <peff@peff.net> writes:

> Yes, I would think die_errno() is a no-brainer for translation, since
> the strerror() will be translated.
>
>>     apply.c:                die(_("internal error"));
>> 
>> That is funny, too. I think we should substitute that with
>> 
>>     die("BUG: untranslated, but what went wrong instead")
>
> Yep. We did not consistently use "BUG:" in the early days. I would say
> that "BUG" lines do not need to be translated. The point is that nobody
> should ever see them, so it seems like there is little point in giving
> extra work to translators.

In addition, "BUG: " is relatively recent introduction to our
codebase.  Perhaps having a separate BUG(<string>) function help the
distinction further?

^ permalink raw reply

* Re: [PATCH v1] convert: add "status=delayed" to filter process protocol
From: Taylor Blau @ 2017-01-11 17:59 UTC (permalink / raw)
  To: Lars Schneider; +Cc: Junio C Hamano, git, e, jnareb
In-Reply-To: <F533857D-9B51-44C1-8889-AA0542AD8250@gmail.com>

On Wed, Jan 11, 2017 at 11:13:00AM +0100, Lars Schneider wrote:
>
> In v1 I implemented a) with the busy-loop problem in mind.
>
> My thinking was this:
>
> If the filter sees at least one filter request twice then the filter knows that
> Git has already requested all files that require filtering. At that point the
> filter could just block the "delayed" answer to the latest filter request until
> at least one of the previously delayed requests can be fulfilled. Then the filter
> answers "delay" to Git until Git requests the blob that can be fulfilled. This
> process cycles until all requests can be fulfilled. Wouldn't that work?
>
> I think a "done" message by the filter is not easy. Right now the protocol works
> in a mode were Git always asks and the filter always answers. I believe changing
> the filter to be able to initiate a "done" message would complicated the protocol.
>
> > Additionally, the protocol should specify a sentinel "no more entries" value
> > that could be sent from Git to the filter to signal that there are no more files
> > to checkout. Some filters may implement mechanisms for converting files that
> > require a signal to know when all files have been sent. Specifically, Git LFS
> > (https://git-lfs.github.com) batches files to be transferred together, and needs
> > to know when all files have been announced to truncate and send the last batch,
> > if it is not yet full. I'm sure other filter implementations use a similar
> > mechanism and would benefit from this as well.
>
> I agree. I think the filter already has this info implicitly as explained above
> but an explicit message would be better!

This works, but forces some undesirable constraints:

- The filter has to keep track of all items in the checkout to determine how
  many times Git has asked about them. This is probably fine, though annoying.
  Cases where this is not fine is when there are _many_ items in the checkout
  forcing filter implementations to keep track of large sets of data.
- The protocol is now _must_ ask for the status of checkout items in a
  particular order. If the assumption is that "Git has sent me all items in the
  checkout by the point I see Git ask for the status of an item more than once",
  then Git _cannot_ ask for the status of a delayed item while there are still
  more items to checkout. This could be OK, so long as the protocol commits to
  it and documents this behavior.

What are your thoughts?

--
Thanks,
Taylor Blau

ttaylorr@github.com

^ permalink raw reply

* Re: [PATCH 2/2] diff: document the pattern format for diff.orderFile
From: Richard Hansen @ 2017-01-11 17:24 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <xmqq60lmpb4j.fsf@gitster.mtv.corp.google.com>

On 2017-01-10 21:46, Junio C Hamano wrote:
> Richard Hansen <hansenr@google.com> writes:
>
>> I was looking at the code to see how the two file formats differed and
>> noticed that match_order() doesn't set the WM_PATHNAME flag when it
>> calls wildmatch().  That's unintentional (a bug), right?
>
> It has been that way from day one IIRC even before we introduced
> wildmatch()---IOW it may be intentional that the current code that
> uses wildmatch() does not use WM_PATHNAME.

You are the original author (af5323e027 2005-05-30).  Do you remember 
what your intention was?

I would like to change it to pass WM_PATHNAME, but I'm not sure if that 
would break anyone.  I'm guessing probably not, because users probably 
expect WM_PATHNAME and would be surprised (like I was) to learn otherwise.

If we want to keep it as-is, I'll have to adjust [PATCH v2 2/2].

-Richard

^ permalink raw reply

* Re: [RFC PATCH 0/5] Localise error headers
From: Stefan Beller @ 2017-01-11 17:15 UTC (permalink / raw)
  To: Jeff King; +Cc: Michael J Gruber, git@vger.kernel.org
In-Reply-To: <20170111113725.avl3wetwrfezdni2@sigill.intra.peff.net>

On Wed, Jan 11, 2017 at 3:37 AM, Jeff King <peff@peff.net> wrote:
> On Tue, Jan 10, 2017 at 10:28:42AM -0800, Stefan Beller wrote:
>
>> > And then presumably that mix would gradually move to 100% consistency as
>> > more messages are translated. But the implicit question is: are there
>> > die() messages that should never be translated? I'm not sure.
>>
>> I would assume any plumbing command is not localizing?
>> Because in plumbing land, (easily scriptable) you may find
>> a grep on the output/stderr for a certain condition?
>
> That's the assumption I'm challenging. Certainly the behavior and
> certain aspects of the output of a plumbing command should remain the
> same over time. But error messages to stderr?

In an ideal world that assumption would hold true and any machine
readable information from the plumbing commands are on stdout and
nobody in their sane mind would ever try to parse stderr.

There is no easy way to check though except auditing all the code.
Having pointed out some string handling mistakes in the prior email,
my confidence is not high that plumbing commands are that strict about
separating useful machine output and errors for human consumption.

>
> It seems like they should be translated, because plumbing invoked on
> behalf of porcelain scripts is going to send its stderr directly to the
> user.

Well that could be solved, c.f. unpack-trees.c lines 15-55.

As another data point (coming from that area of strings):
If you grep for these plumbing strings in the project,
i.e.
$ git grep "would be overwritten by merge. Cannot merge"
$ git grep "not uptodate. Cannot merge."
...

you only find the occurrence in the  unpack-trees.c lines 15-55,
which means our test suite at least doesn't grep for error messages
but relies on the exit code of plumbing commands(?!)

>
>> To find a good example, "git grep die" giving me some food of though:
>>
>> die_errno(..) should always take a string marked up for translation,
>> because the errno string is translated?
>
> Yes, I would think die_errno() is a no-brainer for translation, since
> the strerror() will be translated.
>
>>     apply.c:                die(_("internal error"));
>>
>> That is funny, too. I think we should substitute that with
>>
>>     die("BUG: untranslated, but what went wrong instead")
>
> Yep. We did not consistently use "BUG:" in the early days. I would say
> that "BUG" lines do not need to be translated. The point is that nobody
> should ever see them, so it seems like there is little point in giving
> extra work to translators.
>
> -Peff

^ permalink raw reply

* Re: [PATCH v1] convert: add "status=delayed" to filter process protocol
From: Jakub Narębski @ 2017-01-11 14:53 UTC (permalink / raw)
  To: Lars Schneider; +Cc: Junio C Hamano, Git mailing list, Eric Wong, Taylor Blau
In-Reply-To: <9A1064BB-DA72-44DB-A875-39E007708A69@gmail.com>

W dniu 11.01.2017 o 11:20, Lars Schneider pisze: 
> On 10 Jan 2017, at 23:11, Jakub Narębski <jnareb@gmail.com> wrote:
>> W dniu 09.01.2017 o 00:42, Junio C Hamano pisze:
>>> larsxschneider@gmail.com writes:
>>>> From: Lars Schneider <larsxschneider@gmail.com>
>>>>
>>>> Some `clean` / `smudge` filters might require a significant amount of
>>>> time to process a single blob. During this process the Git checkout
>>>> operation is blocked and Git needs to wait until the filter is done to
>>>> continue with the checkout.
>>
>> Lars, what is expected use case for this feature; that is when do you
>> think this problem may happen?  Is it something that happened IRL?
> 
> Yes, this problem happens every day with filters that perform network
> requests (e.g. GitLFS). 

Do I understand it correctly that the expected performance improvement
thanks to this feature is possible only if there is some amount of
parallelism and concurrency in the filter?  That is, filter can be sending
one blob to Git while processing other one, or filter can be fetching blobs
in parallel.

This means that filter process works as a kind of (de)multiplexer for
fetching and/or processing blob contents, I think.

> [...] In GitLFS we even implemented Git wrapper
> commands to address the problem: https://github.com/git-lfs/git-lfs/pull/988
> The ultimate goal of this patch is to be able to get rid of the wrapper 
> commands.

I'm sorry, I don't see it how the wrapper helps here.

> 
>>>> Teach the filter process protocol (introduced in edcc858) to accept the
>>>> status "delayed" as response to a filter request. Upon this response Git
>>>> continues with the checkout operation and asks the filter to process the
>>>> blob again after all other blobs have been processed.
>>>
>>> Hmm, I would have expected that the basic flow would become
>>>
>>> 	for each paths to be processed:
>>> 		convert-to-worktree to buf
>>> 		if not delayed:
>>> 			do the caller's thing to use buf
>>> 		else:
>>> 			remember path
>>>
>>> 	for each delayed paths:
>>> 		ensure filter process finished processing for path
>>> 		fetch the thing to buf from the process
>>> 		do the caller's thing to use buf
>>
>> I would expect here to have a kind of event loop, namely
>>
>>        while there are delayed paths:
>>                get path that is ready from filter
>>                fetch the thing to buf (supporting "delayed")
>>                if path done
>>                        do the caller's thing to use buf 
>>                        (e.g. finish checkout path, eof convert, etc.)
>>
>> We can either trust filter process to tell us when it finished sending
>> delayed paths, or keep list of paths that are being delayed in Git.
> 
> I could implement "get path that is ready from filter" but wouldn't
> that complicate the filter protocol? I think we can use the protocol pretty 
> much as if with the strategy outlined here:
> http://public-inbox.org/git/F533857D-9B51-44C1-8889-AA0542AD8250@gmail.com/

You are talking about the "busy-loop" solution, isn't it?  In the
same notation, it would look like this:

          while there are delayed paths:
                  for each delayed path:
                          request path from filter [1]
                          fetch the thing (supporting "delayed") [2]
                          if path done
                                  do the caller's thing to use buf
                                  remove it from delayed paths list


Footnotes:
----------
1) We don't send the Git-side contents of blob again, isn't it?
   So we need some protocol extension / new understanding anyway.
   for example that we don't send contents if we request path again.
2) If path is not ready at all, filter protocol would send status=delayed
   with empty contents.  This means that we would immediately go to the
   next path, if there is one.

There are some cases where busy loop is preferable, but I don't think
it is the case here.


The event loop solution would require additional protocol extension,
but I don't think those complicate protocol too much:

A. Git would need to signal filter process that it has sent all paths,
and that it should be sending delayed paths when they are ready.  This
could be done for example with "command=continue".

    packet:          git> command=continue


B. Filter driver, in the event-loop phase, when (de)multiplexing fetching
or processing of data, it would need now to initialize transfer, instead
of waiting for Git to ask.  It could look like this:

    packet:          git< status=resumed [3]
    packet:          git< pathname=file/to/be/resumed [4]
    packet:          git< 0000
    packet:          git< SMUDGED_CONTENT_CONTINUED
    packet:          git< 0000
    packet:          git< 0000  # empty list, means "status=success" [5]

Footnotes:
----------
3.) It could be "status=success", "status=delayed", "command=resumed", etc.
4.) In the future we can add byte at which we resume, size of file, etc.
5.) Of course sending reminder of contents may be further delayed.

-- 
Jakub Narębski

^ permalink raw reply

* Re: [PATCH 1/2] diff: document behavior of relative diff.orderFile
From: Jeff King @ 2017-01-11 14:41 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Richard Hansen, git
In-Reply-To: <xmqqtw96pno0.fsf@gitster.mtv.corp.google.com>

On Tue, Jan 10, 2017 at 02:15:11PM -0800, Junio C Hamano wrote:

> Richard Hansen <hansenr@google.com> writes:
> 
> >> A related tangent.
> >>
> >> I wonder if anything that uses git_config_pathname() should be
> >> relative to GIT_DIR when it is not absolute.
> >
> > I think so.  (For bare repositories anyway; non-bare should be
> > relative to GIT_WORK_TREE.)  Perhaps git_config_pathname() itself
> > should convert relative paths to absolute so that every pathname
> > setting automatically works without changing any calling code.
> 
> Yes, that was what I was alluding to.  We might have to wait until
> major version boundary to do so, but I think that it is the sensible
> way forward in the longer term to convert relative to absolute in
> git_config_pathname().

Yeah, I'd agree.

I'm undecided on whether it would need to happen at a major version
bump. The existing semantics are fairly insane, and would cause a lot of
confusing breakages. We can imagine use of relative paths in a bare
repository falls into one of a few categories:

  1. The user generally runs by "cd /path/to/bare.git && git ...". This
     would be unaffected, as relative and $GIT_DIR are the same.

  2. The user runs via "cd /path/to/bare.git/some-subdir". This would be
     broken, but I have trouble imagining that they really wanted to
     read something like "objects/orderfile".

  3. The user runs via "GIT_DIR=/path/to/bare.git" from various
     directories. This case is probably horribly broken, as things like
     diff.orderFile will complain if they ever run from a directory that
     doesn't have the order file.

  4. They run GIT_DIR=/path/to/bare.git from a consistent origin
     directory. This _does_ work, and we'd be breaking it. Though I kind
     of question why the config in $GIT_DIR is meant to apply to a file
     in a totally unrelated directory.

     I suppose somebody could be relying on the behavior where setting
     GIT_DIR uses the current directory as the working tree (i.e., if
     core.bare is "true" in bare.git). But then, we'd consider their
     working directory as the working tree and read from that anyway. So
     the behavior would stay the same.

So I dunno. I do hate to break even corner cases, but I'm having trouble
imagining the scenario where somebody is actually using the current
behavior in a useful way.

-Peff

^ permalink raw reply

* [RFC PATCH 3/2] vreportf: add prefix to each line
From: Jeff King @ 2017-01-11 14:07 UTC (permalink / raw)
  To: git
In-Reply-To: <20170111140138.5p647xuqpqrej63b@sigill.intra.peff.net>

On Wed, Jan 11, 2017 at 09:01:38AM -0500, Jeff King wrote:

>   [1/2]: Revert "vreportf: avoid intermediate buffer"
>   [2/2]: vreport: sanitize ASCII control chars

We've talked before about repeating the "error:" header for multi-line
messages. The reversion in patch 1 makes this easy to play with, so I
did. I kind of hate it. But here it is, for your viewing pleasure.

-- >8 --
Subject: vreportf: add prefix to each line

Some error and warning messages are multi-line, but we put
the prefix only on the first line. This means that either
the subsequent lines don't have any indication that they're
connected to the original error, or the caller has to make
multiple calls to error() or warning(). And that's not even
possible for die().

Instead, let's put the prefix at the start of each line,
just as advise() does.

Note that this patch doesn't update the tests yet, so it
causes tons of failures. This is just to see what it might
look like.

It's actually kind of ugly.  For instance, a failing test in
t3600 now produces:

   error: the following files have staged content different from both the
   error: file and the HEAD:
   error:     bar.txt
   error:     foo.txt
   error: (use -f to force removal)

which seems a bit aggressive.  It also screws up messages
which indent with tabs (the prefix eats up some of the
tabstop, making the indentation smaller). And the result is
a little harder if you need to cut-and-paste the file lines
(if your terminal lets you triple-click to select a whole
line, now you have this "error:" cruft on the front).

It may be possible to address some of that by using some
other kind of continuation marker (instead of just repeating
the prefix), and expanding initial tabs.

Signed-off-by: Jeff King <peff@peff.net>
---
 usage.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/usage.c b/usage.c
index ad6d2910f..8a1f6ff4e 100644
--- a/usage.c
+++ b/usage.c
@@ -8,18 +8,30 @@

 static FILE *error_handle;

+static void show_line(FILE *fh, const char *prefix, const char *line, int len)
+{
+	fprintf(fh, "%s%.*s\n", prefix, len, line);
+}
+
 void vreportf(const char *prefix, const char *err, va_list params)
 {
 	char msg[4096];
 	FILE *fh = error_handle ? error_handle : stderr;
+	const char *base;
 	char *p;

 	vsnprintf(msg, sizeof(msg), err, params);
+
+	base = msg;
 	for (p = msg; *p; p++) {
-		if (iscntrl(*p) && *p != '\t' && *p != '\n')
+		if (*p == '\n') {
+			show_line(fh, prefix, base, p - base);
+			base = p + 1;
+		} else if (iscntrl(*p) && *p != '\t')
 			*p = '?';
 	}
-	fprintf(fh, "%s%s\n", prefix, msg);
+
+	show_line(fh, prefix, base, p - base);
 }

 static NORETURN void usage_builtin(const char *err, va_list params)
-- 
2.11.0.627.gfa6151259

^ permalink raw reply related

* [PATCH 2/2] vreport: sanitize ASCII control chars
From: Jeff King @ 2017-01-11 14:02 UTC (permalink / raw)
  To: git
In-Reply-To: <20170111140138.5p647xuqpqrej63b@sigill.intra.peff.net>

Our error() and die() calls may report messages with
arbitrary data (e.g., filenames or even data from a remote
server). Let's make it harder to cause confusion with
mischievous filenames. E.g., try:

  git rev-parse "$(printf "\rfatal: this argument is too sneaky")" --

or

  git rev-parse "$(printf "\x1b[5mblinky\x1b[0m")" --

Let's block all ASCII control characters, with the exception
of TAB and LF. We use both in our own messages (and we are
necessarily sanitizing the complete output of snprintf here,
as we do not have access to the individual varargs). And TAB
and LF are unlikely to cause confusion (you could put
"\nfatal: sneaky\n" in your filename, but it would at least
not _cover up_ the message leading to it, unlike "\r").

We'll replace the characters with a "?", which is similar to
how "ls" behaves. It might be nice to do something less
lossy, like converting them to "\x" hex codes. But replacing
with a single character makes it easy to do in-place and
without worrying about length limitations. This feature
should kick in rarely enough that the "?" marks are almost
never seen.

We'll leave high-bit characters as-is, as they are likely to
be UTF-8 (though there may be some Unicode mischief you
could cause, which may require further patches).

Signed-off-by: Jeff King <peff@peff.net>
---
 usage.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/usage.c b/usage.c
index b1cbe6799..ad6d2910f 100644
--- a/usage.c
+++ b/usage.c
@@ -12,7 +12,13 @@ void vreportf(const char *prefix, const char *err, va_list params)
 {
 	char msg[4096];
 	FILE *fh = error_handle ? error_handle : stderr;
+	char *p;
+
 	vsnprintf(msg, sizeof(msg), err, params);
+	for (p = msg; *p; p++) {
+		if (iscntrl(*p) && *p != '\t' && *p != '\n')
+			*p = '?';
+	}
 	fprintf(fh, "%s%s\n", prefix, msg);
 }

-- 
2.11.0.627.gfa6151259

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox