Git development
 help / color / mirror / Atom feed
* Re: Git bisect does not find commit introducing the bug
From: Junio C Hamano @ 2017-02-20  9:02 UTC (permalink / raw)
  To: Christian Couder; +Cc: Alex Hoffman, Johannes Sixt, Stephan Beyer, git
In-Reply-To: <CAP8UFD3ngMvVy2XLzYNn9OFbS+zQpWTW=pravpHhA-0PcDVhfg@mail.gmail.com>

Christian Couder <christian.couder@gmail.com> writes:

> git bisect makes some assumptions that are true most of the time, so
> in practice it works well most of the time.

I think we need to clarify the documentation and ask you to stop
using such fuzzy phrases like "assumptions" and "most of the time"
;-).

For bisection to work, "git bisect" requires a very simple thing,
which is that your history is bisectable wrt one particular "trait",
i.e. what you are interested in (e.g. the trait may be "the commit
has this feature in a broken form").

What does it mean for a history to be "bisectable", then?

A bisectable history has commits with the "trait" and commits
without the "trait".  Given any commit with the "trait", all commits
that are decendant of such commit must have the "trait".  Also given
any commit without the "trait", all commits that are ancestor of
such a commit must not have the "trait".

And your goal is to find one commit with the "trait" whose direct
parent commits all lack the "trait".

If and only if you can formulate your problem into the above form,
"git bisect" can help you by not requiring you to check each and
every commit in the history.

Depending on the way you define the "trait", your history may not be
bisectable, but by formulating the "trait" carefully, many such
history can be made bisectable.  In the "Recently some commit broke
feature X.  Before then the feature used to work.  In an ancient
past, the feature did not even exist" example, if you set "trying to
use feature X breaks" as the "trait", your history is not bisectable
unless you ignore the ancient part that did not even have the
feature.  If you restate your "trait" to "feature X does not exist
in a broken form", however, the history becomes bisectable.

Historically, we called commits with "trait" BAD and others GOOD,
because bisection was used primarily to hunt for bugs.  It may be
easier to understand if the user thinks of commits without "trait"
as OLD (i.e. commits in the older part of the history are not yet
contaminated), and commits with "trait" as NEW (i.e. at some point,
there is an event to introduce the "trait" and newer part of the
history is contaminated by that event ever since).

^ permalink raw reply

* Re: [PATCH] config: preserve <subsection> case for one-shot config on the command line
From: Junio C Hamano @ 2017-02-20  9:58 UTC (permalink / raw)
  To: Jeff King; +Cc: Lars Schneider, Jonathan Tan, git, sbeller
In-Reply-To: <xmqqwpcptxps.fsf@gitster.mtv.corp.google.com>

Junio C Hamano <gitster@pobox.com> writes:

> I still haven't queued any of the variants I posted (and I do not
> think other people sent their own versions, either).  I need to pick
> one and queue, with a test or two.  Perhaps after -rc2.  
>
> Others are welcome to work on it while I cut -rc2 tomorrow, so that
> by the time I see their patch all that is left for me to do is to
> apply it ;-)

Since nothing seems to have happened in the meantime, here is what
I'll queue so that we won't forget for now.  Lars's tests based on
how the scripted "git submodule" uses "git config" may still be
valid, but it is somewhat a roundabout way to demonstrate the
breakage and not very effective way to protect the fix, so I added a
new test that directly tests "git -c <var>=<val> <command>".

I am not sure if this updated one is worth doing, or the previous
"strchr and strrchr" I originally wrote was easier to understand.

One thing I noticed is that "git config --get X" will correctly
diagnose that a dot-less X is not a valid variable name, but we do
not seem to diagnose "git -c X=V <cmd>" as invalid.

Perhaps we should, but it is not the focus of this topic.

-- >8 --
From: Junio C Hamano <gitster@pobox.com>
Date: Wed, 15 Feb 2017 15:48:44 -0800
Subject: [PATCH] config: preserve <subsection> case for one-shot config on the
 command line

The "git -c <var>=<val> cmd" mechanism is to pretend that a
configuration variable <var> is set to <val> while the cmd is
running.  The code to do so however downcased <var> in its entirety,
which is wrong for a three-level <section>.<subsection>.<variable>.

The <subsection> part needs to stay as-is.

Reported-by: Lars Schneider <larsxschneider@gmail.com>
Diagnosed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 config.c                  | 30 ++++++++++++++++++++++++++++-
 t/t1351-config-cmdline.sh | 48 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 77 insertions(+), 1 deletion(-)
 create mode 100755 t/t1351-config-cmdline.sh

diff --git a/config.c b/config.c
index 0dfed682b8..ba9a5911b0 100644
--- a/config.c
+++ b/config.c
@@ -199,6 +199,34 @@ void git_config_push_parameter(const char *text)
 	strbuf_release(&env);
 }
 
+/*
+ * downcase the <section> and <variable> in <section>.<variable> or
+ * <section>.<subsection>.<variable> and do so in place.  <subsection>
+ * is left intact.
+ */
+static void canonicalize_config_variable_name(char *varname)
+{
+	char *cp, *last_dot;
+
+	/* downcase the first segment */
+	for (cp = varname; *cp; cp++) {
+		if (*cp == '.')
+			break;
+		*cp = tolower(*cp);
+	}
+	if (!*cp)
+		return;
+
+	/* scan for the last dot */
+	for (last_dot = cp; *cp; cp++)
+		if (*cp == '.')
+			last_dot = cp;
+
+	/* downcase the last segment */
+	for (cp = last_dot; *cp; cp++)
+		*cp = tolower(*cp);
+}
+
 int git_config_parse_parameter(const char *text,
 			       config_fn_t fn, void *data)
 {
@@ -221,7 +249,7 @@ int git_config_parse_parameter(const char *text,
 		strbuf_list_free(pair);
 		return error("bogus config parameter: %s", text);
 	}
-	strbuf_tolower(pair[0]);
+	canonicalize_config_variable_name(pair[0]->buf);
 	if (fn(pair[0]->buf, value, data) < 0) {
 		strbuf_list_free(pair);
 		return -1;
diff --git a/t/t1351-config-cmdline.sh b/t/t1351-config-cmdline.sh
new file mode 100755
index 0000000000..acb8dc3b15
--- /dev/null
+++ b/t/t1351-config-cmdline.sh
@@ -0,0 +1,48 @@
+#!/bin/sh
+
+test_description='git -c var=val'
+
+. ./test-lib.sh
+
+test_expect_success 'last one wins: two level vars' '
+	echo VAL >expect &&
+
+	# sec.var and sec.VAR are the same variable, as the first
+	# and the last level of a configuration variable name is
+	# case insensitive.  Test both setting and querying.
+
+	git -c sec.var=val -c sec.VAR=VAL config --get sec.var >actual &&
+	test_cmp expect actual &&
+	git -c SEC.var=val -c sec.var=VAL config --get sec.var >actual &&
+	test_cmp expect actual &&
+
+	git -c sec.var=val -c sec.VAR=VAL config --get SEC.var >actual &&
+	test_cmp expect actual &&
+	git -c SEC.var=val -c sec.var=VAL config --get sec.VAR >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'last one wins: three level vars' '
+	echo val >expect &&
+
+	# v.a.r and v.A.r are not the same variable, as the middle
+	# level of a three-level configuration variable name is
+	# case sensitive.  Test both setting and querying.
+
+	git -c v.a.r=val -c v.A.r=VAL config --get v.a.r >actual &&
+	test_cmp expect actual &&
+	git -c v.a.r=val -c v.A.r=VAL config --get V.a.R >actual &&
+	test_cmp expect actual &&
+
+	echo VAL >expect &&
+	git -c v.a.r=val -c v.a.R=VAL config --get v.a.r >actual &&
+	test_cmp expect actual &&
+	git -c v.a.r=val -c V.a.r=VAL config --get v.a.r >actual &&
+	test_cmp expect actual &&
+	git -c v.a.r=val -c v.a.R=VAL config --get V.a.R >actual &&
+	test_cmp expect actual &&
+	git -c v.a.r=val -c V.a.r=VAL config --get V.a.R >actual &&
+	test_cmp expect actual
+'
+
+test_done
-- 
2.12.0-rc2-221-g8fa194a99f


^ permalink raw reply related

* Re: Cross-referencing the Git mailing list archive with their corresponding commits in `pu`
From: Johannes Schindelin @ 2017-02-17 17:50 UTC (permalink / raw)
  To: Josh Triplett; +Cc: git
In-Reply-To: <alpine.DEB.2.20.1702041206130.3496@virtualbox>

Hi Josh,

On Mon, 6 Feb 2017, Johannes Schindelin wrote:

> as discussed at the GitMerge, I am trying to come up with tooling that
> will allow for substantially less tedious navigation between the local
> repository, the mailing list, and what ends up in the `pu` branch.

I found a little bit more time last Friday to play with the
cross-correlation between commits in `pu` and mails in
public-inbox/git.git and it is worse than I previously assumed.

Just as a reminder: my plan was to start developing tools that will
ultimately help me as well as other contributors with the arcane mailing
list model of patch submission. And my first target was the seemingly
simple task of figuring out the mail corresponding to any given commit in
`pu` (i.e. the mail that contained the patch, and whose mail thread is
hence expected to have the entire patch review, and to which I would be
expected to respond if I find a problem with that commit).

And since it is all-too-common that the oneline is adjusted before
applying the patch, the Subject:/oneline pair is not a good candidate to
find matches.

My next best guess was that the author date would not be touched, so the
pair of Date: and authordate should make a good candidate.

My initial finding was that this is not without problems, as some mails
were sent with identical Date: lines (most likely due to bugs in the
tools, e.g. the well-known and already fixed bug in git-am, and hence
git-rebase, where it would apply all patches using the first patch's
author date), and worse: some of those mails contained actual patch series
that actually made it into Git's commit history.

But those are not the only problems.

For starters, I tried to cross-correlate *just* the commits that entered
`pu` since one week ago (git rev-list --since=1.week.ago upstream/pu) with
mails of the past month in the mailing list archive.

One obvious caveat is that RFC 2822 is ambiguous when it comes to the date
format. While it seems nice that you *can* write single-digit day numbers
as single digit if you want, or with a leading zero, or with a leading
space, it makes it impossible to get away with exact matching. I did not
really want to complicate my research by parsing the dates and normalizing
them to epoch + timezone, also because I wanted results quick, so I simply
normalized the dates to have leading zeroes for single-digit day numbers,
that seems to work for the moment).

The first category of problematic commits come as no surprise: merges. We
do not even have a way to represent them as mails. I simply excluded them
from the remainder of this study.

The second category should not be all that surprising, too: Junio often
adjusts the release notes without sending those patches out for review.
Those commits are:

363588f (### match next, Junio C Hamano 2017-02-17)
2076907 (Git 2.12-rc2, Junio C Hamano 2017-02-17)
076c053 (Hopefully the final batch of mini-topics before the final,
	Junio C Hamano 2017-02-16)
ae86372 (Revert "reset: add an example of how to split a commit into two",
	Junio C Hamano 2017-02-16)
d09b692 (A bit more for -rc2, Junio C Hamano 2017-02-15)

There is a third category, and this one *does* come as a surprise to me.
It appears that at least *some* patches' Date: lines are either ignored or
overridden or changed on their way from the mailing list into Git's commit
history. There was only one commit in that commit range:

3c0cb0c (read_loose_refs(): read refs using resolve_ref_recursively(),
	Michael Haggerty 2017-02-09)

This one was committed with an author date "Thu, 09 Feb 2017 21:53:52
+0100" but it appears that there was no mail sent to the Git mailing list
with that particular Date: header and the *actual* mail containing the
patch was sent with a Date: header "Fri, 10 Feb 2017 12:16:19 +0100"
(Message-ID:
d8e906d969700acbca8dc717673d0a9cdc910f62.1486724698.git.mhagger@alum.mit.edu).

It is labor-intensive, but possible to find the correlation manually in
this case because the Subject: line has been left intact.

However, this points to a serious problem with my approach: I try to
re-create information that is actually not available (which Message-ID
corresponds to a given commit name). Since that information is not
available, it is quite possible that this information cannot be retrieved
accurately (and Michael's commit demonstrates that this is not a merely
theoretic consideration). I do not know that I can fix this on my side.

> P.S.: I used public-inbox.org links instead of commit references to the
> Git repository containing the mailing list archive, because the format
> of said Git repository is so unfavorable that it was determined very
> quickly in a discussion between Patrick Reynolds (GitHub) and myself
> that it would put totally undue burden on GitHub to mirror it there
> (compare also Carlos Nieto's talk at GitMerge titled "Top Ten Worst
> Repositories to host on GitHub").

Since the main problem was the unfavorable commit history structure, I
*think* that it may be possible to auto-process public-inbox.org/git.git
into a frequently-rewritten branch that squashes all commits from past
years into single, per-year commits (and the same for recent months, the
past days, and a single commit accumulating the current day's commits) and
that that may solve the problematic structure. The blob names would remain
identical to what is on public-inbox, of course.

Ciao,
Johannes

P.S.: The *mini* scripts I used were

cat generate-date-index.sh <<\EOF
#! /bin/sh

cd public-inbox-git

since_commit="$1"
test -n "$since_commit" ||
since_commit=$(git rev-list --since=1.month.ago master --reverse | head -n 1)
for sha1 in $(git diff --raw --no-abbrev $since_commit..master | cut -f 4 -d \ )
do
	printf '%s\t%s\n' \
		"$(git cat-file blob $sha1 |
		sed -n \
			-e 's/^Date:[ 	]*\([^,]*,\) *\([1-9] .*\)/\1 0\2/p' \
			-e 's/^Date:[ 	]*\([^,]*,\) *\([0-9][0-9] .*\)/\1 \2/p' \
			-e '/^$/q')" \
		$sha1
done | less -S
EOF

to generate a file date-index.txt containing "date\tblob" pairs where the blob
refers to the SHA-1 of the mail in public-inbox/git.git, and

cat >match-pu.sh <<\EOF
#! /bin/sh

for commit in $(git rev-list --since=1.week.ago --no-merges upstream/pu)
do
	date="$(git show -s --format=%aD $commit |
		sed 's/, \([1-9]\) /, 0\1 /')" # fix up Git's idea of RFC 2822
	mail_id=$(grep "^$date" date-index.txt | sed 's/.*	//')
	case "$mail_id" in
	'')
		echo "ERROR: no mail found for $commit (date $date)" >&2
		git show -s --pretty='tformat:%h (%s, %an %ad)' --date=short \
			$commit >&2
		;;
	*' '*)
		echo "ERROR: multiple candidates found for $commit ($mail_id)" >&2
		;;
	*)
		echo "$date $mail_id"
		;;
	esac
done
EOF

to try to match the author dates with the ones in date-index.txt. The
obvious next improvement is to list also Message-ID in date-index.txt.

^ permalink raw reply

* Re: [PATCH v2 04/16] files-backend: replace *git_path*() with files_path()
From: Michael Haggerty @ 2017-02-20 11:23 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy, git
  Cc: Junio C Hamano, Johannes Schindelin, Ramsay Jones, Stefan Beller,
	novalis
In-Reply-To: <20170216114818.6080-5-pclouds@gmail.com>

On 02/16/2017 12:48 PM, Nguyễn Thái Ngọc Duy wrote:
> This centralizes all path rewriting of files-backend.c in one place so
> we have easier time removing the path rewriting later. There could be
> some hidden indirect git_path() though, I didn't audit the code to the
> bottom.

Almost all of the calls to `files_path()` [1] take one of the following
forms:

* `files_path(refs, &sb, "packed-refs")`
* `files_path(refs, &sb, "%s", refname)`
* `files_path(refs, &sb, "logs/%s", refname)`

(though sometimes `refname` is not the name of a reference but rather
the name of a directory containing references, like "refs/heads").

This suggests to me that it would be more natural to have three
functions, `files_packed_refs_path()`, `files_loose_ref_path()`, and
`files_reflog_path()`, with no `fmt` arguments. Aside from making the
callers a bit simpler and the implementation of each of the three
functions simpler (they wouldn't have to deal with variable argument
lists), at the cost of needing three similar functions.

But I think the split would also be advantageous from a design point of
view. The relative path locations of loose reference files, reflog
files, and the packed-refs file is kind of a coincidence, and it would
be advantageous to encode that policy in three functions rather than
continuing to spread knowledge of those assumptions around.

It would also make it easier to switch to a new system of encoding
reference names, for example so that reference names that differ only in
case can be stored on a case-insensitive filesystem, or to store reflogs
using a naming scheme that is not susceptible to D/F conflicts so that
we can retain reflogs for deleted references.

Michael

[1] The only exception I see is one call `files_path(refs, &sb,
"logs")`, which is a prelude to iterating over the reflog files. This is
actually an example of the code giving us a hint that the design is
wrong: if you iterate only over the directories under `git_path(logs)`,
you miss the reflogs for worktree references.


^ permalink raw reply

* Re: [PATCH v4 06/15] files-backend: remove the use of git_path()
From: Michael Haggerty @ 2017-02-20 11:34 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy, git
  Cc: Junio C Hamano, Johannes Schindelin, Ramsay Jones, Stefan Beller,
	novalis
In-Reply-To: <20170218133303.3682-7-pclouds@gmail.com>

On 02/18/2017 02:32 PM, Nguyễn Thái Ngọc Duy wrote:
> Given $GIT_DIR and $GIT_COMMON_DIR, files-backend is now in charge of
> deciding what goes where. The end goal is to pass $GIT_DIR only. A
> refs "view" of a linked worktree is a logical ref store that combines
> two files backends together.
> 
> (*) Not entirely true since strbuf_git_path_submodule() still does path
> translation underneath. But that's for another patch.
> 
> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
> ---
>  refs/files-backend.c | 37 +++++++++++++++++++++++++++++++++----
>  1 file changed, 33 insertions(+), 4 deletions(-)
> 
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index b599ddf92..dbcaf9bda 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -924,6 +924,9 @@ struct files_ref_store {
>  	 */
>  	const char *submodule;
>  
> +	struct strbuf gitdir;
> +	struct strbuf gitcommondir;

Is there a reason for these to be `strbuf`s rather than `const char *`?
(One reason would be if you planned to use the `len` field, but I don't
think you do so.)

> @@ -937,15 +940,33 @@ static void files_path(struct files_ref_store *refs, struct strbuf *sb,
>  {
>  	struct strbuf tmp = STRBUF_INIT;
>  	va_list vap;
> +	const char *ref;
>  
>  	va_start(vap, fmt);
>  	strbuf_vaddf(&tmp, fmt, vap);
>  	va_end(vap);
> -	if (refs->submodule)
> +	if (refs->submodule) {
>  		strbuf_git_path_submodule(sb, refs->submodule,
>  					  "%s", tmp.buf);
> -	else
> -		strbuf_git_path(sb, "%s", tmp.buf);
> +	} else if (!strcmp(tmp.buf, "packed-refs") ||
> +		   !strcmp(tmp.buf, "logs")) { /* non refname path */
> +		strbuf_addf(sb, "%s/%s", refs->gitcommondir.buf, tmp.buf);
> +	} else if (skip_prefix(tmp.buf, "logs/", &ref)) { /* reflog */
> +		if (is_per_worktree_ref(ref))
> +			strbuf_addf(sb, "%s/%s", refs->gitdir.buf, tmp.buf);
> +		else
> +			strbuf_addf(sb, "%s/%s", refs->gitcommondir.buf, tmp.buf);

This code would also be simpler if there were separate functions for
packed-refs, loose references, and reflogs.

> [...]

Michael


^ permalink raw reply

* [PATCH v2] send-email: only allow one address per body tag
From: Johan Hovold @ 2017-02-20 11:44 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Matthieu Moy, Jeff King, Kevin Daudt, Larry Finger, git,
	Johan Hovold
In-Reply-To: <xmqqbmu0pgg6.fsf@gitster.mtv.corp.google.com>

Adding comments after a tag in the body is a common practise (e.g. in
the Linux kernel) and git-send-email has been supporting this for years
by removing any trailing cruft after the address.

After some recent changes, any trailing comment is now instead appended
to the recipient name (with some random white space inserted) resulting
in undesirable noise in the headers, for example:

CC: "# 3 . 3 . x : 1b9508f : sched : Rate-limit newidle" <stable@vger.kernel.org>

Revert to the earlier behaviour of discarding anything after the (first)
address in a tag while parsing the body.

Note that multiple addresses after are still allowed after a command
line switch (and in a CC header field).

Also note that --suppress-cc=self was never honoured when using multiple
addresses in a tag.

Fixes: b1c8a11c8024 ("send-email: allow multiple emails using --cc, --to
and --bcc")
Fixes: e3fdbcc8e164 ("parse_mailboxes: accept extra text after <...>
address")
Signed-off-by: Johan Hovold <johan@kernel.org>
---

v2:
 - update the cc-trailer test
 - amend commit message and mention the broken --suppress-cc=self


 git-send-email.perl   | 2 +-
 t/t9001-send-email.sh | 7 +++----
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/git-send-email.perl b/git-send-email.perl
index 068d60b3e698..eea0a517f71b 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -1563,7 +1563,7 @@ foreach my $t (@files) {
 	# Now parse the message body
 	while(<$fh>) {
 		$message .=  $_;
-		if (/^(Signed-off-by|Cc): (.*)$/i) {
+		if (/^(Signed-off-by|Cc): ([^>]*>?)/i) {
 			chomp;
 			my ($what, $c) = ($1, $2);
 			chomp $c;
diff --git a/t/t9001-send-email.sh b/t/t9001-send-email.sh
index 0f398dd1603d..60a80f60b268 100755
--- a/t/t9001-send-email.sh
+++ b/t/t9001-send-email.sh
@@ -148,7 +148,6 @@ cat >expected-cc <<\EOF
 !two@example.com!
 !three@example.com!
 !four@example.com!
-!five@example.com!
 EOF
 "
 
@@ -159,9 +158,9 @@ test_expect_success $PREREQ 'cc trailer with various syntax' '
 	Test Cc: trailers.
 
 	Cc: one@example.com
-	Cc: <two@example.com> # this is part of the name
-	Cc: <three@example.com>, <four@example.com> # not.five@example.com
-	Cc: "Some # Body" <five@example.com> [part.of.name.too]
+	Cc: <two@example.com> # trailing comments are ignored
+	Cc: <three@example.com>, <not.four@example.com> one address per line
+	Cc: "Some # Body" <four@example.com> [ <also.a.comment> ]
 	EOF
 	clean_fake_sendmail &&
 	git send-email -1 --to=recipient@example.com \
-- 
2.11.1


^ permalink raw reply related

* Re: [PATCH v2] send-email: only allow one address per body tag
From: Matthieu Moy @ 2017-02-20 12:10 UTC (permalink / raw)
  To: Johan Hovold; +Cc: Junio C Hamano, Jeff King, Kevin Daudt, Larry Finger, git
In-Reply-To: <20170220114406.19436-1-johan@kernel.org>

Johan Hovold <johan@kernel.org> writes:

> --- a/git-send-email.perl
> +++ b/git-send-email.perl
> @@ -1563,7 +1563,7 @@ foreach my $t (@files) {
>  	# Now parse the message body
>  	while(<$fh>) {
>  		$message .=  $_;
> -		if (/^(Signed-off-by|Cc): (.*)$/i) {
> +		if (/^(Signed-off-by|Cc): ([^>]*>?)/i) {

I think this is acceptable, but this doesn't work with trailers like

Cc: "Some > Body" <Some.Body@example.com>

A proper management of this kind of weird address should be doable by
reusing the regexp parsing "..." in parse_mailbox:

	my $re_quote = qr/"(?:[^\"\\]|\\.)*"/;

So the final regex would look like

if (/^(Signed-off-by|Cc): (([^>]*|"(?:[^\"\\]|\\.)*")>?)/i) {

I don't think that should block the patch inclusion, but it may be worth
considering.

Anyway, thanks for the patch!

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply

* Re: [PATCH v4 14/15] files-backend: remove submodule_allowed from files_downcast()
From: Michael Haggerty @ 2017-02-20 12:11 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy, git
  Cc: Junio C Hamano, Johannes Schindelin, Ramsay Jones, Stefan Beller,
	novalis
In-Reply-To: <20170218133303.3682-15-pclouds@gmail.com>

On 02/18/2017 02:33 PM, Nguyễn Thái Ngọc Duy wrote:
> Since submodule or not is irrelevant to files-backend after the last
> patch, remove the parameter from files_downcast() and kill
> files_assert_main_repository().
> 
> PS. This implies that all ref operations are allowed for submodules. But
> we may need to look more closely to see if that's really true...

I think you are jumping the gun here.

Even after your changes, there is still a significant difference between
the main repository and submodules: we have access to the object
database for the main repository, but not for submodules.

So, for example, the following don't work for submodules:

* `parse_object()` is used to ensure that references are only pointed at
valid objects, and that branches are only pointed at commit objects.

* `peel_object()` is used to write the peeled version of references in
`packed-refs`, and to peel references while they are being iterated
over. Since this doesn't work, I think you can't write `packed-refs` in
a submodule.

These limitations, I think, imply that submodule references have to be
treated as read-only.

When I was working on a patch series similar to yours, I introduced a
boolean "main_repository" flag in `struct ref_store`, and use that
member to implement `files_assert_main_repository()`. That way
`files_ref_store::submodule` can still be removed, which is the more
important goal from a design standpoint.

Michael


^ permalink raw reply

* Re: [PATCH v4 14/15] files-backend: remove submodule_allowed from files_downcast()
From: Duy Nguyen @ 2017-02-20 12:21 UTC (permalink / raw)
  To: Michael Haggerty
  Cc: Git Mailing List, Junio C Hamano, Johannes Schindelin,
	Ramsay Jones, Stefan Beller, David Turner
In-Reply-To: <25fcb527-595a-7865-41e3-ee7c4c1ad668@alum.mit.edu>

On Mon, Feb 20, 2017 at 7:11 PM, Michael Haggerty <mhagger@alum.mit.edu> wrote:
> On 02/18/2017 02:33 PM, Nguyễn Thái Ngọc Duy wrote:
>> Since submodule or not is irrelevant to files-backend after the last
>> patch, remove the parameter from files_downcast() and kill
>> files_assert_main_repository().
>>
>> PS. This implies that all ref operations are allowed for submodules. But
>> we may need to look more closely to see if that's really true...
>
> I think you are jumping the gun here.
>
> Even after your changes, there is still a significant difference between
> the main repository and submodules: we have access to the object
> database for the main repository, but not for submodules.
>
> So, for example, the following don't work for submodules:
>
> * `parse_object()` is used to ensure that references are only pointed at
> valid objects, and that branches are only pointed at commit objects.
>
> * `peel_object()` is used to write the peeled version of references in
> `packed-refs`, and to peel references while they are being iterated
> over. Since this doesn't work, I think you can't write `packed-refs` in
> a submodule.
>
> These limitations, I think, imply that submodule references have to be
> treated as read-only.

Behind the scene submodule does add_submodule_odb(), which basically
makes the submodule's odb an alternate of in-core odb. So odb access
works. I was puzzled how submodules could by pass odb access at the
beginning only to find that out. technical/api-ref-iteration.txt also
mentions that you need to add_submodule_odb(), so I think it's
deliberate (albeit hacky) design.

> When I was working on a patch series similar to yours, I introduced a
> boolean "main_repository" flag in `struct ref_store`, and use that
> member to implement `files_assert_main_repository()`. That way
> `files_ref_store::submodule` can still be removed, which is the more
> important goal from a design standpoint.

I could keep the submodule check back (and replace the submodule
string in files_ref_store with just a flag). But I really think all
backend functions work with submodule. Perhaps add some tests to
exercise/verify that files-backend-on-submodule works?
-- 
Duy

^ permalink raw reply

* Re: [PATCH v2 04/16] files-backend: replace *git_path*() with files_path()
From: Duy Nguyen @ 2017-02-20 12:25 UTC (permalink / raw)
  To: Michael Haggerty
  Cc: Git Mailing List, Junio C Hamano, Johannes Schindelin,
	Ramsay Jones, Stefan Beller, David Turner
In-Reply-To: <330bfd35-ec00-92a2-a221-c7be9f0199e5@alum.mit.edu>

On Mon, Feb 20, 2017 at 6:23 PM, Michael Haggerty <mhagger@alum.mit.edu> wrote:
> On 02/16/2017 12:48 PM, Nguyễn Thái Ngọc Duy wrote:
>> This centralizes all path rewriting of files-backend.c in one place so
>> we have easier time removing the path rewriting later. There could be
>> some hidden indirect git_path() though, I didn't audit the code to the
>> bottom.
>
> Almost all of the calls to `files_path()` [1] take one of the following
> forms:
>
> * `files_path(refs, &sb, "packed-refs")`
> * `files_path(refs, &sb, "%s", refname)`
> * `files_path(refs, &sb, "logs/%s", refname)`
>
> (though sometimes `refname` is not the name of a reference but rather
> the name of a directory containing references, like "refs/heads").
>
> This suggests to me that it would be more natural to have three
> functions, `files_packed_refs_path()`, `files_loose_ref_path()`, and
> `files_reflog_path()`, with no `fmt` arguments. Aside from making the
> callers a bit simpler and the implementation of each of the three
> functions simpler (they wouldn't have to deal with variable argument
> lists), at the cost of needing three similar functions.
>
> But I think the split would also be advantageous from a design point of
> view. The relative path locations of loose reference files, reflog
> files, and the packed-refs file is kind of a coincidence, and it would
> be advantageous to encode that policy in three functions rather than
> continuing to spread knowledge of those assumptions around.
>
> It would also make it easier to switch to a new system of encoding
> reference names, for example so that reference names that differ only in
> case can be stored on a case-insensitive filesystem, or to store reflogs
> using a naming scheme that is not susceptible to D/F conflicts so that
> we can retain reflogs for deleted references.

I agree. I didn't see it clearly at the beginning (and made several
mistakes in earlier iterations) but as you have seen files_path() the
separation is pretty clear there. I was going to do it when
introducing the "linked worktree" backend. But since you pointed it
out, I'll update it in this series too.

> Michael
>
> [1] The only exception I see is one call `files_path(refs, &sb,
> "logs")`, which is a prelude to iterating over the reflog files. This is
> actually an example of the code giving us a hint that the design is
> wrong: if you iterate only over the directories under `git_path(logs)`,
> you miss the reflogs for worktree references.

Oh yes, I had to fix the reflog iterator [1] exactly for that :)

[1] https://public-inbox.org/git/20170217141908.18012-14-pclouds@gmail.com/T/#u
-- 
Duy

^ permalink raw reply

* Re: Git bisect does not find commit introducing the bug
From: Jakub Narębski @ 2017-02-20 12:27 UTC (permalink / raw)
  To: Oleg Taranenko, Jacob Keller
  Cc: Alex Hoffman, Johannes Sixt, Christian Couder, Stephan Beyer, git
In-Reply-To: <CABEd3j8sgDd8DXW8+2Q7pjANPTr-Ws1Xs1ap875mkxFOfnenYw@mail.gmail.com>

W dniu 20.02.2017 o 08:38, Oleg Taranenko pisze:

>>>> Then you must adjust your definition of "good": All commits that do not have
>>>> the feature, yet, are "good": since they do not have the feature in the
>>>> first place, they cannot have the breakage that you found in the feature.
>>>>
>>>> That is exactly the situation in your original example! But you constructed
>>>> the condition of goodness in such a simplistic way (depending on the
>>>> presence of a string), that it was impossible to distinguish between "does
>>>> not have the feature at all" and "has the feature, but it is broken".
>>>
>>> Johannes, thank you for correctly identifying the error in my logic.
>>> Indeed I was using the term 'bad' also for the commit without the
>>> feature. In order to find the commit introducing the bug in my example
>>> a new state is needed, which would make 'git bisect' a bit more
>>> complicated than the user 'most of the time' probably needs. Or do you
>>> think, it would make sense to ask the user for this state (if e.g 'git
>>> bisect' would be started with a new parameter)?
> 
>> If a commit doesn't have the feature, then it is by definition, not
>> containing a broken feature, and you can simply use the "good" state.
>> There is no need for a different state. If you can't test the commit
>> because it's broken in some other way, you can use "git bisect skip"
>> but that isn't what you want in this case.
> 
> Commits missing feature == 'good' commit is a very confusing one.

Nowadays you can change the names for 'old' and 'new' with
`git bisect terms`.  HTH.
 
> Looks like in real life it happens much often, then git developers can
> imagine. For multi-branch/multi-feature workflow it's pretty easy not
> to recognize whether it is missing or not developed yet, especially on
> retrospective view where cherry-picking/squashing/merging is being
> used. My experience shows most annoying bugs are generating after a
> heavy merge (evil merge) with conflicts resolutions, where developer
> is not involved in the knowing what happens on counterpart changes.
> Then feature can be disappeared after it was worked & tested in its
> own branches.

Good to know about this problem.

> @Alex, I'm pretty interesting in fixing this weird bisect behaviour as
> well, as far as I struggled on it last summer and continue struggling
> so far :) If you want we can join to your efforts on fixing.

Anyway, I don't think it is feasible to weaken the assumption that there
is only one transition from 'old' ('good') to 'new' ('bad'); this is
what allows O(log(N)) operations.  See bisection method of root finding,
that is finding zeros of a continuous function.

Best,
-- 
Jakub Narębski


^ permalink raw reply

* Re: [PATCH v4 14/15] files-backend: remove submodule_allowed from files_downcast()
From: Michael Haggerty @ 2017-02-20 12:30 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Git Mailing List, Junio C Hamano, Johannes Schindelin,
	Ramsay Jones, Stefan Beller, David Turner
In-Reply-To: <CACsJy8AZ27O-pxTqHOzYXRBuyv8dkxdGJ_5Z0u3eaxkNdnaEYA@mail.gmail.com>

On 02/20/2017 01:21 PM, Duy Nguyen wrote:
> On Mon, Feb 20, 2017 at 7:11 PM, Michael Haggerty <mhagger@alum.mit.edu> wrote:
>> On 02/18/2017 02:33 PM, Nguyễn Thái Ngọc Duy wrote:
>>> Since submodule or not is irrelevant to files-backend after the last
>>> patch, remove the parameter from files_downcast() and kill
>>> files_assert_main_repository().
>>>
>>> PS. This implies that all ref operations are allowed for submodules. But
>>> we may need to look more closely to see if that's really true...
>>
>> I think you are jumping the gun here.
>>
>> Even after your changes, there is still a significant difference between
>> the main repository and submodules: we have access to the object
>> database for the main repository, but not for submodules.
>>
>> So, for example, the following don't work for submodules:
>>
>> * `parse_object()` is used to ensure that references are only pointed at
>> valid objects, and that branches are only pointed at commit objects.
>>
>> * `peel_object()` is used to write the peeled version of references in
>> `packed-refs`, and to peel references while they are being iterated
>> over. Since this doesn't work, I think you can't write `packed-refs` in
>> a submodule.
>>
>> These limitations, I think, imply that submodule references have to be
>> treated as read-only.
> 
> Behind the scene submodule does add_submodule_odb(), which basically
> makes the submodule's odb an alternate of in-core odb. So odb access
> works. I was puzzled how submodules could by pass odb access at the
> beginning only to find that out. technical/api-ref-iteration.txt also
> mentions that you need to add_submodule_odb(), so I think it's
> deliberate (albeit hacky) design.

That's interesting. I didn't know it before.

But I still don't think that means that reference writing can work
correctly. If I try to set a submodule branch to an SHA-1 and I verify
that the SHA-1 points to a valid commit, how do I know that the commit
is in the same submodule that holds the reference?

> [...]

Michael

^ permalink raw reply

* Re: [PATCH v4 06/15] files-backend: remove the use of git_path()
From: Duy Nguyen @ 2017-02-20 12:31 UTC (permalink / raw)
  To: Michael Haggerty
  Cc: Git Mailing List, Junio C Hamano, Johannes Schindelin,
	Ramsay Jones, Stefan Beller, David Turner
In-Reply-To: <1d750672-441d-14ae-21e9-d7bdd47a50a4@alum.mit.edu>

On Mon, Feb 20, 2017 at 6:34 PM, Michael Haggerty <mhagger@alum.mit.edu> wrote:
> On 02/18/2017 02:32 PM, Nguyễn Thái Ngọc Duy wrote:
>> Given $GIT_DIR and $GIT_COMMON_DIR, files-backend is now in charge of
>> deciding what goes where. The end goal is to pass $GIT_DIR only. A
>> refs "view" of a linked worktree is a logical ref store that combines
>> two files backends together.
>>
>> (*) Not entirely true since strbuf_git_path_submodule() still does path
>> translation underneath. But that's for another patch.
>>
>> Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
>> ---
>>  refs/files-backend.c | 37 +++++++++++++++++++++++++++++++++----
>>  1 file changed, 33 insertions(+), 4 deletions(-)
>>
>> diff --git a/refs/files-backend.c b/refs/files-backend.c
>> index b599ddf92..dbcaf9bda 100644
>> --- a/refs/files-backend.c
>> +++ b/refs/files-backend.c
>> @@ -924,6 +924,9 @@ struct files_ref_store {
>>        */
>>       const char *submodule;
>>
>> +     struct strbuf gitdir;
>> +     struct strbuf gitcommondir;
>
> Is there a reason for these to be `strbuf`s rather than `const char *`?
> (One reason would be if you planned to use the `len` field, but I don't
> think you do so.)

Nope. I just didn't think about char *. It may have to lose "const"
though because in submodule case we may need a new allocation.

>
>> @@ -937,15 +940,33 @@ static void files_path(struct files_ref_store *refs, struct strbuf *sb,
>>  {
>>       struct strbuf tmp = STRBUF_INIT;
>>       va_list vap;
>> +     const char *ref;
>>
>>       va_start(vap, fmt);
>>       strbuf_vaddf(&tmp, fmt, vap);
>>       va_end(vap);
>> -     if (refs->submodule)
>> +     if (refs->submodule) {
>>               strbuf_git_path_submodule(sb, refs->submodule,
>>                                         "%s", tmp.buf);
>> -     else
>> -             strbuf_git_path(sb, "%s", tmp.buf);
>> +     } else if (!strcmp(tmp.buf, "packed-refs") ||
>> +                !strcmp(tmp.buf, "logs")) { /* non refname path */
>> +             strbuf_addf(sb, "%s/%s", refs->gitcommondir.buf, tmp.buf);
>> +     } else if (skip_prefix(tmp.buf, "logs/", &ref)) { /* reflog */
>> +             if (is_per_worktree_ref(ref))
>> +                     strbuf_addf(sb, "%s/%s", refs->gitdir.buf, tmp.buf);
>> +             else
>> +                     strbuf_addf(sb, "%s/%s", refs->gitcommondir.buf, tmp.buf);
>
> This code would also be simpler if there were separate functions for
> packed-refs, loose references, and reflogs.

And maybe keep the path to packed-refs, the base path up to "logs" in
struct files_ref_store too (they will be calculated at ref store
init)? That way the files_packed_refs_path() does no calculation.
files_reflog_path() and files_ref_path() will just do string
concatenation, no fancy addf.
-- 
Duy

^ permalink raw reply

* Re: [PATCH v4 14/15] files-backend: remove submodule_allowed from files_downcast()
From: Duy Nguyen @ 2017-02-20 12:33 UTC (permalink / raw)
  To: Michael Haggerty
  Cc: Git Mailing List, Junio C Hamano, Johannes Schindelin,
	Ramsay Jones, Stefan Beller, David Turner
In-Reply-To: <c4265faf-a04a-552b-fd72-1631a220f788@alum.mit.edu>

On Mon, Feb 20, 2017 at 7:30 PM, Michael Haggerty <mhagger@alum.mit.edu> wrote:
> On 02/20/2017 01:21 PM, Duy Nguyen wrote:
>> On Mon, Feb 20, 2017 at 7:11 PM, Michael Haggerty <mhagger@alum.mit.edu> wrote:
>>> On 02/18/2017 02:33 PM, Nguyễn Thái Ngọc Duy wrote:
>>>> Since submodule or not is irrelevant to files-backend after the last
>>>> patch, remove the parameter from files_downcast() and kill
>>>> files_assert_main_repository().
>>>>
>>>> PS. This implies that all ref operations are allowed for submodules. But
>>>> we may need to look more closely to see if that's really true...
>>>
>>> I think you are jumping the gun here.
>>>
>>> Even after your changes, there is still a significant difference between
>>> the main repository and submodules: we have access to the object
>>> database for the main repository, but not for submodules.
>>>
>>> So, for example, the following don't work for submodules:
>>>
>>> * `parse_object()` is used to ensure that references are only pointed at
>>> valid objects, and that branches are only pointed at commit objects.
>>>
>>> * `peel_object()` is used to write the peeled version of references in
>>> `packed-refs`, and to peel references while they are being iterated
>>> over. Since this doesn't work, I think you can't write `packed-refs` in
>>> a submodule.
>>>
>>> These limitations, I think, imply that submodule references have to be
>>> treated as read-only.
>>
>> Behind the scene submodule does add_submodule_odb(), which basically
>> makes the submodule's odb an alternate of in-core odb. So odb access
>> works. I was puzzled how submodules could by pass odb access at the
>> beginning only to find that out. technical/api-ref-iteration.txt also
>> mentions that you need to add_submodule_odb(), so I think it's
>> deliberate (albeit hacky) design.
>
> That's interesting. I didn't know it before.
>
> But I still don't think that means that reference writing can work
> correctly. If I try to set a submodule branch to an SHA-1 and I verify
> that the SHA-1 points to a valid commit, how do I know that the commit
> is in the same submodule that holds the reference?

Good point. So will the new flag be "read_only" (no reference to
submodule), or "submodule"? This flag will be passed in at ref-store
init time and kept in files_ref_store.
-- 
Duy

^ permalink raw reply

* Re: [PATCH v4 14/15] files-backend: remove submodule_allowed from files_downcast()
From: Michael Haggerty @ 2017-02-20 12:38 UTC (permalink / raw)
  To: Duy Nguyen
  Cc: Git Mailing List, Junio C Hamano, Johannes Schindelin,
	Ramsay Jones, Stefan Beller, David Turner
In-Reply-To: <CACsJy8ChAwwPawTrq4gqZqAO3k6PrgzzvceVFjbJmxkE0Rn8SQ@mail.gmail.com>

On 02/20/2017 01:33 PM, Duy Nguyen wrote:
> On Mon, Feb 20, 2017 at 7:30 PM, Michael Haggerty <mhagger@alum.mit.edu> wrote:
>> On 02/20/2017 01:21 PM, Duy Nguyen wrote:
>>> On Mon, Feb 20, 2017 at 7:11 PM, Michael Haggerty <mhagger@alum.mit.edu> wrote:
>>>> [...]
>>>> These limitations, I think, imply that submodule references have to be
>>>> treated as read-only.
>>>
>>> Behind the scene submodule does add_submodule_odb(), which basically
>>> makes the submodule's odb an alternate of in-core odb. So odb access
>>> works. I was puzzled how submodules could by pass odb access at the
>>> beginning only to find that out. technical/api-ref-iteration.txt also
>>> mentions that you need to add_submodule_odb(), so I think it's
>>> deliberate (albeit hacky) design.
>>
>> That's interesting. I didn't know it before.
>>
>> But I still don't think that means that reference writing can work
>> correctly. If I try to set a submodule branch to an SHA-1 and I verify
>> that the SHA-1 points to a valid commit, how do I know that the commit
>> is in the same submodule that holds the reference?
> 
> Good point. So will the new flag be "read_only" (no reference to
> submodule), or "submodule"? This flag will be passed in at ref-store
> init time and kept in files_ref_store.

I haven't checked carefully whether there are other operations that
don't work and/or don't make sense for submodules. If not, then
`read_only` would also be a fine name for the flag.

Michael


^ permalink raw reply

* Re: [PATCH v4 11/15] refs.c: make get_main_ref_store() public and use it
From: Michael Haggerty @ 2017-02-20 12:37 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy, git
  Cc: Junio C Hamano, Johannes Schindelin, Ramsay Jones, Stefan Beller,
	novalis
In-Reply-To: <20170218133303.3682-12-pclouds@gmail.com>

On 02/18/2017 02:32 PM, Nguyễn Thái Ngọc Duy wrote:
> get_ref_store() will soon be renamed to get_submodule_ref_store().
> Together with future get_worktree_ref_store(), the three functions
> provide an appropriate ref store for different operation modes. New APIs
> will be added to operate directly on ref stores.

I see where you're going with this, but as of the end of this patch
series, there is still nothing that a caller outside of the refs module
can do with a `struct ref_store *`. This means that it would be enough
to put this declaration (and that of `get_submodule_ref_store()`, added
in a later patch) in refs/refs-internal.h for now.

If you want to move the declarations straight to `refs.h` now to avoid
code churn in some later patch series, then please mention that fact in
the commit message.

Michael


^ permalink raw reply

* slightly confusing message
From: Leon George @ 2017-02-20 12:35 UTC (permalink / raw)
  To: git

Hey, you lovely people <3

Every once in a while this:

£ git add -p .
warning: empty strings as pathspecs will be made invalid in upcoming
releases. please use . instead if you meant to match all paths
No changes.

It seems to happen only when there are no more modified files and git
still works wonderfully otherwise - just wanted to let you know.

enjoy your weeks :-)

^ permalink raw reply

* Re: [PATCH 2/5] hashmap: allow memihash computation to be continued
From: Johannes Schindelin @ 2017-02-20 12:43 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jeff Hostetler
In-Reply-To: <xmqq7f4onjrs.fsf@gitster.mtv.corp.google.com>

Hi Junio,

On Fri, 17 Feb 2017, Junio C Hamano wrote:

> Johannes Schindelin <johannes.schindelin@gmx.de> writes:
> 
> > diff --git a/hashmap.c b/hashmap.c
> > index b10b642229c..061b7d61da6 100644
> > --- a/hashmap.c
> > +++ b/hashmap.c
> > @@ -50,6 +50,20 @@ unsigned int memihash(const void *buf, size_t len)
> >  	return hash;
> >  }
> >  
> > +/* Incoporate another chunk of data into a memihash computation. */
> > +unsigned int memihash_continue(unsigned int hash,
> > +			       const void *buf, size_t len)
> > +{
> > +	const unsigned char *p = buf;
> > +	while (len--) {
> > +		unsigned int c = *p++;
> > +		if (c >= 'a' && c <= 'z')
> > +			c -= 'a' - 'A';
> > +		hash = (hash * FNV32_PRIME) ^ c;
> > +	}
> > +	return hash;
> > +}
> 
> This makes me wonder if we want to reduce the duplication (primarily
> to avoid risking the loop body to go out of sync) by doing:
> 
> 	unsigned int memihash(const void *buf, size_t len)
> 	{
> 		return memihash_continue(buf, len, FNV32_BASE);
> 	}                
> 
> If an extra call level really matters, its "inline" equivalent in
> the header would probably be good.

Well, the hashing is supposed to be as fast as possible, so I would like
to avoid that extra call level. However, the end result is not so pretty
because FNV32_BASE needs to be made public (OTOH it removes more lines
than it adds):

-- snipsnap --
diff --git a/hashmap.c b/hashmap.c
index 061b7d61da6..470a0832688 100644
--- a/hashmap.c
+++ b/hashmap.c
@@ -4,7 +4,6 @@
 #include "cache.h"
 #include "hashmap.h"
 
-#define FNV32_BASE ((unsigned int) 0x811c9dc5)
 #define FNV32_PRIME ((unsigned int) 0x01000193)
 
 unsigned int strhash(const char *str)
@@ -37,19 +36,6 @@ unsigned int memhash(const void *buf, size_t len)
 	return hash;
 }
 
-unsigned int memihash(const void *buf, size_t len)
-{
-	unsigned int hash = FNV32_BASE;
-	unsigned char *ucbuf = (unsigned char *) buf;
-	while (len--) {
-		unsigned int c = *ucbuf++;
-		if (c >= 'a' && c <= 'z')
-			c -= 'a' - 'A';
-		hash = (hash * FNV32_PRIME) ^ c;
-	}
-	return hash;
-}
-
 /* Incoporate another chunk of data into a memihash computation. */
 unsigned int memihash_continue(unsigned int hash,
 			       const void *buf, size_t len)
diff --git a/hashmap.h b/hashmap.h
index 78e14dfde71..a1a8fd7dc54 100644
--- a/hashmap.h
+++ b/hashmap.h
@@ -8,12 +8,17 @@
 
 /* FNV-1 functions */
 
+#define FNV32_BASE ((unsigned int) 0x811c9dc5)
+
 extern unsigned int strhash(const char *buf);
 extern unsigned int strihash(const char *buf);
 extern unsigned int memhash(const void *buf, size_t len);
-extern unsigned int memihash(const void *buf, size_t len);
 extern unsigned int memihash_continue(unsigned int hash_seed,
 				      const void *buf, size_t len);
+static inline unsigned int memihash(const void *buf, size_t len)
+{
+	return memihash_continue(FNV32_BASE, buf, len);
+}
 
 static inline unsigned int sha1hash(const unsigned char *sha1)
 {

^ permalink raw reply related

* Re: [PATCH v4 00/15] Remove submodule from files-backend.c
From: Michael Haggerty @ 2017-02-20 12:42 UTC (permalink / raw)
  To: Nguyễn Thái Ngọc Duy, git
  Cc: Junio C Hamano, Johannes Schindelin, Ramsay Jones, Stefan Beller,
	novalis
In-Reply-To: <20170218133303.3682-1-pclouds@gmail.com>

On 02/18/2017 02:32 PM, Nguyễn Thái Ngọc Duy wrote:
> v4:

I very much like the direction of this patch series. I reviewed the
design pretty carefully and left some comments about it, and skimmed
through the code changes. But I haven't yet reviewed the code in detail.
I'll wait for your reaction to my design comments before doing so.

Thanks,
Michael


^ permalink raw reply

* Re: [PATCH v4 00/15] Remove submodule from files-backend.c
From: Duy Nguyen @ 2017-02-20 12:47 UTC (permalink / raw)
  To: Michael Haggerty
  Cc: Git Mailing List, Junio C Hamano, Johannes Schindelin,
	Ramsay Jones, Stefan Beller, David Turner
In-Reply-To: <cdf01f55-f3df-d3ce-149c-0249a03d27bf@alum.mit.edu>

On Mon, Feb 20, 2017 at 7:42 PM, Michael Haggerty <mhagger@alum.mit.edu> wrote:
> On 02/18/2017 02:32 PM, Nguyễn Thái Ngọc Duy wrote:
>> v4:
>
> I very much like the direction of this patch series. I reviewed the
> design pretty carefully and left some comments about it, and skimmed
> through the code changes. But I haven't yet reviewed the code in detail.
> I'll wait for your reaction to my design comments before doing so.

Unless there are some mails in transit, thanks for the review. The
comments I haven't replied to usually mean "I agree" (but not writing
unless I could add anything after).
-- 
Duy

^ permalink raw reply

* Re: Git bisect does not find commit introducing the bug
From: Oleg Taranenko @ 2017-02-20 13:50 UTC (permalink / raw)
  To: Jakub Narębski
  Cc: Jacob Keller, Alex Hoffman, Johannes Sixt, Christian Couder,
	Stephan Beyer, git
In-Reply-To: <58d25138-de2e-6995-444f-79c3ac0bbad2@gmail.com>

> Anyway, I don't think it is feasible to weaken the assumption that there
> is only one transition from 'old' ('good') to 'new' ('bad'); this is
> what allows O(log(N)) operations.  See bisection method of root finding,
> that is finding zeros of a continuous function.

I don't intended to change default behaviour of git bisect, I can
estimate what drawback it could bring.
I'd rather implement something like

git bisect start --bisect-strategy=missing-feature

by default current state is being used.

git config value would be also useful.

Oleg

^ permalink raw reply

* Re: [PATCH 4/4 v4] sha1_name.c: teach get_sha1_1 "-" shorthand for "@{-1}"
From: Siddharth Kannan @ 2017-02-20 14:21 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Git List, Matthieu Moy, Pranit Bauva, Jeff King, pclouds,
	brian m. carlson
In-Reply-To: <xmqq8tp6x8b6.fsf@gitster.mtv.corp.google.com>

On 17 February 2017 at 00:38, Junio C Hamano <gitster@pobox.com> wrote:
> Having said all that, I do not think the remainder of the code is
> prepared to take "-", not yet anyway [*1*], so turning "-" into
> "@{-1}" this patch does before it calls get_sha1_basic(), while it
> is not an ideal final state, is probably an acceptable milestone to
> stop at.

So, is it okay to stop with just supporting "-" and not support things
like "-@{yesterday}"?

Matthieu's comments on the matter:

    Siddharth Kannan <kannan.siddharth12@gmail.com> writes:

    > As per Matthieu's comments, I have updated the tests, but there
is still one
    > thing that is not working: log -@{yesterday} or log -@{2.days.ago}

    Note that I did not request that these things work, just that they seem
    to be relevant tests: IMHO it's OK to reject them, but for example we
    don't want them to segfault. And having a test is a good hint that you
    thought about what could happen and to document it.

[Quoted from email <vpqa89mnl4z.fsf@anie.imag.fr>]


>
> It is a separate matter if this patch is sufficient to produce
> correct results, though.  I haven't studied the callers of this
> change to make sure yet, and may find bugs in this approach later.
>

-- 

Best Regards,

- Siddharth Kannan.

^ permalink raw reply

* Re: git diff --quiet exits with 1 on clean tree with CRLF conversions
From: Mike Crowe @ 2017-02-20 15:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <20170217221958.GA12163@mcrowe.com>

On Friday 17 February 2017 at 22:19:58 +0000, Mike Crowe wrote:
> On Friday 17 February 2017 at 14:05:17 -0800, Junio C Hamano wrote:
> > Mike Crowe <mac@mcrowe.com> writes:
> > 
> > > If "git diff --quiet" finds it necessary to compare actual file contents,
> > > and a file requires CRLF conversion, then it incorrectly exits with an exit
> > > code of 1 even if there have been no changes.
> > >
> > > The patch below adds a test file that shows the problem.
> > 
> > If "git diff" does not show any output and "git diff --exit-code" or
> > "git diff --quiet" says there are differences, then it is a bug.
> > 
> > I would however have expected that any culprit would involve a code
> > that says "under QUICK option, we do not have to bother doing
> > this".  The part you quoted:
> > 
> > > 	if (!DIFF_FILE_VALID(p->one) || /* (1) */
> > > 	    !DIFF_FILE_VALID(p->two) ||
> > > 	    (p->one->oid_valid && p->two->oid_valid) ||
> > > 	    (p->one->mode != p->two->mode) ||
> > > 	    diff_populate_filespec(p->one, CHECK_SIZE_ONLY) ||
> > > 	    diff_populate_filespec(p->two, CHECK_SIZE_ONLY) ||
> > > 	    (p->one->size != p->two->size) ||
> > > 	    !diff_filespec_is_identical(p->one, p->two)) /* (2) */
> > > 		p->skip_stat_unmatch_result = 1;
> > 
> > is used by "git diff" with and without "--quiet", afacr, so I
> > suspect that the bug lies somewhere else.
> 
> I can't say that I really understand the code fully, but it appears that
> the first pass generates a queue of files that contain differences. The
> result of the quiet diff comes from the size of that queue,
> diff_queued_diff.nr, being non-zero in diffcore_std. I'm assuming that the
> result of the noisy diff comes from actually comparing the files.
> 
> But, I've only spent a short while looking so I may have got the wrong end
> of the stick.

Tricking Git into checking the actual file contents (by passing
--ignore-space-change for example) is sufficient to ensure that the exit
status is never 1 when it should be zero. (Of course that option has other
unwanted effects in this case.)

I think that if there's a risk that file contents will undergo conversion
then this should force the diff to check the file contents. Something like:

diff --git a/diff.c b/diff.c
index 051761b..bee1662 100644
--- a/diff.c
+++ b/diff.c
@@ -3413,13 +3413,14 @@ void diff_setup_done(struct diff_options *options)
 	/*
 	 * Most of the time we can say "there are changes"
 	 * only by checking if there are changed paths, but
-	 * --ignore-whitespace* options force us to look
-	 * inside contents.
+	 * --ignore-whitespace* options or text conversion
+	 * force us to look inside contents.
 	 */
 
 	if (DIFF_XDL_TST(options, IGNORE_WHITESPACE) ||
 	    DIFF_XDL_TST(options, IGNORE_WHITESPACE_CHANGE) ||
-	    DIFF_XDL_TST(options, IGNORE_WHITESPACE_AT_EOL))
+	    DIFF_XDL_TST(options, IGNORE_WHITESPACE_AT_EOL) ||
+	    DIFF_OPT_TST(options, ALLOW_TEXTCONV))
 		DIFF_OPT_SET(options, DIFF_FROM_CONTENTS);
 	else
 		DIFF_OPT_CLR(options, DIFF_FROM_CONTENTS);

This solves the problem for me and my test case now passes. Unfortunately
it breaks the 'removing and adding subproject' test case in
t3040-subprojects-basic at the line:

 test_expect_code 1 git diff -M --name-status --exit-code HEAD^ HEAD

presumably because after the rename has been detected the file contents are
identical. :( A rename of a single file appears to still be handled
correctly.

Mike.

^ permalink raw reply related

* Re: [PATCH] config: preserve <subsection> case for one-shot config on the command line
From: Lars Schneider @ 2017-02-20 17:17 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Jonathan Tan, git, sbeller
In-Reply-To: <xmqqino5jia8.fsf@gitster.mtv.corp.google.com>


> On 20 Feb 2017, at 10:58, Junio C Hamano <gitster@pobox.com> wrote:
> 
> Junio C Hamano <gitster@pobox.com> writes:
> 
>> I still haven't queued any of the variants I posted (and I do not
>> think other people sent their own versions, either).  I need to pick
>> one and queue, with a test or two.  Perhaps after -rc2.  
>> 
>> Others are welcome to work on it while I cut -rc2 tomorrow, so that
>> by the time I see their patch all that is left for me to do is to
>> apply it ;-)
> 
> Since nothing seems to have happened in the meantime, here is what
> I'll queue so that we won't forget for now.  Lars's tests based on
> how the scripted "git submodule" uses "git config" may still be
> valid, but it is somewhat a roundabout way to demonstrate the
> breakage and not very effective way to protect the fix, so I added a
> new test that directly tests "git -c <var>=<val> <command>".

Agreed. Please ignore my tests.
If you want to you could queue this tiny cleanup, though:
http://public-inbox.org/git/20170215113325.14393-1-larsxschneider@gmail.com/

> ...
> 
> +/*
> + * downcase the <section> and <variable> in <section>.<variable> or
> + * <section>.<subsection>.<variable> and do so in place.  <subsection>
> + * is left intact.
> + */
> +static void canonicalize_config_variable_name(char *varname)
> +{
> +	char *cp, *last_dot;

What does cp stand for? "char pointer"?

> +
> +	/* downcase the first segment */
> +	for (cp = varname; *cp; cp++) {
> +		if (*cp == '.')
> +			break;
> +		*cp = tolower(*cp);
> +	}
> +	if (!*cp)
> +		return;
> +
> +	/* scan for the last dot */
> +	for (last_dot = cp; *cp; cp++)
> +		if (*cp == '.')
> +			last_dot = cp;
> +
> +	/* downcase the last segment */
> +	for (cp = last_dot; *cp; cp++)
> +		*cp = tolower(*cp);
> +}
> +
> int git_config_parse_parameter(const char *text,
> 			       config_fn_t fn, void *data)
> {
> @@ -221,7 +249,7 @@ int git_config_parse_parameter(const char *text,
> 		strbuf_list_free(pair);
> 		return error("bogus config parameter: %s", text);
> 	}
> -	strbuf_tolower(pair[0]);
> +	canonicalize_config_variable_name(pair[0]->buf);
> 	if (fn(pair[0]->buf, value, data) < 0) {
> 		strbuf_list_free(pair);
> 		return -1;
> diff --git a/t/t1351-config-cmdline.sh b/t/t1351-config-cmdline.sh
> new file mode 100755
> index 0000000000..acb8dc3b15
> --- /dev/null
> +++ b/t/t1351-config-cmdline.sh
> @@ -0,0 +1,48 @@
> +#!/bin/sh
> +
> +test_description='git -c var=val'
> +
> +. ./test-lib.sh
> +
> +test_expect_success 'last one wins: two level vars' '
> +	echo VAL >expect &&
> +
> +	# sec.var and sec.VAR are the same variable, as the first
> +	# and the last level of a configuration variable name is
> +	# case insensitive.  Test both setting and querying.
> +
> +	git -c sec.var=val -c sec.VAR=VAL config --get sec.var >actual &&
> +	test_cmp expect actual &&
> +	git -c SEC.var=val -c sec.var=VAL config --get sec.var >actual &&
> +	test_cmp expect actual &&
> +
> +	git -c sec.var=val -c sec.VAR=VAL config --get SEC.var >actual &&
> +	test_cmp expect actual &&
> +	git -c SEC.var=val -c sec.var=VAL config --get sec.VAR >actual &&
> +	test_cmp expect actual
> +'
> +
> +test_expect_success 'last one wins: three level vars' '
> +	echo val >expect &&
> +
> +	# v.a.r and v.A.r are not the same variable, as the middle
> +	# level of a three-level configuration variable name is
> +	# case sensitive.  Test both setting and querying.
> +
> +	git -c v.a.r=val -c v.A.r=VAL config --get v.a.r >actual &&
> +	test_cmp expect actual &&
> +	git -c v.a.r=val -c v.A.r=VAL config --get V.a.R >actual &&
> +	test_cmp expect actual &&
> +
> +	echo VAL >expect &&
> +	git -c v.a.r=val -c v.a.R=VAL config --get v.a.r >actual &&
> +	test_cmp expect actual &&
> +	git -c v.a.r=val -c V.a.r=VAL config --get v.a.r >actual &&
> +	test_cmp expect actual &&
> +	git -c v.a.r=val -c v.a.R=VAL config --get V.a.R >actual &&
> +	test_cmp expect actual &&
> +	git -c v.a.r=val -c V.a.r=VAL config --get V.a.R >actual &&
> +	test_cmp expect actual
> +'
> +
> +test_done
> -- 
> 2.12.0-rc2-221-g8fa194a99f
> 

Looks good to me!

Thank you,
Lars


^ permalink raw reply

* Inconsistent results of git blame --porcelain when detecting copies from other files
From: Sokolov, Konstantin @ 2017-02-20 17:59 UTC (permalink / raw)
  To: git@vger.kernel.org

Hi Folks!

The issue is best explained on an example. You can reproduce it using the Lucene repo https://github.com/apache/lucene-solr.git. Tested with the following versions:  1.8.1.6 (Ubuntu), 2.11.0.windows.1, 2.11.1.windows.1.

First, let's produce the correct results without using --procelain:

> git blame --show-name --show-number -s -w --abbrev=40 -C -C -C d1165b19726fa0cd13a539827a7cd43237a4feef..10ba9abeb208d37df985e95a742f756de067353f --not f5dba8b76709ff0ef8715b8b288a4b64d4993fa3 -- lucene/src/java/org/apache/lucene/index/DirectoryReader.java

The following excerpt shows lines 501-505 from the output. In particular we can see that lines 502-503 originate from IndexReader.java.

10ba9abeb208d37df985e95a742f756de067353f lucene/src/java/org/apache/lucene/index/DirectoryReader.java 501 501)    * <p>This method
^d1165b19726fa0cd13a539827a7cd43237a4fee lucene/src/java/org/apache/lucene/index/IndexReader.java     496 502)    * returns the version recorded in the commit that the
^d1165b19726fa0cd13a539827a7cd43237a4fee lucene/src/java/org/apache/lucene/index/IndexReader.java     497 503)    * reader opened.  This version is advanced every time
^d1165b19726fa0cd13a539827a7cd43237a4fee lucene/src/java/org/apache/lucene/index/IndexReader.java     498 504)    * a change is made with {@link IndexWriter}.</p>
10ba9abeb208d37df985e95a742f756de067353f lucene/src/java/org/apache/lucene/index/DirectoryReader.java 505 505)    */

The same information can be obtained as well by using --line-porcelain:

> git blame --show-name --show-number --line-porcelain -s -w --abbrev=40 -C -C -C d1165b19726fa0cd13a539827a7cd43237a4feef..10ba9abeb208d37df985e95a742f756de067353f --not f5dba8b76709ff0ef8715b8b288a4b64d4993fa3 -- lucene/src/java/org/apache/lucene/index/DirectoryReader.java

Here is the output for line 502:

d1165b19726fa0cd13a539827a7cd43237a4feef 496 502 3
author Michael McCandless
author-mail <mikemccand@apache.org>
author-time 1327877325
author-tz +0000
committer Michael McCandless
committer-mail <mikemccand@apache.org>
committer-time 1327877325
committer-tz +0000
summary LUCENE-3725: add optional packing to FSTs
boundary
filename lucene/src/java/org/apache/lucene/index/IndexReader.java
        * returns the version recorded in the commit that the

However, when using --porcelain DirectoryReader.java is reported as the origin of lines 502-504:

> git blame --show-name --show-number --porcelain -s -w --abbrev=40 -C -C -C d1165b19726fa0cd13a539827a7cd43237a4feef..10ba9abeb208d37df985e95a742f756de067353f --not f5dba8b76709ff0ef8715b8b288a4b64d4993fa3 -- lucene/src/java/org/apache/lucene/index/DirectoryReader.java

10ba9abeb208d37df985e95a742f756de067353f 501 501 1
author Uwe Schindler
author-mail <uschindler@apache.org>
author-time 1327879145
author-tz +0000
committer Uwe Schindler
committer-mail <uschindler@apache.org>
committer-time 1327879145
committer-tz +0000
summary Reverse merged revision(s) from lucene/dev/trunk up to 1237502
previous f5dba8b76709ff0ef8715b8b288a4b64d4993fa3 lucene/src/java/org/apache/lucene/index/DirectoryReader.java
filename lucene/src/java/org/apache/lucene/index/DirectoryReader.java
        * <p>This method
d1165b19726fa0cd13a539827a7cd43237a4feef 496 502 3
        * returns the version recorded in the commit that the
d1165b19726fa0cd13a539827a7cd43237a4feef 497 503
        * reader opened.  This version is advanced every time
d1165b19726fa0cd13a539827a7cd43237a4feef 498 504

This is not only inconsistent with the other outputs but the output is also inconsistent in itself because lines 496 -498 do not even exist in a previous version of DirectoryReader.java.

Thanks for any feedback.

Kind Regards
Konstantin Sokolov

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox