Git development
 help / color / mirror / Atom feed
* Re: Regression: git send-email fails with "Use of uninitialized value $address" + "unable to extract a valid address"
From: Uwe Kleine-König @ 2023-10-20 10:04 UTC (permalink / raw)
  To: git; +Cc: Luben Tuikov, Michael Strawbridge, entwicklung
In-Reply-To: <20231013141437.ywrhw65xdapmev7d@pengutronix.de>

[-- Attachment #1: Type: text/plain, Size: 1390 bytes --]

hello,

On Fri, Oct 13, 2023 at 04:14:37PM +0200, Uwe Kleine-König wrote:
> Hello,
> 
> 	$ git send-email --to 'A B <a@b.org>, C D <c@d.org>' lala.patch
> 	Use of uninitialized value $address in sprintf at /usr/lib/git-core/git-send-email line 1172.
> 	error: unable to extract a valid address from:
> 
> This happens for me with git 2.42.0 and also on master (59167d7d09fd, "The seventeenth batch").
> 
> Bisection points at
> 
> 	a8022c5f7b67 ("send-email: expose header information to git-send-email's sendemail-validate hook")
> 
> I didn't try to understand that change and fix the problem.

Another (similar?) problem with non-ascii-chars:

	$ git send-email --to 'Will Deacon <will@kernel.org>' --to 'Krzysztof Wilczyński <kw@linux.com>' --to 'Lorenzo Pieralisi <lpieralisi@kernel.org>' --cc 'Rob Herring <robh@kernel.org>' --to 'Bjorn Helgaas <bhelgaas@google.com>' --cc 'linux-pci@vger.kernel.org' --cc kernel@pengutronix.de -1 --base=@~
	Use of uninitialized value $address in sprintf at /home/uwe/gsrc/git/git-send-email line 1162.
	error: unable to extract a valid address from:

Bisection points to the same commit, when dropping ń in Krzysztof's
name, it works fine.

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-König            |
Industrial Linux Solutions                 | https://www.pengutronix.de/ |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* [PATCH 1/3] doc/send-email: mention handling of "reply-to" with --compose
From: Jeff King @ 2023-10-20 10:09 UTC (permalink / raw)
  To: Michael Strawbridge; +Cc: Junio C Hamano, Bagas Sanjaya, Git Mailing List
In-Reply-To: <20231020100343.GA2194322@coredump.intra.peff.net>

The documentation for git-send-email lists the headers handled specially
by --compose in a way that implies that this is the complete set of
headers that are special. But one more was added by d11c943c78
(send-email: support separate Reply-To address, 2018-03-04) and never
documented.

Let's add it, and reword the documentation slightly to avoid having to
specify the list of headers twice (as it is growing and will continue to
do so as we add new features).

If you read the code, you may notice that we also handle MIME-Version
specially, in that we'll avoid over-writing user-provided MIME headers.
I don't think this is worth mentioning, as it's what you'd expect to
happen (as opposed to the other headers, which are picked up to be used
in later emails). And certainly this feature existed when the
documentation was expanded in 01d3861217 (git-send-email.txt: describe
--compose better, 2009-03-16), and we chose not to mention it then.

Signed-off-by: Jeff King <peff@peff.net>
---
Just something I noticed since a later commit touches the same list.

 Documentation/git-send-email.txt | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-send-email.txt b/Documentation/git-send-email.txt
index 492a82323d..021276329c 100644
--- a/Documentation/git-send-email.txt
+++ b/Documentation/git-send-email.txt
@@ -68,11 +68,11 @@ This option may be specified multiple times.
 	Invoke a text editor (see GIT_EDITOR in linkgit:git-var[1])
 	to edit an introductory message for the patch series.
 +
-When `--compose` is used, git send-email will use the From, Subject, and
-In-Reply-To headers specified in the message. If the body of the message
-(what you type after the headers and a blank line) only contains blank
-(or Git: prefixed) lines, the summary won't be sent, but From, Subject,
-and In-Reply-To headers will be used unless they are removed.
+When `--compose` is used, git send-email will use the From, Subject,
+Reply-To, and In-Reply-To headers specified in the message. If the body
+of the message (what you type after the headers and a blank line) only
+contains blank (or Git: prefixed) lines, the summary won't be sent, but
+the headers mentioned above will be used unless they are removed.
 +
 Missing From or In-Reply-To headers will be prompted for.
 +
-- 
2.42.0.980.g8b5f6199be


^ permalink raw reply related

* [PATCH 2/3] Revert "send-email: extract email-parsing code into a subroutine"
From: Jeff King @ 2023-10-20 10:13 UTC (permalink / raw)
  To: Michael Strawbridge; +Cc: Junio C Hamano, Bagas Sanjaya, Git Mailing List
In-Reply-To: <20231020100343.GA2194322@coredump.intra.peff.net>

This reverts commit b6049542b97e7b135e0e82bf996084d461224d32.

Prior to that commit, we read the results of the user editing the
"--compose" message in a loop, picking out parts we cared about, and
streaming the result out to a ".final" file. That commit split the
reading/interpreting into two phases; we'd now read into a hash, and
then pick things out of the hash.

The goal was making the code more readable. And in some ways it did,
because the ugly regexes are confined to the reading phase. But it also
introduced several bugs, because now the two phases need to match each
other. In particular:

  - we pick out headers like "Subject: foo" with a case-insensitive
    regex, and then use the user-provided header name as the key in a
    case-sensitive hash. So if the user wrote "subject: foo", we'd no
    longer recognize it as a subject.

  - the namespace for the hash keys conflates header names with meta
    information like "body". If you put "body: foo" in your message, it
    would be misinterpreted as the actual message body (nobody is likely
    to do that in practice, but it seems like an unnecessary danger).

  - the handling for to/cc/bcc is totally broken. The behavior before
    that commit is to recognize and skip those headers, with a note to
    the user that they are not yet handled. Not great, but OK. But
    after the patch, the reading side now splits the addresses into a
    perl array-ref. But the interpreting side doesn't handle this at
    all, and blindly prints the stringified array-ref value. This leads
    to garbage like:

      (mbox) Adding to: ARRAY (0x555b4345c428) from line 'To: ARRAY(0x555b4345c428)'
      error: unable to extract a valid address from: ARRAY (0x555b4345c428)
      What to do with this address? ([q]uit|[d]rop|[e]dit):

    Probably not a huge deal, since nobody should even try to use those
    headers in the first place (since they were not implemented). But
    the new behavior is worse, and indicative of the sorts of problems
    that come from having the two layers.

The revert had a few conflicts, due to later work in this area from
15dc3b9161 (send-email: rename variable for clarity, 2018-03-04) and
d11c943c78 (send-email: support separate Reply-To address, 2018-03-04).
I've ported the changes from those commits over as part of the conflict
resolution.

The new tests show the bugs. Note the use of GIT_SEND_EMAIL_NOTTY in the
second one. Without it, the test is happy to reach outside the test
harness to the developer's actual terminal (when run with the buggy
state before this patch).

Signed-off-by: Jeff King <peff@peff.net>
---
I guess "readable" is up for debate here, but I find the inline handling
a lot easier to follow (and it's half as many lines; most of the
diffstat is the new tests).

But one thing that gives me pause is that the neither before or after
this patch do we handle continuation lines like:

  Subject: this is the beginning
    and this is more subject

And it would probably be a lot easier to add when storing the headers in
a hash (it's not impossible to do it the other way, but you basically
have to delay processing each line with a small state machine).

So another option is to just fix the individual bugs separately.

 git-send-email.perl   | 120 ++++++++++++++----------------------------
 t/t9001-send-email.sh |  35 ++++++++++++
 2 files changed, 75 insertions(+), 80 deletions(-)

diff --git a/git-send-email.perl b/git-send-email.perl
index 288ea1ae80..bbda2a931b 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -888,73 +888,59 @@ sub get_patch_subject {
 		do_edit($compose_filename);
 	}
 
+	open my $c2, ">", $compose_filename . ".final"
+		or die sprintf(__("Failed to open %s.final: %s"), $compose_filename, $!);
+
 	open $c, "<", $compose_filename
 		or die sprintf(__("Failed to open %s: %s"), $compose_filename, $!);
 
+	my $need_8bit_cte = file_has_nonascii($compose_filename);
+	my $in_body = 0;
+	my $summary_empty = 1;
 	if (!defined $compose_encoding) {
 		$compose_encoding = "UTF-8";
 	}
-
-	my %parsed_email;
-	while (my $line = <$c>) {
-		next if $line =~ m/^GIT:/;
-		parse_header_line($line, \%parsed_email);
-		if ($line =~ /^$/) {
-			$parsed_email{'body'} = filter_body($c);
+	while(<$c>) {
+		next if m/^GIT:/;
+		if ($in_body) {
+			$summary_empty = 0 unless (/^\n$/);
+		} elsif (/^\n$/) {
+			$in_body = 1;
+			if ($need_8bit_cte) {
+				print $c2 "MIME-Version: 1.0\n",
+					 "Content-Type: text/plain; ",
+					   "charset=$compose_encoding\n",
+					 "Content-Transfer-Encoding: 8bit\n";
+			}
+		} elsif (/^MIME-Version:/i) {
+			$need_8bit_cte = 0;
+		} elsif (/^Subject:\s*(.+)\s*$/i) {
+			$initial_subject = $1;
+			my $subject = $initial_subject;
+			$_ = "Subject: " .
+				quote_subject($subject, $compose_encoding) .
+				"\n";
+		} elsif (/^In-Reply-To:\s*(.+)\s*$/i) {
+			$initial_in_reply_to = $1;
+			next;
+		} elsif (/^Reply-To:\s*(.+)\s*$/i) {
+			$reply_to = $1;
+		} elsif (/^From:\s*(.+)\s*$/i) {
+			$sender = $1;
+			next;
+		} elsif (/^(?:To|Cc|Bcc):/i) {
+			print __("To/Cc/Bcc fields are not interpreted yet, they have been ignored\n");
+			next;
 		}
+		print $c2 $_;
 	}
 	close $c;
+	close $c2;
 
-	open my $c2, ">", $compose_filename . ".final"
-	or die sprintf(__("Failed to open %s.final: %s"), $compose_filename, $!);
-
-
-	if ($parsed_email{'From'}) {
-		$sender = delete($parsed_email{'From'});
-	}
-	if ($parsed_email{'In-Reply-To'}) {
-		$initial_in_reply_to = delete($parsed_email{'In-Reply-To'});
-	}
-	if ($parsed_email{'Reply-To'}) {
-		$reply_to = delete($parsed_email{'Reply-To'});
-	}
-	if ($parsed_email{'Subject'}) {
-		$initial_subject = delete($parsed_email{'Subject'});
-		print $c2 "Subject: " .
-			quote_subject($initial_subject, $compose_encoding) .
-			"\n";
-	}
-
-	if ($parsed_email{'MIME-Version'}) {
-		print $c2 "MIME-Version: $parsed_email{'MIME-Version'}\n",
-				"Content-Type: $parsed_email{'Content-Type'};\n",
-				"Content-Transfer-Encoding: $parsed_email{'Content-Transfer-Encoding'}\n";
-		delete($parsed_email{'MIME-Version'});
-		delete($parsed_email{'Content-Type'});
-		delete($parsed_email{'Content-Transfer-Encoding'});
-	} elsif (file_has_nonascii($compose_filename)) {
-		my $content_type = (delete($parsed_email{'Content-Type'}) or
-			"text/plain; charset=$compose_encoding");
-		print $c2 "MIME-Version: 1.0\n",
-			"Content-Type: $content_type\n",
-			"Content-Transfer-Encoding: 8bit\n";
-	}
-	# Preserve unknown headers
-	foreach my $key (keys %parsed_email) {
-		next if $key eq 'body';
-		print $c2 "$key: $parsed_email{$key}";
-	}
-
-	if ($parsed_email{'body'}) {
-		print $c2 "\n$parsed_email{'body'}\n";
-		delete($parsed_email{'body'});
-	} else {
+	if ($summary_empty) {
 		print __("Summary email is empty, skipping it\n");
 		$compose = -1;
 	}
-
-	close $c2;
-
 } elsif ($annotate) {
 	do_edit(@files);
 }
@@ -1009,32 +995,6 @@ sub ask {
 	return;
 }
 
-sub parse_header_line {
-	my $lines = shift;
-	my $parsed_line = shift;
-	my $addr_pat = join "|", qw(To Cc Bcc);
-
-	foreach (split(/\n/, $lines)) {
-		if (/^($addr_pat):\s*(.+)$/i) {
-		        $parsed_line->{$1} = [ parse_address_line($2) ];
-		} elsif (/^([^:]*):\s*(.+)\s*$/i) {
-		        $parsed_line->{$1} = $2;
-		}
-	}
-}
-
-sub filter_body {
-	my $c = shift;
-	my $body = "";
-	while (my $body_line = <$c>) {
-		if ($body_line !~ m/^GIT:/) {
-			$body .= $body_line;
-		}
-	}
-	return $body;
-}
-
-
 my %broken_encoding;
 
 sub file_declares_8bit_cte {
diff --git a/t/t9001-send-email.sh b/t/t9001-send-email.sh
index 263db3ad17..9644ff5793 100755
--- a/t/t9001-send-email.sh
+++ b/t/t9001-send-email.sh
@@ -2505,4 +2505,39 @@ test_expect_success $PREREQ 'test forbidSendmailVariables behavior override' '
 		HEAD^
 '
 
+test_expect_success $PREREQ '--compose handles lowercase headers' '
+	write_script fake-editor <<-\EOF &&
+	sed "s/^From:.*/from: edited-from@example.com/i" "$1" >"$1.tmp" &&
+	mv "$1.tmp" "$1"
+	EOF
+	clean_fake_sendmail &&
+	git send-email \
+		--compose \
+		--from="Example <from@example.com>" \
+		--to=nobody@example.com \
+		--smtp-server="$(pwd)/fake.sendmail" \
+		HEAD^ &&
+	grep "From: edited-from@example.com" msgtxt1
+'
+
+test_expect_success $PREREQ '--compose handles to headers' '
+	write_script fake-editor <<-\EOF &&
+	sed "s/^$/To: edited-to@example.com\n/" <"$1" >"$1.tmp" &&
+	echo this is the body >>"$1.tmp" &&
+	mv "$1.tmp" "$1"
+	EOF
+	clean_fake_sendmail &&
+	GIT_SEND_EMAIL_NOTTY=1 \
+	git send-email \
+		--compose \
+		--from="Example <from@example.com>" \
+		--to=nobody@example.com \
+		--smtp-server="$(pwd)/fake.sendmail" \
+		HEAD^ &&
+	# Ideally the "to" header we specified would be used,
+	# but the program explicitly warns that these are
+	# ignored. For now, just make sure we did not abort.
+	grep "To:" msgtxt1
+'
+
 test_done
-- 
2.42.0.980.g8b5f6199be


^ permalink raw reply related

* [PATCH 3/3] send-email: handle to/cc/bcc from --compose message
From: Jeff King @ 2023-10-20 10:15 UTC (permalink / raw)
  To: Michael Strawbridge; +Cc: Junio C Hamano, Bagas Sanjaya, Git Mailing List
In-Reply-To: <20231020100343.GA2194322@coredump.intra.peff.net>

If the user writes a message via --compose, send-email will pick up
varius headers like "From", "Subject", etc and use them for other
patches as if they were specified on the command-line. But we don't
handle "To", "Cc", or "Bcc" this way; we just tell the user "those
aren't interpeted yet" and ignore them.

But it seems like an obvious thing to want, especially as the same
feature exists when the cover letter is generated separately by
format-patch. There it is gated behind the --to-cover option, but I
don't think we'd need the same control here; since we generate the
--compose template ourselves based on the existing input, if the user
leaves the lines unchanged then the behavior remains the same.

So let's fill in the implementation; like those other headers we already
handle, we just need to assign to the initial_* variables. The only
difference in this case is that they are arrays, so we'll feed them
through parse_address_line() to split them (just like we would when
reading a single string via prompting).

Signed-off-by: Jeff King <peff@peff.net>
---
 Documentation/git-send-email.txt | 11 ++++++-----
 git-send-email.perl              | 16 ++++++++++++++--
 t/t9001-send-email.sh            | 16 +++++++++++-----
 3 files changed, 31 insertions(+), 12 deletions(-)

diff --git a/Documentation/git-send-email.txt b/Documentation/git-send-email.txt
index 021276329c..f4d7166275 100644
--- a/Documentation/git-send-email.txt
+++ b/Documentation/git-send-email.txt
@@ -68,11 +68,12 @@ This option may be specified multiple times.
 	Invoke a text editor (see GIT_EDITOR in linkgit:git-var[1])
 	to edit an introductory message for the patch series.
 +
-When `--compose` is used, git send-email will use the From, Subject,
-Reply-To, and In-Reply-To headers specified in the message. If the body
-of the message (what you type after the headers and a blank line) only
-contains blank (or Git: prefixed) lines, the summary won't be sent, but
-the headers mentioned above will be used unless they are removed.
+When `--compose` is used, git send-email will use the From, To, Cc, Bcc,
+Subject, Reply-To, and In-Reply-To headers specified in the message. If
+the body of the message (what you type after the headers and a blank
+line) only contains blank (or Git: prefixed) lines, the summary won't be
+sent, but the headers mentioned above will be used unless they are
+removed.
 +
 Missing From or In-Reply-To headers will be prompted for.
 +
diff --git a/git-send-email.perl b/git-send-email.perl
index bbda2a931b..9e21b0b3f4 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -861,6 +861,9 @@ sub get_patch_subject {
 	my $tpl_subject = $initial_subject || '';
 	my $tpl_in_reply_to = $initial_in_reply_to || '';
 	my $tpl_reply_to = $reply_to || '';
+	my $tpl_to = join(',', @initial_to);
+	my $tpl_cc = join(',', @initial_cc);
+	my $tpl_bcc = join(', ', @initial_bcc);
 
 	print $c <<EOT1, Git::prefix_lines("GIT: ", __(<<EOT2)), <<EOT3;
 From $tpl_sender # This line is ignored.
@@ -872,6 +875,9 @@ sub get_patch_subject {
 Clear the body content if you don't wish to send a summary.
 EOT2
 From: $tpl_sender
+To: $tpl_to
+Cc: $tpl_cc
+Bcc: $tpl_bcc
 Reply-To: $tpl_reply_to
 Subject: $tpl_subject
 In-Reply-To: $tpl_in_reply_to
@@ -928,8 +934,14 @@ sub get_patch_subject {
 		} elsif (/^From:\s*(.+)\s*$/i) {
 			$sender = $1;
 			next;
-		} elsif (/^(?:To|Cc|Bcc):/i) {
-			print __("To/Cc/Bcc fields are not interpreted yet, they have been ignored\n");
+		} elsif (/^To:\s*(.+)\s*$/i) {
+			@initial_to = parse_address_line($1);
+			next;
+		} elsif (/^Cc:\s*(.+)\s*$/i) {
+			@initial_cc = parse_address_line($1);
+			next;
+		} elsif (/^Bcc:/i) {
+			@initial_bcc = parse_address_line($1);
 			next;
 		}
 		print $c2 $_;
diff --git a/t/t9001-send-email.sh b/t/t9001-send-email.sh
index 9644ff5793..2e8e8137fb 100755
--- a/t/t9001-send-email.sh
+++ b/t/t9001-send-email.sh
@@ -2522,7 +2522,7 @@ test_expect_success $PREREQ '--compose handles lowercase headers' '
 
 test_expect_success $PREREQ '--compose handles to headers' '
 	write_script fake-editor <<-\EOF &&
-	sed "s/^$/To: edited-to@example.com\n/" <"$1" >"$1.tmp" &&
+	sed "s/^To: .*/&, edited-to@example.com/" <"$1" >"$1.tmp" &&
 	echo this is the body >>"$1.tmp" &&
 	mv "$1.tmp" "$1"
 	EOF
@@ -2534,10 +2534,16 @@ test_expect_success $PREREQ '--compose handles to headers' '
 		--to=nobody@example.com \
 		--smtp-server="$(pwd)/fake.sendmail" \
 		HEAD^ &&
-	# Ideally the "to" header we specified would be used,
-	# but the program explicitly warns that these are
-	# ignored. For now, just make sure we did not abort.
-	grep "To:" msgtxt1
+	# Check both that the cover letter used our modified "to" line,
+	# but also that it was picked up for the patch.
+	q_to_tab >expect <<-\EOF &&
+	To: nobody@example.com,
+	Qedited-to@example.com
+	EOF
+	grep -A1 "^To:" msgtxt1 >msgtxt1.to &&
+	test_cmp expect msgtxt1.to &&
+	grep -A1 "^To:" msgtxt2 >msgtxt2.to &&
+	test_cmp expect msgtxt2.to
 '
 
 test_done
-- 
2.42.0.980.g8b5f6199be

^ permalink raw reply related

* Re: [PATCH 0/8] chunk-format: introduce `pair_chunk_expect()` convenience API
From: Jeff King @ 2023-10-20 10:23 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Junio C Hamano
In-Reply-To: <cover.1697225110.git.me@ttaylorr.com>

On Fri, Oct 13, 2023 at 03:25:12PM -0400, Taylor Blau wrote:

> While reviewing this series, I noticed a couple of spots that would be
> helped by having a convenience function that stores the start of a
> chunk in a designated location *after* checking that the chunk has the
> expected size.
> 
> I called this function `pair_chunk_expect()` and touched up seven spots
> that I think could benefit from having a convenience function like this.
> 
> This applies directly on top of 'jk/chunk-bounds'. Thanks in advance for
> your review!

I'm still a little skeptical of this approach, just because I think it
it discourages adding further checks. And in particular, I was planning
to add monotonicity checks to the midx OIDF chunk based on the
discussion in the earlier thread. And likewise, I think I probably
should have put the commit-graph checks into its OIDF reader, rather
than saving them for verify_commit_graph_lite(). That would match the
way we check the validity of the other chunks.

I haven't actually started writing those patches, though, so I'm not
sure of the full details yet.

-Peff

^ permalink raw reply

* Re: [PATCH 5/8] commit-graph: read `BIDX` chunk with `pair_chunk_expect()`
From: Jeff King @ 2023-10-20 10:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, git
In-Reply-To: <xmqqcyxhxk0h.fsf@gitster.g>

On Sat, Oct 14, 2023 at 09:10:22AM -0700, Junio C Hamano wrote:

> > @@ -461,8 +449,10 @@ struct commit_graph *parse_commit_graph(struct repo_settings *s,
> >  	}
> >  
> >  	if (s->commit_graph_read_changed_paths) {
> > +		if (pair_chunk_expect(cf, GRAPH_CHUNKID_BLOOMINDEXES,
> > +				      &graph->chunk_bloom_indexes,
> > +				      st_mult(graph->num_commits, 4)) == -1)
> > +			warning(_("commit-graph changed-path index chunk is too small (%d)"), graph->num_commits * 4);
> >  		read_chunk(cf, GRAPH_CHUNKID_BLOOMDATA,
> >  			   graph_read_bloom_data, graph);
> >  	}
> 
> Overall the series looked sane, but the need for each caller to
> supply error messages, when the helper perfectly well knows how many
> bytes the caller expected and how many bytes there are avaiable, was
> a bit disturbing, as the level of detail given per each caller will
> inevitably become uneven.  Even now, some give an error() while
> others give a warning(), even though I suspect all of them should be
> data errors.
> 
> I wonder if it makes sense to stuff the message template in the
> pair_chunk_data structure and do
> 
> static int pair_chunk_expect_fn(const unsigned char *chunk_start,
> 				size_t chunk_size,
> 				void *data)
> {
> 	struct pair_chunk_data *pcd = data;
> 	if (pcd->expected_size != chunk_size)
> 		return error(_(pcd->message), pcd->expected_size, chunk_size);
> 	*pcd->p = chunk_start;
> 	return 0;
> }

One issue with the series as-is is that the "chunk is too small"
messages can be misleading when the chunk is in fact missing. We do say
"missing or corrupt" in the die message (at least for midx; I did not
update the similar ones for commit-graph), but the explicit "too small"
for a missing chunk seems to me to cross the line.

The caller can distinguish the cases by the actual value returned from
pair_chunk_expect(), but doing so makes the code a lot longer and harder
to read.

Your suggestion above takes care of it naturally (in the same way that
the existing code does, which basically is emitting the same message in
each read_chunk callback).

-Peff

^ permalink raw reply

* Re: [PATCH 2/3] Revert "send-email: extract email-parsing code into a subroutine"
From: Oswald Buddenhagen @ 2023-10-20 10:45 UTC (permalink / raw)
  To: Jeff King
  Cc: Michael Strawbridge, Junio C Hamano, Bagas Sanjaya,
	Git Mailing List
In-Reply-To: <20231020101310.GB2673716@coredump.intra.peff.net>

On Fri, Oct 20, 2023 at 06:13:10AM -0400, Jeff King wrote:
>But one thing that gives me pause is that the neither before or after
>this patch do we handle continuation lines like:
>
>  Subject: this is the beginning
>    and this is more subject
>
>And it would probably be a lot easier to add when storing the headers in
>a hash (it's not impossible to do it the other way, but you basically
>have to delay processing each line with a small state machine).
>
that seems like a rather significant point, doesn't it?

>So another option is to just fix the individual bugs separately.
>
... so that seems preferable to me, given that the necessary fixes seem 
rather trivial.

> I guess "readable" is up for debate here, but I find the inline handling
> a lot easier to follow
>
any particular reason for that?

> (and it's half as many lines; most of the diffstat is the new tests).

>-	if ($parsed_email{'From'}) {
>-		$sender = delete($parsed_email{'From'});
>-	}

this verbosity could be cut down somewhat using just

   $sender = delete($parsed_email{'From'});

and if the value can be pre-set and needs to be preserved,

   $sender = delete($parsed_email{'From'}) // $sender;

but this seems kind of counter-productive legibility-wise.

regards

^ permalink raw reply

* Re: [PATCH v3 3/3] rev-list: add commit object support in `--missing` option
From: Karthik Nayak @ 2023-10-20 11:14 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, ps
In-Reply-To: <xmqqttqmtcc2.fsf@gitster.g>

On Fri, Oct 20, 2023 at 1:35 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Junio C Hamano <gitster@pobox.com> writes:
>
> > Do we really need to allocate a new bit in the object flags, which
> > is already a scarce resource?
>
> Clarification.  I was *not* wondering if we can steal and (re|ab)use
> a bit that is used for other purposes, in order to avoid allocating
> a new bit.
>
> Rather, I was wondering if we need to use object flags to mark these
> objects, or can do what we want to do without using any object flags
> at all.  For the purpose of reporting "missing" objects, wouldn't it
> be sufficient to walk the object graph and report our findings as we
> go?  To avoid reporting the same object twice, as we reasonably can
> expect that the missing objects are minority (compared to the total
> number of objects), perhaps the codepath that makes such a report
> can use a hashmap of object_ids or something, for example.
>

This is kind of hard to do since the revision walking happens in revision.c
and list-objects.c but the missing commits are needed in builtin/rev-list.c.
So even if we build a list of missing commits in list-objects.c, there is no
nice way to send this back to rev-list.c.

The only way we communicate between these layers is through callbacks,
i.e. the show_commit() function in rev-list.c is called for every commit that
the revision walk traverses over.

I'm not entirely sure what the best path to take here to be honest. I will look
and see if there are any bits where overlapping doesn't cause any issues
in the meantime.

^ permalink raw reply

* Re: [Outreachy] Move existing tests to a unit testing framework
From: Achu Luma @ 2023-10-20 13:18 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Christian Couder
In-Reply-To: <CAFR+8Dwxr3iV+R7een0t2sYXUWu1XHhQcLVuqMhOsSg9Bt4wrg@mail.gmail.com>

Dear Git Community and Mentors,

I hope this email finds you well. As a follow-up to my previous
application, I'd like to provide additional details on the process of
migrating existing unit tests from the t/helper/ directory to the new
Git unit test framework.
-- Identify Target Unit Tests: Start by identifying the specific unit
tests in the t/helper/ directory that we want to port to the new Git
unit test framework. Ensure that the tests are suitable for migration
and that the benefits of doing so outweigh the effort(By avoiding
integration tests). The following points have been developed with on
going work on the unit-tests framework visible here: 1-
https://lore.kernel.org/git/0169ce6fb9ccafc089b74ae406db0d1a8ff8ac65.1688165272.git.steadmon@google.com/
2- https://github.com/steadmon/git/blob/unit-tests-asciidoc/Documentation/technical/unit-tests.adoc

-- Create a New C Test File: For each unit test I plan to migrate,
create a new C source file (.c) in the Git project's test suite
directory(t/unit-tests). Name it appropriately to reflect the purpose
of the test.

--  Include Necessary Headers:In the new C test file, include the
necessary Git unit test framework headers. Typically, this includes
headers like "test-lib.h" and others relevant to the specific test.
#include "test-lib.h"

-- Convert Test Logic: Refactor the test logic from the original Shell
script into the new C-based test format. Use the testing macros
provided by the Git unit test framework, such as test_expect_success,
test_expect_failure, etc., to define the tests.
test_expect_success("simple progress display", "{
    // Test logic here...
}");

-- Add Test Descriptions: Provide clear and informative descriptions
for each test using the testing macros. These descriptions will help
in identifying the purpose of each test when the test suite is run.

-- Define a Test Entry Point: Create a cmd_main function as the entry
point for the C-based tests. Inside this function, include the test
functions using the testing macros.
int cmd_main(int argc, const char **argv) {
    // Test functions...
    return test_done();
}

-- Ensure TAP Format Output: Ensure that the C-based tests produce
output in the Test Anything Protocol (TAP) format. This format
includes the test name, status (ok or not ok), and any diagnostic
information.

-- Test Interaction: Ensure that the migrated tests interact correctly
with the new Git unit test framework and any other tests that may be
relevant. Consider dependencies and interactions with other parts of
the Git project.

-- Test Execution: Run the migrated tests to verify that they produce
the expected results when executed as part of the Git project's test
suite. Use the Git testing framework's test runners to execute the
tests.

-- Documentation Update: Update the Git project's documentation to
reflect the changes made during the migration. Include a reference to
the original unit tests in the t/helper/ directory and indicate that
these tests have been ported to the new Git unit test framework.

By following these points, I think I can successfully port existing
unit tests from the t/helper/ directory to use the new Git unit test
framework. This migration helps standardize and streamline the testing
process within the Git project, improving code quality and
maintainability.

Next Steps:

I am eager to discuss these suggestions and collaborate with the Git
community to ensure the success of this project. I will continue to
engage with the community, seek guidance, and refine my proposal as
per your suggestions.
 I look forward to the opportunity to contribute to the Git project
and help make it even more robust and reliable.

Best Regards,
Achu Luma


On Fri, Oct 20, 2023 at 2:15 PM Achu Luma <ach.lumap@gmail.com> wrote:
>
> Dear Git Community and Mentors,
>
> I hope this email finds you well. As a follow-up to my previous application, I'd like to provide additional details on the process of migrating existing unit tests from the t/helper/ directory to the new Git unit test framework.
> -- Identify Target Unit Tests: Start by identifying the specific unit tests in the t/helper/ directory that we want to port to the new Git unit test framework. Ensure that the tests are suitable for migration and that the benefits of doing so outweigh the effort(By avoiding integration tests). The following points have been developed with on going work on the unit-tests framework visible here: 1- https://lore.kernel.org/git/0169ce6fb9ccafc089b74ae406db0d1a8ff8ac65.1688165272.git.steadmon@google.com/
> 2- https://github.com/steadmon/git/blob/unit-tests-asciidoc/Documentation/technical/unit-tests.adoc
>
> -- Create a New C Test File: For each unit test I plan to migrate, create a new C source file (.c) in the Git project's test suite directory(t/unit-tests). Name it appropriately to reflect the purpose of the test.
>
> --  Include Necessary Headers:In the new C test file, include the necessary Git unit test framework headers. Typically, this includes headers like "test-lib.h" and others relevant to the specific test.
> #include "test-lib.h"
>
> -- Convert Test Logic: Refactor the test logic from the original Shell script into the new C-based test format. Use the testing macros provided by the Git unit test framework, such as test_expect_success, test_expect_failure, etc., to define the tests.
> test_expect_success("simple progress display", "{
>     // Test logic here...
> }");
>
> -- Add Test Descriptions: Provide clear and informative descriptions for each test using the testing macros. These descriptions will help in identifying the purpose of each test when the test suite is run.
>
> -- Define a Test Entry Point: Create a cmd_main function as the entry point for the C-based tests. Inside this function, include the test functions using the testing macros.
> int cmd_main(int argc, const char **argv) {
>     // Test functions...
>     return test_done();
> }
>
> -- Ensure TAP Format Output: Ensure that the C-based tests produce output in the Test Anything Protocol (TAP) format. This format includes the test name, status (ok or not ok), and any diagnostic information.
>
> -- Test Interaction: Ensure that the migrated tests interact correctly with the new Git unit test framework and any other tests that may be relevant. Consider dependencies and interactions with other parts of the Git project.
>
> -- Test Execution: Run the migrated tests to verify that they produce the expected results when executed as part of the Git project's test suite. Use the Git testing framework's test runners to execute the tests.
>
> -- Documentation Update: Update the Git project's documentation to reflect the changes made during the migration. Include a reference to the original unit tests in the t/helper/ directory and indicate that these tests have been ported to the new Git unit test framework.
>
> By following these points, I think I can successfully port existing unit tests from the t/helper/ directory to use the new Git unit test framework. This migration helps standardize and streamline the testing process within the Git project, improving code quality and maintainability.
>
> Next Steps:
>
> I am eager to discuss these suggestions and collaborate with the Git community to ensure the success of this project. I will continue to engage with the community, seek guidance, and refine my proposal as per your suggestions.
>  I look forward to the opportunity to contribute to the Git project and help make it even more robust and reliable.
>
> Best Regards,
> Achu Luma
>
> On Mon, Oct 9, 2023 at 10:15 AM Luma <ach.lumap@gmail.com> wrote:
>>
>> Dear Git Community and Mentors,
>>
>> I hope this email finds you well. My name is Achu Luma, and I am
>> excited to submit my application for the Outreachy program with the
>> Git project.
>> I have been a passionate open-source enthusiast and a dedicated Git
>> user for two years, and I am thrilled at the opportunity to contribute
>> to the Git community.
>>
>> Introduction:
>> ----------------
>> I study Computer Science from the University of Bamenda. Over the past
>> 4 years, I have gained experience in software development and have
>> participated in various class projects.
>>
>> Why I am a Good Fit:
>> ----------------------
>> 1. Proficient with Git: I have a good understanding of Git's version
>> control system and have successfully used it in both personal and
>> educational projects.
>>
>> 2. Strong Programming Skills: My programming skills in python, C etc
>> and experience with git, shell etc make me well-prepared to contribute
>> to Git's codebase.
>>
>> 3. Open Source Involvement: I have actively contributed to git
>> open-source project, including
>> https://public-inbox.org/git/20231003174853.1732-1-ach.lumap@gmail.com/T/#t
>> , where I have submitted a patch that has been well-received.
>>
>> Project Idea - Moving Existing Tests to a Unit Testing Framework:
>> ------------------------------------------------------------------
>> I am excited about "Moving Existing Tests to a Unit Testing Framework".
>> The objective of this project is to enhance the efficiency and
>> maintainability of Git's testing infrastructure by porting existing
>> unit tests to a unit testing framework.
>>
>> **Project Plan**:
>> - Evaluate the existing tests in the `t/helper/` directory to identify
>> those suitable for migration to the unit testing framework.
>> - Develop a migration strategy and create detailed plans for adapting
>> these tests.
>> - Port the identified tests to the unit testing framework while
>> ensuring they maintain their functionality.
>> - Verify the correctness and reliability of the migrated tests through
>> thorough testing and validation.
>> - Collaborate with the Git community to gather feedback and make
>> necessary adjustments.
>>
>> **Timeline**:
>> - Community Bonding (Oct 2 - Nov 20): Familiarize myself with the Git
>> project and establish communication channels.
>> - Coding Phase (Dec 4 - Jan 15): Implement the migration of tests and
>> seek feedback from mentors and the community.
>> - Testing and Validation (Jan 15 - Feb 15): Rigorously test the
>> migrated tests and make improvements based on feedback.
>> - Documentation and Finalization (Feb 15 - March 1): Document the
>> migration process and finalize the project.
>>
>> **Contribution to Git Community**:
>> I have actively participated in Git's mailing-list discussions and
>> submitted a patch(
>> https://public-inbox.org/git/20231003174853.1732-1-ach.lumap@gmail.com/T/#t)
>> for review. I have received positive feedback on my contributions, and
>> it has been queued to be merged into official Git branches maintained
>> by Junio. Additionally, I have been involved in discussions related to
>> the git project.(https://public-inbox.org/git/CAFR+8DyN8vbuvdgZPkSVqS2=sqconwhx3QfcpJ0+Wi_oCA=s0w@mail.gmail.com/T/#t)
>>
>> **Proposal Drafts**:
>> I have shared drafts of this proposal on the Git mailing list
>> git@vger.kernel.org  and will incorporate valuable feedback provided
>> by the community.
>>
>> **Next Steps**:
>> I am eager to discuss my proposal further and collaborate with the Git
>> community to ensure the success of this project. I will continue to
>> engage with the community, seek guidance, and refine my proposal as
>> per your suggestions.
>>
>> Thank you for considering my application. I look forward to the
>> opportunity to contribute to the Git project and help make it even
>> more robust and reliable.
>>
>> Best Regards,
>>
>> On Wed, Oct 4, 2023 at 12:36 AM Luma <ach.lumap@gmail.com> wrote:
>> >
>> > oh yes, "Move existing tests to a unit testing framework" was the
>> > only listed project for this current Outreachy cohort. So, I used it
>> > to express my intent.
>> > I appreciate the clarification on authorship identity for patches. I
>> > will update subsequent patches with a legal full name to conform to
>> > the community rules.
>> >
>> > Regards.
>> >
>> > On Tue, Oct 3, 2023 at 7:51 PM Junio C Hamano <gitster@pobox.com> wrote:
>> > >
>> > > Luma <ach.lumap@gmail.com> writes:
>> > >
>> > > > Hi;
>> > > > My name is Luma, and  I wanted to take a moment to introduce myself
>> > > > and share some
>> > > > insights on an essential aspect of  avoiding pipes in git related
>> > > > commands in test scripts.
>> > > >
>> > > > I am an outreachy applicant for the December 2023 cohort and look
>> > > > forward to learning from you.
>> > >
>> > > I notice that the title of the message and the immediate topic you
>> > > discuss in the body of the message do not match.  I presume that the
>> > > topic on the title is what you prefer to work on if the unit testing
>> > > framework is ready by the time Outreachy program starts, and the
>> > > mention about "do not clobber exit code of Git with pipes in the
>> > > tests" is your "dip the tip of a toe in water" microproject?
>> > >
>> > > Welcome to the Git development community.
>> > >
>> > > Do you have a single word name?  If so please disregard the below,
>> > > but in case "Luma" is just a nickname (e.g. like I am introducing
>> > > myself to my Git friends "Hi, I am Gitster!") you use online, please
>> > > read on.
>> > >
>> > > For signing off your patches, we'd prefer to see your real name
>> > > used, as Signed-off-by: is meant to have legal significance.  And
>> > > because we also expect the authorship identity to match the
>> > > name/e-mail of the sign-off, it would mean your patch submissions
>> > > are expected to look like:
>> > >
>> > >         From: Luma <ach.lumap@gmail.com>
>> > >         Subject: ... title of the patch goes here ...
>> > >
>> > >         ... body of the proposed commit log message goes here...
>> > >
>> > >         Signed-off-by: Luma <ach.lumap@gmail.com>
>> > >
>> > > but "Luma" replaced with your full real name.
>> > >
>> > > Thanks.

^ permalink raw reply

* Thanks and One Suggestion
From: Brock Whitaker @ 2023-10-20 14:20 UTC (permalink / raw)
  To: git

Thank you for the git-scm site. It's an invaluable resource I use often.

My only issue with it is the text isn't as easy to read as I would
like. The orange text on a white background is a little tough to read
sometimes. A higher contrast between text and background color would
help.

Splitting hairs,

B

PS - Thanks again for a super useful resource!

^ permalink raw reply

* Re: [PATCH v3 3/3] rev-list: add commit object support in `--missing` option
From: Karthik Nayak @ 2023-10-20 14:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, ps
In-Reply-To: <CAOLa=ZR4Qd9x-zVv9OtL7KTU00W2uT-kxdhKOLFZbi5cnHdiSQ@mail.gmail.com>

On Fri, Oct 20, 2023 at 1:14 PM Karthik Nayak <karthik.188@gmail.com> wrote:
> I'm not entirely sure what the best path to take here to be honest. I will look
> and see if there are any bits where overlapping doesn't cause any issues
> in the meantime.

Was trying to use bit number 12, which coincides METAINFO_SHOWN in
builtin/blame.c.
From skimming over the code, METAINFO_SHOWN is used only within
blame.c and there
should not be collisions here since blame.c doesn't set the
do_not_die_on_missing_objects bit
either.

The tests seem to pass here too [1]. This could be the potential
solution here, Junio, what do you think?
I could send another version if this approach looks good.

[1] https://gitlab.com/gitlab-org/git/-/jobs/5339132966

Diff against the tip of v3:

diff --git a/object.h b/object.h
index b76830fce1..0cb8c02b95 100644
--- a/object.h
+++ b/object.h
@@ -62,7 +62,7 @@ void object_array_init(struct object_array *array);

 /*
  * object flag allocation:
- * revision.h:               0---------10         15             22------28
+ * revision.h:               0---------10  12    15             22------27
  * fetch-pack.c:             01    67
  * negotiator/default.c:       2--5
  * walker.c:                 0-2
@@ -82,7 +82,7 @@ void object_array_init(struct object_array *array);
  * builtin/show-branch.c:    0-------------------------------------------26
  * builtin/unpack-objects.c:                                 2021
  */
-#define FLAG_BITS  29
+#define FLAG_BITS  28

 #define TYPE_BITS 3

diff --git a/revision.h b/revision.h
index d3c1ca0f4e..d2de9d65e4 100644
--- a/revision.h
+++ b/revision.h
@@ -38,6 +38,9 @@
 #define PATCHSAME    (1u<<9)
 #define BOTTOM        (1u<<10)

+/* WARNING: This is also used as METAINFO_SHOWN in builtin/blame.c. */
+#define MISSING     (1u<<12)
+
 /* WARNING: This is also used as REACHABLE in commit-graph.c. */
 #define PULL_MERGE    (1u<<15)

@@ -53,7 +56,6 @@
 #define ANCESTRY_PATH    (1u<<27)
 #define ALL_REV_FLAGS    (((1u<<11)-1) | NOT_USER_GIVEN |
TRACK_LINEAR | PULL_MERGE)

-#define MISSING (1u<<28)

 #define DECORATE_SHORT_REFS    1
 #define DECORATE_FULL_REFS    2

^ permalink raw reply related

* Re: Thanks and One Suggestion
From: Emily Shaffer @ 2023-10-20 16:08 UTC (permalink / raw)
  To: Brock Whitaker, Git List
In-Reply-To: <1D0EE77C-6DEF-4509-9846-6897C73C7CDD@gmail.com>

Adding back the list - first I accidentally HTML'd my reply, and then
Brock replied directly to me :) In general, please make sure to keep
the mailing list CC'd on responses, Brock. (And turn off HTML in your
MUA. Don't be like me :) )

I agree looking in my own browser that the monospace - both inline and
blockquote - is a little difficult to read. Take
https://git-scm.com/docs/git-commit#Documentation/git-commit.txt---amend
for an example of both. However, I'll leave it up to the git-scm folks
to recommend a fix.

 - Emily

On Fri, Oct 20, 2023 at 8:13 AM Brock Whitaker
<brock.e.whitaker@gmail.com> wrote:
>
> The code examples are the hardest to read. My monitor probably isn’t helping.
>
> Brock
>
> On Oct 20, 2023, at 10:56 AM, Emily Shaffer <nasamuffin@google.com> wrote:
>
> 
> Hi Brock, thanks for your feedback. Could you specify which orange-on-white is difficult for you? There are a few different examples I see at a quick glance - headings, fixed-width text, sidebar. Which is bothering you?
>
>  - Emily
>
> On Fri, Oct 20, 2023 at 7:20 AM Brock Whitaker <brock.e.whitaker@gmail.com> wrote:
>>
>> Thank you for the git-scm site. It's an invaluable resource I use often.
>>
>> My only issue with it is the text isn't as easy to read as I would
>> like. The orange text on a white background is a little tough to read
>> sometimes. A higher contrast between text and background color would
>> help.
>>
>> Splitting hairs,
>>
>> B
>>
>> PS - Thanks again for a super useful resource!
>>

^ permalink raw reply

* [PATCH v2] grep: die gracefully when outside repository
From: Kristoffer Haugsbakk @ 2023-10-20 16:40 UTC (permalink / raw)
  To: code; +Cc: gitster, ks1322, git
In-Reply-To: <087c92e3904dd774f672373727c300bf7f5f6369.1697317276.git.code@khaugsbakk.name>

Die gracefully when `git grep --no-index` is run outside of a Git
repository and the path is outside the directory tree.

If you are not in a Git repository and say:

    git grep --no-index search ..

You trigger a `BUG`:

    BUG: environment.c:213: git environment hasn't been setup
    Aborted (core dumped)

Because `..` is a valid path which is treated as a pathspec. Then
`pathspec` figures out that it is not in the current directory tree. The
`BUG` is triggered when `pathspec` tries to advice the user about how the
path is not in the current (non-existing) repository.

Reported-by: ks1322 ks1322 <ks1322@gmail.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>
---

Notes (series):
    v2:
    - Initialize `hint_path` after we know that we are in a Git repository
    - Apply Junio's suggestion for the test: https://lore.kernel.org/git/xmqqzg0hf0g8.fsf@gitster.g/

 pathspec.c      |  7 ++++++-
 t/t7810-grep.sh | 27 +++++++++++++++++++++++++++
 2 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/pathspec.c b/pathspec.c
index 3a3a5724c44..264b4929a55 100644
--- a/pathspec.c
+++ b/pathspec.c
@@ -467,7 +467,12 @@ static void init_pathspec_item(struct pathspec_item *item, unsigned flags,
 		match = prefix_path_gently(prefix, prefixlen,
 					   &prefixlen, copyfrom);
 		if (!match) {
-			const char *hint_path = get_git_work_tree();
+			const char *hint_path;
+
+			if (!have_git_dir())
+				die(_("'%s' is outside the directory tree"),
+				    copyfrom);
+			hint_path = get_git_work_tree();
 			if (!hint_path)
 				hint_path = get_git_dir();
 			die(_("%s: '%s' is outside repository at '%s'"), elt,
diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh
index 39d6d713ecb..84838c0fe1b 100755
--- a/t/t7810-grep.sh
+++ b/t/t7810-grep.sh
@@ -1234,6 +1234,33 @@ test_expect_success 'outside of git repository with fallbackToNoIndex' '
 	)
 '
 
+test_expect_success 'no repository with path outside $cwd' '
+	test_when_finished rm -fr non &&
+	rm -fr non &&
+	mkdir -p non/git/sub non/tig &&
+	(
+		GIT_CEILING_DIRECTORIES="$(pwd)/non" &&
+		export GIT_CEILING_DIRECTORIES &&
+		cd non/git &&
+		test_expect_code 128 git grep --no-index search .. 2>error &&
+		grep "is outside the directory tree" error
+	) &&
+	(
+		GIT_CEILING_DIRECTORIES="$(pwd)/non" &&
+		export GIT_CEILING_DIRECTORIES &&
+		cd non/git &&
+		test_expect_code 128 git grep --no-index search ../tig 2>error &&
+		grep "is outside the directory tree" error
+	) &&
+	(
+		GIT_CEILING_DIRECTORIES="$(pwd)/non" &&
+		export GIT_CEILING_DIRECTORIES &&
+		cd non/git &&
+		test_expect_code 128 git grep --no-index search ../non 2>error &&
+		grep "no such path in the working tree" error
+	)
+'
+
 test_expect_success 'inside git repository but with --no-index' '
 	rm -fr is &&
 	mkdir -p is/git/sub &&
-- 
2.42.0.2.g879ad04204


^ permalink raw reply related

* Re: [PATCH v3 3/3] rev-list: add commit object support in `--missing` option
From: Junio C Hamano @ 2023-10-20 16:41 UTC (permalink / raw)
  To: Karthik Nayak; +Cc: git, ps
In-Reply-To: <xmqqttqmtcc2.fsf@gitster.g>

Junio C Hamano <gitster@pobox.com> writes:

> Rather, I was wondering if we need to use object flags to mark these
> objects, or can do what we want to do without using any object flags
> at all.  For the purpose of reporting "missing" objects, wouldn't it
> be sufficient to walk the object graph and report our findings as we
> go?  To avoid reporting the same object twice, as we reasonably can
> expect that the missing objects are minority (compared to the total
> number of objects), perhaps the codepath that makes such a report
> can use a hashmap of object_ids or something, for example.

Digging from the bottom,

 * builtin/rev-list.c:show_commit() gets "struct rev_list_info *"
   that has "struct rev_info *" [*].

 * list-objects.c:do_traverse() calls revision.c:get_revision() to
   obtain commits, some of which may be missing ones, and things
   behind get_revision() are responsible for marking the commit as
   missing.  It has "struct traversal_context *", among whose
   members is the "revs" member that is the "struct rev_info *".

 * revision.c:get_revision() and machinery behind it ultimately
   discovers a missing commit in the revision.c:process_parents()
   that loops over the parents commit_list.  It of course has access
   to "struct rev_info *".

So, presumably, if we add a new member to "struct rev_info" that
optionally [*] points at an oidset that records the object names of
missing objects we discovered so far (i.e., the set of missing
objects), the location we set the MISSING bit of a commit can
instead add the object name of the commit to the set.  And we can
export a function that takes "struct rev_info *" and "struct object
*" (or "struct object_id *") to check for membership in the "set of
missing objects", which would be used where we checked the MISSING
bit of a commit.

I do not know the performance implications of going this route, but
if we do not find a suitable vacant bit, we do not have to use any
object flags bit to do this, if we go this route, I would think.  I
may be missing some details that breaks the above outline, though.


[Footnotes]

 * A potential #leftoverbits tangent.

   Why is "rev_list_info" structure declared in <bisect.h>?  I
   suspect that this is a fallout from recent header file shuffling,
   but given who uses it (among which is rev-list:show_commit() that
   has very little to do with bisection and uses the information in
   rev_list_info when doing its normal non-bisect things), it does
   not make much sense.

 * When .do_not_die_on_missing_objects is false, it can and should
   be left NULL, but presumably we use the "do not die" bit even
   when we are not necessarily collecting the missing objects?  So
   the new member cannot replace the "do not die" bit completely.

^ permalink raw reply

* Re: [PATCH] typo: fix the typo 'neeed' into 'needed' in the comment under merge-ort.c
From: Junio C Hamano @ 2023-10-20 16:44 UTC (permalink / raw)
  To: 王常新; +Cc: git
In-Reply-To: <42D0A6FD-1314-4977-92E6-BF6535D82E8A@gmail.com>

王常新 <wchangxin824@gmail.com> writes:

> It is my official name. But the email address is not a valid one. Should I rewrite the commit message?

We try to keep the name and address on Signed-off-by: the official
one that we can give court if/when some copyright troll sues us (see
Documentation/SubmittingPatches:sign-off), and one of them (if more
than one developers signed off the patch) must match the primary
author's name and address.

Thanks.

^ permalink raw reply

* Re: [PATCH v4 0/7] merge-ort: implement support for packing objects together
From: Junio C Hamano @ 2023-10-20 16:53 UTC (permalink / raw)
  To: Jeff King
  Cc: Taylor Blau, git, Elijah Newren, Eric W. Biederman,
	Patrick Steinhardt
In-Reply-To: <20231020072941.GD1642714@coredump.intra.peff.net>

Jeff King <peff@peff.net> writes:

>   - Having not worked with the bulk-checkin code much before, I thought
>     it only put in one blob per pack, and thus you'd have to teach it to
>     handle multiple objects, too. But fortunately I was wrong, and it
>     handles this case already. (I mention this mainly because we had
>     talked about it off-list a few weeks ago, and some of my confusion
>     may have seeped into my comments then).

You probably had it mixed up with a real limitation, i.e., "unlike
fast-import that considers the immediate previous object could be a
good delta base, this code does not even attempt any deltification"
;-)

>   - I did briefly wonder if we need this SOURCE abstraction at all. The
>     file source assumes we can seek() and read(), so it must be a
>     regular file. We could probably just mmap() it and treat it as
>     in-core, too (this is what index_fd() and index_path() do under the
>     hood!).

Ahhhh, I've never thought of that one.  I actually was hoping that
we can add an abstraction layer that can be reused elsewhere and
everywhere (not limited to the bulk-checkin codepath for its source
material), which might help folks who want to refactor Git enough to
allow them plug $CORP specific goodies like virtual filesystem layer
;-).

>     But it would probably be a disservice to any platforms that
>     do not support mmap, as the fallback is to read into a full buffer
>     (and the whole point of the bulk checkin code is to avoid that).

Yes.  The result was a pleasant read.

Thanks, both.

^ permalink raw reply

* Re: [PATCH v4 1/7] bulk-checkin: extract abstract `bulk_checkin_source`
From: Junio C Hamano @ 2023-10-20 16:55 UTC (permalink / raw)
  To: Jeff King
  Cc: Taylor Blau, git, Elijah Newren, Eric W. Biederman,
	Patrick Steinhardt
In-Reply-To: <20231020073500.GE1642714@coredump.intra.peff.net>

Jeff King <peff@peff.net> writes:

> On Thu, Oct 19, 2023 at 01:28:42PM -0400, Taylor Blau wrote:
>
>> +struct bulk_checkin_source {
>> +	enum { SOURCE_FILE } type;
>> +
>> +	/* SOURCE_FILE fields */
>> +	int fd;
>> +
>> +	/* common fields */
>> +	size_t size;
>> +	const char *path;
>> +};
>
> This is a pretty minor nit, but we may find that "SOURCE_FILE" is not
> sufficiently name-spaced. Enum tags are in the global namespace, so
> the compiler will barf if there are any conflicts.

A very good point.  This was one of the reasons why I suggested
avoiding the switch() based on a new enum altogether and use a
vtable based approach instead.  But it may do while this one is
private to the file and never exposed to other subsystems.

> It might be OK here, since this is local to a single C file (so we at
> least are not hurting other code), but we may be in trouble if code in a
> header file is less careful. There is already a near-miss here with
> GREP_SOURCE_FILE, but fortunately grep.h is indeed careful. :)
>
> (I notice that ref-filter.c similarly uses SOURCE_* for an internal
> enum).

Thanks.

^ permalink raw reply

* Re: Thanks and One Suggestion
From: Eric Sunshine @ 2023-10-20 17:01 UTC (permalink / raw)
  To: Brock Whitaker; +Cc: git
In-Reply-To: <CAGUSxisVxXDLyFQ-KQHQdtTqF5UihhnJ22rtTn-r9J94ToeuXg@mail.gmail.com>

On Fri, Oct 20, 2023 at 10:20 AM Brock Whitaker
<brock.e.whitaker@gmail.com> wrote:
> Thank you for the git-scm site. It's an invaluable resource I use often.
>
> My only issue with it is the text isn't as easy to read as I would
> like. The orange text on a white background is a little tough to read
> sometimes. A higher contrast between text and background color would
> help.

Indeed, a color contrast analyzer, such as [1], indicates that the
specific orange/white combination used by git-scm.com is less than
unsuitable (at least for normal size text).

The git-scm.com site is maintained by a project separate from the Git
project itself. You can report the poor color contrast at their issue
tracker[2].

[1]: https://color.adobe.com/create/color-contrast-analyzer
[2]: https://github.com/git/git-scm.com/issues

^ permalink raw reply

* Re: [PATCH v2] grep: die gracefully when outside repository
From: Eric Sunshine @ 2023-10-20 17:05 UTC (permalink / raw)
  To: Kristoffer Haugsbakk; +Cc: gitster, ks1322, git
In-Reply-To: <5c8ef6bec1c99e0fae7ada903885a8e77f8137f9.1697819838.git.code@khaugsbakk.name>

On Fri, Oct 20, 2023 at 12:40 PM Kristoffer Haugsbakk
<code@khaugsbakk.name> wrote:
> Die gracefully when `git grep --no-index` is run outside of a Git
> repository and the path is outside the directory tree.
>
> If you are not in a Git repository and say:
>
>     git grep --no-index search ..
>
> You trigger a `BUG`:
>
>     BUG: environment.c:213: git environment hasn't been setup
>     Aborted (core dumped)
>
> Because `..` is a valid path which is treated as a pathspec. Then
> `pathspec` figures out that it is not in the current directory tree. The
> `BUG` is triggered when `pathspec` tries to advice the user about how the
> path is not in the current (non-existing) repository.

s/advice/advise/

(probably not worth a reroll)

> Reported-by: ks1322 ks1322 <ks1322@gmail.com>
> Helped-by: Junio C Hamano <gitster@pobox.com>
> Signed-off-by: Kristoffer Haugsbakk <code@khaugsbakk.name>

^ permalink raw reply

* Re: [PATCH] typo: fix the typo 'neeed' into 'needed' in the comment under merge-ort.c
From: Junio C Hamano @ 2023-10-20 17:06 UTC (permalink / raw)
  To: Bagas Sanjaya; +Cc: 王常新, git
In-Reply-To: <facb06e0-9ad7-40eb-83e3-0a951931496d@gmail.com>

Bagas Sanjaya <bagasdotme@gmail.com> writes:

> On 20/10/2023 09:14, 王常新 wrote:
>> It is my official name. But the email address is not a valid one. Should I rewrite the commit message?
>> 
>
> Please don't top-post, reply inline with appropriate context instead.
>
> Did you mean that you can't receive ML traffic on your @qq.com address?
> If so, resend with your @gmail.com address as patch author (you need
> to set user.name and user.email accordingly). 

Isn't that opposite from what we would normally recommend, though?

If the true authorship e-mail is in an environment where sending
patches are inconvenient, you would still want to do your commits
under the identity you want to appear in the final history of the
project, so you do not futz with user.name and user.email; you'd
send a message with in-body header that shows an extra From: line
(followed by a blank line) that records the true authorship from an
environment whose sender e-mail address may differ.

E.g.  You would see these fields in the e-mail heeader:

	From: 王常新 <wchangxin824@gmail.com>
	Subject: [PATCH] merge-ort.c: comment typofix

and your message would begin like so (indented only for illustration
purposes---the real one should be flushed to the left edge of the
page):

	From: 王常新 <real-email-address-of-mr-wang@do.ma.in>

	There is 'needed' misspelt as 'neeed' in the source file;
	fix it.

	Signed-off-by: 王常新 <real-email-address-of-mr-wang@do.ma.in>

This feature is designed so that other people, different from the
author of the patch, can relay it to the recipient(s) while
preserving the authorship information.

Although it is not needed in this case, you can override "Subject:"
the same way with an in-body header, like so:

	From: 王常新 <real-email-address-of-mr-wang@do.ma.in>
	Subject: real title of the patch to be used

	There is 'needed' misspelt as 'neeed' in the source file;
	fix it.

	Signed-off-by: 王常新 <real-email-address-of-mr-wang@do.ma.in>

and it would replace what we read from the Subject: e-mail header.

^ permalink raw reply

* Re: [PATCH v2] builtin/branch.c: adjust error messages to coding guidelines
From: Junio C Hamano @ 2023-10-20 17:10 UTC (permalink / raw)
  To: Christian Couder; +Cc: Isoken Ibizugbe, git
In-Reply-To: <CAP8UFD1U0cz3CDdE_0d0FUgPKP2pX-ZcVDJqz2tW-+rnZ7rvQw@mail.gmail.com>

Christian Couder <christian.couder@gmail.com> writes:

>> Hello Junio, I would appreciate your comment on this.
>
> Please don't expect Junio to give direct feedback on all the patches
> sent to the mailing list. This patch has already been reviewed by
> Rubén.

Correct.

> If Junio doesn't review it and if this patch is not mentioned
> in the next "What's cooking in git.git" email from Junio, it likely
> means that you should send a version 3 addressing Rubén's feedback
> about the commit message.

While the above may be a good advice, I try to be easier to read
than the above tealeaf-reading, especially to new contributors ;-)

At least I should comment on Rubén's comment, but I haven't got
around to it yet.

Thanks.





^ permalink raw reply

* Re: [PATCH v4 00/17] bloom: changed-path Bloom filters v2 (& sundries)
From: Taylor Blau @ 2023-10-20 17:27 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, Elijah Newren, Eric W. Biederman, Jeff King,
	Patrick Steinhardt
In-Reply-To: <xmqq34y71phj.fsf@gitster.g>

On Wed, Oct 18, 2023 at 04:26:48PM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > (Rebased onto the tip of 'master', which is 3a06386e31 (The fifteenth
> > batch, 2023-10-04), at the time of writing).
>
> Judging from 17/17 that has a free_commit_graph() call in
> close_commit_graph(), that was merged in the eighteenth batch,
> the above is probably untrue.  I'll apply to the current master and
> see how it goes instead.

Worse than that, I sent this `--in-reply-to` the wrong thread :-<.

Sorry about that, and indeed you are right that the correct base for
this round should be a9ecda2788 (The eighteenth batch, 2023-10-13).

I'm optimistic that with the amount of careful review that this topic
has already received, that this round should do the trick. But if there
are more comments and we end up re-rolling it, I'll break this thread
and split out the v5 into it's thread to avoid further confusion.

> > Thanks to Jonathan, Peff, and SZEDER who have helped a great deal in
> > assembling these patches. As usual, a range-diff is included below.
> > Thanks in advance for your
> > review!
>
> Thanks.

Thank you, and sorry for the mistake on my end.

Thanks,
Taylor

^ permalink raw reply

* Re: [PATCH 3/3] send-email: handle to/cc/bcc from --compose message
From: Eric Sunshine @ 2023-10-20 17:30 UTC (permalink / raw)
  To: Jeff King
  Cc: Michael Strawbridge, Junio C Hamano, Bagas Sanjaya,
	Git Mailing List
In-Reply-To: <20231020101524.GA2673857@coredump.intra.peff.net>

On Fri, Oct 20, 2023 at 6:15 AM Jeff King <peff@peff.net> wrote:
> If the user writes a message via --compose, send-email will pick up
> varius headers like "From", "Subject", etc and use them for other

s/varius/various/

> patches as if they were specified on the command-line. But we don't
> handle "To", "Cc", or "Bcc" this way; we just tell the user "those
> aren't interpeted yet" and ignore them.
>
> But it seems like an obvious thing to want, especially as the same
> feature exists when the cover letter is generated separately by
> format-patch. There it is gated behind the --to-cover option, but I
> don't think we'd need the same control here; since we generate the
> --compose template ourselves based on the existing input, if the user
> leaves the lines unchanged then the behavior remains the same.
>
> So let's fill in the implementation; like those other headers we already
> handle, we just need to assign to the initial_* variables. The only
> difference in this case is that they are arrays, so we'll feed them
> through parse_address_line() to split them (just like we would when
> reading a single string via prompting).
>
> Signed-off-by: Jeff King <peff@peff.net>

^ permalink raw reply

* Re: [PATCH v2] builtin/branch.c: adjust error messages to coding guidelines
From: Junio C Hamano @ 2023-10-20 17:31 UTC (permalink / raw)
  To: Rubén Justo; +Cc: Isoken June Ibizugbe, git, christian.couder
In-Reply-To: <331e1ab3-2e8c-497d-a05d-ef197d664188@gmail.com>

Rubén Justo <rjusto@gmail.com> writes:

> On 19-oct-2023 09:40:51, Isoken June Ibizugbe wrote:
>
>> As per the CodingGuidelines document, it is recommended that a single-line
>> message provided to error messages such as die(), error() and warning(),
>
> This is confusing; some multi-line messages are fixed in this series.
>
>> should start with a lowercase letter and should not end with a period.
>> Also this patch fixes the tests broken by the changes.

"Also this patch fixes the tests broken by the changes" -> "Adjust
tests to match updated messages".

> Well done, describing why the tests are touched.

Nicely reviewed.  But it is unclear to me how we want to mention
updates to multi-line messages.  Do we have a good reference in the
guidelines we can copy?

The general desire is for a single-liner to look like so:

    error: the branch 'foo' is not fully merged

and an untold assumption is that we strongly prefer a single
sentence in a single-liner error message---a full-stop to separate
multiple sentences can be omitted for brevity.

But we have follow-up sentences that are not strictly "errors" in
some messages, e.g.,

>> -		error(_("The branch '%s' is not fully merged.\n"
>> +		error(_("the branch '%s' is not fully merged.\n"
>>  		      "If you are sure you want to delete it, "
>> -		      "run 'git branch -D %s'."), branchname, branchname);
>> +		      "run 'git branch -D %s'"), branchname, branchname);

In a modern codebase with facilities from advice.c, we would
probably do "a single-liner error message, optionally followed by a
hint message", i.e.

    error: the branch 'foo' is not fully merged
    hint: If you are sure you want to delete it,
    hint: run "git branch -D foo".

but this message apparently predates wide use of the advice_if() and
friends.

But rewriting this error message to use advice is probably outside
the scope of this patch.  But for completeness it would probably
look something like this (with necessary ADVICE_FOO defined):

	error(_("the branch '%s' is not fully merged"), branchname);
	advice_if_enabled(ADVICE_DELETE_BRANCH,
                	  _("If you are sure you want to delete it,"
			    "run 'git branch -D %s'."));

If I were doing this patch, I'd probably just add

	/* NEEDSWORK: turn the remainder ofthe message into advice */

in front of the existing error() call for this one without touching
the message itself, as it will have to be touched again when such a
change happens.

Thanks.

^ permalink raw reply

* Re: [PATCH] commit: detect commits that exist in commit-graph but not in the ODB
From: Junio C Hamano @ 2023-10-20 17:35 UTC (permalink / raw)
  To: Jeff King; +Cc: Patrick Steinhardt, git, Karthik Nayak, Taylor Blau
In-Reply-To: <20231020100024.GA2194074@coredump.intra.peff.net>

Jeff King <peff@peff.net> writes:

> I guess that is really like a global variable but even more gross. ;)

Yeah, I tend to agree, but ...

> But it is nice that it can cross process boundaries, because "how
> careful do we want to be" may be in the eye of the caller, especially
> for plumbing commands. E.g., even without --missing, you may want
> "rev-list" to be extra careful about checking the existence of commits.

... these are all very good points.

> The caller in check_connected() could arguably turn that on by default
> (the way we used to turn on GIT_REF_PARANOIA for pruning repacks before
> it became the default).

True.

Thanks.


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox