Git development
 help / color / mirror / Atom feed
* Re: [PATCH v2 1/4] wt-status: read HEAD and ORIG_HEAD via the refdb
From: Junio C Hamano @ 2023-12-13 15:15 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Ramsay Jones, git, Taylor Blau, Phillip Wood
In-Reply-To: <ZXlfeWtDgr1GQFCL@tanuki>

Patrick Steinhardt <ps@pks.im> writes:

>> Me neither, but once you start thinking about getting rid of the
>> need to use one-file-per-ref filesystem, being able to maintain all
>> refs, including the pseudo refs, in one r/w store backend, becomes a
>> very tempting goal.  From that point of view, I do not have problem
>> with the idea to move _all_ pseudorefs to reftable.
>
> Yes, we're in agreement.
>
>> But I do have reservations on what Patrick, and the code he
>> inherited from Han-Wen, calls "special refs" (which is not defined
>> in the glossary at all), namely, refs.c:is_special_ref() and its
>> callers.
>
> I do not want to add "special refs" to the glossary because ultimately
> they should go away, with two exceptions: FETCH_HEAD and MERGE_HEAD.
> Once we're there we can of course discuss whether we want to explicitly
> point them out in the glossary and give them a special name.

OK, I somehow got a (wrong) impression that you are very close to
the finish line.  If the intention is to leave many others still in
the "special" category (for only the reasons of inertia), with a
vision that some selected few must remain "special" with their own
good reasons, then I am perfectly fine.

Thanks.

^ permalink raw reply

* Re: [PATCH] tests: prefer host Git to verify chainlint self-checks
From: Junio C Hamano @ 2023-12-13 15:11 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Eric Sunshine
In-Reply-To: <ZXlbNlG28e1sAYPU@tanuki>

Patrick Steinhardt <ps@pks.im> writes:

>> I do not think "prefer host Git" is necessarily a good idea; falling
>> back to use host Git is perfectly fine, of course.
>
> Why is that, though?

Mostly because your "differences in features supported by just-built
one and what happens to be on $PATH can cause problems" cuts both
ways, and as a general principle to work around such issues, using
just-built one is a better discipline.  The features the build
infrastructure ("self-checks" being discussed is a part of it) of a
particular version of Git source depends on are more likely to be
found in the binary that will be built from the matching source,
than what happens to be on $PATH that may be from a year-old version.

As you said, you'd need to accomodate need for those who are
initially porting or building Git on a host without an installed
one.  If we were doing a cross build where just-built on would not
be executable on the host, whatever version on $PATH (or in
/usr/bin) may have to be used, but then in such a case you would not
be testing on host anyway.  For these two reasons, it is a given
that one of the choices has to be to use just-built one.  We can
safely give lower precedence to the host tool.

The only one-and-half practical reasons we may want to fall back to
whatever happens to be on $PATH are:

 - just-built one is so broken that even the simple use by the build
   infrastructure (like "self-checks") does not work (but then it
   becomes "if it is so broken, fix it before even thinking about
   running tests", and that is why I count it as half a reason), or

 - we are bisecting down to an ancient version, and just-built one
   from such a version may not understand the current repository.

so I do think it is an excellent workaround to fall back to a
version of Git with unknown vintage that happens to be on $PATH,
than failing and stopping by relying only on just-built one.

> We already use host Git in other parts of our build
> infra, and the options we pass to git-diff(1) have been around for ages:

It only argues for "trying host one, if available, before just-built
one does not hurt for this particular case".  But I was interested
in laying out a more general principle we can follow in similar
situations in the future.

^ permalink raw reply

* Re: [PATCH] mailinfo: fix out-of-bounds memory reads in unquote_quoted_pair()
From: Junio C Hamano @ 2023-12-13 14:54 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: Jeff King, git, Taylor Blau,
	Carlos Andrés Ramírez Cataño
In-Reply-To: <ZXlYIZ0Hb1kN84NU@tanuki>

Patrick Steinhardt <ps@pks.im> writes:

>>  static const char *unquote_quoted_string(struct strbuf *outbuf, const char *in)
>>  {
>> -	int c;
>>  	int take_next_literally = 0;
>>  
>> -	while ((c = *in++) != 0) {
>> +	while (*in) {
>> +		int c = *in++;
>>  		if (take_next_literally == 1) {
>>  			take_next_literally = 0;
>>  		} else {
>
> I was wondering whether we want to convert `unquote_quoted_pair()` in
> the same way. It's not subject to the same issue because it doesn't
> return `in` to its callers. But it would lower the cognitive overhead a
> bit by keeping the coding style consistent. But please feel free to
> ignore this remark.

Yeah, I was wondering about the value of establishing a pattern that
can be followed safely and with clarity, too.  I also briefly
wondered how bad if we picked the "compensate for the increment done
by the last iteration before leaving the loop by returning (in-1)"
idiom (which Peff called "a hacky way") to be that universal
pattern, but this bug was a clear enough evidence that it does not
work very well in developers' minds.

I actually had trouble with the proposed update, and wondered if

-	while ((c = *in++) != 0) {
+	while ((c = *in)) {
+		in++;

is easier to follow, but then was hit by the possibility that the
same "we have incremented 'in' a bit too early" may exist if such
a loop wants to use 'in' in its body.  Wouldn't it mean that

-	while ((c = *in++) != 0) {
+	for (; c = *in; in++) {

would be even a better rewrite?

Enough bikeshedding...

> Another thing I was wondering about is the recursive nature of these
> functions, and whether it can lead to similar issues like we recently
> had when parsing revisions with stack exhaustion. I _think_ this should
> not be much of a problem in this case though, as we're talking about
> mail headers here. While the length of header values isn't restricted
> per se (the line length is restricted to 1000, but with Comment Folding
> Whitespace values can span multiple lines), but mail provdiers will sure
> clamp down on mails with a "From:" header that is megabytes in size.

Good thinking, and I think Peff's iterative rewrite would be a good
way to deal with it if it ever becomes an issue.

> So taken together, this looks good to me.

Thanks, both, for writing and reviewing.


^ permalink raw reply

* Re: New attempt to export test-lib from Git, maybe Sharness2?
From: Jiang Xin @ 2023-12-13 14:01 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jiang Xin, Mathias Lafeldt, Christian Couder
In-Reply-To: <xmqqcyvfmccr.fsf@gitster.g>

On Sun, Dec 10, 2023 at 2:59 AM Junio C Hamano <gitster@pobox.com> wrote:
> Is it a viable option to stick to the name "test-lib" (or possibly,
> "git-test-lib" to make it more prominent to say where it came from)?

Will rename the project to "git-test-lib".

> If you do not plan to coordinate with those who work on (the remnant
> of) the original sharness based on an ancient version of our test
> framework, and do not plan to actively transition its users to your
> version, it is less confusing if you named yours differently, as it
> avoids hinting that your version is a successor of theirs.
>
> I am not sure if reusing the history of our project verbatim using
> filter-repo is really a good way to help those who are interested in
> the test framework, by the way.

If all historical commits were squashed, it would be difficult to
track future changes of test-lib in the Git project. Futhermore, who
will be the author of the squashed commit?

> and unedited filter-repo result will describe such a commit
> primarily to explain why the changes in the commit was made on Git
> side.  Most of the changes described in the resulting commit message
> are discarded by filter-repo and the resulting history becomes hard
> to follow.

One solution is to add notes in the README file and users can refer to
the commit history of the git project.

^ permalink raw reply

* [PATCH 2/1] test-lib-functions: add object size functions
From: René Scharfe @ 2023-12-13 12:28 UTC (permalink / raw)
  To: Jeff King, git; +Cc: Ondrej Pohorelsky, brian m . carlson, Junio C Hamano
In-Reply-To: <ff735aac-b60b-4d52-a6dc-180ab504fc8d@web.de>

Add test_object_size and its helpers test_loose_object_size and
test_packed_object_size, which allow determining the size of a Git
object using only the low-level Git commands rev-parse and show-index.

Use it in t6300 to replace the bare-bones function test_object_file_size
as a motivating example.  There it provides the expected output of the
high-level Git command for-each-ref.

Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: René Scharfe <l.s.r@web.de>
---
So how about this?  I'm a bit nervous about all the rules about output
descriptors and error propagation and whatnot in the test library, but
this implementation seems simple enough and might be useful in more than
one test.  No idea how to add support for alternate object directories,
but I doubt we'll ever need it.
---
 t/t6300-for-each-ref.sh | 16 ++++++--------
 t/test-lib-functions.sh | 47 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 54 insertions(+), 9 deletions(-)

diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
index 843a7fe143..4687660f38 100755
--- a/t/t6300-for-each-ref.sh
+++ b/t/t6300-for-each-ref.sh
@@ -20,12 +20,6 @@ setdate_and_increment () {
     export GIT_COMMITTER_DATE GIT_AUTHOR_DATE
 }

-test_object_file_size () {
-	oid=$(git rev-parse "$1")
-	path=".git/objects/$(test_oid_to_path $oid)"
-	test_file_size "$path"
-}
-
 test_expect_success setup '
 	# setup .mailmap
 	cat >.mailmap <<-EOF &&
@@ -40,7 +34,11 @@ test_expect_success setup '
 	git branch -M main &&
 	setdate_and_increment &&
 	git tag -a -m "Tagging at $datestamp" testtag &&
+	testtag_oid=$(git rev-parse refs/tags/testtag) &&
+	testtag_disksize=$(test_object_size $testtag_oid) &&
 	git update-ref refs/remotes/origin/main main &&
+	commit_oid=$(git rev-parse refs/heads/main) &&
+	commit_disksize=$(test_object_size $commit_oid) &&
 	git remote add origin nowhere &&
 	git config branch.main.remote origin &&
 	git config branch.main.merge refs/heads/main &&
@@ -129,7 +127,7 @@ test_atom head push:strip=1 remotes/myfork/main
 test_atom head push:strip=-1 main
 test_atom head objecttype commit
 test_atom head objectsize $((131 + hexlen))
-test_atom head objectsize:disk $(test_object_file_size refs/heads/main)
+test_atom head objectsize:disk $commit_disksize
 test_atom head deltabase $ZERO_OID
 test_atom head objectname $(git rev-parse refs/heads/main)
 test_atom head objectname:short $(git rev-parse --short refs/heads/main)
@@ -203,8 +201,8 @@ test_atom tag upstream ''
 test_atom tag push ''
 test_atom tag objecttype tag
 test_atom tag objectsize $((114 + hexlen))
-test_atom tag objectsize:disk $(test_object_file_size refs/tags/testtag)
-test_atom tag '*objectsize:disk' $(test_object_file_size refs/heads/main)
+test_atom tag objectsize:disk $testtag_disksize
+test_atom tag '*objectsize:disk' $commit_disksize
 test_atom tag deltabase $ZERO_OID
 test_atom tag '*deltabase' $ZERO_OID
 test_atom tag objectname $(git rev-parse refs/tags/testtag)
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 9c3cf12b26..9b49645f77 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -1733,6 +1733,53 @@ test_oid_to_path () {
 	echo "${1%$basename}/$basename"
 }

+test_loose_object_size () {
+	test "$#" -ne 1 && BUG "1 param"
+	local path=$(test_oid_to_path "$1")
+	test_file_size "$(git rev-parse --git-path "objects/$path")" 2>&4
+}
+
+test_packed_object_size () {
+	test "$#" -ne 2 && BUG "2 params"
+	local oid=$1 idx=$2 packsize rawsz end
+
+	packsize=$(test_file_size "${idx%.idx}.pack")
+	rawsz=$(test_oid rawsz)
+	end=$(($packsize - $rawsz))
+
+	git show-index <"$idx" |
+	awk -v oid="$oid" -v end="$end" '
+		$2 == oid {start = $1}
+		{offsets[$1] = 1}
+		END {
+			if (!start || start >= end)
+				exit 1
+			for (o in offsets)
+				if (start < o && o < end)
+					end = o
+			print end - start
+		}
+	' && return 0
+
+	echo >&4 "error: '$oid' not found in '$idx'"
+	return 1
+}
+
+test_object_size () {
+	test "$#" -ne 1 && BUG "1 param"
+	local oid=$1
+
+	test_loose_object_size "$oid" 4>/dev/null && return 0
+
+	for idx in "$(git rev-parse --git-path objects/pack)"/pack-*.idx
+	do
+		test_packed_object_size "$oid" "$idx" 4>/dev/null && return 0
+	done
+
+	echo >&4 "error: '$oid' not found"
+	return 1
+}
+
 # Parse oids from git ls-files --staged output
 test_parse_ls_files_stage_oids () {
 	awk '{print $2}' -
--
2.43.0

^ permalink raw reply related

* Re: Problems with Windows + schannel + http.sslCert
From: Johannes Schindelin @ 2023-12-13 11:56 UTC (permalink / raw)
  To: Ragesh Krishna; +Cc: git@vger.kernel.org
In-Reply-To: <TY0PR06MB544239787E909DD4EAC1CA42D189A@TY0PR06MB5442.apcprd06.prod.outlook.com>

Hi Ragesh,

On Sat, 9 Dec 2023, Ragesh Krishna wrote:

> I'm trying to use SSL client auth on Windows. My git installation
> currently lists only schannel as the supported backend.

There is a reason why only Secure Channel is listed as supported backend,
and it is a relatively boring one: DLL hell.

However, contrary to what `git -c http.sslBackend=list ls-remote <url>`
says on Windows, `git -c http.sslBackend=openssl ls-remote <url>` should
work, too.

Ciao,
Johannes

^ permalink raw reply

* Allow adding files from git submodule to parent git repo if they are in .gitignore in the submodule
From: Alexey Murz Korepov @ 2023-12-13 11:07 UTC (permalink / raw)
  To: git

Git submodules are pretty often used to "import" some external git
repository into your git repository, to not manage its files in your
git tree, that's good.

But very often we need to add some file to that submodule and manage
it in our local git repository, and this file is usually in .gitignore
in the submodule's git.

So, the example task is:

Copy a mysubmodule/config.example.yaml to mysubmodule/config.yaml
(which is in .gitignore in the mysubmodule), make local changes, and
store it in the local (parent) root git repo.

But the problem is that we can't add this mysubmodule/config.yaml file
to our root git repository, because receiving an error:
```
$ git add -f mysubmodule/config.yaml
fatal: Pathspec 'mysubmodule/config.yaml' is in submodule 'mysubmodule'
```
But I see no problems with doing this, if the file is in .gitignore of
the mysubmodule submodule.

So, let's discuss the approach of how we can allow this in git?

Or will be glad to hear and discuss alternatives and workarounds for this task.

^ permalink raw reply

* Communication channel interruption
From: Alexander Zhirov @ 2023-12-13  9:54 UTC (permalink / raw)
  To: git

When cloning a repository from GitHub, I have a channel break, although 
the connection is stable. What could be the problem?

# git clone https://github.com/Thinstation/thinstation.git
Cloning into 'thinstation'...
remote: Enumerating objects: 79839, done.
remote: Counting objects: 100% (535/535), done.
remote: Compressing objects: 100% (319/319), done.
error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly: 
CANCEL (err 8)
error: 7473 bytes of body are still expected
fetch-pack: unexpected disconnect while reading sideband packet
fatal: early EOF
fatal: fetch-pack: invalid index-pack output

^ permalink raw reply

* Re: [PATCH] mailinfo: fix out-of-bounds memory reads in unquote_quoted_pair()
From: Jeff King @ 2023-12-13  8:20 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: git, Taylor Blau, Carlos Andrés Ramírez Cataño
In-Reply-To: <ZXlYIZ0Hb1kN84NU@tanuki>

On Wed, Dec 13, 2023 at 08:07:21AM +0100, Patrick Steinhardt wrote:

> >  static const char *unquote_quoted_string(struct strbuf *outbuf, const char *in)
> >  {
> > -	int c;
> >  	int take_next_literally = 0;
> >  
> > -	while ((c = *in++) != 0) {
> > +	while (*in) {
> > +		int c = *in++;
> >  		if (take_next_literally == 1) {
> >  			take_next_literally = 0;
> >  		} else {
> 
> I was wondering whether we want to convert `unquote_quoted_pair()` in
> the same way. It's not subject to the same issue because it doesn't
> return `in` to its callers. But it would lower the cognitive overhead a
> bit by keeping the coding style consistent. But please feel free to
> ignore this remark.

I dunno. I don't think the consistency matters that much between
functions, and on its own, the "while ((c = *in++))" pattern is not
harmful. It's used elsewhere in the same file for other functions that
are also fine (for decoding q/b headers).

> Another thing I was wondering about is the recursive nature of these
> functions, and whether it can lead to similar issues like we recently
> had when parsing revisions with stack exhaustion. I _think_ this should
> not be much of a problem in this case though, as we're talking about
> mail headers here. While the length of header values isn't restricted
> per se (the line length is restricted to 1000, but with Comment Folding
> Whitespace values can span multiple lines), but mail provdiers will sure
> clamp down on mails with a "From:" header that is megabytes in size.

It's just unquote_comment() that is recursive, but yeah, there is
nothing to stop it from recursing forever on "((((((...". The stack
requirements are pretty small, though. I needed between 2^17 and 2^18
bytes to segfault on my machine using:

  perl -e 'print "From: ", "(" x 2**18;' |
  git mailinfo /dev/null /dev/null

Absurdly big for an email, but maybe within the realm of possibility? I
think it might be possible to drop the recursion and just use a depth
counter, like this:

diff --git a/mailinfo.c b/mailinfo.c
index 737b9e5e13..db236f9f9f 100644
--- a/mailinfo.c
+++ b/mailinfo.c
@@ -59,6 +59,7 @@ static void parse_bogus_from(struct mailinfo *mi, const struct strbuf *line)
 static const char *unquote_comment(struct strbuf *outbuf, const char *in)
 {
 	int take_next_literally = 0;
+	int depth = 1;
 
 	strbuf_addch(outbuf, '(');
 
@@ -72,11 +73,14 @@ static const char *unquote_comment(struct strbuf *outbuf, const char *in)
 				take_next_literally = 1;
 				continue;
 			case '(':
-				in = unquote_comment(outbuf, in);
+				strbuf_addch(outbuf, '(');
+				depth++;
 				continue;
 			case ')':
 				strbuf_addch(outbuf, ')');
-				return in;
+				if (!--depth)
+					return in;
+				continue;
 			}
 		}
 
I doubt it's a big deal either way, but if it's that easy to do it might
be worth it.

-Peff

^ permalink raw reply related

* Re: [PATCH] Use ^=1 to toggle between 0 and 1
From: Jeff King @ 2023-12-13  8:01 UTC (permalink / raw)
  To: René Scharfe
  Cc: AtariDreams via GitGitGadget, git, AtariDreams, Seija Kijin
In-Reply-To: <8bea38fe-38a3-412a-b189-541a6596d623@web.de>

On Tue, Dec 12, 2023 at 11:30:03PM +0100, René Scharfe wrote:

> Am 12.12.23 um 21:09 schrieb Jeff King:
> > On Tue, Dec 12, 2023 at 05:17:47PM +0000, AtariDreams via GitGitGadget wrote:
> >
> >> diff --git a/diff.c b/diff.c
> >> index 2c602df10a3..91842b54753 100644
> >> --- a/diff.c
> >> +++ b/diff.c
> >> @@ -1191,7 +1191,7 @@ static void mark_color_as_moved(struct diff_options *o,
> >>  							    &pmb_nr);
> >>
> >>  			if (contiguous && pmb_nr && moved_symbol == l->s)
> >> -				flipped_block = (flipped_block + 1) % 2;
> >> +				flipped_block ^= 1;
> >>  			else
> >>  				flipped_block = 0;
> >
> > This one I do not see any problem with changing, though I think it is a
> > matter of opinion on which is more readable (I actually tend to think of
> > "x = 0 - x" as idiomatic for flipping).
> 
> Did you mean "x = 1 - x"?

Oops, yes, of course. I'm not sure how I managed to fumble that.

> I don't particular like this; it repeats x and seems error-prone. ;-)

Yes. :)

Without digging into the code, I had just assumed that flipped_block was
used as an array index. But it really is a boolean, so I actually think
"flipped_block = !flipped_block" would probably be the most clear (but
IMHO not really worth the churn).

> Can we salvage something from this bikeshedding exercise?  I wonder if
> it's time to use the C99 type _Bool in our code.  It would allow
> documenting that only two possible values exist in cases like the one
> above.  That would be even more useful for function returns, I assume.

Hmm, possibly. I guess that might have helped my confusion, and I do
think returning bool for function returns would help make their meaning
more clear (it would help distinguish them from the usual "0 for
success" return values).

I don't even know that we'd need much of a weather-balloon patch. I
think it would be valid to do:

  #ifndef bool
  #define bool int

to handle pre-C99 compilers (if there even are any these days). Of
course we probably need some conditional magic to try to "#include
<stdbool.h>" for the actual C99. I guess we could assume C99 by default
and then add NO_STDBOOL as an escape hatch if anybody complains.

-Peff

^ permalink raw reply

* Re: [PATCH v2 1/4] wt-status: read HEAD and ORIG_HEAD via the refdb
From: Patrick Steinhardt @ 2023-12-13  7:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Ramsay Jones, git, Taylor Blau, Phillip Wood
In-Reply-To: <xmqq34w7os53.fsf@gitster.g>

[-- Attachment #1: Type: text/plain, Size: 3505 bytes --]

On Tue, Dec 12, 2023 at 04:36:24PM -0800, Junio C Hamano wrote:
> Ramsay Jones <ramsay@ramsayjones.plus.com> writes:
> 
> >> "via the refdb" -> "via the refs API" or something here and on the
> >> title, and possibly elsewhere in the proposed log messages and
> >> in-code comments in patches in this series, as I've never seen a
> >> word "refdb" used in the context of this project.
> >> 
> >> I agree it is bad manners to be intimate with the implementation
> >> details of the how files-backend stores HEAD and ORIG_HEAD.
> >
> > Hmm, I have never thought of the 'pseudo-refs' as being a part of
> > the 'reference database' at all. ;)
> 
> Me neither, but once you start thinking about getting rid of the
> need to use one-file-per-ref filesystem, being able to maintain all
> refs, including the pseudo refs, in one r/w store backend, becomes a
> very tempting goal.  From that point of view, I do not have problem
> with the idea to move _all_ pseudorefs to reftable.

Yes, we're in agreement.

> But I do have reservations on what Patrick, and the code he
> inherited from Han-Wen, calls "special refs" (which is not defined
> in the glossary at all), namely, refs.c:is_special_ref() and its
> callers.

I do not want to add "special refs" to the glossary because ultimately
they should go away, with two exceptions: FETCH_HEAD and MERGE_HEAD.
Once we're there we can of course discuss whether we want to explicitly
point them out in the glossary and give them a special name.

> Neither am I very much sympathetic to the hardcoded list
> of "known" pseudorefs, refs.c:pseudorefs[].  I cannot quite see why
> we need anything more than


>     any string that passes refs.c:is_pseudoref_syntax() is a
>     pseudoref, is per worktree, and ref backends can store them like
>     any other refs.  Many of them have specific meaning and uses
>     (e.g. HEAD is "the current branch").
> 
> Enumerating existing pseudorefs in files backend may need to
> opendir(".git") + readdir() filtered with is_pseudoref_syntax(),
> and a corresponding implementation for reftable backend may be much
> simpler (because there won't be "other cruft" stored there, unlike
> files backend that needs to worry about files that are not refs,
> like ".git/config" file.
> 
> > We seem to have pseudo-refs, special pseudo-refs and (recently)
> > ex-pseudo-refs!
> >
> > This patch (well series) changes the 'status' of some, *but not all*,
> > pseudo-refs; some graduate to full-blown refs stored as part of *a*
> > reference database (ie reftable).
> 
> Yeah, that leaves bad taste in my mouth, too.

I'm taking an iterative approach to things, which means that we're at
times going to be in an in-between state. I want to avoid changing too
many things at once and overwhelming potential reviewers. But I realize
that I should've done a better job of explaining that this patch series
is not the end goal, but rather a step towards that goal.

The patch series at hand merely records the status quo and rectifies any
inconsistencies we have with accessing such "special" refs. The natural
next step here would be to reduce the list of special refs (like e.g. we
do in patch 4) so that in the end it will only contain those refs which
really are special (FETCH_HEAD, MERGE_HEAD).

Please let me know in case you strongly disagree with my iterative
approach, or whether the issue is rather that I didn't make myself
sufficiently clear.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH v2 2/4] refs: propagate errno when reading special refs fails
From: Patrick Steinhardt @ 2023-12-13  7:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Taylor Blau, Phillip Wood
In-Reply-To: <xmqqfs07qi66.fsf@gitster.g>

[-- Attachment #1: Type: text/plain, Size: 1736 bytes --]

On Tue, Dec 12, 2023 at 12:28:49PM -0800, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > Some refs in Git are more special than others due to reasons explained
> > in the next commit. These refs are read via `refs_read_special_head()`,
> > but this function doesn't behave the same as when we try to read a
> > normal ref. Most importantly, we do not propagate `failure_errno` in the
> > case where the reference does not exist, which is behaviour that we rely
> > on in many parts of Git.
> >
> > Fix this bug by propagating errno when `strbuf_read_file()` fails.
> 
> Hmph, I thought, judging from what [1/4] did, that your plan was to
> use the refs API, instead of peeking directly into the filesystem,
> to access these pseudo refs, and am a bit puzzled if it makes all
> that much difference to fix errno handling while still reading
> directly from the filesystem.  Perhaps such a conversion happens in
> later steps of this series (or a follow on series after this series
> lands)?

Yes, that's ultimately the goal. The patch series here is only setting
the stage by recording what we have, and addressing cases where we are
inconsistently accessing refs via the filesystem _and_ via the ref API.
Once this lands I do want create a follow up patch series that converts
all refs to be non-special to the extent possible.

I say "to the extent possible" though because in the end there will be
two refs that we must continue to treat specially: FETCH_HEAD and
MERGE_HEAD, which we were treating special before this patch series
already. Both of these are not normal refs because they may contain
multiple values with annotations, so they will need to stay special.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH] tests: prefer host Git to verify chainlint self-checks
From: Patrick Steinhardt @ 2023-12-13  7:20 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Eric Sunshine
In-Reply-To: <xmqq8r5zrzg1.fsf@gitster.g>

[-- Attachment #1: Type: text/plain, Size: 2052 bytes --]

On Tue, Dec 12, 2023 at 11:30:22AM -0800, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > To accomodate for cases where the host system has no Git installation we
> > use the locally-compiled version of Git. This can result in problems
> > though when the Git project's repository is using extensions that the
> > locally-compiled version of Git doesn't understand. It will refuse to
> > run and thus cause the checks to fail.
> >
> > Fix this issue by prefering the host's Git resolved via PATH. If it
> > doesn't exist, then we fall back to the locally-compiled Git version and
> > diff as before.
> >
> > Signed-off-by: Patrick Steinhardt <ps@pks.im>
> > ---
> >
> > I've started to dogfood the reftable backend on my local machine and
> > have converted many repositories to use the reftable backend. This
> > surfaced the described issue because the repository now sets up the
> > "extensions.refStorage" extension, and thus "check-chainlint" fails
> > depending on which versions of Git I'm trying to compile and test.
> 
> I do not think "prefer host Git" is necessarily a good idea; falling
> back to use host Git is perfectly fine, of course.

Why is that, though? We already use host Git in other parts of our build
infra, and the options we pass to git-diff(1) have been around for ages:

  - `--no-pager` was introduced in 463a849d00 (Add and document a global
    --no-pager option for git., 2007-08-19).

  - `--no-index` is around since at least fcfa33ec90 (diff: make more
    cases implicit --no-index, 2007-02-25).

  - `-w` is around since 0d21efa51c (Teach diff about -b and -w flags,
    2006-06-14).

So all options have pretty much been there since forever. Which means
that if the host has Git around, and the Git version is at least v1.5.3,
then it also understands all options.

Furthermore, we are accessing the Git repository that the user has set
up with host Git in the first place, so I'd think even conceptually it
is the correct thing to do.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH] mailinfo: fix out-of-bounds memory reads in unquote_quoted_pair()
From: Patrick Steinhardt @ 2023-12-13  7:07 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Taylor Blau, Carlos Andrés Ramírez Cataño
In-Reply-To: <20231212221243.GA1656116@coredump.intra.peff.net>

[-- Attachment #1: Type: text/plain, Size: 3736 bytes --]

On Tue, Dec 12, 2023 at 05:12:43PM -0500, Jeff King wrote:
> When processing a header like a "From" line, mailinfo uses
> unquote_quoted_pair() to handle double-quotes and rfc822 parenthesized
> comments. It takes a NUL-terminated string on input, and loops over the
> "in" pointer until it sees the NUL. When it finds the start of an
> interesting block, it delegates to helper functions which also increment
> "in", and return the updated pointer.
> 
> But there's a bug here: the helpers find the NUL with a post-increment
> in the loop condition, like:
> 
>    while ((c = *in++) != 0)
> 
> So when they do see a NUL (rather than the correct termination of the
> quote or comment section), they return "in" as one _past_ the NUL
> terminator. And thus the outer loop in unquote_quoted_pair() does not
> realize we hit the NUL, and keeps reading past the end of the buffer.
> 
> We should instead make sure to return "in" positioned at the NUL, so
> that the caller knows to stop their loop, too. A hacky way to do this is
> to return "in - 1" after leaving the inner loop. But a slightly cleaner
> solution is to avoid incrementing "in" until we are sure it contained a
> non-NUL byte (i.e., doing it inside the loop body).
> 
> The two tests here show off the problem. Since we check the output,
> they'll _usually_ report a failure in a normal build, but it depends on
> what garbage bytes are found after the heap buffer. Building with
> SANITIZE=address reliably notices the problem. The outcome (both the
> exit code and the exact bytes) are just what Git happens to produce for
> these cases today, and shouldn't be taken as an endorsement. It might be
> reasonable to abort on an unterminated string, for example. The priority
> for this patch is fixing the out-of-bounds memory access.

Makes sense.

> diff --git a/mailinfo.c b/mailinfo.c
> index a07d2da16d..737b9e5e13 100644
> --- a/mailinfo.c
> +++ b/mailinfo.c
> @@ -58,12 +58,12 @@ static void parse_bogus_from(struct mailinfo *mi, const struct strbuf *line)
>  
>  static const char *unquote_comment(struct strbuf *outbuf, const char *in)
>  {
> -	int c;
>  	int take_next_literally = 0;
>  
>  	strbuf_addch(outbuf, '(');
>  
> -	while ((c = *in++) != 0) {
> +	while (*in) {
> +		int c = *in++;
>  		if (take_next_literally == 1) {
>  			take_next_literally = 0;
>  		} else {
> @@ -88,10 +88,10 @@ static const char *unquote_comment(struct strbuf *outbuf, const char *in)
>  
>  static const char *unquote_quoted_string(struct strbuf *outbuf, const char *in)
>  {
> -	int c;
>  	int take_next_literally = 0;
>  
> -	while ((c = *in++) != 0) {
> +	while (*in) {
> +		int c = *in++;
>  		if (take_next_literally == 1) {
>  			take_next_literally = 0;
>  		} else {

I was wondering whether we want to convert `unquote_quoted_pair()` in
the same way. It's not subject to the same issue because it doesn't
return `in` to its callers. But it would lower the cognitive overhead a
bit by keeping the coding style consistent. But please feel free to
ignore this remark.

Another thing I was wondering about is the recursive nature of these
functions, and whether it can lead to similar issues like we recently
had when parsing revisions with stack exhaustion. I _think_ this should
not be much of a problem in this case though, as we're talking about
mail headers here. While the length of header values isn't restricted
per se (the line length is restricted to 1000, but with Comment Folding
Whitespace values can span multiple lines), but mail provdiers will sure
clamp down on mails with a "From:" header that is megabytes in size.

So taken together, this looks good to me.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: Propose a change in open for passing in the file type.
From: Đoàn Trần Công Danh @ 2023-12-13  4:17 UTC (permalink / raw)
  To: Haritha D; +Cc: git@vger.kernel.org
In-Reply-To: <E1D54D98-3836-41CA-84B5-32AEAF7642D8@ibm.com>

On 2023-12-12 14:46:04+0000, Haritha D <Harithamma.D@ibm.com> wrote:
> Hi Everyone,
>  
> Am working on porting git to z/OS. For reference, the pull request am working on https://github.com/git/git/pull/1537.
>  
> On z/OS there is a notion of file tag attributes. Files can be
> tagged as binary, ASCII, UTF8, EBCDIC, etc. z/OS uses these
> attributes to determine if auto-conversion is necessary. It was
> recommended in PR that we add logic directly to xopen . In order for
> me to do this in xopen , I have to pass an extra parameter to xopen
> that specifies the file type. 
>  
> Ex: 
> xopen(output_file, O_CREAT | O_WRONLY | O_TRUNC, 0666);
>  
> To :
> xopen(output_file, O_CREAT | O_WRONLY | O_TRUNC, 0666, BINARY);
>  
> BINARY: would be an enum value.

Would it work if you always open the file as BINARY? And let's all the
conversion done by git via some configs (core.encoding?)?

-- 
Danh

^ permalink raw reply

* Re: [PATCH v2 1/4] wt-status: read HEAD and ORIG_HEAD via the refdb
From: Junio C Hamano @ 2023-12-13  0:36 UTC (permalink / raw)
  To: Ramsay Jones; +Cc: Patrick Steinhardt, git, Taylor Blau, Phillip Wood
In-Reply-To: <ac84b1b9-2381-406a-b459-6728bf9f8704@ramsayjones.plus.com>

Ramsay Jones <ramsay@ramsayjones.plus.com> writes:

>> "via the refdb" -> "via the refs API" or something here and on the
>> title, and possibly elsewhere in the proposed log messages and
>> in-code comments in patches in this series, as I've never seen a
>> word "refdb" used in the context of this project.
>> 
>> I agree it is bad manners to be intimate with the implementation
>> details of the how files-backend stores HEAD and ORIG_HEAD.
>
> Hmm, I have never thought of the 'pseudo-refs' as being a part of
> the 'reference database' at all. ;)

Me neither, but once you start thinking about getting rid of the
need to use one-file-per-ref filesystem, being able to maintain all
refs, including the pseudo refs, in one r/w store backend, becomes a
very tempting goal.  From that point of view, I do not have problem
with the idea to move _all_ pseudorefs to reftable.

But I do have reservations on what Patrick, and the code he
inherited from Han-Wen, calls "special refs" (which is not defined
in the glossary at all), namely, refs.c:is_special_ref() and its
callers.  Neither am I very much sympathetic to the hardcoded list
of "known" pseudorefs, refs.c:pseudorefs[].  I cannot quite see why
we need anything more than

    any string that passes refs.c:is_pseudoref_syntax() is a
    pseudoref, is per worktree, and ref backends can store them like
    any other refs.  Many of them have specific meaning and uses
    (e.g. HEAD is "the current branch").

Enumerating existing pseudorefs in files backend may need to
opendir(".git") + readdir() filtered with is_pseudoref_syntax(),
and a corresponding implementation for reftable backend may be much
simpler (because there won't be "other cruft" stored there, unlike
files backend that needs to worry about files that are not refs,
like ".git/config" file.

> We seem to have pseudo-refs, special pseudo-refs and (recently)
> ex-pseudo-refs!
>
> This patch (well series) changes the 'status' of some, *but not all*,
> pseudo-refs; some graduate to full-blown refs stored as part of *a*
> reference database (ie reftable).

Yeah, that leaves bad taste in my mouth, too.

^ permalink raw reply

* Re: [PATCH v2 1/4] wt-status: read HEAD and ORIG_HEAD via the refdb
From: Ramsay Jones @ 2023-12-12 23:32 UTC (permalink / raw)
  To: Junio C Hamano, Patrick Steinhardt; +Cc: git, Taylor Blau, Phillip Wood
In-Reply-To: <xmqqle9zqidj.fsf@gitster.g>



On 12/12/2023 20:24, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
>> The current code only works by chance because we only have a single
>> reference backend implementation. Refactor it to instead read both refs
>> via the refdb layer so that we'll also be compatible with alternate
>> reference backends.
> 
> "via the refdb" -> "via the refs API" or something here and on the
> title, and possibly elsewhere in the proposed log messages and
> in-code comments in patches in this series, as I've never seen a
> word "refdb" used in the context of this project.
> 
> I agree it is bad manners to be intimate with the implementation
> details of the how files-backend stores HEAD and ORIG_HEAD.

Hmm, I have never thought of the 'pseudo-refs' as being a part of
the 'reference database' at all. ;)

We seem to have pseudo-refs, special pseudo-refs and (recently)
ex-pseudo-refs!

This patch (well series) changes the 'status' of some, *but not all*,
pseudo-refs; some graduate to full-blown refs stored as part of *a*
reference database (ie reftable).

As far as I recall, this has not been discussed on the ML. Why are
only some chosen to become 'full' refs and others not? This is not
discussed in any of the commit messages.

The '.invalid' HEAD hack featured in a recent completion patch as well.
[If this is because the JAVA implementation does it this way, I think
it needs some thought before including it in the Git project].

Anyway, I haven't had the time to study the details here, so please
ignore my uninformed ramblings!

ATB,
Ramsay Jones


^ permalink raw reply

* Re: [PATCH 1/1] attr: add builtin objectmode values support
From: Junio C Hamano @ 2023-12-12 23:17 UTC (permalink / raw)
  To: Joanna Wang; +Cc: git, sunshine, tboegi
In-Reply-To: <CAMmZTi-U_ufzoBLCDWKbrf=3GZzGszxnM1_Pu6ufBeoYjj7Gdw@mail.gmail.com>

Joanna Wang <jojwang@google.com> writes:

>> Cumulatively, aside from the removal of the t/#t* file, here is what
>> I ended up with so far.
>
> I want to double check if I should followup here.
> I assumed that you had already applied these final fixes on my behalf,
> similar to my patch for enabling attr for `git-add`. But if I was wrong,
> I'm happy to send another update with all the fixes.

I've squashed the fix in, so no need to resend only to patch up what
I pointed out earlier.

The end result fails t0003 under GIT_TEST_PASSING_SANITIZE_LEAK
though.  As the synthetic attribute values are allocated without
being in the hashmap based on the value read from .gitattributes
files, somebody needs to hold pointers to them *and* we need to
avoid allocating unbounded number of them.

The attached is one possible way to plug the leak; I am not sure if
it is the best one, though.  One thing I like about the solution is
that the approach makes sure that the mode attributes we would ever
return are very tightly controlled and does not allow a buggy code
to come up with "mode" to be passed to this new helper function to
pass random and unsupported mode bits without triggering the BUG().

 attr.c | 30 +++++++++++++++++++++++++++---
 1 file changed, 27 insertions(+), 3 deletions(-)

diff --git c/attr.c w/attr.c
index b03c20f768..679e42258c 100644
--- c/attr.c
+++ w/attr.c
@@ -1250,10 +1250,34 @@ static struct object_id *default_attr_source(void)
 	return &attr_source;
 }
 
+static const char *interned_mode_string(unsigned int mode)
+{
+	static struct {
+		unsigned int val;
+		char str[7];
+	} mode_string[] = {
+		{ .val = 0040000 },
+		{ .val = 0100644 },
+		{ .val = 0100755 },
+		{ .val = 0120000 },
+		{ .val = 0160000 },
+	};
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(mode_string); i++) {
+		if (mode_string[i].val != mode)
+			continue;
+		if (!*mode_string[i].str)
+			snprintf(mode_string[i].str, sizeof(mode_string[i].str),
+				 "%06o", mode);
+		return mode_string[i].str;
+	}
+	BUG("Unsupported mode 0%o", mode);
+}
+
 static const char *builtin_object_mode_attr(struct index_state *istate, const char *path)
 {
 	unsigned int mode;
-	struct strbuf sb = STRBUF_INIT;
 
 	if (direction == GIT_ATTR_CHECKIN) {
 		struct object_id oid;
@@ -1287,8 +1311,8 @@ static const char *builtin_object_mode_attr(struct index_state *istate, const ch
 		else
 			return ATTR__UNSET;
 	}
-	strbuf_addf(&sb, "%06o", mode);
-	return strbuf_detach(&sb, NULL);
+
+	return interned_mode_string(mode);
 }
 
 


^ permalink raw reply related

* Re: Test breakage with zlib-ng
From: René Scharfe @ 2023-12-12 22:54 UTC (permalink / raw)
  To: Jeff King; +Cc: Ondrej Pohorelsky, git, brian m . carlson, Junio C Hamano
In-Reply-To: <20231212200153.GB1127366@coredump.intra.peff.net>

Am 12.12.23 um 21:01 schrieb Jeff King:
> On Tue, Dec 12, 2023 at 06:04:55PM +0100, René Scharfe wrote:
>
>> Subject: [PATCH] t6300: avoid hard-coding object sizes
>>
>> f4ee22b526 (ref-filter: add tests for objectsize:disk, 2018-12-24)
>> hard-coded the expected object sizes.  Coincidentally the size of commit
>> and tag is the same with zlib at the default compression level.
>>
>> 1f5f8f3e85 (t6300: abstract away SHA-1-specific constants, 2020-02-22)
>> encoded the sizes as a single value, which coincidentally also works
>> with sha256.
>>
>> Different compression libraries like zlib-ng may arrive at different
>> values.  Get them from the file system instead of hard-coding them to
>> make switching the compression library (or changing the compression
>> level) easier.
>
> Yeah, this is definitely the right solution here. I'm surprised the
> hard-coded values didn't cause problems before now. ;)
>
> The patch looks good to me, but a few small comments:
>
>> +test_object_file_size () {
>> +	oid=$(git rev-parse "$1")
>> +	path=".git/objects/$(test_oid_to_path $oid)"
>> +	test_file_size "$path"
>> +}
>
> Here we're assuming the objects are loose. I think that's probably OK
> (and certainly the test will notice if that changes).
>
> We're covering the formatting code paths along with the underlying
> implementation that fills in object_info->disk_sizep for loose objects.
> Which I think is plenty for this particular script, which is about
> for-each-ref.
>
> It would be nice to have coverage of the packed_object_info() code path,
> though. Back when it was added in a4ac106178 (cat-file: add
> %(objectsize:disk) format atom, 2013-07-10), I cowardly punted on this,
> writing:
>
>   This patch does not include any tests, as the exact numbers
>   returned are volatile and subject to zlib and packing
>   decisions. We cannot even reliably guarantee that the
>   on-disk size is smaller than the object content (though in
>   general this should be the case for non-trivial objects).
>
> I don't think it's that big a deal, but I guess we could do something
> like:
>
>   prev=
>   git show-index <$pack_idx |
>   sort -n |
>   grep -A1 $oid |
>   while read ofs oid csum
>   do
>     test -n "$prev" && echo "$((ofs - prev))"
>     prev=$ofs
>   done
>
> It feels a little redundant with what Git is doing under the hood, but
> at least is exercising the code (and we're using the idx directly, so
> we're confirming that the revindex is right).

A generic object size function based on both methods could live in the
test lib and be used for e.g. cat-file tests as well.  Getting such a
function polished and library-worthy is probably more work than I
naively imagine, however -- due to our shunning of pipes alone.

> Anyway, that is all way beyond the scope of your patch, but I wonder if
> it's worth doing on top.
>
>> @@ -129,7 +129,7 @@ test_atom head push:strip=1 remotes/myfork/main
>>  test_atom head push:strip=-1 main
>>  test_atom head objecttype commit
>>  test_atom head objectsize $((131 + hexlen))
>> -test_atom head objectsize:disk $disklen
>> +test_atom head objectsize:disk $(test_object_file_size refs/heads/main)
>
> These test_object_file_size calls are happening outside of any
> test_expect_* block, so we'd miss failing exit codes (and also the
> helper is not &&-chained), and any stderr would leak to the output.
> That's probably OK in practice, though (if something goes wrong then the
> expected value output will be bogus and the test itself will fail).

Right.  We could also set variables during setup, though, to put
readers' minds at rest.

René

^ permalink raw reply

* Re: [PATCH] Use ^=1 to toggle between 0 and 1
From: René Scharfe @ 2023-12-12 22:30 UTC (permalink / raw)
  To: Jeff King, AtariDreams via GitGitGadget; +Cc: git, AtariDreams, Seija Kijin
In-Reply-To: <20231212200920.GC1127366@coredump.intra.peff.net>

Am 12.12.23 um 21:09 schrieb Jeff King:
> On Tue, Dec 12, 2023 at 05:17:47PM +0000, AtariDreams via GitGitGadget wrote:
>
>> diff --git a/diff.c b/diff.c
>> index 2c602df10a3..91842b54753 100644
>> --- a/diff.c
>> +++ b/diff.c
>> @@ -1191,7 +1191,7 @@ static void mark_color_as_moved(struct diff_options *o,
>>  							    &pmb_nr);
>>
>>  			if (contiguous && pmb_nr && moved_symbol == l->s)
>> -				flipped_block = (flipped_block + 1) % 2;
>> +				flipped_block ^= 1;
>>  			else
>>  				flipped_block = 0;
>
> This one I do not see any problem with changing, though I think it is a
> matter of opinion on which is more readable (I actually tend to think of
> "x = 0 - x" as idiomatic for flipping).

Did you mean "x = 1 - x"?

    x 0 - x 1 - x
   -- ----- -----
   -1    +1    +2
    0     0    +1
   +1    -1     0

I don't particular like this; it repeats x and seems error-prone. ;-)

I agree with your assessment of the other three cases in the patch.

Can we salvage something from this bikeshedding exercise?  I wonder if
it's time to use the C99 type _Bool in our code.  It would allow
documenting that only two possible values exist in cases like the one
above.  That would be even more useful for function returns, I assume.

René


^ permalink raw reply

* Re: Test breakage with zlib-ng
From: Junio C Hamano @ 2023-12-12 22:30 UTC (permalink / raw)
  To: René Scharfe; +Cc: Ondrej Pohorelsky, git, brian m . carlson
In-Reply-To: <9feeb6cf-aabf-4002-917f-3f6c27547bc8@web.de>

René Scharfe <l.s.r@web.de> writes:

> Or we could get the sizes of the objects by checking their files,
> which would not require  hard-coding anymore.  Patch below.

That was my first reaction to seeing the original report.  It is a
bit surprising that the necessary fix is so small and makes me wonder
why we initially did the hardcoded values, which feels more work to
initially write the test vector.

Looking good.  Thanks.

> --- >8 ---
> Subject: [PATCH] t6300: avoid hard-coding object sizes
>
> f4ee22b526 (ref-filter: add tests for objectsize:disk, 2018-12-24)
> hard-coded the expected object sizes.  Coincidentally the size of commit
> and tag is the same with zlib at the default compression level.
>
> 1f5f8f3e85 (t6300: abstract away SHA-1-specific constants, 2020-02-22)
> encoded the sizes as a single value, which coincidentally also works
> with sha256.
>
> Different compression libraries like zlib-ng may arrive at different
> values.  Get them from the file system instead of hard-coding them to
> make switching the compression library (or changing the compression
> level) easier.
>
> Reported-by: Ondrej Pohorelsky <opohorel@redhat.com>
> Signed-off-by: René Scharfe <l.s.r@web.de>
> ---
>  t/t6300-for-each-ref.sh | 18 +++++++++---------
>  1 file changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
> index 54e2281259..843a7fe143 100755
> --- a/t/t6300-for-each-ref.sh
> +++ b/t/t6300-for-each-ref.sh
> @@ -20,12 +20,13 @@ setdate_and_increment () {
>      export GIT_COMMITTER_DATE GIT_AUTHOR_DATE
>  }
>
> -test_expect_success setup '
> -	test_oid_cache <<-EOF &&
> -	disklen sha1:138
> -	disklen sha256:154
> -	EOF
> +test_object_file_size () {
> +	oid=$(git rev-parse "$1")
> +	path=".git/objects/$(test_oid_to_path $oid)"
> +	test_file_size "$path"
> +}
>
> +test_expect_success setup '
>  	# setup .mailmap
>  	cat >.mailmap <<-EOF &&
>  	A Thor <athor@example.com> A U Thor <author@example.com>
> @@ -94,7 +95,6 @@ test_atom () {
>  }
>
>  hexlen=$(test_oid hexsz)
> -disklen=$(test_oid disklen)
>
>  test_atom head refname refs/heads/main
>  test_atom head refname: refs/heads/main
> @@ -129,7 +129,7 @@ test_atom head push:strip=1 remotes/myfork/main
>  test_atom head push:strip=-1 main
>  test_atom head objecttype commit
>  test_atom head objectsize $((131 + hexlen))
> -test_atom head objectsize:disk $disklen
> +test_atom head objectsize:disk $(test_object_file_size refs/heads/main)
>  test_atom head deltabase $ZERO_OID
>  test_atom head objectname $(git rev-parse refs/heads/main)
>  test_atom head objectname:short $(git rev-parse --short refs/heads/main)
> @@ -203,8 +203,8 @@ test_atom tag upstream ''
>  test_atom tag push ''
>  test_atom tag objecttype tag
>  test_atom tag objectsize $((114 + hexlen))
> -test_atom tag objectsize:disk $disklen
> -test_atom tag '*objectsize:disk' $disklen
> +test_atom tag objectsize:disk $(test_object_file_size refs/tags/testtag)
> +test_atom tag '*objectsize:disk' $(test_object_file_size refs/heads/main)
>  test_atom tag deltabase $ZERO_OID
>  test_atom tag '*deltabase' $ZERO_OID
>  test_atom tag objectname $(git rev-parse refs/tags/testtag)
> --
> 2.43.0

^ permalink raw reply

* Re: [PATCH v6] subtree: fix split processing with multiple subtrees present
From: Junio C Hamano @ 2023-12-12 22:28 UTC (permalink / raw)
  To: Christian Couder
  Cc: Zach FettersMoore, Zach FettersMoore via GitGitGadget, git
In-Reply-To: <CAP8UFD19phFz54d8fDM=MBRMSD9Rz4R0_463KgptN8eeFs7MnQ@mail.gmail.com>

Christian Couder <christian.couder@gmail.com> writes:

>> > $ git subtree split --prefix=apollo-ios-codegen --squash --rejoin
>> > Merge made by the 'ort' strategy.
>> > e274aed3ba6d0659fb4cc014587cf31c1e8df7f4
>>
>> Looking into this some it looks like it could be a bash config
>> difference? My machine always runs it all the way through vs
>> failing for recursion depth. Although that would also be an issue
>> which is solved by this fix.
>
> I use Ubuntu where /bin/sh is dash so my current guess is that dash
> might have a smaller recursion limit than bash.

That sounds quite bad.  Does it have to be recursive (iow, if we can
rewrite the logic to be iterative instead, that would be a much better
way to fix the issue)?

^ permalink raw reply

* Re: Test breakage with zlib-ng
From: brian m. carlson @ 2023-12-12 22:18 UTC (permalink / raw)
  To: René Scharfe; +Cc: Ondrej Pohorelsky, git, Junio C Hamano
In-Reply-To: <9feeb6cf-aabf-4002-917f-3f6c27547bc8@web.de>

[-- Attachment #1: Type: text/plain, Size: 984 bytes --]

On 2023-12-12 at 17:04:55, René Scharfe wrote:
> --- >8 ---
> Subject: [PATCH] t6300: avoid hard-coding object sizes
> 
> f4ee22b526 (ref-filter: add tests for objectsize:disk, 2018-12-24)
> hard-coded the expected object sizes.  Coincidentally the size of commit
> and tag is the same with zlib at the default compression level.
> 
> 1f5f8f3e85 (t6300: abstract away SHA-1-specific constants, 2020-02-22)
> encoded the sizes as a single value, which coincidentally also works
> with sha256.
> 
> Different compression libraries like zlib-ng may arrive at different
> values.  Get them from the file system instead of hard-coding them to
> make switching the compression library (or changing the compression
> level) easier.

This definitely seems like the right thing to do.  I was a bit lazy at
the time and probably could have improved it then, but it's at least
good that we're doing it now.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply

* [PATCH] mailinfo: fix out-of-bounds memory reads in unquote_quoted_pair()
From: Jeff King @ 2023-12-12 22:12 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, Carlos Andrés Ramírez Cataño

When processing a header like a "From" line, mailinfo uses
unquote_quoted_pair() to handle double-quotes and rfc822 parenthesized
comments. It takes a NUL-terminated string on input, and loops over the
"in" pointer until it sees the NUL. When it finds the start of an
interesting block, it delegates to helper functions which also increment
"in", and return the updated pointer.

But there's a bug here: the helpers find the NUL with a post-increment
in the loop condition, like:

   while ((c = *in++) != 0)

So when they do see a NUL (rather than the correct termination of the
quote or comment section), they return "in" as one _past_ the NUL
terminator. And thus the outer loop in unquote_quoted_pair() does not
realize we hit the NUL, and keeps reading past the end of the buffer.

We should instead make sure to return "in" positioned at the NUL, so
that the caller knows to stop their loop, too. A hacky way to do this is
to return "in - 1" after leaving the inner loop. But a slightly cleaner
solution is to avoid incrementing "in" until we are sure it contained a
non-NUL byte (i.e., doing it inside the loop body).

The two tests here show off the problem. Since we check the output,
they'll _usually_ report a failure in a normal build, but it depends on
what garbage bytes are found after the heap buffer. Building with
SANITIZE=address reliably notices the problem. The outcome (both the
exit code and the exact bytes) are just what Git happens to produce for
these cases today, and shouldn't be taken as an endorsement. It might be
reasonable to abort on an unterminated string, for example. The priority
for this patch is fixing the out-of-bounds memory access.

Reported-by: Carlos Andrés Ramírez Cataño <antaigroupltda@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
---
This was reported to the security list, but because it's just an
out-of-bounds read, we won't do a coordinated disclosure.

 mailinfo.c          |  8 ++++----
 t/t5100-mailinfo.sh | 22 ++++++++++++++++++++++
 2 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/mailinfo.c b/mailinfo.c
index a07d2da16d..737b9e5e13 100644
--- a/mailinfo.c
+++ b/mailinfo.c
@@ -58,12 +58,12 @@ static void parse_bogus_from(struct mailinfo *mi, const struct strbuf *line)
 
 static const char *unquote_comment(struct strbuf *outbuf, const char *in)
 {
-	int c;
 	int take_next_literally = 0;
 
 	strbuf_addch(outbuf, '(');
 
-	while ((c = *in++) != 0) {
+	while (*in) {
+		int c = *in++;
 		if (take_next_literally == 1) {
 			take_next_literally = 0;
 		} else {
@@ -88,10 +88,10 @@ static const char *unquote_comment(struct strbuf *outbuf, const char *in)
 
 static const char *unquote_quoted_string(struct strbuf *outbuf, const char *in)
 {
-	int c;
 	int take_next_literally = 0;
 
-	while ((c = *in++) != 0) {
+	while (*in) {
+		int c = *in++;
 		if (take_next_literally == 1) {
 			take_next_literally = 0;
 		} else {
diff --git a/t/t5100-mailinfo.sh b/t/t5100-mailinfo.sh
index db11cababd..654d8cf3ee 100755
--- a/t/t5100-mailinfo.sh
+++ b/t/t5100-mailinfo.sh
@@ -268,4 +268,26 @@ test_expect_success 'mailinfo warn CR in base64 encoded email' '
 	test_must_be_empty quoted-cr/0002.err
 '
 
+test_expect_success 'from line with unterminated quoted string' '
+	echo "From: bob \"unterminated string smith <bob@example.com>" >in &&
+	git mailinfo /dev/null /dev/null <in >actual &&
+	cat >expect <<-\EOF &&
+	Author: bob unterminated string smith
+	Email: bob@example.com
+
+	EOF
+	test_cmp expect actual
+'
+
+test_expect_success 'from line with unterminated comment' '
+	echo "From: bob (unterminated comment smith <bob@example.com>" >in &&
+	git mailinfo /dev/null /dev/null <in >actual &&
+	cat >expect <<-\EOF &&
+	Author: bob (unterminated comment smith
+	Email: bob@example.com
+
+	EOF
+	test_cmp expect actual
+'
+
 test_done
-- 
2.43.0.310.g85bf1f633d

^ permalink raw reply related

* Re: [PATCH] revision: parse integer arguments to --max-count, --skip, etc., more carefully
From: Jeff King @ 2023-12-12 22:05 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Britton Kerin
In-Reply-To: <xmqqjzpjsbjl.fsf@gitster.g>

On Tue, Dec 12, 2023 at 07:09:02AM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > This all looks pretty reasonable to me.
> >
> > I couldn't help but think, though, that surely we have some helpers for
> > this already? But the closest seems to be git_parse_int(), which also
> > allows unit factors. I'm not sure if allowing "-n 1k" would be a feature
> > or a bug. ;)
> 
> The change in question is meant to be a pure fix to replace a careless
> use of atoi().  I do not mind to see a separate patch to add such a
> feature later on top.

Oh, I mostly meant that I would have turned to git_parse_int() as that
already-existing helper, but it is not suitable because of the extra
unit-handling. I think your patch draws the line in the right place.

> > I wonder if there are more spots that could benefit.
> 
> "git grep -e 'atoi('" would give somebody motivated a decent set of
> #microproject ideas, but many hits are not suited for strtol_i(),
> which is designed to parse an integer at the end of a string.  Some
> places use atoi() immediately followed by strspn() to skip over
> digits, which means they are parsing an integer and want to continue
> reading after the integer, which is incompatible with what
> strtol_i() wants to do.  They need either a separate helper or an
> updated strtol_i() that optionally allows you to parse the prefix
> and report where the integer ended, e.g., something like:

Yeah, I agree this might be a good microproject (or leftoverbits) area,
and the semantics for the helper you propose make sense to me.

-Peff

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox