Git development

Git development
 help / color / mirror / Atom feed

* Re: [PATCH v2 2/4] refs: propagate errno when reading special refs fails
From: Patrick Steinhardt @ 2023-12-13  7:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Taylor Blau, Phillip Wood
In-Reply-To: <xmqqfs07qi66.fsf@gitster.g>

[-- Attachment #1: Type: text/plain, Size: 1736 bytes --]

On Tue, Dec 12, 2023 at 12:28:49PM -0800, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > Some refs in Git are more special than others due to reasons explained
> > in the next commit. These refs are read via `refs_read_special_head()`,
> > but this function doesn't behave the same as when we try to read a
> > normal ref. Most importantly, we do not propagate `failure_errno` in the
> > case where the reference does not exist, which is behaviour that we rely
> > on in many parts of Git.
> >
> > Fix this bug by propagating errno when `strbuf_read_file()` fails.
> 
> Hmph, I thought, judging from what [1/4] did, that your plan was to
> use the refs API, instead of peeking directly into the filesystem,
> to access these pseudo refs, and am a bit puzzled if it makes all
> that much difference to fix errno handling while still reading
> directly from the filesystem.  Perhaps such a conversion happens in
> later steps of this series (or a follow on series after this series
> lands)?

Yes, that's ultimately the goal. The patch series here is only setting
the stage by recording what we have, and addressing cases where we are
inconsistently accessing refs via the filesystem _and_ via the ref API.
Once this lands I do want create a follow up patch series that converts
all refs to be non-special to the extent possible.

I say "to the extent possible" though because in the end there will be
two refs that we must continue to treat specially: FETCH_HEAD and
MERGE_HEAD, which we were treating special before this patch series
already. Both of these are not normal refs because they may contain
multiple values with annotations, so they will need to stay special.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH] tests: prefer host Git to verify chainlint self-checks
From: Patrick Steinhardt @ 2023-12-13  7:20 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Eric Sunshine
In-Reply-To: <xmqq8r5zrzg1.fsf@gitster.g>

[-- Attachment #1: Type: text/plain, Size: 2052 bytes --]

On Tue, Dec 12, 2023 at 11:30:22AM -0800, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > To accomodate for cases where the host system has no Git installation we
> > use the locally-compiled version of Git. This can result in problems
> > though when the Git project's repository is using extensions that the
> > locally-compiled version of Git doesn't understand. It will refuse to
> > run and thus cause the checks to fail.
> >
> > Fix this issue by prefering the host's Git resolved via PATH. If it
> > doesn't exist, then we fall back to the locally-compiled Git version and
> > diff as before.
> >
> > Signed-off-by: Patrick Steinhardt <ps@pks.im>
> > ---
> >
> > I've started to dogfood the reftable backend on my local machine and
> > have converted many repositories to use the reftable backend. This
> > surfaced the described issue because the repository now sets up the
> > "extensions.refStorage" extension, and thus "check-chainlint" fails
> > depending on which versions of Git I'm trying to compile and test.
> 
> I do not think "prefer host Git" is necessarily a good idea; falling
> back to use host Git is perfectly fine, of course.

Why is that, though? We already use host Git in other parts of our build
infra, and the options we pass to git-diff(1) have been around for ages:

  - `--no-pager` was introduced in 463a849d00 (Add and document a global
    --no-pager option for git., 2007-08-19).

  - `--no-index` is around since at least fcfa33ec90 (diff: make more
    cases implicit --no-index, 2007-02-25).

  - `-w` is around since 0d21efa51c (Teach diff about -b and -w flags,
    2006-06-14).

So all options have pretty much been there since forever. Which means
that if the host has Git around, and the Git version is at least v1.5.3,
then it also understands all options.

Furthermore, we are accessing the Git repository that the user has set
up with host Git in the first place, so I'd think even conceptually it
is the correct thing to do.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH] mailinfo: fix out-of-bounds memory reads in unquote_quoted_pair()
From: Patrick Steinhardt @ 2023-12-13  7:07 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Taylor Blau, Carlos Andrés Ramírez Cataño
In-Reply-To: <20231212221243.GA1656116@coredump.intra.peff.net>

[-- Attachment #1: Type: text/plain, Size: 3736 bytes --]

On Tue, Dec 12, 2023 at 05:12:43PM -0500, Jeff King wrote:
> When processing a header like a "From" line, mailinfo uses
> unquote_quoted_pair() to handle double-quotes and rfc822 parenthesized
> comments. It takes a NUL-terminated string on input, and loops over the
> "in" pointer until it sees the NUL. When it finds the start of an
> interesting block, it delegates to helper functions which also increment
> "in", and return the updated pointer.
> 
> But there's a bug here: the helpers find the NUL with a post-increment
> in the loop condition, like:
> 
>    while ((c = *in++) != 0)
> 
> So when they do see a NUL (rather than the correct termination of the
> quote or comment section), they return "in" as one _past_ the NUL
> terminator. And thus the outer loop in unquote_quoted_pair() does not
> realize we hit the NUL, and keeps reading past the end of the buffer.
> 
> We should instead make sure to return "in" positioned at the NUL, so
> that the caller knows to stop their loop, too. A hacky way to do this is
> to return "in - 1" after leaving the inner loop. But a slightly cleaner
> solution is to avoid incrementing "in" until we are sure it contained a
> non-NUL byte (i.e., doing it inside the loop body).
> 
> The two tests here show off the problem. Since we check the output,
> they'll _usually_ report a failure in a normal build, but it depends on
> what garbage bytes are found after the heap buffer. Building with
> SANITIZE=address reliably notices the problem. The outcome (both the
> exit code and the exact bytes) are just what Git happens to produce for
> these cases today, and shouldn't be taken as an endorsement. It might be
> reasonable to abort on an unterminated string, for example. The priority
> for this patch is fixing the out-of-bounds memory access.

Makes sense.

> diff --git a/mailinfo.c b/mailinfo.c
> index a07d2da16d..737b9e5e13 100644
> --- a/mailinfo.c
> +++ b/mailinfo.c
> @@ -58,12 +58,12 @@ static void parse_bogus_from(struct mailinfo *mi, const struct strbuf *line)
>  
>  static const char *unquote_comment(struct strbuf *outbuf, const char *in)
>  {
> -	int c;
>  	int take_next_literally = 0;
>  
>  	strbuf_addch(outbuf, '(');
>  
> -	while ((c = *in++) != 0) {
> +	while (*in) {
> +		int c = *in++;
>  		if (take_next_literally == 1) {
>  			take_next_literally = 0;
>  		} else {
> @@ -88,10 +88,10 @@ static const char *unquote_comment(struct strbuf *outbuf, const char *in)
>  
>  static const char *unquote_quoted_string(struct strbuf *outbuf, const char *in)
>  {
> -	int c;
>  	int take_next_literally = 0;
>  
> -	while ((c = *in++) != 0) {
> +	while (*in) {
> +		int c = *in++;
>  		if (take_next_literally == 1) {
>  			take_next_literally = 0;
>  		} else {

I was wondering whether we want to convert `unquote_quoted_pair()` in
the same way. It's not subject to the same issue because it doesn't
return `in` to its callers. But it would lower the cognitive overhead a
bit by keeping the coding style consistent. But please feel free to
ignore this remark.

Another thing I was wondering about is the recursive nature of these
functions, and whether it can lead to similar issues like we recently
had when parsing revisions with stack exhaustion. I _think_ this should
not be much of a problem in this case though, as we're talking about
mail headers here. While the length of header values isn't restricted
per se (the line length is restricted to 1000, but with Comment Folding
Whitespace values can span multiple lines), but mail provdiers will sure
clamp down on mails with a "From:" header that is megabytes in size.

So taken together, this looks good to me.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: Propose a change in open for passing in the file type.
From: Đoàn Trần Công Danh @ 2023-12-13  4:17 UTC (permalink / raw)
  To: Haritha D; +Cc: git@vger.kernel.org
In-Reply-To: <E1D54D98-3836-41CA-84B5-32AEAF7642D8@ibm.com>

On 2023-12-12 14:46:04+0000, Haritha D <Harithamma.D@ibm.com> wrote:
> Hi Everyone,
>  
> Am working on porting git to z/OS. For reference, the pull request am working on https://github.com/git/git/pull/1537.
>  
> On z/OS there is a notion of file tag attributes. Files can be
> tagged as binary, ASCII, UTF8, EBCDIC, etc. z/OS uses these
> attributes to determine if auto-conversion is necessary. It was
> recommended in PR that we add logic directly to xopen . In order for
> me to do this in xopen , I have to pass an extra parameter to xopen
> that specifies the file type. 
>  
> Ex: 
> xopen(output_file, O_CREAT | O_WRONLY | O_TRUNC, 0666);
>  
> To :
> xopen(output_file, O_CREAT | O_WRONLY | O_TRUNC, 0666, BINARY);
>  
> BINARY: would be an enum value.

Would it work if you always open the file as BINARY? And let's all the
conversion done by git via some configs (core.encoding?)?

-- 
Danh

^ permalink raw reply

* Re: [PATCH v2 1/4] wt-status: read HEAD and ORIG_HEAD via the refdb
From: Junio C Hamano @ 2023-12-13  0:36 UTC (permalink / raw)
  To: Ramsay Jones; +Cc: Patrick Steinhardt, git, Taylor Blau, Phillip Wood
In-Reply-To: <ac84b1b9-2381-406a-b459-6728bf9f8704@ramsayjones.plus.com>

Ramsay Jones <ramsay@ramsayjones.plus.com> writes:

>> "via the refdb" -> "via the refs API" or something here and on the
>> title, and possibly elsewhere in the proposed log messages and
>> in-code comments in patches in this series, as I've never seen a
>> word "refdb" used in the context of this project.
>> 
>> I agree it is bad manners to be intimate with the implementation
>> details of the how files-backend stores HEAD and ORIG_HEAD.
>
> Hmm, I have never thought of the 'pseudo-refs' as being a part of
> the 'reference database' at all. ;)

Me neither, but once you start thinking about getting rid of the
need to use one-file-per-ref filesystem, being able to maintain all
refs, including the pseudo refs, in one r/w store backend, becomes a
very tempting goal.  From that point of view, I do not have problem
with the idea to move _all_ pseudorefs to reftable.

But I do have reservations on what Patrick, and the code he
inherited from Han-Wen, calls "special refs" (which is not defined
in the glossary at all), namely, refs.c:is_special_ref() and its
callers.  Neither am I very much sympathetic to the hardcoded list
of "known" pseudorefs, refs.c:pseudorefs[].  I cannot quite see why
we need anything more than

    any string that passes refs.c:is_pseudoref_syntax() is a
    pseudoref, is per worktree, and ref backends can store them like
    any other refs.  Many of them have specific meaning and uses
    (e.g. HEAD is "the current branch").

Enumerating existing pseudorefs in files backend may need to
opendir(".git") + readdir() filtered with is_pseudoref_syntax(),
and a corresponding implementation for reftable backend may be much
simpler (because there won't be "other cruft" stored there, unlike
files backend that needs to worry about files that are not refs,
like ".git/config" file.

> We seem to have pseudo-refs, special pseudo-refs and (recently)
> ex-pseudo-refs!
>
> This patch (well series) changes the 'status' of some, *but not all*,
> pseudo-refs; some graduate to full-blown refs stored as part of *a*
> reference database (ie reftable).

Yeah, that leaves bad taste in my mouth, too.

^ permalink raw reply

* Re: [PATCH v2 1/4] wt-status: read HEAD and ORIG_HEAD via the refdb
From: Ramsay Jones @ 2023-12-12 23:32 UTC (permalink / raw)
  To: Junio C Hamano, Patrick Steinhardt; +Cc: git, Taylor Blau, Phillip Wood
In-Reply-To: <xmqqle9zqidj.fsf@gitster.g>

On 12/12/2023 20:24, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
>> The current code only works by chance because we only have a single
>> reference backend implementation. Refactor it to instead read both refs
>> via the refdb layer so that we'll also be compatible with alternate
>> reference backends.
> 
> "via the refdb" -> "via the refs API" or something here and on the
> title, and possibly elsewhere in the proposed log messages and
> in-code comments in patches in this series, as I've never seen a
> word "refdb" used in the context of this project.
> 
> I agree it is bad manners to be intimate with the implementation
> details of the how files-backend stores HEAD and ORIG_HEAD.

Hmm, I have never thought of the 'pseudo-refs' as being a part of
the 'reference database' at all. ;)

We seem to have pseudo-refs, special pseudo-refs and (recently)
ex-pseudo-refs!

This patch (well series) changes the 'status' of some, *but not all*,
pseudo-refs; some graduate to full-blown refs stored as part of *a*
reference database (ie reftable).

As far as I recall, this has not been discussed on the ML. Why are
only some chosen to become 'full' refs and others not? This is not
discussed in any of the commit messages.

The '.invalid' HEAD hack featured in a recent completion patch as well.
[If this is because the JAVA implementation does it this way, I think
it needs some thought before including it in the Git project].

Anyway, I haven't had the time to study the details here, so please
ignore my uninformed ramblings!

ATB,
Ramsay Jones

^ permalink raw reply

* Re: [PATCH 1/1] attr: add builtin objectmode values support
From: Junio C Hamano @ 2023-12-12 23:17 UTC (permalink / raw)
  To: Joanna Wang; +Cc: git, sunshine, tboegi
In-Reply-To: <CAMmZTi-U_ufzoBLCDWKbrf=3GZzGszxnM1_Pu6ufBeoYjj7Gdw@mail.gmail.com>

Joanna Wang <jojwang@google.com> writes:

>> Cumulatively, aside from the removal of the t/#t* file, here is what
>> I ended up with so far.
>
> I want to double check if I should followup here.
> I assumed that you had already applied these final fixes on my behalf,
> similar to my patch for enabling attr for `git-add`. But if I was wrong,
> I'm happy to send another update with all the fixes.

I've squashed the fix in, so no need to resend only to patch up what
I pointed out earlier.

The end result fails t0003 under GIT_TEST_PASSING_SANITIZE_LEAK
though.  As the synthetic attribute values are allocated without
being in the hashmap based on the value read from .gitattributes
files, somebody needs to hold pointers to them *and* we need to
avoid allocating unbounded number of them.

The attached is one possible way to plug the leak; I am not sure if
it is the best one, though.  One thing I like about the solution is
that the approach makes sure that the mode attributes we would ever
return are very tightly controlled and does not allow a buggy code
to come up with "mode" to be passed to this new helper function to
pass random and unsupported mode bits without triggering the BUG().

 attr.c | 30 +++++++++++++++++++++++++++---
 1 file changed, 27 insertions(+), 3 deletions(-)

diff --git c/attr.c w/attr.c
index b03c20f768..679e42258c 100644
--- c/attr.c
+++ w/attr.c
@@ -1250,10 +1250,34 @@ static struct object_id *default_attr_source(void)
 	return &attr_source;
 }
 
+static const char *interned_mode_string(unsigned int mode)
+{
+	static struct {
+		unsigned int val;
+		char str[7];
+	} mode_string[] = {
+		{ .val = 0040000 },
+		{ .val = 0100644 },
+		{ .val = 0100755 },
+		{ .val = 0120000 },
+		{ .val = 0160000 },
+	};
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(mode_string); i++) {
+		if (mode_string[i].val != mode)
+			continue;
+		if (!*mode_string[i].str)
+			snprintf(mode_string[i].str, sizeof(mode_string[i].str),
+				 "%06o", mode);
+		return mode_string[i].str;
+	}
+	BUG("Unsupported mode 0%o", mode);
+}
+
 static const char *builtin_object_mode_attr(struct index_state *istate, const char *path)
 {
 	unsigned int mode;
-	struct strbuf sb = STRBUF_INIT;
 
 	if (direction == GIT_ATTR_CHECKIN) {
 		struct object_id oid;
@@ -1287,8 +1311,8 @@ static const char *builtin_object_mode_attr(struct index_state *istate, const ch
 		else
 			return ATTR__UNSET;
 	}
-	strbuf_addf(&sb, "%06o", mode);
-	return strbuf_detach(&sb, NULL);
+
+	return interned_mode_string(mode);
 }
 
 


^ permalink raw reply related

* Re: Test breakage with zlib-ng
From: René Scharfe @ 2023-12-12 22:54 UTC (permalink / raw)
  To: Jeff King; +Cc: Ondrej Pohorelsky, git, brian m . carlson, Junio C Hamano
In-Reply-To: <20231212200153.GB1127366@coredump.intra.peff.net>

Am 12.12.23 um 21:01 schrieb Jeff King:
> On Tue, Dec 12, 2023 at 06:04:55PM +0100, René Scharfe wrote:
>
>> Subject: [PATCH] t6300: avoid hard-coding object sizes
>>
>> f4ee22b526 (ref-filter: add tests for objectsize:disk, 2018-12-24)
>> hard-coded the expected object sizes.  Coincidentally the size of commit
>> and tag is the same with zlib at the default compression level.
>>
>> 1f5f8f3e85 (t6300: abstract away SHA-1-specific constants, 2020-02-22)
>> encoded the sizes as a single value, which coincidentally also works
>> with sha256.
>>
>> Different compression libraries like zlib-ng may arrive at different
>> values.  Get them from the file system instead of hard-coding them to
>> make switching the compression library (or changing the compression
>> level) easier.
>
> Yeah, this is definitely the right solution here. I'm surprised the
> hard-coded values didn't cause problems before now. ;)
>
> The patch looks good to me, but a few small comments:
>
>> +test_object_file_size () {
>> +	oid=$(git rev-parse "$1")
>> +	path=".git/objects/$(test_oid_to_path $oid)"
>> +	test_file_size "$path"
>> +}
>
> Here we're assuming the objects are loose. I think that's probably OK
> (and certainly the test will notice if that changes).
>
> We're covering the formatting code paths along with the underlying
> implementation that fills in object_info->disk_sizep for loose objects.
> Which I think is plenty for this particular script, which is about
> for-each-ref.
>
> It would be nice to have coverage of the packed_object_info() code path,
> though. Back when it was added in a4ac106178 (cat-file: add
> %(objectsize:disk) format atom, 2013-07-10), I cowardly punted on this,
> writing:
>
>   This patch does not include any tests, as the exact numbers
>   returned are volatile and subject to zlib and packing
>   decisions. We cannot even reliably guarantee that the
>   on-disk size is smaller than the object content (though in
>   general this should be the case for non-trivial objects).
>
> I don't think it's that big a deal, but I guess we could do something
> like:
>
>   prev=
>   git show-index <$pack_idx |
>   sort -n |
>   grep -A1 $oid |
>   while read ofs oid csum
>   do
>     test -n "$prev" && echo "$((ofs - prev))"
>     prev=$ofs
>   done
>
> It feels a little redundant with what Git is doing under the hood, but
> at least is exercising the code (and we're using the idx directly, so
> we're confirming that the revindex is right).

A generic object size function based on both methods could live in the
test lib and be used for e.g. cat-file tests as well.  Getting such a
function polished and library-worthy is probably more work than I
naively imagine, however -- due to our shunning of pipes alone.

> Anyway, that is all way beyond the scope of your patch, but I wonder if
> it's worth doing on top.
>
>> @@ -129,7 +129,7 @@ test_atom head push:strip=1 remotes/myfork/main
>>  test_atom head push:strip=-1 main
>>  test_atom head objecttype commit
>>  test_atom head objectsize $((131 + hexlen))
>> -test_atom head objectsize:disk $disklen
>> +test_atom head objectsize:disk $(test_object_file_size refs/heads/main)
>
> These test_object_file_size calls are happening outside of any
> test_expect_* block, so we'd miss failing exit codes (and also the
> helper is not &&-chained), and any stderr would leak to the output.
> That's probably OK in practice, though (if something goes wrong then the
> expected value output will be bogus and the test itself will fail).

Right.  We could also set variables during setup, though, to put
readers' minds at rest.

René

^ permalink raw reply

* Re: [PATCH] Use ^=1 to toggle between 0 and 1
From: René Scharfe @ 2023-12-12 22:30 UTC (permalink / raw)
  To: Jeff King, AtariDreams via GitGitGadget; +Cc: git, AtariDreams, Seija Kijin
In-Reply-To: <20231212200920.GC1127366@coredump.intra.peff.net>

Am 12.12.23 um 21:09 schrieb Jeff King:
> On Tue, Dec 12, 2023 at 05:17:47PM +0000, AtariDreams via GitGitGadget wrote:
>
>> diff --git a/diff.c b/diff.c
>> index 2c602df10a3..91842b54753 100644
>> --- a/diff.c
>> +++ b/diff.c
>> @@ -1191,7 +1191,7 @@ static void mark_color_as_moved(struct diff_options *o,
>>  							    &pmb_nr);
>>
>>  			if (contiguous && pmb_nr && moved_symbol == l->s)
>> -				flipped_block = (flipped_block + 1) % 2;
>> +				flipped_block ^= 1;
>>  			else
>>  				flipped_block = 0;
>
> This one I do not see any problem with changing, though I think it is a
> matter of opinion on which is more readable (I actually tend to think of
> "x = 0 - x" as idiomatic for flipping).

Did you mean "x = 1 - x"?

    x 0 - x 1 - x
   -- ----- -----
   -1    +1    +2
    0     0    +1
   +1    -1     0

I don't particular like this; it repeats x and seems error-prone. ;-)

I agree with your assessment of the other three cases in the patch.

Can we salvage something from this bikeshedding exercise?  I wonder if
it's time to use the C99 type _Bool in our code.  It would allow
documenting that only two possible values exist in cases like the one
above.  That would be even more useful for function returns, I assume.

René


^ permalink raw reply

* Re: Test breakage with zlib-ng
From: Junio C Hamano @ 2023-12-12 22:30 UTC (permalink / raw)
  To: René Scharfe; +Cc: Ondrej Pohorelsky, git, brian m . carlson
In-Reply-To: <9feeb6cf-aabf-4002-917f-3f6c27547bc8@web.de>

René Scharfe <l.s.r@web.de> writes:

> Or we could get the sizes of the objects by checking their files,
> which would not require  hard-coding anymore.  Patch below.

That was my first reaction to seeing the original report.  It is a
bit surprising that the necessary fix is so small and makes me wonder
why we initially did the hardcoded values, which feels more work to
initially write the test vector.

Looking good.  Thanks.

> --- >8 ---
> Subject: [PATCH] t6300: avoid hard-coding object sizes
>
> f4ee22b526 (ref-filter: add tests for objectsize:disk, 2018-12-24)
> hard-coded the expected object sizes.  Coincidentally the size of commit
> and tag is the same with zlib at the default compression level.
>
> 1f5f8f3e85 (t6300: abstract away SHA-1-specific constants, 2020-02-22)
> encoded the sizes as a single value, which coincidentally also works
> with sha256.
>
> Different compression libraries like zlib-ng may arrive at different
> values.  Get them from the file system instead of hard-coding them to
> make switching the compression library (or changing the compression
> level) easier.
>
> Reported-by: Ondrej Pohorelsky <opohorel@redhat.com>
> Signed-off-by: René Scharfe <l.s.r@web.de>
> ---
>  t/t6300-for-each-ref.sh | 18 +++++++++---------
>  1 file changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
> index 54e2281259..843a7fe143 100755
> --- a/t/t6300-for-each-ref.sh
> +++ b/t/t6300-for-each-ref.sh
> @@ -20,12 +20,13 @@ setdate_and_increment () {
>      export GIT_COMMITTER_DATE GIT_AUTHOR_DATE
>  }
>
> -test_expect_success setup '
> -	test_oid_cache <<-EOF &&
> -	disklen sha1:138
> -	disklen sha256:154
> -	EOF
> +test_object_file_size () {
> +	oid=$(git rev-parse "$1")
> +	path=".git/objects/$(test_oid_to_path $oid)"
> +	test_file_size "$path"
> +}
>
> +test_expect_success setup '
>  	# setup .mailmap
>  	cat >.mailmap <<-EOF &&
>  	A Thor <athor@example.com> A U Thor <author@example.com>
> @@ -94,7 +95,6 @@ test_atom () {
>  }
>
>  hexlen=$(test_oid hexsz)
> -disklen=$(test_oid disklen)
>
>  test_atom head refname refs/heads/main
>  test_atom head refname: refs/heads/main
> @@ -129,7 +129,7 @@ test_atom head push:strip=1 remotes/myfork/main
>  test_atom head push:strip=-1 main
>  test_atom head objecttype commit
>  test_atom head objectsize $((131 + hexlen))
> -test_atom head objectsize:disk $disklen
> +test_atom head objectsize:disk $(test_object_file_size refs/heads/main)
>  test_atom head deltabase $ZERO_OID
>  test_atom head objectname $(git rev-parse refs/heads/main)
>  test_atom head objectname:short $(git rev-parse --short refs/heads/main)
> @@ -203,8 +203,8 @@ test_atom tag upstream ''
>  test_atom tag push ''
>  test_atom tag objecttype tag
>  test_atom tag objectsize $((114 + hexlen))
> -test_atom tag objectsize:disk $disklen
> -test_atom tag '*objectsize:disk' $disklen
> +test_atom tag objectsize:disk $(test_object_file_size refs/tags/testtag)
> +test_atom tag '*objectsize:disk' $(test_object_file_size refs/heads/main)
>  test_atom tag deltabase $ZERO_OID
>  test_atom tag '*deltabase' $ZERO_OID
>  test_atom tag objectname $(git rev-parse refs/tags/testtag)
> --
> 2.43.0

^ permalink raw reply

* Re: [PATCH v6] subtree: fix split processing with multiple subtrees present
From: Junio C Hamano @ 2023-12-12 22:28 UTC (permalink / raw)
  To: Christian Couder
  Cc: Zach FettersMoore, Zach FettersMoore via GitGitGadget, git
In-Reply-To: <CAP8UFD19phFz54d8fDM=MBRMSD9Rz4R0_463KgptN8eeFs7MnQ@mail.gmail.com>

Christian Couder <christian.couder@gmail.com> writes:

>> > $ git subtree split --prefix=apollo-ios-codegen --squash --rejoin
>> > Merge made by the 'ort' strategy.
>> > e274aed3ba6d0659fb4cc014587cf31c1e8df7f4
>>
>> Looking into this some it looks like it could be a bash config
>> difference? My machine always runs it all the way through vs
>> failing for recursion depth. Although that would also be an issue
>> which is solved by this fix.
>
> I use Ubuntu where /bin/sh is dash so my current guess is that dash
> might have a smaller recursion limit than bash.

That sounds quite bad.  Does it have to be recursive (iow, if we can
rewrite the logic to be iterative instead, that would be a much better
way to fix the issue)?

^ permalink raw reply

* Re: Test breakage with zlib-ng
From: brian m. carlson @ 2023-12-12 22:18 UTC (permalink / raw)
  To: René Scharfe; +Cc: Ondrej Pohorelsky, git, Junio C Hamano
In-Reply-To: <9feeb6cf-aabf-4002-917f-3f6c27547bc8@web.de>

[-- Attachment #1: Type: text/plain, Size: 984 bytes --]

On 2023-12-12 at 17:04:55, René Scharfe wrote:
> --- >8 ---
> Subject: [PATCH] t6300: avoid hard-coding object sizes
> 
> f4ee22b526 (ref-filter: add tests for objectsize:disk, 2018-12-24)
> hard-coded the expected object sizes.  Coincidentally the size of commit
> and tag is the same with zlib at the default compression level.
> 
> 1f5f8f3e85 (t6300: abstract away SHA-1-specific constants, 2020-02-22)
> encoded the sizes as a single value, which coincidentally also works
> with sha256.
> 
> Different compression libraries like zlib-ng may arrive at different
> values.  Get them from the file system instead of hard-coding them to
> make switching the compression library (or changing the compression
> level) easier.

This definitely seems like the right thing to do.  I was a bit lazy at
the time and probably could have improved it then, but it's at least
good that we're doing it now.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply

* [PATCH] mailinfo: fix out-of-bounds memory reads in unquote_quoted_pair()
From: Jeff King @ 2023-12-12 22:12 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, Carlos Andrés Ramírez Cataño

When processing a header like a "From" line, mailinfo uses
unquote_quoted_pair() to handle double-quotes and rfc822 parenthesized
comments. It takes a NUL-terminated string on input, and loops over the
"in" pointer until it sees the NUL. When it finds the start of an
interesting block, it delegates to helper functions which also increment
"in", and return the updated pointer.

But there's a bug here: the helpers find the NUL with a post-increment
in the loop condition, like:

   while ((c = *in++) != 0)

So when they do see a NUL (rather than the correct termination of the
quote or comment section), they return "in" as one _past_ the NUL
terminator. And thus the outer loop in unquote_quoted_pair() does not
realize we hit the NUL, and keeps reading past the end of the buffer.

We should instead make sure to return "in" positioned at the NUL, so
that the caller knows to stop their loop, too. A hacky way to do this is
to return "in - 1" after leaving the inner loop. But a slightly cleaner
solution is to avoid incrementing "in" until we are sure it contained a
non-NUL byte (i.e., doing it inside the loop body).

The two tests here show off the problem. Since we check the output,
they'll _usually_ report a failure in a normal build, but it depends on
what garbage bytes are found after the heap buffer. Building with
SANITIZE=address reliably notices the problem. The outcome (both the
exit code and the exact bytes) are just what Git happens to produce for
these cases today, and shouldn't be taken as an endorsement. It might be
reasonable to abort on an unterminated string, for example. The priority
for this patch is fixing the out-of-bounds memory access.

Reported-by: Carlos Andrés Ramírez Cataño <antaigroupltda@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
---
This was reported to the security list, but because it's just an
out-of-bounds read, we won't do a coordinated disclosure.

 mailinfo.c          |  8 ++++----
 t/t5100-mailinfo.sh | 22 ++++++++++++++++++++++
 2 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/mailinfo.c b/mailinfo.c
index a07d2da16d..737b9e5e13 100644
--- a/mailinfo.c
+++ b/mailinfo.c
@@ -58,12 +58,12 @@ static void parse_bogus_from(struct mailinfo *mi, const struct strbuf *line)

 static const char *unquote_comment(struct strbuf *outbuf, const char *in)
 {
-	int c;
 	int take_next_literally = 0;

 	strbuf_addch(outbuf, '(');

-	while ((c = *in++) != 0) {
+	while (*in) {
+		int c = *in++;
 		if (take_next_literally == 1) {
 			take_next_literally = 0;
 		} else {
@@ -88,10 +88,10 @@ static const char *unquote_comment(struct strbuf *outbuf, const char *in)

 static const char *unquote_quoted_string(struct strbuf *outbuf, const char *in)
 {
-	int c;
 	int take_next_literally = 0;

-	while ((c = *in++) != 0) {
+	while (*in) {
+		int c = *in++;
 		if (take_next_literally == 1) {
 			take_next_literally = 0;
 		} else {
diff --git a/t/t5100-mailinfo.sh b/t/t5100-mailinfo.sh
index db11cababd..654d8cf3ee 100755
--- a/t/t5100-mailinfo.sh
+++ b/t/t5100-mailinfo.sh
@@ -268,4 +268,26 @@ test_expect_success 'mailinfo warn CR in base64 encoded email' '
 	test_must_be_empty quoted-cr/0002.err
 '

+test_expect_success 'from line with unterminated quoted string' '
+	echo "From: bob \"unterminated string smith <bob@example.com>" >in &&
+	git mailinfo /dev/null /dev/null <in >actual &&
+	cat >expect <<-\EOF &&
+	Author: bob unterminated string smith
+	Email: bob@example.com
+
+	EOF
+	test_cmp expect actual
+'
+
+test_expect_success 'from line with unterminated comment' '
+	echo "From: bob (unterminated comment smith <bob@example.com>" >in &&
+	git mailinfo /dev/null /dev/null <in >actual &&
+	cat >expect <<-\EOF &&
+	Author: bob (unterminated comment smith
+	Email: bob@example.com
+
+	EOF
+	test_cmp expect actual
+'
+
 test_done
-- 
2.43.0.310.g85bf1f633d

^ permalink raw reply related

* Re: [PATCH] revision: parse integer arguments to --max-count, --skip, etc., more carefully
From: Jeff King @ 2023-12-12 22:05 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Britton Kerin
In-Reply-To: <xmqqjzpjsbjl.fsf@gitster.g>

On Tue, Dec 12, 2023 at 07:09:02AM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > This all looks pretty reasonable to me.
> >
> > I couldn't help but think, though, that surely we have some helpers for
> > this already? But the closest seems to be git_parse_int(), which also
> > allows unit factors. I'm not sure if allowing "-n 1k" would be a feature
> > or a bug. ;)
> 
> The change in question is meant to be a pure fix to replace a careless
> use of atoi().  I do not mind to see a separate patch to add such a
> feature later on top.

Oh, I mostly meant that I would have turned to git_parse_int() as that
already-existing helper, but it is not suitable because of the extra
unit-handling. I think your patch draws the line in the right place.

> > I wonder if there are more spots that could benefit.
> 
> "git grep -e 'atoi('" would give somebody motivated a decent set of
> #microproject ideas, but many hits are not suited for strtol_i(),
> which is designed to parse an integer at the end of a string.  Some
> places use atoi() immediately followed by strspn() to skip over
> digits, which means they are parsing an integer and want to continue
> reading after the integer, which is incompatible with what
> strtol_i() wants to do.  They need either a separate helper or an
> updated strtol_i() that optionally allows you to parse the prefix
> and report where the integer ended, e.g., something like:

Yeah, I agree this might be a good microproject (or leftoverbits) area,
and the semantics for the helper you propose make sense to me.

-Peff

^ permalink raw reply

* Re: [PATCH v2 2/4] refs: propagate errno when reading special refs fails
From: Junio C Hamano @ 2023-12-12 20:28 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Taylor Blau, Phillip Wood
In-Reply-To: <24032a62e54fb37dad46c3ede7151efc8a7a8818.1702365291.git.ps@pks.im>

Patrick Steinhardt <ps@pks.im> writes:

> Some refs in Git are more special than others due to reasons explained
> in the next commit. These refs are read via `refs_read_special_head()`,
> but this function doesn't behave the same as when we try to read a
> normal ref. Most importantly, we do not propagate `failure_errno` in the
> case where the reference does not exist, which is behaviour that we rely
> on in many parts of Git.
>
> Fix this bug by propagating errno when `strbuf_read_file()` fails.

Hmph, I thought, judging from what [1/4] did, that your plan was to
use the refs API, instead of peeking directly into the filesystem,
to access these pseudo refs, and am a bit puzzled if it makes all
that much difference to fix errno handling while still reading
directly from the filesystem.  Perhaps such a conversion happens in
later steps of this series (or a follow on series after this series
lands)?

>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  refs.c              |  4 +++-
>  t/t1403-show-ref.sh | 10 ++++++++++
>  2 files changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/refs.c b/refs.c
> index fcae5dddc6..00e72a2abf 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -1806,8 +1806,10 @@ static int refs_read_special_head(struct ref_store *ref_store,
>  	int result = -1;
>  	strbuf_addf(&full_path, "%s/%s", ref_store->gitdir, refname);
>  
> -	if (strbuf_read_file(&content, full_path.buf, 0) < 0)
> +	if (strbuf_read_file(&content, full_path.buf, 0) < 0) {
> +		*failure_errno = errno;
>  		goto done;
> +	}
>  
>  	result = parse_loose_ref_contents(content.buf, oid, referent, type,
>  					  failure_errno);
> diff --git a/t/t1403-show-ref.sh b/t/t1403-show-ref.sh
> index b50ae6fcf1..66e6e77fa9 100755
> --- a/t/t1403-show-ref.sh
> +++ b/t/t1403-show-ref.sh
> @@ -266,4 +266,14 @@ test_expect_success '--exists with directory fails with generic error' '
>  	test_cmp expect err
>  '
>  
> +test_expect_success '--exists with non-existent special ref' '
> +	test_expect_code 2 git show-ref --exists FETCH_HEAD
> +'
> +
> +test_expect_success '--exists with existing special ref' '
> +	test_when_finished "rm .git/FETCH_HEAD" &&
> +	git rev-parse HEAD >.git/FETCH_HEAD &&
> +	git show-ref --exists FETCH_HEAD
> +'
> +
>  test_done

^ permalink raw reply

* Re: [PATCH v2 1/4] wt-status: read HEAD and ORIG_HEAD via the refdb
From: Junio C Hamano @ 2023-12-12 20:24 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Taylor Blau, Phillip Wood
In-Reply-To: <1db3eb3945432964aabe1c559db4c3ac251e83fd.1702365291.git.ps@pks.im>

Patrick Steinhardt <ps@pks.im> writes:

> The current code only works by chance because we only have a single
> reference backend implementation. Refactor it to instead read both refs
> via the refdb layer so that we'll also be compatible with alternate
> reference backends.

"via the refdb" -> "via the refs API" or something here and on the
title, and possibly elsewhere in the proposed log messages and
in-code comments in patches in this series, as I've never seen a
word "refdb" used in the context of this project.

I agree it is bad manners to be intimate with the implementation
details of the how files-backend stores HEAD and ORIG_HEAD.

> Note that we pass `RESOLVE_REF_NO_RECURSE` to `read_ref_full()`. This is
> because we didn't resolve symrefs before either, and in practice none of
> the refs in "rebase-merge/" would be symbolic. We thus don't want to
> resolve symrefs with the new code either to retain the old behaviour.

Good to see a rewrite being careful like this.

> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  wt-status.c | 17 +++++++++--------
>  1 file changed, 9 insertions(+), 8 deletions(-)
>
> diff --git a/wt-status.c b/wt-status.c
> index 9f45bf6949..fe9e590b80 100644
> --- a/wt-status.c
> +++ b/wt-status.c
> @@ -1295,26 +1295,27 @@ static char *read_line_from_git_path(const char *filename)
>  static int split_commit_in_progress(struct wt_status *s)
>  {
>  	int split_in_progress = 0;
> -	char *head, *orig_head, *rebase_amend, *rebase_orig_head;
> +	struct object_id head_oid, orig_head_oid;
> +	char *rebase_amend, *rebase_orig_head;
>  
>  	if ((!s->amend && !s->nowarn && !s->workdir_dirty) ||
>  	    !s->branch || strcmp(s->branch, "HEAD"))
>  		return 0;
>  
> -	head = read_line_from_git_path("HEAD");
> -	orig_head = read_line_from_git_path("ORIG_HEAD");
> +	if (read_ref_full("HEAD", RESOLVE_REF_NO_RECURSE, &head_oid, NULL) ||
> +	    read_ref_full("ORIG_HEAD", RESOLVE_REF_NO_RECURSE, &orig_head_oid, NULL))
> +		return 0;
> +

This made me wonder if we have changed behaviour when on an unborn
branch.  In such a case, the original most likely would have read
"ref: blah" in "head" and compared it with "rebase_amend", which
would be a good way to ensure they would not match.  I would not
know offhand what the updated code would do, but head_oid would be
uninitialized in such a case, so ...?

>  	rebase_amend = read_line_from_git_path("rebase-merge/amend");
>  	rebase_orig_head = read_line_from_git_path("rebase-merge/orig-head");
>  
> -	if (!head || !orig_head || !rebase_amend || !rebase_orig_head)
> +	if (!rebase_amend || !rebase_orig_head)
>  		; /* fall through, no split in progress */
>  	else if (!strcmp(rebase_amend, rebase_orig_head))
> -		split_in_progress = !!strcmp(head, rebase_amend);
> -	else if (strcmp(orig_head, rebase_orig_head))
> +		split_in_progress = !!strcmp(oid_to_hex(&head_oid), rebase_amend);
> +	else if (strcmp(oid_to_hex(&orig_head_oid), rebase_orig_head))
>  		split_in_progress = 1;
>  
> -	free(head);
> -	free(orig_head);
>  	free(rebase_amend);
>  	free(rebase_orig_head);

^ permalink raw reply

* Re: [PATCH] Use ^=1 to toggle between 0 and 1
From: Jeff King @ 2023-12-12 20:09 UTC (permalink / raw)
  To: AtariDreams via GitGitGadget; +Cc: git, AtariDreams, Seija Kijin
In-Reply-To: <pull.1620.git.git.1702401468082.gitgitgadget@gmail.com>

On Tue, Dec 12, 2023 at 05:17:47PM +0000, AtariDreams via GitGitGadget wrote:

> diff --git a/builtin/fast-export.c b/builtin/fast-export.c
> index 70aff515acb..f9f2c9dd850 100644
> --- a/builtin/fast-export.c
> +++ b/builtin/fast-export.c
> @@ -593,8 +593,8 @@ static void anonymize_ident_line(const char **beg, const char **end)
>  	struct ident_split split;
>  	const char *end_of_header;
>  
> -	out = &buffers[which_buffer++];
> -	which_buffer %= ARRAY_SIZE(buffers);
> +	out = &buffers[which_buffer];
> +	which_buffer ^= 1;

In the current code, if the size of "buffers" is increased then
everything would just work. But your proposed code (rather subtly) makes
the assumption that ARRAY_SIZE(buffers) is 2.

So even leaving aside questions of readability, I think the existing
code is much more maintainable.

> diff --git a/diff.c b/diff.c
> index 2c602df10a3..91842b54753 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -1191,7 +1191,7 @@ static void mark_color_as_moved(struct diff_options *o,
>  							    &pmb_nr);
>  
>  			if (contiguous && pmb_nr && moved_symbol == l->s)
> -				flipped_block = (flipped_block + 1) % 2;
> +				flipped_block ^= 1;
>  			else
>  				flipped_block = 0;

This one I do not see any problem with changing, though I think it is a
matter of opinion on which is more readable (I actually tend to think of
"x = 0 - x" as idiomatic for flipping).

> diff --git a/ident.c b/ident.c
> index cc7afdbf819..188826eed63 100644
> --- a/ident.c
> +++ b/ident.c
> @@ -459,7 +459,7 @@ const char *fmt_ident(const char *name, const char *email,
>  	int want_name = !(flag & IDENT_NO_NAME);
>  
>  	struct strbuf *ident = &ident_pool[index];
> -	index = (index + 1) % ARRAY_SIZE(ident_pool);
> +	index ^= 1;
>  
>  	if (!email) {
>  		if (whose_ident == WANT_AUTHOR_IDENT && git_author_email.len)

This has the same problem as the first case.

> diff --git a/t/helper/test-path-utils.c b/t/helper/test-path-utils.c
> index 70396fa3845..241136148a5 100644
> --- a/t/helper/test-path-utils.c
> +++ b/t/helper/test-path-utils.c
> @@ -185,7 +185,7 @@ static int check_dotfile(const char *x, const char **argv,
>  	int res = 0, expect = 1;
>  	for (; *argv; argv++) {
>  		if (!strcmp("--not", *argv))
> -			expect = !expect;
> +			expect ^= 1;

This one is not wrong, but IMHO it is more clear to express negation of
a boolean using "!" (i.e., what the code is already doing).


So of the four hunks, only the second one seems like a possible
improvement, and even there I am not sure the readability is better.

-Peff

^ permalink raw reply

* Re: Test breakage with zlib-ng
From: Jeff King @ 2023-12-12 20:01 UTC (permalink / raw)
  To: René Scharfe
  Cc: Ondrej Pohorelsky, git, brian m . carlson, Junio C Hamano
In-Reply-To: <9feeb6cf-aabf-4002-917f-3f6c27547bc8@web.de>

On Tue, Dec 12, 2023 at 06:04:55PM +0100, René Scharfe wrote:

> Subject: [PATCH] t6300: avoid hard-coding object sizes
> 
> f4ee22b526 (ref-filter: add tests for objectsize:disk, 2018-12-24)
> hard-coded the expected object sizes.  Coincidentally the size of commit
> and tag is the same with zlib at the default compression level.
> 
> 1f5f8f3e85 (t6300: abstract away SHA-1-specific constants, 2020-02-22)
> encoded the sizes as a single value, which coincidentally also works
> with sha256.
> 
> Different compression libraries like zlib-ng may arrive at different
> values.  Get them from the file system instead of hard-coding them to
> make switching the compression library (or changing the compression
> level) easier.

Yeah, this is definitely the right solution here. I'm surprised the
hard-coded values didn't cause problems before now. ;)

The patch looks good to me, but a few small comments:

> +test_object_file_size () {
> +	oid=$(git rev-parse "$1")
> +	path=".git/objects/$(test_oid_to_path $oid)"
> +	test_file_size "$path"
> +}

Here we're assuming the objects are loose. I think that's probably OK
(and certainly the test will notice if that changes).

We're covering the formatting code paths along with the underlying
implementation that fills in object_info->disk_sizep for loose objects.
Which I think is plenty for this particular script, which is about
for-each-ref.

It would be nice to have coverage of the packed_object_info() code path,
though. Back when it was added in a4ac106178 (cat-file: add
%(objectsize:disk) format atom, 2013-07-10), I cowardly punted on this,
writing:

  This patch does not include any tests, as the exact numbers
  returned are volatile and subject to zlib and packing
  decisions. We cannot even reliably guarantee that the
  on-disk size is smaller than the object content (though in
  general this should be the case for non-trivial objects).

I don't think it's that big a deal, but I guess we could do something
like:

  prev=
  git show-index <$pack_idx |
  sort -n |
  grep -A1 $oid |
  while read ofs oid csum
  do
    test -n "$prev" && echo "$((ofs - prev))"
    prev=$ofs
  done

It feels a little redundant with what Git is doing under the hood, but
at least is exercising the code (and we're using the idx directly, so
we're confirming that the revindex is right).

Anyway, that is all way beyond the scope of your patch, but I wonder if
it's worth doing on top.

> @@ -129,7 +129,7 @@ test_atom head push:strip=1 remotes/myfork/main
>  test_atom head push:strip=-1 main
>  test_atom head objecttype commit
>  test_atom head objectsize $((131 + hexlen))
> -test_atom head objectsize:disk $disklen
> +test_atom head objectsize:disk $(test_object_file_size refs/heads/main)

These test_object_file_size calls are happening outside of any
test_expect_* block, so we'd miss failing exit codes (and also the
helper is not &&-chained), and any stderr would leak to the output.
That's probably OK in practice, though (if something goes wrong then the
expected value output will be bogus and the test itself will fail).

-Peff

^ permalink raw reply

* Re: [PATCH] tests: prefer host Git to verify chainlint self-checks
From: Junio C Hamano @ 2023-12-12 19:30 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Eric Sunshine
In-Reply-To: <4112adbe467c14a8f22a87ea41aa4705f8760cf6.1702380646.git.ps@pks.im>

Patrick Steinhardt <ps@pks.im> writes:

> To accomodate for cases where the host system has no Git installation we
> use the locally-compiled version of Git. This can result in problems
> though when the Git project's repository is using extensions that the
> locally-compiled version of Git doesn't understand. It will refuse to
> run and thus cause the checks to fail.
>
> Fix this issue by prefering the host's Git resolved via PATH. If it
> doesn't exist, then we fall back to the locally-compiled Git version and
> diff as before.
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>
> I've started to dogfood the reftable backend on my local machine and
> have converted many repositories to use the reftable backend. This
> surfaced the described issue because the repository now sets up the
> "extensions.refStorage" extension, and thus "check-chainlint" fails
> depending on which versions of Git I'm trying to compile and test.

I do not think "prefer host Git" is necessarily a good idea; falling
back to use host Git is perfectly fine, of course.

Other than that, I agree with the motivation.

>  t/Makefile | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/t/Makefile b/t/Makefile
> index 225aaf78ed..8b7f7aceaa 100644
> --- a/t/Makefile
> +++ b/t/Makefile
> @@ -111,7 +111,9 @@ check-chainlint:
>  	if test -f ../GIT-BUILD-OPTIONS; then \
>  		. ../GIT-BUILD-OPTIONS; \
>  	fi && \
> -	if test -x ../git$$X; then \
> +	if command -v git >/dev/null 2>&1; then \
> +		DIFFW="git --no-pager diff -w --no-index"; \
> +	elif test -x ../git$$X; then \
>  		DIFFW="../git$$X --no-pager diff -w --no-index"; \
>  	else \
>  		DIFFW="diff -w -u"; \

^ permalink raw reply

* Re: [PATCH v2 0/7] clone: fix init of refdb with wrong object format
From: Junio C Hamano @ 2023-12-12 19:20 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git
In-Reply-To: <cover.1702361370.git.ps@pks.im>

Patrick Steinhardt <ps@pks.im> writes:

> this is the second version of my patch series to make git-clone(1)
> initialize the reference database with the correct hash format. This is
> a preparatory step for the reftable backend, which encodes the hash
> format on-disk and thus requires us to get the hash format right at the
> time of initialization.
>
> Changes compared to v1:
>
>   - Patch 1: Extend the comment explaining why we always create the
>     "refs/" directory.
>
>   - Patch 3: Explain better why the patch is required in the first
>     place.
>
>   - Patch 4: Fix a typo.
>
>   - Patch 7: Adapt a failing test to now assert the new behaviour.
>
> Thanks for your reviews!

Everything I see in the range-diff looks very sensible.

Will replace.  Thanks.

^ permalink raw reply

* Re: Propose a change in open for passing in the file type.
From: Torsten Bögershausen @ 2023-12-12 17:29 UTC (permalink / raw)
  To: Haritha D; +Cc: git@vger.kernel.org
In-Reply-To: <E1D54D98-3836-41CA-84B5-32AEAF7642D8@ibm.com>

On Tue, Dec 12, 2023 at 02:46:04PM +0000, Haritha D wrote:
> Hi Everyone,
>
> Am working on porting git to z/OS. For reference, the pull request am working on https://github.com/git/git/pull/1537.
>
> On z/OS there is a notion of file tag attributes. Files can be tagged as binary, ASCII, UTF8, EBCDIC, etc. z/OS uses these attributes to determine if auto-conversion is necessary. It was recommended in PR that we add logic directly to xopen . In order for me to do this in xopen , I have to pass an extra parameter to xopen that specifies the file type. 
>  
> Ex:
> xopen(output_file, O_CREAT | O_WRONLY | O_TRUNC, 0666);
>
> To :
> xopen(output_file, O_CREAT | O_WRONLY | O_TRUNC, 0666, BINARY);
>
> BINARY: would be an enum value.
>
> Would this be okay to do? Or are there other recommendations?

I think that the suggestion was to embedd the BINARY thing into xopen(),
not to all callers of xopen().

If you have this type of special handling for one specific OS,
call it quirks, if you want, they are better placed in an own file.

Please see e.g. compat/mingw.c as an example, how things are solved
for Git for windows.
Then there are some include files, which re-define things like open(),
if needed.

Another example could be
compat/win32/dirent.c

Where exactly things are placed, is may be a matter of taste,
that may be decided later.
In your case a file like
compat/z_os.c and z_os.h may make clear what this is about to everybody.

>
> Best regards
> Haritha
>
>

^ permalink raw reply

* Re: [PATCH] Use ^=1 to toggle between 0 and 1
From: Dragan Simic @ 2023-12-12 17:29 UTC (permalink / raw)
  To: AtariDreams via GitGitGadget; +Cc: git, AtariDreams, Seija Kijin
In-Reply-To: <pull.1620.git.git.1702401468082.gitgitgadget@gmail.com>

On 2023-12-12 18:17, AtariDreams via GitGitGadget wrote:
> From: Seija Kijin <doremylover123@gmail.com>
> 
> If it is known that an int is either 1 or 0,
> doing an exclusive or to switch instead of a
> modulus makes more sense and is more efficient.

Quite frankly, this doesn't seem like an improvement to me.  It makes 
the code much less readable, more error-prone, and may even end up 
producing code that isn't portable.

Regarding the efficiency, such optimizations may be perfectly fine as a 
trade-off in some critical paths, but these cases don't seem like that.

> Signed-off-by: Seija Kijin doremylover123@gmail.com
> ---
>     Use ^=1 to toggle between 0 and 1
> 
>     If it is known that an int is either 1 or 0, doing an exclusive or 
> to
>     switch instead of a modulus makes more sense and is more efficient.
> 
>     Signed-off-by: Seija Kijin doremylover123@gmail.com
> 
> Published-As:
> https://github.com/gitgitgadget/git/releases/tag/pr-git-1620%2FAtariDreams%2Fbuffer-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git
> pr-git-1620/AtariDreams/buffer-v1
> Pull-Request: https://github.com/git/git/pull/1620
> 
>  builtin/fast-export.c      | 4 ++--
>  diff.c                     | 2 +-
>  ident.c                    | 2 +-
>  t/helper/test-path-utils.c | 2 +-
>  4 files changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/builtin/fast-export.c b/builtin/fast-export.c
> index 70aff515acb..f9f2c9dd850 100644
> --- a/builtin/fast-export.c
> +++ b/builtin/fast-export.c
> @@ -593,8 +593,8 @@ static void anonymize_ident_line(const char **beg,
> const char **end)
>  	struct ident_split split;
>  	const char *end_of_header;
> 
> -	out = &buffers[which_buffer++];
> -	which_buffer %= ARRAY_SIZE(buffers);
> +	out = &buffers[which_buffer];
> +	which_buffer ^= 1;
>  	strbuf_reset(out);
> 
>  	/* skip "committer", "author", "tagger", etc */
> diff --git a/diff.c b/diff.c
> index 2c602df10a3..91842b54753 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -1191,7 +1191,7 @@ static void mark_color_as_moved(struct 
> diff_options *o,
>  							    &pmb_nr);
> 
>  			if (contiguous && pmb_nr && moved_symbol == l->s)
> -				flipped_block = (flipped_block + 1) % 2;
> +				flipped_block ^= 1;
>  			else
>  				flipped_block = 0;
> 
> diff --git a/ident.c b/ident.c
> index cc7afdbf819..188826eed63 100644
> --- a/ident.c
> +++ b/ident.c
> @@ -459,7 +459,7 @@ const char *fmt_ident(const char *name, const char 
> *email,
>  	int want_name = !(flag & IDENT_NO_NAME);
> 
>  	struct strbuf *ident = &ident_pool[index];
> -	index = (index + 1) % ARRAY_SIZE(ident_pool);
> +	index ^= 1;
> 
>  	if (!email) {
>  		if (whose_ident == WANT_AUTHOR_IDENT && git_author_email.len)
> diff --git a/t/helper/test-path-utils.c b/t/helper/test-path-utils.c
> index 70396fa3845..241136148a5 100644
> --- a/t/helper/test-path-utils.c
> +++ b/t/helper/test-path-utils.c
> @@ -185,7 +185,7 @@ static int check_dotfile(const char *x, const char 
> **argv,
>  	int res = 0, expect = 1;
>  	for (; *argv; argv++) {
>  		if (!strcmp("--not", *argv))
> -			expect = !expect;
> +			expect ^= 1;
>  		else if (expect != (is_hfs(*argv) || is_ntfs(*argv)))
>  			res = error("'%s' is %s.git%s", *argv,
>  				    expect ? "not " : "", x);
> 
> base-commit: 1a87c842ece327d03d08096395969aca5e0a6996

^ permalink raw reply

* [PATCH] Use ^=1 to toggle between 0 and 1
From: AtariDreams via GitGitGadget @ 2023-12-12 17:17 UTC (permalink / raw)
  To: git; +Cc: AtariDreams, Seija Kijin

From: Seija Kijin <doremylover123@gmail.com>

If it is known that an int is either 1 or 0,
doing an exclusive or to switch instead of a
modulus makes more sense and is more efficient.

Signed-off-by: Seija Kijin doremylover123@gmail.com
---
    Use ^=1 to toggle between 0 and 1
    
    If it is known that an int is either 1 or 0, doing an exclusive or to
    switch instead of a modulus makes more sense and is more efficient.
    
    Signed-off-by: Seija Kijin doremylover123@gmail.com

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1620%2FAtariDreams%2Fbuffer-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1620/AtariDreams/buffer-v1
Pull-Request: https://github.com/git/git/pull/1620

 builtin/fast-export.c      | 4 ++--
 diff.c                     | 2 +-
 ident.c                    | 2 +-
 t/helper/test-path-utils.c | 2 +-
 4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 70aff515acb..f9f2c9dd850 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -593,8 +593,8 @@ static void anonymize_ident_line(const char **beg, const char **end)
 	struct ident_split split;
 	const char *end_of_header;
 
-	out = &buffers[which_buffer++];
-	which_buffer %= ARRAY_SIZE(buffers);
+	out = &buffers[which_buffer];
+	which_buffer ^= 1;
 	strbuf_reset(out);
 
 	/* skip "committer", "author", "tagger", etc */
diff --git a/diff.c b/diff.c
index 2c602df10a3..91842b54753 100644
--- a/diff.c
+++ b/diff.c
@@ -1191,7 +1191,7 @@ static void mark_color_as_moved(struct diff_options *o,
 							    &pmb_nr);
 
 			if (contiguous && pmb_nr && moved_symbol == l->s)
-				flipped_block = (flipped_block + 1) % 2;
+				flipped_block ^= 1;
 			else
 				flipped_block = 0;
 
diff --git a/ident.c b/ident.c
index cc7afdbf819..188826eed63 100644
--- a/ident.c
+++ b/ident.c
@@ -459,7 +459,7 @@ const char *fmt_ident(const char *name, const char *email,
 	int want_name = !(flag & IDENT_NO_NAME);
 
 	struct strbuf *ident = &ident_pool[index];
-	index = (index + 1) % ARRAY_SIZE(ident_pool);
+	index ^= 1;
 
 	if (!email) {
 		if (whose_ident == WANT_AUTHOR_IDENT && git_author_email.len)
diff --git a/t/helper/test-path-utils.c b/t/helper/test-path-utils.c
index 70396fa3845..241136148a5 100644
--- a/t/helper/test-path-utils.c
+++ b/t/helper/test-path-utils.c
@@ -185,7 +185,7 @@ static int check_dotfile(const char *x, const char **argv,
 	int res = 0, expect = 1;
 	for (; *argv; argv++) {
 		if (!strcmp("--not", *argv))
-			expect = !expect;
+			expect ^= 1;
 		else if (expect != (is_hfs(*argv) || is_ntfs(*argv)))
 			res = error("'%s' is %s.git%s", *argv,
 				    expect ? "not " : "", x);

base-commit: 1a87c842ece327d03d08096395969aca5e0a6996
-- 
gitgitgadget

^ permalink raw reply related

* Re: Test breakage with zlib-ng
From: René Scharfe @ 2023-12-12 17:04 UTC (permalink / raw)
  To: Ondrej Pohorelsky, git; +Cc: brian m . carlson, Junio C Hamano
In-Reply-To: <CA+B51BEpSh1wT627Efpysw3evVocpiDCoQ3Xaza6jKE3B62yig@mail.gmail.com>

Am 12.12.23 um 15:16 schrieb Ondrej Pohorelsky:
> Hi everyone,
>
> As some might have heard, there is a proposal for Fedora 40 to
> transition from zlib to zlib-ng[0]. Because of this, there has been a
> rebuild of all packages to ensure every package works under zlib-ng.
>
> Git test suit has three breakages in t6300-for-each-ref.sh.
> To be precise, it is:
>
> not ok 35 - basic atom: refs/heads/main objectsize:disk
> not ok 107 - basic atom: refs/tags/testtag objectsize:disk
> not ok 108 - basic atom: refs/tags/testtag *objectsize:disk
>
>
> All of these tests are atomic, and they compare the result against
> $disklen.

Why do these three objects (HEAD commit of main, testtag and testtag
target) have the same size?  Half of the answer is that testtag points
to the HEAD of main.  But the other half is pure coincidence as far as I
can see.

> I can easily patch these tests in Fedora to be compatible with zlib-ng
> only by not comparing to $disklen, but a concrete value, however I
> would like to have a universal solution that works with both zlib and
> zlib-ng. So if anyone has an idea on how to do it, please let me know.

The test stores the expected values at the top, in the following lines,
for the two possible repository formats:

	test_expect_success setup '
		test_oid_cache <<-EOF &&
		disklen sha1:138
		disklen sha256:154
		EOF

So it's using hard-coded values already, which breaks when the
compression rate changes.

We could set core.compression to 0 to take compression out of the
picture.

Or we could get the sizes of the objects by checking their files,
which would not require  hard-coding anymore.  Patch below.

--- >8 ---
Subject: [PATCH] t6300: avoid hard-coding object sizes

f4ee22b526 (ref-filter: add tests for objectsize:disk, 2018-12-24)
hard-coded the expected object sizes.  Coincidentally the size of commit
and tag is the same with zlib at the default compression level.

1f5f8f3e85 (t6300: abstract away SHA-1-specific constants, 2020-02-22)
encoded the sizes as a single value, which coincidentally also works
with sha256.

Different compression libraries like zlib-ng may arrive at different
values.  Get them from the file system instead of hard-coding them to
make switching the compression library (or changing the compression
level) easier.

Reported-by: Ondrej Pohorelsky <opohorel@redhat.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
---
 t/t6300-for-each-ref.sh | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
index 54e2281259..843a7fe143 100755
--- a/t/t6300-for-each-ref.sh
+++ b/t/t6300-for-each-ref.sh
@@ -20,12 +20,13 @@ setdate_and_increment () {
     export GIT_COMMITTER_DATE GIT_AUTHOR_DATE
 }

-test_expect_success setup '
-	test_oid_cache <<-EOF &&
-	disklen sha1:138
-	disklen sha256:154
-	EOF
+test_object_file_size () {
+	oid=$(git rev-parse "$1")
+	path=".git/objects/$(test_oid_to_path $oid)"
+	test_file_size "$path"
+}

+test_expect_success setup '
 	# setup .mailmap
 	cat >.mailmap <<-EOF &&
 	A Thor <athor@example.com> A U Thor <author@example.com>
@@ -94,7 +95,6 @@ test_atom () {
 }

 hexlen=$(test_oid hexsz)
-disklen=$(test_oid disklen)

 test_atom head refname refs/heads/main
 test_atom head refname: refs/heads/main
@@ -129,7 +129,7 @@ test_atom head push:strip=1 remotes/myfork/main
 test_atom head push:strip=-1 main
 test_atom head objecttype commit
 test_atom head objectsize $((131 + hexlen))
-test_atom head objectsize:disk $disklen
+test_atom head objectsize:disk $(test_object_file_size refs/heads/main)
 test_atom head deltabase $ZERO_OID
 test_atom head objectname $(git rev-parse refs/heads/main)
 test_atom head objectname:short $(git rev-parse --short refs/heads/main)
@@ -203,8 +203,8 @@ test_atom tag upstream ''
 test_atom tag push ''
 test_atom tag objecttype tag
 test_atom tag objectsize $((114 + hexlen))
-test_atom tag objectsize:disk $disklen
-test_atom tag '*objectsize:disk' $disklen
+test_atom tag objectsize:disk $(test_object_file_size refs/tags/testtag)
+test_atom tag '*objectsize:disk' $(test_object_file_size refs/heads/main)
 test_atom tag deltabase $ZERO_OID
 test_atom tag '*deltabase' $ZERO_OID
 test_atom tag objectname $(git rev-parse refs/tags/testtag)
--
2.43.0

^ permalink raw reply related

* Re: [PATCH 02/24] pack-bitmap-write: deep-clear the `bb_commit` slab
From: Taylor Blau @ 2023-12-12 16:48 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Patrick Steinhardt, Junio C Hamano
In-Reply-To: <20231212070406.GA1117953@coredump.intra.peff.net>

On Tue, Dec 12, 2023 at 02:04:06AM -0500, Jeff King wrote:
> On Tue, Nov 28, 2023 at 02:07:59PM -0500, Taylor Blau wrote:
>
> > +static void clear_bb_commit(struct bb_commit *commit)
> > +{
> > +	free(commit->reverse_edges);
> > +	bitmap_free(commit->commit_mask);
> > +	bitmap_free(commit->bitmap);
> > +}
>
> I think these bitmaps may sometimes be NULL. But double-checking
> bitmap_free(), it sensibly is noop when passed NULL. So this look good
> to me.

Yeah, bitamp_free() handles a NULL input correctly, so we can pass a
possibly-NULL `commit->commit_mask` or `commit->bitmap` argument and be
OK.

Thanks,
Taylor

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox