Git development

Git development
 help / color / mirror / Atom feed

* Re: [PATCH] mailinfo: fix out-of-bounds memory reads in unquote_quoted_pair()
From: Patrick Steinhardt @ 2023-12-14  7:41 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Taylor Blau, Carlos Andrés Ramírez Cataño
In-Reply-To: <20231213082027.GB1684525@coredump.intra.peff.net>

[-- Attachment #1: Type: text/plain, Size: 2643 bytes --]

On Wed, Dec 13, 2023 at 03:20:27AM -0500, Jeff King wrote:
> On Wed, Dec 13, 2023 at 08:07:21AM +0100, Patrick Steinhardt wrote:
[snip]
> > Another thing I was wondering about is the recursive nature of these
> > functions, and whether it can lead to similar issues like we recently
> > had when parsing revisions with stack exhaustion. I _think_ this should
> > not be much of a problem in this case though, as we're talking about
> > mail headers here. While the length of header values isn't restricted
> > per se (the line length is restricted to 1000, but with Comment Folding
> > Whitespace values can span multiple lines), but mail provdiers will sure
> > clamp down on mails with a "From:" header that is megabytes in size.
> 
> It's just unquote_comment() that is recursive, but yeah, there is
> nothing to stop it from recursing forever on "((((((...". The stack
> requirements are pretty small, though. I needed between 2^17 and 2^18
> bytes to segfault on my machine using:
> 
>   perl -e 'print "From: ", "(" x 2**18;' |
>   git mailinfo /dev/null /dev/null
> 
> Absurdly big for an email, but maybe within the realm of possibility? I
> think it might be possible to drop the recursion and just use a depth
> counter, like this:

It's definitely not as large as I'd have expected it to be, we're only
talking about kilobytes of data. Feels like it might be in the realm of
possibility to get transferred by a mail provider.

> diff --git a/mailinfo.c b/mailinfo.c
> index 737b9e5e13..db236f9f9f 100644
> --- a/mailinfo.c
> +++ b/mailinfo.c
> @@ -59,6 +59,7 @@ static void parse_bogus_from(struct mailinfo *mi, const struct strbuf *line)
>  static const char *unquote_comment(struct strbuf *outbuf, const char *in)
>  {
>  	int take_next_literally = 0;
> +	int depth = 1;
>  
>  	strbuf_addch(outbuf, '(');
>  
> @@ -72,11 +73,14 @@ static const char *unquote_comment(struct strbuf *outbuf, const char *in)
>  				take_next_literally = 1;
>  				continue;
>  			case '(':
> -				in = unquote_comment(outbuf, in);
> +				strbuf_addch(outbuf, '(');
> +				depth++;
>  				continue;
>  			case ')':
>  				strbuf_addch(outbuf, ')');
> -				return in;
> +				if (!--depth)
> +					return in;
> +				continue;
>  			}
>  		}
>  
> I doubt it's a big deal either way, but if it's that easy to do it might
> be worth it.

Isn't this only protecting against unbalanced braces? That might be a
sensible check to do regardless, but does it protect against recursion
blowing the stack if you just happen to have many opening braces?

Might be I'm missing something.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH] tests: prefer host Git to verify chainlint self-checks
From: Eric Sunshine @ 2023-12-14  3:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Patrick Steinhardt, git
In-Reply-To: <xmqqr0jqnnmn.fsf@gitster.g>

On Wed, Dec 13, 2023 at 10:11 AM Junio C Hamano <gitster@pobox.com> wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> >> I do not think "prefer host Git" is necessarily a good idea; falling
> >> back to use host Git is perfectly fine, of course.
> >
> > Why is that, though?
>
> Mostly because your "differences in features supported by just-built
> one and what happens to be on $PATH can cause problems" cuts both
> ways [...]

I sent an alternative solution[1] which should sidestep this objection.

As usual, I forgot to use --in-reply-to=<this-thread> when sending the
patch, even though I reminded myself to use it only a minute or so
earlier. Sorry.

[1]: https://lore.kernel.org/git/20231214032248.1615-1-ericsunshine@charter.net/

^ permalink raw reply

* [PATCH] tests: drop dependency on `git diff` in check-chainlint
From: Eric Sunshine @ 2023-12-14  3:22 UTC (permalink / raw)
  To: git; +Cc: Eric Sunshine, Junio C Hamano, Patrick Steinhardt, Eric Sunshine

From: Eric Sunshine <sunshine@sunshineco.com>

The "check-chainlint" target runs automatically when running tests and
performs self-checks to verify that the chainlinter itself produces the
expected output. Originally, the chainlinter was implemented via sed,
but the infrastructure has been rewritten in fb41727b7e (t: retire
unused chainlint.sed, 2022-09-01) to use a Perl script instead.

The rewrite caused some slight whitespace changes in the output that are
ultimately not of much importance. In order to be able to assert that
the actual chainlinter errors match our expectations we thus have to
ignore whitespace characters when diffing them. As the `-w` flag is not
in POSIX we try to use `git diff -w --no-index` before we fall back to
non-standard `diff -w -u`.

To accommodate for cases where the host system has no Git installation
we use the locally-compiled version of Git. This can result in problems
though when the Git project's repository is using extensions that the
locally-compiled version of Git doesn't understand, in which case `git`
may refuse to run and thus cause the checks to fail.

Work around this issue by normalizing whitespace via sed before invoking
diff, which allows any platform diff implementation to be used, thus
eliminating the dependency upon `git diff` and the non-POSIX `-w` flag.

Reported-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
---

This is an alternative solution to the issue Patrick's patch[1]
addresses. Hopefully, this approach should avoid the sort of push-back
Patrick's patch received[2].

I shamelessly stole most of Patrick's commit message.

The sed expressions for normalizing whitespace prior to `diff` may look
a bit hairy, but they are simple enough in concept:

* collapse runs of whitespace to a single SP
* drop blank lines (this step is not new)
* fold out possible SP at beginning and end of each line
* fold out SP surrounding common punctuation characters used in shell
  scripts, such as `>`, `|`, `;`, etc.

By the way, I'm somewhat surprised that this issue crops up at all
considering that --no-index is being used with git-diff. As such, I
would have thought that the local repository's format would not have
been interrogated at all. If that's a bug in `git diff --no-index`, then
fixing that could be considered yet another alternative solution to the
issue raised here.

[1]: https://lore.kernel.org/git/4112adbe467c14a8f22a87ea41aa4705f8760cf6.1702380646.git.ps@pks.im/
[2]: https://lore.kernel.org/git/xmqqr0jqnnmn.fsf@gitster.g/

 t/Makefile | 14 +++-----------
 1 file changed, 3 insertions(+), 11 deletions(-)

diff --git a/t/Makefile b/t/Makefile
index 225aaf78ed..656ff10afa 100644
--- a/t/Makefile
+++ b/t/Makefile
@@ -103,20 +103,12 @@ check-chainlint:
 		echo "# chainlint: $(CHAINLINTTMP_SQ)/tests" && \
 		for i in $(CHAINLINTTESTS); do \
 			echo "# chainlint: $$i" && \
-			sed -e '/^[ 	]*$$/d' chainlint/$$i.expect; \
+			sed -e 's/[ 	][ 	]*/ /g;/^ *$$/d;s/^ //;s/ $$//;s/\([<>|();&]\) /\1/g;s/ \([<>|();&]\)/\1/g' chainlint/$$i.expect; \
 		done \
 	} >'$(CHAINLINTTMP_SQ)'/expect && \
 	$(CHAINLINT) --emit-all '$(CHAINLINTTMP_SQ)'/tests | \
-		sed -e 's/^[1-9][0-9]* //;/^[ 	]*$$/d' >'$(CHAINLINTTMP_SQ)'/actual && \
-	if test -f ../GIT-BUILD-OPTIONS; then \
-		. ../GIT-BUILD-OPTIONS; \
-	fi && \
-	if test -x ../git$$X; then \
-		DIFFW="../git$$X --no-pager diff -w --no-index"; \
-	else \
-		DIFFW="diff -w -u"; \
-	fi && \
-	$$DIFFW '$(CHAINLINTTMP_SQ)'/expect '$(CHAINLINTTMP_SQ)'/actual
+		sed -e 's/^[1-9][0-9]* //;s/[ 	][ 	]*/ /g;/^ *$$/d;s/^ //;s/ $$//;s/\([<>|();&]\) /\1/g;s/ \([<>|();&]\)/\1/g' >'$(CHAINLINTTMP_SQ)'/actual && \
+	diff -u '$(CHAINLINTTMP_SQ)'/expect '$(CHAINLINTTMP_SQ)'/actual

 test-lint: test-lint-duplicates test-lint-executable test-lint-shell-syntax \
 	test-lint-filenames
-- 
2.43.0

^ permalink raw reply related

* Re: Communication channel interruption
From: brian m. carlson @ 2023-12-14  2:27 UTC (permalink / raw)
  To: Alexander Zhirov; +Cc: git
In-Reply-To: <71f2b28b-3e27-4773-8408-2f4c13757b73@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1726 bytes --]

On 2023-12-13 at 09:54:30, Alexander Zhirov wrote:
> When cloning a repository from GitHub, I have a channel break, although the
> connection is stable. What could be the problem?
> 
> # git clone https://github.com/Thinstation/thinstation.git
> Cloning into 'thinstation'...
> remote: Enumerating objects: 79839, done.
> remote: Counting objects: 100% (535/535), done.
> remote: Compressing objects: 100% (319/319), done.
> error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly: CANCEL
> (err 8)
> error: 7473 bytes of body are still expected
> fetch-pack: unexpected disconnect while reading sideband packet
> fatal: early EOF
> fatal: fetch-pack: invalid index-pack output

It would be helpful to know what version of Git, libcurl, and operating
system you're using.  This type of error is often caused by proxies,
including non-default antivirus or firewall programs (very especially on
Windows), TLS middleboxes, and monitoring software.  If you have such
software, it should be completely uninstalled and you should reboot, and
usually that will fix the problem.  (Simply disabling it often does not
fix things.)

I should point out that while I'm aware that this does occur for many
people, I have never, ever seen it when using GitHub, nor do I believe
it is actually sent by GitHub servers.  Thus, my experience leads me to
believe the problem lies elsewhere.  It is theoretically possible that
libcurl or one of its dependencies has a bug in your version, but
without more details, it's hard to tell.

As Junio suggested, you could try forcing HTTP/1.1 and see if that fixes
the problem for you as well.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply

* Re: End-of-line comments are prompted with "is not a valid attribute name"
From: Junio C Hamano @ 2023-12-13 19:22 UTC (permalink / raw)
  To: D无数; +Cc: git
In-Reply-To: <CAOQ=bxz8txyOt6p5L0qfx5DFKfxUhWvHW0pJP+YTbWVfBpvYxg@mail.gmail.com>

D无数 <wushuripple@gmail.com> writes:

> This is my .gitattributes:
> ```
> res/csv/*.txt eol=lf # 保证csv为lf，以匹配解析格式

I do not think of any version of Git ignoring what you wrote after #
(including #) in the middle of the line.

> # Custom for Visual Studio

I know I designed the parser to allow this as a comment, though.

So

> When performing certain operations:
> ···shell
>> git status
> # is not a valid attribute name: .gitattributes:1

This is totally expected; nothing to see here.

^ permalink raw reply

* Re: [PATCH 05/24] midx: implement `DISP` chunk
From: Junio C Hamano @ 2023-12-13 19:20 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Jeff King, git, Patrick Steinhardt
In-Reply-To: <ZXn33HVBFqidAThn@nand.local>

Taylor Blau <me@ttaylorr.com> writes:

>   - Turn any cross-pack deltas (which are stored as OFS_DELTAs) into
>     REF_DELTAs. We already do this today when reusing an OFS_DELTA
>     without `--delta-base-offset` enabled, so it's not a huge stretch to
>     do the same for cross-pack deltas even when `--delta-base-offset` is
>     enabled.
>
>     This would work, but would obviously result in larger-than-necessary
>     packs, as we in theory *could* represent these cross-pack deltas by
>     patching an existing OFS_DELTA. But it's not clear how much that
>     would matter in practice. I suspect it would have a lot to do with
>     how you pack your repository in the first place.

I have to wonder if the cost of additional computation to see when
is safe to allow cross-pack delta can be kept reasonably low, as
you'd need to prove that you are not introducing a cycle by doing
so.  Compared to that, spending a dozen or so bytes for the offset
for rare cases of having to express such a cross-pack delta pair
would be much less worrisome to me.

> Thanks for the terrific suggestion. I'll clean up the results of trying
> to implement it, and share it with the list soon (ideally before the end
> of this week, after which I'm on vacation until the new year).

Good to hear that you'll be recharging soon.  Enjoy.

^ permalink raw reply

* Re: [PATCH 05/24] midx: implement `DISP` chunk
From: Taylor Blau @ 2023-12-13 18:28 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git, Patrick Steinhardt
In-Reply-To: <20231212080332.GC1117953@coredump.intra.peff.net>

Hi Peff,

On Tue, Dec 12, 2023 at 03:03:32AM -0500, Jeff King wrote:
> [...]

Well this took me longer to respond to than I thought it would ;-).

> So it kind of seems to me that this would "just work" if
> try_partial_reuse() did something like for the midx case:
>
>   - convert offset of base object into an object id hash using the pack
>     revindex (similar to offset_to_pack_pos)
>
>   - look up the object id in the midx to get a pack/offset combo
>
>   - use the midx revindex to convert that into a bit position
>
>   - check if that bit is marked for verbatim reuse
>
> The assumption there is that in the second step, the midx has resolved
> duplicates by putting all of pack A before pack B, and so on, as above.
> It also assumes that we are trying verbatim reuse in the same order
> (though a different order would not produce wrong results, it would only
> result in less verbatim reuse).

After much thinking, I agree with your conclusion here. Which is great
news indeed, since it allows us to implement multi-pack reuse without
requiring that the candidate packs be disjoint with respect to their
objects.

Even though we have some protections in place for ensuring these packs'
disjointed-ness, I agree with Junio upthread that this is likely the
most fragile part of this series. That is, even though we check in
midx.c::get_sorted_entries() that the marked packs are disjoint, if we
miss something, the results would be rather bad. At best it would result
in sending a corrupt packfile to the client, and at worst it could
result in permanent data corruption if repacking a repository with
multi-pack reuse enabled.

> If we assume that any cross-pack deltas are a problem, we could always
> just skip them for verbatim reuse. That is, once we look up the object
> id in the midx, we can see if it's in the same pack we're currently
> processing. If not, we could punt and let the non-verbatim code paths
> handle it as usual.

Let me think aloud for a second, since it took me quite a lot of
thinking to fully wrap my head around this. Suppose we have two packs,
P1 and P2 where P1 has object A delta'd against B, and P2 has its own
copy of object B. If the MIDX chose the copy of B from P2, but also
decided to send P1 (which it chose from A), then if P1 is stored as an
OFS delta, we would be sending a corrupt packfile to the client.

I think there are a few things that we could reasonably do here:

  - Reject cross-pack deltas entirely (as you suggest). This is the
    easiest implementation choice in this already-complex series, and it
    doesn't paint us into a corner in the sense that we could relax
    these requirements in the future.

  - Turn any cross-pack deltas (which are stored as OFS_DELTAs) into
    REF_DELTAs. We already do this today when reusing an OFS_DELTA
    without `--delta-base-offset` enabled, so it's not a huge stretch to
    do the same for cross-pack deltas even when `--delta-base-offset` is
    enabled.

    This would work, but would obviously result in larger-than-necessary
    packs, as we in theory *could* represent these cross-pack deltas by
    patching an existing OFS_DELTA. But it's not clear how much that
    would matter in practice. I suspect it would have a lot to do with
    how you pack your repository in the first place.

  - Finally, we could patch OFS_DELTAs across packs in a similar fashion
    as we do today for OFS_DELTAs within a single pack on either side of
    a gap. This would result in the smallest packs of the three options
    here, but implementing this would be more involved.

    At minimum, you'd have to keep the reusable chunks list for all
    reused packs, not just the one we're currently processing. And you'd
    have to ensure that any bases which are a part of cross-pack deltas
    appear before the delta. I think this is possible to do, but would
    require assembling the reusable chunks list potentially in a
    different order than they appear in the source packs.

> And of course there's a bunch of hard work teaching all of those
> functions about midx's and multiple packs in the first place, but you
> already had to do all that in the latter part of your series, and it
> would still be applicable.

Yep, all of that is still a requirement here, and lives on in the
revised version of this series.

The naive implementation where we call try_partial_reuse() on every
object which is a candidate for reuse and check for cross-pack deltas
would work, but have poor performance. The reason is that we would be
doing a significant amount of cache-inefficient work to determine
whether or not the base for some delta/base-pair resides in the same
pack:

  - If you see a delta in some pack while processing a MIDX bitmap for
    reuse, you first have to figure out the location of its base in that
    same pack. (Note: this may or may not be the copy of the base object
    chosen by the MIDX).

  - To figure out whether or not the MIDX chose that copy, you would
    naively have to do something like:

      - Convert the base object's offset into a packfile position using
        the pack revindex.

      - Convert the base object's packfile position into an index
        position, which would then be used to obtain its OID.

      - Then binary search through the MIDX for that OID found in the
        previous step, filling out the MIDX entry for that object.

      - Toss out the cross-pack delta/base pair if the MIDX entry in the
        previous step indicates that the MIDX chose a copy of the base
        from a different pack than the one we're currently processing
        (i.e. where the delta resides).

That's rather inefficient. But, we can do better. Instead of going back
and forth through both the pack and MIDX's reverse index, we can simply
binary search in the MIDX's reverse index for the (pack_id, offset) pair
corresponding to the base.

If we find a match, then we know that the MIDX chose its copy of the
base object from the same pack, and we can reuse the delta/base-pair. If
we don't, then we know that the MIDX chose its copy of the base object
from a different pack, and we have to throw out the delta/base-pair.

This is a bit more involved than the naive implementation, but it's (a)
efficient, and (b) most of the code for it already exists in the form of
midx_to_pack_pos(), which implements a binary search over the MIDX
bitmap's pseudo-pack order.

With some light refactoring, we can repurpose this code to perform a
binary search for a given (pack, offset) pair instead of starting with a
MIDX lex position and converting it into the (pack, offset) pair. So
that works, and is what I've done in the revised version of this series.

There is one other snag, which is that we can no longer blindly reuse
whole-words from the reuse bitmap if we have non-disjoint packs. That
is, we can't do something like:

    while (pos < result->word_alloc && result->words[pos] == (eword_t)~0)
        pos++;

when processing anything but the first pack.

The reason is that we know the first pack has all duplicate object ties
broken in its favor, but we don't have the same guarantee for subsequent
packs. So we have to be more careful about which bits we reuse from
those subsequent packs, since we may inadvertently pick up a cross-pack
delta/base pair without inspecting it more closely.

As I mentioned, you can still perform this optimization over the first
pack, and I think that will be sufficient for most repositories. It's
not clear to me exactly how much this optimization is helping us in
contrast to all of the other work that pack-objects is doing, but that
is probably something worth measuring.

Thanks for the terrific suggestion. I'll clean up the results of trying
to implement it, and share it with the list soon (ideally before the end
of this week, after which I'm on vacation until the new year).

Thanks,
Taylor

^ permalink raw reply

* End-of-line comments are prompted with "is not a valid attribute name"
From: D无数 @ 2023-12-13 17:32 UTC (permalink / raw)
  To: git

> macos 13.5.2
> git version 2.37.0 (Apple Git-136)

I use end-of-line comments in my .gitattributes file, and it is often
(not always present, but often) prompted with '# is not a valid
attribute name: .gitattributes:1' when performing many operations.

This is my .gitattributes:
```
res/csv/*.txt eol=lf # 保证csv为lf，以匹配解析格式

# Custom for Visual Studio
*.cs     diff=csharp
```

When performing certain operations:
···shell
> git status
# is not a valid attribute name: .gitattributes:1
On branch master
Your branch is up to date with 'origin/master'.

nothing to commit, working tree clean
···

Other repositories that have .gitattributes files and include
end-of-line comments do the same thing. I'm sure there are no special
characters in my files.

^ permalink raw reply

* Assistance Needed
From: micah veintetres @ 2023-12-13 15:52 UTC (permalink / raw)
  To: git@vger.kernel.org

I am in the middle of the worst tragedy I think man will ever see.

The only safety I have found online is with github, who has been hit with several NSLs regarding a repository I setup documenting what the massacre of my church during the height of Sunday morning worship services in April.

I'm asking that everyone please clone this repository: https://github.com/micahonamission/mcbcm.git

The government has worked tirelessly to compromise my iPhone and finally stole it at gunpoint, it is the second time they have stolen one.  My iPhones/iCloud held the only passkey to login.  So I've created a new one account and cloned the repository knowing they will be trying to modify it.  

I need clones so they cannot keep this hidden.  The existing coverup operation has turned into a genocide in Kentucky.

Please clone and fetch it often.  I'm going to pull in all of the untouched wiki content into this repository later from the existing forks.

Thank you,
-micah

p.s. Before you consume any information about my church, the Creator asks you to pray for guidance as there is alot of information that is circulating intended to confuse you or cover the matter up.  The Creator will let you know, He's just not intrusive, so you need to ask Him for some guidance first.  Thanks!

^ permalink raw reply

* Re: Allow adding files from git submodule to parent git repo if they are in .gitignore in the submodule
From: Junio C Hamano @ 2023-12-13 15:38 UTC (permalink / raw)
  To: Alexey Murz Korepov; +Cc: git
In-Reply-To: <CAL5pyKseyxEfLbgEkDHSqWQVLGkrrcZU=BzRx2zZOaCdQbMaoA@mail.gmail.com>

Alexey Murz Korepov <murznn@gmail.com> writes:

> ```
> $ git add -f mysubmodule/config.yaml
> fatal: Pathspec 'mysubmodule/config.yaml' is in submodule 'mysubmodule'
> ```
> But I see no problems with doing this, if the file is in .gitignore of
> the mysubmodule submodule.
>
> So, let's discuss the approach of how we can allow this in git?

What should happen when mysubmodule/.gitignore changes?  Who
guarantees that such a file stolen by the superproject will stay in
the .gitignore file in the submodule?

I do not think you want to go there, and I do not think I want to
see such a behaviour in Git, as we would be the ones who will be
blamed for the resulting mess.

The right solution probably is to keep such a file in a directory
that is separate from the directory where you bind your submodule.
Track the file in the superproject, and have a build procedure to
copy it into the submodule directory (as an untracked file, of
course).

^ permalink raw reply

* Re: Communication channel interruption
From: Junio C Hamano @ 2023-12-13 15:32 UTC (permalink / raw)
  To: Alexander Zhirov; +Cc: git
In-Reply-To: <71f2b28b-3e27-4773-8408-2f4c13757b73@gmail.com>

Alexander Zhirov <azhirov1991@gmail.com> writes:

> When cloning a repository from GitHub, I have a channel break,
> although the connection is stable. What could be the problem?
>
> # git clone https://github.com/Thinstation/thinstation.git
> Cloning into 'thinstation'...
> remote: Enumerating objects: 79839, done.
> remote: Counting objects: 100% (535/535), done.
> remote: Compressing objects: 100% (319/319), done.
> error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly:
> CANCEL (err 8)
> error: 7473 bytes of body are still expected
> fetch-pack: unexpected disconnect while reading sideband packet
> fatal: early EOF
> fatal: fetch-pack: invalid index-pack output

A guess in the dark without knowing Git version or platform on which
you are seeing this symptom, but it would help to read "What does
http.postBuffer really do?" in 

    $ git help faq

and then to ask Google about "RPC failed; curl 92", which would show
mainly two suggestions from various places.  Perhaps

    $ git -c http.version=HTTP/1.1 clone https://github.com/Thinstation/thinstation.git

may help?


^ permalink raw reply

* Re: [PATCH v6] subtree: fix split processing with multiple subtrees present
From: Zach FettersMoore @ 2023-12-13 15:20 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Christian Couder, Zach FettersMoore via GitGitGadget, git
In-Reply-To: <xmqqzfyfoy2w.fsf@gitster.g>

Christian Couder <christian.couder@gmail.com> writes:

>>> > $ git subtree split --prefix=apollo-ios-codegen --squash --rejoin
>>> > Merge made by the 'ort' strategy.
>>> > e274aed3ba6d0659fb4cc014587cf31c1e8df7f4
>>>
>>> Looking into this some it looks like it could be a bash config
>>> difference? My machine always runs it all the way through vs
>>> failing for recursion depth. Although that would also be an issue
>>> which is solved by this fix.
>>
>> I use Ubuntu where /bin/sh is dash so my current guess is that dash
>> might have a smaller recursion limit than bash.
>
> That sounds quite bad. Does it have to be recursive (iow, if we can
> rewrite the logic to be iterative instead, that would be a much better
> way to fix the issue)?

I don't think an iterative vs recursive approach fixes this
particular issue, the root of the issue this patch is fixing
is that lots of commits from the history of subtrees not
being acted upon are being processed when they don't need to
be. So the iterative approach would likely resolve the
recursion limit issue for some shells, but in my instance
I don't see a recursion limit error, it just takes an
extraordinary amount of time to run the split command
because of all the unnecessary processing which needs to be
avoided which this patch fixes.

^ permalink raw reply

* Re: [PATCH] Use ^=1 to toggle between 0 and 1
From: Junio C Hamano @ 2023-12-13 15:17 UTC (permalink / raw)
  To: Jeff King
  Cc: René Scharfe, AtariDreams via GitGitGadget, git, Seija Kijin
In-Reply-To: <20231213080143.GA1684525@coredump.intra.peff.net>

Jeff King <peff@peff.net> writes:

> Without digging into the code, I had just assumed that flipped_block was
> used as an array index. But it really is a boolean, so I actually think
> "flipped_block = !flipped_block" would probably be the most clear (but
> IMHO not really worth the churn).

;-)

> I don't even know that we'd need much of a weather-balloon patch. I
> think it would be valid to do:
>
>   #ifndef bool
>   #define bool int
>
> to handle pre-C99 compilers (if there even are any these days). Of
> course we probably need some conditional magic to try to "#include
> <stdbool.h>" for the actual C99. I guess we could assume C99 by default
> and then add NO_STDBOOL as an escape hatch if anybody complains.

Sounds good.


^ permalink raw reply

* Re: [PATCH v2 1/4] wt-status: read HEAD and ORIG_HEAD via the refdb
From: Junio C Hamano @ 2023-12-13 15:15 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Ramsay Jones, git, Taylor Blau, Phillip Wood
In-Reply-To: <ZXlfeWtDgr1GQFCL@tanuki>

Patrick Steinhardt <ps@pks.im> writes:

>> Me neither, but once you start thinking about getting rid of the
>> need to use one-file-per-ref filesystem, being able to maintain all
>> refs, including the pseudo refs, in one r/w store backend, becomes a
>> very tempting goal.  From that point of view, I do not have problem
>> with the idea to move _all_ pseudorefs to reftable.
>
> Yes, we're in agreement.
>
>> But I do have reservations on what Patrick, and the code he
>> inherited from Han-Wen, calls "special refs" (which is not defined
>> in the glossary at all), namely, refs.c:is_special_ref() and its
>> callers.
>
> I do not want to add "special refs" to the glossary because ultimately
> they should go away, with two exceptions: FETCH_HEAD and MERGE_HEAD.
> Once we're there we can of course discuss whether we want to explicitly
> point them out in the glossary and give them a special name.

OK, I somehow got a (wrong) impression that you are very close to
the finish line.  If the intention is to leave many others still in
the "special" category (for only the reasons of inertia), with a
vision that some selected few must remain "special" with their own
good reasons, then I am perfectly fine.

Thanks.

^ permalink raw reply

* Re: [PATCH] tests: prefer host Git to verify chainlint self-checks
From: Junio C Hamano @ 2023-12-13 15:11 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Eric Sunshine
In-Reply-To: <ZXlbNlG28e1sAYPU@tanuki>

Patrick Steinhardt <ps@pks.im> writes:

>> I do not think "prefer host Git" is necessarily a good idea; falling
>> back to use host Git is perfectly fine, of course.
>
> Why is that, though?

Mostly because your "differences in features supported by just-built
one and what happens to be on $PATH can cause problems" cuts both
ways, and as a general principle to work around such issues, using
just-built one is a better discipline.  The features the build
infrastructure ("self-checks" being discussed is a part of it) of a
particular version of Git source depends on are more likely to be
found in the binary that will be built from the matching source,
than what happens to be on $PATH that may be from a year-old version.

As you said, you'd need to accomodate need for those who are
initially porting or building Git on a host without an installed
one.  If we were doing a cross build where just-built on would not
be executable on the host, whatever version on $PATH (or in
/usr/bin) may have to be used, but then in such a case you would not
be testing on host anyway.  For these two reasons, it is a given
that one of the choices has to be to use just-built one.  We can
safely give lower precedence to the host tool.

The only one-and-half practical reasons we may want to fall back to
whatever happens to be on $PATH are:

 - just-built one is so broken that even the simple use by the build
   infrastructure (like "self-checks") does not work (but then it
   becomes "if it is so broken, fix it before even thinking about
   running tests", and that is why I count it as half a reason), or

 - we are bisecting down to an ancient version, and just-built one
   from such a version may not understand the current repository.

so I do think it is an excellent workaround to fall back to a
version of Git with unknown vintage that happens to be on $PATH,
than failing and stopping by relying only on just-built one.

> We already use host Git in other parts of our build
> infra, and the options we pass to git-diff(1) have been around for ages:

It only argues for "trying host one, if available, before just-built
one does not hurt for this particular case".  But I was interested
in laying out a more general principle we can follow in similar
situations in the future.

^ permalink raw reply

* Re: [PATCH] mailinfo: fix out-of-bounds memory reads in unquote_quoted_pair()
From: Junio C Hamano @ 2023-12-13 14:54 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: Jeff King, git, Taylor Blau,
	Carlos Andrés Ramírez Cataño
In-Reply-To: <ZXlYIZ0Hb1kN84NU@tanuki>

Patrick Steinhardt <ps@pks.im> writes:

>>  static const char *unquote_quoted_string(struct strbuf *outbuf, const char *in)
>>  {
>> -	int c;
>>  	int take_next_literally = 0;
>>  
>> -	while ((c = *in++) != 0) {
>> +	while (*in) {
>> +		int c = *in++;
>>  		if (take_next_literally == 1) {
>>  			take_next_literally = 0;
>>  		} else {
>
> I was wondering whether we want to convert `unquote_quoted_pair()` in
> the same way. It's not subject to the same issue because it doesn't
> return `in` to its callers. But it would lower the cognitive overhead a
> bit by keeping the coding style consistent. But please feel free to
> ignore this remark.

Yeah, I was wondering about the value of establishing a pattern that
can be followed safely and with clarity, too.  I also briefly
wondered how bad if we picked the "compensate for the increment done
by the last iteration before leaving the loop by returning (in-1)"
idiom (which Peff called "a hacky way") to be that universal
pattern, but this bug was a clear enough evidence that it does not
work very well in developers' minds.

I actually had trouble with the proposed update, and wondered if

-	while ((c = *in++) != 0) {
+	while ((c = *in)) {
+		in++;

is easier to follow, but then was hit by the possibility that the
same "we have incremented 'in' a bit too early" may exist if such
a loop wants to use 'in' in its body.  Wouldn't it mean that

-	while ((c = *in++) != 0) {
+	for (; c = *in; in++) {

would be even a better rewrite?

Enough bikeshedding...

> Another thing I was wondering about is the recursive nature of these
> functions, and whether it can lead to similar issues like we recently
> had when parsing revisions with stack exhaustion. I _think_ this should
> not be much of a problem in this case though, as we're talking about
> mail headers here. While the length of header values isn't restricted
> per se (the line length is restricted to 1000, but with Comment Folding
> Whitespace values can span multiple lines), but mail provdiers will sure
> clamp down on mails with a "From:" header that is megabytes in size.

Good thinking, and I think Peff's iterative rewrite would be a good
way to deal with it if it ever becomes an issue.

> So taken together, this looks good to me.

Thanks, both, for writing and reviewing.

^ permalink raw reply

* Re: New attempt to export test-lib from Git, maybe Sharness2?
From: Jiang Xin @ 2023-12-13 14:01 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jiang Xin, Mathias Lafeldt, Christian Couder
In-Reply-To: <xmqqcyvfmccr.fsf@gitster.g>

On Sun, Dec 10, 2023 at 2:59 AM Junio C Hamano <gitster@pobox.com> wrote:
> Is it a viable option to stick to the name "test-lib" (or possibly,
> "git-test-lib" to make it more prominent to say where it came from)?

Will rename the project to "git-test-lib".

> If you do not plan to coordinate with those who work on (the remnant
> of) the original sharness based on an ancient version of our test
> framework, and do not plan to actively transition its users to your
> version, it is less confusing if you named yours differently, as it
> avoids hinting that your version is a successor of theirs.
>
> I am not sure if reusing the history of our project verbatim using
> filter-repo is really a good way to help those who are interested in
> the test framework, by the way.

If all historical commits were squashed, it would be difficult to
track future changes of test-lib in the Git project. Futhermore, who
will be the author of the squashed commit?

> and unedited filter-repo result will describe such a commit
> primarily to explain why the changes in the commit was made on Git
> side.  Most of the changes described in the resulting commit message
> are discarded by filter-repo and the resulting history becomes hard
> to follow.

One solution is to add notes in the README file and users can refer to
the commit history of the git project.

^ permalink raw reply

* [PATCH 2/1] test-lib-functions: add object size functions
From: René Scharfe @ 2023-12-13 12:28 UTC (permalink / raw)
  To: Jeff King, git; +Cc: Ondrej Pohorelsky, brian m . carlson, Junio C Hamano
In-Reply-To: <ff735aac-b60b-4d52-a6dc-180ab504fc8d@web.de>

Add test_object_size and its helpers test_loose_object_size and
test_packed_object_size, which allow determining the size of a Git
object using only the low-level Git commands rev-parse and show-index.

Use it in t6300 to replace the bare-bones function test_object_file_size
as a motivating example.  There it provides the expected output of the
high-level Git command for-each-ref.

Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: René Scharfe <l.s.r@web.de>
---
So how about this?  I'm a bit nervous about all the rules about output
descriptors and error propagation and whatnot in the test library, but
this implementation seems simple enough and might be useful in more than
one test.  No idea how to add support for alternate object directories,
but I doubt we'll ever need it.
---
 t/t6300-for-each-ref.sh | 16 ++++++--------
 t/test-lib-functions.sh | 47 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 54 insertions(+), 9 deletions(-)

diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
index 843a7fe143..4687660f38 100755
--- a/t/t6300-for-each-ref.sh
+++ b/t/t6300-for-each-ref.sh
@@ -20,12 +20,6 @@ setdate_and_increment () {
     export GIT_COMMITTER_DATE GIT_AUTHOR_DATE
 }

-test_object_file_size () {
-	oid=$(git rev-parse "$1")
-	path=".git/objects/$(test_oid_to_path $oid)"
-	test_file_size "$path"
-}
-
 test_expect_success setup '
 	# setup .mailmap
 	cat >.mailmap <<-EOF &&
@@ -40,7 +34,11 @@ test_expect_success setup '
 	git branch -M main &&
 	setdate_and_increment &&
 	git tag -a -m "Tagging at $datestamp" testtag &&
+	testtag_oid=$(git rev-parse refs/tags/testtag) &&
+	testtag_disksize=$(test_object_size $testtag_oid) &&
 	git update-ref refs/remotes/origin/main main &&
+	commit_oid=$(git rev-parse refs/heads/main) &&
+	commit_disksize=$(test_object_size $commit_oid) &&
 	git remote add origin nowhere &&
 	git config branch.main.remote origin &&
 	git config branch.main.merge refs/heads/main &&
@@ -129,7 +127,7 @@ test_atom head push:strip=1 remotes/myfork/main
 test_atom head push:strip=-1 main
 test_atom head objecttype commit
 test_atom head objectsize $((131 + hexlen))
-test_atom head objectsize:disk $(test_object_file_size refs/heads/main)
+test_atom head objectsize:disk $commit_disksize
 test_atom head deltabase $ZERO_OID
 test_atom head objectname $(git rev-parse refs/heads/main)
 test_atom head objectname:short $(git rev-parse --short refs/heads/main)
@@ -203,8 +201,8 @@ test_atom tag upstream ''
 test_atom tag push ''
 test_atom tag objecttype tag
 test_atom tag objectsize $((114 + hexlen))
-test_atom tag objectsize:disk $(test_object_file_size refs/tags/testtag)
-test_atom tag '*objectsize:disk' $(test_object_file_size refs/heads/main)
+test_atom tag objectsize:disk $testtag_disksize
+test_atom tag '*objectsize:disk' $commit_disksize
 test_atom tag deltabase $ZERO_OID
 test_atom tag '*deltabase' $ZERO_OID
 test_atom tag objectname $(git rev-parse refs/tags/testtag)
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 9c3cf12b26..9b49645f77 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -1733,6 +1733,53 @@ test_oid_to_path () {
 	echo "${1%$basename}/$basename"
 }

+test_loose_object_size () {
+	test "$#" -ne 1 && BUG "1 param"
+	local path=$(test_oid_to_path "$1")
+	test_file_size "$(git rev-parse --git-path "objects/$path")" 2>&4
+}
+
+test_packed_object_size () {
+	test "$#" -ne 2 && BUG "2 params"
+	local oid=$1 idx=$2 packsize rawsz end
+
+	packsize=$(test_file_size "${idx%.idx}.pack")
+	rawsz=$(test_oid rawsz)
+	end=$(($packsize - $rawsz))
+
+	git show-index <"$idx" |
+	awk -v oid="$oid" -v end="$end" '
+		$2 == oid {start = $1}
+		{offsets[$1] = 1}
+		END {
+			if (!start || start >= end)
+				exit 1
+			for (o in offsets)
+				if (start < o && o < end)
+					end = o
+			print end - start
+		}
+	' && return 0
+
+	echo >&4 "error: '$oid' not found in '$idx'"
+	return 1
+}
+
+test_object_size () {
+	test "$#" -ne 1 && BUG "1 param"
+	local oid=$1
+
+	test_loose_object_size "$oid" 4>/dev/null && return 0
+
+	for idx in "$(git rev-parse --git-path objects/pack)"/pack-*.idx
+	do
+		test_packed_object_size "$oid" "$idx" 4>/dev/null && return 0
+	done
+
+	echo >&4 "error: '$oid' not found"
+	return 1
+}
+
 # Parse oids from git ls-files --staged output
 test_parse_ls_files_stage_oids () {
 	awk '{print $2}' -
--
2.43.0

^ permalink raw reply related

* Re: Problems with Windows + schannel + http.sslCert
From: Johannes Schindelin @ 2023-12-13 11:56 UTC (permalink / raw)
  To: Ragesh Krishna; +Cc: git@vger.kernel.org
In-Reply-To: <TY0PR06MB544239787E909DD4EAC1CA42D189A@TY0PR06MB5442.apcprd06.prod.outlook.com>

Hi Ragesh,

On Sat, 9 Dec 2023, Ragesh Krishna wrote:

> I'm trying to use SSL client auth on Windows. My git installation
> currently lists only schannel as the supported backend.

There is a reason why only Secure Channel is listed as supported backend,
and it is a relatively boring one: DLL hell.

However, contrary to what `git -c http.sslBackend=list ls-remote <url>`
says on Windows, `git -c http.sslBackend=openssl ls-remote <url>` should
work, too.

Ciao,
Johannes

^ permalink raw reply

* Allow adding files from git submodule to parent git repo if they are in .gitignore in the submodule
From: Alexey Murz Korepov @ 2023-12-13 11:07 UTC (permalink / raw)
  To: git

Git submodules are pretty often used to "import" some external git
repository into your git repository, to not manage its files in your
git tree, that's good.

But very often we need to add some file to that submodule and manage
it in our local git repository, and this file is usually in .gitignore
in the submodule's git.

So, the example task is:

Copy a mysubmodule/config.example.yaml to mysubmodule/config.yaml
(which is in .gitignore in the mysubmodule), make local changes, and
store it in the local (parent) root git repo.

But the problem is that we can't add this mysubmodule/config.yaml file
to our root git repository, because receiving an error:
```
$ git add -f mysubmodule/config.yaml
fatal: Pathspec 'mysubmodule/config.yaml' is in submodule 'mysubmodule'
```
But I see no problems with doing this, if the file is in .gitignore of
the mysubmodule submodule.

So, let's discuss the approach of how we can allow this in git?

Or will be glad to hear and discuss alternatives and workarounds for this task.

^ permalink raw reply

* Communication channel interruption
From: Alexander Zhirov @ 2023-12-13  9:54 UTC (permalink / raw)
  To: git

When cloning a repository from GitHub, I have a channel break, although 
the connection is stable. What could be the problem?

# git clone https://github.com/Thinstation/thinstation.git
Cloning into 'thinstation'...
remote: Enumerating objects: 79839, done.
remote: Counting objects: 100% (535/535), done.
remote: Compressing objects: 100% (319/319), done.
error: RPC failed; curl 92 HTTP/2 stream 5 was not closed cleanly: 
CANCEL (err 8)
error: 7473 bytes of body are still expected
fetch-pack: unexpected disconnect while reading sideband packet
fatal: early EOF
fatal: fetch-pack: invalid index-pack output

^ permalink raw reply

* Re: [PATCH] mailinfo: fix out-of-bounds memory reads in unquote_quoted_pair()
From: Jeff King @ 2023-12-13  8:20 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: git, Taylor Blau, Carlos Andrés Ramírez Cataño
In-Reply-To: <ZXlYIZ0Hb1kN84NU@tanuki>

On Wed, Dec 13, 2023 at 08:07:21AM +0100, Patrick Steinhardt wrote:

> >  static const char *unquote_quoted_string(struct strbuf *outbuf, const char *in)
> >  {
> > -	int c;
> >  	int take_next_literally = 0;
> >  
> > -	while ((c = *in++) != 0) {
> > +	while (*in) {
> > +		int c = *in++;
> >  		if (take_next_literally == 1) {
> >  			take_next_literally = 0;
> >  		} else {
> 
> I was wondering whether we want to convert `unquote_quoted_pair()` in
> the same way. It's not subject to the same issue because it doesn't
> return `in` to its callers. But it would lower the cognitive overhead a
> bit by keeping the coding style consistent. But please feel free to
> ignore this remark.

I dunno. I don't think the consistency matters that much between
functions, and on its own, the "while ((c = *in++))" pattern is not
harmful. It's used elsewhere in the same file for other functions that
are also fine (for decoding q/b headers).

> Another thing I was wondering about is the recursive nature of these
> functions, and whether it can lead to similar issues like we recently
> had when parsing revisions with stack exhaustion. I _think_ this should
> not be much of a problem in this case though, as we're talking about
> mail headers here. While the length of header values isn't restricted
> per se (the line length is restricted to 1000, but with Comment Folding
> Whitespace values can span multiple lines), but mail provdiers will sure
> clamp down on mails with a "From:" header that is megabytes in size.

It's just unquote_comment() that is recursive, but yeah, there is
nothing to stop it from recursing forever on "((((((...". The stack
requirements are pretty small, though. I needed between 2^17 and 2^18
bytes to segfault on my machine using:

  perl -e 'print "From: ", "(" x 2**18;' |
  git mailinfo /dev/null /dev/null

Absurdly big for an email, but maybe within the realm of possibility? I
think it might be possible to drop the recursion and just use a depth
counter, like this:

diff --git a/mailinfo.c b/mailinfo.c
index 737b9e5e13..db236f9f9f 100644
--- a/mailinfo.c
+++ b/mailinfo.c
@@ -59,6 +59,7 @@ static void parse_bogus_from(struct mailinfo *mi, const struct strbuf *line)
 static const char *unquote_comment(struct strbuf *outbuf, const char *in)
 {
 	int take_next_literally = 0;
+	int depth = 1;
 
 	strbuf_addch(outbuf, '(');
 
@@ -72,11 +73,14 @@ static const char *unquote_comment(struct strbuf *outbuf, const char *in)
 				take_next_literally = 1;
 				continue;
 			case '(':
-				in = unquote_comment(outbuf, in);
+				strbuf_addch(outbuf, '(');
+				depth++;
 				continue;
 			case ')':
 				strbuf_addch(outbuf, ')');
-				return in;
+				if (!--depth)
+					return in;
+				continue;
 			}
 		}
 
I doubt it's a big deal either way, but if it's that easy to do it might
be worth it.

-Peff

^ permalink raw reply related

* Re: [PATCH] Use ^=1 to toggle between 0 and 1
From: Jeff King @ 2023-12-13  8:01 UTC (permalink / raw)
  To: René Scharfe
  Cc: AtariDreams via GitGitGadget, git, AtariDreams, Seija Kijin
In-Reply-To: <8bea38fe-38a3-412a-b189-541a6596d623@web.de>

On Tue, Dec 12, 2023 at 11:30:03PM +0100, René Scharfe wrote:

> Am 12.12.23 um 21:09 schrieb Jeff King:
> > On Tue, Dec 12, 2023 at 05:17:47PM +0000, AtariDreams via GitGitGadget wrote:
> >
> >> diff --git a/diff.c b/diff.c
> >> index 2c602df10a3..91842b54753 100644
> >> --- a/diff.c
> >> +++ b/diff.c
> >> @@ -1191,7 +1191,7 @@ static void mark_color_as_moved(struct diff_options *o,
> >>  							    &pmb_nr);
> >>
> >>  			if (contiguous && pmb_nr && moved_symbol == l->s)
> >> -				flipped_block = (flipped_block + 1) % 2;
> >> +				flipped_block ^= 1;
> >>  			else
> >>  				flipped_block = 0;
> >
> > This one I do not see any problem with changing, though I think it is a
> > matter of opinion on which is more readable (I actually tend to think of
> > "x = 0 - x" as idiomatic for flipping).
> 
> Did you mean "x = 1 - x"?

Oops, yes, of course. I'm not sure how I managed to fumble that.

> I don't particular like this; it repeats x and seems error-prone. ;-)

Yes. :)

Without digging into the code, I had just assumed that flipped_block was
used as an array index. But it really is a boolean, so I actually think
"flipped_block = !flipped_block" would probably be the most clear (but
IMHO not really worth the churn).

> Can we salvage something from this bikeshedding exercise?  I wonder if
> it's time to use the C99 type _Bool in our code.  It would allow
> documenting that only two possible values exist in cases like the one
> above.  That would be even more useful for function returns, I assume.

Hmm, possibly. I guess that might have helped my confusion, and I do
think returning bool for function returns would help make their meaning
more clear (it would help distinguish them from the usual "0 for
success" return values).

I don't even know that we'd need much of a weather-balloon patch. I
think it would be valid to do:

  #ifndef bool
  #define bool int

to handle pre-C99 compilers (if there even are any these days). Of
course we probably need some conditional magic to try to "#include
<stdbool.h>" for the actual C99. I guess we could assume C99 by default
and then add NO_STDBOOL as an escape hatch if anybody complains.

-Peff

^ permalink raw reply

* Re: [PATCH v2 1/4] wt-status: read HEAD and ORIG_HEAD via the refdb
From: Patrick Steinhardt @ 2023-12-13  7:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Ramsay Jones, git, Taylor Blau, Phillip Wood
In-Reply-To: <xmqq34w7os53.fsf@gitster.g>

[-- Attachment #1: Type: text/plain, Size: 3505 bytes --]

On Tue, Dec 12, 2023 at 04:36:24PM -0800, Junio C Hamano wrote:
> Ramsay Jones <ramsay@ramsayjones.plus.com> writes:
> 
> >> "via the refdb" -> "via the refs API" or something here and on the
> >> title, and possibly elsewhere in the proposed log messages and
> >> in-code comments in patches in this series, as I've never seen a
> >> word "refdb" used in the context of this project.
> >> 
> >> I agree it is bad manners to be intimate with the implementation
> >> details of the how files-backend stores HEAD and ORIG_HEAD.
> >
> > Hmm, I have never thought of the 'pseudo-refs' as being a part of
> > the 'reference database' at all. ;)
> 
> Me neither, but once you start thinking about getting rid of the
> need to use one-file-per-ref filesystem, being able to maintain all
> refs, including the pseudo refs, in one r/w store backend, becomes a
> very tempting goal.  From that point of view, I do not have problem
> with the idea to move _all_ pseudorefs to reftable.

Yes, we're in agreement.

> But I do have reservations on what Patrick, and the code he
> inherited from Han-Wen, calls "special refs" (which is not defined
> in the glossary at all), namely, refs.c:is_special_ref() and its
> callers.

I do not want to add "special refs" to the glossary because ultimately
they should go away, with two exceptions: FETCH_HEAD and MERGE_HEAD.
Once we're there we can of course discuss whether we want to explicitly
point them out in the glossary and give them a special name.

> Neither am I very much sympathetic to the hardcoded list
> of "known" pseudorefs, refs.c:pseudorefs[].  I cannot quite see why
> we need anything more than


>     any string that passes refs.c:is_pseudoref_syntax() is a
>     pseudoref, is per worktree, and ref backends can store them like
>     any other refs.  Many of them have specific meaning and uses
>     (e.g. HEAD is "the current branch").
> 
> Enumerating existing pseudorefs in files backend may need to
> opendir(".git") + readdir() filtered with is_pseudoref_syntax(),
> and a corresponding implementation for reftable backend may be much
> simpler (because there won't be "other cruft" stored there, unlike
> files backend that needs to worry about files that are not refs,
> like ".git/config" file.
> 
> > We seem to have pseudo-refs, special pseudo-refs and (recently)
> > ex-pseudo-refs!
> >
> > This patch (well series) changes the 'status' of some, *but not all*,
> > pseudo-refs; some graduate to full-blown refs stored as part of *a*
> > reference database (ie reftable).
> 
> Yeah, that leaves bad taste in my mouth, too.

I'm taking an iterative approach to things, which means that we're at
times going to be in an in-between state. I want to avoid changing too
many things at once and overwhelming potential reviewers. But I realize
that I should've done a better job of explaining that this patch series
is not the end goal, but rather a step towards that goal.

The patch series at hand merely records the status quo and rectifies any
inconsistencies we have with accessing such "special" refs. The natural
next step here would be to reduce the list of special refs (like e.g. we
do in patch 4) so that in the end it will only contain those refs which
really are special (FETCH_HEAD, MERGE_HEAD).

Please let me know in case you strongly disagree with my iterative
approach, or whether the issue is rather that I didn't make myself
sufficiently clear.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH v2 2/4] refs: propagate errno when reading special refs fails
From: Patrick Steinhardt @ 2023-12-13  7:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Taylor Blau, Phillip Wood
In-Reply-To: <xmqqfs07qi66.fsf@gitster.g>

[-- Attachment #1: Type: text/plain, Size: 1736 bytes --]

On Tue, Dec 12, 2023 at 12:28:49PM -0800, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > Some refs in Git are more special than others due to reasons explained
> > in the next commit. These refs are read via `refs_read_special_head()`,
> > but this function doesn't behave the same as when we try to read a
> > normal ref. Most importantly, we do not propagate `failure_errno` in the
> > case where the reference does not exist, which is behaviour that we rely
> > on in many parts of Git.
> >
> > Fix this bug by propagating errno when `strbuf_read_file()` fails.
> 
> Hmph, I thought, judging from what [1/4] did, that your plan was to
> use the refs API, instead of peeking directly into the filesystem,
> to access these pseudo refs, and am a bit puzzled if it makes all
> that much difference to fix errno handling while still reading
> directly from the filesystem.  Perhaps such a conversion happens in
> later steps of this series (or a follow on series after this series
> lands)?

Yes, that's ultimately the goal. The patch series here is only setting
the stage by recording what we have, and addressing cases where we are
inconsistently accessing refs via the filesystem _and_ via the ref API.
Once this lands I do want create a follow up patch series that converts
all refs to be non-special to the extent possible.

I say "to the extent possible" though because in the end there will be
two refs that we must continue to treat specially: FETCH_HEAD and
MERGE_HEAD, which we were treating special before this patch series
already. Both of these are not normal refs because they may contain
multiple values with annotations, so they will need to stay special.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox