Git development
 help / color / mirror / Atom feed
* Re: [PATCH] t5512.40 sometimes dies by SIGPIPE
From: Eric Sunshine @ 2024-09-13 19:44 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Patrick Steinhardt, Jeff King
In-Reply-To: <xmqqmskbwe1a.fsf@gitster.g>

On Fri, Sep 13, 2024 at 3:26 PM Junio C Hamano <gitster@pobox.com> wrote:
> The last test in t5512 we recently added seems to be flaky.
> Running
>
>     $ make && cd t && sh ./t5512-ls-remote.sh --stress
>
> shows that "git ls-remote foo::bar" exited with status 141, which
> means we got a SIGPIPE.  This test piece was introduced by 9e89dcb6
> (builtin/ls-remote: fall back to SHA1 outside of a repo, 2024-08-02)
> and is pretty much independent from all other tests in the script
> (it can even run standalone with everything before it removed).
>
> The transport-helper.c:get_helper() function tries to write to the
> helper.  As we can see the helper script is very short and can exit
> even before it reads anything, when get_helper() tries to give the
> first command, "capabilities", the helper may already be gone.
>
> A trivial fix, presented here, os to make sure that the helper reads

s/os/is/

> the first command it is given, as what it writes later is a response
> to that command.
>
> I however would wonder if the interactions with the helper initiated
> by get_helper() should be done on a non-blocking I/O (we do check
> the return value from our write(2) system calls, do we?).
>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>

^ permalink raw reply

* [PATCH] t5512.40 sometimes dies by SIGPIPE
From: Junio C Hamano @ 2024-09-13 19:26 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Jeff King

The last test in t5512 we recently added seems to be flaky.
Running

    $ make && cd t && sh ./t5512-ls-remote.sh --stress

shows that "git ls-remote foo::bar" exited with status 141, which
means we got a SIGPIPE.  This test piece was introduced by 9e89dcb6
(builtin/ls-remote: fall back to SHA1 outside of a repo, 2024-08-02)
and is pretty much independent from all other tests in the script
(it can even run standalone with everything before it removed).

The transport-helper.c:get_helper() function tries to write to the
helper.  As we can see the helper script is very short and can exit
even before it reads anything, when get_helper() tries to give the
first command, "capabilities", the helper may already be gone.

A trivial fix, presented here, os to make sure that the helper reads
the first command it is given, as what it writes later is a response
to that command.

I however would wonder if the interactions with the helper initiated
by get_helper() should be done on a non-blocking I/O (we do check
the return value from our write(2) system calls, do we?).

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 t/t5512-ls-remote.sh | 1 +
 1 file changed, 1 insertion(+)

diff --git c/t/t5512-ls-remote.sh w/t/t5512-ls-remote.sh
index d64b40e408..64b3491e4e 100755
--- c/t/t5512-ls-remote.sh
+++ w/t/t5512-ls-remote.sh
@@ -406,6 +406,7 @@ test_expect_success 'v0 clients can handle multiple symrefs' '
 test_expect_success 'helper with refspec capability fails gracefully' '
 	mkdir test-bin &&
 	write_script test-bin/git-remote-foo <<-EOF &&
+	read capabilities
 	echo import
 	echo refspec ${SQ}*:*${SQ}
 	EOF

^ permalink raw reply related

* Re: [PATCH v3 6/6] Makefile: add option to build and test libgit-rs and libgit-rs-sys
From: brian m. carlson @ 2024-09-13 19:01 UTC (permalink / raw)
  To: Sean Allred
  Cc: Calvin Wan, git, steadmon, spectral, emilyshaffer, emrass,
	rsbecker, gitster, mh, Jason, dsimic
In-Reply-To: <m0seubo5q7.fsf@epic96565.epic.com>

[-- Attachment #1: Type: text/plain, Size: 1373 bytes --]

On 2024-09-07 at 15:15:12, Sean Allred wrote:
> Calvin Wan <calvinwan@google.com> writes:
> > Add libgitrs, libgitrs-sys, libgitrs-test, and libgitrs-sys-test targets
> > to their respective Makefiles so they can be built and tested without
> > having to run cargo build/test.
> 
> I feel like clippy should be run as part of these somehow, but I'm not
> sure where.

Yes, that seems like a good idea in CI.

> > +libgitrs-sys:
> > +	$(QUIET)(\
> > +		cd contrib/libgit-rs/libgit-sys && \
> > +		cargo build \
> > +	)
> > +.PHONY: libgitrs
> > +libgitrs:
> > +	$(QUIET)(\
> > +		cd contrib/libgit-rs && \
> > +		cargo build \
> > +	)
> 
> We should definitely be setting `RUSTFLAGS=-Dwarnings` as an analog to
> `-Wall` in the C world, no? These crates should build without warnings.

I believe -Dwarnings turns warnings into errors (at least it does in my
tests), which is equivalent to -Werror.  We don't want that because it
breaks compiling older code with newer versions of the compiler, which
makes it harder to bisect changes or compiler on the system compiler (or
sometimes, other architectures or OSes).

That would be fine for clippy, however, because that would only run in
CI, where we _would_ want to catch newer changes, but we want to
compile nonetheless.
-- 
brian m. carlson (they/them or he/him)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply

* Re: SyntaxWarning for   '\S'
From: Stephen Smith @ 2024-09-13 18:28 UTC (permalink / raw)
  To: brian m. carlson; +Cc: git
In-Reply-To: <ZuSDQPssBOujNCrF@tapette.crustytoothpaste.net>

Given that you don’t see it and that Ubuntu release is due a month from now I will see if it is part of the update.  If the fix isn’t part of the update I will report it. 

Thanks.  
SPS

> On Sep 13, 2024, at 11:24 AM, brian m. carlson <sandals@crustytoothpaste.net> wrote:
> 
> On 2024-09-10 at 03:49:33, Stephen Smith wrote:
>> When compiling Git from source on my Ubuntu machine I've lately been getting
>> some warnings when the docs are built.
>> 
>> An example of is:
>> 
>> ASCIIDOC git-sh-i18n--envsubst.html
>> <unknown>:1: SyntaxWarning: invalid escape sequence '\S'
>> <unknown>:1: SyntaxWarning: invalid escape sequence '\S'
>> 
>> This syntax warning shows up for nearly every man page or html file.
>> 
>> Is there are current documented solution?   If there isn't a documented
>> solution, where do I start looking and I will craft a patch and submit it.
> 
> I believe this is a warning from Python about the asciidoc program
> itself.  I don't see this on my Debian unstable system, so it may be
> that it's fixed upstream (you can check at
> https://gitlab.com/asciidoc3/asciidoc3/) and if so, you can open an
> issue with Ubuntu to get it fixed.
> 
> If it bothers you, you could also use Asciidoctor, which is written in
> Ruby, and which won't have that warning.
> --
> brian m. carlson (they/them or he/him)
> Toronto, Ontario, CA
> <signature.asc>

^ permalink raw reply

* Re: SyntaxWarning for   '\S'
From: brian m. carlson @ 2024-09-13 18:24 UTC (permalink / raw)
  To: Stephen Smith; +Cc: git
In-Reply-To: <2214912.irdbgypaU6@thunderbird>

[-- Attachment #1: Type: text/plain, Size: 1065 bytes --]

On 2024-09-10 at 03:49:33, Stephen Smith wrote:
> When compiling Git from source on my Ubuntu machine I've lately been getting
> some warnings when the docs are built.
>  
> An example of is:
> 
> ASCIIDOC git-sh-i18n--envsubst.html
> <unknown>:1: SyntaxWarning: invalid escape sequence '\S'
> <unknown>:1: SyntaxWarning: invalid escape sequence '\S'
> 
> This syntax warning shows up for nearly every man page or html file.
> 
> Is there are current documented solution?   If there isn't a documented
> solution, where do I start looking and I will craft a patch and submit it.

I believe this is a warning from Python about the asciidoc program
itself.  I don't see this on my Debian unstable system, so it may be
that it's fixed upstream (you can check at
https://gitlab.com/asciidoc3/asciidoc3/) and if so, you can open an
issue with Ubuntu to get it fixed.

If it bothers you, you could also use Asciidoctor, which is written in
Ruby, and which won't have that warning.
-- 
brian m. carlson (they/them or he/him)
Toronto, Ontario, CA

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 262 bytes --]

^ permalink raw reply

* Re: [PATCH v4 0/3] doc: introducing synopsis para
From: Junio C Hamano @ 2024-09-13 18:15 UTC (permalink / raw)
  To: Jean-Noël Avila via GitGitGadget
  Cc: git, Eric Sunshine, Jean-Noël Avila
In-Reply-To: <pull.1766.v4.git.1725573126.gitgitgadget@gmail.com>

"Jean-Noël Avila via GitGitGadget" <gitgitgadget@gmail.com> writes:

> In the continuation of the simplification of manpage editing, the synopsis
> processing that was developed for synopsis paragraph style is also applied
> to all inline backquoted texts.
>
> Refining the magic regexp took more time than expected, but this one should
> really enhance writers'experience. I had to fight a bit more with
> asciidoctor, due to discrepancies between version 2.0 on my laptop and the
> 1.5.6 used by Github actions.
>
> The git-init and git-clone manpages are converted to this new system.

The fact that such a "magic" processing will hide the gory details
from those whose primary interest is to describe the commands and
their options cuts both ways.  While I can understand that purists
would find it ugly, as `backticks` is now much more than a mark-up
that means "this text is typeset in monospace", it is a very welcome
thing for developers around here, who just want to write their
document in a way even whose source is readable without having to
worry about suh gory details.  Maybe this gets popular enough after
other projects notice what you did to AsciiDoctor, love it, adopt
it, and eventually it feeds back to improve AsciiDoctor proper ;-).

So, unless there are objections and people want to discuss it further,
I'll mark the topic for 'next' soonish.

Thanks.

^ permalink raw reply

* Re: [PATCH v2 2/3] builtin: remove USE_THE_REPOSITORY_VARIABLE from builtin.h
From: John Cai @ 2024-09-13 18:00 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: John Cai via GitGitGadget, git, Patrick Steinhardt
In-Reply-To: <xmqqikv26oq4.fsf@gitster.g>

On Wed, Sep 11, 2024 at 2:15 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "John Cai via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > From: John Cai <johncai86@gmail.com>
> >
> > Instead of including USE_THE_REPOSITORY_VARIABLE by default on every
> > builtin, remove it from builtin.h and add it to all the builtins that
> > reference the_repository.
> >
> > Also, remove the include statement for repository.h since it gets
> > brought in through builtin.h.
>
> Can we have _all_ builtin/*.c files that include "builtin.h" to gain
> "#define USE_THE_REPOSITORY_VARIABLE" in this step to make it more
> mechanical?  That way we do not have to go through this large patch
> manually to review it.
>
> Then another patch can immediately remove the "#define" (and doing
> nothing else) from some of the files in builtin/*.c with its commit
> message saying "These do not need implicit or explicit accesses to
> the_repository as-is", which would make it trivially reviewable,
> because such a claim in its commit message can trivially be verified
> by simply compiling these files.
>
> After that, manual work to remove implicit or explicit accesses to
> the_repository, which would remove the "#define" that becomes
> unnecessary, one-patch-per-file can build on top.  Each of them
> would be reviewable again.

Okay I see what you mean. Yeah I can do that.

>
> > The next step will be to migrate each builtin
> > from having to use the_repository.
>
> I am not sure what this "to migrate" refers to.  Is it referring
> exactly the same thing as what I called "manual work" above?

Yes, by migrate I meant the process of removing #define and passing in
the repository
argument through to replace the_repository global.

> Thanks.

^ permalink raw reply

* Re: [PATCH v2 1/3] builtin: add a repository parameter for builtin functions
From: John Cai @ 2024-09-13 17:54 UTC (permalink / raw)
  To: Patrick Steinhardt
  Cc: Jeff King, Junio C Hamano, John Cai via GitGitGadget, git
In-Reply-To: <ZuLItS8eruz2b_D7@pks.im>

On Thu, Sep 12, 2024 at 6:55 AM Patrick Steinhardt <ps@pks.im> wrote:
>
> On Thu, Sep 12, 2024 at 06:50:40AM -0400, Jeff King wrote:
> > On Thu, Sep 12, 2024 at 06:43:20AM -0400, Jeff King wrote:
> >
> > > I do feel like I have only started seeing it in the last month or so. I
> > > wonder if a new version of binutils changed behavior or something.
> >
> > Since I had that Makefile reproduction, I ran it against an older
> > version of binutils. And yeah, it produces the same outcome. So it is
> > probably just some kind of Baader-Meinhof phenomenon.
>
> I was also reading through recent changes in binutils and couldn't spot
> anything. But in any case, I also had the feeling that this only started
> to pop up recently. Maybe it's not a change in binutils, but in our own
> build instructions. I couldn't find any smoking guns there either,
> though.

Sorry everyone. I totally missed that the garbage file got checked in
accidentally. my apologies.

>
> Patrick

^ permalink raw reply

* Re: [PATCH 4/4] remote: check branch names
From: Junio C Hamano @ 2024-09-13 17:49 UTC (permalink / raw)
  To: phillip.wood123
  Cc: Phillip Wood via GitGitGadget, git, Han Jiang, Phillip Wood
In-Reply-To: <4915a1ba-eda9-435b-b615-4f78c7fe25f7@gmail.com>

phillip.wood123@gmail.com writes:

> Thanks for the patch, I'll re-roll based on that. I wonder if we
> really want to support "@{-N}" when setting remote tracking branches
> though - should we be using INTERPRET_BRANCH_REMOTE instead when
> calling strbuf_branchname()?

Perhaps.  Users try to use "-" in surprising places, though ;-)


^ permalink raw reply

* Re: [PATCH 0/2] fix bare repositories with Git.pm
From: Junio C Hamano @ 2024-09-13 17:46 UTC (permalink / raw)
  To: Jeff King; +Cc: Rodrigo, git
In-Reply-To: <20240912223413.GA649897@coredump.intra.peff.net>

Jeff King <peff@peff.net> writes:

> Yikes, good catch. That's a pretty bad bug. I'm surprised we didn't
> cover this in the tests, but it's specific to bare repositories.
>
>> Bug hunting through the Git.pm code and skimming through the Git SCM
>> repo, there's a significant change (commit 20da61f25) that makes the
>> recent Git.pm rely on:
>> 
>>      git rev-parse --is-bare-repository --git-dir
>
> Yep, I confirmed via bisection that that commit is the culprit.

Yeah, it is surprising that nobody noticed it since Dec 2022.

> It does fix all cases, but it leaves some redundant code in place.
> Here are two patches. The first does the minimal fix within the code
> (what 20da61f25 should have done!) and corrects the problem. The second
> switches to --absolute-git-dir and drops the now-redundant code.
>
> Thank you for a very thorough bug report!

Yup, thanks, both.

Queued.

^ permalink raw reply

* Re: [PATCH 2/4] remote: print an error if refspec cannot be removed
From: Junio C Hamano @ 2024-09-13 17:38 UTC (permalink / raw)
  To: phillip.wood123
  Cc: Phillip Wood via GitGitGadget, git, Han Jiang, Phillip Wood
In-Reply-To: <acd287c8-990b-42b7-85dd-a206a887b8ee@gmail.com>

phillip.wood123@gmail.com writes:

> On 11/09/2024 21:52, Junio C Hamano wrote:
>> "Phillip Wood via GitGitGadget" <gitgitgadget@gmail.com> writes:
>>>   	if (!add_mode && remove_all_fetch_refspecs(key.buf)) {
>>> +		error(_("could not remove existing fetch refspec"));
>>>   		strbuf_release(&key);
>>>   		return 1;
>>>   	}
>> It is a minor point, but would it help to say what we tried to
>> remove (e.g. "from remote X") or is it too obvious to the end user
>> in the context they get this error?
>
> The user has to give the remote name on the command line so I think it
> should be obvious to the user.

That makes sense.  Thanks.

^ permalink raw reply

* Re: [PATCH 0/2] Simplify "commented" API functions
From: Junio C Hamano @ 2024-09-13 17:23 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git
In-Reply-To: <ZuPYZW3jYas4kJzC@pks.im>

Patrick Steinhardt <ps@pks.im> writes:

> I'm not quite sure I agree. The comment strings we have are in theory
> broken, because they can be configured differently per repository. So if
> you happen to have a Git command that operates on multiple repositories
> at once and that needs to pay attention to that config then it would now
> use the same comment character for both repositories, which I'd argue is
> the wrong thing to do.

Correct.  It needs a move from global to a member in a repository
instance, but the same "hey do not keep passing the same parameter"
motivation behind these patches applies, as the existing call sites
most likely will instead pass "the_repository->comment_line_str" to
these two functions.  The simplification would move that reference
to "the_repository->comment_line_str" down to these two functions.

> The recent patch series that makes "environment.c" aware of
> `USE_THE_REPOSITORY_VARIABLE` already converted some of the global
> config variables to be per-repository, because ultimately all of them
> are broken in the described way. So from my point of view we should aim
> to convert remaining ones to be per-repository, as well.

Yes, I view the environement change somewhat incomplete and it was
annoying to see things other than the_repository itself and those
that implicitly refer to the_repository covered by the CPP macro.

But we need to step back a bit in order to make the environment
change better.  Not everything works inside a repository and you may
not even have a repository but want to refer to a comment character
(say, "git bugreport" run outside a repository, perhaps, and the
bugreport may want to honor end-user configuration for commentChar
to mark its various sections).  Earlier I said it may make sense to
reimplement the global as a member of a repository instance, but
that is not entirely true.  Such a member in a repository struct may
be a good implementation detail for anybody who asks "what comment
character should I be using in the context I am called?", and there
may be "const char *get_comment_line_str(struct repository *)" that
accepts which repository to work with, but such a function would
need to be prepared to work without any repository, working out of
the system and per-user configuration files.

>   - It depends on a repository, but I'd argue it shouldn't such that we
>     can also query configuration in repo-less settings.
>
>   - `prepare_repo_settings()` makes up for a bad calling convention. I
>     think that lazy accessors are way easier to use.
>
> But it is a start, and something we can continue to build on.

Yup.

^ permalink raw reply

* [PATCH v4 5/5] ref: add symlink ref content check for files backend
From: shejialuo @ 2024-09-13 17:18 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano
In-Reply-To: <ZuRzCyjQFilGhj8j@ArchLinux>

We have already introduced "files_fsck_symref_target". We should reuse
this function to handle the symrefs which use legacy symbolic links. We
should not check the trailing garbage for symbolic refs. Add a new
parameter "symbolic_link" to disable some checks which should only be
executed for textual symrefs.

We firstly use the "strbuf_add_real_path" to resolve the symlink and
get the absolute path "referent_path" which the symlink ref points
to. Then we can get the absolute path "abs_gitdir" of the "gitdir".
By combining "referent_path" and "abs_gitdir", we can extract the
"referent". Thus, we can reuse "files_fsck_symref_target" function to
seamlessly check the symlink refs.

Because we consider deprecating writing the symbolic links and for
reading, we may or may not deprecate. We first need to asses whether
symbolic links may still be used. So, add a new fsck message
"symlinkRef(INFO)" to let the user be aware of this information.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 Documentation/fsck-msgids.txt |  5 ++
 fsck.h                        |  1 +
 refs/files-backend.c          | 65 ++++++++++++++++++-----
 t/t0602-reffiles-fsck.sh      | 97 +++++++++++++++++++++++++++++++++++
 4 files changed, 154 insertions(+), 14 deletions(-)

diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
index 03bcb77972..31626e765b 100644
--- a/Documentation/fsck-msgids.txt
+++ b/Documentation/fsck-msgids.txt
@@ -186,6 +186,11 @@
 	(INFO) A ref does not end with newline. This will be
 	considered an error in the future.
 
+`symlinkRef`::
+	(INFO) A symref uses the symbolic link. This kind of symref may
+	be considered ERROR in the future when totally dropping the
+	symlink support.
+
 `trailingRefContent`::
 	(INFO) A ref has trailing content. This will be
 	considered an error in the future.
diff --git a/fsck.h b/fsck.h
index c90561c6db..b72ee632a4 100644
--- a/fsck.h
+++ b/fsck.h
@@ -89,6 +89,7 @@ enum fsck_msg_type {
 	FUNC(BAD_TAG_NAME, INFO) \
 	FUNC(MISSING_TAGGER_ENTRY, INFO) \
 	FUNC(REF_MISSING_NEWLINE, INFO) \
+	FUNC(SYMLINK_REF, INFO) \
 	FUNC(TRAILING_REF_CONTENT, INFO) \
 	/* ignored (elevated when requested) */ \
 	FUNC(EXTRA_HEADER_ENTRY, IGNORE)
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 0cb4a2da73..c511deb509 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -1,4 +1,5 @@
 #include "../git-compat-util.h"
+#include "../abspath.h"
 #include "../copy.h"
 #include "../environment.h"
 #include "../gettext.h"
@@ -1950,10 +1951,13 @@ static int commit_ref_update(struct files_ref_store *refs,
 	return 0;
 }
 
+#ifdef NO_SYMLINK_HEAD
+#define create_ref_symlink(a, b) (-1)
+#else
 static int create_ref_symlink(struct ref_lock *lock, const char *target)
 {
 	int ret = -1;
-#ifndef NO_SYMLINK_HEAD
+
 	char *ref_path = get_locked_file_path(&lock->lk);
 	unlink(ref_path);
 	ret = symlink(target, ref_path);
@@ -1961,13 +1965,12 @@ static int create_ref_symlink(struct ref_lock *lock, const char *target)
 
 	if (ret)
 		fprintf(stderr, "no symlink - falling back to symbolic ref\n");
-#endif
 	return ret;
 }
+#endif
 
-static int create_symref_lock(struct files_ref_store *refs,
-			      struct ref_lock *lock, const char *refname,
-			      const char *target, struct strbuf *err)
+static int create_symref_lock(struct ref_lock *lock, const char *target,
+			      struct strbuf *err)
 {
 	if (!fdopen_lock_file(&lock->lk, "w")) {
 		strbuf_addf(err, "unable to fdopen %s: %s",
@@ -2583,8 +2586,7 @@ static int lock_ref_for_update(struct files_ref_store *refs,
 	}
 
 	if (update->new_target && !(update->flags & REF_LOG_ONLY)) {
-		if (create_symref_lock(refs, lock, update->refname,
-				       update->new_target, err)) {
+		if (create_symref_lock(lock, update->new_target, err)) {
 			ret = TRANSACTION_GENERIC_ERROR;
 			goto out;
 		}
@@ -3436,12 +3438,15 @@ typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store,
 
 /*
  * Check the symref "referent" and "referent_path". For textual symref,
- * "referent" would be the content after "refs:".
+ * "referent" would be the content after "refs:". For symlink ref,
+ * "referent" would be the relative path agaignst "gitdir" which should
+ * be the same as the textual symref literally.
  */
 static int files_fsck_symref_target(struct fsck_options *o,
 				    struct fsck_ref_report *report,
 				    struct strbuf *referent,
-				    struct strbuf *referent_path)
+				    struct strbuf *referent_path,
+				    unsigned int symbolic_link)
 {
 	size_t len = referent->len - 1;
 	struct stat st;
@@ -3454,14 +3459,16 @@ static int files_fsck_symref_target(struct fsck_options *o,
 		goto out;
 	}
 
-	if (referent->buf[referent->len - 1] != '\n') {
+	if (!symbolic_link && referent->buf[referent->len - 1] != '\n') {
 		ret = fsck_report_ref(o, report,
 				      FSCK_MSG_REF_MISSING_NEWLINE,
 				      "missing newline");
 		len++;
 	}
 
-	strbuf_rtrim(referent);
+	if (!symbolic_link)
+		strbuf_rtrim(referent);
+
 	if (check_refname_format(referent->buf, 0)) {
 		ret = fsck_report_ref(o, report,
 				      FSCK_MSG_BAD_REFERENT_NAME,
@@ -3469,7 +3476,7 @@ static int files_fsck_symref_target(struct fsck_options *o,
 		goto out;
 	}
 
-	if (len != referent->len) {
+	if (!symbolic_link && len != referent->len) {
 		ret = fsck_report_ref(o, report,
 				      FSCK_MSG_TRAILING_REF_CONTENT,
 				      "trailing garbage in ref");
@@ -3509,6 +3516,7 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 {
 	struct strbuf referent_path = STRBUF_INIT;
 	struct strbuf ref_content = STRBUF_INIT;
+	struct strbuf abs_gitdir = STRBUF_INIT;
 	struct strbuf referent = STRBUF_INIT;
 	struct strbuf refname = STRBUF_INIT;
 	struct fsck_ref_report report = {0};
@@ -3521,8 +3529,35 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 	strbuf_addf(&refname, "%s/%s", refs_check_dir, iter->relative_path);
 	report.path = refname.buf;
 
-	if (S_ISLNK(iter->st.st_mode))
+	if (S_ISLNK(iter->st.st_mode)) {
+		const char* relative_referent_path;
+
+		ret = fsck_report_ref(o, &report,
+				      FSCK_MSG_SYMLINK_REF,
+				      "use deprecated symbolic link for symref");
+
+		strbuf_add_absolute_path(&abs_gitdir, ref_store->gitdir);
+		strbuf_normalize_path(&abs_gitdir);
+		if (!is_dir_sep(abs_gitdir.buf[abs_gitdir.len - 1]))
+			strbuf_addch(&abs_gitdir, '/');
+
+		strbuf_add_real_path(&referent_path, iter->path.buf);
+
+		if (!skip_prefix(referent_path.buf,
+				 abs_gitdir.buf,
+				 &relative_referent_path)) {
+			ret = fsck_report_ref(o, &report,
+					      FSCK_MSG_ESCAPE_REFERENT,
+					      "point to target outside gitdir");
+			goto cleanup;
+		}
+
+		strbuf_addstr(&referent, relative_referent_path);
+		ret = files_fsck_symref_target(o, &report,
+					       &referent, &referent_path, 1);
+
 		goto cleanup;
+	}
 
 	if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0) {
 		ret = error_errno(_("unable to read ref '%s/%s'"),
@@ -3563,7 +3598,8 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 		strbuf_rtrim(&referent_path);
 		ret = files_fsck_symref_target(o, &report,
 					       &referent,
-					       &referent_path);
+					       &referent_path,
+					       0);
 	}
 
 cleanup:
@@ -3571,6 +3607,7 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 	strbuf_release(&ref_content);
 	strbuf_release(&referent);
 	strbuf_release(&referent_path);
+	strbuf_release(&abs_gitdir);
 	return ret;
 }
 
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index 9580c340ab..7c3579705f 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -326,4 +326,101 @@ test_expect_success 'textual symref content should be checked (aggregate)' '
 	test_cmp expect sorted_err
 '
 
+test_expect_success SYMLINKS 'symlink symref content should be checked (individual)' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	tag_dir_prefix=.git/refs/tags &&
+	cd repo &&
+	test_commit default &&
+	mkdir -p "$branch_dir_prefix/a/b" &&
+
+	ln -sf ./main $branch_dir_prefix/branch-symbolic-good &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-symbolic-good: symlinkRef: use deprecated symbolic link for symref
+	EOF
+	rm $branch_dir_prefix/branch-symbolic-good &&
+	test_cmp expect err &&
+
+	ln -sf ../../../../branch $branch_dir_prefix/branch-symbolic-1 &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-symbolic-1: symlinkRef: use deprecated symbolic link for symref
+	error: refs/heads/branch-symbolic-1: escapeReferent: point to target outside gitdir
+	EOF
+	rm $branch_dir_prefix/branch-symbolic-1 &&
+	test_cmp expect err &&
+
+	ln -sf ../../logs/branch-bad $branch_dir_prefix/branch-symbolic-2 &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-symbolic-2: symlinkRef: use deprecated symbolic link for symref
+	error: refs/heads/branch-symbolic-2: escapeReferent: points to ref outside the refs directory
+	EOF
+	rm $branch_dir_prefix/branch-symbolic-2 &&
+	test_cmp expect err &&
+
+	ln -sf ./"branch   space" $branch_dir_prefix/branch-symbolic-3 &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-symbolic-3: symlinkRef: use deprecated symbolic link for symref
+	error: refs/heads/branch-symbolic-3: badReferentName: points to refname with invalid format
+	EOF
+	rm $branch_dir_prefix/branch-symbolic-3 &&
+	test_cmp expect err &&
+
+	ln -sf ./".tag" $tag_dir_prefix/tag-symbolic-1 &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/tags/tag-symbolic-1: symlinkRef: use deprecated symbolic link for symref
+	error: refs/tags/tag-symbolic-1: badReferentName: points to refname with invalid format
+	EOF
+	rm $tag_dir_prefix/tag-symbolic-1 &&
+	test_cmp expect err &&
+
+	ln -sf ./ $tag_dir_prefix/tag-symbolic-2 &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/tags/tag-symbolic-2: symlinkRef: use deprecated symbolic link for symref
+	error: refs/tags/tag-symbolic-2: badReferentFiletype: points to the directory
+	EOF
+	rm $tag_dir_prefix/tag-symbolic-2 &&
+	test_cmp expect err
+'
+
+test_expect_success SYMLINKS 'symlink symref content should be checked (aggregate)' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	tag_dir_prefix=.git/refs/tags &&
+	cd repo &&
+	test_commit default &&
+	mkdir -p "$branch_dir_prefix/a/b" &&
+
+	ln -sf ./main $branch_dir_prefix/branch-symbolic-good &&
+	ln -sf ../../../../branch $branch_dir_prefix/branch-symbolic-1 &&
+	ln -sf ../../logs/branch-bad $branch_dir_prefix/branch-symbolic-2 &&
+	ln -sf ./"branch   space" $branch_dir_prefix/branch-symbolic-3 &&
+	ln -sf ./".tag" $tag_dir_prefix/tag-symbolic-1 &&
+	ln -sf ./ $tag_dir_prefix/tag-symbolic-2 &&
+
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/heads/branch-symbolic-1: escapeReferent: point to target outside gitdir
+	error: refs/heads/branch-symbolic-2: escapeReferent: points to ref outside the refs directory
+	error: refs/heads/branch-symbolic-3: badReferentName: points to refname with invalid format
+	error: refs/tags/tag-symbolic-1: badReferentName: points to refname with invalid format
+	error: refs/tags/tag-symbolic-2: badReferentFiletype: points to the directory
+	warning: refs/heads/branch-symbolic-1: symlinkRef: use deprecated symbolic link for symref
+	warning: refs/heads/branch-symbolic-2: symlinkRef: use deprecated symbolic link for symref
+	warning: refs/heads/branch-symbolic-3: symlinkRef: use deprecated symbolic link for symref
+	warning: refs/heads/branch-symbolic-good: symlinkRef: use deprecated symbolic link for symref
+	warning: refs/tags/tag-symbolic-1: symlinkRef: use deprecated symbolic link for symref
+	warning: refs/tags/tag-symbolic-2: symlinkRef: use deprecated symbolic link for symref
+	EOF
+	sort err >sorted_err &&
+	test_cmp expect sorted_err
+'
+
 test_done
-- 
2.46.0


^ permalink raw reply related

* [PATCH v4 4/5] ref: add symref content check for files backend
From: shejialuo @ 2024-09-13 17:18 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano
In-Reply-To: <ZuRzCyjQFilGhj8j@ArchLinux>

We have already introduced the checks for regular refs. There is no need
to check the consistency of the target which the symref points to.
Instead, we just need to check the content of the symref itself.

A regular file is accepted as a textual symref if it begins with
"ref:", followed by zero or more whitespaces, followed by the full
refname, followed only by whitespace characters. We always write
a single SP after "ref:" and a single LF after the refname, but
third-party reimplementations of Git may have taken advantage of the
looser syntax. Put it more specific, we accept the following contents
of the symref:

1. "ref: refs/heads/master   "
2. "ref: refs/heads/master   \n  \n"
3. "ref: refs/heads/master\n\n"

Thus, we could reuse "refMissingNewline" and "trailingRefContent"
FSCK_INFOs to do the same retroactive tightening as we introduce for
regular references.

But we do not allow any other trailing garbage. The followings are bad
symref contents which will be reported as fsck error by "git-fsck(1)".

1. "ref: refs/heads/master garbage\n"
2. "ref: refs/heads/master \n\n\n garbage  "

And we introduce a new "badReferentName(ERROR)" fsck message to report
above errors to the user.

In order to check the content of the symref, create a function
"files_fsck_symref_target". It will first check whether the "referent"
is under the "refs/" directory, if not, we will report "escapeReferent"
fsck error message to notify the user this situation.

Then, we will first check whether the symref content misses the newline
by peeking the last byte of the "referent" to see whether it is '\n'.

And we will remember the untrimmed length of the "referent" and call
"strbuf_rtrim()" on "referent". Then, we will call "check_refname_format"
to check whether the trimmed referent format is valid. If not, we will
report to the user that the symref points to referent which has invalid
format. If it is valid, we will compare the untrimmed length and trimmed
length, if they are not the same, we need to warn the user there is some
trailing garbage in the symref content.

At last, we need to check whether the referent is a directory. We cannot
distinguish whether a given reference like "refs/heads/a" is a file or a
directory by using "check_refname_format". We have already checked bad
file type when iterating the "refs/" directory but we ignore the
directory. Thus, we need to explicitly add check here.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 Documentation/fsck-msgids.txt |   9 +++
 fsck.h                        |   3 +
 refs/files-backend.c          |  81 +++++++++++++++++++++++
 t/t0602-reffiles-fsck.sh      | 117 ++++++++++++++++++++++++++++++++++
 4 files changed, 210 insertions(+)

diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
index 8827137ef0..03bcb77972 100644
--- a/Documentation/fsck-msgids.txt
+++ b/Documentation/fsck-msgids.txt
@@ -28,6 +28,12 @@
 `badRefName`::
 	(ERROR) A ref has an invalid format.
 
+`badReferentFiletype`::
+	(ERROR) The referent of a symref has a bad file type.
+
+`badReferentName`::
+	(ERROR) The referent name of a symref is invalid.
+
 `badTagName`::
 	(INFO) A tag has an invalid format.
 
@@ -49,6 +55,9 @@
 `emptyName`::
 	(WARN) A path contains an empty name.
 
+`escapeReferent`::
+	(ERROR) The referent of a symref is outside the "ref" directory.
+
 `extraHeaderEntry`::
 	(IGNORE) Extra headers found after `tagger`.
 
diff --git a/fsck.h b/fsck.h
index b85072df57..c90561c6db 100644
--- a/fsck.h
+++ b/fsck.h
@@ -34,11 +34,14 @@ enum fsck_msg_type {
 	FUNC(BAD_REF_CONTENT, ERROR) \
 	FUNC(BAD_REF_FILETYPE, ERROR) \
 	FUNC(BAD_REF_NAME, ERROR) \
+	FUNC(BAD_REFERENT_FILETYPE, ERROR) \
+	FUNC(BAD_REFERENT_NAME, ERROR) \
 	FUNC(BAD_TIMEZONE, ERROR) \
 	FUNC(BAD_TREE, ERROR) \
 	FUNC(BAD_TREE_SHA1, ERROR) \
 	FUNC(BAD_TYPE, ERROR) \
 	FUNC(DUPLICATE_ENTRIES, ERROR) \
+	FUNC(ESCAPE_REFERENT, ERROR) \
 	FUNC(MISSING_AUTHOR, ERROR) \
 	FUNC(MISSING_COMMITTER, ERROR) \
 	FUNC(MISSING_EMAIL, ERROR) \
diff --git a/refs/files-backend.c b/refs/files-backend.c
index df4ce270ae..0cb4a2da73 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3434,11 +3434,80 @@ typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store,
 				  const char *refs_check_dir,
 				  struct dir_iterator *iter);
 
+/*
+ * Check the symref "referent" and "referent_path". For textual symref,
+ * "referent" would be the content after "refs:".
+ */
+static int files_fsck_symref_target(struct fsck_options *o,
+				    struct fsck_ref_report *report,
+				    struct strbuf *referent,
+				    struct strbuf *referent_path)
+{
+	size_t len = referent->len - 1;
+	struct stat st;
+	int ret = 0;
+
+	if (!starts_with(referent->buf, "refs/")) {
+		ret = fsck_report_ref(o, report,
+				      FSCK_MSG_ESCAPE_REFERENT,
+				      "points to ref outside the refs directory");
+		goto out;
+	}
+
+	if (referent->buf[referent->len - 1] != '\n') {
+		ret = fsck_report_ref(o, report,
+				      FSCK_MSG_REF_MISSING_NEWLINE,
+				      "missing newline");
+		len++;
+	}
+
+	strbuf_rtrim(referent);
+	if (check_refname_format(referent->buf, 0)) {
+		ret = fsck_report_ref(o, report,
+				      FSCK_MSG_BAD_REFERENT_NAME,
+				      "points to refname with invalid format");
+		goto out;
+	}
+
+	if (len != referent->len) {
+		ret = fsck_report_ref(o, report,
+				      FSCK_MSG_TRAILING_REF_CONTENT,
+				      "trailing garbage in ref");
+	}
+
+	/*
+	 * Dangling symrefs are common and so we don't report them.
+	 */
+	if (lstat(referent_path->buf, &st)) {
+		if (errno != ENOENT) {
+			ret = error_errno(_("unable to stat '%s'"),
+					  referent_path->buf);
+		}
+		goto out;
+	}
+
+	/*
+	 * We cannot distinguish whether "refs/heads/a" is a directory or not by
+	 * using "check_refname_format(referent->buf, 0)". Instead, we need to
+	 * check the file type of the target.
+	 */
+	if (S_ISDIR(st.st_mode)) {
+		ret = fsck_report_ref(o, report,
+				      FSCK_MSG_BAD_REFERENT_FILETYPE,
+				      "points to the directory");
+		goto out;
+	}
+
+out:
+	return ret;
+}
+
 static int files_fsck_refs_content(struct ref_store *ref_store,
 				   struct fsck_options *o,
 				   const char *refs_check_dir,
 				   struct dir_iterator *iter)
 {
+	struct strbuf referent_path = STRBUF_INIT;
 	struct strbuf ref_content = STRBUF_INIT;
 	struct strbuf referent = STRBUF_INIT;
 	struct strbuf refname = STRBUF_INIT;
@@ -3484,12 +3553,24 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 					      "trailing garbage in ref");
 			goto cleanup;
 		}
+	} else {
+		strbuf_addf(&referent_path, "%s/%s",
+			    ref_store->gitdir, referent.buf);
+		/*
+		 * the referent may contain the spaces and the newline, need to
+		 * trim for path.
+		 */
+		strbuf_rtrim(&referent_path);
+		ret = files_fsck_symref_target(o, &report,
+					       &referent,
+					       &referent_path);
 	}
 
 cleanup:
 	strbuf_release(&refname);
 	strbuf_release(&ref_content);
 	strbuf_release(&referent);
+	strbuf_release(&referent_path);
 	return ret;
 }
 
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index a06ad044f2..9580c340ab 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -209,4 +209,121 @@ test_expect_success 'regular ref content should be checked (aggregate)' '
 	test_cmp expect sorted_err
 '
 
+test_expect_success 'textual symref content should be checked (individual)' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	tag_dir_prefix=.git/refs/tags &&
+	cd repo &&
+	test_commit default &&
+	mkdir -p "$branch_dir_prefix/a/b" &&
+
+	printf "ref: refs/heads/branch\n" >$branch_dir_prefix/branch-good &&
+	git refs verify 2>err &&
+	rm $branch_dir_prefix/branch-good &&
+	test_must_be_empty err &&
+
+	printf "ref: refs/heads/branch" >$branch_dir_prefix/branch-no-newline-1 &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-no-newline-1: refMissingNewline: missing newline
+	EOF
+	rm $branch_dir_prefix/branch-no-newline-1 &&
+	test_cmp expect err &&
+
+	printf "ref: refs/heads/branch     " >$branch_dir_prefix/a/b/branch-trailing-1 &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/a/b/branch-trailing-1: refMissingNewline: missing newline
+	warning: refs/heads/a/b/branch-trailing-1: trailingRefContent: trailing garbage in ref
+	EOF
+	rm $branch_dir_prefix/a/b/branch-trailing-1 &&
+	test_cmp expect err &&
+
+	printf "ref: refs/heads/branch\n\n" >$branch_dir_prefix/a/b/branch-trailing-2 &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/a/b/branch-trailing-2: trailingRefContent: trailing garbage in ref
+	EOF
+	rm $branch_dir_prefix/a/b/branch-trailing-2 &&
+	test_cmp expect err &&
+
+	printf "ref: refs/heads/branch \n" >$branch_dir_prefix/a/b/branch-trailing-3 &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/a/b/branch-trailing-3: trailingRefContent: trailing garbage in ref
+	EOF
+	rm $branch_dir_prefix/a/b/branch-trailing-3 &&
+	test_cmp expect err &&
+
+	printf "ref: refs/heads/branch \n  " >$branch_dir_prefix/a/b/branch-complicated &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/a/b/branch-complicated: refMissingNewline: missing newline
+	warning: refs/heads/a/b/branch-complicated: trailingRefContent: trailing garbage in ref
+	EOF
+	rm $branch_dir_prefix/a/b/branch-complicated &&
+	test_cmp expect err &&
+
+	printf "ref: refs/heads/.branch\n" >$branch_dir_prefix/branch-bad-1 &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/heads/branch-bad-1: badReferentName: points to refname with invalid format
+	EOF
+	rm $branch_dir_prefix/branch-bad-1 &&
+	test_cmp expect err &&
+
+	printf "ref: reflogs/heads/main\n" >$branch_dir_prefix/branch-bad-2 &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/heads/branch-bad-2: escapeReferent: points to ref outside the refs directory
+	EOF
+	rm $branch_dir_prefix/branch-bad-2 &&
+	test_cmp expect err &&
+
+	printf "ref: refs/heads/a\n" >$branch_dir_prefix/branch-bad-3 &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/heads/branch-bad-3: badReferentFiletype: points to the directory
+	EOF
+	rm $branch_dir_prefix/branch-bad-3 &&
+	test_cmp expect err
+'
+
+test_expect_success 'textual symref content should be checked (aggregate)' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	tag_dir_prefix=.git/refs/tags &&
+	cd repo &&
+	test_commit default &&
+	mkdir -p "$branch_dir_prefix/a/b" &&
+
+	printf "ref: refs/heads/branch\n" >$branch_dir_prefix/branch-good &&
+	printf "ref: refs/heads/branch" >$branch_dir_prefix/branch-no-newline-1 &&
+	printf "ref: refs/heads/branch     " >$branch_dir_prefix/a/b/branch-trailing-1 &&
+	printf "ref: refs/heads/branch\n\n" >$branch_dir_prefix/a/b/branch-trailing-2 &&
+	printf "ref: refs/heads/branch \n" >$branch_dir_prefix/a/b/branch-trailing-3 &&
+	printf "ref: refs/heads/branch \n  " >$branch_dir_prefix/a/b/branch-complicated &&
+	printf "ref: refs/heads/.branch\n" >$branch_dir_prefix/branch-bad-1 &&
+	printf "ref: reflogs/heads/main\n" >$branch_dir_prefix/branch-bad-2 &&
+	printf "ref: refs/heads/a\n" >$branch_dir_prefix/branch-bad-3 &&
+
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/heads/branch-bad-1: badReferentName: points to refname with invalid format
+	error: refs/heads/branch-bad-2: escapeReferent: points to ref outside the refs directory
+	error: refs/heads/branch-bad-3: badReferentFiletype: points to the directory
+	warning: refs/heads/a/b/branch-complicated: refMissingNewline: missing newline
+	warning: refs/heads/a/b/branch-complicated: trailingRefContent: trailing garbage in ref
+	warning: refs/heads/a/b/branch-trailing-1: refMissingNewline: missing newline
+	warning: refs/heads/a/b/branch-trailing-1: trailingRefContent: trailing garbage in ref
+	warning: refs/heads/a/b/branch-trailing-2: trailingRefContent: trailing garbage in ref
+	warning: refs/heads/a/b/branch-trailing-3: trailingRefContent: trailing garbage in ref
+	warning: refs/heads/branch-no-newline-1: refMissingNewline: missing newline
+	EOF
+	sort err >sorted_err &&
+	test_cmp expect sorted_err
+'
+
 test_done
-- 
2.46.0


^ permalink raw reply related

* [PATCH v4 3/5] ref: add more strict checks for regular refs
From: shejialuo @ 2024-09-13 17:17 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano
In-Reply-To: <ZuRzCyjQFilGhj8j@ArchLinux>

We have already used "parse_loose_ref_contents" function to check
whether the ref content is valid in files backend. However, by
using "parse_loose_ref_contents", we allow the ref's content to end with
garbage or without a newline.

Even though we never create such loose refs ourselves, we have accepted
such loose refs. So, it is entirely possible that some third-party tools
may rely on such loose refs being valid. We should not report an error
fsck message at current. We should notify the users about such
"curiously formatted" loose refs so that adequate care is taken before
we decide to tighten the rules in the future.

And it's not suitable either to report a warn fsck message to the user.
We don't yet want the "--strict" flag that controls this bit to end up
generating errors for such weirdly-formatted reference contents, as we
first want to assess whether this retroactive tightening will cause
issues for any tools out there. It may cause compatibility issues which
may break the repository. So we add the following two fsck infos to
represent the situation where the ref content ends without newline or
has trailing garbages:

1. refMissingNewline(INFO): A ref does not end with newline. This will
   be considered an error in the future.
2. trailingRefContent(INFO): A ref has trailing content. This will be
   considered an error in the future.

It might appear that we can't provide the user with any warnings by
using FSCK_INFO. However, in "fsck.c::fsck_vreport", we will convert
FSCK_INFO to FSCK_WARN and we can still warn the user about these
situations when using "git refs verify" without introducing
compatibility issues.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 Documentation/fsck-msgids.txt |  8 +++++
 fsck.h                        |  2 ++
 refs.c                        |  2 +-
 refs/files-backend.c          | 27 ++++++++++++++--
 refs/refs-internal.h          |  2 +-
 t/t0602-reffiles-fsck.sh      | 60 +++++++++++++++++++++++++++++++++++
 6 files changed, 96 insertions(+), 5 deletions(-)

diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
index 22c385ea22..8827137ef0 100644
--- a/Documentation/fsck-msgids.txt
+++ b/Documentation/fsck-msgids.txt
@@ -173,6 +173,14 @@
 `nullSha1`::
 	(WARN) Tree contains entries pointing to a null sha1.
 
+`refMissingNewline`::
+	(INFO) A ref does not end with newline. This will be
+	considered an error in the future.
+
+`trailingRefContent`::
+	(INFO) A ref has trailing content. This will be
+	considered an error in the future.
+
 `treeNotSorted`::
 	(ERROR) A tree is not properly sorted.
 
diff --git a/fsck.h b/fsck.h
index 0d99a87911..b85072df57 100644
--- a/fsck.h
+++ b/fsck.h
@@ -85,6 +85,8 @@ enum fsck_msg_type {
 	FUNC(MAILMAP_SYMLINK, INFO) \
 	FUNC(BAD_TAG_NAME, INFO) \
 	FUNC(MISSING_TAGGER_ENTRY, INFO) \
+	FUNC(REF_MISSING_NEWLINE, INFO) \
+	FUNC(TRAILING_REF_CONTENT, INFO) \
 	/* ignored (elevated when requested) */ \
 	FUNC(EXTRA_HEADER_ENTRY, IGNORE)
 
diff --git a/refs.c b/refs.c
index 74de3d3009..5e74881945 100644
--- a/refs.c
+++ b/refs.c
@@ -1758,7 +1758,7 @@ static int refs_read_special_head(struct ref_store *ref_store,
 	}
 
 	result = parse_loose_ref_contents(ref_store->repo->hash_algo, content.buf,
-					  oid, referent, type, failure_errno);
+					  oid, referent, type, NULL, failure_errno);
 
 done:
 	strbuf_release(&full_path);
diff --git a/refs/files-backend.c b/refs/files-backend.c
index b1ed2e5c04..df4ce270ae 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -560,7 +560,7 @@ static int read_ref_internal(struct ref_store *ref_store, const char *refname,
 	buf = sb_contents.buf;
 
 	ret = parse_loose_ref_contents(ref_store->repo->hash_algo, buf,
-				       oid, referent, type, &myerr);
+				       oid, referent, type, NULL, &myerr);
 
 out:
 	if (ret && !myerr)
@@ -597,7 +597,7 @@ static int files_read_symbolic_ref(struct ref_store *ref_store, const char *refn
 int parse_loose_ref_contents(const struct git_hash_algo *algop,
 			     const char *buf, struct object_id *oid,
 			     struct strbuf *referent, unsigned int *type,
-			     int *failure_errno)
+			     const char **trailing, int *failure_errno)
 {
 	const char *p;
 	if (skip_prefix(buf, "ref:", &buf)) {
@@ -619,6 +619,10 @@ int parse_loose_ref_contents(const struct git_hash_algo *algop,
 		*failure_errno = EINVAL;
 		return -1;
 	}
+
+	if (trailing)
+		*trailing = p;
+
 	return 0;
 }
 
@@ -3439,6 +3443,7 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 	struct strbuf referent = STRBUF_INIT;
 	struct strbuf refname = STRBUF_INIT;
 	struct fsck_ref_report report = {0};
+	const char *trailing = NULL;
 	unsigned int type = 0;
 	int failure_errno = 0;
 	struct object_id oid;
@@ -3458,13 +3463,29 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 
 	if (parse_loose_ref_contents(ref_store->repo->hash_algo,
 				     ref_content.buf, &oid, &referent,
-				     &type, &failure_errno)) {
+				     &type, &trailing, &failure_errno)) {
 		ret = fsck_report_ref(o, &report,
 				      FSCK_MSG_BAD_REF_CONTENT,
 				      "invalid ref content");
 		goto cleanup;
 	}
 
+	if (!(type & REF_ISSYMREF)) {
+		if (!*trailing) {
+			ret = fsck_report_ref(o, &report,
+					      FSCK_MSG_REF_MISSING_NEWLINE,
+					      "missing newline");
+			goto cleanup;
+		}
+
+		if (*trailing != '\n' || *(trailing + 1)) {
+			ret = fsck_report_ref(o, &report,
+					      FSCK_MSG_TRAILING_REF_CONTENT,
+					      "trailing garbage in ref");
+			goto cleanup;
+		}
+	}
+
 cleanup:
 	strbuf_release(&refname);
 	strbuf_release(&ref_content);
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 2313c830d8..73b05f971b 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -715,7 +715,7 @@ struct ref_store {
 int parse_loose_ref_contents(const struct git_hash_algo *algop,
 			     const char *buf, struct object_id *oid,
 			     struct strbuf *referent, unsigned int *type,
-			     int *failure_errno);
+			     const char **trailing, int *failure_errno);
 
 /*
  * Fill in the generic part of refs and add it to our collection of
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index a1205b3a3b..a06ad044f2 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -101,6 +101,54 @@ test_expect_success 'regular ref content should be checked (individual)' '
 	git refs verify 2>err &&
 	test_must_be_empty err &&
 
+	printf "%s" "$(git rev-parse main)" >$branch_dir_prefix/branch-no-newline &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-no-newline: refMissingNewline: missing newline
+	EOF
+	rm $branch_dir_prefix/branch-no-newline &&
+	test_cmp expect err &&
+
+	printf "%s garbage" "$(git rev-parse main)" >$branch_dir_prefix/branch-garbage &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/heads/branch-garbage: trailingRefContent: trailing garbage in ref
+	EOF
+	rm $branch_dir_prefix/branch-garbage &&
+	test_cmp expect err &&
+
+	printf "%s\n\n\n" "$(git rev-parse main)" >$tag_dir_prefix/tag-garbage-1 &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/tags/tag-garbage-1: trailingRefContent: trailing garbage in ref
+	EOF
+	rm $tag_dir_prefix/tag-garbage-1 &&
+	test_cmp expect err &&
+
+	printf "%s\n\n\n  garbage" "$(git rev-parse main)" >$tag_dir_prefix/tag-garbage-2 &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/tags/tag-garbage-2: trailingRefContent: trailing garbage in ref
+	EOF
+	rm $tag_dir_prefix/tag-garbage-2 &&
+	test_cmp expect err &&
+
+	printf "%s    garbage\n\na" "$(git rev-parse main)" >$tag_dir_prefix/tag-garbage-3 &&
+	git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	warning: refs/tags/tag-garbage-3: trailingRefContent: trailing garbage in ref
+	EOF
+	rm $tag_dir_prefix/tag-garbage-3 &&
+	test_cmp expect err &&
+
+	printf "%s garbage" "$(git rev-parse main)" >$tag_dir_prefix/tag-garbage-4 &&
+	test_must_fail git -c fsck.trailingRefContent=error refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/tags/tag-garbage-4: trailingRefContent: trailing garbage in ref
+	EOF
+	rm $tag_dir_prefix/tag-garbage-4 &&
+	test_cmp expect err &&
+
 	printf "%sx" "$(git rev-parse main)" >$tag_dir_prefix/tag-bad-1 &&
 	test_must_fail git refs verify 2>err &&
 	cat >expect <<-EOF &&
@@ -135,6 +183,12 @@ test_expect_success 'regular ref content should be checked (aggregate)' '
 	test_commit default &&
 	mkdir -p "$branch_dir_prefix/a/b" &&
 
+	printf "%s" "$(git rev-parse main)" >$branch_dir_prefix/branch-no-newline &&
+	printf "%s garbage" "$(git rev-parse main)" >$branch_dir_prefix/branch-garbage &&
+	printf "%s\n\n\n" "$(git rev-parse main)" >$tag_dir_prefix/tag-garbage-1 &&
+	printf "%s\n\n\n  garbage" "$(git rev-parse main)" >$tag_dir_prefix/tag-garbage-2 &&
+	printf "%s    garbage\n\na" "$(git rev-parse main)" >$tag_dir_prefix/tag-garbage-3 &&
+	printf "%s garbage" "$(git rev-parse main)" >$tag_dir_prefix/tag-garbage-4 &&
 	printf "%sx" "$(git rev-parse main)" >$tag_dir_prefix/tag-bad-1 &&
 	printf "xfsazqfxcadas" >$tag_dir_prefix/tag-bad-2 &&
 	printf "xfsazqfxcadas" >$branch_dir_prefix/a/b/branch-bad &&
@@ -144,6 +198,12 @@ test_expect_success 'regular ref content should be checked (aggregate)' '
 	error: refs/heads/a/b/branch-bad: badRefContent: invalid ref content
 	error: refs/tags/tag-bad-1: badRefContent: invalid ref content
 	error: refs/tags/tag-bad-2: badRefContent: invalid ref content
+	warning: refs/heads/branch-garbage: trailingRefContent: trailing garbage in ref
+	warning: refs/heads/branch-no-newline: refMissingNewline: missing newline
+	warning: refs/tags/tag-garbage-1: trailingRefContent: trailing garbage in ref
+	warning: refs/tags/tag-garbage-2: trailingRefContent: trailing garbage in ref
+	warning: refs/tags/tag-garbage-3: trailingRefContent: trailing garbage in ref
+	warning: refs/tags/tag-garbage-4: trailingRefContent: trailing garbage in ref
 	EOF
 	sort err >sorted_err &&
 	test_cmp expect sorted_err
-- 
2.46.0


^ permalink raw reply related

* [PATCH v4 2/5] ref: port git-fsck(1) regular refs check for files backend
From: shejialuo @ 2024-09-13 17:17 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano
In-Reply-To: <ZuRzCyjQFilGhj8j@ArchLinux>

We implicitly rely on "git-fsck(1)" to check the consistency of regular
refs. However, we have already set up the infrastructure of the ref
consistency checks. We need to port original checks from "git-fsck(1)".
Thus, we could clean the "git-fsck(1)" code by removing these implicit
checks.

The "git-fsck(1)" command reports an error when the ref content is
invalid. Following this, add a similar check to "git refs verify".
Add a new fsck error message called "badRefContent(ERROR)" to represent
that a ref has an invalid content.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 Documentation/fsck-msgids.txt |  3 ++
 fsck.h                        |  1 +
 refs/files-backend.c          | 43 +++++++++++++++++++++++++
 t/t0602-reffiles-fsck.sh      | 60 +++++++++++++++++++++++++++++++++++
 4 files changed, 107 insertions(+)

diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
index 68a2801f15..22c385ea22 100644
--- a/Documentation/fsck-msgids.txt
+++ b/Documentation/fsck-msgids.txt
@@ -19,6 +19,9 @@
 `badParentSha1`::
 	(ERROR) A commit object has a bad parent sha1.
 
+`badRefContent`::
+	(ERROR) A ref has bad content.
+
 `badRefFiletype`::
 	(ERROR) A ref has a bad file type.
 
diff --git a/fsck.h b/fsck.h
index 500b4c04d2..0d99a87911 100644
--- a/fsck.h
+++ b/fsck.h
@@ -31,6 +31,7 @@ enum fsck_msg_type {
 	FUNC(BAD_NAME, ERROR) \
 	FUNC(BAD_OBJECT_SHA1, ERROR) \
 	FUNC(BAD_PARENT_SHA1, ERROR) \
+	FUNC(BAD_REF_CONTENT, ERROR) \
 	FUNC(BAD_REF_FILETYPE, ERROR) \
 	FUNC(BAD_REF_NAME, ERROR) \
 	FUNC(BAD_TIMEZONE, ERROR) \
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 890d0324e1..b1ed2e5c04 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3430,6 +3430,48 @@ typedef int (*files_fsck_refs_fn)(struct ref_store *ref_store,
 				  const char *refs_check_dir,
 				  struct dir_iterator *iter);
 
+static int files_fsck_refs_content(struct ref_store *ref_store,
+				   struct fsck_options *o,
+				   const char *refs_check_dir,
+				   struct dir_iterator *iter)
+{
+	struct strbuf ref_content = STRBUF_INIT;
+	struct strbuf referent = STRBUF_INIT;
+	struct strbuf refname = STRBUF_INIT;
+	struct fsck_ref_report report = {0};
+	unsigned int type = 0;
+	int failure_errno = 0;
+	struct object_id oid;
+	int ret = 0;
+
+	strbuf_addf(&refname, "%s/%s", refs_check_dir, iter->relative_path);
+	report.path = refname.buf;
+
+	if (S_ISLNK(iter->st.st_mode))
+		goto cleanup;
+
+	if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0) {
+		ret = error_errno(_("unable to read ref '%s/%s'"),
+				  refs_check_dir, iter->relative_path);
+		goto cleanup;
+	}
+
+	if (parse_loose_ref_contents(ref_store->repo->hash_algo,
+				     ref_content.buf, &oid, &referent,
+				     &type, &failure_errno)) {
+		ret = fsck_report_ref(o, &report,
+				      FSCK_MSG_BAD_REF_CONTENT,
+				      "invalid ref content");
+		goto cleanup;
+	}
+
+cleanup:
+	strbuf_release(&refname);
+	strbuf_release(&ref_content);
+	strbuf_release(&referent);
+	return ret;
+}
+
 static int files_fsck_refs_name(struct ref_store *ref_store UNUSED,
 				struct fsck_options *o,
 				const char *refs_check_dir,
@@ -3512,6 +3554,7 @@ static int files_fsck_refs(struct ref_store *ref_store,
 {
 	files_fsck_refs_fn fsck_refs_fn[]= {
 		files_fsck_refs_name,
+		files_fsck_refs_content,
 		NULL,
 	};
 
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index 71a4d1a5ae..a1205b3a3b 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -89,4 +89,64 @@ test_expect_success 'ref name check should be adapted into fsck messages' '
 	test_must_be_empty err
 '
 
+test_expect_success 'regular ref content should be checked (individual)' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	tag_dir_prefix=.git/refs/tags &&
+	cd repo &&
+	test_commit default &&
+	mkdir -p "$branch_dir_prefix/a/b" &&
+
+	git refs verify 2>err &&
+	test_must_be_empty err &&
+
+	printf "%sx" "$(git rev-parse main)" >$tag_dir_prefix/tag-bad-1 &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/tags/tag-bad-1: badRefContent: invalid ref content
+	EOF
+	rm $tag_dir_prefix/tag-bad-1 &&
+	test_cmp expect err &&
+
+	printf "xfsazqfxcadas" >$tag_dir_prefix/tag-bad-2 &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/tags/tag-bad-2: badRefContent: invalid ref content
+	EOF
+	rm $tag_dir_prefix/tag-bad-2 &&
+	test_cmp expect err &&
+
+	printf "xfsazqfxcadas" >$branch_dir_prefix/a/b/branch-bad &&
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/heads/a/b/branch-bad: badRefContent: invalid ref content
+	EOF
+	rm $branch_dir_prefix/a/b/branch-bad &&
+	test_cmp expect err
+'
+
+test_expect_success 'regular ref content should be checked (aggregate)' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	branch_dir_prefix=.git/refs/heads &&
+	tag_dir_prefix=.git/refs/tags &&
+	cd repo &&
+	test_commit default &&
+	mkdir -p "$branch_dir_prefix/a/b" &&
+
+	printf "%sx" "$(git rev-parse main)" >$tag_dir_prefix/tag-bad-1 &&
+	printf "xfsazqfxcadas" >$tag_dir_prefix/tag-bad-2 &&
+	printf "xfsazqfxcadas" >$branch_dir_prefix/a/b/branch-bad &&
+
+	test_must_fail git refs verify 2>err &&
+	cat >expect <<-EOF &&
+	error: refs/heads/a/b/branch-bad: badRefContent: invalid ref content
+	error: refs/tags/tag-bad-1: badRefContent: invalid ref content
+	error: refs/tags/tag-bad-2: badRefContent: invalid ref content
+	EOF
+	sort err >sorted_err &&
+	test_cmp expect sorted_err
+'
+
 test_done
-- 
2.46.0


^ permalink raw reply related

* [PATCH v4 1/5] ref: initialize "fsck_ref_report" with zero
From: shejialuo @ 2024-09-13 17:17 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano
In-Reply-To: <ZuRzCyjQFilGhj8j@ArchLinux>

In "fsck.c::fsck_refs_error_function", we need to tell whether "oid" and
"referent" is NULL. So, we need to always initialize these parameters to
NULL instead of letting them point to anywhere when creating a new
"fsck_ref_report" structure.

The original code explicitly initializes the "path" member in the
"struct fsck_ref_report" to NULL (which implicitly 0-initializes other
members in the struct). It is more customary to use " {0} " to express
that we are 0-initializing everything. In order to be align with the the
codebase, initialize "fsck_ref_report" with zero.

Mentored-by: Patrick Steinhardt <ps@pks.im>
Mentored-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: shejialuo <shejialuo@gmail.com>
---
 refs/files-backend.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 8d6ec9458d..890d0324e1 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3446,7 +3446,7 @@ static int files_fsck_refs_name(struct ref_store *ref_store UNUSED,
 		goto cleanup;
 
 	if (check_refname_format(iter->basename, REFNAME_ALLOW_ONELEVEL)) {
-		struct fsck_ref_report report = { .path = NULL };
+		struct fsck_ref_report report = { 0 };
 
 		strbuf_addf(&sb, "%s/%s", refs_check_dir, iter->relative_path);
 		report.path = sb.buf;
-- 
2.46.0


^ permalink raw reply related

* [PATCH v4 0/5] add ref content check for files backend
From: shejialuo @ 2024-09-13 17:14 UTC (permalink / raw)
  To: git; +Cc: Patrick Steinhardt, Karthik Nayak, Junio C Hamano
In-Reply-To: <Ztb-mgl50cwGVO8A@ArchLinux>

Hi All:

This version handles some minor problems mainly focus at the improving
commit messages, comments and some minor problems.

1. Split [PATCH v3 2/4] into two commits [PATCH v4 2/5] and [PATCH v4
3/5]. [PATCH v4 2/5] integrates "git-fsck(1)"'s check and [PATCH v4 3/5]
tightens rules to check the refs with trailing garbage and refs without
newline.

2. Handle a lot of typo errors in original [PATCH v3 2/4]. And improve
the fsck-msgids documentation.

3. Improve [PATCH v4 4/5]'s commit message to first introduce the
tighten rules to be consistent with the [PATCH v4 3/5].

4. Remove "badSymrefTarget(ERROR)" fsck message. Add three new messages
to be more specific:

  1. badReferentFiletype(ERROR): The referent of a symref has a bad file
  type.

  2. badReferentName(ERROR): The referent name of a symref is invalid.

  3. escapeReferent(ERROR): The referent of a symref is outside the
  ref directory

5. Handle typos and some minor problems.

Because I add more commits, I provide the "--interdiff" here to make the
reviewer's life easier.

However, because I have not merged the latest ci fixup, so I cannot
verify some jobs in CIs. May need the help from Junio to verify.

Thanks,
Jialuo

shejialuo (5):
  ref: initialize "fsck_ref_report" with zero
  ref: port git-fsck(1) regular refs check for files backend
  ref: add more strict checks for regular refs
  ref: add symref content check for files backend
  ref: add symlink ref content check for files backend

 Documentation/fsck-msgids.txt |  25 +++
 fsck.h                        |   7 +
 refs.c                        |   2 +-
 refs/files-backend.c          | 202 +++++++++++++++++++-
 refs/refs-internal.h          |   2 +-
 t/t0602-reffiles-fsck.sh      | 334 ++++++++++++++++++++++++++++++++++
 6 files changed, 560 insertions(+), 12 deletions(-)

Interdiff against v3:
diff --git a/Documentation/fsck-msgids.txt b/Documentation/fsck-msgids.txt
index 9e8e1ac7f0..31626e765b 100644
--- a/Documentation/fsck-msgids.txt
+++ b/Documentation/fsck-msgids.txt
@@ -20,7 +20,7 @@
 	(ERROR) A commit object has a bad parent sha1.
 
 `badRefContent`::
-	(ERROR) A ref has a bad content.
+	(ERROR) A ref has bad content.
 
 `badRefFiletype`::
 	(ERROR) A ref has a bad file type.
@@ -28,9 +28,11 @@
 `badRefName`::
 	(ERROR) A ref has an invalid format.
 
-`badSymrefTarget`::
-	(ERROR) The symref target points outside the ref directory or
-	the name of the symref target is invalid.
+`badReferentFiletype`::
+	(ERROR) The referent of a symref has a bad file type.
+
+`badReferentName`::
+	(ERROR) The referent name of a symref is invalid.
 
 `badTagName`::
 	(INFO) A tag has an invalid format.
@@ -53,6 +55,9 @@
 `emptyName`::
 	(WARN) A path contains an empty name.
 
+`escapeReferent`::
+	(ERROR) The referent of a symref is outside the "ref" directory.
+
 `extraHeaderEntry`::
 	(IGNORE) Extra headers found after `tagger`.
 
@@ -178,8 +183,8 @@
 	(WARN) Tree contains entries pointing to a null sha1.
 
 `refMissingNewline`::
-	(INFO) A ref does not end with newline. This kind of ref may
-	be considered ERROR in the future.
+	(INFO) A ref does not end with newline. This will be
+	considered an error in the future.
 
 `symlinkRef`::
 	(INFO) A symref uses the symbolic link. This kind of symref may
@@ -187,8 +192,8 @@
 	symlink support.
 
 `trailingRefContent`::
-	(INFO) A ref has trailing contents. This kind of ref may be
-	considered ERROR in the future.
+	(INFO) A ref has trailing content. This will be
+	considered an error in the future.
 
 `treeNotSorted`::
 	(ERROR) A tree is not properly sorted.
diff --git a/fsck.h b/fsck.h
index 1c6f750812..b72ee632a4 100644
--- a/fsck.h
+++ b/fsck.h
@@ -34,12 +34,14 @@ enum fsck_msg_type {
 	FUNC(BAD_REF_CONTENT, ERROR) \
 	FUNC(BAD_REF_FILETYPE, ERROR) \
 	FUNC(BAD_REF_NAME, ERROR) \
-	FUNC(BAD_SYMREF_TARGET, ERROR) \
+	FUNC(BAD_REFERENT_FILETYPE, ERROR) \
+	FUNC(BAD_REFERENT_NAME, ERROR) \
 	FUNC(BAD_TIMEZONE, ERROR) \
 	FUNC(BAD_TREE, ERROR) \
 	FUNC(BAD_TREE_SHA1, ERROR) \
 	FUNC(BAD_TYPE, ERROR) \
 	FUNC(DUPLICATE_ENTRIES, ERROR) \
+	FUNC(ESCAPE_REFERENT, ERROR) \
 	FUNC(MISSING_AUTHOR, ERROR) \
 	FUNC(MISSING_COMMITTER, ERROR) \
 	FUNC(MISSING_EMAIL, ERROR) \
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 2a1b952f0d..c511deb509 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3449,14 +3449,12 @@ static int files_fsck_symref_target(struct fsck_options *o,
 				    unsigned int symbolic_link)
 {
 	size_t len = referent->len - 1;
-	const char *p = NULL;
 	struct stat st;
 	int ret = 0;
 
-	if (!skip_prefix(referent->buf, "refs/", &p)) {
-
+	if (!starts_with(referent->buf, "refs/")) {
 		ret = fsck_report_ref(o, report,
-				      FSCK_MSG_BAD_SYMREF_TARGET,
+				      FSCK_MSG_ESCAPE_REFERENT,
 				      "points to ref outside the refs directory");
 		goto out;
 	}
@@ -3473,7 +3471,7 @@ static int files_fsck_symref_target(struct fsck_options *o,
 
 	if (check_refname_format(referent->buf, 0)) {
 		ret = fsck_report_ref(o, report,
-				      FSCK_MSG_BAD_SYMREF_TARGET,
+				      FSCK_MSG_BAD_REFERENT_NAME,
 				      "points to refname with invalid format");
 		goto out;
 	}
@@ -3485,22 +3483,24 @@ static int files_fsck_symref_target(struct fsck_options *o,
 	}
 
 	/*
-	 * Missing target should not be treated as any error worthy event and
-	 * not even warn. It is a common case that a symbolic ref points to a
-	 * ref that does not exist yet. If the target ref does not exist, just
-	 * skip the check for the file type.
+	 * Dangling symrefs are common and so we don't report them.
 	 */
-	if (lstat(referent_path->buf, &st))
+	if (lstat(referent_path->buf, &st)) {
+		if (errno != ENOENT) {
+			ret = error_errno(_("unable to stat '%s'"),
+					  referent_path->buf);
+		}
 		goto out;
+	}
 
 	/*
-	 * We cannot distinguish whether "refs/heads/a" is directory or nots by
+	 * We cannot distinguish whether "refs/heads/a" is a directory or not by
 	 * using "check_refname_format(referent->buf, 0)". Instead, we need to
 	 * check the file type of the target.
 	 */
 	if (S_ISDIR(st.st_mode)) {
 		ret = fsck_report_ref(o, report,
-				      FSCK_MSG_BAD_SYMREF_TARGET,
+				      FSCK_MSG_BAD_REFERENT_FILETYPE,
 				      "points to the directory");
 		goto out;
 	}
@@ -3520,7 +3520,6 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 	struct strbuf referent = STRBUF_INIT;
 	struct strbuf refname = STRBUF_INIT;
 	struct fsck_ref_report report = {0};
-	unsigned int symbolic_link = 0;
 	const char *trailing = NULL;
 	unsigned int type = 0;
 	int failure_errno = 0;
@@ -3533,7 +3532,6 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 	if (S_ISLNK(iter->st.st_mode)) {
 		const char* relative_referent_path;
 
-		symbolic_link = 1;
 		ret = fsck_report_ref(o, &report,
 				      FSCK_MSG_SYMLINK_REF,
 				      "use deprecated symbolic link for symref");
@@ -3549,21 +3547,20 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 				 abs_gitdir.buf,
 				 &relative_referent_path)) {
 			ret = fsck_report_ref(o, &report,
-					      FSCK_MSG_BAD_SYMREF_TARGET,
+					      FSCK_MSG_ESCAPE_REFERENT,
 					      "point to target outside gitdir");
 			goto cleanup;
 		}
 
 		strbuf_addstr(&referent, relative_referent_path);
 		ret = files_fsck_symref_target(o, &report,
-					       &referent, &referent_path,
-					       symbolic_link);
+					       &referent, &referent_path, 1);
 
 		goto cleanup;
 	}
 
 	if (strbuf_read_file(&ref_content, iter->path.buf, 0) < 0) {
-		ret = error_errno(_("%s/%s: unable to read the ref"),
+		ret = error_errno(_("unable to read ref '%s/%s'"),
 				  refs_check_dir, iter->relative_path);
 		goto cleanup;
 	}
@@ -3578,14 +3575,14 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 	}
 
 	if (!(type & REF_ISSYMREF)) {
-		if (*trailing == '\0') {
+		if (!*trailing) {
 			ret = fsck_report_ref(o, &report,
 					      FSCK_MSG_REF_MISSING_NEWLINE,
 					      "missing newline");
 			goto cleanup;
 		}
 
-		if (*trailing != '\n' || (*(trailing + 1) != '\0')) {
+		if (*trailing != '\n' || *(trailing + 1)) {
 			ret = fsck_report_ref(o, &report,
 					      FSCK_MSG_TRAILING_REF_CONTENT,
 					      "trailing garbage in ref");
@@ -3602,7 +3599,7 @@ static int files_fsck_refs_content(struct ref_store *ref_store,
 		ret = files_fsck_symref_target(o, &report,
 					       &referent,
 					       &referent_path,
-					       symbolic_link);
+					       0);
 	}
 
 cleanup:
diff --git a/t/t0602-reffiles-fsck.sh b/t/t0602-reffiles-fsck.sh
index e735816d5b..7c3579705f 100755
--- a/t/t0602-reffiles-fsck.sh
+++ b/t/t0602-reffiles-fsck.sh
@@ -268,7 +268,7 @@ test_expect_success 'textual symref content should be checked (individual)' '
 	printf "ref: refs/heads/.branch\n" >$branch_dir_prefix/branch-bad-1 &&
 	test_must_fail git refs verify 2>err &&
 	cat >expect <<-EOF &&
-	error: refs/heads/branch-bad-1: badSymrefTarget: points to refname with invalid format
+	error: refs/heads/branch-bad-1: badReferentName: points to refname with invalid format
 	EOF
 	rm $branch_dir_prefix/branch-bad-1 &&
 	test_cmp expect err &&
@@ -276,7 +276,7 @@ test_expect_success 'textual symref content should be checked (individual)' '
 	printf "ref: reflogs/heads/main\n" >$branch_dir_prefix/branch-bad-2 &&
 	test_must_fail git refs verify 2>err &&
 	cat >expect <<-EOF &&
-	error: refs/heads/branch-bad-2: badSymrefTarget: points to ref outside the refs directory
+	error: refs/heads/branch-bad-2: escapeReferent: points to ref outside the refs directory
 	EOF
 	rm $branch_dir_prefix/branch-bad-2 &&
 	test_cmp expect err &&
@@ -284,7 +284,7 @@ test_expect_success 'textual symref content should be checked (individual)' '
 	printf "ref: refs/heads/a\n" >$branch_dir_prefix/branch-bad-3 &&
 	test_must_fail git refs verify 2>err &&
 	cat >expect <<-EOF &&
-	error: refs/heads/branch-bad-3: badSymrefTarget: points to the directory
+	error: refs/heads/branch-bad-3: badReferentFiletype: points to the directory
 	EOF
 	rm $branch_dir_prefix/branch-bad-3 &&
 	test_cmp expect err
@@ -311,9 +311,9 @@ test_expect_success 'textual symref content should be checked (aggregate)' '
 
 	test_must_fail git refs verify 2>err &&
 	cat >expect <<-EOF &&
-	error: refs/heads/branch-bad-1: badSymrefTarget: points to refname with invalid format
-	error: refs/heads/branch-bad-2: badSymrefTarget: points to ref outside the refs directory
-	error: refs/heads/branch-bad-3: badSymrefTarget: points to the directory
+	error: refs/heads/branch-bad-1: badReferentName: points to refname with invalid format
+	error: refs/heads/branch-bad-2: escapeReferent: points to ref outside the refs directory
+	error: refs/heads/branch-bad-3: badReferentFiletype: points to the directory
 	warning: refs/heads/a/b/branch-complicated: refMissingNewline: missing newline
 	warning: refs/heads/a/b/branch-complicated: trailingRefContent: trailing garbage in ref
 	warning: refs/heads/a/b/branch-trailing-1: refMissingNewline: missing newline
@@ -347,7 +347,7 @@ test_expect_success SYMLINKS 'symlink symref content should be checked (individu
 	test_must_fail git refs verify 2>err &&
 	cat >expect <<-EOF &&
 	warning: refs/heads/branch-symbolic-1: symlinkRef: use deprecated symbolic link for symref
-	error: refs/heads/branch-symbolic-1: badSymrefTarget: point to target outside gitdir
+	error: refs/heads/branch-symbolic-1: escapeReferent: point to target outside gitdir
 	EOF
 	rm $branch_dir_prefix/branch-symbolic-1 &&
 	test_cmp expect err &&
@@ -356,7 +356,7 @@ test_expect_success SYMLINKS 'symlink symref content should be checked (individu
 	test_must_fail git refs verify 2>err &&
 	cat >expect <<-EOF &&
 	warning: refs/heads/branch-symbolic-2: symlinkRef: use deprecated symbolic link for symref
-	error: refs/heads/branch-symbolic-2: badSymrefTarget: points to ref outside the refs directory
+	error: refs/heads/branch-symbolic-2: escapeReferent: points to ref outside the refs directory
 	EOF
 	rm $branch_dir_prefix/branch-symbolic-2 &&
 	test_cmp expect err &&
@@ -365,7 +365,7 @@ test_expect_success SYMLINKS 'symlink symref content should be checked (individu
 	test_must_fail git refs verify 2>err &&
 	cat >expect <<-EOF &&
 	warning: refs/heads/branch-symbolic-3: symlinkRef: use deprecated symbolic link for symref
-	error: refs/heads/branch-symbolic-3: badSymrefTarget: points to refname with invalid format
+	error: refs/heads/branch-symbolic-3: badReferentName: points to refname with invalid format
 	EOF
 	rm $branch_dir_prefix/branch-symbolic-3 &&
 	test_cmp expect err &&
@@ -374,7 +374,7 @@ test_expect_success SYMLINKS 'symlink symref content should be checked (individu
 	test_must_fail git refs verify 2>err &&
 	cat >expect <<-EOF &&
 	warning: refs/tags/tag-symbolic-1: symlinkRef: use deprecated symbolic link for symref
-	error: refs/tags/tag-symbolic-1: badSymrefTarget: points to refname with invalid format
+	error: refs/tags/tag-symbolic-1: badReferentName: points to refname with invalid format
 	EOF
 	rm $tag_dir_prefix/tag-symbolic-1 &&
 	test_cmp expect err &&
@@ -383,7 +383,7 @@ test_expect_success SYMLINKS 'symlink symref content should be checked (individu
 	test_must_fail git refs verify 2>err &&
 	cat >expect <<-EOF &&
 	warning: refs/tags/tag-symbolic-2: symlinkRef: use deprecated symbolic link for symref
-	error: refs/tags/tag-symbolic-2: badSymrefTarget: points to the directory
+	error: refs/tags/tag-symbolic-2: badReferentFiletype: points to the directory
 	EOF
 	rm $tag_dir_prefix/tag-symbolic-2 &&
 	test_cmp expect err
@@ -407,11 +407,11 @@ test_expect_success SYMLINKS 'symlink symref content should be checked (aggregat
 
 	test_must_fail git refs verify 2>err &&
 	cat >expect <<-EOF &&
-	error: refs/heads/branch-symbolic-1: badSymrefTarget: point to target outside gitdir
-	error: refs/heads/branch-symbolic-2: badSymrefTarget: points to ref outside the refs directory
-	error: refs/heads/branch-symbolic-3: badSymrefTarget: points to refname with invalid format
-	error: refs/tags/tag-symbolic-1: badSymrefTarget: points to refname with invalid format
-	error: refs/tags/tag-symbolic-2: badSymrefTarget: points to the directory
+	error: refs/heads/branch-symbolic-1: escapeReferent: point to target outside gitdir
+	error: refs/heads/branch-symbolic-2: escapeReferent: points to ref outside the refs directory
+	error: refs/heads/branch-symbolic-3: badReferentName: points to refname with invalid format
+	error: refs/tags/tag-symbolic-1: badReferentName: points to refname with invalid format
+	error: refs/tags/tag-symbolic-2: badReferentFiletype: points to the directory
 	warning: refs/heads/branch-symbolic-1: symlinkRef: use deprecated symbolic link for symref
 	warning: refs/heads/branch-symbolic-2: symlinkRef: use deprecated symbolic link for symref
 	warning: refs/heads/branch-symbolic-3: symlinkRef: use deprecated symbolic link for symref
-- 
2.46.0


^ permalink raw reply related

* Re: [PATCH] remote: introduce config to set prefetch refs
From: Junio C Hamano @ 2024-09-13 16:58 UTC (permalink / raw)
  To: Shubham Kanodia
  Cc: Shubham Kanodia via GitGitGadget, git, Patrick Steinhardt [ ],
	Derrick Stolee [ ]
In-Reply-To: <CAG=Um+3WSckyZ2P2o2igQr4hbMyMNTDZ_kqjrfdufvL6hUhMjA@mail.gmail.com>

Shubham Kanodia <shubham.kanodia10@gmail.com> writes:

> Ideally, a repository should be able to specify (say):
>
> remote.origin.fetch=+refs/heads/*:refs/remotes/origin/*
> remote.origin.prefetchref=refs/heads/main
>
> This configuration would maintain the normal behavior for fetches, but
> only prefetch the main branch.
> The rationale for this is that the main branch typically serves as the
> HEAD from which future branches will be forked in an active
> repository.

Oh, that is 100% agreeable.  All I wanted to caution you about was
what should happen when remote.origin.prefetchref in the above is
replaced to something like:

    [remote "origin"]
	fetch = +refs/heads/*:refs/remotes/origin/*
        prefetchref = refs/notes/*

That is, if your refspec used for your real fetch (i.e. "git fetch"
without the "--prefetch" option) does not fetch anything from
"refs/notes/" hierarchy, prefetching from the hierarchy does not
help the real fetch.  I do not have a strong preference between
marking it as an error and silently ignoring the prefetch but
leaning towards the latter, and that is why my suggestion to
implement this new "prefetchref" as something that extends the
existing filter_prefetch_refspec(), which already filters out
refspec that fetches from the refs/tags/ namespace (and the ones
that do not store by having NULL in the .dst side).

> Regarding:
>
>> If prefetch_refs contains only positive patterns, then a refspec whose source
>> doesn't match any of these patterns is rejected.
>
> Simply rejecting a source refspec pattern in `remote.<remote>.fetch`
> wouldn't achieve our goal here.

I used the verb "reject" to mean "filter out", just like a refspec
with left-hand-side that begins with "refs/tags/" is filtered out
in the current filter_prefetch_refspec().  And that is exactly what
we want to achieve our goal here.

IOW, you would

 * read their ref advertisement, and pick only the ones that have a
   matching pattern in the left-hand-side of a remote.$name.fetch
   element.  With a more recent protocol, remote.$name.fetch may
   have already participated in narrowing what they advertise to
   begin with, but the end result is the same.

 * give it to filter_prefetch_refspec().

 * filter_prefetch_refspec() inspects the refspec elements, and
   rejects ones with no right-hand-side, and ones with
   left-hand-side that begin with refs/tags/.  The current code
   without your patch already works this way up to this point.

 * We extend the above filtering so that in addition to the two
   kinds we currently reject, reject the ones that do not match the
   prefetchref criteria.  This is what is needed to implement
   "prefetchref configuration limits the set of refs that get
   prefetched".

And what you quoted is a beginning of how "prefetchref configuration
limits".  It cannot be "add to what filter_prefetch_refspec() did",
like done by the implementation in the patch we are discussing.

If your configuration were this:

    [remote "origin"]
        fetch = +refs/heads/*:refs/remotes/origin/*

you would want a way to say things like

 (1) I want to prefetch everything I usually fetch

 (2) Among the ones I usually fetch, I only want to prefetch master
     and next branches.

 (3) I want to prefetch only refs/heads/jk/* branches, but not
     refs/heads/jk/wip/* branches.

 (4) I want to prefetch everything I usually fetch, except for
     refs/heads/wip/* branches.

The case (1) is the simplest.  You will leave .prefetchref empty.

For the case (2), you would write something like

    [remote "origin"]
	prefetchref = refs/heads/master
	prefetchref = refs/heads/next

So, when your prefetchref has all positive patterns, after the
existing conditional in filter_prefetch_refspec() passes a refspec
whose right-hand-side (i.e., .dst) is not NULL and whose
left-hand-side (i.e., .src) does not begin with "refs/tags/", you
further inspect and make sure it matches one of these prefetchref
patterns.  In example (2), if they advertised master, next, and seen
branches, refs/heads/seen would be filtered out because it matches
neither of the two patterns, so we would end up prefetching master
and next branches.

For the case (3), you would want to say something like

    [remote "origin"]
	prefetchref = refs/heads/jk/*
	prefetchref = !refs/heads/jk/wip/*

Now your prefetchref has some negative pattern.  When filtering what
the existing conditional in filter_prefetch_refspec() passed, you'd
inspect the refspec element and see if it matches any of the
positive patterns, and also if it does not match any of the negative
ones.  refs/heads/next does not match any positive ones and gets
rejected.  refs/heads/jk/main does match the positive pattern
'refs/heads/jk/*', and it does not match the negative pattern
'refs/heads/jk/wip/*', so it passes and will get prefetched.

For the case (4), you would write something like

    [remote "origin"]
	prefetchref = !refs/heads/wip/*

There is no positive pattern, so if you blindly apply the rule you
used for (3) above, everything will get rejected, which is not what
you want.  refs/heads/main does not match any positive patterns
(because there are no positive patterns given), but it does not
match any negative ones, so it passes and will get prefetched.

The condition to implement the above four cases (which I think
covers all the cases we care about, but I won't guarantee it is
exhaustive---you'd need to sanity check) would be

 - If there is 1 or more positive prefetchref patterns, the refspec
   element must match one of them to be considered for the next
   rule.  Otherwise, it will not be prefetched.

 - If the refspec element matches any of negative prefetchref
   patterns, it will not be prefetched.


^ permalink raw reply

* Re: [PATCH 5/4] ci: add Ubuntu 16.04 job to GitLab CI
From: Junio C Hamano @ 2024-09-13 16:17 UTC (permalink / raw)
  To: Jeff King; +Cc: Patrick Steinhardt, git
In-Reply-To: <20240913062113.GA1232933@coredump.intra.peff.net>

Jeff King <peff@peff.net> writes:

> On Fri, Sep 13, 2024 at 07:52:51AM +0200, Patrick Steinhardt wrote:
>
>> In the preceding commits we had to convert the linux32 job to be based
>> on Ubuntu 20.04 instead of Ubuntu 16.04 due to a limitation in GitHub
>> Workflows. This was the only job left that still tested against this old
>> but supported Ubuntu version, and we have no other jobs that test with a
>> comparatively old Linux distribution.
>> 
>> Add a new job to GitLab CI that tests with Ubuntu 16.04 to cover the
>> resulting test gap. GitLab doesn't modify Docker images in the same way
>> GitHub does and thus doesn't fall prey to the same issue. There are two
>> compatibility issues uncovered by this:
>> 
>>   - Ubuntu 16.04 does not support HTTP/2 in Apache. We thus cannot set
>>     `GIT_TEST_HTTPD=true`, which would otherwise cause us to fail when
>>     Apache fails to start.
>> 
>>   - Ubuntu 16.04 cannot use recent JGit versions as they depend on a
>>     more recent Java runtime than we have available. We thus disable
>>     installing any kind of optional dependencies that do not come from
>>     the package manager.
>
> OK, this looks reasonable to me. I do think we could have our cake and
> eat it too on the Apache support if we added a GIT_TEST_HTTP2 knob. But
> it's probably not all that big a deal in practice, and after another 1.5
> years I think we'd drop this 16.04 job anyway (since it will be out of
> LTS then).
>
> Thanks for putting this together.

Yes, thanks, both.  Queued.

^ permalink raw reply

* Re: [PATCH 2/4] remote: print an error if refspec cannot be removed
From: phillip.wood123 @ 2024-09-13 15:11 UTC (permalink / raw)
  To: Junio C Hamano, Phillip Wood via GitGitGadget
  Cc: git, Han Jiang, Phillip Wood
In-Reply-To: <xmqqseu56hhb.fsf@gitster.g>

On 11/09/2024 21:52, Junio C Hamano wrote:
> "Phillip Wood via GitGitGadget" <gitgitgadget@gmail.com> writes:
>>   	if (!add_mode && remove_all_fetch_refspecs(key.buf)) {
>> +		error(_("could not remove existing fetch refspec"));
>>   		strbuf_release(&key);
>>   		return 1;
>>   	}
> 
> It is a minor point, but would it help to say what we tried to
> remove (e.g. "from remote X") or is it too obvious to the end user
> in the context they get this error?

The user has to give the remote name on the command line so I think it 
should be obvious to the user.

> The reason why I had the above question was because inserting error()
> before strbuf_release(&key) looked curious and I initially suspected
> that it was because key was used in the error message somehow, but it
> turns out that is not the case at all.

Arguably we should refactor this to use our standard "goto cleanup" pattern.

Best Wishes

Phillip

> IOW, I would have expected something more like this:
> 
>   	if (!add_mode && remove_all_fetch_refspecs(key.buf)) {
>   		strbuf_release(&key);
> +		return error(_("failed to remove fetch refspec from '%s'"),
> +				remotename);
> 
>   	}
> 

^ permalink raw reply

* Re: [PATCH 4/4] remote: check branch names
From: phillip.wood123 @ 2024-09-13 15:09 UTC (permalink / raw)
  To: Junio C Hamano, Phillip Wood via GitGitGadget
  Cc: git, Han Jiang, Phillip Wood
In-Reply-To: <xmqqfrq686n5.fsf@gitster.g>

On 11/09/2024 18:03, Junio C Hamano wrote:
> "Phillip Wood via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
> The authoritative logic in the above
> however may further evolve, and we need to make sure that these two
> checks from drifting away from each other over time.  We probably
> should refactor the leaf function in the above call chain so that
> both places can use it (the main difference is that you allow '*' in
> yours when calling check_refname_format()).
> 
>      Side note: we *should* lose "strbuf_" from its name, as it is
>                 not about string manipulation but the "strbuf'-ness
>                 of the function is merely that as the side effect of
>                 checking it computes a full refname and it happens to
>                 use strbuf as a mechanism to return it.
> 
> Something like the patch attached at the end.

Thanks for the patch, I'll re-roll based on that. I wonder if we really 
want to support "@{-N}" when setting remote tracking branches though - 
should we be using INTERPRET_BRANCH_REMOTE instead when calling 
strbuf_branchname()?

Best Wishes

Phillip

>>   static const char mirror_advice[] =
>>   N_("--mirror is dangerous and deprecated; please\n"
>>      "\t use --mirror=fetch or --mirror=push instead");
>> @@ -203,6 +216,9 @@ static int add(int argc, const char **argv, const char *prefix)
>>   	if (!valid_remote_name(name))
>>   		die(_("'%s' is not a valid remote name"), name);
>>   
>> +	if (check_branch_names(track.v))
>> +		exit(128);
>> +
> 
> Seeing that the loop in check_branch_names() is brand new and you
> could have iterated over a string-list just as easily, I somehow
> doubt that step [3/4] was fully warranted.
> 
>> @@ -1601,6 +1617,9 @@ static int set_remote_branches(const char *remotename, const char **branches,
>>   		exit(2);
>>   	}
>>   
>> +	if (check_branch_names(branches))
>> +		exit(128);
> 
> But here you are already passed "const char *branches[]" to this caller,
> and it would be hassle to turn it into string_list, so [3/4] is fine
> after all.
> 
> 
> 
>   object-name.h |  2 ++
>   object-name.c | 23 +++++++++++++++++++++--
>   2 files changed, 23 insertions(+), 2 deletions(-)
> 
> diff --git i/object-name.h w/object-name.h
> index 8dba4a47a4..fa70d42044 100644
> --- i/object-name.h
> +++ w/object-name.h
> @@ -130,4 +130,6 @@ struct object *repo_peel_to_type(struct repository *r,
>   /* used when the code does not know or care what the default abbrev is */
>   #define FALLBACK_DEFAULT_ABBREV 7
>   
> +/* Check if "name" is allowed as a branch */
> +int valid_branch_name(const char *name, int allow_wildcard);
>   #endif /* OBJECT_NAME_H */
> diff --git i/object-name.c w/object-name.c
> index 09c1bd93a3..e3bed5a664 100644
> --- i/object-name.c
> +++ w/object-name.c
> @@ -1747,7 +1747,8 @@ void strbuf_branchname(struct strbuf *sb, const char *name, unsigned allowed)
>   	strbuf_add(sb, name + used, len - used);
>   }
>   
> -int strbuf_check_branch_ref(struct strbuf *sb, const char *name)
> +static int full_ref_from_branch_name_internal(struct strbuf *sb, const char *name,
> +					      int crf_flags)
>   {
>   	if (startup_info->have_repository)
>   		strbuf_branchname(sb, name, INTERPRET_BRANCH_LOCAL);
> @@ -1766,7 +1767,25 @@ int strbuf_check_branch_ref(struct strbuf *sb, const char *name)
>   	    !strcmp(sb->buf, "refs/heads/HEAD"))
>   		return -1;
>   
> -	return check_refname_format(sb->buf, 0);
> +	return check_refname_format(sb->buf, crf_flags);
> +}
> +
> +/* NEEDSWORK: rename this to full_ref_from_branch_name */
> +int strbuf_check_branch_ref(struct strbuf *sb, const char *name)
> +{
> +	return full_ref_from_branch_name_internal(sb, name, 0);
> +}
> +
> +int valid_branch_name(const char *name, int allow_wildcard)
> +{
> +	struct strbuf sb = STRBUF_INIT;
> +	int ret;
> +	int flags;
> +
> +	flags = allow_wildcard ? REFNAME_REFSPEC_PATTERN : 0;
> +	ret = full_ref_from_branch_name_internal(&sb, name, flags);
> +	strbuf_release(&sb);
> +	return ret;
>   }
>   
>   void object_context_release(struct object_context *ctx)

^ permalink raw reply

* Re: Commit signing with SSH key uses SSH_AUTH_SOCK but ignores IdentityAgent
From: Phillip Wood @ 2024-09-13 15:05 UTC (permalink / raw)
  To: Justin Su, git
In-Reply-To: <CAB=S_8JhN=WSuYRMWbGz7gZMRX9dSb3k8rJZ7zrxkbHKOqfzww@mail.gmail.com>

Hi Justin

On 13/09/2024 10:58, Justin Su wrote:
> I use Secretive (https://github.com/maxgoedjen/secretive) to store my
> SSH keys on macOS. I've configured my ssh_config to use it as the
> IdentityAgent, and Git can push and pull just fine.
> 
> However, it seems that Git ignores IdentityAgent when signing commits,
> resulting in the following error message:

Git just runs "ssh -Y". I can reproduce this on linux - I suspect the 
problem is that ssh does not read the IdentityAgent config when signing 
even if it is outside a Host/Match in the config file.

Best Wishes

Phillip

> error: No private key found for public key "foo.pub"?
> fatal: failed to write commit object
> 
> I've worked around this by setting SSH_AUTH_SOCK, but this doesn't
> feel correct to me. Is this intended behaviour?
> 
> Thanks,
> Justin
> 

^ permalink raw reply

* Re: [PATCH 0/6] refs/reftable: wire up exclude patterns
From: karthik nayak @ 2024-09-13 12:48 UTC (permalink / raw)
  To: Patrick Steinhardt, git; +Cc: Taylor Blau
In-Reply-To: <cover.1725881266.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 1378 bytes --]

Patrick Steinhardt <ps@pks.im> writes:

> Hi,
>
> this patch series wires up exclude patterns for the reftable backend.
> Exclude patterns allow the backend to skip references internally that
> match a specific pattern. This avoids having to read references that the
> caller would discard immediately anyway.
>
> The series is structured as follows:
>
>   - Patches 1 and 2 fix two separate bugs in how we currently handle
>     exclude patterns in combination with namespaces. We didn't happen to
>     stumble over these bugs before because exclude patterns are only
>     implemented for the "packed" backend. But once you start to pack
>     refs we exclude references based on their full name instead of the
>     name with the prefixed stripped. For the reftable backend we'd
>     always hit those bugs because it always uses exclude patterns when
>     passed.
>
>   - Patches 3 to 5 wire up proper re-seeking of reftable iterators and
>     adds some tests to demonstrate that this works as expected. This is
>     a prerequisite for handling exclude patterns.
>
>   - Patch 6 wires up exclude patterns in the reftable backend by
>     re-seeking iterators once we hit an excluded reference.
>
> Thanks!
>
> Patrick
>

This was a bit more intensive so I took my time with the review. Overall
I have some questions/comments. But the series looks good. Thanks!

Karthik

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply

* Re: [PATCH 6/6] refs/reftable: wire up support for exclude patterns
From: karthik nayak @ 2024-09-13 12:47 UTC (permalink / raw)
  To: Patrick Steinhardt, git; +Cc: Taylor Blau
In-Reply-To: <f3922b81db69cd3bbdddcfbe02c99613448fd9ed.1725881266.git.ps@pks.im>

[-- Attachment #1: Type: text/plain, Size: 6356 bytes --]

Patrick Steinhardt <ps@pks.im> writes:

[snip]

> hidden refs. The following benchmark prints one reference with 1 million
> hidden references:
>
>     Benchmark 1: HEAD~
>       Time (mean ± σ):      93.3 ms ±   2.1 ms    [User: 90.3 ms, System: 2.5 ms]
>       Range (min … max):    89.8 ms …  97.2 ms    33 runs
>
>     Benchmark 2: HEAD
>       Time (mean ± σ):       4.2 ms ±   0.6 ms    [User: 2.2 ms, System: 1.8 ms]
>       Range (min … max):     3.1 ms …   8.1 ms    765 runs
>
>     Summary
>       HEAD ran
>        22.15 ± 3.19 times faster than HEAD~
>

Nice.

> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  refs/reftable-backend.c | 125 +++++++++++++++++++++++++++++++++++++++-
>  t/t1419-exclude-refs.sh |  33 ++++++++---
>  trace2.h                |   1 +
>  trace2/tr2_ctr.c        |   5 ++
>  4 files changed, 152 insertions(+), 12 deletions(-)
>
> diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
> index 1c4b19e737f..5c241097a4e 100644
> --- a/refs/reftable-backend.c
> +++ b/refs/reftable-backend.c
> @@ -21,6 +21,7 @@
>  #include "../reftable/reftable-iterator.h"
>  #include "../setup.h"
>  #include "../strmap.h"
> +#include "../trace2.h"
>  #include "parse.h"
>  #include "refs-internal.h"
>
> @@ -447,10 +448,81 @@ struct reftable_ref_iterator {
>
>  	const char *prefix;
>  	size_t prefix_len;
> +	char **exclude_patterns;
> +	size_t exclude_patterns_index;
> +	size_t exclude_patterns_strlen;
>  	unsigned int flags;
>  	int err;
>  };
>
> +/*
> + * Handle exclude patterns. Returns either `1`, which tells the caller that the
> + * current reference shall not be shown. Or `0`, which indicates that it should
> + * be shown.
> + */
> +static int should_exclude_current_ref(struct reftable_ref_iterator *iter)
> +{
> +	while (iter->exclude_patterns[iter->exclude_patterns_index]) {
> +		const char *pattern = iter->exclude_patterns[iter->exclude_patterns_index];
> +		char *ref_after_pattern;
> +		int cmp;
> +
> +		/*
> +		 * Lazily cache the pattern length so that we don't have to
> +		 * recompute it every time this function is called.
> +		 */
> +		if (!iter->exclude_patterns_strlen)
> +			iter->exclude_patterns_strlen = strlen(pattern);
> +
> +		/*
> +		 * When the reference name is lexicographically bigger than the
> +		 * current exclude pattern we know that it won't ever match any
> +		 * of the following references, either. We thus advance to the
> +		 * next pattern and re-check whether it matches.

So this means that the exclude patterns were lexicographically sorted.
Otherwise this would work.

> +		 * Otherwise, if it's smaller, then we do not have a match and
> +		 * thus want to show the current reference.
> +		 */
> +		cmp = strncmp(iter->ref.refname, pattern,
> +			      iter->exclude_patterns_strlen);
> +		if (cmp > 0) {
> +			iter->exclude_patterns_index++;
> +			iter->exclude_patterns_strlen = 0;
> +			continue;
> +		}
> +		if (cmp < 0)
> +			return 0;
> +
> +		/*
> +		 * The reference shares a prefix with the exclude pattern and
> +		 * shall thus be omitted. We skip all references that match the
> +		 * pattern by seeking to the first reference after the block of
> +		 * matches.
> +		 *
> +		 * This is done by appending the highest possible character to
> +		 * the pattern. Consequently, all references that have the
> +		 * pattern as prefix and whose suffix starts with anything in
> +		 * the range [0x00, 0xfe] are skipped. And given that 0xff is a
> +		 * non-printable character that shouldn't ever be in a ref name,
> +		 * we'd not yield any such record, either.
> +		 *

This is simple yet clever.

> +		 * Note that the seeked-to reference may also be excluded. This
> +		 * is not handled here though, but the caller is expected to
> +		 * loop and re-verify the next reference for us.
> +		 */

The seeked-to reference here being the one with 0xff. We could get rid
of this by doing something like this:

    int last_char_idx = iter->exclude_patterns_strlen - 1
    ref_after_pattern = xstrfmt("%s", pattern);
    ref_after_pattern[last_char_idx] = ref_after_pattern[last_char_idx] + 1;

instead no?

> +		ref_after_pattern = xstrfmt("%s%c", pattern, 0xff);
> +		iter->err = reftable_iterator_seek_ref(&iter->iter, ref_after_pattern);
> +		iter->exclude_patterns_index++;
> +		iter->exclude_patterns_strlen = 0;
> +		trace2_counter_add(TRACE2_COUNTER_ID_REFTABLE_RESEEKS, 1);
> +
> +		free(ref_after_pattern);
> +		return 1;
> +	}
> +
> +	return 0;
> +}
> +
>  static int reftable_ref_iterator_advance(struct ref_iterator *ref_iterator)
>  {
>  	struct reftable_ref_iterator *iter =
> @@ -481,6 +553,9 @@ static int reftable_ref_iterator_advance(struct ref_iterator *ref_iterator)
>  			break;
>  		}
>
> +		if (iter->exclude_patterns && should_exclude_current_ref(iter))
> +			continue;
> +
>  		if (iter->flags & DO_FOR_EACH_PER_WORKTREE_ONLY &&
>  		    parse_worktree_ref(iter->ref.refname, NULL, NULL, NULL) !=
>  			    REF_WORKTREE_CURRENT)
> @@ -570,6 +645,11 @@ static int reftable_ref_iterator_abort(struct ref_iterator *ref_iterator)
>  		(struct reftable_ref_iterator *)ref_iterator;
>  	reftable_ref_record_release(&iter->ref);
>  	reftable_iterator_destroy(&iter->iter);
> +	if (iter->exclude_patterns) {
> +		for (size_t i = 0; iter->exclude_patterns[i]; i++)
> +			free(iter->exclude_patterns[i]);
> +		free(iter->exclude_patterns);
> +	}
>  	free(iter);
>  	return ITER_DONE;
>  }
> @@ -580,9 +660,45 @@ static struct ref_iterator_vtable reftable_ref_iterator_vtable = {
>  	.abort = reftable_ref_iterator_abort
>  };
>
> +static char **filter_exclude_patterns(const char **exclude_patterns)
> +{
> +	size_t filtered_size = 0, filtered_alloc = 0;
> +	char **filtered = NULL;
> +
> +	if (!exclude_patterns)
> +		return NULL;
> +
> +	for (size_t i = 0; ; i++) {
> +		const char *exclude_pattern = exclude_patterns[i];
> +		int has_glob = 0;
> +
> +		if (!exclude_pattern)
> +			break;
> +
> +		for (const char *p = exclude_pattern; *p; p++) {
> +			has_glob = is_glob_special(*p);
> +			if (has_glob)
> +				break;
> +		}

Why do we need to filter excludes here? Don't the callee's already do
something like this?

[snip]

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 690 bytes --]

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox